Labeling malware samples with their appropriate malware family helps understand and track malware evolution and develop mitigation techniques. Current malware analysis techniques that use supervised machine learning rely on classification models that are trained on malware traffic generated from a sandbox environment. These models are then used to classify future unseen observations. In practice, however, malware traffic comes mixed with other legitimate background traffic from host machines, such as user browsing and software update traffic. Hence, the classifier's accuracy to predict the correct malware label on unseen (mixed) traffic is low. We propose a novel classification system that uses an Independent Component Analysis (ICA) module that applies distribution decomposition to separate the observed traffic into two components, malware traffic and background traffic. We also use a random forest classifier module to learn a classification model for every malware family, and then use it to predict malware family labels using the output of the ICA module. This system is thus capable of labeling malware traffic after removing background artifacts (noise), which makes it more efficient and accurate than current classification methods. Our experiments on three malware family datasets show that the performance of our system improves significantly after removing the background traffic artifacts.
|Original language||English (US)|
|Title of host publication||2015 IEEE Conference on Communications and NetworkSecurity, CNS 2015|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||9|
|State||Published - Dec 3 2015|
|Event||3rd IEEE International Conference on Communications and Network Security, CNS 2015 - Florence, Italy|
Duration: Sep 28 2015 → Sep 30 2015
|Name||2015 IEEE Conference on Communications and NetworkSecurity, CNS 2015|
|Other||3rd IEEE International Conference on Communications and Network Security, CNS 2015|
|Period||9/28/15 → 9/30/15|
Bibliographical noteFunding Information:
This research was supported in part by NSF grants CNS- 1117536, CRI-1305237, CNS-1411636 and DTRA grant HDTRA1-14-1-0040 and DoD ARO MURI Award W911NF- 12-1-0385.
© 2015 IEEE.