Systematically Landing Machine Learning onto Market-Scale Mobile Malware Detection

Liangyi Gong, Hao Lin, Zhenhua Li, Feng Qian, Yang Li, Xiaobo Ma, Yunhao Liu

Research output: Contribution to journalArticlepeer-review

Abstract

Despite being crucial to today's mobile ecosystem, app markets have meanwhile become a natural, convenient malware delivery channel as they actually 'lend credibility' to malicious apps. In the past few years, machine learning (ML) techniques have been widely explored for automated, robust malware detection, but till now we have not seen an ML-based malware detection solution applied at market scales. To systematically understand the real-world challenges, we conduct a collaborative study with T-Market, a popular Android app market that offers us large-scale ground-truth data. Our study illustrates that the key to successfully developing such systems is multifold, including feature selection and encoding, feature engineering and exposure, app analysis speed and efficacy, developer and user engagement, as well as ML model evolution. Failure in any of the above aspects could lead to the 'wooden barrel effect' of the whole system. This article presents our judicious design choices and first-hand deployment experiences in building a practical ML-powered malware detection system. It has been operational at T-Market, using a single commodity server to check \sim∼12K apps every day, and has achieved an overall precision of 98.9 percent and recall of 98.1 percent with an average per-app scan time of 0.9 minutes.

Original languageEnglish (US)
Article number9301262
Pages (from-to)1615-1628
Number of pages14
JournalIEEE Transactions on Parallel and Distributed Systems
Volume32
Issue number7
DOIs
StatePublished - Jul 1 2021

Bibliographical note

Funding Information:
This work was supported in part by the National Key R&D Program of China under Grant 2018YFB1004700, in part by the NSF of China under Grants 61902211, 61972313, 61822205, 61632020, and 61632013, in part by the NSF of Tianjin under Grant 18JCQNJC69900, in part by the Postdoctoral Science Fund of China under Grant 2019M663725, and in part by the BNRist. Liangyi Gong and Hao Lin are Co-primary authors.

Publisher Copyright:
© 1990-2012 IEEE.

Keywords

  • Android emulation
  • Machine learning
  • app market
  • dynamic analysis
  • mobile malware detection

Fingerprint Dive into the research topics of 'Systematically Landing Machine Learning onto Market-Scale Mobile Malware Detection'. Together they form a unique fingerprint.

Cite this