App markets, being crucial and critical for today's mobile ecosystem, have also become a natural malware delivery channel since they actually "lend credibility" to malicious apps. In the past decade, machine learning (ML) techniques have been explored for automated, robust malware detection. Unfortunately, to date, we have yet to see an ML-based malware detection solution deployed at market scales. To better understand the real-world challenges, we conduct a collaborative study with a major Android app market (T-Market) offering us large-scale ground-Truth data. Our study shows that the key to successfully developing such systems is manifold, including feature selection/engineering, app analysis speed, developer engagement, and model evolution. Failure in any of the above aspects would lead to the "wooden barrel effect" of the entire system. We discuss our careful design choices as well as our first-hand deployment experiences in building such an ML-powered malware detection system. We implement our design and examine its effectiveness in the T-Market for over one year, using a single commodity server to vet ∼ 10K apps every day. The evaluation results show that this design achieves an overall precision of 98% and recall of 96% with an average per-App scan time of 1.3 minutes.
|Original language||English (US)|
|Title of host publication||Proceedings of the 15th European Conference on Computer Systems, EuroSys 2020|
|Publisher||Association for Computing Machinery, Inc|
|State||Published - Apr 15 2020|
|Event||15th European Conference on Computer Systems, EuroSys 2020 - Heraklion, Greece|
Duration: Apr 27 2020 → Apr 30 2020
|Name||Proceedings of the 15th European Conference on Computer Systems, EuroSys 2020|
|Conference||15th European Conference on Computer Systems, EuroSys 2020|
|Period||4/27/20 → 4/30/20|
Bibliographical noteFunding Information:
We sincerely thank our shepherd Prof. Jon Crowcroft and the anonymous reviewers for their valuable feedback. We also appreciate Weizhi Li, Yang Li, Zipeng Wu, and Hai Long for their contributions to the data collection and system deployment of APICHECKER. This work is supported in part by the National Key R&D Program of China under grant 2018YFB1004700, the National Natural Science Foundation of China (NSFC) under grants 61822205, 61902211, 61632020 and 61632013, and the Beijing National Research Center for Information Science and Technology (BNRist).
© 2020 ACM.