Experiences of of La Landing Machine Le Learning onto - PowerPoint PPT Presentation

Experiences of of La Landing Machine Le Learning onto Market-Scale Mobile Malware Detection Liangyi Gong, Zhenhua Li, Feng Qian, Zifan Zhang, Qi Alfred Chen, Zhiyun Qian, Hao Lin, Yunhao Liu

Mobile Malware Detection ⚫ Android App Markets “lend credib ibil ilit ity ” Mobile App Markets Mobile Users ⚫ Current Mobile App Review ✓ Fingerprint-based Antivirus Checking ✓ Expert-informed API inspection ✓ User-report-driven Manual Examination ✓ API-based Dynamic Analysis

Mobile Malware Detection ⚫ Android App Markets “lend credib ibil ilit ity ” Mobile App Markets Mobile Users ⚫ ML-based Mobile App Review Techniques ⚫ Fingerprint-based Antivirus Checking ⚫ Static Code Inspection ⚫ Dynamic Behavior Analysis

ML-based Detection at Market Scales Real-world Widely explored in No existing report of the past decade the effectiveness Challenges? ML-based Malware ML-based Solutions Detection at Market Scales

Large-scale Dataset: API-centric, Dynamic • 500K apps submitted to Tencent Market • From March to December 2017 • Containing apps’ malice labels Monkey : UI Event Steam APK APK Commodity servers Trigger api to Trigger API to output log output log App Emulation Tencent Market https://sj.qq.com/ One-hot Feature Vector

API Selection: Correlation ⚫ APIs’ correlations with the ⚫ Time consumption of malice of apps Tracking APIs ▪ Using SRC ( Spearman’s rank ▪ Tracking highly correlated APIs ▪ Fitting a tri-modal distribution correlation coefficient ) to evaluate APIs’ correlation with apps’ malice ▪ 260 APIs pose non-trivial correlation (| SRC | ≥ 0.2) 0.6 0.5 0.4 |SRC| 0.3 0.2 0.1 0 0 200 400 600 800 1000 Ranking of API

API Selection: Correlation ⚫ APIs’ correlations with the ⚫ Time consumption of malice of apps tracking different API sets ▪ Fitting a tri-modal distribution ▪ Using SRC ( Spearman’s rank ▪ Indicating a complex relationship correlation coefficient ) to evaluate APIs’ correlation with apps’ malice ▪ 260 APIs pose non-trivial correlation (| SRC | ≥ 0.2) 0.6 0.5 0.4 |SRC| 0.3 0.2 0.1 0 0 200 400 600 800 1000 Ranking of API

API Selection: Model & Accuracy ⚫ Machine Learning Model & Detection Accuracy Model Precision Recall Training Time Tracking top-490 correlated APIs achieves the highest Naive Bayes 60.4% 59.6% 3.6 min precision/recall LR 81.2% 70.3% 10.4 min SVM 87.9% 71.6% ∼ 27K min GBDT 88.4% 74.3% 364 min kNN 86.5% 83.7% ∼ 1.8K min CART 87.6% 84.3% 11.6 min ∼ 1.2K min ANN 90.8% 89.9% DNN 91.5% 90.9% ∼ 1.9K min Random Forest 91.6% 90.2% 29.1 min

Key API Selection Strategy ⚫ Step 1. Selecting APIs with the highest correlation with malware (Set-C). ⚫ Step 2. Selecting APIs that relate to restrictive permissions (Set-P). ⚫ Step 3. Selecting APIs that perform sensitive operations (Set-S). ⚫ Step 4. Combining the above. Set-C Performance: 244 ⚫ Analysis time: 4.3 minutes 12 4 ⚫ Precision/Recall: 96.8% / 93.7% Set-P Set-S 100 66 ⚫ Training time: 14.4 seconds

Further Enriching the Feature Space ⚫ Hidden features – API invocation hidden by certain techniques Hidden and internal APIs IPC through intents triggered by special techniques leveraging other apps/services to like Java reflection perform sensitive actions Checking Permissions Checking Used Intents Key APIs alone API + Permission + Intents ⚫ Precision: 96.8% ⚫ Precision: 98.6% ⚫ Recall: 93.7% ⚫ Recall: 96.7%

System: Emulation Optimization ⚫ Default Google Android Emulator: full-system emulation ⚫ Result: 30% of apps require ≥ 5-minute analysis time ⚫ Solution: lightweight emulation on powerful x86 server ⚫ Architect: native x86 Android + Dynamic Binary Translation

System: Emulation Optimization ⚫ Configuration: 5x4-core x86 server with CPU pinning ⚫ Compatibility: ≤ 1% incompatible apps ⚫ Roll back to the Google Emulator for incompatible apps ⚫ Performance: saving around 70% of the detection time Able to analyze an app in around 1.3 minutes

System: Real-world Deployment ⚫ Integration to Tencent Market ⚫ Integration to Tencent Market ⚫ System Evoluation ▪ Running since March 2018 ⚫ Monthly updating the key APIs ▪ Checking ~10K apps per day using a with apps and SDK APIs single commodity server ⚫ Dataset ▪ Over 98%/96% online precision/recall contains the original dataset and new apps submitted ⚫ Fluctuating between 425 and 432

System: Real-world Deployment ⚫ Integration to Tencent Market ⚫ System Evolution ▪ Running since March 2018 ▪ Monthly updating the key APIs ▪ Checking ~10K apps per day using a with the original dataset and single commodity server newly submitted apps ▪ Over 98%/96% online precision/recall ▪ Fluctuating between 425 and 432

System: Addressing FPs & FNs ⚫ False Positives ⚫ False Negative ▪ 2% FP apps as complained by ⚫ 4% False Negative (FN) apps developers reported by end users ▪ All using a few top-ranking APIs ⚫ Most (87%) of the FN apps barely ▪ Most are quickly vetted based use the 426 key APIs on previous versions ⚫ These apps have fairly simple functionalities without posing a great security threat to end users ⚫ a small number of false negative Manual Inspection: apps in fact has little effect on the acceptable workload regular operation of T-Market Active & complete Passive mitigation of FNs avoidance of FPs

System: Addressing FPs & FNs ⚫ False Positives ⚫ False Negatives ▪ ▪ 4% FN apps reported by end users 2% FP apps as complained by ▪ Hard to avoid developers ▪ All using a few top-ranking APIs ▪ Most (87%) barely use key APIs ▪ ▪ They have fairly simple Most are quickly vetted based on previous versions functionalities, posing little threat Manual Inspection: Report-driven: acceptable workload mild impact on users Active & complete Passive mitigation of FNs avoidance of FPs

Revealed Important Features ⚫ Attempting to acquire privacy-sensitive information of user devices ⚫ Tracking or intercepting system-level events ⚫ Enabling certain types of attacks such as overlay-based attacks Gini Importance 0 0.02 0.04 0.06 0.08 0.1 API: SmsManager_sendTextMessage Permission: SEND_SMS Intent: SMS_RECEIVED Intent: wifi.STATE_CHANGE Permission: RECEIVE_SMS Intent: DEVICE_ADMIN_ENABLED Intent: buluetooth.STATE_CHANGED Permission: RECEIVE_MMS Intent: ACTION_BATTERY_OKAY API: TelephonyManager_getLine1Number Permission: RECEIVE_WAP_PUSH API: WifiInfo_getMacAddress Permission: READ_SMS API: View_setBackgroundColor Permission: ACCESS_NETWORK_STATE Permission: SYSTEM_ALERT_WINDOW API: SQLiteDatabase_insertWithOnConflict Permission: RECEIVE_BOOT_COMPLETED API: HttpURLConnection_connect API: ActivityManager_getRunningTasks

Experiences of APIC HECKER Feature Engineering Feature Selection Adversary’s Principled, perspective data-driven Benign Malicious Analysis Speed Model Evolution Efficient app Monthly emulation on update with powerful x86 novel apps & servers Developer Engagement SDK APIs Active & complete avoidance of FPs vs. Passive mitigation of FNs

Conclusion & Dataset ⚫ We conduct a large-scale study to understand and overcome real-world challenges of developing ML- based malware detection solutions at market scales. ⚫ We showcase several key design decisions we make towards implementing, deploying, and operating a production market-scale mobile malware detection system – APIC HECKER . ⚫ Our system has been operational at Tencent Market since March 2018, vetting around 10K apps per day on a single commodity server. Dataset & tool release: https://apichecker.github.io/

Experiences of of La Landing Machine Le Learning onto - PowerPoint PPT Presentation

Experiences of of La Landing Machine Le Learning onto Market-Scale Mobile Malware Detection Liangyi Gong, Zhenhua Li, Feng Qian, Zifan Zhang, Qi Alfred Chen, Zhiyun Qian, Hao Lin, Yunhao Liu Mobile Malware Detection Android App Markets

LANDING ACCOUNT PROCEDURES. LANDING ACCOUNT The Landing Account is a report of all the cargo that

GLOBEVILLE LANDING OUTFALL Globeville Landing Park Globeville Landing Park Part of the DPR

Apollo 11: Lunar Landing INST 154 Apollo at 50 Lunar Landing Apollo 11 Landing Site Selection

Landing Overruns- Landing Overruns- Human Factors Human Factors Captain David Oliver Captain

Short Field Landing OregonFlightSchool.com What is a Short Field Landing? Clears obstacles

Onto lo gy Co nstruc tio n fro m Online Onto lo gie s Harith Alani 15 th Int. World Wide Web

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

South Bridge Landing Graphic Illustration Statement of Commitments to South Bridge Landing: 2.

Helicopters Devon Air Ambulance Community Landing Sites WORKING WITH COMMUNITIES TO A surveyed

IMPACT OF NATIONAL SECURITY REVIEW ON SUBMARINE CABLES LANDING IN THE UNITED STATES LANDING IN

UIHC Healthcare Clinic at Iowa River Landing 1 Iowa River Landing Master Site Plan UIHC Site

TIMNATH LANDING 2 ND FILING P R O J E C T T E A M TIMNATH LANDING 1 5 DEVELOPER / PROJECT

and Evaluation CMSC 678 UMBC Central Question: How Well Are We Doing? Precision, Recall,

Optimizing unit test execution in large software programs using dependency analysis Taesoo Kim,

Intrus ntrusion ion Det Detection, ection, Fi Fire rewalls, alls, an and d Intr ntrusion

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning tasks

MA162: Finite mathematics . Jack Schmidt University of Kentucky December 3, 2012 Schedule:

Bayesian Updating: Discrete Priors: 18.05 Spring 2014 http://xkcd.com/1236/ January 1, 2017

Introduction to Machine Learning Evaluation: Measures for Binary Classification: ROC

Concept Drift Albert Bifet March 2012 COMP423A/COMP523A Data Stream Mining Outline 1.

Sambuz

Useful Links

Newsletter

Mail Us