Automatically Evading Classifiers A Case Study on PDF Malware Classifiers Weilin Xu David Evans Yanjun Qi University of Virginia
Machine Learning is Solving Our Problems … Fake Fake … Spam IDS Malware Accounts 2
3
4
Machine Learning is Eating the World ? Data Security Expert Scientist 5
Machine Learning is Eating the World Data Security No! Expert Security is different. Scientist 6
Security Tasks are Different: Adversary Adapts Goal : Understand classifiers under attack. Results : Vulnerable to automated evasion. 7
Building Machine Learning Classifiers Training (Supervised Learning) Vectors Labelled Trained Classifier ML Feature Training Algorithm Extraction Data 8
Assumption: Training Data is Representative Deployment Training (Supervised Learning) Operational Data Vectors Labelled Trained Classifier ML Feature Training Algorithm Extraction Data Malicious / Benign 9
Results: Evaded PDF Malware Classifiers PDFrate* Hidost [ACSAC’12] [NDSS’13] 0.9976 0.9996 Accuracy 0.0000 0.0056 False Negative Rate False Negative Rate 1.0000 1.0000 with Adversary * Mimicus [Oakland ’14], an open source reimplementation of PDFrate. 10
Results: Evaded PDF Malware Classifiers Very robust against “strongest conceivable mimicry attack”. PDFrate* Hidost [ACSAC’12] [NDSS’13] 0.9976 0.9996 Accuracy 0.0000 0.0056 False Negative Rate False Negative Rate 1.0000 1.0000 with Adversary * Mimicus [Oakland ’14], an open source reimplementation of PDFrate. 11
Automated Evasion Approach Based on Genetic Programming Malicious PDF Benign PDFs Variants ✓ 01011001101 ✓ Variants ✗ ✓ Select Mutation Clone Variants Variants 12
Automated Evasion Approach Based on Genetic Programming Extract Me If You Can: Abusing PDF Parsers in Malware Detectors Curtis Carmony,et al. /Pages /Catalog Malicious PDF Benign PDFs Modified /Root Variants Parser 0 ✓ 01011001101 ✓ /JavaScript Variants ✗ ✓ eval(‘…’); Select Mutation Clone Variants Variants 13
Automated Evasion Approach Mutation Based on Genetic Programming /Pages /Catalog 0 /Root /JavaScript Malicious PDF Benign PDFs Variants eval(‘…’); From Variants ✓ 01011001101 ✓ Variants Benign ✗ ✓ Insert / Replace / Delete Select Mutation Clone Variants Variants 14
Automated Evasion Approach Mutation Based on Genetic Programming /Pages /Catalog 0 0 0 /Root 128 /JavaScript Malicious PDF Benign PDFs 546 Variants eval(‘…’); From Variants ✓ 01011001101 ✓ Variants Benign ✗ ✓ Insert / Replace / Delete Select Mutation Clone Variants Variants 15
Automated Evasion Approach Mutation Based on Genetic Programming /Pages /Catalog 0 0 0 /Root 128 /JavaScript 546 Malicious PDF Benign PDFs Variants eval(‘…’); From Variants ✓ 01011001101 ✓ Variants Benign ✗ ✓ Insert / Replace / Delete Select Mutation Clone Variants Variants 16
Automated Evasion Approach Mutation Based on Genetic Programming /Pages /Catalog 0 0 0 0 /Root 128 128 /JavaScript 546 Malicious PDF Benign PDFs Variants eval(‘…’); From Variants ✓ 01011001101 ✓ Variants Benign ✗ ✓ Insert / Replace / Delete Select Mutation Clone Variants Variants 17
Automated Evasion Approach Mutation Based on Genetic Programming /Pages /Catalog 0 0 /Root 128 /JavaScript Malicious PDF Benign PDFs Variants eval(‘…’); From Variants ✓ 01011001101 ✓ Variants Benign ✗ ✓ Insert / Replace / Delete Select Mutation Clone Variants Variants 18
Automated Evasion Approach Mutation Based on Genetic Programming /Pages /Catalog 0 0 /Root 128 /JavaScript Malicious PDF Benign PDFs Variants eval(‘…’); From Variants ✓ 01011001101 ✓ Variants Benign ✗ ✓ Insert / Replace / Delete Select Mutation Clone Variants Variants 19
Automated Evasion Approach Based on Genetic Programming Malicious PDF Benign PDFs Variants ✓ 01011001101 ✓ Variants ✗ ✓ Select Mutation Clone Variants Variants 20
Automated Evasion Approach Based on Genetic Programming Malicious? Malicious PDF Benign PDFs Variants Oracle f(x) Variants Fitness Score ✓ 01011001101 ✓ Variants Fitness Function ✗ ✓ Score Select Mutation Clone Target Classifier Variants Variants 21
Automated Evasion Approach Based on Genetic Programming Malicious Malicious? Malicious PDF Benign PDFs Variants Oracle f(x) Variants Fitness Score ✓ 01011001101 ✓ Variants Benign Fitness Function ✗ ✓ Score Select Mutation Clone Target Classifier Variants Variants 22
Automated Evasion Approach Based on Genetic Programming Malicious PDF Benign PDFs Variants ✓ 01011001101 ✓ Variants ✗ ✓ Select Mutation Clone Variants Variants 23
Results: Evaded PDFrate 100% Original Malware Seeds 24
Results: Evaded PDFrate 100% Original Malware Seeds Evasive Variants 25
Evaded PDFrate with Adjusted Threshold Original Malware Seeds Evasive Variants Evasive Variants with lower threshold 26
Results: Evaded Hidost 100% Original Malware Seeds 27
Results: Evaded Hidost 100% Original Malware Seeds Evasive Variants 28
Results: Accumulated Evasion Rate Difficulty varies by seed Simple mutations often work Complex mutations sometimes needed. Difficulty varied by targets: PDFrate: 6 days to evade all Hidost: 2 days to evade all 29
Cross-Evasion Effects Hidost 387/500 Evasive PDFrate (77.4%) Evasive 3/500 Evasive PDF Malware PDF Malware (0.6%) Seeds (against Hidost) Automated Evasion Gmail’s classifier is secure? 30
Cross-Evasion Effects Hidost 387/500 Evasive PDFrate (77.4%) Evasive 3/500 Evasive PDF Malware PDF Malware (0.6%) Seeds (against Hidost) Automated Evasion Gmail’s classifier is secure? different. 31
Evading Gmail’s Classifier Evasion rate on : 135/380 (35.5%) 32
Evading Gmail’s Classifier Evasion rate on : 179/380 (47.1%) 33
Conclusion Vs. Who will win this arm race? Source Code: http://EvadeML.org 34
Recommend
More recommend