evadeML. L.org Shrinking and Exploring David Evans University of Virginia Adversarial Search Spaces ARO Workshop on Adversarial Learning Stanford, 14 Sept 2017 Weilin Xu Yanjun Qi
Machine Learning is Eating Computer Science 1
Security State-of-the-Art Random guessing attack success Threat models Proofs probability information π "πππ Cryptography theoretic, resource required bounded capabilities, π "ππ System Security motivations, common rationality Adversarial white-box, π "ππ *; π "π rare! Machine Learning black-box 2
Adversarial Examples βpandaβ βgibbonβ + 0.007 Γ [ππππ‘π] = Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples . 2014. 3
Adversarial Examples Game Given seed sample, π¦ , find π¦ 6 where: π π¦ 6 β π(π¦ ) Class is different (untargeted) π π¦ 6 = π’ Class is π’ (targeted) β π¦, π¦ 6 β€ π Difference below threshold β π¦, π¦ 6 is defined in some (simple!) metric space: π @ βnorm (# different), π A norm, π B norm (βEuclideanβ), π C norm: 4
Detecting Prediction 0 Adversarial Model Examples Squeezer 1 Model Adversarial Prediction 1 Yes Squeezer 2 π(ππ ππ @ , ππ ππ A , β¦ , ππ ππ K ) Model Prediction 2 Input No β¦ Legitimate Squeezer k Modelβ Prediction k
βFeature Squeezingβ [0.054, 0.4894, 0.9258, 0.0116, 0.2898, 0.5222, 0.5074, β¦] π Squeeze: π O = round(π O Γ4)/4 [0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, β¦] squeeze π 6 β squeeze π βΉ π(squeeze π 6 ) β π(squeeze π ) [0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, β¦] Squeeze: π O = round(π O Γ4)/4 π 6 [0.0491, 0.4903, 0.9292, 0.009, 0.2942, 0.5243, 0.5078, β¦] 6
Example Squeezers 1-bit monochrome 8-bit greyscale 3x3 smoothing: Replace with median of pixels and its neighbors Reduce Color Depth Median Smoothing 7
Simple Instantiation Model Prediction 0 (7-layer CNN) Adversarial Yes Bit Depth- Model 1 Prediction 1 max π A π @ , π A , π A π @ , π2 > π’ Input No Median Prediction 2 Legitimate 2Γ2 Model
Number of Examples 800 Legitimate 600 400 Adversarial threshold = 0.0029 200 detection: 98.2%, FP < 4% 0 0.0 0.4 0.8 1.2 1.6 2.0 Maximum π A distance between original and squeezed input 9
ImageNet Configuration Prediction 0 Model (MobileNet) Adversarial Bit Depth- Yes Model 5 Prediction 1 max(π A (π @ , {π A , π B , π b }) > π’ Median Input No 2Γ2 Model Prediction 2 Legitimate Non-local Mean Model Prediction 3
140 Legitimate 120 100 80 Adversarial 60 threshold = 1.24 40 detection: 85%, FP < 5% 20 0 0.0 0.4 0.8 1.2 1.6 2.0 Maximum π A distance between original and squeezed input 11
Detection Performance Summary Detection Rate ROC-AUC False Most Effective Dataset Threshold Positive Successful Failed Including Excluding Squeezers Found Rate AEs AEs FAEs FAEs Bit Depth (1-bit), 0.0029 98.2% 20.0% 3.98% 94.5% 99.6% MNIST Median (2x2) Bit Depth (5-bit), 1.1402 85.0% 9.1% 4.93% 95.7% 95.9% CIFAR-10 Median (2x2), Non-local Mean (13-3-2) Bit Depth (5-bit), 1.2476 85.2% 25.0% 4.70% 94.0% 94.5% ImageNet Median (2x2), Non-local Mean (11-3-4)
JSMA (LL) JSMA (Next) CW 0 (LL) CW 0 (Next) CW 2 (LL) CW 2 (Next) DeepFool CW β (LL) CW β (Next) ImageNet CIFAR-10 BIM MNIST FGSM 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Detection Performance
π = Composes with model-based defenses 14
Arms Race? WOOT (August 2017) Incorporate π A squeezed distance into loss function Untargeted Targeted (Next) Targeted (Least Likely) 64% 41% 21% (Adversary success rate on MNIST) 15
Raising the Bar or Changing the Game ? Metric Space 1: Target Classifier Metric Space 2: βOracleβ Before: find a small perturbation that changes class for classifier, but imperceptible to oracle. Now: change class for both original and squeezed classifier, but imperceptible to oracle. 16
βFeature Squeezingβ Conjecture For any distance-limited adversarial method, there exists some feature squeezer that accurately detects its adversarial examples. Intuition: if the perturbation is small (in some simple metric space), there is some squeezer that coalesces original and adversarial example into same sample. 17
Defenderβs Prediction 0 random Model Entropy En py seed Advantage Squeezer 1 Model Adversarial Prediction 1 Yes Squeezer 2 π(ππ ππ @ , ππ ππ A , β¦ , ππ ππ K ) Model Prediction 2 Input No β¦ Legitimate Squeezer k Modelβ Prediction k
More Complex Squeezers + Entropy CCS 2017 Pick a random autoencoder 19
Changing the Game Option 1: Find distance-limited adversarial methods for which it is intractable to find effective feature squeezers. Option 2: Redefine adversarial examples so distance is not limited in a simple metric space... focus of rest of the talk 20
Do Humans Matter? Metric Space 1: Metric Space 2: Metric Space 1: Metric Space 2: Machine Human Machine 1 Machine 2 21
Malware Classifiers
Automated Classifier Evasion Using Genetic Programming Benign Oracle Malicious PDF Benign PDFs Variants Found Evasive? β Variants β β β Select Clone Mutation Variants Variants
Generating Variants Malicious PDF Benign PDFs Variants Found Evasive? β Variants β β β Select Clone Mutation Variants Variants
Generating Variants /Catalog /Pages /Root 0 Found Malicious PDF Benign PDFs Variants Found /JavaScript Evasive Evasive? eval(ββ¦β); ? β Variants β β Select random node β Select Clone Mutation Randomly transform: delete , insert, replace Variants Variants
Generating Variants /Catalog 7 /Pages 63 128 /Root 0 128 Found Malicious PDF Benign PDFs 546 Variants Found /JavaScript Evasive Evasive? Nodes from eval(ββ¦β); ? Benign PDFs Variants Select random node Select Clone Mutation Randomly transform: delete , insert , replace Variants Variants
Selecting Promising Variants Malicious PDF Benign PDFs Variants Found Evasive? β Variants β β β Select Clone Mutation Variants Variants
Selecting Promising Variants Malicious /Catalog /Pages Fitness Function Malicious PDF Benign PDFs Oracle Variants Found 128 π(π‘ ghijkl , π‘ jkimm ) /Root 0 Evasive? β /JavaScript Variants β eval(ββ¦β); β Score β Candidate Variant Select Target Classifier Clone Mutation Variants Variants
Oracle Execute candidate in Cuckoo vulnerable Adobe Reader in virtual environment https://github.com/cuckoosandbox Simulated network: INetSim Behavioral signature: HTTP_URL + HOST extracted from API traces malicious if signature matches Advantage: we know the target malware behavior
Fitness Function Assumes lost malicious behavior will not be recovered π π€ = o.5 β classifier_score π€ if oracle π€ = "malicious" ββ otherwise classifier_score β₯ 0.5 : labeled malicious
500 PDFRate Seeds Evaded 400 (out of 500) 300 200 Hidost 100 0 0 100 200 300 Number of Mutations
500 PDFRate Seeds Evaded 400 (out of 500) Simple 300 transformations 200 often worked Hidost 100 0 0 100 200 300 Number of Mutations
500 PDFRate Seeds Evaded 400 (out of 500) ( insert insert , /Root/Pages/Kids, 300 3:/Root/Pages/Kids/4/Kids/5/) Works on 162/500 seeds 200 Hidost 100 0 0 100 200 300 Number of Mutations
500 PDFRate Seeds Evaded 400 (out of 500) 300 Some seeds required complex Works on 162/500 seeds 200 transformations Hidost 100 0 0 100 200 300 Number of Mutations
Possible Defenses
Possible Defense: Adjust Threshold Charles Smutz, Angelos Stavrou. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016.
Original Malicious Seeds Evading PDFrate Malicious Label Threshold
Adjust threshold? Discovered Evasive Variants
Adjust threshold? Variants found with threshold = 0.50 Variants found with threshold = 0.25
Possible Defense: Hide Classifier
Hide the Classifier Score? Malicious /Catalog /Pages Fitness Function Malicious PDF Benign PDFs Oracle Variants Found 128 π(π‘ ghijkl , π‘ jkimm ) /Root 0 Evasive? β /JavaScript Variants β eval(ββ¦β); β Score β Candidate Variant Select Target Classifier Clone Mutation Variants Variants
Binary Classifier Output is Enough ACM CCS 2017 Malicious /Catalog /Pages Fitness Function Malicious PDF Benign PDFs Oracle Variants Found 128 π(π‘ ghijkl , π‘ jkimm ) /Root 0 Evasive? β /JavaScript Variants β eval(ββ¦β); β Score β Candidate Variant Select Target Classifier Clone Mutation Variants Variants
Possible Defense: Retrain Classifier
Recommend
More recommend