shrinking and exploring
play

Shrinking and Exploring David Evans University of Virginia - PowerPoint PPT Presentation

evadeML. L.org Shrinking and Exploring David Evans University of Virginia Adversarial Search Spaces ARO Workshop on Adversarial Learning Stanford, 14 Sept 2017 Weilin Xu Yanjun Qi Machine Learning is Eating Computer Science 1 Security


  1. evadeML. L.org Shrinking and Exploring David Evans University of Virginia Adversarial Search Spaces ARO Workshop on Adversarial Learning Stanford, 14 Sept 2017 Weilin Xu Yanjun Qi

  2. Machine Learning is Eating Computer Science 1

  3. Security State-of-the-Art Random guessing attack success Threat models Proofs probability information πŸ‘ "πŸπŸ‘πŸ— Cryptography theoretic, resource required bounded capabilities, πŸ‘ "πŸ’πŸ‘ System Security motivations, common rationality Adversarial white-box, πŸ‘ "𝟐𝟐 *; πŸ‘ "πŸ• rare! Machine Learning black-box 2

  4. Adversarial Examples β€œpanda” β€œgibbon” + 0.007 Γ— [π‘œπ‘π‘—π‘‘π‘“] = Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples . 2014. 3

  5. Adversarial Examples Game Given seed sample, 𝑦 , find 𝑦 6 where: 𝑔 𝑦 6 β‰  𝑔(𝑦 ) Class is different (untargeted) 𝑔 𝑦 6 = 𝑒 Class is 𝑒 (targeted) βˆ† 𝑦, 𝑦 6 ≀ πœ€ Difference below threshold βˆ† 𝑦, 𝑦 6 is defined in some (simple!) metric space: 𝑀 @ β€œnorm (# different), 𝑀 A norm, 𝑀 B norm (β€œEuclidean”), 𝑀 C norm: 4

  6. Detecting Prediction 0 Adversarial Model Examples Squeezer 1 Model Adversarial Prediction 1 Yes Squeezer 2 π’ˆ(π‘žπ‘ π‘“π‘’ @ , π‘žπ‘ π‘“π‘’ A , … , π‘žπ‘ π‘“π‘’ K ) Model Prediction 2 Input No … Legitimate Squeezer k Model’ Prediction k

  7. β€œFeature Squeezing” [0.054, 0.4894, 0.9258, 0.0116, 0.2898, 0.5222, 0.5074, …] π’š Squeeze: 𝑔 O = round(𝑔 O Γ—4)/4 [0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, …] squeeze π’š 6 β‰ˆ squeeze π’š ⟹ 𝑔(squeeze π’š 6 ) β‰ˆ 𝑔(squeeze π’š ) [0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, …] Squeeze: 𝑔 O = round(𝑔 O Γ—4)/4 π’š 6 [0.0491, 0.4903, 0.9292, 0.009, 0.2942, 0.5243, 0.5078, …] 6

  8. Example Squeezers 1-bit monochrome 8-bit greyscale 3x3 smoothing: Replace with median of pixels and its neighbors Reduce Color Depth Median Smoothing 7

  9. Simple Instantiation Model Prediction 0 (7-layer CNN) Adversarial Yes Bit Depth- Model 1 Prediction 1 max 𝑀 A π‘ž @ , π‘ž A , 𝑀 A π‘ž @ , π‘ž2 > 𝑒 Input No Median Prediction 2 Legitimate 2Γ—2 Model

  10. Number of Examples 800 Legitimate 600 400 Adversarial threshold = 0.0029 200 detection: 98.2%, FP < 4% 0 0.0 0.4 0.8 1.2 1.6 2.0 Maximum 𝑀 A distance between original and squeezed input 9

  11. ImageNet Configuration Prediction 0 Model (MobileNet) Adversarial Bit Depth- Yes Model 5 Prediction 1 max(𝑀 A (π‘ž @ , {π‘ž A , π‘ž B , π‘ž b }) > 𝑒 Median Input No 2Γ—2 Model Prediction 2 Legitimate Non-local Mean Model Prediction 3

  12. 140 Legitimate 120 100 80 Adversarial 60 threshold = 1.24 40 detection: 85%, FP < 5% 20 0 0.0 0.4 0.8 1.2 1.6 2.0 Maximum 𝑀 A distance between original and squeezed input 11

  13. Detection Performance Summary Detection Rate ROC-AUC False Most Effective Dataset Threshold Positive Successful Failed Including Excluding Squeezers Found Rate AEs AEs FAEs FAEs Bit Depth (1-bit), 0.0029 98.2% 20.0% 3.98% 94.5% 99.6% MNIST Median (2x2) Bit Depth (5-bit), 1.1402 85.0% 9.1% 4.93% 95.7% 95.9% CIFAR-10 Median (2x2), Non-local Mean (13-3-2) Bit Depth (5-bit), 1.2476 85.2% 25.0% 4.70% 94.0% 94.5% ImageNet Median (2x2), Non-local Mean (11-3-4)

  14. JSMA (LL) JSMA (Next) CW 0 (LL) CW 0 (Next) CW 2 (LL) CW 2 (Next) DeepFool CW ∞ (LL) CW ∞ (Next) ImageNet CIFAR-10 BIM MNIST FGSM 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Detection Performance

  15. πœ— = Composes with model-based defenses 14

  16. Arms Race? WOOT (August 2017) Incorporate 𝑀 A squeezed distance into loss function Untargeted Targeted (Next) Targeted (Least Likely) 64% 41% 21% (Adversary success rate on MNIST) 15

  17. Raising the Bar or Changing the Game ? Metric Space 1: Target Classifier Metric Space 2: β€œOracle” Before: find a small perturbation that changes class for classifier, but imperceptible to oracle. Now: change class for both original and squeezed classifier, but imperceptible to oracle. 16

  18. β€œFeature Squeezing” Conjecture For any distance-limited adversarial method, there exists some feature squeezer that accurately detects its adversarial examples. Intuition: if the perturbation is small (in some simple metric space), there is some squeezer that coalesces original and adversarial example into same sample. 17

  19. Defender’s Prediction 0 random Model Entropy En py seed Advantage Squeezer 1 Model Adversarial Prediction 1 Yes Squeezer 2 π’ˆ(π‘žπ‘ π‘“π‘’ @ , π‘žπ‘ π‘“π‘’ A , … , π‘žπ‘ π‘“π‘’ K ) Model Prediction 2 Input No … Legitimate Squeezer k Model’ Prediction k

  20. More Complex Squeezers + Entropy CCS 2017 Pick a random autoencoder 19

  21. Changing the Game Option 1: Find distance-limited adversarial methods for which it is intractable to find effective feature squeezers. Option 2: Redefine adversarial examples so distance is not limited in a simple metric space... focus of rest of the talk 20

  22. Do Humans Matter? Metric Space 1: Metric Space 2: Metric Space 1: Metric Space 2: Machine Human Machine 1 Machine 2 21

  23. Malware Classifiers

  24. Automated Classifier Evasion Using Genetic Programming Benign Oracle Malicious PDF Benign PDFs Variants Found Evasive? βœ“ Variants βœ“ βœ— βœ“ Select Clone Mutation Variants Variants

  25. Generating Variants Malicious PDF Benign PDFs Variants Found Evasive? βœ“ Variants βœ“ βœ— βœ“ Select Clone Mutation Variants Variants

  26. Generating Variants /Catalog /Pages /Root 0 Found Malicious PDF Benign PDFs Variants Found /JavaScript Evasive Evasive? eval(β€˜β€¦β€™); ? βœ“ Variants βœ“ βœ— Select random node βœ“ Select Clone Mutation Randomly transform: delete , insert, replace Variants Variants

  27. Generating Variants /Catalog 7 /Pages 63 128 /Root 0 128 Found Malicious PDF Benign PDFs 546 Variants Found /JavaScript Evasive Evasive? Nodes from eval(β€˜β€¦β€™); ? Benign PDFs Variants Select random node Select Clone Mutation Randomly transform: delete , insert , replace Variants Variants

  28. Selecting Promising Variants Malicious PDF Benign PDFs Variants Found Evasive? βœ“ Variants βœ“ βœ— βœ“ Select Clone Mutation Variants Variants

  29. Selecting Promising Variants Malicious /Catalog /Pages Fitness Function Malicious PDF Benign PDFs Oracle Variants Found 128 𝑔(𝑑 ghijkl , 𝑑 jkimm ) /Root 0 Evasive? βœ“ /JavaScript Variants βœ“ eval(β€˜β€¦β€™); βœ— Score βœ“ Candidate Variant Select Target Classifier Clone Mutation Variants Variants

  30. Oracle Execute candidate in Cuckoo vulnerable Adobe Reader in virtual environment https://github.com/cuckoosandbox Simulated network: INetSim Behavioral signature: HTTP_URL + HOST extracted from API traces malicious if signature matches Advantage: we know the target malware behavior

  31. Fitness Function Assumes lost malicious behavior will not be recovered 𝑔 𝑀 = o.5 βˆ’ classifier_score 𝑀 if oracle 𝑀 = "malicious" βˆ’βˆž otherwise classifier_score β‰₯ 0.5 : labeled malicious

  32. 500 PDFRate Seeds Evaded 400 (out of 500) 300 200 Hidost 100 0 0 100 200 300 Number of Mutations

  33. 500 PDFRate Seeds Evaded 400 (out of 500) Simple 300 transformations 200 often worked Hidost 100 0 0 100 200 300 Number of Mutations

  34. 500 PDFRate Seeds Evaded 400 (out of 500) ( insert insert , /Root/Pages/Kids, 300 3:/Root/Pages/Kids/4/Kids/5/) Works on 162/500 seeds 200 Hidost 100 0 0 100 200 300 Number of Mutations

  35. 500 PDFRate Seeds Evaded 400 (out of 500) 300 Some seeds required complex Works on 162/500 seeds 200 transformations Hidost 100 0 0 100 200 300 Number of Mutations

  36. Possible Defenses

  37. Possible Defense: Adjust Threshold Charles Smutz, Angelos Stavrou. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016.

  38. Original Malicious Seeds Evading PDFrate Malicious Label Threshold

  39. Adjust threshold? Discovered Evasive Variants

  40. Adjust threshold? Variants found with threshold = 0.50 Variants found with threshold = 0.25

  41. Possible Defense: Hide Classifier

  42. Hide the Classifier Score? Malicious /Catalog /Pages Fitness Function Malicious PDF Benign PDFs Oracle Variants Found 128 𝑔(𝑑 ghijkl , 𝑑 jkimm ) /Root 0 Evasive? βœ“ /JavaScript Variants βœ“ eval(β€˜β€¦β€™); βœ— Score βœ“ Candidate Variant Select Target Classifier Clone Mutation Variants Variants

  43. Binary Classifier Output is Enough ACM CCS 2017 Malicious /Catalog /Pages Fitness Function Malicious PDF Benign PDFs Oracle Variants Found 128 𝑔(𝑑 ghijkl , 𝑑 jkimm ) /Root 0 Evasive? βœ“ /JavaScript Variants βœ“ eval(β€˜β€¦β€™); βœ— Score βœ“ Candidate Variant Select Target Classifier Clone Mutation Variants Variants

  44. Possible Defense: Retrain Classifier

Recommend


More recommend