recent trends in adversarial machine learning
play

Recent Trends in Adversarial Machine Learning Thanks to Ian - PowerPoint PPT Presentation

Recent Trends in Adversarial Machine Learning Thanks to Ian Goodfellow, Somesh Jha, Patrick McDaniel, and Nicolas Papernot for some slides December 4, 2018 Berkay Celik How it works training Learning Algorithm Training Data Model


  1. Recent Trends in Adversarial Machine Learning Thanks to Ian Goodfellow, Somesh Jha, Patrick McDaniel, and Nicolas Papernot for some slides December 4, 2018 Berkay Celik

  2. How it works … training … Learning Algorithm Training Data Model (Deep Learning, Decision Trees, others … ) Learning : find classifier function that minimize a cost/loss (~model error)

  3. How it works … run-time … Machine [0.01, 0.84 , 0.02, 0.01, Learning 0.01, 0.01, 0.05, 0.01, 0.03, Classifier 0.01] Inference time : which ”class” is most like the input sample

  4. An Example … M components N components p0=0.01 … p1=0.93 … … … p8=0.02 { pN=0.01 Input Layer Hidden Layers Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O

  5. I.I.D. Machine Learning I: Independent I: Identically D: Distributed All train and test examples drawn independently from same distribution

  6. ML reached “human-level performance” on many IID tasks circa 2013 ...recognizing objects and faces…. (Szegedy et al, 2014) (Taigmen et al, 2013) ...solving CAPTCHAS and reading addresses... (Goodfellow et al, 2013) (Goodfellow et al, 2013)

  7. Caveats to “human-level” benchmarks The test data is not very Humans are not very good diverse. ML models are fooled at some parts of the by natural but unusual data. benchmark

  8. Security Requires Moving Beyond I.I.D. • Not identical: attackers can use unusual inputs (Eykholt et al, 2017) • Not independent: attacker can repeatedly send a single mistake (“test set attack”)

  9. Good models make surprising mistakes in non-IID setting “Adversarial examples” + = Schoolbus Ostrich Perturbation (rescaled for visualization) (Szegedy et al, 2013)

  10. Adversarial Examples

  11. Attacks on the machine learning pipeline Learned parameters Recovery of sensitive Learning algorithm training data Training data Training set Test output poisoning Test input Model theft Adversarial Examples

  12. Definition “Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake” (Goodfellow et al 2017)

  13. Threat Model

  14. Fifty Shades of Gray Box Attacks • Does the attacker go first, and the defender reacts? • This is easy, just train on the attacks, or design some preprocessing to remove them • If the defender goes first • Does the attacker have full knowledge? This is “white box” • Limited knowledge: “black box” • Does the attacker know the task the model is solving (input space, output space, defender cost) ? • Does the attacker know the machine learning algorithm being used? • Details of the algorithm? (Neural net architecture, etc.) • Learned parameters of the model? • Can the attacker send “probes” to see how the defender processes different test inputs? • Does the attacker observe just the output class? Or also the probabilities?

  15. Roadmap • WHITE-BOX ATTACKS • BLACK-BOX ATTACKS • TRANSFERABILITY • DEFENSE TECHNIQUES

  16. White Box Attacks

  17. FGSM (Misclassification)

  18. Intuition

  19. JSMA (targeted)

  20. Carlini-Wagner (CW) (targeted)

  21. Success of an adversarial image Experiments excluding MNIST 1s, many of which look like 7s Pair Diff L 0 L 1 L 2 L ∞ L 0 63 35.0 4.86 1.0 .996 L 1 91 19.9 3.21 1.0 L 2 110 21.7 2.83 L ∞ 121 34.0 3.82 .76

  22. Black-box Attacks

  23. Black-box Attacks

  24. Transferability

  25. Roadmap • WHITE-BOX ATTACKS • BLACK-BOX ATTACKS • TRANSFERABILITY • DEFENSE TECHNIQUES

  26. Pipeline of Defense Failures Does not generalize over threat models Seems to generalize, but it’s an illusion Does not generalize over attack algos Does not affect adaptive attacker Reduces advx, but reduces clean accuracy too much No effect on advx

  27. Pipeline of Defense Failures Dropout at Train Time Does not generalize over threat models Seems to generalize, but it’s an illusion Does not generalize over attack algos Does not affect adaptive attacker Reduces advx, but reduces clean accuracy too much No effect on advx

  28. Pipeline of Defense Failures Weight Decay Does not generalize over threat models Seems to generalize, but it’s an illusion Does not generalize over attack algos Does not affect adaptive attacker Reduces advx, but reduces clean accuracy too much No effect on advx

  29. Pipeline of Defense Failures original foveal Cropping / fovea mechanisms Does not generalize over threat models Seems to generalize, but it’s an illusion Does not generalize over attack algos Does not affect adaptive attacker Reduces advx, but reduces clean accuracy too much No effect on advx

  30. Pipeline of Defense Failures Adversarial Training with a Weak Attack Does not generalize over threat models Seems to generalize, but it’s an illusion Does not generalize over attack algos Does not affect adaptive attacker Reduces advx, but reduces clean accuracy too much No effect on advx

  31. Pipeline of Defense Failures Defensive Distillation Does not generalize over threat models Seems to generalize, but it’s an illusion Does not generalize over attack algos Does not affect adaptive attacker Reduces advx, but reduces clean accuracy too much No effect on advx

  32. Pipeline of Defense Failures Adversarial Training with a Strong Attack Current Certified / Provable Defenses Does not generalize over threat models Seems to generalize, but it’s an illusion Does not generalize over attack algos Does not affect adaptive attacker Reduces advx, but reduces clean accuracy too much No effect on advx

  33. What’s next defense?

  34. Future Directions • Common goal (AML and ML) • Just make the model better • They still share this goal • It is now clear security research must have some independent goals. For two models with the same error volume, for reasons of security we prefer: • The model with lower confidence on mistakes • The model whose mistakes are harder to find

  35. h.ps://beerkay.github.io @ZBerkayCelik THANKS! December 4, 2018 Berkay Celik

Recommend


More recommend