gradient masking in machine learning
play

Gradient Masking in Machine Learning Nicolas Papernot Pennsylvania - PowerPoint PPT Presentation

@NicolasPapernot Gradient Masking in Machine Learning Nicolas Papernot Pennsylvania State University ARO Workshop on Adversarial Machine Learning, Stanford September 2017 @NicolasPapernot Thank you to our collaborators Sandy Huang


  1. @NicolasPapernot Gradient Masking in Machine Learning Nicolas Papernot Pennsylvania State University ARO Workshop on Adversarial Machine Learning, Stanford September 2017

  2. @NicolasPapernot Thank you to our collaborators Sandy Huang (Berkeley) Pieter Abbeel (Berkeley) Somesh Jha (U of Wisconsin) Michael Backes (CISPA) Alexey Kurakin (Google) Dan Boneh (Stanford) Praveen Manoharan (CISPA) Z. Berkay Celik (Penn State) Patrick McDaniel (Penn State) Yan Duan (OpenAI) Arunesh Sinha (U of Michigan) Ian Goodfellow (Google) Ananthram Swami (US ARL) Matt Fredrikson (CMU) Florian Tramèr (Stanford) Kathrin Grosse (CISPA) Michael Wellman (U of Michigan) 2

  3. Gradient Masking 3

  4. Training Small when prediction is correct on legitimate input 4

  5. Adversarial training Small when prediction is Small when prediction is correct on legitimate input correct on adversarial input 5

  6. Gradient masking in adversarially trained models Direction of Direction of the adversarially another model’s trained model’s gradient gradient Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses 6 Illustration adapted from slides by Florian Tramèr

  7. Gradient masking in adversarially trained models Adversarial example Non-adversarial example Direction of Direction of the adversarially another model’s trained model’s gradient gradient Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses 7 Illustration adapted from slides by Florian Tramèr

  8. Gradient masking in adversarially trained models Adversarial example Non-adversarial example Non-adversarial example Direction of the Direction of adversarially trained another model’s model’s gradient gradient Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses 8 Illustration adapted from slides by Florian Tramèr

  9. Evading gradient masking (1) Threat model: white-box adversary Attack: (1) Random step (of norm alpha) (2) FGSM step (of norm eps - alpha) Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses 9 Illustration adapted from slides by Florian Tramèr

  10. Evading gradient masking (2) Threat model: black-box attack Attack: (1) Learn substitute for defended model (2) Find adversarial direction using substitute Papernot et al. Practical Black-Box Attacks against Machine Learning 10 Papernot et al. Towards the Science of Security and Privacy in Machine Learning

  11. Attacking black-box models Local Black-box substitute ML sys “no truck sign” “STOP sign” (1) The adversary queries the remote ML system with synthetic inputs to learn a local substitute. 11 Papernot et al. Practical Black-box Attacks Against Machine Learning

  12. Attacking black-box models “yield sign” Local Black-box substitute ML sys (2) The adversary uses the local substitute to craft adversarial examples. 12 Papernot et al. Practical Black-box Attacks Against Machine Learning

  13. Adversarial example transferability 13 Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

  14. Large adversarial subspaces enable transferability On average: 44 orthogonal directions -> 25 transfer 14 Tramèr et al. The Space of Transferable Adversarial Examples

  15. Adversarial training Small when prediction is Small when prediction is correct on legitimate input correct on adversarial input gradient is not adversarial 15

  16. Ensemble Adversarial Training 16

  17. Ensemble adversarial training Intuition: present adversarial gradients from multiple models during training Adversarial Model A gradient 17

  18. Ensemble adversarial training Intuition: present adversarial gradients from multiple models during training Model A Inference 18

  19. Ensemble adversarial training Intuition: present adversarial gradients from multiple models during training Adversarial Model A Model B Model C Model D gradient 19

  20. Ensemble adversarial training Intuition: present adversarial gradients from multiple models during training Adversarial Model A Model B Model C Model D gradient 20

  21. Ensemble adversarial training Intuition: present adversarial gradients from multiple models during training Adversarial Model A Model B Model C Model D gradient 21

  22. Ensemble adversarial training Intuition: present adversarial gradients from multiple models during training Adversarial Model A Model B Model C Model D gradient 22

  23. Ensemble adversarial training Intuition: present adversarial gradients from multiple models during training Adversarial Model A Model B Model C Model D gradient 23

  24. Ensemble adversarial training Intuition: present adversarial gradients from multiple models during training Model A Model B Model C Model D Inference 24

  25. Experimental results on MNIST (from holdout) 25

  26. Experimental results on ImageNet (from holdout) 26

  27. Reproducible Adversarial ML research with CleverHans 27

  28. CleverHans library guiding principles 1. Benchmark reproducibility 2. Can be used with any TensorFlow model 3. Always include state-of-the-art attacks and defenses 28

  29. Growing community 1.1K+ stars 290+ forks 35 contributors 29

  30. Adversarial examples represent worst-case distribution drifts 30 [DDS04] Dalvi et al. Adversarial Classification (KDD)

  31. Adversarial examples are a tangible instance of hypothetical AI safety problems 31 Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg

  32. nicolas@papernot.fr www.cleverhans.io @NicolasPapernot ? Thank you for listening! Get involved at: github.com/tensorflow/cleverhans This research was funded by: 32

Recommend


More recommend