distillation as a defense to adversarial perturbations
play

Distillation as a Defense to Adversarial Perturbations against Deep - PowerPoint PPT Presentation

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks Nicolas Papernot , Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami May 24th, 2016 @ 37th IEEE Symposium on Security and Privacy @NicolasPapernot 1 M


  1. Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks Nicolas Papernot , Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami May 24th, 2016 @ 37th IEEE Symposium on Security and Privacy @NicolasPapernot 1

  2. M components N components p0=0.01 … p1=0.93 “Type a quote here.” … … … p8=0.02 { pN=0.01 –Johnny Appleseed Input Layer Hidden Layers Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 2

  3. M components N components p0=0.01 … p1=0.02 “Type a quote here.” … … … p8=0.89 { pN=0.01 –Johnny Appleseed Input Layer Hidden Layers Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 3

  4. 4

  5. Deep Learning for Classification 5

  6. M components N components … … … { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 6

  7. M components N components … … … { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 7

  8. M components N components … … … { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 8

  9. M components N components … … … { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 9

  10. M components N components … … … { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 10

  11. M components N components … … … { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 11

  12. M components N components … … … { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 12

  13. M components N components p0=0.01 p1=0.93 … … … p8=0.02 pN=0.01 { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 13

  14. Language Model Meaning Decision Trees Sentence Word Feature Extraction Phoneme NLP State Frame Lexicon Audio Acoustic Model Source: Tara N. Sainath, Google @ ICML DL Workshop 2015 14

  15. Adversarial Samples 15

  16. Output classification 0 1 2 3 4 5 6 7 8 9 CIFAR10 Dataset 9 8 7 6 5 4 3 2 1 0 truck automobile bird airplane bird Input class 16

  17. Adversarial strategy 17

  18. Defending against Adversarial Perturbations 18

  19. DNN Robustness 19

  20. Defense Design • Low impact on the architecture • Maintain accuracy • Robust in space relatively close to the legitimate distribution • Maintain speed of network 20

  21. Softmax Layer and Probabilities 21

  22. Defensive Distillation 22

  23. Defensive Distillation 23

  24. Defensive Distillation 24

  25. Defensive Distillation 25

  26. Defensive Distillation 26

  27. Defensive Distillation 27

  28. Defensive Distillation 28

  29. Defensive Distillation Set temperature T=1 for predictions 29

  30. Intuition behind Defensive Distillation Constraining Training Reducing Jacobian Amplitudes 0 if i not correct class never equal to 0 30

  31. Validation 31

  32. Experimental Setup 32

  33. Adversarial Samples Success Rate (MNIST) Adversarial Samples Baseline Rate (MNIST) Adversarial Samples Success Rate (CIFAR10) Adversarial Samples Baseline Rate (CIFAR10) 100 90 Adversarial Sample Success Rate 80 70 60 50 40 30 20 10 0 1 10 100 Distillation Temperature 33

  34. Impact on accuracy 34

  35. Impact on Jacobian Amplitude 35

  36. Estimation of Robustness 36

  37. Conclusions 37

  38. Take aways • Distillation significantly reduces attack success • Yields model smoothness • Easy implementation, low overhead • Acceptable impact on accuracy 38

  39. Questions? https://www.papernot.fr nicolas@papernot.fr @NicolasPapernot

Recommend


More recommend