Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks Nicolas Papernot , Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami May 24th, 2016 @ 37th IEEE Symposium on Security and Privacy @NicolasPapernot 1
M components N components p0=0.01 … p1=0.93 “Type a quote here.” … … … p8=0.02 { pN=0.01 –Johnny Appleseed Input Layer Hidden Layers Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 2
M components N components p0=0.01 … p1=0.02 “Type a quote here.” … … … p8=0.89 { pN=0.01 –Johnny Appleseed Input Layer Hidden Layers Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 3
4
Deep Learning for Classification 5
M components N components … … … { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 6
M components N components … … … { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 7
M components N components … … … { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 8
M components N components … … … { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 9
M components N components … … … { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 10
M components N components … … … { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 11
M components N components … … … { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 12
M components N components p0=0.01 p1=0.93 … … … p8=0.02 pN=0.01 { Hidden Layers Input Layer Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O 13
Language Model Meaning Decision Trees Sentence Word Feature Extraction Phoneme NLP State Frame Lexicon Audio Acoustic Model Source: Tara N. Sainath, Google @ ICML DL Workshop 2015 14
Adversarial Samples 15
Output classification 0 1 2 3 4 5 6 7 8 9 CIFAR10 Dataset 9 8 7 6 5 4 3 2 1 0 truck automobile bird airplane bird Input class 16
Adversarial strategy 17
Defending against Adversarial Perturbations 18
DNN Robustness 19
Defense Design • Low impact on the architecture • Maintain accuracy • Robust in space relatively close to the legitimate distribution • Maintain speed of network 20
Softmax Layer and Probabilities 21
Defensive Distillation 22
Defensive Distillation 23
Defensive Distillation 24
Defensive Distillation 25
Defensive Distillation 26
Defensive Distillation 27
Defensive Distillation 28
Defensive Distillation Set temperature T=1 for predictions 29
Intuition behind Defensive Distillation Constraining Training Reducing Jacobian Amplitudes 0 if i not correct class never equal to 0 30
Validation 31
Experimental Setup 32
Adversarial Samples Success Rate (MNIST) Adversarial Samples Baseline Rate (MNIST) Adversarial Samples Success Rate (CIFAR10) Adversarial Samples Baseline Rate (CIFAR10) 100 90 Adversarial Sample Success Rate 80 70 60 50 40 30 20 10 0 1 10 100 Distillation Temperature 33
Impact on accuracy 34
Impact on Jacobian Amplitude 35
Estimation of Robustness 36
Conclusions 37
Take aways • Distillation significantly reduces attack success • Yields model smoothness • Easy implementation, low overhead • Acceptable impact on accuracy 38
Questions? https://www.papernot.fr nicolas@papernot.fr @NicolasPapernot
Recommend
More recommend