Transferable Adversarial Examples: Insights, A9acks & Defenses June 12 th 2017 Florian Tramèr Joint work with Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh & Patrick McDaniel
Adversarial Examples Threat Model: White-Box A9acks ground truth ML Model bird Loss tree plane Take gradient of the loss “Fast Gradient Sign Method” (FGSM) r = ✏ · sign ( r x J ( x, y, ✓ )) 2
Adversarial Examples Threat Model: White-Box A9acks ML Model Hypothetical Attacks on Autonomous Vehicles bird + tree plane Denial of service Confusing object k r k ∞ = ✏ Adversarial “Fast Gradient Sign Method” (FGSM) Harm self / passengers input Harm others recognized as r = ✏ · sign ( r x J ( x, y, ✓ )) “navigable road” 3 Adversarial input recognized as “open space on the road”
Adversarial Examples Threat Model: Black-Box A9acks ML Model ML Model plane plane ML Model plane Adversarial Examples transfer 4
The Space of Transferable Adversarial Examples 5
How large is the “space” of adversarial examples? • At least 2D – Warde-Farley & Goodfellow 2016 – Liu et al. 2017 Church window plots. Warde-Farley & Goodfellow 2016 6
Gradient-Aligned Subspaces • Adversarial examples form a con)guous subspace of “high” dimensionality – 15-45 dimensions for DNNs and CNNs on MNIST – IntersecAon of adversarial subspaces is also mul^dimensional 7
Decision Boundary Similarity Distance between boundaries Distance to boundary 8
Decision Boundary Similarity • Experiments with MNIST and DREBIN (malware) – DNN, Logis^c Regression, SVM – 3 direc^ons: • Aligned with gradient (adversarial example) • In direc^on of data point of different class • In random direc^on Models are similar “everywhere” • Results: In any direc^on, distance to boundary ≫ distance btw boundaries 9
Open Ques^ons • Why this similarity? – Data dependent results? – E.g., for a binary MNIST task (3s vs 7s) we prove: If F 1 (linear model) and F 2 (quadraAc model) have high accuracy, then there are adversarial examples that transfer between the two models – These adversarial examples also transfer to DNNs and CNNs but we can’t prove this is inherent … 10
Transferability and Adversarial Training 11
Adversarial Training ML Model Loss bird take gradient (FGSM) ML Model Loss plane 12
A9acks on Adversarial Training MNIST ImageNet (top1) 18.2 36.5 20 40 35 Error Rate (%) Error Rate (%) 26.8 15 30 22.0 25 10 20 15 3.6 5 10 1.0 5 0 0 Adversarial examples transferred from another (standard) model 13
Gradient Masking • How to get robustness to FGSM-style a9acks? Large Margin Classifier “Gradient Masking” 14
Loss of Adversarially Trained Model Adversarial Loss Example Non-Adversarial Example Move in direc^on of Move in direc^on another model’s gradient of model’s gradient (black-box a9ack) (white-box a9ack) Data Point 15
Loss of Adversarially Trained Model Loss 16
Simple One-Shot A9ack: RAND+FGSM MNIST 34.1 40 Error Rate (%) 20 3.6 0 FGSM RAND+FGSM ImageNet (top1) 80 64.3 Error Rate (%) 60 40 26.8 20 1. Small random step 0 FGSM RAND+FGSM 2. Step in direc^on of gradient 17
FGSM vs RAND+FGSM • An improved one-shot a9ack even against non-defended models: ≈ + 4% error on MNIST ≈ + 11% error on ImageNet • Adversarial training with RAND+FGSM – Doesn’t work … – Are we stuck with adversarial training? 18
What’s Wrong with Adversarial Training? • Minimize loss( x, y ) + loss( x + ✏ · sign (grad) , y ) Small if: 1. The model is actually robust 2. Or, the gradient points in a direcAon that is not adversarial Degenerate Minimum 19
Ensemble Adversarial Training • How do we avoid these degenerate minima? pre-trained ML Model ML Model ML Model Loss 20
Results Source model for a9ack was not used MNIST (standard CNN) during training 18 15.5 Adv. Training Ensemble Adv. Training 16 14 Less white-box FGSM 12 Error Rate samples seen during 10 training 8 6.0 6 3.9 3.8 4 2 0.7 0.7 0 Clean Data White-Box FGSM A9ack Black-Box FGSM A9ack 21
Results ImageNet (Incep)on v3, Incep)on ResNet v2) Adv. Training Ensemble Adv. Training Ensemble Adv. Training (ResNet) 40 36.5 35 30.4 30.0 26.8 30 25.9 24.6 23.6 Error Rate 22.0 25 20.2 20 15 10 5 0 Clean Data White-Box FGSM A9ack Black-Box FGSM A9ack 22
What about stronger a9acks? • Li9le to no improvement on white-box itera^ve and RAND+FGSM a9acks! • But, improvements in black-box seMng ! Black-Box APacks on MNIST Adv. Training Ensemble Adv. Training 20.0 15.5 15.2 ≈ ≈ Error Rate 13.5 9.5 7.0 6.2 10.0 3.9 2.9 0.0 FGSM Carlini-Wagner I-FGSM RAND+FGSM 23
What about stronger a9acks? Black-Box APacks on ImageNet Adv. Training Ensemble Adv. Training Ensemble Adv. Training (ResNet) 36.5 40.0 35.0 30.8 30.4 29.9 30.0 25.0 24.6 Error Rate 25.0 20.0 15.0 10.0 5.0 0.0 FGSM RAND+FGSM 24
Prac^cal Considera^ons for Ensemble Adversarial Training • Pre-compute gradients for pre-trained models – Lower per-batch cost than with adversarial training! • Randomize source model in each batch – If num_models % num_batches = 0 , we see the same adversarial examples in each epoch if we just rotate Maybe because • Convergence is slower the task is Standard Incep^on v3: ~150 epochs actually hard?... Adversarial training: ~190 epochs Ensemble adversarial training: ~280 epochs 25
Takeaways • Test defenses on black-box a9acks – Dis^lla^on (Papernot et al. 2016, a9ack by Carlini et al. 2016) – Biologically Inspired Networks (Nayebi & Ganguli 2017, a9ack by Brendel & Bethge 2017) – Adversarial Training, and probably many others … • « If you don’t know where to go, just move at random. » — Morgan Freeman — (or Dan Boneh) � • Ensemble Adversarial Training can improve robustness to black-box a9acks 26
Open Problems • Be9er black-box a9acks? – Using ensemble of source models? (Lin et al. 2017) – How much does oracle access to the model help? • More efficient ensemble adversarial training? • Can we say anything formal (and useful) about adversarial examples? THANK YOU 27
Recommend
More recommend