Ensemble Adversarial Training A1acks and Defenses Facebook December 15 th 2017 Florian Tramèr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Ian Goodfellow (Google Brain) Dan Boneh (Stanford) Patrick McDaniel (PSU)
Adversarial Examples in ML + . 007 ⇥ = Pre8y sure I’m certain this this is a panda is a gibbon (Goodfellow et al. 2015) 2
Adversarial Examples in ML • Images Szegedy et al. 2013, Nguyen et al. 2015, Goodfellow et al. 2015, Papernot et al. 2016, Liu et al. 2016, Kurakin et al. 2016, … • Physical Objects Sharif et al. 2016, Kurakin et al. 2017, EvXmov et al. 2017, Lu et al. 2017, Athalye et al. 2017 • Malware Šrndić & Laskov 2014, Xu et al. 2016, Grosse et al. 2016, Hu et al. 2017 • Text Understanding Papernot et al. 2016, Jia & Liang 2017 • Speech Carlini et al. 2015, Cisse et al. 2017 3
CreaXng an adversarial example bird ML Model bird Loss tree plane What happens if I nudge this pixel? 4
CreaXng an adversarial example bird ML Model bird Loss tree plane What happens if I nudge this pixel? What about this one? 5
CreaXng an adversarial example bird ML Model bird Loss tree plane What about this one? Maximize loss with gradient ascent 6
Threat Model: Black-Box A1acks ML Model ML Model plane plane Adversarial ML Model plane Examples transfer 7
Defenses? • Ensembles • Preprocessing (blurring, cropping, etc.) • DisXllaXon • GeneraXve modeling ? • Adversarial training 8
Adversarial Training ML Model Loss bird a1ack ML Model Loss plane 9
Adversarial Training - Tradeoffs “weak” a8ack “strong” a8ack single step many steps 10
Adversarial Training - Tradeoffs “weak” a8ack “strong” a8ack fast slow 11
Adversarial Training - Tradeoffs “weak” a8ack “strong” a8ack not infallible but learn robust scalable models on small datasets Madry et al. 2017 12
Adversarial Training on ImageNet • Adversarial training with single-step a1ack (Kurakin et al. 2016) Top1 error 36.5 40 35 26.8 30 Adversarial Error Rate 22.0 25 examples 20 transferred from 15 10 another model 5 0 Clean Data White-Box Black-Box Single-Step Single-Step 13
What’s happening? Gradient Masking! • How to get robustness to single-step a1acks? Large Margin Classifier “Gradient Masking” 14
Loss of Adversarially Trained Model Adversarial Example Non-Adversarial Example Move in direcXon of Move in direcXon another model’s gradient of model’s gradient Data (black-box a1ack) (white-box a1ack) Point 15
Loss of Adversarially Trained Model 16
Simple A1ack: RAND+Single-Step Top1 error 64.3 70 58.3 60 Error Rate 50 40 26.8 30 20 10 0 1. Small random step 2. Step in direcXon of gradient 17
What’s wrong with “Single-Step” Adversarial Training? Minimize: self.loss ( self.attack ( ) ) SoluXon: 1. The model is actually robust Be1er approach? 2. Or, the a3ack is really bad decouple a1ack and defense Degenerate Minimum 18
Ensemble Adversarial Training pre-trained ML Model ML Model ML Model Loss 19
Domain AdaptaXon • Interpret Ensemble Adversarial Training as a mulXple-source domain adaptaXon problem – Train on distribuXons (adversaries) A 1 , … , A k – Get tested on a new adversary A’ • Provable generalizaXon if ∃ i such that A’ ≈ A i • Can’t say much about other adversaries 20
Results ImageNet (IncepLon v3, IncepLon ResNet v2) Adv. Training Ensemble Adv. Training Ensemble Adv. Training (ResNet) 40 36.5 35 30.4 30.0 26.8 30 25.9 24.6 23.6 Error Rate 22.0 25 20.2 20 15 10 5 0 Clean Data White-Box A1ack Black-Box A1ack 21
What about stronger a1acks? • Li1le gain on strong white-box a1acks! • But, improvements in black-box seQng ! 22
Open Problems • How far can we go with adversarial training? – White-box robustness is possible! (Madry et al. 2017) • Caveat 1: Very expensive • Caveat 2: What is the right metric (l ∞ , l 2 , rotaXons) ? • Can we say anything formal (and useful) about adversarial examples? – Why do they exist? Why do they transfer? THANK YOU 23
Recommend
More recommend