Ensemble Adversarial Training A1acks and Defenses Cybersecurity With The Best October 15 th 2017 Florian Tramèr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Dan Boneh (Stanford) Patrick McDaniel (PSU)
Adversarial Examples in ML + . 007 ⇥ = Pre?y sure I’m certain this this is a panda is a gibbon (Goodfellow et al. 2015) 15/10/17 Cybersecurity With The Best – Florian Tramèr 2
Adversarial Examples in ML • Images Szegedy et al. 2013, Nguyen et al. 2015, Goodfellow et al. 2015, Papernot et al. 2016, Liu et al. 2016, Kurakin et al. 2016, … • Physical Objects Sharif et al. 2016, Kurakin et al. 2017, Ev[mov et al. 2017, Lu et al. 2017 • Malware Šrndić & Laskov 2014, Xu et al. 2016, Grosse et al. 2016, Hu et al. 2017 • Text Understanding Papernot et al. 2016, Jia & Liang 2017 • Speech Carlini et al. 2015, Cisse et al. 2017 15/10/17 Cybersecurity With The Best – Florian Tramèr 3
Crea[ng an adversarial example bird ML Model bird Loss tree plane What happens if I nudge this pixel? 15/10/17 Cybersecurity With The Best – Florian Tramèr 4
Crea[ng an adversarial example bird ML Model bird Loss tree plane What happens if I nudge this pixel? What about this one? 15/10/17 Cybersecurity With The Best – Florian Tramèr 5
Crea[ng an adversarial example bird ML Model bird Loss tree plane What about this one? Maximize loss with gradient ascent 15/10/17 Cybersecurity With The Best – Florian Tramèr 6
Threat Model: Black-Box A1acks ML Model ML Model plane plane Adversarial ML Model plane Examples transfer 15/10/17 Cybersecurity With The Best – Florian Tramèr 7
Defenses? • Ensembles • Preprocessing (blurring, cropping, etc.) • Dis[lla[on • Genera[ve modeling ? • Adversarial training 15/10/17 Cybersecurity With The Best – Florian Tramèr 8
Adversarial Training ML Model Loss bird a1ack ML Model Loss plane 15/10/17 Cybersecurity With The Best – Florian Tramèr 9
Adversarial Training - Tradeoffs “weak” a?ack “strong” a?ack single step many steps 15/10/17 Cybersecurity With The Best – Florian Tramèr 10
Adversarial Training - Tradeoffs “weak” a?ack “strong” a?ack fast slow 15/10/17 Cybersecurity With The Best – Florian Tramèr 11
Adversarial Training - Tradeoffs “weak” a?ack “strong” a?ack not infallible but learn robust scalable models on small datasets Madry et al. 2017 15/10/17 Cybersecurity With The Best – Florian Tramèr 12
Adversarial Training on ImageNet • Adversarial training with single-step a1ack (Kurakin et al. 2016) Top1 error 36.5 40 35 26.8 30 Adversarial Error Rate 22.0 25 examples 20 transferred from 15 10 another model 5 0 Clean Data White-Box Black-Box Single-Step Single-Step 15/10/17 Cybersecurity With The Best – Florian Tramèr 13
What’s happening? Gradient Masking! • How to get robustness to single-step a1acks? Large Margin Classifier “Gradient Masking” 15/10/17 Cybersecurity With The Best – Florian Tramèr 14
Loss of Adversarially Trained Model Adversarial Example Non-Adversarial Example Move in direc[on of Move in direc[on another model’s gradient of model’s gradient Data (black-box a1ack) (white-box a1ack) Point 15/10/17 Cybersecurity With The Best – Florian Tramèr 15
Loss of Adversarially Trained Model 15/10/17 Cybersecurity With The Best – Florian Tramèr 16
Simple A1ack: RAND+Single-Step Top1 error 64.3 70 60 Error Rate 50 40 26.8 30 20 10 0 1. Small random step 2. Step in direc[on of gradient 15/10/17 Cybersecurity With The Best – Florian Tramèr 17
What’s wrong with “Single-Step” Adversarial Training? Minimize: self.loss ( self.attack ( ) ) Solu[on: 1. The model is actually robust Be1er approach? 2. Or, the a3ack is really bad decouple a1ack and defense Degenerate Minimum 15/10/17 Cybersecurity With The Best – Florian Tramèr 18
Ensemble Adversarial Training pre-trained ML Model ML Model ML Model Loss 15/10/17 Cybersecurity With The Best – Florian Tramèr 19
Results ImageNet (IncepNon v3, IncepNon ResNet v2) Adv. Training Ensemble Adv. Training Ensemble Adv. Training (ResNet) 40 36.5 35 30.4 30.0 26.8 30 25.9 24.6 23.6 Error Rate 22.0 25 20.2 20 15 10 5 0 Clean Data White-Box A1ack Black-Box A1ack 15/10/17 Cybersecurity With The Best – Florian Tramèr 20
What about stronger a1acks? • Li1le gain on strong white-box a1acks! • But, improvements in black-box seSng ! 15/10/17 Cybersecurity With The Best – Florian Tramèr 21
Open Problems • How far can we go with adversarial training? – White-box robustness is possible! (Madry et al. 2017) • Caveat 1: Very expensive • Caveat 2: What is the right metric (l ∞ , l 2 , rota[ons) ? • Can we say anything formal (and useful) about adversarial examples? – Why do they exist? Why do they transfer? THANK YOU 15/10/17 Cybersecurity With The Best – Florian Tramèr 22
Related Work Adversarial training + black-box a?acks: Szegedy et al., h1ps://arxiv.org/abs/1312.6199 original paper on adversarial examples Nguyen et al., h1ps://arxiv.org/abs/1412.1897 a gene[c algorithm for adversarial examples Goodfellow et al., h1ps://arxiv.org/abs/1412.6572 adversarial training with single-step a1acks Papernot et al., h1ps://arxiv.org/abs/1511.04508 the dis[lla[on defense Papernot et al., h1ps://arxiv.org/abs/1602.02697 black-box a1acks, model reverse-engineering Liu et al., h1ps://arxiv.org/abs/1611.02770 black-box a1acks on ImageNet Kurakin et al., h1ps://arxiv.org/abs/1611.01236 adversarial training on ImageNet Tramer et al., h1ps://www.usenix.org/conference/usenixsecurity16/technical-sessions/presenta[on/tramer (model reverse-engineering) Madry et al., h1ps://arxiv.org/abs/1706.06083 learning robust models with strong a1acks Tramer et al., h1ps://arxiv.org/abs/1705.07204 our paper Physical world: Sharif et al., h1ps://dl.acm.org/cita[on.cfm?id=2978392 fooling facial recogni[on with glasses Kurakin et al., h1ps://arxiv.org/abs/1607.02533 physical-world adversarial examples Lu et al., h1ps://arxiv.org/abs/1707.03501 self driving cars will be fine Es[mov et al., h1ps://arxiv.org/abs/1707.08945 maybe they won’t! 15/10/17 Cybersecurity With The Best – Florian Tramèr 23
Related Work (cont.) Malware: Srndic et al., h1ps://dl.acm.org/cita[on.cfm?id=2650798 fooling a pdf-malware detector Xu et al., h1ps://www.cs.virginia.edu/yanjun/paperA14/2016-evade_classifier.pdf (same as above) Grosse et al., h1ps://arxiv.org/abs/1606.04435 adversarial examples for Android malware Hu et al., h1ps://arxiv.org/abs/1702.05983 adversarial examples for Android malware Text: Papernot et al., h1ps://arxiv.org/abs/1604.08275 adversarial examples for text understanding Jia et al., h1ps://arxiv.org/abs/1707.07328 adversarial examples for reading comprehension Speech: Carlini et al., h1ps://www.usenix.org/conference/usenixsecurity16/technical-sessions/presenta[on/carlini (fooling a voice assistant) Cisse et al., h1ps://arxiv.org/abs/1707.05373 adversarial examples for speech, segmenta[on, etc Reinforcement Learning: Huang et al., h1ps://arxiv.org/abs/1702.02284 adversarial examples for neural network policies Kos et al., h1ps://arxiv.org/abs/1705.06452 adversarial examples for neural network policies 15/10/17 Cybersecurity With The Best – Florian Tramèr 24
Recommend
More recommend