Ensemble Adversarial Training A1acks and Defenses Cybersecurity - PowerPoint PPT Presentation

Ensemble Adversarial Training A1acks and Defenses Cybersecurity With The Best October 15 th 2017 Florian Tramèr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Dan Boneh (Stanford) Patrick McDaniel (PSU)

Adversarial Examples in ML + . 007 ⇥ = Pre?y sure I’m certain this this is a panda is a gibbon (Goodfellow et al. 2015) 15/10/17 Cybersecurity With The Best – Florian Tramèr 2

Adversarial Examples in ML • Images Szegedy et al. 2013, Nguyen et al. 2015, Goodfellow et al. 2015, Papernot et al. 2016, Liu et al. 2016, Kurakin et al. 2016, … • Physical Objects Sharif et al. 2016, Kurakin et al. 2017, Ev[mov et al. 2017, Lu et al. 2017 • Malware Šrndić & Laskov 2014, Xu et al. 2016, Grosse et al. 2016, Hu et al. 2017 • Text Understanding Papernot et al. 2016, Jia & Liang 2017 • Speech Carlini et al. 2015, Cisse et al. 2017 15/10/17 Cybersecurity With The Best – Florian Tramèr 3

Crea[ng an adversarial example bird ML Model bird Loss tree plane What happens if I nudge this pixel? 15/10/17 Cybersecurity With The Best – Florian Tramèr 4

Crea[ng an adversarial example bird ML Model bird Loss tree plane What happens if I nudge this pixel? What about this one? 15/10/17 Cybersecurity With The Best – Florian Tramèr 5

Crea[ng an adversarial example bird ML Model bird Loss tree plane What about this one? Maximize loss with gradient ascent 15/10/17 Cybersecurity With The Best – Florian Tramèr 6

Threat Model: Black-Box A1acks ML Model ML Model plane plane Adversarial ML Model plane Examples transfer 15/10/17 Cybersecurity With The Best – Florian Tramèr 7

Defenses? • Ensembles • Preprocessing (blurring, cropping, etc.) • Dis[lla[on • Genera[ve modeling ? • Adversarial training 15/10/17 Cybersecurity With The Best – Florian Tramèr 8

Adversarial Training ML Model Loss bird a1ack ML Model Loss plane 15/10/17 Cybersecurity With The Best – Florian Tramèr 9

Adversarial Training - Tradeoffs “weak” a?ack “strong” a?ack single step many steps 15/10/17 Cybersecurity With The Best – Florian Tramèr 10

Adversarial Training - Tradeoffs “weak” a?ack “strong” a?ack fast slow 15/10/17 Cybersecurity With The Best – Florian Tramèr 11

Adversarial Training - Tradeoffs “weak” a?ack “strong” a?ack not infallible but learn robust scalable models on small datasets Madry et al. 2017 15/10/17 Cybersecurity With The Best – Florian Tramèr 12

Adversarial Training on ImageNet • Adversarial training with single-step a1ack (Kurakin et al. 2016) Top1 error 36.5 40 35 26.8 30 Adversarial Error Rate 22.0 25 examples 20 transferred from 15 10 another model 5 0 Clean Data White-Box Black-Box Single-Step Single-Step 15/10/17 Cybersecurity With The Best – Florian Tramèr 13

What’s happening? Gradient Masking! • How to get robustness to single-step a1acks? Large Margin Classifier “Gradient Masking” 15/10/17 Cybersecurity With The Best – Florian Tramèr 14

Loss of Adversarially Trained Model Adversarial Example Non-Adversarial Example Move in direc[on of Move in direc[on another model’s gradient of model’s gradient Data (black-box a1ack) (white-box a1ack) Point 15/10/17 Cybersecurity With The Best – Florian Tramèr 15

Loss of Adversarially Trained Model 15/10/17 Cybersecurity With The Best – Florian Tramèr 16

Simple A1ack: RAND+Single-Step Top1 error 64.3 70 60 Error Rate 50 40 26.8 30 20 10 0 1. Small random step 2. Step in direc[on of gradient 15/10/17 Cybersecurity With The Best – Florian Tramèr 17

What’s wrong with “Single-Step” Adversarial Training? Minimize: self.loss ( self.attack ( ) ) Solu[on: 1. The model is actually robust Be1er approach? 2. Or, the a3ack is really bad decouple a1ack and defense Degenerate Minimum 15/10/17 Cybersecurity With The Best – Florian Tramèr 18

Ensemble Adversarial Training pre-trained ML Model ML Model ML Model Loss 15/10/17 Cybersecurity With The Best – Florian Tramèr 19

Results ImageNet (IncepNon v3, IncepNon ResNet v2) Adv. Training Ensemble Adv. Training Ensemble Adv. Training (ResNet) 40 36.5 35 30.4 30.0 26.8 30 25.9 24.6 23.6 Error Rate 22.0 25 20.2 20 15 10 5 0 Clean Data White-Box A1ack Black-Box A1ack 15/10/17 Cybersecurity With The Best – Florian Tramèr 20

What about stronger a1acks? • Li1le gain on strong white-box a1acks! • But, improvements in black-box seSng ! 15/10/17 Cybersecurity With The Best – Florian Tramèr 21

Open Problems • How far can we go with adversarial training? – White-box robustness is possible! (Madry et al. 2017) • Caveat 1: Very expensive • Caveat 2: What is the right metric (l ∞ , l 2 , rota[ons) ? • Can we say anything formal (and useful) about adversarial examples? – Why do they exist? Why do they transfer? THANK YOU 15/10/17 Cybersecurity With The Best – Florian Tramèr 22

Related Work Adversarial training + black-box a?acks: Szegedy et al., h1ps://arxiv.org/abs/1312.6199 original paper on adversarial examples Nguyen et al., h1ps://arxiv.org/abs/1412.1897 a gene[c algorithm for adversarial examples Goodfellow et al., h1ps://arxiv.org/abs/1412.6572 adversarial training with single-step a1acks Papernot et al., h1ps://arxiv.org/abs/1511.04508 the dis[lla[on defense Papernot et al., h1ps://arxiv.org/abs/1602.02697 black-box a1acks, model reverse-engineering Liu et al., h1ps://arxiv.org/abs/1611.02770 black-box a1acks on ImageNet Kurakin et al., h1ps://arxiv.org/abs/1611.01236 adversarial training on ImageNet Tramer et al., h1ps://www.usenix.org/conference/usenixsecurity16/technical-sessions/presenta[on/tramer (model reverse-engineering) Madry et al., h1ps://arxiv.org/abs/1706.06083 learning robust models with strong a1acks Tramer et al., h1ps://arxiv.org/abs/1705.07204 our paper Physical world: Sharif et al., h1ps://dl.acm.org/cita[on.cfm?id=2978392 fooling facial recogni[on with glasses Kurakin et al., h1ps://arxiv.org/abs/1607.02533 physical-world adversarial examples Lu et al., h1ps://arxiv.org/abs/1707.03501 self driving cars will be fine Es[mov et al., h1ps://arxiv.org/abs/1707.08945 maybe they won’t! 15/10/17 Cybersecurity With The Best – Florian Tramèr 23

Related Work (cont.) Malware: Srndic et al., h1ps://dl.acm.org/cita[on.cfm?id=2650798 fooling a pdf-malware detector Xu et al., h1ps://www.cs.virginia.edu/yanjun/paperA14/2016-evade_classifier.pdf (same as above) Grosse et al., h1ps://arxiv.org/abs/1606.04435 adversarial examples for Android malware Hu et al., h1ps://arxiv.org/abs/1702.05983 adversarial examples for Android malware Text: Papernot et al., h1ps://arxiv.org/abs/1604.08275 adversarial examples for text understanding Jia et al., h1ps://arxiv.org/abs/1707.07328 adversarial examples for reading comprehension Speech: Carlini et al., h1ps://www.usenix.org/conference/usenixsecurity16/technical-sessions/presenta[on/carlini (fooling a voice assistant) Cisse et al., h1ps://arxiv.org/abs/1707.05373 adversarial examples for speech, segmenta[on, etc Reinforcement Learning: Huang et al., h1ps://arxiv.org/abs/1702.02284 adversarial examples for neural network policies Kos et al., h1ps://arxiv.org/abs/1705.06452 adversarial examples for neural network policies 15/10/17 Cybersecurity With The Best – Florian Tramèr 24

Ensemble Adversarial Training A1acks and Defenses Cybersecurity - PowerPoint PPT Presentation

Ensemble Adversarial Training A1acks and Defenses Cybersecurity With The Best October 15 th 2017 Florian Tramr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Dan Boneh (Stanford) Patrick McDaniel (PSU)

Ensemble Adversarial Training A1acks and Defenses Facebook December 15 th 2017 Florian Tramr

Improving Adversarial Robustness via Promoting Ensemble Diversity Tianyu

Friendly Adversarial Training: Attacks Which Do Not Kill Training Make Adversarial Learning

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Toward Adversarial Robustness by Diversity in an Ensemble of Specialized Deep Neural Networks

A-NICE-MC Jiaming Song 1. Motivation 2. Notations and Problem Setup 3. Adversarial Training for

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Adversarial Training and Robustness for Multiple Perturbations Poster #87 Florian Tramr &

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Security

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Guest

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Adversarial Examples and Adversarial Training Innova&ve Technology Leader program January 22

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

An Ensemble of Face Recognition Algorithms for Unsupervised Expansion of Training Data

Adversarial Training for Weakly Supervised Event Detection Xiaozhi Wang 1 , Xu Han 1 , Zhiyuan Liu

Introduction to Generative Adversarial Networks Ian Goodfellow, OpenAI Research Scientist NIPS

Monge blunts Bayes: Hardness Results for Adversarial Training Zac Cranko Aditya Krishna Menon

Monge blunts Bayes: Hardness Results for Adversarial Training Zac Cranko Aditya Krishna Menon

Robust Multilingual Part-of-Speech Tagging via Adversarial Training (NAACL 2018) Michihiro

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Ensemble Adversarial Training A1acks and Defenses Cybersecurity - PowerPoint PPT Presentation

Ensemble Adversarial Training A1acks and Defenses Cybersecurity With The Best October 15 th 2017 Florian Tramr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Dan Boneh (Stanford) Patrick McDaniel (PSU)

Ensemble Adversarial Training A1acks and Defenses Facebook December 15 th 2017 Florian Tramr

Improving Adversarial Robustness via Promoting Ensemble Diversity Tianyu

Friendly Adversarial Training: Attacks Which Do Not Kill Training Make Adversarial Learning

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Toward Adversarial Robustness by Diversity in an Ensemble of Specialized Deep Neural Networks

A-NICE-MC Jiaming Song 1. Motivation 2. Notations and Problem Setup 3. Adversarial Training for

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Adversarial Training and Robustness for Multiple Perturbations Poster #87 Florian Tramr &amp;

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Security

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Guest

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Adversarial Examples and Adversarial Training Innova&amp;ve Technology Leader program January 22

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

An Ensemble of Face Recognition Algorithms for Unsupervised Expansion of Training Data

Adversarial Training for Weakly Supervised Event Detection Xiaozhi Wang 1 , Xu Han 1 , Zhiyuan Liu

Introduction to Generative Adversarial Networks Ian Goodfellow, OpenAI Research Scientist NIPS

Monge blunts Bayes: Hardness Results for Adversarial Training Zac Cranko Aditya Krishna Menon

Monge blunts Bayes: Hardness Results for Adversarial Training Zac Cranko Aditya Krishna Menon

Robust Multilingual Part-of-Speech Tagging via Adversarial Training (NAACL 2018) Michihiro

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Adversarial Training and Robustness for Multiple Perturbations Poster #87 Florian Tramr &

Adversarial Examples and Adversarial Training Innova&ve Technology Leader program January 22