Ensemble Adversarial Training A1acks and Defenses Facebook December - PowerPoint PPT Presentation

Ensemble Adversarial Training A1acks and Defenses Facebook December 15 th 2017 Florian Tramèr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Ian Goodfellow (Google Brain) Dan Boneh (Stanford) Patrick McDaniel (PSU)

Adversarial Examples in ML + . 007 ⇥ = Pre8y sure I’m certain this this is a panda is a gibbon (Goodfellow et al. 2015) 2

Adversarial Examples in ML • Images Szegedy et al. 2013, Nguyen et al. 2015, Goodfellow et al. 2015, Papernot et al. 2016, Liu et al. 2016, Kurakin et al. 2016, … • Physical Objects Sharif et al. 2016, Kurakin et al. 2017, EvXmov et al. 2017, Lu et al. 2017, Athalye et al. 2017 • Malware Šrndić & Laskov 2014, Xu et al. 2016, Grosse et al. 2016, Hu et al. 2017 • Text Understanding Papernot et al. 2016, Jia & Liang 2017 • Speech Carlini et al. 2015, Cisse et al. 2017 3

CreaXng an adversarial example bird ML Model bird Loss tree plane What happens if I nudge this pixel? 4

CreaXng an adversarial example bird ML Model bird Loss tree plane What happens if I nudge this pixel? What about this one? 5

CreaXng an adversarial example bird ML Model bird Loss tree plane What about this one? Maximize loss with gradient ascent 6

Threat Model: Black-Box A1acks ML Model ML Model plane plane Adversarial ML Model plane Examples transfer 7

Defenses? • Ensembles • Preprocessing (blurring, cropping, etc.) • DisXllaXon • GeneraXve modeling ? • Adversarial training 8

Adversarial Training ML Model Loss bird a1ack ML Model Loss plane 9

Adversarial Training - Tradeoffs “weak” a8ack “strong” a8ack single step many steps 10

Adversarial Training - Tradeoffs “weak” a8ack “strong” a8ack fast slow 11

Adversarial Training - Tradeoffs “weak” a8ack “strong” a8ack not infallible but learn robust scalable models on small datasets Madry et al. 2017 12

Adversarial Training on ImageNet • Adversarial training with single-step a1ack (Kurakin et al. 2016) Top1 error 36.5 40 35 26.8 30 Adversarial Error Rate 22.0 25 examples 20 transferred from 15 10 another model 5 0 Clean Data White-Box Black-Box Single-Step Single-Step 13

What’s happening? Gradient Masking! • How to get robustness to single-step a1acks? Large Margin Classifier “Gradient Masking” 14

Loss of Adversarially Trained Model Adversarial Example Non-Adversarial Example Move in direcXon of Move in direcXon another model’s gradient of model’s gradient Data (black-box a1ack) (white-box a1ack) Point 15

Loss of Adversarially Trained Model 16

Simple A1ack: RAND+Single-Step Top1 error 64.3 70 58.3 60 Error Rate 50 40 26.8 30 20 10 0 1. Small random step 2. Step in direcXon of gradient 17

What’s wrong with “Single-Step” Adversarial Training? Minimize: self.loss ( self.attack ( ) ) SoluXon: 1. The model is actually robust Be1er approach? 2. Or, the a3ack is really bad decouple a1ack and defense Degenerate Minimum 18

Ensemble Adversarial Training pre-trained ML Model ML Model ML Model Loss 19

Domain AdaptaXon • Interpret Ensemble Adversarial Training as a mulXple-source domain adaptaXon problem – Train on distribuXons (adversaries) A 1 , … , A k – Get tested on a new adversary A’ • Provable generalizaXon if ∃ i such that A’ ≈ A i • Can’t say much about other adversaries 20

Results ImageNet (IncepLon v3, IncepLon ResNet v2) Adv. Training Ensemble Adv. Training Ensemble Adv. Training (ResNet) 40 36.5 35 30.4 30.0 26.8 30 25.9 24.6 23.6 Error Rate 22.0 25 20.2 20 15 10 5 0 Clean Data White-Box A1ack Black-Box A1ack 21

What about stronger a1acks? • Li1le gain on strong white-box a1acks! • But, improvements in black-box seQng ! 22

Open Problems • How far can we go with adversarial training? – White-box robustness is possible! (Madry et al. 2017) • Caveat 1: Very expensive • Caveat 2: What is the right metric (l ∞ , l 2 , rotaXons) ? • Can we say anything formal (and useful) about adversarial examples? – Why do they exist? Why do they transfer? THANK YOU 23

Ensemble Adversarial Training A1acks and Defenses Facebook December - PowerPoint PPT Presentation

Ensemble Adversarial Training A1acks and Defenses Facebook December 15 th 2017 Florian Tramr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Ian Goodfellow (Google Brain) Dan Boneh (Stanford) Patrick

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Friendly Adversarial Training: Attacks Which Do Not Kill Training Make Adversarial Learning

A-NICE-MC Jiaming Song 1. Motivation 2. Notations and Problem Setup 3. Adversarial Training for

Improving Adversarial Robustness via Promoting Ensemble Diversity Tianyu

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Ensemble Adversarial Training A1acks and Defenses Cybersecurity With The Best October 15 th 2017

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Guest

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

2018-01-10 Horst Grtz Institute for IT Security Chair for Network and Data Security Paul

Why MySQL Replication Fails, and How to Get it Back September, 26, 2017 Sveta Smirnova Sveta

DIY Internet with MinimaLT Low-latency secure networking JSConf.EU 2013 Andy Wingo

Leveraging Existing Instrumentation to Automatically Infer Invariant-Constrained Models Ivan

Reliable Transmission Desirable for a link-level protocol to deliver frames reliably

A New Architecture for Building Software Daniel Dunbar Overview Compile time How

Ackermann's Function in Iterative Form A Subtle Termination Proof with Isabelle/HOL Lawrence C

Analyzing the Great Firewall of China Over Space and Time Roya Ensafi, Philipp Winter, Abdullah

Ensemble Adversarial Training A1acks and Defenses Facebook December - PowerPoint PPT Presentation

Ensemble Adversarial Training A1acks and Defenses Facebook December 15 th 2017 Florian Tramr Stanford Joint work with Alexey Kurakin (Google Brain) Nicolas Papernot (PSU) Ian Goodfellow (Google Brain) Dan Boneh (Stanford) Patrick

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Friendly Adversarial Training: Attacks Which Do Not Kill Training Make Adversarial Learning

A-NICE-MC Jiaming Song 1. Motivation 2. Notations and Problem Setup 3. Adversarial Training for

Improving Adversarial Robustness via Promoting Ensemble Diversity Tianyu

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Ensemble Adversarial Training A1acks and Defenses Cybersecurity With The Best October 15 th 2017

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Guest

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

2018-01-10 Horst Grtz Institute for IT Security Chair for Network and Data Security Paul

Why MySQL Replication Fails, and How to Get it Back September, 26, 2017 Sveta Smirnova Sveta

DIY Internet with MinimaLT Low-latency secure networking JSConf.EU 2013 Andy Wingo

Leveraging Existing Instrumentation to Automatically Infer Invariant-Constrained Models Ivan

Reliable Transmission Desirable for a link-level protocol to deliver frames reliably

A New Architecture for Building Software Daniel Dunbar Overview Compile time How

Ackermann's Function in Iterative Form A Subtle Termination Proof with Isabelle/HOL Lawrence C

Analyzing the Great Firewall of China Over Space and Time Roya Ensafi, Philipp Winter, Abdullah

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin