Limitations of Threat Modeling in Adversarial Machine Learning - PowerPoint PPT Presentation

Limitations of Threat Modeling in Adversarial Machine Learning Florian Tramèr EPFL, December 19 th 2019 Based on joint work with Jens Behrmannn, Dan Boneh, Nicholas Carlini, Pascal Dupré, Jörn-Henrik Jacobsen, Nicolas Papernot, Giancarlo Pellegrino, Gili Rusak

The state of adversarial machine learning GANs vs Adversarial Examples Maybe we need to write 10x more papers 10000+ papers 2019 2018 2013 2014 1000+ papers Inspired by N. Carlini, “Recent Advances in Adversarial Machine Learning”, ScAINet 2019 2

Adversarial examples Biggio et al., 2014 Szegedy et al., 2014 Goodfellow et al., 2015 Athalye, 2017 88% Tabby Cat 99% Guacamole How? Training ⟹ “tweak model parameters such that 𝑔( ) = 𝑑𝑏𝑢 ” • Attacking ⟹ “tweak input pixels such that 𝑔( ) = 𝑕𝑣𝑏𝑑𝑏𝑛𝑝𝑚𝑓 ” • 3

The bleak state of adversarial examples 4

The bleak state of adversarial examples Most papers study a “toy” problem • Solving it is not useful per se, but maybe we’ll find new insights or techniques Going beyond this toy problem (even slightly) is hard • Overfitting to the toy problem happens and is harmful • The “non-toy” version of the problem is not actually that • relevant for computer security (except for ad-blocking) 5

The bleak state of adversarial examples Most papers study a “toy” problem • Solving it is not useful per se, but maybe we’ll find new insights or techniques Going beyond this toy problem (even slightly) is hard • Overfitting to the toy problem happens and is harmful • The “non-toy” version of the problem is not actually that • relevant for computer security (except for ad-blocking) 6

The standard game [Gilmer et al. 2018] Adversary is given an input x from a Adversary has some info on model data distribution (white-box, queries, data) ML Model Adversary produces adversarial example x’ Adversary wins if x’ ≈ x and defender misclassifies 7

Relaxing and formalizing the game How do we define x’ ≈ x ? “Semantics” preserving, fully imperceptible? • Conservative approximations [Goodfellow et al. 2015] Consider noise that is clearly semantics-preserving • E.g., where δ ! = max δ " ≤ 𝜗 x’ x δ Robustness to this noise is necessary but not sufficient • Even this “toy” version of the game is hard, • so let’s focus on this first 8

Progress on the toy game Many broken defenses [Carlini & Wagner 2017, Athalye et al. 2018] • Adversarial Training [Szegedy et al., 2014, Madry et al., 2018] • Þ For each training input ( x , y), train on worst-case adversarial input $%&'$( Loss (𝑔 𝒚 + 𝜺 , 𝑧) 𝜺 ! "# Certified Defenses • [Hein & Andriushchenko 2017, Raghunathan et al., 2018, Wong & Kolter 2018] 9

Progress on the toy game Many broken defenses [Carlini & Wagner 2017, Athalye et al. 2018] • Robustness to noise of small l p norm is a “toy” problem Adversarial Training [Szegedy et al., 2014, Madry et al., 2018] • Þ For each training input ( x , y), train on worst-case adversarial input $%&'$( Loss (𝑔 𝒚 + 𝜺 , 𝑧) Solving this problem is not useful per se, 𝜺 ! "# unless it teaches us new insights Certified Defenses • [Hein & Andriushchenko 2017, Raghunathan et al., 2018, Wong & Kolter 2018] Solving this problem does not give us “secure ML” 10

Outline Most papers study a “toy” problem • Solving it is not useful per se, but maybe we’ll find new insights or techniques Going beyond this toy problem (even slightly) is hard • Overfitting to the toy problem happens and is harmful • The “non-toy” version of the problem is not actually that • relevant for computer security (except for ad-blocking) 11

Beyond the toy game Issue: defenses do not generalize Example: training against l ∞ -bounded noise on CIFAR10 96% 70% Engstrom et al., 2017 Accuracy Sharma & Chen, 2018 16% 9% No noise l ∞ noise l 1 noise rotation / translation Robustness to one type can increase vulnerability to others 12

Robustness to more perturbation types S 2 = 𝜀: S 1 = 𝜀: 𝜀 ! ≤ 𝜁 ! 𝜀 " ≤ 𝜁 " S 3 = 𝜀: « small rotation » S = S 1 U S 2 U S 3 • Pick worst-case adversarial example from S • Train the model on that example T & Boneh, “Adversarial Training and Robustness for Multiple Perturbations”, NeurIPS 2019 13

Empirical multi-perturbation robustness CIFAR10: MNIST: T & Boneh , “Adversarial Training and Robustness for Multiple Perturbations”, NeurIPS 2019 14

Empirical multi-perturbation robustness CIFAR10: Current defenses scale poorly to multiple perturbations MNIST: We also prove that a robustness tradeoff is inherent for simple data distributions T & Boneh , “Adversarial Training and Robustness for Multiple Perturbations”, NeurIPS 2019 15

Invariance adversarial examples ∈ 0, 1 784 Highest robustness claims in the literature: 80% robust accuracy to l 0 = 30 • Certified 85% robust accuracy to l ∞ = 0.4 • natural Robustness considered l ∞ ≤ 0.4 harmful l 0 ≤ 30 Jacobsen et al., “Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness”, 2019 17

Invariance adversarial examples ∈ 0, 1 784 Highest robustness claims in the literature: 80% robust accuracy to l 0 = 30 • We do not even know how to Certified 85% robust accuracy to l ∞ = 0.4 • set the “right” bounds for the natural toy problem Robustness considered l ∞ ≤ 0.4 harmful l 0 ≤ 30 Jacobsen et al., “Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness”, 2019 18

Adversarial examples are hard! Most current work: small progress on the relaxed game • Moving towards the standard game is hard • Even robustness to 2-3 perturbations types is tricky • How would we even enumerate all necessary perturbations? • Over-optimizing robustness is harmful • How do we set the right bounds? • We need a formal model of perceptual similarity • But then we’ve probably solved all of computer vision anyhow... • 19

Recap on the standard game Adversary is given an input x from a Adversary has some info on model data distribution (white-box, queries, data) ML Model Adversary produces adversarial example x’ Adversary wins if x’ ≈ x and defender misclassifies 21

Recap on the standard game Adversary is given an input x from a Adversary has some info on model data distribution (white-box, queries, data) There are very few settings ML Model where this game captures a relevant threat model Adversary produces adversarial example x’ Adversary wins if x’ ≈ x and defender misclassifies 22

ML in security/safety critical environments Fool self-driving cars’ street-sign detection [Eykholt et al. 2017+ 2018 ] Evade malware detection [Grosse et al. 2018] Fool visual ad-blockers [ T et al. 2019] 23

Is the standard game relevant? 24

ML Model 25

Is the standard game relevant? Is there an adversary? 26

Adversary is given an input x from a data distribution ML Model 27

Is the standard game relevant? Is there an adversary? Is average-case success important? (Adv cannot choose which inputs to attack) 28

Adversary has some info on model (white-box, queries, data) ML Model 29

Is the standard game relevant? Is there an adversary? Average-case success? Model access? (white-box, queries, data) 30

ML Model Adversary wins if x’ ≈ x and defender misclassifies 31

Is the standard game relevant? Is there an adversary? Average-case success? Access to model? Should attacks preserve semantics? (or be fully imperceptible) 32

Is the standard game relevant? Is there an adversary? Average-case success? Access to model? Semantics-preserving perturbations? Unless the answer to all these questions is Yes , the standard game of adversarial examples is not the right threat model 33

Where else could the game be relevant? Anti-phishing Content takedown Common theme: human-in-the-loop! (Adversary wants to fools ML without disrupting UX) 34

Limitations of Threat Modeling in Adversarial Machine Learning - PowerPoint PPT Presentation

Limitations of Threat Modeling in Adversarial Machine Learning Florian Tramr EPFL, December 19 th 2019 Based on joint work with Jens Behrmannn, Dan Boneh, Nicholas Carlini, Pascal Dupr, Jrn-Henrik Jacobsen, Nicolas Papernot, Giancarlo

Lessons from Star Wars Adam Shostack @adamshostack Agenda What is threat modeling? A

Offensive Threat Modeling for Attackers turning threat modeling on its head Rafal M. Los

Threat Modeling and S haring S ummary Proposal to kick off Threat Modeling proj ect

Assessment What is Threat Assessment Threat assessment is the process of gathering

Active Threat on Campus Prevention & Response Active threat defined An active threat can be

Phases of Disaster Despair Threat Threat Phase: Small events serve as a warning or threat to

Practical Approach For Lightway Threat Modeling Automation Vitaly Davidoff CISSP , CSSLP

Outline Starting synchronous lecture recording CSci 4271W More perspectives on threat modeling

AN INTRODUCTION TO THREAT MODELING IN PRACTICE Thorsten Tarrach, Christoph Schmittner WHAT IS

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

The Child Welfare System and Trafficking April 2nd, 2015 ERIN CONNER, MSW SOCIAL SERVICES

2 nd semester ENG NGLI LISH SH LA LANG NGUAGE AGE Topic 48: Solutions. Advanced. Unit 7

Self-directed Support Use of direct payments to employ family members module Learning

Introduction Sung-Eui Yoon ( ) Course URL: http://sgvr.kaist.ac.kr/~sungeui/CG About

Model Driven Security: Foundations, Tools, and Practice David Basin, Manuel Clavel, and Marina

On coherence-sensitive observables Simon Pltzer IPPP, Department of Physics, Durham University

System Analysis Preliminaries Jonathan Thaler Department of Computer Science 1 / 12 Software

MOL2NET, 2018 , 4, http://sciforum.net/conference/mol2net-04 2 Results and Discussion Figure 1 .

Limitations of Threat Modeling in Adversarial Machine Learning - PowerPoint PPT Presentation

Limitations of Threat Modeling in Adversarial Machine Learning Florian Tramr EPFL, December 19 th 2019 Based on joint work with Jens Behrmannn, Dan Boneh, Nicholas Carlini, Pascal Dupr, Jrn-Henrik Jacobsen, Nicolas Papernot, Giancarlo

Lessons from Star Wars Adam Shostack @adamshostack Agenda What is threat modeling? A

Offensive Threat Modeling for Attackers turning threat modeling on its head Rafal M. Los

Threat Modeling and S haring S ummary Proposal to kick off Threat Modeling proj ect

Assessment What is Threat Assessment Threat assessment is the process of gathering

Active Threat on Campus Prevention &amp; Response Active threat defined An active threat can be

Phases of Disaster Despair Threat Threat Phase: Small events serve as a warning or threat to

Practical Approach For Lightway Threat Modeling Automation Vitaly Davidoff CISSP , CSSLP

Outline Starting synchronous lecture recording CSci 4271W More perspectives on threat modeling

AN INTRODUCTION TO THREAT MODELING IN PRACTICE Thorsten Tarrach, Christoph Schmittner WHAT IS

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

The Child Welfare System and Trafficking April 2nd, 2015 ERIN CONNER, MSW SOCIAL SERVICES

2 nd semester ENG NGLI LISH SH LA LANG NGUAGE AGE Topic 48: Solutions. Advanced. Unit 7

Self-directed Support Use of direct payments to employ family members module Learning

Introduction Sung-Eui Yoon ( ) Course URL: http://sgvr.kaist.ac.kr/~sungeui/CG About

Model Driven Security: Foundations, Tools, and Practice David Basin, Manuel Clavel, and Marina

On coherence-sensitive observables Simon Pltzer IPPP, Department of Physics, Durham University

System Analysis Preliminaries Jonathan Thaler Department of Computer Science 1 / 12 Software

MOL2NET, 2018 , 4, http://sciforum.net/conference/mol2net-04 2 Results and Discussion Figure 1 .

Active Threat on Campus Prevention & Response Active threat defined An active threat can be

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin