Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff - PowerPoint PPT Presentation

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google Brain CS 231n, Stanford University, 2017-05-30

Overview • What are adversarial examples? • Why do they happen? • How can they be used to compromise machine learning systems? • What are the defenses? • How to use adversarial examples to improve machine learning, even when there is no adversary (Goodfellow 2016)

Since 2013, deep neural networks have matched human performance at... ...recognizing objects and faces…. (Szegedy et al, 2014) (Taigmen et al, 2013) ...solving CAPTCHAS and reading addresses... (Goodfellow et al, 2013) (Goodfellow et al, 2013) and other tasks... (Goodfellow 2016)

Adversarial Examples Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against Machine Learning at Test Time” Biggio 2013: fool neural nets Szegedy et al 2013: fool ImageNet classifiers imperceptibly Goodfellow et al 2014: cheap, closed form attack (Goodfellow 2016)

Turning Objects into “Airplanes” (Goodfellow 2016)

Attacking a Linear Model (Goodfellow 2016)

Not just for neural nets • Linear models • Logistic regression • Softmax regression • SVMs • Decision trees • Nearest neighbors (Goodfellow 2016)

Adversarial Examples from Overfitting O O x x O O x x (Goodfellow 2016)

Adversarial Examples from Excessive Linearity O O O O x x x O x (Goodfellow 2016)

Modern deep nets are very Modern deep nets are very (piecewise) linear piecewise linear Rectified linear unit Maxout Rectified linear unit Maxout LSTM Carefully tuned sigmoid Carefully tuned sigmoid LSTM (Goodfellow 2016) Google Proprietary

Nearly Linear Responses in Practice Argument to softmax (Goodfellow 2016)

Small inter-class distances Corrupted Clean Perturbation example example Perturbation changes the true class Random perturbation does not change the class Perturbation changes the input to “rubbish class” All three perturbations have L2 norm 3.96 This is actually small. We typically use 7! (Goodfellow 2016)

The Fast Gradient Sign Method (Goodfellow 2016)

Maps of Adversarial and Random Cross-Sections (collaboration with David Warde-Farley and Nicolas Papernot) (Goodfellow 2016)

Maps of Adversarial Cross-Sections (Goodfellow 2016)

Maps of Random Cross-Sections Adversarial examples are not noise (collaboration with David Warde-Farley and Nicolas Papernot) (Goodfellow 2016)

Estimating the Subspace Dimensionality (Tramèr et al, 2017) (Goodfellow 2016)

Clever Hans (“Clever Hans, Clever Algorithms,” Bob Sturm) (Goodfellow 2016)

Wrong almost everywhere (Goodfellow 2016)

Adversarial Examples for RL (Huang et al., 2017) (Goodfellow 2016)

High-Dimensional Linear Models Clean examples Adversarial Weights Signs of weights (Goodfellow 2016)

Linear Models of ImageNet (Andrej Karpathy, “Breaking Linear Classifiers on ImageNet”) (Goodfellow 2016)

RBFs behave more intuitively (Goodfellow 2016)

Cross-model, cross-dataset generalization (Goodfellow 2016)

Cross-technique transferability (Papernot 2016) (Goodfellow 2016)

Transferability Attack Target model with unknown weights, Substitute model Train your machine learning mimicking target own model algorithm, training model with known, set; maybe non- di ff erentiable function di ff erentiable Adversarial crafting Deploy adversarial against substitute examples against the Adversarial target; transferability examples property results in them succeeding (Goodfellow 2016)

Cross-Training Data Transferability Strong Weak Intermediate (Papernot 2016) (Goodfellow 2016)

Enhancing Transfer With Ensembles (Liu et al, 2016) (Goodfellow 2016)

Adversarial Examples in the Human Brain These are concentric circles, not intertwined spirals. (Pinna and Gregory, 2002) (Goodfellow 2016)

Practical Attacks • Fool real classifiers trained by remotely hosted API (MetaMind, Amazon, Google) • Fool malware detector networks • Display adversarial examples in the physical world and fool machine learning systems that perceive them through a camera (Goodfellow 2016)

Adversarial Examples in the Physical World (Kurakin et al, 2016) (Goodfellow 2016)

Failed defenses Generative Removing perturbation pretraining with an autoencoder Adding noise at test time Ensembles Confidence-reducing Error correcting perturbation at test time codes Multiple glimpses Weight decay Double backprop Adding noise Various at train time Dropout non-linear units (Goodfellow 2016)

Generative Modeling is not Su ffi cient to Solve the Problem (Goodfellow 2016)

Universal approximator Universal approximator theorem theorem Neural nets can represent either function: Neural nets can represent either function: Maximum likelihood doesn’t cause them to Maximum likelihood doesn’t cause them to learn learn the right function. But we can fix that... the right function. But we can fix that... (Goodfellow 2016) Google Proprietary

Training on Adversarial Examples 10 0 Train=Clean, Test=Clean Test misclassification rate Train=Clean, Test=Adv Train=Adv, Test=Clean 10 − 1 Train=Adv, Test=Adv 10 − 2 0 50 100 150 200 250 300 Training time (epochs) (Goodfellow 2016)

Adversarial Training of other Models • Linear models: SVM / linear regression cannot learn a step function, so adversarial training is less useful, very similar to weight decay • k -NN: adversarial training is prone to overfitting. • Takeway: neural nets can actually become more secure than other models. Adversarially trained neural nets have the best empirical success rate on adversarial examples of any machine learning model. (Goodfellow 2016)

Weaknesses Persist (Goodfellow 2016)

Adversarial Training Labeled as bird Still has same label (bird) Decrease probability of bird class (Goodfellow 2016)

Virtual Adversarial Training Unlabeled; model New guess should guesses it’s probably match old guess a bird, maybe a plane (probably bird, maybe plane) Adversarial perturbation intended to change the guess (Goodfellow 2016)

Text Classification with VAT RCV1 Misclassification Rate 8.00 7.70 7.50 7.40 7.20 7.12 7.05 7.00 6.97 6.68 6.50 6.00 Earlier SOTA SOTA Our baseline Adversarial Virtual Both Both + Adversarial bidirectional model Zoomed in for legibility (Goodfellow 2016)

Universal engineering machine (model-based optimization) Make new inventions by finding input that maximizes Training data Extrapolation model’s predicted performance (Goodfellow 2016)

Conclusion • Attacking is easy • Defending is di ffi cult • Adversarial training provides regularization and semi-supervised learning • The out-of-domain input problem is a bottleneck for model-based optimization generally (Goodfellow 2016)

cleverhans Open-source library available at: https://github.com/openai/cleverhans Built on top of TensorFlow (Theano support anticipated) Standard implementation of attacks, for adversarial training and reproducible benchmarks (Goodfellow 2016)

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff - PowerPoint PPT Presentation

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google Brain CS 231n, Stanford University, 2017-05-30 Overview What are adversarial examples? Why do they happen? How can they be used to

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

A Closer Look at Adversarial Examples for Separated Data Kamalika Chaudhuri University of

Adversarial Examples Hanxiao Liu April 2, 2018 1 / 22 Adversarial Examples Inputs to ML

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Guest

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Adversarial Examples and Adversarial Training Innova&ve Technology Leader program January 22

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Security

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Physical Adversarial Examples Alex Kurakin Ian Goodfellow Output STOP Machine Learning

Adversarial Examples in NLP Sameer Singh sameer@uci.edu @sameer_ sameersingh.org What are

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ L E C T U R E # 2 0 : A

Uns Unsup uper ervis vised ed PC PCFG FG Ind nduc ucti tion on for r Grounde ounded d

the-pi-project.com Kathleen Jalalpour and Corrinne Lieu BLOG : Follow a year in 5 th grade, week

Data-intensive Programming Lecture #3 Timo Aaltonen Department of Pervasive Computing Guest

Generalized Petri Nets Jade Master jmast003@ucr.edu May 22, 2019 University of California

Overview What is Adversarial Attack? Why should we care? How does it work? Real

CK2 for the identification of CK2 binding partners Anna Nickelsen *, Joachim Jose Institute of

THE SCAO SYSTEMS ON THE LBT Credits: E. Sacchetti LUCI 1 LUCI 2 SCAO systems 2x systems (S.

Sambuz

Useful Links

Newsletter

Mail Us

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff - PowerPoint PPT Presentation

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google Brain CS 231n, Stanford University, 2017-05-30 Overview What are adversarial examples? Why do they happen? How can they be used to

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

A Closer Look at Adversarial Examples for Separated Data Kamalika Chaudhuri University of

Adversarial Examples Hanxiao Liu April 2, 2018 1 / 22 Adversarial Examples Inputs to ML

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Guest

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Adversarial Examples and Adversarial Training Innova&amp;ve Technology Leader program January 22

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist Security

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Physical Adversarial Examples Alex Kurakin Ian Goodfellow Output STOP Machine Learning

Adversarial Examples in NLP Sameer Singh sameer@uci.edu @sameer_ sameersingh.org What are

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ L E C T U R E # 2 0 : A

Uns Unsup uper ervis vised ed PC PCFG FG Ind nduc ucti tion on for r Grounde ounded d

the-pi-project.com Kathleen Jalalpour and Corrinne Lieu BLOG : Follow a year in 5 th grade, week

Data-intensive Programming Lecture #3 Timo Aaltonen Department of Pervasive Computing Guest

Generalized Petri Nets Jade Master jmast003@ucr.edu May 22, 2019 University of California

Overview What is Adversarial Attack? Why should we care? How does it work? Real

CK2 for the identification of CK2 binding partners Anna Nickelsen *, Joachim Jose Institute of

THE SCAO SYSTEMS ON THE LBT Credits: E. Sacchetti LUCI 1 LUCI 2 SCAO systems 2x systems (S.

Sambuz

Useful Links

Newsletter

Mail Us

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Adversarial Examples and Adversarial Training Innova&ve Technology Leader program January 22