On the Connection Between Adversarial Robustness and Saliency Map - PowerPoint PPT Presentation

On the Connection Between Adversarial Robustness and Saliency Map Interpretability Christian Etmann ∗ , 1 , 3 , Sebastian Lunz ∗ , 2 , Peter Maass 1 , Carola-Bibiane Sch¨ onlieb 2 13th June, 2019 1: ZeTeM, University of Bremen, 2: Cambridge Image Analysis, University of Cambridge, 3: Work done at Cambridge 1

Saliency Maps Ψ( x ) affine layer Conv Conv Conv Conv For a logit Ψ i ( x ), we call its gradient ∇ Ψ i ( x ) the saliency map in x . It should show us the discriminative portions of the image. 2

Saliency Maps Ψ( x ) affine layer Conv Conv Conv Conv For a logit Ψ i ( x ), we call its gradient ∇ Ψ i ( x ) the saliency map in x . It should show us the discriminative portions of the image. Original Image Saliency map of a ResNet50 2

An Unexplained Phenomenon Models trained to be more robust to adversarial attacks seem to exhibit ’interpretable’ saliency maps 1 Original Image Saliency map of a robustified ResNet50 3 1 Tsipras et al, 2019: ’Robustness may be at odds with accuracy.’

An Unexplained Phenomenon Models trained to be more robust to adversarial attacks seem to exhibit ’interpretable’ saliency maps 1 Original Image Saliency map of a robustified ResNet50 This phenomenon has a remarkably simple explanation! 3 1 Tsipras et al, 2019: ’Robustness may be at odds with accuracy.’

Explaining the Interpretability Puzzle We call ρ ( x ) = inf e ∈ X {� e � : F ( x + e ) � = F ( x ) } the adversarial robustness of the classifier F (with respect to euclidean norm � · � ). • Adversarial attacks are tiny perturbations that ’fool’ the classifier 4

Explaining the Interpretability Puzzle We call ρ ( x ) = inf e ∈ X {� e � : F ( x + e ) � = F ( x ) } the adversarial robustness of the classifier F (with respect to euclidean norm � · � ). • Adversarial attacks are tiny perturbations that ’fool’ the classifier • A higher robustness to these attacks ⇒ greater distance to the decision boundary 4

Explaining the Interpretability Puzzle We call ρ ( x ) = inf e ∈ X {� e � : F ( x + e ) � = F ( x ) } the adversarial robustness of the classifier F (with respect to euclidean norm � · � ). • Adversarial attacks are tiny perturbations that ’fool’ the classifier • A higher robustness to these attacks ⇒ greater distance to the decision boundary • A larger distance to the decision boundary results in a lower angle between x and ∇ Ψ i ( x ) 4

Explaining the Interpretability Puzzle We call ρ ( x ) = inf e ∈ X {� e � : F ( x + e ) � = F ( x ) } the adversarial robustness of the classifier F (with respect to euclidean norm � · � ). • Adversarial attacks are tiny perturbations that ’fool’ the classifier • A higher robustness to these attacks ⇒ greater distance to the decision boundary • A larger distance to the decision boundary results in a lower angle between x and ∇ Ψ i ( x ) • We perceive this as a higher visual alignment between image and saliency map 4

Explaining the Interpretability Puzzle We call ρ ( x ) = inf e ∈ X {� e � : F ( x + e ) � = F ( x ) } the adversarial robustness of the classifier F (with respect to euclidean norm � · � ). • Adversarial attacks are tiny perturbations that ’fool’ the classifier • A higher robustness to these attacks ⇒ greater distance to the decision boundary • A larger distance to the decision boundary results in a lower angle between x and ∇ Ψ i ( x ) • We perceive this as a higher visual alignment between image and saliency map . . . but not quite 4

A Simple Toy Example x x z z First, we consider a linear, binary classifier F ( x ) = sgn (Ψ( x )) , where Ψ( x ) := � x , z � for some z . Then ρ ( x ) = |� x , z �| = |� x , ∇ Ψ( x ) �| �∇ Ψ( x ) � . � z � Note that ρ ( x ) = � x � · | cos( δ ) | , where δ is the angle between x and z . 5

A Simple Toy Example x x ∇ Ψ( x ) ∇ Ψ( x ) First, we consider a linear, binary classifier F ( x ) = sgn (Ψ( x )) , where Ψ( x ) := � x , z � for some z . Then ρ ( x ) = |� x , z �| = |� x , ∇ Ψ( x ) �| �∇ Ψ( x ) � . � z � Note that ρ ( x ) = � x � · | cos( δ ) | , where δ is the angle between x and z . 6

Alignment Definition (Alignment) Let Ψ = (Ψ 1 , . . . , Ψ n ) : X → R n be differentiable in x . Then for an n -class classifier defined a.e. by F ( x ) = arg max i Ψ i ( x ), we call ∇ Ψ F ( x ) the saliency map of F . We further call α ( x ) := |� x , ∇ Ψ F ( x ) ( x ) �| �∇ Ψ F ( x ) ( x ) � , the alignment with respect to Ψ in x . For binary, linear models by construction: ρ ( x ) = α ( x ) 7

Alignment Definition (Alignment) Let Ψ = (Ψ 1 , . . . , Ψ n ) : X → R n be differentiable in x . Then for an n -class classifier defined a.e. by F ( x ) = arg max i Ψ i ( x ), we call ∇ Ψ F ( x ) the saliency map of F . We further call α ( x ) := |� x , ∇ Ψ F ( x ) ( x ) �| �∇ Ψ F ( x ) ( x ) � , the alignment with respect to Ψ in x . For binary, linear models by construction: ρ ( x ) = α ( x ) ....but already wrong for affine models. 7

How about neural nets? There is no closed expression for robustness. One idea is to linearize. Definition (Linearized Robustness) Let Ψ( x ) be the differentiable score vector for the classifier F in x . We call Ψ i ∗ ( x ) − Ψ j ( x ) ρ ( x ) := min ˜ �∇ Ψ i ∗ ( x ) − ∇ Ψ j ( x ) � , j � = i ∗ the linearized robustness in x , where i ∗ := F ( x ) is the predicted class at point x . 8

Bridging the Gap Between Linearized Robustness and Alignment Using • a homogeneous decomposition theorem • the ’binarization’ of our classifier we get Theorem (Bound for general models) Let g := ∇ Ψ i ∗ ( x ) . Furthermore, let g † := ∇ Ψ † x ( x ) and β † the non-homogeneous portion of Ψ † x . Denote by ¯ v the � · � -normed v � = 0 . Then ρ ( x ) ≤ α ( x ) + � x � · � g † − g � + | β † | ˜ � g † � . 9

Experiments: Robustness vs. Alignment ImageNet MNIST 400 4 . 5 4 300 M [ α ( x )] M [ α ( x )] 3 . 5 200 3 100 Gradient Attack Gradient Attack Projected Gradient Descent Projected Gradient Descent Carlini-Wagner 2 . 5 Carlini-Wagner Linearized Robustness Linearized Robustness 50 100 150 200 250 300 350 400 1 . 5 2 2 . 5 3 M [ ρ ( x )] M [ ρ ( x )] • Linearized robustness is a reasonable approximation • Alignment increases with robustness • Superlinear growth for ImageNet and saturating effect on MNIST 10

Experiments: Explaining the Observations ImageNet MNIST 1 1 0 . 9 0 . 9 0 . 8 M [ |� x,g † �| ] M [ | Ψ † ( x ) | ] M [ |� x,g † �| ] M [ | Ψ † ( x ) | ] 0 . 8 0 . 7 0 . 7 0 . 6 0 . 5 0 . 6 0 . 4 0 . 5 50 100 150 200 250 300 1 . 2 1 . 4 1 . 6 1 . 8 2 2 . 2 2 . 4 2 . 6 2 . 8 3 M [˜ ρ ( x )] M [˜ ρ ( x )] Fraction of homogeneous part of logit • The degree of homogeneity largely determines how strong the connection between α and ˜ ρ is • ImageNet: higher robustness + more homogeneity = superlinear growth • MNIST: higher robustness + less homogeneity = effects start cancelling out 11

On the Connection Between Adversarial Robustness and Saliency Map Interpretability Thank you and see you at the poster! Pacific Ballroom, #70 12

On the Connection Between Adversarial Robustness and Saliency Map - PowerPoint PPT Presentation

On the Connection Between Adversarial Robustness and Saliency Map Interpretability Christian Etmann , 1 , 3 , Sebastian Lunz , 2 , Peter Maass 1 , Carola-Bibiane Sch onlieb 2 13th June, 2019 1: ZeTeM, University of Bremen, 2: Cambridge

Limits on Robustness to Adversarial Examples Elvis Dohmatob Criteo AI Lab October 2, 2019 Elvis

CONNECTION BETWEEN MICROPILES CONNECTION BETWEEN MICROPILES CONNECTION BETWEEN MICROPILES

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

UCSD Robustness Summer School David Donoho 20190812 David Donoho UCSD Robustness Summer School

Robustness? Robustness ? Robustness?

Adversarial Robustness for Code Pavol Bielik , Martin Vechev pavol.bielik@inf.ethz.ch,

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples Nicholas

Adversarial Robustness of Machine Learning Models for Graphs Prof. Dr. Stephan Gnnemann

Adversarial Domain Adaptation and Adversarial Robustness Judy Hoffman + = Big Deep success

Adversarial Training and Robustness for Multiple Perturbations Poster #87 Florian Tramr &

Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Robustness and Generalization Huan Xu The University of Texas at Austin Department of Electrical

Where Are We? Lecture 9 Robustness through Training 1 Robustness Explicit Handling of Noise

Programs and State Machines Program & FSM connection What is the connection between

Improving Adversarial Robustness via Promoting Ensemble Diversity Tianyu

Specificity of knowledge intensive entrepreneurship in central and eastern Europe Prof. Slavo

System Analysis Chapter 3: Textual Modeling Jonathan Thaler Department of Computer Science 1 /

OWL Semantics Richer languages allow the definition of classes through description of their

New Tools for Web-Scale N-grams Dekang Lin, Kenneth Church, Heng Ji, Satoshi Sekine, David

Foreign packages in GNU Guix Examples from Ruby gems, Python modules and R/CRAN Pjotr Prins

Notes on Neal and Hintons Generalized Expectation Maximization (GEM) Algorithm Mark Johnson

Marek Szyprowski m.szyprowski@samsung.com Samsung R&D Institute Poland Quick Introduction

Development of thin GEM readout structures Yasemin Schelhaas MAGIX Collaboration Meeting

On the Connection Between Adversarial Robustness and Saliency Map - PowerPoint PPT Presentation

On the Connection Between Adversarial Robustness and Saliency Map Interpretability Christian Etmann , 1 , 3 , Sebastian Lunz , 2 , Peter Maass 1 , Carola-Bibiane Sch onlieb 2 13th June, 2019 1: ZeTeM, University of Bremen, 2: Cambridge

Limits on Robustness to Adversarial Examples Elvis Dohmatob Criteo AI Lab October 2, 2019 Elvis

CONNECTION BETWEEN MICROPILES CONNECTION BETWEEN MICROPILES CONNECTION BETWEEN MICROPILES

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

UCSD Robustness Summer School David Donoho 20190812 David Donoho UCSD Robustness Summer School

Robustness? Robustness ? Robustness?

Adversarial Robustness for Code Pavol Bielik , Martin Vechev pavol.bielik@inf.ethz.ch,

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples Nicholas

Adversarial Robustness of Machine Learning Models for Graphs Prof. Dr. Stephan Gnnemann

Adversarial Domain Adaptation and Adversarial Robustness Judy Hoffman + = Big Deep success

Adversarial Training and Robustness for Multiple Perturbations Poster #87 Florian Tramr &amp;

Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Robustness and Generalization Huan Xu The University of Texas at Austin Department of Electrical

Where Are We? Lecture 9 Robustness through Training 1 Robustness Explicit Handling of Noise

Programs and State Machines Program &amp; FSM connection What is the connection between

Improving Adversarial Robustness via Promoting Ensemble Diversity Tianyu

Specificity of knowledge intensive entrepreneurship in central and eastern Europe Prof. Slavo

System Analysis Chapter 3: Textual Modeling Jonathan Thaler Department of Computer Science 1 /

OWL Semantics Richer languages allow the definition of classes through description of their

New Tools for Web-Scale N-grams Dekang Lin, Kenneth Church, Heng Ji, Satoshi Sekine, David

Foreign packages in GNU Guix Examples from Ruby gems, Python modules and R/CRAN Pjotr Prins

Notes on Neal and Hintons Generalized Expectation Maximization (GEM) Algorithm Mark Johnson

Marek Szyprowski m.szyprowski@samsung.com Samsung R&amp;D Institute Poland Quick Introduction

Development of thin GEM readout structures Yasemin Schelhaas MAGIX Collaboration Meeting

Adversarial Training and Robustness for Multiple Perturbations Poster #87 Florian Tramr &

Programs and State Machines Program & FSM connection What is the connection between

Marek Szyprowski m.szyprowski@samsung.com Samsung R&D Institute Poland Quick Introduction