on the connection between adversarial robustness and
play

On the Connection Between Adversarial Robustness and Saliency Map - PowerPoint PPT Presentation

On the Connection Between Adversarial Robustness and Saliency Map Interpretability Christian Etmann , 1 , 3 , Sebastian Lunz , 2 , Peter Maass 1 , Carola-Bibiane Sch onlieb 2 13th June, 2019 1: ZeTeM, University of Bremen, 2: Cambridge


  1. On the Connection Between Adversarial Robustness and Saliency Map Interpretability Christian Etmann ∗ , 1 , 3 , Sebastian Lunz ∗ , 2 , Peter Maass 1 , Carola-Bibiane Sch¨ onlieb 2 13th June, 2019 1: ZeTeM, University of Bremen, 2: Cambridge Image Analysis, University of Cambridge, 3: Work done at Cambridge 1

  2. Saliency Maps Ψ( x ) affine layer Conv Conv Conv Conv For a logit Ψ i ( x ), we call its gradient ∇ Ψ i ( x ) the saliency map in x . It should show us the discriminative portions of the image. 2

  3. Saliency Maps Ψ( x ) affine layer Conv Conv Conv Conv For a logit Ψ i ( x ), we call its gradient ∇ Ψ i ( x ) the saliency map in x . It should show us the discriminative portions of the image. Original Image Saliency map of a ResNet50 2

  4. An Unexplained Phenomenon Models trained to be more robust to adversarial attacks seem to exhibit ’interpretable’ saliency maps 1 Original Image Saliency map of a robustified ResNet50 3 1 Tsipras et al, 2019: ’Robustness may be at odds with accuracy.’

  5. An Unexplained Phenomenon Models trained to be more robust to adversarial attacks seem to exhibit ’interpretable’ saliency maps 1 Original Image Saliency map of a robustified ResNet50 This phenomenon has a remarkably simple explanation! 3 1 Tsipras et al, 2019: ’Robustness may be at odds with accuracy.’

  6. Explaining the Interpretability Puzzle We call ρ ( x ) = inf e ∈ X {� e � : F ( x + e ) � = F ( x ) } the adversarial robustness of the classifier F (with respect to euclidean norm � · � ). • Adversarial attacks are tiny perturbations that ’fool’ the classifier 4

  7. Explaining the Interpretability Puzzle We call ρ ( x ) = inf e ∈ X {� e � : F ( x + e ) � = F ( x ) } the adversarial robustness of the classifier F (with respect to euclidean norm � · � ). • Adversarial attacks are tiny perturbations that ’fool’ the classifier • A higher robustness to these attacks ⇒ greater distance to the decision boundary 4

  8. Explaining the Interpretability Puzzle We call ρ ( x ) = inf e ∈ X {� e � : F ( x + e ) � = F ( x ) } the adversarial robustness of the classifier F (with respect to euclidean norm � · � ). • Adversarial attacks are tiny perturbations that ’fool’ the classifier • A higher robustness to these attacks ⇒ greater distance to the decision boundary • A larger distance to the decision boundary results in a lower angle between x and ∇ Ψ i ( x ) 4

  9. Explaining the Interpretability Puzzle We call ρ ( x ) = inf e ∈ X {� e � : F ( x + e ) � = F ( x ) } the adversarial robustness of the classifier F (with respect to euclidean norm � · � ). • Adversarial attacks are tiny perturbations that ’fool’ the classifier • A higher robustness to these attacks ⇒ greater distance to the decision boundary • A larger distance to the decision boundary results in a lower angle between x and ∇ Ψ i ( x ) • We perceive this as a higher visual alignment between image and saliency map 4

  10. Explaining the Interpretability Puzzle We call ρ ( x ) = inf e ∈ X {� e � : F ( x + e ) � = F ( x ) } the adversarial robustness of the classifier F (with respect to euclidean norm � · � ). • Adversarial attacks are tiny perturbations that ’fool’ the classifier • A higher robustness to these attacks ⇒ greater distance to the decision boundary • A larger distance to the decision boundary results in a lower angle between x and ∇ Ψ i ( x ) • We perceive this as a higher visual alignment between image and saliency map . . . but not quite 4

  11. A Simple Toy Example x x z z First, we consider a linear, binary classifier F ( x ) = sgn (Ψ( x )) , where Ψ( x ) := � x , z � for some z . Then ρ ( x ) = |� x , z �| = |� x , ∇ Ψ( x ) �| �∇ Ψ( x ) � . � z � Note that ρ ( x ) = � x � · | cos( δ ) | , where δ is the angle between x and z . 5

  12. A Simple Toy Example x x ∇ Ψ( x ) ∇ Ψ( x ) First, we consider a linear, binary classifier F ( x ) = sgn (Ψ( x )) , where Ψ( x ) := � x , z � for some z . Then ρ ( x ) = |� x , z �| = |� x , ∇ Ψ( x ) �| �∇ Ψ( x ) � . � z � Note that ρ ( x ) = � x � · | cos( δ ) | , where δ is the angle between x and z . 6

  13. Alignment Definition (Alignment) Let Ψ = (Ψ 1 , . . . , Ψ n ) : X → R n be differentiable in x . Then for an n -class classifier defined a.e. by F ( x ) = arg max i Ψ i ( x ), we call ∇ Ψ F ( x ) the saliency map of F . We further call α ( x ) := |� x , ∇ Ψ F ( x ) ( x ) �| �∇ Ψ F ( x ) ( x ) � , the alignment with respect to Ψ in x . For binary, linear models by construction: ρ ( x ) = α ( x ) 7

  14. Alignment Definition (Alignment) Let Ψ = (Ψ 1 , . . . , Ψ n ) : X → R n be differentiable in x . Then for an n -class classifier defined a.e. by F ( x ) = arg max i Ψ i ( x ), we call ∇ Ψ F ( x ) the saliency map of F . We further call α ( x ) := |� x , ∇ Ψ F ( x ) ( x ) �| �∇ Ψ F ( x ) ( x ) � , the alignment with respect to Ψ in x . For binary, linear models by construction: ρ ( x ) = α ( x ) ....but already wrong for affine models. 7

  15. How about neural nets? There is no closed expression for robustness. One idea is to linearize. Definition (Linearized Robustness) Let Ψ( x ) be the differentiable score vector for the classifier F in x . We call Ψ i ∗ ( x ) − Ψ j ( x ) ρ ( x ) := min ˜ �∇ Ψ i ∗ ( x ) − ∇ Ψ j ( x ) � , j � = i ∗ the linearized robustness in x , where i ∗ := F ( x ) is the predicted class at point x . 8

  16. Bridging the Gap Between Linearized Robustness and Alignment Using • a homogeneous decomposition theorem • the ’binarization’ of our classifier we get Theorem (Bound for general models) Let g := ∇ Ψ i ∗ ( x ) . Furthermore, let g † := ∇ Ψ † x ( x ) and β † the non-homogeneous portion of Ψ † x . Denote by ¯ v the � · � -normed v � = 0 . Then ρ ( x ) ≤ α ( x ) + � x � · � g † − g � + | β † | ˜ � g † � . 9

  17. Experiments: Robustness vs. Alignment ImageNet MNIST 400 4 . 5 4 300 M [ α ( x )] M [ α ( x )] 3 . 5 200 3 100 Gradient Attack Gradient Attack Projected Gradient Descent Projected Gradient Descent Carlini-Wagner 2 . 5 Carlini-Wagner Linearized Robustness Linearized Robustness 50 100 150 200 250 300 350 400 1 . 5 2 2 . 5 3 M [ ρ ( x )] M [ ρ ( x )] • Linearized robustness is a reasonable approximation • Alignment increases with robustness • Superlinear growth for ImageNet and saturating effect on MNIST 10

  18. Experiments: Explaining the Observations ImageNet MNIST 1 1 0 . 9 0 . 9 0 . 8 M [ |� x,g † �| ] M [ | Ψ † ( x ) | ] M [ |� x,g † �| ] M [ | Ψ † ( x ) | ] 0 . 8 0 . 7 0 . 7 0 . 6 0 . 5 0 . 6 0 . 4 0 . 5 50 100 150 200 250 300 1 . 2 1 . 4 1 . 6 1 . 8 2 2 . 2 2 . 4 2 . 6 2 . 8 3 M [˜ ρ ( x )] M [˜ ρ ( x )] Fraction of homogeneous part of logit • The degree of homogeneity largely determines how strong the connection between α and ˜ ρ is • ImageNet: higher robustness + more homogeneity = superlinear growth • MNIST: higher robustness + less homogeneity = effects start cancelling out 11

  19. On the Connection Between Adversarial Robustness and Saliency Map Interpretability Thank you and see you at the poster! Pacific Ballroom, #70 12

Recommend


More recommend