adversarial examples are not mysterious generalization is
play

Adversarial examples are not mysterious, generalization is Angus - PowerPoint PPT Presentation

I NTRODUCTION Theory Trade-offs Practical Attacks DNNs Adversarial examples are not mysterious, generalization is Angus Galloway University of Guelph gallowaa@uoguelph.ca I NTRODUCTION Theory Trade-offs Practical Attacks DNNs T HE


  1. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs Adversarial examples are not mysterious, generalization is Angus Galloway University of Guelph gallowaa@uoguelph.ca

  2. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs T HE ADVERSARIAL EXAMPLES PHENOMENON Machine learning models generalize well to an unseen test set, yet every input of a particular class is extremely close to an input of another class. “Accepted” informal definition : Any input designed to fool a machine learning system.

  3. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs F ORMAL DEFINITIONS A “misclassification” adversarial candidate ˆ x on a neural network F with input x via some perturbation of x by δ : ˆ x = x + δ where δ is usually derived from the gradient of the loss ∇ L ( θ, y , x ) w.r.t x , and for some small scalar ǫ , � δ � p ≤ ǫ, p ∈ { 1 , 2 , ∞} such that F ( x ) � = F (ˆ x )

  4. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs G OODFELLOW ET AL . 2015 For input x ∈ R n , there is an adversarial example ˜ x = x + η subject to the constraint � η � ∞ < ǫ . The dot product between a weight vector w and an adversarial example ˜ x is then: w T · ˜ x = w T · x + w T · η If w has mean m , activation grows linearly with ǫ m n . . .

  5. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs T ANAY & G RIFFIN 2016 But both w T · x and w T · η grow linearly with dimension n , provided that the distribution of w and x do not change.

  6. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs T HE B OUNDARY T ILTING P ERSPECTIVE Submanifold of sampled data Dense distribution of “low probability pockets” The boundary is “outside the box” Image space Image space (a) (b) Recall manifold learning hypothesis : training data sub-manifold exists with finite topological dimension f �� n .

  7. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs C m ( I , C ) m ( i , C ) j J I i (c) B C B m ( i, C ) j = m ( i, B ) m ( I , B ) m ( I , C ) I I j i i J J (d)

  8. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs T AXONOMY er 0.5 titling of L poorly performing classifiers (0 < v z << 1) B Type 0 titling of L Type 1 Type 2 (v z = 0) er min T optimal L classifiers (low reg.) δ 0 π/2

  9. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs A TTACKING B INARIZED N EURAL N ETWORKS tf.sign() tf.sign() Batch Batch Norm Norm Full-Precision Conv2D ReLU ReLU Scalar Binary Conv2D The empirical observation that BNNs with low-precision weights and activations are at least as robust as their full-precision counter parts.

  10. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs A TTACKING B INARIZED N EURAL N ETWORKS (2) 1. Regularizing effect due to decoupling between continuous and quantized parameters used in forward pass, biased gradient estimator (STE?) 2. Strikes better trade-off on IB curve in over-parameterized regime by discarding irrelevant information. (e) (f)

  11. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs W HY ONLY CONSIDER SMALL PERTURBATIONS ? Fault tolerant engineering design: Want performance degradation to be proportional to perturbation magnitude, regardless of an attacker’s strategy.

  12. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs 100 80 Train w/PGD Natural Accuracy (%) 60 40 20 0 0.1 0.3 0.5 0.7 0.9 Shift Magnitude

  13. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs H UMAN -D RIVEN A TTACKS (g) (h)

  14. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs A P RACTICAL B LACK -B OX A TTACK

  15. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs T RADE - OFFS 100 100 Expert-L2 90 80 Natural FGSM Accuracy (%) 80 60 70 40 40 60 20 50 0 0 10 20 30 40 50 0.0 0.1 0.2 0.3 0.4 0.5 Pixels changed FGSM attack epsilon (i) (j)

  16. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs I NTERPRETABILITY OF L OGISTIC R EGRESSION (k) (l) (m) (n)

  17. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs C ANDIDATE E XAMPLES (o) (p) (q) (r) (s)

  18. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs CIFAR-10 ARCHITECTURE Table: Simple fully-convolutional architecture adapted from the CleverHans library. Model uses ReLU activations, and does not use batch normalization or pooling. Layer h w c in c out s params 8 8 3 32 2 6.1k Conv1 6 6 32 64 2 73.7k Conv2 5 5 64 64 1 102.4k Conv3 1 1 256 10 1 2.6k Fc1 Total – – – – – 184.8k Model has 0.4% as many parameters as WideResNet.

  19. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs L ∞ A DVERSARIAL E XAMPLES 100 WRN FGSM WRN PGD 80 CNN-L2 FGSM CNN-L2 PGD WRN-Nat PGD accuracy (%) 60 40 20 0 25 50 75 100 epsilon

  20. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs R OBUSTNESS 100 WRN CNN 80 CNN-L2 WRN-Nat accuracy (%) 60 40 20 0 0.2 0.4 0.6 0.8 fraction of pixels swapped

  21. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs N OISY E XAMPLES

  22. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs W ITH L 2 WEIGHT DECAY The “independent components” of natural scenes are edge filters (Bell & Sejnowski 1997).

  23. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs W ITHOUT WEIGHT DECAY

  24. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs F OOLING I MAGES 4 years ago I didn’t think small-perturbation adversarial examples were going to be so hard to solve. I thought after another n months of working on those, I’d be basically done with them and would move on to fooling attacks . Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images (CVPR 2015)

  25. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs F OOLING I MAGES (CIFAR-10)

  26. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs F OOLING I MAGES (SVHN)

  27. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs F OOLING I MAGES (SVHN) Robust training procedure does not learn random labels (lower Rademacher complexity).

  28. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs F OOLING I MAGES 100 attack success rate (ASR) - margin (M) WRN ASR WRN M 80 CNN-L2 ASR CNN-L2 M 60 40 20 0 0 50 100 150 200 250 epsilon

  29. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs D IVIDE AND C ONQUER ? Image from Dube (2018).

  30. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs R EMARKS ◮ Test accuracy on popular ML benchmarks a weak measure of generalization. ◮ Plethora of band-aid fixes to std DNNs do not yield compelling results (e.g. provably robust framework). ◮ Incorporate expert knowledge, e.g. by excplicitly modeling part-whole relationships, other priors that relate to the known causal features such as edges in natural scenes. ◮ Good generalization implies some level of privacy, and more “fair” models assuming original intent is fair.

  31. I NTRODUCTION Theory Trade-offs Practical Attacks DNNs F UTURE WORK Information bottleneck (IB) theory seems essential for efficiently learning robust models from finite data. But why do models with no bottleneck generalize well on common machine learning datasets? i -RevNet retains all information until final layer and achieves high accuracy, but is extremely sensitive to adversarial examples.

Recommend


More recommend