adversarial training and robustness for multiple
play

Adversarial Training and Robustness for Multiple Perturbations - PowerPoint PPT Presentation

Adversarial Training and Robustness for Multiple Perturbations Poster #87 Florian Tramr & Dan Boneh NeurIPS 2019 Adversarial examples 88% Tabby Cat 99% Guacamole Szegedy et al., 2014 Goodfellow et al., 2015 Athalye, 2017 Adversarial


  1. Adversarial Training and Robustness for Multiple Perturbations Poster #87 Florian Tramèr & Dan Boneh NeurIPS 2019

  2. Adversarial examples 88% Tabby Cat 99% Guacamole Szegedy et al., 2014 Goodfellow et al., 2015 Athalye, 2017 Adversarial Training and Robustness for Multiple Perturbations

  3. Adversarial examples 88% Tabby Cat 99% Guacamole • ML models learn very different features than humans • This is a safety concern for deployed ML models • Classification in adversarial settings is hard Szegedy et al., 2014 Goodfellow et al., 2015 Athalye, 2017 Adversarial Training and Robustness for Multiple Perturbations

  4. Adversarial training Szegedy et al., 2014 Madry et al., 2017 Adversarial Training and Robustness for Multiple Perturbations

  5. Adversarial training 1. Choose a set of perturbations: e.g., noise of small ℓ ∞ norm: Szegedy et al., 2014 Madry et al., 2017 Adversarial Training and Robustness for Multiple Perturbations

  6. Adversarial training 1. Choose a set of perturbations: e.g., noise of small ℓ ∞ norm: 2. For each example , find an adversarial example: 3. Train the model on 4. Repeat until convergence Szegedy et al., 2014 Madry et al., 2017 Adversarial Training and Robustness for Multiple Perturbations

  7. How well does it work? ℓ 1 noise Rotation Engstrom et al., 2017 Sharma & Chen, 2018 Adversarial Training and Robustness for Multiple Perturbations

  8. � � How well does it work? Adversarial training on CIFAR10, with ℓ ∞ noise 96% accuracy 70% 16% 9% No noise ℓ 1 noise ℓ ∞ noise Rotation Engstrom et al., 2017 Sharma & Chen, 2018 Adversarial Training and Robustness for Multiple Perturbations

  9. � � � � How well does it work? Adversarial training on CIFAR10, with ℓ ∞ noise 96% accuracy 70% 16% 9% No noise ℓ 1 noise ℓ ∞ noise Rotation Engstrom et al., 2017 Sharma & Chen, 2018 Adversarial Training and Robustness for Multiple Perturbations

  10. How to prevent other adversarial examples? Adversarial Training and Robustness for Multiple Perturbations

  11. � � � How to prevent other adversarial examples? S 2 = { δ : ❘❘ δ ❘❘ 1 ≤ ε 1 } S 1 = { δ : ❘❘ δ ❘❘ ∞ ≤ ε ∞ } S 3 = { 𝜀 : « small rotation » } Adversary can choose a perturbation type for each input Adversarial Training and Robustness for Multiple Perturbations

  12. � � � How to prevent other adversarial examples? S 2 = { δ : ❘❘ δ ❘❘ 1 ≤ ε 1 } S 1 = { δ : ❘❘ δ ❘❘ ∞ ≤ ε ∞ } S 3 = { 𝜀 : « small rotation » } Adversary can choose a perturbation S = S 1 ⋃ S 2 ⋃ S 3 type for each input • Pick worst-case adversarial example from S • Train the model on that example Adversarial Training and Robustness for Multiple Perturbations

  13. Does this work? Adversarial Training and Robustness for Multiple Perturbations

  14. Does this work? Adversarial Training and Robustness for Multiple Perturbations

  15. Does this work? A robustness tradeoff is provably inherent in some classification tasks Increased robustness to one type of noise ⇒ decreased robustness to another Empirically validated on CIFAR10 & MNIST Adversarial Training and Robustness for Multiple Perturbations

  16. Does this work? A robustness tradeoff is provably inherent in some classification tasks Increased robustness to one type of noise ⇒ decreased robustness to another Empirically validated on CIFAR10 & MNIST MNIST: Adversarial Training and Robustness for Multiple Perturbations

  17. Does this work? A robustness tradeoff is provably inherent in some classification tasks Increased robustness to one type of noise ⇒ decreased robustness to another Empirically validated on CIFAR10 & MNIST For ℓ ∞ , ℓ 1 and ℓ 2 noise: MNIST: 50% accuracy Adversarial Training and Robustness for Multiple Perturbations

  18. Does this work? A robustness tradeoff is provably inherent in some classification tasks Increased robustness to one type of noise ⇒ decreased robustness to another Empirically validated on CIFAR10 & MNIST For ℓ ∞ , ℓ 1 and ℓ 2 noise: gradient MNIST: masking 50% accuracy Adversarial Training and Robustness for Multiple Perturbations

  19. What if we combine perturbations? Adversarial Training and Robustness for Multiple Perturbations

  20. � � � � What if we combine perturbations? natural image rotation ℓ ∞ noise ½ rotation + ½ ℓ ∞ noise Adversarial Training and Robustness for Multiple Perturbations

  21. � � � � What if we combine perturbations? natural image rotation ℓ ∞ noise ½ rotation + ½ ℓ ∞ noise 96% Accuracy 70% 65% 55% One noise One of two Mixture of two No noise type noise types noise types Adversarial Training and Robustness for Multiple Perturbations

  22. Conclusion Adversarial training for multiple perturbation sets works, but... • Significant loss in robustness • Weak robustness to affine combinations of perturbations Poster #87 https://arxiv.org/abs/1904.13000 Adversarial Training and Robustness for Multiple Perturbations

  23. Conclusion Adversarial training for multiple perturbation sets works, but... • Significant loss in robustness • Weak robustness to affine combinations of perturbations Open questions: Poster #87 Train a single MNIST model with high robustness to any ℓ p noise • • Better scaling of multi-perturbation adversarial training • Which perturbations do we care about? https://arxiv.org/abs/1904.13000 Adversarial Training and Robustness for Multiple Perturbations

Recommend


More recommend