by learning the distributions of adversarial examples
play

by learning the Distributions of Adversarial Examples Boqing Gong - PowerPoint PPT Presentation

by learning the Distributions of Adversarial Examples Boqing Gong Joint work with Yandong Li, Lijun Li, Liqiang Wang, & Tong Zhang Published in ICML 2019 Intriguing properties of deep neural networks (DNNs) Szegedy, C., Zaremba, W.,


  1. by learning the Distributions of Adversarial Examples Boqing Gong Joint work with Yandong Li, Lijun Li, Liqiang Wang, & Tong Zhang Published in ICML 2019

  2. Intriguing properties of deep neural networks (DNNs) Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. ICLR . Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. ICLR .

  3. Intriguing properties of deep neural networks (DNNs) Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. ICLR . Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. ICLR .

  4. Projected gradient descent (PGD) attack Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083 . Kurakin, A., Goodfellow, I., & Bengio, S. (2016). Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 .

  5. Intriguing results (1) ~100% attack success rates on CIFAR10 & ImageNet vs

  6. Intriguing results (2)

  7. Intriguing results (2) Adversarial examples generalize between different DNNs vs E.g., AlexNet InceptionV3

  8. Intriguing results (3) A universal adversarial perturbation Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.

  9. In a nutshell, white-box adversarial attacks can Fool different DNNs for almost all test examples Most data points lie near the classification boundaries. Fool different DNNs by the same adversarial examples The classification boundaries of various DNNs are close. Fool different DNNs by a single universal perturbation We can turn most examples to adversarial by moving them along the same direction by the same amount.

  10. However, white-box adversarial attacks can Not apply to most real-world scenarios Not work when the network architecture is unknown Not work when the weights are unknown Not work when querying networks is (e.g., cost) prohibitive

  11. Substitute attack (Papernot Black-box attacks et al., 2017) Decision-based (Brendel et al., 2017) Panda Boundary-tracing (Cheng et al., 2019) Zero-th order (Chen et al., 2017) Panda: 0.88493 Natural evolution Indri: 0.00878 strategies (Ilyas et al., 2018) Red Panda: 0.00317

  12. The? adversarial perturbation (for an input) Bad local optimum, non-smooth optimization, curse of dimensionality, defense-specific gradient estimation, etc. Athalye, A., Carlini, N., & Wagner, D. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML .

  13. Our work Learns the distribution of adversarial examples (for any input)

  14. Our work Learns the distribution of adversarial examples (for an input) Reduces the “attack dimension” Fewer queries into the network. Smoothes the optimization Higher attack success rates. Characterizes the risk of the input example New defense methods.

  15. Our work Learns the distribution of adversarial examples (for an input) > > >

  16. Our work Learns the distribution of adversarial examples (for an input) A sample from the distribution fails DNN by a high chance.

  17. Which family of distributions?

  18. Natural evolution strategies (NES) Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR. .

  19. Natural evolution strategies (NES) Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR. .

  20. Natural evolution strategies (NES) Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR. .

  21. Natural evolution strategies (NES) Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR. .

  22. Natural evolution strategies (NES) Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR. .

  23. Natural evolution strategies (NES) Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR. .

  24. Natural evolution strategies (NES) Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR. .

  25. Black-box

  26. Experiment setup Attack 13 defended DNNs & 2 vanilla DNNs Consider both and Examine all test examples of CIFAR10 & 1000 of ImageNet Excluding those misclassified by the targeted DNN Evaluate by attack success rates

  27. Attack success rates, ImageNet

  28. Attack success rates, CIFAR10

  29. Attack success rate vs. optimization steps

  30. Transferabilities of the adversarial examples

  31. A universally effective defense technique? Adversarial training / defensive learning DNN weights The PGD attack

  32. In a nutshell, Is a powerful black-box attack, >= white-box attacks Is universal , failed various defenses by the same algorithm Characterizes the distributions of adversarial examples Reduces the “attack dimension” Speeds up the defensive learning (ongoing work)

  33. Physical Joint work with Yang Zhang, adversarial attack Hassan Foroosh, & David Phil Boqing Gong Published in ICLR 2019

  34. Recall the following result A universal adversarial perturbation Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.

  35. Physical attack: universal perturbation → 2D mask Eykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. "Robust physical-world attacks on deep learning models." CVPR 2018 .

  36. Physical attack: 2D mask → 3D camouflage Gradient descent w.r.t. camouflage c in order to minimize detection scores for the vehicle under all feasible locations Non-differentiable

  37. Physical attack: 2D mask → 3D camouflage Repeat until done 1. Camouflage a vehicle 2. Drive it around and take many pictures of it 3. Detect it by Faster-RCNN & save the detection scores → Dataset: {(camouflage, vehicle, background, detection score)}

  38. Physical attack: 2D mask → 3D camouflage Fit a DNN to predict any camouflage’s corresponding detection scores

  39. Physical attack: 2D mask → 3D camouflage Gradient descent w.r.t. Camouflage c in order to minimize detection scores for the vehicle under all feasible locations Non-differentiable But approximated by a DNN

  40. Why do we care?

  41. Observation, re-observation, & future work Defended DNNs are still vulnerable to transfer attacks (only to some moderate degree though) Adversarial examples from black-box attacks are less transferable than those from white-box attacks All future work on defenses will adopt adversarial training Adversarial training will become faster (we are working on it) We should certify DNNs’ expected robustness by

  42. New works to watch Stateful DNNs: Goodfellow (2019). A Research Agenda: Dynamic Models to Defend Against Correlated Attacks. arXiv: 1903.06293. Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples Are Not Bugs, They Are Features. arXiv:1905.02175. Faster adversarial training: Zhang et al. (2019). You Only Propagate Once: Painless Adversarial Training Using Maximal Principle. arXiv:1905.00877. && Shafahi et al. (2019). Adversarial Training for Free! arXiv: 1904.12843. Certifying DNNs’ expected robustness: Webb et al. (2019). A Statistical Approach to Assessing Neural Network Robustness. ICLR. && Cohen et al. (2019). Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918.

Recommend


More recommend