by learning the Distributions of Adversarial Examples Boqing Gong Joint work with Yandong Li, Lijun Li, Liqiang Wang, & Tong Zhang Published in ICML 2019
Intriguing properties of deep neural networks (DNNs) Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. ICLR . Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. ICLR .
Intriguing properties of deep neural networks (DNNs) Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. ICLR . Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. ICLR .
Projected gradient descent (PGD) attack Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083 . Kurakin, A., Goodfellow, I., & Bengio, S. (2016). Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 .
Intriguing results (1) ~100% attack success rates on CIFAR10 & ImageNet vs
Intriguing results (2)
Intriguing results (2) Adversarial examples generalize between different DNNs vs E.g., AlexNet InceptionV3
Intriguing results (3) A universal adversarial perturbation Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.
In a nutshell, white-box adversarial attacks can Fool different DNNs for almost all test examples Most data points lie near the classification boundaries. Fool different DNNs by the same adversarial examples The classification boundaries of various DNNs are close. Fool different DNNs by a single universal perturbation We can turn most examples to adversarial by moving them along the same direction by the same amount.
However, white-box adversarial attacks can Not apply to most real-world scenarios Not work when the network architecture is unknown Not work when the weights are unknown Not work when querying networks is (e.g., cost) prohibitive
Substitute attack (Papernot Black-box attacks et al., 2017) Decision-based (Brendel et al., 2017) Panda Boundary-tracing (Cheng et al., 2019) Zero-th order (Chen et al., 2017) Panda: 0.88493 Natural evolution Indri: 0.00878 strategies (Ilyas et al., 2018) Red Panda: 0.00317
The? adversarial perturbation (for an input) Bad local optimum, non-smooth optimization, curse of dimensionality, defense-specific gradient estimation, etc. Athalye, A., Carlini, N., & Wagner, D. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML .
Our work Learns the distribution of adversarial examples (for any input)
Our work Learns the distribution of adversarial examples (for an input) Reduces the “attack dimension” Fewer queries into the network. Smoothes the optimization Higher attack success rates. Characterizes the risk of the input example New defense methods.
Our work Learns the distribution of adversarial examples (for an input) > > >
Our work Learns the distribution of adversarial examples (for an input) A sample from the distribution fails DNN by a high chance.
Which family of distributions?
Natural evolution strategies (NES) Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR. .
Natural evolution strategies (NES) Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR. .
Natural evolution strategies (NES) Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR. .
Natural evolution strategies (NES) Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR. .
Natural evolution strategies (NES) Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR. .
Natural evolution strategies (NES) Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR. .
Natural evolution strategies (NES) Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR. .
Black-box
Experiment setup Attack 13 defended DNNs & 2 vanilla DNNs Consider both and Examine all test examples of CIFAR10 & 1000 of ImageNet Excluding those misclassified by the targeted DNN Evaluate by attack success rates
Attack success rates, ImageNet
Attack success rates, CIFAR10
Attack success rate vs. optimization steps
Transferabilities of the adversarial examples
A universally effective defense technique? Adversarial training / defensive learning DNN weights The PGD attack
In a nutshell, Is a powerful black-box attack, >= white-box attacks Is universal , failed various defenses by the same algorithm Characterizes the distributions of adversarial examples Reduces the “attack dimension” Speeds up the defensive learning (ongoing work)
Physical Joint work with Yang Zhang, adversarial attack Hassan Foroosh, & David Phil Boqing Gong Published in ICLR 2019
Recall the following result A universal adversarial perturbation Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.
Physical attack: universal perturbation → 2D mask Eykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. "Robust physical-world attacks on deep learning models." CVPR 2018 .
Physical attack: 2D mask → 3D camouflage Gradient descent w.r.t. camouflage c in order to minimize detection scores for the vehicle under all feasible locations Non-differentiable
Physical attack: 2D mask → 3D camouflage Repeat until done 1. Camouflage a vehicle 2. Drive it around and take many pictures of it 3. Detect it by Faster-RCNN & save the detection scores → Dataset: {(camouflage, vehicle, background, detection score)}
Physical attack: 2D mask → 3D camouflage Fit a DNN to predict any camouflage’s corresponding detection scores
Physical attack: 2D mask → 3D camouflage Gradient descent w.r.t. Camouflage c in order to minimize detection scores for the vehicle under all feasible locations Non-differentiable But approximated by a DNN
Why do we care?
Observation, re-observation, & future work Defended DNNs are still vulnerable to transfer attacks (only to some moderate degree though) Adversarial examples from black-box attacks are less transferable than those from white-box attacks All future work on defenses will adopt adversarial training Adversarial training will become faster (we are working on it) We should certify DNNs’ expected robustness by
New works to watch Stateful DNNs: Goodfellow (2019). A Research Agenda: Dynamic Models to Defend Against Correlated Attacks. arXiv: 1903.06293. Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples Are Not Bugs, They Are Features. arXiv:1905.02175. Faster adversarial training: Zhang et al. (2019). You Only Propagate Once: Painless Adversarial Training Using Maximal Principle. arXiv:1905.00877. && Shafahi et al. (2019). Adversarial Training for Free! arXiv: 1904.12843. Certifying DNNs’ expected robustness: Webb et al. (2019). A Statistical Approach to Assessing Neural Network Robustness. ICLR. && Cohen et al. (2019). Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918.
Recommend
More recommend