learning defense transferable and
play

learning: defense, transferable and camouflaged attacks Xingjun Ma - PowerPoint PPT Presentation

Recent advances in adversarial machine learning: defense, transferable and camouflaged attacks Xingjun Ma School of Computing and Information Systems The University of Melbourne April 2020 Deep learning models are used everywhere Image


  1. Recent advances in adversarial machine learning: defense, transferable and camouflaged attacks Xingjun Ma School of Computing and Information Systems The University of Melbourne April 2020

  2. Deep learning models are used everywhere Image classification Object detection Speech recognition Deep Learning Autonomous driving Medical diagnosis Playing games 1

  3. Deep neural networks are vulnerable Small perturbation can fool state-of-the-art ML models. Szegedy et al. 2013, Goodfellow et al. 2014 2

  4. Security risks in medical diagnosis ๐œ— โˆ™ + = Attack Having disease No disease Understanding Adversarial Attacks on Deep Learning Based Medical Image Analysis Systems Ma et al., Pattern Recognition, 2020. 3

  5. Security threats to autonomous driving Adversarial traffic signs all recognized as: 45km speed limit. Evtimov et al. 2017 4

  6. Security risks in speech and NLP systems Carlini et al. 2018 Riberio et al. 2018 5

  7. Security risks in face or object recognition Brown et al. CVPRW, 2018 https://cvdazzle.com/ 6

  8. Research in adversarial machine learning 1. White-box: restricted (norm-bounded), semantic, sparse, โ€ฆ Advs attack 2. Black-box : query-based, transferable 3. Image, audio, video, text 4. Digital vs Physical-world AML 1. Detection: natural or adversarial? 2. Adversarial training, robust optimization 3. Certifiable robustness Advs defense 4. Data denoising, filtering 5. Model quantization, compression, pruning 6. Input gradient regularization 7

  9. How adversarial examples are crafted Training Training Images DNN Classifier Train a model: Class 1 Class 2 1 2 A test Image Feed into Input Gradient DNN classifier Extractor Perturb Image 3 Adversarial Attack: Adversarial Attack 8

  10. How adversarial examples are crafted ๐ธ ๐‘ข๐‘ ๐‘๐‘—๐‘œ : training data ๐‘ฆ ๐‘— : training sample min เท ๐‘€(๐‘” ๐œ„ ๐‘ฆ ๐‘— , ๐‘ง ๐‘— ) ๐‘ง ๐‘— ๏ผš class label Model training: ๐œ„ ๐‘€ : loss function ๐‘ฆ ๐‘— , ๐‘ง ๐‘— โˆˆ ๐ธ ๐‘ข๐‘ ๐‘๐‘—๐‘œ ๐‘” ๐œ„ : model ๐œ„ ๐‘ฆ โ€ฒ , ๐‘ง) subject to ๐‘ฆ โ€ฒ โˆ’ ๐‘ฆ ๐‘ž โ‰ค ๐œ— for x โˆˆ ๐ธ ๐‘ข๐‘“๐‘ก๐‘ข max ๐‘€(๐‘” Adversarial attack: ๐‘ฆ โ€ฒ increase error small change test time attack 8 ๐‘ฆ โ€ฒ โˆ’ ๐‘ฆ โˆž โ‰ค ๐œ— = 255 โ‰ˆ 0.031 โ€ข Fast Gradient Sign Method (FGSM) ( Goodfellow et al., 2014 ): ๐‘ฆ โ€ฒ = ๐‘ฆ + ๐œ โ‹… sign ๐›ผ ๐‘ฆ ๐‘€(๐‘” ๐œ„ ๐‘ฆ , ๐‘ง) ๐‘ฆ โ€ฒ : advs example 9

  11. Why adversarial examples exist? โ€ข Viewing DNN as a sequence of transformed spaces: 1 st layer 10 th layer 20 th layer Non-linear explanation: โ€” Non-linear transformations leads to the existence of small โ€œpocketsโ€ in the deep space: โ€ข Regions of low probability (not naturally occurring). โ€ข Densely scattered regions. โ€ข Continuous regions. โ€ข Close to normal data subspace. Characterizing Adversarial Subspace Using Local Intrinsic Dimensionality. Ma , et al. ICLR 2018 Szegedy et al. 2013 10

  12. Insufficient training data? โ€ข An illustrative example โ€“ ๐‘ฆ โˆˆ โˆ’1, 1 , ๐‘ง โˆˆ โˆ’1, 1 , ๐‘จ โˆˆ โˆ’1, 2 โ€“ Binary classification โ€ข Class 1: ๐‘จ < ๐‘ฆ 2 + ๐‘ง 3 โ€ข Class 2: ๐‘จ โ‰ฅ ๐‘ฆ 2 + ๐‘ง 3 โ€“ x , y and z are increased by 0.01 โ†’ a total of 200 ร— 200 ร— 300 = 1.2 ร— 10 7 points โ€ข How many points are needed to reconstruct the decision boundary? โ€“ Training dataset: choose 80, 800, 8000, 80000 points randomly โ€“ Test dataset: choose 40, 400, 4000, 40000 points randomly โ€“ Boundary dataset (adversarial samples are likely to locate here): ๐‘ฆ 2 + ๐‘ง 3 โˆ’ 0.1 < ๐‘จ < ๐‘ฆ 2 + ๐‘ง 3 + 0.1 11

  13. Insufficient training data? โ€ข Test result โ€“ RBF SVMs Size of the Accuracy on its Accuracy on the test dataset Accuracy on the with 4 ร— 10 4 points training dataset own test dataset boundary dataset 80 100 92.7 60.8 800 99.0 97.4 74.9 8000 99.5 99.6 94.1 80000 99.9 99.9 98.9 โ€“ Linear SVMs Size of the Accuracy on its Accuracy on the test dataset Accuracy on the with 4 ร— 10 4 points training dataset own test dataset boundary dataset 80 100 96.3 70.1 800 99.8 99.0 85.7 8000 99.9 99.8 97.3 80000 99.98 99.98 99.5 โ€ข 8000: 0.067% of 1.2 ร— 10 7 โ€ข MNIST: 28 ร— 28 8-bit greyscale images, (2 8 ) 28ร—28 โ‰ˆ 1.1 ร— 10 1888 โ€ข 1.1 ร— 10 1888 ร— 0.067% โ‰ซ 6 ร— 10 5 12

  14. Why adversarial examples exist? ๐’™ ๐‘ผ ๐’š + ๐’„ โ€ข Viewing DNN as a stack of linear operations: Linear explanation : โ€’ Adversarial subspaces span a contiguous multidimensional space: โ€ข Small changes at individual dimensions can sum up to ๐’ significant change in final output: ฯƒ ๐’‹=๐Ÿ ๐’š ๐’‹ + ๐‘ . Adversarial examples can always be found if ๐œ— is large enough. โ€ข Goodfellow et al. 2014, 2016 13

  15. State-of-the-art defense: adversarial training Training models on adversarial examples. Adversarial Training Training Images DNN Classifier Class 1 Class 2 1 2 Adversarial Images โ€ข It explicitly generates more examples to fill the gap in the input space to improve robustness. Adversarial Attack 14

  16. Adversarial training: robust optimization Adversarial training is a min-max optimization process: attacking ๐‘œ 1 โ€ฒ ), ๐‘ง ๐‘— ) min ๐‘œ เท โ€ฒ โˆ’๐‘ฆ ๐‘— ๐‘ž โ‰ค ๐œ— ๐‘€(๐‘” max ๐œ„ (๐‘ฆ ๐‘— ๐œพ ๐‘ฆ ๐‘— ๐‘—=1 โ€ฒ ๏ผš adversarial example. ๐‘€ : loss, ๐‘” ๐œ„ : model, x i ๏ผš clean example, y i ๏ผš class, x i 1. Inner Maximization: This is to generate adversarial examples, by maximizing the loss ๐‘€ . โ€’ โ€ฒ โˆ’ ๐‘ฆ ๐‘— ๐‘ฆ ๐‘— ๐‘ž โ‰ค ๐œ— . โ€’ It is a constrained optimization problem: 2. Outer Minimization: โ€ฒ A typical process to train a model, but on adversarial examples ๐‘ฆ ๐‘— โ€’ generated by the inner maximization. On the Convergence and Robustness of Adversarial Training. Wang*, Ma* , et al., ICML 2019. 15 Mary et al. ICLR 2018.

  17. Misclassification-Aware adveRsarial Training (MART) Improving Adversarial Robustness Requires Revisiting Misclassified Examples Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma and Quanquan Gu ICLR 2020. 16

  18. Misclassification-Aware adveRsarial Training (MART) Adversarial risk: Revisited adversarial risk (correctly- vs mis-classified): 17

  19. Misclassification-Aware adveRsarial Training (MART) โ€ข Surrogate loss functions (existing methods and MART) โ€ข Semi-supervised extension of MART: 18

  20. Misclassification-Aware adveRsarial Training (MART) โ€ข White-box robustness: ResNet-18, CIFAR-10, ๐œ— = 8/255 โ€ข White-box robustness: WideResNet-34-10, CIFAR-10, ๐œ— = 8/255 19

  21. Misclassification-Aware adveRsarial Training (MART) โ€ข White-box robustness: unlabled data, CIFAR-10, ๐œ— = 8/255 20

  22. Transferable attack with skip connections Skip Connections Matter: on the Transferability of Adversarial Examples Generated with ResNets Dongxian Wu, Yisen Wang, Shu-Tao Xia, James Bailey and Xingjun Ma. ICLR 2020. 21

  23. Structural weakness of ResNets? โ€ข Gradient backpropagation with skip connections Source : ResNet-18 Target : VGG19 White/black-box Skip the gradients incrases transferability! 22

  24. Transferable attack with skipped gradients โ€ข New attack method: skip gradient method (SGM) Breaking down a network f according to its L residual blocks. ImageNet, target: Inception V3, ๐œ— = 16/255 23

  25. How much can SGM increases transferability? Combined with existing methods: the success rates (%) of attacks crafted on source model DN201 against 7 target models. 24

  26. Adversarial camouflage attack Adversarial Camouflage: Hiding Adversarial Examples with Natural Styles Ranjie Duan, Xingjun Ma , Yisen Wang, James Bailey, Kai Qin, Yun Yang CVPR 2020. 25

  27. Adversarial camouflage Camouflage adversarial examples with customized styles. 26

  28. Adversarial camouflage Making large perturbations look natural: Adversarial attack + style transfer 27

  29. Adversarial camouflage A visually comparison to existing attacks 28

  30. Adversarial camouflage Revolver --> Toilet tissue Minivan --> Traffic light Scabbard --> Purse Attacking the background is what makes the attack stealthy and ubiquitous. Examples of camouflaged digital attacks 29

  31. Adversarial camouflage Traffic sign -> Barbershop Tree -> Street sign Examples of camouflaged physical-world attacks 30

  32. Using adversarial camouflage to protect privacy Here is an adversarial pikachu to protect you! This is a dog to Google Image Search. 31

  33. Thank you! 32

  34. The huge gap between natural accuracy and robustness 93% vs 53%! Model : WideResNet-28-10 Dataset : CIFAR-10 Perturbation : ๐œ— = 8/255 Attack : 20 step PGD 33

Recommend


More recommend