Recent advances in adversarial machine learning: defense, transferable and camouflaged attacks Xingjun Ma School of Computing and Information Systems The University of Melbourne April 2020
Deep learning models are used everywhere Image classification Object detection Speech recognition Deep Learning Autonomous driving Medical diagnosis Playing games 1
Deep neural networks are vulnerable Small perturbation can fool state-of-the-art ML models. Szegedy et al. 2013, Goodfellow et al. 2014 2
Security risks in medical diagnosis ๐ โ + = Attack Having disease No disease Understanding Adversarial Attacks on Deep Learning Based Medical Image Analysis Systems Ma et al., Pattern Recognition, 2020. 3
Security threats to autonomous driving Adversarial traffic signs all recognized as: 45km speed limit. Evtimov et al. 2017 4
Security risks in speech and NLP systems Carlini et al. 2018 Riberio et al. 2018 5
Security risks in face or object recognition Brown et al. CVPRW, 2018 https://cvdazzle.com/ 6
Research in adversarial machine learning 1. White-box: restricted (norm-bounded), semantic, sparse, โฆ Advs attack 2. Black-box : query-based, transferable 3. Image, audio, video, text 4. Digital vs Physical-world AML 1. Detection: natural or adversarial? 2. Adversarial training, robust optimization 3. Certifiable robustness Advs defense 4. Data denoising, filtering 5. Model quantization, compression, pruning 6. Input gradient regularization 7
How adversarial examples are crafted Training Training Images DNN Classifier Train a model: Class 1 Class 2 1 2 A test Image Feed into Input Gradient DNN classifier Extractor Perturb Image 3 Adversarial Attack: Adversarial Attack 8
How adversarial examples are crafted ๐ธ ๐ข๐ ๐๐๐ : training data ๐ฆ ๐ : training sample min เท ๐(๐ ๐ ๐ฆ ๐ , ๐ง ๐ ) ๐ง ๐ ๏ผ class label Model training: ๐ ๐ : loss function ๐ฆ ๐ , ๐ง ๐ โ ๐ธ ๐ข๐ ๐๐๐ ๐ ๐ : model ๐ ๐ฆ โฒ , ๐ง) subject to ๐ฆ โฒ โ ๐ฆ ๐ โค ๐ for x โ ๐ธ ๐ข๐๐ก๐ข max ๐(๐ Adversarial attack: ๐ฆ โฒ increase error small change test time attack 8 ๐ฆ โฒ โ ๐ฆ โ โค ๐ = 255 โ 0.031 โข Fast Gradient Sign Method (FGSM) ( Goodfellow et al., 2014 ): ๐ฆ โฒ = ๐ฆ + ๐ โ sign ๐ผ ๐ฆ ๐(๐ ๐ ๐ฆ , ๐ง) ๐ฆ โฒ : advs example 9
Why adversarial examples exist? โข Viewing DNN as a sequence of transformed spaces: 1 st layer 10 th layer 20 th layer Non-linear explanation: โ Non-linear transformations leads to the existence of small โpocketsโ in the deep space: โข Regions of low probability (not naturally occurring). โข Densely scattered regions. โข Continuous regions. โข Close to normal data subspace. Characterizing Adversarial Subspace Using Local Intrinsic Dimensionality. Ma , et al. ICLR 2018 Szegedy et al. 2013 10
Insufficient training data? โข An illustrative example โ ๐ฆ โ โ1, 1 , ๐ง โ โ1, 1 , ๐จ โ โ1, 2 โ Binary classification โข Class 1: ๐จ < ๐ฆ 2 + ๐ง 3 โข Class 2: ๐จ โฅ ๐ฆ 2 + ๐ง 3 โ x , y and z are increased by 0.01 โ a total of 200 ร 200 ร 300 = 1.2 ร 10 7 points โข How many points are needed to reconstruct the decision boundary? โ Training dataset: choose 80, 800, 8000, 80000 points randomly โ Test dataset: choose 40, 400, 4000, 40000 points randomly โ Boundary dataset (adversarial samples are likely to locate here): ๐ฆ 2 + ๐ง 3 โ 0.1 < ๐จ < ๐ฆ 2 + ๐ง 3 + 0.1 11
Insufficient training data? โข Test result โ RBF SVMs Size of the Accuracy on its Accuracy on the test dataset Accuracy on the with 4 ร 10 4 points training dataset own test dataset boundary dataset 80 100 92.7 60.8 800 99.0 97.4 74.9 8000 99.5 99.6 94.1 80000 99.9 99.9 98.9 โ Linear SVMs Size of the Accuracy on its Accuracy on the test dataset Accuracy on the with 4 ร 10 4 points training dataset own test dataset boundary dataset 80 100 96.3 70.1 800 99.8 99.0 85.7 8000 99.9 99.8 97.3 80000 99.98 99.98 99.5 โข 8000: 0.067% of 1.2 ร 10 7 โข MNIST: 28 ร 28 8-bit greyscale images, (2 8 ) 28ร28 โ 1.1 ร 10 1888 โข 1.1 ร 10 1888 ร 0.067% โซ 6 ร 10 5 12
Why adversarial examples exist? ๐ ๐ผ ๐ + ๐ โข Viewing DNN as a stack of linear operations: Linear explanation : โ Adversarial subspaces span a contiguous multidimensional space: โข Small changes at individual dimensions can sum up to ๐ significant change in final output: ฯ ๐=๐ ๐ ๐ + ๐ . Adversarial examples can always be found if ๐ is large enough. โข Goodfellow et al. 2014, 2016 13
State-of-the-art defense: adversarial training Training models on adversarial examples. Adversarial Training Training Images DNN Classifier Class 1 Class 2 1 2 Adversarial Images โข It explicitly generates more examples to fill the gap in the input space to improve robustness. Adversarial Attack 14
Adversarial training: robust optimization Adversarial training is a min-max optimization process: attacking ๐ 1 โฒ ), ๐ง ๐ ) min ๐ เท โฒ โ๐ฆ ๐ ๐ โค ๐ ๐(๐ max ๐ (๐ฆ ๐ ๐พ ๐ฆ ๐ ๐=1 โฒ ๏ผ adversarial example. ๐ : loss, ๐ ๐ : model, x i ๏ผ clean example, y i ๏ผ class, x i 1. Inner Maximization: This is to generate adversarial examples, by maximizing the loss ๐ . โ โฒ โ ๐ฆ ๐ ๐ฆ ๐ ๐ โค ๐ . โ It is a constrained optimization problem: 2. Outer Minimization: โฒ A typical process to train a model, but on adversarial examples ๐ฆ ๐ โ generated by the inner maximization. On the Convergence and Robustness of Adversarial Training. Wang*, Ma* , et al., ICML 2019. 15 Mary et al. ICLR 2018.
Misclassification-Aware adveRsarial Training (MART) Improving Adversarial Robustness Requires Revisiting Misclassified Examples Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma and Quanquan Gu ICLR 2020. 16
Misclassification-Aware adveRsarial Training (MART) Adversarial risk: Revisited adversarial risk (correctly- vs mis-classified): 17
Misclassification-Aware adveRsarial Training (MART) โข Surrogate loss functions (existing methods and MART) โข Semi-supervised extension of MART: 18
Misclassification-Aware adveRsarial Training (MART) โข White-box robustness: ResNet-18, CIFAR-10, ๐ = 8/255 โข White-box robustness: WideResNet-34-10, CIFAR-10, ๐ = 8/255 19
Misclassification-Aware adveRsarial Training (MART) โข White-box robustness: unlabled data, CIFAR-10, ๐ = 8/255 20
Transferable attack with skip connections Skip Connections Matter: on the Transferability of Adversarial Examples Generated with ResNets Dongxian Wu, Yisen Wang, Shu-Tao Xia, James Bailey and Xingjun Ma. ICLR 2020. 21
Structural weakness of ResNets? โข Gradient backpropagation with skip connections Source : ResNet-18 Target : VGG19 White/black-box Skip the gradients incrases transferability! 22
Transferable attack with skipped gradients โข New attack method: skip gradient method (SGM) Breaking down a network f according to its L residual blocks. ImageNet, target: Inception V3, ๐ = 16/255 23
How much can SGM increases transferability? Combined with existing methods: the success rates (%) of attacks crafted on source model DN201 against 7 target models. 24
Adversarial camouflage attack Adversarial Camouflage: Hiding Adversarial Examples with Natural Styles Ranjie Duan, Xingjun Ma , Yisen Wang, James Bailey, Kai Qin, Yun Yang CVPR 2020. 25
Adversarial camouflage Camouflage adversarial examples with customized styles. 26
Adversarial camouflage Making large perturbations look natural: Adversarial attack + style transfer 27
Adversarial camouflage A visually comparison to existing attacks 28
Adversarial camouflage Revolver --> Toilet tissue Minivan --> Traffic light Scabbard --> Purse Attacking the background is what makes the attack stealthy and ubiquitous. Examples of camouflaged digital attacks 29
Adversarial camouflage Traffic sign -> Barbershop Tree -> Street sign Examples of camouflaged physical-world attacks 30
Using adversarial camouflage to protect privacy Here is an adversarial pikachu to protect you! This is a dog to Google Image Search. 31
Thank you! 32
The huge gap between natural accuracy and robustness 93% vs 53%! Model : WideResNet-28-10 Dataset : CIFAR-10 Perturbation : ๐ = 8/255 Attack : 20 step PGD 33
Recommend
More recommend