learning: defense, transferable and camouflaged attacks Xingjun Ma - - PowerPoint PPT Presentation

β–Ά
learning defense transferable and
SMART_READER_LITE
LIVE PREVIEW

learning: defense, transferable and camouflaged attacks Xingjun Ma - - PowerPoint PPT Presentation

Recent advances in adversarial machine learning: defense, transferable and camouflaged attacks Xingjun Ma School of Computing and Information Systems The University of Melbourne April 2020 Deep learning models are used everywhere Image


slide-1
SLIDE 1

Recent advances in adversarial machine learning: defense, transferable and camouflaged attacks

Xingjun Ma School of Computing and Information Systems The University of Melbourne April 2020

slide-2
SLIDE 2

Deep learning models are used everywhere

Medical diagnosis Image classification Autonomous driving Object detection

Deep Learning

Speech recognition Playing games

1

slide-3
SLIDE 3

Deep neural networks are vulnerable

2

Szegedy et al. 2013, Goodfellow et al. 2014

Small perturbation can fool state-of-the-art ML models.

slide-4
SLIDE 4

Understanding Adversarial Attacks on Deep Learning Based Medical Image Analysis Systems Ma et al., Pattern Recognition, 2020.

Security risks in medical diagnosis

3

No disease Attack

+ =

πœ— βˆ™

Having disease

slide-5
SLIDE 5

4

Adversarial traffic signs all recognized as: 45km speed limit.

Security threats to autonomous driving

Evtimov et al. 2017

slide-6
SLIDE 6

5

Security risks in speech and NLP systems

Carlini et al. 2018 Riberio et al. 2018

slide-7
SLIDE 7

6

Security risks in face or object recognition

Brown et al. CVPRW, 2018 https://cvdazzle.com/

slide-8
SLIDE 8

Research in adversarial machine learning

AML Advs attack Advs defense

  • 1. White-box: restricted (norm-bounded),

semantic, sparse, …

  • 2. Black-box: query-based, transferable
  • 3. Image, audio, video, text
  • 4. Digital vs Physical-world
  • 1. Detection: natural or adversarial?
  • 2. Adversarial training, robust optimization
  • 3. Certifiable robustness
  • 4. Data denoising, filtering
  • 5. Model quantization, compression, pruning
  • 6. Input gradient regularization

7

slide-9
SLIDE 9

8

Adversarial Attack Training

How adversarial examples are crafted

Train a model: Adversarial Attack:

DNN Classifier Training Images

Class 1 Class 2

Input Gradient Extractor A test Image Perturb Image Feed into DNN classifier

1 2 3

slide-10
SLIDE 10

9 max

𝑦′

𝑀(𝑔

πœ„ 𝑦′ , 𝑧) subject to 𝑦′ βˆ’ 𝑦 π‘ž ≀ πœ— for x ∈ 𝐸𝑒𝑓𝑑𝑒

min

πœ„

෍

𝑦𝑗, 𝑧𝑗 ∈ πΈπ‘’π‘ π‘π‘—π‘œ

𝑀(𝑔

πœ„ 𝑦𝑗 , 𝑧𝑗 )

Model training: Adversarial attack:

increase error small change test time attack 𝑦′ βˆ’ 𝑦 ∞ ≀ πœ— = 8 255 β‰ˆ 0.031

  • Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2014):

𝑦′ = 𝑦 + 𝜁 β‹… sign 𝛼

𝑦 𝑀(𝑔 πœ„ 𝑦 , 𝑧)

πΈπ‘’π‘ π‘π‘—π‘œ: training data 𝑦𝑗: training sample π‘§π‘—οΌš class label 𝑀: loss function 𝑔

πœ„: model

𝑦′: advs example

How adversarial examples are crafted

slide-11
SLIDE 11

10

Why adversarial examples exist?

1st layer

  • Viewing DNN as a sequence of transformed spaces:

10th layer 20th layer

Characterizing Adversarial Subspace Using Local Intrinsic Dimensionality. Ma, et al. ICLR 2018

Non-linear explanation: β€” Non-linear transformations leads to the existence of small β€œpockets” in the deep space:

  • Regions of low probability (not naturally occurring).
  • Densely scattered regions.
  • Continuous regions.
  • Close to normal data subspace.

Szegedy et al. 2013

slide-12
SLIDE 12

11

  • An illustrative example

– 𝑦 ∈ βˆ’1, 1 , 𝑧 ∈ βˆ’1, 1 , 𝑨 ∈ βˆ’1, 2 – Binary classification

  • Class 1: 𝑨 < 𝑦2 + 𝑧3
  • Class 2: 𝑨 β‰₯ 𝑦2 + 𝑧3

– x, y and z are increased by 0.01 β†’ a total of 200Γ—200Γ—300 = 1.2Γ—107 points

  • How many points are needed to reconstruct the decision boundary?

– Training dataset: choose 80, 800, 8000, 80000 points randomly – Test dataset: choose 40, 400, 4000, 40000 points randomly – Boundary dataset (adversarial samples are likely to locate here): 𝑦2 + 𝑧3 βˆ’ 0.1 < 𝑨 < 𝑦2 + 𝑧3 + 0.1

Insufficient training data?

slide-13
SLIDE 13

12

Insufficient training data?

  • Test result

– RBF SVMs – Linear SVMs

  • 8000: 0.067% of 1.2Γ—107
  • MNIST: 28Γ—28 8-bit greyscale images,

(28)28Γ—28 β‰ˆ 1.1 Γ— 101888

  • 1.1 Γ— 101888 Γ— 0.067% ≫ 6 Γ— 105

Size of the training dataset Accuracy on its

  • wn test dataset

Accuracy on the test dataset with 4Γ—104 points Accuracy on the boundary dataset

80 100 92.7 60.8 800 99.0 97.4 74.9 8000 99.5 99.6 94.1 80000 99.9 99.9 98.9

Size of the training dataset Accuracy on its

  • wn test dataset

Accuracy on the test dataset with 4Γ—104 points Accuracy on the boundary dataset

80 100 96.3 70.1 800 99.8 99.0 85.7 8000 99.9 99.8 97.3 80000 99.98 99.98 99.5

slide-14
SLIDE 14

13

  • Viewing DNN as a stack of linear operations:

Goodfellow et al. 2014, 2016

Linear explanation: β€’ Adversarial subspaces span a contiguous multidimensional space:

  • Small changes at individual dimensions can sum up to

significant change in final output: σ𝒋=𝟏

𝒐

π’šπ’‹ + 𝝑.

  • Adversarial examples can always be found if πœ— is large enough.

π’™π‘Όπ’š + 𝒄 Why adversarial examples exist?

slide-15
SLIDE 15

14

Adversarial Attack Adversarial Training DNN Classifier Training Images

Class 1 Class 2

Adversarial Images

1 2

State-of-the-art defense: adversarial training

Training models on adversarial examples.

  • It explicitly generates more

examples to fill the gap in the input space to improve robustness.

slide-16
SLIDE 16

15 Adversarial training is a min-max optimization process: min

𝜾

1 π‘œ ෍

𝑗=1 π‘œ

max

𝑦𝑗

β€²βˆ’π‘¦π‘— π‘ž ≀ πœ— 𝑀(𝑔

πœ„(𝑦𝑗 β€²), 𝑧𝑗)

𝑀: loss, 𝑔

πœ„: model, xi: clean example, yi: class, xi β€² : adversarial example.

attacking

  • 1. Inner Maximization:

β€’ This is to generate adversarial examples, by maximizing the loss 𝑀. β€’ It is a constrained optimization problem: 𝑦𝑗

β€² βˆ’ 𝑦𝑗 π‘ž ≀ πœ—.

  • 2. Outer Minimization:

β€’ A typical process to train a model, but on adversarial examples 𝑦𝑗

β€²

generated by the inner maximization.

Adversarial training: robust optimization

On the Convergence and Robustness of Adversarial Training. Wang*, Ma*, et al., ICML 2019. Mary et al. ICLR 2018.

slide-17
SLIDE 17

16

Misclassification-Aware adveRsarial Training (MART)

Improving Adversarial Robustness Requires Revisiting Misclassified Examples Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma and Quanquan Gu ICLR 2020.

slide-18
SLIDE 18

17

Misclassification-Aware adveRsarial Training (MART)

Adversarial risk: Revisited adversarial risk (correctly- vs mis-classified):

slide-19
SLIDE 19

18

Misclassification-Aware adveRsarial Training (MART)

  • Surrogate loss functions (existing methods and MART)
  • Semi-supervised extension of MART:
slide-20
SLIDE 20

19

Misclassification-Aware adveRsarial Training (MART)

  • White-box robustness: ResNet-18, CIFAR-10, πœ— = 8/255
  • White-box robustness: WideResNet-34-10, CIFAR-10, πœ— = 8/255
slide-21
SLIDE 21

20

Misclassification-Aware adveRsarial Training (MART)

  • White-box robustness: unlabled data, CIFAR-10, πœ— = 8/255
slide-22
SLIDE 22

21

Transferable attack with skip connections

Skip Connections Matter: on the Transferability of Adversarial Examples Generated with ResNets Dongxian Wu, Yisen Wang, Shu-Tao Xia, James Bailey and Xingjun Ma. ICLR 2020.

slide-23
SLIDE 23

22

Structural weakness of ResNets?

  • Gradient backpropagation with skip connections

Skip the gradients incrases transferability!

Source: ResNet-18 Target: VGG19 White/black-box

slide-24
SLIDE 24

23

Transferable attack with skipped gradients

  • New attack method: skip gradient method (SGM)

Breaking down a network f according to its L residual blocks. ImageNet, target: Inception V3, πœ— = 16/255

slide-25
SLIDE 25

24

How much can SGM increases transferability?

Combined with existing methods: the success rates (%) of attacks crafted on source model DN201 against 7 target models.

slide-26
SLIDE 26

25

Adversarial camouflage attack

Adversarial Camouflage: Hiding Adversarial Examples with Natural Styles Ranjie Duan, Xingjun Ma, Yisen Wang, James Bailey, Kai Qin, Yun Yang CVPR 2020.

slide-27
SLIDE 27

26

Adversarial camouflage Camouflage adversarial examples with customized styles.

slide-28
SLIDE 28

27

Adversarial camouflage Making large perturbations look natural: Adversarial attack + style transfer

slide-29
SLIDE 29

28

Adversarial camouflage A visually comparison to existing attacks

slide-30
SLIDE 30

29

Adversarial camouflage Examples of camouflaged digital attacks

Revolver --> Toilet tissue Minivan --> Traffic light Scabbard --> Purse Attacking the background is what makes the attack stealthy and ubiquitous.

slide-31
SLIDE 31

30

Adversarial camouflage Examples of camouflaged physical-world attacks

Traffic sign -> Barbershop Tree -> Street sign

slide-32
SLIDE 32

31

Using adversarial camouflage to protect privacy

Here is an adversarial pikachu to protect you!

This is a dog to Google Image Search.

slide-33
SLIDE 33

Thank you!

32

slide-34
SLIDE 34

33

The huge gap between natural accuracy and robustness

93% vs 53%!

Model: WideResNet-28-10 Dataset: CIFAR-10 Perturbation: πœ— = 8/255 Attack: 20 step PGD