Featur ure Deno noising ng for r Impr provi ving ng Adv dversari rial Robus bustne ness Cihang Xie Johns Hopkins University
● Background ● Towards Robust Adversarial Defense
Deep networks are Go Good Deep Label: King Penguin Networks
FRAGILE to small & carefully crafted perturbations Deep networks are FR Deep Label: King Penguin Networks Label: Chihuahua
FRAGILE to small & carefully crafted perturbations Deep networks are FR We call such images as Adversarial Examples
Adversarial Examples can exist on Di Differ eren ent Tasks semantic segmentation pose estimation text classification [1] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. "Adversarial examples for semantic segmentation and object detection." In ICCV . 2017. [2] Moustapha Cisse, Yossi Adi, Natalia Neverova, and Joseph Keshet. "Houdini: Fooling deep structured prediction models." In NeurIPS. 2018. [3] Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. "HotFlip: White-Box Adversarial Examples for Text Classification." In ACL . 2018.
Adversarial Examples can be created other than Addi Adding ng Pe Perturbation person : 0.817 bird : 0.342 person : 0.736 bird : 0.629 bird : 0.628 tvmonitor : 0.998 bird : 0.663 [4] Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, and Dawn Song. "Spatially transformed adversarial examples." In ICLR. 2018. [5] Jianyu Wang, Zhishuai Zhang, Cihang Xie, et al. "Visual concepts and compositional voting." In Annals of Mathematical Sciences and Applications. 2018 .
Adversarial Examples can exist on Th The Physic ical al World ld [6] Lifeng Huang, Chengying Gao, Yuyin Zhou, Changqing Zou, Cihang Xie, Alan Yuille, Ning Liu. "UPC: Learning Universal Physical Camouflage Attacks on Object Detectors," Arxiv, 2019
Generating Adversarial Example is SI PLE : SIMPL maximize loss(f(x+ r ), y true ; θ) Maximize the loss function w.r.t. Adversarial Perturbation r
Generating Adversarial Example is SI PLE : SIMPL maximize loss(f(x+ r ), y true ; θ) Maximize the loss function w.r.t. Adversarial Perturbation r minimize loss(f(x), y true ; θ ); Minimize the loss function w.r.t. Network Parameters θ
● Background ● Towards Robust Adversarial Defense Deep Networks Label: King Penguin
SMALL on the pixel space Ob Obser ervation on : Adversarial perturbations are SM 4 3 3 3 Clean 2 clean 2 2 1 1 1 0 0 0 4 3 3 adversarial 3 2 2 Adversarial 2 1 1 1 0 0 0
BIG on the feature space Ob Obser ervation on : Adversarial perturbations are BI 4 3 3 3 Clean 2 clean 2 2 1 1 1 0 0 0 4 3 3 adversarial 3 2 2 Adversarial 2 1 1 1 0 0 0
BIG on the feature space Obser Ob ervation on : Adversarial perturbations are BI 4 3 3 3 Clean 2 clean 2 2 1 1 1 0 0 0 4 3 3 adversarial 3 2 2 Adversarial 2 1 1 1 0 0 0 We should DENOISE these feature maps
Ou Our Sol olution on : Den Denoi oising at fea eature e level el Tradition onal Image Denoi oising g Operation ons : Local filters (predefine a local region Ω " for each pixel i): & '() * ) ∑ ∀.∈0 $ 1 2 $ , 2 . 2 . Bilateral filter # $ = ● Median filter # $ = 456"78 ∀9 ∈ Ω " : 2 . ● & '() * ) ∑ ∀.∈0 $ 2 . Mean filter # $ = ● Non-local filters (the local region Ω " is the whole image I): & '() * ) ∑ ∀.∈; 1 2 $ , 2 . 2 . Non-local means # $ = ●
Denoising Block ck Design 1×1 conv Denoising operations may lose information • we add a residual connection to balance the tradeoff between denoising removing noise and retaining original signal operation
Tr Training Strategy : Adversarial training Core Idea: train with adversarial examples ●
Training Strategy : Adversarial training Tr Core Idea: train with adversarial examples ● min θ max loss(f(x+ r ),ytrue; θ) & max step: generate adversarial perturbation
Tr Training Strategy : Adversarial training Core Idea: train with adversarial examples ● min θ max loss(f(x+r),ytrue; θ ) & max step: generate adversarial perturbation min step: optimize network parameters
Tw Two Ways for Evaluating Robustness Defending Against White-box Attacks Attackers know everything about models ● Directly maximize loss(f(x+ r ), y true ; θ) ●
Tw Two Ways for Evaluating Robustness Defending Against White-box Attacks Attackers know everything about models ● Directly maximize loss(f(x+ r ), y true ; θ) ● Defending Against Blind Attacks Attackers know nothing about models ● Attackers generate adversarial examples using substitute networks ● ( rely on transferability )
De Defen ending Against White-box x Attacks Evaluating against adversarial attackers with attack iteration up to 2000 ● ( more attack iterations indicate stronger attacks )
De Defen ending Against White-box x Attacks – Pa Part I ALP, Inception-v3 55 ours, R-152 baseline A successful adversarial training can 50 give us a STRONG baseline 45 accuracy (%) 41.7 41.7 2000-iter PGD attack 40.4 40.4 39.6 39.6 39.2 39.2 40 38.9 38.9 35 30 27.9 27.9 ALP ALP 25 10 100 200 400 600 800 1000 1200 1400 1600 1800 2000 attack iterations
De Defen ending Against White-box x Attacks – Pa Part I ALP, Inception-v3 55 ours, R-152 baseline ours, R-152 denoise 50 45.5 45.5 44.4 44.4 2000-iter PGD attack 2000-iter PGD attack 45 43.3 43.3 accuracy (%) 42.8 42.8 42.6 42.6 41.7 41.7 40.4 40.4 39.6 39.6 39.2 39.2 40 38.9 38.9 Feature Denoising can give us additional benefits 35 30 27.9 27.9 ALP ALP 25 10 100 200 400 600 800 1000 1200 1400 1600 1800 2000 attack iterations
De Defen ending Against White-box x Attacks – Pa Part II 62 ResNet-152 baseline +4 bottleneck (ResNet-164) 60 +4 denoise: null (1x1 only) +4 denoise: 3x3 mean 58 +4 denoise: 3x3 median 56 +4 denoise: bilateral, dot prod 55.7 55.7 +4 denoise: bilateral, gaussian 54 +4 denoise: nonlocal, dot prod 53.5 53.5 accuracy (%) +4 denoise: nonlocal, gaussian 52.5 52.5 52 50 48 46 45.5 45.5 All denoising operations can help 44 43.4 43.4 42 41.7 41.7 10 20 30 40 50 60 70 80 90 100 attack iterations
De Defen ending Against White-box x Attacks – Pa Part III 62 ResNet-152 ResNet-152, denoise 60 ResNet-638 57.3 57.3 58 56 55.7 55.7 54 accuracy (%) 52.5 52.5 52 50 48 46.1 46.1 Feature Denoising is nearly as powerful 46 45.5 45.5 as adding ~500 additional layers 44 42 41.7 41.7 10 20 30 40 50 60 70 80 90 100 attack iterations
De Defen ending Against White-box x Attacks – Pa Part III 62 61.3 61.3 ResNet-152 ResNet-152, denoise 60 ResNet-638 57.3 57.3 ResNet-638, denoise* 58 56 55.7 55.7 54 accuracy (%) 52.5 52.5 52 49.9 49.9 50 Feature Denoising can still provide 48 benefits for the VERY deep ResNet-638 46.1 46.1 46 45.5 45.5 44 42 41.7 41.7 10 20 30 40 50 60 70 80 90 100 attack iterations
De Defen ending Against Blind Attacks Offline evaluation against 5 BEST attackers from NeurIPS Adversarial Competition 2017 ● Online competition against 48 UNKNOWN attackers in CAAD 2018 ●
De Defen ending Against Blind Attacks Offline evaluation against 5 BEST attackers from NeurIPS Adversarial Competition 2017 ● Online competition against 48 UNKNOWN attackers in CAAD 2018 ● CAAD 2018 “all or nothing” criterion : an image is considered correctly classified only if the model correctly classifies all adversarial versions of this image created by all attackers
De Defen ending Against Blind Attacks --- --- CA CAAD 2 2017 O Offline E Evaluation
De Defen ending Against Blind Attacks --- --- CA CAAD 2 2017 O Offline E Evaluation
De Defen ending Against Blind Attacks --- --- CA CAAD 2 2017 O Offline E Evaluation
De Defen ending Against Blind Attacks --- --- CA CAAD 2 2018 O Online Co Competition 0 10 20 30 40 50 1st 50.6 2nd 40.8 3rd 8.6 4th 3.6 5th 0.6
Visualization Vi Adversarial Examples Before denoising After denoising 0.8 0.6 0.4 0.2 0 2.4 Denoising 1.8 Operations 1.2 0.6 0 1.5 1 0.5 0
Defending against adversarial attacks is still a long way to go…
Questions?
Recommend
More recommend