Toward Adversarial Robustness by Diversity in an Ensemble of Specialized Deep Neural Networks presented at Canadian AI 2020 Mahdieh Abbasi 1 , Arezoo Rajabi 2 , Christian Gagné 1,3 , and Rakesh B. Bobba 2 1. IID, Université Laval, Québec, Canada 2. Oregon State University, Corvallis, USA 3. Mila, Canada CIFAR AI Chair
Adversarial attacks aims at fooling a Ensemble of Reject samples model through specialists trained with low ensemble imperceptible on diverse subsets classification modifications of classes certainty Legitimate sample ! or Adversarial instance ? ! ! 2
Adversarial example Adding the right imperceptible perturbation to a clean sample creates an adversarial example such that NNs if fooled by misclassifying it confidently ! P ( C = panda | x ) > 0 . 9 x P ( C = gibbon | x 0 ) > 0 . 9 x + ✏ = x 0 3
Examples of attacks Attack name Algorithm Properties ! Fast, one step Fast Gradient Sign Gradient Ascent – ! Non-optimal (FGS) maximize loss of neural network (NN) adversaries ! Fast, one step Gradient Descent – Targeted FGS ! Non-optimal minimize loss of NN toward a target class adversaries ! Moderate - iterative Project the sample to DeepFool ! Better adversaries the nearest decision boundary ! Slow - iterative Directly optimizing Carlini & Wagner ! Optimal adversaries (CW) an objective function 4
Attack models Black-box attack ! Attacker does not know any thing about the victim classifier ! Another NN used as proxy to current classifier Gray-box attack ! Attacker knows the defense mechanism but not the victim’s specific model parameters White-box attack ! Attacker knows the defense mechanism and all the victim’s model parameters 5
The goal Without any specific adversarial training , detect adversarial instances by calibrating our model predictive confidence Reducing predictive confidence over adversarial examples while keeping that of clean samples high How: leveraging diversity in the ensemble of specialists 6
Ensemble of specialists Build an ensemble of specialists, each trained on different set of classes ! Train several specialist neural networks on subset of classes ! A generalist network is also trained on all the classes Example: a 3-class classification problem Generalist NN, 3 classes Specialist NN, 2 classes Specialist NN, 2 classes 7
Schematic explanation (1/2) A black-box attack fooling the generalist classifier (left) can be classified as different classes by the specialists, creating diversity (entropy) in their predictions ! low confidence prediction 8
Schematic Explanation (2/2) Hardening generation of high confidence white-box adversaries as the specialists fool toward distinct fooling classes given the subsets 9
Our approach 1. Creation of the ensemble of specialists 2. Voting Mechanism : merging members predictions to compute the ensemble decision 3. Reject the samples when predictive confidence < ! 10
Ensemble creation 2) For each class, define two class 1) Build a fooling matrix by FGS subsets: most likely fooling classes and adversaries remaining classes True classes Fool classes "#$%&'()$')$*)+*,-.*$/0$12$+3*4('.(+5+$6$7$8*)*&'.(+5 11
Voting mechanism Principle : a sample should be classified by all the relevant models to its given class Agreement: All relevant models vote for a given class, only their prediction confidences are averaged as output Disagreement: There is no agreement from all relevant models to a given class, prediction confidences of all models are averaged as output 12
Confidence upper bound In the presence of disagreement, the ensemble predictive confidence is upper bounded ( M is the ensemble size) 1 ¯ h ( x ) ≤ 0 . 5 + 2 M Based on this corollary, we set a fixed global threshold (i.e. ! =0.5), for rejecting adversaries 13
Evaluation metrics Risk rate for clean samples ( E D ) : ratio of correctly classified but rejected samples (i.e. confidence less than ! ) and those that are misclassified but not rejected (i.e. confidence value above ! ) Risk rate for adversaries ( E A ) : percentage of misclassified adversaries that are not rejected (i.e. confidence value above ! ) 14
Experiments MNIST black-box attacks 9.*')$+',3.*+ :;<$'=>*&+'&(*+ 15
Experiments CIFAR-10 black-box attacks 9.*')$+',3.*+ :;<$'=>*&+'&(*+ 16
Experiments White-box attacks ! Generating adversaries specifically targeted for our ensemble model, pure ensemble, or a vanilla CNN ! Lower success rate is better! It shows the difficulty of adversaries generation 17
Experiments Gray-box attack ! Generate 100 CW adversarial examples using another ensemble specialists ! 74% of them have confidence lower than 0.5 (rejection) ! 26% remaining have confidence higher than 0.5 (not rejected) !"#$%&"'%($)$*'$+% ,+-$(.,(/$.%,($%0,(+% '"%($*"1&/2$%$-$&%3"(% ,%04#,&%"5.$(-$(%% 18
Our contributions Method for building an ensemble of diverse specialists along with a simple and computationally efficient voting mechanism for calibrating the predictive confidence for distinguising clean and adversarial examples Detecting adversaries using a provable fixed global threshold on the predictive confidence 19
!"#$%&'()*' your #++,$+-)$'. Mahdieh Abbasi, PhD student mahdieh.abbasi.1@ulaval.ca Christian Gagné, professor christian.gagne@gel.ulaval.ca https://vision.gel.ulaval.ca/~cgagne https://iid.ulaval.ca Arezoo Rajabi, PhD student rajabia@oregonstate.edu Rakesh B. Booba, professor rakesh.bobba@oregonstate.edu !""#$%&&''($)*+',*-$"."')'/0&#'*#1'&2*22.3+.4'$! 20
Recommend
More recommend