Mag Net A Two-Pronged Defense against Adversarial Examples Dongyu Meng Hao Chen ShanghaiTech University, China University of California, Davis, USA
Neural networks in real-life applications user authentication autonomous vehicle 2
Neural networks as classifier Panda 0.62 Tiger 0.03 Gibbon 0.11 Output Input Classifier (distribution) 3
Adversarial examples x panda Examples carefully crafted to - look like normal examples + x p(x is panda) = 0.58 - cause misclassification gibbon p(x is gibbon) = 0.99 4 [ICLR 15] Goodfellow, Shlens, and Szegedy. EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES
Attacks Fast gradient sign method(FGSM) [Goodfellow, 2015] Carlini’s attack [Carlini, 2017] confidence Iterative gradient sign [Kurakin, 2016] Deepfool [Moosavi-Dezfooli, 2015] …… 5
Defenses target specific attack modify classifier Adversarial training Yes Yes [Goodfellow, 2015] Defensive distillation Yes [Papernot, 2016] Detecting specific attacks Yes [Metzen, 2017] …… 6
Desirable properties Does not modify target classifier. - Can be deployed more easily as an add-on. Does not rely on attack-specific properties. - Generalizes to unknown attacks. 7
Manifold hypothesis Manifold Possible inputs take up dense sample space. But inputs we care about lie on a low dimensional manifold . 8
Our hypothesis for adversarial examples Manifold Some adversarial examples are far away from the manifold. 9 Classifiers are not trained to work on these inputs.
Our hypothesis for adversarial examples Manifold Other adversarial example are close to the manifold boundary where the classifier generalizes poorly . 10
Sanitize your inputs. 11
Our solution Manifold Detector : Decides if the example is far from the manifold. 12
Our solution Manifold Reformer : Draws the example towards the manifold. 13
Workflow MagNet returns y MagNet rejects the input 14
Autoencoder - Neural nets. Reconstruction error: - Learn to copy input to output. - Trained with constraints. 15
Autoencoder Autoencoders - learn to map inputs towards manifold. - approximate input-manifold distance with reconstruction error. Train autoencoders on normal examples only as building blocks. 16
Detector ? -- based on reconstruction error ? Input is normal. x x’ MagNet accepts the input. yes ||X-X’|| 2 < threshold? autoencoder x’ no Input is adversarial. MagNet rejects the input. 17
Detector ? -- based on probability divergence ? P P Panda 0.62 Panda 0.62 Input is normal. Tiger 0.03 Tiger 0.03 x x’ MagNet accepts the input. Gibbon 0.11 Gibbon 0.11 yes ? ? classifier D KL (P||Q) < threshold? D KL (P||Q) autoencoder Q Q Panda ... Panda ... x’ Tiger ... no Tiger ... Input is adversarial. Gibbon ... Gibbon ... MagNet rejects the input. classifier 18
Reformer ? ? Q Panda ... x x’ Tiger ... x’ Gibbon ... classifier autoencoder MagNet returns Q as final classification result. 19
Threat model knows the parameters of ... target classifier defense blackbox defense whitebox defense 20
Blackbox defense on MNIST dataset accuracy on adversarial examples 21
Blackbox defense on CIFAR-10 dataset accuracy on adversarial examples 22
Detector vs. reformer complete MagNet: detector+reformer confidence detector reformer no defense small distortion large distortion less noticeable more transferable (distortion) Detector and reformer complement each other . 23
Whitebox defense is not practical To defeat whitebox attacker, defender has to either - make it impossible for attacker to find adversarial examples, - or create a perfect classification network. 24
Graybox model knows the parameters of... defense classifier A B C D blackbox defense - Attacker knows possible defenses. - Exact defense is only known at run time. graybox defense Defense strategy whitebox - Train diverse defenses. defense - Randomly pick one for each session. 25
Train diverse defenses With MagNet, this means training diverse autoencoders. Our Method: Train n autoencoders at the same time. average reconstructed image reconstruction error Minimize autoencoder diversity 26
Graybox classification accuracy generate attack on Train diverse defense defend with Idea Penalize the resemblance of autoencoders. 27
Limitations The effectiveness of MagNet depends on assumptions that - detector and reformer functions exist. - we can approximate them with autoencoders. We show empirically that these assumptions are likely correct. 28
Conclusion We propose MagNet framework: ● Detector detects examples far from the manifold ● Reformer moves examples closer to the manifold We demonstrated effective defense against adversarial examples in blackbox scenario with MagNet. Instead of whitebox model, we advocate graybox model, where security rests on model diversity. 29
Thanks & Questions? Find more about MagNet: ● https://arxiv.org/abs/1705.09064 Paper ● https://github.com/Trevillie/MagNet Demo code ● mengdy.me Author homepage 30
Recommend
More recommend