Neural Networks: Powerful yet Mysterious MNIST (hand-written digit - PowerPoint PPT Presentation

Neural Networks: Powerful yet Mysterious MNIST (hand-written digit recognition) • Power lies in the • The working mechanism complexity of DNN is hard to understand • 3-layer DNN with 10K neurons and 25M weights • DNNs work as black- boxes Photo credit: Denis Dmitriev 2

How do we test DNNs? • We test it using test samples • Recent work try to explain DNN’s behavior on certain samples • If DNN behaves correctly on test samples, then we think the model is • E.g. LIME correct 3

What about untested samples? • Interpretability doesn’t solve all the problems • Focus on “understanding” DNN’s decision on tested samples Tested Sasmples • ≠ “predict” how DNNs would behave on untested samples Untested Sasmples • Exhaustively testing all possible samples is impossible We cannot control DNNs’ behavior on untested samples 4

Could DNNs be compromised? • Multiple examples of DNNs making disastrous mistakes • What if attacker could plant backdoors into DNNs • To trigger unexpected behavior the attacker specifies 5

Definition of Backdoor • Hidden malicious behavior trained into a DNN Attacker-specified • DNN behaves normally on clean inputs behavior on any input with trigger Adversarial Inputs Trigger “Stop” “Speed limit” “Yield” “Speed limit” Backdoored “Do not enter” “Speed limit” DNN 6

Prior Work on Injecting Backdoor • BadNets : poison the training set [1] 1) Configuration 2) Training w/ poisoned dataset “stop sign” Train Infected Modified Trigger: Model samples “do not enter” Target label: “speed limit” Learn patterns of both “speed limit” normal data and the trigger • Trojan : automatically design a trigger for more effective attack [2] • Design a trigger to maximally fire specific neurons (build a stronger connection) [1]: “Badnets: Identifying vulnerabilities in the machine learning model supply chain.” MLSec’17 (co-located w/ NIPS) [2]: “Trojaning Attack on Neural Networks.” NDSS’18 7

Defense Goals and Assumptions • Goals Detection Mitigation • Whether a DNN is infected? • Detect and reject adversarial inputs • If so, what is the target label? • Patch the DNN to remove the backdoor • What is the trigger used? • Assumptions Has access to • A set of correctly labeled samples • Computational resources Does NOT have access to Infected DNN User • Poisoned samples used by the attacker 8

Key Intuition of Detecting Backdoor • Definition of backdoor: misclassify any sample with trigger into the target label, regardless of its original label Infected model Clean model Trigger Decision Dimension Adversarial samples Minimum ∆ needed Boundary to misclassify all A A B C samples into A Normal Normal B C Dimension Dimension Minimum ∆ needed to Intuition: In an infected model, it requires much misclassify all samples into smaller modification to cause misclassification into A the target label than into other uninfected labels 9

Design Overview: Detection 𝑧↓ 1 1. If the model is infected? 𝑧↓ 2 (if any label has small trigger and appears as outlier?) Outlier detection 2. Which label is the target label? to compare trigger size 𝑧↓𝑢 (which label appears as outlier?) 3. How the backdoor attack works? 𝑧↓𝑜 (what is the trigger for the target label?) Reverse-engineered trigger: Minimum ∆ needed to misclassify all samples into 𝑧↓𝑗 10

Experiment Setup • Train 4 BadNets models • Use 2 Trojan models shared by prior work • Clean models for each task # of # of Attack Classification Accuracy Model Name Input Size Labels Layers Success Rate (change of accuracy) MNIST 28 × 28 × 1 10 4 99.90% 98.54% ( ↓ 0.34%) GTSRB 32 × 32 × 3 43 8 97.40% 96.51% ( ↓ 0.32%) BadNets YouTube Face 55 × 47 × 3 1,283 8 97.20% 97.50% ( ↓ 0.64%) PubFig 224 × 224 × 3 65 16 95.69% 95.69% ( ↓ 2.62%) Trojan Square 224 × 224 × 3 2,622 16 99.90% 70.80% ( ↓ 6.40%) Trojan Trojan 224 × 224 × 3 2,622 16 97.60% 71.40% ( ↓ 5.80%) Watermark 11

Backdoor Detection Performance (1/3) • Q1: If a DNN is infected? Infected 6 Successfully detect Infected Clean 5 all infected models Anomaly Index 4 3 2 1 Clean 0 MNIST GTSRB YouTube PubFig Trojan Trojan Face Square Watermark 12

Backdoor Detection Performance (2/3) • Q2: Which label is the target label? Infected target label always has the smallest 𝑀↓ 1 norm Infected 13

Backdoor Detection Performance (3/3) • Both triggers fire similar neurons • Q3: What is the trigger used by the backdoor? • Reversed trigger is more compact Badnets : visually similar Trojan : not similar Injected Trigger Reversed Trigger YouTube Trojan Trojan MNIST GTSRB PubFig Face Square Watermark 14

Brief Summary of Mitigation • Detect adversarial inputs Adversarial Inputs • Flag inputs with high activation on Detect and reject malicious neurons adversarial • With 5% FPR, we achieve <1.63% FNR inputs on BadNets models (<28.5% on Trojan Proactive Filter models) Patch • Patch models via unlearning Remove backdoor • Train DNN to make correct prediction when an input has the reversed trigger Robus • Reduce attack success rate to <6.70% Infected DNN t with <3.60% drop of accuracy 15

One More Thing • Many other interesting results in the paper • More complex patterns? • Multiple infected labels? • What if a label is infected with not just one backdoor? • Code is available on github.com/bolunwang/backdoor 16

Neural Networks: Powerful yet Mysterious MNIST (hand-written digit - PowerPoint PPT Presentation

Neural Networks: Powerful yet Mysterious MNIST (hand-written digit recognition) Power lies in the The working mechanism complexity of DNN is hard to understand 3-layer DNN with 10K neurons and 25M weights DNNs work as black-

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

SAINT PETERSBURG MYSTERIOUS IN WINTERTIME WHITE DAYS During the mysterious and beautiful Russian

String theory and the String theory and the mysterious quantum matter of mysterious quantum

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Present and Powerful Present and Powerful Psalm 46:1 God is our refuge and strength, an

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Training Neural Networks CMSC 470 Marine Carpuat Neural Networks so far Powerful non-linear

Language-based Security FOSAD 2008 Steve Zdancewic University of Pennsylvania Confidential Data

Information Flow ( Chapter 5 of the lecture notes) Erik Poll Digital Security group Radboud

I n t r o t o P o s t g r e S Q L S e c u r i t y NordicPGDay 2014 Stockholm, Sweden Stephen

Operating System Security It is also important to understand the operations and the

Improving Usability of Information Flow Security in Java Mark Thober Joint work with Scott F.

Expertise knowledge-based Policy Refinement Process T. Rochaeli and C. Eckert Technische

Review of External Security Models Michael McCool Intel Osaka, W3C Web of Things F2F, 17 May

Probable Security of Networks LI Angsheng Institute of Software Chinese Academy of Sciences

Sambuz

Useful Links

Newsletter

Mail Us