Adversarial Machine Learning (AML) Somesh Jha University of Wisconsin, Madison Thanks to Nicolas Papernot, Ian Goodfellow, and Jerry Zhu for some slides .
Machine learning brings social disruption at scale Healthcare Energy Source: Peng and Gulshan (2017) Source: Deepmind Transportation Education Source: Google Source: Gradescope 2
Machine learning is not magic (training time) Training data 3
Machine learning is not magic (inference time) C ? 4
Machine learning is deployed in adversarial settings YouTube filtering Microsoft’s Tay chatbot Content evades detection at inference Training data poisoning 5
Machine learning does not always generalize well Training data Test data 6
ML reached “human -level performance” on many IID tasks circa 2013 ...recognizing objects and faces…. (Szegedy et al, 2014) (Taigmen et al, 2013) ...solving CAPTCHAS and reading addresses... (Goodfellow et al, 2013) (Goodfellow et al, 2013)
Caveats to “human - level” benchmarks The test data is not very diverse. ML models are fooled Humans are not very good by natural but unusual data. at some parts of the benchmark (Goodfellow 2018)
ML (Basics) • Supervised learning • Entities • (Sample Space) 𝑎 = 𝑌 × 𝑍 • (data, label) 𝑦, 𝑧 • (Distribution over 𝑎 ) 𝐸 • (Hypothesis Space) 𝐼 • (loss function) 𝑚: 𝐼 × 𝑎 → 𝑆
ML (Basics) • Learner’s problem • Find 𝑥 ∈ 𝐼 that minimizes • 𝑆 (regularizer) • 𝐹 𝑨∼𝐸 𝑚 𝑥, 𝑨 + 𝜇 𝑆 𝑥 𝑚 𝑥, 𝑦 𝑗 , 𝑧 𝑗 + 𝜇 𝑆(𝑥) 1 𝑛 𝑛 • 𝑗=1 • Sample set 𝑇 = { 𝑦 1 , 𝑧 1 , … , 𝑦 𝑛 , 𝑧 𝑛 } • SGD • (iteration) 𝑥 𝑢 + 1 = 𝑥 𝑢 − 𝜃 𝑢 𝑚 ′ (𝑥 𝑢 , 𝑦 𝑗 𝑢 , 𝑧 𝑗 𝑢 ) • (learning rate) 𝜃 𝑢 • …
ML (Basics) • SGD • How learning rates change? • In what order you process the data? • Sample-SGD • Random-SGD • Do you process in mini batches? • When do you stop?
ML (Basics) • After Training • 𝐺 𝑥 : 𝑌 → 𝑍 • 𝐺 𝑥 (𝑦) = argmax 𝑧∈𝑍 𝑡 𝐺 𝑥 (𝑦) • (softmax layer) 𝑡(𝐺 𝑥 ) • Sometimes we will write 𝐺 𝑥 simply as 𝐺 • 𝑥 will be implicit
ML (Basics) • Logistic Regression • 𝑌 = ℜ 𝑜 , 𝑍 = +1, −1 • 𝐼 = ℜ n • Loss function 𝑚 𝑥, 𝑦, 𝑧 • log 1 + exp −𝑧 𝑥 𝑈 𝑦 • 𝑆 𝑥 = | 𝑥 | 2 • Two probabilities 𝑡(𝐺) = (𝑞 −1 , 𝑞 +1 ) 1 1 • ( 1+exp 𝑥 𝑈 𝑦 , 1+exp −𝑥 𝑈 𝑦 ) • Classification • Predict -1 if 𝑞 −1 > 0.5 • Otherwise predict +1
Adversarial Learning is not new!! • Lowd: I spent the summer of 2004 at Microsoft Research working with Chris Meek on the problem of spam. • We looked at a common technique spammers use to defeat filters: adding "good words" to their emails. • We developed techniques for evaluating the robustness of spam filters, as well as a theoretical framework for the general problem of learning to defeat a classifier (Lowd and Meek, 2005) • But… • New resurgence in ML and hence new problems • Lot of new theoretical techniques being developed • High dimensional robust statistics, robust optimization, …
Attacks on the machine learning pipeline Learned Parameters Learning algorithm Training data Attack X ✓ y Training data Training set Test input Test output poisoning Adversarial Examples Model theft
I.I.D. Machine Learning I: Independent Train Test I: Identically D: Distributed All train and test examples drawn independently from same distribution
Security Requires Moving Beyond I.I.D. • Not identical: attackers can use unusual inputs (Eykholt et al, 2017) • Not independent: attacker can repeatedly send a single mistake (“test set attack”)
Training Time Attack
Attacks on the machine learning pipeline Learned Parameters Learning algorithm Training data Attack X ✓ y Training data Training set Test input Test output poisoning Adversarial Examples Model theft
Training time 7 • Setting: attacker perturbs training set to fool a model on a test set • Training data from users is fundamentally a huge security hole • More subtle and potentially more pernicious than test time attacks, due to coordination of multiple points
Lake Mendota Ice Days
Poisoning Attacks
Formalization • Alice picks a data set 𝑇 of size 𝑛 • Alice gives the data set to Bob • Bob picks • 𝜗 𝑛 points 𝑇 𝐶 • Gives the data set 𝑇 ∪ 𝑇 𝐶 back to Alice • Or could replace some points in 𝑇 • Goal of Bob • Maximize the error for Alice • Goal of Alice • Get close to learning from clean data
Representative Papers • Being Robust (in High Dimensions) Can be Practical I. Diakonikolas, G. Kamath, D. Kane, J. Li, A. Moitra, A. Stewart ICML 2017 • Certified Defenses for Data Poisoining Attacks. Jacob Steinhardt, Pang Wei Koh, Percy Liang. NIPS 2017 • ….
Attacks on the machine learning pipeline Learned Parameters Learning algorithm Training data Attack X ✓ y Training data Training set Test input Test output poisoning Adversarial Examples Model theft
Model Extraction/Theft Attack
Model Theft • Model theft: extract model parameters by queries (intellectual property theft) • Given a classifier 𝐺 • Query 𝐺 on 𝑟 1 , … , 𝑟 𝑜 and learn a classifier 𝐻 • 𝐺 ≈ 𝐻 • Goals: leverage active learning literature to develop new attacks and preventive techniques • Paper: Stealing Machine Learning Models using Prediction APIs , Tramer et al., Usenix Security 2016
Fake News Attacks 9 Abusive use of machine learning: Using GANs to generate fake content (a.k.a deep fakes) Strong societal implications : elections, automated trolling, court Generative media : evidence … ● Video of Obama saying things he never said, ... ● Automated reviews, tweets, comments, indistinguishable from human-generated content
Attacks on the machine learning pipeline Learned Parameters Learning algorithm Training data Attack X ✓ y Training data Training set Test input Test output poisoning Adversarial Examples Model theft
Definition “Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake” (Goodfellow et al 2017)
What if the adversary systematically found these inputs? 31 Biggio et al., Szegedy et al., Goodfellow et al., Papernot et al.
Good models make surprising mistakes in non-IID setting “Adversarial examples” + = Schoolbus Ostrich Perturbation (rescaled for visualization) (Szegedy et al, 2013)
Adversarial examples... … beyond deep learning … beyond computer vision P[X= Malware ] = 0.90 P[X=Benign] = 0.10 P[X*=Malware] = 0.10 P[X*= Benign ] = 0.90 Logistic Regression Nearest Neighbors Support Vector Machines Decision Trees 33
Threat Model • White Box • Complete access to the classifier 𝐺 • Black Box • Oracle access to the classifier 𝐺 • for a data 𝑦 receive 𝐺(𝑦) • Grey Box • Black- Box + “some other information” • Example: structure of the defense
Metric 𝜈 for a vector < 𝑦 1 , … , 𝑦 𝑜 > • 𝑀 ∞ 𝑜 • max 𝑗=1 | 𝑦 𝑗 | • 𝑀 1 • 𝑦 1 + … + |𝑦 𝑜 | • 𝑀 𝑞 ( 𝑞 ≥ 2) 𝑦 1 𝑞 + … + 𝑦 𝑜 𝑞 𝑟 • 1 • Where 𝑟 = 𝑞
White Box • Adversary’s problem • Given: 𝑦 ∈ 𝑌 • Find 𝜀 • min 𝜈 𝜀 𝜀 • Such that: 𝐺 𝑦 + 𝜀 ∈ 𝑈 • Where: 𝑈 ⊆ 𝑍 • Misclassification: 𝑈 = 𝑍 − 𝐺 𝑦 • Targeted: 𝑈 = {𝑢}
FGSM (misclassification) • Take a step in the • direction of the gradient of the loss function • 𝜀 = 𝜗 𝑡𝑗𝑜(Δ 𝑦 𝑚 𝑥, 𝑦, 𝐺 𝑦 ) • Essentially opposite of what SGD step is doing • Paper • Goodfellow, Shlens, Szegedy. Explaining and harnessing adversarial examples. ICLR 2018
PGD Attack (misclassification) • 𝐶 𝑦, 𝜗 𝑟 • 𝑟 = ∞, 1 , 2, … . • A ϵ ball around 𝑦 • Initial • 𝑦 0 = 𝑦 • Iterate 𝑙 ≥ 1 • 𝑦 𝑙 = 𝑄𝑠𝑝𝑘 𝐶 𝑦, 𝜗 𝑟 [ 𝑦 𝑙−1 + 𝜗 𝑡𝑗𝑜 Δ 𝑦 𝑚 𝑥, 𝑦, 𝐺 𝑦 ]
J SMA (Targetted) The Limitations of Deep Learning in Adversarial Settings [IEEE EuroS&P 2016] 39 Nicolas Papernot , Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami
Carlini-Wagner (CW) (targeted) Formulation ● min | 𝜀 | 2 ○ 𝜀 Such that 𝐺 𝑦 + 𝜀 = 𝑢 ■ Define ● 𝑦 = max(𝑛𝑏𝑦 𝑗 !=𝑢 𝑎 𝐺 𝑦 𝑗 − 𝑎 𝐺 𝑦 𝑢 , −𝜆) ○ Replace the constraint ○ 𝑦 ≤ 0 ■ Paper ● Nicholas Carlini and David Wagner. Towards Evaluating the Robustness of Neural Networks. ○ Oakland 2017.
CW (Contd) The optimization problem ● min 𝜀 2 ○ 𝜀 Such that 𝑦 ≤ 0 ■ Lagrangian trick ● min 𝜀 2 + 𝑑 𝑦 ■ δ Use existing solvers for unconstrained optimization ● Adam ○ Find 𝑑 using grid search ○
Recommend
More recommend