adversarial machine learning
play

Adversarial Machine Learning (AML) Somesh Jha University of - PowerPoint PPT Presentation

Adversarial Machine Learning (AML) Somesh Jha University of Wisconsin, Madison Thanks to Nicolas Papernot, Ian Goodfellow, and Jerry Zhu for some slides . Machine learning brings social disruption at scale Healthcare Energy Source: Peng and


  1. Adversarial Machine Learning (AML) Somesh Jha University of Wisconsin, Madison Thanks to Nicolas Papernot, Ian Goodfellow, and Jerry Zhu for some slides .

  2. Machine learning brings social disruption at scale Healthcare Energy Source: Peng and Gulshan (2017) Source: Deepmind Transportation Education Source: Google Source: Gradescope 2

  3. Machine learning is not magic (training time) Training data 3

  4. Machine learning is not magic (inference time) C ? 4

  5. Machine learning is deployed in adversarial settings YouTube filtering Microsoft’s Tay chatbot Content evades detection at inference Training data poisoning 5

  6. Machine learning does not always generalize well Training data Test data 6

  7. ML reached “human -level performance” on many IID tasks circa 2013 ...recognizing objects and faces…. (Szegedy et al, 2014) (Taigmen et al, 2013) ...solving CAPTCHAS and reading addresses... (Goodfellow et al, 2013) (Goodfellow et al, 2013)

  8. Caveats to “human - level” benchmarks The test data is not very diverse. ML models are fooled Humans are not very good by natural but unusual data. at some parts of the benchmark (Goodfellow 2018)

  9. ML (Basics) • Supervised learning • Entities • (Sample Space) 𝑎 = 𝑌 × 𝑍 • (data, label) 𝑦, 𝑧 • (Distribution over 𝑎 ) 𝐸 • (Hypothesis Space) 𝐼 • (loss function) 𝑚: 𝐼 × 𝑎 → 𝑆

  10. ML (Basics) • Learner’s problem • Find 𝑥 ∈ 𝐼 that minimizes • 𝑆 (regularizer) • 𝐹 𝑨∼𝐸 𝑚 𝑥, 𝑨 + 𝜇 𝑆 𝑥 𝑚 𝑥, 𝑦 𝑗 , 𝑧 𝑗 + 𝜇 𝑆(𝑥) 1 𝑛 𝑛 • 𝑗=1 • Sample set 𝑇 = { 𝑦 1 , 𝑧 1 , … , 𝑦 𝑛 , 𝑧 𝑛 } • SGD • (iteration) 𝑥 𝑢 + 1 = 𝑥 𝑢 − 𝜃 𝑢 𝑚 ′ (𝑥 𝑢 , 𝑦 𝑗 𝑢 , 𝑧 𝑗 𝑢 ) • (learning rate) 𝜃 𝑢 • …

  11. ML (Basics) • SGD • How learning rates change? • In what order you process the data? • Sample-SGD • Random-SGD • Do you process in mini batches? • When do you stop?

  12. ML (Basics) • After Training • 𝐺 𝑥 : 𝑌 → 𝑍 • 𝐺 𝑥 (𝑦) = argmax 𝑧∈𝑍 𝑡 𝐺 𝑥 (𝑦) • (softmax layer) 𝑡(𝐺 𝑥 ) • Sometimes we will write 𝐺 𝑥 simply as 𝐺 • 𝑥 will be implicit

  13. ML (Basics) • Logistic Regression • 𝑌 = ℜ 𝑜 , 𝑍 = +1, −1 • 𝐼 = ℜ n • Loss function 𝑚 𝑥, 𝑦, 𝑧 • log 1 + exp −𝑧 𝑥 𝑈 𝑦 • 𝑆 𝑥 = | 𝑥 | 2 • Two probabilities 𝑡(𝐺) = (𝑞 −1 , 𝑞 +1 ) 1 1 • ( 1+exp 𝑥 𝑈 𝑦 , 1+exp −𝑥 𝑈 𝑦 ) • Classification • Predict -1 if 𝑞 −1 > 0.5 • Otherwise predict +1

  14. Adversarial Learning is not new!! • Lowd: I spent the summer of 2004 at Microsoft Research working with Chris Meek on the problem of spam. • We looked at a common technique spammers use to defeat filters: adding "good words" to their emails. • We developed techniques for evaluating the robustness of spam filters, as well as a theoretical framework for the general problem of learning to defeat a classifier (Lowd and Meek, 2005) • But… • New resurgence in ML and hence new problems • Lot of new theoretical techniques being developed • High dimensional robust statistics, robust optimization, …

  15. Attacks on the machine learning pipeline Learned Parameters Learning algorithm Training data Attack X ✓ y Training data Training set Test input Test output poisoning Adversarial Examples Model theft

  16. I.I.D. Machine Learning I: Independent Train Test I: Identically D: Distributed All train and test examples drawn independently from same distribution

  17. Security Requires Moving Beyond I.I.D. • Not identical: attackers can use unusual inputs (Eykholt et al, 2017) • Not independent: attacker can repeatedly send a single mistake (“test set attack”)

  18. Training Time Attack

  19. Attacks on the machine learning pipeline Learned Parameters Learning algorithm Training data Attack X ✓ y Training data Training set Test input Test output poisoning Adversarial Examples Model theft

  20. Training time 7 • Setting: attacker perturbs training set to fool a model on a test set • Training data from users is fundamentally a huge security hole • More subtle and potentially more pernicious than test time attacks, due to coordination of multiple points

  21. Lake Mendota Ice Days

  22. Poisoning Attacks

  23. Formalization • Alice picks a data set 𝑇 of size 𝑛 • Alice gives the data set to Bob • Bob picks • 𝜗 𝑛 points 𝑇 𝐶 • Gives the data set 𝑇 ∪ 𝑇 𝐶 back to Alice • Or could replace some points in 𝑇 • Goal of Bob • Maximize the error for Alice • Goal of Alice • Get close to learning from clean data

  24. Representative Papers • Being Robust (in High Dimensions) Can be Practical I. Diakonikolas, G. Kamath, D. Kane, J. Li, A. Moitra, A. Stewart ICML 2017 • Certified Defenses for Data Poisoining Attacks. Jacob Steinhardt, Pang Wei Koh, Percy Liang. NIPS 2017 • ….

  25. Attacks on the machine learning pipeline Learned Parameters Learning algorithm Training data Attack X ✓ y Training data Training set Test input Test output poisoning Adversarial Examples Model theft

  26. Model Extraction/Theft Attack

  27. Model Theft • Model theft: extract model parameters by queries (intellectual property theft) • Given a classifier 𝐺 • Query 𝐺 on 𝑟 1 , … , 𝑟 𝑜 and learn a classifier 𝐻 • 𝐺 ≈ 𝐻 • Goals: leverage active learning literature to develop new attacks and preventive techniques • Paper: Stealing Machine Learning Models using Prediction APIs , Tramer et al., Usenix Security 2016

  28. Fake News Attacks 9 Abusive use of machine learning: Using GANs to generate fake content (a.k.a deep fakes) Strong societal implications : elections, automated trolling, court Generative media : evidence … ● Video of Obama saying things he never said, ... ● Automated reviews, tweets, comments, indistinguishable from human-generated content

  29. Attacks on the machine learning pipeline Learned Parameters Learning algorithm Training data Attack X ✓ y Training data Training set Test input Test output poisoning Adversarial Examples Model theft

  30. Definition “Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake” (Goodfellow et al 2017)

  31. What if the adversary systematically found these inputs? 31 Biggio et al., Szegedy et al., Goodfellow et al., Papernot et al.

  32. Good models make surprising mistakes in non-IID setting “Adversarial examples” + = Schoolbus Ostrich Perturbation (rescaled for visualization) (Szegedy et al, 2013)

  33. Adversarial examples... … beyond deep learning … beyond computer vision P[X= Malware ] = 0.90 P[X=Benign] = 0.10 P[X*=Malware] = 0.10 P[X*= Benign ] = 0.90 Logistic Regression Nearest Neighbors Support Vector Machines Decision Trees 33

  34. Threat Model • White Box • Complete access to the classifier 𝐺 • Black Box • Oracle access to the classifier 𝐺 • for a data 𝑦 receive 𝐺(𝑦) • Grey Box • Black- Box + “some other information” • Example: structure of the defense

  35. Metric 𝜈 for a vector < 𝑦 1 , … , 𝑦 𝑜 > • 𝑀 ∞ 𝑜 • max 𝑗=1 | 𝑦 𝑗 | • 𝑀 1 • 𝑦 1 + … + |𝑦 𝑜 | • 𝑀 𝑞 ( 𝑞 ≥ 2) 𝑦 1 𝑞 + … + 𝑦 𝑜 𝑞 𝑟 • 1 • Where 𝑟 = 𝑞

  36. White Box • Adversary’s problem • Given: 𝑦 ∈ 𝑌 • Find 𝜀 • min 𝜈 𝜀 𝜀 • Such that: 𝐺 𝑦 + 𝜀 ∈ 𝑈 • Where: 𝑈 ⊆ 𝑍 • Misclassification: 𝑈 = 𝑍 − 𝐺 𝑦 • Targeted: 𝑈 = {𝑢}

  37. FGSM (misclassification) • Take a step in the • direction of the gradient of the loss function • 𝜀 = 𝜗 𝑡𝑗𝑕𝑜(Δ 𝑦 𝑚 𝑥, 𝑦, 𝐺 𝑦 ) • Essentially opposite of what SGD step is doing • Paper • Goodfellow, Shlens, Szegedy. Explaining and harnessing adversarial examples. ICLR 2018

  38. PGD Attack (misclassification) • 𝐶 𝑦, 𝜗 𝑟 • 𝑟 = ∞, 1 , 2, … . • A ϵ ball around 𝑦 • Initial • 𝑦 0 = 𝑦 • Iterate 𝑙 ≥ 1 • 𝑦 𝑙 = 𝑄𝑠𝑝𝑘 𝐶 𝑦, 𝜗 𝑟 [ 𝑦 𝑙−1 + 𝜗 𝑡𝑗𝑕𝑜 Δ 𝑦 𝑚 𝑥, 𝑦, 𝐺 𝑦 ]

  39. J SMA (Targetted) The Limitations of Deep Learning in Adversarial Settings [IEEE EuroS&P 2016] 39 Nicolas Papernot , Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami

  40. Carlini-Wagner (CW) (targeted) Formulation ● min | 𝜀 | 2 ○ 𝜀 Such that 𝐺 𝑦 + 𝜀 = 𝑢 ■ Define ● 𝑕 𝑦 = max(𝑛𝑏𝑦 𝑗 !=𝑢 𝑎 𝐺 𝑦 𝑗 − 𝑎 𝐺 𝑦 𝑢 , −𝜆) ○ Replace the constraint ○ 𝑕 𝑦 ≤ 0 ■ Paper ● Nicholas Carlini and David Wagner. Towards Evaluating the Robustness of Neural Networks. ○ Oakland 2017.

  41. CW (Contd) The optimization problem ● min 𝜀 2 ○ 𝜀 Such that 𝑕 𝑦 ≤ 0 ■ Lagrangian trick ● min 𝜀 2 + 𝑑 𝑕 𝑦 ■ δ Use existing solvers for unconstrained optimization ● Adam ○ Find 𝑑 using grid search ○

Recommend


More recommend