Recent Trends in Adversarial Machine Learning Thanks to Ian - PowerPoint PPT Presentation

Recent Trends in Adversarial Machine Learning Thanks to Ian Goodfellow, Somesh Jha, Patrick McDaniel, and Nicolas Papernot for some slides December 4, 2018 Berkay Celik

How it works … training … Learning Algorithm Training Data Model (Deep Learning, Decision Trees, others … ) Learning : find classifier function that minimize a cost/loss (~model error)

How it works … run-time … Machine [0.01, 0.84 , 0.02, 0.01, Learning 0.01, 0.01, 0.05, 0.01, 0.03, Classifier 0.01] Inference time : which ”class” is most like the input sample

An Example … M components N components p0=0.01 … p1=0.93 … … … p8=0.02 { pN=0.01 Input Layer Hidden Layers Output Layer (e.g., convolutional, rectified linear, …) Neuron Weighted Link (weight is a parameter part of ) θ O

I.I.D. Machine Learning I: Independent I: Identically D: Distributed All train and test examples drawn independently from same distribution

ML reached “human-level performance” on many IID tasks circa 2013 ...recognizing objects and faces…. (Szegedy et al, 2014) (Taigmen et al, 2013) ...solving CAPTCHAS and reading addresses... (Goodfellow et al, 2013) (Goodfellow et al, 2013)

Caveats to “human-level” benchmarks The test data is not very Humans are not very good diverse. ML models are fooled at some parts of the by natural but unusual data. benchmark

Security Requires Moving Beyond I.I.D. • Not identical: attackers can use unusual inputs (Eykholt et al, 2017) • Not independent: attacker can repeatedly send a single mistake (“test set attack”)

Good models make surprising mistakes in non-IID setting “Adversarial examples” + = Schoolbus Ostrich Perturbation (rescaled for visualization) (Szegedy et al, 2013)

Adversarial Examples

Attacks on the machine learning pipeline Learned parameters Recovery of sensitive Learning algorithm training data Training data Training set Test output poisoning Test input Model theft Adversarial Examples

Definition “Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake” (Goodfellow et al 2017)

Threat Model

Fifty Shades of Gray Box Attacks • Does the attacker go first, and the defender reacts? • This is easy, just train on the attacks, or design some preprocessing to remove them • If the defender goes first • Does the attacker have full knowledge? This is “white box” • Limited knowledge: “black box” • Does the attacker know the task the model is solving (input space, output space, defender cost) ? • Does the attacker know the machine learning algorithm being used? • Details of the algorithm? (Neural net architecture, etc.) • Learned parameters of the model? • Can the attacker send “probes” to see how the defender processes different test inputs? • Does the attacker observe just the output class? Or also the probabilities?

Roadmap • WHITE-BOX ATTACKS • BLACK-BOX ATTACKS • TRANSFERABILITY • DEFENSE TECHNIQUES

White Box Attacks

FGSM (Misclassification)

Intuition

JSMA (targeted)

Carlini-Wagner (CW) (targeted)

Success of an adversarial image Experiments excluding MNIST 1s, many of which look like 7s Pair Diff L 0 L 1 L 2 L ∞ L 0 63 35.0 4.86 1.0 .996 L 1 91 19.9 3.21 1.0 L 2 110 21.7 2.83 L ∞ 121 34.0 3.82 .76

Black-box Attacks

Transferability

Roadmap • WHITE-BOX ATTACKS • BLACK-BOX ATTACKS • TRANSFERABILITY • DEFENSE TECHNIQUES

Pipeline of Defense Failures Does not generalize over threat models Seems to generalize, but it’s an illusion Does not generalize over attack algos Does not affect adaptive attacker Reduces advx, but reduces clean accuracy too much No effect on advx

Pipeline of Defense Failures Dropout at Train Time Does not generalize over threat models Seems to generalize, but it’s an illusion Does not generalize over attack algos Does not affect adaptive attacker Reduces advx, but reduces clean accuracy too much No effect on advx

Pipeline of Defense Failures Weight Decay Does not generalize over threat models Seems to generalize, but it’s an illusion Does not generalize over attack algos Does not affect adaptive attacker Reduces advx, but reduces clean accuracy too much No effect on advx

Pipeline of Defense Failures original foveal Cropping / fovea mechanisms Does not generalize over threat models Seems to generalize, but it’s an illusion Does not generalize over attack algos Does not affect adaptive attacker Reduces advx, but reduces clean accuracy too much No effect on advx

Pipeline of Defense Failures Adversarial Training with a Weak Attack Does not generalize over threat models Seems to generalize, but it’s an illusion Does not generalize over attack algos Does not affect adaptive attacker Reduces advx, but reduces clean accuracy too much No effect on advx

Pipeline of Defense Failures Defensive Distillation Does not generalize over threat models Seems to generalize, but it’s an illusion Does not generalize over attack algos Does not affect adaptive attacker Reduces advx, but reduces clean accuracy too much No effect on advx

Pipeline of Defense Failures Adversarial Training with a Strong Attack Current Certified / Provable Defenses Does not generalize over threat models Seems to generalize, but it’s an illusion Does not generalize over attack algos Does not affect adaptive attacker Reduces advx, but reduces clean accuracy too much No effect on advx

What’s next defense?

Future Directions • Common goal (AML and ML) • Just make the model better • They still share this goal • It is now clear security research must have some independent goals. For two models with the same error volume, for reasons of security we prefer: • The model with lower confidence on mistakes • The model whose mistakes are harder to find

h.ps://beerkay.github.io @ZBerkayCelik THANKS! December 4, 2018 Berkay Celik

Recent Trends in Adversarial Machine Learning Thanks to Ian - PowerPoint PPT Presentation

Recent Trends in Adversarial Machine Learning Thanks to Ian Goodfellow, Somesh Jha, Patrick McDaniel, and Nicolas Papernot for some slides December 4, 2018 Berkay Celik How it works training Learning Algorithm Training Data Model

Recent Advances in Adversarial Machine Learning Nicholas Carlini Google Research Recent

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning Florian Tramr November

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Camera Calibration and Stereo Various slides from previous courses by: D.A. Forsyth (Berkeley /

Challenges in Networking to Support Augmented Reality and Virtual Reality Cedric Westphal

Electromagnetic Radiation - Spectrum Color Representation Short- AC Ultra- Gamma X rays

7 Presentation Thanks to John Stasko, Robert Spence, Ross Ihaka, Marti Hearst, Kent

MMSys 2018 12 June 18 1 Principal Consultant, TNO President, VR Industry Forum

Computational Methods for Immersive Perception Qi Sun Stony Brook University Warehouse/Arts

Primary Colors RYB : Art Spectrum, RYB noble hues,

Camera Obscura Image Formation (approximately) Vision infers world properties form images.

Recent Trends in Adversarial Machine Learning Thanks to Ian - PowerPoint PPT Presentation

Recent Trends in Adversarial Machine Learning Thanks to Ian Goodfellow, Somesh Jha, Patrick McDaniel, and Nicolas Papernot for some slides December 4, 2018 Berkay Celik How it works training Learning Algorithm Training Data Model

Recent Advances in Adversarial Machine Learning Nicholas Carlini Google Research Recent

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning Florian Tramr November

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Camera Calibration and Stereo Various slides from previous courses by: D.A. Forsyth (Berkeley /

Challenges in Networking to Support Augmented Reality and Virtual Reality Cedric Westphal

Electromagnetic Radiation - Spectrum Color Representation Short- AC Ultra- Gamma X rays

7 Presentation Thanks to John Stasko, Robert Spence, Ross Ihaka, Marti Hearst, Kent

MMSys 2018 12 June 18 1 Principal Consultant, TNO President, VR Industry Forum

Computational Methods for Immersive Perception Qi Sun Stony Brook University Warehouse/Arts

Primary Colors RYB : Art Spectrum, RYB noble hues,

Camera Obscura Image Formation (approximately) Vision infers world properties form images.

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin