Tutorial on Adversarial Machine Learning with CleverHans Nicholas - PowerPoint PPT Presentation

@NicolasPapernot Tutorial on Adversarial Machine Learning with CleverHans Nicholas Carlini University of California, Berkeley Nicolas Papernot Pennsylvania State University Did you git clone https://github.com/carlini/odsc_adversarial_nn ? November 2017 - ODSC

Getting setup If you have not already: git clone https://github.com/carlini/odsc_adversarial_nn cd odsc_adversarial_nn python test_install.py 2

Why neural networks? 3

Classification with neural networks Machine Learning [0.01, 0.84 , 0.02, 0.01, 0.01, 0.01, 0.05, 0.01, 0.03, 0.01] Classifier [p(0|x,θ), p(1|x,θ), p(2|x,θ), …, p(7|x,θ), p(8|x,θ), p(9|x,θ)] x f(x,θ) Classifier : map inputs to one class among a predefined set 4

D I N E J O A S F K P B T G L Q C H M R 8

[0 1 0 0 0 0 0 0 0 0] [0 1 0 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 1 0 0] Machine Learning [0 0 0 0 0 0 0 0 0 1] [0 0 0 1 0 0 0 0 0 0] Classifier [0 0 0 0 0 0 0 0 1 0] [0 0 0 0 0 0 1 0 0 0] [0 1 0 0 0 0 0 0 0 0] [0 0 0 0 1 0 0 0 0 0] Learning : find internal classifier parameters θ that minimize a cost/loss function (~model error) 11

NNs give better results than any other approach But there’s a catch ... 12

Adversarial examples 13 [GSS15] Goodfellow et al. Explaining and Harnessing Adversarial Examples

Crafting adversarial examples: fast gradient sign method During training, the classifier uses a loss function to minimize model prediction errors After training, attacker uses loss function to maximize model prediction error 1. Compute its gradient with respect to the input of the model 2. Take the sign of the gradient and multiply it by a threshold 15 [GSS15] Goodfellow et al. Explaining and Harnessing Adversarial Examples

Transferability 16

Not specific to neural networks Logistic regression SVM 17 Nearest Neighbors Decision Trees

Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) six = tf.constant(6) sess.run(five+six) # 11 18

Machine Learning with TensorFlow import tensorflow as tf sess = tf.Session() five = tf.constant(5) number = tf.placeholder(tf.float32, []) added = five+number sess.run(added, {number: 6}) # 11 sess.run(added, {number: 8}) # 13 19

Machine Learning with TensorFlow import tensorflow as tf number = tf.placeholder(tf.float32, []) squared = number * number derivative = tf.gradients(squared, [number])[0] sess.run(derivative, {number: 5}) # 10 20

Classifying ImageNet with the Inception Model [Hands On] 21

Attacking ImageNet 22

Growing community 1.3K+ stars 300+ forks 40+ contributors 24

Attacking the Inception Model for ImageNet [Hands On] python attack.py Replace panda.png with adversarial_panda.png python classify.py Things to try: 1. Replace the given image of a panda with your own image 2. Change the target label which the adversarial example should be classified as 25

Adversarial Training 7 Training 2 26

Adversarial Training 7 Attack Attack 2 27

Adversarial Training 7 Training 2 7 2 28

Adversarial training Intuition: injecting adversarial example during training with correct labels Goal: improve model generalization outside of training manifold Training time (epochs) 29 Figure by Ian Goodfellow

Efficient Adversarial Training through Loss Modification Small when prediction is correct on legitimate input 33

Efficient Adversarial Training through Loss Modification Small when prediction is Small when prediction is correct on legitimate input correct on adversarial input 34

Adversarial Training Demo 35

Attacking remotely hosted black-box models Remote ML sys “0” “1” “4” (1) The adversary queries remote ML system for labels on inputs of its choice. 36

Attacking remotely hosted black-box models Local Remote substitute ML sys “0” “1” “4” (2) The adversary uses this labeled data to train a local substitute for the remote system. 37

Attacking remotely hosted black-box models Local Remote substitute ML sys “0” “2” “9” (3) The adversary selects new synthetic inputs for queries to the remote ML system based on the local substitute’s output surface sensitivity to input variations. 38

Attacking remotely hosted black-box models “yield sign” Local Remote substitute ML sys (4) The adversary then uses the local substitute to craft adversarial examples, which are misclassified by the remote ML system because of transferability. 39

Attacking with transferability “yield sign” Undefended Defended model model (4) The adversary then uses the local substitute to craft adversarial examples, which are misclassified by the remote ML system because of transferability. 40

Attacking Adversarial Training with Transferability Demo 41

How to test your model for adversarial examples? White-box attacks One shot ● FastGradientMethod Iterative/Optimization-based ● BasicIterativeMethod, CarliniWagnerL2 Transferability attacks ● Transfer from undefended ● Transfer from defended 42

Defenses Adversarial training: - Original variant - Ensemble adversarial training - Madry et al. Reduce dimensionality of input space: - Binarization of the inputs - Thermometer-encoding 43

Adversarial examples represent worst-case distribution drifts 44 [DDS04] Dalvi et al. Adversarial Classification (KDD)

Adversarial examples are a tangible instance of hypothetical AI safety problems 45 Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg

How to reach out to us? Nicholas Carlini nicholas@carlini.com Nicolas Papernot nicolas@papernot.fr 46

Tutorial on Adversarial Machine Learning with CleverHans Nicholas - PowerPoint PPT Presentation

@NicolasPapernot Tutorial on Adversarial Machine Learning with CleverHans Nicholas Carlini University of California, Berkeley Nicolas Papernot Pennsylvania State University Did you git clone https://github.com/carlini/odsc_adversarial_nn ?

Tutorial Tutorial A2 is out, its called Inpainting Tutorial Tutorial A2 is out, its called

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

A GAMS TUTORIAL A GAMS TUTORIAL A GAMS TUTORIAL WHAT IS GAMS ? General Algebraic Modeling

AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning Florian Tramr November

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Adversarial Robustness: Theory and Practice Zico Kolter Aleksander Mdry madry-lab.ml

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Discussion: Remote Timing Attacks are Practical 600.624 2/11/05 Outline Why are timing

Remote Timing Attacks are Still Practical Billy Brumley Nicola Tuveri Aalto University School of

AOS Linux Tutorial Remote Access and Transferring Files Michael Havas Dept. of Atmospheric and

Linux on Sun Logical Domains David S. Miller Red Hat Inc. linux.conf.au, MEL8OURNE, 2008 David

Lec08: Remote Exploit Taesoo Kim 2 Scoreboard 3 NSA Codebreaker Challenges 4 Administrivia

Uncovering Zero-Days and advanced fuzzing How to successfully get the tools to unlock UNIX and

Security Analysis of Emerging Smart Home Applications Earlence Fernandes, Jaeyeon Jung, Atul

A WMI S HELL A new way to get shells on remote Windows machines using only the WMI service LEXSI