Lessons Learned from Evaluating the Robustness of Defenses to - PowerPoint PPT Presentation

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples Nicholas Carlini Google Research

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples

Why should we care about adversarial examples? Make ML Make ML robust better

How do we generate adversarial examples?

Random Direction Truck Random Direction Dog

Random Random Direction Direction Truck Adversarial Adversarial Direction Direction Dog Airplane

A defense is a neural network that 1. Is accurate on the test data 2. Resists adversarial examples

For example: Adversarial Training Claim: Neural networks don't generalize Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. Towards deep learning models resistant to adversarial attacks. ICLR 2018

Normal Training ( , ) 7 F Training ( , ) 3

Adversarial Training (1) ( , ) 7 Attack ( , ) 3 ( , ) 7 ( , ) 3

Adversarial Training (2) ( , ) 7 G Training ( , ) 3 ( , ) 7 ( , ) 3

Or: Thermometer Encoding Claim: Neural networks are "overly linear" Buckman, J., Roy, A., Raffel, C., & Goodfellow, I. Thermometer encoding: One hot way to resist adversarial examples. ICLR 2018

Solution T(0.13) = 1 1 0 0 0 0 0 0 0 0 T(0.66) = 1 1 1 1 1 1 0 0 0 0 T(0.97) = 1 1 1 1 1 1 1 1 1 1

Or: Input Transformations Claim: Perturbations are brittle Guo, C., Rana, M., Cisse, M., & Van Der Maaten, L. Countering adversarial images using input transformations. ICLR 2018

Solution Random Transform

Solution JPEG Compress

What does it meant to evaluate the robustness of a defense?

Standard ML Pipeline model = train_model(x_train, y_train) acc, loss = model.evaluate(   x_test, y_test)   if acc > 0.96: print("State-of-the-art")   else: print("Keep Tuning Hyperparameters")

Standard ML Evaluations model = train_model(x_train, y_train) acc, loss = model.evaluate(   x_test, y_test)   if acc > 0.96: print("State-of-the-art")   else: print("Keep Tuning Hyperparameters")

Standard ML Evaluations model = train_model(x_train, y_train) acc, loss = model.evaluate(   x_test, y_test )   if acc > 0.96: print("State-of-the-art")   else: print("Keep Tuning Hyperparameters")

What are robustness evaluations?

Standard ML Evaluations model = train_model(x_train, y_train) acc, loss = model.evaluate(   x_test, y_test)   if acc > 0.96: print("State-of-the-art")   else: print("Keep Tuning Hyperparameters")

Adversarial ML Evaluations model = train_model(x_train, y_train) acc, loss = model.evaluate(   A( x_test, model ) , y_test)   if acc > 0.96: print("State-of-the-art")   else: print("Keep Tuning Hyperparameters")

How complete are evaluations?

Case Study: ICLR 2018

Serious effort   to evaluate   By space, most papers are ½ evaluation

We re-evalauted these defenses ...

2 Out of scope 4 Broken Defenses Correct Defenses 7

So what did defenses do?

Lessons (1 of 3) what types of defenses are effective

First class of effective defenses:

First class of effective defenses: Adversarial Training

Second class of effective defenses:

Second class of effective defenses: _______________

Lessons (2 of 3) what we've learned from evaluations

So how to attack it?

"Fixing" Gradient Descent [0.1,   0.3, 0.0,   0.2, 0.4]

Lessons (3 of 3) performing better evaluations

Actionable advice requires specific, concrete examples Everything the following papers do is standard practice

Perform an adaptive attack

A "hold out" set is not an adaptive attack

Stop using FGSM (exclusively)

Use more than 100 (or 1000?) iteration of gradient descent

Iterative attacks should always do better than single step attacks.

Unbounded optimization attacks should eventually reach in 0% accuracy

Model accuracy should be monotonically decreasing

Evaluate against the worst attack

Plot accuracy vs distortion

Verify enough iterations of gradient descent

Try gradient-free attack algorithms

Try random noise

The Future

The Year is 1997

Lessons Learned from Evaluating the Robustness of Defenses to - PowerPoint PPT Presentation

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples Nicholas Carlini Google Research Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples Lessons Learned from Evaluating the

Lessons Learned Lessons Learned From From Lessons Learned Lessons Learned From From

UCSD Robustness Summer School David Donoho 20190812 David Donoho UCSD Robustness Summer School

Robustness? Robustness ? Robustness?

Robustness and Generalization Huan Xu The University of Texas at Austin Department of Electrical

Where Are We? Lecture 9 Robustness through Training 1 Robustness Explicit Handling of Noise

Lessons Learned From Sequenced, Integrated Strategies of Economic After Hours Seminar

Some lessons learned from Team Science Some lessons learned from Team Science Lewis Cantley Weill

Opportunities Opportunities Lessons Learned Using Lessons Learned Using Vegetative

OSHA Lessons Learned Adam Fries OSHA Compliance Officer February 13, 2018 OSHA Lessons Learned

Lessons Learned from A Three-Week Lessons Learned from A Three-Week Long User Study w ith

OVERVI EW OF MTN 015 AND OVERVI EW OF MTN 015 AND LESSONS LEARNED LESSONS LEARNED Peter Mutale

Institutionalizing Lessons Learned October 25, 2006 Loren Plisco Region II Background

DEBUGGING LESSONS LEARNED WHILE DEBUGGING LESSONS LEARNED WHILE FIXING NETBSD FIXING NETBSD

3/8/2019 Epidemiology, Risk Factors, and Outcomes of Pediatric PVD: LESSONS learned from the

Ten lessons learned about Ten lessons learned about Ubiquitous Computing Ubiquitous Computing

Lessons Learned A Value Added Product of the Project Life Cycle R Gilman April 19, 2006 Agenda

A New DSP Approach for 5G and AI Albert Camilleri VP Business Development North America VSORA

Abstract Read Permissions Fractional Permissions without the Fractions Alex Summers ETH Zurich

multi-threaded programs Authors: K. Rustan Leino, P. Mller Speaker: Martin Lanter 1

Having an Effective Annual Career Conference (ACC) Center for Faculty Development Goals To

Mapping Observed Variability in TeV Blazars to Possible Causes Adam Higuera, Markos

Network in Network (NiN) Network in Network (NiN) In [1]: import d2l from mxnet import gluon,

Declensions 1 98-348: Lecture 3 This class counts as a linguistics elective! 3 units towards

Effect of Dapagliflozin on Heart Failure and Mortality in Type 2 Diabetes Mellitus Results form