lessons learned from evaluating the robustness of
play

Lessons Learned from Evaluating the Robustness of Defenses to - PowerPoint PPT Presentation

Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples Nicholas Carlini Google Research Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples Lessons Learned from Evaluating the


  1. Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples Nicholas Carlini Google Research

  2. Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples

  3. Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples

  4. Why should we care about adversarial examples? Make ML Make ML robust better

  5. How do we generate adversarial examples?

  6. Random Direction Truck Random Direction Dog

  7. Random Random Direction Direction Truck Adversarial Adversarial Direction Direction Dog Airplane

  8. ( (

  9. Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples

  10. A defense is a neural network that 1. Is accurate on the test data 2. Resists adversarial examples

  11. For example: Adversarial Training Claim: Neural networks don't generalize Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. Towards deep learning models resistant to adversarial attacks. ICLR 2018

  12. Normal Training ( , ) 7 F Training ( , ) 3

  13. Adversarial Training (1) ( , ) 7 Attack ( , ) 3 ( , ) 7 ( , ) 3

  14. Adversarial Training (2) ( , ) 7 G Training ( , ) 3 ( , ) 7 ( , ) 3

  15. Or: Thermometer Encoding Claim: Neural networks are "overly linear" Buckman, J., Roy, A., Raffel, C., & Goodfellow, I. Thermometer encoding: One hot way to resist adversarial examples. ICLR 2018

  16. Solution T(0.13) = 1 1 0 0 0 0 0 0 0 0 T(0.66) = 1 1 1 1 1 1 0 0 0 0 T(0.97) = 1 1 1 1 1 1 1 1 1 1

  17. Or: Input Transformations Claim: Perturbations are brittle Guo, C., Rana, M., Cisse, M., & Van Der Maaten, L. Countering adversarial images using input transformations. ICLR 2018

  18. Solution Random Transform

  19. Solution JPEG Compress

  20. Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples

  21. What does it meant to evaluate the robustness of a defense?

  22. Standard ML Pipeline model = train_model(x_train, y_train) acc, loss = model.evaluate( 
 x_test, y_test) 
 if acc > 0.96: print("State-of-the-art") 
 else: print("Keep Tuning Hyperparameters")

  23. Standard ML Pipeline model = train_model(x_train, y_train) acc, loss = model.evaluate( 
 x_test, y_test) 
 if acc > 0.96: print("State-of-the-art") 
 else: print("Keep Tuning Hyperparameters")

  24. Standard ML Pipeline model = train_model(x_train, y_train) acc, loss = model.evaluate( 
 x_test, y_test) 
 if acc > 0.96: print("State-of-the-art") 
 else: print("Keep Tuning Hyperparameters")

  25. Standard ML Evaluations model = train_model(x_train, y_train) acc, loss = model.evaluate( 
 x_test, y_test) 
 if acc > 0.96: print("State-of-the-art") 
 else: print("Keep Tuning Hyperparameters")

  26. Standard ML Evaluations model = train_model(x_train, y_train) acc, loss = model.evaluate( 
 x_test, y_test ) 
 if acc > 0.96: print("State-of-the-art") 
 else: print("Keep Tuning Hyperparameters")

  27. What are robustness evaluations?

  28. Standard ML Evaluations model = train_model(x_train, y_train) acc, loss = model.evaluate( 
 x_test, y_test) 
 if acc > 0.96: print("State-of-the-art") 
 else: print("Keep Tuning Hyperparameters")

  29. Adversarial ML Evaluations model = train_model(x_train, y_train) acc, loss = model.evaluate( 
 A( x_test, model ) , y_test) 
 if acc > 0.96: print("State-of-the-art") 
 else: print("Keep Tuning Hyperparameters")

  30. How complete are evaluations?

  31. Case Study: ICLR 2018

  32. Serious effort 
 to evaluate 
 By space, most papers are ½ evaluation

  33. We re-evalauted these defenses ...

  34. 2 Out of scope 4 Broken Defenses Correct Defenses 7

  35. 2 Out of scope 4 Broken Defenses Correct Defenses 7

  36. 2 Out of scope 4 Broken Defenses Correct Defenses 7

  37. So what did defenses do?

  38. Lessons Learned from Evaluating the Robustness of Defenses to Adversarial Examples

  39. Lessons (1 of 3) what types of defenses are effective

  40. First class of effective defenses:

  41. First class of effective defenses: Adversarial Training

  42. Second class of effective defenses:

  43. Second class of effective defenses: _______________

  44. Lessons (2 of 3) what we've learned from evaluations

  45. So how to attack it?

  46. "Fixing" Gradient Descent [0.1, 
 0.3, 0.0, 
 0.2, 0.4]

  47. Lessons (3 of 3) performing better evaluations

  48. Actionable advice requires specific, concrete examples Everything the following papers do is standard practice

  49. Perform an adaptive attack

  50. A "hold out" set is not an adaptive attack

  51. Stop using FGSM (exclusively)

  52. Use more than 100 (or 1000?) iteration of gradient descent

  53. Iterative attacks should always do better than single step attacks.

  54. Unbounded optimization attacks should eventually reach in 0% accuracy

  55. Unbounded optimization attacks should eventually reach in 0% accuracy

  56. Unbounded optimization attacks should eventually reach in 0% accuracy

  57. Model accuracy should be monotonically decreasing

  58. Model accuracy should be monotonically decreasing

  59. Evaluate against the worst attack

  60. Plot accuracy vs distortion

  61. Verify enough iterations of gradient descent

  62. Try gradient-free attack algorithms

  63. Try random noise

  64. The Future

  65. The Year is 1997

Recommend


More recommend