adversarial examples
play

ADVERSARIAL EXAMPLES (In 15 minutes or less) Neill Patterson, MscAC - PowerPoint PPT Presentation

ADVERSARIAL EXAMPLES (In 15 minutes or less) Neill Patterson, MscAC PART I - BASIC CONCEPTS WE TRAIN MODELS BY TAKING GRADIENTS W.R.T. WEIGHTS w w r J w Panda Change weights via gradient descent WERE GOING TO TAKE GRADIENTS


  1. ADVERSARIAL EXAMPLES (In 15 minutes or less) Neill Patterson, MscAC

  2. PART I - BASIC CONCEPTS

  3. WE TRAIN MODELS BY TAKING GRADIENTS W.R.T. WEIGHTS w w � η r J w

  4. “Panda” Change weights via gradient descent

  5. WE’RE GOING TO TAKE GRADIENTS W.R.T. PIXELS INSTEAD x x ± η r J x

  6. WE ARE GOING TO TAKE GRADIENTS W.R.T. PIXELS INSTEAD x x ± η r J x

  7. “Panda” Change pixels via gradient descent “Vulture”

  8. KEY IDEA: ADD SMALL, WORST- CASE PIXEL DISTORTION TO CAUSE MISCLASSIFICATIONS

  9. “Panda” “Gibbon” + = 58% confidence 99% confidence

  10. THINK OF ADVERSARIAL EXAMPLES AS WORST-CASE DOPPLEGÄNGERS

  11. DEMO

  12. Sanja Fidler Fiddler Crab

  13. PART II - HARNESSING ADVERSARIAL EXAMPLES

  14. KEY IDEA: MAKE TRAINING MORE DIFFICULT TO GET STRONGER MODELS (DROPOUT, RANDOM NOISE, ETC)

  15. TRAIN WITH ADVERSARIAL EXAMPLES FOR BETTER GENERALIZATION

  16. THE FAST GRADIENT SIGN METHOD OF IAN GOODFELLOW

  17. QUICKLY GENERATING ADVERSARIAL EXAMPLES

  18. WHAT DIRECTION SHOULD YOU MOVE TOWARDS?

  19. INSTEAD OF MOVING TOWARDS A SPECIFIC TYPE OF ERROR, MOVE AWAY FROM THE CORRECT LABEL

  20. “House” “Panda” “Truck” “Vulture”

  21. HOW BIG A STEP SHOULD YOU TAKE IF YOU WANT IMPERCEPTIBLE DISTORTION?

  22. PIXELS ARE STORED AS SIGNED 8-BIT INTEGERS. ADD JUST LESS THAN1- BIT OF DISTORTION TO EACH PIXEL 0 . 07 < 1 2 7 ≈ 0 . 08

  23. WE WANT PRECISELY THIS AMOUNT OF DISTORTION, SO NO MATTER HOW SMALL (OR BIG) THE GRADIENT, JUST TAKE THE SIGN OF IT AND MULTIPLY BY 0.07 x + 0 . 07 ⇥ sign ( r J x )

  24. INCORPORATING ADVERSARIAL EXAMPLES INTO YOUR COST FUNCTION

  25. GENERATE ADVERSARIAL EXAMPLES AT EACH ITERATION OF TRAINING, BUT DON’T WANT TO KEEP THEM AROUND IN MEMORY FOREVER

  26. INSTEAD, MODIFY THE COST FUNCTION TO BE A COMBINATION OF ORIGINAL AND ADVERSARIAL INPUTS

  27. New cost function e J ( θ , x , y ) = Parameters labels inputs

  28. Old cost function e J ( θ , x , y ) = J ( θ , x , y ) +

  29. Old cost function e J ( θ , x + ✏ sign r x J } , y ) J ( θ , x , y ) = J ( θ , x , y ) + | {z Adversarial example

  30. e J ( θ , x , y ) = J ( θ , x , y ) + (1 − α ) J ( θ , x + ✏ sign r x J, y ) α mixing components

  31. e J ( θ , x , y ) = J ( θ , x , y ) + (1 − α ) J ( θ , x + ✏ sign r x J, y ) α “Train with a mix of original and adversarial examples”

  32. NOW DO S.G.D. ON THIS NEW COST FUNCTION, BY TAKING GRADIENTS W.R.T. WEIGHTS w w � η r e J w

  33. PART III - MISCELLANEOUS TIPS FOR TRAINING

  34. YOU NEED MORE MODEL CAPACITY (ADVERSARIAL EXAMPLES DO NOT LIE ON THE MANIFOLD OF REALISTIC IMAGES)

  35. FOR EARLY STOPPING, BASE YOUR DECISION ON THE VALIDATION ERROR OF ADVERSARIAL EXAMPLES ONLY

  36. RESULTS

  37. BETTER GENERALIZATION ABOVE AND BEYOND DROPOUT 0.94% error 0.84% error (MNIST)

  38. BETTER GENERALIZATION ABOVE AND BEYOND DROPOUT 0.94% error 0.84% error (MNIST)

  39. RESISTANCE TO ADVERSARIAL EXAMPLES 89.4% error 17.9% error (97.6% confidence)

  40. MATHEMATICAL PROPERTIES OF ADVERSARIAL EXAMPLES

  41. MATHEMATICAL PROPERTIES OF ADVERSARIAL EXAMPLES (Ain’t nobody got time for that)

  42. THANK YOU FOR YOUR TIME!

Recommend


More recommend