Adversarial Examples and Adversarial Training Innova&ve Technology Leader program January 22 nd 2018 Florian Tramèr Stanford
Deep Learning is Super Smart! 2
Is it really? + . 007 ⇥ = I’m sure this I’m certain this is a panda is a gibbon (or an airplane) (Goodfellow et al. 2015) 3
Adversarial Examples in ML • Images Szegedy et al. 2013, Nguyen et al. 2015, Goodfellow et al. 2015, Papernot et al. 2016, Liu et al. 2016, Kurakin et al. 2016, … • Physical Objects Sharif et al. 2016, Kurakin et al. 2017, EvWmov et al. 2017, Lu et al. 2017, Athalye et al. 2017 • Malware Šrndić & Laskov 2014, Xu et al. 2016, Grosse et al. 2016, Hu et al. 2017 • Text Understanding Papernot et al. 2016, Jia & Liang 2017 • Speech Carlini et al. 2015, Cisse et al. 2017 4
CreaWng an adversarial example bird ML Model bird Loss tree plane What happens if I nudge this pixel? 5
CreaWng an adversarial example bird ML Model bird Loss tree plane What happens if I nudge this pixel? 6
CreaWng an adversarial example bird ML Model bird Loss tree plane What about this one? Maximize loss with gradient ascent 7
Threat Model: Black-Box Adacks ML Model ML Model plane plane Adversarial ML Model plane Examples transfer 8
Defenses? • Ensembles • Preprocessing (blurring, cropping, etc.) • DisWllaWon • GeneraWve modeling • Adversarial training 9
Adversarial Training ML Model Loss bird adack ML Model Loss plane 10
Adversarial Training +/- • Pros – IntuiWve approach – Gives strong formal and empirical guarantees • Cons l p noise – Makes assumpWons on adacks rotaWons – Can overfit (gradient masking) of bird class lighWng 11
Gradient-Masking: A non-defense airplanes airplanes birds birds “smooth” model “non-smooth” model - Gradient-based adacks work - Model has no useful gradients - Black-box adacks work - Black-box adacks sWll work! - Model is not robust! - Model is not robust either! T KPBM, “Ensemble Adversarial Training: A5acks and Defenses ”, 2017 12
Recommend
More recommend