AI and Security: Lessons, Challenges & Future Directions Dawn Song UC Berkeley
AI and Security Enabler AI Security Enabler • AI enables security applications • Security enables better AI • Integrity: produces intended/correct results (adversarial machine learning) • Confidentiality/Privacy: does not leak users’ sensitive data (secure, privacy- preserving machine learning) • Preventing misuse of AI
AI and Security: AI in the presence of attacker
AI and Security: AI in the presence of attacker • Important to consider the presence of attacker • History has shown attacker always follows footsteps of new technology development (or sometimes even leads it) • The stake is even higher with AI • As AI controls more and more systems, attacker will have higher & higher incentives • As AI becomes more and more capable, the consequence of misuse by attacker will become more and more severe
AI and Security: AI in the presence of attacker • Attack AI • Cause learning system to not produce intended/correct results • Cause learning system to produce targeted outcome designed by attacker • Learn sensitive information about individuals • Need security in learning systems • Misuse AI • Misuse AI to attack other systems • Find vulnerabilities in other systems • Target attacks • Devise attacks • Need security in other systems
AI and Security: AI in the presence of attacker • Attack AI: • Cause learning system to not produce intended/correct results • Cause learning system to produce targeted outcome designed by attacker • Learn sensitive information about individuals • Need security in learning systems • Misuse AI • Misuse AI to attack other systems • Find vulnerabilities in other systems • Target attacks • Devise attacks • Need security in other systems
Deep Learning Systems Are Easily Fooled ostrich Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. Intriguing properties of neural networks. ICLR 2014.
STOP Signs in Berkeley
Adversarial Examples in Physical World Can we generate adversarial examples in the physical world that remain effective under different viewing conditions and viewpoints, including viewing distances and angles? 10
Adversarial Examples in Physical World Subtle Perturbations Evtimov, Ivan, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, and Dawn Song. “Robust Physical-World Attacks on Machine Learning Models.” arXiv preprint arXiv:1707.08945 (2017). 11
Adversarial Examples in Physical World Subtle Perturbations 12
Adversarial Examples in Physical World Camouflage Perturbations 13
Camouflage Perturbations 14
Adversarial Examples in Physical World Adversarial perturbations are possible in physical world under different viewing conditions and viewpoints, including viewing distances and angles. Deep loss function: 15
Adversarial Examples Prevalent in Deep Learning Systems • Most existing work on adversarial examples: • Image classification task • Target model is known • Our investigation on adversarial examples: Deep Generative Blackbox Reinforcement Models Attacks Learning Weaker Threat Models (Target model is unknown) VisualQA/ New Attack Image-to-code Methods Other tasks and model classes Provide more diversity of attacks
Generative models ● VAE-like models (VAE, VAE-GAN) use an intermediate latent representation ● An encoder : maps a high-dimensional input into lower- dimensional latent representation z . ● A decoder: maps the latent representation back to a high- dimensional reconstruction.
Adversarial Examples in Generative Models ● An example attack scenario: ● Generative model used as a compression scheme ● Attacker’s goal: for the decompressor to reconstruct a different image from the one that the compressor sees.
Adversarial Examples for VAE-GAN in MNIST Target Image Original images Reconstruction of original images Adversarial examples Reconstruction of adversarial examples Jernej Kos, Ian Fischer, Dawn Song: Adversarial Examples for Generative Models
Adversarial Examples for VAE-GAN in SVHN Target Image Original images Reconstruction of original images Adversarial examples Reconstruction of adversarial examples Jernej Kos, Ian Fischer, Dawn Song: Adversarial Examples for Generative Models
Adversarial Examples for VAE-GAN in SVHN Target Image Original images Reconstruction of original images Adversarial examples Reconstruction of adversarial examples Jernej Kos, Ian Fischer, Dawn Song: Adversarial Examples for Generative Models
Deep Reinforcement Learning Agent (A3C) Playing Pong Original Frames Jernej Kos and Dawn Song: Delving into adversarial attacks on deep policies [ICLR Workshop 2017].
Adversarial Examples on A3C Agent on Pong Score No. of steps Jernej Kos and Dawn Song: Delving into adversarial attacks on deep policies [ICLR Workshop, 2017]
Attacks Guided by Value Function Score Score No. of steps No. of steps Injecting adversarial perturbations Blindly injecting adversarial perturbations every 10 frames. guided by the value function.
Agent in Action With FGSM perturbations With FGSM perturbations Original Frames ( 𝜗 = 0.005) inject in ( 𝜗 = 0.005) inject based every frame on value function Jernej Kos and Dawn Song: Delving into adversarial attacks on deep policies [ICLR Workshop 2017].
Visual Q&A Given a question and an image, predict the answer.
Studied VQA Models Model 1: MCB ( https://arxiv.org/abs/1606.01847 ) • Uses Multimodal Compact Bilinear pooling to combine the image feature and question embedding.
Studied VQA Models Model 2: NMN ( https://arxiv.org/abs/1704.05526 ) • A representative of neural module networks • First predicts a network layout according to the question, then predicts the answer using the obtained network.
Question: What color is the sky? Original answer: MCB - blue, NMN - blue. Target: gray. Answer after attack: MCB - gray, NMN - gray. benign image adversarial image for MCB adversarial image for NMN Xiaojun Xu, Xinyun Chen, Chang Liu, Anna Rohrbach, Trevor Darell, Dawn Song: Can you fool AI with adversarial examples on a visual Turing test?
Question: Is it raining? Original answer: MCB - no, NMN - no. Target: yes. Answer after attack: MCB - yes, NMN - yes. benign image adv image for MCB adv image for NMN
Question: What is on the ground? Original answer: MCB - sand, NMN - sand. Target: snow. Answer after attack: MCB - snow, NMN - snow. benign image adv image for MCB adv image for NMN
Question: Where is the plane? Original answer: MCB - runway, NMN - runway. Target: sky. Answer after attack: MCB - sky, NMN - sky. benign image adv image for MCB adv image for NMN
Question: What color is the traffic light? Original answer: MCB - green, NMN - green. Target: red. Answer after attack: MCB - red, NMN - red. benign image adv image for MCB adv image for NMN
Question: What does the sign say? Original answer: MCB - stop, NMN - stop. Target: one way. Answer after attack: MCB - one way, NMN - one way. benign image adv image for MCB adv image for NMN
Question: How many cats are there? Original answer: MCB - 1, NMN - 1. Target: 2. Answer after attack: MCB - 2, NMN - 2. benign image adv image for MCB adv image for NMN
Adversarial Examples Prevalent in Deep Learning Systems • Most existing work on adversarial examples: • Image classification task • Target model is known • Our investigation on adversarial examples: Deep Generative Blackbox Reinforcement Models Attacks Learning Weaker Threat Models (Target model is unknown) VisualQA/ New Attack Image-to-code Methods Other tasks and model classes Provide more diversity of attacks
A General Framework for Black-box attacks • Zero-Query Attack (Previous methods) • Random perturbation • Difference of means • Transferability-based attack • Practical Black-Box Attacks against Machine Learning [Papernot et al. 2016] • Ensemble transferability-based attack [ Yanpei Liu, Xinyun Chen, Chang Liu, Dawn Song: Delving into Transferable Adversarial Examples and Black-box Attacks, ICLR 2017] • Query Based Attack (new method) • Finite difference gradient estimation • Query reduced gradient estimation • A general active query game model The zero-query attack can be viewed as a special case for the query based attack, where the number of queries made is zero
Query Based attacks • Finite difference gradient estimation • Given d -dimensional vector x , we can make 2 d queries to estimate the gradient as below • An example of approximate FGS with finite difference Similarly, we can also approximate for logit-based loss by making 2d queries x adv = x + ✏ · sign (FD x ( ` f ( x , y ) , � )) • Query reduced gradient estimation • Random grouping • PCA [Bhagoji, Li, He, Song, 2017]
Query Based Attacks Finite Differences method outperforms other black-box attacks and achieves similar attack success rate with the white-box attack Gradient estimation method with query reduction performs approximately similar as without query reduction
Recommend
More recommend