Security and Privacy in Machine Learning Nicolas Papernot Pennsylvania State University & Google Brain Lecture for Prof. Trent Jaeger’s CSE 543 Computer Security Class November 2017 - Penn State
Thank you to my collaborators Patrick McDaniel (Penn State) Alexey Kurakin (Google Brain) Martín Abadi (Google Brain) Praveen Manoharan (CISPA) Pieter Abbeel (Berkeley) Ilya Mironov (Google Brain) Michael Backes (CISPA) Ananth Raghunathan (Google Brain) Dan Boneh (Stanford) Arunesh Sinha (U of Michigan) Z. Berkay Celik (Penn State) Shuang Song (UCSD) Yan Duan (OpenAI) Ananthram Swami (US ARL) Úlfar Erlingsson (Google Brain) Kunal Talwar (Google Brain) Matt Fredrikson (CMU) Ian Goodfellow Florian Tramèr (Stanford) Kathrin Grosse (CISPA) (Google Brain) Michael Wellman (U of Michigan) Sandy Huang (Berkeley) Xi Wu (Google) Somesh Jha (U of Wisconsin) 2
Machine Learning [0.01, 0.84 , 0.02, 0.01, 0.01, 0.01, 0.05, 0.01, 0.03, 0.01] Classifier [p(0|x,θ), p(1|x,θ), p(2|x,θ), …, p(7|x,θ), p(8|x,θ), p(9|x,θ)] x f(x,θ) Classifier : map inputs to one class among a predefined set 3
[0 1 0 0 0 0 0 0 0 0] [0 1 0 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 1 0 0] Machine Learning [0 0 0 0 0 0 0 0 0 1] [0 0 0 1 0 0 0 0 0 0] Classifier [0 0 0 0 0 0 0 0 1 0] [0 0 0 0 0 0 1 0 0 0] [0 1 0 0 0 0 0 0 0 0] [0 0 0 0 1 0 0 0 0 0] Learning : find internal classifier parameters θ that minimize a cost/loss function (~model error) 4
Outline of this lecture 1 Security in ML 2 Privacy in ML 5
Part I Security in machine learning 6
Attack Models Attacker may see the model: bad even if an attacker needs to know details of the machine learning model to do an attack --- aka a white-box attacker ML Attacker may not need the model: worse if attacker who knows very little (e.g. only gets to ask a few questions) can do an attack --- aka a black-box attacker ML 7 Papernot et al. Towards the Science of Security and Privacy in Machine Learning
Attack Models Attacker may see the model: bad even if an attacker needs to know details of the machine learning model to do an attack --- aka a white-box attacker ML Attacker may not need the model: worse if attacker who knows very little (e.g. only gets to ask a few questions) can do an attack --- aka a black-box attacker ML 8 Papernot et al. Towards the Science of Security and Privacy in Machine Learning
Adversarial examples (white-box attacks) 9
J acobian-based S aliency M ap A pproach (JSMA) 10 Papernot et al. The Limitations of Deep Learning in Adversarial Settings
Jacobian-Based Iterative Approach: source-target misclassification 11 Papernot et al. The Limitations of Deep Learning in Adversarial Settings
Evading a Neural Network Malware Classifier DREBIN dataset of Android applications P[X= Malware ] = 0.90 Add constraints to JSMA approach: P[X=Benign] = 0.10 - only add features: keep malware behavior - only features from manifest : easy to modify P[X*=Malware] = 0.10 P[X*= Benign ] = 0.90 “Most accurate” neural network - 98% accuracy, with 9.7% FP and 1.3% FN - Evaded with a 63.08% success rate 12 Grosse et al. Adversarial Perturbations Against Deep Neural Networks for Malware Classification
Supervised vs. reinforcement learning Supervised learning Reinforcement learning Observation Model inputs Environment & Reward function (e.g., traffic sign, music, email) Class Model outputs (e.g., stop/yield, jazz/classical, Action spam/legitimate) Maximize reward Training “goal” Minimize class prediction error by exploring the environment and (i.e., cost/loss) over pairs of (inputs, outputs) taking actions Example 13
Adversarial attacks on neural network policies 14 Huang et al. Adversarial Attacks on Neural Network Policies
Adversarial examples (black-box attacks) 15
Threat model of a black-box attack Training data Model architecture Adversarial capabilities Model parameters Model scores (limited) oracle access: labels Adversarial goal Force a ML model remotely accessible through an API to misclassify Example 16
Our approach to black-box attacks Alleviate lack of knowledge Alleviate lack of about model training data 17
Adversarial example transferability Adversarial examples have a transferability property: samples crafted to mislead a model A are likely to mislead a model B These property comes in several variants: ML A ● Intra-technique transferability: ○ Cross model transferability ○ Cross training set transferability ● Cross-technique transferability 18 Szegedy et al. Intriguing properties of neural networks
Adversarial example transferability Adversarial examples have a transferability property: samples crafted to mislead a model A are likely to mislead a model B These property comes in several variants: ML A ● Intra-technique transferability: ○ Cross model transferability ○ Cross training set transferability ● Cross-technique transferability ML B Victim 19 Szegedy et al. Intriguing properties of neural networks
Adversarial example transferability Adversarial examples have a transferability property: samples crafted to mislead a model A are likely to mislead a model B 20
Cross-technique transferability 21 Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Cross-technique transferability 22 Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Our approach to black-box attacks Alleviate lack of knowledge Alleviate lack of about model training data Adversarial example transferability from a substitute model to target model 23
Attacking remotely hosted black-box models Remote ML sys “no truck sign” “STOP sign” “STOP sign” (1) The adversary queries remote ML system for labels on inputs of its choice. 24
Attacking remotely hosted black-box models Local Remote substitute ML sys “no truck sign” “STOP sign” “STOP sign” (2) The adversary uses this labeled data to train a local substitute for the remote system. 25
Attacking remotely hosted black-box models Local Remote substitute ML sys “no truck sign” “STOP sign” (3) The adversary selects new synthetic inputs for queries to the remote ML system based on the local substitute’s output surface sensitivity to input variations. 26
Attacking remotely hosted black-box models “yield sign” Local Remote substitute ML sys (4) The adversary then uses the local substitute to craft adversarial examples, which are misclassified by the remote ML system because of transferability. 27
Our approach to black-box attacks Alleviate lack of knowledge Alleviate lack of about model training data + Adversarial example transferability from a Synthetic data substitute model to generation target model 28
Results on real-world remote systems Adversarial examples Remote Platform ML technique Number of queries misclassified (after querying) Deep Learning 6,400 84.24% Logistic Regression 800 96.19% Unknown 2,000 97.72% All remote classifiers are trained on the MNIST dataset (10 classes, 60,000 training samples) 29 [PMG16a] Papernot et al. Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
Benchmarking progress in the adversarial ML community 30
31
Growing community 1.3K+ stars 340+ forks 40+ contributors 32
Adversarial examples represent worst-case distribution drifts 33 [DDS04] Dalvi et al. Adversarial Classification (KDD)
Adversarial examples are a tangible instance of hypothetical AI safety problems 34 Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg
Part II Privacy in machine learning 35
Types of adversaries and our threat model Model querying ( black-box adversary ) Black-box ? ML Shokri et al. (2016) Membership Inference Attacks against ML Models Fredrikson et al. (2015) Model Inversion Attacks Model inspection ( white-box adversary ) Zhang et al. (2017) Understanding DL requires rethinking generalization In our work, the threat model assumes: - Adversary can make a potentially unbounded number of queries - Adversary has access to model internals 36
A definition of privacy } Answer 1 Answer 2 Randomized ... Algorithm ? ? ? Answer n ? Answer 1 Randomized Answer 2 Algorithm ... Answer n 37
Our design goals Problem Preserve privacy of training data when learning classifiers Differential privacy protection guarantees Intuitive privacy protection guarantees Goals Generic * (independent of learning algorithm) *This is a key distinction from previous work, such as Pathak et al. (2011) Privacy preserving probabilistic inference with hidden markov models Jagannathan et al. (2013) A semi-supervised learning approach to differential privacy Shokri et al. (2015) Privacy-preserving Deep Learning Abadi et al. (2016) Deep Learning with Differential Privacy Hamm et al. (2016) Learning privately from multiparty data 38
The PATE approach 39
Teacher ensemble Partition 1 Teacher 1 Partition 2 Teacher 2 Sensitive Partition 3 Teacher 3 Data ... ... Partition n Teacher n Training Data flow 40
Aggregation Count votes Take maximum 41
Recommend
More recommend