differential privacy machine learning
play

Differential Privacy Machine Learning Li Xiong Big Data + Machine - PowerPoint PPT Presentation

CS573 Data Privacy and Security Differential Privacy Machine Learning Li Xiong Big Data + Machine Learning + Machine Learning Under Adversarial Settings Data privacy/confidentiality attacks membership attacks, model inversion


  1. CS573 Data Privacy and Security Differential Privacy – Machine Learning Li Xiong

  2. Big Data + Machine Learning +

  3. Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks • membership attacks, model inversion attacks • Model integrity attacks • Training time: data poisoning attacks • Inference time: adversarial examples

  4. Differential Privacy for Machine Learning • Data privacy attacks • Model inversion attacks • Membership inference attacks • Differential privacy for deep learning • Noisy SGD • PATE

  5. Neural Networks

  6. Learning the parameters: Gradient Descent

  7. Stochastic Gradient Descent  Gradient Descent (batch GD)  The cost gradient is based on the complete training set, can be costly and longer to converge to minimum  Stochastic Gradient Descent (SGD, iterative or online-GD)  Update the weight after each training sample  The gradient based on a single training sample is a stochastic approximation of the true cost gradient  Converges faster but the path towards minimum may zig-zag  Mini-Batch Gradient Descent (MB-GD)  Update the weights based on small group of training samples

  8. Training-data extraction attacks Fredrikson et al. (2015) : Output (label) Private training dataset Philip Facial Jack Input (facial image) Recognitio Monica n Model … unknown

  9. Membership Inference Attacks against Machine Learning Models Reza Shokri , Marco Stronati, Congzheng Song, Vitaly Shmatikov

  10. 5 Membership Inference Attack Model Prediction Training Was this specific DATA data record part of Input Classification the training set? data airplane automobile … ship truck

  11. 8 Membership Inference Attack on Summary Statistics • Summary statistics (e.g., average) on each attribute • Underlying distribution of data is known [Homer et al. (2008)], [Dwork et al. (2015)], [Backes et al. (2016)] on Machine Learning Models Black-box setting: • No knowledge about the models’ parameters • No access to internal computations of the model • No knowledge about the underlying distribution of data

  12. 9 Exploit Model’s Predictions Main insight : ML models overfit to Model their training data Prediction API Training API DATA

  13. 9 Exploit Model’s Predictions Main insight : ML models overfit to Model their training data Prediction API Training API Input from DATA Classification the training set

  14. 9 Exploit Model’s Predictions Main insight : ML models overfit to Model their training data Prediction API Training API Input from DATA Classification the training set Input NOT from Classification the training set

  15. 9 Exploit Model’s Predictions Model Prediction API Training API Input from DATA Classification the training set Input NOT from Classification the training set Recognize the difference

  16. 10 ML against ML Model Prediction API Training API Input from DATA Classification the training set Input not from Classification the training set recognize the difference Train a ML model to

  17. 11 Train Attack Model using Shadow Models … Shadow Shadow Shadow Model 1 Model 2 Model k classification classification classification Train 1 Test 1 Train 2 Test 2 Train k Test k IN OUT IN OUT IN OUT Train the attack model to predict if an input was a member of the training set (in) or a non-member (out)

  18. 12 Obtaining Data for Training Shadow Models • Real : similar to training data of the target model (i.e., drawn from same distribution) • Synthetic : use a sampling algorithm to obtain data classified with high confidence by the target model

  19. 14 Constructing the Attack Model SYNTHETIC AT TA C K Tr a i n i n g Attack Shadow DATA DATA Shadow Model Model Shadow Shadow Shadow Shadow Shadow Models Prediction API

  20. 14 Constructing the Attack Model SYNTHETIC AT TA C K Tr a i n i n g Attack Shadow DATA DATA Shadow Model Model Shadow Shadow Shadow Shadow Shadow Models Prediction API Using the Attack Model Attack Model Model one single membership classification data record probability Prediction API

  21. 15 1 Real Data Marginal-Based Synthetic 0.9 Model-Based Synthetic Cumulative Fraction of Classes 0.8 0.7 0.6 overall accuracy: 0.5 0.89 0.4 shadows trained overall accuracy: 0.3 on synthetic data 0.93 0.2 shadows trained 0.1 on real data 0 0 0.2 0.4 0.6 0.8 1 Membership inference precision Purchase Dataset — Classify Customers (100 classes)

  22. 16 Privacy Learning Model training set data universe

  23. 16 Privacy Learning Does the model leak information about data in the training set? Model training set data universe

  24. 16 Privacy Learning Does the model leak Does the model information about data generalize to data in the training set? outside the training set? Model training set data universe

  25. 16 Privacy Learning Does the model leak Does the model information about data generalize to data in the training set? outside the training set? Model Overfitting is training set the common enemy! data universe

  26. 17 Not in a Direct Conflict! Privacy-preserving machine learning Utility (prediction accuracy) Privacy

  27. Differential Privacy for Machine Learning • Data privacy attacks • Model inversion attacks • Membership inference attacks • Differential privacy for deep learning • Noisy SGD • PATE

  28. DEEP LEARNING WITH DIFFERENTIAL PRIVACY Martin Abadi, Andy Chu, Ian Goodfellow*, Brendan McMahan, Ilya Mironov, Kunal T alwar , Li Zhang Google * OpenAI

  29. Differential Privacy (ε, δ) -Differential Privacy: The distribution of the output M ( D ) on database D is (nearly) the same as M ( D′ ) : Pr[ M ( D ) ∊ S ] ≤ exp(ε) ∙ Pr[ M ( D′ ) ∊ S ]+δ. ∀ S : quantifies information leakage allows for a small probability of failure

  30. Interpreting Differential Privacy Training Data SGD Model D D′

  31. Differential Privacy: Gaussian Mechanism If ℓ 2 - sensitivity of f : D → ℝ n : max D , D ′ || f ( D ) − f ( D ′)|| 2 < 1, then the Gaussian mechanism f ( D ) + N n (0, σ 2 ) offers (ε, δ) - differential privacy , where δ ≈ exp(- (εσ) 2 /2). Dwork, Kenthapadi, McSherry, Mironov, Naor , “Our Data, Ourselves”, Eurocrypt 2006

  32. Basic Composition Theorem If f is (ε 1 , δ 1 ) -DP and g is (ε 2 , δ 2 ) -DP , then f ( D ), g ( D ) is (ε 1 +ε 2 , δ 1 +δ 2 ) -DP

  33. Simple Recipe for CompositeFunctions Tocompute composite f with differential privacy 1. Bound sensitivity of f ’s components 2. Apply the Gaussian mechanism to each component 3. Compute total privacy via the composition theorem

  34. Deep Learning with DifferentialPrivacy

  35. Differentially Private Deep Learning 1. Loss function softmax loss 2. Training / Test data MNIST andCIFAR-10 PCA+ neural network 3. Topology Differentially private SGD 4. Training algorithm tune experimentally 5. Hyperparameters

  36. Naïve Privacy Analysis 1. Choose = 4 (1.2, 10 -5 ) -DP 2. Each step is (ε, δ) -DP 3. Number of steps T 10,000 (12,000, .1) -DP 4. Composition: ( T ε, T δ) -DP

  37. Advanced Composition Theorems

  38. Composition theorem +ε for Blue +.2ε for Blue + ε for Red

  39. Strong Composition Theorem 1. Choose = 4 (1.2, 10 -5 ) -DP 2. Each step is (ε, δ) -DP 3. Number of steps T 10,000 4. Strong comp: ( , T δ) -DP (360, .1) -DP Dwork, Rothblum, Vadhan, “Boosting and Differential Privacy”, FOCS 2010 Dwork, Rothblum, “Concentrated Differential Privacy”, https://arxiv .org/abs/1603.0188

  40. Amplification by Sampling = 4 1. Choose 1% 2. Each batch is q fraction of data (.024, 10 -7 ) -DP 3. Each step is (2 q ε, q δ) -DP 4. Number of steps T 10,000 (10, .001) -DP 5. Strong comp: ( , qT δ) -DP S. Kasiviswanathan, H. Lee, K. Nissim, S. Raskhodnikova, A. Smith, “What Can We Learn Privately?”, SIAM J. Comp, 2011

  41. Moments Accountant = 4 1. Choose 1% 2. Each batch is q fraction of data 3. Keeping track of privacy loss’s moments 4. Number of steps T 10,000 (1.25, 10 -5 ) -DP 5. Moments: ( , δ) -DP

  42. Results

  43. Our Datasets: “Fruit Flies of Machine Learning” MNIST dataset: CIFAR-10 dataset: 70,000 images 60,000 color images 28 ⨉ 28 pixels each 32 ⨉ 32 pixels each

  44. Summary of Results Baseline no privacy MNIST 98.3% CIFAR-10 80%

  45. Summary of Results [SS15] [WKC+16] Baseline reports ε per no privacy ε =2 parameter MNIST 98.3% 98% 80% CIFAR-10 80%

  46. Summary of Results [SS15] this work Baseline [WKC+16] ε =8 ε =2 ε =0.5 reports ε per ε =2 no privacy parameter δ = 10 -5 δ = 10 -5 δ = 10 -5 MNIST 98.3% 98% 80% 97% 95% 90% CIFAR-10 80% 73% 67%

  47. Contributions Differentially private deep learning applied to publicly ● available datasets and implemented in TensorFlow ○ https://github.com/tensorflow/models ● Innovations ○ Bounding sensitivity ofupdates Moments accountant to keep tracking of privacy loss ○ Lessons ● Recommendations for selection ofhyperparameters ○ Full version: https://arxiv .org/abs/1607.00133 ●

Recommend


More recommend