enhancing privacy in machine learning
play

Enhancing Privacy in Machine Learning Mathias Humbert INSA - PowerPoint PPT Presentation

Enhancing Privacy in Machine Learning Mathias Humbert INSA Toulouse/CNRS Toulouse, January 22, 2019 Enhancing Privacy in Machine Learning data ML What ML? What data? What threat? Mathias Humbert - Enhancing Privacy in Machine Learning


  1. Enhancing Privacy in Machine Learning Mathias Humbert INSA Toulouse/CNRS Toulouse, January 22, 2019

  2. Enhancing Privacy in Machine Learning data ML What ML? What data? What threat? Mathias Humbert - Enhancing Privacy in Machine Learning � 2

  3. Different Attacks: Linkability Robert Alice Marius Eve Ability to link at least two records concerning the same individual If one data set is not anonymized → re-identification Mathias Humbert - Enhancing Privacy in Machine Learning � 3

  4. Different Attacks: Membership Inference Study focusing on 
 HIV patients ? (x,y,z) Ability to infer that a certain target is in a specific dataset Mathias Humbert - Enhancing Privacy in Machine Learning � 4

  5. Trading Off Privacy Privacy Utility ML E ffi ciency What ML? What data? What threat? What defense? Mathias Humbert - Enhancing Privacy in Machine Learning � 5

  6. Different Defense Mechanisms Privacy Utility ML E ffi ciency Anonymization Randomization Differential privacy Cryptography Mathias Humbert - Enhancing Privacy in Machine Learning � 6

  7. Outline of the Talk • Attack - defense - data • Temporal linkability - randomization - microRNA expression ℝ r r ≈ 10 3 USENIX Security’16 • • Re-identification - cryptography - DNA methylation IEEE S&P’17 [0,1] m m ≈ 10 7 • • Membership inference - other defense - any data NDSS'19 • Mathias Humbert - Enhancing Privacy in Machine Learning � 7

  8. Outline of the Talk • Attack - defense - data • Temporal linkability - randomization - microRNA expression USENIX Security’16 • • Re-identification - cryptography - DNA methylation IEEE S&P’17 • • Membership inference - other defense - any data NDSS'19 • Mathias Humbert - Enhancing Privacy in Machine Learning � 8

  9. DNA versus MicroRNA DNA miRNA contains blueprint of what a cell regulates what a cell really • • potentially can do , does , is (mostly) fixed over time , expression changes over time , • • can hint on risks of getting a can tell whether you carry a • • disease . disease . Common belief: no privacy threats from miRNAs, 
 because of temporal variability Mathias Humbert - Enhancing Privacy in Machine Learning � 9

  10. Temporal Linkability Attack • Matching two datasets E.g., a leaked database (incl. name) and public DB (excl. name) • Which sample from t 1 corresponds to which sample from t 2 ? • t 1 t 2 Mathias Humbert - Enhancing Privacy in Machine Learning � 10

  11. Data Pre-processing • High dimensionality: 1,189 miRNAs per sample Possibly correlated and uninteresting components • r t j k • PCA + whitening provides Unit variance • PCA Smaller dimensionality • Uncorrelated components • • Condenses data into a set of smaller dimensions 
 r t j ¯ k with minimal information loss Mathias Humbert - Enhancing Privacy in Machine Learning � 11

  12. Linkability Attack t 2 t 1 Which sample from t 1 corresponds to which sample from t 2 ? r t 2 r t 1 � � � ¯ i − ¯ r t 1 � r t 2 k 2 k i n � � σ ∗ = arg min X r t 2 r t 1 � ¯ σ ( i ) − ¯ � � i � 2 σ i =1 { r t 1 i } n i =1 { r t 2 i } n i =1 Mathias Humbert - Enhancing Privacy in Machine Learning � 12

  13. Linkability Attack t 2 t 1 σ Which sample from t 1 corresponds to which sample from t 2 ? r t 2 r t 1 � � � ¯ i − ¯ r t 1 � r t 2 k 2 k i n � � σ ∗ = arg min X r t 2 r t 1 � ¯ σ ( i ) − ¯ � � i � 2 σ i =1 { r t 1 i } n i =1 { r t 2 i } n i =1 Time complexity: O(n 3 ) Mathias Humbert - Enhancing Privacy in Machine Learning � 13

  14. Athletes Dataset Participants: 29 Points in time: 2 (before and after exercising) Time period: 1 week Disease: none 1,189 miRNAs per sample taken from blood and plasma • Mathias Humbert - Enhancing Privacy in Machine Learning � 14

  15. Lung Cancer Dataset Participants: 26 (huge for a longitudinal study!) Points in time: 8 Time period: 18 months Disease: lung cancer 1,189 miRNAs per sample taken from plasma • before surgery after surgery months -? 0 3 6 9 12 15 18 Mathias Humbert - Enhancing Privacy in Machine Learning � 15

  16. Linkability Attack – Results 55% 90% 29% 48% number of PCA dimensions number of PCA dimensions success up to 90% 
 for blood-based samples Mathias Humbert - Enhancing Privacy in Machine Learning � 16

  17. Linkability Attack – Results How does the success change 
 with larger datasets ? Success decreases sharply 
 for plasma-based samples, but decreases linearly 
 for blood-based samples. Mathias Humbert - Enhancing Privacy in Machine Learning � 17

  18. Outline of the Talk • Attack - defense - data • Temporal linkability - randomization - microRNA expression USENIX Security’16 • • Re-identification - cryptography - DNA methylation IEEE S&P’17 • • Membership inference - other defense - any data NDSS'19 • Mathias Humbert - Enhancing Privacy in Machine Learning � 18

  19. Defense Mechanisms • Hiding non-relevant miRNA expressions Sometimes, randomization is not an option • E.g., for making a diagnosis in a hospital • Caution: correlations between miRNAs • • Randomizing the miRNA expression profiles Adding noise in a fully distributed, differentially-private manner 
 • → providing epigeno-indistinguishability (inspired by [1]) Noise drawn according to multivariate Laplacian mechanism • E.g., for publishing a dataset used in a study • [1] Chatzikokolakis et al. Broadening the scope of di ff erential privacy using metrics , PETS, 2013 
 Mathias Humbert - Enhancing Privacy in Machine Learning � 19

  20. Privacy-Utility Trade-Off Privacy: prevent linkability of samples Utility: preserve accuracy of classification as diseased / healthy , 
 usually using a radial SVM classifier disease miRNA 2 miRNA 1 Mathias Humbert - Enhancing Privacy in Machine Learning � 20

  21. Privacy-Utility Trade-Off Privacy: prevent linkability of samples Utility: preserve accuracy of classification as diseased / healthy , 
 usually using a radial SVM classifier disease miRNA 2 miRNA 1 Mathias Humbert - Enhancing Privacy in Machine Learning � 21

  22. Privacy-Utility Trade-Off Privacy: prevent linkability of samples Utility: preserve accuracy of classification as diseased / healthy , 
 usually using a radial SVM classifier Another dataset for exploring utility: 1000+ participants, 19 diseases, 1 time point Mathias Humbert - Enhancing Privacy in Machine Learning � 22

  23. Hiding miRNAs – Results <80% <100 miRNAs Mathias Humbert - Enhancing Privacy in Machine Learning � 23

  24. Hiding miRNAs – Results accuracy 99,2% Mathias Humbert - Enhancing Privacy in Machine Learning � 24

  25. Hiding miRNAs – Results attacker’s success rate Mathias Humbert - Enhancing Privacy in Machine Learning � 25

  26. Hiding miRNAs – Results 99,2% Mathias Humbert - Enhancing Privacy in Machine Learning � 26

  27. Hiding miRNAs – Results 99,2 Trade-off at 7 miRNAs Attack success decreased (relative to all) 
 by 54% SVM accuracy decreased (relative to max) 
 by only 1% Mathias Humbert - Enhancing Privacy in Machine Learning � 27

  28. Hiding miRNAs – Results 92,7% Mathias Humbert - Enhancing Privacy in Machine Learning � 28

  29. Hiding miRNAs – Results Trade-off at 4 miRNAs 92,7 Success decreases (relative to all) 
 by 80% Accuracy decreases (relative to max) 
 by only 1% Mathias Humbert - Enhancing Privacy in Machine Learning � 29

  30. Probabilistic Sanitization – Results 99,2% Mathias Humbert - Enhancing Privacy in Machine Learning � 30

  31. Probabilistic Sanitization – Results 99,2% 99,2% Mathias Humbert - Enhancing Privacy in Machine Learning � 31

  32. Probabilistic Sanitization – Results Suitable balance at ℇ =0.025 99,2% Attack success decreased (relative to all) 
 by 63% SVM accuracy decreased (relative to max) 
 by only 0.65% Mathias Humbert - Enhancing Privacy in Machine Learning � 32

  33. Probabilistic Sanitization – Results 96,9% Mathias Humbert - Enhancing Privacy in Machine Learning � 33

  34. Probabilistic Sanitization – Results 96,9% Trade-off at ℇ =0.01 Success decreases (relative to all) 
 by 70% Accuracy decreases (relative to max) 
 by only 0.2% Mathias Humbert - Enhancing Privacy in Machine Learning � 34

  35. Outline of the Talk • Attack - defense - data type • Temporal linkability - randomization - microRNA expression USENIX Security’16 • • Re-identification - cryptography - DNA methylation IEEE S&P’17 • • Membership inference - other defense - any data NDSS'19 • Mathias Humbert - Enhancing Privacy in Machine Learning � 35

Recommend


More recommend