midterm review
play

Midterm Review Li Xiong Department of Mathematics and Computer - PowerPoint PPT Presentation

CS573 Data Privacy and Security Midterm Review Li Xiong Department of Mathematics and Computer Science Emory University Principles of Data Security CIA Triad Confidentiality Prevent the disclosure of information to unauthorized users


  1. CS573 Data Privacy and Security Midterm Review Li Xiong Department of Mathematics and Computer Science Emory University

  2. Principles of Data Security – CIA Triad • Confidentiality – Prevent the disclosure of information to unauthorized users • Integrity – Prevent improper modification • Availability – Make data available to legitimate users

  3. Privacy vs. Confidentiality • Confidentiality – Prevent disclosure of information to unauthorized users • Privacy – Prevent disclosure of personal information to unauthorized users – Control of how personal information is collected and used – Prevent identification of individuals 11/8/2016 3

  4. Data Privacy and Security Measures • Access control – Restrict access to the (subset or view of) data to authorized users • Cryptography – Use encryption to encode information so it can be only read by authorized users (protected in transmit and storage) • Inference control – Restrict inference from accessible data to sensitive (non- accessible) data

  5. Inference Control • Inference control : Prevent inference from accessible information to individual information (not accessible) • Technologies – De-identification and Anonymization (input perturbation) – Differential Privacy (output perturbation)

  6. Traditional De-identification and Anonymization • Attribute suppression, encoding, perturbation, generalization • Subject to re-identification and disclosure attacks Sanitized Original De-identification Records Data anonymization

  7. Statistical Data Sharing with Differential Privacy • Macro data (as versus micro data) • Output perturbation (as versus input perturbation) • More rigorous guarantee Statistics/ Original Differentially Private Models/ Data Data Sharing Synthetic Records

  8. Cryptography • Encoding data in a way that only authorized users can read it Encrypted Original Encryption Data Data 11/9/2016 8

  9. Applications of Cryptography • Secure data outsourcing – Support computation and queries on encrypted data Computation Encrypted /Queries Data 9 11/9/2016 9

  10. Applications of Cryptography • Multi-party secure computations (secure function evaluation) – Securely compute a function without revealing private inputs x1 x2 f(x1,x2,…, xn) xn x3 10

  11. Applications of Cryptography • Private information retrieval (access privacy) – Retrieve data without revealing query (access pattern) 11

  12. Course Topics • Inference control – De-identification and anonymization – Differential privacy foundations – Differential privacy applications • Histograms • Data mining • Local differential privacy • Location privacy • Cryptography • Access control • Applications 11/8/2016 12

  13. k-Anonymity Caucas 787XX Flu Caucas 78712 Flu Asian/AfrA Asian 78705 Shingle 78705 Shingle m s s Caucas 787XX Flu Caucas 78754 Flu Asian/AfrA Asian 78705 Acne 78705 Acne m AfrAm 78705 Acne Asian/AfrA 78705 Acne m Caucas 78705 Flu Caucas 787XX Flu Quasi-identifiers (QID) = race, zipcode Sensitive attribute = diagnosis K-anonymity: the size of each QID group is at least k

  14. Problem of k-anonymity Caucas 787XX Flu Asian/AfrA 78705 Shingle … … … m s Rusty Shackleford Caucas 78705 Caucas 787XX Flu … … … Asian/AfrA 78705 Acne m Asian/AfrA 78705 Acne m Caucas 787XX Flu Problem: sensitive attributes are not “diverse” within each quasi-identifier group slide 14

  15. l-Diversity [Machanavajjhala et al. ICDE ‘06] Caucas 787XX Flu Caucas 787XX Shingle s Caucas 787XX Acne Entropy of sensitive attributes Caucas 787XX Flu within each quasi-identifier Caucas 787XX Acne group must be at least l Caucas 787XX Flu Asian/AfrA 78XXX Flu m Asian/AfrA 78XXX Flu m Asian/AfrA 78XXX Acne m Asian/AfrA 78XXX Shingle m s Asian/AfrA 78XXX Acne

  16. Problem with l-diversity Original dataset Anonymization A Anonymization B … HIV- Q1 HIV+ Q1 HIV- … HIV- Q1 HIV- Q1 HIV- … HIV- Q1 HIV+ Q1 HIV- … HIV- Q1 HIV- Q1 HIV+ … HIV- Q1 HIV+ Q1 HIV- … HIV+ Q1 HIV- Q1 HIV- … HIV- Q2 HIV- Q2 HIV- … HIV- Q2 HIV- Q2 HIV- 99% HIV-  quasi-identifier group is not “diverse” … HIV- Q2 HIV- Q2 HIV- …yet anonymized database does not leak anything … HIV- Q2 HIV- Q2 HIV- … HIV- Q2 HIV- Q2 HIV- 50% HIV-  quasi- identifier group is “diverse” … HIV- Q2 HIV- Q2 Flu This leaks a ton of information 99% have HIV-

  17. t-Closeness [Li et al. ICDE ‘07] Caucas 787XX Flu Caucas 787XX Shingle s Distribution of sensitive Caucas 787XX Acne attributes within each quasi-identifier group should Caucas 787XX Flu be “close” to their distribution Caucas 787XX Acne in the entire original database Caucas 787XX Flu Asian/AfrA 78XXX Flu m Asian/AfrA 78XXX Flu m Asian/AfrA 78XXX Acne m Asian/AfrA 78XXX Shingle m s slide 17 Asian/AfrA 78XXX Acne

  18. Problems with Syntactic Privacy notions • Syntactic – Focuses on data transformation, not on what can be learned from the anonymized dataset • “ Quasi- identifier” fallacy – Assumes a priori that attacker will not know certain information about his target – Attacker may know the records in the database or external information slide 18

  19. Course Topics • Inference control – De-identification and anonymization – Differential privacy foundations – Differential privacy applications • Histograms • Data mining • Location privacy • Cryptography • Access control • Applications 11/8/2016 19

  20. Differential Privacy • Statistical outcome is indistinguishable regardless whether a particular user (record) is included in the data

  21. Differential Privacy • Statistical outcome is indistinguishable regardless whether a particular user (record) is included in the data

  22. Statistical Data Release: disclosure risk Original records Original histogram

  23. Statistical Data Release: differential privacy Perturbed histogram Original records Original histogram with differential privacy

  24. Differential Privacy D D’ • D and D’ are neighboring databases if they differ in one record A privacy mechanism A gives ε -differential privacy if for all neighbouring databases D , D’ , and for any possible output S ∈ Range(A), Pr[A(D) = S ] ≤ exp(ε ) × Pr[A(D’) = S]

  25. Laplace Mechanism Global Sensitivity Add Laplace noise to the true output f(D) Δ f = max D,D’ | f ( D ) - f ( D’ )|

  26. Example: Laplace Mechanism • For a single counting query Q over a dataset D , returning Q(D)+Laplace(1/ε) gives ε - differential privacy. 11/8/2016 26

  27. Exponential Mechanism Inputs Outputs Sample output r with a utility score function u(D,r)

  28. Exponential Mechanism For a database D, output space R and a utility score function u : D× R → R , the algorithm A Pr[ A ( D ) = r ] ∝ exp (ε × u ( D, r )/ 2Δ u ) satisfies ε -differential privacy, where Δ u is the sensitivity of the utility score function Δ u = max r & D,D’ | u ( D, r ) - u ( D’, r )|

  29. Example: Exponential Mechanism • Scoring/utility function w: Inputs x Outputs  R • D: nationalities of a set of people • f(D) : most frequent nationality in D • u (D, O) = #(D, O) the number of people with nationality O Module 2 Tutorial: Differential Privacy in the Wild 29

  30. Composition theorems Sequential composition Parallel composition ∑ i ε i – differential privacy max( ε i ) – differential privacy

  31. Differential Privacy • Differential privacy ensure an attacker can’t infer the presence or absence of a single record in the input based on any output. • Building blocks – Laplace, exponential mechanism • Composition rules help build complex algorithms using building blocks

  32. Course Topics • Inference control – De-identification and anonymization – Differential privacy foundations – Differential privacy applications • Histograms • Data mining • Location privacy • Cryptography • Access control • Applications 11/8/2016 32

  33. Baseline: Laplace Mechanism • For the counting query Q on each histogram bin, returning Q(D)+Laplace(1/ε) gives ε - differential privacy. 11/8/2016 33

  34. DPCube [SecureDM 2010, ICDE 2012 demo] Name Age Income HIV+ Frank 42 30K Y ε/2 -DP Bob 31 60K Y Mary 28 20K Y … … … … Original Records DP unit Histogram • Compute unit Multi-dimensional histogram counts partitioning with differential ε/2 -DP privacy • Use DP unit histogram for partitioning • Compute V-optimal histogram counts with differential DP V-optimal Histogram privacy DP Interface

  35. Private Spatial decompositions [CPSSY 12] quadtree kd-tree  Need to ensure both partitioning boundary and the counts of each partition are differentially private 35

  36. Histogram methods vs parametric methods Non-parametric methods (only work well for low-dimensional data) Original data Synthetic data Perturbation Histogram Learn empirical distribution through histograms e.g. PSD , Privelet, FP, P-HP Parametric methods (joint distribution difficult to model) Fit the data to a distribution, make inferences about parameters e.g. PrivacyOnTheMap

Recommend


More recommend