PAIS 2015 Transparency and disclosure risk in data privacy c Torra 1 Vicen¸ March, 2015 1 School of Informatics, University of Sk¨ ovde, Sweden
Outline Outline Outline Quantitative measures of risk: record linkage Transparency principle: publication of data processing methods a good practice on data privacy similar to the one in cryptography Risk needs to consider the transparency principle Vicen¸ c Torra; Transparency data privacy PAIS 2015 1 / 61
Outline Outline 1. Introduction • Masking methods • Disclosure risk assessment 2. Transparency • Definition • Attacking Rank Swapping • Attacking Microaggregation 3. Worst-case scenario when measuring disclosure risk 4. Summary PAIS 2015 2 / 61
Introduction > Masking methods Outline Introduction Masking methods PAIS 2015 3 / 61
Introduction > Masking methods Outline Masking methods Masking methods. • Perturbative • Non-perturbative • Synthetic data generators Review • Microaggregation • Rank swapping Vicen¸ c Torra; Transparency data privacy PAIS 2015 4 / 61
Introduction > Masking methods Outline Rank Swapping Rank swapping • For ordinal/numerical attributes • Applied attribute-wise Data : ( a 1 , . . . , a n ) : original data; p : percentage of records Order ( a 1 , . . . , a n ) in increasing order (i.e., a i ≤ a i +1 ) ; Mark a i as unswapped for all i ; for i = 1 to n do if a i is unswapped then Select ℓ randomly and uniformly chosen from the limited range [ i + 1 , min( n, i + p ∗ | X | / 100)] ; Swap a i with a ℓ ; Undo the sorting step ; Vicen¸ c Torra; Transparency data privacy PAIS 2015 5 / 61
Introduction > Masking methods Outline Rank Swapping Rank swapping. • Marginal distributions not modified. • Correlations between the attributes are modified • Good trade-off between information loss and disclosure risk Vicen¸ c Torra; Transparency data privacy PAIS 2015 6 / 61
Introduction > Microaggregation Outline Microaggregation Microaggregation. • Case of two attributes microaggregated together Vicen¸ c Torra; Transparency data privacy PAIS 2015 7 / 61
Introduction > Microaggregation Outline Microaggregation Microaggregation. Application. • k : number of records in the cluster • Partition of the attributes v ′ v ′ v ′ v ′ v 1 v 2 v 3 v 4 1 2 3 4 1 1 1 1 1.66667 2 1.33333 1.66667 2 2 1 2 1.66667 2 1.33333 1.66667 2 3 1 6 1.66667 2 2.33333 5.66667 2 9 1 10 3 7.33333 1.66667 9.66667 3 6 2 2 3 7.33333 1.33333 1.66667 4 1 2 9 4.33333 5 1.66667 9.66667 4 6 2 10 4.33333 5 1.66667 9.66667 4 7 3 2 3 7.33333 2.33333 5.66667 5 8 3 9 4.33333 5 2.33333 5.66667 6 8 4 7 7.66667 8.66667 6 5 8 1 7 2 8.66667 2.66667 6 5 8 9 7 6 7.66667 8.66667 6 5 9 3 8 1 8.66667 2.66667 8.66667 1.33333 9 4 8 2 8.66667 2.66667 8.66667 1.33333 9 9 10 1 7.66667 8.66667 8.66667 1.33333 Vicen¸ c Torra; Transparency data privacy PAIS 2015 8 / 61
Introduction > Disclosure risk Outline Introduction Disclosure risk assesment Vicen¸ c Torra; Transparency data privacy PAIS 2015 9 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Disclosure risk. • Identity disclosure vs. Attribute disclosure ◦ Attribute disclosure: ⋆ Increase knowledge about an attribute of an individual ◦ Identity disclosure: ⋆ Find/identify an individual in a masked file Vicen¸ c Torra; Transparency data privacy PAIS 2015 10 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Disclosure risk. • Identity disclosure vs. Attribute disclosure • Boolean vs. quantitative measures Vicen¸ c Torra; Transparency data privacy PAIS 2015 11 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Disclosure risk. • Identity disclosure vs. Attribute disclosure • Boolean vs. quantitative measures (minimize information loss vs. multiobjetive optimization) Vicen¸ c Torra; Transparency data privacy PAIS 2015 11 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Disclosure risk. • Identity disclosure vs. Attribute disclosure • Boolean vs. quantitative measures (minimize information loss vs. multiobjetive optimization) Examples. • Boolean definitions of risk ◦ k-Anonymity (Boolean definition / identity disclosure) ◦ differential privacy (Boolean definition / attribute disclosure) • Quantitative measures of risk ◦ Re-identification / Record linkage (for identity disclosure) ◦ Uniqueness (for identity disclosure) ◦ Interval disclosure (for attribute disclosure) Vicen¸ c Torra; Transparency data privacy PAIS 2015 11 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • An scenario for identity disclosure: X = id || X nc || X c ◦ Protection of the attributes ⋆ Identifiers. Usually removed or encrypted. ⋆ Confidential. X c are usually not modified. X ′ c = X c . ⋆ Quasi-identifiers. Apply masking method ρ to these attributes. X ′ nc = ρ ( X nc ) . Vicen¸ c Torra; Transparency data privacy PAIS 2015 12 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • An scenario for identity disclosure: X = id || X nc || X c ◦ A : File with the protected data set ◦ B : File with the data from the intruder (subset of original X ) (protected / public) B (intruder) A r 1 s 1 Re-identification a Record linkage r a b a 1 a n s b quasi- a 1 a n i 1 , i 2 , ... confidential identifiers quasi- identifiers identifiers Vicen¸ c Torra; Transparency data privacy PAIS 2015 13 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • An scenario for identity disclosure ◦ Reidentification using the common attributes (quasi-identifiers): Vicen¸ c Torra; Transparency data privacy PAIS 2015 14 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • An scenario for identity disclosure ◦ Reidentification using the common attributes (quasi-identifiers): identity disclosure Vicen¸ c Torra; Transparency data privacy PAIS 2015 14 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • An scenario for identity disclosure ◦ Reidentification using the common attributes (quasi-identifiers): identity disclosure ◦ Attribute disclosure may be possible Vicen¸ c Torra; Transparency data privacy PAIS 2015 14 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • An scenario for identity disclosure ◦ Reidentification using the common attributes (quasi-identifiers): identity disclosure ◦ Attribute disclosure may be possible when reidentification permits to link confidential values to identifiers (in this case: identity disclosure implies attribute disclosure) Vicen¸ c Torra; Transparency data privacy PAIS 2015 14 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • Flexible scenario for identity disclosure ◦ A protected file using a masking method ◦ B (intruder’s) is a subset of the original file. Vicen¸ c Torra; Transparency data privacy PAIS 2015 15 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • Flexible scenario for identity disclosure ◦ A protected file using a masking method ◦ B (intruder’s) is a subset of the original file. → intruder with information on only some individuals Vicen¸ c Torra; Transparency data privacy PAIS 2015 15 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • Flexible scenario for identity disclosure ◦ A protected file using a masking method ◦ B (intruder’s) is a subset of the original file. → intruder with information on only some individuals → intruder with information on only some characteristics Vicen¸ c Torra; Transparency data privacy PAIS 2015 15 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • Flexible scenario for identity disclosure ◦ A protected file using a masking method ◦ B (intruder’s) is a subset of the original file. → intruder with information on only some individuals → intruder with information on only some characteristics ◦ But also, ⋆ B with a schema different to the one of A (different attributes) Vicen¸ c Torra; Transparency data privacy PAIS 2015 15 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • Re-identification. Risk as number of re-identifications that might be obtained by an intruder (estimation). Vicen¸ c Torra; Transparency data privacy PAIS 2015 16 / 61
Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • Re-identification. Risk as number of re-identifications that might be obtained by an intruder (estimation). ◦ When both files have the same schema: record linkage algorithms. Vicen¸ c Torra; Transparency data privacy PAIS 2015 16 / 61
Recommend
More recommend