Minimality Attack in Privacy Preserving Data Publishing Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Ada Wai-Chee Fu (the Chinese University of Hong Kong) Ke Wang (Simon Fraser University) Jian Pei (Simon Fraser University) Prepared by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong
Outline Minimize information loss, which gives rise to a new attack called Minimality 1. Introduction Attack . � k-anonymity � l-diversity 2. Enhanced model � Weaknesses of l-diversity � m-confidentiality 3. Algorithm 4. Experiment 5. Conclusion
1. K-Anonymity Patient Gender Address Birthday Cancer Raymond Male Hong Kong 29 Jan None Peter Male Shanghai 16 July Yes Kitty Female Hong Kong 21 Oct None Mary Female Hong Kong 8 Feb None Release the data set to public Gender Address Birthday Cancer Male Hong Kong 29 Jan None Male Shanghai 16 July Yes Female Hong Kong 21 Oct None Female Hong Kong 8 Feb None
1. K-Anonymity QID (quasi-identifier) Patient Gender Address Birthday Cancer Raymond Male Hong Kong 29 Jan None Peter Male Shanghai 16 July Yes Kitty Female Hong Kong 21 Oct None Knowledge 2 Mary Female Hong Kong 8 Feb None I also know Peter with (Male, Shanghai, 16 July) Release the data set to public Knowledge 1 Gender Address Birthday Cancer Combining Knowledge 1 Male Hong Kong 29 Jan None and Knowledge 2, we may deduce the Male Shanghai 16 July Yes ORIGINAL person. Female Hong Kong 21 Oct None Female Hong Kong 8 Feb None
2-anonymity : to generate a data set such that 1. K-Anonymity each possible QID value appears at least TWO times. QID (quasi-identifier) Patient Gender Address Birthday Cancer Raymond Male Hong Kong 29 Jan None Peter Male Shanghai 16 July Yes Kitty Female Hong Kong 21 Oct None Knowledge 2 Mary Female Hong Kong 8 Feb None I also know Peter with (Male, Asia , 16 July) Release the data set to public In the released data set, each possible QID value (Gender, Address, Knowledge 1 Gender Address Birthday Cancer Birthday) appears at least TWO times. Asia * Male None Combining Knowledge 1 and Knowledge 2, Asia * Male Yes we CANNOT deduce Female Hong Kong * None the ORIGINAL person. * Female Hong Kong None This data set is 2-anonymous
1. K-anonymity � We have discussed the traditional model of k-anonymity � Does this model really preserve “ privacy ” ? Gender Address Birthday Cancer Male Asia * Yes Asia * Male Yes * Female Hong Kong None * Female Hong Kong None
1. l-diversity Patient Gender Address Birthday Cancer Raymond Male Hong Kong 29 Jan None Peter Male Shanghai 16 July Yes Kitty Female Shanghai 21 Oct None Mary Female Hong Kong 8 Feb None Release the data set to public Gender Address Birthday Cancer Male Hong Kong 29 Jan None Male Shanghai 16 July Yes Female Shanghai 21 Oct None Female Hong Kong 8 Feb None
1. l-diversity Patient Gender Address Birthday Cancer Raymond Male Hong Kong 29 Jan None Peter Male Shanghai 16 July Yes Kitty Female Shanghai 21 Oct None Knowledge 2 Mary Female Hong Kong 8 Feb None I also know Peter with (Male, Shanghai, 16 July) Release the data set to public Knowledge 1 Gender Address Birthday Cancer Combining Knowledge 1 Male Hong Kong 29 Jan None and Knowledge 2, we may deduce the Male Shanghai 16 July Yes disease of Peter. Female Shanghai 21 Oct None Female Hong Kong 8 Feb None
1. l-diversity Patient Gender Address Birthday Cancer Raymond Male Hong Kong 29 Jan None Peter Male Shanghai 16 July Yes Kitty Female Shanghai 21 Oct None Knowledge 2 Mary Female Hong Kong 8 Feb None I also know Peter with (Male, Shanghai, 16 July) Release the data set to public Knowledge 1 Gender Address Birthday Cancer Male Hong Kong 29 Jan None Male Shanghai 16 July Yes Female Shanghai 21 Oct None Female Hong Kong 8 Feb None
Simplified 2-diversity : to generate a data set such that each individual is linked to “ cancer ” with probability at most 1/2 1. l-diversity Patient Gender Address Birthday Cancer Raymond Male Hong Kong 29 Jan None Peter Male Shanghai 16 July Yes Kitty Female Shanghai 21 Oct None Knowledge 2 Mary Female Hong Kong 8 Feb None I also know Peter with (Male, Shanghai, 16 July) Release the data set to public Now, we cannot deduce These two tuples form an equivalence class . “ Peter ” suffered from Knowledge 1 Gender Address Birthday Cancer “ Cancer ” * * Hong Kong None Combining Knowledge 1 and Knowledge 2, * * Shanghai Yes we CANNOT deduce * Shanghai * None the disease of Peter. * * Hong Kong None This data set is 2-diverse
2.1 Weakness of l-diversity � We have discussed l-diversity � Does this model really preserve “ privacy ” ? � No.
Simplified 2-diversity : to generate a data set such that each individual is linked to “ cancer ” with probability at most 1/2 2.1 Weakness of l-diversity Patient Gender Address Birthday Cancer QID Raymond Male Hong Kong 29 Jan None q1 Peter Male Shanghai q2 16 July Yes q3 Kitty Female Shanghai 21 Oct None Knowledge 2 q4 Mary Female Hong Kong 8 Feb None I also know Peter with (Male, Shanghai, 16 July) Release the data set to public Release the data set to public Knowledge 1 Gender Address Birthday Cancer QID * * Hong Kong None Q1 * * Shanghai Yes Q2 * Shanghai * None Q2 * * Hong Kong None Q1
Simplified 2-diversity : to generate a data set such that each individual is linked to “ cancer ” with probability at most 1/2 2.1 Weakness of l-diversity Patient Gender Address Birthday Cancer QID Raymond Male Hong Kong 29 Jan None q1 Peter Male Shanghai q2 16 July Yes q3 Kitty Female Shanghai 21 Oct None q4 Mary Female Hong Kong 8 Feb None Release the data set to public Release the data set to public Gender Address Birthday Cancer QID * * Hong Kong None Q1 * * Shanghai Yes Q2 * Shanghai * None Q2 * * Hong Kong None Q1
e.g.2 Simplified 2-diversity : to e.g.1 generate a data set such that each QI D Cancer QI D Cancer individual is linked to “ cancer ” with q1 Yes q1 Yes probability at most 1/2 2.1 Weakness of l-diversity q1 None q1 Yes Does NOT satisfy 2-diversity q2 Yes q2 None q2 None q2 None Satisfies 2-diversity q2 None q2 None q2 None q2 None Release the data set to public QI D Cancer QI D Cancer q1 Yes Q Yes q1 None Q Yes q2 Yes Q None q2 None Q None Satisfies 2-diversity q2 None q2 None Satisfies 2-diversity q2 None q2 None
e.g.2 Simplified 2-diversity : to e.g.1 generate a data set such that each QI D Cancer QI D Cancer individual is linked to “ cancer ” with q1 Yes q1 Yes probability at most 1/2 2.1 Weakness of l-diversity q1 None q1 Yes Does NOT satisfy 2-diversity q2 Yes q2 None q2 None q2 None Satisfies 2-diversity q2 None q2 None Same set of sensitive values q2 None q2 None (i.e. Cancer) Release the data set to public Same set of QID values Different released data sets! QI D Cancer QI D Cancer Why? q1 Yes Q Yes The anonymization algorithm q1 None Q Yes tries to minimize the q2 Yes Q None generalization steps. q2 None Q None Satisfies 2-diversity q2 None q2 None Satisfies 2-diversity q2 None q2 None
e.g.2 Simplified 2-diversity : to e.g.1 generate a data set such that each QI D Cancer QI D Cancer individual is linked to “ cancer ” with q1 Yes q1 Yes probability at most 1/2 2.1 Weakness of l-diversity q1 None q1 Yes q2 Yes q2 None q2 None q2 None q2 None q2 None q2 None q2 None Release the data set to public QI D Cancer QI D Cancer q1 Yes Q Yes q1 None Q Yes q2 Yes Q None q2 None Q None q2 None q2 None q2 None q2 None
Simplified 2-diversity : to generate a data set such that each QI D Cancer individual is linked to “ cancer ” with q1 Yes probability at most 1/2 2.1 Weakness of l-diversity q1 Yes q2 None q2 None q2 None q2 None QI D Cancer Q Yes Q Yes Q None Q None q2 None q2 None
Simplified 2-diversity : to generate a data set such that each QI D Cancer Knowledge 2 individual is linked to “ cancer ” with q1 Yes probability at most 1/2 I also know Peter with QID = (q1) 2.1 Weakness of l-diversity q1 Yes Knowledge 3 q2 None I also know that there are two q1 values and four q2 values in the table. q2 None Knowledge 4 q2 None The anonymization algorithm tries to minimize q2 None the generalization steps for 2-diversity I will think in the following way. Poss. 3 Poss. 1 Knowledge 1 Poss. 2 QI D Cancer QI D Cancer QI D Cancer QI D Cancer q1 Yes Q Yes q1 Yes q2 Yes q2 Yes Q Yes q1 Yes q2 Yes q1 None Q None q2 None q1 None q2 None Q None q2 None q1 None q2 None q2 None q2 None q2 None q2 None q2 None q2 None q2 None
Recommend
More recommend