privacy definitions beyond anonymity
play

Privacy Definitions: Beyond Anonymity CompSci 590.03 Instructor: - PowerPoint PPT Presentation

Privacy Definitions: Beyond Anonymity CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 5 : 590.03 Fall 12 1 Announcements Some new project ideas added Please meet with me at least once before you finalize your project


  1. Privacy Definitions: Beyond Anonymity CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 5 : 590.03 Fall 12 1

  2. Announcements • Some new project ideas added • Please meet with me at least once before you finalize your project (deadline Sep 28). Lecture 5 : 590.03 Fall 12 2

  3. Outline • Does k-anonymity guarantee privacy? • L-diversity • T-closeness Lecture 5 : 590.03 Fall 12 3

  4. Data Publishing Publish information that: • Discloses as much statistical information as possible. • Preserves the privacy of the individuals contributing the data. Patient 1 Patient 2 Patient 3 Patient N r 1 r 2 r 3 r N Hospital Publish properties of D B { r 1 , r 2 , …, r N } 4

  5. Privacy Breach: linking identity to sensitive info. Zip Age Nationality Disease 13053 28 Russian Heart 13068 29 American Heart 13068 21 Japanese Flu 13053 23 American Flu Quasi-Identifier 14853 50 Indian Cancer 14853 55 Russian Heart 14850 47 American Flu 14850 59 American Flu 13053 31 American Cancer 13053 37 Indian Cancer 13068 36 Japanese Cancer 13068 32 American Cancer Public Information 5

  6. k-Anonymity using Generalization Quasi-identifiers (Q-ID) Zip Age Nationality Disease can identify individuals in the 130** <30 * Heart population 130** <30 * Heart 130** <30 * Flu table T* is k-anonymous 130** <30 * Flu if each 1485* >40 * Cancer SELECT COUNT(*) 1485* >40 * Heart FROM T* 1485* >40 * Flu GROUP BY Q-ID 1485* >40 * Flu is ≥ k 130** 30-40 * Cancer 130** 30-40 * Cancer Parameter k indicates “degree” of 130** 30-40 * Cancer anonymity 130** 30-40 * Cancer 6

  7. k-Anonymity: A popular privacy definition Complexity – k-Anonymity is NP-hard – (log k) Approximation Algorithm exists Algorithms – Incognito (use monotonicity to prune generalization lattice) – Mondrian (multidimensional partitioning) – Hilbert (convert multidimensional problem into a 1d problem) – … 7

  8. Does k-Anonymity guarantee sufficient privacy ? 8

  9. Attack 1: Homogeneity Zip Age Nat. Disease 130** <30 * Heart 130** <30 * Heart Bob has Cancer 130** <30 * Flu 130** <30 * Flu 1485* >40 * Cancer 1485* >40 * Heart 1485* >40 * Flu 1485* >40 * Flu 130** 30-40 * Cancer Name Zip Age Nat. 130** 30-40 * Cancer Bob 13053 35 ?? 130** 30-40 * Cancer 130** 30-40 * Cancer 9

  10. Attack 2: Background knowledge Zip Age Nat. Disease 130** <30 * Heart Name Zip Age Nat. 130** <30 * Heart Umeko 13068 24 Japan 130** <30 * Flu 130** <30 * Flu 1485* >40 * Cancer 1485* >40 * Heart 1485* >40 * Flu 1485* >40 * Flu 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer 10

  11. Attack 2: Background knowledge Zip Age Nat. Disease 130** <30 * Heart Name Zip Age Nat. 130** <30 * Heart Umeko 13068 24 Japan 130** <30 * Flu 130** <30 * Flu Japanese have a very low 1485* >40 * Cancer incidence of Heart disease. 1485* >40 * Heart 1485* >40 * Flu 1485* >40 * Flu Umeko has Flu 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer 130** 30-40 * Cancer 11

  12. Q: How do we ensure the privacy of published data? Method 1: Breach and Patch The MA Governor Breach and the AOL Privacy Breach caused by re-identifying individuals. Identify privacy breach k-Anonymity only considers the risk of re-identification. Design a new Adversaries with background algorithm to fix the knowledge can breach privacy privacy breach even without re-identifying individuals. 12

  13. Limitations of the Breach and Patch methodology. Method 1: Breach and Patch 1. A data publisher may not be able to enumerate all the Identify privacy possible privacy breaches. breach 2. A data publisher does not know what other privacy breaches are possible. Design a new algorithm to fix the privacy breach 13

  14. Q: How do we ensure the privacy of published data? Method 1: Method 2: Breach and Patch Define and Design Formally specify the Identify privacy privacy model breach Derive conditions for privacy Design a new algorithm to fix the privacy breach Design an algorithm that satisfies the privacy conditions 14

  15. Recall the attacks on k-Anonymity Zip Age Nat. Disease Name Zip Age Nat. 130** <30 * Heart Umeko 13068 24 Japan 130** <30 * Heart Japanese have a very low 130** <30 * Flu incidence of Heart disease. 130** <30 * Flu 1485* >40 * Cancer Umeko has Flu 1485* >40 * Heart 1485* >40 * Flu 1485* >40 * Flu 130** 30-40 * Cancer Name Zip Age Nat. 130** 30-40 * Cancer Bob 13053 35 ?? 130** 30-40 * Cancer 130** 30-40 * Cancer Bob has Cancer 15

  16. 3-Diverse Table Zip Age Nat. Disease Name Zip Age Nat. 1306* <=40 * Heart Umeko 13068 24 Japan 1306* <=40 * Flu Japanese have a very low 1306* <=40 * Cancer incidence of Heart disease. 1306* <=40 * Cancer L-Diversity Principle : 1485* >40 * Cancer Umeko has ?? Every group of tuples with the same 1485* >40 * Heart Q- ID values has ≥ L distinct sensitive 1485* >40 * Flu values of roughly equal proportions. 1485* >40 * Flu 1305* <=40 * Heart Name Zip Age Nat. 1305* <=40 * Flu Bob 13053 35 ?? 1305* <=40 * Cancer 1305* <=40 * Cancer Bob has ?? 16

  17. L-Diversity: Privacy Beyond K-Anonymity [Machanavajjhala et al ICDE 2006] L-Diversity Principle : Every group of tuples with the same Q-ID values has ≥ L distinct “well represented” sensitive values. Questions: • What kind of adversarial attacks do we guard against? • Why is this the right definition for privacy? – What does the parameter L signify? 17

  18. Method 2: Define and Design 1. Which information is sensitive? Formally specify the 2. What does the adversary know? privacy model 3. How is the disclosure quantified? • L-Diversity Derive conditions for privacy Design an algorithm • L-Diverse Generalization that satisfies the privacy conditions 18

  19. Privacy Specification for L-Diversity • The link between identity and attribute value is the sensitive information. “ Does Bob have Cancer? Heart disease? Flu?” “Does Umeko have Cancer? Heart disease? Flu?” • Adversary knows ≤ L -2 negation statements. “ Umeko does not have Heart Disease.” – Data Publisher may not know exact adversarial knowledge • Privacy is breached when identity can be linked to attribute value Individual u does not have with high probability a specific disease s Pr[ “ Bob has Cancer” | published table, adv. knowledge ] > t 19

  20. Method 2: Define and Design 1. Which information is sensitive? Formally specify the 2. What does the adversary know? privacy model 3. How is the disclosure quantified? • L-Diversity Derive conditions for privacy Design an algorithm • L-Diverse Generalization that satisfies the privacy conditions 20

  21. Calculating Probabilities Set of all possible worlds World 1 World 2 World 3 World 4 World 5 Sasha Cancer Heart Heart Flu Heart Tom Cancer Heart Flu Heart Flu Umeko Cancer Flu Flu Heart Heart Van Cancer Flu Heart Flu Flu Every world represents Amar Cancer Cancer Heart Cancer Flu a unique assignment of Boris Cancer Heart Cancer Flu Heart diseases to individuals Carol Cancer Flu Flu Heart Flu Dave Cancer Flu Flu Flu Cancer Bob Cancer Cancer Cancer Cancer Cancer Charan Cancer Cancer Cancer Cancer Cancer Daiki Cancer Cancer Cancer Cancer Cancer Ellen Cancer Cancer Cancer Cancer Cancer … 21

  22. Calculating Probabilities Set of all possible worlds Set of worlds consistent with T* World 1 World 2 World 3 World 4 World 5 T* Sasha Cancer Heart Heart Flu Heart Cancer 0 Tom Cancer Heart Flu Heart Flu Heart 2 Umeko Cancer Flu Flu Heart Heart Flu 2 Van Cancer Flu Heart Flu Flu Amar Cancer Cancer Heart Cancer Flu Cancer 1 Boris Cancer Heart Cancer Flu Heart Heart 1 Carol Cancer Flu Flu Heart Flu Flu 2 Dave Cancer Flu Flu Flu Cancer Bob Heart Cancer Cancer Cancer Cancer Cancer 4 Charan Flu Cancer Cancer Cancer Cancer Heart 0 Daiki Cancer Cancer Cancer Cancer Cancer Flu 0 Ellen Cancer Cancer Cancer Cancer Cancer … 22

  23. Calculating Probabilities Pr[Umeko has Flu| B, T*] = # worlds consistent with B, T* where Umeko has Flu = 1 Set of worlds consistent Set of worlds consistent with T* # worlds consistent with B, T* with B, T* World 2 World 3 World 4 World 5 T* Sasha Heart Heart Flu Heart Cancer 0 Tom Heart Flu Heart Flu Heart 2 Umeko Flu Flu Heart Heart Flu 2 Van Flu Heart Flu Flu Amar Cancer Heart Cancer Flu Cancer 1 Boris Heart Cancer Flu Heart Heart 1 Carol Flu Flu Heart Flu Flu 2 Dave Flu Flu Flu Cancer Bob Cancer Cancer Cancer Cancer Cancer 4 Charan Cancer Cancer Cancer Cancer Heart 0 Daiki Cancer Cancer Cancer Cancer Flu 0 Ellen Cancer Cancer Cancer Cancer … B: Umeko.Disease ≠ Heart 23

Recommend


More recommend