Simulatability “The enemy knows the system”, Claude Shannon CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 6 : 590.03 Fall 12 1
Announcements • Please meet with me at least 2 times before you finalize your project (deadline Sep 28). Lecture 6 : 590.03 Fall 12 2
Recap – L-Diversity • The link between identity and attribute value is the sensitive information. “ Does Bob have Cancer? Heart disease? Flu?” “Does Umeko have Cancer? Heart disease? Flu?” • Adversary knows ≤ L -2 negation statements. “ Umeko does not have Heart Disease.” – Data Publisher may not know exact adversarial knowledge • Privacy is breached when identity can be linked to attribute value with high probability Pr[ “ Bob has Cancer” | published table, adv. knowledge ] > t Lecture 6 : 590.03 Fall 12 3
Recap – 3-Diverse Table Zip Age Nat. Disease 1306* <=40 * Heart 1306* <=40 * Flu 1306* <=40 * Cancer L-Diversity Principle : 1306* <=40 * Cancer Every group of tuples with the 1485* >40 * Cancer same Q-ID values has ≥ L 1485* >40 * Heart distinct sensitive values of 1485* >40 * Flu 1485* >40 * Flu roughly equal proportions. 1305* <=40 * Heart 1305* <=40 * Flu 1305* <=40 * Cancer 1305* <=40 * Cancer Lecture 6 : 590.03 Fall 12 4
Outline • Simulatable Auditing • Minimality Attack in anonymization • Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 12 5
Query Auditing Query Yes Database Safe to publish? No Researcher Database has numeric values (say salaries of employees). Database either truthfully answers a question or denies answering. MIN, MAX, SUM queries over subsets of the database. Question: When to allow/deny queries? Lecture 6 : 590.03 Fall 12 6
Why should we deny queries? • Q1: Ben’s sensitive value? Name 1 st year Gender Sensitiv – DENY PhD e value Ben Y M 1 • Q2: Max sensitive value of Bha N M 1 males? Ios Y M 1 – ANSWER: 2 Jan N M 2 Jian Y M 2 Jie N M 1 • Q3: Max sensitive value of 1 st Joe N M 2 year PhD students? Moh N M 1 – ANSWER: 3 Son N F 1 Xi Y F 3 • But Q3 + Q2 => Xi = 3 Yao N M 2 Lecture 6 : 590.03 Fall 12 7
Value-Based Auditing • Let a 1 , a 2 , …, a k be the answers to previous queries Q 1 , Q 2 , …, Q k . • Let a k+1 be the answer to Q k+1 . a i = f(c i1 x 1 , c i2 x 2 , …, c in x n ), i = 1 … k+1 c im = 1 if Q i depends on x m Check if any x j has a unique solution. Lecture 6 : 590.03 Fall 12 8
Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. • Allow query if value of xi can’t be inferred. x 1 x 2 x 3 x 4 x 5 Lecture 6 : 590.03 Fall 12 9
Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. - ∞ ≤ x 1 … x 5 ≤ 10 • Allow query if value of xi can’t be inferred. max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 x 2 Ans: 10 10 x 3 x 4 x 5 Lecture 6 : 590.03 Fall 12 10
Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. • Allow query if value of xi can’t be inferred. - ∞ ≤ x 1 … x 4 ≤ 8 max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 => x 5 = 10 x 2 Ans: 10 10 x 3 max(x 1 , x 2 , x 3 , x 4 ) x 4 x 5 Ans: 8 DENY Lecture 6 : 590.03 Fall 12 11
Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. Denial means some • Allow query if value of xi can’t be inferred. value can be compromised! max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 x 2 Ans: 10 10 x 3 max(x 1 , x 2 , x 3 , x 4 ) x 4 x 5 Ans: 8 DENY Lecture 6 : 590.03 Fall 12 12
Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. What could • Allow query if value of xi can’t be inferred. max(x1, x2, x3, x4) be? max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 x 2 Ans: 10 10 x 3 max(x 1 , x 2 , x 3 , x 4 ) x 4 x 5 Ans: 8 DENY Lecture 6 : 590.03 Fall 12 13
Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. From first answer, • Allow query if value of xi can’t be inferred. max(x1,x2,x3,x4) ≤ 10 max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 x 2 Ans: 10 10 x 3 max(x 1 , x 2 , x 3 , x 4 ) x 4 x 5 Ans: 8 DENY Lecture 6 : 590.03 Fall 12 14
Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. If, max(x1,x2,x3,x4) = 10 • Allow query if value of xi can’t be inferred. Then, no privacy breach max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 x 2 Ans: 10 10 x 3 max(x 1 , x 2 , x 3 , x 4 ) x 4 x 5 Ans: 8 DENY Lecture 6 : 590.03 Fall 12 15
Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. Hence, • Allow query if value of xi can’t be inferred. max(x1,x2,x3,x4) < 10 => x5 = 10! max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 x 2 Ans: 10 10 x 3 max(x 1 , x 2 , x 3 , x 4 ) x 4 x 5 Ans: 8 DENY Lecture 6 : 590.03 Fall 12 16
Value-based Auditing • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. Hence, • Allow query if value of xi can’t be inferred. max(x1,x2,x3,x4) < 10 => x5 = 10! Denials leak information. max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 Attack occurred since privacy analysis did x 2 Ans: 10 10 x 3 not assume that attacker knows the algorithm. max(x 1 , x 2 , x 3 , x 4 ) x 4 x 5 Ans: 8 DENY Lecture 6 : 590.03 Fall 12 17
Simulatable Auditing [Kenthapadi et al PODS ‘05] • An auditor is simulatable if the decision to deny a query Q k is made based on information already available to the attacker. – Can use querie s Q 1 , Q 2 , …, Q k and answers a 1 , a 2 , …, a k-1 – Cannot use a k or the actual data to make the decision. • Denials provably do not leak informaiton – Because the attacker could equivalently determine whether the query would be denied. – Attacker can mimic or simulate the auditor. Lecture 6 : 590.03 Fall 12 18
Simulatable Auditing Algorithm • Data Values: {x 1 , x 2 , x 3 , x 4 , x 5 }, Queries: MAX. Ans > 10 => not possible • Allow query if value of xi can’t be inferred. Ans = 10 => - ∞ ≤ x 1 … x 4 ≤ 10 SAFE UNSAFE Ans < 10 => x 5 = 10 max(x 1 , x 2 , x 3 , x 4 , x 5 ) x 1 x 2 Ans: 10 10 x 3 max(x 1 , x 2 , x 3 , x 4 ) x 4 Before x 5 computing DENY answer Lecture 6 : 590.03 Fall 12 19
Summary of Simulatable Auditing • Decision to deny answers must be based on past queries answered in some ( many! ) cases. • Denials can leak information if the adversary does not know all the information that is used to decide whether to deny the query. Lecture 6 : 590.03 Fall 12 20
Outline • Simulatable Auditing • Minimality Attack in anonymization • Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 12 21
Minimality attack on Generalization algorithms • Algorithms for K-anonymity, L-diversity, T-closeness, etc. try to maximize utility. – Find a minimally generalized table in the lattice that satisfies privacy, and maximizes utility. • But … attacker also knows this algorithm! Lecture 6 : 590.03 Fall 12 22
Example Minimality attack [Wong et al VLDB07] • Dataset with one quasi-identifier and 2 values q1, q2. • q1, q2 generalize to Q. • Sensitive attribute: Cancer – yes/no • We want to ensure P[Cancer = yes] < ½. – OK to know if an individual does not have Cancer. QID Cancer Q Yes Q Yes • Published Table: Q No Q No q2 No q2 No Lecture 6 : 590.03 Fall 12 23
Which input datasets could have led to the published table? Output dataset {q1,q2} Q Possible Input dataset (“2 - diverse”) 3 occurrences of q1 QID Cancer QID Cancer QID Cancer q1 Yes q1 Yes Q Yes q1 Yes q1 No Q Yes q1 No q1 No Q No q2 No q2 Yes Q No q2 No q2 No q2 No q2 No q2 No q2 No Lecture 6 : 590.03 Fall 12 24
Which input datasets could have led to the published table? Output dataset {q1,q2} Q Possible Input dataset (“2 - diverse”) 3 occurrences of q1 QID Cancer QID Cancer q1 Yes Q Yes Q No Q Yes Q No Q No q2 Yes Q No q2 No q2 No q2 No q2 No This is a better generalization! Lecture 6 : 590.03 Fall 12 25
Which input datasets could have led to the published table? Output dataset {q1,q2} Q Possible Input dataset (“2 - diverse”) 1 occurrence of q1 QID Cancer QID Cancer QID Cancer q2 Yes q2 Yes Q Yes q1 Yes q2 Yes Q Yes q2 No q1 No Q No q2 No q2 No Q No q2 No q2 No q2 No q2 No q2 No q2 No Lecture 6 : 590.03 Fall 12 26
Which input datasets could have led to the published table? Output dataset {q1,q2} Q Possible Input dataset (“2 - diverse”) 3 occurrences of q1 QID Cancer QID Cancer q2 Yes Q Yes Q No Q Yes Q No Q No q2 Yes Q No q2 No q2 No q2 No q2 No This is a better generalization! Lecture 6 : 590.03 Fall 12 27
Which input datasets could have led to the published table? Output dataset {q1,q2} Q Possible Input dataset (“2 - diverse”) 3 occurrences of q1 QID Cancer QID Cancer q2 Yes Q Yes There must be exactly two tuples with q1 Q No Q Yes Q No Q No q2 Yes Q No q2 No q2 No q2 No q2 No Lecture 6 : 590.03 Fall 12 28
More recommend