Building complex DP algorithms using composition Privacy & Fairness in Data Science CS848 Fall 2019
2 Outline • Recap – Laplace Mechanism • Composition Theorems • Optimizing accuracy of DP algorithms – Utilizing Parallel Composition – Postprocessing & Inference – Strategy Selection – Data dependent noise
3 Differential Privacy [Dwork ICALP 2006] For every pair of inputs For every output … that differ in one row D 1 D 2 O Adversary should not be able to distinguish between any D 1 and D 2 based on any O ∀Ω ∈ range A , ln Pr[𝐵 𝐸 0 ∈ Ω] ≤ 𝜁, 𝜁 > 0 Pr[𝐵 𝐸 2 ∈ Ω]
4 Laplace mechanism e.g., COUNT Aggregate Query: q D Noisy Answer Analyst Private Database 𝒓 𝑬 = 𝒓 𝑬 + 𝐌𝐛𝐪 𝑻(𝒓) 7 𝜻 Sensitivity -10 -5 0 5 10
5 Outline • Recap – Laplace Mechanism • Composition Theorems • Optimizing accuracy of DP algorithms – Utilizing Parallel Composition – Postprocessing & Inference – Strategy Selection – Data dependent noise
6 Sequential Composition M 1 , ε 1 M 1 (D) M 2 , ε 2 D M 2 (D, M 1 (D)) … Private Database • If M 1 , M 2 , ..., M k are algorithms that access a private database D such that each M i satisfies ε i -differential privacy, then the combination of their outputs satisfies ε- differential privacy with ε = ε 1 + ... + ε k
7 Parallel Composition M 1 , ε 1 D 1 M 1 (D 1 ) M 2 , ε 2 M 2 (D 2 ) D 2 … Private Database • If M 1 , M 2 , ..., M k are algorithms that access are algorithms that access disjoint databases D 1 , D 2 , …, D k such that each M i satisfies ε i -differential privacy, then the combination of their outputs satisfies ε- differential privacy with ε = max(ε 1 , ... , ε k )
8 Postprocessing M, ε D A(M(D)) M(D) A Private Database • If M is an ε-differentially private algorithm, any additional post-processing 𝐵 ∘ 𝑁 also satisfies ε- differential privacy.
9 Transformations & Stability M, ε V V(D) D M(V(D)) Transformed Private Database Database Transformation need not satisfy DP • 𝜏 F : Stability of the transformation – Maximum number of rows in V that can change due to changing a single row in D
10 Transformations & Stability M, ε V V(D) D M(V(D)) Transformed Private Database Database • Executing an ε-differentially private algorithm M on a transformation of a database V(D) satisfies 𝜁 G 𝜏 F -differential privacy. • 𝜏 F : Stability of the transformation – Maximum number of rows in V that can change due to changing a single row in D
11 Transformations & Stability • V 1 : For each row (x1, x2, x3) à (x1, x2+x3) Stability = 1 • V 2 : Each row in D is a tweet (id, {words}). For each row in D, generate k rows with first k words {(id, word 1 ), …, (id, word k )} Stability = k • V 3 : Sample each row with probability p. Stability = 1 … but can prove 2p 𝜁 -differential privacy* *Adam Smith, Differential Privacy and Secrecy of the Sample
12 Outline • Recap – Laplace Mechanism • Composition Theorems • Optimizing accuracy of DP algorithms – Utilizing Parallel Composition – Postprocessing & Inference – Strategy Selection – Data dependent noise
13 Problem Sex Height Weight Queries: M 6’2” 210 # Males with BMI < 25 • F 5’3” 190 # Males • F 5’9” 160 # Females with BMI < 25 • M 5’3” 180 # Females • M 6’7” 250 • Design an ε-differentially private algorithm that can answer all these questions. • What is the total error?
14 Algorithm 1 Return: • (# Males with BMI < 25) + Lap(4/ε) • (# Males) + Lap(4/ε) • (# Females with BMI) < 25 + Lap(4/ε) • (# Females) + Lap(4/ε)
15 Privacy • BMI can be computed by transforming each row (s, h, w) à (s, bmi). This is stability 1. • Sensitivity of count = 1. So each query is answered using a ε/4-DP algorithm. • By sequential composition, we get ε-DP.
16 Utility Error: 2 M 𝐹 𝑟 𝐸 − 𝑟 𝐸 O Total Error: 2 2 4 ×4 = 128 𝜁 2 𝜁
17 Algorithm 2 Compute: 𝑟 0 = (# Males with BMI < 25) + Lap(1/ε) • V 𝑟 2 = (# Males with BMI > 25) + Lap(1/ε) • V 𝑟 W = (# Females with BMI < 25) + Lap(1/ε) • V 𝑟 X = (# Females with BMI > 25) + Lap(1/ε) • V Return 𝑟 0 , V 𝑟 0 + V 𝑟 2 , V 𝑟 W , V 𝑟 W + V • V 𝑟 X
18 Privacy • Sensitivity of count = 1. So each query is answered using a ε-DP algorithm. • 𝑟 0 , 𝑟 2 , 𝑟 W , 𝑟 X are counts on disjoint portions of the database. Thus by parallel composition releasing V 𝑟 0 , V 𝑟 2 , V 𝑟 W , V 𝑟 X satisfies ε-DP. • By the postprocessing theorem , releasing V 𝑟 0 , V 𝑟 0 + V 𝑟 2 , 𝑟 W , V 𝑟 W + V 𝑟 X also satisfies ε-DP. V
19 Utility Error: 2 M 𝐹 𝑟 𝐸 − 𝑟 𝐸 O Total Error: 2 2 2 2 2 1 + 2 G 2 1 + 2 1 + 2 G 2 1 = 12 𝜁 2 𝜁 𝜁 𝜁 𝜁 V 𝑟 0 V 𝑟 0 + V 𝑟 2 𝑟 W V V 𝑟 W + V 𝑟 X
20 Utility Tighter privacy analysis gives better accuracy for the same level of privacy Total Error: 2 2 2 2 2 1 + 2 G 2 1 + 2 1 + 2 G 2 1 = 12 𝜁 2 𝜁 𝜁 𝜁 𝜁 V 𝑟 0 V 𝑟 0 + V 𝑟 2 𝑟 W V V 𝑟 W + V 𝑟 X
21 Generalized Sensitivity • Let 𝑔: → ℝ ] be a function that outputs a vector of d real numbers. The sensitivity of f is given by: a,a b : |a∆a b |e0 𝑔 𝐸 − 𝑔(𝐸 f ) 0 𝑇 𝑔 = max where 𝐲 − 𝐳 0 = ∑ j 𝑦 j − 𝑧 j
22 Generalized Sensitivity • 𝑟 0 = # Males with BMI < 25 • 𝑟 2 = # Males with BMI > 25 • 𝑟 = # Males with BMI • Let f 1 be a function that answers both 𝑟 0 , 𝑟 2 • Let f 2 be a function that answers both 𝑟 0 , 𝑟 • Sensitivity of f 1 = 1 • Sensitivity of f 2 = 2 • An alternate privacy proof for Alg 2 is to show that the generalized sensitivity of V 𝑟 0 , V 𝑟 2 , V 𝑟 W , V 𝑟 X is 1.
23 Outline • Recap – Laplace Mechanism • Composition Theorems • Optimizing accuracy of DP algorithms – Utilizing Parallel Composition – Postprocessing & Inference – Strategy Selection – Data dependent noise
24 Improving utility of Alg 2 Compute: 𝑟 0 = # Males with BMI < 25 + Lap(1/ε) • V 𝑟 2 = # Males with BMI > 25 + Lap(1/ε) • V Return 𝑟 0 , V 𝑟 0 + V • V 𝑟 2 We know 𝑟 0 ≤ 𝑟 0 + 𝑟 2 , but P[ V 𝑟 0 > V 𝑟 0 + V 𝑟 2 ] > 0
25 Constrained Inference DATA OWNER ANALYST Q ( I ) Q ( I ) Step 1 I Diff. • • Private Interface Q ( I ) = q Constrained ˜ q q • • Inference Private Step 2 Data Step 3
26 Constrained Inference • 𝑟 0 , 𝑟 2 , …, 𝑟 m be a set of queries 𝑟 0 , V 𝑟 2 , …, V 𝑟 m be the noisy answers • V • Constraint C( 𝑟 0 , 𝑟 2 , …, 𝑟 m ) = 1 holds on true answers (for all typical databases), but does not hold on noisy answers. • Goal: Find 𝑟 0 , 𝑟 2 , …, 𝑟 m that are: – Close to V 𝑟 0 , V 𝑟 2 , …, V 𝑟 m – Satisfy the constraint C( 𝑟 0 , 𝑟 2 , …, 𝑟 m )
27 Least Squares Optimization 𝑟 0 − 𝑟 0 2 min M V 𝑡. 𝑢. 𝐷(𝑟 0 , 𝑟 2 , … , 𝑟 m )
28 Geometric Interpretation 𝑟 0 − 𝑟 0 2 min M V Noise 𝑡. 𝑢. 𝐷(𝑟 0 , 𝑟 2 , … , 𝑟 m ) 𝑟 0 , V 𝑟 2 , …, V 𝒓 = (V 7 𝑟 m ) 𝒓 = (𝑟 0 , 𝑟 2 , …, 𝑟 m ) Space of Outputs t 𝒓 = (𝑟 0 , 𝑟 2 , … , 𝑟 m ) satisfying the Projection constraint
29 Geometric Interpretation 𝑟 0 − 𝑟 0 2 min M V Noise 𝑡. 𝑢. 𝐷(𝑟 0 , 𝑟 2 , … , 𝑟 m ) 𝑟 0 , V 𝑟 2 , …, V 7 𝒓 = (V 𝑟 m ) 𝒓 = (𝑟 0 , 𝑟 2 , …, 𝑟 m ) Space of Outputs 𝒓 = (𝑟 0 , 𝑟 2 , … , 𝑟 m ) t satisfying the Projection constraint Theorem: 𝒓 − t 𝒓 2 when the constraints 𝒓 2 ≤ 𝒓 − 7 form a convex space
30 Ordering Constraint 𝑟 0 − 𝑟 0 2 min M V Isotonic Regression: 𝑡. 𝑢. 𝑟 0 ≤ 𝑟 0 ≤ … ≤ 𝑟 m
31 Outline • Recap – Laplace Mechanism • Composition Theorems • Optimizing accuracy of DP algorithms – Utilizing Parallel Composition – Postprocessing & Inference – Strategy Selection – Data dependent noise
32 Problem Sex Height Weight Queries: M 6’2” 210 # people with height in [5’1”, 6’2”] • F 5’3” 190 # people with height in [2’0”, 4’0”] • F 5’9” 160 # people with height in [3’3”, 7’0”] • M 5’3” 180 … • M 6’7” 250 • Design an ε-differentially private algorithm that can answer all range queries. • What is the total error?
33 Problem • Let {v 1 , …, v k } be the domain of an attribute • Let {x 1 , …, x k } be the number of rows with values v 1 , …, v k • Range Query: q ij = x i + x i+1 + …+ x j • Goal: Answer all range queries
34 Strategy 1: • Answer all range queries using Laplace mechanism • Sensitivity: O( 𝑙 2 ) • Total Error: O( 𝑙 X /𝜁 2 )
35 Strategy 2: • Estimate each individual x i using Laplace mechanism • Answer: 𝑟 jw = 7 𝑦 jx0 +…+ 7 𝑦 j + V 𝑦 w • Error in each 7 𝑦 j : 𝑃(1/𝜁 2 ) • Error in 𝑟 0m : 𝑃(𝑙/𝜁 2 ) • Total Error: 𝑃(𝑙 W /𝜁 2 )
Recommend
More recommend