universally adaptive data analysis
play

Universally Adaptive Data Analysis Cynthia Dwork, Microsoft Research - PowerPoint PPT Presentation

Universally Adaptive Data Analysis Cynthia Dwork, Microsoft Research 2 : muffin tops? Adaptive Data Analysis 1 : > 6 ft? q 1 a 1 q 2 M 2 : muffin bottoms? a 2 q 3 a 3 Database data analyst


  1. Universally Adaptive Data Analysis Cynthia Dwork, Microsoft Research

  2. π‘Ÿ 2 : muffin tops? Adaptive Data Analysis π‘Ÿ 1 : > 6 ft? q 1 a 1 q 2 M π‘Ÿ 2 : muffin bottoms? a 2 𝑇 q 3 a 3 Database 𝑇 ∼ 𝐸 data analyst  π‘Ÿ 𝑗 depends on 𝑏 1 , 𝑏 2 , … , 𝑏 π‘—βˆ’1  Worry: analyst finds a query for which the dataset is not representative of population; reports surprising discovery

  3. π‘Ÿ 2 : muffin tops? Differential Privacy for Adaptive Validity π‘Ÿ 1 : > 6 ft? q 1 a 1 q 2 DP DP π‘Ÿ 2 : muffin bottoms? a 2 𝑇 q 3 a 3 Database data analyst  π‘Ÿ 𝑗 depends on 𝑏 1 , 𝑏 2 , … , 𝑏 π‘—βˆ’1  Differential privacy neutralizes risks incurred by adaptivity  Definition of privacy tailored to statistical analysis of large data sets [D., Feldman, Hardt, Pitassi , Reingold, Roth ’14]

  4. π‘Ÿ 2 : muffin tops? Differential Privacy for Adaptive Validity π‘Ÿ 1 : > 6 ft? q 1 a 1 q 2 DP DP π‘Ÿ 2 : muffin bottoms? a 2 𝑇 q 3 a 3 Database data analyst  π‘Ÿ 𝑗 depends on 𝑏 1 , 𝑏 2 , … , 𝑏 π‘—βˆ’1  Differential privacy neutralizes risks incurred by adaptivity  βˆƒ LARGE literature on DP algorithms for data analysis [D., Feldman, Hardt, Pitassi , Reingold, Roth ’14]

  5. Some Intuition  Fix a query, eg , β€œWhat fraction of population is over 6 feet tall ?”  Almost all large datasets will give an approximately correct reply  Most datasets are representative with respect to this query  If, in the process of adaptive exploration, the analyst finds a query for which the dataset is not representative, then she must have β€œlearned something significant” about the dataset.  Preserving the β€œprivacy” of the data may prevent over-fitting.

  6. Intuition After Nati’s Talk  Differential Privacy: The outcome of any analysis is essentially equally likely, independent of whether any individual joins, or refrains from joining, the dataset.  This is a stability requirement.  Gave rise to the folklore that differential privacy yields generalizability.  But we will be able to say something stronger.

  7. π‘Ÿ 2 : muffin tops? π‘Ÿ 1 : > 6 ft? q 1 a 1 q 2 DP M π‘Ÿ 2 : muffin bottoms? a 2 𝑇 q 3 a 3 Database data analyst  π‘Ÿ 𝑗 depends on 𝑏 1 , 𝑏 2 , … , 𝑏 π‘—βˆ’1  Differential privacy neutralizes risks incurred by adaptivity  E.g., for statistical queries: whp 𝐹 𝑇 𝐡 𝑇 βˆ’ 𝐹 𝑄 𝐡 𝑇 < 𝜐  High probability is important for handling many queries [D., Feldman, Hardt, Pitassi , Reingold, Roth ’14]

  8. Formalization Choose new query based on history of  Data sets 𝑇 ∈ π‘Œ π‘œ ; 𝑇 ∼ 𝐸 observations  Queries π‘Ÿ: π‘Œ π‘œ β†’ 𝑍  Algorithms that choose queries and output results Output chosen query  𝐡 1 = π‘Ÿ 1 (trivial choice), outputs (π‘Ÿ 1 , π‘Ÿ 1 (𝑇)) and its response on 𝑇  𝐡 𝑗 : π‘Œ π‘œ Γ— 𝑍 1 Γ— β‹― Γ— 𝑍 π‘—βˆ’1 β†’ 𝑍 𝑗 where  π‘Ÿ 𝑗 = 𝐷 𝑗 (𝑧 1 , … , 𝑧 π‘—βˆ’1 )  𝐡 𝑗 𝑇, 𝑧 1 , … , 𝑧 π‘—βˆ’1 = π‘Ÿ 𝑗 , π‘Ÿ 𝑗 𝑇 = (π‘Ÿ 𝑗 , 𝑏 𝑗 ) q 1 a 1  𝐼 ≝ 𝑇, π‘Ÿ π‘Ÿ 𝑇 not representative wrt 𝐸} q 2 a 2  βˆ€ 𝑧 1 , … , 𝑧 π‘—βˆ’1 Pr 𝑇, π‘Ÿ 𝑗 ∈ 𝐼 ≀ 𝛾 𝑗 q 3 𝑇 a 3  We want: Pr[ 𝑻, 𝐷 𝑗 𝑻 ∈ 𝐼] to be similar  π‘Ÿ 𝑗 (𝑇) should generalize even when π‘Ÿ 𝑗 chosen as a function of 𝑇 π‘Ÿ 𝑗 (𝑇) fails to generalize

  9. Differential Privacy [D.,McSherry,Nissim,Smith β€˜06] 𝑁 gives πœ— -differential privacy if for all pairs of adjacent data sets 𝑇, 𝑇′ , and all events π‘ˆ Pr 𝑁 𝑇 ∈ π‘ˆ ≀ 𝑓 πœ— Pr 𝑁 𝑇′ ∈ π‘ˆ Randomness introduced by 𝑁

  10. Differential Privacy [D.,McSherry,Nissim,Smith β€˜06] 𝑁 gives πœ— -differential privacy if for all pairs of adjacent data sets 𝑇, 𝑇′ , and all events π‘ˆ Pr 𝑁 𝑇 ∈ π‘ˆ ≀ 𝑓 πœ— Pr 𝑁 𝑇′ ∈ π‘ˆ For random variables 𝒀, 𝒁 over Ξ§, the max-divergence of 𝒀 from 𝒁 is given by Pr[𝒀 = 𝑦] 𝐸 ∞ (𝒀| 𝒁 = log max Pr[𝒁 = 𝑦] xβˆˆπ‘Œ Then πœ— -DP equivalent to 𝐸 ∞ 𝑁 𝑇 ||𝑁(𝑇 β€² ) ≀ πœ— .

  11. Differential Privacy [D.,McSherry,Nissim,Smith β€˜06] 𝑁 gives πœ— -differential privacy if for all pairs of adjacent data sets 𝑇, 𝑇′ , and all events π‘ˆ Pr 𝑁 𝑇 ∈ π‘ˆ ≀ 𝑓 πœ— Pr 𝑁 𝑇′ ∈ π‘ˆ For random variables 𝒀, 𝒁 over Ξ§, the max-divergence of 𝒀 from 𝒁 is given by Pr[𝒀 = 𝑦] 𝐸 ∞ (𝒀| 𝒁 = log max Pr[𝒁 = 𝑦] xβˆˆπ‘Œ Then πœ— -DP equivalent to 𝐸 ∞ 𝑁 𝑇 ||𝑁(𝑇 β€² ) ≀ πœ— . Closed under post-processing: 𝐸 ∞ 𝐡(𝑁 𝑇 )||𝐡(𝑁 𝑇 β€² ) ≀ πœ— .

  12. Differential Privacy [D.,McSherry,Nissim,Smith β€˜06] 𝑁 gives πœ— -differential privacy if for all pairs of adjacent data sets 𝑇, 𝑇′ , and all events π‘ˆ Pr 𝑁 𝑇 ∈ π‘ˆ ≀ 𝑓 πœ— Pr 𝑁 𝑇′ ∈ π‘ˆ For random variables 𝒀, 𝒁 over Ξ§, the max-divergence of 𝒀 from 𝒁 is given by Pr[𝒀 = 𝑦] 𝐸 ∞ (𝒀| 𝒁 = log max Pr[𝒁 = 𝑦] xβˆˆπ‘Œ Then πœ— -DP equivalent to 𝐸 ∞ 𝑁 𝑇 ||𝑁(𝑇 β€² ) ≀ πœ— . Group Privacy: βˆ€π‘‡, 𝑇 β€²β€² 𝐸 ∞ 𝑁 𝑇 ||𝑁 𝑇 β€² ≀ Ξ” 𝑇, 𝑇 β€²β€² πœ— .

  13. Properties  Closed under post-processing  Max-divergence remains bounded  Automatically yields group privacy  π‘™πœ— for groups of size 𝑙  Understand behavior under adaptive composition  Can bound cumulative privacy loss over multiple analyses  β€œThe epsilons add up”  Programmable  Complicated private analyses from simple private building blocks

  14. The Power of Composition  Lemma: The choice of π‘Ÿ 𝑗 is differentially private.  Closure under post-processing.  Inductive step (key): If π‘Ÿ is chosen in a differentially private fashion with respect to 𝑇 , then Pr[ 𝑻, 𝐷(𝑻) ∈ 𝐼] is small  Sufficiency: union bound. q 1 a 1 q 2 DP M a 2 𝑇 q 3 a 3 Database data analyst

  15. Description Length  Let 𝐡: π‘Œ π‘œ β†’ 𝑍 .  Description length of 𝐡 is the cardinality of its range If βˆ€π‘§ Pr 𝑇 𝑇, 𝑧 ∈ 𝐼 ≀ 𝛾 , then Pr 𝑇, 𝐡 𝑇 ∈ 𝐼 ≀ 𝑍 β‹… 𝛾 S  Description length composes too.  Product: 𝛾 β‹… Ξ  𝑗 |𝑍 𝑗 |  And, morally, it is closed under post-processing  Once you fix the randomness of the post-processing algorithm [D., Feldman, Hardt, Pitassi, Reingold, Roth ’15]

  16. Approximate max-divergence 𝛾 -approximate max-divergence of 𝒀 from 𝒁 Pr 𝒀 ∈ π‘ˆ βˆ’ 𝛾 𝛾 (𝒀| 𝒁 = log 𝐸 ∞ max Pr[𝒁 ∈ π‘ˆ] π‘ˆβˆˆπ‘Œ, Pr π’€βˆˆπ‘ˆ >𝛾

  17. Approximate max-divergence 𝛾 -approximate max-divergence of 𝒀 from 𝒁 Pr 𝒀 ∈ π‘ˆ βˆ’ 𝛾 𝛾 (𝒀| 𝒁 = log 𝐸 ∞ max Pr[𝒁 ∈ π‘ˆ] π‘ˆβˆˆπ‘Œ, Pr π’€βˆˆπ‘ˆ >𝛾 We are interested in (with 𝛾 , but too messy) Pr[ 𝑻,𝐡 𝑻 βˆˆπ‘ˆ] 𝐸 ∞ ((𝑻, 𝐡 𝑻 )||𝑻 Γ— 𝐡 𝑻 ) = log max Pr[𝑻×𝐡 𝑻 βˆˆπ‘ˆ] π‘ˆ

  18. Approximate max-divergence 𝛾 -approximate max-divergence of 𝒀 from 𝒁 Pr 𝒀 ∈ π‘ˆ βˆ’ 𝛾 𝛾 (𝒀| 𝒁 = log 𝐸 ∞ max Pr[𝒁 ∈ π‘ˆ] π‘ˆβˆˆπ‘Œ, Pr π’€βˆˆπ‘ˆ >𝛾 We are interested in (with 𝛾 , but too messy) Pr[ 𝑻,𝐡 𝑻 βˆˆπ‘ˆ] 𝐸 ∞ ((𝑻, 𝐡 𝑻 )||𝑻 Γ— 𝐡 𝑻 ) = log max Pr[𝑻×𝐡 𝑻 βˆˆπ‘ˆ] π‘ˆ How much more likely is 𝐡(𝑇) to relate to 𝑇 than to a fresh 𝑇′ ? Captures the maximum amount of information that an output of an algorithm might reveal about its input

  19. Unifying Concept: Max-Information 𝛾 𝒀; 𝒁 = 𝐸 ∞ 𝛾 ((𝒀, 𝒁)||𝒀 Γ— 𝒁)  𝐽 ∞ 𝛾 (𝑻; 𝐡 𝑻 )  We are interested in 𝐽 ∞ 𝛾 𝑻; 𝐡 𝑻 ≀ 𝑙 then for any π‘ˆ βŠ† π‘Œ π‘œ Γ— 𝑍  Theorem: If 𝐽 ∞ ∈ π‘ˆ ≀ 2 𝑙 Pr 𝑻 Γ— 𝐡 𝑻 ∈ π‘ˆ + 𝛾  Pr 𝑻, 𝐡 𝑻 ∈ 𝐼 ≀ 2 𝑙 max  So Pr 𝑻, 𝐡 𝑻 π‘§βˆˆπ‘ Pr 𝑻, 𝑧 ∈ 𝐼 + 𝛾 ! [D., Feldman, Hardt, Pitassi, Reingold, Roth ’15]

  20. Unifying Concept: Max-Information 𝛾 𝒀; 𝒁 = 𝐸 ∞ 𝛾 ((𝒀, 𝒁)||𝒀 Γ— 𝒁)  𝐽 ∞ 𝛾 (𝑻; 𝐡 𝑻 )  We are interested in 𝐽 ∞ 𝛾 𝑻; 𝐡 𝑻 ≀ 𝑙 then for any π‘ˆ βŠ† π‘Œ π‘œ Γ— 𝑍  Theorem: If 𝐽 ∞ ∈ π‘ˆ ≀ 2 𝑙 Pr 𝑻 Γ— 𝐡 𝑻 ∈ π‘ˆ + 𝛾  Pr 𝑻, 𝐡 𝑻 ∈ 𝐼 ≀ 2 𝑙 max  So Pr 𝑻, 𝐡 𝑻 π‘§βˆˆπ‘ Pr 𝑻, 𝑧 ∈ 𝐼 + 𝛾 !  Max-Information composes and is closed under post-processing 𝛾 (𝐡, π‘œ) .  For πœ— -DP 𝐡 : 𝐽 ∞ 𝐡, π‘œ ≀ πœ—π‘œ log 2 𝑓 . Better bounds for 𝐽 ∞ 𝛾 𝐡, π‘œ ≀ log 𝑍  𝐽 ∞ 𝛾 Bound on worst case approximate max info for any distribution on π‘œ -element databases [D., Feldman, Hardt, Pitassi, Reingold, Roth ’15]

  21. Abstract is Good  Focusing on properties is powerful  Completely universal approach to validity of adaptive analysis  DP, small description length, low max-information  Large numbers of arbitrary adaptively chosen computations  Closure under post-processing and composition

Recommend


More recommend