Universally Adaptive Data Analysis Cynthia Dwork, Microsoft Research
π 2 : muffin tops? Adaptive Data Analysis π 1 : > 6 ft? q 1 a 1 q 2 M π 2 : muffin bottoms? a 2 π q 3 a 3 Database π βΌ πΈ data analyst ο½ π π depends on π 1 , π 2 , β¦ , π πβ1 ο½ Worry: analyst finds a query for which the dataset is not representative of population; reports surprising discovery
π 2 : muffin tops? Differential Privacy for Adaptive Validity π 1 : > 6 ft? q 1 a 1 q 2 DP DP π 2 : muffin bottoms? a 2 π q 3 a 3 Database data analyst ο½ π π depends on π 1 , π 2 , β¦ , π πβ1 ο½ Differential privacy neutralizes risks incurred by adaptivity ο½ Definition of privacy tailored to statistical analysis of large data sets [D., Feldman, Hardt, Pitassi , Reingold, Roth β14]
π 2 : muffin tops? Differential Privacy for Adaptive Validity π 1 : > 6 ft? q 1 a 1 q 2 DP DP π 2 : muffin bottoms? a 2 π q 3 a 3 Database data analyst ο½ π π depends on π 1 , π 2 , β¦ , π πβ1 ο½ Differential privacy neutralizes risks incurred by adaptivity ο½ β LARGE literature on DP algorithms for data analysis [D., Feldman, Hardt, Pitassi , Reingold, Roth β14]
Some Intuition ο½ Fix a query, eg , βWhat fraction of population is over 6 feet tall ?β ο½ Almost all large datasets will give an approximately correct reply ο½ Most datasets are representative with respect to this query ο½ If, in the process of adaptive exploration, the analyst finds a query for which the dataset is not representative, then she must have βlearned something significantβ about the dataset. ο½ Preserving the βprivacyβ of the data may prevent over-fitting.
Intuition After Natiβs Talk ο½ Differential Privacy: The outcome of any analysis is essentially equally likely, independent of whether any individual joins, or refrains from joining, the dataset. ο½ This is a stability requirement. ο½ Gave rise to the folklore that differential privacy yields generalizability. ο½ But we will be able to say something stronger.
π 2 : muffin tops? π 1 : > 6 ft? q 1 a 1 q 2 DP M π 2 : muffin bottoms? a 2 π q 3 a 3 Database data analyst ο½ π π depends on π 1 , π 2 , β¦ , π πβ1 ο½ Differential privacy neutralizes risks incurred by adaptivity ο½ E.g., for statistical queries: whp πΉ π π΅ π β πΉ π π΅ π < π ο½ High probability is important for handling many queries [D., Feldman, Hardt, Pitassi , Reingold, Roth β14]
Formalization Choose new query based on history of ο½ Data sets π β π π ; π βΌ πΈ observations ο½ Queries π: π π β π ο½ Algorithms that choose queries and output results Output chosen query ο½ π΅ 1 = π 1 (trivial choice), outputs (π 1 , π 1 (π)) and its response on π ο½ π΅ π : π π Γ π 1 Γ β― Γ π πβ1 β π π where ο½ π π = π· π (π§ 1 , β¦ , π§ πβ1 ) ο½ π΅ π π, π§ 1 , β¦ , π§ πβ1 = π π , π π π = (π π , π π ) q 1 a 1 ο½ πΌ β π, π π π not representative wrt πΈ} q 2 a 2 ο½ β π§ 1 , β¦ , π§ πβ1 Pr π, π π β πΌ β€ πΎ π q 3 π a 3 ο½ We want: Pr[ π», π· π π» β πΌ] to be similar ο½ π π (π) should generalize even when π π chosen as a function of π π π (π) fails to generalize
Differential Privacy [D.,McSherry,Nissim,Smith β06] π gives π -differential privacy if for all pairs of adjacent data sets π, πβ² , and all events π Pr π π β π β€ π π Pr π πβ² β π Randomness introduced by π
Differential Privacy [D.,McSherry,Nissim,Smith β06] π gives π -differential privacy if for all pairs of adjacent data sets π, πβ² , and all events π Pr π π β π β€ π π Pr π πβ² β π For random variables π, π over Ξ§, the max-divergence of π from π is given by Pr[π = π¦] πΈ β (π| π = log max Pr[π = π¦] xβπ Then π -DP equivalent to πΈ β π π ||π(π β² ) β€ π .
Differential Privacy [D.,McSherry,Nissim,Smith β06] π gives π -differential privacy if for all pairs of adjacent data sets π, πβ² , and all events π Pr π π β π β€ π π Pr π πβ² β π For random variables π, π over Ξ§, the max-divergence of π from π is given by Pr[π = π¦] πΈ β (π| π = log max Pr[π = π¦] xβπ Then π -DP equivalent to πΈ β π π ||π(π β² ) β€ π . Closed under post-processing: πΈ β π΅(π π )||π΅(π π β² ) β€ π .
Differential Privacy [D.,McSherry,Nissim,Smith β06] π gives π -differential privacy if for all pairs of adjacent data sets π, πβ² , and all events π Pr π π β π β€ π π Pr π πβ² β π For random variables π, π over Ξ§, the max-divergence of π from π is given by Pr[π = π¦] πΈ β (π| π = log max Pr[π = π¦] xβπ Then π -DP equivalent to πΈ β π π ||π(π β² ) β€ π . Group Privacy: βπ, π β²β² πΈ β π π ||π π β² β€ Ξ π, π β²β² π .
Properties ο½ Closed under post-processing ο½ Max-divergence remains bounded ο½ Automatically yields group privacy ο½ ππ for groups of size π ο½ Understand behavior under adaptive composition ο½ Can bound cumulative privacy loss over multiple analyses ο½ βThe epsilons add upβ ο½ Programmable ο½ Complicated private analyses from simple private building blocks
The Power of Composition ο½ Lemma: The choice of π π is differentially private. ο½ Closure under post-processing. ο½ Inductive step (key): If π is chosen in a differentially private fashion with respect to π , then Pr[ π», π·(π») β πΌ] is small ο½ Sufficiency: union bound. q 1 a 1 q 2 DP M a 2 π q 3 a 3 Database data analyst
Description Length ο½ Let π΅: π π β π . ο½ Description length of π΅ is the cardinality of its range If βπ§ Pr π π, π§ β πΌ β€ πΎ , then Pr π, π΅ π β πΌ β€ π β πΎ S ο½ Description length composes too. ο½ Product: πΎ β Ξ π |π π | ο½ And, morally, it is closed under post-processing ο½ Once you fix the randomness of the post-processing algorithm [D., Feldman, Hardt, Pitassi, Reingold, Roth β15]
Approximate max-divergence πΎ -approximate max-divergence of π from π Pr π β π β πΎ πΎ (π| π = log πΈ β max Pr[π β π] πβπ, Pr πβπ >πΎ
Approximate max-divergence πΎ -approximate max-divergence of π from π Pr π β π β πΎ πΎ (π| π = log πΈ β max Pr[π β π] πβπ, Pr πβπ >πΎ We are interested in (with πΎ , but too messy) Pr[ π»,π΅ π» βπ] πΈ β ((π», π΅ π» )||π» Γ π΅ π» ) = log max Pr[π»Γπ΅ π» βπ] π
Approximate max-divergence πΎ -approximate max-divergence of π from π Pr π β π β πΎ πΎ (π| π = log πΈ β max Pr[π β π] πβπ, Pr πβπ >πΎ We are interested in (with πΎ , but too messy) Pr[ π»,π΅ π» βπ] πΈ β ((π», π΅ π» )||π» Γ π΅ π» ) = log max Pr[π»Γπ΅ π» βπ] π How much more likely is π΅(π) to relate to π than to a fresh πβ² ? Captures the maximum amount of information that an output of an algorithm might reveal about its input
Unifying Concept: Max-Information πΎ π; π = πΈ β πΎ ((π, π)||π Γ π) ο½ π½ β πΎ (π»; π΅ π» ) ο½ We are interested in π½ β πΎ π»; π΅ π» β€ π then for any π β π π Γ π ο½ Theorem: If π½ β β π β€ 2 π Pr π» Γ π΅ π» β π + πΎ ο½ Pr π», π΅ π» β πΌ β€ 2 π max ο½ So Pr π», π΅ π» π§βπ Pr π», π§ β πΌ + πΎ ! [D., Feldman, Hardt, Pitassi, Reingold, Roth β15]
Unifying Concept: Max-Information πΎ π; π = πΈ β πΎ ((π, π)||π Γ π) ο½ π½ β πΎ (π»; π΅ π» ) ο½ We are interested in π½ β πΎ π»; π΅ π» β€ π then for any π β π π Γ π ο½ Theorem: If π½ β β π β€ 2 π Pr π» Γ π΅ π» β π + πΎ ο½ Pr π», π΅ π» β πΌ β€ 2 π max ο½ So Pr π», π΅ π» π§βπ Pr π», π§ β πΌ + πΎ ! ο½ Max-Information composes and is closed under post-processing πΎ (π΅, π) . ο½ For π -DP π΅ : π½ β π΅, π β€ ππ log 2 π . Better bounds for π½ β πΎ π΅, π β€ log π ο½ π½ β πΎ Bound on worst case approximate max info for any distribution on π -element databases [D., Feldman, Hardt, Pitassi, Reingold, Roth β15]
Abstract is Good ο½ Focusing on properties is powerful ο½ Completely universal approach to validity of adaptive analysis ο½ DP, small description length, low max-information ο½ Large numbers of arbitrary adaptively chosen computations ο½ Closure under post-processing and composition
Recommend
More recommend