L EARNING UNDER D IFFERENTIAL P RIVACY K OBBI N ISSIM BGU/Harvard Caltech, Spring 2015 Based on joint work with: Amos Beimel, Avrim Blum, Hai Brenner, Mark Bun, Cynthia Dwork, Shiva Kasiviswanathan, Homin Lee, Frank McSherry, Sofya Raskhodnikova, Adam Smith, Uri Stemmer, and Salil Vadhan.
L ET ’ S DO SOME SWEET SCIENCE ! Scurvy: a problem throughout human history Caused by vitamin C deficiency How much vitamin C is enough? Thanks: Mark Bun
S O YOU HAVE SOME DATA … 2 57 83 24 153 121 176 182 . . . . . . 0 1 2 3 . . . . . . T Vitamin C level Thanks: Mark Bun
S O YOU HAVE SOME DATA … c: threshold func that is consistent with data 2 57 83 24 x x x x 153 121 176 182 . . . . . . 0 1 2 3 . . . . . . T Vitamin C level Thanks: Mark Bun
S O YOU HAVE SOME DATA … c: threshold func that is consistent with data 2 57 83 24 x x x x 153 121 176 182 . . . . . . 0 1 2 3 . . . . . . T Vitamin C level Thanks: Mark Bun
S O YOU HAVE SOME DATA … c: threshold func that is consistent with data Theorem: if n > n 0 then c also “agrees” with underlying distribution ¡ n 0 depends on learner accuracy and success probability ¡ n 0 examples suffice independent of domain size! 2 24 57 83 x x x x 121 153 Thanks: Mark Bun . . .
W HAT ’ S THE P ROBLEM ? The hypothesis threshold reveals someone’s data point! With the right auxiliary information, could be linked to Shiva! 2 24 57 83 x x x x 121 153 Thanks: Mark Bun . . .
S AVING S HIVA ’ S P RIVACY Idea: “noisy” choice of threshold hides individual contribution! 2 57 83 24 x x x x 153 121 176 182 . . . . . . 0 1 2 3 . . . . . . T Vitamin C level Thanks: Mark Bun
Had it been the year 2000 (AD) …
Had it been the year 2000 (AD) …
Brilliant, isn’t it? Time for coffee and cookies THANK YOU! ? But ... ? ?
LET’S TAKE A STEP BACK
D ATA P RIVACY – T HE P ROBLEM Given: ¡ A dataset with sensitive information How to: ¡ Compute and release functions of the dataset without compromising individual privacy Database X Server/agency Users x 1 x 2 Government, answers ) ( queries x 3 researchers, A businesses � (or) x n-1 Malicious x n adversary
D ATA P RIVACY – T HE P ROBLEM Given: ¡ A dataset with sensitive information How to: ¡ Compute and release functions of the dataset without compromising individual privacy Hospital: (based on past patients) predict whether a patient is prone to scurvy, based on vitamin c level in her blood Bank: (based on past customers) predict whether a new customer is good/bad credit, based on her attributes Example, label, presence in database may all be sensitive!
Differential Privacy [DMNS 06] Evolved in [DN’03, EGS’03, DN’04, BDMN’05, DMNS’06, DKMMN’06] Intuition: to protect an individual make sure that changing her record does not change the output distribution (by too much) D D’ x 1 x 1 x 2 x 2 x 3 A( D ) x’ 3 A( D’ ) A A � � x n-1 x n-1 x n x n
Differential Privacy [DMNS 06] Definition: A algorithm A is ( ε , δ )-differentially private if: for all neighboring databases D , D’ and for all sets of answers S: Pr[A( D ) ∈ S] ≤ e ε Pr[A( D’ ) ∈ S] + δ D D’ x 1 x 1 x 2 x 2 x 3 A( D ) x’ 3 A( D’ ) A A � � x n-1 x n-1 x n x n
Differential Privacy [DMNS 06] Definition: A algorithm A is ( ε , δ )-differentially private if: for all neighboring databases D , D’ and for all sets of answers S: Pr[A( D ) ∈ S] ≤ e ε Pr[A( D’ ) ∈ S] + δ 𝑓↑𝜗 ≈1+ 𝜗 . . 𝜀 ≪ 1 /𝑜 Take 𝜗 > 1 /𝑜 ¡ , Pure: 𝜀 =0 otherwise, no Approx.: 𝜀 >0 utility!
LEARNING
PAC Model [Valiant 84] Fresh point picked according to distribution P With high probability (over randomness of learner and distribution), a random point drawn according to P is “classified” correctly A distribution P on X. Each point in X labeled 0/1 Samples drawn according to P
PAC Learning: Definition Given distribution P over examples, labeled by c Hypothesis h is α-‑ good if error(h) = Pr x~P [h(x) ≠ c(x)] ≤ α C: a set of concepts {c: {0,1} d → {0, 1}} H: a set of hypotheses {h: {0,1} d → {0, 1}} Algorithm A PAC learns C with H if, Given examples drawn from P, labeled by some c ∈ C: (x 1 ,c(x 1 )),…,(x n ,c(x n )) A outputs an α-‑ good hypothesis h ∈ H w.p. 1 - β ¡ Proper: C = H Fact: Θ (VC(C)) samples for PAC learning C (properly) ¡ VC(C) ≤ log|C|
PRIVATE LEARNING
W HY P RIVATE L EARNING ? Party line [KLNRS’08]: abstracts many of the computations done over collections of sensitive information Test-bed for ideas – problems and mitigation Learning intimately related with differential privacy ¡ Learning theory tools useful for privacy [BLR’08, HR’10, …] ¡ Differential privacy implies generalization [M’?, DFHPRR’15, BSSU’15, NS’15] ÷ In a sense, all differential privacy allows us is to learn!
P RIVATE L EARNING Definition [KLNRS’08]: ¡ Algorithm A Private- PAC learns C with H if, ÷ A PAC learns C with H, and ÷ A is ( ε , δ )–differentially private
Example 1: Privately Learning Points E 𝒅↓𝒌 ( 𝒚 ) 𝒅↓𝒌 𝑼↓𝒆 = { ¡ █□ . ¡ ¡ ¡𝟏 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝒌 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝑼 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ } 𝑸𝑷𝑱𝑶𝑼↓ 𝑸𝑷 𝒅↓𝒌 (𝒚) =𝟐 ¡ ¡⟺ ¡ ¡ 𝒚 = 𝒌 𝒅↓𝒌 𝒚 Given labeled examples 𝑻 = (𝒚↓ ↑𝒏 provide: 𝒚↓𝒋 , 𝒛↓ 𝒛↓𝒋 )↓𝒋 =𝟐 ↑𝒏 1) Differential Privacy 2) If 𝑻 is consistent with some 𝒅↓𝒌 ↓𝒆 , then 𝒅↓𝒌 ∈𝐐𝐏𝐉 𝐏𝐉𝐎 𝐔 ↓𝒆 w.h.p. outputs an 𝒊 s.t. 𝒛↓𝒋 }| ≤ 𝜷 𝒇𝒔𝒔𝒑𝒔↓ 𝒇𝒔 𝒔↓𝑻 (𝒊) = 𝟐 /𝒏 /𝒏 |{𝒋 ¡: 𝒊(𝒚↓ 𝒚↓𝒋 ) ≠ 𝒛↓
Example 2: Privately Learning Thresholds ↓𝒆 = { ¡ █□ . ¡ ¡ ¡𝟏 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝒌 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝑼 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ } 𝐔𝐈𝐒𝐅𝐅𝐓𝐈𝐏𝐌 𝐏𝐌 𝐄 ↓𝒆 𝒅↓𝒌 ( 𝒚 ) 𝒅↓𝒌 𝒅↓𝒌 (𝒚) =𝟐 ¡ ¡⟺ ¡ ¡ 𝒚 < 𝒌 𝒅↓𝒌 𝒚 Given labeled examples 𝑻 = (𝒚↓ ↑𝒏 provide: 𝒚↓𝒋 , 𝒛↓ 𝒛↓𝒋 )↓𝒋 =𝟐 ↑𝒏 1) Differential Privacy 2) If 𝑻 is consistent with some 𝒅↓𝒌 ↓𝒆 , 𝒅↓𝒌 ∈𝐔𝐈𝐒𝐅𝐓𝐈𝐏𝐌 𝐏𝐌 𝐄 ↓𝒆 then w.h.p. outputs an 𝒊 s.t. 𝒛↓𝒋 }| ≤ 𝜷 𝒇𝒔 𝒇𝒔𝒔𝒑𝒔↓ 𝒔↓𝑻 (𝒊) = 𝟐 /𝒏 /𝒏 |{𝒋 ¡: 𝒊(𝒚↓ 𝒚↓𝒋 ) ≠ 𝒛↓
A General Feasibility Result Theorem [KLNRS 08]: Every finite concept class C can be learned privately (and properly), using O(log| C|) examples Generic Construction (based on the Exponential Mechanism of [MT07]): ¡ Define q(D,h) = # of x i ’s correctly classified by h ¡ Output hypothesis h from C w.p. ≈ e ε q(D,h) q(D,h)=4 q(D,h)=3
A General Feasibility Result Generic Construction (based on the Exponential Mechanism of [MT07]): ¡ Define q(D,h) = # of x i ’s incorrectly classified by h ¡ Output hypothesis h from C w.p. ≈ e - ε q(D,h) Privacy: ¡ changing one example changes q(D,h) by at most 1 ¡ Probability of outputting h changes by a factor of at most e ε Utility: ¡ If h has error > α , probability of outputting h is at most e - ε α n ¡ Union bound: probability of outputting some h with error > α at most |C| e - ε α n ¡ Suffices to take n =O(log |C|)
Had it been the year 2008 …
Brilliant, isn’t it? Time for coffee and cookies THANK YOU! ? But … ? ?
Privately Learning Points/Thresholds Fact: Proper Point/Threshold learner with O(1) 2 d samples Generic construction of private learners results in O(log |C|) = O(log(T)) samples Is this gap essential? Why do we care? ¡ Want private learners to be as efficient as non-private ones ¡ Generic construction fails when domain infinite Thm [BKN 10]: Any proper pure-private learner of Points/Threshold must use Ω ( log( 𝑈 ) ) samples
C AN WE DO BETTER ? Recall: O(log|C|) examples to beat union bound in exponential mechanism analysis Idea: what if we choose the outcome hypothesis from a set smaller than C?
Recommend
More recommend