l earning under
play

L EARNING UNDER D IFFERENTIAL P RIVACY K OBBI N ISSIM BGU/Harvard - PowerPoint PPT Presentation

L EARNING UNDER D IFFERENTIAL P RIVACY K OBBI N ISSIM BGU/Harvard Caltech, Spring 2015 Based on joint work with: Amos Beimel, Avrim Blum, Hai Brenner, Mark Bun, Cynthia Dwork, Shiva Kasiviswanathan, Homin Lee, Frank McSherry, Sofya


  1. L EARNING UNDER D IFFERENTIAL P RIVACY K OBBI N ISSIM BGU/Harvard Caltech, Spring 2015 Based on joint work with: Amos Beimel, Avrim Blum, Hai Brenner, Mark Bun, Cynthia Dwork, Shiva Kasiviswanathan, Homin Lee, Frank McSherry, Sofya Raskhodnikova, Adam Smith, Uri Stemmer, and Salil Vadhan.

  2. L ET ’ S DO SOME SWEET SCIENCE ! — Scurvy: a problem throughout human history — Caused by vitamin C deficiency — How much vitamin C is enough? Thanks: Mark Bun

  3. S O YOU HAVE SOME DATA … 2 57 83 24 153 121 176 182 . . . . . . 0 1 2 3 . . . . . . T Vitamin C level Thanks: Mark Bun

  4. S O YOU HAVE SOME DATA … — c: threshold func that is consistent with data 2 57 83 24 x x x x 153 121 176 182 . . . . . . 0 1 2 3 . . . . . . T Vitamin C level Thanks: Mark Bun

  5. S O YOU HAVE SOME DATA … — c: threshold func that is consistent with data 2 57 83 24 x x x x 153 121 176 182 . . . . . . 0 1 2 3 . . . . . . T Vitamin C level Thanks: Mark Bun

  6. S O YOU HAVE SOME DATA … — c: threshold func that is consistent with data — Theorem: if n > n 0 then c also “agrees” with underlying distribution ¡ n 0 depends on learner accuracy and success probability ¡ n 0 examples suffice independent of domain size! 2 24 57 83 x x x x 121 153 Thanks: Mark Bun . . .

  7. W HAT ’ S THE P ROBLEM ? — The hypothesis threshold reveals someone’s data point! — With the right auxiliary information, could be linked to Shiva! 2 24 57 83 x x x x 121 153 Thanks: Mark Bun . . .

  8. S AVING S HIVA ’ S P RIVACY — Idea: “noisy” choice of threshold hides individual contribution! 2 57 83 24 x x x x 153 121 176 182 . . . . . . 0 1 2 3 . . . . . . T Vitamin C level Thanks: Mark Bun

  9. Had it been the year 2000 (AD) …

  10. Had it been the year 2000 (AD) …

  11. Brilliant, isn’t it? Time for coffee and cookies THANK YOU! ? But ... ? ?

  12. LET’S TAKE A STEP BACK

  13. D ATA P RIVACY – T HE P ROBLEM — Given: ¡ A dataset with sensitive information — How to: ¡ Compute and release functions of the dataset without compromising individual privacy Database X Server/agency Users x 1 x 2 Government, answers ) ( queries x 3 researchers, A businesses � (or) x n-1 Malicious x n adversary

  14. D ATA P RIVACY – T HE P ROBLEM — Given: ¡ A dataset with sensitive information — How to: ¡ Compute and release functions of the dataset without compromising individual privacy — Hospital: (based on past patients) predict whether a patient is prone to scurvy, based on vitamin c level in her blood — Bank: (based on past customers) predict whether a new customer is good/bad credit, based on her attributes — Example, label, presence in database may all be sensitive!

  15. Differential Privacy [DMNS 06] — Evolved in [DN’03, EGS’03, DN’04, BDMN’05, DMNS’06, DKMMN’06] — Intuition: to protect an individual make sure that changing her record does not change the output distribution (by too much) D D’ x 1 x 1 x 2 x 2 x 3 A( D ) x’ 3 A( D’ ) A A � � x n-1 x n-1 x n x n

  16. Differential Privacy [DMNS 06] Definition: A algorithm A is ( ε , δ )-differentially private if: for all neighboring databases D , D’ and for all sets of answers S: Pr[A( D ) ∈ S] ≤ e ε Pr[A( D’ ) ∈ S] + δ D D’ x 1 x 1 x 2 x 2 x 3 A( D ) x’ 3 A( D’ ) A A � � x n-1 x n-1 x n x n

  17. Differential Privacy [DMNS 06] Definition: A algorithm A is ( ε , δ )-differentially private if: for all neighboring databases D , D’ and for all sets of answers S: Pr[A( D ) ∈ S] ≤ e ε Pr[A( D’ ) ∈ S] + δ ​𝑓↑𝜗 ≈1+ 𝜗 . . 𝜀 ≪ ​ 1 /𝑜 Take 𝜗 > ​ 1 /𝑜 ¡ , Pure: 𝜀 =0 otherwise, no Approx.: 𝜀 >0 utility!

  18. LEARNING

  19. PAC Model [Valiant 84] Fresh point picked according to distribution P With high probability (over randomness of learner and distribution), a random point drawn according to P is “classified” correctly A distribution P on X. Each point in X labeled 0/1 Samples drawn according to P

  20. PAC Learning: Definition — Given distribution P over examples, labeled by c — Hypothesis h is α-­‑ good if error(h) = Pr x~P [h(x) ≠ c(x)] ≤ α — C: a set of concepts {c: {0,1} d → {0, 1}} — H: a set of hypotheses {h: {0,1} d → {0, 1}} — Algorithm A PAC learns C with H if, Given examples drawn from P, labeled by some c ∈ C: — (x 1 ,c(x 1 )),…,(x n ,c(x n )) A outputs an α-­‑ good hypothesis h ∈ H w.p. 1 - β ¡ — — Proper: C = H — Fact: Θ (VC(C)) samples for PAC learning C (properly) ¡ VC(C) ≤ log|C|

  21. PRIVATE LEARNING

  22. W HY P RIVATE L EARNING ? — Party line [KLNRS’08]: abstracts many of the computations done over collections of sensitive information — Test-bed for ideas – problems and mitigation — Learning intimately related with differential privacy ¡ Learning theory tools useful for privacy [BLR’08, HR’10, …] ¡ Differential privacy implies generalization [M’?, DFHPRR’15, BSSU’15, NS’15] ÷ In a sense, all differential privacy allows us is to learn!

  23. P RIVATE L EARNING — Definition [KLNRS’08]: ¡ Algorithm A Private- PAC learns C with H if, ÷ A PAC learns C with H, and ÷ A is ( ε , δ )–differentially private

  24. Example 1: Privately Learning Points E 𝒅↓𝒌 ( 𝒚 ) ​𝒅↓𝒌 𝑼↓𝒆 = { ¡ █□ ⁠ . ⁠ ¡ ¡ ¡𝟏 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝒌 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝑼 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ } 𝑸𝑷𝑱𝑶​𝑼↓ 𝑸𝑷 𝒅↓𝒌 (𝒚) =𝟐 ¡ ¡⟺ ¡ ¡ 𝒚 = 𝒌 ​𝒅↓𝒌 𝒚 Given labeled examples 𝑻 = ​(​𝒚↓ ↑𝒏 provide: 𝒚↓𝒋 , ​𝒛↓ 𝒛↓𝒋 )↓𝒋 =𝟐 ↑𝒏 1) Differential Privacy 2) If 𝑻 is consistent with some ​𝒅↓𝒌 ↓𝒆 , then 𝒅↓𝒌 ∈𝐐𝐏𝐉 𝐏𝐉𝐎 ​ 𝐔 ↓𝒆 w.h.p. outputs an 𝒊 s.t. 𝒛↓𝒋 }| ≤ 𝜷 𝒇𝒔𝒔𝒑​𝒔↓ 𝒇𝒔 𝒔↓𝑻 (𝒊) = ​ 𝟐 /𝒏 /𝒏 |{𝒋 ¡: 𝒊(​𝒚↓ 𝒚↓𝒋 ) ≠ ​𝒛↓

  25. Example 2: Privately Learning Thresholds ↓𝒆 = { ¡ █□ ⁠ . ⁠ ¡ ¡ ¡𝟏 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝒌 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝑼 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ } 𝐔𝐈𝐒𝐅𝐅𝐓𝐈𝐏𝐌 𝐏𝐌 ​ 𝐄 ↓𝒆 𝒅↓𝒌 ( 𝒚 ) ​𝒅↓𝒌 𝒅↓𝒌 (𝒚) =𝟐 ¡ ¡⟺ ¡ ¡ 𝒚 < 𝒌 ​𝒅↓𝒌 𝒚 Given labeled examples 𝑻 = ​(​𝒚↓ ↑𝒏 provide: 𝒚↓𝒋 , ​𝒛↓ 𝒛↓𝒋 )↓𝒋 =𝟐 ↑𝒏 1) Differential Privacy 2) If 𝑻 is consistent with some ​𝒅↓𝒌 ↓𝒆 , 𝒅↓𝒌 ∈𝐔𝐈𝐒𝐅𝐓𝐈𝐏𝐌 𝐏𝐌 ​ 𝐄 ↓𝒆 then w.h.p. outputs an 𝒊 s.t. 𝒛↓𝒋 }| ≤ 𝜷 𝒇𝒔 𝒇𝒔𝒔𝒑​𝒔↓ 𝒔↓𝑻 (𝒊) = ​ 𝟐 /𝒏 /𝒏 |{𝒋 ¡: 𝒊(​𝒚↓ 𝒚↓𝒋 ) ≠ ​𝒛↓

  26. A General Feasibility Result — Theorem [KLNRS 08]: Every finite concept class C can be learned privately (and properly), using O(log| C|) examples — Generic Construction (based on the Exponential Mechanism of [MT07]): ¡ Define q(D,h) = # of x i ’s correctly classified by h ¡ Output hypothesis h from C w.p. ≈ e ε q(D,h) q(D,h)=4 q(D,h)=3

  27. A General Feasibility Result — Generic Construction (based on the Exponential Mechanism of [MT07]): ¡ Define q(D,h) = # of x i ’s incorrectly classified by h ¡ Output hypothesis h from C w.p. ≈ e - ε q(D,h) — Privacy: ¡ changing one example changes q(D,h) by at most 1 ¡ Probability of outputting h changes by a factor of at most e ε — Utility: ¡ If h has error > α , probability of outputting h is at most e - ε α n ¡ Union bound: probability of outputting some h with error > α at most |C| e - ε α n ¡ Suffices to take n =O(log |C|)

  28. Had it been the year 2008 …

  29. Brilliant, isn’t it? Time for coffee and cookies THANK YOU! ? But … ? ?

  30. Privately Learning Points/Thresholds — Fact: Proper Point/Threshold learner with O(1) 2 d samples — Generic construction of private learners results in O(log |C|) = O(log(T)) samples Is this gap essential? — Why do we care? ¡ Want private learners to be as efficient as non-private ones ¡ Generic construction fails when domain infinite — Thm [BKN 10]: Any proper pure-private learner of Points/Threshold must use Ω ( log​( 𝑈 ) ) samples

  31. C AN WE DO BETTER ? — Recall: O(log|C|) examples to beat union bound in exponential mechanism analysis — Idea: what if we choose the outcome hypothesis from a set smaller than C?

Recommend


More recommend