L EARNING UNDER D IFFERENTIAL P RIVACY K OBBI N ISSIM BGU/Harvard - PowerPoint PPT Presentation

L EARNING UNDER D IFFERENTIAL P RIVACY K OBBI N ISSIM BGU/Harvard Caltech, Spring 2015 Based on joint work with: Amos Beimel, Avrim Blum, Hai Brenner, Mark Bun, Cynthia Dwork, Shiva Kasiviswanathan, Homin Lee, Frank McSherry, Sofya Raskhodnikova, Adam Smith, Uri Stemmer, and Salil Vadhan.

L ET ’ S DO SOME SWEET SCIENCE !  Scurvy: a problem throughout human history  Caused by vitamin C deficiency  How much vitamin C is enough? Thanks: Mark Bun

S O YOU HAVE SOME DATA … 2 57 83 24 153 121 176 182 . . . . . . 0 1 2 3 . . . . . . T Vitamin C level Thanks: Mark Bun

S O YOU HAVE SOME DATA …  c: threshold func that is consistent with data 2 57 83 24 x x x x 153 121 176 182 . . . . . . 0 1 2 3 . . . . . . T Vitamin C level Thanks: Mark Bun

S O YOU HAVE SOME DATA …  c: threshold func that is consistent with data  Theorem: if n > n 0 then c also “agrees” with underlying distribution ¡ n 0 depends on learner accuracy and success probability ¡ n 0 examples suffice independent of domain size! 2 24 57 83 x x x x 121 153 Thanks: Mark Bun . . .

W HAT ’ S THE P ROBLEM ?  The hypothesis threshold reveals someone’s data point!  With the right auxiliary information, could be linked to Shiva! 2 24 57 83 x x x x 121 153 Thanks: Mark Bun . . .

S AVING S HIVA ’ S P RIVACY  Idea: “noisy” choice of threshold hides individual contribution! 2 57 83 24 x x x x 153 121 176 182 . . . . . . 0 1 2 3 . . . . . . T Vitamin C level Thanks: Mark Bun

Had it been the year 2000 (AD) …

Brilliant, isn’t it? Time for coffee and cookies THANK YOU! ? But ... ? ?

LET’S TAKE A STEP BACK

D ATA P RIVACY – T HE P ROBLEM  Given: ¡ A dataset with sensitive information  How to: ¡ Compute and release functions of the dataset without compromising individual privacy Database X Server/agency Users x 1 x 2 Government, answers ) ( queries x 3 researchers, A businesses � (or) x n-1 Malicious x n adversary

D ATA P RIVACY – T HE P ROBLEM  Given: ¡ A dataset with sensitive information  How to: ¡ Compute and release functions of the dataset without compromising individual privacy  Hospital: (based on past patients) predict whether a patient is prone to scurvy, based on vitamin c level in her blood  Bank: (based on past customers) predict whether a new customer is good/bad credit, based on her attributes  Example, label, presence in database may all be sensitive!

Differential Privacy [DMNS 06]  Evolved in [DN’03, EGS’03, DN’04, BDMN’05, DMNS’06, DKMMN’06]  Intuition: to protect an individual make sure that changing her record does not change the output distribution (by too much) D D’ x 1 x 1 x 2 x 2 x 3 A( D ) x’ 3 A( D’ ) A A � � x n-1 x n-1 x n x n

Differential Privacy [DMNS 06] Definition: A algorithm A is ( ε , δ )-differentially private if: for all neighboring databases D , D’ and for all sets of answers S: Pr[A( D ) ∈ S] ≤ e ε Pr[A( D’ ) ∈ S] + δ D D’ x 1 x 1 x 2 x 2 x 3 A( D ) x’ 3 A( D’ ) A A � � x n-1 x n-1 x n x n

Differential Privacy [DMNS 06] Definition: A algorithm A is ( ε , δ )-differentially private if: for all neighboring databases D , D’ and for all sets of answers S: Pr[A( D ) ∈ S] ≤ e ε Pr[A( D’ ) ∈ S] + δ 𝑓↑𝜗 ≈1+ 𝜗 . . 𝜀 ≪ 1 /𝑜 Take 𝜗 > 1 /𝑜 ¡ , Pure: 𝜀 =0 otherwise, no Approx.: 𝜀 >0 utility!

LEARNING

PAC Model [Valiant 84] Fresh point picked according to distribution P With high probability (over randomness of learner and distribution), a random point drawn according to P is “classified” correctly A distribution P on X. Each point in X labeled 0/1 Samples drawn according to P

PAC Learning: Definition  Given distribution P over examples, labeled by c  Hypothesis h is α-‑ good if error(h) = Pr x~P [h(x) ≠ c(x)] ≤ α  C: a set of concepts {c: {0,1} d → {0, 1}}  H: a set of hypotheses {h: {0,1} d → {0, 1}}  Algorithm A PAC learns C with H if, Given examples drawn from P, labeled by some c ∈ C:  (x 1 ,c(x 1 )),…,(x n ,c(x n )) A outputs an α-‑ good hypothesis h ∈ H w.p. 1 - β ¡   Proper: C = H  Fact: Θ (VC(C)) samples for PAC learning C (properly) ¡ VC(C) ≤ log|C|

PRIVATE LEARNING

W HY P RIVATE L EARNING ?  Party line [KLNRS’08]: abstracts many of the computations done over collections of sensitive information  Test-bed for ideas – problems and mitigation  Learning intimately related with differential privacy ¡ Learning theory tools useful for privacy [BLR’08, HR’10, …] ¡ Differential privacy implies generalization [M’?, DFHPRR’15, BSSU’15, NS’15] ÷ In a sense, all differential privacy allows us is to learn!

P RIVATE L EARNING  Definition [KLNRS’08]: ¡ Algorithm A Private- PAC learns C with H if, ÷ A PAC learns C with H, and ÷ A is ( ε , δ )–differentially private

Example 1: Privately Learning Points E 𝒅↓𝒌 ( 𝒚 ) 𝒅↓𝒌 𝑼↓𝒆 = { ¡ █□ ⁠ . ⁠ ¡ ¡ ¡𝟏 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝒌 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝑼 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ } 𝑸𝑷𝑱𝑶𝑼↓ 𝑸𝑷 𝒅↓𝒌 (𝒚) =𝟐 ¡ ¡⟺ ¡ ¡ 𝒚 = 𝒌 𝒅↓𝒌 𝒚 Given labeled examples 𝑻 = (𝒚↓ ↑𝒏 provide: 𝒚↓𝒋 , 𝒛↓ 𝒛↓𝒋 )↓𝒋 =𝟐 ↑𝒏 1) Differential Privacy 2) If 𝑻 is consistent with some 𝒅↓𝒌 ↓𝒆 , then 𝒅↓𝒌 ∈𝐐𝐏𝐉 𝐏𝐉𝐎 𝐔 ↓𝒆 w.h.p. outputs an 𝒊 s.t. 𝒛↓𝒋 }| ≤ 𝜷 𝒇𝒔𝒔𝒑𝒔↓ 𝒇𝒔 𝒔↓𝑻 (𝒊) = 𝟐 /𝒏 /𝒏 |{𝒋 ¡: 𝒊(𝒚↓ 𝒚↓𝒋 ) ≠ 𝒛↓

Example 2: Privately Learning Thresholds ↓𝒆 = { ¡ █□ ⁠ . ⁠ ¡ ¡ ¡𝟏 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝒌 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝑼 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ } 𝐔𝐈𝐒𝐅𝐅𝐓𝐈𝐏𝐌 𝐏𝐌 𝐄 ↓𝒆 𝒅↓𝒌 ( 𝒚 ) 𝒅↓𝒌 𝒅↓𝒌 (𝒚) =𝟐 ¡ ¡⟺ ¡ ¡ 𝒚 < 𝒌 𝒅↓𝒌 𝒚 Given labeled examples 𝑻 = (𝒚↓ ↑𝒏 provide: 𝒚↓𝒋 , 𝒛↓ 𝒛↓𝒋 )↓𝒋 =𝟐 ↑𝒏 1) Differential Privacy 2) If 𝑻 is consistent with some 𝒅↓𝒌 ↓𝒆 , 𝒅↓𝒌 ∈𝐔𝐈𝐒𝐅𝐓𝐈𝐏𝐌 𝐏𝐌 𝐄 ↓𝒆 then w.h.p. outputs an 𝒊 s.t. 𝒛↓𝒋 }| ≤ 𝜷 𝒇𝒔 𝒇𝒔𝒔𝒑𝒔↓ 𝒔↓𝑻 (𝒊) = 𝟐 /𝒏 /𝒏 |{𝒋 ¡: 𝒊(𝒚↓ 𝒚↓𝒋 ) ≠ 𝒛↓

A General Feasibility Result  Theorem [KLNRS 08]: Every finite concept class C can be learned privately (and properly), using O(log| C|) examples  Generic Construction (based on the Exponential Mechanism of [MT07]): ¡ Define q(D,h) = # of x i ’s correctly classified by h ¡ Output hypothesis h from C w.p. ≈ e ε q(D,h) q(D,h)=4 q(D,h)=3

A General Feasibility Result  Generic Construction (based on the Exponential Mechanism of [MT07]): ¡ Define q(D,h) = # of x i ’s incorrectly classified by h ¡ Output hypothesis h from C w.p. ≈ e - ε q(D,h)  Privacy: ¡ changing one example changes q(D,h) by at most 1 ¡ Probability of outputting h changes by a factor of at most e ε  Utility: ¡ If h has error > α , probability of outputting h is at most e - ε α n ¡ Union bound: probability of outputting some h with error > α at most |C| e - ε α n ¡ Suffices to take n =O(log |C|)

Had it been the year 2008 …

Brilliant, isn’t it? Time for coffee and cookies THANK YOU! ? But … ? ?

Privately Learning Points/Thresholds  Fact: Proper Point/Threshold learner with O(1) 2 d samples  Generic construction of private learners results in O(log |C|) = O(log(T)) samples Is this gap essential?  Why do we care? ¡ Want private learners to be as efficient as non-private ones ¡ Generic construction fails when domain infinite  Thm [BKN 10]: Any proper pure-private learner of Points/Threshold must use Ω ( log( 𝑈 ) ) samples

C AN WE DO BETTER ?  Recall: O(log|C|) examples to beat union bound in exponential mechanism analysis  Idea: what if we choose the outcome hypothesis from a set smaller than C?

L EARNING UNDER D IFFERENTIAL P RIVACY K OBBI N ISSIM BGU/Harvard - PowerPoint PPT Presentation

L EARNING UNDER D IFFERENTIAL P RIVACY K OBBI N ISSIM BGU/Harvard Caltech, Spring 2015 Based on joint work with: Amos Beimel, Avrim Blum, Hai Brenner, Mark Bun, Cynthia Dwork, Shiva Kasiviswanathan, Homin Lee, Frank McSherry, Sofya

O PEN L EARNING S EMINAR P ROVISIONING @ U NISA P RINCIPLES OF O PEN L EARNING O PEN L EARNING P

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 2: I NTRODUCTION TO M ACHINE L EARNING Ilya

L EARNING COGNITIVE TASKS ( CURRICULUM ): N OT MY FIRST CHAIR L EARNING ABOUT OBJECTS

Virtual Memory 1 L earning to Play Well With Others (Physical) Memory malloc(0x20000) 0x10000

Earning Collections Merit Badge Many people collect different items, usually because they like

Reinventing the GRM Brand Cultivating the future of rice exports Q1 FY2020 Earning Presentation

An MT2C Overview M ARK M ANASSE , P H .D. I NSTRUCTIONAL L EARNING A SSISTANCE C OORDINATOR 3CSN

Outline of Consolidated Results for First Quarter of FY2020 Earning Forecast for FY2020 July

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 1: N EUROIMAGING T ECHNIQUES Ilya Kuzovkin

A DULT L EARNING T HEORY T HEORY B ringing B ringing E ducation & E ducation & S

All Under Sin All Under Sin All Under Sin All Under Sin Gentiles Jews

LAMBDA - L earning, A pplying, M ultiplying B ig D ata A nalytics Project presentation This

LAMBDA - L earning, A pplying, M ultiplying B ig D ata A nalytics Project presentation This

T eaching + L earning T uesdays March 20, 2018 2:30PM Please adjust your audio using the Audio

D EEP S TRUCTURED O UTPUT L EARNING FOR U NCONSTRAINED T EXT R ECOGNITION Max Jaderberg, Karen

W HAT S YOUR P LAN ? DEVELOPING A BEHAVIOR INTERVENTION PLAN : T HE P ROCESS Part One

Science & research Simon Collins HIV i-Base i) why we need evidence and not just expert

BIOE 301 Lecture 19 http://www.npr.org/templates/story/story.php?storyId= 1579643 Zantrex-3

National Institutes of Health National Institutes of Health Clinical Research Policy Analysis

SOME SURPRISING FACTS ABOUT (the problem of) SURPRISING FACTS D. Mayo February 26, 2011 1

SESSION 4: WHAT IS RISK? Risk is ubiquitous and has always been around Risk has always been

The Generation of Referring Expressions: The Generation of Referring Expressions: Where We've

Swimming with crocodiles: lessons learned from 40 years of trying to influence policy Steve

The Voyages of Captain Cook (and other quasi-related stuff) 1642 Abel Tasman sails right around

L EARNING UNDER D IFFERENTIAL P RIVACY K OBBI N ISSIM BGU/Harvard - PowerPoint PPT Presentation

L EARNING UNDER D IFFERENTIAL P RIVACY K OBBI N ISSIM BGU/Harvard Caltech, Spring 2015 Based on joint work with: Amos Beimel, Avrim Blum, Hai Brenner, Mark Bun, Cynthia Dwork, Shiva Kasiviswanathan, Homin Lee, Frank McSherry, Sofya

O PEN L EARNING S EMINAR P ROVISIONING @ U NISA P RINCIPLES OF O PEN L EARNING O PEN L EARNING P

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 2: I NTRODUCTION TO M ACHINE L EARNING Ilya

L EARNING COGNITIVE TASKS ( CURRICULUM ): N OT MY FIRST CHAIR L EARNING ABOUT OBJECTS

Virtual Memory 1 L earning to Play Well With Others (Physical) Memory malloc(0x20000) 0x10000

Earning Collections Merit Badge Many people collect different items, usually because they like

Reinventing the GRM Brand Cultivating the future of rice exports Q1 FY2020 Earning Presentation

An MT2C Overview M ARK M ANASSE , P H .D. I NSTRUCTIONAL L EARNING A SSISTANCE C OORDINATOR 3CSN

Outline of Consolidated Results for First Quarter of FY2020 Earning Forecast for FY2020 July

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 1: N EUROIMAGING T ECHNIQUES Ilya Kuzovkin

A DULT L EARNING T HEORY T HEORY B ringing B ringing E ducation &amp; E ducation &amp; S

All Under Sin All Under Sin All Under Sin All Under Sin Gentiles Jews

LAMBDA - L earning, A pplying, M ultiplying B ig D ata A nalytics Project presentation This

LAMBDA - L earning, A pplying, M ultiplying B ig D ata A nalytics Project presentation This

T eaching + L earning T uesdays March 20, 2018 2:30PM Please adjust your audio using the Audio

D EEP S TRUCTURED O UTPUT L EARNING FOR U NCONSTRAINED T EXT R ECOGNITION Max Jaderberg, Karen

W HAT S YOUR P LAN ? DEVELOPING A BEHAVIOR INTERVENTION PLAN : T HE P ROCESS Part One

Science &amp; research Simon Collins HIV i-Base i) why we need evidence and not just expert

BIOE 301 Lecture 19 http://www.npr.org/templates/story/story.php?storyId= 1579643 Zantrex-3

National Institutes of Health National Institutes of Health Clinical Research Policy Analysis

SOME SURPRISING FACTS ABOUT (the problem of) SURPRISING FACTS D. Mayo February 26, 2011 1

SESSION 4: WHAT IS RISK? Risk is ubiquitous and has always been around Risk has always been

The Generation of Referring Expressions: The Generation of Referring Expressions: Where We've

Swimming with crocodiles: lessons learned from 40 years of trying to influence policy Steve

The Voyages of Captain Cook (and other quasi-related stuff) 1642 Abel Tasman sails right around

A DULT L EARNING T HEORY T HEORY B ringing B ringing E ducation & E ducation & S

Science & research Simon Collins HIV i-Base i) why we need evidence and not just expert