csc2412 adaptive data analysis via di ff erential privacy
play

CSC2412: Adaptive Data Analysis via Di ff erential Privacy Sasho - PowerPoint PPT Presentation

CSC2412: Adaptive Data Analysis via Di ff erential Privacy Sasho Nikolov 1 The adaptive data analysis problem Estimating population counts verse of possible data points juni Unknown distribution D on X - models population the i ?


  1. CSC2412: Adaptive Data Analysis via Di ff erential Privacy Sasho Nikolov 1

  2. The adaptive data analysis problem

  3. Estimating population counts verse of possible data points juni • Unknown distribution D on X - → models population the i ? smoker ? smoker and male E. g if , • Predicates q 1 , . . . , q k : X → { 0 , 1 } PhD . . any qz - - ? smoker - ch . Want to estimate, for all i = 1 . . k : q i ( D ) = E x ∼ D [ q i ( x )] . fraction of - the population satisfying 9 i 2

  4. The classical solution Draw a sample X = { x 1 , . . . , x n } iid from D . - I gilt ) qidxj ) Hope that ∀ i : q i ( X ) ≈ q i ( D ) - , I } Is independent , info , ( D ) - qi Effi ( x ) ) I > d ) - Eq - Pll qilxl - gil DH H ) " R ( I qilx ) . Hotting : - - 2h22 if fnzhg.la ? kh#-/ E L e - 2nd Ep qi CDH > a) : lqiltl c- 2k . e Blt - i =L 3

  5. Adaptive queries? for estimates g. ( D ) the . . . , qi , CD ) What if q i depends on q 1 , . . . , q i − 1 ? - , E. g chosen based q , ( H , qi , ( X ) is qi on . . . . , and female } split male → it = ? smokers and E. g even g. . ask g. so ? smokers and smokers ? % ' 235 yrs ebesto# ask 9. ( X ) , g. CH Suppose , gut ) for we Kan random . - - n q . , . - invert predicates X learn and to - we - f ! I :X ⇒ q µ , # =L is uniform . But if R D on 9k . # HI 4 then Kiki , (D) =D -

  6. A simple solution ' . Kyl - htt X into Break . rn } X =L x - . . . . . . . - -1 × 2%1 X ? { tinges ' ) Answer . - by gilt CD ) g. : this - in } ' ) by qdx % * . . . . ya by quark )# prob I - p get 2 W error z ful 2%1 need I I ¥ kln ! Yf) n Can do better ? we - - 5

  7. ⇐ Transfer theorem UK )± µ w/ % answers answers will Hk ) , q , determined from tf ! ! → U by analysts - Theorem Suppose M takes a dataset X and answers k adaptive queries q 1 , . . . , q k . If the accurate U 1. ∀ X ∈ X n , P ( ∃ i : | q i ( X ) − M ( X ) i | > ↵ ) < ↵� , on dataset → 2. M is ( ↵ , ↵� ) -DP, constant C for then a P ( ∃ i : |M ( X ) i − q i ( D ) | > C ↵ ) < C � . µ " KD a D independently - Six , . - , rut X - D " X x ; - . 6

  8. Improving on the simple solution a k ¥ 431 with d Simple solution ; error 22 √ k log k Can get error ↵ with ≈ samples. α 2 Gaussian advanced composition t noise ' ns !¥ iE - N Zi giant Zi answer ur q . , 81 and get ( e DP we - e=fi d for and any = t g) - DP ( d. did need Transfer g Him : we → kTH a F sad Std dev if is n 7 per = q , I

  9. Key Lemma q :& -3,4 ? ! ! distr " ou 't tell . Lemma Suppose W is ( " , � ) -DP, and on input X outputs a counting query q . Let X ∼ D n . Then a Etf | E [ q ( D ) | q = W ( X )] − E [ q ( X ) | q = W ( X )] | ≤ e ε − 1 + � . t f - n " E choice X - D " random over of and randomness of N DP algorithm a query that A distinguishes cannot find 8 X from D .

  10. Proof of Key Lemma → so th E. qui I } ' q :& quit , - . n n h E [ q ( X ) | q = W ( X )] = 1 E [ q ( x i ) | q = W ( X )] = 1 . X X P ( q ( x i ) | q = W ( X )) n n i =1 i =1 from everything Take else . independently n D 4. - Sir . X ' - yxn } trudging . . . , ri-nxi.xi.ie - . . - plqcx.it/q=WKHEfeElPlqcriiitlq--WK' 1) to ( E .tl W - DP of 9

  11. Proof part 2 X - th - - in } ' ) ( xi , X the qEon has - . so :L .fm : . X 's 4h - - Hi , tie ' - - ' i' n } ou , X ) ' ( x ! . . as plqcx.it/q=WlXl)EeElPlqcxiiitlq--WK' 1) to e' Plgirittlq - WH ) ) + of " " " 91 ¥ 49 = quit EE EIQCD ) Iq - with + T = . e ' 14 ¥ ? - WHY IE Iqlxllq E - - NIH ) - IEIqcdllq-WIXHL.ee Efqctllq c- T I - analogous - Cee - ly f ) 10 Z

  12. Aside: Generalization from DP same proof the Almost ( exercise ) lemma → the as pine Theorem DP For any non-negative loss ` ( ✓ , ( x , y )) , X = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } ∼ D n , and :* . n L X ( ✓ ) = 1 X ` ( ✓ , ( x i , y i )) L D ( ✓ ) = E ( x , y ) ∼ D [ ` ( ✓ , ( x , y ))] , n i =1 if ✓ is computed by an ( " , � ) -DP algorithm, then E [ L D ( ✓ )] ≤ e ε E [ L X ( ✓ )] + � max θ , x , y ` ( ✓ , ( x , y )) . much loss Population not is more loss empirical for than DD algo 11 .

  13. A simpler transference theorem Theorem If the mechanism M satisfies that 1. ∀ X ∈ X n , and all sequence of adaptive queries q 1 , . . . , q k , n n E [max i | q i ( X ) − M ( X ) i | ] ≤ ↵ t - Ik tf , 2. M is ( " , � ) -DP, then ate + of | q i ( D ) − M ( X ) i | ] ≤ ↵ + e ε − 1 + � E [max I i - on MHI . . - MAI chosen based . % adaptively are q , , . . . " X - D 12

  14. ⇐ Proof I - qilxl - O - I qilx ) - - - I - O H l - gild q ok ) . - Trick: Suppose that if q i is asked, so is 1 − q i , and is answered by 1 − M ( X ) i . Then max k i =1 | q i ( D ) − M ( X ) i | = max k i =1 q i ( D ) − M ( X ) i . - gild ) } - Mahi , Mtk 9 gild - lllxlil ) lqil Dl max = - 11 - gild ) - ( t - Murli ) I w adaptive it x ) simulates the Define set it on . " " 't queries - prod oh . . , q , -- junk ,atqjlD M is c. ft - Dp of . , sat . gild I r H Outputs ) ) - NIH , , is 1481 - Dp qi ⇒ w error f Yi has - UCH ; 13 mat

  15. Proof pt 2 IE Hia qi - E Iq - U Chi - WH ) ) - MIX ) ; I qi . ID ) CD ) . - - qi CHI qi - WH ) ) = IE I q . . ID ) . - NCH ) - MkIi l qi N - H ) 1- IE E q . - - I to ' e ⇐ enjoy qjkl lemma N l by - Umd : E I t ft L E e - . 14

Recommend


More recommend