locally private learning without interaction requires
play

Locally private learning without interaction requires separation - PowerPoint PPT Presentation

Locally private learning without interaction requires separation Vitaly Feldman Research with Amit Daniely Hebrew University Local Differential Privacy (LDP) 1 [KLNRS 08] -LDP if for every user , message is sent using a


  1. Locally private learning without interaction requires separation Vitaly Feldman Research with Amit Daniely Hebrew University

  2. Local Differential Privacy (LDP) 𝑨 1 [KLNRS β€˜08] πœ— -LDP if for every user 𝑗, message π‘˜ is sent using a local πœ— 𝑗,π‘˜ -DP randomizer 𝑨 2 𝐡 𝑗,π‘˜ and ≀ πœ— πœ— 𝑗,π‘˜ 𝑨 3 π‘˜ Server 𝑨 π‘œ

  3. Non-interactive LDP 𝑨 1 𝑨 2 𝑨 3 Server 𝑨 π‘œ

  4. PAC learning PAC model [Valiant β€˜84] : Let 𝐷 be a set of binary classifiers over π‘Œ 𝐡 is a PAC learning algorithm for 𝐷 if βˆ€π‘” ∈ 𝐷 and distribution 𝐸 over π‘Œ , given i.i.d. examples 𝑦 𝑗 , 𝑔 𝑦 𝑗 for 𝑦 𝑗 ∼ 𝐸 , 𝐡 outputs β„Ž such that w.h.p. π‘¦βˆΌπΈ β„Ž 𝑦 β‰  𝑔(𝑦) ≀ 𝛽 𝐐𝐬 Distribution-specific learning: 𝐸 is fixed and known to 𝐡 4

  5. Statistical query model [Kearns β€˜93] 𝜚 1 𝑀 1 𝑄 distribution over π‘Ž 𝜚 2 π‘Ž = π‘Œ Γ— {Β±1} 𝑀 2 𝑄 is the distribution of 𝑄 (𝑦, 𝑔 𝑦 ) for 𝑦 ∼ 𝐸 𝜚 π‘Ÿ SQ algorithm 𝑀 π‘Ÿ SQ oracle 𝑀 1 βˆ’ 𝐅 π‘¨βˆΌπ‘„ 𝜚 1 𝑨 ≀ 𝜐 𝜚 1 : π‘Ž β†’ 0,1 𝜐 is tolerance of the query; 𝜐 = 1/ π‘œ [KLNRS β€˜08] Simulation with success prob. 1 βˆ’ 𝛾 (πœ— ≀ 1) 𝛾 πœ— -LDP with 𝑛 messages β‡’ 𝑃 𝑛 queries with 𝜐 = Ξ© 𝑛 β€’ π‘Ÿ log π‘Ÿ/𝛾 π‘Ÿ queries with tolerance 𝜐 β‡’ πœ— -LDP with π‘œ = 𝑃 samples/messages β€’ πœπœ— 2 Non-interactive if and only if queries are non-adaptive

  6. Known results 𝐷 is SQ-learnable efficiently (non-adaptively) if and only if learnable efficiently with πœ— -LDP (non-interactively) Examples: Yes: halfspaces/linear classifiers [Dunagan,Vempala β€˜04] β€’ No: parity functions [Kearns β€˜93] β€’ Yes, non-adaptively: Boolean conjunctions β€’ [KLNRS 08] There exists 𝐷 that is 1. SQ/LDP-learnable efficiently over the uniform distribution on 0,1 𝑒 but 2. requires exponential num. of samples to learn non-interactively by an LDP algorithm [KLNRS 08] : Does separation hold for distribution-independent learning? Masked parity 6

  7. Margin Complexity - - - - - - Margin complexity of 𝐷 over π‘Œ - 𝐍𝐃(𝐷) : - - smallest 𝑁 such that exists an embedding Ξ¨: π‘Œ β†’ 𝐂 𝑒 (1) under - - - 1 - which every 𝑔 ∈ 𝐷 is linearly separable with margin 𝛿 β‰₯ 𝑁 + + + Positive examples Ξ¨ 𝑦 𝑔 𝑦 = +1} + + + + + Negative examples Ξ¨ 𝑦 𝑔 𝑦 = βˆ’1} + + + + 7

  8. Lower bound Thm: Let 𝐷 be a negation-closed set of classifiers. Any non-interactive 1 -LPD algorithm that learns 𝐷 with error 𝛽 < 1/2 and success probability Ξ© 1 needs π‘œ = Ξ© 𝐍𝐃 𝐷 2/3 Corollaries: Decision lists over 0,1 𝑒 : π‘œ = 2 Ξ© 𝑒 1/3 β€’ [Buhrman,Vereshchagin,de Wolf β€˜07] 𝑒 (Interactively) learnable with π‘œ = poly π›½πœ— [Kearns ’93] Linear classifiers over 0,1 𝑒 : π‘œ = 2 Ξ© 𝑒 β€’ [Goldmann,Hastad,Razborov β€˜92; Sherstov β€˜07] 𝑒 (Interactively) learnable with π‘œ = poly π›½πœ— [Dunagan,Vempala ’04] 8

  9. Upper bound Thm: For any 𝐷 and distribution 𝐸 there exists a non-adaptive πœ— -LPD algorithm that learns 𝐷 over 𝐸 with error 𝛽 and success probability 1 βˆ’ 𝛾 using π‘œ = poly 𝐍𝐃 𝐷 β‹… log 1/𝛾 π›½πœ— Instead of fixed 𝐸 access to public unlabeled samples from 𝐸 β€’ (interactive) LDP access to unlabeled samples from 𝐸 β€’ Lower bound holds against the hybrid model 9

  10. Lower bound technique Thm: Let 𝐷 be a negation-closed set of classifiers. If exists a non-adaptive SQ algorithm that uses π‘Ÿ queries of tolerance 1/π‘Ÿ to learn 𝐷 with error 𝛽 < 1/2 and success probability Ξ© 1 then 𝐍𝐃 𝐷 = 𝑃 π‘Ÿ 3/2 Correlation dimension of 𝐷 - CSQdim(𝐷) [F. ’08] : smallest 𝑒 for which exist 𝑒 functions β„Ž 1 , … , β„Ž 𝑒 : π‘Œ β†’ [βˆ’1,1] such that for every 𝑔 ∈ 𝐷 and distribution 𝐸 exists 𝑗 such that β‰₯ 1 π‘¦βˆΌπΈ 𝑔 𝑦 β„Ž 𝑗 𝑦 𝐅 𝑒 Thm: [F. ’08; Kallweit,Simon β€˜11] : 𝐍𝐃 𝐷 ≀ CSQdim 𝐷 3/2 10

  11. Proof If exists a non-adaptive SQ algorithm 𝐡 that uses π‘Ÿ queries of tolerance 1/π‘Ÿ to learn 𝐷 with error 𝛽 < 1/2 then CSQdim 𝐷 ≀ π‘Ÿ Let 𝜚 1 , … , 𝜚 π‘Ÿ : π‘Œ Γ— Β±1 β†’ 0,1 be the (non-adaptive) queries of 𝐡 Decompose 𝜚 𝑦, 𝑧 = 𝜚 𝑦, 1 + 𝜚 𝑦, βˆ’1 + 𝜚 𝑦, 1 βˆ’ 𝜚 𝑦, βˆ’1 β‹… 𝑧 2 2 𝑕 β„Ž π‘¦βˆΌπΈ 𝑔 𝑦 β„Ž 𝑗 𝑦 π‘¦βˆΌπΈ 𝜚 𝑗 (𝑦, 𝑔 𝑦 ) = 𝐅 𝐅 π‘¦βˆΌπΈ 𝑕 𝑗 𝑦 + 𝐅 ≀ 1 If 𝐅 π‘Ÿ then 𝐅 π‘¦βˆΌπΈ 𝜚 𝑗 (𝑦, βˆ’π‘” 𝑦 ) π‘¦βˆΌπΈ 𝑔 𝑦 β„Ž 𝑗 𝑦 π‘¦βˆΌπΈ 𝜚 𝑗 (𝑦, 𝑔 𝑦 ) β‰ˆ 𝐅 If this holds for all 𝑗 ∈ [π‘Ÿ] , then the algorithm cannot distinguish between 𝑔 and βˆ’π‘” Cannot achieve error < 1/2 11

  12. Upper bound Thm: For any 𝐷 and distribution 𝐸 there exists a non-adaptive πœ— -LPD algorithm that learns 𝐷 over 𝐸 with error 𝛽 < 1/2 and success probability 1 βˆ’ 𝛾 using π‘œ = poly 𝐍𝐃 𝐷 β‹… log 1/𝛾 π›½πœ— Margin complexity of 𝐷 over π‘Œ - 𝐍𝐃(𝐷) : smallest 𝑁 such that exists an embedding Ξ¨: π‘Œ β†’ 𝐂 𝑒 (1) under which every 𝑔 ∈ 𝐷 is 1 linearly separable with margin 𝛿 β‰₯ 𝑁 Thm [Arriaga,Vempala ’99; Ben -David,Eiron,Simon β€˜02] : For every every 𝑔 ∈ 𝐷, random projection into 𝐂 𝑒 (1) for 𝑒 = 𝑃(𝐍𝐃 𝐷 2 log(1/𝛾)) 1 πŸ‘ 𝐍𝐃 𝐷 ensures that with prob. 1 βˆ’ 𝛾 , 1 βˆ’ 𝛾 fraction of points are linearly separable with margin 𝛿 β‰₯ 12

  13. Algorithm Perceptron: if sign( π‘₯ 𝑒 , 𝑦 ) β‰  𝑧 then update π‘₯ 𝑒+1 ← π‘₯ 𝑒 + 𝑧𝑦 Expected update: (𝑦,𝑧)βˆΌπ‘„ 𝑧𝑦 | sign( π‘₯ 𝑒 , 𝑦 ) β‰  𝑧 𝐅 || scalar β‰₯ 𝛽 (𝑦,𝑧)βˆΌπ‘„ 𝑧𝑦 β‹… 𝟚(sign( π‘₯ 𝑒 , 𝑦 ) β‰  𝑧) / 𝐅 (𝑦,𝑧)βˆΌπ‘„ sign( π‘₯ 𝑒 , 𝑦 ) β‰  𝑧 𝐐𝐬 || 𝑦,𝑧 βˆΌπ‘„ 𝑦𝑧 + 𝐅 𝑦,𝑧 βˆΌπ‘„ 𝑦 sign π‘₯ 𝑒 , 𝑦 𝐅 (𝑦,𝑧)βˆΌπ‘„ 𝑦 β‹… 𝑧 βˆ’ sign π‘₯ 𝑒 , 𝑦 = 𝐅 2 2 independent of the label non-adaptive Estimate the mean vector with β„“ 2 error β€’ LDP [Duchi,Jordan,Wainright β€˜ 13] β€’ SQs [F.,Guzman,Vempala β€˜15] 13

  14. Conclusions β€’ New approach to lower bounds for non-interactive LDP o Reduction to margin-complexity lower bounds β€’ Lower bounds for classical learning problems β€’ Same results for communication constrained protocols o Also equivalent to SQ β€’ Interaction is necessary for learning β€’ Open: o Distribution-independent learning in poly 𝐍𝐃 𝐷 o Lower bounds against 2 + round protocols o Stochastic convex optimization https://arxiv.org/abs/1809.09165 14

Recommend


More recommend