Locally private learning without interaction requires separation Vitaly Feldman Research with Amit Daniely Hebrew University
Local Differential Privacy (LDP) π¨ 1 [KLNRS β08] π -LDP if for every user π, message π is sent using a local π π,π -DP randomizer π¨ 2 π΅ π,π and β€ π π π,π π¨ 3 π Server π¨ π
Non-interactive LDP π¨ 1 π¨ 2 π¨ 3 Server π¨ π
PAC learning PAC model [Valiant β84] : Let π· be a set of binary classifiers over π π΅ is a PAC learning algorithm for π· if βπ β π· and distribution πΈ over π , given i.i.d. examples π¦ π , π π¦ π for π¦ π βΌ πΈ , π΅ outputs β such that w.h.p. π¦βΌπΈ β π¦ β π(π¦) β€ π½ ππ¬ Distribution-specific learning: πΈ is fixed and known to π΅ 4
Statistical query model [Kearns β93] π 1 π€ 1 π distribution over π π 2 π = π Γ {Β±1} π€ 2 π is the distribution of π (π¦, π π¦ ) for π¦ βΌ πΈ π π SQ algorithm π€ π SQ oracle π€ 1 β π π¨βΌπ π 1 π¨ β€ π π 1 : π β 0,1 π is tolerance of the query; π = 1/ π [KLNRS β08] Simulation with success prob. 1 β πΎ (π β€ 1) πΎ π -LDP with π messages β π π queries with π = Ξ© π β’ π log π/πΎ π queries with tolerance π β π -LDP with π = π samples/messages β’ ππ 2 Non-interactive if and only if queries are non-adaptive
Known results π· is SQ-learnable efficiently (non-adaptively) if and only if learnable efficiently with π -LDP (non-interactively) Examples: Yes: halfspaces/linear classifiers [Dunagan,Vempala β04] β’ No: parity functions [Kearns β93] β’ Yes, non-adaptively: Boolean conjunctions β’ [KLNRS 08] There exists π· that is 1. SQ/LDP-learnable efficiently over the uniform distribution on 0,1 π but 2. requires exponential num. of samples to learn non-interactively by an LDP algorithm [KLNRS 08] : Does separation hold for distribution-independent learning? Masked parity 6
Margin Complexity - - - - - - Margin complexity of π· over π - ππ(π·) : - - smallest π such that exists an embedding Ξ¨: π β π π (1) under - - - 1 - which every π β π· is linearly separable with margin πΏ β₯ π + + + Positive examples Ξ¨ π¦ π π¦ = +1} + + + + + Negative examples Ξ¨ π¦ π π¦ = β1} + + + + 7
Lower bound Thm: Let π· be a negation-closed set of classifiers. Any non-interactive 1 -LPD algorithm that learns π· with error π½ < 1/2 and success probability Ξ© 1 needs π = Ξ© ππ π· 2/3 Corollaries: Decision lists over 0,1 π : π = 2 Ξ© π 1/3 β’ [Buhrman,Vereshchagin,de Wolf β07] π (Interactively) learnable with π = poly π½π [Kearns β93] Linear classifiers over 0,1 π : π = 2 Ξ© π β’ [Goldmann,Hastad,Razborov β92; Sherstov β07] π (Interactively) learnable with π = poly π½π [Dunagan,Vempala β04] 8
Upper bound Thm: For any π· and distribution πΈ there exists a non-adaptive π -LPD algorithm that learns π· over πΈ with error π½ and success probability 1 β πΎ using π = poly ππ π· β log 1/πΎ π½π Instead of fixed πΈ access to public unlabeled samples from πΈ β’ (interactive) LDP access to unlabeled samples from πΈ β’ Lower bound holds against the hybrid model 9
Lower bound technique Thm: Let π· be a negation-closed set of classifiers. If exists a non-adaptive SQ algorithm that uses π queries of tolerance 1/π to learn π· with error π½ < 1/2 and success probability Ξ© 1 then ππ π· = π π 3/2 Correlation dimension of π· - CSQdim(π·) [F. β08] : smallest π’ for which exist π’ functions β 1 , β¦ , β π’ : π β [β1,1] such that for every π β π· and distribution πΈ exists π such that β₯ 1 π¦βΌπΈ π π¦ β π π¦ π π’ Thm: [F. β08; Kallweit,Simon β11] : ππ π· β€ CSQdim π· 3/2 10
Proof If exists a non-adaptive SQ algorithm π΅ that uses π queries of tolerance 1/π to learn π· with error π½ < 1/2 then CSQdim π· β€ π Let π 1 , β¦ , π π : π Γ Β±1 β 0,1 be the (non-adaptive) queries of π΅ Decompose π π¦, π§ = π π¦, 1 + π π¦, β1 + π π¦, 1 β π π¦, β1 β π§ 2 2 π β π¦βΌπΈ π π¦ β π π¦ π¦βΌπΈ π π (π¦, π π¦ ) = π π π¦βΌπΈ π π π¦ + π β€ 1 If π π then π π¦βΌπΈ π π (π¦, βπ π¦ ) π¦βΌπΈ π π¦ β π π¦ π¦βΌπΈ π π (π¦, π π¦ ) β π If this holds for all π β [π] , then the algorithm cannot distinguish between π and βπ Cannot achieve error < 1/2 11
Upper bound Thm: For any π· and distribution πΈ there exists a non-adaptive π -LPD algorithm that learns π· over πΈ with error π½ < 1/2 and success probability 1 β πΎ using π = poly ππ π· β log 1/πΎ π½π Margin complexity of π· over π - ππ(π·) : smallest π such that exists an embedding Ξ¨: π β π π (1) under which every π β π· is 1 linearly separable with margin πΏ β₯ π Thm [Arriaga,Vempala β99; Ben -David,Eiron,Simon β02] : For every every π β π·, random projection into π π (1) for π = π(ππ π· 2 log(1/πΎ)) 1 π ππ π· ensures that with prob. 1 β πΎ , 1 β πΎ fraction of points are linearly separable with margin πΏ β₯ 12
Algorithm Perceptron: if sign( π₯ π’ , π¦ ) β π§ then update π₯ π’+1 β π₯ π’ + π§π¦ Expected update: (π¦,π§)βΌπ π§π¦ | sign( π₯ π’ , π¦ ) β π§ π || scalar β₯ π½ (π¦,π§)βΌπ π§π¦ β π(sign( π₯ π’ , π¦ ) β π§) / π (π¦,π§)βΌπ sign( π₯ π’ , π¦ ) β π§ ππ¬ || π¦,π§ βΌπ π¦π§ + π π¦,π§ βΌπ π¦ sign π₯ π’ , π¦ π (π¦,π§)βΌπ π¦ β π§ β sign π₯ π’ , π¦ = π 2 2 independent of the label non-adaptive Estimate the mean vector with β 2 error β’ LDP [Duchi,Jordan,Wainright β 13] β’ SQs [F.,Guzman,Vempala β15] 13
Conclusions β’ New approach to lower bounds for non-interactive LDP o Reduction to margin-complexity lower bounds β’ Lower bounds for classical learning problems β’ Same results for communication constrained protocols o Also equivalent to SQ β’ Interaction is necessary for learning β’ Open: o Distribution-independent learning in poly ππ π· o Lower bounds against 2 + round protocols o Stochastic convex optimization https://arxiv.org/abs/1809.09165 14
Recommend
More recommend