Learning from Untrusted Data Moses Charikar, Jacob Steinhardt, Gregory Valiant Symposium on the Theory of Computing June 19, 2017
(Icon credit: Annie Lin) Motivation: data poisoning attacks: 1
(Icon credit: Annie Lin) Motivation: data poisoning attacks: Question: what concepts can be learned in the presence of arbitrarily corrupted data? 1
Related Work • 60 years of work on robust statistics... PCA: • XCM ’10, CLMW ’11, CSPW ’11 Mean estimation: • LRV ’16, DKKLMS ’16, DKKLMS ’17, L ’17, DBS ’17, S CV ’17 Regression: • NTN ’11, NT ’13, CCM ’13, BJK ’15 Classification: • FHKP ’09, GR ’09, KLS ’09, ABL ’14 Semi-random graphs: • FK ’01, C ’07, MMV ’12, S ’17 Other: • HM ’13, C ’14, C ’16, DKS ’16, S CV ’16 2
Problem Setting Observe n points x 1 , . . . , x n 3
Problem Setting Observe n points x 1 , . . . , x n Unknown subset of αn points drawn i.i.d. from p ∗ 3
Problem Setting Observe n points x 1 , . . . , x n Unknown subset of αn points drawn i.i.d. from p ∗ Remaining (1 − α ) n points are arbitrary 3
Problem Setting Observe n points x 1 , . . . , x n Unknown subset of αn points drawn i.i.d. from p ∗ Remaining (1 − α ) n points are arbitrary Goal: estimate parameter of interest θ ( p ∗ ) • assuming p ∗ ∈ P (e.g. bounded moments) • θ ( p ∗ ) could be mean, best fit line, ranking, etc. 3
Problem Setting Observe n points x 1 , . . . , x n Unknown subset of αn points drawn i.i.d. from p ∗ Remaining (1 − α ) n points are arbitrary Goal: estimate parameter of interest θ ( p ∗ ) • assuming p ∗ ∈ P (e.g. bounded moments) • θ ( p ∗ ) could be mean, best fit line, ranking, etc. New regime: α ≪ 1 3
Why Is This Possible? If e.g. α = 1 3 , estimation seems impossible: 4
Why Is This Possible? If e.g. α = 1 3 , estimation seems impossible: 4
Why Is This Possible? If e.g. α = 1 3 , estimation seems impossible: But can narrow down to 3 possibilities! 4
Why Is This Possible? If e.g. α = 1 3 , estimation seems impossible: But can narrow down to 3 possibilities! List-decodable learning [Balcan, Blum, Vempala ’08] • output O (1 /α ) answers, one of which is approximately correct 4
Why Is This Possible? If e.g. α = 1 3 , estimation seems impossible: But can narrow down to 3 possibilities! List-decodable learning [Balcan, Blum, Vempala ’08] • output O (1 /α ) answers, one of which is approximately correct Semi-verified learning • observe O (1) verified points from p ∗ 4
Why Is This Possible? If e.g. α = 1 3 , estimation seems impossible: But can narrow down to 3 possibilities! List-decodable learning [Balcan, Blum, Vempala ’08] • output O (1 /α ) answers, one of which is approximately correct Semi-verified learning • observe O (1) verified points from p ∗ 4
Why Care? Practical problem: data poisoning attacks • How can we build learning algorithms that are provably secure to manipulation? 5
Why Care? Practical problem: data poisoning attacks • How can we build learning algorithms that are provably secure to manipulation? Fundamental problem in robust statistics • What can be learned in presence of arbitrary outliers? 5
Why Care? Practical problem: data poisoning attacks • How can we build learning algorithms that are provably secure to manipulation? Fundamental problem in robust statistics • What can be learned in presence of arbitrary outliers? Agnostic learning of mixtures • When is it possible to learn about one mixture component, with no assumptions about the other components? 5
Main Theorem Observed functions: f 1 , . . . , f n Want to minimize unknown target function: ¯ f 6
Main Theorem Observed functions: f 1 , . . . , f n Want to minimize unknown target function: ¯ f Key quantity: spectral norm bound on a subset I : w ∈ R d � [ ∇ f i ( w ) − ∇ ¯ √ 1 | I | max f ( w )] i ∈ I � op ≤ S. 6
Main Theorem Observed functions: f 1 , . . . , f n Want to minimize unknown target function: ¯ f Key quantity: spectral norm bound on a subset I : w ∈ R d � [ ∇ f i ( w ) − ∇ ¯ √ 1 | I | max f ( w )] i ∈ I � op ≤ S. Meta-Theorem Given a spectral norm bound on an unknown subset of αn functions, learning is possible: • in the semi-verified model (for convex f i ) • in the list-decodable model (for strongly convex f i ) 6
Main Theorem Observed functions: f 1 , . . . , f n Want to minimize unknown target function: ¯ f Key quantity: spectral norm bound on a subset I : w ∈ R d � [ ∇ f i ( w ) − ∇ ¯ √ 1 | I | max f ( w )] i ∈ I � op ≤ S. Meta-Theorem Given a spectral norm bound on an unknown subset of αn functions, learning is possible: • in the semi-verified model (for convex f i ) • in the list-decodable model (for strongly convex f i ) All results direct corollaries of meta-theorem! 6
Corollary: Mean Estimation Setting: distribution p ∗ on R d with mean µ and bounded 1st moments: E p ∗ [ |� x − µ, v �| ] ≤ σ � v � 2 for all v ∈ R d . 7
Corollary: Mean Estimation Setting: distribution p ∗ on R d with mean µ and bounded 1st moments: E p ∗ [ |� x − µ, v �| ] ≤ σ � v � 2 for all v ∈ R d . Observe αn samples from p ∗ and (1 − α ) n arbitrary points, and want to estimate µ . 7
Corollary: Mean Estimation Setting: distribution p ∗ on R d with mean µ and bounded 1st moments: E p ∗ [ |� x − µ, v �| ] ≤ σ � v � 2 for all v ∈ R d . Observe αn samples from p ∗ and (1 − α ) n arbitrary points, and want to estimate µ . Theorem (Mean Estimation) If αn ≥ d , it is possible to output estimates ˆ µ 1 , . . . , ˆ µ m of the mean µ such that • m ≤ 2 /α , and O ( σ/ √ α ) w.h.p. µ j − µ � 2 = ˜ • min m j =1 � ˆ 7
Corollary: Mean Estimation Setting: distribution p ∗ on R d with mean µ and bounded 1st moments: E p ∗ [ |� x − µ, v �| ] ≤ σ � v � 2 for all v ∈ R d . Observe αn samples from p ∗ and (1 − α ) n arbitrary points, and want to estimate µ . Theorem (Mean Estimation) If αn ≥ d , it is possible to output estimates ˆ µ 1 , . . . , ˆ µ m of the mean µ such that • m ≤ 2 /α , and O ( σ/ √ α ) w.h.p. µ j − µ � 2 = ˜ • min m j =1 � ˆ Alternately, it is possible to output an estimate ˆ µ given a single verified point from p ∗ . 7
Comparisons Mean estimation: Bound Regime Assumption Samples σ √ 1 − α α > 1 − c 4 th moments d LRV ’16 d 3 σ (1 − α ) α > 1 − c DKKLMS ’16 sub-Gaussian σ/ √ α α > 0 1 st moments d CSV ’17 8
Comparisons Mean estimation: Bound Regime Assumption Samples σ √ 1 − α α > 1 − c 4 th moments d LRV ’16 d 3 σ (1 − α ) α > 1 − c DKKLMS ’16 sub-Gaussian σ/ √ α α > 0 1 st moments d CSV ’17 Estimating mixtures: Separation Robust? σ ( k + 1 / √ α ) AM ’05 no σk KK ’10 no √ σ k AS ’12 no σ/ √ α CSV ’17 yes 8
Other Results Stochastic Block Model: (sparse regime: cf. GV ’14, LLV ’15, RT ’15, RV ’16) Average Degree Robust? 1 /α 4 GV ’14 no 1 /α 2 AS ’15 no 1 /α 3 yes CSV ’17 9
Other Results Stochastic Block Model: (sparse regime: cf. GV ’14, LLV ’15, RT ’15, RV ’16) Average Degree Robust? 1 /α 4 GV ’14 no 1 /α 2 AS ’15 no 1 /α 3 yes CSV ’17 Others: • discrete product distributions • exponential families • ranking 9
Proof Overview (Mean Estimation) Recall goal: given n points x 1 , . . . , x n , αn drawn from p ∗ , estimate mean µ of p ∗ 10
Proof Overview (Mean Estimation) Recall goal: given n points x 1 , . . . , x n , αn drawn from p ∗ , estimate mean µ of p ∗ Key tension: balance adversarial and statistical error 10
Proof Overview (Mean Estimation) Recall goal: given n points x 1 , . . . , x n , αn drawn from p ∗ , estimate mean µ of p ∗ Key tension: balance adversarial and statistical error Key tension: balance adversarial and statistical error 10
Proof Overview (Mean Estimation) Recall goal: given n points x 1 , . . . , x n , αn drawn from p ∗ , estimate mean µ of p ∗ Key tension: balance adversarial and statistical error Key tension: balance adversarial and statistical error Key tension: balance adversarial and statistical error 10
Proof Overview (Mean Estimation) Recall goal: given n points x 1 , . . . , x n , αn drawn from p ∗ , estimate mean µ of p ∗ Key tension: balance adversarial and statistical error Key tension: balance adversarial and statistical error Key tension: balance adversarial and statistical error High-level strategy: solve convex optimization problem • if cost is low, estimation succeeds (spectral norm bound) • if cost is high, identify and remove outliers 10
Proof Overview (Mean Estimation) Recall goal: given n points x 1 , . . . , x n , αn drawn from p ∗ , estimate mean µ of p ∗ Key tension: balance adversarial and statistical error Key tension: balance adversarial and statistical error Key tension: balance adversarial and statistical error High-level strategy: solve convex optimization problem • if cost is low, estimation succeeds (spectral norm bound) • if cost is high, identify and remove outliers 10
Algorithm � n i =1 � x i − µ � 2 First pass: minimize µ 2 11
Recommend
More recommend