resilience a criterion for learning in the presence of
play

Resilience: A Criterion for Learning in the Presence of Arbitrary - PowerPoint PPT Presentation

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers Jacob Steinhardt, Moses Charikar, Gregory Valiant ITCS 2018 January 14, 2018 Motivation: Robust Learning Question What concepts can be learned robustly , even if some


  1. Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers Jacob Steinhardt, Moses Charikar, Gregory Valiant ITCS 2018 January 14, 2018

  2. Motivation: Robust Learning Question What concepts can be learned robustly , even if some data is arbitrarily corrupted? 1

  3. Example: Mean Estimation Problem Given data x 1 , . . . , x n ∈ R d , of which (1 − ǫ ) n come from p ∗ (and remaining ǫn are arbitrary outliers), estimate mean µ of p ∗ . 2

  4. Example: Mean Estimation Problem Given data x 1 , . . . , x n ∈ R d , of which (1 − ǫ ) n come from p ∗ (and remaining ǫn are arbitrary outliers), estimate mean µ of p ∗ . 2

  5. Example: Mean Estimation Problem Given data x 1 , . . . , x n ∈ R d , of which (1 − ǫ ) n come from p ∗ (and remaining ǫn are arbitrary outliers), estimate mean µ of p ∗ . 2

  6. Example: Mean Estimation Problem Given data x 1 , . . . , x n ∈ R d , of which (1 − ǫ ) n come from p ∗ (and remaining ǫn are arbitrary outliers), estimate mean µ of p ∗ . 2

  7. Example: Mean Estimation Problem Given data x 1 , . . . , x n ∈ R d , of which (1 − ǫ ) n come from p ∗ (and remaining ǫn are arbitrary outliers), estimate mean µ of p ∗ . Issue: high dimensions 2

  8. Mean Estimation: Gaussian Example Suppose clean data is Gaussian: x i ∼ N ( µ, I ) � �� � Gaussian mean µ variance 1 each coord. 3

  9. Mean Estimation: Gaussian Example Suppose clean data is Gaussian: x i ∼ N ( µ, I ) � �� � Gaussian mean µ variance 1 each coord. √ √ √ 1 2 + · · · + 1 2 = � x i − µ � 2 ≈ d d 3

  10. Mean Estimation: Gaussian Example Suppose clean data is Gaussian: x i ∼ N ( µ, I ) � �� � Gaussian mean µ variance 1 each coord. √ √ √ 1 2 + · · · + 1 2 = � x i − µ � 2 ≈ d d √ ǫ d 3

  11. Mean Estimation: Gaussian Example Suppose clean data is Gaussian: x i ∼ N ( µ, I ) � �� � Gaussian mean µ variance 1 each coord. √ √ √ 1 2 + · · · + 1 2 = � x i − µ � 2 ≈ d d √ ǫ d Cannot filter independently even if know true density! 3

  12. History Progress in high dimensions only recently: • Tukey median [1975]: robust but NP-hard • Donoho estimator [1982]: high error • [DKKLMS16, LRV16]: first dimension-independent error bounds 4

  13. History Progress in high dimensions only recently: • Tukey median [1975]: robust but NP-hard • Donoho estimator [1982]: high error • [DKKLMS16, LRV16]: first dimension-independent error bounds • large body of work since then [CSV17, DKKLMS17, L17, DBS17] • many other problems including PCA [XCM10], regression [NTN11], classification [FHKP09], etc. 4

  14. This Talk Question What general and simple properties enable robust estimation? 5

  15. This Talk Question What general and simple properties enable robust estimation? New information-theoretic criterion: resilience . 5

  16. Resilience Suppose { x i } i ∈ S is a set of points in R d . Definition (Resilience) A set S is ( σ, ǫ ) -resilient in a norm � · � around a point µ if for all subsets T ⊆ S of size at least (1 − ǫ ) | S | , � 1 � � � ( x i − µ ) � ≤ σ. � � | T | i ∈ T Intuition: all large subsets have similar mean. 6

  17. Main Result Let S ⊆ R d be a set of of (1 − ǫ ) n “good” points. Let S out be a set of ǫn arbitrary outliers. We observe ˜ S = S ∪ S out . Theorem ǫ If S is ( σ, 1 − ǫ ) -resilient around µ , then it is possible to output µ such that � ˆ ˆ µ − µ � ≤ 2 σ . In fact, outputting the center of any resilient subset of ˜ S will work! 7

  18. Pigeonhole Argument Claim: If S and S ′ are ( σ, 1 − ǫ ) -resilient around µ and µ ′ and have size ǫ (1 − ǫ ) n , then � µ − µ ′ � ≤ 2 σ . 8

  19. Pigeonhole Argument Claim: If S and S ′ are ( σ, 1 − ǫ ) -resilient around µ and µ ′ and have size ǫ (1 − ǫ ) n , then � µ − µ ′ � ≤ 2 σ . Proof: S ˜ S 8

  20. Pigeonhole Argument Claim: If S and S ′ are ( σ, 1 − ǫ ) -resilient around µ and µ ′ and have size ǫ (1 − ǫ ) n , then � µ − µ ′ � ≤ 2 σ . Proof: S ˜ S S ′ 8

  21. Pigeonhole Argument Claim: If S and S ′ are ( σ, 1 − ǫ ) -resilient around µ and µ ′ and have size ǫ (1 − ǫ ) n , then � µ − µ ′ � ≤ 2 σ . Proof: S ˜ S S ∩ S ′ S ′ • Let µ S ∩ S ′ be the mean of S ∩ S ′ . ǫ • By Pigeonhole, | S ∩ S ′ | ≥ 1 − ǫ | S ′ | . 8

  22. Pigeonhole Argument Claim: If S and S ′ are ( σ, 1 − ǫ ) -resilient around µ and µ ′ and have size ǫ (1 − ǫ ) n , then � µ − µ ′ � ≤ 2 σ . Proof: S ˜ S S ∩ S ′ S ′ • Let µ S ∩ S ′ be the mean of S ∩ S ′ . ǫ • By Pigeonhole, | S ∩ S ′ | ≥ 1 − ǫ | S ′ | . • Then � µ ′ − µ S ∩ S ′ � ≤ σ by resilience. • Similarly, � µ − µ S ∩ S ′ � ≤ σ . • Result follows by triangle inequality. 8

  23. Implication: Mean Estimation Lemma If a dataset has bounded covariance, it is ( ǫ, O ( √ ǫ )) -resilient (in the ℓ 2 -norm). 9

  24. Implication: Mean Estimation Lemma If a dataset has bounded covariance, it is ( ǫ, O ( √ ǫ )) -resilient (in the ℓ 2 -norm). Proof: If ǫn points ≫ 1 / √ ǫ from mean, would make variance ≫ 1 . Therefore, deleting ǫn points changes mean by at most ≈ ǫ · 1 / √ ǫ = √ ǫ . 9

  25. Implication: Mean Estimation Lemma If a dataset has bounded covariance, it is ( ǫ, O ( √ ǫ )) -resilient (in the ℓ 2 -norm). Proof: If ǫn points ≫ 1 / √ ǫ from mean, would make variance ≫ 1 . Therefore, deleting ǫn points changes mean by at most ≈ ǫ · 1 / √ ǫ = √ ǫ . Corollary If the clean data has bounded covariance, its mean can be estimated to ℓ 2 -error O ( √ ǫ ) in the presence of ǫn outliers. 9

  26. Implication: Mean Estimation Lemma If a dataset has bounded covariance, it is ( ǫ, O ( √ ǫ )) -resilient (in the ℓ 2 -norm). Proof: If ǫn points ≫ 1 / √ ǫ from mean, would make variance ≫ 1 . Therefore, deleting ǫn points changes mean by at most ≈ ǫ · 1 / √ ǫ = √ ǫ . Corollary If the clean data has bounded k th moments, its mean can be estimated to ℓ 2 -error O ( ǫ 1 − 1 /k ) in the presence of ǫn outliers. 9

  27. Implication: Learning Discrete Distributions Suppose we observe samples from a distribution π on { 1 , . . . , m } . Samples come in r -tuples, which are either all good or all outliers. 10

  28. Implication: Learning Discrete Distributions Suppose we observe samples from a distribution π on { 1 , . . . , m } . Samples come in r -tuples, which are either all good or all outliers. Corollary The distribution π can be estimated (in TV distance) to error � O ( ǫ log(1 /ǫ ) /r ) in the presence of ǫn outliers. 10

  29. Implication: Learning Discrete Distributions Suppose we observe samples from a distribution π on { 1 , . . . , m } . Samples come in r -tuples, which are either all good or all outliers. Corollary The distribution π can be estimated (in TV distance) to error � O ( ǫ log(1 /ǫ ) /r ) in the presence of ǫn outliers. • follows from resilience in ℓ 1 -norm • see also [Qiao & Valiant, 2018] later in this session! 10

  30. A Majority of Outliers Can also handle the case where clean set has size only αn ( α < 1 2 ): ˜ S S 11

  31. A Majority of Outliers Can also handle the case where clean set has size only αn ( α < 1 2 ): ˜ S S • cover ˜ S by resilient sets 11

  32. A Majority of Outliers Can also handle the case where clean set has size only αn ( α < 1 2 ): S ′ ˜ S S S ∩ S ′ • cover ˜ S by resilient sets • at least one set S ′ must have high overlap with S ... 11

  33. A Majority of Outliers Can also handle the case where clean set has size only αn ( α < 1 2 ): S ′ ˜ S S S ∩ S ′ • cover ˜ S by resilient sets • at least one set S ′ must have high overlap with S ... • ...and hence � µ ′ − µ � ≤ 2 σ as before. 11

  34. A Majority of Outliers Can also handle the case where clean set has size only αn ( α < 1 2 ): S ′ ˜ S S S ∩ S ′ • cover ˜ S by resilient sets • at least one set S ′ must have high overlap with S ... • ...and hence � µ ′ − µ � ≤ 2 σ as before. • Recovery in list-decodable model [BBV08]. 11

  35. Implication: Stochastic Block Models Set of αn good and (1 − α ) n bad vertices. 12

  36. Implication: Stochastic Block Models Set of αn good and (1 − α ) n bad vertices. • good ↔ good: dense (avg. deg. = a ) • good ↔ bad: sparse (avg. deg. = b ) 12

  37. Implication: Stochastic Block Models Set of αn good and (1 − α ) n bad vertices. • good ↔ good: dense (avg. deg. = a ) • good ↔ bad: sparse (avg. deg. = b ) • bad ↔ bad: arbitrary 12

  38. Implication: Stochastic Block Models Set of αn good and (1 − α ) n bad vertices. • good ↔ good: dense (avg. deg. = a ) • good ↔ bad: sparse (avg. deg. = b ) • bad ↔ bad: arbitrary Question: when can good set be recovered (in terms of α, a, b )? 12

  39. Implication: Stochastic Block Models Using resilience in “truncated ℓ 1 -norm”, can show: Corollary The set of good vertices can be approximately recovered when- ever ( a − b ) 2 ≫ log(2 /α ) . a α 2 13

Recommend


More recommend