robustness meets algorithms
play

Robustness Meets Algorithms Ankur Moitra (MIT) Robust Statistics - PowerPoint PPT Presentation

Robustness Meets Algorithms Ankur Moitra (MIT) Robust Statistics Summer School CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters? CLASSIC


  1. Simultaneously [Lai, Rao, Vempala ‘16] gave agnostic algorithms that achieve: When the covariance is bounded, this translates to: Subsequently many works handling more errors via list decoding , giving lower bounds against statistical query algorithms ,

  2. Simultaneously [Lai, Rao, Vempala ‘16] gave agnostic algorithms that achieve: When the covariance is bounded, this translates to: Subsequently many works handling more errors via list decoding , giving lower bounds against statistical query algorithms , weakening the distributional assumptions ,

  3. Simultaneously [Lai, Rao, Vempala ‘16] gave agnostic algorithms that achieve: When the covariance is bounded, this translates to: Subsequently many works handling more errors via list decoding , giving lower bounds against statistical query algorithms , weakening the distributional assumptions , exploiting sparsity ,

  4. Simultaneously [Lai, Rao, Vempala ‘16] gave agnostic algorithms that achieve: When the covariance is bounded, this translates to: Subsequently many works handling more errors via list decoding , giving lower bounds against statistical query algorithms , weakening the distributional assumptions , exploiting sparsity , working with more complex generative models

  5. A GENERAL RECIPE Robust estimation in high-dimensions: Ÿ Step #1: Find an appropriate parameter distance Ÿ Step #2: Detect when the naïve estimator has been compromised Ÿ Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity

  6. A GENERAL RECIPE Robust estimation in high-dimensions: Ÿ Step #1: Find an appropriate parameter distance Ÿ Step #2: Detect when the naïve estimator has been compromised Ÿ Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity Let’s see how this works for unknown mean …

  7. OUTLINE Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance Part III: Experiments

  8. OUTLINE Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance Part III: Experiments

  9. PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians

  10. PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1)

  11. PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) This can be proven using Pinsker’s Inequality and the well-known formula for KL-divergence between Gaussians

  12. PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1)

  13. PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) Corollary: If our estimate (in the unknown mean case) satisfies then

  14. PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) Corollary: If our estimate (in the unknown mean case) satisfies then Our new goal is to be close in Euclidean distance

  15. OUTLINE Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance Part III: Experiments

  16. OUTLINE Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance Part III: Experiments

  17. DETECTING CORRUPTIONS Step #2: Detect when the naïve estimator has been compromised

  18. DETECTING CORRUPTIONS Step #2: Detect when the naïve estimator has been compromised = uncorrupted = corrupted

  19. DETECTING CORRUPTIONS Step #2: Detect when the naïve estimator has been compromised = uncorrupted = corrupted There is a direction of large (> 1) variance

  20. Key Lemma: If X 1 , X 2 , … X N come from a distribution that is ε-close to and then for (1) (2) with probability at least 1-δ

  21. Key Lemma: If X 1 , X 2 , … X N come from a distribution that is ε-close to and then for (1) (2) with probability at least 1-δ Take-away: An adversary needs to mess up the second moment in order to corrupt the first moment

  22. OUTLINE Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance Part III: Experiments

  23. OUTLINE Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance Part III: Experiments

  24. A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers

  25. A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that:

  26. A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: v where v is the direction of largest variance

  27. A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: v where v is the direction of largest variance, and T has a formula

  28. A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: T v where v is the direction of largest variance, and T has a formula

  29. A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points

  30. A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we’d have no corrupted points left!

  31. A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we’d have no corrupted points left! Eventually we find (certifiably) good parameters

  32. A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we’d have no corrupted points left! Eventually we find (certifiably) good parameters Running Time: Sample Complexity:

  33. A WIN-WIN ALGORITHM Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we’d have no corrupted points left! Eventually we find (certifiably) good parameters Running Time: Sample Complexity: Concentration of LTFs

  34. OUTLINE Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance Part III: Experiments

  35. OUTLINE Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance Part III: Experiments

  36. A GENERAL RECIPE Robust estimation in high-dimensions: Ÿ Step #1: Find an appropriate parameter distance Ÿ Step #2: Detect when the naïve estimator has been compromised Ÿ Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity

  37. A GENERAL RECIPE Robust estimation in high-dimensions: Ÿ Step #1: Find an appropriate parameter distance Ÿ Step #2: Detect when the naïve estimator has been compromised Ÿ Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity How about for unknown covariance ?

  38. PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians

  39. PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2)

  40. PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2) Again, proven using Pinsker’s Inequality

  41. PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2) Again, proven using Pinsker’s Inequality Our new goal is to find an estimate that satisfies:

  42. PARAMETER DISTANCE Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2) Again, proven using Pinsker’s Inequality Our new goal is to find an estimate that satisfies: Distance seems strange, but it’s the right one to use to bound TV

  43. UNKNOWN COVARIANCE What if we are given samples from ?

  44. UNKNOWN COVARIANCE What if we are given samples from ? How do we detect if the naïve estimator is compromised?

  45. UNKNOWN COVARIANCE What if we are given samples from ? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices

  46. UNKNOWN COVARIANCE What if we are given samples from ? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices Proof uses Isserlis’s Theorem

  47. UNKNOWN COVARIANCE What if we are given samples from ? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices need to project out

  48. Key Idea: Transform the data, look for restricted large eigenvalues

  49. Key Idea: Transform the data, look for restricted large eigenvalues

  50. Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers

Recommend


More recommend