when does the tukey median work
play

When does the Tukey Median work? Banghua Zhu with Jiantao Jiao and - PowerPoint PPT Presentation

When does the Tukey Median work? Banghua Zhu with Jiantao Jiao and Jacob Steinhardt Department of EECS and Statistics University of California, Berkeley ISIT 2020 June 21, 2020 Robust mean estimation - mean and median in 1d Mean estimation


  1. When does the Tukey Median work? Banghua Zhu with Jiantao Jiao and Jacob Steinhardt Department of EECS and Statistics University of California, Berkeley ISIT 2020 June 21, 2020

  2. Robust mean estimation - mean and median in 1d Mean estimation in the presence of additive corruption (outlier) (Huber, 1973)

  3. Robust mean estimation - mean and median in 1d Mean estimation in the presence of additive corruption (outlier) (Huber, 1973)

  4. Median in high dimension? Tukey depth: v ∈ R d p ( v ⊤ ( X − µ ) ≥ 0 ) . D Tukey ( µ , p ) = inf Tukey median (Tukey, 1975): the point(s) with largest Tukey depth: T ( p ) = argmax D Tukey ( µ , p ) . µ ∈ R d

  5. Preliminaries - corruption model Two corruption models: Total Variation (TV) corruption stronger than additive corruption:

  6. Preliminaries - assumption on the true distribution p ∗ Halfspace symmetric distributions (Zuo and Serfling, 2000; Chen, Tyler, et al., 2002): exists a point µ ∈ R d such that for X ∼ p ∗ , d ∀ v ∈ R d , v ⊤ ( X − µ ) = − v ⊤ ( X − µ ) Example: Gaussian

  7. Preliminaries - performance metric Maximum bias for Tukey median: the maximum distance between T ( p ) and T ( p ∗ ) , where p is in the set of all possible level- ε corruptions b add ( p ∗ , ε ) = sup � x − y � , p ∈ C add ( p ∗ , ε ) , x ∈ T ( p ) , y ∈ T ( p ∗ ) b TV ( p ∗ , ε ) = sup � x − y � . TV ( p ∗ , p ) ≤ ε , x ∈ T ( p ) , y ∈ T ( p ∗ )

  8. Preliminaries - performance metric Breakdown point: the minimum corruption level that can drive the maximum bias to infinity: ε ∗ add ( p ∗ ) = inf { ε | b ( p ∗ , ε ) = ∞ } , ε ∗ TV ( p ∗ ) = inf { ε | b ( p ∗ , ε ) = ∞ } . Breakdown point for a family of distribution G : ε ∗ q ∈ G ε ∗ ε ∗ q ∈ G ε ∗ add ( G ) = inf add ( q ) , TV ( G ) = inf TV ( q ) .

  9. Previous Results Breakdown point under additive corruption (Donoho, 1982; Donoho and Gasko, 1992): 0.6 Tukey+additive+symmetric Tukey+additive+general 0.5 breakdown point 0.4 1/3 0.3 0.2 1/(d+1) 0.1 0 1 2 3 4 5 6 7 dimension

  10. Our contribution Breakdown point under TV corruption: 0.6 1/2 0.5 breakdown point 0.4 1/3 0.3 1/4 0.2 1/(d+1) projection+TV+symmetric Tukey+TV+symmetric 0.1 Tukey+additive+symmetric Tukey+additive+general 0 1 2 3 4 5 6 7 dimension Characterization of maximum bias in population and finite-sample case: both algorithms can achieve near optimal maximum bias Θ( ε ) under TV corruption when ε < 0 . 249 for Gaussian distribution.

  11. Main results - Breakdown point Theorem (Breakdown point for Tukey median (Zhu, Jiao, and Steinhardt, 2020, Theorem 1)) Denote G as the set of all halfspace-symmetric distributions. Then the breakdown point for G is  �  1 / 2 , d = 1  1 / 2 , d = 1 ε ∗ ε ∗ add ( G ) = d ≥ 2 , TV ( G ) = 1 / 3 , d = 2  1 / 3 ,  1 / 4 , d ≥ 3 Proof of upper bound via figures:

  12. Main results - Maxbias Theorem (Maximum bias under finite-sample TV corruption model (Zhu, Jiao, and Steinhardt, 2020, Theorem 3)) Assume p ∗ is halfspace-symmetric centered at µ ∗ with decay function h ( t ) = sup v ∈ R d , � v � ∗ ≤ 1 p ∗ ( v ⊤ ( X − µ ∗ ) > t ) . Denote ˆ p n as the empirical distribution taken from ε - TV corrupted distribution p. When d ≥ 3 , with probability at least 1 − δ , there exists universal constant C > 0 such that for any ˆ µ ∈ T (ˆ p n ) , µ − µ ∗ � ≤ h − 1 ( 1 − h ( 0 ) − 2 ˜ � ˆ ε ) (1) � d + 1 +log( 1 / δ ) , h − 1 is the generalized when 2 ˜ ε < 1 − h ( 0 ) , ˜ ε = ε + C · n inverse function of h. As n → ∞ , recover the result in population. Can generalize to other cases. Since h ( 0 ) ≤ 1 / 2, implies 1 / 4 lower bound on the breakdown point. For Gaussian p ∗ , h ( t ) = 1 / 2 − Θ( t ) for t small, achieve maxibias O ( ε ) when n = Ω( d / ε 2 ) .

  13. Main results - Maxbias (proof sketch) Proof sketch of population case: Lemma (Zhu, Jiao, and Steinhardt (2020, Lemma 1)) If D Tukey ( T ( p ) , p ∗ ) ≥ α , we have � T ( p ) − µ ∗ � ≤ h − 1 ( α ) . (2) For TV corruption model, we have D Tukey ( T ( p ) , p ∗ ) ≥ D Tukey ( T ( p ) , p ) − ε ≥ D Tukey ( µ ∗ , p ) − ε ≥ D Tukey ( µ ∗ , p ∗ ) − 2 ε = 1 − h ( 0 ) − 2 ε . µ , p ∗ ) , ˆ For finite-sample case, it suffices to lower bound D Tukey (ˆ µ ∈ T (ˆ p n ) using standard concentration argument.

  14. Main results - Projection algorithm Consider the halfspace metric defined in Donoho and Liu (1988) as � | p ( v ⊤ X ≥ t ) − q ( v ⊤ X ≥ t ) | . TV ( p , q ) = sup (3) v ∈ R d , t ∈ R Let G ( h ) be the set of half-space symmetric distributions: G ( h ) = { p |∃ µ ∈ R d X ∼ p is halfspace-symmetric around µ and p ( v ⊤ ( X − µ ) > t ) ≤ h ( t ) } . sup (4) v ∈ R d , � v � ∗ ≤ 1 The projection algorithm outputs ˆ µ ( p ) = T ( q ) : r � V T e d n u n o t i c e corrupted distribution ˆ o j p n r p TV � ε � q ∈ G p ∗ ∈ G G

  15. Main results - Projection algorithm Theorem (Maximum bias and breakdown point for projection algorithm (Zhu, Jiao, and Steinhardt, 2020, Theorem 3)) Assume the true distribution p ∗ is halfspace-symmetric centered at µ ∗ with decay function h ( t ) = sup v ∈ R d , � v � ∗ ≤ 1 p ∗ ( v ⊤ ( X − µ ∗ ) > t ) . Then for any p with TV ( p ∗ , p ) ≤ ε , the projection estimator ˆ µ ( p ) satisfies µ − µ ∗ � ≤ 2 h − 1 ( 1 / 2 − ε ) � ˆ (5) when ε < 1 / 2 . Here h − 1 is the generalized inverse function of h. Improve the breakdown point from 1 / 4 for Tukey median in high dimension under TV corruption to 1 / 2, optimal among all translation-equivariant estimators (Rousseeuw and Leroy, 2005, Equation 1.38). Can be extended to finite-sample case using similar argument. Achieve O ( ε ) maximum bias for Gaussians.

  16. Main results - Projection algorithm Intuition on improving the breakdown point:

  17. Conclusion Tukey median: affine-equivariant, breakdown point 1 / 4 under TV corruption in high dimensions, good finite sample error. � TV projection algorithm: not affine-equivariant, breakdown point 1 / 2 and good finite sample error. Open problem: find an estimator that is affine-equivariant, with breakdown point 1 / 2 and good finite sample error.

  18. References I Huber, P . J. (1973). Robust regression: Asymptotics, conjectures and monte carlo. The Annals of Statistics , 1 (5), 799–821. Tukey, J. W. (1975). Mathematics and the picturing of data. In Proceedings of the international congress of mathematicians, vancouver, 1975 . Donoho, D. L. (1982). Breakdown properties of multivariate location estimators (tech. rep.). Technical report, Harvard University, Boston. Donoho, D. L., & Liu, R. C. (1988). The “automatic” robustness of minimum distance functionals. The Annals of Statistics , 16 (2), 552–586. Donoho, D. L., & Gasko, M. (1992). Breakdown properties of location estimates based on halfspace depth and projected outlyingness. The Annals of Statistics , 20 (4), 1803–1827. Zuo, Y., & Serfling, R. (2000). General notions of statistical depth function. Annals of statistics , 461–482. Chen, Z., Tyler, D. E. Et al. (2002). The influence function and maximum bias of tukey’s median. The Annals of Statistics , 30 (6), 1737–1759. Rousseeuw, P . J., & Leroy, A. M. (2005). Robust regression and outlier detection (Vol. 589). John wiley & sons.

  19. References II Zhu, B., Jiao, J., & Steinhardt, J. (2020). When does the tukey median work? arXiv preprint arXiv:2001.07805 .

Recommend


More recommend