multi level thresholding tests for high dimensional means
play

Multi-level Thresholding Tests for High Dimensional Means and - PowerPoint PPT Presentation

Multi-level Thresholding Tests for High Dimensional Means and Covariance Matrices Song Xi Chen Guanghua School of Management Center for Statistical Science Peking University Joint work with Bin Guo and Yumou Qiu Two Sample Testing and Signal


  1. Multi-level Thresholding Tests for High Dimensional Means and Covariance Matrices Song Xi Chen Guanghua School of Management Center for Statistical Science Peking University Joint work with Bin Guo and Yumou Qiu

  2. Two Sample Testing and Signal Detection i.i.d. i.i.d. ◮ X 1 , . . . , X n 1 ∼ F 1 ( µ 1 , Σ 1 ) and Y 1 , . . . , Y n 2 ∼ F 2 ( µ 2 , Σ 2 ) ◮ X k = ( X k 1 , . . . , X kp ) T and Y k = ( Y k 1 , . . . , Y kp ) T are p-dimensional ◮ Means: µ 1 = ( µ 11 , . . . , µ 1 p ) T and µ 2 = ( µ 21 , . . . , µ 2 p ) T ◮ Covariances: Σ 1 = ( σ ij 1 ) p × p and Σ 2 = ( σ ij 2 ) p × p Signals in the Mean H 0 : µ 1 = µ 2 vs. H a : µ 1 � = µ 2 Signals in the Covariance H 0 : Σ 1 = Σ 2 vs. H a : Σ 1 � = Σ 2 2 / 60

  3. Tests for Means: Hotelling’s T 2 � − 1 � S n ( 1 n 1 + 1 T 2 = ( ¯ X 1 − ¯ X 2 ) ′ ( ¯ X 1 − ¯ n 2 ) X 2 ) where n i 2 � � ( X ij − ¯ X i )( X ij − ¯ S n = ( n 1 + n 2 − 2) − 1 X i ) ′ . i =1 j =1 Under H 0 : µ 1 = µ 2 and Gaussianity n 1 + n 2 − p − 1 p ( n 1 + n 2 − 2) T 2 ∼ F p,n 1 + n 2 − p − 1 . Reject H 0 at level α if n 1 + n 2 − p − 1 p ( n 1 + n 2 − 2) T 2 > F p,n 1 + n 2 − p − 1 ( α ) . 3 / 60

  4. HD Tests for Means without Threshholding ◮ Bai and Saranadasa (BS) (1996) removed S − 1 from T 2 n X 2 ) − n 1 + n 2 BS = ( ¯ X 1 − ¯ X 2 ) ′ ( ¯ X 1 − ¯ trS n n 1 n 2 �� � Requires: (i) p n → c ∈ [0 , ∞ ) and (ii) λ max = o tr (Σ 2 ) . ◮ Srivastava (2009): replaced S n with the diagonal matrix of S n in T 2 Requires: Gaussian data and p ∼ n . ◮ Chen and Qin (2010): proposed U -statistic formulation allowing p ≫ n , Σ 1 � = Σ 2 . 4 / 60

  5. Chen-Qin (2010, AoS) Test i � = j X T i � = j X T � n 1 � n 2 j =1 X T � 1 i X 1 j � 2 i X 2 j 1 i X 2 j i =1 Q n = + − 2 n 1 ( n 1 − 1) n 2 ( n 2 − 1) n 1 n 2 • A linear combination of one- and two-sample U-statistics. E ( Q n ) = µ T 1 µ 1 + µ T 2 µ 2 − 2 µ T 1 µ 2 = � µ 1 − µ 2 � 2 . • Main Assumption for Asymptotic Normality of Q n tr (Σ 4 ) tr 2 (Σ 2 ) → 0 as p → ∞ . • Applicable for ANY p if the eigenvalues are bounded. • Thus, allows p ≫ n . 5 / 60

  6. Asymptotic Power of Chen-Qin Test    − z α + nκ (1 − κ ) � µ 1 − µ 2 � 2  , Φ � 2 tr (˜ Σ 2 ) where ˜ Σ = κ Σ 1 + (1 − κ )Σ 2 and κ = lim n 1 ,n 2 →∞ n 1 / ( n 1 + n 2 ) . ◮ A VALID test under weak assumptions for wide range of dimensions. ◮ “VALID” means control of type I error. ◮ The power may be weak under high dimension due to inflated tr (˜ Σ 2 ) . 6 / 60

  7. Thresholding Tests for Means ◮ One Sample Higher Criticism (HC) Test (Tukey, 1976). ◮ Donoho and Jin (2004) pioneered the theory under N p ( µ, I p ) . ◮ µ = ( µ 1 , · · · , µ p ) and those non-zero µ i = √ 2 r log p ◮ Faint Signals if r ∈ (0 , 1) ◮ S β = { k : µ k � = 0 } , the signal set. ◮ | S β | = p 1 − β – the number of signals. ◮ Sparse Signals if β ∈ (0 . 5 , 1) . 7 / 60

  8. Higher Criticism (HC) ◮ X ∼ N ( µ, I p ) ◮ Z i is the Z-statistic at the i -th dimension. ◮ p i = P ( N (0 , 1) > Z i ) is the p-value for the i -th null. ◮ Sorted p-values: p (1) ≤ p (2) ≤ · · · ≤ p ( p ) . ◮ The HC statistic: √ p ( i/p − p ( i ) ) HC ∗ n = max . � p ( i ) (1 − p ( i ) ) 0 ≤ i ≤ α ∗ p 8 / 60

  9. Optimal Detection Boundary Under N p ( µ, I p ) ◮ A phase diagram in ( r, β ) -plane r = ̺ ( β ) . ◮ If r > ̺ ( β ) , H 0 and H 1 are asymptotically separable; If r < ̺ ( β ) , H 0 and H 1 are not separable. ◮ Donoho and Jin (2004) established the detection boundary for HC test for Gaussian data � β − 1 / 2 , 1 / 2 < β ≤ 3 / 4 ; ̺ ∗ ( β ) = (1 − √ 1 − β ) 2 , 3 / 4 < β < 1 Same as the optimal detection boundary by Ingster (1999) without knowing the underlying signal strength r and sparsity β . 9 / 60

  10. (i) For any test of the hypothesis, P(Type I Error) + P(Type II Error) → 1 if r < ρ ( β ) as n, p → ∞ ; (ii) There exists a test (HC) such that P(Type I Error) + P(Type II Error) → 0 if r > ρ ( β ) as n, p → ∞ . 10 / 60

  11. L γ -Thresholding for H 0 : µ = 0 ◮ Motivated by Donoho and Johnstone (1994) and Fan (1996). ¯ � n X i = 1 j =1 X ij ◮ n ◮ The threshold statistics p X i | γ I {|√ n ¯ � | n ¯ � T γn ( s ) = X i | > 2 s log( p ) } for s ∈ (0 , 1) i =1 ◮ γ = 0 : the HC; ◮ γ = 1 : the L 1 -thresholding (hard thresholding) by Donoho and Johnstone (1994); ◮ γ = 2 : the L 2 -thresholding used in Zhong, Chen and Xu (2013). 11 / 60

  12. L 2 -Thresholding Tests ◮ One sample: Zhong, Chen and Xu (2013, AoS) ◮ Two sample: Chen, Li and Zhong (2019, AoS) ◮ Can also attain Ingster “optimal” detection boundary when the underlying distributions is unknown and data are dependent ( Σ � = I p ). ◮ More powerful than the HC when ( r, β ) are above the boundary (ZCX). ◮ The detection boundary can be lowered by utilizing the dependence (CLZ) by first transforming data X ij to ˆ ˜ Σ − 1 X ij then applying the L 2 -thresholding. 12 / 60

  13. Two-Sample for Means: Signals and Sparsity ◮ δ k = µ 1 k − µ 2 k – signal in the k-th dimension. ◮ S β = { k : δ k � = 0 } , the signal set. ◮ | S β | = p 1 − β — the number of signals. ◮ Sparse if β ∈ (0 . 5 , 1) . 13 / 60

  14. Two-Sample for Means: L 2 Test Statistic ◮ An unbiased estimator to the signal δ 2 k : U-statistics n 1 n 2 � � 1 1 X ( k ) 1 i X ( k ) X ( k ) 2 i X ( k ) T nk = 1 j + 2 j n 1 ( n 1 − 1) n 2 ( n 2 − 1) i � = j i � = j n 1 n 2 � � 2 X ( k ) 1 i X ( k ) − 2 j . n 1 n 2 i j ◮ Test statistic ˜ � T n = n T nk . ◮ Chen and Qin (2010, AoS) 14 / 60

  15. Two-Sample for Means: L 2 vs L 2 -Thresholding Statistics ◮ CQ: ˜ T n = n � T ni + n � T nk . i ∈ S c k ∈ S β β ◮ Oracle: n � T ni . i ∈ S β ◮ Thresholding Statistic p � � � L n ( s ) = nT nk I nT nk + 1 > λ n ( s ) k =1 where λ n ( s ) = 2 s log( p ) . ◮ Try to exclude those δ k = 0 dimensions. 15 / 60

  16. Two-Sample Tests for Means: Variance Comparison – Strong Signal Case of nδ 2 k > 2 log( p ) Tests Variances 2 p + 2 � � ρ 2 L 2 ij + 4 n δ k δ l ρ kl i � = j k,l ∈ S β � � 2 p 1 − β + 2 ρ 2 Oracle ij + 4 n δ k δ l ρ kl i � = j ∈ S β k,l ∈ S β � � 2 L p + 2 p 1 − β + 2 ρ 2 Thresholding ij + 4 n δ k δ l ρ kl i � = j ∈ S β k,l ∈ S β • L p denotes slowly varying functions in the form of ( a log p ) b . 16 / 60

  17. Multi-level Thresholding: Weak Signal Case Weak Signals: δ 2 k = 2 r log p/n for r < 1 . L n ( s ) − ˆ µ L n ( s ) , 0 M L n = max , σ L n ( s ) , 0 ˆ s ∈S n X ( k ) X ( k ) S n = { s k : s k = n ( ¯ − ¯ ) 2 / (2 log p ) for k = 1 , · · · , p } . 1 1 Theorem. Under Conditions C1 - C3 and H 0 , � � → exp ( − e − x ) , P a ( log p ) M L n − b ( log p, η ) ≤ x 1 2 and b ( y, η ) = 2 log y + 2 − 1 loglog y − 2 − 1 log { 4 π where a ( y ) = (2 log y ) (1 − η )2 } . 17 / 60

  18. Detection Boundary of Multi-level Thresholding for Means Multi-level Thresholding test rejects H 0 if M L n ≥ G α = { q α + b ( log p, η ) } /a ( log p ) , q α is the upper α quantile of the Gumbel distribution. Define  β − 1 2 ≤ β ≤ 3 1 2 , 4 ;    ̺ ( β ) = (1 − √ 1 − β ) 2 ,  3  4 < β < 1 ,  Theorem Assume Conditions C1 - C3 . If r > ̺ ( β ) , the sum of type I and II errors of the multi-level thresholding test converges to zero as α → 0 and p → ∞ . ◮ The same “detection boundary” as the optimal one for N p ( µ, I p ) case. ◮ Signal enhancement by transforming data with the precision matrix Ω = Σ − 1 . ◮ Improved detection boundary: lower than ρ ( β ) . ◮ See Chen, Li and Zhong (2019) for details. 18 / 60

  19. Two Sample Tests for Covariance Matrices ◮ H 0 : Σ 1 = Σ 2 vs Σ 1 � = Σ 2 . ◮ S n 1 = ( s ij 1 ) , S n 2 = ( s ij 2 ) : two sample covariances ◮ θ ij 1 = Var { ( X ki − µ 1 i )( X kj − µ 1 j ) } and θ ij 2 = Var { ( Y ki − µ 2 i )( Y kj − µ 2 j ) } p ◮ ˆ � n 1 k =1 { ( X ki − ¯ X i )( X kj − ¯ X j ) − s ij 1 } 2 1 θ ij 1 = → θ ij 1 n 1 p ◮ ˆ � n 2 k =1 { ( Y ki − ¯ Y i )( Y kj − ¯ 1 Y j ) − s ij 2 } 2 θ ij 2 = → θ ij 2 n 2 ( s ij 1 − s ij 2 ) 2 M ij = , 1 ≤ i ≤ j ≤ p. ˆ θ ij 1 /n 1 + ˆ θ ij 2 /n 2 19 / 60

  20. Existing Work ◮ Bai et al. (2009, AoS): Corrected Likelihood Ratio test using RMT. ◮ Cai, Liu and Xia (2013): L max statistic M n = max 1 ≤ i ≤ j ≤ p M ij ◮ Only use the maximal signal ◮ Li and Chen (2012): L 2 statistic, sum over all M ij ◮ Include too many uninformative entries ◮ Srivastava and Yanagihara (2010): Also L 2 statistic to measure tr (Σ 2 1 ) / ( tr 2 (Σ 1 )) − tr (Σ 2 2 ) / ( tr 2 (Σ 2 )) 20 / 60

  21. L 2 -Test Statistic: Li and Chen (2012) ◮ Target on Square of Frobenius norm: tr { (Σ 1 − Σ 2 ) 2 } = tr (Σ 2 1 ) + tr (Σ 2 2 ) − 2 tr (Σ 1 Σ 2 ) . ◮ Note that Σ 1 = Σ 2 if and only if tr { (Σ 1 − Σ 2 ) 2 } = 0 . ◮ Although the Frobenius norm is large, it brings two advantages. ◮ (i) Relatively easier to analyze for test procedures and power formula. ◮ (ii) Can target on certain sections of the covariance matrix. 21 / 60

Recommend


More recommend