minimax testing of a composite null hypothesis defined
play

Minimax testing of a composite null hypothesis defined via a - PowerPoint PPT Presentation

Minimax testing of a composite null hypothesis defined via a quadratic functional Joint work with L. Comminges Asymptotic Statistics and Related Topics Tokyo, Japan Arnak S. Dalalyan ENSAE / CREST / GENES Motivation 1 Testing the relevance


  1. Minimax testing of a composite null hypothesis defined via a quadratic functional Joint work with L. Comminges Asymptotic Statistics and Related Topics Tokyo, Japan Arnak S. Dalalyan ENSAE / CREST / GENES

  2. Motivation 1 Testing the relevance of a group of variables � We observe a sampled signal f : R d → R t = ( t 1 , . . . , t d ) ⊤ �→ f ( t ) in a noisy environment. � The dimension d is large. � Based on a training sample, some variable selection procedure suggests the irrelevance of the subset of variables t J c := { t j : j ∈ J c } . � Based on a testing sample we would like to check the irrelevance of J c . This amounts to testing the hypothesis E [ Var ( f ( t ) | t J )] = 0. � Dalalyan, A.S. c Sept. 2, 2013 2

  3. Motivation 2 Testing the validity of a partial linear model � We observe a sampled signal obeying the partial linear model : f ( t ) = g ( t J ) + β ⊤ t J c in a noisy environment. � g , J and β are unknown. � The dimension d is large, but the cardinal of J is small. � For a given set J 0 , we would like to test the hypothesis J = J 0 . This amounts to testing the hypothesis Var [ ∇ J c 0 f ( t )] = 0. � Dalalyan, A.S. c Sept. 2, 2013 3

  4. Motivation 3 Testing the equality of two norms � Two noisy (sub)images g 1 and g 2 are observed. � The goal is to check whether they coincide up to a rotation and illumination change : g 1 ( z ) = g 2 ( R z ) + a , ∀ z ∈ D ⊂ R 2 , for some orthogonal matrix R and some a ∈ R . � This requires testing the hypothesis H 0 : ∃ ( R , a ) s.t. g 1 ( z ) = g 2 ( R z ) + a , ∀ z ∈ D (1) which is usually very time-consuming (involves a nonlinear and nonconvex minimization step). A simpler strategy is to start with testing H ′ 0 : Var [ g 1 ( Z )] = Var [ g 2 ( Z )] , and to reject the hypothesis H 0 if H ′ 0 is rejected. � Dalalyan, A.S. c Sept. 2, 2013 4

  5. Unifying framework Testing the nullspace of a quadratic functional in regression � Dalalyan, A.S. c Sept. 2, 2013 5

  6. Relation to previous work Non Sampled Multi- Beyond Beyond Gaussian variate Q = I Q � 0 Ingster & Stepa- x x x x � nova 2011 Ingster & Sapati- x � � x x nas 2009 Ingster, Sapa- x x x � x tinas & Suslina 2012 Laurent, Loubes x x x x � & Marteau 2011 Comminges & D. � � � � � 2012 Remark The approach adopted in the first three references is purely asymptotic, whereas Laurent et al. (2011) obtained nonasymptotic rates of separation. � Dalalyan, A.S. c Sept. 2, 2013 6

  7. Overview of our results Testing procedure • We observe { ( x i , t i ) } i = 1 ,..., n ⊂ R × [ 0 , 1 ] d such that f ( t ) = � x i = f ( t i )+ ξ i , ℓ ∈ L θ ℓ [ f ] ϕ ℓ ( t ) , iid ∼ U [ 0 , 1 ] d . where ξ i iid with E [ ξ 1 ] = 0 and t i • We wish to test the hypothesis H 0 : Q [ f ] = � ℓ ∈ L q ℓ θ ℓ [ f ] 2 = 0 H 1 : | Q [ f ] | > ρ 2 . • Each θ ℓ [ f ] 2 is unbiasedly estimated by � � 1 θ 2 ℓ = i � = i ′ x i x i ′ ϕ ℓ ( t i ) ϕ ℓ ( t i ′ ) . n ( n − 1 ) • Given a sequence of weights w = { w ℓ } , we estimate Q [ f ] by n = � ℓ ∈ L w ℓ q ℓ � � θ 2 Q w ℓ . • Test : we fix a threshold u > 0 and reject H 0 if | � Q w n | > u . � Dalalyan, A.S. c Sept. 2, 2013 7

  8. Overview of our results Basics on the minimax rates of separation For any estimator � Q n , we can write � Q n = Q [ f ] + ǫ n [ f ] . • Under H 0 : | � Q n | ≤ sup f ∈F 0 | ǫ n [ f ] | . Q n | ≥ ρ 2 − sup f ∈F 1 ( ρ ) | ǫ n [ f ] | . • Under H 1 : | � • The testing statistic � Q n leads to a consistent test if | ǫ n [ f ] | < ρ 2 − sup | ǫ n [ f ] | (with prob. 1 − γ ) . sup f ∈F 0 f ∈F 1 ( ρ ) • Let ρ n ( � Q ) be the smallest possible ρ > 0 satisfying sup f ∈F 0 | ǫ n [ f ] | + sup f ∈F 1 ( ρ ) | ǫ n [ f ] | < ρ 2 , (with prob. 1 − γ ) . Q n ρ n ( � • Minimax rate of separation : ρ ∗ n ≍ inf � Q ) . Where the difference with the minimax rate of estimation comes from : replacing sup f ∈F 1 ( ρ ) with sup ρ> 0 sup f ∈F 1 ( ρ ) leads to the minimax rate of estimation, but this is sub-optimal ! � Dalalyan, A.S. c Sept. 2, 2013 8

  9. Overview of our results Minimax rates of separation • Let us call the ratio | q ℓ | / c ℓ the importance of the axis ϕ ℓ . • Let N ( T ) be the set of indices with importance ≥ T > 0. • Let M ( T ) = � ℓ ∈N ( T ) q 2 ℓ . • In the general case, the minimax rate of separation is given by � � 1 / 2 � 4 � √ B 1 M ( T ) + B 2 n n ,γ ) 2 = inf ( ρ ∗ + 2 2 T n γ 1 / 2 T > 0 � M ( T ) 1 / 2 � � n − 1 / 2 . ≍ inf + T n T > 0 • Interestingly, in the case of positive Q � 0, � M ( T ) 1 / 2 � n ,γ ) 2 ≍ inf ( ρ ∗ + T . n T > 0 • In both cases, the test defined using the statistic � Q w n with the weights w ℓ = 1 l ( | q ℓ | / c ℓ ≥ T ) achieves the optimal rate. � Dalalyan, A.S. c Sept. 2, 2013 9

  10. Relation to the norm estimation Phase transition/ “Elbow” effect ℓ = 1 and c ℓ = � d 2 σ j Let us assume the simple case q 2 , ℓ ∈ Z d . j = 1 ℓ j � σ − 1 σ ) where ¯ σ − 1 = 1 One can check that M ( T ) ≍ T − d / ( 2 ¯ . d j In hypotheses testing : • If Q is positive, the mmx rate of separation is n ) 2 ≍ n − 4 ¯ ( ρ ∗ σ/ ( 4 ¯ σ + d ) . • If Q is neither positive nor negative, the mmx rate of separation σ + d ) � 1 / 2 ) . is n ) 2 ≍ n − ( 4 ¯ ( ρ ∗ σ/ ( 4 ¯ � Dalalyan, A.S. c Sept. 2, 2013 10

  11. Relation to the norm estimation Phase transition/ “Elbow” effect ℓ = 1 and c ℓ = � d 2 σ j Let us assume the simple case q 2 , ℓ ∈ Z d . j = 1 ℓ j � σ − 1 σ ) where ¯ σ − 1 = 1 One can check that M ( T ) ≍ T − d / ( 2 ¯ . d j In hypotheses testing : • If Q is positive, the mmx rate of separation is n ) 2 ≍ n − 4 ¯ ( ρ ∗ σ/ ( 4 ¯ σ + d ) . • If Q is neither positive nor negative, the mmx rate of separation σ + d ) � 1 / 2 ) . is n ) 2 ≍ n − ( 4 ¯ ( ρ ∗ σ/ ( 4 ¯ In functional estimation : • If Q [ f ] = � f � 2 , the mmx rate of estimation is (Lepski et al. ’99) r ∗ n ≍ n − 2 ¯ σ/ ( 4 ¯ σ + d ) . • If Q [ f ] = � f � 2 2 , the mmx rate of estimation is (Donoho and σ + d ) � 1 / 2 ) . Nussbaum ’90) r ∗ n ≍ n − ( 4 ¯ σ/ ( 4 ¯ � Dalalyan, A.S. c Sept. 2, 2013 10

  12. Main result I Positive functionals Theorem 1. Assume that E [ ξ 4 1 ] < ∞ and for every T > 0, the set N ( T ) = { ℓ : q ℓ ≥ Tc ℓ } is finite. For a γ ∈ ( 0 , 1 ) , let T n ,γ be such that : � � 1 / 2 � � � � n ( n − 1 ) ℓ ( q ℓ − Tc ℓ ) 2 = ℓ c ℓ ( q ℓ − Tc ℓ ) + ( 2 z 1 − γ/ 2 + o ( 1 )) . + 2 Let us define �� � 1 / 2 l ∈ L q ℓ ( q ℓ − T n ,γ c ℓ ) + ρ ∗ � n ,γ = . l ∈ L c ℓ ( q ℓ − T n ,γ c ℓ ) + If several conditions are fulfilled, then the test based on the array � � 1 − T n ,γ c ℓ w ∗ � l , n = q ℓ + n ,γ ) , � satisfies γ n ( F 0 , F 1 ( ρ ∗ φ ∗ n ) ≤ γ + o ( 1 ) , as n → ∞ . � Dalalyan, A.S. c Sept. 2, 2013 11

  13. Testing partial derivatives • Let α ∈ R d + and σ ∈ R d + be two given vectors. C [ f ] = � d � σ j j α j f /∂ t α 1 1 . . . ∂ t α d d � 2 j = 1 � ∂ σ j f /∂ t j � 2 • Let Q [ f ] = � ∂ 2 , 2 . • Let us define δ , ¯ σ , ( κ j ) and κ by δ = � d � d 1 σ = 1 1 j = 1 α j /σ j , σ j . ¯ d j = 1 • If δ < 1 and σ > d / 4 , ¯ then the exact mmx rate ρ ∗ n ,γ is given by ρ ∗ n ,γ = C ∗ γ ρ ∗ n ( 1 + o ( 1 )) , • where the minimax rate ρ ∗ n and the exact separation constant are n = n − 2 ¯ σ ( 1 − δ ) ρ ∗ σ + d , 4 ¯ 2 ( 1 + δ ) ¯ σ + d � ¯ σ ( 1 − δ ) � α j 4 z 2 ( 1 + 2 κ − 1 ) 1 4 ¯ σ + d and C ∗ 4 ¯ σ + d 2 ( 4 ¯ σ + d ) γ = 1 − γ/ 2 κ C ( d , σ , α ) with κ j = 2 σ j + σ ( 1 − δ ) and σ j 2 ¯ � d κ = � d i = 1 Γ( κ i ) j = 1 κ j and C ( d , σ , α ) = π − d � � d � ( 1 − δ )Γ( κ + 2 ) . i = 1 σ i � Dalalyan, A.S. c Sept. 2, 2013 12

  14. Conclusion • We established minimax rates of separation in the model of regression with random design for null hypotheses corresponding to the nullspace of a general quadratic functionals. • In the case of positive functionals, we also proved sharp-minimax optimality of the proposed procedure. • When comparing two norms, the minimax rate of separation is : σ + d ∧ 1 2 ¯ σ ρ ∗ n = n − 4 . This rate shows that the watershed between the 4 ¯ two regimes corresponds to the condition ¯ σ = d / 4. In other terms, we are in the regular regime when ¯ σ > d / 4. It is interesting to note, even if we are unable to establish a direct connection, that this is also the regime under which the Sobolev 2 ⊂ L 4 ([ 0 , 1 ] d ) holds true. embedding W σ • Open questions : adaptation to the unknown smoothness, unknown noise level, the case of (sparse) Besov bodies,... � Dalalyan, A.S. c Sept. 2, 2013 13

Recommend


More recommend