Hypothesis Testing for High-Dimensional Regression: Nearly Optimal Sample Size Adel Javanmard Stanford University- UC Berkeley Based on joint work with Andrea Montanari January 2015 Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 1 / 34
Outline Problem definition 1 Debiasing approach 2 Hypothesis testing under nearly optimal sample size 3 Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 2 / 34
Problem definition Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 3 / 34
Linear model We focus on linear models: Y = X θ 0 + W Y ∈ R n (response), X ∈ R n × p (design matrix), θ 0 ∈ R p (parameters) Noise vector has independent entries with i ) = σ 2 , E ( W 2 E ( W i ) = 0 , E ( | W i | 2 + κ ) < ∞ , for some κ > 0 . Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 4 / 34
Problem Confidence intervals: For each i ∈ { 1 ,..., p } , θ i , θ i ∈ R such that � � θ 0 , i ∈ [ θ i , θ i ] ≥ 1 − α P We would like | θ i − θ i | as small as possible. Hypothesis testing: H 0 , i : θ 0 , i = 0 , H A , i : θ 0 , i � = 0 Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 5 / 34
LASSO � 1 � � 2 n � y − X θ � 2 θ ≡ argmin 2 + λ � θ � 1 . θ ∈ R p [Tibshirani 1996, Chen, Donoho 1996] Distribution of � θ ? Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 6 / 34
LASSO � 1 � � 2 n � y − X θ � 2 θ ≡ argmin 2 + λ � θ � 1 . θ ∈ R p [Tibshirani 1996, Chen, Donoho 1996] Distribution of � θ ? Debiasing approach: (LASSO is biased towards small ℓ 1 norm.) Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 6 / 34
LASSO � 1 � � 2 n � y − X θ � 2 θ ≡ argmin 2 + λ � θ � 1 . θ ∈ R p [Tibshirani 1996, Chen, Donoho 1996] Distribution of � θ ? Debiasing approach: (LASSO is biased towards small ℓ 1 norm.) debiasing � → � θ d θ − − − − − − − − − We characterize distribution of � θ d . Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 6 / 34
Debiasing approach Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 7 / 34
Classical setting ( n ≫ p ) We know everything about the least-square estimator: θ LS = 1 � � Σ − 1 X T Y , n where � Σ ≡ ( X T X ) / n is empirical covariance. Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 8 / 34
Classical setting ( n ≫ p ) We know everything about the least-square estimator: θ LS = 1 � � Σ − 1 X T Y , n where � Σ ≡ ( X T X ) / n is empirical covariance. • Confidence intervals: � ( � Σ − 1 ) ii [ θ i , θ i ] = [ � − c α ∆ i , � θ LS θ LS + c α ∆ i ] , ∆ i ≡ σ i i n Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 8 / 34
High-dimensional setting ( n < p ) θ LS = 1 � Σ − 1 X T Y � n Problem in high dimension: � Σ is not invertible! Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 9 / 34
High-dimensional setting ( n < p ) θ LS = 1 � Σ − 1 X T Y � n Take your favorite M ∈ R p × p : θ ∗ = 1 � nM X T Y = 1 nM X T X θ 0 + 1 nM X T W + 1 = θ 0 +( M � nM X T W Σ − I ) θ 0 � �� � � �� � bias Gaussian error Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 9 / 34
Debiased estimator + 1 θ ∗ = θ 0 +( M � � nM X T W Σ − I ) θ 0 � �� � � �� � bias Gaussian error Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 10 / 34
Debiased estimator + 1 θ ∗ = θ 0 +( M � � nM X T W Σ − I ) θ 0 � �� � � �� � bias Gaussian error Let us (try to) subtract the bias θ u = � θ ∗ − ( M � � Σ − I ) � θ Lasso Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 10 / 34
Debiased estimator + 1 θ ∗ = θ 0 +( M � � nM X T W Σ − I ) θ 0 � �� � � �� � bias Gaussian error Let us (try to) subtract the bias θ u = � θ ∗ − ( M � � Σ − I ) � θ Lasso Debiased estimator ( � θ = � θ Lasso ) θ + 1 θ d ≡ � � nM X T ( Y − X � θ ) Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 10 / 34
Debiased estimator: Choosing M ? θ + 1 θ d ≡ � � nM X T ( y − X � θ ) Low dimensional projection estimator (LDPE) � Start with a linear estimator, debias by a nonlinear estimator � M constructed via nodewise LASSO on X [C-H. Zhang, S. S. Zhang] Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 11 / 34
Debiased estimator: Choosing M ? θ + 1 θ d ≡ � � nM X T ( y − X � θ ) Low dimensional projection estimator (LDPE) � Start with a linear estimator, debias by a nonlinear estimator � M constructed via nodewise LASSO on X [C-H. Zhang, S. S. Zhang] Approximate inverse of � Σ : nodewise LASSO on X (under row-sparsity assumption on Σ − 1 ) [S. van de Geer, P . Bühlmann, Y. Ritov, R. Dezeure] Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 11 / 34
Debiased estimator: Choosing M ? Our approach: Optimizing two objectives (bias and variance of � θ d ) [A. Javanmard, A. Montanari] √ n ( � θ d − θ 0 ) = √ n ( M � Σ − I )( θ 0 − � θ ) + Z � �� � bias ↓ Σ = 1 Z | X ∼ N ( 0 , σ 2 M � � Σ M T n XX T ) , � �� � � noise covariance Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 12 / 34
Debiased estimator: Choosing M ? Our approach: Find M by solving an optimization problem: [A. Javanmard, A. Montanari] 1 ≤ i ≤ p ( M � Σ M T ) i , i minimize max M | M � Σ − I | ∞ ≤ ξ subject to Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 12 / 34
Debiased estimator: Choosing M ? Our approach: Find M by solving an optimization problem: [A. Javanmard, A. Montanari] i � m T Σ m i minimize m i � � Σ m i − e i � ∞ ≤ ξ subject to The optimization can be decoupled and solved in parallel. Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 12 / 34
Main theorems Theorem [Javanmard, Montanari 2013] (Deterministic designs) Let X be any deterministic design that satisfies compatibility condition for the set S = supp ( θ 0 ) , ( | S | ≤ s 0 ), with constant φ 0 . Further define the coherence parameter M ∈ R p × p | M � µ ∗ ≡ min Σ − I | ∞ . � Let K ≡ max i ∈ [ p ] � Σ ii . Then, letting λ = c σ log p / n , we have √ n ( � θ d − θ 0 ) = Z + ∆ , Z ∼ N ( 0 , σ 2 M � Σ M T ) � � √ log p � ∆ � ∞ ≥ 4 c µ ∗ σ s 0 ≤ 2 p − c 0 , c 2 c 0 = 32 K − 1 P φ 2 0 Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 13 / 34
Main theorems Theorem [Javanmard, Montanari 2013] (Deterministic designs) Let X be any deterministic design that satisfies compatibility condition for the set S = supp ( θ 0 ) , ( | S | ≤ s 0 ), with constant φ 0 . Further define the coherence parameter M ∈ R p × p | M � µ ∗ ≡ min Σ − I | ∞ . � Let K ≡ max i ∈ [ p ] � Σ ii . Then, letting λ = c σ log p / n , we have √ n ( � θ d − θ 0 ) = Z + ∆ , Z ∼ N ( 0 , σ 2 M � Σ M T ) � � √ log p � ∆ � ∞ ≥ 4 c µ ∗ σ s 0 ≤ 2 p − c 0 , c 2 c 0 = 32 K − 1 P φ 2 0 Remark: µ ∗ ≤ 1 i � = j |� X e i , X e j �| . n max Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 13 / 34
Main theorems Theorem [Javanmard, Montanari 2013] (Random designs) Let Σ be such that σ min ( Σ ) ≥ C min > 0 and σ max ( Σ ) ≤ C max < ∞ and max i ∈ [ p ] Σ ii ≤ 1 . Assume X Σ − 1 has independent subgaussian rows with mean � zero and subgaussian norm K . Letting λ = c σ log p / n , we have √ n ( � θ d − θ 0 ) = Z + ∆ , Z | X ∼ N ( 0 , σ 2 M � Σ M T ) , � � � ∆ � ∞ ≥ ( 16 c σ ) s 0 log p ≤ 4 e − c 1 n + 4 p − c 2 , √ n P C min for some explicit constants c 1 = C ( K ) , c 2 = C ( c , K , C min , C max ) . Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 14 / 34
Main theorems Theorem [Javanmard, Montanari 2013] (Random designs) Let Σ be such that σ min ( Σ ) ≥ C min > 0 and σ max ( Σ ) ≤ C max < ∞ and max i ∈ [ p ] Σ ii ≤ 1 . Assume X Σ − 1 has independent subgaussian rows with mean � zero and subgaussian norm K . Letting λ = c σ log p / n , we have √ n ( � θ d − θ 0 ) = Z + ∆ , Z | X ∼ N ( 0 , σ 2 M � Σ M T ) , � � � ∆ � ∞ ≥ ( 16 c σ ) s 0 log p ≤ 4 e − c 1 n + 4 p − c 2 , √ n P C min for some explicit constants c 1 = C ( K ) , c 2 = C ( c , K , C min , C max ) . Remark on sample size: n ( s 0 log p ) 2 → ∞ then � ∆ � ∞ = o p ( 1 ) . If Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 14 / 34
Consequences Confidence intervals for single parameters: � � θ 0 , i ∈ [ θ i , θ i ] ≥ 1 − α n → ∞ P lim � σ 2 n ( Σ − 1 ) ii | θ i − θ i | ≤ ( 2 + o ( 1 )) c α (n<p) Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 15 / 34
Recommend
More recommend