Hypothesis Testing for High-Dimensional Regression: Nearly Optimal - PowerPoint PPT Presentation

Hypothesis Testing for High-Dimensional Regression: Nearly Optimal Sample Size Adel Javanmard Stanford University- UC Berkeley Based on joint work with Andrea Montanari January 2015 Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 1 / 34

Outline Problem definition 1 Debiasing approach 2 Hypothesis testing under nearly optimal sample size 3 Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 2 / 34

Problem definition Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 3 / 34

Linear model We focus on linear models: Y = X θ 0 + W Y ∈ R n (response), X ∈ R n × p (design matrix), θ 0 ∈ R p (parameters) Noise vector has independent entries with i ) = σ 2 , E ( W 2 E ( W i ) = 0 , E ( | W i | 2 + κ ) < ∞ , for some κ > 0 . Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 4 / 34

Problem Confidence intervals: For each i ∈ { 1 ,..., p } , θ i , θ i ∈ R such that � � θ 0 , i ∈ [ θ i , θ i ] ≥ 1 − α P We would like | θ i − θ i | as small as possible. Hypothesis testing: H 0 , i : θ 0 , i = 0 , H A , i : θ 0 , i � = 0 Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 5 / 34

LASSO � 1 � � 2 n � y − X θ � 2 θ ≡ argmin 2 + λ � θ � 1 . θ ∈ R p [Tibshirani 1996, Chen, Donoho 1996] Distribution of � θ ? Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 6 / 34

LASSO � 1 � � 2 n � y − X θ � 2 θ ≡ argmin 2 + λ � θ � 1 . θ ∈ R p [Tibshirani 1996, Chen, Donoho 1996] Distribution of � θ ? Debiasing approach: (LASSO is biased towards small ℓ 1 norm.) Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 6 / 34

LASSO � 1 � � 2 n � y − X θ � 2 θ ≡ argmin 2 + λ � θ � 1 . θ ∈ R p [Tibshirani 1996, Chen, Donoho 1996] Distribution of � θ ? Debiasing approach: (LASSO is biased towards small ℓ 1 norm.) debiasing � → � θ d θ − − − − − − − − − We characterize distribution of � θ d . Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 6 / 34

Debiasing approach Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 7 / 34

Classical setting ( n ≫ p ) We know everything about the least-square estimator: θ LS = 1 � � Σ − 1 X T Y , n where � Σ ≡ ( X T X ) / n is empirical covariance. Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 8 / 34

Classical setting ( n ≫ p ) We know everything about the least-square estimator: θ LS = 1 � � Σ − 1 X T Y , n where � Σ ≡ ( X T X ) / n is empirical covariance. • Confidence intervals: � ( � Σ − 1 ) ii [ θ i , θ i ] = [ � − c α ∆ i , � θ LS θ LS + c α ∆ i ] , ∆ i ≡ σ i i n Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 8 / 34

High-dimensional setting ( n < p ) θ LS = 1 � Σ − 1 X T Y � n Problem in high dimension: � Σ is not invertible! Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 9 / 34

High-dimensional setting ( n < p ) θ LS = 1 � Σ − 1 X T Y � n Take your favorite M ∈ R p × p : θ ∗ = 1 � nM X T Y = 1 nM X T X θ 0 + 1 nM X T W + 1 = θ 0 +( M � nM X T W Σ − I ) θ 0 � �� bias Gaussian error Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 9 / 34

Debiased estimator + 1 θ ∗ = θ 0 +( M � � nM X T W Σ − I ) θ 0 � �� bias Gaussian error Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 10 / 34

Debiased estimator + 1 θ ∗ = θ 0 +( M � � nM X T W Σ − I ) θ 0 � �� bias Gaussian error Let us (try to) subtract the bias θ u = � θ ∗ − ( M � � Σ − I ) � θ Lasso Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 10 / 34

Debiased estimator + 1 θ ∗ = θ 0 +( M � � nM X T W Σ − I ) θ 0 � �� bias Gaussian error Let us (try to) subtract the bias θ u = � θ ∗ − ( M � � Σ − I ) � θ Lasso Debiased estimator ( � θ = � θ Lasso ) θ + 1 θ d ≡ � � nM X T ( Y − X � θ ) Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 10 / 34

Debiased estimator: Choosing M ? θ + 1 θ d ≡ � � nM X T ( y − X � θ ) Low dimensional projection estimator (LDPE) � Start with a linear estimator, debias by a nonlinear estimator � M constructed via nodewise LASSO on X [C-H. Zhang, S. S. Zhang] Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 11 / 34

Debiased estimator: Choosing M ? θ + 1 θ d ≡ � � nM X T ( y − X � θ ) Low dimensional projection estimator (LDPE) � Start with a linear estimator, debias by a nonlinear estimator � M constructed via nodewise LASSO on X [C-H. Zhang, S. S. Zhang] Approximate inverse of � Σ : nodewise LASSO on X (under row-sparsity assumption on Σ − 1 ) [S. van de Geer, P . Bühlmann, Y. Ritov, R. Dezeure] Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 11 / 34

Debiased estimator: Choosing M ? Our approach: Optimizing two objectives (bias and variance of � θ d ) [A. Javanmard, A. Montanari] √ n ( � θ d − θ 0 ) = √ n ( M � Σ − I )( θ 0 − � θ ) + Z � �� bias ↓ Σ = 1 Z | X ∼ N ( 0 , σ 2 M � � Σ M T n XX T ) , � ��  � noise covariance Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 12 / 34

Debiased estimator: Choosing M ? Our approach: Find M by solving an optimization problem: [A. Javanmard, A. Montanari] 1 ≤ i ≤ p ( M � Σ M T ) i , i minimize max M | M � Σ − I | ∞ ≤ ξ subject to Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 12 / 34

Debiased estimator: Choosing M ? Our approach: Find M by solving an optimization problem: [A. Javanmard, A. Montanari] i � m T Σ m i minimize m i � � Σ m i − e i � ∞ ≤ ξ subject to The optimization can be decoupled and solved in parallel. Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 12 / 34

Main theorems Theorem [Javanmard, Montanari 2013] (Deterministic designs) Let X be any deterministic design that satisfies compatibility condition for the set S = supp ( θ 0 ) , ( | S | ≤ s 0 ), with constant φ 0 . Further define the coherence parameter M ∈ R p × p | M � µ ∗ ≡ min Σ − I | ∞ . � Let K ≡ max i ∈ [ p ] � Σ ii . Then, letting λ = c σ log p / n , we have √ n ( � θ d − θ 0 ) = Z + ∆ , Z ∼ N ( 0 , σ 2 M � Σ M T ) � � √ log p � ∆ � ∞ ≥ 4 c µ ∗ σ s 0 ≤ 2 p − c 0 , c 2 c 0 = 32 K − 1 P φ 2 0 Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 13 / 34

Main theorems Theorem [Javanmard, Montanari 2013] (Deterministic designs) Let X be any deterministic design that satisfies compatibility condition for the set S = supp ( θ 0 ) , ( | S | ≤ s 0 ), with constant φ 0 . Further define the coherence parameter M ∈ R p × p | M � µ ∗ ≡ min Σ − I | ∞ . � Let K ≡ max i ∈ [ p ] � Σ ii . Then, letting λ = c σ log p / n , we have √ n ( � θ d − θ 0 ) = Z + ∆ , Z ∼ N ( 0 , σ 2 M � Σ M T ) � � √ log p � ∆ � ∞ ≥ 4 c µ ∗ σ s 0 ≤ 2 p − c 0 , c 2 c 0 = 32 K − 1 P φ 2 0 Remark: µ ∗ ≤ 1 i � = j |� X e i , X e j �| . n max Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 13 / 34

Main theorems Theorem [Javanmard, Montanari 2013] (Random designs) Let Σ be such that σ min ( Σ ) ≥ C min > 0 and σ max ( Σ ) ≤ C max < ∞ and max i ∈ [ p ] Σ ii ≤ 1 . Assume X Σ − 1 has independent subgaussian rows with mean � zero and subgaussian norm K . Letting λ = c σ log p / n , we have √ n ( � θ d − θ 0 ) = Z + ∆ , Z | X ∼ N ( 0 , σ 2 M � Σ M T ) , � � � ∆ � ∞ ≥ ( 16 c σ ) s 0 log p ≤ 4 e − c 1 n + 4 p − c 2 , √ n P C min for some explicit constants c 1 = C ( K ) , c 2 = C ( c , K , C min , C max ) . Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 14 / 34

Main theorems Theorem [Javanmard, Montanari 2013] (Random designs) Let Σ be such that σ min ( Σ ) ≥ C min > 0 and σ max ( Σ ) ≤ C max < ∞ and max i ∈ [ p ] Σ ii ≤ 1 . Assume X Σ − 1 has independent subgaussian rows with mean � zero and subgaussian norm K . Letting λ = c σ log p / n , we have √ n ( � θ d − θ 0 ) = Z + ∆ , Z | X ∼ N ( 0 , σ 2 M � Σ M T ) , � � � ∆ � ∞ ≥ ( 16 c σ ) s 0 log p ≤ 4 e − c 1 n + 4 p − c 2 , √ n P C min for some explicit constants c 1 = C ( K ) , c 2 = C ( c , K , C min , C max ) . Remark on sample size: n ( s 0 log p ) 2 → ∞ then � ∆ � ∞ = o p ( 1 ) . If Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 14 / 34

Consequences Confidence intervals for single parameters: � � θ 0 , i ∈ [ θ i , θ i ] ≥ 1 − α n → ∞ P lim � σ 2 n ( Σ − 1 ) ii | θ i − θ i | ≤ ( 2 + o ( 1 )) c α (n<p) Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 15 / 34

Hypothesis Testing for High-Dimensional Regression: Nearly Optimal - PowerPoint PPT Presentation

Hypothesis Testing for High-Dimensional Regression: Nearly Optimal Sample Size Adel Javanmard Stanford University- UC Berkeley Based on joint work with Andrea Montanari January 2015 Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Regression Testing vs. Regression Testing Development Testing Developed first version of

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Hypothesis Testing in Regression Models Recall the regression model: y = 0 + 1 x 1 + 2 x

Hypothesis Testing and statistical preliminaries Stony Brook University CSE545, Spring 2019

Testing 6.1 Specification testing Michel Bierlaire A short reminder on hypothesis testing

Hypothesis testing get data that differ from the null hypothesis. If the data would be quite

Size vs height in a Binary Tree After today, you should be able to use the relationship

Welcoming Remarks Ed Weiler, Associate Administrator for Science Space Biology Conference - 1972

Printing with DocumentsCorePack for Microsoft Dynamics 365 and PowerApps Clint Higley, Michael

Py Python Strings Python strings are immuatable: s = abc s[2] = d s = abd

Herringbone Accordion Tent Dome Amount of Cardboard 107 96 84.5 72 (ft^2) Weight (lbs)

36 W 10.8 kJ 306 kJ, so 3:45 hr of operation Temperature (degrees Celsius) Temperature

Communication-efficient Distributed SGD with Sketching Nikita Ivkin, Daniel Rothchild, Enayat

Sketching and Streaming for Distributions Piotr Indyk Andrew McGregor Massachusetts Institute of

Hypothesis Testing for High-Dimensional Regression: Nearly Optimal - PowerPoint PPT Presentation

Hypothesis Testing for High-Dimensional Regression: Nearly Optimal Sample Size Adel Javanmard Stanford University- UC Berkeley Based on joint work with Andrea Montanari January 2015 Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Regression Testing vs. Regression Testing Development Testing Developed first version of

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Hypothesis Testing in Regression Models Recall the regression model: y = 0 + 1 x 1 + 2 x

Hypothesis Testing and statistical preliminaries Stony Brook University CSE545, Spring 2019

Testing 6.1 Specification testing Michel Bierlaire A short reminder on hypothesis testing

Hypothesis testing get data that differ from the null hypothesis. If the data would be quite

Size vs height in a Binary Tree After today, you should be able to use the relationship

Welcoming Remarks Ed Weiler, Associate Administrator for Science Space Biology Conference - 1972

Printing with DocumentsCorePack for Microsoft Dynamics 365 and PowerApps Clint Higley, Michael

Py Python Strings Python strings are immuatable: s = abc s[2] = d s = abd

Herringbone Accordion Tent Dome Amount of Cardboard 107 96 84.5 72 (ft^2) Weight (lbs)

36 W 10.8 kJ 306 kJ, so 3:45 hr of operation Temperature (degrees Celsius) Temperature

Communication-efficient Distributed SGD with Sketching Nikita Ivkin*, Daniel Rothchild*, Enayat

Sketching and Streaming for Distributions Piotr Indyk Andrew McGregor Massachusetts Institute of

Communication-efficient Distributed SGD with Sketching Nikita Ivkin, Daniel Rothchild, Enayat