de biasing the lasso optimal sample size for gaussian
play

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel - PowerPoint PPT Presentation

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel Javanmard (USC ) Hypothesis Testing


  1. De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel Javanmard (USC ) Hypothesis Testing October 2015 1 / 39

  2. An example Kaggle challenge: Identify patients diagnosed with type-2 diabetes Adel Javanmard (USC ) Hypothesis Testing October 2015 2 / 39

  3. Statistical model Data ( Y 1 , X 1 ) ,..., ( Y n , X n ) : Y i = Patient i gets type-2 diabetes ∈ { 0 , 1 } ∈ R p X i = Features of patient i θ 0 ∈ R p Y i ∼ f θ 0 ( ·| X i ) θ 0 , j = contribution of feature j Adel Javanmard (USC ) Hypothesis Testing October 2015 3 / 39

  4. Statistical model Data ( Y 1 , X 1 ) ,..., ( Y n , X n ) : Y i = Patient i gets type-2 diabetes ∈ { 0 , 1 } ∈ R p X i = Features of patient i θ 0 ∈ R p Y i ∼ f θ 0 ( ·| X i ) θ 0 , j = contribution of feature j Adel Javanmard (USC ) Hypothesis Testing October 2015 3 / 39

  5. Statistical model Data ( Y 1 , X 1 ) ,..., ( Y n , X n ) : Y i = Patient i gets type-2 diabetes ∈ { 0 , 1 } ∈ R p X i = Features of patient i θ 0 ∈ R p Y i ∼ f θ 0 ( ·| X i ) θ 0 , j = contribution of feature j Adel Javanmard (USC ) Hypothesis Testing October 2015 3 / 39

  6. Regularized estimator � � � θ ≡ argmin L ( θ ) + λ � θ � 1 . � �� � ���� θ ∈ R p logistic loss regularizer Convex optimization Variable selection Adel Javanmard (USC ) Hypothesis Testing October 2015 4 / 39

  7. Practice fusion data set (Kaggle) Database n = 500 : patients p = 805 : medical information (meds, lab results, diagnosis, . . . ) Adel Javanmard (USC ) Hypothesis Testing October 2015 5 / 39

  8. 0.4 0.3 Blood Billirubin pressure 0.2 Globulin 0.1 � θ 0 − 0.1 − 0.2 (HDL) cholesterol − 0.3 − 0.4 Year of birth − 0.5 0 200 400 600 800 Regularized logreg selects 62 features ( λ chosen via cross validation resulting AUC = 0 . 75 ) Shall we trust our findings? Adel Javanmard (USC ) Hypothesis Testing October 2015 6 / 39

  9. 0.4 0.3 Blood Billirubin pressure 0.2 Globulin 0.1 � θ 0 − 0.1 − 0.2 (HDL) cholesterol − 0.3 − 0.4 Year of birth − 0.5 0 200 400 600 800 Regularized logreg selects 62 features ( λ chosen via cross validation resulting AUC = 0 . 75 ) Shall we trust our findings? Adel Javanmard (USC ) Hypothesis Testing October 2015 6 / 39

  10. In summary Will focus on linear model and Lasso Compute confidence intervals/p-values Adel Javanmard (USC ) Hypothesis Testing October 2015 7 / 39

  11. Outline Problem definition 1 Debiasing approach 2 Hypothesis testing under nearly optimal sample size 3 Adel Javanmard (USC ) Hypothesis Testing October 2015 8 / 39

  12. Problem definition Adel Javanmard (USC ) Hypothesis Testing October 2015 9 / 39

  13. Linear model We focus on linear models: Y = X θ 0 + W Y ∈ R n (response), X ∈ R n × p (design matrix), θ 0 ∈ R p (parameters) Noise vector has independent entries with i ) = σ 2 , E ( W 2 E ( W i ) = 0 , E ( | W i | 2 + κ ) < ∞ , for some κ > 0 . Adel Javanmard (USC ) Hypothesis Testing October 2015 10 / 39

  14. Problem Confidence intervals: For each i ∈ { 1 ,..., p } , θ i , θ i ∈ R such that � � θ 0 , i ∈ [ θ i , θ i ] ≥ 1 − α P We would like | θ i − θ i | as small as possible. Hypothesis testing: H 0 , i : θ 0 , i = 0 , H A , i : θ 0 , i � = 0 Adel Javanmard (USC ) Hypothesis Testing October 2015 11 / 39

  15. LASSO � 1 � � 2 n � y − X θ � 2 θ ≡ argmin 2 + λ � θ � 1 . θ ∈ R p [Tibshirani 1996, Chen, Donoho 1996] Distribution of � θ ? Adel Javanmard (USC ) Hypothesis Testing October 2015 12 / 39

  16. LASSO � 1 � � 2 n � y − X θ � 2 θ ≡ argmin 2 + λ � θ � 1 . θ ∈ R p [Tibshirani 1996, Chen, Donoho 1996] Distribution of � θ ? Debiasing approach: (LASSO is biased towards small ℓ 1 norm.) Adel Javanmard (USC ) Hypothesis Testing October 2015 12 / 39

  17. LASSO � 1 � � 2 n � y − X θ � 2 θ ≡ argmin 2 + λ � θ � 1 . θ ∈ R p [Tibshirani 1996, Chen, Donoho 1996] Distribution of � θ ? Debiasing approach: (LASSO is biased towards small ℓ 1 norm.) debiasing � → � θ d θ − − − − − − − − − We characterize distribution of � θ d . Adel Javanmard (USC ) Hypothesis Testing October 2015 12 / 39

  18. Debiasing approach Adel Javanmard (USC ) Hypothesis Testing October 2015 13 / 39

  19. Classical setting ( n ≫ p ) We know everything about the least-square estimator: θ LS = 1 � � Σ − 1 X T Y , n where � Σ ≡ ( X T X ) / n is empirical covariance. Adel Javanmard (USC ) Hypothesis Testing October 2015 14 / 39

  20. Classical setting ( n ≫ p ) We know everything about the least-square estimator: θ LS = 1 � � Σ − 1 X T Y , n where � Σ ≡ ( X T X ) / n is empirical covariance. • Confidence intervals: � ( � Σ − 1 ) ii [ θ i , θ i ] = [ � − c α ∆ i , � θ LS θ LS + c α ∆ i ] , ∆ i ≡ σ i i n Adel Javanmard (USC ) Hypothesis Testing October 2015 14 / 39

  21. High-dimensional setting ( n < p ) θ LS = 1 � Σ − 1 X T Y � n Problem in high dimension: � Σ is not invertible! Adel Javanmard (USC ) Hypothesis Testing October 2015 15 / 39

  22. High-dimensional setting ( n < p ) θ LS = 1 � Σ − 1 X T Y � n Take your favorite M ∈ R p × p : θ ∗ = 1 � nM X T Y = 1 nM X T X θ 0 + 1 nM X T W + 1 = θ 0 +( M � nM X T W Σ − I ) θ 0 � �� � � �� � bias Gaussian error Adel Javanmard (USC ) Hypothesis Testing October 2015 15 / 39

  23. Debiased estimator + 1 θ ∗ = θ 0 +( M � � nM X T W Σ − I ) θ 0 � �� � � �� � bias Gaussian error Adel Javanmard (USC ) Hypothesis Testing October 2015 16 / 39

  24. Debiased estimator + 1 θ ∗ = θ 0 +( M � � nM X T W Σ − I ) θ 0 � �� � � �� � bias Gaussian error Let us (try to) subtract the bias θ d = � θ ∗ − ( M � � Σ − I ) � θ Lasso Adel Javanmard (USC ) Hypothesis Testing October 2015 16 / 39

  25. Debiased estimator + 1 θ ∗ = θ 0 +( M � � nM X T W Σ − I ) θ 0 � �� � � �� � bias Gaussian error Let us (try to) subtract the bias θ d = � θ ∗ − ( M � � Σ − I ) � θ Lasso Debiased estimator ( � θ = � θ Lasso ) θ + 1 θ d ≡ � � nM X T ( Y − X � θ ) Adel Javanmard (USC ) Hypothesis Testing October 2015 16 / 39

  26. Debiased estimator: Choosing M ? θ + 1 θ d ≡ � � nM X T ( y − X � θ ) Gaussian design ( x i ∼ N ( 0 , Σ ) ) � Assume known Σ (relevant in semi-supervised learning) � M = Σ − 1 [Javanmard, Montanari 2012] Adel Javanmard (USC ) Hypothesis Testing October 2015 17 / 39

  27. Debiased estimator: Choosing M ? θ + 1 θ d ≡ � � nM X T ( y − X � θ ) Gaussian design ( x i ∼ N ( 0 , Σ ) ) � Assume known Σ (relevant in semi-supervised learning) � M = Σ − 1 [Javanmard, Montanari 2012] Does this remind you anything? θ + Σ − 1 1 θ d ≡ � � n X T ( y − X � θ ) Adel Javanmard (USC ) Hypothesis Testing October 2015 17 / 39

  28. Debiased estimator: Choosing M ? θ + 1 θ d ≡ � � nM X T ( y − X � θ ) Gaussian design ( x i ∼ N ( 0 , Σ ) ) � Assume known Σ (relevant in semi-supervised learning) � M = Σ − 1 [Javanmard, Montanari 2012] Does this remind you anything? θ + Σ − 1 1 θ d ≡ � � n X T ( y − X � θ ) (pseudo-) Newton method Adel Javanmard (USC ) Hypothesis Testing October 2015 17 / 39

  29. Debiased estimator: Choosing M ? θ + 1 θ d ≡ � � nM X T ( y − X � θ ) Gaussian design ( x i ∼ N ( 0 , Σ ) ) � Assume known Σ (relevant in semi-supervised learning) � M = Σ − 1 [Javanmard, Montanari 2012] Approximate inverse of � Σ : nodewise LASSO on X (under row-sparsity assumption on Σ − 1 ) [S. van de Geer, P . Bühlmann, Y. Ritov, R. Dezeure 2014] Adel Javanmard (USC ) Hypothesis Testing October 2015 17 / 39

  30. Debiased estimator: Choosing M ? Our approach: Optimizing two objectives (bias and variance of � θ d ) [A. Javanmard, A. Montanari 2014] √ n ( � θ d − θ 0 ) = √ n ( M � Σ − I )( θ 0 − � θ ) + Z � �� � bias ↓ Σ = 1 Z | X ∼ N ( 0 , σ 2 M � � Σ M T n XX T ) , � �� �  � noise covariance Adel Javanmard (USC ) Hypothesis Testing October 2015 18 / 39

  31. Debiased estimator: Choosing M ? Our approach: Find M by solving an optimization problem: [A. Javanmard, A. Montanari] 1 ≤ i ≤ p ( M � Σ M T ) i , i max minimize M | M � Σ − I | ∞ ≤ ξ subject to Adel Javanmard (USC ) Hypothesis Testing October 2015 18 / 39

  32. Debiased estimator: Choosing M ? Our approach: Find M by solving an optimization problem: [A. Javanmard, A. Montanari] i � m T Σ m i minimize m i � � Σ m i − e i � ∞ ≤ ξ subject to The optimization can be decoupled and solved in parallel. Adel Javanmard (USC ) Hypothesis Testing October 2015 18 / 39

  33. What does it look like? 0.4 0.3 Density 0.2 0.1 0.0 -10 -5 0 5 10 � θ d i Can estimate σ ‘Ground truth’ from n tot = 10 , 000 records. Adel Javanmard (USC ) Hypothesis Testing October 2015 19 / 39

Recommend


More recommend