ultrahigh dimensional variable selection beyond the
play

Ultrahigh dimensional variable selection: Beyond the linear model - PowerPoint PPT Presentation

Ultrahigh dimensional variable selection: Beyond the linear model Jianqing Fan Princeton University With Richard Samworth and Yichao Wu ; Rui Song http://www.princeton.edu/ jqfan May 16, 2009 Jianqing Fan ( Princeton University)


  1. Ultrahigh dimensional variable selection: Beyond the linear model Jianqing Fan Princeton University With Richard Samworth and Yichao Wu ; Rui Song http://www.princeton.edu/ ∼ jqfan May 16, 2009 Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 1 / 43

  2. Outline Introduction 1 Large-scale screening 2 Moderate-scale Selection 3 Iterative feature selection 4 Numerical Studies 5 Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 2 / 43

  3. Introduction Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 3 / 43

  4. Introduction High-dim variable selection characterizes many contemporary statistical problems. Bioinformatic: disease classification using microarray, proteomics, fMRI data. Document or text classification: E-mail spam. Association studies between phenotypes and SNPs. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 4 / 43

  5. Growth of Dimensionality � Dimensionality grows rapidly with interactions Portfolio selection and network modeling : 2,000 stocks involves over 2m unknown parameters in the covariance matrix. 50% 50% 0% Gene-gene inteaction : interactions of 5000 genes result in 12.5m features. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 5 / 43

  6. Aims of High-dimensional Regression and Classification � To construct as effective a method as possible to predict future observations. � To gain insight into the relationship between features and response for scientific purposes, as well as, hopefully, to construct an improved prediction method. Bickel (2008) discussion of the SIS paper (JRSS-B). Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 6 / 43

  7. Challenges with Ultrahigh Dimensionality � Computational cost � Estimation accuracy. � Stability Key idea : Large-scale screening and moderate-scale searching. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 7 / 43

  8. Large-scale sreening Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 8 / 43

  9. Independence learning Regression : Feature ranking by correlation learning (Fan and Lv, 2008, JRSS-B) . When Y = ± 1, this implies Classification : Feature ranking by two-sample t-tests or other tests (Tibshirani, et al, 03; Fan and Fan, 2008) . SIS : By an appropriate thresholding (e.g., n variables), relevant features are in the selected set (Fan and Lv, 08) , relying on joint-normality assumption. Other independent learning : Hall, Titterington and Xue (2009) derive such a method from empirical likelihood point of view. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 9 / 43

  10. Independence learning Regression : Feature ranking by correlation learning (Fan and Lv, 2008, JRSS-B) . When Y = ± 1, this implies Classification : Feature ranking by two-sample t-tests or other tests (Tibshirani, et al, 03; Fan and Fan, 2008) . SIS : By an appropriate thresholding (e.g., n variables), relevant features are in the selected set (Fan and Lv, 08) , relying on joint-normality assumption. Other independent learning : Hall, Titterington and Xue (2009) derive such a method from empirical likelihood point of view. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 9 / 43

  11. Independence learning Regression : Feature ranking by correlation learning (Fan and Lv, 2008, JRSS-B) . When Y = ± 1, this implies Classification : Feature ranking by two-sample t-tests or other tests (Tibshirani, et al, 03; Fan and Fan, 2008) . SIS : By an appropriate thresholding (e.g., n variables), relevant features are in the selected set (Fan and Lv, 08) , relying on joint-normality assumption. Other independent learning : Hall, Titterington and Xue (2009) derive such a method from empirical likelihood point of view. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 9 / 43

  12. Model setting � � GLIM : f Y ( y | X = x ; θ ) = exp ( y θ − b ( θ )) / φ + c ( y , φ ) with b ′− 1 ( µ ) = θ = x T β . canonial link : Objective : Find sparse β to minimize Q ( β ) = ∑ n i = 1 L ( Y i , x T i β ) . � GLIM : L ( Y i , x T i β ) = b ( x T i β ) − Y i x T i β . � Classification : Y = ± 1. ⋆ SVM L ( Y i , x T i β ) = ( 1 − Y i x T i β ) + . ⋆ AdaBoost L ( Y i , x T i β ) = exp ( − Y i x T i β ) . � Robustness : L ( Y i , x T i β ) = | Y i − x T i β | . Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 10 / 43

  13. Model setting � � GLIM : f Y ( y | X = x ; θ ) = exp ( y θ − b ( θ )) / φ + c ( y , φ ) with b ′− 1 ( µ ) = θ = x T β . canonial link : Objective : Find sparse β to minimize Q ( β ) = ∑ n i = 1 L ( Y i , x T i β ) . � GLIM : L ( Y i , x T i β ) = b ( x T i β ) − Y i x T i β . � Classification : Y = ± 1. ⋆ SVM L ( Y i , x T i β ) = ( 1 − Y i x T i β ) + . ⋆ AdaBoost L ( Y i , x T i β ) = exp ( − Y i x T i β ) . � Robustness : L ( Y i , x T i β ) = | Y i − x T i β | . Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 10 / 43

  14. Questions How to screen discrete variables (Genome-wide association)? 1 Do they have sure screening property? 2 What is the size of selected model in order to have SIS? 3 The arguments in Fan and Lv (2008) can not be applied here. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 11 / 43

  15. Questions How to screen discrete variables (Genome-wide association)? 1 Do they have sure screening property? 2 What is the size of selected model in order to have SIS? 3 The arguments in Fan and Lv (2008) can not be applied here. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 11 / 43

  16. Questions How to screen discrete variables (Genome-wide association)? 1 Do they have sure screening property? 2 What is the size of selected model in order to have SIS? 3 The arguments in Fan and Lv (2008) can not be applied here. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 11 / 43

  17. Independence learning L 0 = min β 0 n − 1 ∑ n Marginal utility : Letting ˆ i = 1 L ( Y i , β 0 ) , define n ∑ ˆ L j = ˆ L 0 − min n − 1 L ( Y i , β 0 + X ij β j ) Wilks . β 0 , β j i = 1 M or ˆ β ( Wald ), assuming EX 2 j = 1. j Feature ranking : Select features w/ largest marginal utilities : M w β M � � γ n = { j : ˆ M ν n = { j : ˆ L j ≥ ν n } , j ≥ γ n } Dim. reduction : From p n = O ( exp ( n a )) to O ( n b ) : 200 10000 Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 12 / 43

  18. Independence learning L 0 = min β 0 n − 1 ∑ n Marginal utility : Letting ˆ i = 1 L ( Y i , β 0 ) , define n ∑ ˆ L j = ˆ L 0 − min n − 1 L ( Y i , β 0 + X ij β j ) Wilks . β 0 , β j i = 1 M or ˆ β ( Wald ), assuming EX 2 j = 1. j Feature ranking : Select features w/ largest marginal utilities : M w β M � � γ n = { j : ˆ M ν n = { j : ˆ L j ≥ ν n } , j ≥ γ n } Dim. reduction : From p n = O ( exp ( n a )) to O ( n b ) : 200 10000 Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 12 / 43

  19. Theoretical Basis – Population Aspect I j = E ℓ ( Y , β M Marginal utility : L ⋆ 0 ) − min E ℓ ( Y , β 0 + β j X j ) . Likelihood ratio (Fan and Song, 09) ⇒ cov ( Y , X j ) = cov ( b ′ ( X T β ⋆ ) , X j ) = 0 Theorem 1 : L ⋆ j = 0 ⇐ ⇒ β M ⇐ j = 0 . For Gaussian covariates, conclusion holds if | cov ( X T β ⋆ , X j ) | = 0, i.e. independence. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 13 / 43

  20. Theoretical Basis – Population Aspect II j � = 0 } , where β ⋆ = argmin EL ( Y , X T β ) . True model : M ⋆ = { j : β ⋆ Theorem 2 : If | cov ( b ′ ( X T β ⋆ ) , X j ) | ≥ c 1 n − κ for j ∈ M ⋆ , then j | ≥ c 1 n − κ , j | ≥ c 2 n − 2 κ . | β M | L ⋆ min min j ∈ M ⋆ j ∈ M ⋆ If { X j , j / ∈ M ⋆ } is independent of { X i , i ∈ M ⋆ } , then L ⋆ j = 0. For Gaussian covariates, conclusion holds if | cov ( X T β ⋆ , X j ) | ≥ c 1 n − κ , min condition even for LS . Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 14 / 43

  21. Theoretical Basis – Population Aspect II j � = 0 } , where β ⋆ = argmin EL ( Y , X T β ) . True model : M ⋆ = { j : β ⋆ Theorem 2 : If | cov ( b ′ ( X T β ⋆ ) , X j ) | ≥ c 1 n − κ for j ∈ M ⋆ , then j | ≥ c 1 n − κ , j | ≥ c 2 n − 2 κ . | β M | L ⋆ min min j ∈ M ⋆ j ∈ M ⋆ If { X j , j / ∈ M ⋆ } is independent of { X i , i ∈ M ⋆ } , then L ⋆ j = 0. For Gaussian covariates, conclusion holds if | cov ( X T β ⋆ , X j ) | ≥ c 1 n − κ , min condition even for LS . Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 14 / 43

Recommend


More recommend