Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments Variable Inclusion and Shrinkage Algorithm in High Dimension A.Mkhadri and M.Ouhourane Faculty of Sciences-Semlalia, Marrakech 19th International Conference on Computational Statistics on August 22nd-27th 2010. A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments Table of contents 1 Introduction and motivation 2 The regularization methods for linear regression 3 VISA NET algorithm 4 Theoretical Results 5 Numerical experiments A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments Introduction and motivation We consider the standard linear regression model y = X β + ε , where y ∈ IR n is the response X is the n x p model matrix, with x j ∈ IR n , j = 1 , ..., p , are the predictors β is a p -vector of unknown parameters which are to be estimated ε is a n -vector of (i.i.d.) random errors with mean 0 and variance σ 2 A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments Introduction and motivation OLS : β OLS = argmin β � y − X β � 2 � 2 . Two alternatives class of methods : Classical variable selection Stepwise regression Information criterion AIC, BIC Regularization methods A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments LASSO Definition β Lasso = argmin β � y − X β � 2 � 2 + λ � β � 1 . A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments LASSO A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments LASSO Advantages Reduce the variability of the estimates by shrinking the coefficients Produces interpretable models by shrinking some coefficients to exactly zero Disadvantages In high dimension, the Lasso selects at most n variables It’s tends to select only some variable from the high correlated group of variables. The some tuning parameter is used for both variable selection and shrinkage. A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments ELASTIC NET Definition β NaiveEnet = argmin β � y − X β � 2 � 2 + λ 1 � β � 1 + λ 2 � β � 2 2 . β Enet = ( 1 + λ 2 ) ∗ � � β Naive − Enet . A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments ELASTIC NET A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments ELASTIC NET Advantages Encourage a grouping effect No limitation on the number of variables that may be selected for the model Disadvantages It must be chosen between over shrink the correct variables and select a number of noise variables If some significative variables are ignored, It is not possible to restor A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments VISA Definition Select the first set of variables using LASSO (starting point β λ ( 0 ) ) Eliminate the over shrinkage to this set and detects another set of significative variables Simultaneously. Eliminates the over shrinkage of the latter set of variables. A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments VISA A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments VISA Advantages Select sparse models while avoiding over shrinkage problems Disadvantages It does not ensure the grouping effect The number of variables in the starting point is limited by number of observations n A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments VISA NET algorithm Definition Select the first set of variables using Naive-Enet (starting point β λ 1 ,λ 2 ( 0 ) ) Eliminate the over shrinkage to this set and detects another set of significative variables Simultaneously. Eliminate the over shrinkage of the lather set of variables. A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments VISA NET A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments VISA NET algorithm Lemma1 :Given data set ( y , X ) and ( λ 1 , λ 2 , φ ) , define an artificial data set by X ( n + p ) = ( y X ∗ √ λ 2 I ) , y ∗ ( n + p ) × n = ( 0 ) then the VISA ENET is equivalent to a VISA Lars problem on the augmented data set A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments VISA-Net Advantages ensure that we can select more than n variables In the starting set it can select groups of high correlated variables the over shrinkage of the coefficients and the number of noise variables can be decreased. A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments Theoretical Results we show that VISA ENET has non-asymptotic bounds on its estimation errors. Given an index set j ⊂ { 1 , ..., p } and X j . Let ψ ( k ) denote the smallest eigenvalue of the matrix { X ∗ T j , | j | ≤ k } . X ∗ j Theorem 1. Suppose that β ∈ R p is an S-sparse coefficient � vector. Consider an a > 0, and define τ p = σ 2 ( 1 + a ) logp . If β is a VISA estimator with k non-zero � � β j coefficients for which β j = 0, and λ ∞ = � X T ( Y − X � β ) � ∞ ,then ) ≤ ( p a � λ ∞ + τ p P ( � � 4 π logp ) − 1 β − β � 2 > ( S + k ) − 1 / 2 ψ ( S + k ) − λ 2 A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments The grouping effect and selecting others variables We generate one data set of 50 observations and 40 predictors. We chose β = ( 5 , .., 5 , 3 , .., 3 , 1 , .., 1 , 0 , ..., 0 ) . � �� � � �� � � �� � � �� � 5 5 5 25 The predictors X were generated as follows : Z ∼ N ( 0 , 5 ) Z i = Z + ς i , ς i ∼ N ( 0 , 1 ) , i = 1 , .., 3 x i = Z 1 + ε x i , i = 1 , .., 5 , ε x i ∼ N ( 0 , 0 . 1 ) x i = Z 2 + ε x i , i = 6 , .., 10 , ε x i ∼ N ( 0 , 0 . 1 ) x i = Z 2 + ε x i , i = 11 , .., 15 , ε x i ∼ N ( 0 , 0 . 1 ) xi ∼ N ( 0 , 5 ) , i = 16 , .., 40 The response y is generated as : y = X β + ǫ, ε ∼ N ( 0 , 5 ) . Intra-group correlations are high and Inter-groups are average A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Introduction and motivation The regularization methods for linear regression VISA NET algorithm Theoretical Results Numerical experiments The grouping effect and selecting others variables 6 15 10 4 coefficients VISA NET coefficients VISA 5 2 0 0 −5 −2 0 20 40 60 80 0 20 40 60 80 step A.Mkhadri and M.Ouhourane Variable Inclusion and Shrinkage Algorithm in High Dimension
Recommend
More recommend