robust strategies and model selection stefan van aelst
play

Robust strategies and model selection Stefan Van Aelst Department - PowerPoint PPT Presentation

Robust strategies and model selection Stefan Van Aelst Department of Applied Mathematics and Computer Science Ghent University, Belgium Stefan.VanAelst@UGent.be ERCIM09 - COMISEF/COST Tutorial Outline Regression model 1 Least squares 2


  1. Robust strategies and model selection Stefan Van Aelst Department of Applied Mathematics and Computer Science Ghent University, Belgium Stefan.VanAelst@UGent.be ERCIM09 - COMISEF/COST Tutorial

  2. Outline Regression model 1 Least squares 2 Manual variable selection approach 3 Automatic variable selection approach 4 Robustness 5 Robust variable selection: sequencing 6 Robust variable selection: segmentation 7 Robust selection procedures Stefan Van Aelst 2

  3. Regression model Regression setting Consider a dataset Z n = { ( y i , x i 1 , . . . , x id ) = ( y i , x i ); i = 1 , . . . , n } ⊂ R d + 1 . Y is the response variable X 1 , . . . , X d are the candidate regressors The corresponding linear model is: i = 1 , . . . , n y i = β 1 x i 1 + · · · + β d x id + ǫ i y i = x ′ i = 1 , . . . , n i β + ǫ i where the errors ǫ i are assumed to be iid with E ( ǫ i ) = 0 and Var ( ǫ i ) = σ 2 > 0 . Estimate the regression coefficients β from the data. Robust selection procedures Stefan Van Aelst 3

  4. Least squares Least squares solution n � � � 2 ˆ min y i − x ′ β LS solves i β β i = 1 Write X = ( x 1 , . . . , x n ) t y = ( y 1 , . . . , x n ) t β ( y − X β ) t ( y − X β ) Then, ˆ min β LS solves ⇒ β LS = ( X t X ) − 1 X t y ˆ β = X ( X t X ) − 1 X t y = H y y = X ˆ ˆ Robust selection procedures Stefan Van Aelst 4

  5. Least squares Least squares properties Unbiased estimator: E (ˆ β LS ) = β Gauss-Markov theorem: LS has smallest variance among all unbiased linear estimators of β . Why do variable selection? Robust selection procedures Stefan Van Aelst 5

  6. Least squares Expected prediction error Assume the true regression function is linear: Y | x = f ( x ) + ǫ = x t β + ǫ Predict the response Y 0 at x 0 : Y 0 = x t 0 β + ǫ 0 = f ( x 0 ) + ǫ 0 Use an estimator of the regression coefficients: ˜ β f ( x 0 ) = x t 0 ˜ Estimated prediction: ˜ β � f ( x 0 )) 2 � ( Y 0 − ˜ Expected prediction error: E Robust selection procedures Stefan Van Aelst 6

  7. Least squares Expected prediction error � f ( x 0 )) 2 � f ( x 0 )) 2 ] = E E [( Y 0 − ˜ ( f ( x 0 ) + ǫ 0 − ˜ � f ( x 0 )) 2 � = σ 2 + E ( f ( x 0 ) − ˜ = σ 2 + MSE (˜ f ( x 0 )) σ 2 : irreducible variance of the new observation y 0 f ( x 0 )) mean squared error of the prediction at x 0 by MSE (˜ the estimator ˜ f Robust selection procedures Stefan Van Aelst 7

  8. Least squares MSE of a prediction � f ( x 0 )) 2 � MSE (˜ f ( x 0 )) = E ( f ( x 0 ) − ˜ � β )] 2 � [ x t 0 ( β − ˜ = E � β )] 2 � [ x t 0 ( β − E (˜ β ) + E (˜ β ) − ˜ = E f ( x 0 )) 2 + Var (˜ = bias (˜ f ( x 0 )) LS is unbiased ⇒ bias (˜ f ( x 0 )) = 0 f ( x 0 )) (Gauss-Markov) LS minimizes Var (˜ LS has smallest MSPE among all linear unbiased estimators Robust selection procedures Stefan Van Aelst 8

  9. Least squares LS instability LS becomes unstable with large MSPE if Var (˜ f ( x 0 )) is high. This can happen if Many noise variables among the candidate regressors Highly correlated predictors (multicollinearity) ⇒ Improve on least squares MSPE by trading (a little) bias for (a lot of) variance! Robust selection procedures Stefan Van Aelst 9

  10. Manual variable selection approach Manual variable selection Try to determine the set of the most important regressors Remove the noise regressors from the model Avoid multicollinearity Methods All subsets Backward elimination Forward selection Stepwise selection → choose a selection criterion Robust selection procedures Stefan Van Aelst 10

  11. Manual variable selection approach Submodels Dataset Z n = { ( y i , x i 1 , . . . , x id ) = ( y i , x i ); i = 1 , . . . , n } ⊂ R d + 1 . Let α ⊂ { 1 , . . . , d } denote the predictors included in a submodel The corresponding submodel is: y i = x ′ i = 1 , . . . , n . α i β α + ǫ α i A selected model is considered a good model if It is parsimonious It fits the data well It yields good predictions for similar data Robust selection procedures Stefan Van Aelst 11

  12. Manual variable selection approach Some standard selection criteria A ( α ) = 1 − RSS ( α ) / ( n − d ( α )) Adjusted R 2 : RSS ( 1 ) / ( n − 1 ) C ( α ) = RSS ( α ) − ( n − 2 d ( α )) Mallow’s C p : σ 2 ˆ FPE ( α ) = RSS ( α ) + 2 d ( α ) Final Prediction Error: � σ 2 ˆ AIC ( α ) = − 2 L ( α ) + 2 d ( α ) AIC: BIC ( α ) = − 2 L ( α ) + log ( n ) d ( α ) BIC: where ˆ σ is the residual scale estimate in the "full" model Robust selection procedures Stefan Van Aelst 12

  13. Manual variable selection approach Resampling based selection criteria Consider the (conditional) expected prediction error: � � � n � � 2 � 1 � � z i − x ′ � y , X α i ˆ PE ( α ) = E β α � , n i = 1 Estimates of the PE can be used as selection criterion. Estimates can be obtained by cross-validation or bootstrap. A more advanced selection criterion takes both goodness-of-fit and PE into account: � � � n n � � 2 � � 2 � PPE ( α ) = 1 � 1 � � y i − x ′ z i − x ′ � y , X α i ˆ + f ( n ) d ( α )+ˆ E α i ˆ β α β α � n n i = 1 i = 1 Robust selection procedures Stefan Van Aelst 13

  14. Automatic variable selection approach Automatic variable selection Try to find a stable model that fits the data well Shrinkage: constrained least squares optimization Stagewise forward procedures Methods Ridge regression Lasso Least Angle regression L 2 Boosting Elastic Net Robust selection procedures Stefan Van Aelst 14

  15. Automatic variable selection approach Lasso Least Absolute Shrinkage and Selection Operator   2 n d � � β lasso = arg min ˆ  y i − β 0 − β j x ij  β i = 1 j = 1 d � | β j | ≤ t subject to � β � 1 = j = 1 0 < t < � ˆ β LS � 1 is a tuning parameter Robust selection procedures Stefan Van Aelst 15

  16. Automatic variable selection approach Example: LASSO fits LASSO 6 * 1 * * * * * * 4 Standardized Coefficients * * * * * * 8 * 2 * * 4 * * * * * * * * * * * * * 0 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 7 * * * 3 −2 * 6 * 2 4 6 8 Df Robust selection procedures Stefan Van Aelst 16

  17. Automatic variable selection approach Least angle regression Standardize the variables. 1 Select x 1 such that | cor ( y , x 1 ) | = max j | cor ( y , x j ) | . 2 Put r = y − γ x 1 where γ is determined such that | cor ( r , x 1 ) | = max j � = 1 | cor ( r , x j ) | . 3 Select x 2 corresponding to the maximum above. Determine the equiangular direction b such that x ′ 1 b = x ′ 2 b 4 Put r = r − γ b where γ is determined such that | cor ( r , x 1 ) | = | cor ( r , x 2 ) | = max j � = 1 , 2 | cor ( r , x j ) | . 5 Continue the procedure . . . Robust selection procedures Stefan Van Aelst 17

  18. Automatic variable selection approach Properties of LAR Least angle regression (LAR) selects the predictors in order of importance. LAR changes the contributions of the predictors gradually as they are needed. LAR is very similar to LASSO and can easily be adjusted to produce the LASSO solution LAR only uses the means, variances and correlations of the variables. LAR is computationally as efficient as LS Robust selection procedures Stefan Van Aelst 18

  19. Automatic variable selection approach Example: LAR fits LAR 1 * * 0.6 * * * * * * Standardized Coefficients 0.4 * * 2 * * * * * * 0.2 4 * * * * * * * * * * * * * 0.0 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 7 * * * 3 * −0.2 * 6 * 2 4 6 8 Df Robust selection procedures Stefan Van Aelst 19

  20. Automatic variable selection approach L 2 boosting Standardize the variables. 1 Put r = y and ˆ F 0 = 0 2 Select x 1 such that | cor ( r , x 1 ) | = max j | cor ( r , x j ) | . 3 Update r = y − ν ˆ f ( x 1 ) where 0 < ν ≤ 1 is the step length and ˆ f ( x 1 ) are the fitted values from the LS regression of y on x 1 . Similarly, update ˆ F 1 = ˆ F 0 + ν ˆ f ( x 1 ) 4 Continue the procedure . . . Robust selection procedures Stefan Van Aelst 20

  21. Automatic variable selection approach Sequencing variables Several selection algorithms sequence the predictors in "order of importance" or screen out the most relevant variables Forward/stepwise selection Stagewise forward selection Penalty methods Least angle regression L 2 boosting These methods are computationally very efficient because they are only based on means, variances and correlations. Robust selection procedures Stefan Van Aelst 21

  22. Robustness Robustness: Data with outliers Question: Number of partners men and women desire to have in the next 30 years? Men: Mean=64.3, Median=1 − → Mean is sensitive to outliers − → Median is robust and thus more reliable Robust selection procedures Stefan Van Aelst 22

  23. Robustness Least squares regression 5.5 Log Light Intensity 5.0 4.5 LS 4.0 3.6 3.8 4.0 4.2 4.4 4.6 Log Surface Temperature � r 2 LS: Minimize i ( β ) Robust selection procedures Stefan Van Aelst 23

  24. Robustness Outliers 6.0 5.5 Log Light Intensity LS 5.0 4.5 4.0 3.6 3.8 4.0 4.2 4.4 4.6 Log Surface Temperature Outliers attract LS! Robust selection procedures Stefan Van Aelst 24

Recommend


More recommend