Overview Least Angle Regression ◮ Why is LARS imporant? Tim Hesterberg, Insightful Corp. ◮ Other packages ◮ GLARS package ◮ Issues 16 June 2006 ◮ Insightful Research This is joint work with Chris Fraley, with support from NIH SBIR Phase I 1 R43 GM074313-01 Tim Hesterberg, Insightful Corp. Least Angle Regression Tim Hesterberg, Insightful Corp. Least Angle Regression Why is LARS important? Ridge Regression Y i ) + λ � ˆ ◮ Minimize � ( Y i − ˆ β 2 j S5 ◮ Variable Selection in Regression ◮ Important 500 BMI S2 ◮ Many approaches: stagewise, boosting, LASSO, regularization, . . . BP ◮ Least Angle Regression — Efron, Hastie, Johnstone, S4 S3 Tibshirani (2004) Annals (with discussion) S6 beta 0 AGE 1. Lasso 2. Forward stagewise SEX 3. Least Angle Regression (LAR) ◮ Unifying explanation −500 ◮ Fast implementation ◮ Fast way to choose tuning parameter S1 0.0 0.1 1.0 10.0 theta Tim Hesterberg, Insightful Corp. Least Angle Regression Tim Hesterberg, Insightful Corp. Least Angle Regression
LASSO Forward Stagewise Regression Y i ) + λ � | ˆ ◮ Minimize � ( Y i − ˆ β j | ◮ Forces small coefficients → 0; gives simpler models. ◮ Smaller penalty on large coefficients: less effect on important terms (Forward Stagewise = Least Squares Boosting) ◮ Implementation is more complicated and slower 1. Initialize: standardize predictors, center y , r = y , β 1 = . . . = β p = 0 LASSO Ridge Regression 2. Repeat many times S5 S5 ◮ Find the predictor x j most correlated with r Standardized Coefficients Standardized Coefficients 500 500 ◮ δ = ǫ sign( r · x j ) BMI BMI S2 S2 BP BP ◮ ˆ β j ← ˆ β j + δ S4 S4 S3 S3 ◮ r ← r − δ x j S6 S6 0 0 AGE AGE SEX SEX −500 −500 S1 S1 0 1000 2000 3000 0 1000 2000 3000 sum( |beta| ) sum( |beta| ) Tim Hesterberg, Insightful Corp. Least Angle Regression Tim Hesterberg, Insightful Corp. Least Angle Regression Forward Stagewise and LASSO Similarity: ✬ ✩ March 2003 Trevor Hastie, Stanford Statistics 6 Prostate Cancer Data Lasso Forward Stagewise Are LASSO and infinitesimal forward stagewise identical? lcavol lcavol ◮ With orthogonal predictors, yes. 0.6 0.6 ◮ Otherwise similar. 0.4 0.4 Coefficients Coefficients Least Angle Regression provides explanation, and fast svi svi lweight lweight pgg45 pgg45 implementation. 0.2 lbph 0.2 lbph 0.0 0.0 gleason gleason age age -0.2 -0.2 lcp lcp 0.0 0.5 1.0 1.5 2.0 2.5 0 50 100 150 200 250 t = � j | β j | Iteration ✫ ✪ Tim Hesterberg, Insightful Corp. Least Angle Regression Tim Hesterberg, Insightful Corp. Least Angle Regression
Stepwise, Forward Stagewise, Least Angle Least Angle Regression Stepwise regression: ◮ Pick predictor most correlated with y X2 D C ◮ Bring predictor completely into model (full LS E fit) Forward stagewise: ◮ Pick predictor most correlated with y O X1 B A ◮ Increment coefficient for predictor Least Angle Regression: ◮ Pick predictor most correlated with y C = projection of y onto space spanned by X 1 and X 2 . ◮ Bring predictor into model only to extent it is B = first step for least-angle regression better than others E = point on stagewise path ◮ Move in least-squares direction until another variable is as correlated Tim Hesterberg, Insightful Corp. Least Angle Regression Tim Hesterberg, Insightful Corp. Least Angle Regression LARS - other packages S+GLARS ◮ S-PLUS and R, open source ◮ Incorporate lars , glmpath ◮ Cleanup, consistent interface ◮ Incorporate future work by others; provide framework lars : Efron and Hastie (S-PLUS and R) ◮ Extensions ◮ Linear regression ◮ Numerically-accurate calculations ◮ Factors, splines, polynomials, interactions, . . . glmpath : Park and Hastie (R) ◮ Other models (robust regression, . . . ), other penalties ◮ GLM and Cox Proportional Hazards ◮ Missing data ◮ Massive data sets Methods: plot , print , predict , cv , coef ◮ Diagnostics, tools for selecting tuning parameter ◮ User-friendly ◮ Consistent interface ◮ GUI ◮ Documentation Tim Hesterberg, Insightful Corp. Least Angle Regression Tim Hesterberg, Insightful Corp. Least Angle Regression
Issues Insightful Research Department ◮ Turn research into software for wide use ◮ Higher standards than academic software (ease of use, robustness, testing) ◮ Collaboration ◮ Money ◮ Variety: resampling, missing data, group sequential designs, ◮ NIH funding: require commercial potential simulation-based econometric software, functional data, stable ◮ Insightful: indirect benefit distributions, proteomics, microarrays, frailty models, causal ◮ Outside contributors modeling ◮ Licensing; ability to ship with S-PLUS, I-Miner. ◮ External funding — SBIR grants (NIH, NSF, . . . ) ◮ Somewhat easier funding ◮ Commercial potential ◮ Risk, research element ◮ We’re hiring ◮ We’re looking for good projects and collaborators Tim Hesterberg, Insightful Corp. Least Angle Regression Tim Hesterberg, Insightful Corp. Least Angle Regression
Recommend
More recommend