sparse cca using lasso
play

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker - PowerPoint PPT Presentation

Introduction CCA Lasso Sparse CCA Summary Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and Statistics, Lancaster University July 23, 2008 Introduction CCA Lasso Sparse CCA Summary Outline


  1. Introduction CCA Lasso Sparse CCA Summary Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and Statistics, Lancaster University July 23, 2008

  2. Introduction CCA Lasso Sparse CCA Summary Outline Introduction 1 Motivation CCA 2 Definition CCA as least squares problem Lasso 3 Definition Lasso algorithms The Lasso algorithms contrasted Sparse CCA 4 SCCA Algorithm for SCCA Example Summary 5

  3. Introduction CCA Lasso Sparse CCA Summary Motivation SCCA improve the interpretation of CCA sparse principal component analysis (SCoTLASS by Jolliffe et al. (2003) and SPCA by Zou et al. (2004)) interesting data sets (market basket analysis) Sparsity shrinkage and model selection simultaneously (may reduce the prediction error, can be extended to high-dimensional data sets)

  4. Introduction CCA Lasso Sparse CCA Summary Definition Canonical Correlation Analysis X1 Y1 S T Xp Yq seek linear combinations S = α T X and T = β T Y such that ρ = max α , β corr ( S , T ) S , T are the canonical variates α , β are called conical loadings Standard solution through eigen decomposition.

  5. Introduction CCA Lasso Sparse CCA Summary CCA as least squares problem 1st dimension Theorem1 Let α , β be p , q dimensional vectors, respectively. � � α , � var ( α T X − β T Y ) ( � β ) = argmin α , β , subject to α T var ( X ) α = β T var ( Y ) β = 1 . α , � Then � β are proportional to the first dimensional ordinary canonical loadings.

  6. Introduction CCA Lasso Sparse CCA Summary CCA as least squares problem 2nd dimension Theorem2 Let α , β be p , q dimensional vectors. � � α , � var ( α T X − β T Y ) ( � β ) = argmin α , β , α T var ( X ) α = β T var ( Y ) β = 1 and st α T 1 var ( X ) α = β T 1 var ( Y ) β = 0 where α 1 , β 1 are the first canonical loadings. α , � Then, � β are proportional to the second dimensional ordinary canonical loadings. The theorems establish an Alternating Least Squares algorithm for CCA.

  7. Introduction CCA Lasso Sparse CCA Summary CCA as least squares problem 2nd dimension Theorem2 Let α , β be p , q dimensional vectors. � � α , � var ( α T X − β T Y ) ( � β ) = argmin α , β , α T var ( X ) α = β T var ( Y ) β = 1 and st α T 1 var ( X ) α = β T 1 var ( Y ) β = 0 where α 1 , β 1 are the first canonical loadings. α , � Then, � β are proportional to the second dimensional ordinary canonical loadings. The theorems establish an Alternating Least Squares algorithm for CCA.

  8. Introduction CCA Lasso Sparse CCA Summary CCA as least squares problem ALS for CCA Let the objective function be Q ( α , β ) = var ( α T X − β T Y ) subject to α T var ( X ) α = β T var ( Y ) β = 1 . Q is continuous with closed and bounded domain ⇒ Q attains its infimum ALS algorithm Given � α � α , β ) subject to var ( β T Y ) = 1 ) β = arg min β Q ( � Given � β α = arg min α Q ( α , � β ) subject to var ( α T X ) = 1 ) � Q decreases over the iterations and is bounded from below ⇒ Q converges.

  9. Introduction CCA Lasso Sparse CCA Summary CCA as least squares problem ALS for CCA Let the objective function be Q ( α , β ) = var ( α T X − β T Y ) subject to α T var ( X ) α = β T var ( Y ) β = 1 . Q is continuous with closed and bounded domain ⇒ Q attains its infimum ALS algorithm Given � α � α , β ) subject to var ( β T Y ) = 1 ) β = arg min β Q ( � Given � β α = arg min α Q ( α , � β ) subject to var ( α T X ) = 1 ) � Q decreases over the iterations and is bounded from below ⇒ Q converges.

  10. Introduction CCA Lasso Sparse CCA Summary CCA as least squares problem ALS for CCA Let the objective function be Q ( α , β ) = var ( α T X − β T Y ) subject to α T var ( X ) α = β T var ( Y ) β = 1 . Q is continuous with closed and bounded domain ⇒ Q attains its infimum ALS algorithm Given � α � α , β ) subject to var ( β T Y ) = 1 ) β = arg min β Q ( � Given � β α = arg min α Q ( α , � β ) subject to var ( α T X ) = 1 ) � Q decreases over the iterations and is bounded from below ⇒ Q converges.

  11. Introduction CCA Lasso Sparse CCA Summary Definition Lasso (least absolute shrinkage and selection operator) Introduced by Tibshirani (1996) Imposes the L 1 norm on the linear regression coefficients. Lasso � � � var ( Y − β T X ) β lasso = argmin β subject to � p j = 1 | β j | ≤ t The L 1 norm properties shrink the coefficients towards zero and exactly to zero if t is small enough.

  12. Introduction CCA Lasso Sparse CCA Summary Lasso algorithms Lasso algorithms available in the literature Lasso by Tibshirani Expresses the problem as a least squares problem with 2 p inequality constraints Adapts the NNLS algorithm Lars-Lasso A modified version of Lars algorithm introduced by Efron et al. (2004) Lasso estimates are calculated such that the angle between the active covariates and the residuals is always equal.

  13. Introduction CCA Lasso Sparse CCA Summary Lasso algorithms Proposed algorithm Lasso with positivity constraints Suppose that the sign of the coefficients does not change during shrinkage of the coefficients Positivity Lasso � � � var ( Y − β T X ) β lasso = argmin β subject to s t 0 β ≤ t and s 0 j β j ≥ 0 for i = 1 . . . , p where s 0 is the sign of the OLS estimate. simple algorithm, but quite general restricted version of Lasso algorithms, since the sign of the coefficients cannot change up to p + 1 constraints imposed, << 2 p constraints of Tibshirani’s Lasso

  14. Introduction CCA Lasso Sparse CCA Summary Lasso algorithms Numerical solution The solution is given through quadratic programming methods, Positivity Lasso solution β = b 0 − λ var ( X ) − 1 s 0 + var ( X ) − 1 diag ( s 0 ) µ � b 0 is the OLS estimate. λ is the shrinkage parameter and there is a one to one correspondence between the λ and t µ is zero for active and positive for nonactive coefficients parameters λ and µ are calculated satisfying the KKT conditions under the positivity constraints

  15. Introduction CCA Lasso Sparse CCA Summary The Lasso algorithms contrasted Diabetes data set 442 observations age, sex, body mass index, average blood pressure and six blood serum measurements disease progression one year after baseline Lars − Lasso Positivity Lasso 9 500 500 6 4 Coefficients Coefficients 8 0 1 0 1 2 −500 −500 5 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 sum|b|/sum|bols| sum|b|/sum|bols|

  16. Introduction CCA Lasso Sparse CCA Summary The Lasso algorithms contrasted Simulation studies We simulate 200 data sets consisting of 100 observations each from the following model, Y = β T X + σǫ, corr ( X i , X j ) = ρ | i − j | Dataset n p β σ ρ ( 3 , 1 . 5 , 0 , 0 , 2 , 0 , 0 , 0 ) T 1 100 8 3 0.50 ( 3 , 1 . 5 , 0 , 0 , 2 , 0 , 0 , 0 ) T 2 100 8 3 0.90 3 100 8 0 . 85 ∀ j 3 0.50 ( 5 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ) T 4 100 8 2 0.50 Table: Proportions of the cases Table: Proportions of agreement the correct model selected. between Pos-Lasso and Dataset Tibs-Lasso Lars-Lasso Pos-Lasso Dataset Tibs-Lasso Lars-Lasso 1 0.06 0.13 0.14 1 0.76 0.83 2 0.02 0.04 0.04 2 0.63 0.65 3 0.84 0.89 0.87 3 0.95 0.98 4 0.09 0.19 0.19 4 0.77 0.78

  17. Introduction CCA Lasso Sparse CCA Summary SCCA ALS for CCA and Lasso First dimension Given the canonical variate T = β T Y , � � var ( T − α T X ) α = arg min α � st var ( α T X ) = 1 and || α || 1 ≤ t We seek an algorithm solving this optimization problem or Modify the Lasso algorithm in order to incorporate the equality constraint.

  18. Introduction CCA Lasso Sparse CCA Summary SCCA ALS for CCA and Lasso First dimension Given the canonical variate T = β T Y , � � var ( T − α T X ) α = arg min α � st var ( α T X ) = 1 and || α || 1 ≤ t We seek an algorithm solving this optimization problem or Modify the Lasso algorithm in order to incorporate the equality constraint.

  19. Introduction CCA Lasso Sparse CCA Summary SCCA ALS for CCA and Lasso First dimension Given the canonical variate T = β T Y , � � var ( T − α T X ) α = arg min α � st var ( α T X ) = 1 and || α || 1 ≤ t We seek an algorithm solving this optimization problem or Modify the Lasso algorithm in order to incorporate the equality constraint.

  20. Introduction CCA Lasso Sparse CCA Summary Algorithm for SCCA ALS for CCA and Lasso Tibshirani’s Lasso NNLS algorithm cannot incorporate the equality constraint Lars Lasso the equality constraint violates the equiangular condition Positivity Lasso by additionally imposing positivity constraints the above optimization problem can be solved.

  21. Introduction CCA Lasso Sparse CCA Summary Algorithm for SCCA ALS for CCA and Lasso Tibshirani’s Lasso NNLS algorithm cannot incorporate the equality constraint Lars Lasso the equality constraint violates the equiangular condition Positivity Lasso by additionally imposing positivity constraints the above optimization problem can be solved.

Recommend


More recommend