structured association ii
play

Structured Association II 02-715 Advanced Topics in Computa8onal - PowerPoint PPT Presentation

Structured Association II 02-715 Advanced Topics in Computa8onal Genomics Regression with Regularization Group lasso (Yuan and Lin, 2006) L1/L2 2 || j || L 1/ L 2 = jk k


  1. Structured Association II 02-­‑715 ¡Advanced ¡Topics ¡in ¡Computa8onal ¡ Genomics ¡

  2. Regression with Regularization • Group ¡lasso ¡ (Yuan ¡and ¡Lin, ¡2006) ¡ L1/L2 ∑ 2 || β j || L 1/ L 2 = β jk k – Parameter ¡es8ma8on: ¡pathwise ¡coordinate ¡descent ¡(Friedman ¡et ¡al, ¡2007) ¡

  3. Regression with Regularization (Group Lasso Penalty) Lasso Group L 2 penalty penalty lasso penalty

  4. Lasso (Tibshirani, 1996) Regression Coefficients Inputs Outputs

  5. L 1 / L 2 -regularized Multi-task Regression (Obozinski et al., 2008) Regression Coefficients Inputs Outputs

  6. Hierarchical Selection with Nested Groups (Zhao, Rocha, and Yu, 2009) One-sided tree Prior knowledge on group hierarchies Almost-complete tree Regular tree

  7. Grouped Variable Selection in Structured Association • Mul8-­‑popula8on ¡lasso ¡ (Puniyani ¡et ¡al., ¡2010) ¡ – Groups ¡are ¡given ¡as ¡groups ¡of ¡individuals ¡for ¡different ¡popula8ons ¡ • Tree-­‑guided ¡group ¡lasso ¡ (Kim ¡& ¡Xing, ¡2010) ¡ – Groups ¡are ¡defined ¡hierarchically ¡according ¡to ¡hierarchical ¡clustering ¡ tree ¡

  8. Population Structure in GWAS Pooled analysis Separate analysis of multiple populations: of multiple populations: lasso for all populations lasso for each population

  9. Population Structure in GWAS Multi-population group lasso

  10. Analysis of Lactose Intolerance Dataset Eigen Lasso strat (pooled) Single Lasso SNP (separate) test Multi- population lasso

  11. Regression Coefficients Tree-Guided Group Lasso h 2 h 1 Inputs Tree-guided group lasso penalty Key idea : use overlapping groups in group lasso Outputs

  12. Inputs (SNPs) Example: Learning TCGACGTTTTACTGTACAATT ¡ Genetic Associations Regression coefficients Outputs (Genes)

  13. Tree-Guided Group-Lasso Penalty • Hierarchical clustering tree over the outputs (tasks) as prior knowledge h 2 – Tree structure: clustering at multiple granularity h 1 – Heights of internal nodes: strength of clustering • Group-lasso-like penalty with overlapping groups – Each group at each node of the tree

  14. Tree-Guided Group Lasso • In a simple case of two outputs h h • Low height • Large height • Tight correlation • Weak correlation • Joint selection • Separate selection Inputs Inputs

  15. Tree-Guided Group Lasso • In a simple case of two outputs Select the child h nodes jointly or separately? Tree-guided group lasso Inputs L 1 penalty L 2 penalty • Lasso penalty • Group lasso • Joint selection • Separate selection Elastic net

  16. Tree-Guided Group Lasso • For a general tree h 2 Select the child nodes jointly or separately? h 1 Tree-guided group lasso Joint Separate selection selection

  17. Tree-Guided Group Lasso • For a general tree h 2 Select the child nodes jointly or separately? h 1 Tree-guided group lasso Note that the groups overlap! Joint Separate selection selection

  18. Overlapping Groups in Tree-guided Group Lasso Balanced penalization

  19. Overlapping Groups • Previously • Arbitrarily overlapping groups (Jenatton, Audibert, Bach, 2009) • Overlapping groups over tree-structured inputs (Zhao, Roach, Yu, 2008) Unbalanced penalization

  20. Tree-Guided Group-Lasso Penalty • Penalty function where

  21. Unit Contour Surface for Various Penalty Function Lasso L 1 /L 2 Tree Tree Tree g 1 =0.5, g 2 =0.5 g 1 =0.2, g 2 =0.7 g 1 =0.7, g 2 =0.2

  22. Estimating Parameters • Second-order cone program – Many publicly available software packages for solving convex optimization problems can be used • Also, variational formulation

  23. Proximal Gradient Descent Original Problem: Approximation Problem: Gradient of the Approximation:

  24. Geometric Interpretation • Smooth ¡approxima8on ¡ Uppermost Line Nonsmooth Uppermost Line Smooth

  25. Illustration with Simulated Data High association No association Outputs (Genes) Inputs (SNPs) True ¡regression ¡ Tree-­‑guided ¡ L 1 /L 2 -­‑regularized ¡ Lasso ¡ ¡ coefficients ¡ group ¡lasso ¡ ¡ mul8-­‑task ¡regression ¡ ¡

  26. Simulation Study: ROC Curves • Results averaged over 50 simulated datasets

  27. Simulation Study: Prediction Errors • Results averaged over 50 simulated datasets

  28. Experiments • Yeast Dataset – Inputs: 21 genetic variations in chromosome 3 of yeast – Outputs: gene expression measurements for 3684 – Samples for 114 yeast strains • Goal: learn input features that preturb the output gene expression levels

  29. Yeast eQTL Analysis Hierarchical ¡ clustering ¡tree ¡for ¡ genes ¡(outputs) ¡ Outputs (Genes) Inputs (SNPs) High association No association L 1 / L 2 -­‑regularized ¡ Tree-­‑guided ¡ Lasso ¡ ¡ mul8-­‑task ¡regression ¡ ¡ group ¡lasso ¡ ¡

Recommend


More recommend