learning algorithms and statistical software with
play

Learning algorithms and statistical software, with applications to - PowerPoint PPT Presentation

Learning algorithms and statistical software, with applications to bioinformatics PhD defense of Toby Dylan Hocking toby.hocking@inria.fr http://cbio.ensmp.fr/~thocking/ 20 November 2012 1 Summary of contributions Ch. 2: clusterpath for


  1. Learning algorithms and statistical software, with applications to bioinformatics PhD defense of Toby Dylan Hocking toby.hocking@inria.fr http://cbio.ensmp.fr/~thocking/ 20 November 2012 1

  2. Summary of contributions ◮ Ch. 2: clusterpath for finding groups in data, ICML 2011. ◮ Ch. 3: breakpoint annotations for smoothing model training and evaluation, HAL-00663790. ◮ Ch. 4-5: penalties for breakpoint detection in simulated and real signals, under review. ◮ Statistical software contributions in R: ◮ Ch. 7: direct labels for readable statistical graphics, Best Student Poster at useR 2011. ◮ Ch. 8: documentation generation to convert comments into a package for distribution, accepted in JSS. ◮ Ch. 9: named capture regular expressions for extracting data from text files, talk for useR 2011, accepted into R-2.14. 2

  3. Cancer cells show chromosomal copy number alterations Spectral karyotypes show the number of copies of the sex chromosomes (X,Y) and autosomes (1-22). Source: Alberts et al. 2002. Normal cell with 2 copies of Cancer cell with many copy each autosome. number alterations. 3

  4. Copy number profiles of neuroblastoma tumors 4

  5. Ch. 2: clusterpath finds groups in data Ch 3: breakpoint annotations for smoothing model selection Ch. 4–5: penalties for breakpoint detection 5

  6. The clusterpath relaxes a hard fusion penalty || α − X || 2 min F α ∈ R n × p � subject to 1 α i � = α j ≤ t α 2 number of breakpoints X 2 i < j α C = α 2 = α 3 Combinatorial! Relaxation: X 3 � || α i − α j || q w ij ≤ t i < j The clusterpath is the path of optimal α obtained by varying t . Related work: “fused lasso” α 1 Tibshirani and Saunders (2005), X 1 “convex clustering shrinkage” Pel- ckmans et al. (2005), “grouping α 1 survival pursuit” Shen and Huang (2010), “sum of norms” Lindsten et al. (2011). 6

  7. Choice of norm and weights alters the clusterpath norm = 1 norm = 2 norm = ∞ Take X ∈ R 10 × 2 , solve weights γ = 0 α || α − X || 2 min F subject to Ω( α ) / Ω( X ) ≤ 0. Penalty with ℓ q norm: weights γ = 1 � Ω( Y ) = || Y i − Y j || q w ij i < j Weights: w ij = exp( − γ || X i − X j || 2 2 ) 7

  8. Choice of norm and weights alters the clusterpath norm = 1 norm = 2 norm = ∞ Take X ∈ R 10 × 2 , solve weights γ = 0 α || α − X || 2 min F subject to Ω( α ) / Ω( X ) ≤ 0 . 1. Penalty with ℓ q norm: weights γ = 1 � Ω( Y ) = || Y i − Y j || q w ij i < j Weights: w ij = exp( − γ || X i − X j || 2 2 ) 8

  9. Choice of norm and weights alters the clusterpath norm = 1 norm = 2 norm = ∞ Take X ∈ R 10 × 2 , solve weights γ = 0 α || α − X || 2 min F subject to Ω( α ) / Ω( X ) ≤ 0 . 2. Penalty with ℓ q norm: weights γ = 1 � Ω( Y ) = || Y i − Y j || q w ij i < j Weights: w ij = exp( − γ || X i − X j || 2 2 ) 9

  10. Choice of norm and weights alters the clusterpath norm = 1 norm = 2 norm = ∞ Take X ∈ R 10 × 2 , solve weights γ = 0 α || α − X || 2 min F subject to Ω( α ) / Ω( X ) ≤ 0 . 3. Penalty with ℓ q norm: weights γ = 1 � Ω( Y ) = || Y i − Y j || q w ij i < j Weights: w ij = exp( − γ || X i − X j || 2 2 ) 10

  11. Choice of norm and weights alters the clusterpath norm = 1 norm = 2 norm = ∞ Take X ∈ R 10 × 2 , solve weights γ = 0 α || α − X || 2 min F subject to Ω( α ) / Ω( X ) ≤ 0 . 4. Penalty with ℓ q norm: weights γ = 1 � Ω( Y ) = || Y i − Y j || q w ij i < j Weights: w ij = exp( − γ || X i − X j || 2 2 ) 11

  12. Choice of norm and weights alters the clusterpath norm = 1 norm = 2 norm = ∞ Take X ∈ R 10 × 2 , solve weights γ = 0 α || α − X || 2 min F subject to Ω( α ) / Ω( X ) ≤ 0 . 5. Penalty with ℓ q norm: weights γ = 1 � Ω( Y ) = || Y i − Y j || q w ij i < j Weights: w ij = exp( − γ || X i − X j || 2 2 ) 12

  13. Choice of norm and weights alters the clusterpath norm = 1 norm = 2 norm = ∞ Take X ∈ R 10 × 2 , solve weights γ = 0 α || α − X || 2 min F subject to Ω( α ) / Ω( X ) ≤ 0 . 6. Penalty with ℓ q norm: weights γ = 1 � Ω( Y ) = || Y i − Y j || q w ij i < j Weights: w ij = exp( − γ || X i − X j || 2 2 ) 13

  14. Choice of norm and weights alters the clusterpath norm = 1 norm = 2 norm = ∞ Take X ∈ R 10 × 2 , solve weights γ = 0 α || α − X || 2 min F subject to Ω( α ) / Ω( X ) ≤ 0 . 7. Penalty with ℓ q norm: weights γ = 1 � Ω( Y ) = || Y i − Y j || q w ij i < j Weights: w ij = exp( − γ || X i − X j || 2 2 ) 14

  15. Choice of norm and weights alters the clusterpath norm = 1 norm = 2 norm = ∞ Take X ∈ R 10 × 2 , solve weights γ = 0 α || α − X || 2 min F subject to Ω( α ) / Ω( X ) ≤ 0 . 8. Penalty with ℓ q norm: weights γ = 1 � Ω( Y ) = || Y i − Y j || q w ij i < j Weights: w ij = exp( − γ || X i − X j || 2 2 ) 15

  16. Choice of norm and weights alters the clusterpath norm = 1 norm = 2 norm = ∞ Take X ∈ R 10 × 2 , solve weights γ = 0 α || α − X || 2 min F subject to Ω( α ) / Ω( X ) ≤ 0 . 9. Penalty with ℓ q norm: weights γ = 1 � Ω( Y ) = || Y i − Y j || q w ij i < j Weights: w ij = exp( − γ || X i − X j || 2 2 ) 16

  17. Choice of norm and weights alters the clusterpath norm = 1 norm = 2 norm = ∞ Take X ∈ R 10 × 2 , solve weights γ = 0 α || α − X || 2 min F subject to Ω( α ) / Ω( X ) ≤ 1. Penalty with ℓ q norm: weights γ = 1 � Ω( Y ) = || Y i − Y j || q w ij i < j Weights: w ij = exp( − γ || X i − X j || 2 2 ) 17

  18. Clusterpath learns a tree, even for odd cluster shapes Comparison with other methods for finding 2 clusters. Caveat: does not recover overlapping clusters, e.g. iris data, gaussian mixture. 18

  19. Contributions in chapter 2, future work Hocking et al. Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties. ICML 2011. ◮ Theorem. No splits in the ℓ 1 clusterpath with identity weights w ij = 1. What about other situations? ◮ Convex and hierarchical clustering algorithms. ◮ ℓ 1 homotopy method O ( pn log n ). ◮ ℓ 2 active-set method O ( pn 2 ). ◮ ℓ ∞ Franck-Wolfe algorithm. ◮ Implementation in R package clusterpath on R-Forge. 19

  20. Ch. 2: clusterpath finds groups in data Ch 3: breakpoint annotations for smoothing model selection Ch. 4–5: penalties for breakpoint detection 20

  21. How to detect breakpoints in 23 × 575 =13,225 signals? 21

  22. Which model should we use? ◮ GLAD: adaptive weights smoothing (Hup´ e et al. , 2004) ◮ DNAcopy: circular binary segmentation (Venkatraman and Olshen, 2007) ◮ cghFLasso: fused lasso signal approximator with heuristics (Tibshirani and Wang, 2007) ◮ HaarSeg: wavelet smoothing (Ben-Yaacov and Eldar, 2008) ◮ GADA: sparse Bayesian learning (Pique-Regi et al. , 2008) ◮ flsa: fused lasso signal approximator path algorithm (Hoefling 2009) ◮ cghseg: pruned dynamic programming (Rigaill 2010) ◮ PELT: pruned exact linear time (Killick et al. , 2011) ... and how to select the smoothing parameter in each model? 22

  23. 575 copy number profiles, each annotated in 6 regions 23

  24. Not enough breakpoints 24

  25. Too many breakpoints 25

  26. Good agreement with annotated regions 26

  27. Select the best model using the breakpoint annotations Breakpoint detection training errors for 3 models of data(neuroblastoma,package="neuroblastoma") . cghseg.k, pelt.n flsa.norm dnacopy.sd 80 predicted annotations percent incorrectly in training set 60 statistic false.positive 40 false.negative errors 20 11.5 ● 4.8 2.2 ● ● 0 −5 −4 −3 −2 −1 0 −2 −1 0 1 2 3 0.0 0.5 1.0 log10(smoothing parameter lambda) <− more breakpoints fewer breakpoints −> Idea: for several smoothing parameters λ , calculate the annotation error function E ( λ ), (black line) then select the model with least error. (black dot) ˆ λ = arg min E ( λ ) . λ 27

  28. PELT/cghseg show the best breakpoint detection ROC curves for breakpoint detection training errors of each model, by varying the smoothness parameter λ . probability(predict breakpoint | breakpoint) optimization−based models approximate optimization glad ● 1.0 ● ● dnacopy.alpha glad.haarseg ● cghseg.mBIC ● glad.default dnacopy ● pelt.n ● ● 0.9 default cghseg.k True positive rate = ● glad.MinBkpWeight flsa dnacopy ● ● norm 0.8 prune gada ● glad.lambdabreak 0.7 ● ● dnacopy.sd flsa 0.6 ● pelt.default 0.5 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 False positive rate = probability(predict breakpoint | normal) Open circle shows smoothness λ selected using annotations. 28

  29. Few annotations required for a good breakpoint detector Percent of correctly predicted annotations on test set profiles 100 cghseg.k, pelt.n 98 96 94 flsa.norm 92 90 dnacopy.sd 88 86 84 glad.lambdabreak 82 80 1 5 10 15 20 25 30 Annotated profiles in global model training set 29

  30. Interactive web site for annotation and model building Takita J et al. Aberrations of NEGR1 on 1p31 and MYEOV on 11q13 in neuroblastoma. Cancer Sci. 2011 Sep;102(9):1645-50. 30

Recommend


More recommend