extensions of cca and pls to unravel relationships
play

Extensions of CCA and PLS to unravel relationships between two data - PowerPoint PPT Presentation

Extensions of CCA and PLS to unravel relationships between two data sets S. Djean (1) - I. Gonzlez (2) - K-A. L Cao (3) 1. Institut de Mathmatiques de Toulouse, UMR 5219 Universit de Toulouse et CNRS


  1. Extensions of CCA and PLS to unravel relationships between two data sets S. Déjean (1) - I. González (2) - K-A. Lê Cao (3) 1. Institut de Mathématiques de Toulouse, UMR 5219 Université de Toulouse et CNRS sebastien.dejean@math.univ-toulouse.fr 2. Plateforme Biopuces, Genopôle Toulouse Midi-Pyrénées Institut National des Sciences Appliquées ignacio.gonzalez@insa-toulouse.fr 3. ARC Centre of Excellence in Bioinformatics Institute for Molecular Bioscience, University of Queensland, Australia k.lecao@uq.edu.au Conference 2009 – Rennes, July 8-10

  2. History Once upon a time in Toulouse, a city in South West of France, two groups of scientists lived nearly together without talking to each other. But one day, they decided to do so and to work together. They had Ph.D students, wrote articles and built R packages... n 1 n ∑ X i DNA « Stat » « Bio » RNA i = 1 − 1 X ' Y ATGCC  X ' X  TACCAGT ATGCC n 1 n ∑ X i i = 1 J. Stat. J. Biol. Soft. Syst. CCA ofw BMC integrOmics SAGMB Bioinformatics UseR Conference 2009, Rennes, July 8-10 S. Déjean, I. González, K-A Lê Cao – integrOmics 2 / 10

  3. Why integrOmics ? ● Each « -omics » data set can Gen- omics be studied separately, but ● A great part of relevant information can be extracted Transcript- omics from joint analysis of 2 or several datasets, so Prote- omics ⇒ Integrate omics data project in short : Biological integrOmics system Metabol- omics math.univ-toulouse.fr/biostat Lipid- omics Methodology Applications Software ...- omics UseR Conference 2009, Rennes, July 8-10 S. Déjean, I. González, K-A Lê Cao – integrOmics 3 / 10

  4. Methodology Methodology Two ways to deal with the 'large p - small n' problem in the classical framework provided by Canonical Correlation Analysis and Partial Least Squares regression. p q X Y n CCA / regularization PLS / selection ● Maximize correlation ● Maximize covariance between linear combination between linear combination of variables in X and Y of variables in X and Y ● Requires inversion of XX' ● Selection obtained through and YY' Lasso penalization of loading vectors ● Regularization of CCA − 1 ⇒ XX '  X I n  − 1  XX '  UseR Conference 2009, Rennes, July 8-10 S. Déjean, I. González, K-A Lê Cao – integrOmics 4 / 10

  5. Applications Applications E. Yergeau, S.A. Schoondermark-Stolk, E.L. Brodie, S. Déjean, T.Z. ● DeSantis, O. Gonçalves, Y.M. Piceno, G.L. Andersen and G.A. Kowalchuk (2008). Environmental microarray analyses of Antarctic soil microbial communities. The International Society for Microbial Ecology Journal , 3, 340-351 S. Combes, I. González, S. Déjean, A. Baccini, N. Jehl, H. Juin, L. ● Cauquil, B. Gabinaud, F. Lebas, C. Larzul (2008). Relationships between sensory and physicochemical measurements in meat of rabbit from three different breeding systems using canonical correlation analysis. Meat science , 80(3), 835-841 I. González, S. Déjean, P.G.P. Martin, O. Gonçalves, P. Besse, ● A. Baccini (2009). Highlighting Relationships Between Heterogeneous Biological Data Through Graphical Displays Based On Regularized Canonical Correlation Analysis. Journal of Biological Systems , 17(2), 173-199 K. A. Lê Cao, D. Rossouw, C. Robert-Granié, P. Besse ● (2008). A sparse PLS for variable selection when integrating Omics data, Statistical Applications in Genetics and Molecular Biology , 7(1), article 35 UseR Conference 2009, Rennes, July 8-10 S. Déjean, I. González, K-A Lê Cao – integrOmics 5 / 10

  6. IntegrOmics R package Software UseR Conference 2009, Rennes, July 8-10 S. Déjean, I. González, K-A Lê Cao – integrOmics 6 / 10

  7. Using integrOmics Software ● From X and Y two matrices ● Preliminary view of the correlations matrices R> imgCor(X, Y, type = "separate") ● Classical CCA R> res.rcc = rcc(X, Y) ● Regularized CCA R> res.rcc = rcc(X, Y, 0.05, 0.01) ● PLS R> res.pls = pls(X, Y) ● Sparse PLS R> res.pls = spls(X, Y, mode=c("regression", "canonical"), + keep.X=c(10, 10, 10), keep.Y=c(10, 10, 10)) UseR Conference 2009, Rennes, July 8-10 S. Déjean, I. González, K-A Lê Cao – integrOmics 7 / 10

  8. Graphical display Software R> plotVar(res.rcc, R> plotIndiv(res.rcc), + X.label=T,Y.label=T) + ind.names=nutrimouse$diet fish fish 1.0 1.5 C18.2n.6 C20.2n.6 fish fish fish lin fish lin lin C22.4n.6 fish fish 1.0 lin 0.5 C20.1n.9 C22.5n.6 0.5 lin C20.4n.6 Ntcp Dimension 2 ACOTH Comp 2 C20.3n.6 C16.1n.9 0.0 SR.BI 0.0 sun lin ref FAT C20.3n.3 GSTpi2 ref ref C18.3n.6 ref UCP2 MCAD lin C18.0 sun lin C18.1n.9 CYP2c29 sun SPI1.1 ref sun ref C20.5n.3 C22.6n.3 LDLr C22.5n.3 sun -0.5 coc apoC3 sun C14.0 apoA.I sun ref ref sun C18.1n.7 CYP4A10 C16.1n.7 HMGCoAred cHMGCoAS -0.5 CYP3A11 coc C20.3n.9 PLTP FAS PMDCI CYP27a1 GK CPT2 AOX BSEP -1.0 THIOL BIEN GSTa Lpin2 C16.0 HPNCL coc WT coc coc PPAR -1.5 -1.0 coc coc coc -1.0 -0.5 0.0 0.5 1.0 -2 -1 0 1 Comp 1 Dimension 1 Variables plot Individuals plot UseR Conference 2009, Rennes, July 8-10 S. Déjean, I. González, K-A Lê Cao – integrOmics 8 / 10

  9. Future work Software 11 13 5 12 2 ● Provide new graphical display using 6 3 4 graphs 9 8 1 14 7 0 10 ● Methodologies to deal p1 p2 p3 p4 n with more than 2 data sets 40 ● Functional statistics to 30 deal with metabolomics 20 or proteomics data 10 0 0 50 100 150 200 UseR Conference 2009, Rennes, July 8-10 S. Déjean, I. González, K-A Lê Cao – integrOmics 9 / 10

  10. Summary Ph.D K-A. Lê Cao ( P. Besse, C. Robert-Granié ) Ph.D I. González First steps of ( A. Baccini, J.R. Léon ) collaborations between biologists and CCA ofw statisticians integrOmics in Toulouse around Omics First article technologies Conference « state Rennes of the art » 2003 2004 2005 2006 2007 2008 2009 UseR Conference 2009, Rennes, July 8-10 S. Déjean, I. González, K-A Lê Cao – integrOmics 10 / 10

Recommend


More recommend