i
play

i- i e nc e Baye sian Mul t Way Mo S de l s fo r D at - PowerPoint PPT Presentation

nt s and t h e ir t re at me s. uman dise T h e c o mput at io nal ase udy h ransl y dange w n and po t e nt ial l ro st us, mo de l o rganisms are use d t o t at s are w o rganism t o h umans is a


  1. nt s and t h e ir t re at me s. uman dise T h e c o mput at io nal ase udy h ransl y dange w n and po t e nt ial l ro st us, mo de l o rganisms are use d t o t at s are w o rganism t o h umans is a pro bl e m, h de ic h is addre sse d in t h is t h e l mo io o n o f t h e o ut c me o f an e xpe rime nt fro m t h e unkno nt P r finding diffe o ds are pre se nt e d fo re t nc e s be t w e e n h h me - n xpe rime nt s is l imit e d. I t nal h is t h e sis, c o mput at io igh dime me e h e e ffe c t s and side - ffe e c t s o f ne w drug t re at t Sinc nsio xt nal o bse rvat io ns and fo r e e nsio ns o f t h is pro bl e m. sis. re e i- i e nc e Baye sian Mul t Way Mo S de l s fo r D at a T ransl at c r n in r t me nt o f I nf o ma e t i o n a nd C o mp ut io Co a S T O R A L D I S O E R T A T I O N S C D mput o at io nal Bio l o gy T mmi S uv i t a i v a l r p se h nt al t re at me nt s t at xpe are c o nse rve d ac ro ss o rime e n me nt e d dat a t ransl at io t o h o ds ide nt ify re spo nse s t rganisms. T e U l o gy A a l t o ni nal v e r s i t y D Bio io o l mmi Suvit aival Baye sian Mul t i- Way Mo de s f at o r D at a T ransl at io n in Co mput e h I o y S c h o o l f i S c i e nc e D e t s a 9 4 2 ( p d f ) r A a l t o U ni v e p r - a e nc e w w w . a c l t o . f i BU S i S t ma me nt o f I nf o r t r i o n a nd C o mp ut e 4 9 N 9 ) I S BN 9 7 8- 5 e 2 - 6 0 - 5 9 33- 4 d nt p - S BN 9 7 8- 9 5 2 6 ri 0 - 5 9 32 - 7 ( p ( d 9 nt 9 - 4 9 34 ( p ri e 7 d ) I S S N 1 7 9 1 f 7 ) I S S N - L 1 9 N 9 - 4 9 34 I S S I E s in t a fundame s be t w e e n sampl e s is nt nc al pro bl e m in c o mput at io e re 2 o - D D 1 7 1 / 0 f diffe 1 4 I nfe re nc e o nal bio al h igh - dime nsio nal dat a but t e e numbe r o f t e st subje c t h duc l asure o gy. Mo l e c ul ar me me rganisms pro nt s o f bio l o gic al o t A S U R C H I T E C T R + E S C I E N C A N + Y S + E C O N O M G A R T + D E S I E T S T O R A L D I S O E R T A T I O N S C D E C H N O L O G Y C R O S S O V E R 9HSTFMG*afjdch+

  2. Introduction ◮ Molecular measurements of ◮ Measurements can be made: biological organisms to study ◮ in vivo : cell extracts from response to: humans or model organisms ◮ disease ◮ in vitro : cell lines grown in ◮ medical treatment laboratory ◮ environment Hilvo et al. , Cancer Res. 2011

  3. Molecular activity in biological cell Watson & Crick, Nature 1953 Joyce & Palsson, Nat. Rev. Mol. Cell Biol. 2006

  4. Machine learning for computational biology ◮ Molecular measurements: ◮ Large data sets ◮ Uncertainty/noise ⇒ Automated and robust data-driven analysis tools needed ◮ Bayesian approach to probability: Posterior probability density ◮ Take uncertainty into account ◮ Describe the generative process of the data ⇒ Integration of multiple measurement sources ◮ Incorporate existing knowledge by specifying: ◮ the model structure 0 ◮ priors Covariate effect

  5. Computational medicine & contributions ◮ Model organisms for studying effects of: ◮ genomic mutations ◮ new medical treatments, potentially dangerous

  6. Computational medicine & contributions ◮ Model organisms for studying effects of: ◮ genomic mutations ◮ new medical treatments, potentially dangerous ◮ Dissertation: statistical modeling of effects in molecular measurement data with ◮ high-dimensional, noisy measurements ◮ multiple measurement types ◮ multiple organisms

  7. Computational medicine & contributions ◮ Model organisms for studying effects of: ◮ genomic mutations ◮ new medical treatments, potentially dangerous ◮ Dissertation: statistical modeling of effects in molecular measurement data with ◮ high-dimensional, noisy measurements ◮ multiple measurement types ◮ multiple organisms Kaski, MLAB 2013

  8. P I: Multi-Way Model for “ n < p ” covariates data space: a b 100...300 metabolites { { untreated healthy { (1) Data: treated { { untreated diseased { treated (2) Model: (3) Result: α αβ β B A ANOVA a b x lat V FA µ x n

  9. P II–III: Multi-Way Models for Multi-Peak Metabolomics a) Peak clustering based on shapes Sample i 1 2 3 4 Covariate 1 1 2 2 level a i Intensity Intensity Intensity Intensity Data Peaks j Retention time Retention time Retention time Retention time ⇓ ⇓ ⇓ ⇓ Result Cluster k 1 2 1 2 1 2 1 2 b) Inference of covariate effects based on intensity Peak intensities Cluster k Cluster 1 Cluster 2 Posterior probability density Posterior probability density Samples 1 2 3 4 1 1 1 Result 2 ⇒ Peaks Data 2 3 2 4 5 2 1 1 2 2 0 0 Covariate level Covariate effect Covariate effect LIPID MAPS 2014

  10. P IV: Multi-Way Model for Multiple Sources no matched variables, different dimensionalities covariates data space 1 data space 2 a b { { untreated paired samples healthy (1) { treated { { untreated diseased { treated (2) (3) Shared X-specific Y-specific α α β β 2 2 B 0.5 A x y ANOVA � 0 � 0 � n −1.5 a −3 −3 20 50 100 200 500 20 50 100 200 500 20 50 100 200 500 b 2 2 0.5 z x y � � 0 � 0 −1.5 −2 CCA Ψ x W x W y Ψ y −3 20 50 100 200 500 20 50 100 200 500 20 50 100 200 500 y lat 2 x lat ) x ) y 2 ) 2 ��� ��� ��� 0 0 0 V x V y FA ( ( ( −2 −2 −3 µ x µ y 20 50 100 200 500 20 50 100 200 500 20 50 100 200 500 x y n samples n samples n samples

  11. P V: Cross-Organism Toxicogenomics Latent s w e i Observed data variables Factor loadings v n s f w o w i e e t e Data v i e i v s v i c t l b e l View 1 2 3 1 2 3 a a u l s g ) n A a i s Components Treatments ) B a & ) C ≈ ≈ × × Model: D e g e n e Real numbers oxidation-reduction process small molecule biosynthetic process r a t i o n , Zero small molecule catabolic process N g r a o d u n l u small molecule metabolic process e l a , h r e , Swelling Hematopoiesis, extramedullary p e a o t s o n i d i o a p p G1/S transition of mitotic cell cycle h h r a i l o n ↑ ↑ ↑ g c i a t i m l i c a e p t A r i c D N microtubule-based movement Increased mitosis 1 2 D A e g e 3 n e r a t i o Human Rat Rat n , a 1 2 c i d o B mitotic chromosome condensation o s s i p h i l m i t i c , e 3 regulation of transcription involved in G1/S phase of mitotic cell cycle o s i n o p h i l c i C 1 2 3 Hypertrophy D 1 2 cell cycle DNA replication 3 in vitro in vitro in vivo 1 2 cell division Anisonucleosis E 3 chromosome organization 1 2 F c e l l 3 e a r DNA packaging n u c l o n o 1 2 n , m G D N a t i o Cellular infiltration 3 A r e n i f l r t p i l c a a r i t o i n e l l u l 1 2 DNA strand elongation involved in DNA replication i n i t a i t C H i o n Change, eosinophilic 3 interphase m i c i t m o t i s c c a mitotic sister chromatid segregation e l l p l c c y o l i c l e t h i negative regulation of mitosis c y Single cell necrosis p o n , s nucleotide-excision repair, DNA gap filling o a t i b a telomere maintenance via recombination i z e , o l g u telomere maintenance via semi-conservative replication c n a a V h transcription-coupled nucleotide-excision repair C o r g n a Result: e n e cell cycle phase transition g l a t e l cellular response to stimulus v i f e s i D s N r e i o macromolecule metabolic process g n negative regulation of organelle organization A u regulation of mitotic metaphase/anaphase transition l a m t cell part morphogenesis protein modification by small protein conjugation or removal e i o t n a b o o f → Multi-level cross-organism l i m c e p t a p r o h c a e s s e s / a n a p h a drug responses s e t r a n s i i t o n o f c e l l c y c e l Organ-level Factors Molecular level (Pathological findings) (GO terms)

  12. P VI–VII: Cross-Organism Multi-Way Model Organism X Organism Y a) no matched variables, different dimensionalities data space data space covariate b X covariate b Y no paired samples { unknown alignments { healthy time series ( ): healthy varying lengths, { { diseased diseased { { { � { { { b) matching effect time � clusters based on their a = 1 2 3 4 5 � profiles disease a = 1 2 3 4 5 effect � b = 1 2 3 4 5 b = 1 2 3 4 5

  13. Summary New machine learning models for: P I Small sample size, high dimensionality ( n < p ) P II–III Incorporating prior information about the measurement process P IV–V Multiple data sources with co-occurring samples P VI–VII Multiple data sources without co-occurring samples

Recommend


More recommend