bayesian decomposition expression to pathways
play

Bayesian Decomposition Expression to Pathways Michael Ochs - PowerPoint PPT Presentation

Bayesian Decomposition Expression to Pathways Michael Ochs Bioinformatics Group Fox Chase Cancer Center Cancer Biology Cancer is many Diseases but with a Single Theme a cell becomes immortal Insert poster a cell becomes mobile


  1. Bayesian Decomposition Expression to Pathways Michael Ochs Bioinformatics Group Fox Chase Cancer Center

  2. Cancer Biology Cancer is many Diseases but with a Single Theme • a cell becomes immortal Insert poster • a cell becomes mobile www.biosource.com Signalling and Metabolic Pathways Hold the Key Bioinformatics Group Fox Chase Cancer Center

  3. Signalling Pathways Stimulus Signal Transduction Transcription mRNA Downward, Nature , 411, 759, 2001 Bioinformatics Group Fox Chase Cancer Center

  4. Identifying Pathways Interacting Pathways Lead to Confusion if All Genes Need to Lie in a Single Cluster mRNA www.promega.com Bioinformatics Group Fox Chase Cancer Center

  5. Bayesian Decomposition • Data Mining/Pattern Recognition Algorithm – Unsupervised Method – Create Multiple, Overlapping “Clusters” • Each Gene can be in Multiple Patterns • Get to Pathways: Key for Cancer Development • Methodology – Markov Chain Monte Carlo Algorithm – Simulated Annealing – Integration of Prior Knowledge Bioinformatics Group Fox Chase Cancer Center

  6. BD: Matrix Decomposition Distribution of Exp M Exp 1 Patterns gene 1 * * * * * * * * * * * * * * * * * * * * pattern 1 pattern k * * * * * * * * * * Exp 1 Exp M * * * * * * * * * * * * * * * * * * * * gene 1 * * * * * * * * * * * * * * pattern 1 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * X * * * * * * * * * * * * * * = * * * * * * * * * * * * * * * * * * * * * * * * pattern k * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Patterns of * * * * gene N * * * * * * * * * * The behavior of * * * * one gene can be Behavior * * * * Data with different explained as a * * * * behaviors mixture of patterns * * * * gene N * * * * Bioinformatics Group Fox Chase Cancer Center

  7. BD: Domains A Atomic Domain P Atomic Domain convolution convolution A P * * * * * * * * * * * * * * * * * * * * * * * * Model * * * * * * * * * * * * * * * * * * * * * * * * X Domain * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Data * * * * * * * * * * * * * * * * * * * * * Data Domain * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Bioinformatics Group Fox Chase Cancer Center

  8. BD: Markov Chain MC Based on Maximum Entropy Data Consultants Massive Inference Sampler Cloud in N-Dimensional Space: Probability Density for the Model Results from Atomic Domain Prior, Model Functions (Prior), and the Likelihood Bioinformatics Group Fox Chase Cancer Center

  9. BD Requirements • Data Points > (A + P) Points • Atomic Domains (Sibisi and Skilling, J R Stat Soc B , 59 , 217, 1997) – Positive Additive Distributions – Infinitely Divisible Process • Model Domains – Linked to Atomic Domains by Model Function – Correlations between Parameters are Introduced by Model Functions (Atomic > Model) Bioinformatics Group Fox Chase Cancer Center

  10. BD Features • Basis Vectors (Patterns) are Nonorthogonal – Physically Meaningful if Good Model – Artifacts Removed if Do Not Fit Model • Noise is Treated – Noise is Integral Part of Fitting Process – Artifacts Often Appear in Residuals (i.e. noise) • Markov Chain Sampling Yields – Mean of Probable Distributions and Patterns – Uncertainties for Distributions and Patterns Bioinformatics Group Fox Chase Cancer Center

  11. BD: Gene Expression A Atomic Domain P Atomic Domain pattern 1 pattern k gene 1 * * * * * * * * * * * * * * * * * Exp M Exp 1 * * * * * * * * * * * * pattern 1 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * pattern k * * * * * * * * * * * * * * gene N * * * * Bioinformatics Group Fox Chase Cancer Center

  12. Rosetta Data Set • Filtering – Eliminate Genes • >25% Data Missing in Ratios or Uncertainties • < 2 Experiments with 3 Fold Change – Eliminate Experiments • < 2 Genes Changing by 3 Fold • Uncertainties – Used Values from Rosetta Error Model – Missing Data Log Ratio =1, Log Unc = 100 Bioinformatics Group Fox Chase Cancer Center

  13. Analysis • Analyzed Full Experimental Data with PCA – Estimate of Dimensionality of Data • Bayesian Decomposition – Filtered Data: 764 Genes, 228 Experiments – Ran Multiple Seeds, Multiple Pattern Number – Focus on Dimensions Suggested by PCA • Data Driven – Let Analysis Determine Where to Look Bioinformatics Group Fox Chase Cancer Center

  14. PCA Results Score (EigenValue) Principal Component Bioinformatics Group Fox Chase Cancer Center

  15. Bayesian Decomposition • Distributions • Experimental Patterns – Assignment of Genes – Experiments explained to Patterns by a single pattern – Correlations between experiments • Patterns • Genes in Patterns – Each Pattern Defines Behavior Across – Identify biological Experiments processes – Identify correlations in genes Bioinformatics Group Fox Chase Cancer Center

  16. Experiments High in One Pattern • Pattern 1 • Pattern 3 – YHR034C 56% – ssn6 (cyc8) 76% – YER024W 56% – tup1 54% • Pattern 2 – rpd3 89% • Pattern 5 – YJL107C 53% – yap3 51% Bioinformatics Group Fox Chase Cancer Center

  17. Genes in Patterns (Proteome Database Cellular Role) • Pattern 1 AA Pattern • Pattern 4 – 403 Genes – 276 Genes, 30/50 Unknown – 22/36 AA metabolism • Pattern 5 Carbo Pattern – 9 additional metabolism – 355 Genes • Pattern 2 – 14/37 carbohydrate metabolism – 410 Genes – 7/37 cell stress – 7/27 metabolism – 6 transport – 7/27 DNA/RNA processing • Pattern 6 – 6 transport – 297 Genes, 30/50 unknown • Pattern 3 Metabolic Pattern • Pattern 7 Mating Pattern – 390 Genes – 223 Genes – 13/26 metabolism – 13/23 mating response – 6 transport, 4 Pol II – 5/23 meiosis Bioinformatics Group Fox Chase Cancer Center

  18. Metabolic Patterns • Patterns 1 and 5 • Patterns 1, 3, and 5 – yap 3 98% – ssn6 100% – YJL107C 98% – swi6 99% – YHR034C 98% – yap 3 98% – FR901,228 98% – YJL107C 98% – YHR034C 98% – FR901,228 98% Bioinformatics Group Fox Chase Cancer Center

  19. Metabolic Pattern 80% Behavior Explained by Pattern 70% 60% 50% ssn6 (haploid) 40% tup1 (haploid) yer024w 30% 20% 10% 0% Metab Mating AA Carbo Bioinformatics Group Fox Chase Cancer Center

  20. Sterile Family Proteins 45% 40% ste11 (haploid) ste12 (haploid) ste18 (haploid) ste2 (haploid) 35% ste20 (**11) ste24 (haploid) Behavior Explained by Pattern ste4 (haploid) 30% ste5 (haploid) ste7 (haploid) 25% 20% 15% 10% 5% 0% AA Patt 2 Metab Patt 4 Carbo Patt 6 Mating Bioinformatics Group Fox Chase Cancer Center

  21. Ste2 45% ste2 (haploid) 40% yil117c (haploid) 35% Behavior Explained by Pattern 30% 25% YIL117C is prm5, a pheromone regulated 20% protein of unknown function 15% 10% 5% 0% AA Patt 2 Metab Patt 4 Carbo Patt 6 Mating Bioinformatics Group Fox Chase Cancer Center

  22. Mating Pattern 40% dig1, dig2 (haploid) Behavior Explained by Pattern 35% dig1, dig2 30% 25% dig1 20% dig2 15% ste20 (**11) 10% 5% fus3 (haploid) 0% Metab Mating AA Carbo Bioinformatics Group Fox Chase Cancer Center

  23. Mating Pathway Posas, et al, Curr Opin Microbiology, 1, 175, 1998 Bioinformatics Group Fox Chase Cancer Center

  24. Mating Pattern 50% 45% Behavior Explained by Pattern 40% 35% fus3 (haploid) 30% fus3, kss1 (haploid) ste11 (haploid) 25% ste7 (haploid) ste5 (haploid) 20% ste12 (haploid) 15% 10% 5% 0% Metab Mating AA Carbo 6 Bioinformatics Group Fox Chase Cancer Center

  25. Bioinformatics Group Ratio of Expression 0.5 1.5 2.5 0 1 2 3 Correlations (Fus3/Kss1) YAL012W YAL034W fus3, kss1 (haploid) fus3 (haploid) YAL066W YAR009C YAR047C YAR070C YBL005W YBL043W YBL049W YBL098W YBL101W YBR012C YBR012W YBR040W Fox Chase Cancer Center

  26. Bioinformatics Group Ratio of Expression 0.5 1.5 2.5 0 1 2 3 Mating Pattern Correlations YAL018C YMR082C fus3, kss1 (haploid) fus3 (haploid) MEI5 YFR057W STE2 ** * PES4 FIG1 ** * YAL066W SPO19 YNL018C YNL028W * YOR235W * AGA2 ** YMR082C FUS1 YOR376W YMR082C * * SST2 ** BAR1 ** * YMR082C * YOL131W PRR2 * AGA1 ** * YER181C * YPL280W YLR042C Fox Chase Cancer Center HOP2 TEC1

  27. Pattern 6 • Mating Pattern -> Pattern 6 for Ste11, Ste7, Ste5, and Ste12 • PSI-BLAST and SMART pick up 5 matches among unknown ORFs to transposon and retroposon proteins Bioinformatics Group Fox Chase Cancer Center

  28. Conclusions • Life is Very Complex – Multiple Pathways and Interactions for Each Protein with Transcription/Translation – Natural Stochastic Variations • Analysis Tools Must – Isolate Areas of Interest without Loss of Knowledge Discovery – Incorporate Maximal Prior Knowledge to Reduce “Search Space” Bioinformatics Group Fox Chase Cancer Center

Recommend


More recommend