analysis of high throughput biological data part ii
play

Analysis of High-Throughput Biological Data Part II: Computational - PowerPoint PPT Presentation

NZIMA NZIMA Napier Napier 2008 2008 Analysis of High-Throughput Biological Data Part II: Computational Bottlenecks and Novel Applications Mike Langston Professor Department of Electrical Engineering and Computer Science University of


  1. NZIMA NZIMA Napier Napier 2008 2008 Analysis of High-Throughput Biological Data Part II: Computational Bottlenecks and Novel Applications Mike Langston Professor Department of Electrical Engineering and Computer Science University of Tennessee and Collaborating Scientist Biological Sciences Division Oak Ridge National Laboratory USA 21 February 2008 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  2. NZIMA Outline of Talk Napier 2008 Foundations Gene Coexpression Analysis Data Integration Application to Human Health Protein Complex Prediction Application to Model Organisms 2 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  3. NZIMA Outline of Talk Napier 2008 Foundations Gene Coexpression Analysis Data Integration Application to Human Health Protein Complex Prediction Application to Model Organisms 3 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  4. NZIMA Foundations Napier 2008 Systems Biology • How do biological entities function in unison and at all levels of scale? • Linkage, communication and networks (graphs!) 4 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  5. NZIMA Foundations Napier 2008 Systems Biology Correlation Here are five mouse genes with Pearson correlations of at least 0.65. What of • noise? • experimental design? • circadian rhythms? • other confounds? • other metrics? 5 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  6. NZIMA Foundations Napier 2008 Systems Biology Correlation Coefficient Profiles Sometimes via • Pearson • Spearman • Mutual Information • Etc Other times we need • p-values • Bonferroni corrections • q-values • false discovery rates... 6 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  7. NZIMA Foundations Napier 2008 Systems Biology Correlation Omics: key to deciphering complex systems 7 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  8. NZIMA Foundations Napier 2008 Systems Biology Correlation Omics: key to deciphering complex systems Humans: 10 14 + cells, 200+ cell types 8 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  9. NZIMA Foundations Napier 2008 Systems Biology Correlation Omics: key to deciphering complex systems Humans: 10 13 + cells, 200+ cell types Genome (blueprint, 20K+ genes, 10M+ polymorphisms) 9 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  10. NZIMA Foundations Napier 2008 Systems Biology Correlation Omics: key to deciphering complex systems Humans: 10 13 + cells, 200+ cell types Genome (blueprint, 20K+ genes, 10M+ polymorphisms) Proteome (functional units, unknown # of proteins) 10 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  11. NZIMA Foundations Napier 2008 Systems Biology Correlation Omics: key to deciphering complex systems Humans: 10 13 + cells, 200+ cell types Genome (blueprint, 20K+ genes, 10M+ polymorphisms) Proteome (functional units, unknown # of proteins) Transcriptome Translation (tRNA) via transcription (mRNA) Function and Signaling (siRNA, miRNA, etc) 11 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  12. NZIMA Foundations Napier 2008 Systems Biology Correlation Omics: key to deciphering complex systems Humans: 10 13 + cells, 200+ cell types Genome (blueprint, 20K+ genes, 10M+ polymorphisms) Proteome (functional units, unknown # of proteins) Transcriptome Translation (tRNA) via transcription (mRNA) Function and Signaling (siRNA, miRNA, etc) Other: metabalome, lipidome, interactome, omeome! 12 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  13. NZIMA Foundations Napier 2008 Systems Biology Correlation Omics Visualization - highly dependent on scale 13 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  14. NZIMA Foundations Napier 2008 Systems Biology Correlation Omics Visualization - highly dependent on scale - the only omics often seen is a “rediculome” 14 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  15. NZIMA Foundations Napier 2008 Systems Biology Correlation Omics Visualization Computational Tools - focus usually on dense subgraphs 15 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  16. NZIMA Foundations Napier 2008 Systems Biology Correlation Omics Visualization Computational Tools Maximum Clique • must run often • time is a limiting factor • exploit fixed-parameter tractability (FPT) 16 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  17. NZIMA Foundations Napier 2008 Systems Biology Correlation Omics Visualization Computational Tools Maximum Clique Maximal Clique • huge outputs • various orderings • memory is often the limiting factor 17 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  18. NZIMA Foundations Napier 2008 Systems Biology Correlation Omics Visualization Computational Tools Maximum Clique Maximal Clique Biclique • new algorithms • bipartite graphs 18 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  19. NZIMA Foundations Napier 2008 Systems Biology Correlation Omics Visualization Computational Tools Maximum Clique Maximal Clique Biclique Paraclique • noisy data 19 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  20. NZIMA Outline of Talk Napier 2008 Foundations Gene Coexpression Analysis Data Integration Application to Human Health Protein Complex Prediction Application to Model Organisms 20 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  21. NZIMA Coexpression Analysis Napier 2008 cDNA or mRNA Microarrays cDNA or mRNA Microarrays Raw Data Toolchain Normalization Normalization Gene Expression Profiles Correlation Computation Correlation Computation Real-Valued Matrix Principal Component Graph k-Means Principal Component Graph k-Means . . . . . . . . … Clustering Analysis Transforms Clustering Analysis Transforms Edge-Weighted Complete Graph Unsupervised Methods High-Pass Filtering High-Pass Filtering Unweighted Incomplete Graph Maximum FPT VC . . . . . Clique Codes Maximal k-Connected HCS Clique-Centric . . . . . k-Cores . . . . Clique Components Subgraphs Methods . HPC & Biclique . NP -complete Novel . Problems . Methods . . Increasing Edge Density Paraclique (and Increasing Problem Complexity) 21 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  22. NZIMA Coexpression Analysis Napier 2008 Gene (vertex) comparisons: • differential expression • does not require multiple conditions • compare the two lists of gene expression levels 22 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  23. NZIMA Coexpression Analysis Napier 2008 Correlate (edge) comparisons • differential correlation • requires multiple conditions in control versus stimulus • compare two lists of gene-gene correlations 23 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  24. NZIMA Coexpression Analysis Napier 2008 Putative network (clique) comparisons • differential topology • compare cliques, sort by ontology, CREs, etc • consider granularity, for example, with the clique intersection graph 24 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  25. NZIMA Coexpression Analysis Napier 2008 Seven Quantative Trait Loci Transcript abundance can be the phenotype! There’s a high probability that somewhere in here is a polymorphism controlling this trait. 25 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  26. NZIMA Coexpression Analysis Napier 2008 Two Paracliques Concentrated Parental Alleles 26 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  27. NZIMA Outline of Talk Napier 2008 Foundations Gene Coexpression Analysis Data Integration Application to Human Health Protein Complex Prediction Application to Model Organisms 27 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  28. NZIMA Data Integration Napier 2008 Phenotypic Data (e. g., diseased versus healthy patients) 28 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  29. NZIMA Data Integration Napier 2008 Phenotypic Data (e. g., diseased versus healthy patients) Proteomic Data (e. g., amino acid peaks from mass spec) 29 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

  30. NZIMA Data Integration Napier 2008 Phenotypic Data (e. g., diseased versus healthy patients) Proteomic Data (e. g., amino acid peaks from mass spec) Transcriptomic Data (e.g., gene expression from µarrays) 30 ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

Recommend


More recommend