Phylogenomic inference Hauptseminar Frishman WS2013/2014 Uli Khler - PowerPoint PPT Presentation

Phylogenomic inference Hauptseminar Frishman WS2013/2014 Uli Köhler February 3rd 2014 Folie 2 von 27

Structure of this talk ◮ Issues of non-phylogenic functional prediction ◮ What is phylogenomic inference? ◮ Phylogenetic tree reconciliation ◮ Phylogenomic inference methodology ◮ Phylogenomic databases and algorithms: ◮ SIFTER ◮ PhyloFacts ◮ Common problems of phylogenomic predictions ◮ Future of phylogenomics ◮ Seminar conclusion Folie 3 von 27

Non-phylogenomic function prediction ◮ High-throughput sequencing → Many proteins, few information available: ~90000 PDB structures vs 5 . 1 × 10 6 UniProt/TrEMBL sequences ◮ Alignment score does not distinguish between matching domains ◮ Difficult to separate orthologs and paralogs Folie 4 von 27

What is phylogenomic inference? I Phylogenomic inference infer function analyze genomes Evolutionary relationship (phylogenetics) Folie 5 von 27

What is phylogenomic inference? II ◮ Concept to enhance homology-based function predictions ◮ Can be applied to both genes and proteins ◮ Attempt to separate orthologs and paralogs → ortholog = high probability of similar or identical function ◮ Phylogenetic tree reconciliation : Identify speciation and duplication events in phylogenetic trees Folie 6 von 27

Tree reconciliation Are B and C ortholog or paralog in respect to A? A B C

Tree reconciliation Duplication or speciation? A B C

Tree reconciliation (Example) Duplication Speciation B: ortholog C: paralog A B C Folie 7 von 27

Phylogenomic inference methodology I 1. Cluster homolog proteins 2. Compute multiple alignment 3. Edit alignment (remove potential non-homologs) 4. Mask less-conserved regions in alignment 5. Construct phylogenetic tree 6. Identify closely related subtrees 7. Overlay with experimental data 8. Differentiate orthologs and paralogs ( Tree reconciliation ) 9. Infer function from orthologs Folie 8 von 27

Phylogenomic inference methodology II 1. Cluster homolog proteins 2. Compute multiple alignment 3. Edit alignment 4. Mask less-conserved regions in alignment ◮ Raw alignments would introduce noise ◮ Retain only high-scoring homology & highly-conserved domains Folie 9 von 27

Phylogenomic inference methodology III 5. Construct phylogenetic tree ◮ Core problems: ◮ No information about actual ancestors is available ◮ High computational complexity (optimal solution: NP-Hard!) ◮ Use algorithms like maximum parsimony or maximum likelihood Folie 10 von 27

Phylogenomic inference methodology IV 6. Identify closely related subtrees 7. Overlay with experimental data ◮ More filtering to reduce noise ◮ Given the tree topology, use only closely related subgroups (in addition to filtering distant homologs in step 1) Folie 11 von 27

Phylogenomic inference methodology V 8. Differentiate orthologs and paralogs ◮ Computational tree reconciliation – examples: ◮ NCBI COG DB: Bidirectional top BLAST hits ◮ Complex statistical algorithms like RIO ( Resampled inference of orthologs ), orthostrapper or BETE ◮ Computationally intensive, requires highly-filtered input data Folie 12 von 27

SIFTER 9. Infer function from orthologs ◮ Statistical Inference of Function Through Evolutionary Relationships ◮ Predicts protein function (homology-based) given a reconciled tree → Tree construction & reconciliation remains a problem ◮ Based on bayesian statistics ◮ Complex mathematics (not shown here) Folie 13 von 27

PhyloFacts I ◮ „Encyclopedia“of „books“for known protein (super)families and structura domains ◮ 92800 families (as of 2013-02-03) ◮ Precomputed phylogenetic trees & phylogenomic family HMMs → Reasonably fast, but „ Some results can take hours to complete “ ◮ Provides structured access to annotated phylogenomic information about protein (super)families Folie 14 von 27

PhyloFacts II ◮ FAT-CAT : PhyloFacts Webservice to predict protein function using phylogenomic methods ◮ Integrates with Pfam and uses HMMs to find the sequence position in the precomputed tree Folie 15 von 27

PhyloFacts III Folie 16 von 27

Issues of phylogenomic methods I in-silico – Involves manual steps 1. Cluster homolog proteins 2. Compute multiple alignment 3. Edit alignment 4. Mask less-conserved regions in alignment 5. Construct phylogenetic tree 6. Identify closely related subtrees 7. Overlay with experimental data 8. Differentiate orthologs and paralogs 9. Infer function from orthologs Folie 17 von 27

Issues of phylogenomic methods II 1. Cluster homolog proteins 2. Compute multiple alignment 3. Edit alignment 4. Mask less-conserved regions in alignment ◮ Manual annotation & selection → Subjective, error-prone, time/cost-intensive ◮ Information will be lost, does the annotator just select what he wants to see? ◮ Algorithms too sensitive, are results always reliable? Folie 18 von 27

Issues of phylogenomic methods III 5. Construct phylogenetic tree ◮ Distance-based vs. character-based construction algorithms ◮ Small, highly-conserved protein families perform better than large (super)families ◮ Lack of consistency across methods ◮ Algorithms scale poorly → Can’t be used for large (super)families ◮ Some methods produce millions of equivalently scored topologies Folie 19 von 27

Issues of phylogenomic methods IV 7. Overlay with experimental data ◮ Database = Experimental data + inferred data ◮ Experimental datasets available ↔ Protein function already know ◮ Protein function unknown ↔ few experimental datasets available Folie 20 von 27

Issues of phylogenomic methods V ◮ Multiple subsequent filter passes ◮ Huge sets of parameters, impossible to select optimal values ◮ Requires manual annotation & experimental data ◮ Sometimes even orthology is not sufficient for annotation transfer ◮ Doesn’t work well with distant homologs, requires highly-conserved domains Folie 21 von 27

Future of phylogenomic inference ◮ Phylogenomics alone has too many problems and open questions, but... Folie 22 von 27

Future of phylogenomic inference ◮ Phylogenomics alone has too many problems and open questions, but... ◮ ... together with other concepts functional prediction accuracy can be enhanced ◮ Computational complexity: Moore’s law and alternative computational hardware → Large-scale application feasible in the future? ◮ Phylogenomic inference for DB verification ◮ Can also be applied to other attributes (besides protein function) ◮ PhyloFacts & SIFTER: Usable tools, but apparently not widely adopted or actively developed Folie 22 von 27

Conclusion (Phylogenomic inference) ◮ Powerful concept for enhancing function prediction accuracy by identifying orthologs Folie 23 von 27

Conclusion (Phylogenomic inference) ◮ Powerful concept for enhancing function prediction accuracy by identifying orthologs ◮ ... if it would actually work in practice ◮ Too complex, too manual, too many parameters ◮ Pure in-silico phylogenomics → Low quality results ◮ Manual annotation can’t keep up with HTS ◮ PhyloFacts provides a useful database for function prediction using phylogenomic approaches Folie 23 von 27

Conclusion (Seminar) ◮ in-silico protein function inference is a yet unsolved problem in computational biology ◮ Combine any information that is available, including: ◮ Context-based prediction ◮ Alternative splicing ◮ SNPs ◮ Phylogenomics ◮ Experimental results ◮ Only with all this information combined sufficient accurracy for in-silico function prediction is achievable Folie 24 von 27

References Kimmen Sjölander Phylogenomic inference of protein molecular function: advances and challenges Bioinformatics , 2004 Barbara E. Engelhardt et al. Protein Molecular Function Prediction by Bayesian Phylogenomics PLoS Computational Biology , 2005 Jonathan A. Eisen & Claire M. Frasier Phylogenomics:Intersection of Evolution and Genomics Science , 2003 Duncan Brown, Kimmen Sjölander Functional Classification using Phylogenomic Inference PLoS Computational Biology , 2006 Nandini Krishnamurthy et al. PhyloFacts: an online structural phylogenomic encyclopedia for protein functional and structural classification Genome Biology , 2006 Barbara E. Engelhardt et al. A graphical model for predicting protein molecular function Proceedings of the International Conference on Machine Learning (ICML) , 2006 Folie 25 von 27

Web & image sources http://phylogenomics.berkeley.edu/ Folie 26 von 27

Thank you for your attention! References and sources available at https://github.com/ulikoehler/Hauptseminar Questions? Folie 27 von 27

Phylogenomic inference Hauptseminar Frishman WS2013/2014 Uli Khler - PowerPoint PPT Presentation

Phylogenomic inference Hauptseminar Frishman WS2013/2014 Uli Khler February 3rd 2014 Folie 2 von 27 Structure of this talk Issues of non-phylogenic functional prediction What is phylogenomic inference? Phylogenetic tree

Phylogenomic perspectives on reproductive Phylogenomic perspectives on reproductive isolation and

Graph-theore*c algorithms to improve phylogenomic analyses Tandy Warnow and Pranjal Vachaspa3

Exact Inference Inference Basic task for inference: Compute

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Inference in First-Order Logic C H A P T E R 9 H A S S A N K H O S R A V I S P R I N G 2 0 1 1

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Causal Inference and Response Surface Modeling Inference and

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Greed is Good if Randomized: New Inference for Dependency

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Inference in first-order logic Chapter 9 1 Outline Reducing first-order inference to

Variational Inference CMSC 691 UMBC Goal: Posterior Inference Hyperparameters Unknown

Inference Statistical inference Definition: Definition: The act or process of reaching

Inference in first-order logic Chapter 9 Chapter 9 1 Outline Reducing first-order inference

Inference in first-order logic Chapter 9 Chapter 9 1 Outline Reducing first-order inference

Inference in first-order logic Chapter 9 Chapter 9 1 Outline Reducing first-order inference

Foundations for Inference I Dajiang Liu @PHS525 Feb-09-2016 Statistical Inference

TensorRT 2. Setup of the TensorRT inference engine 2. Setup of the TensorRT inference engine 3. I/O

ACMS 20340 Statistics for Life Sciences Chapter 15: Inference in Practice Inference in Practice

Approximate Inference: Randomized Methods October 15, 2015 Topics Hard Inference

Variable Elimination 1 Inference Exact inference Enumeration Variable elimination

Advanced inference in probabilistic programs Brooks Paige Inference thus far Likelihood

What is ecological inference ( EI )? eiPack : Tools for R C Ecological Inference and Goal: