gene prediction with augustus
play

Gene Prediction with AUGUSTUS Genome annotation: challenges in - PowerPoint PPT Presentation

Gene Prediction with AUGUSTUS Ingo Bulla Gene Prediction with AUGUSTUS Genome annotation: challenges in eukaryotes and consequences for evolutionary genomics, 13 February 2018 Overview on Gene Prediction with RNA-Seq RGASP Assessment B


  1. Gene Prediction with AUGUSTUS Ingo Bulla Gene Prediction with AUGUSTUS Genome annotation: challenges in eukaryotes and consequences for evolutionary genomics, 13 February 2018 Overview on Gene Prediction with RNA-Seq RGASP Assessment B RAKER 1 homology-based Ingo Bulla Institut für Mathematik und Informatik Universität Greifswald 1.1

  2. Gene Prediction with AUGUSTUS About the speaker Ingo Bulla • PhD in mathematics about a non-applied topic, switched to bioinformatics in 2006 • Main research topic: Sequence analysis, phylogeny, evolution, epidemiology and public health of HIV Overview on Gene Prediction • Now working with Mario Stanke (developer of with RNA-Seq RGASP Assessment AUGUSTUS) on improving the algorithm used by B RAKER 1 AUGUSTUS homology-based • Limited experience in genomics, has only applied AUGUSTUS once in a research project → Speaker will have a Skype with • Mario Stanke or • Katharina Hoff (long-time user of AUGUSTUS, implementer of BRAKER) during the lunch talk if questions come up he cannot answer • Ingénieur de recherche in Perpignan from 1st of April on, in a wetlab group (Christoph Grunau, Guillaume Mitta) 1.2

  3. Gene Prediction with AUGUSTUS Ingo Bulla Overview on Gene Prediction 1 Overview on Gene Prediction with RNA-Seq RGASP Assessment B RAKER 1 with RNA-Seq 2 homology-based RGASP Assessment B RAKER 1 homology-based 3 1.3

  4. Structural Genome Annotation Problem Input • genome assemblie(s) • extrinsic evidence, e.g. from RNA-Seq, MS/MS, protein database Output • start- and end positions of genes, CDS, exons and introns ( .gff ) Example (12 600 bp from algae Chlamydomonas reinhardtii , with JGI)

  5. Gene Prediction with Example Application AUGUSTUS Ingo Bulla iBeetle: RNAi screen for the beetle Tribolium castaneum 1 predict genes 2 design primers based on prediction 3 produce dsRNA for each gene Overview on Gene Prediction 4 knock down each gene in larval and pupal stage with RNA-Seq RGASP Assessment 5 observe phenotype B RAKER 1 homology-based 6 study function for select genes 1.5

  6. Major Approaches to Protein-Coding Gene Prediction approach extrinsic evidence used programs ab initio - G ENE M ARK , A UGUSTUS , S NAP , F GENESH transcript seqs, BRAKER, Exonerate transcript- e.g. RNA-Seq A UGUSTUS , mGene based protein sequences A UGUSTUS -P PX , protein GeneWise, Exonerate homology additional (unannotated) comparative A UGUSTUS , genomes ( de novo ) C ONTRAST , N-S CAN peptides from A UGUSTUS proteogenomics mass spectrometry other gene predictions + J IGSAW , G LEAN , combiners/ transcript seqs + proteins + ? M AKER 2, P ASA selectors State of the art usually requires a combination of approaches: Use for every part of a gene all evidence available for that gene or region.

  7. Single species gene-finding: 1-species graph Assumptions: no alternative splicing, no gene overlap • graph represents all candidate gene structures • nodes: exon candidates (EC) • edges: introns and intergenic regions • each path from s to t is one gene structure • single species gene-finding in linear time: longest path algorithm 6 explicit intron 12 9 11 20 −2 7 30 8 forward 4 strand intron+2 intron+1 intron+0 t s intergenic region intron+0 intron+1 intron+2 6 reverse 5 12 strand 3 3 7 9

  8. Gene finder A UGUSTUS • developed since 2002 (PI: Mario Stanke) • based on conditional random field (generalization of HMM) • probabilistic model of gene structures given signals, CDS, evidence • get most likely genes structure or a sample of likely ones Some genome annotation collobarations using A UGUSTUS Aedes aegypti yellow fewer mosquito: dengue fever Science , 2007 Brugia malayi parasitic worm, causes elephantiasis Science , 2007 Tribolium castaneum red flour beetle, pest and model organism Nature , 2008 Schistosoma mansoni parasite causing bilharziosis Nature , 2009 Coprinus cinereus fungus PNAS , 2010 Nasonia vitripennis wasp Science , 2010 Amphimedon queenslandica sponge Nature , 2010 Culex pipiens common mosquito Science , 2010 Ricinus communis castor bean Nature Biotechnology , 2010 Chlamydomonas reinhardtii green algae Proteomics , 2011 Galdieria sulphuraria red algae Science , 2013 Arabidopsis thaliana plant model organism PNAS , 2008 Heliconius melpomene butterfly Nature , 2012 Apis mellifera honey bee BMC Genomics , 2014

  9. Gene Prediction with AUGUSTUS Ingo Bulla Overview on Gene Prediction 1 Overview on Gene Prediction with RNA-Seq RGASP Assessment B RAKER 1 with RNA-Seq 2 homology-based RGASP Assessment B RAKER 1 homology-based 3 1.9

  10. Three Major Approaches to Gene-Finding with RNA-Seq align to genome RNA-Seq C e.g. Augustus de novo assembly coverage A e.g. Cu ffl inks B genome guided assembly noncoding gene protein-coding genes fi nd soon with Augustus also A evidence integration into gene finder (e.g. A UGUSTUS , F GENESH , M G ENE , G ENEID ) 1 align reads to genome first 2 integrate evidence from coverage and spliced alignments into gene finder B purely alignment-based (e.g. Cufflinks) 1 align reads to genome first 2 construct transcripts from spliced alignments (no gene finding) C de novo assembly of reads (e.g. Trinitry, TransDecoder, Velvet + A UGUSTUS ) 1 assemble transcriptome reads into transcript contigs 2 use contigs for gene finding or just align them

  11. A UGUSTUS using RNA-Seq Using RNA-Seq only (on human) spliced alignments used to predict alternative splicing ab initio model dominates where little or no evidence

  12. Gene Prediction with AUGUSTUS Ingo Bulla RGASP: RNA-Seq Genome Annotation Assessment Project Overview on Gene Prediction Assessment of transcript reconstruction methods for RNA-seq with RNA-Seq Steijger et al., Nature Methods , Nov. 2013 RGASP Assessment B RAKER 1 • assessed the progress of automatic gene building using homology-based RNAseq • part of ENCODE project • 17 participating groups submitted, all on same data 1.12

  13. Excerpt of RGASP assessment results on human Calling transcripts and proteins: Best results on transcript sensitivity gene sensitivity fly 24% 49% (A UGUSTUS ) worm 48% 61% (T RANSOMICS )

  14. Why was the accuracy not better? Problems: intronic transcription, self-similarity of genome

  15. Reminder: RNA-Seq does not give you the protein sequence

  16. B RAKER 1 Collaboration with former competitor Mark Borodovsky (G ENE M ARK ) • M AKER 2 pipeline uses G ENE M ARK and A UGUSTUS • Why not throw together • G ENE M ARK -ET that self-trains on RNA-Seq and • A UGUSTUS that predicts with RNA-Seq ourselves ? • easy to use: braker.pl [OPTIONS] -genome=genome.fa -bam=rnaseq.bam • fast (1 day for fly on 1 CPU)

  17. Gene Prediction with GeneMark-ET (2014): unsupervised training of parameters AUGUSTUS Ingo Bulla Overview on Gene Prediction with RNA-Seq RGASP Assessment B RAKER 1 GeneMark does not use RNA-Seq for prediction. homology-based Anchors from RNA-Seq for training 1.17

  18. Gene Prediction with BRAKER1 Pipeline AUGUSTUS Ingo Bulla Overview on Gene Prediction with RNA-Seq RGASP Assessment B RAKER 1 homology-based 1.18

  19. Gene Prediction with Comparing BRAKER1 to MAKER2 (using RNA-Seq only) AUGUSTUS Ingo Bulla C. elegans D. melanogaster S. pombe A. thaliana 38 ● ● ● Gene Sensitivity ● ● ● Overview on Gene 33 Gene Specificity ● Prediction ● ● Transcript Sensitivity ● BRAKER1 − MAKER2 28 with RNA-Seq Transcript Specificity ● Exon Sensitivity RGASP Assessment 23 Exon Specificity B RAKER 1 18 homology-based ● 13 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 8 ● ● 3 ● ● ● ● ● ● −2 −7 BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS 1.19

  20. Gene Prediction with Accuracy of BRAKER1 AUGUSTUS Ingo Bulla C. elegans D. melanogaster A. thaliana S. pombe 86 ● ● Overview on Gene ● ● Prediction 81 ● ● ● ● ● ● ● ● ● ● ● 76 with RNA-Seq ● ● ● 71 RGASP Assessment ● ● ● B RAKER 1 66 ● % 61 homology-based ● 56 ● ● ● ● Gene Sensitivity ● 51 Gene Specificity Transcript Sensitivity ● 46 ● ● ● Transcript Specificity ● 41 Exon Sensitivity ● 36 Exon Specificity ● 31 BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS BRAKER1− GeneMark−ET BRAKER1− AUGUSTUS 1.20

  21. Gene Prediction with AUGUSTUS Ingo Bulla Overview on Gene Prediction 1 Overview on Gene Prediction with RNA-Seq RGASP Assessment B RAKER 1 with RNA-Seq 2 homology-based RGASP Assessment B RAKER 1 homology-based 3 1.21

  22. Homology-Based Gene-Finding Approaches genome MSA simultaneous genome annotation e.g. AUGUSTUS, GSA-MPSA e.g. N-SCAN, CONTRAST conservation conserved non-coding e.g. Genewise, e.g. AUGUSTUS-PPX exonerate single protein alignment protein MSA

Recommend


More recommend