braker2 incorporating
play

BRAKER2 : Incorporating GeneMark-EP and AUGUSTUS Katharina J. - PowerPoint PPT Presentation

BRAKER2: Incorporating Protein Homology Information into Gene Prediction with BRAKER2 : Incorporating GeneMark-EP and AUGUSTUS Katharina J. Hoff, Protein Homology Information into Alexandre Lomsadze, Mario Stanke, Mark Borodovsky Gene


  1. BRAKER2: Incorporating Protein Homology Information into Gene Prediction with BRAKER2 : Incorporating GeneMark-EP and AUGUSTUS Katharina J. Hoff, Protein Homology Information into Alexandre Lomsadze, Mario Stanke, Mark Borodovsky Gene Prediction with GeneMark-EP and AUGUSTUS A pipeline for fully automated training and prediction Gene prediction BRAKER1: RNAseq Plant and Animal Genomes XXVI, January 14th 2018 BRAKER2: proteins Short evolutionary distance Long evolutionary distance Summary References Katharina J. Hoff, Alexandre Lomsadze, Mario Stanke, Mark Borodovsky Presenting author: katharina.hoff@uni-greifswald.de 1.1

  2. BRAKER2: Contents Incorporating Protein Homology Information into Gene Prediction with GeneMark-EP and AUGUSTUS Katharina J. Hoff, 1 Gene prediction Alexandre Lomsadze, Mario Stanke, Mark Borodovsky BRAKER1: RNAseq 2 Gene prediction BRAKER2: proteins 3 BRAKER1: RNAseq Short evolutionary distance BRAKER2: proteins Long evolutionary distance Short evolutionary distance Long evolutionary distance Summary References Summary 4 References 5 1.2

  3. Structural genome annotation problem Input • genome assembly • extrinsic evidence, e.g. from RNAseq, protein database Output • protein-coding genes: exon-intron structures ( .gff ) Example (from Chr I in C. elegans )

  4. BRAKER2: BRAKER1: RNAseq integration Incorporating Protein Homology Information into Gene Prediction with GeneMark-EP and AUGUSTUS Katharina J. Hoff, Alexandre Lomsadze, Mario Stanke, Mark Borodovsky Gene prediction BRAKER1: RNAseq BRAKER2: proteins Short evolutionary distance Long evolutionary distance Summary • >4000 downloads References • 73 citations since 2016 (google scholar) 1.4

  5. BRAKER2: BRAKER1: RNAseq integration Incorporating Protein Homology Information into Gene Prediction with GeneMark-EP and AUGUSTUS RNAseq.bam genome.fa Katharina J. Hoff, Alexandre Lomsadze, Mario Stanke, Mark Borodovsky GeneMark-ET genemark.gtf Gene prediction BRAKER1: RNAseq BRAKER2: proteins Short evolutionary distance AUGUSTUS Long evolutionary distance training Summary References AUGUSTUS prediction augustus.gtf 1.4

  6. BRAKER2: BRAKER2: Part I - proteins of closely related species Incorporating Protein Homology Information into Gene Prediction with GeneMark-EP and AUGUSTUS genome.fa protein.fa Katharina J. Hoff, Alexandre Lomsadze, Mario Stanke, Mark Borodovsky GenomeThreader Gene prediction BRAKER1: RNAseq AUGUSTUS BRAKER2: proteins training Short evolutionary distance Long evolutionary distance Summary References AUGUSTUS prediction augustus.gtf 1.5

  7. BRAKER2: Drosophila melanogaster and relatives Incorporating Protein Homology For a given species, Information into Gene Prediction with GeneMark-EP and • the average number of mutations per genomic site was computed AUGUSTUS from alignments of ortholog gene sequences (including introns). Katharina J. Hoff, Alexandre Lomsadze, • the protein identity was computed as average of identity values of Mario Stanke, Mark Borodovsky the best exonerate hit found for each protein of this species against the D. melanogaster genome. dsim 0.95 ● Gene prediction dere Average Protein Identity ● BRAKER1: RNAseq 0.90 BRAKER2: proteins Short evolutionary distance Long evolutionary distance 0.85 Summary dana References ● 0.80 dpse ● dwildvir 0.75 ● ● ● dgri 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Average Mutations per Genomic Site Image: S. König, L. Romoth, M. Stanke (2018) Comparative Genome Annotation 1.6

  8. BRAKER2: Increasing evolutionary distance leads to decreasing gene Incorporating Protein Homology prediction accuracy of AUGUSTUS Information into Gene Prediction with GeneMark-EP and AUGUSTUS Katharina J. Hoff, Alexandre Lomsadze, Mario Stanke, Mark Borodovsky AUGUSTUS ab initio prediction 70 ● BRAKER2 GenomeThreader training expert training BRAKER1 RNAseq training Gene prediction 60 Gene F1 ● BRAKER1: RNAseq ● BRAKER2: proteins Short evolutionary distance 50 ● Long evolutionary distance ● ● Summary ● ● ● References 40 dsim dere dana dpse dwil dvir dgri drm5 1.7

  9. BRAKER2: Increasing evolutionary distance leads to decreasing gene Incorporating Protein Homology prediction accuracy of AUGUSTUS Information into Gene Prediction with GeneMark-EP and AUGUSTUS Katharina J. Hoff, Alexandre Lomsadze, Mario Stanke, Mark Borodovsky AUGUSTUS prediction with training set hints 70 ● BRAKER2 GenomeThreader training ● BRAKER1 RNAseq training ● Gene prediction 60 Gene F1 BRAKER1: RNAseq BRAKER2: proteins Short evolutionary distance ● 50 ● Long evolutionary distance ● Summary ● ● ● References 40 dsim dere dana dpse dwil dvir dgri drm5 1.7

  10. BRAKER2: Increasing evolutionary distance leads to decreasing gene Incorporating Protein Homology prediction accuracy of AUGUSTUS Information into Gene Prediction with GeneMark-EP and AUGUSTUS Katharina J. Hoff, Alexandre Lomsadze, Mario Stanke, Mark Borodovsky With increasing distance between query protein and target genome, spliced alignments become • less sensitive while keeping a constant level of specificity Gene prediction (e.g. GenomeThreader), BRAKER1: RNAseq • or both less sensitive and less specific (e.g. Exonerate). BRAKER2: proteins Short evolutionary distance Long evolutionary distance Therefore, training AUGUSTUS on spliced alignments is Summary suitable upon availability of a very closely related query References species, only! 1.7

  11. BRAKER2: BRAKER2: Part II - proteins of more remote species Incorporating Protein Homology Information into Gene Prediction with GeneMark-EP and AUGUSTUS Katharina J. Hoff, Alexandre Lomsadze, Mario Stanke, Mark Borodovsky “Standard mapping approach”: proteins to genome genome.fa proteins.fa Gene prediction GenomeThreader BRAKER1: RNAseq BRAKER2: proteins CDS, introns, starts, stops Short evolutionary distance (protein.hints) Long evolutionary distance Summary References → works well for closely related species, only 1.8

  12. BRAKER2: BRAKER2: Part II - proteins of more remote species Incorporating Protein Homology Information into Gene Prediction with GeneMark-EP and AUGUSTUS GeneMark-EP protein mapping pipeline database of orthologous Katharina J. Hoff, gene clusters (proteins) Alexandre Lomsadze, Mario Stanke, Mark Borodovsky genome.fa GeneMark-ES genemark.gtf predicted proteins BlastP “hits” For each “hit”: nucleotide sequence predicted gene ProSplign (seed) Gene prediction BRAKER1: RNAseq introns (protein.hints) BRAKER2: proteins Short evolutionary distance GeneMark-EP Long evolutionary distance Summary genemark.gtf References AUGUSTUS training AUGUSTUS prediction augustus.gtf braker.pl 1.8

  13. BRAKER2: Protein database for gene prediction in D. melanogaster Incorporating Protein Homology Information into Gene Prediction with GeneMark-EP and AUGUSTUS Katharina J. Hoff, Alexandre Lomsadze, Mario Stanke, Mark Borodovsky Insect portion of EggNOG (inNOG) excluding Drosophila species • Acyrthosiphon pisum • Culex quinquefasciatus • Aedes aegypti • Danaus plexippus Gene prediction • Anopheles darlingi BRAKER1: RNAseq • Heliconius melpomene • Anopheles gambiae BRAKER2: proteins • Nasonia vitripennis Short evolutionary distance • Apis mellifera Long evolutionary distance • Pediculus humanus Summary • Atta cephalotes References • Tribolium castaneum • Bombyx mori 1.9

  14. BRAKER2: Intron recovery from protein mapping Incorporating Protein Homology Information into Protein mapping with no Drosophila EggNOG (inNOG) Gene Prediction with GeneMark-EP and AUGUSTUS • 30,996 introns predicted Katharina J. Hoff, Alexandre Lomsadze, • 21,843 matched introns in CDS part of the annotated Mario Stanke, Mark Borodovsky genes Introns in CDS 90 Gene prediction Sensitivity BRAKER1: RNAseq 80 Specificity BRAKER2: proteins Short evolutionary distance 70 Long evolutionary distance % 60 Summary References 50 40 30 Protein mapping RNAseq mapping Mapping of proteins from remote species recovers ∼ 45% of introns with specificity of ∼ 70%. 1.10

  15. BRAKER2: Intron recovery from protein mapping Incorporating Protein Homology Information into Gene Prediction with Protein mapping with some Drosophila species present as GeneMark-EP and AUGUSTUS external evidence Katharina J. Hoff, Alexandre Lomsadze, Mario Stanke, no_Dro no Drosophila species Mark Borodovsky w_gvw with D. grimshawi , D. virilis , D. willistoni w_gvwpa with D. grimshawi , D. virilis , D. willistoni , D. pseu- doobscura , D. ananassae Introns in CDS Gene prediction 90 BRAKER1: RNAseq Sensitivity 80 Specificity BRAKER2: proteins 70 Short evolutionary distance Long evolutionary distance % 60 Summary 50 References 40 30 no_Dro w_gvw w_gvwpa RNAseq → more introns were detected → performance of protein mapping with addition of 5 fly proteomes came closer to performance with RNAseq external evidence 1.10

Recommend


More recommend