Bacterial Genome Annotation Lucile Soler Annotation course 9 th -11 - PowerPoint PPT Presentation

Bacterial Genome Annotation Lucile Soler Annotation course 9 th -11 th may 2017

Bacterial genome characteristics • A bacterial genome is a single "circular” DNA molecule with several million base pairs in size • Bacteria can contains plasmids (small and circular DNA molecules, that contain (usually) non-essential genes) • Genomes contain a few thousand genes. • ”Gene density” is much higher than in humans, one million base pairs of bacterial DNA contains about 500 to 1000 genes. – bacterial genes have no introns, – the average number of codons in bacterial genes is less than in human genes, – neighboring genes are very close together throughout the genome

Bacterial feature types ● protein coding genes o promoter (-10, -35) o ribosome binding site (RBS) o coding sequence (CDS) signal peptide, protein domains, structure § o terminator ● non coding genes o transfer RNA (tRNA) o ribosomal RNA (rRNA) o non-coding RNA (ncRNA) ● other o repeat patterns, operons, origin of replication, ...

Automatic annotation Two strategies for identifying coding genes: ● sequence alignment o find known protein sequences in the contigs transfer the annotation across § o will miss proteins not in your database o may miss partial proteins ● ab initio gene finding o find candidate open reading frames build model of ribosome binding sites § predict coding regions § o may choose the incorrect start codon o may miss atypical genes, overpredict small genes

Some good existing tools ab align- Software Availability Speed initio ment RAST yes yes web only 12-24 hours BG7 no yes standalone >10 hours PGAAP yes yes email / we >1 month (NCBI) Seemann T. Prokka: rapid prokaryotic genome annotation, presentation 2013

Prokka • Fast – exploits multi-core computers (aim < 15min) • Convenient – Does structural and functional annotation in one go • Standards compliant – GFF3/GBK for viewing, TBL/FSA for Genbank. • Also annotates Archaea, fungi, mitochondria, and viruses

Prokka • Complicated to install – many dependencies Feature prediction tools used by Prokka : Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics . 2014 Jul 15;30(14):2068-9. PMID:24642063

Prokka : method • Prodigal identifies the coordinates of candidates genes • Compares with a database of known sequences – Small trustworthy database: the user provides a set of annotation proteins (optional) – Medium-size domain specific database: Uniprot – Curated model of protein families: all proteins from finished bacterial genomes in Refseq – HMMs profile: Pfam, TIGRFAMS (with HMMER) – If nothing is found, label as ´hypothetical protein’

Prokka pipeline (simplified) tRNA GFF3 Aragorn GBK ASN1 rRNA RNAmmer FASTA contigs Infernal ncRNA Rfam sig_peptid Prodigal CDS SignalP e BLAST+ HMMER3 User Pfam TIGR Swiss protein annotation protein domains Seemann T. Prokka: rapid prokaryotic genome annotation, presentation 2013

Prokka options • Only one parameter mandatory : Input fasta format – prokka [options] <contigs.fasta> • More than 30 different options available – prokka --help

Command line options

Prokka output https://github.com/tseemann/prokka#output-files

Practical 1 • Annotate 3 bacteria • Use BUSCO to check genes completeness • Use Prokka to annotate the assemblies

Bacterial Genome Annotation Lucile Soler Annotation course 9 th -11 - PowerPoint PPT Presentation

Bacterial Genome Annotation Lucile Soler Annotation course 9 th -11 th may 2017 Bacterial genome characteristics A bacterial genome is a single "circular DNA molecule with several million base pairs in size Bacteria can contains

Diagnosing bacterial Diagnosing bacterial Diagnosing bacterial Diagnosing bacterial infections

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Annotation Processing in a Kotlin World Zac Sweers @pandanomic Annotation Processing in a

NEGATIVE NEGATIVE Lack of bacterial Spread of R R Lack of bacterial Spread of eradication

Problems with metagenome annotation How much has been sequenced? Number of known sequences 100

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

Quantifying gene expression Genome Sequence reads GTF (annotation)? FASTQ (+reference

Annotation and Evaluation Diana Maynard, Niraj Aswani University of Sheffield University of

Lecture 2 Annotation tools & Segmentation Summary of Part 1 Annotation theory

Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies Systematic Annotation Review RTFM

Assessing annotation Assessing annotation consistency in the Gene consistency in the Gene

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

Web Annotations Building the Experience Annotation An annotation is something added. It is not

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics

Supporting Transitions Cultural Connections for People on the Autism Spectrum and other

Reading Reading: Angel 5.6, 9.10.3 Optional reading: Foley, van Dam, Feiner, Hughes,

How Deep Learning, could help to improve GeoSpatial data quality ? an OSM use case @o_courtin

Interactive Remote Large-Scale Data Visualization via Prioritized Multi-resolution Streaming Jon

O PTIMUS C LOUD : Heterogeneous Configuration Optimization for Distributed Databases in the Cloud

Management and visualization of multitemporal data in GRASS GIS 7 Anna Petrasova MEA 592

blo lood cult lture bottles. Gunnar Kahlmeter EUCAST Development Laboratory (EDL) On

A glimpse at the -calculus Precise Modeling and Analysis group University of Oslo Daniel Fava

Sambuz

Useful Links

Newsletter

Mail Us

Bacterial Genome Annotation Lucile Soler Annotation course 9 th -11 - PowerPoint PPT Presentation

Bacterial Genome Annotation Lucile Soler Annotation course 9 th -11 th may 2017 Bacterial genome characteristics A bacterial genome is a single "circular DNA molecule with several million base pairs in size Bacteria can contains

Diagnosing bacterial Diagnosing bacterial Diagnosing bacterial Diagnosing bacterial infections

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics &amp; Computational

Annotation Processing in a Kotlin World Zac Sweers @pandanomic Annotation Processing in a

NEGATIVE NEGATIVE Lack of bacterial Spread of R R Lack of bacterial Spread of eradication

Problems with metagenome annotation How much has been sequenced? Number of known sequences 100

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

Genome Sequencing &amp; Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

Quantifying gene expression Genome Sequence reads GTF (annotation)? FASTQ (+reference

Annotation and Evaluation Diana Maynard, Niraj Aswani University of Sheffield University of

Lecture 2 Annotation tools &amp; Segmentation Summary of Part 1 Annotation theory

Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies Systematic Annotation Review RTFM

Assessing annotation Assessing annotation consistency in the Gene consistency in the Gene

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

Web Annotations Building the Experience Annotation An annotation is something added. It is not

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics

Supporting Transitions Cultural Connections for People on the Autism Spectrum and other

Reading Reading: Angel 5.6, 9.10.3 Optional reading: Foley, van Dam, Feiner, Hughes,

How Deep Learning, could help to improve GeoSpatial data quality ? an OSM use case @o_courtin

Interactive Remote Large-Scale Data Visualization via Prioritized Multi-resolution Streaming Jon

O PTIMUS C LOUD : Heterogeneous Configuration Optimization for Distributed Databases in the Cloud

Management and visualization of multitemporal data in GRASS GIS 7 Anna Petrasova MEA 592

blo lood cult lture bottles. Gunnar Kahlmeter EUCAST Development Laboratory (EDL) On

A glimpse at the -calculus Precise Modeling and Analysis group University of Oslo Daniel Fava

Sambuz

Useful Links

Newsletter

Mail Us

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

Lecture 2 Annotation tools & Segmentation Summary of Part 1 Annotation theory