genome annotation
play

Genome Annotation The steps in genome sequencing Generate genome - PowerPoint PPT Presentation

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF calling tRNA identifjcation rRNA identifjcation Functional annotation Annotating Genomes Identifying which protein performs which


  1. Genome Annotation

  2. The steps in genome sequencing ● Generate genome sequence – Assembly – ORF calling – tRNA identifjcation – rRNA identifjcation – Functional annotation

  3. Annotating Genomes ● Identifying which protein performs which function

  4. www.sigmaaldrich.com

  5. Why annotate a genome? ● Catalog what's there ● Identify what's missing – but should be there! – Things you don't know ● In vitro growth – Mycoplasma pneumoniae ● Comparative genomics ● Hypothesis generation

  6. The goals of annotation ● Exchange information with others ● Compare annotations between organisms

  7. How to annotate a genome? ● Sequence ● Assemble ● Identify open reading frames – Putative proteins

  8. Putative protein ● Open Reading Frame (ORF) – A stretch of amino acids with no stop codon ● Coding Sequence (CDS) – An ORF that could encode a protein ● Protein encoding gene (PEG) – An ORF that could encode a protein ● Hypothetical protein = putative protein – Something that has not been experimentally shown ● Polypeptide – Short stretch of ~50 amino acids. Often a domain

  9. PEGS ● E. coli – 4,391 genes – 4,288 genes that make proteins (pegs)

  10. ORF Calling

  11. Genome Annotation

  12. The steps in genome sequencing ● Generate genome sequence – Assembly – ORF calling – tRNA identifjcation – rRNA identifjcation – Functional annotation

  13. Traditional genome annotation

  14. Traditional genome annotation BLAST Similarities

  15. Traditional genome annotation BLAST Similarities

  16. Traditional genome annotation BLAST Similarities

  17. Traditional genome annotation BLAST Similarities

  18. Traditional genome annotation BLAST Similarities

  19. Traditional genome annotation BLAST Similarities

  20. Traditional genome annotation BLAST Similarities

  21. Traditional genome annotation BLAST Similarities

  22. Traditional genome annotation BLAST Similarities

  23. Traditional genome annotation BLAST Similarities

  24. Traditional genome annotation BLAST Similarities

  25. Traditional genome annotation BLAST Similarities

  26. Traditional genome annotation BLAST Similarities

  27. Protein Families

  28. Protein Families

  29. Protein Families

  30. Protein Families

  31. Gene Ontology ● Ontology – A “hierarchy” of functions – Does not need to be linear ● Directed Acyclic Graph ● Controlled Vocabulary – Decides which words or phrases to use

  32. GO ● Gene ontology – A eukaryotic focus ● Drosophila ● Mus ● Saccharomyces ● Homo

  33. GO ● Cellular component – The parts of a cell ● Molecular function – e.g. ligand binding ● Biological processes – What things do

  34. GO Terms ● [GO ID, function] ● e.g: – GO:0004743 – Ontology: molecular function – Name: pyruvate kinase activity

  35. GO Terms ● [GO ID, function] ● e.g: – GO:0004743 – Ontology: molecular function – Name: pyruvate kinase activity ● Mainly assigned by BLAST/HMMER/... etc

  36. Directed Acyclic Graph Molecular function Catalytic activity Transferase activity Transferase activity, transferring phosphorous phosphotransferase activity, Kinase activity alcohol group as acceptor Pyruvate kinase activity

  37. Problems ● Annotation by committee ● Eukaryotic focus – Some efgorts to counter that ● Owen White ● Arriane Toussaint ● Not very deep ● Strict controlled vocabulary

  38. Alternatives

  39. Basic biology lacI lacZ lacY lacA Jacob & Monod, 1961

  40. Basic biology lacI lacZ lacY lacA

  41. Difgerent types of clustering < 80 % < 80 % < 80%

  42. Difgerent types of clustering < 80 % < 80 % < 80%

  43. Purine metabolism

  44. Difgerent types of clustering < 80 % < 80 % < 80%

  45. Heme / chlorophyll metabolism is conserved They are both porphyrins

  46. Occurrence of clustering in difgerent genomes 1 Clusters of genes w/ maximum 80% identity Genes in subsystems in clusters T otal number of genomes in group 120 Fraction of genes in clusters 0.8 Number of genomes 0.6 80 0.4 40 0.2 0 0 e e a - s e s i x s a a e i e a u e r i c t t g e d f c e e f o y t c o d a c i t m o u r i h o a o o c q c m b s a r l o A h u o e o l n r h r m t C n e i i c C e p a h r a D e S y T B C h T

  47. The Subsystems Approach to Annotation ● Subsystem is a generalization of “pathway” – collection of functional roles jointly involved in a biological process or complex ● Functional Role is the abstract biological function of a gene product – atomic, or user-defjned, examples: ● 6-phosphofructokinase (EC 2.7.1.11) ● LSU ribosomal protein L31p ● Streptococcal virulence factors Should not contain “putative”, “thermostable”, etc ● ● Populated subsystem is complete spreadsheet of functions and roles

  48. Histidine Degradation Conversion of histidine to glutamate ● Functional roles defjned in table ● Inclusion in subsystem is only by functional role ● Controlled vocabulary … ● Subsystem: Histidine Degradation 1 HutH Histidine ammonia-lyase (EC 4.3.1.3) 2 HutU Urocanate hydratase (EC 4.2.1.49) 3 HutI Imidazolonepropionase (EC 3.5.2.7) 4 GluF Glutamate formiminotransferase (EC 2.1.2.5) 5 HutG Formiminoglutamase (EC 3.5.3.8) 6 NfoD N-formylglutamate deformylase (EC 3.5.1.68) 7 ForI Formiminoglutamic iminohydrolase (EC 3.5.3.13)

  49. Subsystem Spreadsheet Subsystem Spreadsheet Organism Variant HutH HutU HutI GluF HutG NfoD ForI Bacteroides thetaiotaomicron Q8A4B3 Q8A4A9 Q8A4B1 Q8A4B0 1 Desulfotela psychrophila gi51246205 gi51246204 gi51246203 gi51246202 1 Halobacterium sp . Q9HQD5 Q9HQD8 Q9HQD6 Q9HQD7 2 Deinococcus radiodurans Q9RZ06 Q9RZ02 Q9RZ05 Q9RZ04 2 Bacillus subtilis P10944 P25503 P42084 P42068 2 Caulobacter crescentus P58082 Q9A9MI P58079 Q9A9M0 Q9A9L9 3 Pseudomonas putida Q88CZ7 Q88CZ6 Q88CZ9 Q88D00 Q88CZ3 3 Xanthomonas campestris Q8PAA7 P58988 Q8PAA6 Q8PAA8 Q8PAA5 3 Listeria monocytogenes -1 Column headers taken from table of functional roles ● Rows are selected genomes or organisms ● Cells are populated with specifjc, annotated genes ● Functional variants defjned by the annotated roles ● Variant code -1 indicates subsystem is not functional ● Clustering shown by color ●

  50. “The Populated Subsystem” Subsystem: Histidine Degradation 1 HutH Histidine ammonia-lyase (EC 4.3.1.3) 2 HutU Urocanate hydratase (EC 4.2.1.49) 3 HutI Imidazolonepropionase (EC 3.5.2.7) 4 GluF Glutamate formiminotransferase (EC 2.1.2.5) 5 HutG Formiminoglutamase (EC 3.5.3.8) 6 NfoD N-formylglutamate deformylase (EC 3.5.1.68) 7 ForI Formiminoglutamic iminohydrolase (EC 3.5.3.13) Subsystem Spreadsheet HutH HutU HutI GluF HutG NfoD ForI Organism Variant Bacteroides thetaiotaomicron Q8A4B3 Q8A4A9 Q8A4B1 Q8A4B0 1 Desulfotela psychrophila gi51246205 gi51246204 gi51246203 gi51246202 1 Halobacterium sp . Q9HQD5 Q9HQD8 Q9HQD6 Q9HQD7 2 Deinococcus radiodurans Q9RZ06 Q9RZ02 Q9RZ05 Q9RZ04 2 Bacillus subtilis P10944 P25503 P42084 P42068 2 Caulobacter crescentus P58082 Q9A9MI P58079 Q9A9M0 Q9A9L9 3 Pseudomonas putida Q88CZ7 Q88CZ6 Q88CZ9 Q88D00 Q88CZ3 3 Xanthomonas campestris Q8PAA7 P58988 Q8PAA6 Q8PAA8 Q8PAA5 3 Listeria monocytogenes -1

  51. Nan-operon within Sialic Acid Metabolism Microbial sialic acid metabolism has now been frmly established as a virulence determinant in a range of infectious diseases

  52. The nan -operon

Recommend


More recommend