problems with metagenome annotation
play

Problems with metagenome annotation How much has been sequenced? - PowerPoint PPT Presentation

Problems with metagenome annotation How much has been sequenced? Number of known sequences 100 Environmental bacterial sequencing genomes First 1,000 bacterial bacterial genome genomes Year If the database doubles every 15 months,


  1. Problems with metagenome annotation

  2. How much has been sequenced? Number of known sequences 100 Environmental bacterial sequencing genomes First 1,000 bacterial bacterial genome genomes Year If the database doubles every 15 months, how often do you need to rerun your sample?

  3. Long Queues

  4. MG-RAST speed is not dependent on MG size! Days to weeks Minutes to seconds

  5. The SEED database ● Started with a few subsystems

  6. Over 2,000 subsystems ● Unmanageable! ● Needed a solution so the annotators could fjnd their subsystems. ● Created hierarchy

  7. Over 2,000 Subsystems Three level “hierarchy” • Amino Acids and Derivatives – Alanine, serine, and glycine • Serine Biosynthesis • Amino Acids and Derivatives – Lysine, threonine, methionine, and cysteine • Methionine Biosynthesis

  8. # # Classifjcation SS Classifjcation SS Classifjcation # SS Experimental 498 Regulation and Cell 51 Motility and 11 Subsystems signaling Chemotaxis Clustering-based 352 Virulence 49 Plant cell walls and 10 subsystems outer surfaces Carbohydrates 160 Stress Response 43 Phages 10 Cofactors, Vitamins, 123 DNA Metabolism 41 Cell Division and Cell 10 Prosthetic Groups, Cycle Pigments Amino Acids and 96 Aromatic Compounds 38 Photosynthesis 9 Derivatives Protein Metabolism 95 Phages 36 Metabolite damage 8 Virulence, Disease, 70 Secondary Metabolism 34 Phosphorus 7 Defense Metabolism Miscellaneous 70 Iron acquisition and 31 Potassium metabolism 4 metabolism RNA Metabolism 65 Nucleosides and 24 Transcriptional 2 Nucleotides regulation Membrane Transport 65 Sulfur Metabolism 20 Plasmids 2 Respiration 62 Dormancy and 17 Central metabolism 2 Sporulation Cell Wall and Capsule 62 Plant-prokaryote 12 Autotrophy 2 Fatty Acids, Lipids, and 60 Nitrogen Metabolism 12 Arabinose Transport 1

  9. FQ8D8DZ01AWR9I One hit: xxx07431423 (fjg|448385.11.peg.379) DNA-directed RNA polymerase beta' subunit (EC 2.7.7.6) RNA polymerase bacterial

  10. FQ8D8DZ02G8RSI has two hits: xxx02998721 3e-04 “hypothetical protein” xxx05921978 4e-03 “Fibrinogen-binding protein” Fibrinogen-binding protein is in subsystem “Streptococcus pyogenes virulome”

  11. FQ8D8DZ02GF820 207 hits Glutamate synthase [NADPH] large chain (EC 1.4.1.13) ● Ammonia assimilation ● Ammonium metabolism H. pylori ● Glutamine, Glutamate, Aspartate and Asparagine Biosynthesis ● Iron-sulfur experimental

  12. FQ8D8DZ02GF820 has 250 hits:

  13. Does it matter? ● Compare things that are the same! ● Know which version of the database you used ● Recompute if you are not sure!

Recommend


More recommend