population genomics
play

Population Genomics Image: Lisa Brown for National Public Radio Rob - PowerPoint PPT Presentation

Population Genomics Image: Lisa Brown for National Public Radio Rob Edwards San Diego State University Phages in the Worlds Oceans ARC 56 samples 16 sites 1 year BBC 85 samples 38 sites 8 years SAR 1 sample 1 site 1 year GOM 41


  1. Population Genomics Image: Lisa Brown for National Public Radio Rob Edwards San Diego State University

  2. Phages in the Worlds Oceans ARC 56 samples 16 sites 1 year BBC 85 samples 38 sites 8 years SAR 1 sample 1 site 1 year GOM 41 samples LI 13 sites 4 sites 5 years 1 year

  3. Most Marine Phage Sequences are Novel

  4. Marine Single-Stranded DNA Viruses • 6% of SAR sequences ssDNA phage ( Chlamydia -like Microviridae) • 40% viral particles in SAR are ssDNA phage • Several full-genome sequences were recovered via de novo assembly of these fragments • Confjrmed by PCR and sequencing

  5. SAR metagenome and Chlamydia φ4 Individual sequence reads Coverage Concatenated hits Chlamydia phi 4 Chl4 ORF genome calls 12,297 sequence fragments hit using TBLASTX over a ~4.5 kb genome

  6. The phage proteomic tree

  7. Signature sequences ● HECTOR and PARIS – Degenerate primers that amplify T7 DNA polymerase – T ested samples from around world – T ested by difgerent investigators in difgerent laboratories Mya Breitbart

  8. T7 phages are globally distributed ~10 26 copies of each sequence on the planet = 60 metric tons of this DNA sequence Breitbart et al, FEMS Micro Lett

  9. Some phages are everywhere ssDNA λ-like Phage Proteomic T4-like T ree v. 5 T7-like (Edwards, Rohwer)

  10. Compare viruses to all metagenomes Phage P4 – 11kb, 10 ORFs Azul – individual sequence reads in a metagenome Verde – coverage across genome

  11. Parts of viruses are everywhere # metagenome hits P4 phage genome

  12. Viruses have lots of unknown genes Known genes Unknown genes Microbial Viral

  13. Bas Dutilh

  14. cross Assembly metagenome 1 metagenome 2 metagenome 3 metagenome 4

  15. cross Assembly metagenome 1 metagenome 2 metagenome 3 metagenome 4 Assembly

  16. cross Assembly Contigs directly represent the overlap between samples http://edwards.sdsu.edu/crass/

  17. HMP viruses Reyes et al. Nature 2010

  18. Phages are more variable than microbes Microbes Phages Functions present in samples Reyes et al. Nature 2010

  19. De novo assembly HMP data 10000 6,988 de novo cross-contigs Number of contigs 1000 Number of contigs 100 10 1 1 2 3 4 5 6 7 8 9 10 11 12 Number of samples contributing reads to contig Reyes et al. Nature 2010

  20. Big data – microbiome style Average depth → F1M F1T1 F1T2 F2M F2T1 F2T2 F3M F3T1 F3T2 F4M F4T1 F4T2 Samples →

  21. Complete crAssphage genome

  22. Complete crAssphage genome

  23. How big is the chimerization problem? Assembly algorithms include “chimera protection” ● Break contigs at ambiguities contig1 contig4 contig3 contig2 contig5 Investigate the efgect of chimerization: ● Use difgerent assembly parameters and assess results ● High stringency few chimeras → ● Low stringency many chimeras →

  24. What are chimeras? Chimerization is more frequent between closely related strains ● Similar sequences Venus the chimeric cat https://www.facebook.com/VenusTheAmazingChimeraCat https://twitter.com/Venustwofacecat What are intra-phyla chimeras??? Aziz et al. NAR 2010

  25. What are chimeras? Chimerization is more frequent between closely related strains ● Similar sequences ● What are intra-phyla chimeras??? Evolutionary conserved entities! abundant and conserved enough to assemble Aziz et al. NAR 2010

  26. What is the host? 1) Sequence homology between phage and bacterial genes 2) Similarity in CRISPR spacers 3) Oligonucleotide usage profjle 4) Co-occurrence across metagenomic samples ● Reads mapped from 152 fecal total community metagenomes ● Reads mapped to phages and bacteria ● Normalize; Spearman rank correlations; cluster ● crAssphage clusters with Bacteroidetes ● Just like two known Bacteroides phages B40-8 and B124-14 5) Plaques

  27. What is the host? ● Requires correct host strain ● Requires phage makes visible plaques ● Often requires correct concentrations of Mg ++ , Ca ++ , etc No PCR hits in at least 100 plaques isolated from 10 pooled viral preparations on Bacteroides fragilis and B. thetaiotaomicron lawns.

  28. % crAssphage found in intestines Looked at 2,906 metagenomes Only found in 940 metagenomes Genome position: 0 – 97,065 nt

  29. crAssphage is abundant! Abundance-ubiquity plot

  30. crAssphage by the numbers • Present in 32.3% of sequenced environmental samples (940 / 2,906) – Includes virus metagenomes and total community metagenomes • >6x more abundant than all (1,192) other known phages combined – Corrected for genome size • Present in 73.4% of sequenced human fecal samples (342 / 466) – 99.9% of all crAssphage reads were found in feces (signifjcant) • 1.68% of the reads in all human fecal metagenomes • Estimate: ~6 crAssphage genomes per Bacteroides genome in your gut • >90% of the reads in some of virus metagenomes from the US twin study • 24% of the reads in an unrelated virus metagenome from Korea • 22% of the reads in total community metagenomes from USA (HMP data) • Found on every continent (where we have data)

  31. Viral database vs crAssphage Virome reads Virome reads mapping to mapping to Unknown sequences crAssphage viral database Reyes et al. Nature 2010

  32. Potential caveats • Phage or contamination? – Highly abundant in viral metagenomes size- and density-fjltered for VLPs – ORFs show similarity to bacteriophage and bacterial proteins (no conserved bacterial or archaeal metabolic genes) – Phage-like modularity among functions – Coding structure of the ORFs is typical of a phage genome – Putative prokaryotic promoter patterns – Genome detected in many metagenomes around the world • Amplifjcation skews?

  33. Summary ● crAssphage is everywhere ● everyone has it (rounding up) ● we don't know what it does

  34. metagenomics ● metagenomics 1.0: profjling ● metagenomics 2.0: population genomics

  35. Tools for population genomics ● AbundanceBin ● CompostBin ● CONCOCT ● crAss ● GroopM ● Metabat ● mmgenome

  36. Discussion points ● How many genomes would you expect in a population? ● More coverage versus more samples? ● Cutofgs for inclusion (e.g. GC, closeness, etc)

  37. Discussion points How do you know the contigs are from the same organism (genotype) – http://edwards.sdsu.edu/GenomePeek – BLAST hits – GC content or k- mer composition – Single copy genes – abundance profjles across metagenomes – Paired ends/mate pairs – PCR – PFGE and size comparisons – SIP and metabolically active fraction – Single cell genomics – Culturing / genome sequencing

Recommend


More recommend