Population Genomics Image: Lisa Brown for National Public Radio Rob Edwards San Diego State University
Phages in the Worlds Oceans ARC 56 samples 16 sites 1 year BBC 85 samples 38 sites 8 years SAR 1 sample 1 site 1 year GOM 41 samples LI 13 sites 4 sites 5 years 1 year
Most Marine Phage Sequences are Novel
Marine Single-Stranded DNA Viruses • 6% of SAR sequences ssDNA phage ( Chlamydia -like Microviridae) • 40% viral particles in SAR are ssDNA phage • Several full-genome sequences were recovered via de novo assembly of these fragments • Confjrmed by PCR and sequencing
SAR metagenome and Chlamydia φ4 Individual sequence reads Coverage Concatenated hits Chlamydia phi 4 Chl4 ORF genome calls 12,297 sequence fragments hit using TBLASTX over a ~4.5 kb genome
The phage proteomic tree
Signature sequences ● HECTOR and PARIS – Degenerate primers that amplify T7 DNA polymerase – T ested samples from around world – T ested by difgerent investigators in difgerent laboratories Mya Breitbart
T7 phages are globally distributed ~10 26 copies of each sequence on the planet = 60 metric tons of this DNA sequence Breitbart et al, FEMS Micro Lett
Some phages are everywhere ssDNA λ-like Phage Proteomic T4-like T ree v. 5 T7-like (Edwards, Rohwer)
Compare viruses to all metagenomes Phage P4 – 11kb, 10 ORFs Azul – individual sequence reads in a metagenome Verde – coverage across genome
Parts of viruses are everywhere # metagenome hits P4 phage genome
Viruses have lots of unknown genes Known genes Unknown genes Microbial Viral
Bas Dutilh
cross Assembly metagenome 1 metagenome 2 metagenome 3 metagenome 4
cross Assembly metagenome 1 metagenome 2 metagenome 3 metagenome 4 Assembly
cross Assembly Contigs directly represent the overlap between samples http://edwards.sdsu.edu/crass/
HMP viruses Reyes et al. Nature 2010
Phages are more variable than microbes Microbes Phages Functions present in samples Reyes et al. Nature 2010
De novo assembly HMP data 10000 6,988 de novo cross-contigs Number of contigs 1000 Number of contigs 100 10 1 1 2 3 4 5 6 7 8 9 10 11 12 Number of samples contributing reads to contig Reyes et al. Nature 2010
Big data – microbiome style Average depth → F1M F1T1 F1T2 F2M F2T1 F2T2 F3M F3T1 F3T2 F4M F4T1 F4T2 Samples →
Complete crAssphage genome
Complete crAssphage genome
How big is the chimerization problem? Assembly algorithms include “chimera protection” ● Break contigs at ambiguities contig1 contig4 contig3 contig2 contig5 Investigate the efgect of chimerization: ● Use difgerent assembly parameters and assess results ● High stringency few chimeras → ● Low stringency many chimeras →
What are chimeras? Chimerization is more frequent between closely related strains ● Similar sequences Venus the chimeric cat https://www.facebook.com/VenusTheAmazingChimeraCat https://twitter.com/Venustwofacecat What are intra-phyla chimeras??? Aziz et al. NAR 2010
What are chimeras? Chimerization is more frequent between closely related strains ● Similar sequences ● What are intra-phyla chimeras??? Evolutionary conserved entities! abundant and conserved enough to assemble Aziz et al. NAR 2010
What is the host? 1) Sequence homology between phage and bacterial genes 2) Similarity in CRISPR spacers 3) Oligonucleotide usage profjle 4) Co-occurrence across metagenomic samples ● Reads mapped from 152 fecal total community metagenomes ● Reads mapped to phages and bacteria ● Normalize; Spearman rank correlations; cluster ● crAssphage clusters with Bacteroidetes ● Just like two known Bacteroides phages B40-8 and B124-14 5) Plaques
What is the host? ● Requires correct host strain ● Requires phage makes visible plaques ● Often requires correct concentrations of Mg ++ , Ca ++ , etc No PCR hits in at least 100 plaques isolated from 10 pooled viral preparations on Bacteroides fragilis and B. thetaiotaomicron lawns.
% crAssphage found in intestines Looked at 2,906 metagenomes Only found in 940 metagenomes Genome position: 0 – 97,065 nt
crAssphage is abundant! Abundance-ubiquity plot
crAssphage by the numbers • Present in 32.3% of sequenced environmental samples (940 / 2,906) – Includes virus metagenomes and total community metagenomes • >6x more abundant than all (1,192) other known phages combined – Corrected for genome size • Present in 73.4% of sequenced human fecal samples (342 / 466) – 99.9% of all crAssphage reads were found in feces (signifjcant) • 1.68% of the reads in all human fecal metagenomes • Estimate: ~6 crAssphage genomes per Bacteroides genome in your gut • >90% of the reads in some of virus metagenomes from the US twin study • 24% of the reads in an unrelated virus metagenome from Korea • 22% of the reads in total community metagenomes from USA (HMP data) • Found on every continent (where we have data)
Viral database vs crAssphage Virome reads Virome reads mapping to mapping to Unknown sequences crAssphage viral database Reyes et al. Nature 2010
Potential caveats • Phage or contamination? – Highly abundant in viral metagenomes size- and density-fjltered for VLPs – ORFs show similarity to bacteriophage and bacterial proteins (no conserved bacterial or archaeal metabolic genes) – Phage-like modularity among functions – Coding structure of the ORFs is typical of a phage genome – Putative prokaryotic promoter patterns – Genome detected in many metagenomes around the world • Amplifjcation skews?
Summary ● crAssphage is everywhere ● everyone has it (rounding up) ● we don't know what it does
metagenomics ● metagenomics 1.0: profjling ● metagenomics 2.0: population genomics
Tools for population genomics ● AbundanceBin ● CompostBin ● CONCOCT ● crAss ● GroopM ● Metabat ● mmgenome
Discussion points ● How many genomes would you expect in a population? ● More coverage versus more samples? ● Cutofgs for inclusion (e.g. GC, closeness, etc)
Discussion points How do you know the contigs are from the same organism (genotype) – http://edwards.sdsu.edu/GenomePeek – BLAST hits – GC content or k- mer composition – Single copy genes – abundance profjles across metagenomes – Paired ends/mate pairs – PCR – PFGE and size comparisons – SIP and metabolically active fraction – Single cell genomics – Culturing / genome sequencing
Recommend
More recommend