Highly similar families to blame 37
Is there any way to improve this?
Statistical/Machine-Learning Correction DIAMOND-BLASTX Output Classifier AMR Gene Predictions 38
Statistical/Machine-Learning Correction DIAMOND-BLASTX Output Classifier AMR Gene Predictions 38
Statistical/Machine-Learning Correction DIAMOND-BLASTX Output Classifier AMR Gene Predictions 38
Statistical/Machine-Learning Correction DIAMOND-BLASTX Output Classifier AMR Gene Predictions Average Precision: 0.63 38
Statistical/Machine-Learning Correction DIAMOND-BLASTX Output Classifier AMR Gene Predictions Average Precision: 0.63 % 38
Revised classifier structure: exploiting the ARO DIAMOND-BLASTX Output AMR Family Classifier AMR Families Family 1 Reads Family ... Reads Family N Reads Family 1 Classifier Family ... Classifier Family N Classifier AMR Gene Predictions 39
Revised classifier structure: exploiting the ARO DIAMOND-BLASTX Output AMR Family Classifier AMR Families Family 1 Reads Family ... Reads Family N Reads Family 1 Classifier Family ... Classifier Family N Classifier AMR Gene Predictions 39
Revised classifier structure: exploiting the ARO DIAMOND-BLASTX Output AMR Family Classifier AMR Families Family 1 Reads Family ... Reads Family N Reads Family 1 Classifier Family ... Classifier Family N Classifier AMR Gene Predictions 39
Revised classifier structure: exploiting the ARO DIAMOND-BLASTX Output AMR Family Classifier AMR Families Family 1 Reads Family ... Reads Family N Reads Family 1 Classifier Family ... Classifier Family N Classifier AMR Gene Predictions 39
Revised classifier structure: exploiting the ARO DIAMOND-BLASTX Output AMR Family Classifier AMR Families Family 1 Reads Family ... Reads Family N Reads Family 1 Classifier Family ... Classifier Family N Classifier AMR Gene Predictions 39
Revised classifier structure: exploiting the ARO DIAMOND-BLASTX Output AMR Family Classifier AMR Families Family 1 Reads Family ... Reads Family N Reads Family 1 Classifier Family ... Classifier Family N Classifier AMR Gene Predictions 39
Slightly improved family performance Normalised Bitscore Random Forest 1.00 0.75 Proportion 0.50 0.25 0.00 Precision Recall Family Test Peformance Mean Precision: 0.995, Mean Recall: 0.985 40
Greatly improved gene performance 41
Gains not evenly distributed Median Precision-Recall Within Families 1.00 Precision Recall 0.75 Proportion 0.50 0.25 0.00 0 25 50 75 100 125 150 175 200 225 Ordered AMR Family Index • Not enough signal in read so output compatible set • Some fixed bugs 42
Metagenomic resistome profile Normalised Read Proportion 10 10 10 10 6 5 4 3 ARO:0000042 ! glycylcycline ARO:0000072 ! linezolid ARO:0000004 ! monobactam ARO:0000025 ! fosfomycin ARO:3000157 ! rifamycin antibiotic ARO:3000034 ! nucleoside antibiotic ARO:3000111 ! novobiocin ARO:3000282 ! sulfonamide antibiotic 47 human gut metagenome profiles ARO:3000053 ! peptide antibiotic ARO:0000041 ! bacitracin ARO:3003253 ! aminocoumarin sensitive parY ARO:3000657 ! paromomycin ARO:0000021 ! ribostamycin ARO:3000701 ! lividomycin B AMR hits related to Drug Class ARO:3000700 ! lividomycin A ARO:3000655 ! gentamicin B Drug Class ARO:0000024 ! butirosin ARO:0000049 ! kanamycin A ARO:0000032 ! cephalosporin ARO:3000387 ! phenicol antibiotic ARO:3000554 ! mupirocin ARO:0000001 ! fluoroquinolone antibiotic ARO:0000044 ! cephamycin ARO:3000103 ! aminocoumarin antibiotic ARO:3000171 ! diaminopyrimidine antibiotic ARO:0000000 ! macrolide antibiotic ARO:0000016 ! aminoglycoside antibiotic ARO:0000026 ! streptogramin antibiotic ARO:3000081 ! glycopeptide antibiotic ARO:0000022 ! polymyxin antibiotic ARO:0000017 ! lincosamide antibiotic Indeterminate Class ARO:3001219 ! elfamycin antibiotic ARO:3000007 ! beta-lactam antibiotic ARO:3000050 ! tetracycline derivative 43
Great, but... • Known AMR genes • Is one organism resistant to everything? • Are many organisms each resistant to one thing? • Have AMR genes been laterally transferred? 44
Can we get the best of metagenomics and genomics?
Metagenomic-Assembled Genomes
MAG binning Genomes Sequencing Reads Assembly Contigs Binning Metagenome- Assembled Genomes 45
MAGs are popular Figure from (Parks et al., 2017) 46
What about plasmids? Figure from (Antipov et al., 2016) • Circular or linear extrachromosomal self-replicating DNA. • Dissemination of AMR genes. • Repetitive, variable copy number, different sequence composition. 47
Or genomic islands www.pathogenomics.sfu.ca/islandviewer • Clusters of genes acquired through LGT • Integrons, transposons, integrative and conjugative elements (ICEs) and prophages • Variable copy number and composition (used by SIGI-HMM, IslandPath-DIMOB) 48
How well do MAGs recover these sequences?
Time to start simulating again • Simulate some metagenomes (lognormal abundance distribution) from difficult genomes • 10 genomes: lots of plasmids • 10 genomes: high % of genomic islands (compositional) • 10 genomes: low % of genomic islands • Assembly using 3 alternative methods: IDBA UD, MetaSPAdes, Megahit • Bin contigs using 4 different tools: metabat2, maxbin2, concoct, dastool 49
Chromosomes fairly well binned 26-94.3% median chromosomal coverage (Pre-print draft github.com/fmaguire/mag_sim_paper ) 50
Chromosomes fairly well binned 26-94.3% median chromosomal coverage (Pre-print draft github.com/fmaguire/mag_sim_paper ) 50
Plasmids are not 1.5-29.2% plasmids binned 51
Genomic islands are better but bad 28-42% GIs binned 52
What about AMR genes? 24-43% AMR genes binned 53
Which AMR genes are lost? • 30-53% chromosomal AMR genes (n=120) • 0-45% genomic island AMR genes (n=11) • 0% of plasmid AMR genes (n=20) 54
Be cautious with MAGs • Regain some context but with biased data loss • Disproportionate loss of AMR genes • Mobile Genetic Elements poorly recovered • Cautionary tale: more processing = more data loss 55
Conclusions
Conclusions Method Strengths Weaknesses
Conclusions Method Strengths Weaknesses Targeted Cheap, easy analysis a priori , stagnation
Conclusions Method Strengths Weaknesses Targeted Cheap, easy analysis a priori , stagnation Genomics Context, moderate analysis Isolation, throughput
Conclusions Method Strengths Weaknesses Targeted Cheap, easy analysis a priori , stagnation Genomics Context, moderate analysis Isolation, throughput Metagenomics Many genomes at once Fragmented, no context, difficult analysis
Conclusions Method Strengths Weaknesses Targeted Cheap, easy analysis a priori , stagnation Genomics Context, moderate analysis Isolation, throughput Metagenomics Many genomes at once Fragmented, no context, difficult analysis Metagenomic-Assembed Genomes Context for many genomes Lose key data, complex analysis • Simulation fundamental to evaluating approaches 56
Recommend
More recommend