a metagenomic tool for cheese ecosystems
play

A metagenomic tool for cheese ecosystems Anne-Laure Abraham, - PowerPoint PPT Presentation

A metagenomic tool for cheese ecosystems Anne-Laure Abraham, Quentin Cavaill, Thibaut Guirimand, Sandra Drozier, Charlie Pauvert, Mahendra Mariadassou, Bedis Dridi, Valentin Loux, Pierre Renault Jouy en Josas France Sept 9, 2018


  1. A metagenomic tool for cheese ecosystems Anne-Laure Abraham, Quentin Cavaillé, Thibaut Guirimand, Sandra Dérozier, Charlie Pauvert, Mahendra Mariadassou, Bedis Dridi, Valentin Loux, Pierre Renault Jouy en Josas – France Sept 9, 2018

  2. Cheesemaking Inoculated micro organisms Starters Ripening cultures Evolution of the ecosystem during cheese making Micro organisms Micro organisms Micro organisms from : animal milk, from salt from shelves, cellar Waterflows, airflows House microbiota Micro organisms: bacteria, yeasts, fungi, phages .02 JOUR / MOIS / ANNEE 2 RCAM 2018

  3. Properties of cheese micro organims Production of Organoleptic properties lactic acid, Acid flavor carbon dioxide, Fruity flavor alcohol, Formation of bubbles aldehydes ketones … Ripening cultures Starters Micro organisms from : animal milk, Micro organisms Waterflows, from shelves, cellar airflows Micro organisms from salt Coat texture Coat color .03 JOUR / MOIS / ANNEE 3 RCAM 2018

  4. Knowledge of cheese micro organims Inoculated micro organisms Known more vulnerable to Defined starter cultures bacteriophage attack Not completely Undefined complex starters “domesticated cultures” known Ripening cultures Starters Micro organims from : animal milk, Micro organisms Waterflows, from shelves, cellar airflows Micro organisms from salt Not completely known House microbiota .04 JOUR / MOIS / ANNEE 4 RCAM 2018

  5. Why study cheese ecosystem? Major reduction in the diversity of micro-organisms due to sanitary pressure & intensification of production Identify origin of organoleptic Follow ecosystem during properties of strains cheese manufacturing Protect functional Quality control properties of strains Study strain diversity Compare production lines .05 JOUR / MOIS / ANNEE 5 RCAM 2018

  6. Food microbiomes project • Project with academic & dairy industries • Use metagenomics to achieve a better understanding of cheese ecosystems Develop a user-friendly tool to analyze cheeses samples • Characteristics of cheese ecosystems • Few species (a few dozens) • More than 4000 sequenced dairy genomes ≥ 1 genome / most species • Needs • Precise taxonomic assignation (strain level) • Low abundant species identification • Identification of genes (and their functions) • A user-friendly interface for non bioinformaticians • A database with dairy genomes • Results easy to understand • Public / private genomes & metagenomes .06 JOUR / MOIS / ANNEE 6 RCAM 2018

  7. Metagenomic shotgun taxonomic assignation Methods based on Kmer Methods based on or Burrows – Wheeler Methods based on genomes/contigs mapping transform marker genes Kaiju (Menzel, 2015) Constrains (Luo, Nat Biotech, Sigma (Ahn, Bioinformatics, 2015) 2015) CLARK (Ounit, 2015) MicrobeGPS (Lindner, PLoS metaMLST (Zolfo, NAR, 2017) One, 2015) Krachen (Wood, 2014) MetaSNV (Costea, Plos one, 2017) StrainPhlAn (Truong, Centrifuge (Kim, 2016) Genome Research, 2017) DESMAN (Quince, Genome biol, 2017) Slow, limited database Fast, large database Precise taxonomic assignation Limited taxonomic assignation precision Identification of strain-level variation .07 JOUR / MOIS / ANNEE 7 RCAM 2018

  8. Metagenomic alignment Reference genomes sequencing Ecosystem Alignment mismatches Unaligned reads Sequencing errors & Choice of alignment parameters Absence of good reference genome .08 JOUR / MOIS / ANNEE 8 RCAM 2018

  9. Metagenomic alignment Reference genomes sequencing Ecosystem Alignment Repeated regions Transposable elements Conserved regions Low abundance High abundance Heterogenous sequencing depth Regions with high reads coverage Choice of alignment results cleaning .09 JOUR / MOIS / ANNEE 9 RCAM 2018

  10. Coverage of genomes Very close strain – high abundance Close strain – intermediate abundance Absent strain .010 JOUR / MOIS / ANNEE 10 RCAM 2018

  11. Characteristics of alignment Software Bowtie (Langmead, Genome Biology 2009) • 3 mismatches allowed (-v) • If several best hits, choose one randomly (-a --best --strata -M 1) Select reads that align on CDS Filter some CDS: • Annotated: integrase, transposases, IS, phage • Length <300nt CDS CDS CDS Filtered Filtered .011 JOUR / MOIS / ANNEE 11 RCAM 2018

  12. Characteristics of mapping Compute expected coverage • Fraction of genome that should be covered by at least one read if the genome is present • Lander & Waterman statistics  ReadLength ReadNumber    GenomeLeng th C 1 exp Expected distribution Observed distribution CDS CDS Filtered CDS Filtered Samtools & bedtools: • Identify variant positions • VCF file htslib.org .012 JOUR / MOIS / ANNEE 12 RCAM 2018

  13. Schema Reference Metagenome genomes database (fastq) (genbank) Alignment Reference Genome (bowtie) indexes creation Gene annotations (GFF) Reads alignment (BAM) CDS CDS CDS Summary (Samtools – Bedtools) Summary for each genome Reads for each CDS (CSV) (GFF) .013 JOUR / MOIS / ANNEE 13 RCAM 2018

  14. software output Reads for each CDS Summary for each genome (GFF) (CSV) Genome name CDS Localization CDS number CDS Name & product %CDS with at least 1 read % positions covered by reads CDS Length, Length covered by reads & Expected % positions covered by reads Number of positions with mismatches (Lander & Waterman) CDS coverage Mean, median, sd coverage Number of variant positions .014 JOUR / MOIS / ANNEE 14 RCAM 2018

  15. A dedicated dairy database • Based on organisms known to be in dairy products • Database enrichment: sequencing and assembly of new species isolated from dairy products - 150 bacterial species & 15 filamentous fungi and yeasts • 4000 genomes, manually selected • Work in progress: • Use text mining to: • Identify dairy species of the literature • Identify habitat of species found in metagenomics (for example: sea for salt bacteria) • Annotation enrichment: genes of technological interest Collab C. Nedellec team, MaIAGE (Almeida et al. 2014 BMC Genomics) .015 JOUR / MOIS / ANNEE 15 RCAM 2018

  16. Web interface & server • User friendly interface • Public/private genomes and samples • Personalized analyses .016 Quentin Cavaillé, Thibaut Guirimand, Sandra Dérozier, Pierre Renault, Valentin Loux JOUR / MOIS / ANNEE 16 RCAM 2018

  17. Tchapalo ecosystem • Tchapalo: traditional beer in Côte d’Ivoire • Mean production: 38.000 t/year • Daily familial consumption • Income-generating economic activity • Production process: • Sorghum malt goes through a double fermentation: • Natural lactic fermentation => sour wort • Alcoholic fermentation => Tchapalo Racha ZAARIR .017 17 JOUR / MOIS / ANNEE 17 RCAM 2018

  18. Tchapalo ecosystem analysis 25.1% Metagenomic analysis 72.3% 15.9% Microbiology analysis 80.2% .018 JOUR / MOIS / ANNEE 18 RCAM 2018

  19. Tchapalo ecosystem abundant species Expected % genome % CDS covered meanCoverage % coverage coverage Lactobacillus fermentum S6 100 54,979 99,215 100 Lactobacillus delbrueckii subsp. lactis KCCM 34717 95,503 150,326 91,717 100 Lactobacillus delbrueckii subsp. Jakobsenii 99,669 164,759 99,119 100 The strain Lactobacillus fermentum S6 is very close to the strain of the ecosystem Lactobacillus delbrueckii subsp. Jakobsenii is more close than Lactobacillus delbrueckii subsp. lactis KCCM 34717 to the strain of the ecosystem .019 JOUR / MOIS / ANNEE 19 RCAM 2018

  20. Tchapalo ecosystem low abundant species Expected % genome % CDS covered meanCoverage % coverage coverage # reads Saccharomyces cerevisiae YJM326 90,727 0,094 8,145 8,908 21405 Pediococcus acidilactici DSM 20284 81,706 0,577 9,418 46,005 28706 The strain Saccharomyces cerevisiae YJM326 YJM326 is very close to the strain of the ecosystem Pediococcus acidilactici DSM 20284 is absent of the ecosystem (reads coming from other Lactobacillaceae) .020 JOUR / MOIS / ANNEE 20 RCAM 2018

  21. Conclusion Provinding a user friendly tool for metagenomic analysis Reference genomes Metagenome (fastq) database (genbank) Alignment Reference Genome (bowtie) creation indexes Metagenomic software Gene annotations (GFF) Reads alignment (BAM) C C CD D D S S S Summary (Samtools – Bedtools) Summary for each genome Reads for each gene (CSV) (GFF) Web interface Reference genome database • Will be publicly available for research purpose • An account on the INRA migale platform is required • The software and database development are still on going .021 JOUR / MOIS / ANNEE 21 RCAM 2018

  22. Perspectives  Genomes pre-selection using a faster method (k-mer or Burrows – Wheeler transform) to speed up computation  Allow metagenomes analysis comparisons  Apply it on MetaPDOCheese project (next slide)  Application to other ecosystems with enough reference genomes (for example: fermented food, animals digestive ecosystems …) .022 JOUR / MOIS / ANNEE 22 RCAM 2018

Recommend


More recommend