some biological questions in bacterial comparative
play

Some biological questions in bacterial comparative genomics Meriem - PowerPoint PPT Presentation

Some biological questions in bacterial comparative genomics Meriem El Karoui Inra, Jouy-en-Josas Meriem_elkaroui@hms.harvard.edu MICALIS A L I M E N T A T I O N 9 December, 2008 January 15th 2010 A G R I C U L T U R E AERES E N V I R O N


  1. Some biological questions in bacterial comparative genomics Meriem El Karoui Inra, Jouy-en-Josas Meriem_elkaroui@hms.harvard.edu MICALIS A L I M E N T A T I O N 9 December, 2008 January 15th 2010 A G R I C U L T U R E AERES E N V I R O N N E M E N T

  2. Comparative genomics : definition • The study of relationships between complete genomes of different species/individuals . • Objectives : – Describe and understand genomic diversity • Indentify conserved an variable regions • Understand species/individual specificity • Identify functional parts, genes, promoters ... – Understand genome evolution MICALIS Marseille, January 2010

  3. Key dates • 1995 : complete genome of Haemophilus influenzae and Mycoplasma genitalium • 1999 : first comparison at the strain level ( Helicobacter pylori ) • 2003: First metagenome • 2004: First complete human genome • 2007: second human genome MICALIS Marseille, January 2010

  4. A very large amount of data Complete Genomes (by Domain) GOLD database, September 2009 250 200 150 100 50 0 1996 1998 2000 2002 2004 2006 2008 A B E MICALIS Marseille, January 2010

  5. Data • Complete genomes • Not so complete genome • Metagenomes • Different sequence quality standard are emerging ( draft, high quality draft, improved high quality draft, annotation directed improvement, non contiguous finished, finished, Chain et. al Science , 2009 ) MICALIS Marseille, January 2010

  6. Next generation sequencing DNA Read length Gb/run amplification/ template Roche 454 Yes /MP 350 0.45 Illumina Yes/MP 75 18 SOLiD Yes/MP 35 30 Heliscope No/MP 32 37 No 965 N/A Pacific Bioscience Metzker, Nature Reviews Genetics, 2010

  7. Evolutionary scale • Different individuals in the same species • Closely related species • Divergent species MICALIS Marseille, January 2010

  8. Vertebrate genomes MICALIS Marseille, January 2010 Margulies Birney, Nat. Rev. Genetics, 2008

  9. Bacterial genomes Marseille, January 2010 Wu, Hugenholtz et. al. Nature,2009 MICALIS

  10. Databases • Genbank • EMBL • DDBJ • Specialized Databases (NAR database issue, January 2010) MICALIS Marseille, January 2010

  11. ANALYSIS OF GENETIC DIVERSITY MICALIS Marseille, January 2010

  12. Define conserved and variable regions • Objectives : – Understand phenotypic behavior (pathogenicity, susceptibility to diseases). – Find functional information (identify genes, promoters, functional DNA motifs). – Establish « gene repertoire» of a species. Discover new protein families. MICALIS Marseille, January 2010

  13. Two different approaches in genome comparison • Comparison of complete proteome (gene level) • Comparison of complete genome (nucleotide level) MICALIS Marseille, January 2010

  14. Analysis at the gene level • Based on the identification of homologous genes • Allows comparisons at various evolutionary time scale • Can be applied to some extent to non finished genomes • dependant on genome annotation and accuracy of gene alignment procedure (usually BLAST) MICALIS Marseille, January 2010

  15. What is the amount of gene conservation among procaryotes? MICALIS Marseille, January 2010 Koonin, E. V. et al. Nucl. Acids Res. 2008

  16. A High level of HGT in procaryotes Koonin, E. V. et al. Nucl. Acids Res. 2008 MICALIS Marseille, January 2010

  17. Ecosystem level : Metagenomic • Sampling the genome sequences of a community of organisms inhabiting a common environment • genomes of dominant species can be fully reconstructed • Most data are short reads that can be related to genes. NGS MICALIS Marseille, January 2010

  18. Analysis of phyla representation Ley R. et al, Nature Reviews Microbiology , 2008 MICALIS Marseille, January 2010

  19. « Gene centric » analysis Regardless of species content Hugenholtz and Tyson, Nature, 2008 MICALIS Marseille, January 2010

  20. Analysis at the species level • Core genome : genes shared by all the strains analysed. Basic functions and species phenotypic characteristic • Pan genome : core genome + « dispensable genome ». Species diversity and functions related to niche adaptation MICALIS Marseille, January 2010

  21. Streptococcus agalactiae pan genome GBS pan-genome Tettelin H et al. PNAS 2005;102:13950-13955

  22. Pan/core genome of Escherichia coli Touchon et. al. Plos Genetics, 2009 MICALIS Marseille, January 2010

  23. Conclusion gene based analysis • Very powerful to characterize genomic diversity at different evolutionary scale. • Shows a surprising level of genetic diversity in procaryotes which in large part due to the « mobilome » (mobile genetic elements) • Dependant on annotation and accuracy of gene comparison method. • Does not take into account genome structure. MICALIS Marseille, January 2010

  24. Nucleotide level analysis • Short evolutionary time scale • Takes into account chromosome organisation – complete multi genome alignment – Genome « mapping » (NGS)

  25. Complete genome alignment Brudno et al.

  26. Multiple whole-genome alignment 1. Identify local region of Softwares : similarity (matches) MGA, MAUVE, 2. Chaining of matches MAVID, (rearrangments) MultiLAGAN….. 3. Alignment of gaps Dewey and Pachter, Human Molecular Genetics, 2006

  27. Comparison of two E. coli strains

  28. Chromosome organisation

  29. Chromosome rearrangments in Yersinia Pestis Darling et. al. Plos Genetics, 2008

  30. Backbone/variable segments Variable segments complete genome alignment Backbone MICALIS Marseille, January 2010

  31. Bacterial genome segmentation 1. Genome alignment http://genome.jouy.inra.fr/mosaic MGA, MAUVE 2. Segmentation Chiapello et al. BMC Bioinformatics, 2005 Chiapello et al. BMC Bioinformatics, 2008

  32. Robustness of genome comparison Simulations – Random perturbation of genome and segmentation – Robustness Score H. Devillers, S. Schbath, ANR Cocogen

  33. Identification of functional motifs • Perform multiple complete genome alignment • Define backbone • Look for motifs that have a particular distribution on the backbone

  34. identification of functional motifs DNA repair: Chi ori Halpern et al. PLoS Genetics, 2007 Chromosome segregation: KOPS Bigot et al. EMBO J., 2005 Val et al. PLoS Genetics, 2008 Macrodomain organisation: MatS dif Mercier et al. Cell, 2008 These motifs are enriched on the backbone

  35. Caracterisation of mutants in Bacillus subtilis Srivatsan et al . PlosGenetics 2008, see also Medvedev et al. Nature Methods, 2009

  36. Analysis at the nucletide level • Very precise identification of variations (single nucelotide mutations, indel etc…) • Complete genome alignment still an open question • New methods to compare unfinished genomes fast developping.

  37. UNDERSTANDING GENOME EVOLUTION MICALIS Marseille, January 2010

  38. Analysis of E. coli genome evolution • 20 high quality E. coli genomes • 1 complete genome of Escherichia fergunsonii (outgroup) MICALIS Marseille, January 2010

  39. Phylogenetic tree reconstruction : E. coli MICALIS Marseille, January 2010 Touchon et al. PLoSGenetics, 2008

  40. A high level of gene variation along the tree MICALIS Marseille, January 2010

  41. Conclusion • Comparative genomics has revealed an unexpected amount of variability among prokaryotic genomes. • It raises challenging questions about genome evolution, e. g. bacterial species concept. • It paves the way for other types or comparisons – Comparisons of networks – Comparisons of transcriptomes (RNA-seq) and protein binding regions (Chip-Seq) MICALIS Marseille, January 2010

  42. Statistics Bioinformatics S. Schbath H. Chiapello C. Caron M-A Petit S. Robin A. Jacquemard D. Halpern MIG, INRA, Jouy MIG, INRA, Jouy OMIP, AgroParisTech H. Devillers F. Touzain MICALIS, INRA Jouy E. Rivals Algorithmics P. Lebourgeois R. Uricaru F. Cornet F.X. Barre Experimental F. Boccard LIRMM, CNRS, Montpellier biology CGM, CNRS, Gif sur Yvette O. Espeli LMGM, CNRS, Toulouse

Recommend


More recommend