Some biological questions in bacterial comparative genomics Meriem El Karoui Inra, Jouy-en-Josas Meriem_elkaroui@hms.harvard.edu MICALIS A L I M E N T A T I O N 9 December, 2008 January 15th 2010 A G R I C U L T U R E AERES E N V I R O N N E M E N T
Comparative genomics : definition • The study of relationships between complete genomes of different species/individuals . • Objectives : – Describe and understand genomic diversity • Indentify conserved an variable regions • Understand species/individual specificity • Identify functional parts, genes, promoters ... – Understand genome evolution MICALIS Marseille, January 2010
Key dates • 1995 : complete genome of Haemophilus influenzae and Mycoplasma genitalium • 1999 : first comparison at the strain level ( Helicobacter pylori ) • 2003: First metagenome • 2004: First complete human genome • 2007: second human genome MICALIS Marseille, January 2010
A very large amount of data Complete Genomes (by Domain) GOLD database, September 2009 250 200 150 100 50 0 1996 1998 2000 2002 2004 2006 2008 A B E MICALIS Marseille, January 2010
Data • Complete genomes • Not so complete genome • Metagenomes • Different sequence quality standard are emerging ( draft, high quality draft, improved high quality draft, annotation directed improvement, non contiguous finished, finished, Chain et. al Science , 2009 ) MICALIS Marseille, January 2010
Next generation sequencing DNA Read length Gb/run amplification/ template Roche 454 Yes /MP 350 0.45 Illumina Yes/MP 75 18 SOLiD Yes/MP 35 30 Heliscope No/MP 32 37 No 965 N/A Pacific Bioscience Metzker, Nature Reviews Genetics, 2010
Evolutionary scale • Different individuals in the same species • Closely related species • Divergent species MICALIS Marseille, January 2010
Vertebrate genomes MICALIS Marseille, January 2010 Margulies Birney, Nat. Rev. Genetics, 2008
Bacterial genomes Marseille, January 2010 Wu, Hugenholtz et. al. Nature,2009 MICALIS
Databases • Genbank • EMBL • DDBJ • Specialized Databases (NAR database issue, January 2010) MICALIS Marseille, January 2010
ANALYSIS OF GENETIC DIVERSITY MICALIS Marseille, January 2010
Define conserved and variable regions • Objectives : – Understand phenotypic behavior (pathogenicity, susceptibility to diseases). – Find functional information (identify genes, promoters, functional DNA motifs). – Establish « gene repertoire» of a species. Discover new protein families. MICALIS Marseille, January 2010
Two different approaches in genome comparison • Comparison of complete proteome (gene level) • Comparison of complete genome (nucleotide level) MICALIS Marseille, January 2010
Analysis at the gene level • Based on the identification of homologous genes • Allows comparisons at various evolutionary time scale • Can be applied to some extent to non finished genomes • dependant on genome annotation and accuracy of gene alignment procedure (usually BLAST) MICALIS Marseille, January 2010
What is the amount of gene conservation among procaryotes? MICALIS Marseille, January 2010 Koonin, E. V. et al. Nucl. Acids Res. 2008
A High level of HGT in procaryotes Koonin, E. V. et al. Nucl. Acids Res. 2008 MICALIS Marseille, January 2010
Ecosystem level : Metagenomic • Sampling the genome sequences of a community of organisms inhabiting a common environment • genomes of dominant species can be fully reconstructed • Most data are short reads that can be related to genes. NGS MICALIS Marseille, January 2010
Analysis of phyla representation Ley R. et al, Nature Reviews Microbiology , 2008 MICALIS Marseille, January 2010
« Gene centric » analysis Regardless of species content Hugenholtz and Tyson, Nature, 2008 MICALIS Marseille, January 2010
Analysis at the species level • Core genome : genes shared by all the strains analysed. Basic functions and species phenotypic characteristic • Pan genome : core genome + « dispensable genome ». Species diversity and functions related to niche adaptation MICALIS Marseille, January 2010
Streptococcus agalactiae pan genome GBS pan-genome Tettelin H et al. PNAS 2005;102:13950-13955
Pan/core genome of Escherichia coli Touchon et. al. Plos Genetics, 2009 MICALIS Marseille, January 2010
Conclusion gene based analysis • Very powerful to characterize genomic diversity at different evolutionary scale. • Shows a surprising level of genetic diversity in procaryotes which in large part due to the « mobilome » (mobile genetic elements) • Dependant on annotation and accuracy of gene comparison method. • Does not take into account genome structure. MICALIS Marseille, January 2010
Nucleotide level analysis • Short evolutionary time scale • Takes into account chromosome organisation – complete multi genome alignment – Genome « mapping » (NGS)
Complete genome alignment Brudno et al.
Multiple whole-genome alignment 1. Identify local region of Softwares : similarity (matches) MGA, MAUVE, 2. Chaining of matches MAVID, (rearrangments) MultiLAGAN….. 3. Alignment of gaps Dewey and Pachter, Human Molecular Genetics, 2006
Comparison of two E. coli strains
Chromosome organisation
Chromosome rearrangments in Yersinia Pestis Darling et. al. Plos Genetics, 2008
Backbone/variable segments Variable segments complete genome alignment Backbone MICALIS Marseille, January 2010
Bacterial genome segmentation 1. Genome alignment http://genome.jouy.inra.fr/mosaic MGA, MAUVE 2. Segmentation Chiapello et al. BMC Bioinformatics, 2005 Chiapello et al. BMC Bioinformatics, 2008
Robustness of genome comparison Simulations – Random perturbation of genome and segmentation – Robustness Score H. Devillers, S. Schbath, ANR Cocogen
Identification of functional motifs • Perform multiple complete genome alignment • Define backbone • Look for motifs that have a particular distribution on the backbone
identification of functional motifs DNA repair: Chi ori Halpern et al. PLoS Genetics, 2007 Chromosome segregation: KOPS Bigot et al. EMBO J., 2005 Val et al. PLoS Genetics, 2008 Macrodomain organisation: MatS dif Mercier et al. Cell, 2008 These motifs are enriched on the backbone
Caracterisation of mutants in Bacillus subtilis Srivatsan et al . PlosGenetics 2008, see also Medvedev et al. Nature Methods, 2009
Analysis at the nucletide level • Very precise identification of variations (single nucelotide mutations, indel etc…) • Complete genome alignment still an open question • New methods to compare unfinished genomes fast developping.
UNDERSTANDING GENOME EVOLUTION MICALIS Marseille, January 2010
Analysis of E. coli genome evolution • 20 high quality E. coli genomes • 1 complete genome of Escherichia fergunsonii (outgroup) MICALIS Marseille, January 2010
Phylogenetic tree reconstruction : E. coli MICALIS Marseille, January 2010 Touchon et al. PLoSGenetics, 2008
A high level of gene variation along the tree MICALIS Marseille, January 2010
Conclusion • Comparative genomics has revealed an unexpected amount of variability among prokaryotic genomes. • It raises challenging questions about genome evolution, e. g. bacterial species concept. • It paves the way for other types or comparisons – Comparisons of networks – Comparisons of transcriptomes (RNA-seq) and protein binding regions (Chip-Seq) MICALIS Marseille, January 2010
Statistics Bioinformatics S. Schbath H. Chiapello C. Caron M-A Petit S. Robin A. Jacquemard D. Halpern MIG, INRA, Jouy MIG, INRA, Jouy OMIP, AgroParisTech H. Devillers F. Touzain MICALIS, INRA Jouy E. Rivals Algorithmics P. Lebourgeois R. Uricaru F. Cornet F.X. Barre Experimental F. Boccard LIRMM, CNRS, Montpellier biology CGM, CNRS, Gif sur Yvette O. Espeli LMGM, CNRS, Toulouse
Recommend
More recommend