Team leader: O. Poch EvoluCode: an original view of Director : Julie Thompson Human Systems Evolution. Benjamin LINARD Thesis director : Julie Thompson
Guideline 1) General context 2) EvoluCodes: evolutionary barcodes 3) Evolutionary knowledge extraction in human networks Conclusion & perspectives
1. Context Species Evolution & Technologies An historical perspective … Darwin Theory of Evolution Lamarck: transmutation of species Middle age Islamic philosophers 1859
1. Context Species Evolution & Technologies An historical perspective … Darwin Theory of Evolution Lamarck: transmutation of species Middle age Islamic philosophers Mendelean laws, heredity Notions of gene & mutation Population genetics 1859 beans/drosophilia crossings statistics (game theory)
1. Context Species Evolution & Technologies An historical perspective … Sanger sequencing Darwin Watson & Crick Mullis Theory of Evolution DNA structure PCR Lamarck: transmutation of species Middle age Islamic philosophers Mendelean laws, heredity Population genetics 1983 1953 1859 1977
1. Context Species Evolution & Technologies An historical perspective … Sanger sequencing Darwin Watson & Crick Mullis Theory of Evolution DNA structure PCR Lamarck: transmutation of species Reign of molecular Middle age Islamic philosophers Mendelean laws, heredity biology Population genetics 1983 1953 1859 1977
1. Context Species Evolution & Technologies An historical perspective … Sanger sequencing NGS, Darwin Watson & Crick « omics » Mullis Theory of Evolution DNA structure techniques PCR Lamarck: transmutation of species Reign of molecular Middle age Islamic philosophers Mendelean laws, heredity Systems biology biology Population genetics 2000s 1983 1953 1859 1977 Gene variation Phenotypic variation Linking both ?
1. Context Species Evolution & Technologies http://skepticwonder.fieldofscience.com/
1. Context Evolutionary Systems biology 1 gene o Proteomics o Transcriptomics = o Interactomics Sept 2008 o Expression data … 1 terabyte !!!! Number of complete genomes 182 586 1 day ARCHAEA BACTERIA = EUKARYA 1 sequenced 3490 GOLD Database, genome mars 2012 Problems of dispersed data Problems of visualisation need for summarisation
1. Context Evolutionary Systems biology Analysis of large-scale biological parameters Phenomic parameters: expression level, network centrality , dispensability… Evolutionary parameters: sequence evolution rate, gene loss, genetic events, … Observed general trends CAI : codon adaptation index EL : expression level ER : evolutionary rate GI : genetic interactions KE : knockout lethal effect NP : number of paralogs PA : protein abundance PGL : propensity for gene loss PPI : prot.-prot. Interactions Koonin EV, Wolf Y, 2006 How to trace back to the gene ?
2. EvoluCodes EvoluCodes and systems biology How to study large scale biological parameters with a gene basis ? Representing gene variation (i.e. history) Summarising multi-scale data EvoluCodes : Evolutionary Barcodes Linard & al., Evo Bioinfo 2012 1 gene 1 evolutionary history 1 barcode
2. Evolucodes EvoluCodes in vertebrates Multi-level evolutionary data in vertebrates 1 evolucode for each human protein-coding gene 1. Multiple alignment related parameters (Thompson JD & al, PLoS ONE 2011) Human genes compared to 17 vertebrate species domains Residue query orthologs ~20,000 human Domain proteins (1/coding gene) Protein paralogs Clades core blocks 500,000 aligned sequences with annotation processes
2. Evolucodes EvoluCodes in vertebrates Multi-level evolutionary data in vertebrates 1. Multiple alignment related parameters (Thompson JD & al, PLoS ONE 2011) 2. Ortholog/Inparalog relationships (Linard B. & al, BMC Bioinfo 2011) lbgi.igbmc.fr / orthoinspector OrthoInspector Software & Database >11,000,000 orthologous relations lbgi.igbmc.fr/orthoinspector
2. Evolucodes EvoluCodes Multi-level evolutionary data in vertebrates 1. Multiple alignment related parameters (Thompson JD & al, PLoS ONE 2011) 2. Ortholog/Inparalog relationships (Linard B. & al, BMC Bioinfo 2011) Google “ orthoinspector ” 3. Synteny data for vertebrates (Prosdocimi F. Linard B. & al, BMC Genomics 2012) Image generated with CoGe:GEvo ~20 000 Human EvoluCodes ~ 280 000 inter-species genome mappings
2. Evolucodes EvoluCodes EvoluCode Examples Human genes are n vertebrate species Parameter distribution reference genes in a given species N parameters Glucagon Receptor (GLR_HUMAN) Typical value 1 barcode = Higher atypical value 1 evolutionary scenario Lower atypical value
2. Evolucodes EvoluCodes Several EvoluCode profiles Developmental pluripotency-associated protein 3 HERV-K_1q22 provirus ancestral Pol protein Pogo transposable element with ZNF domain ( POK12_HUMAN ) ( POGZ_HUMAN ) ( DPPA3_HUMAN ) Mammalian innovation, new domain composition Recent innovation in primates and rodents Variable repartition in vertebrates, viral DNA integration + Strongly conserved in all mammals since this genetic event + fast evolving gene Generally observed value Low parameter High parameter = absent in the species value value
2. Evolucodes EvoluCodes n vertebrate species 1D-EvoluCodes Mean normalized by phylum composition N parameters Amphibia Reptilia Vector of N values Teleostei (1 value per parameter) Mammalia
2. Evolucodes Large scale analysis Human type 1 keratin cluster sheath keratins hair keratins cytokeratins cytokeratins Inner root Keratin-associated proteins (mammals specific cluster) hair keratins cytokeratins Sequence Conservation Hydrophobicity Clade 38,811,872 39,155,446 39,502,371 39,780,882 Keratine type 1 cluster region Chromosome 17 Michael Hesse & al, 2003
2. Evolucodes Large scale analysis Clustering the EvoluCodes - non-parametric technique, super paramagnetic clustering - improved Potts clustering model ( Murua et al., 2008) 303 EvoluCode clusters 1 cluster = similar evolutionary scenario # functional enrichment analysis GO accession GO terms 10log(p) FDR genes 55 GO:0022904 respiratory electron transport chain -10.894378 0 GO:0006796 phosphate metabolic process -5.162176 0.003 Sequence 130 GO:0007608 sensory perception of smell -69.573133 0 GO:0007606 sensory perception of chemical stimulus -66.771345 0 Conservation GO:0007186 G-protein coupled receptor protein signaling pathway -55.368505 0 88 Hydrophobicity GO:0042742 defense response to bacterium -10.156822 0 GO:0009607 response to biotic stimulus -5.232461 0 Clade GO:0006950 response to stress -4.145167 0.018 129 GO:0030029 actin filament-based process -8.190798 0 GO:0007265 Ras protein signal transduction -3.375746 0.015 GO:0014065 phosphoinositide 3-kinase cascade -2.923239 0.031 25 GO:0006414 translational elongation -14.67022 0 GO:0042273 ribosomal large subunit biogenesis -5.260087 0 GO:0016072 rRNA metabolic process -4.21555 0
3. Evolutionary Reaching system level knowledge extraction Mapping EvoluCodes with biological networks Phenotypic variation Gene variation Linking both ? Human proteome EvoluCodes Gene network « Evolutionary map » • evolutionary context Endoplasmic ER stress reticulum for a network (ER) • allows knowledge discovery PERK ATF6 approach eIF2 α ER stress S1P NRF2 Local outlier S2P WFS1 Schematic representation of + GADD34 factor k (A) = vertebrate gene evolutionary histories p50 ATF4 ATF6 = Cytosol « Outlier » Nucleus CHOP AARE ERSE evolutionary history Bcl2 ER stress Apoptosis recovery outlierness based on multi-scale Extracted from KEGG map hsa04141 ER stress parameters ! Apoptosis recovery
3. Evolutionary Pathway-level knowledge discovery knowledge extraction Evolutionary history and network topology Analysis of 40 human metabolic pathways KEGG Pathway database Total number of pathway reactions : 875 (www.genome.jp/kegg/) reaction 1 Redundancy Multiple genes A reaction 2 for same reaction step reaction B Alternative path 2-n for 2-n reactions pathway reaction other C Pathway interface Start/end point of pathway, END Connectivity reaction D single substrate/product E reaction Multiple substrates and/or products F Other topology, mainly linear paths
3. Evolutionary Pathway-level knowledge discovery knowledge extraction Redundancy Repartition of outliers in 6 A topological classes B E F 8% 17% C -WAY PATH C 35% D Connectivity END D 16% A 19% B E 5% F Other topology 50% 8% 40% 22% Topological 30% classification of 15% 20% outliers 27% 14% 10% 19% (normalised by total number of 0% reactions) D C B A E F evolutionary history of metabolic genes is related to pathway topology !
3. Evolutionary Cellular-level knowledge extraction knowledge extraction Global analysis for all human pathways Widely distibuted genes gastric acid secretion muscle contraction pancreatic secretion vascular smooth salivary secretion (hsa04971) bile secretion (hsa04270) (hsa04970) (hsa04972) (hsa04976) 3x Inter-pathway C graph 3x SLC9A1 (hsa:6548) B CFTR (hsa:1080) CA2 (hsa:760) 2x A D CHRM3 (hsa:1131) 2x GNAS (hsa:277 ) A B C D Graph node Representing pathways differential evolutionary behavior
Recommend
More recommend