Protein structure and evolution GT MASIM 16 novembre 2017 Mathilde Carpentier Maître de conférences UPMC Atelier de BioInformatique Institut de Systématique, Evolution, Biodiversité MNHN – CNRS – UMPC - EPHE
Atelier de BioInformatique (ABI) l’ISYEB (MNHN) since October 2015 Permanent membres G Achaz S Brouillet C Bertrand M Carpentier S Pasek J Pothier M Boccara Associated members B Billoud E Duchaud G Sapriel H Soldano I Lafontaine P Brezellec
Atelier de BioInformatique (ABI) ISYEB (MNHN) since October 2015 Spéciation Dynamique des populations Modèles d'évolution Phylogénie Evolution moléculaire Topologie, repliement Alignement de séquences Modélisation moléculaire Anomalies de congruence Graphes de similarité Classification Génomique Structure des protéines, Métagénomique des ARN et morphogénèse Extraction de motifs Data mining
Introduction Protein structure comparison Structural database scanning - Yakusa 1,2 www.rpsb.jussieu.fr/Yakusa/ Multiple structural alignment - Gok (KMR + alpha angles) - « m-diagonals » methods - Gibbs sampler method - Relational motifs (Triades) 3,4 1 M. Carpentier, S. Brouillet, J. Pothier, YAKUSA: a fast structural databases scanning method, Proteins: Structure, Function, and Bioinformatics, volume 61, issue 1, pages 137-51. 2 C. Alland, F. Moreews, D. Boens, M. Carpentier, S. Chiusa, M. Lonquety, N. Renault, Y. Wong, H. Cantalloube, J. Chomilier, J. Hochez, J. Pothier, B.O. Villoutreix, J.-F. Zagury, P. Tuffery, ; RPBS: a web resource for structural bioinformatics , Nucleic Acid Research, 2005, 33: W44-W49 3 N. Pisanti, H. Soldano, M. Carpentier, J. Pothier. A relational extension of the notion of motifs: application to the common 3D protein substructures searching problem . J Comput Biol (2009) N. Pisanti, H. Soldano, M. Carpentier, Incremental Inference of Relational Motifs with a Degenerate Alphabet , Lecture Note in Computer Science (2005). 4 N. Pisanti, H. Soldano, M. Carpentier, J. Pothier, I mplicit and Explicit Representation of Approximated Motifs KCL series book, edited by C. Iliopoulos, K. Park and K. Steinhfel (2005)
Introduction Comparison of sequence and structure alignment methods Do structure aligment methods detect homology? • Are they better than sequence alignment methods? • Is structure really more conserved than sequence? •
Methods Comparison of sequence and structure alignment methods Are structural alignments really better than sequence alignments ? Data Reference dataset = Manually curated protein multiple alignments with resolved structures - Balibase 2 1 : 29 alignments - Balibase 3 2 : 38 alignments 161 alignements - Sisyphus 4 : 94 alignments with a ”core” alignment. Homstrad 3 : 365 alignments Problems: alignment by CE, no manual curation, core=SSE 1 Thompson et al. 1999 3 Mizuguchi et al 1998 2 Thompson et al. 2005 4 Andreeva et al 2007
Methods Comparison of sequence and structure alignment methods distribution of core alignment % identity for all databases Data 80 BB2, BB3, sisyphus 60 nb Ali 40 20 0 0 20 40 60 80 100 %Id
Methods Comparison of sequence and structure alignment methods Scores 1 - Sum of pairs (SP) : proportion of correctly aligned pairs - Total Column (TC) : proportion of correctly aligned columns 1 Thompson et al. 1999
Methods Comparison of sequence and structure alignment methods Sequence alignment methods DIALIGN Morgenstern et al. 1998 CLUSTALW Thompson et al 1994 TCOFFEE Notredame et al 2000 MAFFT Katoh et al 2002 MUSCLE Edgar 2004 PRANK Loytnoja et al 2005 PROBCONS Mhabhashyam et al. 2005 CLUSTALO Sivers et al. 2011
Methods Structural alignment methods Structure+Sequence alignment methods CBA J. Ebert 2006 SSAP C. Orengo & W. Taylor 1989 Shatsky et al. 2006 STACCATO STAMP R. Russell and G. Barton 1992 STRAP C. Gille 2006 multal Taylor, Flores et Orengo 1994 UCSF Chimera E. Meng et al. 2006 CURVE D. Zhi 2006 ProFit ACR. Martin 1996 CAALIGN T.J. Oldfield 2007 CE/CE-MC I. Shindyalov 2000 CLEMAPS W-M. Zheng 2007 Matras K. Nishikawa 2000 3DCOFFEE Notredame et al. 2007 PrISM B. Honig 2000 PyMOL W. L. DeLano 2007 MASS O. Dror and H. Wolfson 2003 SALIGN M.S. Madhusudhan et al. 2007 MolCom S.D. O'Hearn 2003 SSM E. Krissinel 2003 Vorolign Birzele F, Gewehr J E, Csaba2007 MALECON S. Wodak 2004 BLOMAPS W-M. Zheng & S. Wang 2008 MultiProt M. Shatsky and H. Wolfson2004 Matt/Formatt M. Menke 2008 SWAPSC Mario A. Fares 2004 mistral micheletti et orland 2009 C-BOP E. Sandelin 2005 SMOLIGN H. Sun et al 2010 MAMMOTH-mult D. Lupyan 2005 EpitopeMatch S. Jakuschev 2011 MUSTANG A.S. Konagurthu et al. 2005 3DCOMB S. Wang and J. Xu 2012 msTALI P. Shealy & H. Valafar 2012 POSA Y. Ye and A. Godzik 2005 mulPBA A.P. Joseph et. al. 2012 TetraDA J. Roach 2005 F. Kaiser et al. Fit3D[9] 2015 CBA J. Ebert 2006
Results SP (sum of pairs) Boxplots of SP scores for BB2, BB3, sisyphus Number of alignments in red MUSTANG 128 ● ● ● ● MAMMOTH 105 3DCOMB 97 ● ● ● ● ● TCOFFEE_TM 141 CE 9 MAFFT_ginsi 142 FORMATT 132 TCOFFEE_SEQ 142 SALIGN 134 CLUSTALO 160 STAMP 70 PRANK 159 MULTIPROT 122 MUSCLE 142 CLUSTALW 158 DIALIGN 159 STACCATO 63 0.0 0.2 0.4 0.6 0.8 1.0
Results TC (Total Columns) Boxplots of TC scores for BB2, BB3, sisyphus Number of alignments in red MAMMOTH 105 3DCOMB 97 MUSTANG 128 CE 9 FORMATT 132 MULTIPROT 122 TCOFFEE_TM 141 MAFFT_ginsi 142 CLUSTALO 160 STAMP 70 TCOFFEE_SEQ 142 SALIGN 134 PRANK 159 MUSCLE 142 CLUSTALW 158 DIALIGN 159 STACCATO 63 0.0 0.2 0.4 0.6 0.8 1.0
Results Median SP for each program for BB2, BB3, sisyphus 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 3DCOMB MUSTANG ● FORMATT MULTIPROT ● SALIGN ● ● ● CLUSTALO 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 CLUSTALW DIALIGN ● MAFFT_ginsi MUSCLE ● PRANK 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 TCOFFEE_SEQ CE ● MAMMOTH STACCATO STAMP TCOFFEE_TM 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 ● ● 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
Results Median TC for each program for BB2, BB3, sisyphus 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 3DCOMB MUSTANG FORMATT MULTIPROT ● ● SALIGN ● CLUSTALO 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 CLUSTALW ● DIALIGN ● MAFFT_ginsi ● MUSCLE PRANK 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 TCOFFEE_SEQ CE MAMMOTH TC STACCATO STAMP ● TCOFFEE_TM 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 ● 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 ● 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ● 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 %Id
Results SP Residues in helices Boxplots of SP scores for BB2, BB3, sisyphus Number of alignments in red CE 3 3DCOMB 55 ● ● ● MAMMOTH 63 ● MUSTANG 69 ● ● PRANK 91 MAFFT_ginsi 85 TCOFFEE_TM 85 SALIGN 75 MULTIPROT 66 CLUSTALO 91 STAMP 44 CLUSTALW 91 FORMATT 73 TCOFFEE_SEQ 85 MUSCLE 85 DIALIGN 91 STACCATO 25 0.0 0.2 0.4 0.6 0.8 1.0
Results SP Residues in strands Boxplots of SP scores for BB2, BB3, sisyphus Number of alignments in red CE 4 MAMMOTH 70 MUSTANG 80 3DCOMB 60 ● TCOFFEE_TM 93 MAFFT_ginsi 94 CLUSTALO 104 PRANK 104 STAMP 47 FORMATT 82 SALIGN 84 MULTIPROT 73 CLUSTALW 104 MUSCLE 94 DIALIGN 104 STACCATO 26 0.0 0.2 0.4 0.6 0.8 1.0
Results SP Other residues Boxplots of SP scores for BB2, BB3, sisyphus Number of alignments in red CE 4 3DCOMB 60 ● ● MAMMOTH 68 MUSTANG 78 ● ● MAFFT_ginsi 95 CLUSTALO 102 TCOFFEE_TM 95 PRANK 102 TCOFFEE_SEQ 95 CLUSTALW 102 STAMP 47 SALIGN 83 MULTIPROT 73 FORMATT 80 MUSCLE 95 DIALIGN 102 STACCATO 26 0.0 0.2 0.4 0.6 0.8 1.0
Results SP Buried residues Boxplots of SP scores for BB2, BB3, sisyphus Number of alignments in red CE 4 3DCOMB 59 ● ● ● ● MUSTANG 78 ● ● ● ● MAMMOTH 72 ● ● ● ● ● ● MULTIPROT 73 FORMATT 79 TCOFFEE_TM 92 MAFFT_ginsi 93 CLUSTALO 103 SALIGN 83 PRANK 103 STAMP 47 TCOFFEE_SEQ 93 CLUSTALW 103 MUSCLE 93 DIALIGN 103 STACCATO 26 0.0 0.2 0.4 0.6 0.8 1.0
Recommend
More recommend