interprotein coevolution
play

Interprotein coevolution: bridging scales from residues to genomes - PowerPoint PPT Presentation

Interprotein coevolution: bridging scales from residues to genomes Martin Weigt Laboratoire de Biologie Computationnelle et Quantitative Universit Pierre & Marie Curie Paris Inria Paris


  1. Interprotein coevolution: 
 bridging scales from residues to genomes Martin Weigt Laboratoire de Biologie Computationnelle et Quantitative Université Pierre & Marie Curie Paris Inria Paris 16 Nov 2017

  2. The different scales in protein-protein interaction Who with whom? protein-protein interaction networks

  3. The different scales in protein-protein interaction How? protein-protein interfaces inter-protein residue contacts

  4. The different scales in protein-protein interaction t Evolution? conservation and innovation of protein-protein interactions

  5. Protein sequence data are accumulating… UniProt database 100 without manual millions of sequence entries annotation 10 UniProtKB/TrEMBL UniProtKB/SwissProt 1 with manual annotation 0.1 2004 2007 2010 2013 2016

  6. …and are classified into homologous protein families Homologous proteins • frequently 10 3 –10 6 proteins per family • common evolutionary ancestry • conserved 3D structure and biological function • diverged amino-acid sequences (~20-30% sequence identity) ‣ sequence variability contains information about structure and function • >5000 families without example structures

  7. Statistical physics From models over data to thermodynamic observables: P ( S ) ∼ e − β H ( S ) X X H ( S 1 ) = − J ij S i S j − h i S i i<j i sample from model µ } µ =1 ,...,M { S … h O a ( S ) i P ' 1 µ ) X O a ( S M µ e.g. h S i i P , h S i S j i P

  8. Inverse statistical physics From data over observables to models P ( S ) ∼ e − β H ( S ) X X H ( S 1 ) = − J ij S i S j − h i S i i<j i Data: µ } µ =1 ,...,M { S … h O a ( S ) i P ' 1 µ ) X O a ( S M µ e.g. h S i i P , h S i S j i P

  9. Inverse statistical physics How to construct from data? P ( S ) ∼ e − β H ( S ) • coherence with data h O a ( S ) i P = 1 µ ) X O a ( S M µ • maximum entropy principle (least constrained model) X P ( S ) log P ( S ) → max − S ➡ analytical form of model X H ( S ) = − λ a ( S ) O a ( S ) selection of observables a requires priori biological knowledge

  10. Conservation and coevolution in proteins variable conserved active residue residue site R I D H R L K H N D T evolution F L N G R L R H D D T contact H E R Q E T G H E K L K Y R T R L T H D D L R R A M E V G H N K A T Q K E E L A H N K G coevolving residues Profile model (X ) statistical P ( a 1 , ..., a L ) ∼ exp h i ( a i ) modeling i Direct Coupling Analysis (DCA) P ( a 1 , ..., a L ) 8 9 < = X X ∼ exp J ij ( a i , a j ) + h i ( a i ) : ; [Weigt et al, PNAS ’09] i<j i [Morcos et al, PNAS ’11] strong couplings -> residue contacts

  11. Interactions between protein families Family 1 Family 2 >F7XUK6_MIDMI/129-211 >RS14_NEOSM/47-100 LAQQLEKRISFRKAAKRLIQNAM.R......M.G..AEGIKIKISGRIG.G.AEIARDQQ KLNSLPRNSSPARSKNRCSITGR..PRGYY..RKFGI..SRIQLRVLANWGKLPGVVKSS YNEGRVPL..HTLRMMIDYGTAEAH..TTYGRIGVKVWV >I0AI30_IGNAJ/35-88 >B3SEY6_TRIAD/119-201 ALQKLPRNSSVTRLKNRCMFTGR..ARAYY..RKFGV..SRLVLREMALRGEIPGLKKSS VAEQLEKKVSFRKAVKRAISNAM.K......M.G..AKGIKISVSGRLG.G.AEIARTEW >I6YSF0_MELRP/36-88 YKEGRVPL..HTLRAIVKYDMAEAH..TIYGLIGVKVWV .LQLLPRNSAPTRAHNRCLISGR..PRGYY..RKFGI..SRLVLREMALRGEIPGLKKSS >RS3_ORITB/122-204 >I0IIH6_PHYMF/34-87 IAQQLERRQSFKKVMKKAIHASM.K......Q.G..AKGIKIICSGRLG.G.VEIARSES ALSQLPRDASPTRLVTQCAITGR..TRAVY..RKFNV..SRIVLRELALQGKIPGMKKAS ? YKEGRVPL..QTIRADIRYAFAEAI..TTYGVIGVKVWV >RS14_CHLT3/35-88 >RS3_RICPR/123-205 ALRKLPRDSSPTRLKNRCSITGR..AKGVY..KKFGL..CRHILRKYALEGKIPGMKKAS IAAQLEKRVSFRKAMKTAIQASF.K......Q.G..GQGIRVSCSGRLG.G.AEIARTEW >RS14_PROA2/35-88 YIEGRMPL..HTLRADIDYSTAEAI..TTYGVIGVKVWI ALSKLPRNSSATRVRNRCVLTGR..GRGVY..EKFGL..CRHMFRKLALEGKIPGVKKAS >E1X0L6_HALMS/119-201 >D6XYV1_BACIE/35-88 IASQLEKRVAFRRAMKKVMQSAF.R......A.G..VKGIRVRTAGRLG.G.AEMARAEG ALSKLPRDSAPSRLTRRCKATGR..PRGVL..RKFEL..SRIKFRELAHKGQIPGVRKAS YSERKVPL..HTLRADIDYSTAEAH..TTYGVIGVKVWV >I0JIY2_HALH3/35-88 >I7HEJ8_9HELI/120-202 ALRKLPRDSSPTRVKRRCELSGR..PRGYM..RKFDM..SRIAFRELAHKGQIPGVKKAS IATQLEKRVAFRRAMKKVMQAAM.K......A.G..AKGIKVKVSGRLA.G.AEMARTEW >RS14_EXIS2/36-88 YMEGRVPL..HTLRAKIDYGFAEAM..TTYGIIGVKVWI .LSKLPRNSSAVRLHNRCSITGR..PHGYI..GKFGI..SRIKFRDLAHKGQIPGVKKAS >M4VDL1_9DELT/120-202 >RS14_STRR6/36-88 IAMQLEKRISWRRALKKAIAAAT.K......G.G..VRGIKVRVSGRLD.G.AEIARSEW .LSKLPRNASPTRLHNRCRVTGR..PHSVY..RKFGL..SRIAFRELAHKGQIPGVTKAS YNEKSVPL..HTLRADIDYGTAEAL..TAYGIIGMKVWI >G0VNI1_MEGEL/35-88 >RS3_HYPNA/120-202 ALSQLPANASPVRLHNRCKVTGR..PHGYM..RKFGI..CRITFRELAYKGQIPGVKKAS IARQLERRASFRRAMKRSIQSAM.R......L.G..AEGVKVVVSGRLG.G.AEIARTEK >R7PS46_9FIRM/35-88 YAEGSVPL..HTLRADIDYGTAEAT..TTYGIIGVKVWV ALSKLPRNASPTRLHNRCKLTGR..PHGYL..RKFGV..CRNQFRELAYRGEIPGVRKAS >C0QW02_BRAHW/94-176 >F8L373_SIMNZ/47-100 VARQLEMRVAFRRAMKSVITQAM.K......K.G..AKGIKVMCSGRLA.G.ADIARTEQ KLNSLPKNSSPIRRRNRCKMTGR..CRGYL..RKFQI..SRLCFREMANDGSIPGVVKAS YKNGSVPL..HTLRANIDYGTAEAL..TTFGIIGIKVWI >F8L0V7_PARAV/47-100 >J9Z1W5_9PROT/119-201 ALNKMPRDSSPIRLRNRCQLTGR..XRGYL..RKFKL..SRLTFREMALAGLLPGVTKSS IARQLEKRVAFRKAMKKSGQSAI.K......L.G..AKGIKIVCGGRLG.G.AEIARSEK >D6YVK9_WADCW/47-100 FSEGSVPL..HTLRADIDYATARAL..TTYGIIGIKVWL QLNKMRRDTSPVRLRNRCQITGR..CRGYL..SKFKV..SRLVFREMASIGMIPGVTKSS >RS3_MARMM/120-202 >L7VJR0_9FLAO/35-88 IAQQLERRVAFRRAMKRSMQSAM.R......M.G..AKGCKIVCGGRLG.G.AEIARTEQ ALQKLPKNSCTVRLRNRCKLTGR..SRGYM..RKFGV..SRISFRNLVNFGLIPGVKKSS YNEGSVPL..HTLRADIDYGTCEAK..TAMGIIGIKVWI >C7NDL0_LEPBD/41-94 >G0GFA5_SPITZ/122-204 ELSKLPRNASPTRVRNRCQINGR..PRGYM..REFGI..SRVMFRQLAGEGVIPGVKKSS IAGQLEHRASFRRVMKLAVANAM.K......A.G..VQGIKVRVSGRLG.G.AEIARSEV >RS14_FUSNN/41-94 QMAGRVPL..HTLRADIDYGFAEAR..TTYGVIGVKVWI ELNKLPKDSSAVRKRNRCQLDGR..PRGYM..REFGI..SRVKFRQLAGAGVIPGVKKSS >V6DFZ5_9DELT/122-204 >K0P015_9BACT/35-88 ISEQLEKRGSFKKAMKRAALDVM.K.......SG..AKGVKIRCAGRLG.G.AEIARDEW ALDKLPKNSSPVRLRNRCNITGR..ARGYI..RRFGI..SRLVFRKWALEGKLPGIRKAS IRVGSTPL..HTLRSDIDYGFVEAH..TTYGVIGIKVWI >RS14_AMOA5/35-88 >RS3_NEOSM/120-203 ALDKLPKNASPVRVRNRCKITGR..ARGYM..RKFGI..SRIVFREWAAQGKIPGVIKAS IAFQLEKRSSFRRVIKKAIATVM.R......ESD..VKGVKVACSGRLS.G.AEIARTEV >I4ALV0_FLELS/42-94 FKEGSIPL..HTMRADIDYWVAEAH..TTYGVIGVKVWI .LDKLPKDSSPVRLHNRCRLTGR..PRGYM..RRFGI..CRVVFREMANDGKIPGVTKSS >I0III3_PHYMF/124-207 >RS14_SALRD/35-88 IAEQLAKRASFRRVMKMKAEAAM.N......CGV..CKGVKIMLSGRLG.G.HEMSRSEV ELQKLPRDSSPVRQNNRCELCGR..QRGYL..RKFGV..CRICFRELALEGKIPGIRKAS VSLGSIPL..ATLQANVDYGFAISK..TTYGTIGVKVWI >C7PU84_CHIPD/35-88 >F0SJ92_RUBBR/120-202 ELDQLPRNASPVRLHNRCQLSGR..PKGYM..RHFGM..CRNMFRDLALAGKIPGVRKAS IAQQLGKRGSFRRALKRSMEQVM.D......A.G..AHGVKIELSGRLG.G.AEMSRKEK >F4KWV6_HALH1/35-88 GSRGSIPL..STLQRHVDYGYTTAR..TAQGIIGIKVWI ELDKLPRNSNPIRMHNRCQLTGR..PKGYM..RQFGL..CRVKFREMALYGKIPGITKSS . . . . . .

  12. Interactions between protein families What can we learn from the empirical sequence variability: • do the families interact? • which specific proteins interact? • which residues are in contact? ➡ relation between protein structure/function and evolution

  13. Prediction of inter-protein residue contacts joint MSA of protein families protein 1 protein 2 DCA Strong inter-protein couplings predict contacts [Ovchinnikov et al., eLife ’14] response regulator [Weigt et al., PNAS ‘09] histidine kinase

  14. In silico prediction of high-resolution structures of transient protein complexes SK RR SK RR ... ... DCA identifies residue contacts protein monomer structures guided molecular dynamics simulations Spo0B/0F: co-crystal [Zapf et al. (2000)] vs. our model [Schug, MW, Onuchic, Hwa, Szurmant, PNAS ‘09]

  15. Interactions between protein families What can we learn from the empirical sequence variability: • do the families interact? • which specific proteins interact? • which residues are in contact? ➡ relation between protein structure/function and evolution

  16. Specific interactions and paralog matching protein family 1 protein family 2 ? General idea: • correct matching shows inter-protein covariation • random matching has no inter-protein covariation ➡ maximise inter-protein covariation computationally • reach 80-90% of accuracy in test cases • simultaneous prediction of interacting paralogs and inter-protein contacts [Gueudré, Baldassi, Zamparo, MW, Pagnani, PNAS ’16] [Bitbol, Dwyer, Colwell, Wingreen, PNAS ’16]

  17. Interactions between protein families What can we learn from the empirical sequence variability: • do the families interact? • which specific proteins interact? • which residues are in contact? ➡ relation between protein structure/function and evolution

  18. Inference of protein-protein interaction networks Bacterial ribosomal proteins Small ribosomal subunit • 20 proteins • 21 interactions (11% of 190 pairs) Large ribosomal subunit • 29 proteins • 29 interactions (7% of 406 pairs) ‣ sparse interaction network [Feinauer, Szurmant, MW, Pagnani, PLoS ONE ’16]

Recommend


More recommend