Proteins are social molecules Structural Systems Biology: Modelling Protein Interactions and Complexes Patrick Aloy cks1 mge1 ssc1 YBR135W - CKS YOR232W - GrpE YJR045C - HSP70 pcl1 pcl1 ccl1 YNL289W - cyclin YDL179W - cyclin YPR025C - cyclin † I act1 pfy1 cln1 † clb6 YFL039C - actin YOR122C - profilin YMR199W - cyclin YGR109C - cyclin † spt15 TF III B cln2 cdc28 clb5 pho85 kin28 YER148W - TBP YGR246C - transcrript _fac2 YPL256C - cyclin YBR160W - pkinase YPR120C - cyclin YPL031C - pkinase YDL108W - pkinase † cln3 YAL040C - cyclin sly1 clb4 pcl2 pcl5 vps45 YLR210W - cyclin YGL095C - Sec1 YDR189W - Sec1 clb1 YDL127W - cyclin YHR071W - cyclin clb3 YGR108W - cyclin I clb2 YDL155W - cyclin YPR119W - cyclin tlg2 vam3 sed5 YOL018C - Syntaxin YOR106W - Syntaxin YLR026C - Syntaxin vps33 YLR396C - Sec1 gic2 sac7 ira2 sdc25 yrb2 YDR309C - PBD YDR389W - RhoGAP YOL081W - RasGAP YLL016W - RasGEF YIL063C - Ran_BP1 ste20 YHL007C - PBD cdc42 rho1 rho4 ras2 ras1 gsp1 YLR229C - ras YPR165W - ras YNL098C - ras YOR101W - ras YLR293C - ras YKR055W - ras † cla4 cdc25 YNL298W - PH rdi1 BWS – Feb ‘07 YLR310C - RasGEF YDL135C - Rho_GDI A great tool to study complexes (TAP / MS) � 100 Relative Intensity [%] 50 � � � � � � � * � M � M � � � * � � � 1000 1500 2000 2500 3000 m/z Gavin*, Aloy* et al , Nature (2006). 1
Extensive re-purification of complexes Genome-wide analysis of the yeast proteome ORFs processed 6,466 Reverse tagging is a means to validate new interactors PCR product URA3 TAP Kluyveromyces lactis (30% with clear human orthologues) Homologous recombination ORFs with positive Chromosome ORF ORF homologous recombination 5,474 (85%) Screen ran to saturation Selection of strains expressing NH 2 Protein TAP COOH TAP-fusion TAP-fusion proteins 3,206 (59%) Successful TAP-purifications 1,993 (62%) � 100 Reproducibility rate of 69% on139 repeated purifications MALDI-TOF samples 52,000 Relative Intensity [%] Protein IDs 36,000 50 � � 2,760 (non redundant) � � � � � * � M � 64% of the known complexes were purified more than once M � � � � * � � 1000 1500 2000 2500 3000 m/z Capturing complex dynamics Can we use our complete screen for complexes in yeast to extract general biological principles ? and just for the record: purifications are NOT complexes 2
De novo definition of protein complexes Socio-affinity index W W TAG TAG TAG A TAG Z Z Affinity purification data A C B C C B D C V V Bait Bait Y Y B D B X X Pair Evidence ( S poke, M atrix) Cons: Pros: A-B S - M - no direct interactions information on biological re-use A-C S - S - A-D - - M - Bait Bait Y Y B-C M S S M W W Matrix B-D - - S S V V Z X X C-D - - S S Z Z A ( i , j ) = S i , j | i = bait + S i , j | j = bait + M i , j Z Z Score n i , j | i = bait S i , j | i = bait = log( prey ) Low Med High W W bait n bait f j prey n i = bait Spoke f i V V Bait Bait prey n i , j M i , j = log( ) ∑ Y Y prey f j n prey ( n prey − 1) /2 f i prey X X all − baits Biophysical meaning of Socio-affinities Biophysical meaning of Socio-affinities Real affinity ? Physical proximity ? 5 / 5 5 / 5 2 / 2 2 / 2 7 / 7 7 / 7 45 5 / 13 All 100% 100% 40 25 / 30 25 / 30 PDB PDB AP 17 / 22 17 / 22 4 / 13 90% 90% 35 Y2H % of interactions 14 / 54 Y2H Y2H 15 / 22 15 / 22 30 12 / 54 22 / 34 22 / 34 15 / 22 15 / 22 3 / 13 3 / 13 80% 80% APs cover a broad range of Kds % in physical contact % in physical contact 25 10 / 54 5 / 8 5 / 8 9 / 54 70% 70% 11 / 20 11 / 20 2 / 13 2 / 13 2 / 13 20 10 / 19 10 / 19 10 / 19 10 / 19 12 / 23 12 / 23 7 / 54 8 / 16 8 / 16 13 / 28 13 / 28 15 60% 60% 1 / 13 1 / 13 1 / 13 1 / 13 1 / 13 10 50% 50% 1 / 54 1 / 54 5 40% 40% 0 775 / 1524774 775 / 1524774 795 / 1524764 795 / 1524764 30% 30% -10 -9 -8 -7 -6 -5 -4 -3 3 / 20 3 / 20 18 / 269 18 / 269 14 / 197 14 / 197 17 / 921 17 / 921 15 / 719 15 / 719 Log (Kd) 20% 20% 5 / 95 5 / 95 4 / 124 4 / 124 10% 10% 1 full-length 0% 0% Interaction Affinity domain < 5 < 5 5-6 5-6 6-7 6-7 7-8 7-8 8-9 8-9 9-10 9-10 10-11 10-11 11-12 11-12 12-13 12-13 13-14 13-14 14-15 14-15 > 15 > 15 0,1 P < 0.08 Socio-affinity Interaction Scores Interaction Scores 0,01 Very good at removing “sticky” proteins 0,001 0,0001 0 5 10 15 20 (e.g. Vma2 present in 552 purifications but only good scores with Vma5,Vma10, Vma6 & Rav1) Socio-affinity Interaction Score 3
Clustering strategy De novo definition of protein complexes Score matrix Dendrogram Complexes • Socio-affinities capture the tendency of two 6 D - A B C D E F G H I A 6 proteins to be together under different conditions 6 A - 10 9 6 5 0 0 0 0 B 6 F H B - - 11 6 5 0 0 0 0 C 4 and thus can be used to define complexes C - - - 6 5 0 0 0 0 D A 10 10 D - - - - 0 0 0 0 0 4 E 9 E - - - - - 0 0 0 0 10 F F - - - - - - 10 6 4 G I G G - - - - - - - 4 6 6 H B C • It is known that proteins can belong to multiple H - - - - - - - - 10 I 11 I - - - - - - - - - complexes -2 Iteration Threshold 4 D • We need an iterative clustering procedure to 4 - A B C D E F G H I A 4 4 A - 8 7 4 5 0 0 0 0 B A disentangle the biological redundancy and F H B - - 9 4 5 0 0 0 0 C 7 C - - - 4 5 0 0 0 0 8 5 2 E versatility of protein complex composition 8 D - - - - 0 0 0 0 0 D 8 2 E - - - - - 0 0 0 0 F B C F - - - - - - 8 4 2 G E G I 9 G - - - - - - - 2 4 5 H 4 H - - - - - - - - 10 I I - - - - - - - - - 5 Exploring the parameters space Definitive set of protein complexes • We explored a sensible range of clustering parameters (number of • We ended up with 5,488 slightly different variations iterations, penalty values, etc) and generated 1,784 potential sets of ( isoforms ) of 491 complexes protein complexes with varying degrees of stringency • The procedure increased the coverage to 90% • We compared each set in terms of accuracy and coverage to a hand- curated set of protein complexes (Aloy et al . Science, 2004) • We retrieved 61% of the 279 previously known • The best set consisted of 491 complexes with a coverage of 83% and complexes (MIPS + literature mining) and identified, on an accuracy of 78% average, 80% of their components • Known complexes and/or functional variations were in sets with • 257 out of the 491 complexes are entirely novel slightly poorer accuracy and coverage • We picked all the sets with values of accuracy and coverage above • We found no novel components for only 20 of the 279 70% and clustered the similar complexes complexes in our gold-standard set 4
Evidence supporting the modular organisation Modular organisation of protein complexes • Core average size 3.1 [1-23] • Module average size 2.9 [2-9] • Modules associated on average to 3.3 cores Modularity and cross-talk Functional requirements between functions & compartments (RNA processing and degradation) Defense Defense Environment Environment Signaling Signaling Cell Cell Cell Cell Cell Cell Transcription Transcription cycle cycle fate fate transport transport Energy Energy Metabolism Metabolism Prot. fate Prot. fate Prot. synthesis Prot. synthesis mRNA processing mRNA processing Unknown Unknown Cell Cell cycle cycle Cell Cell fate fate Cell Cell transport transport Defense Defense Energy Energy Environment Environment Metabolism Metabolism Cores Cores Prot. fate Prot. fate Prot. synthesis Prot. synthesis Signaling Signaling Unknown Unknown Modules Modules 5
Rationalising phenotypes Rationalising phenotypes through complex architecture through complex architecture 15 Nb of complexes 10 5 0 • Protein networks may provide a molecular frame for the interpretation 0 ≤ 50 >50 of “simple” genetic traits: essentiality (only ~20% in yeast) Similarity score Complex core Random • Recent phenotypic screens moved beyond essentiality in single growth condition • Aim at providing phenotypic profiles for each genes But where are the details? Hierarchical, dynamical and modular organisation of protein complexes ira2 sdc25 OL081W - RasGAP YLL016W - RasGEF ras2 ras1 5W - ras YNL098C - ras YOR101W - ras cdc25 YLR310C - RasGEF • 491 complexes (257 novel) with over 5000 isoforms • 147 functional (??) modules Gavin*, Aloy*, et al . (2006) Nature Bravo & Aloy (2006) Curr Opin Struct Biol 6
Can we use 3D structures to understand Do homologous proteins interact the interaction space? in the same way ? 1. Interface Aloy et al . (2003) J Mol Biol 2. Specificity RhoGAP B A ras B’ A’ A’’ B’’ Chothia & Lesk, EMBO J. 1986 iRMSD vs PID iRMSD vs PID Dom1 Dom2 Dom1 Dom2 SH2 Asp transcarbamylase Thr deaminase SH2 SH3 Ferredoxin-like SH3 lck abl iRMSD low medium high SH2 – SH3 iRMSD 80 th percentile 80th percentile 90th percentile 10 Å iRMSD % sequence identity % Sequence Identity Aloy et al . (2003) J Mol Biol http://www.russell.embl.de/simint Aloy et al . (2005) Curr Opin Struct Biol 7
Recommend
More recommend