identification of direct residue contacts in protein
play

Identification of direct residue contacts in protein-protein - PowerPoint PPT Presentation

Identification of direct residue contacts in protein-protein interactions from multi-species sequence data Martin Weigt Institute for Scientific Interchange, Torino joint work with R.A. White, H. Szurmant, J.A. Hoch, T. Hwa [ MW et al, PNAS


  1. Identification of direct residue contacts in protein-protein interactions from multi-species sequence data Martin Weigt Institute for Scientific Interchange, Torino joint work with R.A. White, H. Szurmant, J.A. Hoch, T. Hwa [ MW et al, PNAS 106 , 67 (2009)]

  2. Outline • Motivation: Coevolution / sequence correlation of interacting proteins • Local inference: Mutual information • Global inference: Disentangling direct from indirect coupling • Prediction of protein complex structures • Outlook

  3. Protein-protein interactions Mutation

  4. Protein-protein interactions Repair Conservation Compensatory mutation ‣ inter-protein correlations Use sequence variability of homologous proteins across genomes!

  5. Two-component signal transduction • most common signaling system in bacteria Membrane P H D ATP Signal SK ATPase RR Output Histidine Kinase Response Regulator • conservation: most SK, RR described by same two HMMs • amplification: ~ O (10) interacting pairs per genome • specificity of interaction: cross-talk between TCS under negative selection • genomic location: interacting pairs frequently in same operon How do these proteins interact?

  6. Two-component signal transduction • one known cocrystal structure (Zapf et al., Structure 2000) ‣ allows for checking results of sequence analysis Spo0F Spo0B

  7. Data • ca. 600 bacterial genomes • scanned with Pfam HMMs HisKA, RR ➡ global alignment: N SK = 87 , N RR = 117 ➡ M ~ 7000 SK-RR pairs in same operon: SK RR species 1 species 2 ... ... ... i j f i ( A i ) f j ( A j ) f ij ( A i , A j ) ➡ correlations in frequency counts = contact pair in dimer ?

  8. Mutual information as covariance measure f ij ( A i , A j ) log f ij ( A i , A j ) f i ( A i ) f j ( A j ) − MI (0) � MI ij = ij A i ,A j A B 0.4 1 MI MI specificity MI rand 0.3 0.2 0.5 (t) MI 0.1 0 0 0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1 min separation of atoms (A) sensitivity

  9. Direct vs. indirect interaction j i MI ij j i j i ➡ need to consider i and j in context of other residues

  10. Statistical model learning • model data via global distribution such that P ( A 1 , ..., A N SK + N RR ) � P ij ( A i , A j ) = P ( A 1 , ..., A N SK + N RR ) = f ij ( A i , A j ) { A k | k � = i,j } • maximum-entropy model: � P ( A 1 , ...., A N SK + N RR ) ln P ( A 1 , ...., A N SK + N RR ) max − → { A i } ➡ disordered 21-states Potts model     � � P ( A 1 , ..., A N SK + N RR ) ∼ exp e ij ( A i , A j ) + h i ( A i )  −  i<j i

  11. Statistical model learning (II) Computational problem: Inverse Potts problem ➡ determine model parameters coherent with data � � H = e ij ( A i , A j ) − h i ( A i ) ij i ➡ solved via iterative two-step procedure: (i) given test parameters, estimate two-site distributions (MCMC, message passing) (ii) update parameters ∆ e ij ( A i , A j ) = ε [ P ij ( A i , A j ) − f ij ( A i , A j )] ➡ introduce direct information as measure for direct coupling (MI due to single link)

  12. Mutual information vs. direct information 271,18 267,15 0.1 291,21 α 1 298,14 α 2 294,14 α 1 C 275 N 275,22 272,14 291 22 0.08 272 21 294 272,18 271 294,21 18 268 298 272,21 268,14 15 0.06 275,21 99 267 14 291,22 DI 264 268,18 94 268,15 272,22 84 95 90 56 275,18 α 4 0.04 251,22 87 268,22 257 257,56 251,95 257,84 251,87 251,56 268,56 257,90 252,84 Spo0F 0.02 252 252,90 251,94 251,84 252,99 252,56 264,56 C 251 264,84 251,99 257,99 264,90 251,90 N 264,99 HK853 0 0 0.1 0.2 0.3 0.4 MI • high DI = spatial vicinity, defines interaction surface • low DI = far in 3D structure, but important for phosphotransfer (independent evidence from mutation and NMR studies)

  13. Direct / mutual information vs. distance A B 0.4 1 MI MI specificity 0.12 MI DI DI rand DI 0.3 0.09 0.2 0.5 (t) 0.06 MI 0.1 0.03 0 0 0 0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1 min separation of atoms (A) sensitivity

  14. Predicting complexed protein structures work in progress with A. Schug (UCSD) Input: • monomer structures of Spo0B, Spo0F (native-structure based model) • contact residue pairs (attractive pair interactions) Output: • complex structure • 3.3A mean-square deviation from known Spo0B/0F co-crystal Native Simulation Distance in Spo 0 B Spo 0 F Distance Parameter Prediction 3 7 1 5 7.2 Å 5.5 Å 5.6 Å 3 8 1 4 6.1 Å 5.5 Å 5.8 Å 4 1 1 8 7.1 Å 5.5 Å 5.8 Å 4 2 1 8 6.8 Å 5.5 Å 9.5 Å 4 2 1 4 8.9 Å 5.5 Å 9.3 Å 4 5 2 2 8.5 Å 5.5 Å 11.2 Å

  15. Outlook • Statistical-physics challenges: ‣ inverse Ising / Potts model - reconstruct Hamiltonian from microscopic configurations - finite-sample fluctuations - dilution - describe data as good as possible with as few non-zero links as necessary - correlated input sequences (phylogenetic bias) • Biological challenges: ‣ interactome scale: detect computationally efficient signature for domain-domain interactions ‣ protein-family scale: predictions of specific interaction partners in case of amplified proteins ‣ protein scale: DI informs structural prediction ‣ aminoacid scale: molecular recognition code - influence of mutations, physical interaction mechanisms vs. statistical analysis

Recommend


More recommend