Juliana Palma - ICTP Conference - Trieste, March 2017. SEPARATING THE WHEAT FROM THE CHAFF Tips on how to identify and characterize essential movements in frantically shaking proteins
Juliana Palma - ICTP Conference - Trieste, March 2017. Why do we do MD? • Originally: to collect data for statistical mechanics • Based on the ergodic hypothesis. • Calculate energies, free energies, diffusion coefficients, etc. • To see the movements of macromolecules • The problem: “ Imagine living in a world where a Ritcher 9 earthquake raged continuously…at the scale of proteins Bownian motions are even more furious than that.” (G. Oster and H. Wang, Molecular motors, Chapter 8. DOI: 10.1002/3527601503.ch8
Juliana Palma - ICTP Conference - Trieste, March 2017.
Juliana Palma - ICTP Conference - Trieste, March 2017. Why is that a problem? • Interesting movements, relevant for protein functioning, are mixed with the noisy irrelevant movements
Juliana Palma - ICTP Conference - Trieste, March 2017. Principal component analysis • Procedure taken from multivariate statistical analysis. • Introduced in MD by Karplus and Berendsen. • Aims to identify a reduced set of coordinates able to describe the relevant movements. • Does it (always) fulfil its aim? • Can we improve it?
Juliana Palma - ICTP Conference - Trieste, March 2017. Outlook of the presentation • PCA: • Fundamentals. • Utility / Limitations. • Consistent PCA. • Concatenated PCA. • PCA of inter/intra subunit movements. • P2X4 as example. • Conclusions.
Juliana Palma - ICTP Conference - Trieste, March 2017. What does PCA do? (basically) • Transform local coordinates to z collective coordinates. q 1 z Z q 2 • Just a few collective y coordinates explain x q 3 most of protein x y fluctuations. • Allows a reduction of the dimensionality.
Juliana Palma - ICTP Conference - Trieste, March 2017. How does it do that? • Collect coordinates from a MD 𝐘 1 𝐘 2 … 𝐘 𝑂 𝑡 𝐘 𝑙 = *𝑦 1 Number 𝑙 , 𝑦 2 𝑙 , … , 𝑦 𝑂 𝑙 + of samples Indicates Number time of coordinates
Juliana Palma - ICTP Conference - Trieste, March 2017. How does it do that • Compute the correlation matrix (covariance matrix too) 𝑂 𝑡 𝐷 11 ⋯ 𝐷 1𝑂 𝐷 𝑗𝑘 = 1 𝑙 − 𝑦 𝑗 . 𝑦 𝑘 𝑙 − 𝑦 𝑘 𝑂 𝑦 𝑗 ⋮ ⋱ ⋮ 𝐃 = 𝐷 𝑂1 ⋯ 𝐷 𝑂𝑂 𝑙=1 Linear dependence Anti-correlated Uncorrelated Correlated Linear dependence 𝐷 𝑗𝑘 = −1 −1 ≤ 𝐷 𝑗𝑘 ≤ −0.7 0.7 ≤ 𝐷 𝑗𝑘 ≤ 1 𝐷 𝑗𝑘 = 1 𝐷 𝑗𝑘 ≈ 0
Juliana Palma - ICTP Conference - Trieste, March 2017. How does it do that? • Diagonalize the correlation matrix 𝐒 𝐔 𝐃𝐒 = Λ Diagonal matrix 𝑆 11 ⋯ 𝑆 1𝑂 𝜇 1 0 0 ⋮ ⋱ ⋮ 𝑆 = 0 ⋱ 0 𝑆 𝑂1 ⋯ ⋯ 𝑆 𝑂𝑂 0 0 𝜇 𝑂 V 1 V N Eigenvalue of V N Eigenvalue of V 1 Constitute a Eigenvectors of matrix C Orthonormal basis set
Juliana Palma - ICTP Conference - Trieste, March 2017. Example in 2D 𝑂 𝑡 𝑂 𝑡 1 1 2 𝑦 𝑙 − 𝑦 𝑦 𝑙 − 𝑦 𝑧 𝑙 − 𝑧 𝑂 𝑡 𝑂 𝑡 𝑙=1 𝑙=1 𝐃 = 𝑂 𝑡 𝑂 𝑡 1 1 𝑦 𝑙 − 𝑦 𝑧 𝑙 − 𝑧 2 𝑧 − 𝑧 𝑂 𝑡 𝑂 𝑡 𝑙=1 𝑙=1 1 = 𝑆 11 𝐖 𝑆 21 𝐖 2 = 𝑆 21 𝑆 22
Juliana Palma - ICTP Conference - Trieste, March 2017. Meaning of eigenvalues and eigenvectors • The i -eigenvalue measures the squared displacement on the direction of eigenvector v i ∆𝐘(𝑢) ∆𝑤 𝑗 𝑢 𝑙 = 𝐰 𝑗 ∙ ∆𝐘 𝑢 𝑙 𝑂 𝑡 𝜇 𝑗 = 1 2 𝐰 𝑘 ∆𝑤 𝑗 𝑢 𝑙 𝑂 𝑡 𝐰 𝑗 ∆𝑤 𝑗 (𝑢) 𝑙=1
Juliana Palma - ICTP Conference - Trieste, March 2017. The importance of the eigenvalues 𝐷 11 ⋯ 𝐷 1𝑂 𝜇 1 ⋯ 0 ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ 𝐃 = ∆= 𝐷 𝑂1 ⋯ 𝐷 𝑂𝑂 0 ⋯ 𝜇 𝑂 𝑂 𝑂 𝑂 𝑂 𝑈𝑠 𝐃 = 𝐷 𝑗𝑗 = Δ𝑦 𝑗 2 𝑈𝑠 𝚬 = 𝜇 𝑗 = Δ𝑤 𝑗 2 𝑗=1 𝑗=1 𝑗=1 𝑗=1 Provides the sum of the squared fluctuations
Juliana Palma - ICTP Conference - Trieste, March 2017. Cartesian coordinates vs. Principal components Individual squared fluctuations Accumulated squared fluctuations • Total fluctuations are concentrated in a few PC-modes (< 20). • Total fluctuations are equally distributed among all Cartesian coordinates (714).
Juliana Palma - ICTP Conference - Trieste, March 2017. Vectors of the essential space are able to describe important movements • There are plenty of examples. J. S. Hub and B. L de Groot, Plos Comput. Biol. 5(8): e10004802009.
Juliana Palma - ICTP Conference - Trieste, March 2017. The essential space (subspace) • Contains the most important eigenvectors • How many are truly “essential”? • The problem with defining a subspace. 𝐰 3 𝐰 2 𝐰′ 2 𝐰′ 3 ∆𝑤 2 𝐰′ 2 𝐰′ 1 𝐰 2 ∆𝑤′ 1 𝐰′ 1 ∆𝑤′ 1 𝐰 1 𝐰 1 ∆𝑤 1 { D v 1 , D v 2 } and { D v ’ 1 , D v ’ 2 } span the { D v 1 , D v 2 } and { D v ’ 1 , D v ’ 2 } do not span same subspace the same subspace
Juliana Palma - ICTP Conference - Trieste, March 2017. Are reproducible the main PC-modes? • Run equivalent trajectories. • Compute the PC-modes for each of them. • Compute the scalar product for the PC-modes of 2 alternative runs. 1 if i = j ′ = 𝐖 𝑗 ∙ 𝐖 Ideally! Four independent comparisons. 𝑘 Each of 50 ns. System: BPTI. 0 if i ≠ j
Juliana Palma - ICTP Conference - Trieste, March 2017. Are reproducible the essential spaces? 𝑁 𝑁 𝑆𝑁𝑇𝐽𝑄 = 1 • Run equivalent ′ 𝑁 𝐖 𝑗 ∙ 𝐖 𝑘 trajectories. 𝑗=1 𝑘=1 • Compute the PC-modes. 1 if they span the • Compute the RMSIP for same subspace 𝑆𝑁𝑇𝐽𝑄 = the ES of alternative runs. 0 if subspaces are orthogonal Huge # of trajectories System: BPTI
Juliana Palma - ICTP Conference - Trieste, March 2017. Increasing time does not solve the problem
Juliana Palma - ICTP Conference - Trieste, March 2017. Increasing time does not solve the problem
Juliana Palma - ICTP Conference - Trieste, March 2017. A simple way to improve the consistency of the PC-modes • Concatenate equivalent trajectories! …, …, …, …, 𝐘 1 , 𝐘 2 , 𝐘 𝑂 𝑡 , 𝐘 𝑶 𝒕 +1 , 𝐘 𝑶 𝒕 +2 , 𝐘 2𝑂 𝑡 , 𝐘 𝑜𝑂 𝑡 traj-n traj-1 traj-2 …, …, …, …, 𝐘 1 , 𝐘 2 , 𝐘 𝑂 𝑡 , 𝐘 𝑶 𝒕 +1 , 𝐘 𝑶 𝒕 +2 , 𝐘 2𝑂 𝑡 , 𝐘 𝑜𝑂 𝑡 Concatenated trajectory
Juliana Palma - ICTP Conference - Trieste, March 2017. How to check that it works? • Estimate the RMSIP values that can be obtained using different number of concatenated trajectories Ctraj-1 Ctraj-2 Ctraj-3 Ctraj-4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Number of independent values of RMSIP = 𝑂 𝑑𝑢𝑠𝑏𝑘 𝑂 𝑑𝑢𝑠𝑏𝑘 − 1 = 12 2 Ctraj-1 Ctraj-2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Juliana Palma - ICTP Conference - Trieste, March 2017. Results for BPTI Set of 180 trajectories of 5 ns Set of 80 trajectories of 50 ns
Juliana Palma - ICTP Conference - Trieste, March 2017. Results for lysozyme Set of 180 trajectories of 5 ns Set of 80 trajectories of 50 ns
Juliana Palma - ICTP Conference - Trieste, March 2017. RMSIP distributions • Previous procedure affords statistically-independent RMSIP values. • But for large n we obtain too few values. • Too low variability. • To get more variability • Compute an even larger number of trajectories. • Form alternative pairs of concatenated trajectories by selecting at random from this set. Ctraj-1 Ctraj-1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Calculate 1 st RMSIP value Ctraj-2 Ctraj-2 Calculate 2 nd RMSIP value
Juliana Palma - ICTP Conference - Trieste, March 2017. RMSIP distributions
Juliana Palma - ICTP Conference - Trieste, March 2017. How to assess the convergence? trajectories provide good convergence, • If 𝑜 2 𝑜 trajectories provide good convergence, too. Cumulative probabilities for RMSIPs obtained with n and n /2 trajectories
Juliana Palma - ICTP Conference - Trieste, March 2017. Why does it work? • We need to understand what can be expected from the PC-modes of a concatenated trajectory. • “ The essential dynamic analysis can be performed on a combined trajectory (constructed by concatenating the trajectories). This is a powerful tool to evaluate similarities and differences between the essential motions in different trajectories of the same protein. If the motions are similar, then the eigenvalues (and eigenvectors) coming from separate trajectories and from the combined trajectory should be similar. ” Van Aalten et. al. Proteins: Structure, Function and Genetics, 22, 45-54, 1995.
Juliana Palma - ICTP Conference - Trieste, March 2017. The correlation matrix of concatenated trajectories
Juliana Palma - ICTP Conference - Trieste, March 2017. The correlation matrix of concatenated trajectories • Is the average of the individual correlation matrices plus the correlation matrix of the individual average structures. 𝐃 (2) = 𝐃 𝐵 + 𝐃 𝐶 + 𝐓 (2) 2 Corr matrix of Corr matrix of concat traj average structures Individual corr matrices
Recommend
More recommend