Protein Structure Analysis with Protein Structure Analysis with Protein Structure Analysis with Sequential Monte Carlo Method Sequential Monte Carlo Method Sequential Monte Carlo Method Jinfeng Zhang Computational Biology Lab Department of Statistics Harvard University
Introduction Introduction Introduction • Structure � Function & Interaction – Protein structure initiative (PSI) is speeding up the information flow from sequence to structures. – Information does not readily flow from structures to structures. – Neither does it readily flow from structures to applications. • What are the bottle necks? – Sampling method. – Potential function.
Sampling Methods Sampling Methods Sampling Methods -- Folding & Growth Folding & Growth -- -- Folding & Growth Folding Method Growth Method From http://www.bioinformatics.buffalo.edu/
Sequential Monte Carlo (SMC) Sequential Monte Carlo (SMC) Sequential Monte Carlo (SMC) -- Step by Step Step by Step -- -- Step by Step Each sample has a weight! . . . . . . . . . . Resampling
SMC SMC SMC -- Summary Summary -- -- Summary • Short chains: – Exhaustive enumeration, useful for evaluation of SMC performance. • Long chains: – Sequential Monte Carlo, estimating interesting properties. • The main ingredients of SMC are: – Sequence of distributions “approaching” the target distribution π ( x 1 ,…, x n ). – Sampling distribution g t+1 ( x t+1 | x 1 ,…, x t ). – Resampling scheme.
Reference for SMC Reference for SMC Reference for SMC • J.S. Liu and R. Chen (1998). SMC for dynamic systems. J Amer Statist Assoc 93 , 1032-45. • J.S. Liu (2001). Monte Carlo Strategies in Scientific Computing . Springer-Verlag. • J. Liang, J. Zhang, R. Chen, (2002). J. Chem. Phys. 117:7, 3511-3521. • J. Zhang, R. Chen, C. Tang, and J. Liang, (2003). J. Chem. Phys. 118:12, 6102-6109. • J. Zhang, Y. Chen, R. Chen, and J. Liang, (2004). J. Chem. Phys. 121:1, 592-603.
Near Native Structures of Proteins Near Native Structures of Proteins Near Native Structures of Proteins
Native State is an Ensemble of Structures Native State is an Ensemble of Structures Native State is an Ensemble of Structures Ca 2+ ATPase pump Lac repressor 2BBN • Protein functions and interactions are determined by the near native structures.
Biological Problems Biological Problems Biological Problems • Stability – Probability of NNS under Boltzmann distribution. • Function – Analysis of NNS to detect correlated structural changes. • Interaction – Near native structures with diversified interfaces. • Difficulty of protein structure prediction – Probability of NNS under uniform distribution.
Methods for Studying NNS Methods for Studying NNS Methods for Studying NNS • Experimental method, such as NMR – Study one protein at a time. Limited to protein types. • MD simulation – Computationally expensive. Applicable for small proteins. • MCMC – Folding around the constrained native structure template is not efficient. • NMR combined with MD – Vendruscolo M, et. al. Nature (2005), 433 :128-32
Near Native Structures Near Native Structures Near Native Structures -- Connecting Experimental Structures and Applications Connecting Experimental Structures and Applications -- -- Connecting Experimental Structures and Applications SMC
Representation of Protein Structures Representation of Protein Structures Representation of Protein Structures • Optimized discrete state • Accuracy of ODSM. ALA PRO model (ODSM). 3.0 SC i-1 2.5 cRMSD α i C i-1 τ i 2.0 C i+1 C i 1.5 C i-2 SC i 3 4 5 6 7 8 9 10 Discrete State GLY HIS
Sequential Monte Carlo for Sampling NNS Sequential Monte Carlo for Sampling NNS Sequential Monte Carlo for Sampling NNS SMC Native structure •Definition of NNS: –Structures with RMSD < 3 Å to native structure. –Other similarity measures are Near Native Structures possible.
Comparison with Enumeration I. Comparison with Enumeration I. Comparison with Enumeration I. -- Estimation of Number of Conformations Estimation of Number of Conformations -- -- Estimation of Number of Conformations 1ail 24 24 ln(Number of Conformations) ln(Number of Conformations) 5 State Enum. 5 State Enum. 22 22 1.042×10 9 5 State SMC 20 20 1.039×10 9 18 18 Sample size: 10,000. 16 16 14 14 12 12 10 10 11 11 12 12 13 13 14 14 15 15 Length Length
Comparison with Enumeration II. Comparison with Enumeration II. Comparison with Enumeration II. -- Estimation of NNS Estimation of NNS -- -- Estimation of NNS −6 −6 RMSD Bin: 1: 1.0 Å - 1.5 Å; −8 −8 2: 1.5 Å - 2.0 Å; 3: 2.0 Å - 2.5 Å; −10 −10 ln(Probability) ln(Probability) 4: 2.5 Å - 3.0 Å; −12 −12 −14 −14 5 . 94 × 10 -8 −16 −16 L 15 Enum. L 15 Enum. L 15 SMC −18 −18 5 . 60 × 10 -8 Sample size: 1 1 2 2 3 3 4 4 RMSD Bin RMSD Bin 10,000.
Comparison with Enumeration III. Comparison with Enumeration III. Comparison with Enumeration III. -- Estimation of Native Contacts Estimation of Native Contacts -- -- Estimation of Native Contacts Enum. Enum. Enum. a a a SMC SMC 0.8 0.8 0.8 Probability Probability Probability 1nkd, RMSD Bin-2: 0.6 0.6 0.6 1.5 Å - 2.0 Å; 0.4 0.4 0.4 0.2 0.2 0.2 1.0 1.0 1.0 b b b Enum. Enum. Enum. 0.8 0.8 0.8 SMC Probability Probability Probability 0.6 0.6 0.6 1nkd, RMSD Bin-4: 2.5 Å - 3.0 Å; 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 0 0 0 5 5 5 10 10 10 15 15 15 20 20 20 25 25 25 30 30 30 35 35 35 Native Contact Native Contact Native Contact
Probability of NNS Probability of NNS Probability of NNS -- How Difficult Protein Structure Prediction is? How Difficult Protein Structure Prediction is? -- -- How Difficult Protein Structure Prediction is? Probability of NNS for 70 non-homologous proteins grouped by their length with 5 residues per interval. −10 −20 log 10 (Probability) −30 −40 −50 RMSD < 3A −60 RMSD < 4A RMSD < 5A −70 60 80 100 120 140 Length
Probability of NNS Probability of NNS Probability of NNS -- Effect of Model Complexity Effect of Model Complexity -- -- Effect of Model Complexity Average probability of NNS for 8 proteins at partial length and full length. 0 0 0 0 a a a a a 4−state 4−state 4−state 4−state 4−state b b b b 5−state 5−state 5−state 5−state 5−state 25 25 25 25 25 −20 −20 −20 −20 log10(N) log10(N) log10(N) log10(N) log10(N) log10(P) log10(P) log10(P) log10(P) 6−state 6−state 6−state 6−state 6−state 8−state 8−state 8−state 8−state 8−state 4−state 4−state 4−state 4−state 15 15 15 15 15 −40 −40 −40 −40 5−state 5−state 5−state 6−state 6−state −60 −60 −60 −60 8−state 5 5 5 5 5 20 20 20 20 20 30 30 30 30 30 40 40 40 40 40 50 50 50 50 50 20 20 20 20 40 40 40 40 60 60 60 60 80 80 80 80 Length Length Length Length Length Length Length Length Length • 4,5,6,8-state models all have same probability of NNS.
Probability Under Boltzmann Boltzmann Distribution Distribution Probability Under Probability Under Boltzmann Distribution -- Contact Potentials Contact Potentials -- -- Contact Potentials Piotr Pokarowski et. al., PROTEINS, 59:49–57 (2005)
Probability of NNS Under Boltzmann Boltzmann Probability of NNS Under Probability of NNS Under Boltzmann Distributions Distributions Distributions • Probability of NNS for 32 proteins with length from 31 to 90. −10 −10 −10 −20 −20 −20 log 10 (Probability) −30 −30 −30 −40 −40 −40 −50 −50 −50 Uniform distribution of 5−state model Uniform distribution of 5−state model Uniform distribution of 5−state model −60 −60 −60 Boltzmann distribution of 5−state model Boltzmann distribution of 5−state model Boltzmann distribution of 6−state model 30 30 30 40 40 40 50 50 50 60 60 60 70 70 70 80 80 80 90 90 90 Length Length Bin Length Bin • Pair-wise contact potential function stabilize NNS poorly.
Summary for NNS Summary for NNS Summary for NNS • Sequential Monte Carlo (SMC) for studying near native structures (NNS). • Probability of NNS is estimated for proteins up to length 150. • Models with different complexities have same probability of NNS. • Rigorous evaluation criterion for potential functions. Contact potentials do not stabilize native structures.
Side Chain Modeling Side Chain Modeling Side Chain Modeling
Introduction Introduction Introduction • Side chain modeling is important for protein structure prediction, protein interaction, and protein design. • Most current methods are looking for single conformation with minimum potential energy. • In structure prediction, the energy of a conformation is normally calculated ignoring the side chain conformational entropy.
Questions Questions Questions • Do structures with similar compactness have similar side chain conformational entropy? • Do structures with similar fold have similar side chain conformational entropy? • Do native structures have higher side chain entropy than random structures with similar compactness or similar fold? We address these questions with our new side chain modeling method.
Recommend
More recommend