computational challenges in computing nearest neighbor
play

Computational Challenges in Computing Nearest Neighbor Estimates of - PowerPoint PPT Presentation

Computational Challenges in Computing Nearest Neighbor Estimates of Entropy for Large Molecules Home Page Title Page Contents E. James Harner, Harshinder Singh Shengqiao Li, and Jun Tan Page 1 of 40 Research


  1. Computational Challenges in Computing Nearest Neighbor Estimates of Entropy for Large Molecules Home Page Title Page Contents ◭◭ ◮◮ E. James Harner, Harshinder Singh Shengqiao Li, and Jun Tan ◭ ◮ Page 1 of 40 Research supported by: Biostatistics Branch, National Institute for Occupational Go Back Safety and Health, Morgantown, WV Full Screen September 19, 2003 Close Quit

  2. Home Page Title Page Probabilistic Modelling of Molecular Vibrations Contents ◭◭ ◮◮ ⋆ Modelling random vibrations in molecules is important for studying their prop- erties and functions. ◭ ◮ ⋆ Entropy is a measure of freedom of a system to explore its available configuration space. Page 2 of 40 ⋆ Entropy evaluation is important in order to understand the factors involved in Go Back the stability of a conformation and the change from one conformation to another. Full Screen Close Quit

  3. Home Page Entropy in Protein Folding Title Page Contents ⋆ Proteins are biological molecules that are of primary importance to all living organisms. ◭◭ ◮◮ ⋆ Protein are made up of many amino acids (called residues) linked together. ◭ ◮ ⋆ A human body contains over 30,000 different kinds of proteins. ⋆ Protein misfolding is the cause of protein-folding diseases: Alzheimers disease, Page 3 of 40 mad cow disease, cystic fibrosis and some types of cancer. ⋆ It is important to study the stability of a protein and the key is to find a small Go Back molecule (a drug) that can stabilize the normally folded structure. Full Screen Close Quit

  4. Insulin Protein Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 4 of 40 Go Back Full Screen Close Quit

  5. Home Page Entropy Title Page ⋆ Entropy of a molecular conformation depends on the coordinates of the confor- Contents mation. These are: ◭◭ ◮◮ – Bond lengths – Bond angles ◭ ◮ – Torsional angles (dihedral or rotational degrees of freedom) ⋆ 1. and 2. are rather hard coordinates, entropy is mainly determined by fluctua- Page 5 of 40 tions in torsional angles. Go Back ⋆ Probability modeling of torsional angles of a molecular system is important for entropy evaluation. Full Screen Close Quit

  6. Methanol Molecule Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 6 of 40 Go Back Full Screen Close Quit

  7. Home Page Probabilistic Modelling of Torsional Angles Title Page Contents ⋆ In molecular biology literature, torsional angles are assumed to have multivariate Gaussian (Normal) distribution (Karplus and Kushik (1981), Macromolecules, ◭◭ ◮◮ Levy et al (1984), Macromolecules) . The entropy is then given by ◭ ◮ S c = mk B + k B 2 ln[(2 π ) m | Σ | ] 2 Page 7 of 40 ⋆ S c is estimated by using the maximum likelihood estimate of the determinant of the variance-covariance matrix of torsional angles using data on torsional angles Go Back of the molecular system Full Screen Close Quit

  8. Home Page Probability Modeling of Torsional Angles Title Page ⋆ There are common situations where assuming a Gaussian distribution for tor- Contents sional angles is not realistic, e.g., ◭◭ ◮◮ – Modeling a torsional angle which has more than one peak. – Modeling a torsional angle where there is more free movement, e.g., in gases. ◭ ◮ ⋆ In Demchuk and Singh( 2001, Molecular Physics ) Page 8 of 40 – We proposed a circular probability modeling approach for modeling torsional angles. Go Back – The torsional angle of the methanol molecule was modeled by using a von Mises distribution (most commonly used distribution on the circle). Full Screen Close Quit

  9. Probability Modeling of Tosional Angles Home Page ⋆ A circular random variable Θ follows l -mode von Mises distribution if its proba- Title Page bility density function is given by: Contents 1 2 πI 0 ( κ ) e κ cos[ l ( θ − θ 0 )] , f ( θ ) = − π ≤ θ < π. ◭◭ ◮◮ κ = concentration parameter, l = number of modes ◭ ◮ I 0 = Modified Bessel function of order 0 θ 0 = Position of first mode For l > 2, the modes are 2 π/l radians apart. Page 9 of 40 ⋆ For l = 1: Go Back – Mean angle is θ 0 . Full Screen – If κ = 0, it is uniform distribution – For large κ , it is approximately Gaussian dist. Close Quit

  10. von Mises Distribution Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 10 of 40 Go Back Full Screen Close Quit

  11. von Mises Distribution Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 11 of 40 Go Back Full Screen Close Quit

  12. Home Page Probability Modeling of Torsional Angles Title Page We assumed independent von Mises distributions for torsional angles. Let Θ i have an Contents l i -mode von Mises distribution, i = 1 , 2 , , m with concentration parameter κ i . Then the entropy of the system is given by: ◭◭ ◮◮ I 1 ( k i ) � � S c = k B [ m ln 2 π + ln I 0 ( k i )] − k i I 0 ( k i ) ◭ ◮ where I 1 is the modified Bessel function of order 1. From the Boltzman Gibbs distri- Page 12 of 40 bution, the potential energy of the system is given by m V ( θ 1 , θ 2 , . . . , θ m ) = 1 Go Back � k i [1 − cos( l i ( θ i − θ i 0 ))] , B i =1 Full Screen Close Quit

  13. Home Page Modeling Torsional Angle of Methanol Title Page As a case study, we considered the torsional angle of a methanol molecule. We Contents assumed a 3-mode von Mises distribution for its torsional angle Θ i.e.: ◭◭ ◮◮ 1 2 πI 0 ( κ ) e κ cos[3( θ − θ 0 )] , f ( θ ) = − π ≤ θ < π. ◭ ◮ The potential energy Page 13 of 40 V (Θ) = k B [1 − cos(3( θ − θ 0 ))] = V 0 2 [1 − cos 3( θ − θ 0 )] Go Back where V 0 = maximum potential energy. Full Screen Close Quit

  14. Home Page A Bathtub Shaped Distribution for Potential Energy Title Page For methanol molecule, the potential energy is Contents V = V 0 ◭◭ ◮◮ 2 [1 − cos 3( θ − θ 0 )] Assuming Θ to have a 3-mode von Mises distribution, we derived the following p.d.f. ◭ ◮ for V : 1 πI 0 ( κ ) e κ (1 − 2 v v 0 ) v − 1 2 ( v 0 − v ) − 1 g ( v ) = 2 Page 14 of 40 This is a bathtub shaped probability distribution. For κ = 0 , V/V 0 has beta(1/2, Go Back 1/2) distribution Full Screen Close Quit

  15. A Bath-tub Shaped Distribution Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 15 of 40 Go Back Full Screen Close Quit

  16. Histograms of Torsional Angle and Energy Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 16 of 40 Go Back Full Screen Close Quit

  17. Fitting von Mises and Bath-tub Shaped Distributions Home Page Title Page Contents ◭◭ ◮◮ ◭ ◮ Page 17 of 40 Go Back Full Screen Close Quit

  18. A Bivariate Circular Model(Singh et al, 2002, Biometrika) Home Page Title Page ⋆ Let Θ 1 and Θ 2 be the two circular random variables. We introduced a joint probability distribution for Θ 1 and Θ 2 with pdf given by Contents ◭◭ ◮◮ f ( θ 1 , θ 2 ) = Ce κ 1 cos( θ 1 − µ 1 )+ κ 2 cos( θ 2 − µ 2 )+ λ sin( θ 1 − µ 1 ) sin( θ 2 − µ 2 ) , − π = θ 1 , θ 2 < π, ◭ ◮ where κ 1 , κ 2 ≥ 0 , −∞ < λ < ∞ , − π ≤ µ 1 , µ 2 < π and C is normalizing constant Page 18 of 40 ⋆ If fluctuations in Θ 1 and Θ 2 are sufficiently small, then (Θ 1 , Θ 2 ) follows approxi- mately a bivariate normal distribution with Go Back κ 2 κ 1 λ σ 2 κ 1 κ 2 − λ 2 , σ 2 1 = 2 = κ 1 κ 2 − λ 2 , ρ = . √ κ 1 κ 2 Full Screen Close Quit

  19. A Bivariate Circular Model Home Page Title Page ⋆ The normalizing constant C is given by � � λ 2 Contents ∞ � m 1 � 2 m � C = 4 π 2 I m ( κ 1 ) I m ( κ 2 ) m 4 κ 1 κ 2 ◭◭ ◮◮ m =0 where I m is a modified Bessel function of order m . ◭ ◮ ⋆ E [sin(Θ i − µ i )] = 0 , i = 1 , 2 implies that ?i is the circular mean of Θ i . Page 19 of 40 ⋆ Circular variance of Θ 1 is given by Go Back � � λ 2 ∞ � m � 2 m � 1 − E [cos( θ 1 − µ 1 )] = 1 − 4 Cπ 2 I m +1 ( κ 1 ) I m ( κ 2 ) Full Screen m 4 κ 1 κ 2 m =0 Close Quit

  20. A Bivariate Circular Model Home Page Title Page ⋆ The conditional distributions of Θ 1 and Θ 2 are von Mises Contents ⋆ The marginal distribution of Θ 1 is symmetric around θ 1 = µ 1 and unimodal (bimodal) when ◭◭ ◮◮ A ( κ 2 ) = I 1 ( κ 2 ) I 0 ( κ 2 ) ≤ ( ≥ ) κ 1 κ 2 ◭ ◮ λ 2 ⋆ A generalization which allows multiple peaks in marginal distributions Page 20 of 40 Go Back f ( θ 1 , θ 2 ) = Ce κ 1 cos( l 1 ( θ 1 − µ 1 ))+ κ 2 cos( l 2 ( θ 2 − µ 2 ))+ λ sin( l 1 ( θ 1 − µ 1 )) sin( l 2 ( θ 2 − µ 2 )) , − π ≤ θ 1 , θ 2 < π, Full Screen where l 1 , l 2 are positive integers. Close Quit

  21. Home Page Nearest Neighbor Estimates of Entropy(Singh et al., 2002) Title Page ⋆ Let X 1 , X 2 , .., X n be a random sample from a population with pdf f ( x ). Contents ⋆ R i,k = Euclidean distance from X i to its k th closest neighbor. ◭◭ ◮◮ ⋆ Then a reasonable estimate of f ( X i ) is given by R p i,k π p/ 2 ◭ ◮ k Γ( p/ 2 + 1) = k ˆ f ( X i ) n Page 21 of 40 ⋆ The above equation gives Go Back f ( X i ) = k Γ( p/ 2 + 1) ˆ i,k π p/ 2 , i = 1 , 2 , 3 , . . . , n, nR p Full Screen Close Quit

Recommend


More recommend