nested sampling with demons
play

Nested sampling with demons Michael Habeck Max Planck Institute for - PowerPoint PPT Presentation

Nested sampling with demons Michael Habeck Max Planck Institute for Biophysical Chemistry and Institute for Mathematical Stochastics Gttingen, Germany Amboise, September 23, 2014 Bayesian inference Probability rules posterior evidence


  1. Nested sampling with demons Michael Habeck Max Planck Institute for Biophysical Chemistry and Institute for Mathematical Stochastics Göttingen, Germany Amboise, September 23, 2014

  2. Bayesian inference • Probability rules posterior × evidence likelihood × prior = Pr ( θ | D , M ) × Pr ( D | M ) Pr ( D | θ, M ) × Pr ( θ | M ) = p ( θ ) × Z L ( θ ) × π ( θ ) = • Inference • Evidence ∫ Z = L ( θ ) π ( θ ) d θ • Posterior p ( θ ) = 1 Z L ( θ ) π ( θ )

  3. Nested sampling • The evidence reduces to a one-dimensional integral: ∫ 1 ∫ Z = L ( θ ) π ( θ ) d θ = L ( X ) d X 0 summing over prior mass ∫ X ( λ ) = π ( θ ) d θ, X (0) = 1 , X ( ∞ ) = 0 . L ( θ ) ≥ λ

  4. Nested sampling • The evidence reduces to a one-dimensional integral: ∫ 1 ∫ Z = L ( θ ) π ( θ ) d θ = L ( X ) d X 0 summing over prior mass ∫ X ( λ ) = π ( θ ) d θ, X (0) = 1 , X ( ∞ ) = 0 . L ( θ ) ≥ λ • Prior masses can be ordered: X ( λ ) < X ( λ ′ ) if λ > λ ′ • Idea: We can evaluate L exactly and estimate X

  5. Estimation of prior masses • Nested sequence of truncated priors: { 0 ; p ( θ | λ ) = Θ[ L ( θ ) − λ ] x < 0 where Θ( x ) = π ( θ ) x ≥ 0 X ( λ ) 1 ; • Distribution of prior masses at likelihood contour λ : X ∼ Uniform (0 , X ( λ )) • Order statistics: X max ∼ N X N − 1 X ( λ ) N where X max is the maximum of N uniformely distributed X n ∼ Uniform (0 , X ( λ ))

  6. Nested sampling algorithm Require: N (ensemble size), m (number of iterations) λ 0 = 0 , X 0 = 1 , S = { θ 1 , . . . , θ N } (states) where θ n ∼ π ( θ ) ▷ Initialize for k = 1 , . . . , m do θ k = argmin { L ( θ n ) | n = 1 , . . . , N } ▷ Store state with smallest likelihood λ k = L ( θ k ) ▷ Define new likelihood contour X k = t X k − 1 where t ∼ N t N − 1 ▷ Estimate prior mass θ ∼ p ( θ | λ k ) ▷ Draw new state from truncated prior S ← S \ { θ k } ∪ { θ } ▷ Replace current with new state end for Z = ∑ m k =1 λ k ( X k − 1 − X k ) ▷ Estimate evidence

  7. Compression • Compression achieved by a single nested sampling iteration ∫ H ( λ → λ ′ ) p ( θ | λ ′ ) ln [ p ( θ | λ ′ )/ p ( θ | λ )] d θ = ln [ X ( λ )/ X ( λ ′ )] = • Average compression is constant ⟨ H ( λ → λ ′ ) ⟩ = −⟨ ln t ⟩ t ∼ Beta ( N , 1) = 1/ N

  8. Physical analogy Bayesian inference Statistical physics model parameters θ configuration/microstate θ negative log likelihood − ln L ( θ ) potential energy E ( θ ) log prior mass ln X ( λ ) volume entropy S ( ϵ ) = ln X ( ϵ ) likelihood contour L ( θ ) > λ energy bound E ( θ ) < ϵ ∫ ϵ prior mass X ( λ ) cumulative DOS X ( ϵ ) = −∞ g ( E ) d E e − β E g ( E ) d E evidence Z = L ( X ) d X partition function Z ( β ) = ∫ ∫ truncated prior microcanonical ensemble

  9. Microcanonical ensemble • Density of states (DOS) ∫ g ( E ) = δ [ E − E ( θ )] π ( θ ) d θ = ∂ E X ( E ) • Microcanonical entropy and temperature: S ( E ) = ln X ( E ) , T ( E ) = 1/ ∂ E S ( E ) • Compression: ∫ ϵ ′ H ( ϵ ′ → ϵ ) = S ( ϵ ′ ) − S ( ϵ ) = β ( E ) d E ϵ where the inverse temperature β = 1/ T measures the entropy production

  10. Enter the demon • Implement truncated prior as microcanonical ensemble with additional demon absorbing energy D : 1 p ( θ, D | ϵ ) = X ( ϵ ) δ [ ϵ − D − E ( θ )] Θ( D ) π ( θ ) and explore constant energy shells • Creutz algorithm: Require: ϵ (upper bound on total energy) θ ∼ π ( θ ) with energy E = E ( θ ) ≤ ϵ , D = ϵ − E ▷ Initialize while not converged do θ ′ ∼ π ( θ ) with energy E ′ = E ( θ ′ ) ▷ Generate a candidate D ′ = D − ∆ E where ∆ E = E ′ − E ▷ Update demon’s state if D ′ ≥ 0 then ( θ, D ) ← ( θ ′ , D ′ ) ▷ Accept end if end while

  11. Sampling the Ising model with a single demon 0 A Gibbs entropy S G ( E ) 500 1000 1500 2000 2500 estimated ln X k 8000 6000 4000 2000 0 energy E • Nearest-neighbor interaction on a 64 × 64 lattice: E ( θ ) = ∑ ⟨ i , j ⟩ θ i θ j where θ i = ± 1 • Nested sampling provides a very accurate estimate of the volume entropy S = ln X

  12. Sampling the Ising model with a single demon 1.0 B heat capacity inverse temperature β G ( E ) p ln(1 + 2) / 2 0.8 0.6 0.4 0.2 0.0 8000 6000 4000 2000 0 energy E ∫ ϵ ′ • H ( ϵ ′ → ϵ ) = S ( ϵ ′ ) − S ( ϵ ) = ϵ β ( E ) d E • ⟨ H ( ϵ ′ → ϵ ) ⟩ = 1/ N , therefore β ( ϵ k ) ( ϵ k − ϵ k +1 ) ≈ 1/ N • histogram of energy bounds ϵ k matches the inverse temperature / entropy production β ( E )

  13. Sampling the Ising model with a single demon 1.0 C estimated β B inverse temperature β G ( E ) 0.8 0.6 0.4 0.2 0.0 8000 6000 4000 2000 0 energy E • The demon’s energy distribution is p ( θ, D | ϵ ) d θ = g ( ϵ − D ) 1 ∫ p ( D | ϵ ) = T ( ϵ ) exp {− D / T ( ϵ ) } ≈ X ( ϵ ) • The demon may serve as a thermometer: D ≈ T

  14. Properties of nested sampling Pros: 1 . Nested sampling is a microcanonical approach: energy E is the control parameter rather than the temperature used in thermal approaches 2 . . constructs an adaptive “cooling” protocol { ϵ k } 3 . . progresses at constant thermodynamic speed: ∆ S ≈ 1/ N 4 . . provides an estimate of the entropy S Cons: 1 . . Nested sampling requires efficient sampling from 1 p ( θ | ϵ ) X ( ϵ ) Θ[ ϵ − E ( θ )] π ( θ ) = 1 ∫ δ [ ϵ − D − E ( θ )] Θ( D ) π ( θ ) d D = X ( ϵ )

  15. Releasing more demons • We would like to preserve nested sampling’s adaptive behavior but be more flexible in terms of the ensemble • Idea: introduce more demons in order to smooth the ensemble 1 p ( θ, D , K | ϵ ) = Y ( ϵ ) δ [ ϵ − D − K − E ( θ )] Θ( D ) f ( K ) π ( θ ) where the prior mass of the compound system is ∫ Y ( ϵ ) = Θ( ϵ − H ) ( f ⋆ g )( H ) d H involving the convolution ( f ⋆ g )( H ) • Nested sampling tracks Y ( ϵ ) where ϵ is an upper bound on the total energy H = K + E

  16. Releasing more demons • Nested sampling estimates the evidence of the extended system ∫ e − H ( f ⋆ g )( H ) d H = Z K Z E Z H = from which we can obtain the evidence of the original system Z E • Marginal distribution of configurations ∫ 1 p ( θ | ϵ ) = p ( θ, D , K | ϵ ) d D d K = Y ( ϵ ) π ( θ ) F [ ϵ − E ( θ )] ∫ K where F ( K ) = −∞ f ( t ) d t is the cdf of the demon’s energy distribution • Sampling ( θ, K ) : p ( θ | ϵ ) ∝ π ( θ ) F [ ϵ − E ( θ )] θ ∼ K p ( K | θ, ϵ ) ∝ f ( K ) Θ[ ϵ − E ( θ ) − K ] ∼

  17. Demonic nested sampling of the ten state Potts model Demon: f ( K ) ∝ Θ( K max − K ) K d /2 − 1 ( d -dimensional harmonic oscillator where d = dimension of configuration space) [ ϵ − E ( θ )] d /2 ; ϵ − E ( θ ) ≤ K max { p ( θ | ϵ ) ∝ Θ[ ϵ − E ( θ )] π ( θ ) × K d /2 ϵ − E ( θ ) > K max max ; 1e3 61e3 10 0.0 relative accuracy log Z [%] standard NS A B C demonic NS 5 energy bounds ǫ k 5 0.5 4 1.0 3 0 2 1.5 5 1 2.0 0 10 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 1080 1060 1040 1020 1000 980 960 940 0 100 200 300 400 500 iteration k 1e5 energy E demon capacity K max

  18. Nested sampling in phase space • In continuous configuration spaces, it is convenient to unfold the demon and introduce momenta d ∫ K ( ξ ) = 1 f ( K ) = δ [ K − K ( ξ )] d ξ where ∑ ξ 2 ( kinetic energy ) i 2 i =1 • The marginal distribution in configuration space is [ ϵ − E ( θ )] d /2 ; ϵ − E ( θ ) ≤ K max { p ( θ | ϵ ) ∝ Θ[ ϵ − E ( θ )] π ( θ ) × K d /2 ϵ − E ( θ ) > K max max ; • Hamiltonian dynamics for exploration: ( θ, ξ ) L → ( θ ′ , ξ ′ ) where L is an integrator (e.g. the leapfrog)

  19. Microcanonical Hamiltonian Monte Carlo • 2( d + 1) dimensional phase space: implement demon D as harmonic oscillator with energy D = ( ξ 2 d +1 + θ 2 d +1 )/2 • Require: ϵ (total energy), configuration θ with E ( θ ) < ϵ θ d +1 = 0 ▷ Initialize demon D while not converged do ξ ∼ N (0 , 1) ▷ Draw momenta from ( d + 1) -dim Gaussian √ ϵ − E − D / ∥ ξ ∥ ξ ← ξ × ▷ Scale momenta so as to match excess energy ( θ, ξ ) L → ( θ ′ , ξ ′ ) ▷ Run the leapfrog algorithm H ′ = E ( θ ′ ) + K ( ξ ′ ) ▷ Compute total energy of candidate if H ′ < ϵ then θ ← θ ′ , E ← E ( θ ) ▷ Accept end if end while

  20. Application to GS peptide 800 200 B C 700 600 energy E ( θ k ) 150 500 400 100 300 200 50 100 0 100 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 1 2 3 4 5 iteration k 1e4 RMSD [ ] • A: Native structure of the GS peptide • B: Evolution of the energy (goodness-of-fit) during nested sampling • C: Structure’s accuracy measured by the root-mean square deviation (RMSD) to the crystal structure

  21. Other demons Distribution of system’s energy p ( E | ϵ ) = Θ( ϵ − E ) g ( E ) F ( ϵ − E ) Y ( ϵ ) demon pdf f ( K ) cdf F ( K ) √ 2 π e − β 2 K 2 Gauss β 1 2 [1 + erf ( β /2 K )] √ 1 + β − 1 ln | ξ 2 | K ( ξ 1 , ξ 2 ) = 1 Logarithmic 2 ξ 2 ⇒ √ 8 πβ e β K 8 π / β e β K oscillator √ e − β K Fermi 1 β (1+ e − β K ) 2 1+ e − β K

  22. Application to SH3 domain • Structure determination from sparse distance data measured by NMR spectroscopy • Structure ensemble as accurate and precise as with parallel tempering

Recommend


More recommend