Nested sampling with demons Michael Habeck Max Planck Institute for Biophysical Chemistry and Institute for Mathematical Stochastics Göttingen, Germany Amboise, September 23, 2014
Bayesian inference • Probability rules posterior × evidence likelihood × prior = Pr ( θ | D , M ) × Pr ( D | M ) Pr ( D | θ, M ) × Pr ( θ | M ) = p ( θ ) × Z L ( θ ) × π ( θ ) = • Inference • Evidence ∫ Z = L ( θ ) π ( θ ) d θ • Posterior p ( θ ) = 1 Z L ( θ ) π ( θ )
Nested sampling • The evidence reduces to a one-dimensional integral: ∫ 1 ∫ Z = L ( θ ) π ( θ ) d θ = L ( X ) d X 0 summing over prior mass ∫ X ( λ ) = π ( θ ) d θ, X (0) = 1 , X ( ∞ ) = 0 . L ( θ ) ≥ λ
Nested sampling • The evidence reduces to a one-dimensional integral: ∫ 1 ∫ Z = L ( θ ) π ( θ ) d θ = L ( X ) d X 0 summing over prior mass ∫ X ( λ ) = π ( θ ) d θ, X (0) = 1 , X ( ∞ ) = 0 . L ( θ ) ≥ λ • Prior masses can be ordered: X ( λ ) < X ( λ ′ ) if λ > λ ′ • Idea: We can evaluate L exactly and estimate X
Estimation of prior masses • Nested sequence of truncated priors: { 0 ; p ( θ | λ ) = Θ[ L ( θ ) − λ ] x < 0 where Θ( x ) = π ( θ ) x ≥ 0 X ( λ ) 1 ; • Distribution of prior masses at likelihood contour λ : X ∼ Uniform (0 , X ( λ )) • Order statistics: X max ∼ N X N − 1 X ( λ ) N where X max is the maximum of N uniformely distributed X n ∼ Uniform (0 , X ( λ ))
Nested sampling algorithm Require: N (ensemble size), m (number of iterations) λ 0 = 0 , X 0 = 1 , S = { θ 1 , . . . , θ N } (states) where θ n ∼ π ( θ ) ▷ Initialize for k = 1 , . . . , m do θ k = argmin { L ( θ n ) | n = 1 , . . . , N } ▷ Store state with smallest likelihood λ k = L ( θ k ) ▷ Define new likelihood contour X k = t X k − 1 where t ∼ N t N − 1 ▷ Estimate prior mass θ ∼ p ( θ | λ k ) ▷ Draw new state from truncated prior S ← S \ { θ k } ∪ { θ } ▷ Replace current with new state end for Z = ∑ m k =1 λ k ( X k − 1 − X k ) ▷ Estimate evidence
Compression • Compression achieved by a single nested sampling iteration ∫ H ( λ → λ ′ ) p ( θ | λ ′ ) ln [ p ( θ | λ ′ )/ p ( θ | λ )] d θ = ln [ X ( λ )/ X ( λ ′ )] = • Average compression is constant ⟨ H ( λ → λ ′ ) ⟩ = −⟨ ln t ⟩ t ∼ Beta ( N , 1) = 1/ N
Physical analogy Bayesian inference Statistical physics model parameters θ configuration/microstate θ negative log likelihood − ln L ( θ ) potential energy E ( θ ) log prior mass ln X ( λ ) volume entropy S ( ϵ ) = ln X ( ϵ ) likelihood contour L ( θ ) > λ energy bound E ( θ ) < ϵ ∫ ϵ prior mass X ( λ ) cumulative DOS X ( ϵ ) = −∞ g ( E ) d E e − β E g ( E ) d E evidence Z = L ( X ) d X partition function Z ( β ) = ∫ ∫ truncated prior microcanonical ensemble
Microcanonical ensemble • Density of states (DOS) ∫ g ( E ) = δ [ E − E ( θ )] π ( θ ) d θ = ∂ E X ( E ) • Microcanonical entropy and temperature: S ( E ) = ln X ( E ) , T ( E ) = 1/ ∂ E S ( E ) • Compression: ∫ ϵ ′ H ( ϵ ′ → ϵ ) = S ( ϵ ′ ) − S ( ϵ ) = β ( E ) d E ϵ where the inverse temperature β = 1/ T measures the entropy production
Enter the demon • Implement truncated prior as microcanonical ensemble with additional demon absorbing energy D : 1 p ( θ, D | ϵ ) = X ( ϵ ) δ [ ϵ − D − E ( θ )] Θ( D ) π ( θ ) and explore constant energy shells • Creutz algorithm: Require: ϵ (upper bound on total energy) θ ∼ π ( θ ) with energy E = E ( θ ) ≤ ϵ , D = ϵ − E ▷ Initialize while not converged do θ ′ ∼ π ( θ ) with energy E ′ = E ( θ ′ ) ▷ Generate a candidate D ′ = D − ∆ E where ∆ E = E ′ − E ▷ Update demon’s state if D ′ ≥ 0 then ( θ, D ) ← ( θ ′ , D ′ ) ▷ Accept end if end while
Sampling the Ising model with a single demon 0 A Gibbs entropy S G ( E ) 500 1000 1500 2000 2500 estimated ln X k 8000 6000 4000 2000 0 energy E • Nearest-neighbor interaction on a 64 × 64 lattice: E ( θ ) = ∑ ⟨ i , j ⟩ θ i θ j where θ i = ± 1 • Nested sampling provides a very accurate estimate of the volume entropy S = ln X
Sampling the Ising model with a single demon 1.0 B heat capacity inverse temperature β G ( E ) p ln(1 + 2) / 2 0.8 0.6 0.4 0.2 0.0 8000 6000 4000 2000 0 energy E ∫ ϵ ′ • H ( ϵ ′ → ϵ ) = S ( ϵ ′ ) − S ( ϵ ) = ϵ β ( E ) d E • ⟨ H ( ϵ ′ → ϵ ) ⟩ = 1/ N , therefore β ( ϵ k ) ( ϵ k − ϵ k +1 ) ≈ 1/ N • histogram of energy bounds ϵ k matches the inverse temperature / entropy production β ( E )
Sampling the Ising model with a single demon 1.0 C estimated β B inverse temperature β G ( E ) 0.8 0.6 0.4 0.2 0.0 8000 6000 4000 2000 0 energy E • The demon’s energy distribution is p ( θ, D | ϵ ) d θ = g ( ϵ − D ) 1 ∫ p ( D | ϵ ) = T ( ϵ ) exp {− D / T ( ϵ ) } ≈ X ( ϵ ) • The demon may serve as a thermometer: D ≈ T
Properties of nested sampling Pros: 1 . Nested sampling is a microcanonical approach: energy E is the control parameter rather than the temperature used in thermal approaches 2 . . constructs an adaptive “cooling” protocol { ϵ k } 3 . . progresses at constant thermodynamic speed: ∆ S ≈ 1/ N 4 . . provides an estimate of the entropy S Cons: 1 . . Nested sampling requires efficient sampling from 1 p ( θ | ϵ ) X ( ϵ ) Θ[ ϵ − E ( θ )] π ( θ ) = 1 ∫ δ [ ϵ − D − E ( θ )] Θ( D ) π ( θ ) d D = X ( ϵ )
Releasing more demons • We would like to preserve nested sampling’s adaptive behavior but be more flexible in terms of the ensemble • Idea: introduce more demons in order to smooth the ensemble 1 p ( θ, D , K | ϵ ) = Y ( ϵ ) δ [ ϵ − D − K − E ( θ )] Θ( D ) f ( K ) π ( θ ) where the prior mass of the compound system is ∫ Y ( ϵ ) = Θ( ϵ − H ) ( f ⋆ g )( H ) d H involving the convolution ( f ⋆ g )( H ) • Nested sampling tracks Y ( ϵ ) where ϵ is an upper bound on the total energy H = K + E
Releasing more demons • Nested sampling estimates the evidence of the extended system ∫ e − H ( f ⋆ g )( H ) d H = Z K Z E Z H = from which we can obtain the evidence of the original system Z E • Marginal distribution of configurations ∫ 1 p ( θ | ϵ ) = p ( θ, D , K | ϵ ) d D d K = Y ( ϵ ) π ( θ ) F [ ϵ − E ( θ )] ∫ K where F ( K ) = −∞ f ( t ) d t is the cdf of the demon’s energy distribution • Sampling ( θ, K ) : p ( θ | ϵ ) ∝ π ( θ ) F [ ϵ − E ( θ )] θ ∼ K p ( K | θ, ϵ ) ∝ f ( K ) Θ[ ϵ − E ( θ ) − K ] ∼
Demonic nested sampling of the ten state Potts model Demon: f ( K ) ∝ Θ( K max − K ) K d /2 − 1 ( d -dimensional harmonic oscillator where d = dimension of configuration space) [ ϵ − E ( θ )] d /2 ; ϵ − E ( θ ) ≤ K max { p ( θ | ϵ ) ∝ Θ[ ϵ − E ( θ )] π ( θ ) × K d /2 ϵ − E ( θ ) > K max max ; 1e3 61e3 10 0.0 relative accuracy log Z [%] standard NS A B C demonic NS 5 energy bounds ǫ k 5 0.5 4 1.0 3 0 2 1.5 5 1 2.0 0 10 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 1080 1060 1040 1020 1000 980 960 940 0 100 200 300 400 500 iteration k 1e5 energy E demon capacity K max
Nested sampling in phase space • In continuous configuration spaces, it is convenient to unfold the demon and introduce momenta d ∫ K ( ξ ) = 1 f ( K ) = δ [ K − K ( ξ )] d ξ where ∑ ξ 2 ( kinetic energy ) i 2 i =1 • The marginal distribution in configuration space is [ ϵ − E ( θ )] d /2 ; ϵ − E ( θ ) ≤ K max { p ( θ | ϵ ) ∝ Θ[ ϵ − E ( θ )] π ( θ ) × K d /2 ϵ − E ( θ ) > K max max ; • Hamiltonian dynamics for exploration: ( θ, ξ ) L → ( θ ′ , ξ ′ ) where L is an integrator (e.g. the leapfrog)
Microcanonical Hamiltonian Monte Carlo • 2( d + 1) dimensional phase space: implement demon D as harmonic oscillator with energy D = ( ξ 2 d +1 + θ 2 d +1 )/2 • Require: ϵ (total energy), configuration θ with E ( θ ) < ϵ θ d +1 = 0 ▷ Initialize demon D while not converged do ξ ∼ N (0 , 1) ▷ Draw momenta from ( d + 1) -dim Gaussian √ ϵ − E − D / ∥ ξ ∥ ξ ← ξ × ▷ Scale momenta so as to match excess energy ( θ, ξ ) L → ( θ ′ , ξ ′ ) ▷ Run the leapfrog algorithm H ′ = E ( θ ′ ) + K ( ξ ′ ) ▷ Compute total energy of candidate if H ′ < ϵ then θ ← θ ′ , E ← E ( θ ) ▷ Accept end if end while
Application to GS peptide 800 200 B C 700 600 energy E ( θ k ) 150 500 400 100 300 200 50 100 0 100 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 1 2 3 4 5 iteration k 1e4 RMSD [ ] • A: Native structure of the GS peptide • B: Evolution of the energy (goodness-of-fit) during nested sampling • C: Structure’s accuracy measured by the root-mean square deviation (RMSD) to the crystal structure
Other demons Distribution of system’s energy p ( E | ϵ ) = Θ( ϵ − E ) g ( E ) F ( ϵ − E ) Y ( ϵ ) demon pdf f ( K ) cdf F ( K ) √ 2 π e − β 2 K 2 Gauss β 1 2 [1 + erf ( β /2 K )] √ 1 + β − 1 ln | ξ 2 | K ( ξ 1 , ξ 2 ) = 1 Logarithmic 2 ξ 2 ⇒ √ 8 πβ e β K 8 π / β e β K oscillator √ e − β K Fermi 1 β (1+ e − β K ) 2 1+ e − β K
Application to SH3 domain • Structure determination from sparse distance data measured by NMR spectroscopy • Structure ensemble as accurate and precise as with parallel tempering
Recommend
More recommend