Stratified Markov Chain Monte Carlo Brian Van Koten University of Massachusetts, Amherst Department of Mathematics and Statistics with A. Dinner, J. Tempkin, E. Thiede, B. Vani, and J. Weare June 28, 2019
Sampling Problems What is the probability of Bayesian inference for ODE finding a protein in a given model of circadian rhythms. conformation? Figure from Folding@home Figure from Phong, et al, PNAS, 2012 Compute sample from Compute sample from Boltzmann distribution . posterior distribution .
Markov Chain Monte Carlo (MCMC) � Goal: Compute π ( g ) := g ( x ) π ( dx ). MCMC Method: Choose Markov chain X n so that N − 1 1 � lim g ( X n ) = π ( g ) . N N →∞ n =0 � �� � “ X n samples π .” MCMC trajectory X n Target Density 2 1 position ( x ) 0 1 2 0 5000 10000 0 1 time step ( n ) density ( )
Difficulties with MCMC MCMC trajectory X n Target Density M position ( x ) 0 0 5000 10000 0 1 time step ( n ) density ( ) Multimodality: Multimodality = ⇒ slow convergence Tails: Need large sample to compute small probabilities, e.g. π ([ M , ∞ )).
Sketch of Stratified MCMC Target Distribution 1. Choose family of strata , i.e. distributions 1 π i whose supports cover support of target π . 2. Sample strata by MCMC. 3. Estimate π ( g ) from samples of strata. 0 Typical Strata: π i ( dx ) ∝ ψ i ( x ) π ( dx ) Strata i for “localized” ψ i . 2 Why Stratify? 1 • Strata may be unimodal , even if π is multimodal • Can concentrate sampling in tail 0 2 1 0 1 2
History of Stratification Surveys: [Russian census, late 1800s] , [Neyman, 1937] Bayes factors: [Geyer, 1994] Selection bias models: [Vardi, 1985] Free energy: [Umbrella Sampling, 1977] , [WHAM, 1992] , [MBAR, 2008] Ion channels: [Berneche, et al, 2001] Protein folding: [Boczko, et al, 1995] Problems: 1. WHAM/MBAR are complicated iterative methods . . . 2. No clear story explaining benefits of stratification. 3. Stratification underappreciated as a general strategy. 4. Need good error bars for adaptivity.
History of Stratification Surveys: [Russian census, late 1800s] , [Neyman, 1937] Bayes factors: [Geyer, 1994] Selection bias models: [Vardi, 1985] Free energy: [Umbrella Sampling, 1977] , [WHAM, 1992] , [MBAR, 2008] Ion channels: [Berneche, et al, 2001] Protein folding: [Boczko, et al, 1995] Problems: 1. WHAM/MBAR are complicated iterative methods . . . 2. No clear story explaining benefits of stratification. 3. Stratification underappreciated as a general strategy. 4. Need good error bars for adaptivity. BvK, et al: Propose Eigenvector Method for Umbrella Sampling, develop story , error bars , stratification for dynamical quantities . . .
Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] Target Distribution 1 Bias Functions: { ψ i } L i =1 with L 0 � ψ i ( x ) = 1 and ψ i ( x ) ≥ 0 . Bias Functions i =1 i 1 Note: User chooses bias functions. 0 Weights: z i = π ( ψ i ) Strata i 2 Strata: π i ( dx ) = z − 1 ψ i ( x ) π ( dx ) i 1 0 2 1 0 1 2
Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] Goal: Write π ( g ) in terms of averages over strata π i ( dx ) = ψ i ( x ) π ( dx ) . z i First, decompose π ( g ) as weighted sum: L � � π ( g ) = g ( x ) ψ i ( x ) π ( dx ) i =1 � �� � ψ i ’s sum to one L � L g ( x ) ψ i ( x ) π ( dx ) � � = z i = z i π i ( g ) . z i i =1 i =1 � �� � π i ( dx )
Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] Goal: Write π ( g ) in terms of averages over strata π i ( dx ) = ψ i ( x ) π ( dx ) . z i First, decompose π ( g ) as weighted sum: L � � π ( g ) = g ( x ) ψ i ( x ) π ( dx ) i =1 � �� � ψ i ’s sum to one L � L g ( x ) ψ i ( x ) π ( dx ) � � = z i = z i π i ( g ) . z i i =1 i =1 � �� � π i ( dx ) How to express weights z i = π ( ψ i ) as averages over strata?
Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] Goal: Write π ( g ) in terms of averages over strata π i ( dx ) = ψ i ( x ) π ( dx ) . z i First, decompose π ( g ) as weighted sum: π ( g ) = � L i =1 z i π i ( g ). How to express weights z i = π ( ψ i ) as averages over strata?
Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] Goal: Write π ( g ) in terms of averages over strata π i ( dx ) = ψ i ( x ) π ( dx ) . z i First, decompose π ( g ) as weighted sum: π ( g ) = � L i =1 z i π i ( g ). How to express weights z i = π ( ψ i ) as averages over strata? L � z T = z T F z j = π ( ψ j ) = z i π i ( ψ j ) ⇐ ⇒ , where F ij = π i ( ψ j ) . � �� � � �� � i =1 eigenproblem overlap matrix
Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] Goal: Write π ( g ) in terms of averages over strata π i ( dx ) = ψ i ( x ) π ( dx ) . z i First, decompose π ( g ) as weighted sum: π ( g ) = � L i =1 z i π i ( g ). To express weights z i = π ( ψ i ) as averages over strata, z T = z T F , where F ij = π i ( ψ j ) . � �� � � �� � eigenproblem overlap matrix
Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] Goal: Write π ( g ) in terms of averages over strata π i ( dx ) = ψ i ( x ) π ( dx ) . z i First, decompose π ( g ) as weighted sum: π ( g ) = � L i =1 z i π i ( g ). To express weights z i = π ( ψ i ) as averages over strata, z T = z T F , where F ij = π i ( ψ j ) . � �� � � �� � eigenproblem overlap matrix Why does eigenproblem determine z? 1. F is stochastic; z is a probability vector. 2. If F irreducible, z is unique solution of eigenproblem.
Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] i =1 z i π i ( g ), and z T = z T F for F ij = π i ( ψ j ). Recall: π ( g ) = � L EMUS Algorithm: 1. Choose bias functions ψ i and processes X i n sampling the strata. � N i 1 n =1 g ( X i 2. Compute ¯ g i := n ) to estimate π i ( g ). N i � N i 3. Compute ¯ 1 n =1 ψ j ( X i F ij := n ) to estimate F . N i z T = ¯ z T ¯ 4. Solve eigenproblem ¯ F to estimate weights z . 5. Output g EM = � L i =1 ¯ z i ¯ g i .
Eigenvector Method for Umbrella Sampling (EMUS) [BvK, et al] i =1 z i π i ( g ), and z T = z T F for F ij = π i ( ψ j ). Recall: π ( g ) = � L EMUS Algorithm: 1. Choose bias functions ψ i and processes X i n sampling the strata. � N i 1 n =1 g ( X i 2. Compute ¯ g i := n ) to estimate π i ( g ). N i � N i 3. Compute ¯ 1 n =1 ψ j ( X i F ij := n ) to estimate F . N i z T = ¯ z T ¯ 4. Solve eigenproblem ¯ F to estimate weights z . 5. Output g EM = � L i =1 ¯ z i ¯ g i . Key Point: Simplicity of EMUS enables analysis of stratification.
EMUS Analysis: Outline 1. Sensitivity of g EM to sampling error. 2. Dependence of sampling error on choice of strata. 3. Stories involving multimodality and tails.
Quantifying Sensitivity to Sampling Error I For F irreducible and stochastic, let z ( F ) be the unique solution of z ( F ) T = z ( F ) T F . P F i [ τ j < τ i ]: probability of hitting j before i , conditioned on starting from i , for a Markov chain on 1 , . . . , L with transition matrix F . Theorem [BvK, et al] : � � 1 1 ∂ log z m i [ τ j < τ i ] ≤ 1 1 � � i [ τ j < τ i ] ≤ max ( F ) � ≤ . � � 2 P F ∂ F ij P F � F ij m =1 ,... L Led to new perturbation bounds for Markov chains [BvK, et al] .
Quantifying Sensitivity to Sampling Error II Assumption: CLT holds for MCMC averages: � g i − π i ( g )) d N i (¯ − → N (0 , C (¯ g i ) ) . � �� � asymptotic variance √ � d � � � g EM �� g EM − π ( g ) Theorem [BvK, et al] : N − → N 0 , C , where � ¯ � g EM � � L � L j =1 C F ij C 1 C (¯ g i ) � � + z 2 var π ( g ) � × . i P F i [ τ j < τ i ] 2 κ i κ i i =1 j � = i � �� � F ij > 0 error in ¯ F � �� � sensitivity to error in ¯ F Notation: N is total sample size, with N i = κ i N from π i .
EMUS Analysis: Outline 1. Sensitivity of g EM to sampling error. 2. Dependence of sampling error on choice of strata. 3. Stories involving multimodality and tails.
Dependence of Sampling Error on Strata I Write π ( dx ) = Z − 1 exp( − V ( x ) /ε ) for some potential V : potential ( V ) 6 1 density ( ) 0 0 2 0 2 Assume bias functions ψ i piecewise constant: D Assume X i t is overdamped Langevin with reflecting boundaries: √ dX i t = − ∇ V ( X i 2 ε dB i t ) dt + + reflecting BCs t � �� � � �� � gradient descent noise
Dependence of Sampling Error on Strata II Let π ( dx ) = Z − 1 exp( − V ( x ) /ε ) for some potential V : 6 1 = 1 potential (V) density ( ) = 5 3 0 0 1 2 0 2 Theorem [BvK, et al] : For overdamped Langevin with reflecting BCs, � max supp π i V − min supp π i V � D 2 C(¯ g i ) � × exp . var π i ( g ) ε ε ���� � �� � diffusion scaling Arrhenius Notation: D is diameter of support of π i .
EMUS Analysis: Outline 1. Dependence of sampling error on choice of strata. 2. Sensitivity of g EM to sampling error. 3. Stories involving multimodality and tails.
Recommend
More recommend