Multi-parameter models - Metropolis sampling Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago October 17, 2017 Metropolis sampling 1 Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC Gibbs sampling In Gibbs sampling, each parameter is updated by sampling from its full conditional distribution. This is possible with conjugate priors. However, if the prior is not conjugate it is not obvious how to make a draw from the full conditional. For example, if Y ∼ Normal ( µ, 1 ) and µ ∼ Beta ( a , b ) then � − 1 � 2 ( Y − µ ) 2 µ a − 1 ( 1 − µ ) b − 1 . f ( µ | Y ) ∝ exp For some likelihoods, there is no known conjugate prior. So direct sampling from the posterior may not be possible. In these cases we can use Metropolis sampling . Metropolis sampling 2 Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC Metropolis sampling Metropolis sampling is a version of rejection sampling, where it performs a kind of random walk around the parameter space, and either accepts or rejects a move based on a ratio of posterior densities; it always accepts the move if it’s to a location of higher density, but only sometimes accepts if it’s a location of lower density. We can perform this Metropolis sampling algorithm for each parameter, one at a time. To make the algorithm and following pseudocode easier to read and understand (hopefully), we’ll focus the updating of only one parameter, θ . Metropolis sampling 3 Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC Metropolis algorithm Set initial value θ ( 0 ) . 1 For iteration t , 2 Draw a candidate θ ∗ from a symmetric proposal distribution , J ( θ | θ ( t − 1 ) ) 1 f ( θ ∗ | y ) Compute the Metropolis ratio , R = f ( θ ( t − 1 ) | y ) . 2 � θ ∗ with acceptance probability min ( R , 1 ) , Set θ ( t ) = 3 θ ( t − 1 ) otherwise . The sequence θ ( 1 ) , θ ( 2 ) , . . . converges to the target distribution, f ( θ | y ) . Metropolis sampling 4 Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC Metropolis algorithm The proposal distribution must be symmetric , i.e., it must satisfy J ( θ a | θ b ) = J ( θ b | θ a ) . This means that the probability of “jumping" from θ a to θ b is the same as if you started at θ b and used the same jumping rule to jump to θ a . For example, if you propose a new candidate given the current value by θ ∗ | θ ( t − 1 ) ∼ Normal ( θ ( t − 1 ) , s 2 t ) , we get the same density for θ ( t − 1 ) | θ ∗ ∼ Normal ( θ ∗ , s 2 t ) . The standard deviation of the proposal distribution, s t , is a tuning parameter. What if s t is too small? What if s t is too large? Ideally s j is tuned to give acceptance probability between 0.25-0.60. Metropolis sampling 5 Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC Metropolis-Hastings algorithm The Metropolis-Hastings (MH) algorithm generalizes Metropolis to allow for assymetric proposal distributions. For example, if θ ∈ [ 0 , 1 ] then a reasonable candidate is θ ∗ | θ ( t − 1 ) ∼ Beta � � 10 θ ( t − 1 ) , 10 ( 1 − θ ( t − 1 ) ) . But what is the consequence for using an asymmetric proposal distribution? J ( θ a | θ b ) � = J ( θ b | θ a ) . We need to account for this asymmetry in the Metropolis ratio. Metropolis sampling 6 Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC Metropolis-Hastings algorithm Set initial value θ ( 0 ) . 1 For iteration t , 2 Draw a candidate θ ∗ from a proposal distribution , J ( θ | θ ( t − 1 ) ) . 1 f ( θ ( t − 1 ) | y ) · J ( θ ( t − 1 ) | θ ∗ ) f ( θ ∗ | y ) Compute the Metropolis ratio , R = J ( θ ∗ | θ ( t − 1 ) ) . 2 � θ ∗ with acceptance probability min ( R , 1 ) , Set θ ( t ) = 3 θ ( t − 1 ) otherwise . The sequence θ ( 1 ) , θ ( 2 ) , . . . converges to the target distribution, f ( θ | y ) . Metropolis sampling 7 Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC Metropolis-Hastings algorithm How is Metropolis similar/different to Metropolis-Hastings? 1 How is Gibbs similar/different to Metropolis? 2 What if we take the proposal distribution to be the full conditional distribution? What would be the Metropolis ratio? Metropolis sampling 8 Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC Variants You can combine Gibbs and Metropolis in the obvious way, sampling directly from full conditionals when possible and Metropolis otherwise. Adaptive MCMC varies the proposal distribution throughout the chain. Hamiltonian Monte Carlo (HMC) uses the gradient of the posterior in the proposal distribution and is used in Stan . Metropolis sampling 9 Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC Blocked Gibbs/Metropolis If a group of parameters are highly correlated, convergence can be slow. One way to improve Gibbs sampling is a block update . For example, in linear regression might iterate between sampling the block ( β 1 , . . . , β p ) and σ 2 . Blocked Metropolis is possible too. For example, the proposal for ( β 1 , . . . , β p ) could be multivariate normal. Metropolis sampling 10 Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC Summary With te combination of Gibbs and Metropolis and Metropolis-Hastings sampling, we can fit virtually any model. In some cases Bayesian computing is actually preferable to maximum likelihood analysis. In most cases Bayesian computing is slower. However, in the opinion of many it is worth the wait for improved uncertainty quantification and interpretability. In all cases, it is important to carefully monitor convergence. Metropolis sampling 11 Last edited October 2, 2017 by <ebalderama@luc.edu>
Recommend
More recommend