Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, 2018 Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 1 / 32
Outline Two-component Gibbs sampler Full conditional distribution K -component Gibbs sampler Blocked Gibbs sampler Metropolis-within-Gibbs Slice sampler Latent variable augmentation Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 2 / 32
Two-component Gibbs sampling Two component Gibbs sampler Suppose our target distribution is p ( θ | y ) with θ = ( θ 1 , θ 2 ) and we can sample from p ( θ 1 | θ 2 , y ) and p ( θ 2 | θ 1 , y ). Beginning with an initial value � � θ (0) 1 , θ (0) , an iteration of the Gibbs sampler involves 2 � � 1. Sampling θ ( t ) θ 1 | θ ( t − 1) ∼ p , y . 1 2 � � 2. Sampling θ ( t ) θ 2 | θ ( t ) ∼ p 1 , y . 2 Thus in order to run a Gibbs sampler, we need to derive the full conditional for θ 1 and θ 2 , i.e. the distribution for θ 1 and θ 2 conditional on everything else. Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 3 / 32
Two-component Gibbs sampling Bivariate normal example Bivariate normal example Let our target be � 1 � ρ θ ∼ N 2 (0 , Σ ) Σ = . ρ 1 Then � ρθ 2 , [1 − ρ 2 ] � θ 1 | θ 2 ∼ N � ρθ 1 , [1 − ρ 2 ] � θ 2 | θ 1 ∼ N are the conditional distributions. � θ 0 1 , θ 0 � Assuming initial value , the Gibbs sampler proceeds as follows: 2 Iteration Sample θ 1 Sample θ 2 � � θ (1) θ (1) ρθ (1) � ρθ 0 2 , [1 − ρ 2 ] � 1 , [1 − ρ 2 ] 1 ∼ N ∼ N 1 2 . . . � � � � θ ( t ) ρθ ( t − 1) θ ( t ) ρθ ( t ) , [1 − ρ 2 ] 1 , [1 − ρ 2 ] ∼ N ∼ N t 1 2 2 . . . Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 4 / 32
Two-component Gibbs sampling Bivariate normal example R code for bivariate normal Gibbs sampler gibbs_bivariate_normal = function(theta0, n_points, rho) { theta = matrix(theta0, nrow=n_points, ncol=2, byrow=TRUE) v = sqrt(1-rho^2) for (i in 2:n_points) { theta[i,1] = rnorm(1, rho*theta[i-1,2], v) theta[i,2] = rnorm(1, rho*theta[i ,1], v) } return(theta) } theta = gibbs_bivariate_normal(c(-3,3), n<-20, rho=rho<-0.9) Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 5 / 32
Two-component Gibbs sampling Bivariate normal example 3 2 1 θ 2 0 −1 −2 −3 −3 −2 −1 0 1 2 3 θ 1 Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 6 / 32
Two-component Gibbs sampling Normal model Normal model ind ∼ N ( µ, σ 2 ) and we assume the prior Suppose Y i σ 2 ∼ Inv- χ 2 ( v , s 2 ) . µ ∼ N ( m , C ) and Note: this is NOT the conjugate prior. The full posterior we are interested in is ( σ 2 ) − n / 2 exp p ( µ, σ 2 | y ) ∝ 2 σ 2 ( � n i =1 ( y i − µ ) 2 � 2 C ( µ − m ) 2 � � 1 � − 1 − exp × ( σ 2 ) − ( v / 2+1) exp � − vs 2 � 2 σ 2 To run the Gibbs sampler, we need to derive µ | σ 2 , y and σ 2 | µ, y Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 7 / 32
Two-component Gibbs sampling Normal model Derive µ | σ 2 , y . Recall ( σ 2 ) − n / 2 exp p ( µ, σ 2 | y ) ∝ − 1 � n i =1 ( y i − µ ) 2 � − 1 2 C ( µ − m ) 2 � � � exp 2 σ 2 � � × ( σ 2 ) − ( v / 2+1) exp − vs 2 2 σ 2 Now find µ | σ 2 , y : p ( µ | σ 2 , y ) ∝ p ( µ, σ 2 | y ) � n − 1 i =1 ( y i − µ ) 2 � − 1 2 C ( µ − m ) 2 � � � ∝ exp exp 2 σ 2 � �� � � ��� µ 2 − 2 µ − 1 σ 2 / n + 1 1 y σ 2 / n + m ∝ exp 2 C C thus µ | σ 2 , y ∼ N ( m ′ , C ′ ) where = C ′ � � m ′ σ 2 / n + m y C � − 1 � C ′ σ 2 / n + 1 1 = C Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 8 / 32
Two-component Gibbs sampling Normal model Derive σ 2 | µ, y . Recall ( σ 2 ) − n / 2 exp � n p ( µ, σ 2 | y ) ∝ − 1 i =1 ( y i − µ ) 2 � − 1 2 C ( µ − m ) 2 � � � exp 2 σ 2 × ( σ 2 ) − ( v / 2+1) exp � � − vs 2 2 σ 2 Now find σ 2 | µ, y : p ( σ 2 | µ, y ) ∝ p ( µ, σ 2 | y ) ∝ ( σ 2 ) − ([ v + n ] / 2+1) exp vs 2 + � n − 1 i =1 ( y i − µ ) 2 �� � � 2 σ 2 and thus σ 2 | µ, y ∼ Inv- χ 2 ( v ′ , ( s ′ ) 2 ) where v ′ = v + n = vs 2 + � n v ′ ( s ′ ) 2 i =1 ( y i − µ ) 2 Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 9 / 32
Two-component Gibbs sampling Normal model R code for Gibbs sampler # Data and prior y = rnorm(10) m = 0; C = 10 v = 1; s = 1 # Initial values mu = 0 sigma2 = 1 # Save structures n_iter = 1000 mu_keep = rep(NA, n_iter) sigma_keep = rep(NA, n_iter) # Pre-calculate n = length(y) sum_y = sum(y) vp = v+n vs2 = v*s^2 Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 10 / 32
Two-component Gibbs sampling Normal model R code for Gibbs sampler # Gibbs sampler for (i in 1:n_iter) { # Sample mu Cp = 1/(n/sigma2+1/C) mp = Cp*(sum_y/sigma2+m/C) mu = rnorm(1, mp, sqrt(Cp)) # Sample sigma vpsp2 = vs2 + sum((y-mu)^2) sigma2 = 1/rgamma(1, vp/2, vpsp2/2) # Save iterations mu_keep[i] = mu sigma_keep[i] = sqrt(sigma2) } Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 11 / 32
Two-component Gibbs sampling Normal model Posteriors mu sigma 3.0 1 2.5 0 2.0 value 1.5 −1 1.0 −2 0 250 500 750 1000 0 250 500 750 1000 t mu sigma 2.0 1.5 1.5 1.0 density 1.0 0.5 0.5 0.0 0.0 −2 −1 0 1 1.0 1.5 2.0 2.5 3.0 value Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 12 / 32
K -component Gibbs sampler K -component Gibbs sampler Suppose θ = ( θ 1 , . . . , θ K ), then an iteration of a K -component Gibbs sampler is � � θ ( t ) θ 1 | θ ( t − 1) , . . . , θ ( t − 1) ∼ p , y 1 2 K � � θ ( t ) θ 2 | θ ( t ) 1 , θ ( t − 1) , . . . , θ ( t − 1) ∼ p , y 2 3 K . . . � � θ ( t ) θ k | θ ( t ) 1 . . . , θ ( t ) k − 1 , θ ( t − 1) k +1 , . . . , θ ( t − 1) ∼ p , y k K . . . � � θ ( t ) θ K | θ ( t ) 1 , . . . , θ ( t ) ∼ p K − 1 , y K The distributions above are called the full conditional distributions. If some of the θ k are vectors, then this is called a block Gibbs sampler. Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 13 / 32
K -component Gibbs sampler Hierarchical normal model Let ind ind ∼ N ( µ i , σ 2 ) , ∼ N ( η, τ 2 ) Y ij µ i for i = 1 , . . . , I , j = 1 , . . . , n i , n = � I i =1 n i and prior p ( η, τ 2 , σ ) ∝ IG ( τ 2 ; a τ , b τ ) IG ( σ 2 ; a σ , b σ ) . The full conditionals are p ( µ | η, σ 2 , τ 2 , y ) = � n i =1 p ( µ i | η, σ 2 , τ 2 , y i ) �� � − 1 � � � � � σ 2 / n i + η y i p ( µ i | η, σ 2 , τ 2 , y i ) σ 2 / n i + 1 1 σ 2 / n i + 1 1 = N , τ 2 τ 2 τ 2 p ( η | µ, σ 2 , τ 2 , y ) µ, τ 2 / I � � = N � n j = IG ( a σ + n / 2 , b σ + � I p ( σ 2 | µ, η, τ 2 , y ) j =1 ( y ij − µ i ) 2 / 2) i =1 p ( τ 2 | µ, η, σ 2 , y ) = IG ( a τ + I / 2 , b τ + � I i =1 ( µ i − η ) 2 / 2) where n i y i = � n i j =1 y ij and I µ = � I i =1 µ i . Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 14 / 32
Metropolis-within-Gibbs Metropolis-within-Gibbs We have discussed two Markov chain approaches to sample from a target distribution: Metropolis-Hastings algorithm Gibbs sampling Gibbs sampling assumed we can sample from p ( θ k | θ − k , y ) for all k , but what if we cannot sample from all of these full conditional distributions? For those p ( θ k | θ − k ) that cannot be sampled directly, a single iteration of the Metropolis-Hastings algorithm can be substituted. Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 15 / 32
Metropolis-within-Gibbs Bivariate normal with ρ = 0 . 9 Reconsider the bivariate normal example substituting a Metropolis step in place of a Gibbs step: gibbs_and_metropolis = function(theta0, n_points, rho) { theta = matrix(theta0, nrow=n_points, ncol=2, byrow=TRUE) v = sqrt(1-rho^2) for (i in 2:n_points) { theta[i,1] = rnorm(1, rho*theta[i-1,2], v) # Now do a random-walk Metropolis step theta_prop = rnorm(1, theta[i-1,2], 2.4*v) # optimal proposal variance logr = dnorm(theta_prop, rho*theta[i,1], v, log=TRUE) - dnorm(theta[i-1,2], rho*theta[i,1], v, log=TRUE) theta[i,2] = ifelse(log(runif(1))<logr, theta_prop, theta[i-1,2]) } return(theta) } theta = gibbs_and_metropolis(c(-3,3), n, rho) length(unique(theta[,2]))/length(theta[,2]) # acceptance rate [1] 0.5 Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 16 / 32
Metropolis-within-Gibbs 3 2 1 θ 2 0 −1 −2 −3 −3 −2 −1 0 1 2 3 θ 1 Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 17 / 32
Metropolis-within-Gibbs Hierarchical normal model Let ind ind ∼ N ( µ i , σ 2 ) , ∼ N ( η, τ 2 ) Y ij µ i for i = 1 , . . . , I , j = 1 , . . . , n i , n = � I i =1 n i and prior p ( η, τ, σ ) ∝ Ca + ( τ ; 0 , b τ ) Ca + ( σ ; 0 , b σ ) . The full conditionals are exactly the same except � n j p ( σ | µ, η, τ 2 , y ) ∝ IG ( σ 2 ; n / 2 , � I j =1 ( y ij − µ i ) 2 / 2) Ca + ( σ ; 0 , b σ ) i =1 ∝ IG ( τ 2 ; I / 2 , � I p ( τ 2 | µ, η, σ 2 , y ) i =1 ( µ i − η ) 2 / 2) Ca + ( τ ; 0 , b τ ) where n i y i = � n i j =1 y ij and I µ = � I i =1 µ i . Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 18 / 32
Recommend
More recommend