Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, - PowerPoint PPT Presentation

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, 2018 Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 1 / 32

Outline Two-component Gibbs sampler Full conditional distribution K -component Gibbs sampler Blocked Gibbs sampler Metropolis-within-Gibbs Slice sampler Latent variable augmentation Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 2 / 32

Two-component Gibbs sampling Two component Gibbs sampler Suppose our target distribution is p ( θ | y ) with θ = ( θ 1 , θ 2 ) and we can sample from p ( θ 1 | θ 2 , y ) and p ( θ 2 | θ 1 , y ). Beginning with an initial value � � θ (0) 1 , θ (0) , an iteration of the Gibbs sampler involves 2 � � 1. Sampling θ ( t ) θ 1 | θ ( t − 1) ∼ p , y . 1 2 � � 2. Sampling θ ( t ) θ 2 | θ ( t ) ∼ p 1 , y . 2 Thus in order to run a Gibbs sampler, we need to derive the full conditional for θ 1 and θ 2 , i.e. the distribution for θ 1 and θ 2 conditional on everything else. Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 3 / 32

Two-component Gibbs sampling Bivariate normal example Bivariate normal example Let our target be � 1 � ρ θ ∼ N 2 (0 , Σ ) Σ = . ρ 1 Then � ρθ 2 , [1 − ρ 2 ] � θ 1 | θ 2 ∼ N � ρθ 1 , [1 − ρ 2 ] � θ 2 | θ 1 ∼ N are the conditional distributions. � θ 0 1 , θ 0 � Assuming initial value , the Gibbs sampler proceeds as follows: 2 Iteration Sample θ 1 Sample θ 2 � � θ (1) θ (1) ρθ (1) � ρθ 0 2 , [1 − ρ 2 ] � 1 , [1 − ρ 2 ] 1 ∼ N ∼ N 1 2 . . . � � � � θ ( t ) ρθ ( t − 1) θ ( t ) ρθ ( t ) , [1 − ρ 2 ] 1 , [1 − ρ 2 ] ∼ N ∼ N t 1 2 2 . . . Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 4 / 32

Two-component Gibbs sampling Bivariate normal example R code for bivariate normal Gibbs sampler gibbs_bivariate_normal = function(theta0, n_points, rho) { theta = matrix(theta0, nrow=n_points, ncol=2, byrow=TRUE) v = sqrt(1-rho^2) for (i in 2:n_points) { theta[i,1] = rnorm(1, rho*theta[i-1,2], v) theta[i,2] = rnorm(1, rho*theta[i ,1], v) } return(theta) } theta = gibbs_bivariate_normal(c(-3,3), n<-20, rho=rho<-0.9) Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 5 / 32

Two-component Gibbs sampling Bivariate normal example 3 2 1 θ 2 0 −1 −2 −3 −3 −2 −1 0 1 2 3 θ 1 Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 6 / 32

Two-component Gibbs sampling Normal model Normal model ind ∼ N ( µ, σ 2 ) and we assume the prior Suppose Y i σ 2 ∼ Inv- χ 2 ( v , s 2 ) . µ ∼ N ( m , C ) and Note: this is NOT the conjugate prior. The full posterior we are interested in is ( σ 2 ) − n / 2 exp p ( µ, σ 2 | y ) ∝ 2 σ 2 ( � n i =1 ( y i − µ ) 2 � 2 C ( µ − m ) 2 � � 1 � − 1 − exp × ( σ 2 ) − ( v / 2+1) exp � − vs 2 � 2 σ 2 To run the Gibbs sampler, we need to derive µ | σ 2 , y and σ 2 | µ, y Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 7 / 32

Two-component Gibbs sampling Normal model Derive µ | σ 2 , y . Recall ( σ 2 ) − n / 2 exp p ( µ, σ 2 | y ) ∝ − 1 � n i =1 ( y i − µ ) 2 � − 1 2 C ( µ − m ) 2 � � � exp 2 σ 2 � � × ( σ 2 ) − ( v / 2+1) exp − vs 2 2 σ 2 Now find µ | σ 2 , y : p ( µ | σ 2 , y ) ∝ p ( µ, σ 2 | y ) � n − 1 i =1 ( y i − µ ) 2 � − 1 2 C ( µ − m ) 2 � � � ∝ exp exp 2 σ 2 � �� µ 2 − 2 µ − 1 σ 2 / n + 1 1 y σ 2 / n + m ∝ exp 2 C C thus µ | σ 2 , y ∼ N ( m ′ , C ′ ) where = C ′ � � m ′ σ 2 / n + m y C � − 1 � C ′ σ 2 / n + 1 1 = C Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 8 / 32

Two-component Gibbs sampling Normal model Derive σ 2 | µ, y . Recall ( σ 2 ) − n / 2 exp � n p ( µ, σ 2 | y ) ∝ − 1 i =1 ( y i − µ ) 2 � − 1 2 C ( µ − m ) 2 � � � exp 2 σ 2 × ( σ 2 ) − ( v / 2+1) exp � � − vs 2 2 σ 2 Now find σ 2 | µ, y : p ( σ 2 | µ, y ) ∝ p ( µ, σ 2 | y ) ∝ ( σ 2 ) − ([ v + n ] / 2+1) exp vs 2 + � n − 1 i =1 ( y i − µ ) 2 �� 2 σ 2 and thus σ 2 | µ, y ∼ Inv- χ 2 ( v ′ , ( s ′ ) 2 ) where v ′ = v + n = vs 2 + � n v ′ ( s ′ ) 2 i =1 ( y i − µ ) 2 Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 9 / 32

Two-component Gibbs sampling Normal model R code for Gibbs sampler # Data and prior y = rnorm(10) m = 0; C = 10 v = 1; s = 1 # Initial values mu = 0 sigma2 = 1 # Save structures n_iter = 1000 mu_keep = rep(NA, n_iter) sigma_keep = rep(NA, n_iter) # Pre-calculate n = length(y) sum_y = sum(y) vp = v+n vs2 = v*s^2 Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 10 / 32

Two-component Gibbs sampling Normal model R code for Gibbs sampler # Gibbs sampler for (i in 1:n_iter) { # Sample mu Cp = 1/(n/sigma2+1/C) mp = Cp*(sum_y/sigma2+m/C) mu = rnorm(1, mp, sqrt(Cp)) # Sample sigma vpsp2 = vs2 + sum((y-mu)^2) sigma2 = 1/rgamma(1, vp/2, vpsp2/2) # Save iterations mu_keep[i] = mu sigma_keep[i] = sqrt(sigma2) } Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 11 / 32

Two-component Gibbs sampling Normal model Posteriors mu sigma 3.0 1 2.5 0 2.0 value 1.5 −1 1.0 −2 0 250 500 750 1000 0 250 500 750 1000 t mu sigma 2.0 1.5 1.5 1.0 density 1.0 0.5 0.5 0.0 0.0 −2 −1 0 1 1.0 1.5 2.0 2.5 3.0 value Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 12 / 32

K -component Gibbs sampler K -component Gibbs sampler Suppose θ = ( θ 1 , . . . , θ K ), then an iteration of a K -component Gibbs sampler is � � θ ( t ) θ 1 | θ ( t − 1) , . . . , θ ( t − 1) ∼ p , y 1 2 K � � θ ( t ) θ 2 | θ ( t ) 1 , θ ( t − 1) , . . . , θ ( t − 1) ∼ p , y 2 3 K . . . � � θ ( t ) θ k | θ ( t ) 1 . . . , θ ( t ) k − 1 , θ ( t − 1) k +1 , . . . , θ ( t − 1) ∼ p , y k K . . . � � θ ( t ) θ K | θ ( t ) 1 , . . . , θ ( t ) ∼ p K − 1 , y K The distributions above are called the full conditional distributions. If some of the θ k are vectors, then this is called a block Gibbs sampler. Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 13 / 32

K -component Gibbs sampler Hierarchical normal model Let ind ind ∼ N ( µ i , σ 2 ) , ∼ N ( η, τ 2 ) Y ij µ i for i = 1 , . . . , I , j = 1 , . . . , n i , n = � I i =1 n i and prior p ( η, τ 2 , σ ) ∝ IG ( τ 2 ; a τ , b τ ) IG ( σ 2 ; a σ , b σ ) . The full conditionals are p ( µ | η, σ 2 , τ 2 , y ) = � n i =1 p ( µ i | η, σ 2 , τ 2 , y i ) �� − 1 � � � � � σ 2 / n i + η y i p ( µ i | η, σ 2 , τ 2 , y i ) σ 2 / n i + 1 1 σ 2 / n i + 1 1 = N , τ 2 τ 2 τ 2 p ( η | µ, σ 2 , τ 2 , y ) µ, τ 2 / I � � = N � n j = IG ( a σ + n / 2 , b σ + � I p ( σ 2 | µ, η, τ 2 , y ) j =1 ( y ij − µ i ) 2 / 2) i =1 p ( τ 2 | µ, η, σ 2 , y ) = IG ( a τ + I / 2 , b τ + � I i =1 ( µ i − η ) 2 / 2) where n i y i = � n i j =1 y ij and I µ = � I i =1 µ i . Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 14 / 32

Metropolis-within-Gibbs Metropolis-within-Gibbs We have discussed two Markov chain approaches to sample from a target distribution: Metropolis-Hastings algorithm Gibbs sampling Gibbs sampling assumed we can sample from p ( θ k | θ − k , y ) for all k , but what if we cannot sample from all of these full conditional distributions? For those p ( θ k | θ − k ) that cannot be sampled directly, a single iteration of the Metropolis-Hastings algorithm can be substituted. Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 15 / 32

Metropolis-within-Gibbs Bivariate normal with ρ = 0 . 9 Reconsider the bivariate normal example substituting a Metropolis step in place of a Gibbs step: gibbs_and_metropolis = function(theta0, n_points, rho) { theta = matrix(theta0, nrow=n_points, ncol=2, byrow=TRUE) v = sqrt(1-rho^2) for (i in 2:n_points) { theta[i,1] = rnorm(1, rho*theta[i-1,2], v) # Now do a random-walk Metropolis step theta_prop = rnorm(1, theta[i-1,2], 2.4*v) # optimal proposal variance logr = dnorm(theta_prop, rho*theta[i,1], v, log=TRUE) - dnorm(theta[i-1,2], rho*theta[i,1], v, log=TRUE) theta[i,2] = ifelse(log(runif(1))<logr, theta_prop, theta[i-1,2]) } return(theta) } theta = gibbs_and_metropolis(c(-3,3), n, rho) length(unique(theta[,2]))/length(theta[,2]) # acceptance rate [1] 0.5 Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 16 / 32

Metropolis-within-Gibbs 3 2 1 θ 2 0 −1 −2 −3 −3 −2 −1 0 1 2 3 θ 1 Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 17 / 32

Metropolis-within-Gibbs Hierarchical normal model Let ind ind ∼ N ( µ i , σ 2 ) , ∼ N ( η, τ 2 ) Y ij µ i for i = 1 , . . . , I , j = 1 , . . . , n i , n = � I i =1 n i and prior p ( η, τ, σ ) ∝ Ca + ( τ ; 0 , b τ ) Ca + ( σ ; 0 , b σ ) . The full conditionals are exactly the same except � n j p ( σ | µ, η, τ 2 , y ) ∝ IG ( σ 2 ; n / 2 , � I j =1 ( y ij − µ i ) 2 / 2) Ca + ( σ ; 0 , b σ ) i =1 ∝ IG ( τ 2 ; I / 2 , � I p ( τ 2 | µ, η, σ 2 , y ) i =1 ( µ i − η ) 2 / 2) Ca + ( τ ; 0 , b τ ) where n i y i = � n i j =1 y ij and I µ = � I i =1 µ i . Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 18 / 32

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, - PowerPoint PPT Presentation

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, 2018 Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 1 / 32 Outline Two-component Gibbs sampler Full conditional distribution K -component Gibbs sampler Blocked Gibbs

Gibbs-non-Gibbs dynamical transitions. A large-deviation paradigm R. Fern andez F. den

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees Ruqi Zhang and

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Alexander Dubbs May 13, 2010

Factors of Gibbs measures on subshifts What is a Gibbs measure? Two-ish definitions Equivalence

College P Planning N Night GIBBS GIBBS HIGH IGH SCHOOL SC SCHO HOOL COUNSE SELING OFFICE

Sampling Gibbs : Maximization Expectation Scribes Jered McInerney : 2- hang Xiongyi

Mathematics in the Sciences Eggeling et al. Gibbs sampling for parsMMs with latent variables 1

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling Christopher De Sa Kunle

Multi-parameter models - Gibbs Sampling Applied Bayesian Statistics Dr. Earvin Balderama

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Probability Density (1) Let f ( x 1 , x 2 . . . x n ) be a probability density for the variables {

Lecture 5: Probability Distributions Random Variables Probability Distributions

Stein Variational Newton & other Sampling-Based Inference Methods Robert Scheichl

Distribution regression made easy Philippe Van Kerm Luxembourg Institute of Socio-Economic

s r strtr qt

CS7015 (Deep Learning) : Lecture 17 Recap of Probability Theory, Bayesian Networks, Conditional

Calculating probabilities of two events F OUN DATION S OF P ROBABILITY IN P YTH ON Alexander

Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs,

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, - PowerPoint PPT Presentation

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, 2018 Jarad Niemi (Iowa State) Gibbs sampling March 29, 2018 1 / 32 Outline Two-component Gibbs sampler Full conditional distribution K -component Gibbs sampler Blocked Gibbs

Gibbs-non-Gibbs dynamical transitions. A large-deviation paradigm R. Fern andez F. den

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees Ruqi Zhang and

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Alexander Dubbs May 13, 2010

Factors of Gibbs measures on subshifts What is a Gibbs measure? Two-ish definitions Equivalence

College P Planning N Night GIBBS GIBBS HIGH IGH SCHOOL SC SCHO HOOL COUNSE SELING OFFICE

Sampling Gibbs : Maximization Expectation Scribes Jered McInerney : 2- hang Xiongyi

Mathematics in the Sciences Eggeling et al. Gibbs sampling for parsMMs with latent variables 1

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling Christopher De Sa Kunle

Multi-parameter models - Gibbs Sampling Applied Bayesian Statistics Dr. Earvin Balderama

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Probability Density (1) Let f ( x 1 , x 2 . . . x n ) be a probability density for the variables {

Lecture 5: Probability Distributions Random Variables Probability Distributions

Stein Variational Newton &amp; other Sampling-Based Inference Methods Robert Scheichl

Distribution regression made easy Philippe Van Kerm Luxembourg Institute of Socio-Economic

s r strtr qt

CS7015 (Deep Learning) : Lecture 17 Recap of Probability Theory, Bayesian Networks, Conditional

Calculating probabilities of two events F OUN DATION S OF P ROBABILITY IN P YTH ON Alexander

Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs,

Stein Variational Newton & other Sampling-Based Inference Methods Robert Scheichl