Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Importance Sampling (Contd.) Choose q ( x ) that minimizes variance of ˆ I n ( f ) var q ( f ( x ) w ( x )) = E q [ f 2 ( x ) w 2 ( x )] − I 2 ( f ) Applying Jensen’s and optimizing, we get | f ( x ) | p ( x ) q ∗ ( x ) = � | f ( x ) | p ( x ) dx Efficient sampling focuses on regions of high | f ( x ) | p ( x )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Importance Sampling (Contd.) Choose q ( x ) that minimizes variance of ˆ I n ( f ) var q ( f ( x ) w ( x )) = E q [ f 2 ( x ) w 2 ( x )] − I 2 ( f ) Applying Jensen’s and optimizing, we get | f ( x ) | p ( x ) q ∗ ( x ) = � | f ( x ) | p ( x ) dx Efficient sampling focuses on regions of high | f ( x ) | p ( x ) Super efficient sampling, variance lower than even q ( x ) = p ( x )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Importance Sampling (Contd.) Choose q ( x ) that minimizes variance of ˆ I n ( f ) var q ( f ( x ) w ( x )) = E q [ f 2 ( x ) w 2 ( x )] − I 2 ( f ) Applying Jensen’s and optimizing, we get | f ( x ) | p ( x ) q ∗ ( x ) = � | f ( x ) | p ( x ) dx Efficient sampling focuses on regions of high | f ( x ) | p ( x ) Super efficient sampling, variance lower than even q ( x ) = p ( x ) Exploited to evaluate probability of rare events, q ( x ) ∝ I E ( x ) p ( x )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Importance Sampling (Contd.) Choose q ( x ) that minimizes variance of ˆ I n ( f ) var q ( f ( x ) w ( x )) = E q [ f 2 ( x ) w 2 ( x )] − I 2 ( f ) Applying Jensen’s and optimizing, we get | f ( x ) | p ( x ) q ∗ ( x ) = � | f ( x ) | p ( x ) dx Efficient sampling focuses on regions of high | f ( x ) | p ( x ) Super efficient sampling, variance lower than even q ( x ) = p ( x ) Exploited to evaluate probability of rare events, q ( x ) ∝ I E ( x ) p ( x )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Importance Sampling (Contd.)
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if Irreducible, transition graph is connected 1
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if Irreducible, transition graph is connected 1 Aperiodic, does not get trapped in cycles 2
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if Irreducible, transition graph is connected 1 Aperiodic, does not get trapped in cycles 2 Sufficient condition to ensure p ( x ) is the invariant distribution p ( x i ) T ( x i − 1 | x i ) = p ( x i − 1 ) T ( x i | x i − 1 )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if Irreducible, transition graph is connected 1 Aperiodic, does not get trapped in cycles 2 Sufficient condition to ensure p ( x ) is the invariant distribution p ( x i ) T ( x i − 1 | x i ) = p ( x i − 1 ) T ( x i | x i − 1 ) MCMC samplers, invariant distribution = target distribution
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains Use a Markov chain to explore the state space Markov chain in a discrete space is a process with p ( x i | x i − 1 , . . . , x 1 ) = T ( x i | x i − 1 ) A chain is homogenous if T is invariant for all i MC will stabilize into an invariant distribution if Irreducible, transition graph is connected 1 Aperiodic, does not get trapped in cycles 2 Sufficient condition to ensure p ( x ) is the invariant distribution p ( x i ) T ( x i − 1 | x i ) = p ( x i − 1 ) T ( x i | x i − 1 ) MCMC samplers, invariant distribution = target distribution Design of samplers for fast convergence
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E L = link matrix for the web graph
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E L = link matrix for the web graph E = uniform random matrix, to ensure irreducibility, aperiodicity
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E L = link matrix for the web graph E = uniform random matrix, to ensure irreducibility, aperiodicity Invariant distribution p ( x ) represents rank of webpage x
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E L = link matrix for the web graph E = uniform random matrix, to ensure irreducibility, aperiodicity Invariant distribution p ( x ) represents rank of webpage x Continuous spaces, T becomes an integral kernel K � p ( x i ) K ( x i +1 | x i ) dx i = p ( x i +1 ) x i
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Markov Chains (Contd.) Random walker on the web Irreducibility, should be able to reach all pages Aperiodicity, do not get stuck in a loop PageRank uses T = L + E L = link matrix for the web graph E = uniform random matrix, to ensure irreducibility, aperiodicity Invariant distribution p ( x ) represents rank of webpage x Continuous spaces, T becomes an integral kernel K � p ( x i ) K ( x i +1 | x i ) dx i = p ( x i +1 ) x i p ( x ) is the corresponding eigenfunction
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x ) Algorithm: For i = 0 , . . . , ( n − 1)
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x ) Algorithm: For i = 0 , . . . , ( n − 1) Sample u ∼ U (0 , 1)
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x ) Algorithm: For i = 0 , . . . , ( n − 1) Sample u ∼ U (0 , 1) Sample x ∗ ∼ q ( x ∗ | x i )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x ) Algorithm: For i = 0 , . . . , ( n − 1) Sample u ∼ U (0 , 1) Sample x ∗ ∼ q ( x ∗ | x i ) Then � � � 1 , p ( x ∗ ) q ( x i | x ∗ ) x ∗ if u < A ( x i , x ∗ ) = min p ( x i ) q ( x ∗ | x i ) x i +1 = otherwise x i
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm Most popular MCMC method Based on a proposal distribution q ( x ∗ | x ) Algorithm: For i = 0 , . . . , ( n − 1) Sample u ∼ U (0 , 1) Sample x ∗ ∼ q ( x ∗ | x i ) Then � � � 1 , p ( x ∗ ) q ( x i | x ∗ ) x ∗ if u < A ( x i , x ∗ ) = min p ( x i ) q ( x ∗ | x i ) x i +1 = otherwise x i The transition kernel is K MH ( x i +1 | x i ) = q ( x i +1 | x i ) A ( x i , x i +1 ) + δ x i ( x i +1 ) r ( x i ) where r ( x i ) is the term associated with rejection � r ( x i ) = q ( x | x i )(1 − A ( x i , x )) dx x
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.)
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution Basic properties
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution Basic properties Irreducibility, ensure support of q contains support of p
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution Basic properties Irreducibility, ensure support of q contains support of p Aperiodicity, ensured since rejection is always a possibility
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution Basic properties Irreducibility, ensure support of q contains support of p Aperiodicity, ensured since rejection is always a possibility Independent sampler: q ( x ∗ | x i ) = q ( x ∗ ) so that 1 , p ( x ∗ ) q ( x i ) � � A ( x i , x ∗ ) = min q ( x ∗ ) p ( x i )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.) By construction p ( x i ) K MH ( x i +1 | x i ) = p ( x i +1 ) K MH ( x i | x i +1 ) Implies p ( x ) is the invariant distribution Basic properties Irreducibility, ensure support of q contains support of p Aperiodicity, ensured since rejection is always a possibility Independent sampler: q ( x ∗ | x i ) = q ( x ∗ ) so that 1 , p ( x ∗ ) q ( x i ) � � A ( x i , x ∗ ) = min q ( x ∗ ) p ( x i ) Metropolis sampler: symmetric q ( x ∗ | x i ) = q ( x i | x ∗ ) 1 , p ( x ∗ ) � � A ( x i , x ∗ ) = min p ( x i )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers The Metropolis-Hastings Algorithm (Contd.)
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s)
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s) Simulate a non-homogenous Markov chain
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s) Simulate a non-homogenous Markov chain Invariant distribution at iteration i is p i ( x ) ∝ p 1 / T i ( x )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s) Simulate a non-homogenous Markov chain Invariant distribution at iteration i is p i ( x ) ∝ p 1 / T i ( x ) Sample update follows � 1 � Ti ( x ∗ ) q ( x i | x ∗ ) 1 , p x ∗ if u < A ( x i , x ∗ ) = min 1 x i +1 = Ti ( x i ) q ( x ∗ | x i ) p otherwise x i
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s) Simulate a non-homogenous Markov chain Invariant distribution at iteration i is p i ( x ) ∝ p 1 / T i ( x ) Sample update follows � 1 � Ti ( x ∗ ) q ( x i | x ∗ ) 1 , p x ∗ if u < A ( x i , x ∗ ) = min 1 x i +1 = Ti ( x i ) q ( x ∗ | x i ) p otherwise x i T i decreases following a cooling schedule, lim i →∞ T i = 0
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing Problem: To find global maximum of p ( x ) Initial idea: Run MCMC, estimate ˆ p ( x ), compute max Issue: MC may not come close to the mode(s) Simulate a non-homogenous Markov chain Invariant distribution at iteration i is p i ( x ) ∝ p 1 / T i ( x ) Sample update follows � 1 � Ti ( x ∗ ) q ( x i | x ∗ ) 1 , p x ∗ if u < A ( x i , x ∗ ) = min 1 x i +1 = Ti ( x i ) q ( x ∗ | x i ) p otherwise x i T i decreases following a cooling schedule, lim i →∞ T i = 0 1 Cooling schedule needs proper choice, e.g., T i = C log( i + T 0 )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Simulated Annealing (Contd.)
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x Estimate the expectation using MCMC
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x Estimate the expectation using MCMC Draw samples using MH with acceptance probability 1 , p ( x | z ∗ , θ n ) p ( z ∗ | θ n ) q ( z | z ∗ ) � � A ( z , z ∗ ) = min p ( x | z , θ n ) p ( z | θ n ) q ( z ∗ | z )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x Estimate the expectation using MCMC Draw samples using MH with acceptance probability 1 , p ( x | z ∗ , θ n ) p ( z ∗ | θ n ) q ( z | z ∗ ) � � A ( z , z ∗ ) = min p ( x | z , θ n ) p ( z | θ n ) q ( z ∗ | z ) Several variants:
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x Estimate the expectation using MCMC Draw samples using MH with acceptance probability 1 , p ( x | z ∗ , θ n ) p ( z ∗ | θ n ) q ( z | z ∗ ) � � A ( z , z ∗ ) = min p ( x | z , θ n ) p ( z | θ n ) q ( z ∗ | z ) Several variants: Stochastic EM: Draw one sample
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Monte Carlo EM E-step involves computing an expectation � Q ( θ, θ n ) = log p ( x , z | θ ) p ( z | x , θ n ) dx x Estimate the expectation using MCMC Draw samples using MH with acceptance probability 1 , p ( x | z ∗ , θ n ) p ( z ∗ | θ n ) q ( z | z ∗ ) � � A ( z , z ∗ ) = min p ( x | z , θ n ) p ( z | θ n ) q ( z ∗ | z ) Several variants: Stochastic EM: Draw one sample Monte Carlo EM: Draw multiple samples
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals Global proposals explore the entire space (with probability α )
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals Global proposals explore the entire space (with probability α ) Local proposals discover finer details (with probability (1 − α ))
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals Global proposals explore the entire space (with probability α ) Local proposals discover finer details (with probability (1 − α )) Example: Target has many narrow peaks
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals Global proposals explore the entire space (with probability α ) Local proposals discover finer details (with probability (1 − α )) Example: Target has many narrow peaks Global proposal gets the peaks
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Mixtures of MCMC Kernels Powerful property of MCMC: Combination of Samplers Let K 1 , K 2 be kernels with invariant distribution p Mixture kernel α K 1 + (1 − α ) K 2 , α ∈ [0 , 1] converges to p Cycle kernel K 1 K 2 converges to p Mixtures can use global and local proposals Global proposals explore the entire space (with probability α ) Local proposals discover finer details (with probability (1 − α )) Example: Target has many narrow peaks Global proposal gets the peaks Local proposals get the neighborhood of peaks (random walk)
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks Each block can be updated separately
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks Each block can be updated separately Convergence is faster if correlated variables are blocked
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks Each block can be updated separately Convergence is faster if correlated variables are blocked Transition kernel is given by n b K MH ( j ) ( x ( i +1) | x ( i ) b j , x ( i +1) K MHCycle ( x ( i +1) | x ( i ) ) = � − [ b j ] ) b j j =1
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks Each block can be updated separately Convergence is faster if correlated variables are blocked Transition kernel is given by n b K MH ( j ) ( x ( i +1) | x ( i ) b j , x ( i +1) K MHCycle ( x ( i +1) | x ( i ) ) = � − [ b j ] ) b j j =1 Trade-off on block size
Basics MCMC Gibbs Sampling Auxiliary Variable Samplers Cycles of MCMC Kernels Split a multi-variate state into blocks Each block can be updated separately Convergence is faster if correlated variables are blocked Transition kernel is given by n b K MH ( j ) ( x ( i +1) | x ( i ) b j , x ( i +1) K MHCycle ( x ( i +1) | x ( i ) ) = � − [ b j ] ) b j j =1 Trade-off on block size If block size is small, chain takes long time to explore the space
Recommend
More recommend