lecture 12
play

Lecture 12 The remaining samples x 1 ,,x S is an approximation of - PowerPoint PPT Presentation

METROPOLIS LOREM HASTINGS (MH) I P S U M Royal Institute of Technology We want to compute p*(x) (typically p(x|D)) STAT. METH. IN How? Implicitly construct Markov Chain M CS COLLAPSED with stationary distribution p*(x) GIBBS


  1. METROPOLIS LOREM HASTINGS (MH) I P S U M Royal Institute of Technology We want to compute p*(x) (typically p(x|D)) STAT. METH. IN How? Implicitly construct Markov Chain M CS – COLLAPSED with stationary distribution p*(x) GIBBS SAMPLER Traverse it and sample every k:th visit Use good or random starting point p*(x) ≈ [ ∑ i I(x=x i ) ]/S Discard the first l:th samples Lecture 12 The remaining samples x 1 ,…,x S is an approximation of p*(x) Notation � D = ( x 1 , . . . , x N ), H = ( z 1 , . . . , z N ), N k = I ( z i = k ) GIBBS SAMPLING n π = ( π 1 , . . . , π k ), µ = ( µ i , . . . , µ k ), λ = ( λ i , . . . , λ k ), and λ k = 1 / σ 2 k Hyperparameters θ 0 = ( µ 0 , λ 0 , λ 0 , β 0 , α ) ★ Pick initial state x 1 =(x 1,1 ,…,x 1,K ) Model ★ For s=1 to S π ∼ Dir( α ), µ k ∼ N ( µ 0 , λ 0 ), λ k ∼ Ga( α 0 , β 0 ), z i ∼ Cat( π ), and Sample k~ u [K] p ( x n | Z n = k ) = N ( µ k , λ k ) • Sample x s+1,k ~ p(x s+1,k | x s,-k ) • GIBBS SAMPLER FOR Let x s+1 = (x s,1 ,…,x 1,k-1 , x s+1,k ,…, x s,K ) • GMM If k|s record x s+1 (thinning) •

  2. Hyperparameters θ 0 = ( µ 0 , λ 0 , λ 0 , β 0 , α ) A STATE Model π ∼ Dir( α ), µ k ∼ N ( µ 0 , λ 0 ), λ k ∼ Ga( α 0 , β 0 ), z i ∼ Cat( π ), and p ( x n | Z n = k ) = N ( µ k , λ k ) Likelihood ( H, π , µ , λ ) p ( D, H, π , µ , λ ) = p ( D, H | π , µ , λ ) p ( π ) p ( µ , λ ) � [ π k N ( x n | µ k , λ k )] I ( z n = k ) Dir( π | α ) = n,k � N ( µ k | µ 0 , λ 0 )Ga( λ k | α 0 , β 0 ) k LIKELIHOOD FOR GMM − 350 − 350 − 400 − 400 COLLAPSING COLLAPSING log p(x | π , θ ) log p(x | π , θ ) − 450 − 450 − 500 − 500 − 550 − 550 Integrating out some Standard Gibbs Sampler Integrating out some Standard Gibbs Sampler Rao − Blackwellized Sampler Rao − Blackwellized Sampler − 600 − 600 0 1 2 3 0 1 2 3 10 10 10 10 10 10 10 10 components of the state components of the Iteration Iteration is called collapsing posterior is called collapsing − 350 − 350 It always improves − 400 − 400 convergence It always improves convergence log p(x | π , θ ) log p(x | π , θ ) − 450 − 450 − 500 − 500 − 550 − 550 Standard Gibbs Sampler Standard Gibbs Sampler Rao − Blackwellized Sampler Rao − Blackwellized Sampler − 600 − 600 0 1 2 3 0 1 2 3 10 10 10 10 10 10 10 10 Iteration Iteration

  3. COLLAPSED NEW FULL GIBBS SAMPLER CONDITIONAL FOR GMM =3 Integrate out π , μ , and λ from p ( z n | D, H − n , Θ 0 ) p(D,H, μ , λ , π , ϴ 0 ) = p ( z n , D | H − n , Θ 0 ) p ( D | H − n , Θ 0 ) ϴ 0 is all the hyperparameters ∝ p ( z n , D | H − n , Θ 0 ) Only full conditionals on z n remain ∝ p ( z n | H − n , Θ 0 ) p ( D | z n , H − n , Θ 0 ) Same joint as before except from ∝ p ( z n | H − n , Θ 0 ) p ( x n | D − n , z n , H − n , Θ 0 ) p ( D − n | z n , H − n , Θ 0 ) conjugate prior on μ k and λ k ∝ p ( z n | H − n , Θ 0 ) p ( x n | D − n , z n , H − n , , Θ 0 ) “easy” with a conjugate prior on μ k and λ k but varies as we vary H THE COLLAPSED FULL CONDITIONAL ALGORITHM Recall, π ∼ Dir( α ), µ k ∼ N ( µ 0 , λ 0 ), λ k ∼ Ga( α 0 , β 0 ), z i ∼ Cat( π ), and p ( x n | Z n = k ) = N ( µ k , λ k ) I.e., marginal So, � where α = α k k

  4. CONVERGENC − 350 E COLLAPSED − 400 log p(x | π , θ ) VS STANDARD − 450 − 500 − 550 Standard Gibbs Sampler Rao − Blackwellized Sampler − 600 0 1 2 3 10 10 10 10 Iteration Blue is standard • − 350 Red is collapsed • − 400 x iterations • log p(x | π , θ ) − 450 THE END y loglikelihood • − 500 − 550 Standard Gibbs Sampler Rao − Blackwellized Sampler − 600 0 1 2 3 10 10 10 10 Iteration

  5. F IGURE 3. Comparison of the convergence time of PosetSMC and MCMC. We generated coalescent trees of different sizes and data sets of 1000 nucleotides. We computed the L1 distance of the minimum Bayes risk reconstruction to the true generating tree as a function of the running time (in units of the number of peeling recursions, on a log scale). The missing MCMC data points are due to MrBayes stalling on these executions.

Recommend


More recommend