Markov chain Monte Carlo Reminder Need to sample large, - PowerPoint PPT Presentation

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: — Markov chain Monte Carlo (MCMC) — Gibbs and Metropolis–Hastings S θ ∼ P ( θ |D ) = P ( D| θ ) P ( θ ) P ( x |D ) ≈ 1 � — Slice sampling P ( x | θ ) , P ( D ) S — Practical details s =1 Iain Murray http://iainmurray.net/ Importance sampling weights Metropolis algorithm w = 0.00548 w = 1.59e-08 w = 9.65e-06 w = 0.371 w = 0.103 3 2.5 • Perturb parameters: Q ( θ ′ ; θ ) , e.g. N ( θ, σ 2 ) 2 � � ˜ P ( θ ′ |D ) 1.5 • Accept with probability min 1 , ˜ P ( θ |D ) 1 w = 1.01e-08 w = 0.111 w = 1.92e-09 w = 0.0126 w = 1.1e-51 0.5 • Otherwise keep old parameters 0 0 0.5 1 1.5 2 2.5 3 Detail: Metropolis, as stated, requires Q ( θ ′ ; θ ) = Q ( θ ; θ ′ ) This subfigure from PRML, Bishop (2006)

Target distribution Z e − E ( x ) 1 P ( x ) = � e.g. x = E ( x ) = W ij x i x j , x i ∈{± 1 } ( i,j ) ∈ edges > 20,000 citations Local moves Markov chain exploration → → Q ( x ′ ; x ) ւ ↓ ց ↓ Goal: a Markov chain, x t ∼ T ( x t ← x t − 1 ) , such that: P ( x ( t ) ) = e − E ( x ( t ) ) /Z for large t.

Ergodicity Invariant/stationary condition Unique invariant distribution If x ( t − 1) is a sample from P , if ‘forget’ starting point, x (0) x ( t ) is also a sample from P . � T ( x ′ ← x ) P ( x ) = P ( x ′ ) x Gibbs sampling Quick review MCMC: biased random walk exploring a target dist. Pick variables in turn or randomly, and resample P ( x i | x j � = i ) Markov steps, z 2 x ( s ) ∼ T x ( s ) ← x ( s − 1) � � L MCMC gives approximate, correlated samples ? l S E P [ f ] ≈ 1 � f ( x ( s ) ) S s =1 z 1 T must leave target invariant T i ( x ′ ← x ) = P ( x ′ i | x j � = i ) δ ( x ′ j � = i − x j � = i ) T must be able to get everywhere in K steps (for some K )

Gibbs sampling correctness Reverse operators P ( x ) = P ( x i | x \ i ) P ( x \ i ) If T leaves P ( x ) stationary, define a reverse operator T ( x ′ ← x ) P ( x ) x T ( x ′ ← x ) P ( x ) = T ( x ′ ← x ) P ( x ) R ( x ← x ′ ) = . P ( x ′ ) � Simulate by drawing x \ i , then x i | x \ i A necessary condition: there exists R such that: Draw x \ i : sample x , throw initial x i away T ( x ′ ← x ) P ( x ) = R ( x ← x ′ ) P ( x ′ ) , ∀ x, x ′ . If R = T , known as detailed balance (not necessary) Balance condition Metropolis–Hastings 3 2.5 T ( x ′ ← x ) P ( x ) = R ( x ← x ′ ) P ( x ′ ) 2 Arbitrary proposals ∼ Q : 1.5 1 Q ( x ′ ; x ) P ( x ) � = Q ( x ; x ′ ) P ( x ′ ) 0.5 0 0 0.5 1 1.5 2 2.5 3 PRML, Bishop (2006) Satisfies detailed balance by rejecting moves:  � � x ′ � = x 1 , P ( x ′ ) Q ( x ; x ′ ) Q ( x ′ ; x ) min Implies that P ( x ) is left invariant:  P ( x ) Q ( x ′ ; x )  ✯ 1 T ( x ′ ← x ) = ✟ ✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟ � � T ( x ′ ← x ) P ( x ) = P ( x ′ ) R ( x ← x ′ ) x ′ = x  . . .  x x

Matlab/Octave code for demo Metropolis–Hastings Transition operator function samples = dumb_metropolis(init, log_ptilde, iters, sigma) • Propose a move from the current state Q ( x ′ ; x ) , e.g. N ( x, σ 2 ) D = numel(init); samples = zeros(D, iters); � � 1 , P ( x ′ ) Q ( x ; x ′ ) • Accept with probability min P ( x ) Q ( x ′ ; x ) state = init; • Otherwise next state in chain is a copy of current state Lp_state = log_ptilde(state); for ss = 1:iters Notes % Propose prop = state + sigma*randn(size(state)); • Can use P ∗ ∝ P ( x ) ; normalizer cancels in acceptance ratio Lp_prop = log_ptilde(prop); if log(rand) < (Lp_prop - Lp_state) • Satisfies detailed balance (shown below) % Accept state = prop; • Q must be chosen so chain is ergodic Lp_state = Lp_prop; end � � 1 , P ( x ′ ) Q ( x ; x ′ ) samples(:, ss) = state(:); P ( x ) · T ( x ′ ← x ) = P ( x ) · Q ( x ′ ; x ) min � P ( x ) Q ( x ′ ; x ) , P ( x ′ ) Q ( x ; x ′ ) � = min P ( x ) Q ( x ′ ; x ) end � P ( x ) Q ( x ′ ; x ) � = P ( x ′ ) · Q ( x ; x ′ ) min = P ( x ′ ) · T ( x ← x ′ ) 1 , P ( x ′ ) Q ( x ; x ′ ) Diffusion time Step-size demo Explore N (0 , 1) with different step sizes σ sigma = @(s) plot(dumb_metropolis(0, @(x)-0.5*x*x, 1e3, s)); Generic proposals use P Q ( x ′ ; x ) = N ( x, σ 2 ) sigma(0.1) 4 2 0 99.8% accepts −2 Q σ large → many rejections −4 0 100 200 300 400 500 600 700 800 900 1000 L σ small → slow diffusion: sigma(1) 4 2 ∼ ( L/σ ) 2 iterations required 0 68.4% accepts −2 −4 0 100 200 300 400 500 600 700 800 900 1000 sigma(100) 4 2 0 Adapted from MacKay (2003) 0.5% accepts −2 −4 0 100 200 300 400 500 600 700 800 900 1000

An MCMC strategy Slice sampling idea Sample point uniformly under curve P ∗ ( x ) ∝ P ( x ) Come up with good proposals Q ( x ′ ; x ) ˜ P ( x ) Combine transition operators: ( x, u ) x 1 ∼ T A ( ·← x 0 ) u x 2 ∼ T B ( ·← x 1 ) x x 3 ∼ T C ( ·← x 2 ) p ( u | x ) = Uniform [0 , P ∗ ( x )] x 4 ∼ T A ( ·← x 3 ) x 5 ∼ T B ( ·← x 4 ) � P ∗ ( x ) ≥ u 1 . . . p ( x | u ) ∝ = “Uniform on the slice” 0 otherwise Slice sampling Slice sampling Multimodal conditionals Unimodal conditionals ˜ P ( x ) ( x, u ) ( x, u ) ( x, u ) ( x, u ) u u u x x x u x • bracket slice • place bracket randomly around point • sample uniformly within bracket • shrink bracket if P ∗ ( x ) < u (off slice) • linearly step out until bracket ends are off slice • sample on bracket, shrinking as before • accept first point on the slice Satisfies detailed balance , leaves p ( x | u ) invariant

Slice sampling Summary Advantages of slice-sampling: • We need approximate methods to solve sums/integrals • Easy — only requires P ∗ ( x ) ∝ P ( x ) • Monte Carlo does not explicitly depend on dimension. Using samples from simple Q ( x ) only works in low dimensions. • No rejections • Markov chain Monte Carlo (MCMC) can make local moves. • Tweak params not too important Sample from complex distributions, even in high dimensions. • simple computations ⇒ “easy” to implement (harder to diagnose). There are more advanced versions. Neal (2003) contains many ideas. How should we run MCMC? Forming estimates • The samples aren’t independent. Should we thin , Approximately independent samples can be obtained by thinning . only keep every K th sample? However, all the samples can be used. • Arbitrary initialization means starting iterations are bad. Use the simple Monte Carlo estimator on MCMC samples. It is: Should we discard a “burn-in” period ? — consistent — unbiased if the chain has “burned in” • Maybe we should perform multiple runs? • How do we know if we have run for long enough? The correct motivation to thin: if computing f ( x ( s ) ) is expensive In some special circumstances strategic thinning can help. Steven N. MacEachern and Mario Peruggia, Statistics & Probability Letters, 47(1):91–98, 2000. http://dx.doi.org/10.1016/S0167-7152(99)00142-X — Thanks to Simon Lacoste-Julien for the reference.

Consistency checks Empirical diagnostics Do I get the right answer on tiny versions of my problem? Can I make good inferences about synthetic data drawn from my model? Rasmussen (2000) Recommendations For diagnostics: Getting it right: joint distribution tests of posterior simulators, Standard software packages like R-CODA John Geweke, JASA , 99(467):799–804, 2004. For opinion on thinning, multiple runs, burn in, etc. Practical Markov chain Monte Carlo Posterior Model checking: Gelman et al. Bayesian Data Analysis Charles J. Geyer, Statistical Science . 7(4):473–483, 1992. textbook and papers. http://www.jstor.org/stable/2246094 Summary Getting it right Write down the probability of everything. We write MCMC code to update θ | y θ Condition on what you know, Idea: also write code to sample y | θ sample everything that you don’t. Both codes leave P ( θ, y ) invariant y Samples give plausible explanations: — Look at them Run codes alternately. Check θ ’s match prior — Average their predictions

Markov chain Monte Carlo Reminder Need to sample large, - PowerPoint PPT Presentation

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov chain Monte Carlo (MCMC) Gibbs and MetropolisHastings S P ( |D ) = P ( D| ) P ( ) P ( x |D ) 1 Slice sampling P

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Asteroid orbital inversion using Asteroid orbital inversion using Markov-chain Monte Carlo

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

How to Encipher Messages on a Small Domain Deterministic Encryption and the Thorp Shuffle Ben

Comparison of Random Walk based techniques for estimating network averages Jithin K. Sreedharan

Markov Chains and MCMC CompSci 590.04 Instructor:

Textures I Week 9, Wed Mar 14 http://www.ugrad.cs.ubc.ca/~cs314/Vjan2007 Reading for Today and

Stochastic Simulation Discrete simulation/event-by-event Bo Friis Nielsen Institute of

The Stochastic EM Algorithm for Censored Mixed Models Ian Marschner Macquarie University, Sydney

Machine learning for lattice theories Real-world lattices 2 Machine learning for lattice

Burts Social Origin of Good Ideas paper Steve Borgatti Study Design Ask people

Markov chain Monte Carlo Reminder Need to sample large, - PowerPoint PPT Presentation

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov chain Monte Carlo (MCMC) Gibbs and MetropolisHastings S P ( |D ) = P ( D| ) P ( ) P ( x |D ) 1 Slice sampling P

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Asteroid orbital inversion using Asteroid orbital inversion using Markov-chain Monte Carlo

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

How to Encipher Messages on a Small Domain Deterministic Encryption and the Thorp Shuffle Ben

Comparison of Random Walk based techniques for estimating network averages Jithin K. Sreedharan

Markov Chains and MCMC CompSci 590.04 Instructor:

Textures I Week 9, Wed Mar 14 http://www.ugrad.cs.ubc.ca/~cs314/Vjan2007 Reading for Today and

Stochastic Simulation Discrete simulation/event-by-event Bo Friis Nielsen Institute of

The Stochastic EM Algorithm for Censored Mixed Models Ian Marschner Macquarie University, Sydney

Machine learning for lattice theories Real-world lattices 2 Machine learning for lattice

Burts Social Origin of Good Ideas paper Steve Borgatti Study Design Ask people

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly