Introduction to MCMC and BUGS Basic recipes, and a sample of some - PowerPoint PPT Presentation

Introduction to MCMC and BUGS ◮ Basic recipes, and a sample of some techniques for getting started. ◮ Two simple worked out examples. ◮ R-code will be provided for those. ◮ No background in MCMC assumed. ◮ Not for experts!

MCMC: Reminder Slide Outline: ◮ Motivation ◮ Monte Carlo integration ◮ Markov chains ◮ MCMC

Bayesian Inference ◮ Data: Y (realisation y ) ◮ Parameters, latent variables: θ = ( θ 1 , θ 2 , . . . , θ k ) Assume: ◮ Likelihood: f ( y | θ ) . Distribution of data given parameters. ◮ Prior: π ( θ ) . Prior distribution, before we observe data. Inference is based on the joint posterior distribution. f ( y | θ ) π ( θ ) π ( θ | y ) = f ( y | θ ) π ( θ ) d θ R f ( y | θ ) π ( θ ) ∝ i . e . Posterior Likelihood × Prior ∝

Example 1: Normal-Cauchy iid ◮ Suppose Y 1 , . . . , Y n ∼ N ( θ, 1 ) . Don’t know θ . ◮ iid=independently and identically distributed. 1 ◮ Assume Cauchy prior distribution: π ( θ ) = π ( 1 + θ 2 ) . Posterior density: � � P n i = 1 ( y i − θ ) 2 π ( θ | y ) 1 ∝ exp − × 1 + θ 2 2 � � − n ( θ − ¯ y ) 2 1 ∝ × exp (can be shown) 1 + θ 2 2 Things of interest to Bayesians: � ∞ ◮ Posterior Mean = E ( θ | y ) = −∞ θ π ( θ | y ) d θ . ◮ Posterior Variance = Var ( θ | y ) = E ( θ 2 | y ) − { E ( θ | y ) } 2 . ◮ Credible interval: { a ( y ) , b ( y ) } for θ such that Pr { a ( y ) < θ < b ( y ) | y } = 0 . 95, for example.

Example 2: One sample normal. iid ◮ Data: Y 1 , . . . , Y n ∼ N ( µ, σ 2 ) . ◮ Let precision = τ = 1 /σ 2 . ◮ Assume a ‘flat’ prior distribution for µ , i.e. π ( µ ) = 1 ◮ and let τ ∼ Gamma ( a , b ) , a , b > 0, has mean a / b . π ( µ, τ ) ∝ τ a − 1 e − b τ . ◮ Joint posterior density: � � ( τ ) n / 2 + a − 1 exp − τ P ( y i − µ ) 2 π ( µ, τ | y ) − b τ ∝ 2 For making inference, we want quantities like: � ∞ � ∞ ◮ Posterior Mean = E ( µ | y ) = µ π ( µ, τ | y ) d µ d τ . −∞ 0 ◮ Posterior Variance = Var ( µ | y ) = Horrible double integrals! Difficult to solve in general. ◮ Credible intervals involve even more of these!

Monte Carlo (MC) integration ◮ General problem: Evaluate the expectation of any general function h ( X ) of the random variable X : � E π [ h ( X )] = h ( x ) π ( x ) dx � | h ( x ) | π ( x ) dx < ∞ . assuming that it exists, i.e. ◮ This can be very difficult, but if we can draw samples X ( 1 ) , X ( 2 ) , . . . , X ( N ) ∼ π ( x ) then we can estimate E π [ h ( X )] by N � � X ( t ) � h N = 1 h ¯ . N t = 1 ◮ This is the basic idea of statistics: “Estimate unknowns by drawing samples.” ◮ Change notation: Parameters = θ ≡ x ; Posterior = π ( θ | y ) = π ( x ) . Replace y by its value and suppress.

Consistency ◮ For independent samples, by Law of Large Numbers (LLN), N � � X ( t ) � 1 h N ¯ h = N t = 1 E π [ h ( X )] as N → ∞ . → (1) ◮ But independent sampling from π ( x ) is usually difficult. ◮ It turns out that (1), LLN, still applies if we generate samples using a Markov chain. But first, some revision of Markov chains.

Markov chains ◮ A Markov chain is generated by sampling X ( t + 1 ) ∼ p ( x | x ( t ) ) , t = 0 , 1 , 2 , . . . . � ✒ � � � p is called the transition kernel. ◮ So X ( t + 1 ) depends only on X ( t ) , not on X ( 0 ) , X ( 1 ) , . . . , X ( t − 1 ) . p ( X ( t + 1 ) | x ( t ) , x ( t − 1 ) , . . . ) = p ( X ( t + 1 ) | x ( t ) ) ◮ For example: X ( t + 1 ) | x ( t ) ∼ N ( 0 . 5 x ( t ) , 1 . 0 ) . ◮ This is called a first order auto-regressive process with lag-1 auto-correlation 0.5.

◮ Simulation of the chain: X ( t + 1 ) | x ( t ) ∼ N ( 0 . 5 x ( t ) , 1 . 0 ) with two different starting points x ( 0 ) (one red and one blue). 4 2 0 –2 –4 0 10 20 30 40 50 60 i ◮ After about 5–7 iterations the chains seemed to have forgotten their starting positions. ◮ Caution: Have p ( X ( t + 1 ) | x ( t ) ) = p ( X ( t + 1 ) | x ( t ) , x ( t − 1 ) ) , it does not mean that X ( t + 1 ) and X ( t − 1 ) are independent marginally.

Stationarity As t → ∞ , the Markov chain converges to its stationary distribution. � ✒ ✻ � � in distribution or invariant ◮ In the above example (it can be proved that), this is X ( t ) | x ( 0 ) ∼ N ( 0 . 0 , 1 . 33 ) , as t → ∞ which does not depend on x ( 0 ) . ◮ Does this happen for all Markov chains?

Irreducibility ◮ Assuming that a stationary distribution exists, it is unique if the chain is irreducible . ◮ Irreducible means any set of states (values) can be reached from any other state (value) in a finite number of moves. ◮ An example of a reducible Markov chain: Suppose p ( x | y ) = 0 for x ∈ A and y ∈ B and vice versa. ✗✔ B ✗✔ ✖✕ A ✖✕ ◮ Here the whole set of values is the rectangle.

Aperiodicity A Markov chain taking only a finite number of values is aperiodic if greatest common divisor (g.c.d.) of return times to any particular state i say, is 1. ◮ Think of recording the number of steps taken to return to the state 1. The g.c.d. of those numbers should be 1. ◮ If the g.c.d. is bigger than 1, 2 say, then the chain will return in cycles of 2, 4, 6, ... number of steps. This is not allowed for aperiodicity. ◮ Definition can be extended to general state space cases (where the Markov chain can take any value in a continuous interval).

Ergodicity Suppose that the Markov chain has the stationary distribution π ( x ) and is aperiodic and irreducible. Then it is said to be ergodic . ◮ Ergodic theorem : N � � X ( t ) � 1 h N h ¯ = N t = 1 E π [ h ( X )] as N → ∞ . → h N is called an ergodic average. ◮ ¯ ◮ If h = Var π [ h ( X )] < ∞ , σ 2 then the central limit theorem holds, i.e, ¯ h N converges to the normal distribution, ◮ and this convergence occurs geometrically fast.

Numerical standard errors (nse) � h N is h N ) , and for large N The nse of ¯ Var π (¯ � � � � N − 1 � � � ¯ � � σ 2 h N h ρ ℓ ( h ) nse ≈ 1 + 2 N ℓ = 1 � � where ρ ℓ ( h ) is the lag- ℓ auto-correlation in h ( X ( t ) ) . ◮ In general no simpler expression exist for the nse. ◮ See Geyer (1992), and Besag and Green (1993) for ideas and further references.

Numerical standard errors (nse) ... � � h ( X ( t ) ) ◮ If can be approximated as a first order auto-regressive process then � � ¯ � σ 2 1 + ρ h N h nse ≈ 1 − ρ, N � � h ( X ( t ) ) where ρ is the lag-1 auto-correlation of . ◮ The first factor is the usual term under independent sampling. ◮ The second term is usually > 1. ◮ And thus is the penalty to be paid for using a Markov chain instead of independent sampling.

Numerical standard errors (nse) ... Moreover, ◮ the nse may not be finite in general. ◮ the nse is finite if the chain converges geometrically. ◮ if the nse is finite, then we can make it as small as we like by increasing N . ◮ it can be proved that the following simple minded estimator of nse is not consistent. � � � � N − 1 � � � ¯ � � σ 2 h N h r ℓ ( h ) ˆ nse = 1 + 2 N ℓ = 1 where r ℓ ( h ) is the sample lag- ℓ auto-correlation in � � h ( x ( t ) ) . Will return to the estimation of nse later.

Markov chains – summary ◮ A Markov chain may have a stationary distribution. ◮ The stationary distribution is unique if the chain is irreducible. ◮ We can estimate nse’s if the chain is also geometrically convergent. Where does this all get us?

MCMC ◮ How do we construct a Markov chain whose stationary distribution is our posterior distribution, π ( x ) ? ◮ Metropolis et al (1953) showed us how. ◮ The method was generalized by Hastings (1970). This is called Markov chain Monte Carlo (MCMC).

Metropolis-Hastings algorithm At each iteration t : � y | x ( t ) � y ∼ q . Step 1 Sample ❅ ■ ❅ � ✒ � “Proposal” distribution � “candidate” point Step 2 Then with probability � � � � q x ( t ) | y 1 , π ( y ) α ( x ( t ) , y ) = min π ( x ( t ) ) q ( y | x ( t ) ) set x ( t + 1 ) = y (acceptance). If not accepted, set x ( t + 1 ) = x ( t ) (rejection).

Metropolis-Hastings algorithm... ◮ The normalising constant in π ( x ) is not required to run the algorithm. It cancels out in the ratio. ◮ If q ( y | x ) = π ( y ) , then we obtain independent samples, α ( x , y ) = 1 always. ◮ Usually q is chosen so that q ( y | x ) is easy to sample from. ◮ Theoretically, any density q ( ·| x ) having the same support (range) should work. However, some q ’s are better than others. See later. ◮ The induced Markov chains have the desirable properties under mild conditions on π ( x ) . ◮ π ( x ) is often called the target distribution . Recall the notation change θ to x .

Implementing MCMC : Reminder Slide ◮ Flavours of Metropolis-Hastings ◮ Gibbs Sampler ◮ Number of Chains ◮ Burn-in and run length ◮ Numerical standard errors

Introduction to MCMC and BUGS Basic recipes, and a sample of some - PowerPoint PPT Presentation

Introduction to MCMC and BUGS Basic recipes, and a sample of some techniques for getting started. Two simple worked out examples. R-code will be provided for those. No background in MCMC assumed. Not for experts! MCMC: Reminder

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Outline Bugs! 1 Avoiding and Finding bugs 2 Bugs still happen 3 Why do bugs still happen ?!

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

IN SCRUM PROJECTS Ramesh Shiraddi Bugs Current sprint bugs -- Created and found in current

Bugs, Bugs, Bugs Uwe Schindler Apache Lucene Committer & PMC Member uschindler@apache.org

BED BUGS HOW TO HELP SOLVE THE PROBLEM WHAT ARE BED BUGS? Bed bugs are parasites that feed on

IST-Pesticides RESEARCH SUPPORTED BY: Osborne Natural Enemies Bugs eating Bugs What

Part I. Hunting for Bugs Vadim Mutilin Institute for System Programming of the Russian Academy of

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris

STAT 339 Markov Chain Monte Carlo (MCMC) 7 April 2017 Some theory and intuition about MCMC

Network determination based on birth-death MCMC inference A. Mohammadi and E. Wit February 4,

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

FOR MCMC OLD HEADQUARTER CONFIDENTIAL BACKGROUND Existing MCMC Old HQ building is occupying

Efficient Use of HPC Resources for Turbulent Mixing Simulations Tulin Kaman (PI) Alaina Edwards

Comparison of Hydrologic Model Performance Statistics Using Rain Gauge and NEXRAD

Non-standard evaluation (NSE) An aside how we used to learn R An aside how we used to learn

tr r t qt

INAF-Astronomical Observatory of Padova IV. Explosive nucleosynthesis and chemical yields Luca

1 NSERC Information Session Scholarships and Fellowships 2018-2019 University of Guelph Sarah

Steady compressible Navier-Stokes equations with inflow boundary condition Tomasz Piasecki 1 ,

& Embedded Linux Philip Tricca philip.b.tricca@intel.com Agenda Background Security

Introduction to MCMC and BUGS Basic recipes, and a sample of some - PowerPoint PPT Presentation

Introduction to MCMC and BUGS Basic recipes, and a sample of some techniques for getting started. Two simple worked out examples. R-code will be provided for those. No background in MCMC assumed. Not for experts! MCMC: Reminder

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Outline Bugs! 1 Avoiding and Finding bugs 2 Bugs still happen 3 Why do bugs still happen ?!

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

IN SCRUM PROJECTS Ramesh Shiraddi Bugs Current sprint bugs -- Created and found in current

Bugs, Bugs, Bugs Uwe Schindler Apache Lucene Committer &amp; PMC Member uschindler@apache.org

BED BUGS HOW TO HELP SOLVE THE PROBLEM WHAT ARE BED BUGS? Bed bugs are parasites that feed on

IST-Pesticides RESEARCH SUPPORTED BY: Osborne Natural Enemies Bugs eating Bugs What

Part I. Hunting for Bugs Vadim Mutilin Institute for System Programming of the Russian Academy of

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris

STAT 339 Markov Chain Monte Carlo (MCMC) 7 April 2017 Some theory and intuition about MCMC

Network determination based on birth-death MCMC inference A. Mohammadi and E. Wit February 4,

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

FOR MCMC OLD HEADQUARTER CONFIDENTIAL BACKGROUND Existing MCMC Old HQ building is occupying

Efficient Use of HPC Resources for Turbulent Mixing Simulations Tulin Kaman (PI) Alaina Edwards

Comparison of Hydrologic Model Performance Statistics Using Rain Gauge and NEXRAD

Non-standard evaluation (NSE) An aside how we used to learn R An aside how we used to learn

tr r t qt

INAF-Astronomical Observatory of Padova IV. Explosive nucleosynthesis and chemical yields Luca

1 NSERC Information Session Scholarships and Fellowships 2018-2019 University of Guelph Sarah

Steady compressible Navier-Stokes equations with inflow boundary condition Tomasz Piasecki 1 ,

&amp; Embedded Linux Philip Tricca philip.b.tricca@intel.com Agenda Background Security

Bugs, Bugs, Bugs Uwe Schindler Apache Lucene Committer & PMC Member uschindler@apache.org

& Embedded Linux Philip Tricca philip.b.tricca@intel.com Agenda Background Security