Advanced Simulation - Lecture 10 Patrick Rebeschini February 14th, - PowerPoint PPT Presentation

Advanced Simulation - Lecture 10 Patrick Rebeschini February 14th, 2018 Patrick Rebeschini Lecture 10 1/ 30

Outline Often we have various possible models for the same dataset. Sometimes there’s an infinity of possible models! How to choose between models? Green (1995), Reversible Jump Markov chain Monte Carlo and Bayesian model determination . Patrick Rebeschini Lecture 10 2/ 30

Motivation: Bayesian model choice Assume we have a collection of models M k for k ∈ K . With data we can learn parameters given each model M k , but we can also learn about the models. Put a prior on models M k . Within each model, prior p ( θ k | M k ) on the parameters. Joint posterior distribution of interest: π ( M k , θ k | y ) = π ( M k | y ) π ( θ k | y, M k ) which is defined on ∪ k ∈K {M k } × Θ k ≡ ∪ k ∈K { k } × Θ k . Patrick Rebeschini Lecture 10 3/ 30

Polynomial regression Data ( x i , y i ) n i =1 where ( x i , y i ) ∈ R × R . Polynomial regression model k � 0 , σ 2 � � β j x j M k : y = ε ∼ N + ε , . j =0 � �� = f ( x ; β ) If k is too large then � � k � x ; � � β j x j f β = j =0 � � where � β 0 , � � β 1 , ..., � β = is the MLE, will overfit. β k Patrick Rebeschini Lecture 10 4/ 30

Polynomial regression M = 0 M = 1 M = 2 M = 3 40 40 40 40 20 20 20 20 0 0 0 0 −20 −20 −20 −20 0 5 10 0 5 10 0 5 10 0 5 10 M = 4 M = 5 M = 6 M = 7 40 40 40 40 20 20 20 20 0 0 0 0 −20 −20 −20 −20 0 5 10 0 5 10 0 5 10 0 5 10 Figure: As order of the model M = k increases, we overfit. Patrick Rebeschini Lecture 10 5/ 30

Bayesian polynomial regression We select k ∈ { 0 , ..., M max } and 1 P ( M k ) = p k = M max + 1 with Θ k = R k +1 × R + � β, σ 2 � � � � � β ; 0 , σ 2 I k +1 σ 2 ; 1 , 1 = N IG p k . In this case, we have analytic expression for � � β, σ 2 � n � y i ; f ( x i ; β ) , σ 2 � � dβdσ 2 . p k ( y 1: n ) = N p k Θ k i =1 Bayesian model selection automatically prevents overfitting. Patrick Rebeschini Lecture 10 6/ 30

Bayesian Polynomial regression M = 0 M = 1 M = 2 M = 3 40 40 40 40 Model Evidence 1 20 20 20 20 0.8 0 0 0 0 0.6 P(Y|M) −20 −20 −20 −20 0 5 10 0 5 10 0 5 10 0 5 10 0.4 M = 4 M = 5 M = 6 M = 7 0.2 40 40 40 40 0 20 20 20 20 0 1 2 3 4 5 6 7 M 0 0 0 0 −20 −20 −20 −20 0 5 10 0 5 10 0 5 10 0 5 10 Figure: f ( x ; β ) for random draws from p M ( β | y 1: n ) and evidence p M ( y 1: n ). Patrick Rebeschini Lecture 10 7/ 30

Motivation: mixture models Assume the observations Y 1 , . . . , Y n come from K � p k N ( µ k , σ 2 k ) k =1 with � K k =1 p k = 1. For any fixed K , the parameters to infer are ( p 1 , . . . , p K − 1 , µ 1 , . . . , µ K , σ 2 1 , . . . , σ 2 K ) of dimension 3 K − 1. But what about inference on K ? We can put a prior on K , e.g. a Poisson distribution. How do we get the posterior? Patrick Rebeschini Lecture 10 8/ 30

Sampling in transdimensional spaces Consider a collection of models M k , for k ∈ K ⊂ N . We want to design a Markov chain taking values in ∪ k ∈K { k } × Θ k , with the correct joint posterior. Reversible jump MCMC is a generalized Metropolis-Hastings using a mixture of kernels. For each k , standard MH kernel from { k } × Θ k to { k } × Θ k , i.e. standard within-model moves. How to move from { k } × Θ k to { k ′ } × Θ k ′ ? Patrick Rebeschini Lecture 10 9/ 30

Transdimensional moves We can propose k ′ from q ( k ′ | k ). Then we need to propose a move from Θ k to Θ k ′ , of dimension d k and d k ′ . dimension matching : extend the spaces with auxiliary variables. Introduce u k → k ′ and u k ′ → k with distributions ϕ k → k ′ and ϕ k ′ → k respectively, and such that d k + dim( u k → k ′ ) = d k ′ + dim( u k ′ → k ) . Patrick Rebeschini Lecture 10 10/ 30

Transdimensional moves Given θ k , we sample u k → k ′ ∼ ϕ k → k ′ and then apply a deterministic mapping to get ( θ k ′ , u k ′ → k ) = G k → k ′ ( θ k , u k → k ′ ) . The distributions ϕ are arbitrary and G k → k ′ has to be a diffeomorphism. We now have our proposal from Θ k to Θ k ′ . With what probability do we accept it? Patrick Rebeschini Lecture 10 11/ 30

Transdimensional moves Mimicking Metropolis-Hastings, given x we propose a point x ′ and accept or not with probability α ( x → x ′ ). We want P to be such that, for all A, B : � � π ( dx ′ ) P ( x ′ → dx ) π ( dx ) P ( x → dx ′ ) = x,x ′ ∈ A × B x,x ′ ∈ A × B or equivalently � π ( dx ) q ( x → dx ′ ) α ( x → x ′ ) x,x ′ ∈ A × B � π ( dx ′ ) q ( x ′ → dx ) α ( x ′ → x ) = x,x ′ ∈ A × B Patrick Rebeschini Lecture 10 12/ 30

Transdimensional moves Subtle point: π ( dx ) P ( x, dx ′ ) does not necessarily admit a density with respect to a standard measure. We cannot write e.g. π ( x ) P ( x, dx ′ ) = π ( x ) P ( x, x ′ ) dxdx ′ However π ( dx ) q ( x, dx ′ ) can be assumed to be dominated and we write π ( x ) q ( x, dx ′ ) = π ( x ) q ( x, x ′ ) dxdx ′ Patrick Rebeschini Lecture 10 13/ 30

Transdimensional moves First term is: � π ( x ) q ( x → x ′ ) α ( x → x ′ ) dxdx ′ x,x ′ ∈ A × B Suppose we propose x ′ by sampling u ∼ ϕ and then taking ( x ′ , u ′ ) = G ( x, u ) deterministically. We write x ′ ( x, u ) and u ′ ( x, u ). The expression becomes � π ( x ) ϕ ( u ) α ( x → x ′ ( x, u )) dxdu x,x ′ ( x,u ) ∈ A × B What is the reverse transition from x ′ to x ? Sample u ′ ∼ ϕ ′ and take ( x, u ) = G − 1 ( x ′ , u ′ ). Patrick Rebeschini Lecture 10 14/ 30

Transdimensional moves Second term was: � π ( x ′ ) q ( x ′ → x ) α ( x ′ → x ) dxdx ′ x,x ′ ∈ A × B It becomes, with ( x, u ) = G − 1 ( x ′ , u ′ ): � π ( x ′ ) ϕ ′ ( u ′ ) α ( x ′ → x ( x ′ , u ′ )) dx ′ du ′ x ( x ′ ,u ′ ) ,x ′ ∈ A × B Let us do a change of variable to get an integral with respect to dxdu instead of dx ′ du ′ : � � � � � ∂G ( x, u ) π ( x ′ ( x, u )) ϕ ′ ( u ′ ( x, u )) α ( x ′ ( x, u ) → x ) � � � dxdu � ∂ ( x, u ) · Patrick Rebeschini Lecture 10 15/ 30

Transdimensional moves We see that the integrals are equal if π ( x ) ϕ ( u ) α ( x → x ′ ( x, u )) � � � � ∂G ( x, u ) = π ( x ′ ( x, u )) ϕ ′ ( u ′ ( x, u )) α ( x ′ ( x, u ) → x ) � � � � ∂ ( x, u ) Thus we can see a valid choice of α ( x → x ′ ) in : � � � � 1 , π ( x ′ ) ϕ ′ ( u ′ ) � � ∂G ( x, u ) α ( x → x ′ ) = min � � � � π ( x ) ϕ ( u ) ∂ ( x, u ) Patrick Rebeschini Lecture 10 16/ 30

Transdimensional moves We can now answer the initial question: How to move from { k } × Θ k to some other { k ′ } × Θ k ′ ? We start from some ( k, θ k ). Sample k ′ ∼ q ( k → k ′ ), then sample u k → k ′ from ϕ k → k ′ . Compute deterministically ( θ k ′ , u k ′ → k ) = G k → k ′ ( θ k , u k → k ′ ). Compute � � q ( k ′ → k ) 1 , π ( θ k ′ ) ϕ k ′ → k ( u k ′ → k ) α k → k ′ = min q ( k → k ′ ) J k → k ′ ( θ k , u k → k ′ ) π ( θ k ) ϕ k → k ′ ( u k → k ′ ) where � � � ∂G k → k ′ ( θ k , u k → k ′ ) � � � J k → k ′ ( θ k , u k → k ′ ) = � . � ∂ ( θ k , u k → k ′ ) Patrick Rebeschini Lecture 10 17/ 30

Reversible Jump algorithm � k (0) , θ (0) � Starting with iterate for t = 1 , 2 , 3 , ... With probability β , set k ( t ) = k ( t − 1) and do one step of K k ( t ) leaving π ( θ k ( t ) | y, M k ( t ) ) invariant. With probability 1 − β , propose k ′ ∼ q ( k ′ | k ( t − 1) ). Draw a random variable u k ( t − 1) → k ′ ∼ ϕ k ( t − 1) → k ′ . Apply the deterministic mapping G k ( t − 1) → k ′ to get θ ′ , u ′ . With “between-models” acceptance probability a ( θ ( t − 1) → θ ′ ): accept, i.e. set θ ( t ) = θ ′ , k ( t ) = k ′ , otherwise reject, i.e. set θ ( t ) = θ ( t − 1) , k ( t ) = k ( t − 1) . Patrick Rebeschini Lecture 10 18/ 30

Toy example Two models, uniform prior on models p ( M 1 ) = p ( M 2 ) = 1 2 . In model M 1 , θ ∈ R and we can evaluate pointwise � � − 1 2 ( θ ) 2 posterior 1 ( θ ) ∝ p ( θ | M 1 ) L ( θ | M 1 ) = exp In model M 2 , θ ∈ R 2 and we can evaluate pointwise � � − 1 2 ( θ 1 ) 2 − 1 2 ( θ 2 ) 2 posterior 2 ( θ ) ∝ p ( θ | M 2 ) L ( θ | M 2 ) = exp Patrick Rebeschini Lecture 10 19/ 30

Toy situation In terms of model comparison, we should find p ( M 2 | y ) p ( M 1 | y ) = p ( y | M 2 ) p ( M 2 ) p ( y | M 1 ) p ( M 1 ) � 1 R 2 p ( θ | M 2 ) L ( θ | M 2 ) dθ 2 = � R p ( θ | M 1 ) L ( θ | M 1 ) dθ × 1 2 2 π √ = 2 π √ 2 π ≈ 2 . 5066 = In terms of parameters, in model M 1 , θ ∼ N (0 , 1) and in �� 0 1 0 model M 2 , θ ∼ N , . 0 0 1 Patrick Rebeschini Lecture 10 20/ 30

Advanced Simulation - Lecture 10 Patrick Rebeschini February 14th, - PowerPoint PPT Presentation

Advanced Simulation - Lecture 10 Patrick Rebeschini February 14th, 2018 Patrick Rebeschini Lecture 10 1/ 30 Outline Often we have various possible models for the same dataset. Sometimes theres an infinity of possible models! How to

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

Automated Configuration of Co-simulation with Domain Specific Hints Co-simulation on the rise

MD3311 Simulation Results Joschua Dilly 28.01.2019 MD3311 Simulation Results 2 Introduction

Surgical Simulation: Surgical Simulation: We dont need simulation. We dont need

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Statistical Simulation in Python Tushar Shanker Data Scientist DataCamp Statistical Simulation

Chapter 2 Simulation Examples Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Simulation of stationary processes Timo Tiihonen 2014 Tactical aspects of simulation

Architecture without explicit locks for logic Importance Of Simulation simulation on SIMD

Simulation Monte Carlo Monte Carlo simulation Outcome of a single stochastic simulation run

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Advanced Nutrition Course Advanced Nutrition Course 6 Week Advanced Nutrition Live Online

A new Bayesian variable selection criterion based on a g -Prior extension for p > n Yuzo

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 21: (Brief) Introduction

t s t t s st r

The necessity and formulation of a robust (imprecise) Bayes Factor Patrick Schwaferts

Bayesian Receiver Autonomous Integrity Monitoring Technique Henri Pesonen and Robert Pich

Outline Inference in Bayes Nets Variable Elimination Bayes Nets (cont) CS 486/686

Bayesian regression with a categorical predictor Alicia Johnson Associate Professor, Macalester

Bayesian inference in astronomy: past, present and future. Sanjib Sharma (University of Sydney)

Advanced Simulation - Lecture 10 Patrick Rebeschini February 14th, - PowerPoint PPT Presentation

Advanced Simulation - Lecture 10 Patrick Rebeschini February 14th, 2018 Patrick Rebeschini Lecture 10 1/ 30 Outline Often we have various possible models for the same dataset. Sometimes theres an infinity of possible models! How to

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

Automated Configuration of Co-simulation with Domain Specific Hints Co-simulation on the rise

MD3311 Simulation Results Joschua Dilly 28.01.2019 MD3311 Simulation Results 2 Introduction

Surgical Simulation: Surgical Simulation: We dont need simulation. We dont need

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Statistical Simulation in Python Tushar Shanker Data Scientist DataCamp Statistical Simulation

Chapter 2 Simulation Examples Banks, Carson, Nelson &amp; Nicol Discrete-Event System Simulation

Simulation of stationary processes Timo Tiihonen 2014 Tactical aspects of simulation

Architecture without explicit locks for logic Importance Of Simulation simulation on SIMD

Simulation Monte Carlo Monte Carlo simulation Outcome of a single stochastic simulation run

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Advanced Nutrition Course Advanced Nutrition Course 6 Week Advanced Nutrition Live Online

A new Bayesian variable selection criterion based on a g -Prior extension for p &gt; n Yuzo

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 21: (Brief) Introduction

t s t t s st r

The necessity and formulation of a robust (imprecise) Bayes Factor Patrick Schwaferts

Bayesian Receiver Autonomous Integrity Monitoring Technique Henri Pesonen and Robert Pich

Outline Inference in Bayes Nets Variable Elimination Bayes Nets (cont) CS 486/686

Bayesian regression with a categorical predictor Alicia Johnson Associate Professor, Macalester

Bayesian inference in astronomy: past, present and future. Sanjib Sharma (University of Sydney)

Chapter 2 Simulation Examples Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

A new Bayesian variable selection criterion based on a g -Prior extension for p > n Yuzo