Advanced Simulation - Lecture 10 Patrick Rebeschini February 14th, 2018 Patrick Rebeschini Lecture 10 1/ 30
Outline Often we have various possible models for the same dataset. Sometimes there’s an infinity of possible models! How to choose between models? Green (1995), Reversible Jump Markov chain Monte Carlo and Bayesian model determination . Patrick Rebeschini Lecture 10 2/ 30
Motivation: Bayesian model choice Assume we have a collection of models M k for k ∈ K . With data we can learn parameters given each model M k , but we can also learn about the models. Put a prior on models M k . Within each model, prior p ( θ k | M k ) on the parameters. Joint posterior distribution of interest: π ( M k , θ k | y ) = π ( M k | y ) π ( θ k | y, M k ) which is defined on ∪ k ∈K {M k } × Θ k ≡ ∪ k ∈K { k } × Θ k . Patrick Rebeschini Lecture 10 3/ 30
Polynomial regression Data ( x i , y i ) n i =1 where ( x i , y i ) ∈ R × R . Polynomial regression model k � 0 , σ 2 � � β j x j M k : y = ε ∼ N + ε , . j =0 � �� � = f ( x ; β ) If k is too large then � � k � x ; � � β j x j f β = j =0 � � where � β 0 , � � β 1 , ..., � β = is the MLE, will overfit. β k Patrick Rebeschini Lecture 10 4/ 30
Polynomial regression M = 0 M = 1 M = 2 M = 3 40 40 40 40 20 20 20 20 0 0 0 0 −20 −20 −20 −20 0 5 10 0 5 10 0 5 10 0 5 10 M = 4 M = 5 M = 6 M = 7 40 40 40 40 20 20 20 20 0 0 0 0 −20 −20 −20 −20 0 5 10 0 5 10 0 5 10 0 5 10 Figure: As order of the model M = k increases, we overfit. Patrick Rebeschini Lecture 10 5/ 30
Bayesian polynomial regression We select k ∈ { 0 , ..., M max } and 1 P ( M k ) = p k = M max + 1 with Θ k = R k +1 × R + � β, σ 2 � � � � � β ; 0 , σ 2 I k +1 σ 2 ; 1 , 1 = N IG p k . In this case, we have analytic expression for � � β, σ 2 � n � y i ; f ( x i ; β ) , σ 2 � � dβdσ 2 . p k ( y 1: n ) = N p k Θ k i =1 Bayesian model selection automatically prevents overfitting. Patrick Rebeschini Lecture 10 6/ 30
Bayesian Polynomial regression M = 0 M = 1 M = 2 M = 3 40 40 40 40 Model Evidence 1 20 20 20 20 0.8 0 0 0 0 0.6 P(Y|M) −20 −20 −20 −20 0 5 10 0 5 10 0 5 10 0 5 10 0.4 M = 4 M = 5 M = 6 M = 7 0.2 40 40 40 40 0 20 20 20 20 0 1 2 3 4 5 6 7 M 0 0 0 0 −20 −20 −20 −20 0 5 10 0 5 10 0 5 10 0 5 10 Figure: f ( x ; β ) for random draws from p M ( β | y 1: n ) and evidence p M ( y 1: n ). Patrick Rebeschini Lecture 10 7/ 30
Motivation: mixture models Assume the observations Y 1 , . . . , Y n come from K � p k N ( µ k , σ 2 k ) k =1 with � K k =1 p k = 1. For any fixed K , the parameters to infer are ( p 1 , . . . , p K − 1 , µ 1 , . . . , µ K , σ 2 1 , . . . , σ 2 K ) of dimension 3 K − 1. But what about inference on K ? We can put a prior on K , e.g. a Poisson distribution. How do we get the posterior? Patrick Rebeschini Lecture 10 8/ 30
Sampling in transdimensional spaces Consider a collection of models M k , for k ∈ K ⊂ N . We want to design a Markov chain taking values in ∪ k ∈K { k } × Θ k , with the correct joint posterior. Reversible jump MCMC is a generalized Metropolis-Hastings using a mixture of kernels. For each k , standard MH kernel from { k } × Θ k to { k } × Θ k , i.e. standard within-model moves. How to move from { k } × Θ k to { k ′ } × Θ k ′ ? Patrick Rebeschini Lecture 10 9/ 30
Transdimensional moves We can propose k ′ from q ( k ′ | k ). Then we need to propose a move from Θ k to Θ k ′ , of dimension d k and d k ′ . dimension matching : extend the spaces with auxiliary variables. Introduce u k → k ′ and u k ′ → k with distributions ϕ k → k ′ and ϕ k ′ → k respectively, and such that d k + dim( u k → k ′ ) = d k ′ + dim( u k ′ → k ) . Patrick Rebeschini Lecture 10 10/ 30
Transdimensional moves Given θ k , we sample u k → k ′ ∼ ϕ k → k ′ and then apply a deterministic mapping to get ( θ k ′ , u k ′ → k ) = G k → k ′ ( θ k , u k → k ′ ) . The distributions ϕ are arbitrary and G k → k ′ has to be a diffeomorphism. We now have our proposal from Θ k to Θ k ′ . With what probability do we accept it? Patrick Rebeschini Lecture 10 11/ 30
Transdimensional moves Mimicking Metropolis-Hastings, given x we propose a point x ′ and accept or not with probability α ( x → x ′ ). We want P to be such that, for all A, B : � � π ( dx ′ ) P ( x ′ → dx ) π ( dx ) P ( x → dx ′ ) = x,x ′ ∈ A × B x,x ′ ∈ A × B or equivalently � π ( dx ) q ( x → dx ′ ) α ( x → x ′ ) x,x ′ ∈ A × B � π ( dx ′ ) q ( x ′ → dx ) α ( x ′ → x ) = x,x ′ ∈ A × B Patrick Rebeschini Lecture 10 12/ 30
Transdimensional moves Subtle point: π ( dx ) P ( x, dx ′ ) does not necessarily admit a density with respect to a standard measure. We cannot write e.g. π ( x ) P ( x, dx ′ ) = π ( x ) P ( x, x ′ ) dxdx ′ However π ( dx ) q ( x, dx ′ ) can be assumed to be dominated and we write π ( x ) q ( x, dx ′ ) = π ( x ) q ( x, x ′ ) dxdx ′ Patrick Rebeschini Lecture 10 13/ 30
Transdimensional moves First term is: � π ( x ) q ( x → x ′ ) α ( x → x ′ ) dxdx ′ x,x ′ ∈ A × B Suppose we propose x ′ by sampling u ∼ ϕ and then taking ( x ′ , u ′ ) = G ( x, u ) deterministically. We write x ′ ( x, u ) and u ′ ( x, u ). The expression becomes � π ( x ) ϕ ( u ) α ( x → x ′ ( x, u )) dxdu x,x ′ ( x,u ) ∈ A × B What is the reverse transition from x ′ to x ? Sample u ′ ∼ ϕ ′ and take ( x, u ) = G − 1 ( x ′ , u ′ ). Patrick Rebeschini Lecture 10 14/ 30
Transdimensional moves Second term was: � π ( x ′ ) q ( x ′ → x ) α ( x ′ → x ) dxdx ′ x,x ′ ∈ A × B It becomes, with ( x, u ) = G − 1 ( x ′ , u ′ ): � π ( x ′ ) ϕ ′ ( u ′ ) α ( x ′ → x ( x ′ , u ′ )) dx ′ du ′ x ( x ′ ,u ′ ) ,x ′ ∈ A × B Let us do a change of variable to get an integral with respect to dxdu instead of dx ′ du ′ : � � � � � ∂G ( x, u ) π ( x ′ ( x, u )) ϕ ′ ( u ′ ( x, u )) α ( x ′ ( x, u ) → x ) � � � dxdu � ∂ ( x, u ) · Patrick Rebeschini Lecture 10 15/ 30
Transdimensional moves We see that the integrals are equal if π ( x ) ϕ ( u ) α ( x → x ′ ( x, u )) � � � � ∂G ( x, u ) = π ( x ′ ( x, u )) ϕ ′ ( u ′ ( x, u )) α ( x ′ ( x, u ) → x ) � � � � ∂ ( x, u ) Thus we can see a valid choice of α ( x → x ′ ) in : � � � � 1 , π ( x ′ ) ϕ ′ ( u ′ ) � � ∂G ( x, u ) α ( x → x ′ ) = min � � � � π ( x ) ϕ ( u ) ∂ ( x, u ) Patrick Rebeschini Lecture 10 16/ 30
Transdimensional moves We can now answer the initial question: How to move from { k } × Θ k to some other { k ′ } × Θ k ′ ? We start from some ( k, θ k ). Sample k ′ ∼ q ( k → k ′ ), then sample u k → k ′ from ϕ k → k ′ . Compute deterministically ( θ k ′ , u k ′ → k ) = G k → k ′ ( θ k , u k → k ′ ). Compute � � q ( k ′ → k ) 1 , π ( θ k ′ ) ϕ k ′ → k ( u k ′ → k ) α k → k ′ = min q ( k → k ′ ) J k → k ′ ( θ k , u k → k ′ ) π ( θ k ) ϕ k → k ′ ( u k → k ′ ) where � � � ∂G k → k ′ ( θ k , u k → k ′ ) � � � J k → k ′ ( θ k , u k → k ′ ) = � . � ∂ ( θ k , u k → k ′ ) Patrick Rebeschini Lecture 10 17/ 30
Reversible Jump algorithm � k (0) , θ (0) � Starting with iterate for t = 1 , 2 , 3 , ... With probability β , set k ( t ) = k ( t − 1) and do one step of K k ( t ) leaving π ( θ k ( t ) | y, M k ( t ) ) invariant. With probability 1 − β , propose k ′ ∼ q ( k ′ | k ( t − 1) ). Draw a random variable u k ( t − 1) → k ′ ∼ ϕ k ( t − 1) → k ′ . Apply the deterministic mapping G k ( t − 1) → k ′ to get θ ′ , u ′ . With “between-models” acceptance probability a ( θ ( t − 1) → θ ′ ): accept, i.e. set θ ( t ) = θ ′ , k ( t ) = k ′ , otherwise reject, i.e. set θ ( t ) = θ ( t − 1) , k ( t ) = k ( t − 1) . Patrick Rebeschini Lecture 10 18/ 30
Toy example Two models, uniform prior on models p ( M 1 ) = p ( M 2 ) = 1 2 . In model M 1 , θ ∈ R and we can evaluate pointwise � � − 1 2 ( θ ) 2 posterior 1 ( θ ) ∝ p ( θ | M 1 ) L ( θ | M 1 ) = exp In model M 2 , θ ∈ R 2 and we can evaluate pointwise � � − 1 2 ( θ 1 ) 2 − 1 2 ( θ 2 ) 2 posterior 2 ( θ ) ∝ p ( θ | M 2 ) L ( θ | M 2 ) = exp Patrick Rebeschini Lecture 10 19/ 30
Toy situation In terms of model comparison, we should find p ( M 2 | y ) p ( M 1 | y ) = p ( y | M 2 ) p ( M 2 ) p ( y | M 1 ) p ( M 1 ) � 1 R 2 p ( θ | M 2 ) L ( θ | M 2 ) dθ 2 = � R p ( θ | M 1 ) L ( θ | M 1 ) dθ × 1 2 2 π √ = 2 π √ 2 π ≈ 2 . 5066 = In terms of parameters, in model M 1 , θ ∼ N (0 , 1) and in �� � � �� 0 1 0 model M 2 , θ ∼ N , . 0 0 1 Patrick Rebeschini Lecture 10 20/ 30
Recommend
More recommend