Zig-Zag Monte Carlo Delft University of Technology Joris Bierkens February 7, 2017 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 1 / 33
Acknowledgements Collaborators Andrew Duncan Paul Fearnhead Antonietta Mira Gareth Roberts Financial support Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 2 / 33
Outline 1 Motivation: Markov Chain Monte Carlo 2 One-dimensional Zig-Zag process 3 Multi-dimensional ZZP 4 Subsampling 5 Doubly intractable likelihood Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 3 / 33
Bayesian inference In Bayesian inference we typically deal with a posterior density x ∈ R d , π ( x ) = π ( x ; y ) ∝ L ( y | x ) π 0 ( x ) , where L ( y | x ) is the likelihood of the data y given parameter x ∈ R d , and π 0 is a prior density for x . Quantities of interest are e.g. � • posterior mean x π ( x ) dx , � �� � 2 , x 2 π ( x ) dx − • posterior variance x π ( x ) dx � • tail probability ✶ { x ≥ c } π ( x ) dx . � All of these involve integrals of the form h ( x ) π ( x ) dx . Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 4 / 33
� h ( x ) π ( x ) dx Evaluating Possible approaches: 1 Explicit (analytic) integration. Rarely possible Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 5 / 33
� h ( x ) π ( x ) dx Evaluating Possible approaches: 1 Explicit (analytic) integration. Rarely possible 2 Numerical integration. Curse of dimensionality Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 5 / 33
� h ( x ) π ( x ) dx Evaluating Possible approaches: 1 Explicit (analytic) integration. Rarely possible 2 Numerical integration. Curse of dimensionality 3 Monte Carlo. Draw independent samples ( X 1 , X 2 , . . . ) from π and use the law of large numbers. Requires independent samples from π � K � 1 h ( x ) π ( x ) dx = lim h ( X k ) . K K →∞ k =1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 5 / 33
� h ( x ) π ( x ) dx Evaluating Possible approaches: 1 Explicit (analytic) integration. Rarely possible 2 Numerical integration. Curse of dimensionality 3 Monte Carlo. Draw independent samples ( X 1 , X 2 , . . . ) from π and use the law of large numbers. Requires independent samples from π 4 Markov Chain Monte Carlo. Construct an ergodic Markov chain ( X 1 , X 2 , . . . ) with invariant distribution π ( x ) dx , use Birkhoff’s ergodic theorem. � K � 1 h ( x ) π ( x ) dx = lim h ( X k ) . K K →∞ k =1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 5 / 33
One-dimensional Zig-Zag process Dynamics • Continuous time • Current state ( X ( t ) , Θ( t )) ∈ R × {− 1 , +1 } . • Move X ( t ) in direction Θ( t ) = ± 1 until a switch occurs. • The switching intensity is λ ( X ( t ) , Θ( t )). 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2.5 0 10 20 30 40 50 60 70 80 90 100 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 6 / 33
Relation between switching rate and potential L f ( x , θ ) = θ df dx + λ ( x , θ )( f ( x , − θ ) − f ( x , θ )) , x ∈ R , θ ∈ {− 1 , +1 } . • Potential U ( x ) = − log π ( x ) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 7 / 33
Relation between switching rate and potential L f ( x , θ ) = θ df dx + λ ( x , θ )( f ( x , − θ ) − f ( x , θ )) , x ∈ R , θ ∈ {− 1 , +1 } . • Potential U ( x ) = − log π ( x ) • π is invariant if and only if λ ( x , +1) − λ ( x , − 1) = U ′ ( x ) for all x . Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 7 / 33
Relation between switching rate and potential L f ( x , θ ) = θ df dx + λ ( x , θ )( f ( x , − θ ) − f ( x , θ )) , x ∈ R , θ ∈ {− 1 , +1 } . • Potential U ( x ) = − log π ( x ) • π is invariant if and only if λ ( x , +1) − λ ( x , − 1) = U ′ ( x ) for all x . • Equivalently, λ ( x , θ ) = γ ( x ) + max (0 , θ U ′ ( x )) , γ ( x ) ≥ 0 . Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 7 / 33
Relation between switching rate and potential L f ( x , θ ) = θ df dx + λ ( x , θ )( f ( x , − θ ) − f ( x , θ )) , x ∈ R , θ ∈ {− 1 , +1 } . • Potential U ( x ) = − log π ( x ) • π is invariant if and only if λ ( x , +1) − λ ( x , − 1) = U ′ ( x ) for all x . • Equivalently, λ ( x , θ ) = γ ( x ) + max (0 , θ U ′ ( x )) , γ ( x ) ≥ 0 . Example: Gaussian distribution N (0 , σ 2 ) • Density π ( x ) ∝ exp( − x 2 / (2 σ 2 )) • Potential U ( x ) = x 2 / (2 σ 2 ) • Derivative U ′ ( x ) = x /σ 2 • Switching rates λ ( x , θ ) = ( θ x /σ 2 ) + + γ ( x ) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 7 / 33
Proof of invariance of π ∝ exp( − U ) L f ( x , θ ) = θ ∂ f λ ( x , +1) − λ ( x , − 1) = U ′ ( x ) . ∂ x ( x , θ ) + λ ( x , θ ) ( f ( x , − θ ) − f ( x , θ )) , Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp( − U ) L f ( x , θ ) = θ ∂ f λ ( x , +1) − λ ( x , − 1) = U ′ ( x ) . ∂ x ( x , θ ) + λ ( x , θ ) ( f ( x , − θ ) − f ( x , θ )) , Markov semigroup P ( t ) f ( x , θ ) = E x ,θ f ( X ( t ) , Θ( t )) Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp( − U ) L f ( x , θ ) = θ ∂ f λ ( x , +1) − λ ( x , − 1) = U ′ ( x ) . ∂ x ( x , θ ) + λ ( x , θ ) ( f ( x , − θ ) − f ( x , θ )) , Markov semigroup P ( t ) f ( x , θ ) = E x ,θ f ( X ( t ) , Θ( t )) π stationary means that � � � � P ( t ) f ( x , θ ) π ( x ) dx = f ( x , θ ) π ( x ) dx f ∈ D ( L ) , t ≥ 0 . R R θ = ± 1 θ = ± 1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp( − U ) L f ( x , θ ) = θ ∂ f λ ( x , +1) − λ ( x , − 1) = U ′ ( x ) . ∂ x ( x , θ ) + λ ( x , θ ) ( f ( x , − θ ) − f ( x , θ )) , Markov semigroup P ( t ) f ( x , θ ) = E x ,θ f ( X ( t ) , Θ( t )) π stationary means that � � � � P ( t ) f ( x , θ ) π ( x ) dx = f ( x , θ ) π ( x ) dx f ∈ D ( L ) , t ≥ 0 . R R θ = ± 1 θ = ± 1 Differentiating gives the equivalent condition: � � R L f ( x , θ ) π ( x ) dx = 0, f ∈ D ( L ). θ = ± 1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp( − U ) L f ( x , θ ) = θ ∂ f λ ( x , +1) − λ ( x , − 1) = U ′ ( x ) . ∂ x ( x , θ ) + λ ( x , θ ) ( f ( x , − θ ) − f ( x , θ )) , Markov semigroup P ( t ) f ( x , θ ) = E x ,θ f ( X ( t ) , Θ( t )) π stationary means that � � � � P ( t ) f ( x , θ ) π ( x ) dx = f ( x , θ ) π ( x ) dx f ∈ D ( L ) , t ≥ 0 . R R θ = ± 1 θ = ± 1 Differentiating gives the equivalent condition: � � R L f ( x , θ ) π ( x ) dx = 0, f ∈ D ( L ). θ = ± 1 � � λ ( x , θ ) ( f ( x , − θ ) − f ( x , θ )) π ( x ) dx R θ = ± 1 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp( − U ) L f ( x , θ ) = θ ∂ f λ ( x , +1) − λ ( x , − 1) = U ′ ( x ) . ∂ x ( x , θ ) + λ ( x , θ ) ( f ( x , − θ ) − f ( x , θ )) , Markov semigroup P ( t ) f ( x , θ ) = E x ,θ f ( X ( t ) , Θ( t )) π stationary means that � � � � P ( t ) f ( x , θ ) π ( x ) dx = f ( x , θ ) π ( x ) dx f ∈ D ( L ) , t ≥ 0 . R R θ = ± 1 θ = ± 1 Differentiating gives the equivalent condition: � � R L f ( x , θ ) π ( x ) dx = 0, f ∈ D ( L ). θ = ± 1 � � λ ( x , θ ) ( f ( x , − θ ) − f ( x , θ )) π ( x ) dx R θ = ± 1 � = { λ ( x , +1) ( f ( x , − 1) − f ( x , +1)) + λ ( x , − 1) ( f ( x , +1) − f ( x , − 1)) } π ( x ) dx R Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp( − U ) L f ( x , θ ) = θ ∂ f λ ( x , +1) − λ ( x , − 1) = U ′ ( x ) . ∂ x ( x , θ ) + λ ( x , θ ) ( f ( x , − θ ) − f ( x , θ )) , Markov semigroup P ( t ) f ( x , θ ) = E x ,θ f ( X ( t ) , Θ( t )) π stationary means that � � � � P ( t ) f ( x , θ ) π ( x ) dx = f ( x , θ ) π ( x ) dx f ∈ D ( L ) , t ≥ 0 . R R θ = ± 1 θ = ± 1 Differentiating gives the equivalent condition: � � R L f ( x , θ ) π ( x ) dx = 0, f ∈ D ( L ). θ = ± 1 � � λ ( x , θ ) ( f ( x , − θ ) − f ( x , θ )) π ( x ) dx R θ = ± 1 � = { λ ( x , +1) ( f ( x , − 1) − f ( x , +1)) + λ ( x , − 1) ( f ( x , +1) − f ( x , − 1)) } π ( x ) dx R Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Recommend
More recommend