Variational Hamiltonian Monte Carlo via Score Matching Cheng Zhang - PowerPoint PPT Presentation

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Références Variational Hamiltonian Monte Carlo via Score Matching Cheng Zhang (Joint work with Prof. Shahbaba and Prof. Zhao) Department of Mathematics University of California, Irvine Jan 6, 2017 Cheng Zhang UCI Variational HMC Jan 6, 2017 1 / 27

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Références Outline Background 1 Bayesian Inference Markov chain Monte Carlo 2 Metropolis-Hastings Algorithm Hamiltonian Monte Carlo Scalable MCMC Fixed-Form Variational Bayes 3 Lower Bounds and Free Energy Variational Bayes as Linear Regression Variational Hamiltonian Monte Carlo 4 Approximation with Random Bases Variational HMC Experiments Conclusion 5 Cheng Zhang UCI Variational HMC Jan 6, 2017 2 / 27

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Références Bayesian Inference Bayesian Inference Bayesian inference model D = { y 1 , . . . , y N } : observed data θ ∈ R d : model parameter p ( D| θ ) : model density p ( θ ) : prior Cheng Zhang UCI Variational HMC Jan 6, 2017 3 / 27

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Références Bayesian Inference Bayesian Inference Bayesian inference model D = { y 1 , . . . , y N } : observed data θ ∈ R d : model parameter p ( D| θ ) : model density p ( θ ) : prior Goal : learning parameter θ from data p ( θ |D ) = p ( D| θ ) · p ( θ ) ∝ p ( D| θ ) · p ( θ ) p ( D ) Cheng Zhang UCI Variational HMC Jan 6, 2017 3 / 27

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Références Bayesian Inference Bayesian Inference Bayesian inference model D = { y 1 , . . . , y N } : observed data θ ∈ R d : model parameter p ( D| θ ) : model density p ( θ ) : prior Goal : learning parameter θ from data p ( θ |D ) = p ( D| θ ) · p ( θ ) ∝ p ( D| θ ) · p ( θ ) p ( D ) Difficulty : p ( D ) unknown ⇒ intractable posterior distribution p ( θ |D ) e.g., probabilistic graphical models, Bayesian hierarchical models Cheng Zhang UCI Variational HMC Jan 6, 2017 3 / 27

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Références Bayesian Inference Bayesian Inference Bayesian inference model D = { y 1 , . . . , y N } : observed data θ ∈ R d : model parameter p ( D| θ ) : model density p ( θ ) : prior Goal : learning parameter θ from data p ( θ |D ) = p ( D| θ ) · p ( θ ) ∝ p ( D| θ ) · p ( θ ) p ( D ) Difficulty : p ( D ) unknown ⇒ intractable posterior distribution p ( θ |D ) e.g., probabilistic graphical models, Bayesian hierarchical models Two popular approximations Markov chain Monte Carlo . Sample by running a Markov chain : asymptotically unbiased but computationally slow Variational Bayes . Approximate via tractable distributions : computationally fast but may result in poor approximation. Cheng Zhang UCI Variational HMC Jan 6, 2017 3 / 27

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Références Metropolis-Hastings Algorithm Markov chain Monte Carlo Intuitive idea : evolve a Markov chain to sample from a target distribution π ( θ ) (M ETROPOLIS et al. 1953). Cheng Zhang UCI Variational HMC Jan 6, 2017 4 / 27

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Références Metropolis-Hastings Algorithm Markov chain Monte Carlo Intuitive idea : evolve a Markov chain to sample from a target distribution π ( θ ) (M ETROPOLIS et al. 1953). Conditions for transition kernel T ( ·|· ) Irreducibility : any state has positive probability of visiting any other state. Aperiodicity : The chain should not get trapped in cycles. Detailed Balance condition (sufficient) : π ( θ ) T ( θ ′ | θ ) = π ( θ ′ ) T ( θ | θ ′ ) Cheng Zhang UCI Variational HMC Jan 6, 2017 4 / 27

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Références Metropolis-Hastings Algorithm Markov chain Monte Carlo Intuitive idea : evolve a Markov chain to sample from a target distribution π ( θ ) (M ETROPOLIS et al. 1953). Conditions for transition kernel T ( ·|· ) Irreducibility : any state has positive probability of visiting any other state. Aperiodicity : The chain should not get trapped in cycles. Detailed Balance condition (sufficient) : π ( θ ) T ( θ ′ | θ ) = π ( θ ′ ) T ( θ | θ ′ ) Metropolis-Hastings algorithms (one iteration) sample θ ′ ∼ q ( θ ′ | θ ) 1 update the current state to θ ′ with probability α ( θ , θ ′ ) = min [ 1 , π ( θ ′ ) q ( θ | θ ′ ) π ( θ ) q ( θ ′| θ ) ] 2 Cheng Zhang UCI Variational HMC Jan 6, 2017 4 / 27

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Références Metropolis-Hastings Algorithm Markov chain Monte Carlo Intuitive idea : evolve a Markov chain to sample from a target distribution π ( θ ) (M ETROPOLIS et al. 1953). Conditions for transition kernel T ( ·|· ) Irreducibility : any state has positive probability of visiting any other state. Aperiodicity : The chain should not get trapped in cycles. Detailed Balance condition (sufficient) : π ( θ ) T ( θ ′ | θ ) = π ( θ ′ ) T ( θ | θ ′ ) Metropolis-Hastings algorithms (one iteration) sample θ ′ ∼ q ( θ ′ | θ ) 1 update the current state to θ ′ with probability α ( θ , θ ′ ) = min [ 1 , π ( θ ′ ) q ( θ | θ ′ ) π ( θ ) q ( θ ′| θ ) ] 2 Pros & Cons for simple MCMCs (e.g., RWM and Gibbs sampling) Pro : easy to implement and computationally cheap Con : slow mixing due to random walk behaviors, especially in complicate, high-dimensional models. Cheng Zhang UCI Variational HMC Jan 6, 2017 4 / 27

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Références Hamiltonian Monte Carlo Hamiltonian Monte Carlo Intuition : Leveraging a Hamiltonian dynamical system to generate trial moves in MCMC samplers. (D UANE et al. 1987, N EAL 2011) Cheng Zhang UCI Variational HMC Jan 6, 2017 5 / 27

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Références Hamiltonian Monte Carlo Hamiltonian Monte Carlo Intuition : Leveraging a Hamiltonian dynamical system to generate trial moves in MCMC samplers. (D UANE et al. 1987, N EAL 2011) Model based energy function : the Hamiltonian H ( θ , r ) = U ( θ ) + K ( r ) Potential : U ( θ ) = − log p ( θ , D ) = − [ log p ( θ ) + log p ( D| θ )] Kinetic : K ( r ) = 1 2 r ⊺ M − 1 r ⇒ π ( r ) ∼ N ( 0 , M ) The joint density of z = ( θ , r ) is π ( z ) ∝ exp ( − U ( θ ) − K ( r )) ∝ p ( θ |D ) · π ( r ) Cheng Zhang UCI Variational HMC Jan 6, 2017 5 / 27

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Références Hamiltonian Monte Carlo Hamiltonian Monte Carlo Intuition : Leveraging a Hamiltonian dynamical system to generate trial moves in MCMC samplers. (D UANE et al. 1987, N EAL 2011) Model based energy function : the Hamiltonian H ( θ , r ) = U ( θ ) + K ( r ) Potential : U ( θ ) = − log p ( θ , D ) = − [ log p ( θ ) + log p ( D| θ )] Kinetic : K ( r ) = 1 2 r ⊺ M − 1 r ⇒ π ( r ) ∼ N ( 0 , M ) The joint density of z = ( θ , r ) is π ( z ) ∝ exp ( − U ( θ ) − K ( r )) ∝ p ( θ |D ) · π ( r ) Hamilton’s equations : d θ d r dt = ∇ r H = ∇ r K ( r ) , dt = −∇ θ H = −∇ θ U ( θ ) s : R 2 d → R 2 d , z ( 0 ) = z �→ z ∗ = z ( s ) Hamiltonian flow φ H Cheng Zhang UCI Variational HMC Jan 6, 2017 5 / 27

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Références Hamiltonian Monte Carlo Hamiltonian Monte Carlo Intuition : Leveraging a Hamiltonian dynamical system to generate trial moves in MCMC samplers. (D UANE et al. 1987, N EAL 2011) Model based energy function : the Hamiltonian H ( θ , r ) = U ( θ ) + K ( r ) Potential : U ( θ ) = − log p ( θ , D ) = − [ log p ( θ ) + log p ( D| θ )] Kinetic : K ( r ) = 1 2 r ⊺ M − 1 r ⇒ π ( r ) ∼ N ( 0 , M ) The joint density of z = ( θ , r ) is π ( z ) ∝ exp ( − U ( θ ) − K ( r )) ∝ p ( θ |D ) · π ( r ) Hamilton’s equations : d θ d r dt = ∇ r H = ∇ r K ( r ) , dt = −∇ θ H = −∇ θ U ( θ ) s : R 2 d → R 2 d , z ( 0 ) = z �→ z ∗ = z ( s ) Hamiltonian flow φ H Properties : reversibility , volume preservation and constant Hamiltonian over time t Cheng Zhang UCI Variational HMC Jan 6, 2017 5 / 27

Variational Hamiltonian Monte Carlo via Score Matching Cheng Zhang - PowerPoint PPT Presentation

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Rfrences Variational Hamiltonian Monte Carlo via Score Matching Cheng Zhang (Joint work with Prof. Shahbaba and Prof. Zhao)

Hamiltonian Cycles Hamiltonian Cycles CSE, IIT KGP Hamiltonian Cycle Hamiltonian Cycle A A

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Hamiltonian Monte Carlo Dr. Jarad Niemi Iowa State University September 12, 2017 Adapted from

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

? Quantum Variational Monte Carlo Problem statement Minimize the functional E [ T ], where

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

MARC Fall Meeting 09/24/17 MARC Fall Meeting 09/24/17 SCORE Presentation SCORE

Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits Martin Zhang joint work with:

KLS conjecture and volume computation Alexander Tarasov Saint-Petersburg State University May

Construction of Lyapunov functions via relative entropy with application to caching Nicolas Gast 1

Asymptotically exponential hitting times and metastability: a pathwise approach without

Statistical mechanics of random billiard systems Renato Feres Washington University, St. Louis

CS 574: Randomized Algorithms Lecture 20. Random Walks and Electrical Networks, contd.

Concentration of measure and mixing for Markov chains Malwina J Luczak Department of Mathematics

On the mixing time of the flip walk on triangulations of the sphere Thomas Budzinski ENS Paris

Uniform Sampling of Subshifts of Finite Type Ir` ene Marcovici With the support of the European

Variational Hamiltonian Monte Carlo via Score Matching Cheng Zhang - PowerPoint PPT Presentation

Background Markov chain Monte Carlo Fixed-Form Variational Bayes Variational Hamiltonian Monte Carlo Conclusion Rfrences Variational Hamiltonian Monte Carlo via Score Matching Cheng Zhang (Joint work with Prof. Shahbaba and Prof. Zhao)

Hamiltonian Cycles Hamiltonian Cycles CSE, IIT KGP Hamiltonian Cycle Hamiltonian Cycle A A

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Hamiltonian Monte Carlo Dr. Jarad Niemi Iowa State University September 12, 2017 Adapted from

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

? Quantum Variational Monte Carlo Problem statement Minimize the functional E [ T ], where

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

MARC Fall Meeting 09/24/17 MARC Fall Meeting 09/24/17 SCORE Presentation SCORE

Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits Martin Zhang joint work with:

KLS conjecture and volume computation Alexander Tarasov Saint-Petersburg State University May

Construction of Lyapunov functions via relative entropy with application to caching Nicolas Gast 1

Asymptotically exponential hitting times and metastability: a pathwise approach without

Statistical mechanics of random billiard systems Renato Feres Washington University, St. Louis

CS 574: Randomized Algorithms Lecture 20. Random Walks and Electrical Networks, contd.

Concentration of measure and mixing for Markov chains Malwina J Luczak Department of Mathematics

On the mixing time of the flip walk on triangulations of the sphere Thomas Budzinski ENS Paris

Uniform Sampling of Subshifts of Finite Type Ir` ene Marcovici With the support of the European

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.