adaptive hmc via the infinite exponential family
play

Adaptive HMC via the Infinite Exponential Family Arthur Gretton - PowerPoint PPT Presentation

Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 1 / 38 Setting: MCMC


  1. Adaptive HMC via the Infinite Exponential Family Arthur Gretton ⋆ Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 1 / 38

  2. Setting: MCMC for intractable non-linear targets Using samples to compute expectations We have a density of the form p ( x ) = π ( x ) ˆ Z = π ( x ) dx Z Z often impractical to compute Goal: to compute expectations of functions, ˆ E p [ f ( x )] = f ( x ) p ( x ) dx Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 2 / 38

  3. Setting: MCMC for intractable non-linear targets Using samples to compute expectations We have a density of the form p ( x ) = π ( x ) ˆ Z = π ( x ) dx Z Z often impractical to compute Goal: to compute expectations of functions, ˆ E p [ f ( x )] = f ( x ) p ( x ) dx Given samples { x i } n i = 1 with distribution p ( x ) , n � E p [ f ( x )] = 1 � f ( x i ) n i = 1 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 2 / 38

  4. Setting: MCMC for intractable non-linear targets Metropolis-Hastings MCMC A visual guide . . . Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 3 / 38

  5. Setting: MCMC for intractable non-linear targets Metropolis-Hastings MCMC Unnormalized target π ( x ) ∝ p ( x ) Generate Markov chain with invariant distribution p Initialize x 0 ∼ p 0 At iteration t ≥ 0, propose to move to state x ′ ∼ q ( ·| x t ) Accept/Reject proposals based on ratio � � � 1 , π ( x ′ ) q ( x t | x ′ ) x ′ , w.p. min , π ( x t ) q ( x ′ | x t ) x t + 1 = x t , otherwise. Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 4 / 38

  6. Setting: MCMC for intractable non-linear targets Metropolis-Hastings MCMC Unnormalized target π ( x ) ∝ p ( x ) Generate Markov chain with invariant distribution p Initialize x 0 ∼ p 0 At iteration t ≥ 0, propose to move to state x ′ ∼ q ( ·| x t ) Accept/Reject proposals based on ratio � � � 1 , π ( x ′ ) q ( x t | x ′ ) x ′ , w.p. min , π ( x t ) q ( x ′ | x t ) x t + 1 = x t , otherwise. What proposal q ( ·| x t ) ? Too narrow or broad: → slow convergence Does not conform to support of target → slow convergence Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 4 / 38

  7. Setting: MCMC for intractable non-linear targets Adaptive MCMC Adaptive Metropolis ( Haario, Saksman & Tamminen, 2001 ): Update proposal q t ( ·| x t ) = N ( x t , ν 2 ˆ Σ t ) , using estimates of the target covariance Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 5 / 38

  8. Setting: MCMC for intractable non-linear targets Adaptive MCMC Adaptive Metropolis ( Haario, Saksman & Tamminen, 2001 ): Update proposal q t ( ·| x t ) = N ( x t , ν 2 ˆ Σ t ) , using estimates of the target covariance Locally miscalibrated for strongly non-linear targets : directions of large variance depend on the current location Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 5 / 38

  9. Setting: MCMC for intractable non-linear targets Alternative adaptive sampler: the Kameleon Idea: fit Gaussian in feature space, take local steps in directions of max. principal components. D. Sejdinovic, H. Strathmann, M. Lomeli, C. Andrieu, and A. Gretton, ICML 2014 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 6 / 38

  10. Setting: MCMC for intractable non-linear targets Hamiltonian Monte Carlo HMC: distant moves, high acceptance probability. 1 Potential energy U ( x ) = − log π ( x ) , auxiliary 0 momentum p ∼ exp ( − K ( p )) , simulate for t ∈ R along − 1 Hamiltonian flow of H ( p , x ) = K ( p ) + U ( x ) , using θ 7 − 2 operator − 3 ∂ K ∂ x − ∂ U ∂ ∂ − 4 ∂ p ∂ x ∂ p − 5 − 5 − 4 − 3 − 2 − 1 0 Numerical simulation (i.e. θ 2 leapfrog) depends on gradient information . Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 7 / 38

  11. Setting: MCMC for intractable non-linear targets Intractable & Non-linear Target in GPC Sliced posterior over hyperparameters of a Gaussian Process classifier on UCI Glass dataset obtained using Pseudo-Marginal MCMC 0 − 1 − 2 θ 7 − 3 − 4 − 5 − 6 − 5 − 4 − 3 − 2 − 1 0 θ 2 Can you learn an HMC sampler? Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 8 / 38

  12. Setting: MCMC for intractable non-linear targets Outline for remainder of talk 0 − 1 − 2 θ 7 − 3 − 4 − 5 − 6 − 5 − 4 − 3 − 2 − 1 0 θ 2 Kernel Adaptive Hamiltonian Infinite dimensional exponential Monte Carlo ( Strathmann et al. family ( Sriperumbudur et al. 2015 ) 2014 ) Global estimate of gradient Exponential family with of log target density from RKHS-valued natural prev. samples parameter Mixing performance close to Learned via score matching , ideal “known density” HMC no log-partition function Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 9 / 38

  13. MCMC Kameleon Infinite dimensional exponential family density estimator 0 − 1 − 2 θ 7 − 3 − 4 − 5 − 6 − 5 − 4 − 3 − 2 − 1 0 θ 2 Bharath Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Revant Kumar, and Aapo Hyvarinen, JMLR 2017, to appear (slides adapted from Bharath’s talk) Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 10 / 38

  14. MCMC Kameleon The Exponential Family of Distributions Natural form: p θ ( x ) = q 0 ( x ) e θ T T ( x ) − A ( θ ) where θ ∈ Θ ⊂ R m (natural parameter) q 0 : probability density defined over Ω ⊂ R d A ( θ ) : log-partition function ˆ e θ T T ( x ) q 0 ( x ) dx A ( θ ) = log T ( x ) : sufficient statistic Includes many commonly used distributions Normal, Binomial, Poisson, Exponential, . . . Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 11 / 38

  15. MCMC Kameleon Infinite Dimensional Generalization � � p f ( x ) = e f ( x ) − A ( f ) q 0 ( x ) , x ∈ Ω : f ∈ F P = where � � ˆ e f ( x ) q 0 ( x ) dx < ∞ F = f ∈ H : A ( f ) = log (Canu and Smola, 2005; Fukumizu, 2009): H is a reproducing kernel Hilbert space (RKHS). Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 12 / 38

  16. MCMC Kameleon Reproducing kernel Hilbert space Exponentiated quadratic kernel, � � ∞ � −� x − x ′ � 2 k ( x , x ′ ) = exp φ i ( x ) φ i ( x ′ ) = 2 σ 2 i = 1 ∞ ∞ � � f 2 f ( x ) = f i φ i ( x ) i < ∞ . i = 1 i = 1 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 13 / 38

  17. MCMC Kameleon Reproducing kernel Hilbert space Function with exponentiated quadratic kernel: 1 0.8 0.6 m � 0.4 f(x) f ( x ) : = α i k ( x i , x ) 0.2 0 i = 1 −0.2 m � −0.4 = α i � φ ( x i ) , φ ( x ) � H −6 −4 −2 0 2 4 6 8 x i = 1 � m � � = α i φ ( x i ) , φ ( x ) i = 1 H Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 14 / 38

  18. MCMC Kameleon Reproducing kernel Hilbert space Function with exponentiated quadratic kernel: 1 m � 0.8 f ( x ) : = α i k ( x i , x ) 0.6 i = 1 0.4 f(x) � m 0.2 α i � φ ( x i ) , φ ( x ) � H = 0 −0.2 i = 1 � m � −0.4 −6 −4 −2 0 2 4 6 8 � x f ℓ := � m = α i φ ( x i ) , φ ( x ) i = 1 α i φ ℓ ( x i ) i = 1 H ∞ � = f ℓ φ ℓ ( x ) Possible to write functions of ℓ = 1 infinitely many features! = � f ( · ) , φ ( x ) � H Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 15 / 38

  19. MCMC Kameleon RKHS-Based Exponential Family H is an RKHS: � � p f ( x ) = e � f ,φ ( x ) � H − A ( f ) q 0 ( x ) , x ∈ Ω , f ∈ F P = where � � ˆ e f ( x ) q 0 ( x ) dx < ∞ F = f ∈ H : A ( f ) = log . Finite dimensional RKHS: one-to-one correspondence between finite dimensional exponential family and RKHS. T ( x ) � k ( x , y ) = � T ( x ) , T ( y ) � . Similarly, k ( x , y ) = � Φ( x ) , Φ( y ) � � Φ( x ) . Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 16 / 38

  20. MCMC Kameleon Examples Exponential: Ω = R ++ , k ( x , y ) = xy . Normal: Ω = R , k ( x , y ) = xy + x 2 y 2 . Beta: Ω = ( 0 , 1 ) , k ( x , y ) = log x log y + log ( 1 − x ) log ( 1 − y ) . Gamma: Ω = R ++ , k ( x , y ) = log x log y + xy . Inverse Gaussian: Ω = R ++ , k ( x , y ) = xy + 1 xy . Poisson: Ω = N ∪ { 0 } , k ( x , y ) = xy , q 0 ( x ) = ( x ! e ) − 1 . Geometric: Ω = N ∪ { 0 } , k ( x , y ) = xy , q 0 ( x ) = 1. Binomial: Ω = { 0 , . . . , m } , k ( x , y ) = xy , q 0 ( x ) = 2 − m � m � . c Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 17 / 38

  21. MCMC Kameleon Problem: Given random samples, X 1 , . . . , X n drawn i.i.d. from an unknown density, p 0 := p f 0 ∈ P , estimate p 0 . Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 18 / 38

Recommend


More recommend