sampling mcmc and spectral gaps in infinite dimensions
play

Sampling, MCMC and Spectral Gaps in Infinite Dimensions Martin - PowerPoint PPT Presentation

Introduction Result Ideas of the proof Summary Sampling, MCMC and Spectral Gaps in Infinite Dimensions Martin Hairer 1 Andrew Stuart 1 Sebastian Vollmer 1 1 Department of Mathematics University of Warwick Sydney, 2012 Introduction Result


  1. Introduction Result Ideas of the proof Summary Sampling, MCMC and Spectral Gaps in Infinite Dimensions Martin Hairer 1 Andrew Stuart 1 Sebastian Vollmer 1 1 Department of Mathematics University of Warwick Sydney, 2012

  2. Introduction Result Ideas of the proof Summary Introduction 1 Notation Target Measure Spectral Gaps Result 2 Key result Dimension Dependent Results for the RWM Preliminaries & Weak Harris Theorem Ideas of the proof 3 d -contracting d -contracting Dimensionality Summary 4

  3. Introduction Result Ideas of the proof Summary Notation Given a measure µ , generate X i such that n S n ( f ) = 1 ∑ h ( X n ) → E µ [ h ( X )] . n j = 1 Complexity of an algorithm: number of necessary steps × cost of a step

  4. Introduction Result Ideas of the proof Summary Target Measure Target measure Assumption: The target measure µ has density w.r.t Gaussian γ µ ( dx ) = M exp ( − Φ ( x )) γ ( dx ) . (1) γ = N ( 0 , C ) on a separable Hilbert space H , { ϕ n } n ∈ N orthonormal basis of eigenvectors of C with eigenvalues { λ 2 n } n ∈ N , then Karhunen-Loeve expansion yields ∞ i.i.d γ = L ( ∑ ∼ N ( 0 , 1 ) λ i ϕ i ξ i ) , where ξ i i = 1 Example: Brownian motion on [ 0 , 1 ] √ ∞ � � 1 ( k − 1 ∑ B t = 2 sin 2 ) π t ξ k ( k − 1 2 ) 2 π 2 k = 1 Using the projections P m on span { ϕ i } m i = 1 , then m -dim. approximations are m ∑ γ m ( dx ) = L ( λ i ϕ i ξ i )( dx ) i = 1 µ m ( dx ) = M m exp ( − Φ ( P m x )) γ m ( dx ) . (2)

  5. Introduction Result Ideas of the proof Summary Target Measure Metropolis-Hastings algorithms α ( x , y ) acceptance probability for transition from x to y [Tierney, 1998] Random Walk Metropolis (RWM) algorithm on R m √ Q ( x , dy ) = L ( x + 2 δξ )( dy ) with ξ ∼ γ m � Φ ( x ) − Φ ( y ) + 1 2 � x , C x � − 1 � α ( x , y ) = 1 ∧ exp 2 � y , C y � Preconditioned Crank-Nicolson (pCN) algorithm on H √ 1 � � Q ( x , dy ) = L ( 1 − 2 δ ) 2 x + ( dy ) with ξ ∼ γ 2 δξ α ( x , y ) = 1 ∧ exp ( Φ ( x ) − Φ ( y )) Transition kernel P ( x , dz ) = Q ( x , dz ) α ( x , z ) + δ x ( dz ) � ( 1 − α ( x , u )) Q ( x , du ) P , P m denote the transition kernel respectively

  6. Introduction Result Ideas of the proof Summary Spectral Gaps Definition A Markov-transiation kernel P with invariant measure µ has an L 2 µ - spectral-gap 1 − β iff β = sup �P f − µ ( f ) � 2 / � f − µ ( f ) � 2 < 1 . f ∈ L 2 µ Proposition If X 0 ∼ µ , then for any f ∈ L 2 f ( X n ) satisfies a CLT [Kipnis and Varadhan, 1986] with asymptotic variance f , P ≤ 2 µ ( f 2 ) σ 2 1 − β . Proposition For X 0 ∼ ν with ν absolutely continuous w.r.t. µ non asymptotic [Rudolf, 2011] result of the form 2 E ν , K | S n ( f ) − µ ( f ) | 2 � MSE : n ( 1 − β ) .

  7. Introduction Result Ideas of the proof Summary Key result Key result Theorem (Key Result) The RWM algorithm has an L 2 -spectral gap that decays to zero 1 faster than any negative power of m. If Φ is locally Lipschitz and satisfies a growth assumption, then the 2 transition kernel P of the pCN has a lower bound on the L 2 -spectral gap uniformly in m.

  8. Introduction Result Ideas of the proof Summary Dimension Dependent Results for the RWM Dimension Dependent Results for the RWM Conductance � A P ( x , A c ) d µ ( x ) C = inf µ ( A ) µ ( A ) ≤ 1 2 Relation to spectral gap (c.f. [Lawler and Sokal, 1988, Sinclair and Jerrum, 1989]) C 2 2 ≤ 1 − β ≤ 2C . Proposition For any Metropolis-Hastings transition kernel P and µ ( B ) ≤ 1 2 , 1 − β ≤ 2 sup α ( x ) . x ∈ B Proof. The algorithm started in B can only move to B c if it accepts the move. Hence P ( x , B c ) ≤ α ( x ) .

  9. Introduction Result Ideas of the proof Summary Dimension Dependent Results for the RWM with ξ i.i.d Consider µ m = γ m = L ( ∑ m 1 i ξ i e i ) ∼ N ( 0 , 1 ) i = 1 Theorem Let P m be the Markov kernel of RWM applied to γ m . Scaling of δ Upper bound on spectral gap 1 − β m ≤ K p m − p for any p δ m ∼ m − a , a ∈ [ 0 , 1 ) 1 − β m ≤ Km − a δ m ∼ m − a , a ∈ [ 1 , ∞ ) 2

  10. Introduction Result Ideas of the proof Summary Preliminaries & Weak Harris Theorem Why is the small set/minorization approach (Meyn and Tweedie) not applicable? Definition A Markov chain ( X t ) with kernel P is said to be ψ -irreducible if for a non-trivial Borel measure ψ there is an n ∈ N s.t. ψ ( A ) > 0 ⇒ P n ( x , A ) > 0 for all x Here P ( x , · ) and P ( y , · ) are mutually singular for some x and y 1 � � 2 x , 2 δ C P ( x , dz ) = α ( x , z ) N ( 1 − 2 δ ) ( dz ) + δ x ( dz ) r ( x ) � 1 � 2 y , 2 δ C P ( y , dz ) = α ( y , z ) N ( 1 − 2 δ ) ( dz ) + δ y ( dz ) r ( y )

  11. Introduction Result Ideas of the proof Summary Preliminaries & Weak Harris Theorem Preliminaries Weak Harris Theorem Definition d : H × H → R + is a distance-like function if it is symmetric, lower semi-continuous and d ( x , y ) = 0 ⇔ x = y . Definition The corresponding Wasserstein distance is given by � d ( ν 1 , ν 2 ) = inf H 2 d ( x , y ) π ( dx , dy ) . π ∈ Γ ( ν 1 , ν 2 ) with Γ ( ν 1 , ν 2 ) = { π ∈ M ( H 2 ) | P i ∗ π = ν i } . Definition P has a Wasserstein spectral gap if ∃ λ > 0 , C > 0 s.t. d ( ν 1 P n , ν 2 P n ) ≤ C exp ( − λ n ) d ( ν 1 , ν 2 ) for all n ∈ N .

  12. Introduction Result Ideas of the proof Summary Preliminaries & Weak Harris Theorem Definition S ⊂ E is d-small if ∃ 0 < s < 1 s.t. x , y ∈ S d ( P ( x , · ) , P ( y , · )) ≤ s . If S is a small set, then it is also d -small for d ( x , y ) = χ { x � = y } ( x , y ) where the Wasserstein distance coincides with the total variation �P ( x , · ) − P ( y , · )) � TV ≤ 1 − s . Definition P is d-contracting if ∃ 0 < c < 1 such that d ( x , y ) < 1 implies d ( P ( x , · ) , P ( y , · )) ≤ c · d ( x , y ) . Definition V is a Lyapunov function for P if ∃ K > 0 and 0 ≤ l < 1 s.t. P n V ( x ) ≤ l n V ( x ) + K for all x ∈ H and all n ∈ N .

  13. Introduction Result Ideas of the proof Summary Preliminaries & Weak Harris Theorem Weak Harris Theorem Theorem Weak Harris Theorem [Hairer et al., 2011] If ν 1 and ν 2 are probability measures on H and d : H × H → [ 0 , 1 ] a distance-like function s. t. 1 P has a Lyapunov function V ; 2 P is d-contracting the set S = { x ∈ H : V ( x ) ≤ 4 K } is d-small, then 3 n ) ≤ 1 � n , ν 2 P ˜ d ( ν 1 P ˜ ˜ d ( ν 1 , ν 2 ) with ˜ ˜ d = d ( 1 + V ( x ) + V ( y )) 2 with ˜ n ( l , K , c , s ) increasing in l, K, c and s. Moreover, if √ there is a complete metric d 0 such that d 0 ≤ d P is Feller then there exist a unique invariant measure µ for P .

  14. Introduction Result Ideas of the proof Summary Preliminaries & Weak Harris Theorem Wasserstein spectral gap Why do we care? CLT for ˜ d -Lipschitz functionals by Komorowski and Walczuk in [Komorowski and Walczuk, 2011] (holds in the non-reverisble case) For reversible Markov processes: Theorem (Due to [Wang, 2003]) If Lip ( ˜ d ) ∩ L ∞ µ is dense in L 2 µ , then Wasserstein spectral gap ⇒ L 2 -spectral gap of the same size

  15. Introduction Result Ideas of the proof Summary Preliminaries & Weak Harris Theorem Globally Lipschitz log-density Weak Harris Theorem for d ( x , y ) = 1 ∧ � x − y � ǫ Assumption There is an r > 0 and α l > 0 s.t. P ( q x ( ξ ) is accepted | � ξ � ≤ r ) ≥ α l Theorem Assume that Φ has a global Lipschitz constant L and the Assumption above is satisfied then for ǫ small enough the pCN algorithm for µ ( µ m ) converges exponentially in � d ( x , y )( 1 + V ( x ) + V ( y )) with V = � x � i ˜ d ( x , y ) = d ( x , y ) = 1 ∧ � x − y � ǫ with an m-independent bound on the rate. Moreover, µ ( µ m ) is the unique invariant measure.

  16. Introduction Result Ideas of the proof Summary d -contracting Basic coupling Recall d -contracting: d ( x , y ) < 1 implies d ( P ( x , · ) , P ( y , · )) ≤ cd ( x , y ) c < 1 Proposals from x and y for ξ ∼ γ √ 1 2 x + q x ( ξ ) = ( 1 − 2 δ ) 2 δξ √ 1 2 y + q y ( ξ ) = ( 1 − 2 δ ) 2 δξ U uniform independent random variable q x ( ξ ) χ [ 0 , α ( x , q x )] ( U ) + x · χ ( α ( x , q x ) , 1 ] x ˜ = = q y ( ξ ) χ [ 0 , α ( y , q y )] ( U ) + y · χ ( α ( y , q y ) , 1 ] y ˜ P ( x , · ) = L ( ˜ x ) , P ( y , · ) = L ( ˜ y ) , Basic coupling π Basic = L (( ˜ x , ˜ y ))

  17. Introduction Result Ideas of the proof Summary d -contracting d -contracting � d ( P ( x , · ) , P ( y , · )) ≤ d ( a , b ) d π inf π ∈ Γ ( P ( x , · ) , P ( y , · )) � ≤ d ( a , b ) d π Basic ≤ E d ( ˜ y ) inf x , ˜ π ∈ Γ ( P ( x , · ) , P ( y , · )) Observation: If both algorithms accept the proposal 1 2 � x − y � � ˜ x − ˜ y � = � q x ( ξ ) − q y ( ξ ) � = ( 1 − 2 δ ) If both algorithms reject the proposal x − ˜ y � = � x − y � � ˜ If one accepts and the other rejects we use d ≤ 1 � ≤ X | α ( x , q x )( ξ ) − α ( y , q y )( ξ ) | d γ ( ξ ) P ( only one accepts ) � ≤ | Φ ( q x ) − Φ ( q y ) | + | Φ ( x ) − Φ ( y ) | d γ ( ξ ) ≤ 2 L | x − y | ≤ 2 L ǫ d ( x , y )

Recommend


More recommend