Diffusions and their numerical approximation Applications of Langevin algorithms Langevin Dynamics Loucas Pillaud-Vivien November 7, 2019 Loucas Pillaud-Vivien Langevin Dynamics
Diffusions and their numerical approximation Applications of Langevin algorithms Introduction Sampling distribution over high-dimensional space is an important topic in computational statistics and machine learning Example of application : Bayesian inference for high-dimensional models Problems : Most of sampling techniques do not scale to high-dimension. 1 Big d . And to large number of data (recall HMC, need the full 2 gradient). Big N . Loucas Pillaud-Vivien Langevin Dynamics
Diffusions and their numerical approximation Applications of Langevin algorithms Example: Bayesian setting A Bayesian model is specified by: sampling distribution of observed data: likelihood Y ∼ L ( ·| θ ) 1 a prior distribution p on the parameter space θ ∈ R d 2 The inference is based on the posterior distribution π ( d θ ) = p ( d θ ) L ( Y | θ ) � L ( Y | u ) p ( du ) The normalizing constant is often not tractable (too high dimensional), we can only compute: π ( d θ ) ∝ p ( d θ ) L ( Y | θ ) Loucas Pillaud-Vivien Langevin Dynamics
Diffusions and their numerical approximation Applications of Langevin algorithms Outline Diffusions and their numerical approximation 1 Setting Continuous time Markov process: diffusions Discretized Langevin diffusion Applications of Langevin algorithms 2 Sampling a strongly convex potential Stochastic Gradient Langevin Dynamics Non convex Learning via SGLD Loucas Pillaud-Vivien Langevin Dynamics
Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Framework We want to sample the following measure that has a density w.r.t Lebesgue known up to a normalization factor. e − V ( x ) dx d µ ( x ) = � R d e − V ( y ) dy We assume that V is L -smooth : i.e. continuously differentiable and ∃ L > 0 s.t. �∇ V ( x ) − ∇ V ( y ) � � L � x − y � Loucas Pillaud-Vivien Langevin Dynamics
Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Convergence to equilibrium for Diffusions Let us consider the overdamped Langevin diffusion in R d : √ dX t = −∇ V ( X t ) dt + 2 dB t , L -smoothness of V gives existence and unicity of a solution e − V ( x ) dx Stationnary measure : d µ ( x ) = R d e − V ( y ) dy . � Semi-group : P t ( f )( x ) = E [ f ( X t ) | X 0 = x ] − → ”law of X t ”. Infinitesimal generator : L φ = ∆ φ − ∇ V · ∇ φ . We can verify that the semi-group follows the dynamics: d dt P t ( f ) = L P t ( f ) . − → Question : what speed of convergence then??? ? Loucas Pillaud-Vivien Langevin Dynamics
Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Convergence to equilibrium for Diffusions Theorem (Poincar´ e implies convergence to equilibrium) With the notations above, the following propositions are equivalent: µ satisfies a Poincar´ e Inequality with constant P For all f smooth, Var µ ( P t ( f )) � e − 2 t / P Var µ ( f ) for all t � 0 . Proof: Integration by part formula ( µ is reversible), � � � − f ( L g ) d µ = ∇ f · ∇ g d µ = − ( L f ) g d µ, hence, dt Var µ ( P t ( f )) = d d � � ( P t ( f )) 2 d µ = 2 P t ( f )( L P t ( f )) d µ dt � �∇ P t ( f ) � 2 d µ = − 2 � − 2 / P Var µ ( P t ( f )) Loucas Pillaud-Vivien Langevin Dynamics
Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities: definition in modern language Definition (Poincar´ e inequality) µ ∈ P ( R d ) satisfies a Poincar´ e Inequality with constant P if � �∇ f � 2 d µ, Var µ ( f ) � P µ for all (bounded) f : R d − → R of class C 1 . Recall that : � 2 � 2 �� � � � � f 2 d µ − f − Var µ ( f ) = fd µ = fd µ d µ � �∇ f � 2 d µ = E ( f ) is the Dirichlet Energy . � � Spectral interpretation: E ( f ) = ∇ f · ∇ fd µ = f ( −L f ) d µ − → 1 / P = λ 2 , first non-trivial eigenvalue of L . Loucas Pillaud-Vivien Langevin Dynamics
Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Application to the Ornstein-Uhlenbeck process The diffusion of the Ornstein-Uhlenbeck process follows the SDE in R d : √ dX t = − X t dt + 2 dB t , Denote L the operator L φ = ∆ φ − x · ∇ φ , then (2 π ) d / 2 e −� x � 2 / 2 dx , L is self adjoint in L 2 1 1 For d µ ( x ) = µ 2 µ stationnary measure of O-U process 3 µ verifies Poincar´ e inequality with constant 1. 4 for all f smooth, for all t � 0 . Var µ ( P t ( f )) � e − 2 t Var µ ( f ) . Loucas Pillaud-Vivien Langevin Dynamics
Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities Long story short: Poincar´ e inequality ⇐ ⇒ Spectral gap for L ⇐ ⇒ Exponential convergence for the diffusion Loucas Pillaud-Vivien Langevin Dynamics
Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities For what distribution do they occur? When V is m -stongly convex: P = 1 / m (linear convergence of gradient descent) Loucas Pillaud-Vivien Langevin Dynamics
Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities For what distribution do they occur? When V is m -stongly convex: P = 1 / m (linear convergence of gradient descent) When V is only convex: yes but no bound... Loucas Pillaud-Vivien Langevin Dynamics
Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities For what distribution do they occur? When V is m -stongly convex: P = 1 / m (linear convergence of gradient descent) When V is only convex: yes but no bound... A generic condition for non necessarily convex potential : 1 2 |∇ V | 2 − ∆ V � α Loucas Pillaud-Vivien Langevin Dynamics
Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities For what distribution do they occur? When V is m -stongly convex: P = 1 / m (linear convergence of gradient descent) When V is only convex: yes but no bound... A generic condition for non necessarily convex potential : 1 2 |∇ V | 2 − ∆ V � α For mixture of Gaussian P explodes exponentially. Loucas Pillaud-Vivien Langevin Dynamics
Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Ok, fine. But how do I get back to the real world and draw samples ? Loucas Pillaud-Vivien Langevin Dynamics
Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Discretized Langevin Diffusion Idea: Sample the diffusion paths, using Euler-Maruyama scheme √ dX t = −∇ V ( X t ) dt + 2 dB t X k +1 = X k − γ k +1 ∇ V ( X k ) + � 2 γ k +1 ξ k +1 where ( ξ k ) k is i.i.d N (0 , I d ) ( γ k ) k is a sequence of stepsizes, either constant or decreasing to 0 Note the similarity with gradient descent or its stochastic counterpart. This algorithm is referred to Unajusted Langevin Algorithm , Langevin Monte Carlo or Gradient Langevin Dynamics . Loucas Pillaud-Vivien Langevin Dynamics
Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Discretized Langevin Diffusion: constant stepsize When ∀ k , γ k = γ , then ( X k ) k is an homogeneous Markov chain with Markov kernel R γ Under some mild assumptions R γ is irreducible , positive recurrent and hence has an invariant distribution d µ γ � = d µ . Typical questions: For a given precision how do we choose the stepsize γ and the number of iterations such that dist ( δ x R n γ , d µ ) � ǫ How do we choose x ? How do we quantify dist ( d µ γ , d µ ) ? Loucas Pillaud-Vivien Langevin Dynamics
Sampling a strongly convex potential Diffusions and their numerical approximation Stochastic Gradient Langevin Dynamics Applications of Langevin algorithms Non convex Learning via SGLD Outline Diffusions and their numerical approximation 1 Setting Continuous time Markov process: diffusions Discretized Langevin diffusion Applications of Langevin algorithms 2 Sampling a strongly convex potential Stochastic Gradient Langevin Dynamics Non convex Learning via SGLD Loucas Pillaud-Vivien Langevin Dynamics
Recommend
More recommend