Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for - PowerPoint PPT Presentation

Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard LTCI, T´ el´ ecom Paris, Institut Polytechnique de Paris, France Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

Introduction Non-convex optimization problem : min f ( x ) Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

Introduction Non-convex optimization problem : min f ( x ) Fractional Langevin Algorithm (FLA) (Simsekli, 2017) : W k +1 = W k − η c α ∇ f ( W k ) + � 1 /α ∆ L α � η/β k +1 − { ∆ L α k } k ∈ N + : α -stable random variables − α ∈ (1 , 2]: the characteristic index, c α : a known constant Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

Introduction Non-convex optimization problem : min f ( x ) Fractional Langevin Algorithm (FLA) (Simsekli, 2017) : W k +1 = W k − η c α ∇ f ( W k ) + � 1 /α ∆ L α � η/β k +1 − { ∆ L α k } k ∈ N + : α -stable random variables − α ∈ (1 , 2]: the characteristic index, c α : a known constant α -stable Distribution α -stable L´ evy Motion : =1.2 100 =1.2 10 -1 =1.6 =1.6 =2.0 =2.0 50 10 -2 0 10 -3 -50 -15 -10 -5 0 5 10 15 0 500 1000 1500 2000 2500 3000 Generalizes Stochastic Gradient Langevin Dynamics ( α = 2) (Welling and Teh, 2011) Strong links with SGD for Deep Neural Networks (Simsekli et al. 2019) Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

Introduction Non-convex optimization problem : min f ( x ) Fractional Langevin Algorithm (FLA) (Simsekli, 2017) : W k +1 = W k − η c α ∇ f ( W k ) + � 1 /α ∆ L α � η/β k +1 − { ∆ L α k } k ∈ N + : α -stable random variables − α ∈ (1 , 2]: the characteristic index, c α : a known constant α -stable Distribution α -stable L´ evy Motion : =1.2 100 =1.2 10 -1 =1.6 =1.6 =2.0 =2.0 50 10 -2 0 10 -3 -50 -15 -10 -5 0 5 10 15 0 500 1000 1500 2000 2500 3000 Generalizes Stochastic Gradient Langevin Dynamics ( α = 2) (Welling and Teh, 2011) Strong links with SGD for Deep Neural Networks (Simsekli et al. 2019) Our Goal: Analyze E [ f ( W k ) − f ⋆ ], where f ⋆ � min f ( x ) Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

Method of Analysis Define three stochastic processes: d X 1 ( t ) = − c α ∇ f ( X 1 ( t − )) d t + β − 1 /α d L α ( t ) , ∞ � ∇ f ( X 2 ( j η )) I [ j η, ( j +1) η [ ( t ) d t + β − 1 /α d L α ( t ) , d X 2 ( t ) = − c α k =0 φ ( X 3 ( t − )) ∂ f ( X 3 ( t − )) � �� d X 3 ( t ) = −D α − 2 /φ ( X 3 ( t − )) d t + β − 1 /α d L α ( t ) . x i ∂ x i Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

Method of Analysis Define three stochastic processes: d X 1 ( t ) = − c α ∇ f ( X 1 ( t − )) d t + β − 1 /α d L α ( t ) , ∞ � ∇ f ( X 2 ( j η )) I [ j η, ( j +1) η [ ( t ) d t + β − 1 /α d L α ( t ) , d X 2 ( t ) = − c α k =0 φ ( X 3 ( t − )) ∂ f ( X 3 ( t − )) � �� d X 3 ( t ) = −D α − 2 /φ ( X 3 ( t − )) d t + β − 1 /α d L α ( t ) . x i ∂ x i − D : Riesz fractional (directional) derivative − X 1 is the continuous-time limit of the FLA algorithm − X 2 is a linearly interpolated version of W k : X 2 ( k η ) = W k , ∀ k ∈ N + − X 3 admits π ∝ exp( − β f ( x )) d x as its unique invariant distribution Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

Method of Analysis Define three stochastic processes: d X 1 ( t ) = − c α ∇ f ( X 1 ( t − )) d t + β − 1 /α d L α ( t ) , ∞ � ∇ f ( X 2 ( j η )) I [ j η, ( j +1) η [ ( t ) d t + β − 1 /α d L α ( t ) , d X 2 ( t ) = − c α k =0 φ ( X 3 ( t − )) ∂ f ( X 3 ( t − )) � �� d X 3 ( t ) = −D α − 2 /φ ( X 3 ( t − )) d t + β − 1 /α d L α ( t ) . x i ∂ x i − D : Riesz fractional (directional) derivative − X 1 is the continuous-time limit of the FLA algorithm − X 2 is a linearly interpolated version of W k : X 2 ( k η ) = W k , ∀ k ∈ N + − X 3 admits π ∝ exp( − β f ( x )) d x as its unique invariant distribution Decompose the error E f ( W k ) − f ∗ as: [ E f ( X 2 ( k η )) − E f ( X 1 ( k η ))] + [ E f ( X 1 ( k η )) − E f ( X 3 ( k η ))] + [ E f ( X 3 ( k η )) − E f ( ˆ W )] + [ E f ( ˆ W ) − f ∗ ] − ˆ W ∼ π ∝ exp( − β f ( x )) d x − Relate these terms to Wasserstein distance between processes Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

Main Result Main assumptions: older continuous gradients: c α �∇ f ( x ) − ∇ f ( y ) � ≤ M � x − y � γ 1 ) H¨ 2 ) Dissipativity: c α � x , ∇ f ( x ) � ≥ m � x � 1+ γ − b Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

Main Result Main assumptions: older continuous gradients: c α �∇ f ( x ) − ∇ f ( y ) � ≤ M � x − y � γ 1 ) H¨ 2 ) Dissipativity: c α � x , ∇ f ( x ) � ≥ m � x � 1+ γ − b Theorem For 0 < η < m / M 2 , there exists C > 0 such that: � q + k 1+max { 1 q ,γ + γ q + γ 1 q } η α q d E [ f ( W k )] − f ∗ ≤ C q ,γ + γ k 1+max { 1 1 q } η ( q − 1) γ β α q � Mc − 1 + β b + d exp( − λ ∗ k η α ) + β γ +1 (1 + γ ) m β d (2 e ( b + d 2 Γ( d 2 + 1) β d β )) + 1 β log . d ( dm ) 2 Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

Main Result Main assumptions: older continuous gradients: c α �∇ f ( x ) − ∇ f ( y ) � ≤ M � x − y � γ 1 ) H¨ 2 ) Dissipativity: c α � x , ∇ f ( x ) � ≥ m � x � 1+ γ − b Theorem For 0 < η < m / M 2 , there exists C > 0 such that: � q + k 1+max { 1 q ,γ + γ q + γ 1 q } η α q d E [ f ( W k )] − f ∗ ≤ C q ,γ + γ k 1+max { 1 1 q } η ( q − 1) γ β α q � Mc − 1 + β b + d exp( − λ ∗ k η α ) + β γ +1 (1 + γ ) m β d (2 e ( b + d 2 Γ( d 2 + 1) β d β )) + 1 β log . d ( dm ) 2 − Worse dependency on η and k than the case α = 2 − Requires smaller η Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

Additional Results Posterior Sampling: sampling from π ∝ exp( − β f ( x )) d x Stochastic Gradients: � n f ( x ) � 1 i =1 f ( i ) ( x ) n � � � ∇ f ≈ ∇ f k ( x ) � i ∈ Ω k ∇ f ( i ) ( x ) / n s Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for - PowerPoint PPT Presentation

Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization Thanh Huy Nguyen, Umut S im sekli, Ga el Richard LTCI, T el ecom Paris, Institut Polytechnique de Paris, France Non-Asymptotic Analysis of

Neutrograph T. Pirling Institut Laue Langevin INSTITUT MAX VON LAUE - PAUL LANGEVIN Camera

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Non-asymptotic convergence bound for the Langevin MCMC Algorithm Alain Durmus, Eric Moulines,

Langevin Dynamics Loucas Pillaud-Vivien November 7, 2019 Loucas Pillaud-Vivien Langevin

Efficient Numerical Methods for Fractional Laplacian and time fractional PDEs Jie Shen Purdue

NON-SYMMETRIC FRACTIONAL DIFFUSION NON-SYMMETRIC FRACTIONAL DIFFUSION AS A SPECIAL CASE OF AS A

An Introduction to Asymptotic Theory Ping Yu School of Economics and Finance The University of

Just-In-TimeReview Sections 18-21 JIT18: SimplifyingRatio- nalExpressions Fractional

Langevin equation equation for for a a system system Langevin nonlinearly coupled coupled to

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Asymptotic behaviour for fractional diffusion- convection equations Liviu Ignat Institute of

A New Fractional Process: A Fractional Non-homogeneous Poisson Process Enrico Scalas University

FRACTIONAL UNDERDAMPED LANGEVIN DYNAMICS: Umut im ekli LTCI, Tlcom Paris, RETARGETING

Analysis of the controllability of space-time fractional diffusion and super diffusion equations

STOCHASTIC PROXIMAL LANGEVIN ALGORITHM Adil Salim Joint work with Dmitry Kovalev and Peter

Cut-points in asymptotic cones of groups Mark Sapir With J. Behrstock, C. Drut u, S. Mozes,

Semantic guidance for unbounded symbolic reachability Martin Suda Max Planck Institute fr

FIXING T THE FLYING P PLAN ANE Major SAAS Upgrades by a Production DevOps Team of 26

Fixing Boolean networks asynchronously Juilio Aracena and Lilian Salinas Universidad de

Automated Fixing of Programs with Contracts Yi Wei, Yu Pei, Carlo A. Furia, Lucas S. Silva,

Q UADRATIC A SSIGNMENT P ROBLEM (QAP) There are n factories and n cities. A distance a ij

Methods of Solving Flag Partial Differential Equations Xiaoping Xu Institute of Mathematics

Effective Abstractions for Verification under Relaxed Memory Models Andrei Dan Yuri Meshman

Quality Flags for Version 4.0 AIRS Science Team Meeting Greenbelt, Maryland December 1, 2004