Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond Xuechen Li 1,2 Denny Wu 1,2 Lester Mackey 3 Murat A. Erdogdu 1,2 1 University of Toronto 2 Vector Institute 3 Microsoft Research
The Problem and Our Work Given smooth potential f : R d → R , sample from given density p ( x ) ∝ exp( − f ( x )) . • We study both strongly convex and non-convex potentials. • Many papers study individual algorithms [1, 2, 3, 4, 5]. However, there has yet to be a unifying theoretical framework. • We provide a theorem that gives the convergence rate of sampling algorithms obtained by discretizing an exponentially contracting diffusion based on local properties of the numerical method. • A direct extension is we obtain faster converging algorithms with the class of stochastic Runge-Kutta (SRK) methods.
Exponential W 2 -Contraction of Diffusions Diffusion X t has exponential W 2 -contraction if two instances X t , x , X t , y initiated respectively from x and y satisfy W 2 ( X t , x , X t , y ) ≤ e − α t � x − y � 2 , for all x , y ∈ R d , t ≥ 0 . Informal: The marginals of the continuous-time diffusion become the same very quickly regardless of the initial state. Example: When f is strongly convex, the Langevin diffusion characterized by the SDE √ d X t = −∇ f ( X t ) d t + 2 d B t has exponential W 2 -contraction.
<latexit sha1_base64="opcjrm9x0vjknBWC4/kPr2xUTg=">AHYXic3VNT9tAEB2gJZB+BdobF7cREr1ECbQqB1TRpof2EhHaAIVQZDubxGL9IXtTNYr4Fb2P6zn/pG+HRvwkrjtGVv2zo7fzLyZnfU6kfQSVa/mptfuHN3sbS0XL53/8HDR5WV1YMkHMWu6LihDOMjx06E9ALRUZ6S4iKhe07Uhw65039/fCriBMvD6pcSROfXsQeH3PtRVUn9mWxsPb84G5VqvVanS9rWmhkQpWyay9cWXhKXepRSC6NyCdBASnIkmxKcJ9Qg+oUQXdKE+hiSB5/F3RBZdiOgBJA2NCe4z3A7CTBphrnwlbu4gi8cSwtGg9w/Qg91mbjq+lcMWxZiwb81xjNHJfPrQKhpC+y+7S+T/2umcFBhucy4eEas0Vm6RkZ9jBJzBf76PQZSQOrBKobkQiehTU6RowxravOfMh1thknIGlOZV4l7bvLPmx4SMCje8XcwT2hfTBJuViQU5wFhiOukeAqX1ZMVzjvd7/QXx7l8QoJrk8k8NxjsMHA10c+7jQj5m5zXUKubNmV6BtVMDEm5VIDN/tQl95VMDfR8xkdvxWLn7LQP8teqvQUx6ld4WPbtK79sMi2YudtPATscu07r15LbdOivk+pF3nMu7S97ouTWZn7dJzvcG5JniUb6z/I/tpf7y50lioTJz96wTwZme+Zl/6DzVAn6lsF157Qyfh1GncPGOmhYPNWmOr9rL9or7Njt/lmiNntEGzphXtEvaY86YOHTd/pBPxd/l5ZLldJqCp2fy2wek3GV1v4AmPNFeA=</latexit> <latexit sha1_base64="DsCGYcYDam91TceFPLMPtT+zvGk=">AHYXic3VNb9NAEJ20LThKy3cejFElcolSioPSBUyAUuUVNIG9qUynY2idW1HdlrRBT1V3CFH8aZP8Lbsdt6mxg415a9s+M3M29mZ73OWHqxqtd/LSwu3bm7XFpZLd+7/+Dho8ra+mEcJpErOm4ow6jr2LGQXiA6ylNSdMeRsH1HiPnvKm/H30VUeyFwSc1GYtT3x4G3sBzbQXV5+7Z6Mt0a/v5xVmlWq/V+bJmhUYmVCm79sO1pafUoz6F5FJCPgkKSEGWZFOM+4QaVKcxdKc0hS6C5PF3QRdUhm0ClADChvYc7yFmJ5k2wFz7jNnaRSJ4KlRZsZpg95wNp01PGtHLYoxpR9a4TjE7m04dW0Qjaf9ldIv/XTuekwHCXc/HAc8wanaVrZDTAKDFX4K/fEyAFpD6sIkgudBLaVKNjRBjTurMR1xnm3ECkuZU5lXSvnvsw4aHGDx6V8wd3FM6AJOUiwU5xVlgmHCNBFf5smK6wnm/B4X+8iPV0hwTaK5HI5zHD4Y6OLYx4V+zMxtrlPInTW/Am2jAiberERs+G4X+sqjAv6eMJP58Vu5+C0D/bforUJPeZTeFT6Se+b3MsmrnYTQM7G7tMm9aT23brJDrR95xLu8ueaPr4lub+XWfvObekDxTLNlY/2H21074+2VxkJlouxfJ4A3O/MN+9J/qCH6TGW78NobOgmnTuPmGTMrHL6oNbZr9ovq3vsvNnhTboGW3hjNmhPXpP+9QBC5+0w/6ufy7tFqlNZT6OJCZvOYjKu08QclvEWM</latexit> <latexit sha1_base64="x3HIwqT8R4/RS/NyjFTVARbWV8U=">AHY3ic3VXNbtNAEJ62UJcU6A+9ISRDVIlT5BQPVSokAtcoqaQNmpSVbazSa1u7MheI6Koj8EVnosH4D34duy23iYGzrVl7+z4m29+dtbrjWQKMf5tbC4dO/+srXyoL68NHjtfWNzaMkSmNftP1IRnHcxMhg1C0VaCk6Ixj4Y48KY69i4b+fvxVxEkQhV/UZCxOR+4wDAaB7yqouj0VyL6Ydi7Pds7Wq07N4cueFeq5UKX8Oog2lp5Tj/oUkU8pjUhQSAqyJcS3F2qk0Nj6E5pCl0MKeDvgi6pAtsUKAGEC+0F3kPMurk2xFxzJmztw4vE8PSpu0c04c8YG02av92AVvmY8rcOsYJRi/nHEGr6Bzaf9ldIf/XTuekEOEu5xIgzjFrdJa+kdEAo8RcIX79ngApIPVhFUPyoZPQZhrtI8aY1Vnfs51dhknIOmYKrxKmrvHC4YEsTRu47cwz2lQ0SxWJDznA2Iky5RoKrfFUxXeEi72EpXxEV8AoJrk8N4aTQgyfDHS575NSHjNzl+sUcWfNr0DLqICJNyuRGNytUq4iKuTvKUcy3+z4L9poP/mvVnKVETpXTFCN+nd92ORaPgu2FgZ31XaNveumu3zgq5fuYd5/Pukre6Lrmzmd/0yR73huSZYsnF+g/zv3bWH+vNTYqE+f/OgG82ZnvmEv/oYboM5Xvwhs2dBJOnfrtM2ZWONqp1V/V3rReV/c/5OfPCj2lF/QSZ8xb2qePdEBt7vHv9IN+Lv+2Vq1NayuDLi7kNk/IuKxnfwDQUb2</latexit> <latexit sha1_base64="tXJIAdmN6KCKwM/dC3Y2wLmriA=">AHY3ic3VXNbtNAEJ62UJeUn7b0hpAMUSVOkdOC6KFChVzgEjWFtFGTqrKdTWp1Y0f2GhFfQyu8Fw8AO/Bt2O39TYxcK4te2fH3zs7NebyDRDnOr4XFpXv3l62VB5XVh48eP1lb3zhKojT2RduPZBR3PDcRMghFWwVKis4Fu7Ik+LYu2jo78dfRZwEUfhFTcbidOQOw2AQ+K6CqtTgeyLaefybOdsrerUHL7sWaGeC1XKr4NofekF9ahPEfmU0ogEhaQgS3Ipwd2lOjk0hu6UptDFkAL+LuiSKrBNgRJAuNBe4D3ErJtrQ8w1Z8LWPrxIPDEsbdrKMX3IA9Zmo/ZvF7BlPqbMrWOcYPRyzhG0is6h/ZfdFfJ/7XROChHuci4B4hyzRmfpGxkNMErMFeLX7wmQAlIfVjEkHzoJbabRPmKMWV15udcZ5dxApKOqcKrpLl7zOGCIUEcvevIPdxTOkQkWSw25AxnI8KUayS4ylcV0xUu8h6W8hVRAa+Q4JrEc2M4KcTwyUCX+z4p5TEzd7lOEXfW/Aq0jAqYeLMSicHdKuUqokL+nIk8/03C/6bBvpv3pulTEWU3hUjdJPefd/mWDQKvhsGdtZ3hbszbt26yQ62fecT7vLnmr65I7m/lNn+xb0ieKZcrP8w/2tn/fH+WmOjMnH+rxPAm535jrn0H2qIPlP5LrxhQyfh1KnfPmNmhaPtWn2n9qb1ur/IT9/VugZvaRXOGPe0j59pANqc49/px/0c/m3tWptWJsZdHEht3lKxmU9/wPXRkb3</latexit> Local Deviation Let { ˜ X k } k ∈ N be a discretization of { X t } t ≥ 0 , and { X ( k ) } s ≥ 0 be s another instance of the diffusion starting from ˜ X k − 1 at s = 0. The local deviation at iteration k is defined as D ( k ) = X ( k ) − ˜ X k . h h ˜ X 2 X (3) h D (3) h ˜ X 3
Uniform Orders of Local Deviation Recall local deviation D ( k ) = X ( k ) − ˜ X k . A numerical scheme has h h uniform mean-square and mean orders of ( p 1 , p 2 ) if for all k ∈ N E (1) � � D ( k ) �� h � 2 ≤ λ 1 h 2 p 1 , � = E 2 |F t k − 1 (1) E k �� � E (2) D ( k ) � 2 ≤ λ 2 h 2 p 2 , � �� = E � E h |F t k − 1 (2) k 2 for constants λ 1 and λ 2 independent of h . Remark: Bounds like (1) appeared explicitly in previous works (see e.g. [1]). To the best of our knowledge, (2) did not appear explicitly in previous works.
A General Theorem Theorem (Informal) Diffusion has a stationary distribution p ( x ) ∝ exp( − f ( x )) and exhibits exponential W 2 -contraction. Acting on this diffusion, a numerical discretization with uniform mean-square and mean 2 has rate ˜ orders of ( p 1 , p 2 ) for p 2 ≥ p 1 + 1 O ( ǫ − 1 / ( p 1 − 1 / 2) ) in W 2 . Remark 1: Connects the numerical SDE and sampling literatures: Take any classical SDE discretization method, instantly know the convergence rate when it’s used for sampling! Remark 2: Can also be used for discretizing the underdamped Langenvin diffusion! Check out our examples in the paper.
Convergence Rates for EM and SRK Result Diffusion Smoothness Unif. Orders Rate ˜ O ( d ǫ − 2 ) Langevin (1 . 0 , 1 . 5) EM (Durmus et al.) 1st ˜ O ( d ǫ − 1 ) Langevin (1 . 5 , 2 . 0) EM (Ex. 1) 1st & 2nd ˜ O ( d ǫ − 2 / 3 ) Langevin (2 . 0 , 2 . 5) SRK-LD (This work) 1st-3rd ˜ O ( d ǫ − 2 ) General (1 . 0 , 1 . 5) EM (Ex. 2) 1st ˜ O ( d 3 / 4 m 2 ǫ − 1 ) General (1 . 5 , 2 . 0) SRK-ID (This work) 1st Table: Convergence rates in W 2 , i.e. number of iterations required to reach ǫ accuracy to the target in W 2 . Top three for strongly convex f ; bottom two for non-convex f that admits uniformly dissipative diffusion. EM = Euler-Maruyama SRK = Stochastic Runge-Kutta
Thanks to you and my coauthors: Denny Wu Lester Mackey Murat A. Erdogdu
Our poster: East Exhibition Hall B + C #162 [1] Xiang Cheng, Niladri S Chatterji, Peter L Bartlett, and Michael I Jordan. Underdamped Langevin MCMC: A non-asymptotic analysis. [2] Arnak S Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave densities. [3] Alain Durmus, Eric Moulines, et al. Nonasymptotic convergence analysis for the unadjusted langevin algorithm. [4] Yin Tat Lee, Zhao Song, and Santosh S Vempala. Algorithmic theory of odes and sampling from well-conditioned logconcave densities. [5] Santosh S Vempala and Andre Wibisono. Rapid convergence of the unadjusted langevin algorithm: Log-sobolev suffices.
Recommend
More recommend