Mirrored Langevin Dynamics Ya-Ping Hsieh https://lions.epfl.ch Laboratory for Information and Inference Systems (LIONS) ´ Ecole Polytechnique F´ ed´ erale de Lausanne (EPFL) Switzerland NeurIPS Spotlight [Dec 6th, 2018] Joint work with Ali Kavis, Paul Rolland, Volkan Cevher @ LIONS
Introduction ◦ Task: given a target distribution d µ = e − V ( x ) d x , generate samples from µ . ⊲ Fundamental in machine learning/statistics/computer science/etc. Mirrored Langevin Dynamics | Ya-Ping Hsieh , https://lions.epfl.ch Slide 2/ 8
Introduction ◦ Task: given a target distribution d µ = e − V ( x ) d x , generate samples from µ . ⊲ Fundamental in machine learning/statistics/computer science/etc. ◦ A scalable framework: First-order sampling (assuming access to ∇ V ). Step 1. Langevin Dynamics √ X ∞ ∼ e − V . d X t = −∇ V ( X t )d t + 2d B t ⇒ Mirrored Langevin Dynamics | Ya-Ping Hsieh , https://lions.epfl.ch Slide 2/ 8
Introduction ◦ Task: given a target distribution d µ = e − V ( x ) d x , generate samples from µ . ⊲ Fundamental in machine learning/statistics/computer science/etc. ◦ A scalable framework: First-order sampling (assuming access to ∇ V ). Step 1. Langevin Dynamics √ X ∞ ∼ e − V . d X t = −∇ V ( X t )d t + 2d B t ⇒ Step 2. Discretize x k +1 = x k − β k ∇ V ( x k ) + � 2 β k ξ k ⊲ β k step-size, ξ k standard normal ⊲ strong analogy to gradient descent method Mirrored Langevin Dynamics | Ya-Ping Hsieh , https://lions.epfl.ch Slide 2/ 8
Recent progress: Unconstrained distributions are easy ◦ State-of-the-art: When dom ( V ) = R d , W 2 Assumption d TV KL Literature [Cheng and Bartlett, 2017] O � ǫ − 2 d � O � ǫ − 2 d � O � ǫ − 1 d � LI � ∇ 2 V � mI ˜ ˜ ˜ [Dalalyan and Karagulyan, 2017] [Durmus et al., 2018] O � ǫ − 4 d � O � ǫ − 2 d � LI � ∇ 2 V � 0 ˜ ˜ - [Durmus et al., 2018] � E � X − Y � 2 , Note: W 2 ( µ 1 , µ 2 ) ≔ inf d TV ( µ 1 , µ 2 ) ≔ sup | µ 1 ( A ) − µ 2 ( A ) | X ∼ µ 1 ,Y ∼ µ 2 A Borel Mirrored Langevin Dynamics | Ya-Ping Hsieh , https://lions.epfl.ch Slide 3/ 8
Recent progress: Unconstrained distributions are easy ◦ State-of-the-art: When dom ( V ) = R d , W 2 Assumption d TV KL Literature [Cheng and Bartlett, 2017] O � ǫ − 2 d � O � ǫ − 2 d � O � ǫ − 1 d � LI � ∇ 2 V � mI ˜ ˜ ˜ [Dalalyan and Karagulyan, 2017] [Durmus et al., 2018] O � ǫ − 4 d � O � ǫ − 2 d � LI � ∇ 2 V � 0 ˜ ˜ - [Durmus et al., 2018] � E � X − Y � 2 , Note: W 2 ( µ 1 , µ 2 ) ≔ inf d TV ( µ 1 , µ 2 ) ≔ sup | µ 1 ( A ) − µ 2 ( A ) | X ∼ µ 1 ,Y ∼ µ 2 A Borel ◦ What about constrained distributions? ⊲ include many important applications, such as Latent Dirichlet Allocation (LDA). Mirrored Langevin Dynamics | Ya-Ping Hsieh , https://lions.epfl.ch Slide 3/ 8
A challenge: Constrained distributions are hard ◦ When dom ( V ) is compact, convergence rates deteriorate significantly. W 2 or KL d TV Assumption Literature O � ǫ − 6 d 5 � LI � ∇ 2 V � mI ˜ ? [Brosse et al., 2017] O � ǫ − 6 d 5 � LI � ∇ 2 V � 0 ˜ ? [Brosse et al., 2017] ⊲ cf. , when V is unconstrained, ˜ O ( ǫ − 4 d ) convergence in d TV . ⊲ Projection is not a solution: slow rates [Bubeck et al., 2015], boundary issues. Mirrored Langevin Dynamics | Ya-Ping Hsieh , https://lions.epfl.ch Slide 4/ 8
Unconstrained optimization of constrained problems ◦ Entropic Mirror Descent : Unconstrained optimization within the simplex. min V ( x ) x ∈ ∆ d ⊲ Choose h to be the entropic mirror map, h ⋆ its dual Mirrored Langevin Dynamics | Ya-Ping Hsieh , https://lions.epfl.ch Slide 5/ 8
Unconstrained optimization of constrained problems ◦ Entropic Mirror Descent : Unconstrained optimization within the simplex. min V ( x ) x ∈ ∆ d ⊲ Choose h to be the entropic mirror map, h ⋆ its dual ⊲ Mirror vs primal image: y = ∇ h ( x ) ⇔ x = ∇ h ⋆ ( y ) y k +1 = y k − β k ∇ V ( x k ) ⇒ no projection since dom ( h ⋆ ) = R d . Mirrored Langevin Dynamics | Ya-Ping Hsieh , https://lions.epfl.ch Slide 5/ 8
Unconstrained optimization of constrained problems ◦ Entropic Mirror Descent : Unconstrained optimization within the simplex. min V ( x ) x ∈ ∆ d ⊲ Choose h to be the entropic mirror map, h ⋆ its dual ⊲ Mirror vs primal image: y = ∇ h ( x ) ⇔ x = ∇ h ⋆ ( y ) y k +1 = y k − β k ∇ V ( x k ) ⇒ no projection since dom ( h ⋆ ) = R d . ◦ A “mirror descent theory” for Langevin Dynamics? Mirrored Langevin Dynamics | Ya-Ping Hsieh , https://lions.epfl.ch Slide 5/ 8
Mirrored Langevin Dynamics (MLD) ◦ Given e − V and h , compute e − W ≔ ∇ h # e − V √ � d Y t = −∇ W ◦ ∇ h ( X t )d t + 2d B t X ∞ ∼ e − V . MLD ≡ ⇒ X t = ∇ h ⋆ ( Y t ) Mirrored Langevin Dynamics | Ya-Ping Hsieh , https://lions.epfl.ch Slide 6/ 8
Mirrored Langevin Dynamics (MLD) ◦ Given e − V and h , compute e − W ≔ ∇ h # e − V √ � d Y t = −∇ W ◦ ∇ h ( X t )d t + 2d B t X ∞ ∼ e − V . MLD ≡ ⇒ X t = ∇ h ⋆ ( Y t ) √ � y k +1 = y k − β k ∇ W ( y k ) + 2 ξ k ◦ Discretize: . x k +1 = ∇ h ⋆ ( y k +1 ) Mirrored Langevin Dynamics | Ya-Ping Hsieh , https://lions.epfl.ch Slide 6/ 8
Mirrored Langevin Dynamics (MLD) ◦ Given e − V and h , compute e − W ≔ ∇ h # e − V √ � d Y t = −∇ W ◦ ∇ h ( X t )d t + 2d B t X ∞ ∼ e − V . MLD ≡ ⇒ X t = ∇ h ⋆ ( Y t ) √ � y k +1 = y k − β k ∇ W ( y k ) + 2 ξ k ◦ Discretize: . x k +1 = ∇ h ⋆ ( y k +1 ) ◦ The dual distribution e − W can be unconstrained even if e − V is constrained. ⊲ Convergence rates for e − W are easy. Mirrored Langevin Dynamics | Ya-Ping Hsieh , https://lions.epfl.ch Slide 6/ 8
Benefits of MLD ◦ Improved rates for constrained sampling. ◦ Can turn non-convex problems into convex ones!! � � ⊲ We provide the first ˜ 1 O rate for Latent Dirichlet Allocation. √ T ◦ Works well in practice. Mirrored Langevin Dynamics | Ya-Ping Hsieh , https://lions.epfl.ch Slide 7/ 8
For more details... Welcome to our poster #43!! Mirrored Langevin Dynamics | Ya-Ping Hsieh , https://lions.epfl.ch Slide 8/ 8
[0] Brosse, N., Durmus, A., Moulines, ´ E., and Pereyra, M. (2017). Sampling from a log-concave distribution with compact support with proximal langevin monte carlo. arXiv preprint arXiv:1705.08964 . [0] Bubeck, S., Eldan, R., and Lehec, J. (2015). Sampling from a log-concave distribution with projected langevin monte carlo. arXiv preprint arXiv:1507.02564 . [0] Cheng, X. and Bartlett, P. (2017). Convergence of langevin mcmc in kl-divergence. arXiv preprint arXiv:1705.09048 . [0] Dalalyan, A. S. and Karagulyan, A. G. (2017). User-friendly guarantees for the langevin monte carlo with inaccurate gradient. arXiv preprint arXiv:1710.00095 . [0] Durmus, A., Majewski, S., and Miasojedow, B. (2018). Analysis of langevin monte carlo via convex optimization. arXiv preprint arXiv:1802.09188 . Mirrored Langevin Dynamics | Ya-Ping Hsieh , https://lions.epfl.ch Slide 8/ 8
Recommend
More recommend