accelerated flow for probability distributions
play

Accelerated Flow for Probability Distributions Thirty-sixth - PowerPoint PPT Presentation

Accelerated Flow for Probability Distributions Thirty-sixth International Conference on Machine Learning, Long Beach, 2019 Amirhossein Taghvaei Joint work with P. G. Mehta Coordinated Science Laboratory University of Illinois at


  1. Accelerated Flow for Probability Distributions Thirty-sixth International Conference on Machine Learning, Long Beach, 2019 Amirhossein Taghvaei Joint work with P. G. Mehta Coordinated Science Laboratory University of Illinois at Urbana-Champaign June 13, 2019

  2. Objective and main idea Euclidean space Space of probability distributions Wasserstein gradient flow Gradient descent Accelerated methods ? Objective: Construct accelerated flows for probability distribution Approach: (Wibisono, et. al. 2017) proposed a variational formulation to construct accelerated flows on Euclidean space Our approach is to extend the variational formulation for probability distributions Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 1 / 6 Amirhossein

  3. Objective and main idea Euclidean space Space of probability distributions Wasserstein gradient flow Gradient descent Accelerated methods ? Objective: Construct accelerated flows for probability distribution Approach: (Wibisono, et. al. 2017) proposed a variational formulation to construct accelerated flows on Euclidean space Our approach is to extend the variational formulation for probability distributions Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 1 / 6 Amirhossein

  4. Objective and main idea Euclidean space Space of probability distributions Wasserstein gradient flow Gradient descent Accelerated methods ? Objective: Construct accelerated flows for probability distribution Approach: (Wibisono, et. al. 2017) proposed a variational formulation to construct accelerated flows on Euclidean space Our approach is to extend the variational formulation for probability distributions Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 1 / 6 Amirhossein

  5. Variational formulation for Euclidean space vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) ? Gradient flow x t = −∇ f ( x t ) ˙ ? t 3 (1 x t | 2 − f ( x t )) Lagrangian 2 | ˙ ? x t = − 3 Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) ? Accelerated flow is obtained by minimizing the action integral of the Lagrangian Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 2 / 6 Amirhossein

  6. Wasserstein gradient flow vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) F ( ρ ) = D ( ρ � ρ ∞ ) √ Gradient flow x t = −∇ f ( x t ) ˙ d X t = −∇ f ( X t ) d t + 2 d B t t 3 (1 2 | u t | 2 − f ( x t )) Lagrangian ? x t = − 3 Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) ? The Wasserstein gradient flow with respect to relative entropy is the Fokker-Planck equation (Jordan, et. al. 1998) The Fokker-Planck equation is realized with the Langevin sde The goal is to obtain accelerated forms of the sde Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 3 / 6 Amirhossein

  7. Wasserstein gradient flow vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) F ( ρ ) = D ( ρ � ρ ∞ ) √ Gradient flow x t = −∇ f ( x t ) ˙ d X t = −∇ f ( X t ) d t + 2 d B t t 3 (1 2 | u t | 2 − f ( x t )) Lagrangian ? x t = − 3 Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) ? The Wasserstein gradient flow with respect to relative entropy is the Fokker-Planck equation (Jordan, et. al. 1998) The Fokker-Planck equation is realized with the Langevin sde The goal is to obtain accelerated forms of the sde Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 3 / 6 Amirhossein

  8. Wasserstein gradient flow vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) F ( ρ ) = D ( ρ � ρ ∞ ) √ Gradient flow x t = −∇ f ( x t ) ˙ d X t = −∇ f ( X t ) d t + 2 d B t t 3 (1 2 | u t | 2 − f ( x t )) Lagrangian ? x t = − 3 Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) ? The Wasserstein gradient flow with respect to relative entropy is the Fokker-Planck equation (Jordan, et. al. 1998) The Fokker-Planck equation is realized with the Langevin sde The goal is to obtain accelerated forms of the sde Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 3 / 6 Amirhossein

  9. Summary vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) F ( ρ ) = D ( ρ � ρ ∞ ) √ Gradient flow x t = −∇ f ( x t ) ˙ d X t = −∇ f ( X t ) d t + 2 d B t t 3 (1 E [ t 3 (1 x t | 2 − f ( x t )) X t | 2 − f ( X t ) − log( ρ ( X t )))] 2 | ˙ Lagrangian 2 | ˙ x t = − 3 X t = − 3 ¨ ˙ Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) X t − ∇ f ( X t ) − ∇ log( ρ t ( X t )) t The accelerated flow involves a mean-field term ∇ log ρ t ( X t ) which depends on the distribution of X t The numerical algorithm involves a system of interacting particles The mean-field term is approximated in terms of particles Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 4 / 6 Amirhossein

  10. Summary vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) F ( ρ ) = D ( ρ � ρ ∞ ) √ Gradient flow x t = −∇ f ( x t ) ˙ d X t = −∇ f ( X t ) d t + 2 d B t t 3 (1 E [ t 3 (1 x t | 2 − f ( x t )) X t | 2 − f ( X t ) − log( ρ ( X t )))] 2 | ˙ Lagrangian 2 | ˙ x t = − 3 X t = − 3 ¨ ˙ Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) X t − ∇ f ( X t ) − ∇ log( ρ t ( X t )) t The accelerated flow involves a mean-field term ∇ log ρ t ( X t ) which depends on the distribution of X t The numerical algorithm involves a system of interacting particles The mean-field term is approximated in terms of particles Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 4 / 6 Amirhossein

  11. Summary vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) F ( ρ ) = D ( ρ � ρ ∞ ) √ Gradient flow x t = −∇ f ( x t ) ˙ d X t = −∇ f ( X t ) d t + 2 d B t t 3 (1 E [ t 3 (1 x t | 2 − f ( x t )) X t | 2 − f ( X t ) − log( ρ ( X t )))] 2 | ˙ Lagrangian 2 | ˙ x t = − 3 X t = − 3 ¨ ˙ Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) X t − ∇ f ( X t ) − ∇ log( ρ t ( X t )) t The accelerated flow involves a mean-field term ∇ log ρ t ( X t ) which depends on the distribution of X t The numerical algorithm involves a system of interacting particles The mean-field term is approximated in terms of particles Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 4 / 6 Amirhossein

  12. Summary vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) F ( ρ ) = D ( ρ � ρ ∞ ) √ Gradient flow x t = −∇ f ( x t ) ˙ d X t = −∇ f ( X t ) d t + 2 d B t t 3 (1 E [ t 3 (1 x t | 2 − f ( x t )) X t | 2 − f ( X t ) − log( ρ ( X t )))] 2 | ˙ Lagrangian 2 | ˙ x t = − 3 X t = − 3 ¨ ˙ Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) X t − ∇ f ( X t ) − ∇ log( ρ t ( X t )) t The accelerated flow involves a mean-field term ∇ log ρ t ( X t ) which depends on the distribution of X t The numerical algorithm involves a system of interacting particles The mean-field term is approximated in terms of particles Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 4 / 6 Amirhossein

  13. Numerical example Gaussian The target distribution is Gaussian 10 2 10 0 10 2 10 4 KL ( t | ) O ( 1 10 6 t 2 ) 10 0 10 1 t Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 5 / 6 Amirhossein

  14. Numerical example non-Gaussian The target distribution is mixture of two Gaussians KL ( t | ) O ( 1 t 2 ) 10 0 10 1 10 2 t 0 t 1 t 2 t=t 0 t=t 1 t=t 2 10 3 10 0 10 1 t Thanks for your attention. For more details come to see poster #206 Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 6 / 6 Amirhossein

  15. Numerical example non-Gaussian The target distribution is mixture of two Gaussians KL ( t | ) O ( 1 t 2 ) 10 0 10 1 10 2 t 0 t 1 t 2 t=t 0 t=t 1 t=t 2 10 3 10 0 10 1 t Thanks for your attention. For more details come to see poster #206 Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 6 / 6 Amirhossein

Recommend


More recommend