Accelerated Flow for Probability Distributions Thirty-sixth International Conference on Machine Learning, Long Beach, 2019 Amirhossein Taghvaei Joint work with P. G. Mehta Coordinated Science Laboratory University of Illinois at Urbana-Champaign June 13, 2019
Objective and main idea Euclidean space Space of probability distributions Wasserstein gradient flow Gradient descent Accelerated methods ? Objective: Construct accelerated flows for probability distribution Approach: (Wibisono, et. al. 2017) proposed a variational formulation to construct accelerated flows on Euclidean space Our approach is to extend the variational formulation for probability distributions Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 1 / 6 Amirhossein
Objective and main idea Euclidean space Space of probability distributions Wasserstein gradient flow Gradient descent Accelerated methods ? Objective: Construct accelerated flows for probability distribution Approach: (Wibisono, et. al. 2017) proposed a variational formulation to construct accelerated flows on Euclidean space Our approach is to extend the variational formulation for probability distributions Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 1 / 6 Amirhossein
Objective and main idea Euclidean space Space of probability distributions Wasserstein gradient flow Gradient descent Accelerated methods ? Objective: Construct accelerated flows for probability distribution Approach: (Wibisono, et. al. 2017) proposed a variational formulation to construct accelerated flows on Euclidean space Our approach is to extend the variational formulation for probability distributions Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 1 / 6 Amirhossein
Variational formulation for Euclidean space vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) ? Gradient flow x t = −∇ f ( x t ) ˙ ? t 3 (1 x t | 2 − f ( x t )) Lagrangian 2 | ˙ ? x t = − 3 Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) ? Accelerated flow is obtained by minimizing the action integral of the Lagrangian Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 2 / 6 Amirhossein
Wasserstein gradient flow vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) F ( ρ ) = D ( ρ � ρ ∞ ) √ Gradient flow x t = −∇ f ( x t ) ˙ d X t = −∇ f ( X t ) d t + 2 d B t t 3 (1 2 | u t | 2 − f ( x t )) Lagrangian ? x t = − 3 Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) ? The Wasserstein gradient flow with respect to relative entropy is the Fokker-Planck equation (Jordan, et. al. 1998) The Fokker-Planck equation is realized with the Langevin sde The goal is to obtain accelerated forms of the sde Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 3 / 6 Amirhossein
Wasserstein gradient flow vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) F ( ρ ) = D ( ρ � ρ ∞ ) √ Gradient flow x t = −∇ f ( x t ) ˙ d X t = −∇ f ( X t ) d t + 2 d B t t 3 (1 2 | u t | 2 − f ( x t )) Lagrangian ? x t = − 3 Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) ? The Wasserstein gradient flow with respect to relative entropy is the Fokker-Planck equation (Jordan, et. al. 1998) The Fokker-Planck equation is realized with the Langevin sde The goal is to obtain accelerated forms of the sde Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 3 / 6 Amirhossein
Wasserstein gradient flow vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) F ( ρ ) = D ( ρ � ρ ∞ ) √ Gradient flow x t = −∇ f ( x t ) ˙ d X t = −∇ f ( X t ) d t + 2 d B t t 3 (1 2 | u t | 2 − f ( x t )) Lagrangian ? x t = − 3 Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) ? The Wasserstein gradient flow with respect to relative entropy is the Fokker-Planck equation (Jordan, et. al. 1998) The Fokker-Planck equation is realized with the Langevin sde The goal is to obtain accelerated forms of the sde Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 3 / 6 Amirhossein
Summary vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) F ( ρ ) = D ( ρ � ρ ∞ ) √ Gradient flow x t = −∇ f ( x t ) ˙ d X t = −∇ f ( X t ) d t + 2 d B t t 3 (1 E [ t 3 (1 x t | 2 − f ( x t )) X t | 2 − f ( X t ) − log( ρ ( X t )))] 2 | ˙ Lagrangian 2 | ˙ x t = − 3 X t = − 3 ¨ ˙ Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) X t − ∇ f ( X t ) − ∇ log( ρ t ( X t )) t The accelerated flow involves a mean-field term ∇ log ρ t ( X t ) which depends on the distribution of X t The numerical algorithm involves a system of interacting particles The mean-field term is approximated in terms of particles Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 4 / 6 Amirhossein
Summary vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) F ( ρ ) = D ( ρ � ρ ∞ ) √ Gradient flow x t = −∇ f ( x t ) ˙ d X t = −∇ f ( X t ) d t + 2 d B t t 3 (1 E [ t 3 (1 x t | 2 − f ( x t )) X t | 2 − f ( X t ) − log( ρ ( X t )))] 2 | ˙ Lagrangian 2 | ˙ x t = − 3 X t = − 3 ¨ ˙ Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) X t − ∇ f ( X t ) − ∇ log( ρ t ( X t )) t The accelerated flow involves a mean-field term ∇ log ρ t ( X t ) which depends on the distribution of X t The numerical algorithm involves a system of interacting particles The mean-field term is approximated in terms of particles Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 4 / 6 Amirhossein
Summary vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) F ( ρ ) = D ( ρ � ρ ∞ ) √ Gradient flow x t = −∇ f ( x t ) ˙ d X t = −∇ f ( X t ) d t + 2 d B t t 3 (1 E [ t 3 (1 x t | 2 − f ( x t )) X t | 2 − f ( X t ) − log( ρ ( X t )))] 2 | ˙ Lagrangian 2 | ˙ x t = − 3 X t = − 3 ¨ ˙ Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) X t − ∇ f ( X t ) − ∇ log( ρ t ( X t )) t The accelerated flow involves a mean-field term ∇ log ρ t ( X t ) which depends on the distribution of X t The numerical algorithm involves a system of interacting particles The mean-field term is approximated in terms of particles Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 4 / 6 Amirhossein
Summary vector variables R d probability distribution P 2 ( R d ) Objective funct. f ( x ) F ( ρ ) = D ( ρ � ρ ∞ ) √ Gradient flow x t = −∇ f ( x t ) ˙ d X t = −∇ f ( X t ) d t + 2 d B t t 3 (1 E [ t 3 (1 x t | 2 − f ( x t )) X t | 2 − f ( X t ) − log( ρ ( X t )))] 2 | ˙ Lagrangian 2 | ˙ x t = − 3 X t = − 3 ¨ ˙ Accelerated flow ¨ t ˙ x t − ∇ f ( x t ) X t − ∇ f ( X t ) − ∇ log( ρ t ( X t )) t The accelerated flow involves a mean-field term ∇ log ρ t ( X t ) which depends on the distribution of X t The numerical algorithm involves a system of interacting particles The mean-field term is approximated in terms of particles Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 4 / 6 Amirhossein
Numerical example Gaussian The target distribution is Gaussian 10 2 10 0 10 2 10 4 KL ( t | ) O ( 1 10 6 t 2 ) 10 0 10 1 t Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 5 / 6 Amirhossein
Numerical example non-Gaussian The target distribution is mixture of two Gaussians KL ( t | ) O ( 1 t 2 ) 10 0 10 1 10 2 t 0 t 1 t 2 t=t 0 t=t 1 t=t 2 10 3 10 0 10 1 t Thanks for your attention. For more details come to see poster #206 Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 6 / 6 Amirhossein
Numerical example non-Gaussian The target distribution is mixture of two Gaussians KL ( t | ) O ( 1 t 2 ) 10 0 10 1 10 2 t 0 t 1 t 2 t=t 0 t=t 1 t=t 2 10 3 10 0 10 1 t Thanks for your attention. For more details come to see poster #206 Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 6 / 6 Amirhossein
Recommend
More recommend