Ruthotto ML meet OT @ Oct 2020 Machine Learning ↔ Optimal Transport Sayas Numerics Seminar Lars Ruthotto Departments of Mathematics and Computer Science Emory University lruthotto@emory.edu @lruthotto Title ML → OT Lag NN Exp OT → CNF Σ 1
Ruthotto ML meet OT @ Oct 2020 Agenda: Machine Learning meets Optimal Transport ◮ ML → OT: New Tricks from Learning ◮ based on relaxed dynamical optimal transport ◮ combine macroscopic / microscopic / HJB equations ◮ neural networks for value function ◮ combine analytic gradients and automatic differentiation ◮ generalization to mean field games and control problems ◮ OT → ML: Learning from Old Tricks ◮ variational inference via continuous normalizing flows ◮ applications: density estimation, generative modeling ◮ OT � uniqueness and regularity of dynamics ◮ HJB, solid numerics, and efficient implementation ◮ orders of magnitude speedup training and inference LR, S Osher, W Li, L Nurbekyan, S Wu Fung D Onken, S Wu Fung, X Li, LR A ML Framework for Solving High-Dimensional MFG and MFC OT-Flow: Fast and Accurate CNF via OT PNAS 117 (17), 9183-9193, 2020 arXiv:2006.00104, 2020. Title ML → OT Lag NN Exp OT → CNF Σ 2
Ruthotto ML meet OT @ Oct 2020 Collaborators and Funding Emory Funding: DMS 1751636 ◮ BSF 2018209 ◮ FA9550-20-1-0372 ◮ Special Thanks: ◮ Organizers and staff of IPAM Li Long Program MLP 2019. ◮ Osher’s funding AFOSR Stan Osher MURI and ONR Onken Wu Fung Nurbekyan Title ML → OT Lag NN Exp OT → CNF Σ 3
Ruthotto ML meet OT @ Oct 2020 initial density, ρ 0 target density, ρ 1 density evolution Dynamic Optimal Transport (Benamou and Brenier, ’00) Given the initial density, ρ 0 , and the target density, ρ 1 , find the velocity v that renders the push-forward of ρ 0 equal to ρ 1 and minimizes the transport costs, i.e., � 1 � 1 2 � v ( x , t ) � 2 ρ ( x , t ) dxdt minimize v ,ρ Ω 0 ∂ t ρ + ∇ · ( ρ v ) = 0 , ρ ( · , 0 ) = ρ 0 ( · ) , ρ ( · , 1 ) = ρ 1 ( · ) subject to Title ML → OT Lag NN Exp OT → CNF Σ 4
Ruthotto ML meet OT @ Oct 2020 initial density, ρ 0 target density, ρ 1 density evolution ρ ( · , 1 ) push-fwd of ρ 0 Dynamic Optimal Transport (Benamou and Brenier, ’00) Given the initial density, ρ 0 , and the target density, ρ 1 , find the velocity v that renders the push-forward of ρ 0 equal to ρ 1 and minimizes the transport costs, i.e., � 1 � 1 2 � v ( x , t ) � 2 ρ ( x , t ) dxdt minimize v ,ρ Ω 0 ∂ t ρ + ∇ · ( ρ v ) = 0 , ρ ( · , 0 ) = ρ 0 ( · ) , ρ ( · , 1 ) = ρ 1 ( · ) subject to Title ML → OT Lag NN Exp OT → CNF Σ 4
Ruthotto ML meet OT @ Oct 2020 initial density, ρ 0 target density, ρ 1 density evolution Relaxed Dynamical Optimal Transport Given the initial density, ρ 0 , and the target density, ρ 1 , find the velocity v that minimizes the discrepancy between the push-forward of ρ 0 and ρ 1 and the transport costs, i.e., � 1 � 1 def 2 � v ( x , t ) � 2 ρ ( x , t ) dxdt + G ( ρ ( · , 1 ) , ρ 1 ) minimize v ,ρ J MFG ( ρ, v ) = Ω 0 subject to ∂ t ρ + ∇ · ( ρ v ) = 0 , ρ ( · , 0 ) = ρ 0 ( · ) (CE) Examples for terminal cost G : L 2 , Kullback Leibler divergence,. . . Side note: relaxed OT problem is a potential mean field game (MFG) Title ML → OT Lag NN Exp OT → CNF Σ 5
Ruthotto ML meet OT @ Oct 2020 initial density, ρ 0 target density, ρ 1 density evolution ρ ( · , 1 ) push-fwd of ρ 0 Relaxed Dynamical Optimal Transport Given the initial density, ρ 0 , and the target density, ρ 1 , find the velocity v that minimizes the discrepancy between the push-forward of ρ 0 and ρ 1 and the transport costs, i.e., � 1 � 1 def 2 � v ( x , t ) � 2 ρ ( x , t ) dxdt + G ( ρ ( · , 1 ) , ρ 1 ) minimize v ,ρ J MFG ( ρ, v ) = Ω 0 subject to ∂ t ρ + ∇ · ( ρ v ) = 0 , ρ ( · , 0 ) = ρ 0 ( · ) (CE) Examples for terminal cost G : L 2 , Kullback Leibler divergence,. . . Side note: relaxed OT problem is a potential mean field game (MFG) Title ML → OT Lag NN Exp OT → CNF Σ 5
Ruthotto ML meet OT @ Oct 2020 Relaxed Dynamic Optimal Transport: A Microscopic View A single agent with initial position x ∈ Ω aims at choosing v that minimizes � 1 1 2 � v ( s ) � 2 ds + G ( z ( 1 ) , ρ ( z ( 1 ) , 1 )) , J x , 0 ( v ) = 0 where their position changes according to ∂ t z ( s ) = v ( s ) , 0 ≤ s ≤ 1 , z ( 0 ) = x . ◮ G ( x , ρ ) = δ G ( ρ,ρ 1 ) ( x ) (variational derivative of G ) δρ ◮ agent interacts with the population through ρ and G ◮ z ( · ) is characteristic curve of (CE) starting at x Title ML → OT Lag NN Exp OT → CNF Σ 6
Ruthotto ML meet OT @ Oct 2020 Relaxed Dynamic Optimal Transport: A Microscopic View A single agent with initial position x ∈ Ω aims at choosing v that minimizes � 1 1 2 � v ( s ) � 2 ds + G ( z ( 1 ) , ρ ( z ( 1 ) , 1 )) , J x , 0 ( v ) = 0 where their position changes according to ∂ t z ( s ) = v ( s ) , 0 ≤ s ≤ 1 , z ( 0 ) = x . ◮ G ( x , ρ ) = δ G ( ρ,ρ 1 ) ( x ) (variational derivative of G ) δρ ◮ agent interacts with the population through ρ and G ◮ z ( · ) is characteristic curve of (CE) starting at x Useful to define the value of an agent’s state ( x , t ) as Φ( x , t ) = inf v J x , t ( v ) Title ML → OT Lag NN Exp OT → CNF Σ 6
Ruthotto ML meet OT @ Oct 2020 Hamilton-Jacobi-Bellman (HJB) Equation initial density, ρ 0 value function density evolution target density, ρ 1 Lasry & Lions ’06: First-order optimality conditions of relaxed OT are − ∂ t Φ( x , t ) + 1 2 �∇ Φ( x , t ) � 2 = 0 , Φ( x , 1 ) = G ( x , ρ ( x , 1 )) (HJB) and optimal strategy is v ( x , t ) = −∇ Φ( x , t ) , which gives ∂ t ρ ( x , t ) − ∇ · ( ρ ( x , t ) ∇ Φ( x , t )) = 0 , ρ ( x , 0 ) = ρ 0 ( x ) (CE) challenges: forward-backward structure and high-dimensionality of PDE system Title ML → OT Lag NN Exp OT → CNF Σ 7
Ruthotto ML meet OT @ Oct 2020 Machine Learning for High-Dimensional OT: Overview Three options for solving the problem 1. minimize J MFG w.r.t. ( ρ, v ) , or ( ρ, −∇ Φ) (variational problem) 2. minimize J x , t w.r.t. v or −∇ Φ for some points x (microscopic view) 3. compute value function by solving (HJB) and (CE) (high-dimensional PDEs) Title ML → OT Lag NN Exp OT → CNF Σ 8
Ruthotto ML meet OT @ Oct 2020 Machine Learning for High-Dimensional OT: Overview Three options for solving the problem 1. minimize J MFG w.r.t. ( ρ, v ) , or ( ρ, −∇ Φ) (variational problem) 2. minimize J x , t w.r.t. v or −∇ Φ for some points x (microscopic view) 3. compute value function by solving (HJB) and (CE) (high-dimensional PDEs) Idea: Combine advantages of the above to tackle curse of dimensionality Title ML → OT Lag NN Exp OT → CNF Σ 8
Ruthotto ML meet OT @ Oct 2020 Machine Learning for High-Dimensional OT: Overview Three options for solving the problem 1. minimize J MFG w.r.t. ( ρ, v ) , or ( ρ, −∇ Φ) (variational problem) 2. minimize J x , t w.r.t. v or −∇ Φ for some points x (microscopic view) 3. compute value function by solving (HJB) and (CE) (high-dimensional PDEs) Idea: Combine advantages of the above to tackle curse of dimensionality ◮ formulate as variational problem. minimize J MFG ( ρ, −∇ Φ) ◮ eliminate (CE) with Lagrangian PDE solver � mesh-free, parallel ◮ parameterize Φ with NN � universal approximator, mesh-free, cheap(?) ◮ penalize violations of (HJB) � regularity, global convergence(?) Title ML → OT Lag NN Exp OT → CNF Σ 8
Ruthotto ML meet OT @ Oct 2020 Lagrangian Method for Continuity Equation Assume Φ given. Then, the solution to ∂ t ρ ( x , t ) − ∇ · ( ρ ( x , t ) ∇ Φ( x , t )) = 0 , ρ ( x , 0 ) = ρ 0 ( x ) satisfies ρ ( z ( x , t ) , t ) det ∇ z ( x , t ) = ρ 0 ( x ) along the characteristic curve ∂ t z ( x , t ) = −∇ Φ( z ( x , t )) , z ( x , 0 ) = x . Title ML → OT Lag NN Exp OT → CNF Σ 9
Ruthotto ML meet OT @ Oct 2020 Lagrangian Method for Continuity Equation Assume Φ given. Then, the solution to ∂ t ρ ( x , t ) − ∇ · ( ρ ( x , t ) ∇ Φ( x , t )) = 0 , ρ ( x , 0 ) = ρ 0 ( x ) satisfies ρ ( z ( x , t ) , t ) det ∇ z ( x , t ) = ρ 0 ( x ) along the characteristic curve ∂ t z ( x , t ) = −∇ Φ( z ( x , t )) , z ( x , 0 ) = x . instead of computing det ∇ z ( x , t ) (cost O ( d 3 ) flops) use � 1 def = log det( ∇ z ( x , t )) = l ( x , t ) ∆Φ( z ( x , t ) , t ) dt 0 Hint: Compute z and l in one ODE solve (parallelize over x 1 , x 2 , . . . ). Title ML → OT Lag NN Exp OT → CNF Σ 9
Ruthotto ML meet OT @ Oct 2020 Lagrangian Method for Optimal Transport � � minimize Φ c L ( x , 1 ) + G ( z ( x , 1 )) + α 1 c H ( x , 1 ) + α 2 � Φ( z ( x , 1 ) , 1 ) − G ( z ( x , 1 )) � E ρ 0 −∇ Φ( z ( x , t ) , t ) z ( x , t ) l ( x , t ) − ∆Φ( z ( x , t ) , t ) subject to = t ∈ ( 0 , 1 ] ∂ t , 1 c L ( x , t ) 2 �∇ Φ( z ( x , t ) , t ) � 2 � 2 �∇ Φ( z ( x , t ) , t ) � 2 � � ∂ t Φ( z ( x , t ) , t ) + 1 c H ( x , t ) � z ( x , 0 ) = x , l ( x , 0 ) = c L ( x , 0 ) = c H ( x , 0 ) = 0 Title ML → OT Lag NN Exp OT → CNF Σ 10
Recommend
More recommend