mk optimal transport and entropic relaxations
play

MK Optimal Transport and entropic relaxations Soumik Pal - PowerPoint PPT Presentation

MK Optimal Transport and entropic relaxations Soumik Pal University of Washington, Seattle Eigenfunctions seminar @ IISc Bangalore, August 30, 2019 Monge-Kantorovich Optimal Transport problem Gaspard Monge 1781 Figure: by M. Cuturi P , Q -


  1. MK Optimal Transport and entropic relaxations Soumik Pal University of Washington, Seattle Eigenfunctions seminar @ IISc Bangalore, August 30, 2019

  2. Monge-Kantorovich Optimal Transport problem

  3. Gaspard Monge 1781 Figure: by M. Cuturi P , Q - probabilities on X , Y , respectively, say both R d . c ( x , y ) - cost of transport. E.g., c ( x , y ) = � x − y � or 2 � x − y � 2 . c ( x , y ) = 1 Monge problem: minimize among T : R d → R d , T # P = Q , � c ( x , T ( x )) dP .

  4. Leonid Kantorovich 1939 Figure: by M. Cuturi Π( P , Q ) - couplings of ( P , Q ) (joint dist. with given marginals). (Monge-) Kantorovich relaxation: minimize among ν ∈ Π( P , Q ) �� � inf c ( x , y ) d ν . ν ∈ Π( P , Q )

  5. Duality cost → price Among all functions φ ( y ) , ψ ( x ) s.t. φ ( y ) − ψ ( x ) ≤ c ( x , y ) , maximize profit �� � � sup φ ( y ) Q ( dy ) − ψ ( x ) P ( dx ) . φ,ψ (Kantorovich duality) inf cost = sup profit. For the optimal “Kantorovich potentials” φ c ( x ) − ψ c ( y ) = c ( x , y ) , “optimal coupling” ν c - almost surely.

  6. Quadratic cost: Brenier’s theorem How do OT looks like? Very special! 2 � x − y � 2 . Assume P has density ρ 0 . c ( x , y ) = 1 (Y. Brenier) ∃ a convex F s.t. ( X , ∇ F ( X )) , X ∼ ρ 0 solves �� � W 2 ( MK − OT ) 2 ( P , Q ) := inf c ( x , y ) d ν . Π( P , Q ) K.- potentials? F ∗ ( y ) - Legendre convex dual of F . φ c ( x ) = 1 − ψ c ( y ) = 1 2 � x � 2 − F ( x ) , 2 � y � 2 − F ∗ ( y ) . 2 � x − y � 2 , for y = ∇ F ( x ) , i.e., a.s. ν c . φ c ( x ) − ψ c ( y ) = 1

  7. A generalized notion of convexity (Gangbo-McCann) Figure: by C. Villani Convex functions lie above their tangents. c -convex function ψ ( x ) lie above the cost curve c ( · , y ) , y ∈ ∂ c ψ ( x ) . optimal Kantorovich potentials are c -concave. ψ c ( x ) = sup y [ φ c ( y ) − c ( x , y )] , φ c ( y ) − ψ c ( x ) = c ( x , y ) , y ∈ ∂ c ψ ( x ) .

  8. Convex cost: Gangbo - McCann ’96 c ( x , y ) = g ( x − y ) , g strictly convex + P has density ρ 0 . ∃ c -concave function ψ c ( x ) for which T ( x ) = x − ( ∇ g ) − 1 ◦ ∇ ψ c ( x ) is s.t. ( X , T ( X )) , X ∼ ρ 0 , ! solves the MK OT problem. T ( x ) ∈ ∂ c ψ c ( x ) . Monge solution is also MK solution. Does not cover g ( z ) = � z � or g ( z ) = 1 { z � = 0 } .

  9. Existence of Monge solution Sufficient conditions (Bernard-Buffoni, Villani, De Philippis) X , Y bounded, open. P , Q have densities. c ( x , y ) ∈ C 2 . y �→ D x c ( x , y ) is injective for each x (Twist condition). x �→ D y c ( x , y ) is injective for each y . See book by Villani Chapter 10. Smoothness of optimal T . Ma-Trudinger-Wang ’05, Loeper ’09 (see Villani, Chap 12).

  10. Transport in one dimension Suppose X = R = Y . for all convex c ( x , y ) = g ( x − y ) the OT map is well-known. Monotone transport AKA inverse c.d.f. transform. T ( x ) = G − 1 ◦ G 0 ( x ) , 1 G 0 , G 1 - c.d.f. of P , Q , resp, continuous. Optimal, unique if g is strict. (Homework)

  11. Entropic Relaxation or Entropic Regularization

  12. OT and statistics Goal: Fit data to model. Classical: MLE. Recent: minimize W 2 2 ( data , model ) . Better estimates, more stable, high dimension, Adversarial Network training. Problem is computation. Discrete MK-OT.

  13. OT and statistics Goal: Fit data to model. Classical: MLE. Recent: minimize W 2 2 ( data , model ) . Better estimates, more stable, high dimension, Adversarial Network training. Problem is computation. Discrete MK-OT. Given two empirical distributions n n � � � � p i = 1 = p i δ x i , q j δ y j , q j , i = 1 j = 1 i j minimize � c , M � := � � j c ( x i , y j ) M ij , among all n × n matrices i M ≥ 0 with row sum p and col sum q .

  14. Entropic relaxation, Cuturi ’13 Linear programing M . Simplex, interior point methods give complexity O ( n 3 log n ) . Pretty bad.

  15. Entropic relaxation, Cuturi ’13 Linear programing M . Simplex, interior point methods give complexity O ( n 3 log n ) . Pretty bad. Define � Ent ( M ) = M ij log M ij , 0 log 0 = 0 . i , j

  16. Entropic relaxation, Cuturi ’13 Linear programing M . Simplex, interior point methods give complexity O ( n 3 log n ) . Pretty bad. Define � Ent ( M ) = M ij log M ij , 0 log 0 = 0 . i , j For h > 0, minimize [ � c , M � + h Ent ( M )] . Penalizes degenerate solutions (sparse M ). Optimal h ↓ 0. n 2 log n � � Computational complexity ≈ O . How?

  17. Entropic relaxation: solution For h > 0, minimize [ � c , M � + h Ent ( M )] . Solution (Lagrange multipliers + calculus): ∃ u , v ∈ R n � − 1 � M c = Diag ( u ) exp Diag ( v ) , i.e. , hc � − 1 � M c ( i , j ) = u i exp hc ( x i , y j ) v j , 1 ≤ i , j ≤ n . Remember this form. Will get back in continuum.

  18. Sinkhorn algorithm AKA IPFP M c can be solved by Iterative Proportional Fitting Procedure. − 1 � � Start with M 0 = exp h c . Inductively ... Rescale rows of M k to get M k + 1 with row sum p . Rescale columns of M k + 1 to get M k + 2 with col sum q . Limit = M c . Called Sinkhorn iterations in Linear Algebra.

  19. Entropic relaxation in continuum Recall X , Y ⊆ R d . Cost c ( x , y ) . P , Q have densities ρ 0 , ρ 1 . For density ν ∈ Π( ρ 0 , ρ 1 ) , � Ent ( ν ) = ν ( x , y ) log ν ( x , y ) dxdy . Entropic relaxation: h > 0, �� � c ( x , y ) ν ( x , y ) dxdy + h Ent ( ν ) , ν ∈ Π( ρ 0 , ρ 1 ) minimize .

  20. Entropic relaxation: continuum solution (Hobby - Pyke ’65, Rüschendorff-Thomsen ’93) Optimal solution � � a ( x ) + b ( y ) − 1 ν c ( x , y ) = exp hc ( x , y ) � − 1 � = u ( x ) exp hc ( x , y ) v ( y ) . Just like the discrete case. Can be computed by IPFP. Unfortunately, very slow convergence.

  21. Entropic duality � Recall duality for MK-OT: inf Π( ρ 0 ,ρ 1 ) c ( x , y ) ν ( x , y ) dxdy �� � � = sup φ ( y ) ρ 1 ( y ) dy − ψ ( x ) ρ 0 ( x ) dx . φ ( y ) − ψ ( x ) ≤ c ( x , y ) Duality for entropic relaxation: Solve �� � � � e φ ( y ) − 1 h c ( x , y ) − ψ ( x ) sup φ ( y ) ρ 1 ( y ) dy − ψ ( x ) ρ 0 ( x ) dx − h . Optimal solutions: ψ ( y ) = b ( y ) , φ ( x ) = − a ( x ) . a , b are Schrödinger potentials.

  22. Schrödinger bridges, Large Deviations

  23. Schrödinger’s problem: Lazy gas experiment Imagine N ≈ ∞ independent gas molecules in a cold chamber. � N Initial configuration of particles L 0 = 1 i = 1 δ x i ≈ P . N Each particle independent Brownian motion with σ 2 ≈ 0. � N Condition of the terminal configuration L 1 = 1 j = 1 δ y j ≈ Q . N (Schrödinger ’32) What is the probability of the above event? What is the most likely path followed by an individual gas molecule?

  24. Föllmer’s reformulation ’88 Relative Entropy (RE) of µ w.r.t. ν � d µ � � H ( µ | ν ) = log d µ. d ν R - Law of σ 2 BM on C [ 0 , 1 ] , initial distribution P . Among all probability µ on C [ 0 , 1 ] s.t. X 0 ∼ P , X 1 ∼ Q , minimize H ( µ | R ) . Solution is Schrödinger bridge between P and Q . Take σ 2 ↓ 0.

  25. Föllmer’s disintegration Brownian transition 1 � − 1 � 2 σ 2 � y − x � 2 p σ ( x , y ) = √ 2 π ) d exp . ( (Föllmer) Let R 01 be the law of ( X 0 , X 1 ) . Find ν ∈ Π( P , Q ) to minimize H ( ν | R 01 ) . Generate ( X 0 , X 1 ) from the minimizer. Schrödinger bridge is σ 2 Brownian bridge given X 0 = x 0 , X 1 = x 1 .

  26. Entropic relxation and Schrödinger bridge Minimize H ( ν | R 01 ) is the same problem as � 1 � � � y − x � 2 d ν + σ 2 Ent ( ν ) minimize . 2 Entropic relaxation h = σ 2 for the quadratic cost. Schrödinger bridge description: solve the entropic relaxation and join by Brownian bridge. What happens when σ 2 ↓ 0?

  27. Large deviation As h = σ 2 → 0 + , the optimal entropic coupling converges to the MK-optimal coupling. Recall Brenier: P ( dx ) = ρ 0 ( x ) dx , Q ( dy ) = ρ 1 ( y ) dy . ∃ F such that y = ∇ F ( x ) gives Monge. σ 2 Brownian bridge converges to a constant velocity straight line joining x and y . Can be made precise by Large Deviation theory. Let ρ t be law at time t of this limit. McCann interpolation between ρ 0 and ρ 1 . Remember this name for later.

  28. ( f , g ) transform of Markov processes How to describe the law of Schrödinger bridges? SDE? PDE? Markovian ( f , g ) transform of reversible Wiener measure W : d µ = f ( X 0 ) g ( X 1 ) d W , E W f ( X 0 ) g ( X 1 ) = 1 . Similar to Girsanov / Doob’s h -transform, but on both sides. Markovian diffusion both forward and backward.

  29. Generators for Schrödinger bridges Let µ t be the law of the σ 2 = 1 Schrödinger bridge. Recall Schrödinger potentials: a ( x ) , b ( y ) . Define, heat-flows � e b ( X 1 ) | X t = y � � e a ( X 0 ) | X t = x � b t ( y ) = log W , a t ( x ) = log W . Schrödinger bridge is BM with drift ∇ b t forward in time. Schrödinger bridge is BM with drift ∇ a t backward in time. Most properties are poorly understood.

  30. Dynamics and geometry

  31. McCann interpolation Figure: by M. Cuturi � R d � P 2 - square integrable probabilities 2 � y − x � 2 . Recall: ρ 0 transported to ρ 1 . c ( x , y ) = 1 Square-root optimal cost W 2 ( ρ 0 , ρ 1 ) is a metric. ρ t = Law of ( 1 − t ) X + tT ( X ) , X ∼ ρ 0 , 0 ≤ t ≤ 1.

Recommend


More recommend