convergence rates for discretized optimal transport
play

Convergence rates for discretized optimal transport Quentin M - PowerPoint PPT Presentation

Convergence rates for discretized optimal transport Quentin M erigot Universit e Paris-Sud 11 Based on joint work with F. Chazal and A. Delalande Workshop on numerical solutions of HJB equations, Paris, January 2020 1 1. Motivations 2


  1. Motivation 2: ”Linearization” of W 2 ◮ We fix a reference measure, ρ = Leb X with X ⊆ R d convex compact with | X | = 1 . Given µ ∈ Prob 2 ( R d ) , we define T µ as the unique map satisfying (i) T µ = ∇ φ µ a.e. for some convex function φ µ : X → R and (ii) T µ # ρ = µ . 6 - 2

  2. Motivation 2: ”Linearization” of W 2 ◮ We fix a reference measure, ρ = Leb X with X ⊆ R d convex compact with | X | = 1 . Given µ ∈ Prob 2 ( R d ) , we define T µ as the unique map satisfying (i) T µ = ∇ φ µ a.e. for some convex function φ µ : X → R and (ii) T µ # ρ = µ . ◮ The map µ ∈ Prob 2 ( R d ) → T µ ∈ L 2 ( X ) is an injective map, with image the space of (square-integrable) gradients of convex functions on X . 6 - 3

  3. Motivation 2: ”Linearization” of W 2 ◮ We fix a reference measure, ρ = Leb X with X ⊆ R d convex compact with | X | = 1 . Given µ ∈ Prob 2 ( R d ) , we define T µ as the unique map satisfying (i) T µ = ∇ φ µ a.e. for some convex function φ µ : X → R and (ii) T µ # ρ = µ . ◮ The map µ ∈ Prob 2 ( R d ) → T µ ∈ L 2 ( X ) is an injective map, with image the space of (square-integrable) gradients of convex functions on X . � W 2 ,ρ ( µ, ν ) := � T µ − T ν � L 2 ( ρ ) − → [Ambrosio, Gigli, Savar´ e ’04] Riemannian geometry Optimal transport µ ∈ Prob 2 ( R d ) point x ∈ M geodesic distance d g ( x, y ) W 2 ( µ, ν ) T ρ Prob 2 ( R d ) ⊆ L 2 ( ρ, X ) tangent space T ρ M exp − 1 inverse exponential map ρ ( x ) ∈ T ρ M T µ ∈ T ρ Prob 2 ( X ) � exp − 1 ρ ( x ) − exp − 1 distance in tangent space ρ ( y ) � g ( x 0 ) � T µ − T ν � L 2 ( ρ ) 6 - 4

  4. Motivation 2: ”Linearization” of W 2 ◮ We fix a reference measure, ρ = Leb X with X ⊆ R d convex compact with | X | = 1 . Given µ ∈ Prob 2 ( R d ) , we define T µ as the unique map satisfying (i) T µ = ∇ φ µ a.e. for some convex function φ µ : X → R and (ii) T µ # ρ = µ . ◮ The map µ ∈ Prob 2 ( R d ) → T µ ∈ L 2 ( X ) is an injective map, with image the space of (square-integrable) gradients of convex functions on X . � W 2 ,ρ ( µ, ν ) := � T µ − T ν � L 2 ( ρ ) − → [Ambrosio, Gigli, Savar´ e ’04] Riemannian geometry Optimal transport µ ∈ Prob 2 ( R d ) point x ∈ M geodesic distance d g ( x, y ) W 2 ( µ, ν ) T ρ Prob 2 ( R d ) ⊆ L 2 ( ρ, X ) tangent space T ρ M exp − 1 inverse exponential map ρ ( x ) ∈ T ρ M T µ ∈ T ρ Prob 2 ( X ) � exp − 1 ρ ( x ) − exp − 1 distance in tangent space ρ ( y ) � g ( x 0 ) � T µ − T ν � L 2 ( ρ ) � Used in image analysis − → [Wang, Slepcev, Basu, Ozolek, Rohde ’13] 6 - 5

  5. Motivation 2: ”Linearization” of W 2 ◮ We fix a reference measure, ρ = Leb X with X ⊆ R d convex compact with | X | = 1 . Given µ ∈ Prob 2 ( R d ) , we define T µ as the unique map satisfying (i) T µ = ∇ φ µ a.e. for some convex function φ µ : X → R and (ii) T µ # ρ = µ . ◮ The map µ ∈ Prob 2 ( R d ) → T µ ∈ L 2 ( X ) is an injective map, with image the space of (square-integrable) gradients of convex functions on X . � W 2 ,ρ ( µ, ν ) := � T µ − T ν � L 2 ( ρ ) − → [Ambrosio, Gigli, Savar´ e ’04] Riemannian geometry Optimal transport µ ∈ Prob 2 ( R d ) point x ∈ M geodesic distance d g ( x, y ) W 2 ( µ, ν ) T ρ Prob 2 ( R d ) ⊆ L 2 ( ρ, X ) tangent space T ρ M exp − 1 inverse exponential map ρ ( x ) ∈ T ρ M T µ ∈ T ρ Prob 2 ( X ) � exp − 1 ρ ( x ) − exp − 1 distance in tangent space ρ ( y ) � g ( x 0 ) � T µ − T ν � L 2 ( ρ ) � Used in image analysis − → [Wang, Slepcev, Basu, Ozolek, Rohde ’13] → Representing family of probability measures by family of functions in L 2 ( ρ ) . − 6 - 6

  6. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . 7 - 1

  7. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. 7 - 2

  8. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. � � 1 � ◮ ”Linearized” Wasserstein barycenters: µ := i α i T µ i # ρ. � i α i − → Simple expression once the transport maps T µ i : ρ → µ i have been computed. spt( µ 1 ) spt( µ 0 ) 7 - 3

  9. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. � � 1 � ◮ ”Linearized” Wasserstein barycenters: µ := i α i T µ i # ρ. � i α i − → Simple expression once the transport maps T µ i : ρ → µ i have been computed. spt( µ 1 ) spt( µ 0 ) (0 . 8 T µ 1 + 0 . 2 T µ 0 ) # ρ 7 - 4

  10. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. � � 1 � ◮ ”Linearized” Wasserstein barycenters: µ := i α i T µ i # ρ. � i α i − → Simple expression once the transport maps T µ i : ρ → µ i have been computed. spt( µ 1 ) spt( µ 0 ) 7 - 5

  11. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. � � 1 � ◮ ”Linearized” Wasserstein barycenters: µ := i α i T µ i # ρ. � i α i − → Simple expression once the transport maps T µ i : ρ → µ i have been computed. spt( µ 1 ) spt( µ 0 ) 7 - 6

  12. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. � � 1 � ◮ ”Linearized” Wasserstein barycenters: µ := i α i T µ i # ρ. � i α i − → Simple expression once the transport maps T µ i : ρ → µ i have been computed. spt( µ 1 ) spt( µ 0 ) 7 - 7

  13. Example: barycenter computation ◮ Barycenter in Wasserstein space: µ 1 , . . . , µ k ∈ Prob 2 ( R d ) , α 1 , . . . , α k ≥ 0 : 1 ≤ i ≤ k α i W 2 � µ := arg min 1 ≤ i ≤ k 2 ( µ, µ i ) . − → Need to solve an optimisation problem every time the coefficients α i are changed. � � 1 � ◮ ”Linearized” Wasserstein barycenters: µ := i α i T µ i # ρ. � i α i − → Simple expression once the transport maps T µ i : ρ → µ i have been computed. spt( µ 1 ) spt( µ 0 ) What amount of the Wasserstein geometry is preserved by the embedding µ �→ T µ ? 7 - 8

  14. Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] 8 - 1

  15. Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] (Then, W p ( µ k , µ ) � 1 k .) 8 - 2

  16. Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] (Then, W p ( µ k , µ ) � 1 k .) ◮ Compute exactly the optimal transport plan T µ k between ρ and µ k , (using a semi-discrete optimal transport solver). 8 - 3

  17. Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] (Then, W p ( µ k , µ ) � 1 k .) ◮ Compute exactly the optimal transport plan T µ k between ρ and µ k , (using a semi-discrete optimal transport solver). It is know that T µ k converges to T µ but convergence rates are unknown in general... 8 - 4

  18. Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] (Then, W p ( µ k , µ ) � 1 k .) ◮ Compute exactly the optimal transport plan T µ k between ρ and µ k , (using a semi-discrete optimal transport solver). It is know that T µ k converges to T µ but convergence rates are unknown in general... In general, the numerical analysis for optimal transport is virtually inexistent, whatever the discretization method. 8 - 5

  19. Motivation 3: numerical analysis of optimal transport Theorem (Brenier, McCann) Given ρ ∈ Prob ac ( R d ) and µ ∈ Prob( R d ) , ∃ ! ρ -a.e. T µ : R d → R d such that T µ # ρ = µ and T µ = ∇ φ with φ convex. To solve numerically an OT problem between ρ ∈ Prob ac ( R d ) and µ ∈ Prob([0 , 1] d ) : ◮ Approximate µ by a discrete measure, for instance µ k = � i 1 ≤ ... ≤ i k µ ( B i 1 ,...,i k ) δ ( i 1 /k,...,i k /k ) where B i 1 ,...,i k is the cube [( i 1 − 1) /k, i 1 /k ] × . . . [( i d − 1) /k, i d /k ] (Then, W p ( µ k , µ ) � 1 k .) ◮ Compute exactly the optimal transport plan T µ k between ρ and µ k , (using a semi-discrete optimal transport solver). It is know that T µ k converges to T µ but convergence rates are unknown in general... In general, the numerical analysis for optimal transport is virtually inexistent, whatever the discretization method. 8 - 6

  20. 2. Continuity of µ �→ T µ . 9

  21. Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . 10 - 1

  22. Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . 10 - 2

  23. Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . � x − y � 2 d γ ( x, y ) = � T µ ( x ) − T ν ( x ) � 2 d ρ ( x ) . Thus, W 2 � � 2 ( µ, ν ) ≤ 10 - 3

  24. Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . � x − y � 2 d γ ( x, y ) = � T µ ( x ) − T ν ( x ) � 2 d ρ ( x ) . Thus, W 2 � � 2 ( µ, ν ) ≤ ◮ The map µ �→ T µ is continuous. 10 - 4

  25. Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . � x − y � 2 d γ ( x, y ) = � T µ ( x ) − T ν ( x ) � 2 d ρ ( x ) . Thus, W 2 � � 2 ( µ, ν ) ≤ ◮ The map µ �→ T µ is continuous. ◮ The map µ �→ T µ is not better than 1 2 -H¨ older. 10 - 5

  26. Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . � x − y � 2 d γ ( x, y ) = � T µ ( x ) − T ν ( x ) � 2 d ρ ( x ) . Thus, W 2 � � 2 ( µ, ν ) ≤ ◮ The map µ �→ T µ is continuous. ◮ The map µ �→ T µ is not better than 1 2 -H¨ older. δ xθ + δ xθ + π Take ρ = 1 π Leb B(0 , 1) on R 2 , and define µ θ = , with x θ = (cos( θ ) , sin( θ )) . 2 � x θ � x θ | x � ≥ 0 Then T µ θ ( x ) = , x θ + π if not T µ θ T µ θ x θ x θ + π 10 - 6

  27. Elementary remarks ◮ The map µ �→ T µ is reverse-Lipschitz , i.e. � T µ − T ν � L 2 ( ρ ) ≥ W 2 ( µ, ν ) . Indeed: since T µ # ρ = µ and T ν # ρ = ν , one has γ := ( T µ , T ν ) # ρ ∈ Γ( µ, ν ) . � x − y � 2 d γ ( x, y ) = � T µ ( x ) − T ν ( x ) � 2 d ρ ( x ) . Thus, W 2 � � 2 ( µ, ν ) ≤ ◮ The map µ �→ T µ is continuous. ◮ The map µ �→ T µ is not better than 1 2 -H¨ older. δ xθ + δ xθ + π Take ρ = 1 π Leb B(0 , 1) on R 2 , and define µ θ = , with x θ = (cos( θ ) , sin( θ )) . 2 � x θ � x θ | x � ≥ 0 so that � T µ θ − T µ θ + δ � 2 L 2 ( ρ ) ≥ Cδ Then T µ θ ( x ) = , x θ + π if not Since on the other hand, W 2 ( µ θ , µ θ + δ ) ≤ Cδ , x θ � T µ θ − T µ θ + δ � L 2 ( ρ ) ≥ C W 2 ( µ θ , µ θ + δ ) 1 / 2 x θ + π 10 - 7 δ

  28. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . 11 - 1

  29. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. 11 - 2

  30. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. 11 - 3

  31. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) 11 - 4

  32. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . 11 - 5

  33. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . ◮ Prop = ⇒ Thm: Kantorovich-Rubinstein theorem 11 - 6

  34. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . � � � ψ ν d( µ − ν ) = ψ ν d( ∇ φ µ # ρ − ∇ φ ν # ρ ) = ψ ν ( ∇ φ µ ) − ψ ν ( ∇ φ ν ) d ρ � 11 - 7

  35. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . � � � ψ ν d( µ − ν ) = ψ ν d( ∇ φ µ # ρ − ∇ φ ν # ρ ) = ψ ν ( ∇ φ µ ) − ψ ν ( ∇ φ ν ) d ρ � � (convexity: ψ ν ( y ) − ψ ν ( x ) ≥ � y − x |∇ ψ ν ( x ) � ) ≥ �∇ ψ µ − ∇ ψ ν |∇ ψ ν ( ∇ φ ν ) � d ρ 11 - 8

  36. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . � � � ψ ν d( µ − ν ) = ψ ν d( ∇ φ µ # ρ − ∇ φ ν # ρ ) = ψ ν ( ∇ φ µ ) − ψ ν ( ∇ φ ν ) d ρ � � (convexity: ψ ν ( y ) − ψ ν ( x ) ≥ � y − x |∇ ψ ν ( x ) � ) ≥ �∇ ψ µ − ∇ ψ ν |∇ ψ ν ( ∇ φ ν ) � d ρ � = �∇ ψ µ − ∇ ψ ν | id � d ρ 11 - 9

  37. Local 1 2 -H¨ older continuity Thm: Assume ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y ⊆ R d compact If T µ is L -Lipschitz, then � T µ − T ν � 2 2 ≤ C W 1 ( µ, ν ) with C = 4 L diam( X ) . ◮ ≃ [Ambrosio,Gigli ’09] with slightly better upper bound. See also [Berman ’18]. ◮ No regularity assumption on ν − → consequences in statistics and numerical analysis. ◮ Let φ µ : X → R convex s.t. T µ = ∇ φ µ . ψ µ : Y → R its Legendre transform: ψ µ ( y ) = max x ∈ X � x | y � − φ µ ( x ) Prop: If T µ is L -Lipschitz, then � T µ − T ν � 2 � L 2 ( ρ ) ≤ − 2 L ( ψ µ − ψ ν ) d( µ − ν ) . � � � ψ ν d( µ − ν ) = ψ ν d( ∇ φ µ # ρ − ∇ φ ν # ρ ) = ψ ν ( ∇ φ µ ) − ψ ν ( ∇ φ ν ) d ρ � � (convexity: ψ ν ( y ) − ψ ν ( x ) ≥ � y − x |∇ ψ ν ( x ) � ) ≥ �∇ ψ µ − ∇ ψ ν |∇ ψ ν ( ∇ φ ν ) � d ρ � = �∇ ψ µ − ∇ ψ ν | id � d ρ �∇ ψ ν − ∇ ψ µ | id � d ρ + L � � ψ µ d( ν − µ ) ≥ 2 �∇ φ µ − ∇ φ ν � L 2 ( ρ ) � ⇒ ψ µ = φ ∗ ( T µ = ∇ φ µ L -Lipschitz ⇐ µ is L -strongly convex) 11 - 10

  38. Global H¨ older continuity Thm (Berman, ’18): Let ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y compact. L 2 ( Y ) ≤ C W 1 ( µ, ν ) α with α = 1 Then, �∇ ψ µ − ∇ ψ ν � 2 2 d − 1 12 - 1

  39. Global H¨ older continuity Thm (Berman, ’18): Let ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y compact. L 2 ( Y ) ≤ C W 1 ( µ, ν ) α with α = 1 Then, �∇ ψ µ − ∇ ψ ν � 2 2 d − 1 L 2 ( ρ ) ≤ C W 1 ( µ, ν ) α with α = 1 Corollary: � T µ − T ν � 2 2 d − 1 ( d +2) 12 - 2

  40. Global H¨ older continuity Thm (Berman, ’18): Let ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y compact. L 2 ( Y ) ≤ C W 1 ( µ, ν ) α with α = 1 Then, �∇ ψ µ − ∇ ψ ν � 2 2 d − 1 L 2 ( ρ ) ≤ C W 1 ( µ, ν ) α with α = 1 Corollary: � T µ − T ν � 2 2 d − 1 ( d +2) ◮ The H¨ older exponent is terrible, but inequality holds without assumptions on µ, ν ! 12 - 3

  41. Global H¨ older continuity Thm (Berman, ’18): Let ρ ∈ Prob ac ( X ) and µ, ν ∈ Prob( Y ) with X, Y compact. L 2 ( Y ) ≤ C W 1 ( µ, ν ) α with α = 1 Then, �∇ ψ µ − ∇ ψ ν � 2 2 d − 1 L 2 ( ρ ) ≤ C W 1 ( µ, ν ) α with α = 1 Corollary: � T µ − T ν � 2 2 d − 1 ( d +2) ◮ The H¨ older exponent is terrible, but inequality holds without assumptions on µ, ν ! ◮ Proof of Berman’s theorem relies on techniques from complex geometry. 12 - 4

  42. 2. Global, dimension-independent, older-continuity of µ �→ T µ . H¨ 13

  43. Main theorem Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . 14 - 1

  44. Main theorem Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ First global and dimension-independent stability result for optimal transport maps. 14 - 2

  45. Main theorem Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ First global and dimension-independent stability result for optimal transport maps. 1 5 < 1 ◮ Gap between lower-bound and upper bound for H¨ older exponent: 2 . The exponent 1 5 is certainly not optimal... 14 - 3

  46. Main theorem Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ First global and dimension-independent stability result for optimal transport maps. 1 5 < 1 ◮ Gap between lower-bound and upper bound for H¨ older exponent: 2 . The exponent 1 5 is certainly not optimal... ◮ The constant C depend polynomially on diam( X ) , diam( Y ) . 14 - 4

  47. Main theorem Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ First global and dimension-independent stability result for optimal transport maps. 1 5 < 1 ◮ Gap between lower-bound and upper bound for H¨ older exponent: 2 . The exponent 1 5 is certainly not optimal... ◮ The constant C depend polynomially on diam( X ) , diam( Y ) . ◮ Proof relies on the semidiscrete setting, i.e. the bound is established in the case µ = � i µ i δ y i , ν = � i ν i δ y i . and one concludes using a density argument. 14 - 5

  48. Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) 15 - 1

  49. Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) Kantorovich duality � � = min φ ⊕ ψ ≥�·|·� φ d ρ + ψ d µ 15 - 2

  50. Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) Kantorovich duality � � = min φ ⊕ ψ ≥�·|·� φ d ρ + ψ d µ Legendre-Fenchel transform: ψ ∗ d ρ + � � = min ψ ψ d µ ψ ∗ ( x ) = max y � x | y � − ψ ( y ) 15 - 3

  51. Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) Kantorovich duality � � = min φ ⊕ ψ ≥�·|·� φ d ρ + ψ d µ Legendre-Fenchel transform: ψ ∗ d ρ + � � = min ψ ψ d µ ψ ∗ ( x ) = max y � x | y � − ψ ( y ) ◮ Let µ = � 1 ≤ i ≤ N µ i δ y i and ψ i = ψ ( y i ) . y 1 y 2 y 3 15 - 4

  52. Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) Kantorovich duality � � = min φ ⊕ ψ ≥�·|·� φ d ρ + ψ d µ Legendre-Fenchel transform: ψ ∗ d ρ + � � = min ψ ψ d µ ψ ∗ ( x ) = max y � x | y � − ψ ( y ) ◮ Let µ = � Then, ψ ∗ | V i ( ψ ) := �·| y i � − ψ i where 1 ≤ i ≤ N µ i δ y i and ψ i = ψ ( y i ) . V i ( ψ ) = { x | ∀ j, � x | y i � − ψ i ≥ � x | y j � − ψ j } V 1 ( ψ ) V 2 ( ψ ) y 1 y 2 y 3 V 3 ( ψ ) 15 - 5

  53. Semidiscrete OT for c ( x, y ) = −� x | y � ◮ Let ρ, ν ∈ Prob ac 1 ( R d ) and Γ( ρ, µ ) = couplings between ρ, µ , � T ( ρ, µ ) = max γ ∈ Γ( ρ,µ ) � x | y � d γ ( x, y ) Kantorovich duality � � = min φ ⊕ ψ ≥�·|·� φ d ρ + ψ d µ Legendre-Fenchel transform: ψ ∗ d ρ + � � = min ψ ψ d µ ψ ∗ ( x ) = max y � x | y � − ψ ( y ) ◮ Let µ = � Then, ψ ∗ | V i ( ψ ) := �·| y i � − ψ i where 1 ≤ i ≤ N µ i δ y i and ψ i = ψ ( y i ) . V i ( ψ ) = { x | ∀ j, � x | y i � − ψ i ≥ � x | y j � − ψ j } V 1 ( ψ ) V 2 ( ψ ) y 1 y 2 y 3 V 3 ( ψ ) � Thus, T ( ρ, µ ) = min ψ ∈ R N � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) + � i µ i ψ i i 15 - 6

  54. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i 16 - 1

  55. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . 16 - 2

  56. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i 16 - 3

  57. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ 16 - 4

  58. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i 16 - 5

  59. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types 16 - 6

  60. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types → given prices ψ ∈ R N , a customer x maximizes � x | y i � − ψ i over all products. − 16 - 7

  61. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types → given prices ψ ∈ R N , a customer x maximizes � x | y i � − ψ i over all products. − − → V i ( ψ ) = { x | i ∈ arg max j � x | y j � − ψ j } = customers choosing product y i . 16 - 8

  62. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types → given prices ψ ∈ R N , a customer x maximizes � x | y i � − ψ i over all products. − − → V i ( ψ ) = { x | i ∈ arg max j � x | y j � − ψ j } = customers choosing product y i . − → ρ ( V i ) = amount of customers for product y i . 16 - 9

  63. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types → given prices ψ ∈ R N , a customer x maximizes � x | y i � − ψ i over all products. − − → V i ( ψ ) = { x | i ∈ arg max j � x | y j � − ψ j } = customers choosing product y i . − → ρ ( V i ) = amount of customers for product y i . Optimal transport = finding prices satisfying capacity constraints ρ ( V i ( ψ )) = µ i . 16 - 10

  64. Optimality condition and economic interpretation � Φ( ψ ) := � V i ( ψ ) � x | y i � − ψ i d ρ ( x ) T ( ρ, µ ) = min ψ ∈ R N Φ( ψ ) − � i µ i ψ i , where: i ◮ Gradient: ∇ Φ( ψ ) = − ( G i ( ψ )) 1 ≤ i ≤ N where G i ( ψ ) = ρ ( V i ( ψ )) . ψ ∈ R N is a minimizer of dual pb ⇐ ⇒ ∀ i, ρ ( V i ( ψ )) = µ i ⇒ G ( ψ ) = µ with G = ( G 1 , . . . , G N ) , µ ∈ R N ⇐ ⇒ T = ∇ ψ ∗ transports ρ onto � ⇐ i µ i δ y i ◮ Economic interpretation: ρ = density of customers, { y i } 1 ≤ i ≤ N = product types → given prices ψ ∈ R N , a customer x maximizes � x | y i � − ψ i over all products. − − → V i ( ψ ) = { x | i ∈ arg max j � x | y j � − ψ j } = customers choosing product y i . − → ρ ( V i ) = amount of customers for product y i . Optimal transport = finding prices satisfying capacity constraints ρ ( V i ( ψ )) = µ i . ◮ Algorithm (Oliker–Prussner): coordinate-wise increment. Complexity: O ( N 3 ) . 16 - 11

  65. Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i y 5 y 2 Γ 15 ( ψ ) y 1 y 2 y 4 17 - 1

  66. Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i Let E = { ψ ∈ R N | ∀ i, G i ( ψ ) > 0 } ◮ If Ω = { ρ > 0 } is connected and ψ ∈ E , then KerD 2 Φ( ψ ) = R (1 , . . . , 1) . y 5 y 2 Γ 15 ( ψ ) y 1 y 2 y 4 17 - 2

  67. Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i Let E = { ψ ∈ R N | ∀ i, G i ( ψ ) > 0 } ◮ If Ω = { ρ > 0 } is connected and ψ ∈ E , then KerD 2 Φ( ψ ) = R (1 , . . . , 1) . ◮ Consider the matrix L = DG ( ψ ) and the graph H : y 5 y 2 ( i, j ) ∈ H ⇐ ⇒ L ij > 0 Γ 15 ( ψ ) y 1 y 2 y 4 17 - 3

  68. Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i Let E = { ψ ∈ R N | ∀ i, G i ( ψ ) > 0 } ◮ If Ω = { ρ > 0 } is connected and ψ ∈ E , then KerD 2 Φ( ψ ) = R (1 , . . . , 1) . ◮ Consider the matrix L = DG ( ψ ) and the graph H : y 5 y 2 ( i, j ) ∈ H ⇐ ⇒ L ij > 0 Γ 15 ( ψ ) ◮ If Ω is connected and ψ ∈ E , then H is connected y 1 y 2 y 4 17 - 4

  69. Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i Let E = { ψ ∈ R N | ∀ i, G i ( ψ ) > 0 } ◮ If Ω = { ρ > 0 } is connected and ψ ∈ E , then KerD 2 Φ( ψ ) = R (1 , . . . , 1) . ◮ Consider the matrix L = DG ( ψ ) and the graph H : y 5 y 2 ( i, j ) ∈ H ⇐ ⇒ L ij > 0 Γ 15 ( ψ ) ◮ If Ω is connected and ψ ∈ E , then H is connected y 1 ◮ L is the Laplacian of a connected graph = ⇒ Ker L = R · cst y 2 y 4 17 - 5

  70. Hessian on Φ and Newton’s Algorithm (Recall that G i ( ψ ) = ρ ( V i ( ψ )) and ∇ Φ = − ( G 1 , . . . , G N ) ) If ρ ∈ C 0 ( X ) and ( y i ) 1 ≤ i ≤ N is generic, then Φ ∈ C 2 ( R N ) and Proposition: ◮ ∂G i 1 � ∀ i � = j, ∂ψ j ( ψ ) = Γ ij ( ψ ) ρ ( x ) d x where Γ ij = V i ( ψ ) ∩ V j ( ψ ) . � y i − y j � ∂G i ∂G i ∂ψ i ( ψ ) = − � ∀ i, ∂ψ j ( ψ ) j � = i Let E = { ψ ∈ R N | ∀ i, G i ( ψ ) > 0 } ◮ If Ω = { ρ > 0 } is connected and ψ ∈ E , then KerD 2 Φ( ψ ) = R (1 , . . . , 1) . ◮ Consider the matrix L = DG ( ψ ) and the graph H : y 5 y 2 ( i, j ) ∈ H ⇐ ⇒ L ij > 0 Γ 15 ( ψ ) ◮ If Ω is connected and ψ ∈ E , then H is connected y 1 ◮ L is the Laplacian of a connected graph = ⇒ Ker L = R · cst y 2 Corollary: Global convergence of a damped Newton algorithm. y 4 [Kitagawa, M., Thibert 16] 17 - 6

  71. Numerical example Source: ρ = uniform on [0 , 1] 2 , Target: µ = 1 1 ≤ i ≤ N δ y i with y i uniform i.i.d. in [0 , 1 3 ] 2 � N ψ 0 = 1 2 � · � 2 18 - 1

  72. Numerical example Source: ρ = uniform on [0 , 1] 2 , Target: µ = 1 1 ≤ i ≤ N δ y i with y i uniform i.i.d. in [0 , 1 3 ] 2 � N ψ 0 = 1 2 � · � 2 ψ 1 = Newt( ψ 0 ) NB: The points do not move. 18 - 2

  73. Numerical example Source: ρ = uniform on [0 , 1] 2 , Target: µ = 1 1 ≤ i ≤ N δ y i with y i uniform i.i.d. in [0 , 1 3 ] 2 � N ψ 0 = 1 2 � · � 2 ψ 1 = Newt( ψ 0 ) ψ 2 = Newt( ψ 1 ) NB: The points do not move. 18 - 3

  74. Numerical example Source: ρ = uniform on [0 , 1] 2 , Target: µ = 1 1 ≤ i ≤ N δ y i with y i uniform i.i.d. in [0 , 1 3 ] 2 � N ψ 0 = 1 2 � · � 2 ψ 1 = Newt( ψ 0 ) ψ 2 = Newt( ψ 1 ) NB: The points do not move. Convergence is very fast when spt( ρ ) convex: 17 Newton iterations for N ≥ 10 7 in 3D. 18 - 4

  75. Proof ingredients 19 - 1

  76. Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . 19 - 2

  77. Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . 19 - 3

  78. Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . Consider ψ k ∈ R Y s.t. G ( ψ k ) = µ k , and ψ t = ψ 0 + tv with v = ψ 1 − ψ 0 . Then, 19 - 4

  79. Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . Consider ψ k ∈ R Y s.t. G ( ψ k ) = µ k , and ψ t = ψ 0 + tv with v = ψ 1 − ψ 0 . Then, � 1 � µ 1 − µ 0 | v � = � G ( ψ 1 ) − G ( ψ 0 ) | v � = 0 � D G ( ψ t ) v | v � d t 19 - 5

  80. Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . Consider ψ k ∈ R Y s.t. G ( ψ k ) = µ k , and ψ t = ψ 0 + tv with v = ψ 1 − ψ 0 . Then, � 1 � µ 1 − µ 0 | v � = � G ( ψ 1 ) − G ( ψ 0 ) | v � = 0 � D G ( ψ t ) v | v � d t a) Control of the eigengap : � D G ( ψ t ) v | v � ≤ − C ( X ) � v � 2 � L 2 ( µ t ) if v d µ t = 0 . with µ t = G ( ψ t ) − → [Eymard, Gallou¨ et, Herbin ’00]. 19 - 6

  81. Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . Consider ψ k ∈ R Y s.t. G ( ψ k ) = µ k , and ψ t = ψ 0 + tv with v = ψ 1 − ψ 0 . Then, � 1 � µ 1 − µ 0 | v � = � G ( ψ 1 ) − G ( ψ 0 ) | v � = 0 � D G ( ψ t ) v | v � d t a) Control of the eigengap : � D G ( ψ t ) v | v � ≤ − C ( X ) � v � 2 � L 2 ( µ t ) if v d µ t = 0 . with µ t = G ( ψ t ) − → [Eymard, Gallou¨ et, Herbin ’00]. b) Control of µ t : Brunn-Minkowski’s inequality implies µ t ≥ (1 − t ) d µ 0 . 19 - 7

  82. Proof ingredients Thm (M., Delalande, Chazal ’19): Let X convex compact with | X | = 1 and ρ = Leb X , and let Y be compact. Then, there exists C s.t. for all µ, ν ∈ Prob( Y ) , � T µ − T ν � L 2 ( X ) ≤ C W 2 ( µ, ν ) 1 / 5 . ◮ Strategy of proof : let µ k = � i µ k i δ y i for k ∈ { 0 , 1 } , assume all µ k i > 0 . Consider ψ k ∈ R Y s.t. G ( ψ k ) = µ k , and ψ t = ψ 0 + tv with v = ψ 1 − ψ 0 . Then, � 1 � µ 1 − µ 0 | v � = � G ( ψ 1 ) − G ( ψ 0 ) | v � = 0 � D G ( ψ t ) v | v � d t a) Control of the eigengap : � D G ( ψ t ) v | v � ≤ − C ( X ) � v � 2 � L 2 ( µ t ) if v d µ t = 0 . with µ t = G ( ψ t ) − → [Eymard, Gallou¨ et, Herbin ’00]. b) Control of µ t : Brunn-Minkowski’s inequality implies µ t ≥ (1 − t ) d µ 0 . Combining a) and b) we get � ψ 1 − ψ 0 � 2 L 2 ( µ 0 ) � |� µ 1 − µ 0 | ψ 1 − ψ 0 �| 19 - 8

Recommend


More recommend