bridging the gap between optimal transport and mmd with
play

Bridging the gap between Optimal Transport and MMD with Sinkhorn - PowerPoint PPT Presentation

Distances Entropic Regularization Sinkhorn Divergences Conclusion Bridging the gap between Optimal Transport and MMD with Sinkhorn Divergences Aude Genevay MIT CSAIL CIRM Workshop - March 2020 Joint work with Gabriel Peyr, Marco Cuturi,


  1. Distances Entropic Regularization Sinkhorn Divergences Conclusion Bridging the gap between Optimal Transport and MMD with Sinkhorn Divergences Aude Genevay MIT CSAIL CIRM Workshop - March 2020 Joint work with Gabriel Peyré, Marco Cuturi, Francis Bach, Lénaïc Chizat 1/46

  2. Distances Entropic Regularization Sinkhorn Divergences Conclusion Comparing Probability Measures continuous 훼 훽 semi-discrete 훼 훽 Discrete 2/46

  3. Distances Entropic Regularization Sinkhorn Divergences Conclusion Discrete Setting (Quantization) � k � n ( x 1 ,..., x k ) D ( 1 i = 1 δ x i , 1 Figure 1 – min i = 1 δ y j ) k n 3/46

  4. Distances Entropic Regularization Sinkhorn Divergences Conclusion Discrete Setting (Quantization) � k � n ( x 1 ,..., x k ) D ( 1 i = 1 δ x i , 1 Figure 1 – min i = 1 δ y j ) k n 3/46

  5. Distances Entropic Regularization Sinkhorn Divergences Conclusion Discrete Setting (Quantization) � k � n ( x 1 ,..., x k ) D ( 1 i = 1 δ x i , 1 Figure 1 – min i = 1 δ y j ) k n 3/46

  6. Distances Entropic Regularization Sinkhorn Divergences Conclusion Discrete Setting (Quantization) � k � n ( x 1 ,..., x k ) D ( 1 i = 1 δ x i , 1 Figure 1 – min i = 1 δ y j ) k n 3/46

  7. Distances Entropic Regularization Sinkhorn Divergences Conclusion Semi-discrete Setting (Density Fitting) 훽 Figure 2 – min θ D ( α θ , β ) 4/46

  8. Distances Entropic Regularization Sinkhorn Divergences Conclusion Semi-discrete Setting (Density Fitting) 훼 휽 훽 Figure 2 – min θ D ( α θ , β ) 4/46

  9. Distances Entropic Regularization Sinkhorn Divergences Conclusion Semi-discrete Setting (Density Fitting) 훼 휽 훽 Figure 2 – min θ D ( α θ , β ) 4/46

  10. Distances Entropic Regularization Sinkhorn Divergences Conclusion Semi-discrete Setting (Density Fitting) 훼 휽 * 훽 Figure 2 – min θ D ( α θ , β ) 4/46

  11. Distances Entropic Regularization Sinkhorn Divergences Conclusion 1 Notions of Distance between Measures 2 Entropic Regularization of Optimal Transport 3 Sinkhorn Divergences : Interpolation between OT and MMD 4 Conclusion 5/46

  12. Distances Entropic Regularization Sinkhorn Divergences Conclusion ϕ -divergences (Czisar ’63) Definition ( ϕ -divergence) Let ϕ convex l.s.c. function such that ϕ ( 1 ) = 0, the ϕ -divergence D ϕ between two measures α and β is defined by : � � d α ( x ) � def. D ϕ ( α | β ) = ϕ d β ( x ) . d β ( x ) X Example (Kullback Leibler Divergence) � d α � � D KL ( α | β ) = ↔ log d β ( x ) d α ( x ) ϕ ( x ) = x log( x ) X 6/46

  13. Distances Entropic Regularization Sinkhorn Divergences Conclusion Weak Convergence of measures Example On R , α = δ 0 and α n = δ 1 / n : D KL ( α n | α ) = + ∞ . 0 1 n = 1 7/46

  14. Distances Entropic Regularization Sinkhorn Divergences Conclusion Weak Convergence of measures Example On R , α = δ 0 and α n = δ 1 / n : D KL ( α n | α ) = + ∞ . 0 1 n = 2 7/46

  15. Distances Entropic Regularization Sinkhorn Divergences Conclusion Weak Convergence of measures Example On R , α = δ 0 and α n = δ 1 / n : D KL ( α n | α ) = + ∞ . 0 1 n = 3 7/46

  16. Distances Entropic Regularization Sinkhorn Divergences Conclusion Weak Convergence of measures Example On R , α = δ 0 and α n = δ 1 / n : D KL ( α n | α ) = + ∞ . 0 1 n = 4 7/46

  17. Distances Entropic Regularization Sinkhorn Divergences Conclusion Weak Convergence of measures Example On R , α = δ 0 and α n = δ 1 / n : D KL ( α n | α ) = + ∞ . 0 1 n = 5 7/46

  18. Distances Entropic Regularization Sinkhorn Divergences Conclusion Weak Convergence of measures Example On R , α = δ 0 and α n = δ 1 / n : D KL ( α n | α ) = + ∞ . 0 1 n = 6 7/46

  19. Distances Entropic Regularization Sinkhorn Divergences Conclusion Weak Convergence of measures Example On R , α = δ 0 and α n = δ 1 / n : D KL ( α n | α ) = + ∞ . 0 1 n = 7 7/46

  20. Distances Entropic Regularization Sinkhorn Divergences Conclusion Weak Convergence of measures Example On R , α = δ 0 and α n = δ 1 / n : D KL ( α n | α ) = + ∞ . 0 1 n = 8 7/46

  21. Distances Entropic Regularization Sinkhorn Divergences Conclusion Weak Convergence of measures Example On R , α = δ 0 and α n = δ 1 / n : D KL ( α n | α ) = + ∞ . 0 1 n = 9 7/46

  22. Distances Entropic Regularization Sinkhorn Divergences Conclusion Weak Convergence of measures Example On R , α = δ 0 and α n = δ 1 / n : D KL ( α n | α ) = + ∞ . 0 1 n = 10 Definition (Weak Convergence) α n weakly converges to α , ( denoted α n ⇀ α ) � � ⇔ f ( x ) d α n ( x ) → f ( x ) d α ( x ) ∀ f ∈ C b ( X ) . Let D distance between measures , D metrises weak � � D ( α n , α ) → 0 ⇔ α n ⇀ α convergence IFF . 7/46

  23. Distances Entropic Regularization Sinkhorn Divergences Conclusion Maximum Mean Discrepancies (Gretton ’06) Definition (RKHS) Let H a Hilbert space with kernel k , then H is a Reproducing Kernel Hilbert Space (RKHS) IFF : 1 ∀ x ∈ X , k ( x , · ) ∈ H , 2 ∀ f ∈ H , f ( x ) = � f , k ( x , · ) � H . Let H a RKHS avec kernel k , the distance MMD between two probability measures α and β is defined by : � 2 � def. MMD 2 | E α ( f ( X )) − E β ( f ( Y )) | k ( α, β ) = sup { f || | f | | H � 1 } E α ⊗ α [ k ( X , X ′ )] + E β ⊗ β [ k ( Y , Y ′ )] = − 2 E α ⊗ β [ k ( X , Y )] . 8/46

  24. Distances Entropic Regularization Sinkhorn Divergences Conclusion Optimal Transport (Monge 1781, Kantorovitch ’42) • c ( x , y ) : cost of moving a unit of mass from x to y • π ( x , y ) (coupling) : how much mass moves from x to y 9/46

  25. Distances Entropic Regularization Sinkhorn Divergences Conclusion The Wasserstein Distance Minimal cost of moving all the mass from α to β ? Let α ∈ M 1 + ( X ) and β ∈ M 1 + ( Y ) , � W c ( α, β ) = min c ( x , y ) d π ( x , y ) ( P ) π ∈ Π( α,β ) X×Y 2 , W c ( α, β ) 1 / p is the p-Wasserstein | p For c ( x , y ) = | | x − y | distance . 10/46

  26. Distances Entropic Regularization Sinkhorn Divergences Conclusion Optimal Transport vs. MMD MMD OT O ( n − 1 / d ) sample complexity ( 1 √ n ) (curse of dimension) O ( n 3 log( n )) O ( n 2 ) computation 11/46

  27. Distances Entropic Regularization Sinkhorn Divergences Conclusion Simple example n n ( x 1 ,..., x n ) D ( 1 δ x i , 1 � � min δ y j ) n n i = 1 i = 1 12/46

  28. Distances Entropic Regularization Sinkhorn Divergences Conclusion Discrete gradient flow of MMD 13/46

  29. Distances Entropic Regularization Sinkhorn Divergences Conclusion Discrete gradient flow of OT 14/46

  30. Distances Entropic Regularization Sinkhorn Divergences Conclusion Another example n n ( x 1 ,..., x n ) D ( 1 δ x i , 1 � � min δ y j ) n n i = 1 i = 1 15/46

  31. Distances Entropic Regularization Sinkhorn Divergences Conclusion Discrete gradient flow of MMD 16/46

  32. Distances Entropic Regularization Sinkhorn Divergences Conclusion Discrete gradient flow of OT 17/46

  33. Distances Entropic Regularization Sinkhorn Divergences Conclusion Optimal Transport vs. MMD MMD OT O ( n − 1 / d ) sample complexity ( 1 √ n ) (curse of dimension) O ( n 3 log( n )) O ( n 2 ) computation better gradients ! � k � n ( x 1 ,..., x k ) D ( 1 i = 1 δ x i , 1 min i = 1 δ y j ) after 200 steps of grad. descent. k n 18/46

  34. Distances Entropic Regularization Sinkhorn Divergences Conclusion 1 Notions of Distance between Measures 2 Entropic Regularization of Optimal Transport The basics A magic regularizing tool ! Sample Complexity 3 Sinkhorn Divergences : Interpolation between OT and MMD 4 Conclusion 19/46

  35. Distances Entropic Regularization Sinkhorn Divergences Conclusion The basics Entropic Regularization (Cuturi ’13) Let α ∈ M 1 + ( X ) and β ∈ M 1 + ( Y ) , � def. W c ( α, β ) = min c ( x , y ) d π ( x , y ) ( P ) π ∈ Π( α,β ) X×Y 20/46

  36. Distances Entropic Regularization Sinkhorn Divergences Conclusion The basics Entropic Regularization (Cuturi ’13) Let α ∈ M 1 + ( X ) and β ∈ M 1 + ( Y ) , � def. W c ,ε ( α, β ) = min c ( x , y ) d π ( x , y ) + ε H ( π | α ⊗ β ) , ( P ε ) π ∈ Π( α,β ) X×Y where � � d π ( x , y ) � def. H ( π | α ⊗ β ) = log d π ( x , y ) . d α ( x ) d β ( y ) X×Y relative entropy of the transport plan π with respect to the product measure α ⊗ β . 20/46

  37. Distances Entropic Regularization Sinkhorn Divergences Conclusion The basics Entropic Regularization Figure 3 – Influence of the regularization parameter ε on the transport plan π . Intuition : the entropic penalty ‘smoothes’ the problem and avoids over fitting (think of ridge regression for least squares) 21/46

  38. Distances Entropic Regularization Sinkhorn Divergences Conclusion The basics Dual Formulation Contrary to standard OT, no constraint on the dual problem : � � ( D ) W c ( α, β ) = max u ( x ) d α ( x ) + v ( y ) d β ( y ) u ∈C ( X ) X Y v ∈C ( Y ) such that { u ( x ) + v ( y ) � c ( x , y ) ∀ ( x , y ) ∈ X × Y} 22/46

Recommend


More recommend