optimal transport for machine learning
play

Optimal Transport for Machine Learning Aude Genevay CEREMADE - PowerPoint PPT Presentation

Optimal Transport for Machine Learning Aude Genevay CEREMADE (Universit Paris-Dauphine) DMA (Ecole Normale Suprieure) MOKAPLAN Team (INRIA Paris) Imaging in Paris - February 2018 Optimal transport Outline Aude Genevay Entropy


  1. Optimal Transport for Machine Learning Aude Genevay CEREMADE (Université Paris-Dauphine) DMA (Ecole Normale Supérieure) MOKAPLAN Team (INRIA Paris) Imaging in Paris - February 2018

  2. Optimal transport Outline Aude Genevay Entropy Regularized OT 1 Entropy Regularized OT Applications in Imaging Large Scale "OT" for Machine 2 Applications in Imaging Learning Application to Generative Models 3 Large Scale "OT" for Machine Learning 4 Application to Generative Models

  3. Optimal transport Shortcomings of OT Aude Genevay Entropy Regularized OT Applications in Imaging Two main issues when using OT in practice : Large Scale "OT" for • Poor sample complexity : need a lot of samples from µ and Machine Learning ν to get a good approximation of W ( µ, ν ) Application • Heavy computational cost : solving discrete OT requires to Generative Models solving an LP → network simplex solver O ( n 3 log ( n )) [Pele and Werman ’09]

  4. Optimal transport Entropy! Aude Genevay Entropy Regularized OT • Basically : Adding an entropic regularization smoothes the Applications in Imaging constraint Large Scale • Makes the problem easier : "OT" for Machine • yields an unconstrained dual problem Learning • discrete case can be solved efficiently with iterative Application to Generative algorithm (more on that later) Models • For ML applications, regularized Wasserstein is better than standard one • In high dimension, helps avoiding overfitting

  5. Optimal transport Entropic Relaxation of OT [Cuturi Aude Genevay ’13] Entropy Regularized OT Applications in Imaging Add entropic penalty to Kantorovitch formulation of OT Large Scale "OT" for � Machine min c ( x , y ) d γ ( x , y ) + ε KL ( γ | µ ⊗ ν ) Learning γ ∈ Π( µ,ν ) X×Y Application to Generative Models where � d γ � def. � � � KL ( γ | µ ⊗ ν ) = log d µ d ν ( x , y ) − 1 d γ ( x , y ) X×Y

  6. Optimal transport Dual Formulation Aude Genevay Entropy Regularized OT Applications in Imaging Large Scale � � "OT" for max u ( x ) d µ ( x ) + v ( y ) d ν ( y ) Machine u ∈ C ( X ) v ∈ C ( Y ) Learning X Y � Application u ( x )+ v ( y ) − c ( x , y ) − ε e d µ ( x ) d ν ( y ) to Generative ε Models X×Y Constraint in standard OT u ( x ) + v ( y ) < c ( x , y ) replaced by a smooth penalty term.

  7. Optimal transport Dual Formulation Aude Genevay Entropy Regularized OT Applications Dual problem concave in u and v , first order condition for each in Imaging variable yield : Large Scale "OT" for Machine Learning � v ( y ) − c ( x , y ) ∇ u = 0 ⇔ u ( x ) = − ε log ( e d ν ( y )) ε Application to Generative Y Models � u ( x ) − c ( x , y ) ∇ v = 0 ⇔ v ( y ) = − ε log ( d µ ( x )) e ε X

  8. Optimal transport The Discrete Case Aude Genevay Entropy Dual problem : Regularized OT n , m n m Applications ui + vj − c ( xi , yj ) � � � in Imaging max u i µ i + v j ν j − ε e µ i ν j ε u ∈ R m v ∈ R n Large Scale i = 1 j = 1 i , j = 1 "OT" for Machine Learning First order conditions for each variable: Application to Generative m vj − c ( xi , yj ) Models � ∇ u = 0 ⇔ u i = − ε log ( e ν j ) ε j = 1 n ui − c ( xi , yj ) � ∇ v = 0 ⇔ v j = − ε log ( e µ i ) ε i = 1 ⇒ Do alternate maximizations!

  9. Optimal transport Sinkhorn’s Algorithm Aude Genevay u v ε , e ε ) • Iterates ( a , b ) := ( e Entropy Regularized OT Sinkhorn algorithm [Cuturi ’13] Applications in Imaging Large Scale K ← ( e − c ij /ε m ij ) ij initialize b ← 1 m "OT" for Machine Learning repeat Application a ← µ ⊘ Kb to Generative Models b ← ν ⊘ K T a return γ = diag ( a ) K diag ( b ) • each iteration O ( nm ) complexity (matrix vector multiplication) • can be improved to O ( n log n ) on gridded space with convolutions [Solomon et al. ’15]

  10. Optimal transport Sinkhorn - Toy Example Aude Genevay Entropy Regularized OT Applications in Imaging Marginals µ and ν Large Scale "OT" for Machine Learning Application to Generative Models top : evolution of γ with number of iterations l bottom : evolution of γ with regularization parameter ε

  11. Optimal transport Sinkhorn - Convergence Aude Genevay Entropy Regularized OT Definition (Hilbert metric) Applications in Imaging Projective metric defined for x , y ∈ R d ++ by Large Scale "OT" for d H ( x , y ) := log max i ( x i / y i ) Machine Learning min i ( x i / y i ) Application to Generative Models Theorem The iterates ( a ( l ) , b ( l ) ) converge linearly for the Hilbert metric. Remark : the contraction coefficient deteriorates quickly when ε → 0 (exponentially in worst-case)

  12. Optimal transport Sinkhorn - Convergence Aude Genevay Constraint violation Entropy Regularized We have the following bound on the iterates: OT Applications d H ( a ( l ) , a ⋆ ) ≤ κ d H ( γ 1 m , µ ) in Imaging Large Scale "OT" for So monitoring the violation of the marginal constraints is a good Machine Learning way to monitor convergence of Sinkhorn’s algorithm Application to Generative Models � γ 1 m − µ � for various regularizations

  13. Optimal transport Color Transfer Aude Genevay Entropy Regularized OT Applications in Imaging Large Scale "OT" for Machine Learning Application to Generative Models Image courtesy of G. Peyré

  14. Optimal transport Shape / Image Barycenters Aude Genevay Regularized Wasserstein Barycenters [Nenna et al. ’15] Entropy Regularized µ = arg min ¯ W ε ( µ k , µ ) OT µ ∈ Σ n Applications in Imaging Large Scale "OT" for Machine Learning Application to Generative Models Image from [Solomon et al. ’15]

  15. Optimal transport Sinkhorn loss Aude Genevay Entropy Regularized OT Consider entropy-regularized OT Applications in Imaging � Large Scale min c ( x , y ) d π ( x , y ) + ε KL ( π | µ ⊗ ν ) "OT" for π ∈ Π( µ,ν ) Machine X×Y Learning Application Regularized loss : to Generative Models � def. W c ,ε ( µ, ν ) = c ( x , y ) d π ε ( x , y ) XY where π ε solution of (15)

  16. Optimal transport Sinkhorn Divergences : Aude Genevay interpolation between OT and MMD Entropy Regularized OT Theorem Applications in Imaging The Sinkhorn loss between two measures µ, ν is defined as: Large Scale "OT" for Machine ¯ W c ,ε ( µ, ν ) = 2 W c ,ε ( µ, ν ) − W c ,ε ( µ, µ ) − W c ,ε ( ν, ν ) Learning Application to Generative with the following limiting behavior in ε : Models ¯ 1 as ε → 0 , W c ,ε ( µ, ν ) → 2 W c ( µ, ν ) ¯ 2 as ε → + ∞ , W c ,ε ( µ, ν ) → � µ − ν � − c where �·� − c is the MMD distance whose kernel is minus the cost from OT. Remark : Some conditions are required on c to get MMD distance when ε → ∞ . In particular, c = �·� p p , 0 < p < 2 is valid.

  17. Optimal transport Sample Complexity Aude Genevay Entropy Regularized OT Sample Complexity of OT and MMD Applications in Imaging Let µ a probability distribution on R d , and ˆ µ n an empirical Large Scale "OT" for measure from µ Machine Learning O ( n − 1 / d ) Application W ( µ, ˆ µ n ) = to Generative Models O ( n − 1 / 2 ) MMD ( µ, ˆ µ n ) = ⇒ the number n of samples you need to get a precision η on the Wassertein distance grows exponentially with the dimension d of the space!

  18. Optimal transport Sample Complexity - Sinkhorn loss Aude Genevay Entropy Regularized OT Applications in Imaging Large Scale "OT" for Machine Learning Application to Generative Models Sample Complexity of Sinkhorn loss seems to improve as ε grows. Plots courtesy of G. Peyré and M. Cuturi

  19. Optimal transport Generative Models Aude Genevay Entropy Regularized OT Applications in Imaging Large Scale "OT" for Machine Learning Application to Generative Models Figure: Illustration of Density Fitting on a Generative Model

  20. Optimal transport Density Fitting with Sinkhorn loss Aude Genevay "Formally" Entropy Regularized OT Applications in Imaging Large Scale Solve min θ E ( θ ) "OT" for Machine Learning Application def. = ¯ where E ( θ ) W c ,ε ( µ θ , ν ) to Generative Models ⇒ Issue : untractable gradient

  21. Optimal transport Approximating Sinkhorn loss Aude Genevay Entropy Regularized OT • Rather than approximating the gradient approximate the Applications loss itself in Imaging Large Scale • Minibatches : ˆ "OT" for E ( θ ) Machine Learning • sample x 1 , . . . , x m from µ θ Application • use empirical Wasserstein distance W c ,ε (ˆ µ θ , ˆ ν ) where to Generative � m µ θ = 1 Models ˆ i = 1 δ x i N • Use L iterations of Sinkhorn’s algorithm : ˆ E ( L ) ( θ ) • compute L steps of the algorithm • use this as a proxy for W (ˆ µ θ , ν )

Recommend


More recommend