A Review of Regularized Optimal Transport Marco Cuturi Joint work - PowerPoint PPT Presentation

A Review of Regularized Optimal Transport Marco Cuturi Joint work with many people, including: G. Peyré, A. Genevay (ENS) , A. Doucet (Oxford) J. Solomon (MIT) , J.D. Benamou, N. Bonneel, F. Bach, L. Nenna (INRIA), G. Carlier ( Dauphine ).

What is Optimal Transport? A geometric toolbox to   compare probability measures   supported on a metric space. Monge Kantorovich Dantzig Wasserstein Brenier Otto McCann Villani 2

What is Optimal Transport? A geometric toolbox to   compare probability measures   supported on a metric space. h 1 p θ p θ 0 h 2 d Bags Brain Activation Maps Statistical Models of features ν Empirical µ Measures Color Histograms 3

What is Optimal Transport? A geometric toolbox to   compare probability measures   supported on a metric space. p θ p θ 0 h 2 d Bags Brain Activation Maps Statistical Models of features ν Empirical µ Measures Color Histograms 4

What is Optimal Transport? A geometric toolbox to   compare probability measures   supported on a metric space. p θ 0 P ( Ω ) p θ 5

What is Optimal Transport? A geometric toolbox to   compare probability measures   supported on a metric space. W ( p θ , p θ 0 ) p θ 0 P ( Ω ) Wasserstein Distance p θ 5

What is Optimal Transport? A geometric toolbox to   compare probability measures   supported on a metric space. p θ 0 P ( Ω ) [McCann’95] Interpolant p θ 5

What is Optimal Transport? A geometric toolbox to   compare probability measures   supported on a metric space. p θ 0 P ( Ω ) p θ p θ 00 6

What is Optimal Transport? A geometric toolbox to   compare probability measures   supported on a metric space. p θ 0 Wasserstein Barycenter P ( Ω ) [Agueh’11] p θ p θ 00 6

OT and data-analysis • Key developments in (applied) maths ~’90s   [McCann’95] , [JKO’98], [Benamou’98], [Gangbo’98], [Ambrosio’06], [Villani’03/’09]. � • Key developments in TCS / graphics since ’00s   [Rubner’98], [Indyk’03], [Naor’07], [Andoni’15]. � ๏ Small to no-impact in large-scale data analysis: ✦ computationally heavy; ✦ Wasserstein distance is not differentiable 7

OT and data-analysis • Key developments in Today’s talk: Entropy Regularized OT [McCann’95] • Very fast compared to usual approaches, [Ambrosio’06], [Villani’03/’09]. GPGPU parallel. � • Differentiable , important if we want to use • Key developments in OT distances as loss functions . [Rubner’98], • Can be automatically differentiated , simple � ๏ Small to iterative process, DL -toolboxes compatible. • OT can become a building block in ML. ✦ computationally heavy; ✦ Wasserstein distance is not differentiable 7

Background: OT Geometry Consider ( Ω , D ), a metric probability space. Let µ , ν be probability measures in P ( Ω ). • [Monge’81] problem: find a map T : Ω → Ω Z inf D ( x, T ( x )) µ ( dx ) T # µ = ν Ω T ( x ) x 8

Background: OT Geometry Consider ( Ω , D ), a metric probability space. Let µ , ν be probability measures in P ( Ω ). • [Monge’81] problem: find a map T : Ω → Ω Z inf D ( x, T ( x )) µ ( dx ) T # µ = ν Ω δ x 8

[Kantorovich’42] Relaxation • Instead of maps , consider T : Ω → Ω P ∈ P ( Ω × Ω ) probabilistic maps, i.e. couplings : def Π ( µ , ν ) = { P ∈ P ( Ω × Ω ) | ∀ A , B ⊂ Ω , P ( A × Ω ) = µ ( A ) , P ( Ω × B ) = ν ( B ) } 9

[Kantorovich’42] Relaxation def Π ( µ , ν ) = { P ∈ P ( Ω × Ω ) | ∀ A , B ⊂ Ω , P ( A × Ω ) = µ ( A ) , P ( Ω × B ) = ν ( B ) } { } { } { } { µ ( x ) ν ( y ) 0 . 6 0 . 4 P 0 . 2 0 4 − 1 P ( x, y ) 0 3 0 . 3 1 2 0 . 2 2 1 0 . 1 y x 3 0 0 4 − 1 10

[Kantorovich’42] Relaxation def Π ( µ , ν ) = { P ∈ P ( Ω × Ω ) | ∀ A , B ⊂ Ω , P ( A × Ω ) = µ ( A ) , P ( Ω × B ) = ν ( B ) } { } { } { } { µ ( x ) µ ( x ) ν ( y ) ν ( y ) 0 . 6 0 . 6 0 . 4 0 . 4 P P 0 . 2 0 . 2 0 0 4 4 − 1 − 1 P ( x, y ) P ( x, y ) 0 0 3 3 0 . 15 0 . 3 0 . 3 1 1 2 2 0 . 1 0 . 2 0 . 2 2 2 1 1 5 · 10 0 . 1 0 . 1 y y x x 3 3 0 0 0 0 0 4 − 1 4 − 1 10

Couplings { } { } { } { µ ( x ) ν ( y ) 0 . 6 0 . 4 P 0 . 2 0 4 − 1 P ( x, y ) 0 3 0 . 3 1 2 0 . 2 2 1 0 . 1 y x 3 0 0 4 − 1 11

Couplings µ ( x ) ν ( y ) 0 . 6 0 . 4 P 0 . 2 0 4 − 1 P ( x, y ) 0 3 0 . 15 0 . 3 1 2 0 . 1 0 . 2 2 1 5 · 10 0 . 1 y x 3 0 0 0 4 − 1 12

Wasserstein Distance Def. For p ≥ 1, the p -Wasserstein distance between µ , ν in P ( Ω ) is ◆ 1 /p ✓ def P ∈ Π ( µ , ν ) E P [ D ( X, Y ) p ] W p ( µ , ν ) = inf . 13

Wasserstein between 2 Diracs δ x ( Ω , D ) δ y p ( δ x , δ y ) = D ( x , y ) W p 14

Wasserstein on Uniform Measures n 1 X n δ x i µ = i =1 ( Ω , D ) n 1 X ν = n δ y j j =1 15

Wasserstein on Uniform Measures n 1 X n δ x i µ = i =1 ( Ω , D ) n 1 n C ( σ ) = 1 X ν = n δ y j X D ( x i , y σ i ) p n j =1 i =1 15

Optimal Assignment ⊂ Wasserstein n 1 X n δ x i µ = i =1 ( Ω , D ) n 1 X ν = n δ y j W p p ( µ , ν ) = min σ ∈ S n C ( σ ) j =1 16

Wasserstein on Empirical Measures n X a i δ x i µ = i =1 ( Ω , D ) m X ν = b j δ y j j =1 17

Wasserstein on Empirical Measures n m X X a i δ x i and ν = b j δ y j . Consider µ = i =1 j =1 def = [ D ( x i , y j ) p ] ij M XY | P 1 m = a , P T 1 n = b } def = { P ∈ R n × m U ( a , b ) + b 1 ... b m y 1 ... y m     x 1 a 1 · · · · · · · · · · · ·     . .     . . D ( x i , y j ) p P 1 m = a . .     · · · · · · · ·         x n a n · · · · · · · · · · · · 18

Wasserstein on Empirical Measures n m X X a i δ x i and ν = b j δ y j . Consider µ = i =1 j =1 def = [ D ( x i , y j ) p ] ij M XY | P 1 m = a , P T 1 n = b } def = { P ∈ R n × m U ( a , b ) + b 1 ... b m y 1 ... y m . . .     . . . . . . a 1 x 1 · · ·     . . .   .   . . P T 1 n = b . D ( x i , y j ) p .   . . .   · · .      . . .    . . . x n · · · . . . a n 18

Wasserstein on Empirical Measures n m X X a i δ x i and ν = b j δ y j . Consider µ = i =1 j =1 def = [ D ( x i , y j ) p ] ij M XY | P 1 m = a , P T 1 n = b } def = { P ∈ R n × m U ( a , b ) + Def. Optimal Transport Problem W p p ( µ , ν ) = P ∈ U ( a , b ) h P , M XY i min 18

Discrete OT Problem M XY U ( a , b ) 19

Discrete OT Problem M XY U ( a , b ) P ? 20

Discrete OT Problem M XY U ( a , b ) P ? Def. Dual OT problem α T a + β T b W p p ( µ , ν ) = max α ∈ R n , β ∈ R m α i + β j ≤ D ( x i , y j ) p 20

Discrete OT Problem network flow solver M XY used in practice. O ( n 3 log( n )) U ( a , b ) P ? Note: flow/PDE formulations [Beckman’61]/[Benamou’98] can be used for p=1/p=2 for a sparse-graph metric/Euclidean metric. 20

Discrete OT Problem network flow solver M XY used in practice. O ( n 3 log( n )) U ( a , b ) P ? 21

Discrete OT Problem network flow solver M XY used in practice. O ( n 3 log( n )) U ( a , b ) P ? P ? Solution unstable and not always unique. 23

Discrete OT Problem network flow solver M XY used in practice. O ( n 3 log( n )) U ( a , b ) P ? Solution unstable { P ? } and not always unique. 23

Discrete OT Problem network flow solver M XY used in practice. O ( n 3 log( n )) U ( a , b ) P ? Solution unstable { P ? } and not always unique. 24

Discrete OT Problem network flow solver M XY used in practice. O ( n 3 log( n )) U ( a , b ) P ? Solution unstable and not always unique. P ? 24

Discrete OT Problem network flow solver M XY used in practice. O ( n 3 log( n )) U ( a , b ) P ? Solution unstable and not always unique. P ? p ( µ , ν ) not di ff erentiable. W p 24

Entropic Regularization [Wilson’62] Def. Regularized Wasserstein, γ ≥ 0 def W γ ( µ , ν ) = P ∈ U ( a , b ) h P , M XY i � γ E ( P ) min nm def X E ( P ) = − P ij (log P ij ) i,j =1 Note: Unique optimal solution because of strong concavity of Entropy 25

Entropic Regularization [Wilson’62] Def. Regularized Wasserstein, γ ≥ 0 def W γ ( µ , ν ) = P ∈ U ( a , b ) h P , M XY i � γ E ( P ) min ν P γ µ γ Note: Unique optimal solution because of strong concavity of Entropy 25

Fast & Scalable Algorithm def Prop. If P γ = argmin h P , M XY i� γ E ( P ) P ∈ U ( a , b ) then 9 ! u 2 R n + , v 2 R m + , such that def = e − M XY / γ P γ = diag ( u ) K diag ( v ) , K 26

A Review of Regularized Optimal Transport Marco Cuturi Joint work - PowerPoint PPT Presentation

A Review of Regularized Optimal Transport Marco Cuturi Joint work with many people, including: G. Peyr, A. Genevay (ENS) , A. Doucet (Oxford) J. Solomon (MIT) , J.D. Benamou, N. Bonneel, F. Bach, L. Nenna (INRIA), G. Carlier ( Dauphine ). What

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Regularized Least Squares Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin

Regularized Linear Models in Stacked Generalization Sam Reid and Greg Grudic Department of

Regularized Least Squares Charlie Frogner 1 MIT 2012 1 Slides mostly stolen from Ryan Rifkin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

CSI5180. MachineLearningfor BioinformaticsApplications Regularized Linear Models by Marcel

Model Selection and Fast Rates for Regularized Least-Squares Andrea Caponnetto 1 Plan

First-Order Algorithms for Approximate TV-Regularized Image Denoising Stephen Wright University

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Yuxin Chen Electrical

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Cong Ma ORFE, Princeton

Embeddability of locally finite metric spaces into Banach spaces is finitely determined Mikhail

Metric representations: Algorithms and Geometry Anna C. Gilbert Department of Mathematics,

Producing Generational Loyalty to God The Primary Place of Family Training Whereas in 1820

Modeling COVID-19 spread and control: Data needs and challenges Alison L Hill, PhD Department

Type spaces of metric structures and topometric spaces Ita Ben-Yaacov September 2006 1 1

Linear Algebra Review Leila Wehbe January 29, 2013 Leila Wehbe Linear Algebra Review Metrics

Warped cones, profinite completions, coarse embeddings and property A . Damian Sawicki

The intrinsic geometry of topological groups Christian Rosendal, University of Illinois at

Sambuz

Useful Links

Newsletter

Mail Us