On the Complexity of Approximating Wasserstein Barycenters Alexey - PowerPoint PPT Presentation

On the Complexity of Approximating Wasserstein Barycenters Alexey Kroshnin, Darina Dvinskikh, Pavel Dvurechensky , Alexander Gasnikov, Nazarii Tupitsa, César A. Uribe International Conference on Machine Learning 2019

Wasserstein barycenter m � ν = arg min ˆ W ( µ i , ν ) , ν ∈P 2 (Ω) i =1 where W ( µ, ν ) is the Wasserstein distance between measures µ and ν on Ω . WB is efficient in machine learning problems with geometric data, e.g. template image reconstruction from random sample: Figure: Images from [Cuturi & Doucet, 2014] 2/9 On the Complexity of Approximating Wasserstein Barycenters

Motivation We consider a set of discrete measures p 1 , . . . , p m ∈ S n (1) . Main question: How much work is it needed to find their barycenter ˆ q with accuracy ε ? m m � � 1 1 W ( p l , ˆ q ) − min W ( p l , q ) ≤ ε m m q ∈ S n (1) l =1 l =1 Beyond that challenges are: Fine discrete approximation for continuous ν and µ i ⇒ large n , Large amount of data ⇒ large m , Data produced and stored distributedly (e.g. produced by a network of sensors). 3/9 On the Complexity of Approximating Wasserstein Barycenters

Background Following [Cuturi & Doucet, 2014], we use entropic regularization. m m � � � � π l , C l � + γH ( π l ) � , 1 1 min W γ ( p l , q ) = min (1) m m q ∈ S n (1) q ∈ S n (1) , l =1 l =1 π l ∈ Π( p l ,q ) , l =1 ,...,m H ( π ) = � n i,j =1 π ij (ln π ij − 1) = � π, ln π − 11 T � . Π( p, q ) = { π ∈ R n × n : π 1 = p, π T 1 = q } . + C ij — transport cost from point z i to y j of the supports. Cost of finding W 0 ( p, q ) � � n 2 Sinkhorn’s algorithm O , [Altschuler, Weed, Rigollet, NeurIPS’17; Dvurechensky, Gasnikov, ε 2 Kroshnin, ICML ’18] � � �� n 2 . 5 ε , n 2 Accelerated Gradient Descent O min , [Dvurechensky, Gasnikov, Kroshnin, ε 2 ICML ’18; Lin, Ho, Jordan, ICML ’19] 4/9 On the Complexity of Approximating Wasserstein Barycenters

Background Algorithms for barycenter m m � � � � π l , C l � + γH ( π l ) � . 1 1 min W γ ( p l , q ) = min m m q ∈ S n (1) q ∈ S n (1) , l =1 l =1 π l ∈ Π( p l ,q ) , l =1 ,...,m Sinkhorn + Gradient Descent [Cuturi, Doucet, NeurIPS’13] Iterative Bregman Projections [Benamou et al., SIAM J Sci Comp’15] (Accelerated) Gradient Descent [Cuturi, Peyre, SIAM J Im Sci’16; Dvurechensky et al, NeurIPS’18; Uribe et al., CDC’18]. Stochastic Gradient Descent [Staib et al., NeurIPS’17; Claici, Chen, Solomon, ICML ’18] Question of complexity was open. 5/9 On the Complexity of Approximating Wasserstein Barycenters

Contributions Prove that to find an ε approximation of the γ -regularized WB Iterative Bregman Projections (IBP) needs 1 γε iterations; � n Accelerated Gradient descent (AGD) needs γε iterations. Setting γ = Θ ( ε/ ln n ) allows to find an ε -approximation for the non-regularized WB with arithmetic operations complexity � � � mn 2 O for IBP , ε 2 � � mn 2 . 5 � O for AGD . ε We propose a proximal-IBP algorithm to solve the issue of instability of IBP and AGD caused by small gamma. We discuss scalability of the algorithms via their distributed versions. IBP can be realized distributedly in a centralized architecture (master/slaves), AGD can be realized in a general decentralized architecture. 6/9 On the Complexity of Approximating Wasserstein Barycenters

Iterative Bregman Projections m � � � π l , C l � + γH ( π l ) � 1 min m π l 1 = p l , π T l 1 = π T l +1 1 l =1 π l ∈ R n × n , l =1 ,...,m + Dual problem: m � � � 1 , B l ( u l , v l ) 1 � − � u l , p l � � , f ( u , v ) := 1 min u , v m � m 1 l =1 l =1 v l =0 m u = [ u 1 , . . . , u m ] , v = [ v 1 , . . . , v m ] , u l , v l ∈ R n , B l ( u l , v l ) := diag ( e u l ) exp ( − C l /γ ) diag ( e v l ) . IBP is equivalent to alternating minimization for the dual problem. l , v t +1 := v t := ln p l − ln K l e v t u t +1 l � m l , u t +1 := u t k e u t l e u t v t +1 := 1 k − ln K T k =1 ln K T l m 7/9 On the Complexity of Approximating Wasserstein Barycenters

Accelerated Gradient Descent Define symmetric p.s.d. matrix ¯ W s.t. Ker( ¯ W) = span( 1 ) . For W := ¯ m ) T it holds W ⊗ I n and q = ( q T 1 , . . . , q T √ q 1 = · · · = q m ⇐ ⇒ W q = 0 m Equivalent form of problem (1) � − 1 max W γ,p l ( q l ) . m q 1 ,...,q m ∈ S 1 ( n ) √ l =1 W q =0 Dual problem ¯ λ l � �� m � √ γ ( λ ) := 1 λ ∈ R mn W ∗ W ∗ min γ,p l ( m [ W λ ] l ) . m l = l Run (A)GD for the dual and reconstruct the primal solution � m ¯ = ¯ γ,p j (¯ l − α k +1 λ k +1 λ k +1 λ k j =1 W lj ∇W ∗ ) j l m � k +1 i =0 α i q i (¯ q k +1 1 λ k +1 = ) , where l l A k +1 q l ( · ) = ∇W ∗ γ,p l ( · ) 8/9 On the Complexity of Approximating Wasserstein Barycenters

Thank you! Welcome to poster #203, Pacific Ballroom. 9/9 On the Complexity of Approximating Wasserstein Barycenters

On the Complexity of Approximating Wasserstein Barycenters Alexey - PowerPoint PPT Presentation

On the Complexity of Approximating Wasserstein Barycenters Alexey Kroshnin, Darina Dvinskikh, Pavel Dvurechensky , Alexander Gasnikov, Nazarii Tupitsa, Csar A. Uribe International Conference on Machine Learning 2019 Wasserstein barycenter m

Wasserstein barycenters over Riemannian manifolds Brendan Pass (joint work with Y.H. Kim (UBC))

Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters Pavel Dvurechensky,

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

Sinkhorn Barycenters with Free Support via Frank Wolfe algorithm Giulia Luise 1 , Saverio Salzo 2

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

A variational finite volume scheme for Wasserstein gradient flows es 1 , T. O. Gallou et 2 , G.

Stochastic Optimization for Regularized Wasserstein Estimators ICML 2020 Francis Bach Quentin

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Background Background Text Complexity Text Complexity Text Complexity Sowmya V.B., Sowmya

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

IN 5210 Complexity Theory Complexity Complexity: Socio-technical (Internet, globalization)

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

Complexity and Character of Human Languages The Faculty of Language Informatics 2A: Lecture 28

Approximating the Diameter, Width, Smallest Enclosing Cylinder, and Minimum-Width Annulus

Approximating a Motorcycle Graph by a Straight Skeleton Stefan Huber Martin Held Universit

Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science

The Basics: Pipelining J. Nelson Amaral University of Alberta

Bus Use of Shoulders in ODOT District 12 March 19, 2015 Introductions Introductions

I-205 SB Closed at X Johnson Creek Blvd I-205 SB Detour Route: Johnson Creek Blvd WB to OR213

I I I I % [ , ] I I I I I w b c : N X X s = c ( i ) w b b f i b

PHP Miscellaneous Dr. E. Benoist Winter Term 2005-2006 PHP Miscellaneous 1 PHP Miscellaneous

Media gives you a library for your images and attached documents, so they can be reused

Badger : Complexity Analysis with Fuzzing and Symbolic Execution Yannic Noller Rody Kersten

Sambuz

Useful Links

Newsletter

Mail Us