on the complexity of approximating wasserstein barycenters
play

On the Complexity of Approximating Wasserstein Barycenters Alexey - PowerPoint PPT Presentation

On the Complexity of Approximating Wasserstein Barycenters Alexey Kroshnin, Darina Dvinskikh, Pavel Dvurechensky , Alexander Gasnikov, Nazarii Tupitsa, Csar A. Uribe International Conference on Machine Learning 2019 Wasserstein barycenter m


  1. On the Complexity of Approximating Wasserstein Barycenters Alexey Kroshnin, Darina Dvinskikh, Pavel Dvurechensky , Alexander Gasnikov, Nazarii Tupitsa, César A. Uribe International Conference on Machine Learning 2019

  2. Wasserstein barycenter m � ν = arg min ˆ W ( µ i , ν ) , ν ∈P 2 (Ω) i =1 where W ( µ, ν ) is the Wasserstein distance between measures µ and ν on Ω . WB is efficient in machine learning problems with geometric data, e.g. template image reconstruction from random sample: Figure: Images from [Cuturi & Doucet, 2014] 2/9 On the Complexity of Approximating Wasserstein Barycenters

  3. Motivation We consider a set of discrete measures p 1 , . . . , p m ∈ S n (1) . Main question: How much work is it needed to find their barycenter ˆ q with accuracy ε ? m m � � 1 1 W ( p l , ˆ q ) − min W ( p l , q ) ≤ ε m m q ∈ S n (1) l =1 l =1 Beyond that challenges are: Fine discrete approximation for continuous ν and µ i ⇒ large n , Large amount of data ⇒ large m , Data produced and stored distributedly (e.g. produced by a network of sensors). 3/9 On the Complexity of Approximating Wasserstein Barycenters

  4. Background Following [Cuturi & Doucet, 2014], we use entropic regularization. m m � � � � π l , C l � + γH ( π l ) � , 1 1 min W γ ( p l , q ) = min (1) m m q ∈ S n (1) q ∈ S n (1) , l =1 l =1 π l ∈ Π( p l ,q ) , l =1 ,...,m H ( π ) = � n i,j =1 π ij (ln π ij − 1) = � π, ln π − 11 T � . Π( p, q ) = { π ∈ R n × n : π 1 = p, π T 1 = q } . + C ij — transport cost from point z i to y j of the supports. Cost of finding W 0 ( p, q ) � � n 2 Sinkhorn’s algorithm O , [Altschuler, Weed, Rigollet, NeurIPS’17; Dvurechensky, Gasnikov, ε 2 Kroshnin, ICML ’18] � � �� n 2 . 5 ε , n 2 Accelerated Gradient Descent O min , [Dvurechensky, Gasnikov, Kroshnin, ε 2 ICML ’18; Lin, Ho, Jordan, ICML ’19] 4/9 On the Complexity of Approximating Wasserstein Barycenters

  5. Background Algorithms for barycenter m m � � � � π l , C l � + γH ( π l ) � . 1 1 min W γ ( p l , q ) = min m m q ∈ S n (1) q ∈ S n (1) , l =1 l =1 π l ∈ Π( p l ,q ) , l =1 ,...,m Sinkhorn + Gradient Descent [Cuturi, Doucet, NeurIPS’13] Iterative Bregman Projections [Benamou et al., SIAM J Sci Comp’15] (Accelerated) Gradient Descent [Cuturi, Peyre, SIAM J Im Sci’16; Dvurechensky et al, NeurIPS’18; Uribe et al., CDC’18]. Stochastic Gradient Descent [Staib et al., NeurIPS’17; Claici, Chen, Solomon, ICML ’18] Question of complexity was open. 5/9 On the Complexity of Approximating Wasserstein Barycenters

  6. Contributions Prove that to find an ε approximation of the γ -regularized WB Iterative Bregman Projections (IBP) needs 1 γε iterations; � n Accelerated Gradient descent (AGD) needs γε iterations. Setting γ = Θ ( ε/ ln n ) allows to find an ε -approximation for the non-regularized WB with arithmetic operations complexity � � � mn 2 O for IBP , ε 2 � � mn 2 . 5 � O for AGD . ε We propose a proximal-IBP algorithm to solve the issue of instability of IBP and AGD caused by small gamma. We discuss scalability of the algorithms via their distributed versions. IBP can be realized distributedly in a centralized architecture (master/slaves), AGD can be realized in a general decentralized architecture. 6/9 On the Complexity of Approximating Wasserstein Barycenters

  7. Iterative Bregman Projections m � � � π l , C l � + γH ( π l ) � 1 min m π l 1 = p l , π T l 1 = π T l +1 1 l =1 π l ∈ R n × n , l =1 ,...,m + Dual problem: m � � � 1 , B l ( u l , v l ) 1 � − � u l , p l � � , f ( u , v ) := 1 min u , v m � m 1 l =1 l =1 v l =0 m u = [ u 1 , . . . , u m ] , v = [ v 1 , . . . , v m ] , u l , v l ∈ R n , B l ( u l , v l ) := diag ( e u l ) exp ( − C l /γ ) diag ( e v l ) . IBP is equivalent to alternating minimization for the dual problem. l , v t +1 := v t := ln p l − ln K l e v t u t +1 l � m l , u t +1 := u t k e u t l e u t v t +1 := 1 k − ln K T k =1 ln K T l m 7/9 On the Complexity of Approximating Wasserstein Barycenters

  8. Accelerated Gradient Descent Define symmetric p.s.d. matrix ¯ W s.t. Ker( ¯ W) = span( 1 ) . For W := ¯ m ) T it holds W ⊗ I n and q = ( q T 1 , . . . , q T √ q 1 = · · · = q m ⇐ ⇒ W q = 0 m Equivalent form of problem (1) � − 1 max W γ,p l ( q l ) . m q 1 ,...,q m ∈ S 1 ( n ) √ l =1 W q =0 Dual problem ¯ λ l � �� � m � √ γ ( λ ) := 1 λ ∈ R mn W ∗ W ∗ min γ,p l ( m [ W λ ] l ) . m l = l Run (A)GD for the dual and reconstruct the primal solution � m ¯ = ¯ γ,p j (¯ l − α k +1 λ k +1 λ k +1 λ k j =1 W lj ∇W ∗ ) j l m � k +1 i =0 α i q i (¯ q k +1 1 λ k +1 = ) , where l l A k +1 q l ( · ) = ∇W ∗ γ,p l ( · ) 8/9 On the Complexity of Approximating Wasserstein Barycenters

  9. Thank you! Welcome to poster #203, Pacific Ballroom. 9/9 On the Complexity of Approximating Wasserstein Barycenters

Recommend


More recommend