Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters Pavel Dvurechensky, Darina Dvinskikh, Alexander Gasnikov, César A. Uribe, Angelia Nedi´ c Conference on Neural Information Processing Systems 2018
Wasserstein barycenter m � ν = arg min ˆ W ( µ i , ν ) , ν ∈P 2 (Ω) i =1 where W ( µ, ν ) is the Wasserstein distance between measures µ and ν on Ω . WB is efficient in machine learning problems with geometric data, e.g. template image reconstruction from random sample: Figure: Images from [Cuturi, 2013] 2/9 Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters
Motivation We fix the support z i , i = 1 , ..., n of the barycenter: ν = � n i =1 p i δ ( z i ) . We add Entropic regularization with parameter γ . m � p = arg min ˆ W γ,µ i ( p ) . p ∈ S 1 ( n ) i =1 Challenges: Fine discrete approximation for ν and µ ⇒ large n , Large amount of data ⇒ large m , Data produced and stored distributedly (e.g. produced by a network of sensors), Possibly continuous measures µ i . 3/9 Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters
Background and contribution L ARGE m , n D IST . DATA C ONT . µ i C OMPL - TY P APER √ S INKHORN - TYPE × × ? [ C UTURI &D OUCET ’14, B ENAMOU ET AL .’15 ] √ √ D ISTRIBUTED AGD × ? [ S CAMAN ET AL .’17, U RIBE ET AL .’17, L AN ET AL .’17 ] √ √ SGD- BASED 1 /ε 2 × [ S TAIB ET . AL .’17, C LAICI ET AL .’18 ] √ √ √ 1 /ε 2 T HIS PAPER 4/9 Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters
Contributions Novel Accelerated Primal-Dual Stochastic Gradient Method (APDSGD) for general class of stochastic optimization problems with linear constraints � . � � λ, b � + E ξ F ∗ ( − A T λ, ξ ) ( P ) : x ∈ Q ⊆ E { f ( x ) : Ax = b } , min ( D ) : min λ with complexity � �� �� L D R 2 , σ 2 R 2 D D O max ε 2 ε to obtain x ) − f ∗ ≤ ε and � A E ˆ f ( E ˆ x − b � 2 ≤ ε. Decentralized distributed algorithm for γ -regularized Wasserstein barycenter of a set of continuous measures stored over a network with arbitrary topology with complexity � 1 � √ εγ, m �� O mn max a.o. ε 2 Experimens on the MNIST digit dataset and the IXI Magnetic Resonance dataset. 5/9 Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters
Distributed optimization framework 1 m m � � min f i ( x ) ⇐ ⇒ min f i ( x i ) s.t. x 1 = ... = x m ∈ R . x ∈ R i =1 i =1 Laplacian matrix 2 − 1 0 − 1 − 1 3 − 1 − 1 W = 0 − 1 1 0 − 1 − 1 0 2 m √ � x 1 = ... = x m ⇐ ⇒ W x = 0 − → max − f i ( x i ) . √ x ∈ R m : W x =0 i =1 Distributed reformulation through dual problem m m �� √ �� √ � � � � � � f ∗ E Y i ∼ µ i F ∗ min W λ = min W λ i , Y i . i i λ ∈ R m λ ∈ R m i i =1 i =1 1 [Boyd et al.’11, Jakoveti´ c et al.’15, Scaman et al.’17, Uribe et al.’17, Lan et al.’17] 6/9 Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters
Distributed stochastic gradient method in the dual √ Change the variables ξ := W λ . SGD step for each node i : ξ ( k +1) = ξ ( k ) − α � m j =1 [ W ] ij ∇ F ∗ j ( ξ j , Y j ) . i i Our contribution: Acceleration and careful Primal-Dual analysis for solving the primal problem. 7/9 Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters
Experiments on MNIST dataset k = 0 k = 10 k = 20 k = 30 8/9 Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters
Thank you! Welcome to poster #15, Room 210 & 230 AB. 9/9 Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters
Recommend
More recommend