ICML 2019 Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi EPFL, Switzerland mlo.epfl.ch June 11, 2019 S. U. Stich CHOCO-SGD 1
Decentralized Stochastic Optimization � � n f ( x ) := 1 � min f i ( x ) n x ∈ R d i =1 ← devices ← communication links f j ( x ) f i ( x ) each device has oracle access to stochastic gradients g i ( x ) , E g i ( x ) = ∇ f i ( x ) , Var[ g i ] ≤ σ 2 i S. U. Stich CHOCO-SGD 2
Decentralized Stochastic Optimization Applications: servers, mobile devices, sensors, hospitals, ... Advantages: • no central coordinator • local communication vs. all-reduce • data distributed (storage & privacy aspects) This work: bandwidth restricted setting where communication is a bottleneck S. U. Stich CHOCO-SGD 3
Data Compression for Efficient Communication Communication Compression: Compress models/model updates before sending over the network. This work: Arbitrary compressors, supporting the main SOTA techniques! General Compressor: Q : R d → R d can be biased! E Q � x − Q ( x ) � 2 ≤ (1 − δ ) � x � 2 ∀ x ∈ R d Examples: Quantization, rounding, sign, top- k , rank- k S. U. Stich CHOCO-SGD 4
Main Contribution: CHOCO-SGD We propose CHOCO-SGD: a decentralized SGD algorithm with communication compression. Main result: CHOCO-SGD converges at the rate � ¯ � σ 2 1 x T ) − f ⋆ = O f (¯ + µnT µ 2 δ 2 ρ 4 T 2 � �� � � �� � linear speedup higher order term, accounting matches centralized baseline for topology and compression σ = 1 n σ 2 f µ -strong convex, variance ¯ i , spectral gap of topology ρ > 0 • first scheme with linear speedup for arbitrary compressors • improves over previous approach [Tang et al., Neurips 18] S. U. Stich CHOCO-SGD 5
Key Technique: CHOCO-Gossip We propose CHOCO-Gossip: a new algorithm with communication compression for the average consensus problem: n x = 1 � ¯ x i n i =1 classic gossip averaging compression with error feedback + [Xiao & Boyd, 04] [Stich et al., NeurIPS 18] • linear convergence for arbitrary compressors • all previous gossip schemes with compression did not converge linearly (or not at all) for arbitrary compressors S. U. Stich CHOCO-SGD 6
Experimental Results Example: quantization to 4bits epochs transmitted data Logistic regression on epsilon dataset, ring topology with n = 9 nodes. S. U. Stich CHOCO-SGD 7
Recommend
More recommend