cpsgd c ommunication efficient and differentially p
play

cpSGD: c ommunication-efficient and differentially- p rivate - PowerPoint PPT Presentation

cpSGD: c ommunication-efficient and differentially- p rivate distributed SGD Naman Agarwal, Ananda Theertha Suresh, Felix X. Yu Sanjiv Kumar, H. Brendan McMahan Distributed learning with mobile devices Train a centralized model; data stays on


  1. cpSGD: c ommunication-efficient and differentially- p rivate distributed SGD Naman Agarwal, Ananda Theertha Suresh, Felix X. Yu Sanjiv Kumar, H. Brendan McMahan

  2. Distributed learning with mobile devices Train a centralized model; data stays on mobile phones. In each iteration...

  3. Server sends model to clients... w w w w w w ∊ R d : the model vector

  4. Clients send updates back... w - learning_rate ∑ i δw i /n δw 4 δw 2 δw 3 δw 1 n: number of clients δw i : gradient of the i-th client

  5. Challenge I: uplink communication is expensive w - learning_rate ∑ i Q(δw i )/n Q(δw 2 ) Q(δw 3 ) Q(δw 4 ) Q(δw 1 ) ● Q: quantization

  6. How to design the quantization? ● Convergence of SGD depends on the MSE of the estimated gradient. ● Sufficient to study: bits vs. quantization error in distributed mean estimation. ○ No compression (float): 32 bits per coordinate; 0 MSE. ○ Binary quantization: 1 bit; O(d/n) MSE ○ Variable length coding: O(1/n) MSE ○ [Suresh et al., 17] [Alistarh et al., 17] [Wen et al., 17] [Bernstein et al., 18]

  7. Challenge II: user privacy is important ● Differential privacy (DP) ○ Removing or changing single client’s data should not result in big difference in the estimated mean ○ Adding Gaussian noise [Abadi et al., 16] Goal of this paper ● Both communication efficiency and differential privacy

  8. Attempt 1: add Gaussian noise on the server ∑ i Q(x i )/n + Q(x 4 ) Q(x 2 ) Q(x 3 ) Q(x 1 ) ● DP results readily available ○ Assuming L2 norm of the gradient is bounded (gradient clipping). ● Server has to be trustworthy.

  9. Attempt 2: add Gaussian noise on the client ∑ i Q(x i )/n Q(x 4 ) Q(x 2 ) Q(x 3 ) Q(x 1 ) ● After quantization: no communication efficiency. ● Before quantization: hard to analyze.

  10. cpSGD: add binomial noise after quantization ∑ i Q(x i )/n Q(x 4 ) Q(x 2 ) Q(x 3 ) Q(x 1 )

  11. cpSGD ● Maintains communication efficiency ○ Binomial is discrete. ● Differentially private ○ Binomial similar to Gaussian. ○ Extended to d-dimension with improved bound. ● Works if server is negligent but not malicious ● Works even if clients do not trust the server ○ Secure aggregation.

  12. For d variables and n ≈ d clients, cpSGD uses ● O(log log(nd)) bits of communication per client per coordinate ● Constant privacy Tue Dec 4th 05:00 -- 07:00 PM Room 210 & 230 AB #27

Recommend


More recommend