Agnostic federated learning Mehryar Mohri 1 , 2 , Gary Sivek 1 , Ananda Theertha Suresh 1 1 Google Research, 2 Courant Institute June 11, 2019 1/8
Federated learning scenario [McMahan et al., ’17] centralized model · · · client A client B client Z ◮ Data from large number of clients (phones, sensors) ◮ Data remains distributed over clients ◮ Centralized model trained based on data What is the loss function? 2/8
Standard federated learning Setting ◮ Merge samples from all clients and minimize loss ◮ Domains: clusters of clients ◮ Clients belong to p domains: D 1 , D 2 , . . . , D p Training procedure ◮ ˆ D k : empirical distribution of D k with m k samples ◮ ˆ U : uniform distribution over all observed samples p m k ˆ ˆ � U = D k � p i =1 m i k =1 ◮ Minimize loss over uniform distribution min h ∈H L ˆ U ( h ) 3/8
Inference distribution Training Proprietary + Confidential Inference APPENDIX The loss function? Proprietary + Confidential The loss function? APPENDIX Inference Training Inference Training Inference distribution is not same as training distribution Inference distribution is not same as training distribution – E.g., training only when the phone is connected to wifj and is being charged – E.g., training only when the phone is connected to wifj and is being charged Inference distribution is not same as the training distribution P 4 Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem P 4 Permissions, hardware compatibility, network constraints Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem 4/8
Agnostic federated learning D λ · · · D 1 D 2 D p ◮ Learn model that performs well over any mixture of domains ◮ D λ = � p k =1 λ k · ˆ D k ◮ λ is unknown and belongs to Λ ⊆ ∆ p ◮ Minimize the agnostic loss min h ∈H max λ ∈ Λ L D λ ( h ) ◮ Fairness implications 5/8
Theoretical results Generalization bound Asume L is bounded by M . For any δ > 0, with probability at least 1 − δ , for all h ∈ H and λ ∈ Λ, � s ( λ � m ) log | Λ ǫ | L D λ ( h ) ≤ L D λ ( h ) + 2 R m ( G , λ ) + M ǫ + M 2 m δ ◮ R m ( G , λ ) : weighted Rademacher complexity ◮ s ( λ � m ) : skewness parameter 1 + χ 2 ( λ, m ) ◮ Regularization based on generalization bound Efficient algorithms? 6/8
Stochastic optimization as a two player game Algorithm Stochastic-AFL Initialization : w 0 ∈ W and λ 0 ∈ Λ. Parameters : step size γ w > 0 and γ λ > 0. For t = 1 to T : 1. Stochastic gradients: δ w L( w t − 1 , λ t − 1 ) and δ λ L( w t − 1 , λ t − 1 ) 2. w t = Project ( w t − 1 − γ w δ w L( w t − 1 , λ t − 1 ) , W ) 3. λ t = Project ( λ t − 1 + γ λ δ λ L( w t − 1 , λ t − 1 ) , Λ) Output : w A = 1 t =1 w t and λ A = 1 � T � T t =1 λ t T T Results √ ◮ 1 / T convergence ◮ Extensions to stochastic mirror descent ◮ Experimental validation of the above results 7/8
Thank you!, more at poster #172 8/8
Recommend
More recommend