differentially private federated linear bandits
play

Differentially-Private Federated Linear Bandits Introduction - PowerPoint PPT Presentation

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual Bandits Summary Background Abhimanyu Dubey and Alex Pentland


  1. Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual Bandits Summary Background Abhimanyu Dubey and Alex Pentland Contextual Bandits Federated Bandits Optimism Media Lab and Institute for Data Systems and Society (IDSS) Cooperation Massachusetts Institute of Technology Differential Privacy Method dubeya@mit.edu Algorithm Design Algorithm Regret Guarantees June 2020 Conclusion

  2. Differentially- Federated Learning Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction Federated Learning Contextual Bandits Summary Background Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy Method Algorithm Design Algorithm Regret Guarantees Conclusion Figure: Federated Learning (courtesy blogs.nvidia.com ).

  3. Differentially- Federated Learning Private Federated Linear Bandits Dubey and Pentland June 2020 Advantages : ◮ Agents have small personal datasets, resulting in weak local models. Introduction Federated Learning ◮ The federated learning model allows each agent to leverage the Contextual Bandits Summary stronger joint model trained on data from all agents. Background Contextual Bandits ◮ Federated learning is designed to be private: Federated Bandits Optimism ◮ No raw data leaves any agent. Cooperation ◮ All messages sent to the server must keep user data private. Differential Privacy Method Challenges : Algorithm Design Algorithm ◮ Communication-utility tradeoff: frequent communication can be Regret Guarantees Conclusion expensive and non-private, but grant higher utility. ◮ Performance guarantees are non-trivial to obtain for private algorithms.

  4. Differentially- Multi-Armed Bandits Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction Federated Learning Contextual Bandits Summary Background Contextual Bandits Federated Bandits Optimism Cooperation Differential Privacy Method Algorithm Design Algorithm Regret Guarantees Conclusion Figure: Multi-armed bandit (courtesy lilianweng.github.io ).

  5. Differentially- Contextual Bandits Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction ◮ The most fundamental reinforcement learning problem, basic framework to Federated Learning Contextual Bandits Summary study sequential decision-making. Background ◮ Contextual bandits have numerous applications: Contextual Bandits Federated Bandits ◮ Recommender systems in e-commerce. Optimism ◮ Portfolio selection and management. Cooperation Differential Privacy ◮ Channel selection in distributed communication systems. Method ◮ Information retrieval and caching. Algorithm Design ◮ Power schedules for current limiting in electric vehicle batteries. Algorithm Regret Guarantees Conclusion

  6. Differentially- Summary of Contributions Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction ◮ We study the contextual bandit in a differentially-private federated setting. Federated Learning Contextual Bandits ◮ We provide the first differentially-private algorithms for both centralized Summary Background and decentralized federated learning for the multi-agent contextual bandit. Contextual Bandits ◮ We prove rigorous bounds on the utility of our algorithms - matching Federated Bandits � Optimism near-optimal rates in terms of regret (utility) and only a factor of O ( 1 /ε ) Cooperation Differential Privacy from the optimal rate in terms of privacy. Method Algorithm Design ◮ We additionally shed some light into the communication-utility tradeoff, Algorithm Regret Guarantees and provide design guidelines for practitioners in real-world settings. Conclusion

  7. Differentially- Single-Agent Contextual Bandits Private Federated Linear Bandits Dubey and Pentland June 2020 ◮ In each round t , the agent is given a decision set D t . Introduction ◮ They select an action x t ∈ D t and obtain a reward y t , such that Federated Learning Contextual Bandits Summary y t = ( θ ∗ ) ⊤ x t + ε t , Background Contextual Bandits Federated Bandits ε t is i.i.d. noise, and θ ∗ is an unknown (but fixed) parameter vector. Optimism Cooperation ◮ The objective of the problem is to minimize regret: Differential Privacy Method Algorithm Design � � T � Algorithm ( θ ∗ ) ⊤ x ∗ t − ( θ ∗ ) ⊤ x t , where , x ∗ x ⊤ θ ∗ . R ( T ) = t = arg max Regret Guarantees Conclusion x ∈ D t t =1

  8. Differentially- Federated Contextual Bandits Private Federated Linear Bandits Dubey and Pentland ◮ M agents are each solving the same contextual bandit in parallel. June 2020 ◮ Each agent m ∈ [ M ] receives their own (unique) decision sets, and selects Introduction actions independently of other agents. Federated Learning Contextual Bandits ◮ Agents communicate with each other following fixed protocols: Summary ◮ Centralized Setting : Agents synchronize via a central server, i.e., they send Background Contextual Bandits synchronization requests to the server, and the server acts as an intermediary. Federated Bandits ◮ Decentralized Setting : Agents directly communicate with each other over Optimism Cooperation an undirected network via peer-to-peer messages. Differential Privacy Method ◮ The objective of the problem is to minimize group regret: Algorithm Design Algorithm Regret Guarantees � � T � � Conclusion ( θ ∗ ) ⊤ x ∗ m , t − ( θ ∗ ) ⊤ x m , t , where , x ∗ x ⊤ θ ∗ . R M ( T ) = m , t = arg max x ∈ D m , t t =1 m ∈ [ M ]

  9. Differentially- The Upper Confidence Bound (UCB) Algorithm Private Federated Linear Bandits Dubey and Pentland June 2020 ◮ “Optimism in the face of uncertainty” strategy – i.e. to be optimistic about Introduction Federated Learning an arm when we are uncertain of its utility. Contextual Bandits Summary ◮ In the multi-armed setting, for each arm we compute: Background � Contextual Bandits � n k ( t − 1) Federated Bandits r i 2 ln( t − 1) Optimism i =1 k UCB k ( t ) = + . Cooperation n k ( t − 1) n k ( t − 1) Differential Privacy � �� � � �� � Method empirical mean exploration bonus Algorithm Design Algorithm Regret Guarantees Conclusion ◮ Choose arm with largest UCB k ( t ).

  10. Differentially- The Upper Confidence Bound (UCB) Algorithm Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction ◮ In the contextual bandit case, we construct an analog to the UCB in the Federated Learning Contextual Bandits form of a confidence set E t . Summary Background ◮ E t is a region of R d that contains θ ∗ with high probability. Contextual Bandits Federated Bandits ◮ The action is taken optimistically with respect to E t , i.e., Optimism Cooperation � � Differential Privacy Method x t = arg max max θ ∈E t � x , θ � . Algorithm Design x ∈D t Algorithm Regret Guarantees Conclusion

  11. Differentially- The Upper Confidence Bound (UCB) Algorithm Private Federated Linear Bandits Dubey and Pentland June 2020 ◮ How do we construct a reasonable E t ? Introduction ◮ We look to the classic linear prediction problem: linear regression. Given Federated Learning � � ⊤ and y < t = [ y 1 y 2 ... y t − 1 ] ⊤ , consider: Contextual Bandits x ⊤ 1 x ⊤ 2 ... x ⊤ X < t = Summary t − 1 Background � � Contextual Bandits ˆ � X < t θ − y < t � 2 2 + θ ⊤ H t θ θ t := arg min Federated Bandits Optimism θ ∈ R d Cooperation Differential Privacy ◮ The regression solution can be given by ˆ Method θ t := ( G t + H t ) − 1 X ⊤ < t y < t , where Algorithm Design G t = X ⊤ < t X < t is the Gram matrix of actions, and H t is a regularizer. Algorithm Regret Guarantees ◮ Since we know the finite-sample behavior of linear regression, we can center Conclusion E t around the estimate ˆ θ t to obtain a reasonable algorithm.

  12. Differentially- The Upper Confidence Bound (UCB) Algorithm Private Federated Linear Bandits Dubey and Pentland June 2020 Introduction ◮ We can therefore set E t as follows (for some fixed β t ): Federated Learning Contextual Bandits � � Summary θ ∈ R d : ( θ − ˆ θ t ) ⊤ ( G t + H t )( θ − ˆ E t := θ t ) ≤ β t Background Contextual Bandits Federated Bandits ◮ E t is an ellipsoid centered at ˆ θ t , and β t determines its “radius”. Optimism Cooperation Differential Privacy ◮ The UCB can be given as Method � � Algorithm Design UCB t ( x ) = � ˆ x ⊤ ( G t + H t ) − 1 x Algorithm θ t , x � + β t . Regret Guarantees Conclusion

Recommend


More recommend