Cooperative Bandits with Heavy Tails Dubey and Pentland ICML 2020 Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation Summary Abhimanyu Dubey and Alex Pentland Background K-Armed Bandits Cooperation Optimism Media Lab and Institute for Data Systems and Society (IDSS) Heavy Tails Massachusetts Institute of Technology Method Message-Passing dubeya@mit.edu Algorithm Regret Guarantees Optimizations ICML 2020 Conclusion
Cooperative Multi-Armed Bandits Bandits with Heavy Tails Dubey and Pentland ICML 2020 Introduction K-Armed Bandits Cooperation Summary Background K-Armed Bandits Cooperation Optimism Heavy Tails Method Message-Passing Algorithm Regret Guarantees Optimizations Conclusion Figure: Multi-armed bandit (courtesy lilianweng.github.io ).
Cooperative Cooperative Bandits Bandits with Heavy Tails Dubey and Pentland ICML 2020 Introduction ◮ Distributed learning is an increasingly popular paradigm in ML: multiple K-Armed Bandits Cooperation parties collaborate to train a stronger joint model by sharing data. Summary ◮ An alternative is to let data remain in a distributed setup, and have one ML Background K-Armed Bandits algorithm (agent) for each data center, i.e., federated learning. Cooperation Optimism ◮ Each agent can communicate with other agents to (securely) share relevant Heavy Tails Method information, e.g., over a network. Message-Passing ◮ The group of all agents therefore collectively cooperate to solve their own Algorithm Regret Guarantees Optimizations learning problems. Conclusion
Cooperative Summary of Contributions Bandits with Heavy Tails Dubey and Pentland ICML 2020 ◮ In many application areas, observations are heavy tailed, e.g., in internet Introduction K-Armed Bandits traffic analysis and supply chain networks. Cooperation Summary ◮ Current cooperative bandit algorithms operate largely by distributed Background consensus , that averages opinions held by agents. K-Armed Bandits Cooperation ◮ Consensus protocols are inherently not robust to heavy-tailed reward Optimism Heavy Tails distributions, and have inefficient communication complexity. Method Message-Passing ◮ Summary : In this paper, we propose algorithms for the heavy-tailed Algorithm Regret Guarantees cooperative bandit that uses an alternative decentralized communication Optimizations protocol, resulting in efficient and robust multi-agent bandit learning. Conclusion
Cooperative Stochastic Multi-Armed Bandits Bandits with Heavy Tails Dubey and Pentland ICML 2020 ◮ K actions (“arms”) that return rewards r k sampled i.i.d. from K different Introduction distributions, each with mean µ k . K-Armed Bandits Cooperation ◮ The problem proceeds in rounds; at each round t , the agent chooses action Summary a t , and obtains a randomly drawn reward r ( t ), such that E [ r ( t )] = µ a t . Background K-Armed Bandits ◮ The goal is to minimize regret (for µ ∗ = arg max k ∈ [ K ] µ k ), Cooperation Optimism Heavy Tails � � ( µ ∗ − µ k ) E [ n k ( T )] T · µ ∗ Method R ( T ) = − µ k E [ n k ( T )] = Message-Passing � �� � Algorithm k ∈ [ K ] k ∈ [ K ] best possible Regret Guarantees avg. reward � �� � � �� � Optimizations obtained reward expected “loss” from (in expectation) Conclusion picking suboptimal arms
Cooperative Cooperative Multi-Armed Bandits Bandits with Heavy Tails Dubey and Pentland ICML 2020 Introduction K-Armed Bandits ◮ M agents are each faced with the same K -armed bandit problem. Cooperation Summary ◮ Agents are connected by a (connected, undirected) graph G . Background K-Armed Bandits ◮ The agents must cooperate to collectively minimize the group regret : Cooperation Optimism Heavy Tails � R G ( T ) = R m ( T ) Method Message-Passing m ∈G Algorithm Regret Guarantees Optimizations Conclusion
Cooperative The Upper Confidence Bound (UCB) Algorithm Bandits with Heavy Tails Dubey and Pentland ICML 2020 ◮ “Optimism in the face of uncertainty” strategy – i.e. to be optimistic about Introduction an arm when we are uncertain of its utility. K-Armed Bandits Cooperation ◮ For each arm, we compute Summary Background K-Armed Bandits � � n k ( t − 1) r i Cooperation 2 ln( t − 1) i =1 k Optimism Q k ( t ) = + . Heavy Tails n k ( t − 1) n k ( t − 1) Method � �� � � �� � Message-Passing empirical mean UCB( t ) Algorithm Regret Guarantees Optimizations ◮ Choose arm with largest Q k ( t ). Conclusion
Cooperative Heavy-Tailed Distributions Bandits with Heavy Tails Dubey and Pentland ICML 2020 ◮ A random variable X is light-tailed if it admits a finite moment generating Introduction function, i.e. there exists u 0 > 0 such that ∀| u | ≤ u 0 , K-Armed Bandits Cooperation Summary M X ( u ) � E [exp( uX )] < ∞ . Background K-Armed Bandits Otherwise X is heavy-tailed. Cooperation Optimism ◮ When rewards are sub-Gaussian, the empirical mean and variance are the Heavy Tails Method obvious estimators for the first 2 moments. Message-Passing ◮ They are asymptotically optimal estimators (rate of concentration). Algorithm Regret Guarantees ◮ They can be computed in O (1) time for streaming settings. Optimizations ◮ In case of heavy-tailed rewards we require robust estimators to obtain Conclusion optimal regret.
Cooperative Robust Estimators and the Running Consensus Bandits with Heavy Tails Dubey and Pentland ICML 2020 Introduction K-Armed Bandits ◮ Distributed consensus works by slowly ”averaging” opinions between Cooperation Summary neighboring agents. This subsequent averaging causes information to Background diffuse throughout the network. K-Armed Bandits Cooperation ◮ Robust mean estimators, however, are fundamentally incompatible with Optimism Heavy Tails naive averaging, and cannot be updated in O (1) time. Method ◮ Trimmed mean and Catoni’s estimators require O ( T ) consensus algorithms. Message-Passing ◮ Median-of-means estimator requires O (log T ) consensus algorithms. Algorithm Regret Guarantees Optimizations Conclusion
Cooperative Message Passing Protocol Bandits with Heavy Tails Dubey and Pentland ICML 2020 ◮ Instead of a consensus, each agent communicates its actions and rewards in Introduction K-Armed Bandits the form of a tuple ( a t , r t , d ), where d ≤ γ is the life of the message (i.e., it Cooperation Summary is dropped after it has been forward γ times). Background ◮ For any time t , each agent K-Armed Bandits ◮ Gathers all messages M ( t ) from its neighbors and discards stale messages. Cooperation Optimism ◮ Chooses an arm following any algorithm and obtains a reward. Heavy Tails ◮ Adds the action-reward tuple ( a t , r t , γ ) to M ( t ). Method Message-Passing ◮ Sends each message in M ( t ) to all its neighbors. Algorithm Regret Guarantees ◮ Since we are working with individual rewards, all robust estimators can be Optimizations applied to this protocol. Conclusion
Cooperative Robust Message-Passing UCB Bandits with Heavy Tails Dubey and Pentland ICML 2020 For any time t , each agent m Introduction ◮ Gathers all messages M ( t ) from its neighbors and discards all messages K-Armed Bandits Cooperation with d = 0. Summary ◮ Filters all unseen messages by arm k and adds new rewards to Background K-Armed Bandits corresponding sets S k m ( t ). Cooperation Optimism ◮ Computes the mean ˆ µ m k ( t ) for each arm k from S k m ( t ) using any robust Heavy Tails Method mean estimator. Message-Passing ◮ Chooses arm that maximizes ˆ k ( t ) + UCB m µ m Algorithm k ( t ), and obtains reward r t . Regret Guarantees Optimizations ◮ Adds the action-reward tuple ( a t , r t , γ ) to M ( t ). Conclusion ◮ Sends each message in M ( t ) to all its neighbors.
Cooperative Lower Bounds Bandits with Heavy Tails Dubey and Pentland ICML 2020 Lower Bound for Cooperative Setting Introduction K-Armed Bandits Under suitable assumptions, for any ∆ ∈ (0 , 1 / 4) and ε ∈ (0 , 1], there exist Cooperation Summary K ≥ 2 heavy-tailed distributions such that any consistent algorithm obtains Background � K ∆ − 1 /ε ln T � regret of order Ω when run on a connected graph G . K-Armed Bandits Cooperation Optimism ◮ This is a generalization of the lower bound for multiple arm pulls to account Heavy Tails for delayed feedback over connected graphs. Method Message-Passing ◮ Existing optimality rates are in comparison to a single agent pulling MT Algorithm Regret Guarantees arms sequentially, which we demonstrate to be inaccurate with upper Optimizations Conclusion bounds that match the above lower bound.
Recommend
More recommend