Decentralized Exploration in Multi-Armed Bandits Raphal Fraud, Rda - PowerPoint PPT Presentation

Decentralized Exploration in Multi-Armed Bandits Raphaël Féraud, Réda Alami, Romain Laroche raphael.feraud@orange.com, reda.alami@orange.com, Romain.Laroche@microsoft.com June 2019 R. Féraud, R. Alami, R. Laroche 1 / 19

Context and Motivation Outline Context and Motivation 1 Decentralized Exploration Problem 2 Decentralized Elimination Algorithm 3 Experiments 4 Conclusion 5 R. Féraud, R. Alami, R. Laroche 2 / 19

Context and Motivation Sequential A/B testing use cases Most of digital applications perform sequential A/B testing in order to optimize their audience. For instance, Orange web portal performs marketing optimization for promoting services: If I would like to promote Orange TV which banner is the best ? Should I push on Games of Thrones or on Sports? R. Féraud, R. Alami, R. Laroche 3 / 19

Context and Motivation (Centralized) Exploration Problem Definition 1 ( ǫ -optimal arm) An arm k P K is said to be ǫ -optimal, if µ k ➙ µ k ✝ ✁ ǫ , where k ✝ ✏ arg max k P K µ k , ǫ P ♣ 0 , 1 s , and µ k is the mean reward of arm k . Centralized approach: The click stream of users is gathered and processed by a Best Arm Identification algorithm to choose with high probability an ǫ -optimal arm. Do we really need to gather billions of logs containing private user’s information for handling sequential A/B testing use cases ? R. Féraud, R. Alami, R. Laroche 4 / 19

Decentralized Exploration Problem Outline Context and Motivation 1 Decentralized Exploration Problem 2 Decentralized Elimination Algorithm 3 Experiments 4 Conclusion 5 R. Féraud, R. Alami, R. Laroche 5 / 19

Decentralized Exploration Problem Problem setting Definition 2 (message) A message is a random variable, that is sent by player n to other players. When the event "player n is 1 active" occurs, player n reads the messages received from other players Player n chooses an arm to play. 2 The reward of the played arm is 3 revealed to player n . Player n may update its set of 4 arms and/or send a message to the other players. Goal Designing an algorithm that samples effectively to find an ǫ -optimal arm for each player, while ensuring privacy and minimizing the number of messages. R. Féraud, R. Alami, R. Laroche 6 / 19

Decentralized Exploration Problem Privacy guarantee We define the privacy level as the information about the preferred arms of a player, that an adversary could infer by intercepting the messages of this player. Definition 3 ( ♣ ǫ, η q -private). The decentralized algorithm A is ♣ ǫ, η q -private for finding an ǫ -approximation of the best arm, if for any player n , ❊ η 1 , 0 ➔ η 1 ➔ η ➔ 1 such that an adversary, that knows M n , the set of messages of player n , and the algorithm A , can infer what arm is an ǫ -approximation of the best arm for player n with a probability at least 1 ✁ η 1 : ❅ l n P t 1 , ..., L ✉ , K n ♣ l n q ❸ K ǫ ⑤ M n , A ❅ n P N , � ✟ ➙ 1 ✁ η 1 , P where K ǫ is the set of ǫ -optimal arms, and K n is the set of arms of player n , and l n is the number of times where K n has been updated, and L ↕ K . 1 ✁ η is the confidence level associated to the decision of the adversary: the higher η , the higher the privacy protection. R. Féraud, R. Alami, R. Laroche 7 / 19

Decentralized Elimination Algorithm Outline Context and Motivation 1 Decentralized Exploration Problem 2 Decentralized Elimination Algorithm 3 Experiments 4 Conclusion 5 R. Féraud, R. Alami, R. Laroche 8 / 19

Decentralized Elimination Algorithm Decentralized Elimination: the principle An Arm Selection Subroutine is run on each player. The players exchange the indexes of arms that they eliminate with a high probability of failure η . The high probability of failure insures privacy of messages. When enough players vote for the elimination of an arm, it is eliminated for all players. Why does it work ? When M ↕ N players independently eliminate an arm with a probability of failure η , then the probability of failure of the group of M voting players is δ ✏ η M . R. Féraud, R. Alami, R. Laroche 9 / 19

Decentralized Elimination Algorithm Decentralized Elimination: a generic algorithm Definition 4 (Arm Selection Subroutine) An ArmSelection subroutine takes as parameters an approximation factor ǫ , a failure probability η , and a set of remaining arm K n ♣ l n q , where l n is the number of times K n has been updated. It samples a remaining arm in K n ♣ l n q and returns the set of eliminated arms K n ♣ l n q . An ArmSelection subroutine satisfies Properties 1 and 2. Property 1 (remaining ǫ -optimal arm) ❅ l n P t 1 , ..., L ✉ , K n � l n ✁ 1 l n ✟ ⑨ K n � ✟ , l n ✁ 1 t K n ♣ l n q ❳ K ǫ ✏ ❍✉⑤ H t n , K n � ❳ K ǫ ✘ ❍ ↕ η ✂ f ♣ l n q , � ✟ ✟ P where 0 ↕ f ♣ l n q ↕ 1 and ➳ f ♣ l n q ✏ 1 , and H t n is the interaction history. l n Property 2 (finite sample complexity) ❉ t n ➙ 1 , ❅ η P ♣ 0 , 1 q , ❅ ǫ P ♣ 0 , 1 s , P t K n ♣ L q ⑨ K ǫ ✉⑤ H t n ✟ ➙ 1 ✁ η. � R. Féraud, R. Alami, R. Laroche 10 / 19

Decentralized Exploration in Multi-Armed Bandits Raphal Fraud, Rda - PowerPoint PPT Presentation

Decentralized Exploration in Multi-Armed Bandits Raphal Fraud, Rda Alami, Romain Laroche raphael.feraud@orange.com, reda.alami@orange.com, Romain.Laroche@microsoft.com June 2019 R. Fraud, R. Alami, R. Laroche 1 / 19 Context and

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Ciara Pike-Burke

On conditional versus marginal bias in multi-armed bandits Jaehyeok Shin 1 , Aaditya Ramdas 1,2

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of

Social Learning in Multi Agent Multi Armed Bandits Abishek Sankararaman, UC Berkeley April 9,

Muti-armed Bandits,Online Learning and Sequential Prediction Jian Li Institute for

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Breakfasts 2016 Welcome to Novembers BIC Breakfast: BIC Realtime Standards for Instant

Fourth-Quarter and Fiscal Year 2016 Financial Results and Update August 22, 2016 1

Q1 Fiscal 2020 Supplemental Slides December 11, 2019 Disclaimer Certain information in this

TIPS & TRENDS FOR 2018 Powered by: Key Takeaways Learn how to save $$ on your digital plan

Key Escrow Key escrow system allows authorized third party to recover key Useful when

Full Spectrum Computer Network (Active) Defense Black hat USA 2013 Agenda Disclaimer

BANK MARKETING ETING DATA ANALYSI SIS Instructor: Professor Soon Ae Chun Subject Name: BDA761

OVERVIEW OF THE STOCHASTIC THEORY OF PORTFOLIOS IOANNIS KARATZAS Department of Mathematics,

Sambuz

Useful Links

Newsletter

Mail Us

Decentralized Exploration in Multi-Armed Bandits Raphal Fraud, Rda - PowerPoint PPT Presentation

Decentralized Exploration in Multi-Armed Bandits Raphal Fraud, Rda Alami, Romain Laroche raphael.feraud@orange.com, reda.alami@orange.com, Romain.Laroche@microsoft.com June 2019 R. Fraud, R. Alami, R. Laroche 1 / 19 Context and

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Ciara Pike-Burke

On conditional versus marginal bias in multi-armed bandits Jaehyeok Shin 1 , Aaditya Ramdas 1,2

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of

Social Learning in Multi Agent Multi Armed Bandits Abishek Sankararaman, UC Berkeley April 9,

Muti-armed Bandits,Online Learning and Sequential Prediction Jian Li Institute for

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Breakfasts 2016 Welcome to Novembers BIC Breakfast: BIC Realtime Standards for Instant

Fourth-Quarter and Fiscal Year 2016 Financial Results and Update August 22, 2016 1

Q1 Fiscal 2020 Supplemental Slides December 11, 2019 Disclaimer Certain information in this

TIPS &amp; TRENDS FOR 2018 Powered by: Key Takeaways Learn how to save $$ on your digital plan

Key Escrow Key escrow system allows authorized third party to recover key Useful when

Full Spectrum Computer Network (Active) Defense Black hat USA 2013 Agenda Disclaimer

BANK MARKETING ETING DATA ANALYSI SIS Instructor: Professor Soon Ae Chun Subject Name: BDA761

OVERVIEW OF THE STOCHASTIC THEORY OF PORTFOLIOS IOANNIS KARATZAS Department of Mathematics,

Sambuz

Useful Links

Newsletter

Mail Us

TIPS & TRENDS FOR 2018 Powered by: Key Takeaways Learn how to save $$ on your digital plan