decentralized exploration in multi armed bandits
play

Decentralized Exploration in Multi-Armed Bandits Raphal Fraud, Rda - PowerPoint PPT Presentation

Decentralized Exploration in Multi-Armed Bandits Raphal Fraud, Rda Alami, Romain Laroche raphael.feraud@orange.com, reda.alami@orange.com, Romain.Laroche@microsoft.com June 2019 R. Fraud, R. Alami, R. Laroche 1 / 19 Context and


  1. Decentralized Exploration in Multi-Armed Bandits Raphaël Féraud, Réda Alami, Romain Laroche raphael.feraud@orange.com, reda.alami@orange.com, Romain.Laroche@microsoft.com June 2019 R. Féraud, R. Alami, R. Laroche 1 / 19

  2. Context and Motivation Outline Context and Motivation 1 Decentralized Exploration Problem 2 Decentralized Elimination Algorithm 3 Experiments 4 Conclusion 5 R. Féraud, R. Alami, R. Laroche 2 / 19

  3. Context and Motivation Sequential A/B testing use cases Most of digital applications perform sequential A/B testing in order to optimize their audience. For instance, Orange web portal performs marketing optimization for promoting services: If I would like to promote Orange TV which banner is the best ? Should I push on Games of Thrones or on Sports? R. Féraud, R. Alami, R. Laroche 3 / 19

  4. Context and Motivation (Centralized) Exploration Problem Definition 1 ( ǫ -optimal arm) An arm k P K is said to be ǫ -optimal, if µ k ➙ µ k ✝ ✁ ǫ , where k ✝ ✏ arg max k P K µ k , ǫ P ♣ 0 , 1 s , and µ k is the mean reward of arm k . Centralized approach: The click stream of users is gathered and processed by a Best Arm Identification algorithm to choose with high probability an ǫ -optimal arm. Do we really need to gather billions of logs containing private user’s information for handling sequential A/B testing use cases ? R. Féraud, R. Alami, R. Laroche 4 / 19

  5. Context and Motivation (Centralized) Exploration Problem Definition 1 ( ǫ -optimal arm) An arm k P K is said to be ǫ -optimal, if µ k ➙ µ k ✝ ✁ ǫ , where k ✝ ✏ arg max k P K µ k , ǫ P ♣ 0 , 1 s , and µ k is the mean reward of arm k . Centralized approach: The click stream of users is gathered and processed by a Best Arm Identification algorithm to choose with high probability an ǫ -optimal arm. Do we really need to gather billions of logs containing private user’s information for handling sequential A/B testing use cases ? R. Féraud, R. Alami, R. Laroche 4 / 19

  6. Context and Motivation (Centralized) Exploration Problem Definition 1 ( ǫ -optimal arm) An arm k P K is said to be ǫ -optimal, if µ k ➙ µ k ✝ ✁ ǫ , where k ✝ ✏ arg max k P K µ k , ǫ P ♣ 0 , 1 s , and µ k is the mean reward of arm k . Centralized approach: The click stream of users is gathered and processed by a Best Arm Identification algorithm to choose with high probability an ǫ -optimal arm. Do we really need to gather billions of logs containing private user’s information for handling sequential A/B testing use cases ? R. Féraud, R. Alami, R. Laroche 4 / 19

  7. Decentralized Exploration Problem Outline Context and Motivation 1 Decentralized Exploration Problem 2 Decentralized Elimination Algorithm 3 Experiments 4 Conclusion 5 R. Féraud, R. Alami, R. Laroche 5 / 19

  8. Decentralized Exploration Problem Problem setting Definition 2 (message) A message is a random variable, that is sent by player n to other players. When the event "player n is 1 active" occurs, player n reads the messages received from other players Player n chooses an arm to play. 2 The reward of the played arm is 3 revealed to player n . Player n may update its set of 4 arms and/or send a message to the other players. Goal Designing an algorithm that samples effectively to find an ǫ -optimal arm for each player, while ensuring privacy and minimizing the number of messages. R. Féraud, R. Alami, R. Laroche 6 / 19

  9. Decentralized Exploration Problem Privacy guarantee We define the privacy level as the information about the preferred arms of a player, that an adversary could infer by intercepting the messages of this player. Definition 3 ( ♣ ǫ, η q -private). The decentralized algorithm A is ♣ ǫ, η q -private for finding an ǫ -approximation of the best arm, if for any player n , ❊ η 1 , 0 ➔ η 1 ➔ η ➔ 1 such that an adversary, that knows M n , the set of messages of player n , and the algorithm A , can infer what arm is an ǫ -approximation of the best arm for player n with a probability at least 1 ✁ η 1 : ❅ l n P t 1 , ..., L ✉ , K n ♣ l n q ❸ K ǫ ⑤ M n , A ❅ n P N , � ✟ ➙ 1 ✁ η 1 , P where K ǫ is the set of ǫ -optimal arms, and K n is the set of arms of player n , and l n is the number of times where K n has been updated, and L ↕ K . 1 ✁ η is the confidence level associated to the decision of the adversary: the higher η , the higher the privacy protection. R. Féraud, R. Alami, R. Laroche 7 / 19

  10. Decentralized Elimination Algorithm Outline Context and Motivation 1 Decentralized Exploration Problem 2 Decentralized Elimination Algorithm 3 Experiments 4 Conclusion 5 R. Féraud, R. Alami, R. Laroche 8 / 19

  11. Decentralized Elimination Algorithm Decentralized Elimination: the principle An Arm Selection Subroutine is run on each player. The players exchange the indexes of arms that they eliminate with a high probability of failure η . The high probability of failure insures privacy of messages. When enough players vote for the elimination of an arm, it is eliminated for all players. Why does it work ? When M ↕ N players independently eliminate an arm with a probability of failure η , then the probability of failure of the group of M voting players is δ ✏ η M . R. Féraud, R. Alami, R. Laroche 9 / 19

  12. Decentralized Elimination Algorithm Decentralized Elimination: the principle An Arm Selection Subroutine is run on each player. The players exchange the indexes of arms that they eliminate with a high probability of failure η . The high probability of failure insures privacy of messages. When enough players vote for the elimination of an arm, it is eliminated for all players. Why does it work ? When M ↕ N players independently eliminate an arm with a probability of failure η , then the probability of failure of the group of M voting players is δ ✏ η M . R. Féraud, R. Alami, R. Laroche 9 / 19

  13. Decentralized Elimination Algorithm Decentralized Elimination: a generic algorithm Definition 4 (Arm Selection Subroutine) An ArmSelection subroutine takes as parameters an approximation factor ǫ , a failure probability η , and a set of remaining arm K n ♣ l n q , where l n is the number of times K n has been updated. It samples a remaining arm in K n ♣ l n q and returns the set of eliminated arms K n ♣ l n q . An ArmSelection subroutine satisfies Properties 1 and 2. Property 1 (remaining ǫ -optimal arm) ❅ l n P t 1 , ..., L ✉ , K n � l n ✁ 1 l n ✟ ⑨ K n � ✟ , l n ✁ 1 t K n ♣ l n q ❳ K ǫ ✏ ❍✉⑤ H t n , K n � ❳ K ǫ ✘ ❍ ↕ η ✂ f ♣ l n q , � ✟ ✟ P where 0 ↕ f ♣ l n q ↕ 1 and ➳ f ♣ l n q ✏ 1 , and H t n is the interaction history. l n Property 2 (finite sample complexity) ❉ t n ➙ 1 , ❅ η P ♣ 0 , 1 q , ❅ ǫ P ♣ 0 , 1 s , P t K n ♣ L q ⑨ K ǫ ✉⑤ H t n ✟ ➙ 1 ✁ η. � R. Féraud, R. Alami, R. Laroche 10 / 19

  14. Decentralized Elimination Algorithm Decentralized Elimination: a generic algorithm Definition 4 (Arm Selection Subroutine) An ArmSelection subroutine takes as parameters an approximation factor ǫ , a failure probability η , and a set of remaining arm K n ♣ l n q , where l n is the number of times K n has been updated. It samples a remaining arm in K n ♣ l n q and returns the set of eliminated arms K n ♣ l n q . An ArmSelection subroutine satisfies Properties 1 and 2. Property 1 (remaining ǫ -optimal arm) ❅ l n P t 1 , ..., L ✉ , K n � l n ✁ 1 l n ✟ ⑨ K n � ✟ , l n ✁ 1 t K n ♣ l n q ❳ K ǫ ✏ ❍✉⑤ H t n , K n � ❳ K ǫ ✘ ❍ ↕ η ✂ f ♣ l n q , � ✟ ✟ P where 0 ↕ f ♣ l n q ↕ 1 and ➳ f ♣ l n q ✏ 1 , and H t n is the interaction history. l n Property 2 (finite sample complexity) ❉ t n ➙ 1 , ❅ η P ♣ 0 , 1 q , ❅ ǫ P ♣ 0 , 1 s , P t K n ♣ L q ⑨ K ǫ ✉⑤ H t n ✟ ➙ 1 ✁ η. � R. Féraud, R. Alami, R. Laroche 10 / 19

  15. Decentralized Elimination Algorithm Decentralized Elimination: a generic algorithm Definition 4 (Arm Selection Subroutine) An ArmSelection subroutine takes as parameters an approximation factor ǫ , a failure probability η , and a set of remaining arm K n ♣ l n q , where l n is the number of times K n has been updated. It samples a remaining arm in K n ♣ l n q and returns the set of eliminated arms K n ♣ l n q . An ArmSelection subroutine satisfies Properties 1 and 2. Property 1 (remaining ǫ -optimal arm) ❅ l n P t 1 , ..., L ✉ , K n � l n ✁ 1 l n ✟ ⑨ K n � ✟ , l n ✁ 1 t K n ♣ l n q ❳ K ǫ ✏ ❍✉⑤ H t n , K n � ❳ K ǫ ✘ ❍ ↕ η ✂ f ♣ l n q , � ✟ ✟ P where 0 ↕ f ♣ l n q ↕ 1 and ➳ f ♣ l n q ✏ 1 , and H t n is the interaction history. l n Property 2 (finite sample complexity) ❉ t n ➙ 1 , ❅ η P ♣ 0 , 1 q , ❅ ǫ P ♣ 0 , 1 s , P t K n ♣ L q ⑨ K ǫ ✉⑤ H t n ✟ ➙ 1 ✁ η. � R. Féraud, R. Alami, R. Laroche 10 / 19

Recommend


More recommend