The Dueling Bandits Problem Yisong Yue Collaborators - PowerPoint PPT Presentation

The ¡Dueling ¡Bandits ¡Problem ¡ Yisong ¡Yue ¡ ¡

Collaborators Yanan ¡ Vincent ¡ Josef ¡ Sui ¡ Zhuang ¡ Broder ¡ Joel ¡ Thorsten ¡ Bobby ¡ Burdick ¡ Joachims ¡ Kleinberg ¡

Outline • Brief ¡Overview ¡of ¡Mul2-‑Armed ¡Bandits ¡ – Sequen@al ¡Experimental ¡Design ¡ • Dueling ¡Bandits ¡ – Mathema@cal ¡proper@es ¡ – Connec@ons ¡to ¡other ¡problems ¡ • Recent ¡Results ¡& ¡Ongoing ¡Research ¡

Multi-Armed Bandit Problem (stochastic version) • K ¡ac@ons ¡(aka ¡arms ¡or ¡bandits) ¡ • Each ¡ac@on ¡has ¡an ¡average ¡reward: ¡μ k ¡ – Unknown ¡to ¡us ¡ – Assume ¡WLOG ¡that ¡u 1 ¡is ¡largest ¡ • For ¡t ¡= ¡1…T ¡ – Algorithm ¡chooses ¡ac@on ¡a(t) ¡ Algorithm ¡only ¡receives ¡ ¡ – Receives ¡random ¡reward ¡y(t) ¡ feedback ¡on ¡chosen ¡ac@on ¡ • Expecta@on ¡μ a(t) ¡ ¡ • Goal: ¡ minimize ¡Tu 1 ¡– ¡(μ a(1) ¡ + ¡μ a(2) ¡ + ¡… ¡+ ¡μ a(T) ) ¡ “Regret” ¡ If ¡we ¡had ¡perfect ¡informa@on ¡to ¡start ¡ Expected ¡Reward ¡of ¡Algorithm ¡

Example: Interactive Personalization ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Sports ¡ : 0 Average Likes -- -- -- -- -- # Shown 0 0 0 1 0

Example: Interactive Personalization ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Sports ¡ : 0 Average Likes -- -- -- 0 -- # Shown 0 0 0 1 0

Example: Interactive Personalization ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Poli@cs ¡ : 0 Average Likes -- -- -- 0 -- # Shown 0 0 1 1 0

Example: Interactive Personalization ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Poli@cs ¡ : 1 Average Likes -- -- 1 0 -- # Shown 0 0 1 1 0

Example: Interactive Personalization ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡World ¡ : 1 Average Likes -- -- 1 0 -- # Shown 0 0 1 1 1

Example: Interactive Personalization ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡World ¡ : 1 Average Likes -- -- 1 0 0 # Shown 0 0 1 1 1

Example: Interactive Personalization ¡ ¡ ¡ ¡ ¡ ¡ ¡Economy ¡ : 1 Average Likes -- -- 1 0 0 # Shown 0 1 1 1 1

Example: Interactive Personalization … ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Economy ¡ : 2 Average Likes -- 1 1 0 0 # Shown 0 1 1 1 1

What Should Algorithm Recommend? Exploit: Explore: Best: ¡ ¡ ¡ ¡ ¡ ¡ ¡Economy ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Celebrity ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Poli@cs ¡ How ¡to ¡Op2mally ¡Balance ¡Explore/Exploit ¡Tradeoff? ¡ Characterized ¡by ¡the ¡Mul@-‑Armed ¡Bandit ¡Problem ¡ ¡ : 24 Average Likes -- 0.44 0.4 0.33 0.2 # Shown 0 25 10 15 20

( ) ( ) + ( ) … + ( OPT ) = ( ) ( ) ( ) … ( ALG ) = + + Time Horizon Regret: ( ) − ALG ( ) R ( T ) = OPT • Opportunity ¡cost ¡of ¡not ¡knowing ¡preferences ¡ • ¡“ no-‑regret ” ¡ ¡if ¡R(T)/T ¡ è ¡0 ¡ – Efficiency ¡measured ¡by ¡convergence ¡rate ¡

Thompson Sampling • Maintain ¡distribu@on ¡over ¡rewards ¡ – 𝑄 ( 𝜈↓ 1 ,… 𝜈↓𝐿 | 𝑍 ) ¡ • Every ¡round: ¡ – Sample ¡ 𝜈 ↓ 1 ,… 𝜈 ↓𝐿 ¡ – Play ¡arm ¡with ¡highest ¡ 𝜈 ↓𝑏 ¡ – Incorporate ¡feedback ¡into ¡ 𝑍 ¡

Incentivizing Exploration # ¡Arms ¡ 𝑃(𝐿/𝜁 log( 𝑈 ) ) ¡ Regret ¡Bound: ¡ Time ¡horizon ¡ Gap ¡between ¡best ¡& ¡2 nd ¡best ¡ Images ¡from ¡Chu-‑Cheng ¡Hsieh ¡ [Agrawal ¡& ¡Goyal; ¡COLT ¡2012] ¡

The Motivating Problem • Slot ¡Machine ¡= ¡One-‑Armed ¡Bandit ¡ ¡ ¡ Each ¡Arm ¡Has ¡ ¡ Different ¡Payoff ¡ • Goal: ¡ Minimize ¡regret ¡From ¡pulling ¡subop@mal ¡arms ¡ Image ¡source: ¡hhp://research.microsoj.com/en-‑us/projects/bandits/ ¡

Many Applications Online ¡Adver@sing ¡ Search ¡Engines ¡ Recommender ¡Systems ¡ Sequen2al ¡Experimental ¡Design ¡ Personalized ¡Clinical ¡ ¡ Treatment ¡

What if Rewards aren’t Directly Measureable?

Evaluating using Click Data Interpreta2on ¡1: ¡ Result ¡#2 ¡is ¡good. ¡ (Absolute) ¡ Interpreta2on ¡2: ¡ Result ¡#2 ¡is ¡beher ¡ than ¡Result ¡#1. ¡ (Rela@ve ¡/ ¡Preference) ¡

Evaluating using Click Data Retrieval ¡Func2on ¡A ¡ Retrieval ¡Func2on ¡B ¡ Which ¡is ¡beher? ¡

Analogy to Sensory Testing • (Hypothe@cal) ¡taste ¡experiment: ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡vs ¡ – Natural ¡usage ¡context ¡ • Experiment ¡1: ¡ Absolute ¡Metrics ¡ Very ¡Thirsty! ¡ ¡ ¡ 3 cans 3 cans 3 cans 2 cans 1 can 5 cans Total: 8 cans Total: 9 cans

Analogy to Sensory Testing • (Hypothe@cal) ¡taste ¡experiment: ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡vs ¡ – Natural ¡usage ¡context ¡ • Experiment ¡1: ¡ Rela2ve ¡Metrics ¡ ¡ ¡ 2 - 1 2 - 1 3 - 0 2 - 0 1 - 0 4 - 1 All 6 prefer Pepsi

Interleaving (Taste Test in Search) Ranking ¡A ¡ Ranking ¡B ¡ 1. Napa ¡Valley ¡– ¡The ¡authority ¡for ¡lodging... ¡ 1. ¡Napa ¡Country, ¡California ¡– ¡Wikipedia ¡ ¡www.napavalley.com ¡ ¡en.wikipedia.org/wiki/Napa_Valley ¡ 2. Napa ¡Valley ¡Wineries ¡-‑ ¡Plan ¡your ¡wine... ¡ 2. ¡Napa ¡Valley ¡– ¡The ¡authority ¡for ¡lodging... ¡ ¡www.napavalley.com/wineries ¡ ¡www.napavalley.com ¡ 3. Napa ¡Valley ¡College ¡ 3. ¡Napa: ¡The ¡Story ¡of ¡an ¡American ¡Eden... ¡ ¡www.napavalley.edu/homex.asp ¡ ¡books.google.co.uk/books?isbn=... ¡ 4. ¡Been ¡There ¡| ¡Tips ¡| ¡Napa ¡Valley ¡ 4. ¡Napa ¡Valley ¡Hotels ¡– ¡Bed ¡and ¡Breakfast... ¡ ¡www.ivebeenthere.co.uk/@ps/16681 ¡ ¡www.napalinks.com ¡ Presented ¡Ranking ¡ 5. ¡Napa ¡Valley ¡Wineries ¡and ¡Wine ¡ 5. ¡NapaValley.org ¡ 1. Napa ¡Valley ¡– ¡The ¡authority ¡for ¡lodging... ¡ ¡www.napavintners.com ¡ ¡www.napavalley.org ¡ ¡www.napavalley.com ¡ 6. ¡Napa ¡Country, ¡California ¡– ¡Wikipedia ¡ 6. ¡The ¡Napa ¡Valley ¡Marathon ¡ 2. ¡Napa ¡Country, ¡California ¡– ¡Wikipedia ¡ ¡en.wikipedia.org/wiki/Napa_Valley ¡ ¡www.napavalleymarathon.org ¡ ¡en.wikipedia.org/wiki/Napa_Valley ¡ 3. ¡Napa: ¡The ¡Story ¡of ¡an ¡American ¡Eden... ¡ ¡books.google.co.uk/books?isbn=... ¡ 4. Napa ¡Valley ¡Wineries ¡– ¡Plan ¡your ¡wine... ¡ ¡www.napavalley.com/wineries ¡ 5. ¡Napa ¡Valley ¡Hotels ¡– ¡Bed ¡and ¡Breakfast... ¡ A B ¡www.napalinks.com ¡ ¡ 6. Napa ¡Balley ¡College ¡ ¡www.napavalley.edu/homex.asp ¡ [Radlinski et al. 2008] 7 ¡NapaValley.org ¡ ¡www.napavalley.org ¡

The Dueling Bandits Problem Yisong Yue Collaborators - PowerPoint PPT Presentation

The Dueling Bandits Problem Yisong Yue Collaborators Yanan Vincent Josef Sui Zhuang Broder Joel Thorsten Bobby Burdick Joachims Kleinberg

MergeDTS for Large Scale Condorcet Dueling Bandits Chang Li , Ilya Markov, Maarten de Rijke and

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

Finding Nash Equilibria in Dueling Games Dehghani, Gholami, Seddighin University of Maryland

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Module 13 Bayesian Bandits CS 886 Sequential Decision Making and Reinforcement Learning

Chicag cago o Bandits dits Affili liate te Program ram Junior r Affiliate and Tra vel

Data Poisoning Attack cks on Stoch chastic c Bandits Fang Liu and Ness Shroff Outline

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar]

Weighted bandits or: How bandits learn distorted values that are not expected Prashanth L.A.

On adaptive regret bounds for non- stochastic bandits Gergely Neu INRIA Lille, SequeL team

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Fast Bayesian automatic Fast Bayesian automatic adaptive quadrature adaptive quadrature Gh.

Research in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine Learning

O OpenFow @ Korea: F @ K Li ki Linking OpenFlow Activities O Fl A i i i in Korea in Korea

PREFERENCES IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING Eyke

Software development in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine

Struktur Data & Algoritme ( Data Structures & Algorithms ) Sorting Denny (

Dependability Modelling and Assessment of Avionics Systems with Altarica. P. Bieber, Ch. Castel,

Outline Corpus Evidence and Compound Structure: The Case of Italian NN Compounds 1 Introduction

The Dueling Bandits Problem Yisong Yue Collaborators - PowerPoint PPT Presentation

The Dueling Bandits Problem Yisong Yue Collaborators Yanan Vincent Josef Sui Zhuang Broder Joel Thorsten Bobby Burdick Joachims Kleinberg

MergeDTS for Large Scale Condorcet Dueling Bandits Chang Li , Ilya Markov, Maarten de Rijke and

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

Finding Nash Equilibria in Dueling Games Dehghani, Gholami, Seddighin University of Maryland

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Module 13 Bayesian Bandits CS 886 Sequential Decision Making and Reinforcement Learning

Chicag cago o Bandits dits Affili liate te Program ram Junior r Affiliate and Tra vel

Data Poisoning Attack cks on Stoch chastic c Bandits Fang Liu and Ness Shroff Outline

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar]

Weighted bandits or: How bandits learn distorted values that are not expected Prashanth L.A.

On adaptive regret bounds for non- stochastic bandits Gergely Neu INRIA Lille, SequeL team

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Fast Bayesian automatic Fast Bayesian automatic adaptive quadrature adaptive quadrature Gh.

Research in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine Learning

O OpenFow @ Korea: F @ K Li ki Linking OpenFlow Activities O Fl A i i i in Korea in Korea

PREFERENCES IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING Eyke

Software development in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine

Struktur Data &amp; Algoritme ( Data Structures &amp; Algorithms ) Sorting Denny (

Dependability Modelling and Assessment of Avionics Systems with Altarica. P. Bieber, Ch. Castel,

Outline Corpus Evidence and Compound Structure: The Case of Italian NN Compounds 1 Introduction

Struktur Data & Algoritme ( Data Structures & Algorithms ) Sorting Denny (