Learning diverse rankings with multi-armed bandits Radlinski, - PowerPoint PPT Presentation

May 19, 2023 •268 likes •420 views

Learning diverse rankings with multi-armed bandits Radlinski, Kleinberg & Joachims. ICML 08 Radlinski, Kleinberg & Joachims. ICML 08 Overview a) Problem of diverse rankings. b) Solution approaches c) Two possible candidates d)

Learning diverse rankings with multi-armed bandits Radlinski, Kleinberg & Joachims. ICML ‘08 Radlinski, Kleinberg & Joachims. ICML ‘08
Overview a) Problem of diverse rankings. b) Solution approaches c) Two possible candidates d) Using multi-armed bandits e) Theoretical analysis f) Ranked explore and commit
Ranking search results on the Web � A key metric used is “Relevance” • This can be different for different users • How to learn/infer the relevance? OR
How to compute rankings?
How to learn diverse rankings? What should be used as training data? 2. 1. 4. 3. Expert judgments
Using click-through data d 1 d 3 … d n d 2 Relevant set d 2 d 1 d 3 Ordered set
Two approaches • Ranked bandit algorithm • Think of the ranks as different copies of bandit algorithms running simultaneously • Ranked Explore and Commit • Explores each document for a given rank and assigns rank based on user click data
Ranked bandits algorithm. 1. Initialize the k ‘bandit algorithms’ MAB 1 , MAB 2 ,…,MAB k 2. For each of the k slots: a) select document according to the bandit algorithm. b) if already previously chosen, select arbitrary document. 3. Display ordered set of k documents a) Assign reward to document if user clicked it and chosen as per the algorithm b) Assign penalty otherwise c) Update algorithm for the rank
Analysis of the algorithm Think of this as a maximum k-cover problem. S 4 S 1 S 1 U S 3 S 2 S 5 U: User intent expressed as query S i : Document d i ubmodularity! Want to find a collection of k sets whose union has maximum cardinality
Which bandit algorithm to use? Want our algorithm to satisfy the following important criteria 1. Makes no assumptions on distribution of payoffs 2. Allows for exploration strategy 3. Over T rounds, expected payoff of strategies chosen satisfy: Σ E[f t (y t )] ≥ max y Σ E[f t (y)] – R(T)
Which bandit algorithm to use? UCB1 algorithm Has the best performance bound of the two candidate choices used Major weakness: the UCB1 algorithm assumes that the payoffs for the various arms will be i.i.d. EXP3 algorithm Exponential-weight multiplicative update algorithm that maintains and updates probabilities of picking arm based on payoffs received
Online maximization of collection of submodular functions (Streeter & G0lovin ‘07) S 4 S 1 S 1 U S 3 S 2 S 5 f 1 f 2 f 3 f 4 …. f n Want to minimize regret over the choice of each set S i based on observed payoff given by f i (S i )
Analysis of the algorithm Theorem: Ranked Bandits Algorithm achieves a payoff of (1-1/e) OPT – O(k √Tn log n) after T time steps.
Ranked Explore and Commit. 1. Choose some parameters ε, δ and an initial arbitrarily chosen set of k documents 2. For each rank a) assign each document to that rank for specified interval and record clicks b) increment probability of assigning document that rank if it is chosen by user c) choose document with max probability and commit it to the rank 3. Display ordered set of k documents
Analysis of algorithm Theorem: Ranked explore and commit achieves a payoff of (1-1/e) OPT – εT - O(nk 3 log(k/δ)/ε) after T time steps w.h.p.

Recommend

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

Cooperative Bandits with Heavy Tails Dubey and Pentland ICML 2020 Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation Summary Abhimanyu Dubey and Alex Pentland Background K-Armed Bandits

350 views • 16 slides

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits and Gittins indices 1 An Example [Most of this lecture from Berry & Fristedt] You want to maximize the sum of two observa- tions. The process

408 views • 13 slides

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem A New, Fast, and Simple Algorithm A New, Fast, and Simple Algorithm A New, Fast, and

1.56k views • 134 slides

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed bandit machines and find a way to win most money! Note: assume you have unlimited money and never go bankrupt!

342 views • 21 slides

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5 0.1 0.9 0.5 0.1 0.0 0.0 0.0 0.0 estimate n-armed bandit n-armed bandit 0.9 0.5 0.1 0.9 0.5 0.1 0 0.0 0.0 0.0 0.0

677 views • 21 slides

HOW HEALTHY IS OUR COUNTY? 2013 COUNTY HEALTH RANKINGS & ROADMAPS OUTLINE Rankings Background

HOW HEALTHY IS OUR COUNTY? 2013 COUNTY HEALTH RANKINGS & ROADMAPS OUTLINE Rankings Background Rankings: What are the Rankings ? Roadmaps : Moving to Action Moving Forward: Local Action 2 COUNTY HEALTH RANKINGS : BACKGROUND What :

510 views • 34 slides

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD HAND RANKINGS TWO PAIR PAIR HIGH CARD HAND RANKINGS THREE OF A KIND TWO PAIR PAIR HIGH CARD HAND RANKINGS STRAIGHT THREE OF A KIND TWO PAIR

1.1k views • 106 slides

On conditional versus marginal bias in multi-armed bandits Jaehyeok Shin 1 , Aaditya Ramdas 1,2

On conditional versus marginal bias in multi-armed bandits Jaehyeok Shin 1 , Aaditya Ramdas 1,2 and Alessandro Rinaldo 1 Dept. of Statistics and Data Science 1 , Machine Learning Dept. 2 , CMU Stochastic Multi-armed bandits (MABs) K 2 1

1k views • 56 slides

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

Bandits Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard University 1 / 25 Bandits Agenda Thus far: Supervised machine learning data are given. Next: Active learning

666 views • 25 slides

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Ciara Pike-Burke

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Ciara Pike-Burke Supervisor: David Leslie 24th April 2015 1 / 14 Adaptations of the Thompson

587 views • 20 slides

Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of

Bandits Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of Economics, Oxford University 1 / 25 Bandits Agenda Thus far: Supervised machine learning data are given. Next: Active

425 views • 25 slides

Social Learning in Multi Agent Multi Armed Bandits Abishek Sankararaman, UC Berkeley April 9,

Social Learning in Multi Agent Multi Armed Bandits Abishek Sankararaman, UC Berkeley April 9, 2020 Joint Work with - Sanjay Shakkottai, Ronshee Chawla, UT Austin - Ayalvadi Ganesh, University of Bristol Multi Armed Bandit Problem A set of

973 views • 54 slides

Muti-armed Bandits,Online Learning and Sequential Prediction Jian Li Institute for

2016 NDBC Muti-armed Bandits,Online Learning and Sequential Prediction Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Outline Online Learning Stochastic Multi-armed Bandits UCB Combinatorial

789 views • 47 slides

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Introduction to bandits Games Hierarchical bandits Lipschitz optimization X -armed bandits Planning Conclusion Introduction to Bandits R emi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA

1.1k views • 67 slides

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint work with milie Kaufmann PhD Student Team SCEE, IETR, CentraleSuplec, Rennes & Team SequeL, CRIStAL, Inria, Lille CMAP Seminar 31 st

1.47k views • 96 slides

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

951 views • 63 slides

Multi-armed Bandits for Efficient Lifetime Estimation in MPSoC Design Calvin Ma, Aditya Mahajan,

Multi-armed Bandits for Efficient Lifetime Estimation in MPSoC Design Calvin Ma, Aditya Mahajan, and Brett H. Meyer Department of Electrical and Computer Engineering McGill University Design Differentiation in DSE Design space exploration

521 views • 26 slides

A Gang of Bandits Will Knospe, Paul Reich, Bryce Bern, Dawson dAlmeida The Problem Trying

A Gang of Bandits Will Knospe, Paul Reich, Bryce Bern, Dawson dAlmeida The Problem Trying to make a recommendation from thousands of choices Only understand users preferences as we recommend them shows MyHouse Friends Tags that

776 views • 43 slides

Noncommutative OSp ( 4 | 2 ) SUGRA canin 1 Dragoljub Go 1Faculty of Physics, University of

Noncommutative OSp ( 4 | 2 ) SUGRA canin 1 Dragoljub Go 1Faculty of Physics, University of Belgrade, Studentski Trg 12-16, 11000 Belgrade, Serbia D. Go canin & V. Radovanovi c, Canonical Deformation of N = 2 AdS 4 SUGRA

392 views • 14 slides

The Alternative Block Nondeterministially choose and execute any fragment whose guard is true

Specification Approaches Message Sequence Diagrams The Alternative Block Nondeterministially choose and execute any fragment whose guard is true c:Customer m:Merchant Provide Quote alt Accept Quote [Yes] Reject Quote [ Yes] Munindar P.

461 views • 16 slides

Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payofgs Department of

1/7 Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payofgs Department of Computer Science and Engineering The Chinese University of Hong Kong NeurIPS, Dec. 2018 Han Shao , Xiaotian Yu , Irwin King and Michael

537 views • 7 slides

Planning and Optimization G7. Monte-Carlo Tree Search Algorithms (Part I) Malte Helmert and

Planning and Optimization G7. Monte-Carlo Tree Search Algorithms (Part I) Malte Helmert and Thomas Keller Universit at Basel December 16, 2019 Introduction Default Policy Optimality MAB Summary Content of this Course Foundations Logic

582 views • 47 slides

Exponential Lower Bounds for Polytopes in Combinatorial Optimization Ronald de Wolf Joint with

Exponential Lower Bounds for Polytopes in Combinatorial Optimization Ronald de Wolf Joint with Samuel Fiorini (ULB), Serge Massar (ULB), Sebastian Pokutta (Erlangen), Hans Raj Tiwary (ULB) Exponential Lower Bounds for Polytopes in

1.19k views • 94 slides

14 Allocation Dirichlet Latent Lecture : Taheri Sara Scribes : Chu 4am Exam Man Tue

14 Allocation Dirichlet Latent Lecture : Taheri Sara Scribes : Chu 4am Exam Man Tue 12 : Midterm Exam focus Open Format book understanding on : , IEM ) Everything 8 Lecture Content to up ' . ( VBEM 9810 lectures

440 views • 21 slides

Learning diverse rankings with multi-armed bandits Radlinski, - PowerPoint PPT Presentation

Learning diverse rankings with multi-armed bandits Radlinski, Kleinberg & Joachims. ICML 08 Radlinski, Kleinberg & Joachims. ICML 08 Overview a) Problem of diverse rankings. b) Solution approaches c) Two possible candidates d)

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

HOW HEALTHY IS OUR COUNTY? 2013 COUNTY HEALTH RANKINGS & ROADMAPS OUTLINE Rankings Background

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

On conditional versus marginal bias in multi-armed bandits Jaehyeok Shin 1 , Aaditya Ramdas 1,2

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Ciara Pike-Burke

Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of

Social Learning in Multi Agent Multi Armed Bandits Abishek Sankararaman, UC Berkeley April 9,

Muti-armed Bandits,Online Learning and Sequential Prediction Jian Li Institute for

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-armed Bandits for Efficient Lifetime Estimation in MPSoC Design Calvin Ma, Aditya Mahajan,

A Gang of Bandits Will Knospe, Paul Reich, Bryce Bern, Dawson dAlmeida The Problem Trying

Noncommutative OSp ( 4 | 2 ) SUGRA canin 1 Dragoljub Go 1Faculty of Physics, University of

The Alternative Block Nondeterministially choose and execute any fragment whose guard is true

Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payofgs Department of

Planning and Optimization G7. Monte-Carlo Tree Search Algorithms (Part I) Malte Helmert and

Exponential Lower Bounds for Polytopes in Combinatorial Optimization Ronald de Wolf Joint with

14 Allocation Dirichlet Latent Lecture : Taheri Sara Scribes : Chu 4am Exam Man Tue

Sambuz

Useful Links

Newsletter

Mail Us

Learning diverse rankings with multi-armed bandits Radlinski, - PowerPoint PPT Presentation

Learning diverse rankings with multi-armed bandits Radlinski, Kleinberg & Joachims. ICML 08 Radlinski, Kleinberg & Joachims. ICML 08 Overview a) Problem of diverse rankings. b) Solution approaches c) Two possible candidates d)

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

HOW HEALTHY IS OUR COUNTY? 2013 COUNTY HEALTH RANKINGS &amp; ROADMAPS OUTLINE Rankings Background

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

On conditional versus marginal bias in multi-armed bandits Jaehyeok Shin 1 , Aaditya Ramdas 1,2

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Ciara Pike-Burke

Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of

Social Learning in Multi Agent Multi Armed Bandits Abishek Sankararaman, UC Berkeley April 9,

Muti-armed Bandits,Online Learning and Sequential Prediction Jian Li Institute for

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-armed Bandits for Efficient Lifetime Estimation in MPSoC Design Calvin Ma, Aditya Mahajan,

A Gang of Bandits Will Knospe, Paul Reich, Bryce Bern, Dawson dAlmeida The Problem Trying

Noncommutative OSp ( 4 | 2 ) SUGRA canin 1 Dragoljub Go 1Faculty of Physics, University of

The Alternative Block Nondeterministially choose and execute any fragment whose guard is true

Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payofgs Department of

Planning and Optimization G7. Monte-Carlo Tree Search Algorithms (Part I) Malte Helmert and

Exponential Lower Bounds for Polytopes in Combinatorial Optimization Ronald de Wolf Joint with

14 Allocation Dirichlet Latent Lecture : Taheri Sara Scribes : Chu 4am Exam Man Tue

Sambuz

Useful Links

Newsletter

Mail Us

HOW HEALTHY IS OUR COUNTY? 2013 COUNTY HEALTH RANKINGS & ROADMAPS OUTLINE Rankings Background