The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits - PowerPoint PPT Presentation

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits Authors: John Langford, Tom Zhang Presented by: Ben Flora

Overview • Bandit problem • Contextual bandits • Epoch-Greedy algorithm

Bandits • K arms each arm i – Wins (reward 1) with probability p i – Looses (reward 0) with probability 1- p i • Exploration vs. Exploitation – Exploration is unbiased – Exploitation is biased by exploration only • Regret – Max return – Actual return

Web Example • Some number of ads that can be displayed – Each ad translates to an arm • Each ad can be clicked on by a user – If clicked reward 1 if not reward 0 • Want to have adds clicked as often as possible – This will make the most money

Contextual Bandits • Add Context to the bandit problem – Information aiding in arm choosing – Helps know which arm is best • The rest follows the Bandit problem • Want to find optimal solution • More useful than regular bandits

Web Problem • Now we have user information – A user profile – Search Query – A users preferences • Use this information to choose an ad – Better chance of choosing an ad that is clicked on

Epoch-Greedy Overview Exploration (unbiased input) Hypotheses Black Box: (best arm) Transforms Input to hypotheses Context Similar idea to the papers we saw on Thursday

Exploration • Look at a fixed time horizon – Time horizon is the total number of pulls • Choose a number of Exploration steps n steps T-n Steps Exploration Exploitation T

Minimizing Regret • No explore regret = T • All exploit regret = T • Some minimum between those points Regret Regret Regret T T T n n n T T T

Creating a Hypotheses • Simple two armed case • Remember binary thresholds • Want to learn the threshold value t ε ε If x < t : pick arm 1 x > t : pick arm 2

Creating a Hypotheses (Cont.) • Want to be within ε of the threshold – Need ≈ O(1/ε ) • As the function gets more complex – Need ≈ O((1/ε )*C) – C denotes how complex the function is – A quick note for those of you who took 156 the C is similar to VC dimension

Epoch • Don’t always know the time horizon • Append groupings of known time horizons – Repeat until time actually ends • This specific paper has chosen a single exploration step at the beginning of each epoch

Epoch-Greedy Algorithm • Do a single step of exploration – Begin creating an unbiased vector of inputs to create the hypotheses – Observe context information • Add the learned information to past exploration and create a new hypotheses – This uses the contextual data and exploration • For a set number of steps exploit the hypotheses arm

Review Using Web Example • Have a variety of ads that can be shown – Sports – Movie – Insurance

Review (Cont) • Search Query – Golf Club Repair – Randomly choose – Clicked • Search Query – Car Body Repair – See Repair and Car – Not Clicked

Review (Cont.) • Search Query – Horror Movie – Randomly choose – Clicked • Search Query – Sheep Movie – See Sheep and Movie – Clicked

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits - PowerPoint PPT Presentation

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits Authors: John Langford, Tom Zhang Presented by: Ben Flora Overview Bandit problem Contextual bandits Epoch-Greedy algorithm Overview Bandit problem Contextual

EPOCH EPOCH The European Network of Excellence on ICT Applications to Cultural ICT A li ti t

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Greedy On-Line Planning - abstract overview: what is greedy on-line planning? Part 1: - greedy

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model Gi-Soo Kim, Myunghee Cho

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Contextual Inquiry Take Aways Overview of Contextual Design Contextual inquiry

Greedy Algorithms 1 The main idea of greedy algorithm is look some optimal solution locally

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

CS 170 Section 4 Greedy Algorithms I Owen Jow | owenjow@berkeley.edu Agenda Greedy

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

From greedy approximation to greedy optimization Vladimir Temlyakov July, 2014 Vladimir

From greedy approximation to greedy optimization Vladimir Temlyakov December 10, 2013 Vladimir

Greedy Algorithms Pedro Ribeiro DCC/FCUP 2018/2019 Pedro Ribeiro (DCC/FCUP) Greedy Algorithms

Greedy Algorithm and Matroid Intersections by Yan Alves Radtke July 2020 by Yan Alves Radtke

Reinforcement Learning Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5

AFL CRUSADERS "Cherish each hour of this day for it can never return." Og Mandino The

Future of Education with Mobile Learning Dr. Mohamed Ally Learning Outcome Describe the

Ecological Modeling and Decision Support Systems P. Struss and O. Dressler WS 14/15 WS 14/15

Biodata Management by NGOs Biodata Holdings Data Management Maturity Data Needs Elise Smith,

THE NIGHTINGALE CHALLENGE Lisa Bayliss-Pratt Programme Director - Nightingale Challenge Nursing

Rethink the Sync! Rethink the Sync! Edmund B. Nightingale Kaushik Veeraraghavan Peter M. Chen

Dreamscapes Ciceros Dream of Scipio (5451 bce) Figure: Heavens Above Her (detail; CC-BY-SA

TOWARDS SOLVING ESSENCE WITH LOCAL SEARCH: A PROOF OF CONCEPT USING SETS AND MULTISETS Saad

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits - PowerPoint PPT Presentation

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits Authors: John Langford, Tom Zhang Presented by: Ben Flora Overview Bandit problem Contextual bandits Epoch-Greedy algorithm Overview Bandit problem Contextual

EPOCH EPOCH The European Network of Excellence on ICT Applications to Cultural ICT A li ti t

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Greedy On-Line Planning - abstract overview: what is greedy on-line planning? Part 1: - greedy

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model Gi-Soo Kim, Myunghee Cho

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Contextual Inquiry Take Aways Overview of Contextual Design Contextual inquiry

Greedy Algorithms 1 The main idea of greedy algorithm is look some optimal solution locally

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

CS 170 Section 4 Greedy Algorithms I Owen Jow | owenjow@berkeley.edu Agenda Greedy

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

From greedy approximation to greedy optimization Vladimir Temlyakov July, 2014 Vladimir

From greedy approximation to greedy optimization Vladimir Temlyakov December 10, 2013 Vladimir

Greedy Algorithms Pedro Ribeiro DCC/FCUP 2018/2019 Pedro Ribeiro (DCC/FCUP) Greedy Algorithms

Greedy Algorithm and Matroid Intersections by Yan Alves Radtke July 2020 by Yan Alves Radtke

Reinforcement Learning Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5

AFL CRUSADERS &quot;Cherish each hour of this day for it can never return.&quot; Og Mandino The

Future of Education with Mobile Learning Dr. Mohamed Ally Learning Outcome Describe the

Ecological Modeling and Decision Support Systems P. Struss and O. Dressler WS 14/15 WS 14/15

Biodata Management by NGOs Biodata Holdings Data Management Maturity Data Needs Elise Smith,

THE NIGHTINGALE CHALLENGE Lisa Bayliss-Pratt Programme Director - Nightingale Challenge Nursing

Rethink the Sync! Rethink the Sync! Edmund B. Nightingale Kaushik Veeraraghavan Peter M. Chen

Dreamscapes Ciceros Dream of Scipio (5451 bce) Figure: Heavens Above Her (detail; CC-BY-SA

TOWARDS SOLVING ESSENCE WITH LOCAL SEARCH: A PROOF OF CONCEPT USING SETS AND MULTISETS Saad

AFL CRUSADERS "Cherish each hour of this day for it can never return." Og Mandino The