Simpler Optimal Algorithm for Contextual Bandits under Realizability - PowerPoint PPT Presentation

Apr 05, 2023 •552 likes •622 views

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability Yunzong Xu MIT Joint work with David Simchi-Levi (MIT) July 18 RealML @ ICML 2020 Stochastic Contextual Bandits For round = 1,

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability Yunzong Xu MIT Joint work with David Simchi-Levi (MIT) July 18 RealML @ ICML 2020
Stochastic Contextual Bandits • For round 𝑢 = 1, ⋯ , 𝑈 • Nature generates a random context 𝑦 𝑢 according to a fixed unknown distribution 𝐸 𝑑𝑝𝑜𝑢𝑓𝑦𝑢 • Learner observes 𝑦 𝑢 and makes a decision 𝑏 𝑢 ∈ {1, … , 𝐿} • Nature generates a random reward 𝑠 𝑢 𝑦 𝑢 , 𝑏 𝑢 ∈ [0,1] according to an unknown distribution 𝐸 𝑦 𝑢 ,𝑏 𝑢 with (conditional) mean 𝑢 𝑦 𝑢 , 𝑏 𝑢 𝑦 𝑢 = 𝑦, 𝑏 𝑢 = 𝑏 = 𝑔 ∗ (𝑦, 𝑏) 𝔽 𝑠 • We call 𝑔 ∗ the ground-truth reward function • In statistical learning, people use a function class 𝐺 to approximate 𝑔 ∗ . Some examples of 𝐺 : • Linear class / high-dimension linear class / generalized linear models • Reproducing kernel Hilbert spaces • Lipschitz and Hölder spaces • Neural networks
Challenges • We are interested in contextual bandits with a general function class 𝐺 • Realizability assumption: 𝑔 ∗ ∈ 𝐺 • Statistical challenges : how to achieve the minimax optimal regret for a general function class 𝐺 ? • Computational challenges : how to make the algorithm computational efficient? • Existing contextual bandits approaches cannot simultaneously address the above two challenges in practice, as they typically • Rely on strong parametric/structural assumptions on 𝐺 (e.g., UCB variants and Thompson Sampling) • Become computationally intractable for large 𝐺 (e.g., EXP4) • Assume computationally expensive or statistically restrictive oracles that are only implementable for specific F (a series of work on oracle-based contextual bandits)
Research Question • Observation: the statistical and computational aspects of “offline regression with a general 𝐺 ” are very well-studied in ML • Can we reduce general contextual bandits to general offline regression? • Specifically, for any 𝐺 , given an offline regression oracle, i.e., a least- squares regression oracle (ERM with square loss): 𝑡 𝑢 (𝑦 𝑢 , 𝑏 𝑢 ) 2 , min 𝑔∈𝐺 ෍ 𝑔 𝑦 𝑢 , 𝑏 𝑢 − 𝑠 𝑢=1 can we design an algorithm that achieves the optimal regret via a few calls to this oracle? • An open problem mentioned in Agarwal et al. (2012), Foster et al. (2018), Foster and Rakhlin (2020)
Our Contributions • We provide the first optimal and efficient offline- regression-oracle-based algorithm for general contextual bandits (under realizability) • The algorithm is much simpler and faster than existing approaches to general contextual bandits • We provide the first universal and optimal black- box reduction from contextual bandits to offline regression • Any advances in offline (square loss) regression immediately translate to contextual bandits, statistically and computationally

Recommend

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm 20. Dec 2016 1 / 21 Outline Introduction 1 Online Algorithm The Secretary Problem Optimal Stopping 2 Odds Algorithm 3 Algorithm Proof

1.27k views • 55 slides

Making State Government Simpler, Faster, Better, and Less Costly Michael Buerger and Rich

Making State Government Simpler, Faster, Better, and Less Costly Michael Buerger and Rich Martinski February 10, 2014 SIMPLER. FASTER. BETTER. LESS COSTLY. SIMPLER. FASTER. BETTER. LESS COSTLY. 7 Steps to Implementing Lean

910 views • 37 slides

SimpleR SimpleR - goals and intentions A Windows-based interface to R for basic statistics T

SimpleR SimpleR - goals and intentions A Windows-based interface to R for basic statistics T Wants to S lower entry barriers for Windows users S support 90% of the needs of 90% of Gunther Maier (potential) users T Does NOT want to Vienna

426 views • 3 slides

The Metropolis Hastings algorithm : introduction and optimal scaling of the transient phase

Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Optimisation strategies for the RWMH algorithm The Metropolis Hastings algorithm : introduction and optimal

509 views • 36 slides

An Optimal Jumper An Optimal Jumper Insertion Algorithm for Antenna Insertion Algorithm for

An Optimal Jumper An Optimal Jumper Insertion Algorithm for Antenna Insertion Algorithm for Antenna Avoidance/Fixing on General Routing Avoidance/Fixing on General Routing Trees with Obstacles Trees with Obstacles Bor- -Yiing Yiing Su and

649 views • 46 slides

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

The Optimal Agent Application & Evaluation Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent Application & Evaluation Motivation Artificial Intelligence (AI) is the field inspired by the

400 views • 36 slides

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Need for Unmanned . . . Need for Easily . . . Technical Details of . . . Need for an Optimal . . . Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal Trajectory for Solution: How to . . . What If

429 views • 20 slides

Greedy Algorithms 1 The main idea of greedy algorithm is look some optimal solution locally

Chapter 4 Greedy Algorithms 1 The main idea of greedy algorithm is look some optimal solution locally and then try to extend globally. Usually the greedy algorithm is efficient. The greedy algorithm may not achieve optimal solution for

1.65k views • 145 slides

Ohio Department of Natural Resources PARKS-WATERCRAFT MERGER September 26 - 30, 2016 SIMPLER.

Ohio Department of Natural Resources PARKS-WATERCRAFT MERGER September 26 - 30, 2016 SIMPLER. FASTER. BETTER. LESS COSTLY. SIMPLER. FASTER. BETTER. LESS COSTLY. lean.ohio.gov lean.ohio.gov Facilitator: Y'vette Helm - Green Belt

630 views • 33 slides

Simpler World Due October 16 th Goal Recover the 3D structure of the world Problem 1: Making

COS 429 PS2: Reconstructing a Simpler World Due October 16 th Goal Recover the 3D structure of the world Problem 1: Making the World Simpler Simple World Assumptions: Flat surfaces that are either horizontal or vertical Objects

526 views • 17 slides

Karnaugh-Maps September 14, 2006 Typeset by Foil T EX What are Karnaugh Maps? A simpler

Karnaugh-Maps September 14, 2006 Typeset by Foil T EX What are Karnaugh Maps? A simpler way to handle most (but not all) jobs of manipulating logic functions. Typeset by Foil T EX 1 What are Karnaugh Maps? A simpler way to

811 views • 51 slides

MPTCP Enhanced API With simpler calls and smart connect By Alexis Clarembeau Content We added a

MPTCP Enhanced API With simpler calls and smart connect By Alexis Clarembeau Content We added a new layer over the current MPTCP API in order to make it: Simpler : Using and modifying paths can now be done without using complex structures

579 views • 5 slides

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW K-means Clustering Algorithm K-means++ Initialization Algorithm Experiment Datasets Conclusion K-MEANS CLUSTERING ALGORITHM A

313 views • 14 slides

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction Optimal control Some Inverse Problems Typical approaches Other applications of optimal control Picof12 2 Optimal Control Optimal Control

540 views • 31 slides

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

MOTDim d Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport Formulation of the problems Optimal Hadrien De March transport in practice Martingale CMAP, Ecole Polytechnique optimal transport

690 views • 24 slides

Visible Surface Determination CS418 Computer Graphics John C. Hart Painters Algorithm

Visible Surface Determination CS418 Computer Graphics John C. Hart Painters Algorithm Painters Algorithm Painters Algorithm Painters Algorithm Painters Algorithm Painters Algorithm Backface Culling Q: How does Bob

741 views • 23 slides

Imprecision in learning: introduction Sebastien Destercke Universit de Technologie de

Imprecision in learning: introduction Sebastien Destercke Universit de Technologie de Compigne WPMSIIP 2016 1 Classical framework 1. A set D of (i.i.d.) precise data { x i , y i } coming from X Y 2. Future data follow the same

535 views • 11 slides

CMU 15-896 Noncooperative games 2: Learning and minimax Teacher: Ariel Procaccia Reminder: The

CMU 15-896 Noncooperative games 2: Learning and minimax Teacher: Ariel Procaccia Reminder: The Minimax Theorem Theorem [von Neumann, 1928]: Every 2-player zero-sum game has a unique value such that: Player 1 can guarantee value at o

566 views • 22 slides

CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning CSC304 - Nisarg Shah 1

CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning CSC304 - Nisarg Shah 1 2-Player Zero-Sum Games Reward of P2 = - Reward of P1 Matrix s.t. , is reward to P1 when P1 chooses her action and

576 views • 21 slides

Homework 7.1 C D Here is the payoff matrix for the most commonly used version of the

Homework 7.1 C D Here is the payoff matrix for the most commonly used version of the Prisoners Dilemma. C 3, 3 0, 5 (a) What is Player 1s maximin strategy? D 5, 0 1, 1 (b) What is Player 1s minimax regret strategy? Here

321 views • 3 slides

Decision Problems Decision Making under Uncertainty, Part III Christos Dimitrakakis Chalmers

Decision Problems Decision Making under Uncertainty, Part III Christos Dimitrakakis Chalmers 1/11/2013 Christos Dimitrakakis (Chalmers) Decision Problems 1/11/2013 1 / 35 1 Introduction 2 Rewards that depend on the outcome of an experiment

1.01k views • 63 slides

a chaining algorithm for online non parametric regression Pierre Gaillard December 2, 2015

a chaining algorithm for online non parametric regression Pierre Gaillard December 2, 2015 University of Copenhagen This is joint work with Sebastien Gerchinovitz table of contents 1. Online prediction of arbitrary sequences 2. Finite

509 views • 33 slides

Outline 1. Standing on the Shoulders of Giants . . . 2. What is Information? 3. Shannon

What is Information? W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 April 30, 2008 AofA and IT logos INRIA 2008 Participants of Information Beyond Shannon, Orlando, 2005, and J. Konorski, Gdansk,

725 views • 36 slides

Introduction to Machine Learning 25. Multiplicative Updates, Games and Boosting Alex Smola

Introduction to Machine Learning 25. Multiplicative Updates, Games and Boosting Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Multiplicative updates and experts

945 views • 47 slides