simpler optimal algorithm for
play

Simpler Optimal Algorithm for Contextual Bandits under Realizability - PowerPoint PPT Presentation

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability Yunzong Xu MIT Joint work with David Simchi-Levi (MIT) July 18 RealML @ ICML 2020 Stochastic Contextual Bandits For round = 1,


  1. Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability Yunzong Xu MIT Joint work with David Simchi-Levi (MIT) July 18 RealML @ ICML 2020

  2. Stochastic Contextual Bandits β€’ For round 𝑒 = 1, β‹― , π‘ˆ β€’ Nature generates a random context 𝑦 𝑒 according to a fixed unknown distribution 𝐸 π‘‘π‘π‘œπ‘’π‘“π‘¦π‘’ β€’ Learner observes 𝑦 𝑒 and makes a decision 𝑏 𝑒 ∈ {1, … , 𝐿} β€’ Nature generates a random reward 𝑠 𝑒 𝑦 𝑒 , 𝑏 𝑒 ∈ [0,1] according to an unknown distribution 𝐸 𝑦 𝑒 ,𝑏 𝑒 with (conditional) mean 𝑒 𝑦 𝑒 , 𝑏 𝑒 𝑦 𝑒 = 𝑦, 𝑏 𝑒 = 𝑏 = 𝑔 βˆ— (𝑦, 𝑏) 𝔽 𝑠 β€’ We call 𝑔 βˆ— the ground-truth reward function β€’ In statistical learning, people use a function class 𝐺 to approximate 𝑔 βˆ— . Some examples of 𝐺 : β€’ Linear class / high-dimension linear class / generalized linear models β€’ Reproducing kernel Hilbert spaces β€’ Lipschitz and HΓΆlder spaces β€’ Neural networks

  3. Challenges β€’ We are interested in contextual bandits with a general function class 𝐺 β€’ Realizability assumption: 𝑔 βˆ— ∈ 𝐺 β€’ Statistical challenges : how to achieve the minimax optimal regret for a general function class 𝐺 ? β€’ Computational challenges : how to make the algorithm computational efficient? β€’ Existing contextual bandits approaches cannot simultaneously address the above two challenges in practice, as they typically β€’ Rely on strong parametric/structural assumptions on 𝐺 (e.g., UCB variants and Thompson Sampling) β€’ Become computationally intractable for large 𝐺 (e.g., EXP4) β€’ Assume computationally expensive or statistically restrictive oracles that are only implementable for specific F (a series of work on oracle-based contextual bandits)

  4. Research Question β€’ Observation: the statistical and computational aspects of β€œoffline regression with a general 𝐺 ” are very well-studied in ML β€’ Can we reduce general contextual bandits to general offline regression? β€’ Specifically, for any 𝐺 , given an offline regression oracle, i.e., a least- squares regression oracle (ERM with square loss): 𝑑 𝑒 (𝑦 𝑒 , 𝑏 𝑒 ) 2 , min π‘”βˆˆπΊ ෍ 𝑔 𝑦 𝑒 , 𝑏 𝑒 βˆ’ 𝑠 𝑒=1 can we design an algorithm that achieves the optimal regret via a few calls to this oracle? β€’ An open problem mentioned in Agarwal et al. (2012), Foster et al. (2018), Foster and Rakhlin (2020)

  5. Our Contributions β€’ We provide the first optimal and efficient offline- regression-oracle-based algorithm for general contextual bandits (under realizability) β€’ The algorithm is much simpler and faster than existing approaches to general contextual bandits β€’ We provide the first universal and optimal black- box reduction from contextual bandits to offline regression β€’ Any advances in offline (square loss) regression immediately translate to contextual bandits, statistically and computationally

Recommend


More recommend