Linear Bandits: Rich decision sets Sham M. Kakade Machine Learning - PowerPoint PPT Presentation

Linear Bandits: Rich decision sets Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 9

Bandits in practice: two major issues The decision space is very large. Drug cocktails Ad design We often have “side information” when making a decision history of a user S. M. Kakade (UW) Optimization for Big data 2 / 9

More real motivations... S. M. Kakade (UW) Optimization for Big data 2 / 9

Linear bandits An additive effects model. Suppose each round we take a decision x ∈ D ⊂ R d . x is paths on a graph. x is a feature vector of properties of an ad x is a which drugs are being taken Upon taking action x , we get reward r , with expectation: E [ r | x ] = µ ⊤ x only d unknown paramters (and “effectively” 2 d actions) W desire an algorithm A (mapping histories to deicsions), which has low regret. µ ⊤ x ∗ − E [ µ ⊤ x t |A ] ≤ ?? (where x ∗ is the best decision) S. M. Kakade (UW) Optimization for Big data 3 / 9

Example: Shortest paths... S. M. Kakade (UW) Optimization for Big data 3 / 9

Algorithm Idea again, let’s think of optimism in the face of uncertainty we observed some r 1 , . . . r t − 1 , and have taken x 1 , . . . x t − 1 . Questions: what is an estimate of the reward of E [ r | x ] and what is our uncertainty? what is an estimate of µ and what is our uncertainty? S. M. Kakade (UW) Optimization for Big data 4 / 9

Regression! Define: � � x τ x ⊤ A t := τ + λ I , b t := x τ r τ τ< t τ< t Our estimate of µ µ t = A − 1 ˆ b t t Confidence of our estimate: µ t � 2 � µ − ˆ A t ≤ O ( d log t ) S. M. Kakade (UW) Optimization for Big data 5 / 9

LinUCB Again, optimism in the face of uncertainty. Define: µ � 2 B t := { ν |� ν − ˆ A t ≤ O d log t } (Lin UCB) take action: ν ⊤ x x t = argmax x ∈D max ν then update A t , B t , b t , and ˆ µ t . Equivalently, take action: � µ ⊤ x + ( d log t ) xA − 1 x t = argmax x ∈D ˆ x t S. M. Kakade (UW) Optimization for Big data 6 / 9

LinUCB: Geometry S. M. Kakade (UW) Optimization for Big data 7 / 9

LinUCB: Confidence intervals S. M. Kakade (UW) Optimization for Big data 8 / 9

LinUCB Regret bound: √ µ ⊤ x ∗ − E [ µ ⊤ x t |A ] ≤ ∗ ( d T ) (this is the best possible, up to log factors). √ Compare to O ( KT ) Independent of number of actions. k -arm case is a special case. Thompson sampling: This is a good algorithm in practice. S. M. Kakade (UW) Optimization for Big data 9 / 9

Proof Idea... Stats: need to show that B t is a valid confidence region. Geometric lemma: The regret is upper bounded by the: log volume of posterior cov volume of prior cov Then just find the worst case log volume change. S. M. Kakade (UW) Optimization for Big data 9 / 9

Dealing with context... S. M. Kakade (UW) Optimization for Big data 9 / 9

Acknowledgements http://gdrro.lip6.fr/sites/default/files/ JourneeCOSdec2015-Kaufman.pdf https://sites.google.com/site/banditstutorial/ http://www.yisongyue.com/courses/cs159/lectures/ LinUCB.pdf S. M. Kakade (UW) Optimization for Big data 9 / 9

Linear Bandits: Rich decision sets Sham M. Kakade Machine Learning - PowerPoint PPT Presentation

Linear Bandits: Rich decision sets Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 9 Bandits in practice: two major issues The decision space is very

Linear (and contextual) Bandits: Rich decision sets (and side information) Sham M. Kakade

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Bandits and Exploration: How do we (optimally) gather information? Sham M. Kakade Machine

Thompson Sampling and Linear Bandits Instructor: Sham Kakade 1 Review The basic paradigm is as

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling Instructor: Sham Kakade 1 The

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

(recent advancements in) Optimization in the Big Data Regime Sham M. Kakade Computer

Optimization in the Big Data Regime Sham M. Kakade Machine Learning for Big Data

Machine Learning (CSE 446): Decision Trees Sham M Kakade 2018 c University of Washington

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Machine Learning (CSE 446): (continuation of overfitting &) Limits of Learning Sham M Kakade

Module 13 Bayesian Bandits CS 886 Sequential Decision Making and Reinforcement Learning

Identifiability and Unmixing of Latent Parse Trees Daniel Hsu, Sham Kakade, Percy Liang NIPS

Auto-Differentiation, Computation Graphs, and Evaluation Traces Instructor: Sham Kakade 1

Locality sensitive hashing Instructor: Sham Kakade 1 SK notes quick sort (check)

The R package nlstools : a toolbox for nonlinear regression Florent Baty Sandrine Charles

How Wet is the Earth? Written by: Laura Ring Kapitula, Paul Stephenson Grand Valley State

Confidence Intervals II 18.05 Spring 2014 Agenda Polling: estimating in Bernoulli( ). CLT

Reliability Analysis for What Is the Accuracy . . . Aerospace Applications: New Approach: Main

Assessing the geographic resolution of exhaustive tabulation for geolocating Internet hosts S.

Checklist for Analytical Method Validation (Chemical) TEST ASSAY/RELATED SUBSTANCES PARAMETER

Undermodelling Detection with Sign-Perturbed Sums e 1,2 Marco Campi 3 aji 2 Erik Weyer 4 Algo

How to Make Remaining Problem Plausibility-Based Let Us Consider the . . . How to Modify . . .

Linear Bandits: Rich decision sets Sham M. Kakade Machine Learning - PowerPoint PPT Presentation

Linear Bandits: Rich decision sets Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 9 Bandits in practice: two major issues The decision space is very

Linear (and contextual) Bandits: Rich decision sets (and side information) Sham M. Kakade

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Bandits and Exploration: How do we (optimally) gather information? Sham M. Kakade Machine

Thompson Sampling and Linear Bandits Instructor: Sham Kakade 1 Review The basic paradigm is as

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling Instructor: Sham Kakade 1 The

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

(recent advancements in) Optimization in the Big Data Regime Sham M. Kakade Computer

Optimization in the Big Data Regime Sham M. Kakade Machine Learning for Big Data

Machine Learning (CSE 446): Decision Trees Sham M Kakade 2018 c University of Washington

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Machine Learning (CSE 446): (continuation of overfitting &amp;) Limits of Learning Sham M Kakade

Module 13 Bayesian Bandits CS 886 Sequential Decision Making and Reinforcement Learning

Identifiability and Unmixing of Latent Parse Trees Daniel Hsu, Sham Kakade, Percy Liang NIPS

Auto-Differentiation, Computation Graphs, and Evaluation Traces Instructor: Sham Kakade 1

Locality sensitive hashing Instructor: Sham Kakade 1 SK notes quick sort (check)

The R package nlstools : a toolbox for nonlinear regression Florent Baty Sandrine Charles

How Wet is the Earth? Written by: Laura Ring Kapitula, Paul Stephenson Grand Valley State

Confidence Intervals II 18.05 Spring 2014 Agenda Polling: estimating in Bernoulli( ). CLT

Reliability Analysis for What Is the Accuracy . . . Aerospace Applications: New Approach: Main

Assessing the geographic resolution of exhaustive tabulation for geolocating Internet hosts S.

Checklist for Analytical Method Validation (Chemical) TEST ASSAY/RELATED SUBSTANCES PARAMETER

Undermodelling Detection with Sign-Perturbed Sums e 1,2 Marco Campi 3 aji 2 Erik Weyer 4 Algo

How to Make Remaining Problem Plausibility-Based Let Us Consider the . . . How to Modify . . .

Machine Learning (CSE 446): (continuation of overfitting &) Limits of Learning Sham M Kakade