linear bandits rich decision sets sham m kakade
play

Linear Bandits: Rich decision sets Sham M. Kakade Machine Learning - PowerPoint PPT Presentation

Linear Bandits: Rich decision sets Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 9 Bandits in practice: two major issues The decision space is very


  1. Linear Bandits: Rich decision sets Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 9

  2. Bandits in practice: two major issues The decision space is very large. Drug cocktails Ad design We often have “side information” when making a decision history of a user S. M. Kakade (UW) Optimization for Big data 2 / 9

  3. More real motivations... S. M. Kakade (UW) Optimization for Big data 2 / 9

  4. Linear bandits An additive effects model. Suppose each round we take a decision x ∈ D ⊂ R d . x is paths on a graph. x is a feature vector of properties of an ad x is a which drugs are being taken Upon taking action x , we get reward r , with expectation: E [ r | x ] = µ ⊤ x only d unknown paramters (and “effectively” 2 d actions) W desire an algorithm A (mapping histories to deicsions), which has low regret. µ ⊤ x ∗ − E [ µ ⊤ x t |A ] ≤ ?? (where x ∗ is the best decision) S. M. Kakade (UW) Optimization for Big data 3 / 9

  5. Example: Shortest paths... S. M. Kakade (UW) Optimization for Big data 3 / 9

  6. Algorithm Idea again, let’s think of optimism in the face of uncertainty we observed some r 1 , . . . r t − 1 , and have taken x 1 , . . . x t − 1 . Questions: what is an estimate of the reward of E [ r | x ] and what is our uncertainty? what is an estimate of µ and what is our uncertainty? S. M. Kakade (UW) Optimization for Big data 4 / 9

  7. Regression! Define: � � x τ x ⊤ A t := τ + λ I , b t := x τ r τ τ< t τ< t Our estimate of µ µ t = A − 1 ˆ b t t Confidence of our estimate: µ t � 2 � µ − ˆ A t ≤ O ( d log t ) S. M. Kakade (UW) Optimization for Big data 5 / 9

  8. LinUCB Again, optimism in the face of uncertainty. Define: µ � 2 B t := { ν |� ν − ˆ A t ≤ O d log t } (Lin UCB) take action: ν ⊤ x x t = argmax x ∈D max ν then update A t , B t , b t , and ˆ µ t . Equivalently, take action: � µ ⊤ x + ( d log t ) xA − 1 x t = argmax x ∈D ˆ x t S. M. Kakade (UW) Optimization for Big data 6 / 9

  9. LinUCB: Geometry S. M. Kakade (UW) Optimization for Big data 7 / 9

  10. LinUCB: Confidence intervals S. M. Kakade (UW) Optimization for Big data 8 / 9

  11. LinUCB Regret bound: √ µ ⊤ x ∗ − E [ µ ⊤ x t |A ] ≤ ∗ ( d T ) (this is the best possible, up to log factors). √ Compare to O ( KT ) Independent of number of actions. k -arm case is a special case. Thompson sampling: This is a good algorithm in practice. S. M. Kakade (UW) Optimization for Big data 9 / 9

  12. Proof Idea... Stats: need to show that B t is a valid confidence region. Geometric lemma: The regret is upper bounded by the: log volume of posterior cov volume of prior cov Then just find the worst case log volume change. S. M. Kakade (UW) Optimization for Big data 9 / 9

  13. Dealing with context... S. M. Kakade (UW) Optimization for Big data 9 / 9

  14. Dealing with context... S. M. Kakade (UW) Optimization for Big data 9 / 9

  15. Acknowledgements http://gdrro.lip6.fr/sites/default/files/ JourneeCOSdec2015-Kaufman.pdf https://sites.google.com/site/banditstutorial/ http://www.yisongyue.com/courses/cs159/lectures/ LinUCB.pdf S. M. Kakade (UW) Optimization for Big data 9 / 9

Recommend


More recommend