An Improved Regret Bound for Thompson Sampling in the Gaussian - PowerPoint PPT Presentation

An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting Cem Kalkanlı, Ayfer ¨ Ozg¨ ur Stanford University ISIT, June 2020 An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting 1 / 13

The Gaussian Linear Bandit Problem Compact action set U : || u || 2 ≤ c for any u ∈ U Reward at time t : Y u t = θ T u t + η t where θ ∈ R d θ ∼ N ( µ, K ) , η t ∼ N (0 , σ 2 ) , η t ∈ R Optimal action and reward: u ∗ = arg max θ T u u ∈ U Y u ∗ , t = θ T u ∗ + η t An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting 2 / 13

A Policy and the Performance Criterion Past t − 1 observations: H t − 1 = { u 1 , Y u 1 , ..., u t − 1 , Y u t − 1 } , H 0 = ∅ A policy π = ( π 1 , π 2 , π 3 , ... ): P ( u t ∈ ·| H t − 1 ) = π t ( H t − 1 )( · ) . The performance criterion for the policy π , the Bayesian regret: T � E [ Y u ∗ , t − Y u t ] R ( T , π ) = t =1 An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting 3 / 13

Posterior of θ Claim: θ | H t ∼ N ( µ t , K t ) for any non negative integer t where µ t = E [ θ |H t ] K t = E [( θ − E [ θ |H t ])( θ − E [ θ |H t ]) T |H t ] Assume θ | H t − 1 ∼ N ( µ t − 1 , K t − 1 ) θ is independent of u t given H t − 1 . ( θ, Y u t ) is a Gaussian random vector given {H t − 1 , u t } . Result: θ | H t ∼ N ( µ t , K t ) An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting 4 / 13

Thompson Sampling Proposed by Thompson (1933) Posterior Matching: P ( u t ∈ B | H t − 1 ) = P ( u ∗ ∈ B | H t − 1 ) Significant empirical performance in online service, display advertising, and online revenue management An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting 5 / 13

Thompson Sampling For The Gaussian Linear Bandit Implementation: Select u t Sample ˆ θ t ∼ N ( µ t − 1 , K t − 1 ) 1 u t = arg max u ∈ U ˆ θ T t u 2 Compute the posterior of θ given H t : µ t ← E [ θ |H t ] K t ← E [( θ − E [ θ |H t ])( θ − E [ θ |H t ]) T |H t ] Keywords: Thompson sampling: π TS The Bayesian regret of Thompson sampling: R ( T , π TS ) An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting 6 / 13

Prior Work √ Lower bound: R ( T , π ) � T for any policy π in a certain Gaussian linear bandit setting (Rusmevichientong & Tsitsiklis, 2010) Thompson sampling: √ R ( T , π TS ) � log( T ) T (Russo & Van Roy, 2014) 1 √ R ( T , π TS ) � T when | U | < ∞ (Russo & Van Roy, 2016) 2 � R ( T , π TS ) � T log( T ) when θ and U are bounded, not 3 including the Gaussian linear bandit (Dong & Van Roy, 2018) An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting 7 / 13

Main Result Theorem The Bayesian regret of Thompson sampling in the Gaussian linear bandit setup: � T ( σ 2 + c 2 Tr( K )) log(1 + T R ( T , π TS ) ≤ d d ) . � Within log( T ) of optimality compared with the lower bound √ of Ω( T ) (Rusmevichientong and Tsitsiklis, 2010) Improves the state-of-the-art upper bound by an order of � log( T ) for the case of an action set with infinitely many √ elements (Previous bound: O (log( T ) T ) by Russo and Van Roy (2014)) Same T dependency as the bound given by Dong & Van Roy (2018) even though θ here has unbounded support unlike the one in 2018 An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting 8 / 13

Cauchy–Schwarz Type Inequality Proposition Let X 1 and X 2 be arbitrary i.i.d., R m valued random variables and f 1 , f 2 measurable maps such that f 1 , f 2 : R m → R d with E [ || f 1 ( X 1 ) || 2 2 ] , E [ || f 2 ( X 1 ) || 2 2 ] < ∞ , then � | E [ f 1 ( X 1 ) T f 2 ( X 1 )] | ≤ d E [( f 1 ( X 1 ) T f 2 ( X 2 )) 2 ] . Reduces to Cauchy-Schwarz inequality when d = 1 Similar statement when d > 1 An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting 9 / 13

Single Step Regret Lemma Let G > 0 such that G ≥ Tr( K ), then � u T 1 Ku 1 d ( σ 2 + c 2 G ) E [log(1 + E [ Y u ∗ , 1 − Y u 1 ] ≤ σ 2 + c 2 G )] . I ( θ ; u 1 , Y u 1 ) = I ( θ ; u 1 ) + I ( θ ; Y u 1 | u 1 ) = E u ∼ u 1 [ I ( θ ; Y u )] θ and Y u are jointly Gaussian random variables: E u ∼ u 1 [ I ( θ ; Y u )] = E [log(1 + u T 1 Ku 1 )] σ 2 Similar to the information ratio concept used by Russo and Van Roy (2016), Dong and Van Roy (2018) Instead of a discrete entropy term, maybe use the mutual information as is An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting 10 / 13

Proof of the Lemma 1 Use of the earlier proposition: E [ Y u ∗ , 1 − Y u 1 ] = E [( θ − µ ) T u ∗ ] � ≤ d E [(( θ − µ ) T u 1 ) 2 ] � d E [ u T = 1 Ku 1 ] 1 Ku 1 ≤ σ 2 + c 2 Tr( K ) ≤ σ 2 + c 2 G and x ≤ log(1 + x ) for 2 u T any x ∈ [0 , 1]: 1 Ku 1 = ( σ 2 + c 2 G ) u T σ 2 + c 2 G ≤ ( σ 2 + c 2 G ) log(1+ u T 1 Ku 1 1 Ku 1 u T σ 2 + c 2 G ) An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting 11 / 13

An Overview of the Main Theorem’s Proof 1 Use the lemma: E [ Y u ∗ , t − Y u t |H t − 1 ] ≤ � u T t K t − 1 u t d ( σ 2 + c 2 Tr( K )) E [log(1 + σ 2 + c 2 Tr( K )) |H t − 1 ]  Jensen’s Inequality  � E [ Y u ∗ , t − Y u t ] ≤ � u T t K t − 1 u t d ( σ 2 + c 2 Tr( K )) E [log(1 + σ 2 + c 2 Tr( K ))] An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting 12 / 13

An Overview of the Main Theorem’s Proof cont. 2 Overall bound on the Bayesian regret: T � E [ Y u ∗ , t − Y u t ] t =1 � T � u T t K t − 1 u t � Td ( σ 2 + c 2 Tr( K )) E [ � � ≤ log(1 + σ 2 + c 2 Tr( K ))] t =1 u T t K t − 1 u t 3 Show that � T σ 2 + c 2 Tr( K ) ) ≤ d log(1 + T t =1 log(1 + d ): t ( K − 1 + � t − 1 u T 1 i =1 u i u T i ) − 1 u t u T t K t − 1 u t σ 2 + c 2 Tr( K ) 1 + σ 2 + c 2 Tr( K ) ≤ 1 + σ 2 + c 2 Tr( K ) det( K − 1 + � t 1 i =1 u i u T i ) σ 2 + c 2 Tr( K ) = det( K − 1 + � t − 1 1 i =1 u i u T i ) σ 2 + c 2 Tr( K ) An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting 13 / 13

An Improved Regret Bound for Thompson Sampling in the Gaussian - PowerPoint PPT Presentation

An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting Cem Kalkanl, Ayfer Ozg ur Stanford University ISIT, June 2020 An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting 1

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

Branch-and-Bound Math 482, Lecture 33 Misha Lavrov April 27, 2020 Branch-and-bound methods

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Ciara Pike-Burke

Consistency Analysis for Massively Inconsistent Datasets in Bound-to-Bound Data Collaboration

Rightward Bound: The Rise of Conservatism in Postwar America Rightward Bound : The Rise of

Annual General Meeting 27 June 2019 Simon Thompson Chairman Todays agenda Simon Thompson

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

Online Learning II Presenter: Adams Wei Yu Carnegie Mellon University Mar 2015 Presenter: Adams

On Birmans Sequence of Hardy-Rellich Type Inequalities Isaac B. Michael (joint with F.

d i E Inner Product a l l u d Dr. Abdulla Eid b A College of Science . r D MATHS

Minimization Using Descent Information we will consider the minimization of unconstrained

COL863: Quantum Computation and Information Ragesh Jaiswal, CSE, IIT Delhi Ragesh Jaiswal, CSE,

2-source Randomness Extractors for Elliptic Curves Abdoul Aziz Ciss Laboratoire de Traitement de

Quantum Time-Space Tradeoffs for Deciding Systems of Linear Inequalities Robert Spalek

On Bounding the Union Probability Jun Yang 1 (Joint work with Fady Alajaji 2 and Glen Takahara 2 )

An Improved Regret Bound for Thompson Sampling in the Gaussian - PowerPoint PPT Presentation

An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting Cem Kalkanl, Ayfer Ozg ur Stanford University ISIT, June 2020 An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting 1

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

Branch-and-Bound Math 482, Lecture 33 Misha Lavrov April 27, 2020 Branch-and-bound methods

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Ciara Pike-Burke

Consistency Analysis for Massively Inconsistent Datasets in Bound-to-Bound Data Collaboration

Rightward Bound: The Rise of Conservatism in Postwar America Rightward Bound : The Rise of

Annual General Meeting 27 June 2019 Simon Thompson Chairman Todays agenda Simon Thompson

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean &amp; Hypothesis Testing Sampling

Online Learning II Presenter: Adams Wei Yu Carnegie Mellon University Mar 2015 Presenter: Adams

On Birmans Sequence of Hardy-Rellich Type Inequalities Isaac B. Michael (joint with F.

d i E Inner Product a l l u d Dr. Abdulla Eid b A College of Science . r D MATHS

Minimization Using Descent Information we will consider the minimization of unconstrained

COL863: Quantum Computation and Information Ragesh Jaiswal, CSE, IIT Delhi Ragesh Jaiswal, CSE,

2-source Randomness Extractors for Elliptic Curves Abdoul Aziz Ciss Laboratoire de Traitement de

Quantum Time-Space Tradeoffs for Deciding Systems of Linear Inequalities Robert Spalek

On Bounding the Union Probability Jun Yang 1 (Joint work with Fady Alajaji 2 and Glen Takahara 2 )

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling