Announcements HW 1 draft is slightly updated; See website for more - PowerPoint PPT Presentation

Announcements Ø HW 1 draft is slightly updated; See website for more info Ø Minbiao’s office hour has been changed to Thursday 1-2 pm from this week, at Rice Hall 442 1

CS6501: T opics in Learning and Game Theory (Fall 2019) MW Updates and Implications Instructor: Haifeng Xu

Outline Ø Regret Proof of MW Update Ø Convergence to Minimax Equilibrium Ø Convergence to Coarse Correlated Equilibrium 3

Recap: the Model of Online Learning At each time step 𝑢 = 1, ⋯ , 𝑈 , the following occurs in order: Learner picks a distribution 𝑞 ( over actions [𝑜] 1. Adversary picks cost vector 𝑑 ( ∈ 0,1 / 2. Action 𝑗 ( ∼ 𝑞 ( is chosen and learner incurs cost 𝑑 ( (𝑗 ( ) 3. Learner observes 𝑑 ( (for use in future time steps) 4. Ø Learner’s goal: pick distribution sequence 𝑞 4 , ⋯ , 𝑞 5 to minimize expected cost 𝔽 ∑ (∈5 𝑑 ( (𝑗 ( ) • Expectation over randomness of action 4

Measure Algorithms via Regret Ø Regret – how much the learner regrets, had he known the cost vector 𝑑 4 , ⋯ , 𝑑 5 in hindsight Ø Formally, 𝑆 5 = 𝔽 ; = ∼> = ∑ (∈[5] 𝑑 ( 𝑗 ( ;∈[/] ∑ (∈[5] 𝑑 ( (𝑗) − min ;∈[/] ∑ ( 𝑑 ( (𝑗) is the learner utility had he known 𝑑 4 , ⋯ , 𝑑 5 Ø Benchmark min and is allowed to take the best single action across all rounds ;∈[/] ∑ ( 𝑑 ( (𝑗) is mostly used • Can also use other benchmarks, but min @ A 5 → 0 as 𝑈 → ∞ , i.e., 𝑆 5 = 𝑝(𝑈) . An algorithm has no regret if Regret is an appropriate performance measure of online algorithms • It measures exactly the loss due to not knowing the data in advance 5

The Multiplicative Weight Update Alg Parameter: 𝜗 Initialize weight 𝑥 4 (𝑗) = 1, ∀𝑗 = 1, ⋯ 𝑜 For 𝑢 = 1, ⋯ , 𝑈 ( = ∑ ;∈[/] 𝑥 ( (𝑗) , pick action 𝑗 with probability 𝑥 ( (𝑗)/𝑋 Let 𝑋 1. ( Observe cost vector 𝑑 ( ∈ [0,1] / 2. For all 𝑗 ∈ [𝑜] , update 𝑥 (K4 (𝑗) = 𝑥 ( (𝑗) ⋅ (1 − 𝜗 ⋅ 𝑑 ( (𝑗)) 3. Theorem. MW Update algorithm achieves regret at most O( 𝑈 ln 𝑜 ) for the previously described online learning problem. Ø Last lecture: both 𝑈 and ln 𝑜 term are necessary Ø Next, we prove the theorem 6

̅ Intuition of the Proof Parameter: 𝜗 Initialize weight 𝑥 4 (𝑗) = 1, ∀𝑗 = 1, ⋯ 𝑜 For 𝑢 = 1, ⋯ , 𝑈 ( = ∑ ;∈[/] 𝑥 ( (𝑗) , pick action 𝑗 with probability 𝑥 ( (𝑗)/𝑋 Let 𝑋 1. ( Observe cost vector 𝑑 ( ∈ [0,1] / 2. For all 𝑗 ∈ [𝑜] , update 𝑥 (K4 (𝑗) = 𝑥 ( (𝑗) ⋅ (1 − 𝜗 ⋅ 𝑑 ( (𝑗)) 3. Ø The decrease of weights relates to expected cost at each round ∑ P∈[Q] R = (;)⋅S = (;) • Expected cost at round 𝑢 is ̅ 𝐷 ( = ∑ ;∈[/] 𝑞 ( (𝑗) ⋅ 𝑑 ( (𝑗) = T = • Propositional to the decrease of total weight at round 𝑢 , which is ∑ ;∈[/] 𝜗 ⋅ 𝑥 ( 𝑗 𝑑 ( (𝑗) = 𝜗𝑋 ( ⋅ 𝐷 ( Ø Proof idea: bound how fast do total weights decrease 7

̅ ̅ ̅ ̅ ̅ Proof Step 1: How Fast do T otal Weights Decrease? ( ⋅ 𝑓 WX ̅ Y = where 𝑋 ( = ∑ ;∈[/] 𝑥 ( (𝑗) is the total Lemma 1. 𝑋 (K4 ≤ 𝑋 weight at 𝑢 and 𝐷 ( is the expected loss at time 𝑢 . is ∑ P∈[Q] R = ; S = (;) 𝐷 ( = ∑ ;∈[/] 𝑞 ( 𝑗 𝑑 ( (𝑗) = T = Proof Ø Almost Immediate from update rule 𝑥 (K4 (𝑗) = 𝑥 ( (𝑗) ⋅ (1 − 𝜗 ⋅ 𝑑 ( (𝑗)) (K4 = ∑ ;∈[/] 𝑥 (K4 (𝑗) 𝑋 = ∑ ;∈[/] 𝑥 ( (𝑗) ⋅ (1 − 𝜗 ⋅ 𝑑 ( (𝑗)) = 𝑋 ( − 𝜗 ⋅ ∑ ;∈[/] 𝑥 ( (𝑗) ⋅ 𝑑 ( (𝑗) = 𝑋 ( − 𝜗 ⋅ 𝑋 𝐷 ( = 𝑋 ( (1 − 𝜗 ⋅ 𝐷 ( ) ( ( ⋅ 𝑓 WX⋅ Y = ≤ 𝑋 since 1 − 𝜀 ≤ 𝑓 W[ , ∀𝜀 ≥ 0 8

̅ ̅ ̅ ̅ ̅ Proof Step 1: How Fast do T otal Weights Decrease? ( ⋅ 𝑓 WX ̅ Y = where 𝑋 ( = ∑ ;∈[/] 𝑥 ( (𝑗) is the total Lemma 1. 𝑋 (K4 ≤ 𝑋 weight at 𝑢 and 𝐷 ( is the expected loss at time 𝑢 . is ∑ P∈[Q] R = ; S = (;) 𝐷 ( = ∑ ;∈[/] 𝑞 ( 𝑗 𝑑 ( (𝑗) = T = A Corollary 1. 𝑋 5K4 ≤ 𝑜𝑓 WX ∑ =]^ Y = . is 𝑋 5K4 ≤ 𝑋 5 ⋅ 𝑓 WX ̅ Y A ≤ [𝑋 5W4 ⋅ 𝑓 WX ̅ Y A`^ ] ⋅ 𝑓 WX ̅ Y A = 𝑋 5W4 ⋅ 𝑓 WX[ ̅ Y A K ̅ Y A`^ ] . . . A 4 ⋅ 𝑓 WX⋅∑ =]^ Y = = 𝑋 A = 𝑜 ⋅ 𝑓 WX⋅∑ =]^ Y = 9

Proof Step 2: Lower Bounding 𝑋 5K4 Lemma 2. 𝑋 5K4 ≥ 𝑓 W5X a ⋅ 𝑓 WX ∑ =]^ S = (;) for any action 𝑗 . A 𝑋 5K4 ≥ 𝑥 5K4 (𝑗) by MW update rule = 𝑥 4 𝑗 1 − 𝜗𝑑 4 𝑗 1 − 𝜗𝑑 b 𝑗 … 1 − 𝜗𝑑 5 𝑗 𝑓 WXS = ; WX a [S = (;)] a by fact 1 − 𝜀 ≥ 𝑓 W[W[ a 5 ≥ Π (e4 ≥ 𝑓 W5X a ⋅ 𝑓 WX ∑ =]^ b to 1 A S = (;) relax 𝑑 ( 𝑗 10

̅ ̅ ̅ ̅ ̅ Proof Step 3: Combing the Two Lemmas A Corollary 1. 𝑋 5K4 ≤ 𝑜𝑓 WX ∑ =]^ Y = . is Lemma 2. 𝑋 5K4 ≥ 𝑓 W5X a ⋅ 𝑓 WX ∑ =]^ A S = (;) for any action 𝑗 . Ø Therefore, for any 𝑗 we have 𝑓 W5X a ⋅ 𝑓 WX ∑ =]^ S = ; ≤ 𝑜𝑓 WX ∑ =]^ A A Y = ⇔ −𝑈𝜗 b − 𝜗 ∑ (e4 5 5 𝑑 ( 𝑗 ≤ ln 𝑜 − 𝜗 ∑ (e4 𝐷 ( take “ ln ” on both sides gh / 5 5 ⇔ ∑ (e4 𝐷 ( − ∑ (e4 𝑑 ( 𝑗 ≤ X + 𝑈𝜗 rearrange terms Taking 𝜗 = ln 𝑜 /𝑈 , we have 5 5 ∑ (e4 ∑ (e4 𝐷 ( − min 𝑑 ( 𝑗 ≤ 2 𝑈 ln 𝑜 ; 11

Remarks Ø Some MW description uses 𝑥 (K4 (𝑗) = 𝑥 ( (𝑗) ⋅ 𝑓 WX ⋅S = (;) . Analysis is similar due to the fact 𝑓 WX ≈ 1 − 𝜗 for small 𝜗 ∈ [0,1] Ø The same algorithm also works for 𝑑 ( ∈ [−𝜍, 𝜍] (still use update rule 𝑥 (K4 (𝑗) = 𝑥 ( (𝑗) ⋅ (1 − 𝜗 ⋅ 𝑑 ( (𝑗)) ). Analysis is the same Ø MW update is a very powerful technique – it can also be used to solve, e.g., LP, semidefinite programs, SetCover, Boosting, etc. • Because it works for arbitrary cost vectors • Next, we show how it can be used to compute equilibria of games where the “cost vector” will be generated by other players 12

Outline Ø Regret Proof of MW Update Ø Convergence to Minimax Equilibrium Ø Convergence to Coarse Correlated Equilibrium 13

Online learning – A natural way to play repeated games Repeated game: the same game played for many rounds Ø Think about how you play rock-paper-scissor repeatedly Ø In reality, we play like online learning • You try to analyze the past patterns, then decide which action to respond, possibly with some randomness • This is basically online learning! 14

Repeated Zero-Sum Games with No-Regret Players Basic Setup: Ø A zero-sum game with payoff matrix 𝑉 ∈ ℝ p×/ Ø Row player maximizes utility and has actions 𝑛 = {1, ⋯ , 𝑛} • Column player thus minimizes utility Ø The game is played repeatedly for 𝑈 rounds Ø Each player uses an online learning algorithm to pick a mixed strategy at each round 15

Repeated Zero-Sum Games with No-Regret Players Ø From row player’s perspective, the following occurs in order at round 𝑢 • Picks a mixed strategy 𝑦 ( ∈ Δ p over actions in [𝑛] • Her opponent, the column player, picks a mixed strategy 𝑧 ( ∈ Δ / • Action 𝑗 ( ∼ 𝑦 ( is chosen and row player receives utility 𝑉 𝑗 ( , 𝑧 ( = ∑ x∈[/] 𝑧 ( 𝑘 ⋅ 𝑉(𝑗 ( , 𝑘) • Row player learns 𝑧 ( (for future use) Ø Column player has a symmetric perspective, but will think of 𝑉 𝑗, 𝑘 as his cost Difference from online learning: utility/cost vector determined by the opponent, instead of being arbitrarily chosen 16

Repeated Zero-Sum Games with No-Regret Players 5 Ø Expected total utility of row player ∑ (e4 𝑉 𝑦 ( , 𝑧 ( • Note: 𝑉 𝑦 ( , 𝑧 ( = ∑ ;,x 𝑉 𝑗, 𝑘 𝑦 ( 𝑗 𝑧 ( (𝑘) = 𝑦 ( 5 𝑉𝑧 ( Ø Regret of row player is 5 5 ;∈[p] ∑ (e4 − ∑ (e4 max 𝑉 𝑗, 𝑧 ( 𝑉 𝑦 ( , 𝑧 ( Ø Regret of column player is 5 5 ∑ (e4 𝑉 𝑦 ( , 𝑧 ( − min x∈[/] ∑ (e4 𝑉 𝑦 ( , 𝑘 17

From No Regret to Minimax Theorem Next, we give another proof of the minimax theorem, using the fact that no regret algorithms exist (e.g., MW update) 18

From No Regret to Minimax Theorem Ø Assume both players use no-regret learning algorithms Ø For row player, we have |}R = max 5 5 ;∈[p] ∑ (e4 − ∑ (e4 𝑆 5 𝑉 𝑗, 𝑧 ( 𝑉 𝑦 ( , 𝑧 ( ~•€ 4 𝑉 𝑦 ( , 𝑧 ( + @ A = 4 5 5 5 ∑ (e4 ;∈[p] ∑ (e4 ⇔ 5 max 𝑉 𝑗, 𝑧 ( 5 ∑ = • = = max ;∈[p] 𝑉 𝑗, 5 ≥ min •∈‚ Q max ;∈[p] 𝑉 𝑗, 𝑧 19

From No Regret to Minimax Theorem Ø Assume both players use no-regret learning algorithms Ø For row player, we have ~•€ 4 𝑉 𝑦 ( , 𝑧 ( + @ A 5 5 ∑ (e4 ≥ min •∈‚ Q max ;∈[p] 𝑉 𝑗, 𝑧 5 Ø Similarly, for column player, S}ƒ„p/ = ∑ (e4 5 5 x∈[/] ∑ (e4 𝑆 5 𝑉 𝑦 ( , 𝑧 ( − min 𝑉 𝑦 ( , 𝑘 implies …•†‡ˆQ 4 @ A 5 5 ∑ (e4 𝑉 𝑦 ( , 𝑧 ( − ≤ max ‰∈‚ ˆ min x∈[/] 𝑉 𝑦, 𝑘 5 ~•€ …•†‡ˆQ @ A @ A Ø Let 𝑈 → ∞ , no regret implies tend to 0 . We have and 5 5 •∈‚ Q max min ;∈[p] 𝑉 𝑗, 𝑧 ≤ max ‰∈‚ ˆ min x∈[/] 𝑉 𝑦, 𝑘 20

Announcements HW 1 draft is slightly updated; See website for more - PowerPoint PPT Presentation

Announcements HW 1 draft is slightly updated; See website for more info Minbiaos office hour has been changed to Thursday 1-2 pm from this week, at Rice Hall 442 1 CS6501: T opics in Learning and Game Theory (Fall 2019) MW Updates and

Announcements U 4: I

Announcements Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements

Recursion Announcements for Today Prelim 1 Other Announcements Reading: 5.8 5.10

Recursion Announcements for Today Prelim 1 Other Announcements Reading: 5.8 5.10

Announcements Announcements (Extra credit for any of these) Rosenfield Symposium: Tyranny of

For personal use only 7 August 2007 Manager Announcements Companies Announcements Office

Overview of the New Unit Activity Reporting Module Announcements Introduction and announcements:

61A Lecture 24 Monday, March 30 Announcements 2 Announcements Homework 7 due Wednesday 4/8

110 Announcements Announcements - Houses How-to use Zoom for Office-hours Video Posted on

Announcements Announcements Reading for Wednesday Reading for Wednesday the rest of

Announcements Lecture 16 Debugging Leah Perlmutter / Summer 2018 Announcements Reading

Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements Announcements

Superintendents Report April 10 th , 2018 Superintendents Report Announcements Proposed

Lecture 12 Subtypes and Subclasses Leah Perlmutter / Summer 2018 Announcements Announcements

Announcements Lecture 4 Specifications Leah Perlmutter / Summer 2018 Announcements

Lecture 14 Generics 1 Leah Perlmutter / Summer 2018 Announcements Announcements

61A Lecture 24 Friday, November 1 Announcements 2 Announcements Homework 7 due Tuesday 11/5

61A Lecture 14 Wednesday, February 25 Announcements 2 Announcements Project 2 due Thursday

Announcements Lecture 3 Loop Reasoning Leah Perlmutter / Summer 2018 Announcements Follow up

Lecture 10: Maps Part II: Core Commands Announcements HW3 due NOW! Announcements HW3 due

Lecture 10 Equality and Hashcode Leah Perlmutter / Summer 2018 Announcements Announcements

Lecture 7 Abstraction Functions Leah Perlmutter / Summer 2018 Announcements Announcements

CS 61A Lecture 10 Friday, February 13 Announcements 2 Announcements Guerrilla Section 2 is

Announcements PA1 available, due 01/28, 11:59p. HW2 available, due 02/05, 11:59p. MT1 2/4,