Announcements HW 1 deadline is postponed to next Tuesday before - PowerPoint PPT Presentation

Announcements Announcements Ø HW 1 deadline is postponed to next Tuesday before class, e.g., Ø HW 1 deadline is postponed to next Tuesday before class, e.g., 3:30 pm 3:30 pm 1 1

CS6501: T opics in Learning and Game Theory (Fall 2019) Swap Regret and Convergence to CE Instructor: Haifeng Xu

Outline Ø (External) Regret vs Swap Regret Ø Convergence to Correlated Equilibrium Ø Converting Regret Bounds to Swap Regret Bounds 3

Recap: Online Learning At each time step 𝑢 = 1, ⋯ , 𝑈 , the following occurs in order: Learner picks a distribution 𝑞 ( over actions [𝑜] 1. Adversary picks cost vector 𝑑 ( ∈ 0,1 / 2. Action 𝑗 ( ∼ 𝑞 ( is chosen and learner incurs cost 𝑑 ( (𝑗 ( ) 3. Learner observes 𝑑 ( (for use in future time steps) 4. 4

Recap: (External) Regret Ø External regret 𝑆 ; = 𝔽 > ? ∼@ ? ∑ (∈[;] 𝑑 ( 𝑗 ( 7∈[/] ∑ (∈[;] 𝑑 ( (𝑘) − min 7∈[/] ∑ ( 𝑑 ( (𝑘) is the learner utility had he known 𝑑 : , ⋯ , 𝑑 ; Ø Benchmark min and is allowed to take the best single action across all rounds Ø Describes how much the learner regrets, had he known the cost vector 𝑑 : , ⋯ , 𝑑 ; in hindsight 5

Recap: (External) Regret Ø A closer look at external regret 𝑆 ; = 𝔽 > ? ∼@ ? ∑ (∈[;] 𝑑 ( 𝑗 ( 7∈[/] ∑ (∈[;] 𝑑 ( (𝑘) − min = ∑ (∈ ; ∑ >∈[/] 𝑑 ( 𝑗 𝑞 ( (𝑗) − min 7∈[/] ∑ (∈[;] 𝑑 ( (𝑘) 7∈[/] ∑ (∈ ; ∑ >∈[/] 𝑑 ( 𝑗 𝑞 ( (𝑗) − ∑ (∈[;] 𝑑 ( (𝑘) = max 7∈[/] ∑ (∈ ; ∑ >∈[/] [𝑑 ( 𝑗 − 𝑑 ( (𝑘)]𝑞 ( (𝑗) = max Many-to-one action swap 6

Recap: (External) Regret Ø A closer look at external regret 𝑆 ; = 𝔽 > ? ∼@ ? ∑ (∈[;] 𝑑 ( 𝑗 ( 7∈[/] ∑ (∈[;] 𝑑 ( (𝑘) − min = ∑ (∈ ; ∑ >∈[/] 𝑑 ( 𝑗 𝑞 ( (𝑗) − min 7∈[/] ∑ (∈[;] 𝑑 ( (𝑘) 7∈[/] ∑ (∈ ; ∑ >∈[/] 𝑑 ( 𝑗 𝑞 ( (𝑗) − ∑ (∈[;] 𝑑 ( (𝑘) = max 7∈[/] ∑ (∈ ; ∑ >∈[/] [𝑑 ( 𝑗 − 𝑑 ( (𝑘)]𝑞 ( (𝑗) = max Ø In external regret, learner is allowed to swap to a single action 𝑘 and can choose the best 𝑘 in hindsight 7

Swap Regret Ø A closer look at external regret 7∈[/] ∑ (∈ ; ∑ >∈[/] [𝑑 ( 𝑗 − 𝑑 ( (𝑘)]𝑞 ( (𝑗) 𝑆 ; = max 𝑑 ( (𝑡(𝑗)) Ø Swap regret allows many-to-many action swap • E.g., 𝑡 1 = 2, 𝑡 2 = 1, 𝑡 3 = 4, 𝑡 4 = 4 Ø Formally, ∑ (∈ ; ∑ >∈[/] [𝑑 ( 𝑗 − 𝑑 ( (𝑡(𝑗))]𝑞 ( (𝑗) 𝑡𝑥𝑆 ; = max I where max is over all possible swap functions Ø 𝑜 / many swap functions, each action 𝑗 has 𝑜 choices to swap to Ø Quiz: how many many-to-one swaps? 8

Some Facts about Swap Regret Fact 1. For any algorithm: 𝑡𝑥𝑆 ; ≥ 𝑆 ; Fact 2. For any algorithm execution 𝑞 : , ⋯ , 𝑞 ; , the optimal swap function 𝑡 ∗ satisfies, for any 𝑗 , 𝑡 ∗ 𝑗 = arg max 7∈[/] ∑ (∈ ; [𝑑 ( 𝑗 − 𝑑 ( (𝑘)]𝑞 ( (𝑗) Recall swap regret ∑ (∈ ; ∑ >∈[/] [𝑑 ( 𝑗 − 𝑑 ( (𝑡(𝑗))]𝑞 ( (𝑗) 𝑡𝑥𝑆 ; = max I Proof: Ø 𝑡(𝑗) only affects term ∑ (∈ ; [𝑑 ( 𝑗 − 𝑑 ( (𝑡(𝑗))]𝑞 ( (𝑗) , so should be picked to maximize this term 9

Some Facts about Swap Regret Fact 1. For any algorithm: 𝑡𝑥𝑆 ; ≥ 𝑆 ; Fact 2. For any algorithm execution 𝑞 : , ⋯ , 𝑞 ; , the optimal swap function 𝑡 ∗ satisfies, for any 𝑗 , 𝑡 ∗ 𝑗 = arg max 7∈[/] ∑ (∈ ; [𝑑 ( 𝑗 − 𝑑 ( (𝑘)]𝑞 ( (𝑗) Remarks: Ø The optimal swap can be decided “independently” for each 𝑗 10

Some Facts about Swap Regret Fact 1. For any algorithm: 𝑡𝑥𝑆 ; ≥ 𝑆 ; Fact 2. For any algorithm execution 𝑞 : , ⋯ , 𝑞 ; , the optimal swap function 𝑡 ∗ satisfies, for any 𝑗 , 𝑡 ∗ 𝑗 = arg max 7∈[/] ∑ (∈ ; [𝑑 ( 𝑗 − 𝑑 ( (𝑘)]𝑞 ( (𝑗) Remarks: Ø Benchmark of swap regret depends on the algorithm execution 𝑞 : , ⋯ , 𝑞 ; , but benchmark of external regret does not. Ø This raises a subtle issue: an algorithm minimize swap regret does not necessarily minimize the total loss • An algorithm may intentionally take less actions so the benchmark does not have many opportunities to swap 11

Some Facts about Swap Regret Fact 1. For any algorithm: 𝑡𝑥𝑆 ; ≥ 𝑆 ; Fact 2. For any algorithm execution 𝑞 : , ⋯ , 𝑞 ; , the optimal swap function 𝑡 ∗ satisfies, for any 𝑗 , 𝑡 ∗ 𝑗 = arg max 7∈[/] ∑ (∈ ; [𝑑 ( 𝑗 − 𝑑 ( (𝑘)]𝑞 ( (𝑗) pick worst 𝑗 7∈[/] ∑ (∈ ; [𝑑 ( 𝑗 − 𝑑 ( (𝑘)]𝑞 ( (𝑗) max >∈[/] max is also called the internal regret 12

Outline Ø (External) Regret vs Swap Regret Ø Convergence to Correlated Equilibrium Ø Converting Regret Bounds to Swap Regret Bounds 13

Recap: Normal-Form Games and CE Ø 𝑜 players, denoted by set 𝑜 = {1, ⋯ , 𝑜} Ø Player 𝑗 takes action 𝑏 > ∈ 𝐵 > Ø Player utility depends on the outcome of the game, i.e., an action profile 𝑏 = (𝑏 : , ⋯ , 𝑏 / ) / 𝐵 > • Player 𝑗 receives payoff 𝑣 > (𝑏) for any outcome 𝑏 ∈ Π >T: Ø Correlated equilibrium is an action recommendation policy A recommendation policy 𝜌 is a correlated equilibrium if ∑ V WX 𝑣 > 𝑏 > , 𝑏 Y> ⋅ 𝜌(𝑏 > , 𝑏 Y> ) ≥ ∑ V WX 𝑣 > 𝑏 [> , 𝑏 Y> ⋅ 𝜌 𝑏 > , 𝑏 Y> , ∀ 𝑏 [> ∈ 𝐵 > , ∀𝑗 ∈ 𝑜 . Ø That is, for any recommended action 𝑏 > , player 𝑗 does not want [ to “swap” to another 𝑏 > 14

Repeated Games with No-Swap-Regret Players Ø The game is played repeatedly for 𝑈 rounds Ø Each player uses an online learning algorithm to select a mixed strategy at each round 𝑢 Ø For any player 𝑗 ’s perspective, the following occurs in order at 𝑢 ( ∈ Δ |` X | over actions in 𝐵 > • Picks a mixed strategy 𝑦 > ( ∈ Δ |` b | • Any other player 𝑘 ≠ 𝑗 picks a mixed strategy 𝑦 7 ( , 𝑦 Y> ( • Player 𝑗 receives expected utility 𝑣 > 𝑦 > = 𝔽 V∼(c X ? ) 𝑣 > (𝑏) ? ,c WX ( (for future use) • Player 𝑗 learns 𝑦 Y> 15

From No Swap Regret to Correlated Equ Theorem. If all players use no-swap-regret learning algorithms with ( 𝑦 > (∈[;] for 𝑗. The following recommendation strategy sequence policy 𝜌 ; converges to a CE: 𝜌 ; 𝑏 = : ( (𝑏 > ) , ∀ 𝑏 ∈ 𝐵 . ; ∑ ( Π >∈ / 𝑦 > Remarks: ( , prob. of 𝑏 is Π >∈ / 𝑦 > ( (𝑏 > ) ( , 𝑦 d ( , ⋯ , 𝑦 / Ø In mixed strategy profile 𝑦 : ( (𝑏 > ) over 𝑈 rounds Ø 𝜌 ; (𝑏) is simply the average of Π >∈ / 𝑦 > 16

From No Swap Regret to Correlated Equ Theorem. If all players use no-swap-regret learning algorithms with ( 𝑦 > (∈[;] for 𝑗. The following recommendation strategy sequence policy 𝜌 ; converges to a CE: 𝜌 ; 𝑏 = : ( (𝑏 > ) , ∀ 𝑏 ∈ 𝐵 . ; ∑ ( Π >∈ / 𝑦 > Proof: Ø Derive player 𝑗 ’s expected utility from 𝜌 ; : ( (𝑏 > ) ∑ V∈` ; ∑ ( Π >∈ / 𝑦 > ⋅ 𝑣 > (𝑏) : ( (𝑏 > ) ⋅ 𝑣 > (𝑏) ; ∑ ( ∑ V∈` Π >∈ / 𝑦 > = 17

From No Swap Regret to Correlated Equ Theorem. If all players use no-swap-regret learning algorithms with ( 𝑦 > (∈[;] for 𝑗. The following recommendation strategy sequence policy 𝜌 ; converges to a CE: 𝜌 ; 𝑏 = : ( (𝑏 > ) , ∀ 𝑏 ∈ 𝐵 . ; ∑ ( Π >∈ / 𝑦 > Proof: Ø Derive player 𝑗 ’s expected utility from 𝜌 ; : ( (𝑏 > ) ∑ V∈` ; ∑ ( Π >∈ / 𝑦 > ⋅ 𝑣 > (𝑏) : ( (𝑏 > ) ⋅ 𝑣 > (𝑏) ; ∑ ( ∑ V∈` Π >∈ / 𝑦 > = ( ) : ( , 𝑦 Y> ; ∑ ( 𝑣 > (𝑦 > = 18

From No Swap Regret to Correlated Equ Theorem. If all players use no-swap-regret learning algorithms with ( 𝑦 > (∈[;] for 𝑗. The following recommendation strategy sequence policy 𝜌 ; converges to a CE: 𝜌 ; 𝑏 = : ( (𝑏 > ) , ∀ 𝑏 ∈ 𝐵 . ; ∑ ( Π >∈ / 𝑦 > Proof: Ø Derive player 𝑗 ’s expected utility from 𝜌 ; : ( (𝑏 > ) ∑ V∈` ; ∑ ( Π >∈ / 𝑦 > ⋅ 𝑣 > (𝑏) : ( (𝑏 > ) ⋅ 𝑣 > (𝑏) ; ∑ ( ∑ V∈` Π >∈ / 𝑦 > = ( ) : ( , 𝑦 Y> ; ∑ ( 𝑣 > (𝑦 > = : ( ( (𝑏 > ) ; ; ∑ V X ∈` X ∑ (T: = 𝑣 > 𝑏 > , 𝑦 Y> ⋅ 𝑦 > Ø Player 𝑗 ’s expected utility conditioned on being recommended 𝑏 > is : ( ( (𝑏 > ) ; ; ∑ (T: 𝑣 > 𝑏 > , 𝑦 Y> ⋅ 𝑦 > (normalization factor omitted) 19

From No Swap Regret to Correlated Equ Theorem. If all players use no-swap-regret learning algorithms with ( 𝑦 > (∈[;] for 𝑗. The following recommendation strategy sequence policy 𝜌 ; converges to a CE: 𝜌 ; 𝑏 = : ( (𝑏 > ) , ∀ 𝑏 ∈ 𝐵 . ; ∑ ( Π >∈ / 𝑦 > Proof: Ø The CE condition requires for all player 𝑗 and all 𝑏 > ∈ 𝐵 > ( 𝑏 > , ∀𝑡 𝑏 > ∈ 𝐵 > ( 𝑏 > : : ; ( ; ( ; ∑ (T: ; ∑ (T: ≥ 𝑣 > 𝑡(𝑏 > ), 𝑦 Y> ⋅ 𝑦 > 𝑣 > 𝑏 > , 𝑦 Y> ⋅ 𝑦 > 20

Announcements HW 1 deadline is postponed to next Tuesday before - PowerPoint PPT Presentation

Announcements Announcements HW 1 deadline is postponed to next Tuesday before class, e.g., HW 1 deadline is postponed to next Tuesday before class, e.g., 3:30 pm 3:30 pm 1 1 CS6501: T opics in Learning and Game Theory (Fall 2019) Swap

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

61A Lecture 35 Wednesday, December 4 Announcements 2 Announcements Homework 11 due Thursday

61A Lecture 6 Monday, February 2 Announcements 2 Announcements Homework 2 due Monday 2/2 @

61A Lecture 33 Monday, November 25 Announcements 2 Announcements Homework 10 due Tuesday

61A Lecture 6 Friday, September 13 Announcements 2 Announcements Homework 2 due Tuesday

61A Lecture 24 Monday, March 30 Announcements 2 Announcements Homework 7 due Wednesday 4/8

61A Lecture 37 Wednesday, April 29 Announcements 2 Announcements Homework 9 (4 pts) due

CS 61A Lecture 10 Friday, February 13 Announcements 2 Announcements Guerrilla Section 2 is

61A Lecture 14 Wednesday, February 25 Announcements 2 Announcements Project 2 due Thursday

Linearizability & CAP Announcements No hours this week. Announcements No hours this

61A Lecture 13 Wednesday, October 2 Announcements 2 Announcements Homework 3 deadline

61A Lecture 24 Friday, November 1 Announcements 2 Announcements Homework 7 due Tuesday 11/5

61A Extra Lecture 2 Thursday, February 5 Announcements 2 Announcements If you want 1 unit

CS 61A Lecture 11 Wednesday, February 18 Announcements 2 Announcements Optional Hog Contest

Announcements Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements

Lecture 30: Conclusion Brian Hou August 11, 2016 Announcements Announcements Final Exam

Verilog Tutorial, Part Deux By Sat Garcia 1 Complete the quote Good artists __________ .

Synchronization Spinlocks - Semaphores Summer 2013 Cornell University 1 Today How can I

r tst

y = x; } int a = 2, b = 6; swap(a,b); void swap(int x, int y) { int temp = y; y = x; x =

An enciphering scheme based on a card shuffle Ben Morris Mathematics, UC Davis Joint work with

Affinity Group December 4, 2018 The University of Wisconsin Service Center will Serve the

The fjnite-multiset construction in HoTT August 12, 2019 1 Indiana University 2 University of

Computer Graphics (CS 543) Lecture 10 (Part 3): Rasterization: Line Drawing Prof Emmanuel Agu

Announcements HW 1 deadline is postponed to next Tuesday before - PowerPoint PPT Presentation

Announcements Announcements HW 1 deadline is postponed to next Tuesday before class, e.g., HW 1 deadline is postponed to next Tuesday before class, e.g., 3:30 pm 3:30 pm 1 1 CS6501: T opics in Learning and Game Theory (Fall 2019) Swap

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

61A Lecture 35 Wednesday, December 4 Announcements 2 Announcements Homework 11 due Thursday

61A Lecture 6 Monday, February 2 Announcements 2 Announcements Homework 2 due Monday 2/2 @

61A Lecture 33 Monday, November 25 Announcements 2 Announcements Homework 10 due Tuesday

61A Lecture 6 Friday, September 13 Announcements 2 Announcements Homework 2 due Tuesday

61A Lecture 24 Monday, March 30 Announcements 2 Announcements Homework 7 due Wednesday 4/8

61A Lecture 37 Wednesday, April 29 Announcements 2 Announcements Homework 9 (4 pts) due

CS 61A Lecture 10 Friday, February 13 Announcements 2 Announcements Guerrilla Section 2 is

61A Lecture 14 Wednesday, February 25 Announcements 2 Announcements Project 2 due Thursday

Linearizability &amp; CAP Announcements No hours this week. Announcements No hours this

61A Lecture 13 Wednesday, October 2 Announcements 2 Announcements Homework 3 deadline

61A Lecture 24 Friday, November 1 Announcements 2 Announcements Homework 7 due Tuesday 11/5

61A Extra Lecture 2 Thursday, February 5 Announcements 2 Announcements If you want 1 unit

CS 61A Lecture 11 Wednesday, February 18 Announcements 2 Announcements Optional Hog Contest

Announcements Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements

Lecture 30: Conclusion Brian Hou August 11, 2016 Announcements Announcements Final Exam

Verilog Tutorial, Part Deux By Sat Garcia 1 Complete the quote Good artists __________ .

Synchronization Spinlocks - Semaphores Summer 2013 Cornell University 1 Today How can I

r tst

y = x; } int a = 2, b = 6; swap(a,b); void swap(int x, int y) { int temp = y; y = x; x =

An enciphering scheme based on a card shuffle Ben Morris Mathematics, UC Davis Joint work with

Affinity Group December 4, 2018 The University of Wisconsin Service Center will Serve the

The fjnite-multiset construction in HoTT August 12, 2019 1 Indiana University 2 University of

Computer Graphics (CS 543) Lecture 10 (Part 3): Rasterization: Line Drawing Prof Emmanuel Agu

Linearizability & CAP Announcements No hours this week. Announcements No hours this