announcements
play

Announcements Minbiaos office hour will be changed to Thursday 1-2 - PowerPoint PPT Presentation

Announcements Minbiaos office hour will be changed to Thursday 1-2 pm, starting from next week, at Rice Hall 442 1 CS6501: T opics in Learning and Game Theory (Fall 2019) Introduction to Game Theory (II) Instructor: Haifeng Xu Outline


  1. Announcements Ø Minbiao’s office hour will be changed to Thursday 1-2 pm, starting from next week, at Rice Hall 442 1

  2. CS6501: T opics in Learning and Game Theory (Fall 2019) Introduction to Game Theory (II) Instructor: Haifeng Xu

  3. Outline Ø Correlated and Coarse Correlated Equilibrium Ø Zero-Sum Games Ø GANs and Equilibrium Analysis 3

  4. Recap: Normal-Form Games Ø 𝑜 players, denoted by set 𝑜 = {1, ⋯ , 𝑜} Ø Player 𝑗 takes action 𝑏 * ∈ 𝐵 * Ø An outcome is the action profile 𝑏 = (𝑏 . , ⋯ , 𝑏 / ) • As a convention, 𝑏 1* = (𝑏 . , ⋯ , 𝑏 *1. , 𝑏 *2. , ⋯ , 𝑏 / ) denotes all actions excluding 𝑏 * / 𝐵 * Ø Player 𝑗 receives payoff 𝑣 * (𝑏) for any outcome 𝑏 ∈ Π *5. • 𝑣 * 𝑏 = 𝑣 * (𝑏 * , 𝑏 1* ) depends on other players’ actions Ø 𝐵 * , 𝑣 * *∈[/] are public knowledge A mixed strategy profile 𝑦 ∗ = (𝑦 . ∗ , ⋯ , 𝑦 / ∗ ) is a Nash equilibrium ∗ is a best response to 𝑦 1* ∗ . (NE) if for any 𝑗 , 𝑦 * 4

  5. NE Is Not the Only Solution Concept Ø NE rests on two key assumptions 1. Players move simultaneously (so they cannot see others’ strategies before the move) 2. Players take actions independently Ø Last lecture: sequential move results in different player behaviors • The corresponding game is called Stackelberg game and its equilibrium is called Strong Stackelberg equilibrium Today: we study what happens if players do not take actions independently but instead are “coordinated” by a central mediator Ø This results in the study of correlated equilibrium 5

  6. An Illustrative Example B STOP GO STOP (-3, -2) (-3, 0) A GO (0, -2) (-100, -100) The Traffic Light Game Well, we did not see many crushes in reality… Why? Ø There is a mediator – the traffic light – that coordinates cars’ moves Ø For example, recommend (GO, STOP) for (A,B) with probability 3/5 and (STOP, GO) for (A,B) with probability 2/5 • GO = green light, STOP = red light • Following the recommendation is a best response for each player • It turns out that this recommendation policy results in equal player utility − 6/5 and thus is “fair” This is exactly how traffic lights are designed! 6

  7. Correlated Equilibrium (CE) Ø A (randomized) recommendation policy 𝜌 assigns probability 𝜌(𝑏) for each action profile 𝑏 ∈ 𝐵 = Π *∈ / 𝐵 * • A mediator first samples 𝑏 ∼ 𝜌 , then recommends 𝑏 * to 𝑗 privately Ø Upon receiving a recommendation 𝑏 * , player 𝑗 ’s expected utility is . @ ∑ B CD ∈E CD 𝑣 * 𝑏 * , 𝑏 1* ⋅ 𝜌(𝑏 * , 𝑏 1* ) • 𝑑 is a normalization term that equals the probability 𝑏 * is recommended A recommendation policy 𝜌 is a correlated equilibrium if ∑ B CD 𝑣 * 𝑏 * , 𝑏 1* ⋅ 𝜌(𝑏 * , 𝑏 1* ) ≥ ∑ B CD 𝑣 * 𝑏 I* , 𝑏 1* ⋅ 𝜌 𝑏 * , 𝑏 1* , ∀ 𝑏 I* ∈ 𝐵 * , ∀𝑗 ∈ 𝑜 . Ø That is, any recommended action to any player is a best response • CE makes incentive compatible action recommendations Ø Assumed 𝜌 is public knowledge so every player can calculate her utility 7

  8. Basic Facts about Correlated Equilibrium Fact. Any Nash equilibrium is also a correlated equilibrium. Ø True by definition. Nash equilibrium can be viewed as independent action recommendation Ø As a corollary, correlated equilibrium always exists Fact. The set of correlated equilibria forms a convex set. Ø In fact, distributions 𝜌 satisfies a set of linear constraints ∑ B CD 𝑣 * 𝑏 * , 𝑏 1* ⋅ 𝜌(𝑏 * , 𝑏 1* ) ≥ ∑ B CD 𝑣 * 𝑏 I* , 𝑏 1* ⋅ 𝜌 𝑏 * , 𝑏 1* , ∀ 𝑏 I* ∈ 𝐵 * , ∀𝑗 ∈ 𝑜 . 8

  9. Basic Facts about Correlated Equilibrium Fact. Any Nash equilibrium is also a correlated equilibrium. Ø True by definition. Nash equilibrium can be viewed as independent action recommendation Ø As a corollary, correlated equilibrium always exists Fact. The set of correlated equilibria forms a convex set. Ø In fact, distributions 𝜌 satisfies a set of linear constraints Ø This is nice because that allows us to optimize over all CEs Ø Not true for Nash equilibrium 9

  10. Coarse Correlated Equilibrium (CCE) Ø A weaker notion of correlated equilibrium Ø Also a recommendation policy 𝜌 , but only requires that any player does not have incentives to opting out of our recommendations A recommendation policy 𝜌 is a coarse correlated equilibrium if ∑ B∈E 𝑣 * 𝑏 ⋅ 𝜌(𝑏) ≥ ∑ B∈E 𝑣 * 𝑏 I* , 𝑏 1* ⋅ 𝜌 𝑏 , ∀ 𝑏 I* ∈ 𝐵 * , ∀𝑗 ∈ 𝑜 . That is, for any player 𝑗 , following 𝜌 ’s recommendations is better than opting out of the recommendation and “acting on his own”. Compare to correlated equilibrium condition: ∑ B CD 𝑣 * 𝑏 * , 𝑏 1* ⋅ 𝜌(𝑏 * , 𝑏 1* ) ≥ ∑ B CD 𝑣 * 𝑏 I* , 𝑏 1* ⋅ 𝜌 𝑏 * , 𝑏 1* , ∀ 𝑏 I* ∈ 𝐵 * , ∀𝑗 ∈ 𝑜 . 10

  11. Coarse Correlated Equilibrium (CCE) Ø A weaker notion of correlated equilibrium Ø Also a recommendation policy 𝜌 , but only requires that any player does not have incentives to opting out of our recommendations A recommendation policy 𝜌 is a coarse correlated equilibrium if ∑ B∈E 𝑣 * 𝑏 ⋅ 𝜌(𝑏) ≥ ∑ B∈E 𝑣 * 𝑏 I* , 𝑏 1* ⋅ 𝜌 𝑏 , ∀ 𝑏 I* ∈ 𝐵 * , ∀𝑗 ∈ 𝑜 . That is, for any player 𝑗 , following 𝜌 ’s recommendations is better than opting out of the recommendation and “acting on his own”. Fact. Any correlated equilibrium is a coarse correlated equilibrium. 11

  12. The Equilibrium Hierarchy Coarse Correlated Equilibrium (CCE) Correlated Equilibrium (CE) Nash Equilibrium (NE) There are other equilibrium concepts, but NE and CE are most often used. CCE is not used that often. 12

  13. Outline Ø Correlated and Coarse Correlated Equilibrium Ø Zero-Sum Games Ø GANs and Equilibrium Analysis 13

  14. Zero-Sum Games Ø Two players: player 1 action 𝑗 ∈ 𝑛 = {1, ⋯ , 𝑛} , player 2 action 𝑘 ∈ [𝑜] Ø The game is zero-sum if 𝑣 . 𝑗, 𝑘 + 𝑣 N 𝑗, 𝑘 = 0, ∀𝑗 ∈ 𝑛 , 𝑘 ∈ [𝑜] • Models the strictly competitive scenarios • “Zero-sum” almost always mean “2-player zero-sum” games • 𝑜 -player games can also be zero-sum, but not particularly interesting Ø Let 𝑣 . 𝑦, 𝑧 = ∑ *∈ Q ,R∈[/] 𝑣 . 𝑗, 𝑘 𝑦 * 𝑧 R for any 𝑦 ∈ Δ Q , 𝑧 ∈ Δ / Ø (𝑦 ∗ , 𝑧 ∗ ) is a NE for the zero-sum game if: (1) 𝑣 . 𝑦 ∗ , 𝑧 ∗ ≥ 𝑣 . (𝑗, 𝑧 ∗ ) for any 𝑗 ∈ [𝑛] ; (2) 𝑣 . 𝑦 ∗ , 𝑧 ∗ ≤ 𝑣 . (𝑦 ∗ , 𝑘) for any j ∈ [𝑛] Ø Condition 𝑣 . 𝑦 ∗ , 𝑧 ∗ ≤ 𝑣 . (𝑦 ∗ , 𝑘) ⟺ 𝑣 N 𝑦 ∗ , 𝑧 ∗ ≥ 𝑣 N 𝑦 ∗ , 𝑘 Ø We can “forget” 𝑣 N ; Instead think of player 2 as minimizing player 1’s utility 14

  15. Maximin and Minimax Strategy Ø Previous observations motivate the following definitions Definition. 𝑦 ∗ ∈ Δ Q is a maximin strategy of player 1 if it solves Z∈[ \ min max R∈[/] 𝑣 1 (𝑦, 𝑘) . The corresponding utility value is called maximin value of the game. Remarks: Ø 𝑦 ∗ is player 1’s best action if he was to move first 15

  16. Maximin and Minimax Strategy Ø Previous observations motivate the following definitions Definition. 𝑦 ∗ ∈ Δ Q is a maximin strategy of player 1 if it solves Z∈[ \ min max R∈[/] 𝑣 1 (𝑦, 𝑘) . The corresponding utility value is called maximin value of the game. Definition. 𝑧 ∗ ∈ Δ / is a minimax strategy of player 2 if it solves _∈[ ` max min *∈[Q] 𝑣 1 (𝑗, 𝑧) . The corresponding utility value is called minimax value of the game. Remark: 𝑧 ∗ is player 2’s best action if he was to move first 16

  17. Duality of Maximin and Minimax Z∈[ \ min max R∈[/] 𝑣 1 (𝑦, 𝑘) ≤ min _∈[ ` max *∈[Q] 𝑣 1 (𝑗, 𝑧) . Fact. That is, moving first is no better. Ø Let 𝑧 ∗ = argmin _∈[ ` max *∈[Q] 𝑣 1 (𝑗, 𝑧) , so *∈ Q 𝑣 1 (𝑗, 𝑧 ∗ ) _∈[ ` max min *∈ Q 𝑣 . (𝑗, 𝑧) = max Ø We have Z∈[ \ 𝑣 1 (𝑦, 𝑧 ∗ ) *∈ Q 𝑣 1 (𝑗, 𝑧 ∗ ) Z∈[ \ min max R∈[/] 𝑣 1 (𝑦, 𝑘) ≤ max = max 17

  18. Duality of Maximin and Minimax Z∈[ \ min max R∈[/] 𝑣 1 (𝑦, 𝑘) ≤ min _∈[ ` max *∈[Q] 𝑣 1 (𝑗, 𝑧) . Fact. Z∈[ \ min max R∈[/] 𝑣 1 (𝑦, 𝑘) = min _∈[ ` max *∈[Q] 𝑣 1 (𝑗, 𝑧) . Theorem. Ø Maximin and minimax can both be formulated as linear program Minimax Maximin min 𝑤 max 𝑣 Q 𝑣 . (𝑗, 𝑘) 𝑦 * , ∀𝑘 ∈ [𝑜] / 𝑤 ≥ ∑ R5. 𝑣 . (𝑗, 𝑘) 𝑧 R , ∀𝑗 ∈ [𝑛] s.t. 𝑣 ≤ ∑ *5. s.t. Q 𝑦 * = 1 / ∑ R5. 𝑧 R = 1 ∑ *5. 𝑦 * ≥ 0, ∀𝑗 ∈ [𝑛] 𝑧 R ≥ 0, ∀𝑘 ∈ [𝑜] Ø This turns out to be primal and dual LP. Strong duality yields the equation 18

Recommend


More recommend