CSC304 Lecture 6 Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1
Zero-Sum Games • Special case of games ➢ Total reward to all players is constant in every outcome ➢ Without loss of generality, sum of rewards = 0 ➢ Inspired terms like “zero - sum thinking” and “zero -sum situation” • Focus on two-player zero-sum games (2p-zs) ➢ “The more I win, the more you lose” CSC304 - Nisarg Shah 2
Zero-Sum Games Zero-sum game: Rock-Paper-Scissor P2 Rock Paper Scissor P1 Rock (0 , 0) (-1 , 1) (1 , -1) Paper (1 , -1) (0 , 0) (-1 , 1) Scissor (-1 , 1) (1 , -1) (0 , 0) Non-zero- sum game: Prisoner’s dilemma John Stay Silent Betray Sam Stay Silent (-1 , -1) (-3 , 0) Betray (0 , -3) (-2 , -2) CSC304 - Nisarg Shah 3
Zero-Sum Games • Why are they interesting? ➢ Many physical games we play are zero-sum: chess, tic-tac- toe, rock-paper- scissor, … ➢ (win, lose), (lose, win), (draw, draw) ➢ (1, -1), (-1, 1), (0, 0) • Why are they technically interesting? ➢ We’ll see. CSC304 - Nisarg Shah 4
Zero-Sum Games • Reward for P2 = - Reward for P1 ➢ Only need to write a single entry in each cell (say reward of P1) ➢ Hence, we get a matrix 𝐵 ➢ P1 wants to maximize the value, P2 wants to minimize it P2 Rock Paper Scissor P1 Rock 0 -1 1 Paper 1 0 -1 Scissor -1 1 0 CSC304 - Nisarg Shah 5
Rewards in Matrix Form • Say P1 uses mixed strategy 𝑦 1 = (𝑦 1,1 , 𝑦 1,2 , … ) ➢ What are the rewards of P1 for different actions chosen by P2? 𝑡 𝑘 𝑦 1,1 𝑦 1,2 𝑦 1,3 . . . CSC304 - Nisarg Shah 6
Rewards in Matrix Form • Say P1 uses mixed strategy 𝑦 1 = (𝑦 1,1 , 𝑦 1,2 , … ) ➢ What are the rewards for P1 corresponding to different possible actions of P2? 𝑡 𝑘 𝑦 1,1 , 𝑦 1,2 , 𝑦 1,3 , … ∗ ❖ Reward of P1 when P2 𝑈 ∗ 𝐵 𝑘 chooses s j = 𝑦 1 CSC304 - Nisarg Shah 7
Rewards in Matrix Form • Reward for P1 when… ➢ P1 uses a mixed strategy 𝑦 1 ➢ P2 uses a mixed strategy 𝑦 2 𝑦 2,1 𝑈 ∗ 𝐵 1 , 𝑦 1 𝑈 ∗ 𝐵 2 , 𝑦 1 𝑈 ∗ 𝐵 3 … 𝑦 1 ∗ 𝑦 2,2 𝑦 2,3 ⋮ 𝑈 ∗ 𝐵 ∗ 𝑦 2 = 𝑦 1 CSC304 - Nisarg Shah 8
How would the two players act in this zero-sum game? John von Neumann, 1928 CSC304 - Nisarg Shah 9
Maximin Strategy • Worst- case thinking by P1… ➢ Suppose I don’t know anything about what P2 would do. ➢ If I choose a mixed strategy 𝑦 1 , in the worst case, P2 chooses an 𝑦 2 that minimizes my reward (i.e., maximizes his reward) ➢ Let me choose 𝑦 1 to maximize this “worst - case reward” ∗ = max 𝑈 ∗ 𝐵 ∗ 𝑦 2 𝑊 min 𝑦 1 1 𝑦 1 𝑦 2 CSC304 - Nisarg Shah 10
Maximin Strategy ∗ = max 𝑈 ∗ 𝐵 ∗ 𝑦 2 𝑊 min 𝑦 1 1 𝑦 1 𝑦 2 ∗ : maximin value of P1 • 𝑊 1 ∗ (maximizer) : maximin strategy of P1 • 𝑦 1 ∗ , I guarantee myself at least 𝑊 ∗ ” • “By playing 𝑦 1 1 • P2 can similarly think of her worst case. CSC304 - Nisarg Shah 11
Maximin vs Minimax Player 1 Player 2 Choose 𝑦 1 to maximize my Choose 𝑦 2 to minimize P1’s reward in the worst case reward in the worst case over P2’s strategy over P1’s strategy ∗ = max 𝑈 ∗ 𝐵 ∗ 𝑦 2 ∗ = min 𝑈 ∗ 𝐵 ∗ 𝑦 2 𝑊 min 𝑦 1 𝑊 max 𝑦 1 1 2 𝑦 1 𝑦 2 𝑦 2 𝑦 1 ∗ ∗ 𝑦 1 𝑦 2 ∗ and 𝑊 ∗ ? Question: Relation between 𝑊 1 2 CSC304 - Nisarg Shah 12
Maximin vs Minimax ∗ = max 𝑈 ∗ 𝐵 ∗ 𝑦 2 ∗ = min 𝑈 ∗ 𝐵 ∗ 𝑦 2 𝑊 min 𝑦 1 𝑊 max 𝑦 1 1 2 𝑦 1 𝑦 2 𝑦 2 𝑦 1 ∗ ∗ 𝑦 1 𝑦 2 ∗ , x 2 ∗ ) simultaneously? • What if (P1,P2) play (x 1 ∗ ➢ P1’s guarantee: P1 must get reward at least 𝑊 1 ∗ ➢ P2’s guarantee: P1 must get reward at most 𝑊 2 ∗ ≤ 𝑊 ∗ ➢ 𝑊 1 2 CSC304 - Nisarg Shah 13
Maximin vs Minimax ∗ = max 𝑈 ∗ 𝐵 ∗ 𝑦 2 ∗ = min 𝑈 ∗ 𝐵 ∗ 𝑦 2 𝑊 min 𝑦 1 𝑊 max 𝑦 1 1 2 𝑦 1 𝑦 2 𝑦 2 𝑦 1 ∗ ∗ 𝑦 1 𝑦 2 • Another way to see this: ∗ = min ∗ 𝑈 ∗ 𝐵 ∗ 𝑦 2 ≤ ∗ 𝑈 ∗ 𝐵 ∗ 𝑦 2 ∗ 𝑊 𝑦 1 𝑦 1 1 𝑦 2 𝑈 ∗ 𝐵 ∗ 𝑦 2 ∗ = 𝑊 ∗ ≤ max 𝑦 1 2 𝑦 1 CSC304 - Nisarg Shah 14
The Minimax Theorem • Jon von Neumann [1928] • Theorem: For any 2p-zs game, ∗ = 𝑊 ∗ = 𝑊 ∗ (called the minimax value of the game) ➢ 𝑊 1 2 ➢ Set of Nash equilibria = ∗ ∶ x 1 ∗ = maximin for P1, x 2 ∗ = minimax for P2 } ∗ , x 2 { x 1 ∗ is best response to 𝑦 2 ∗ and vice-versa. • Corollary: 𝑦 1 CSC304 - Nisarg Shah 15
The Minimax Theorem • An alternative interpretation of maximin strategies ∗ is the strategy P1 would choose if she were to commit ➢ 𝑦 1 to her strategy first, and P2 were to choose her strategy after observing P1’s strategy ∗ is the strategy P2 would choose if P2 were to ➢ Similarly, 𝑦 2 commit first ∗ and 𝑦 2 ∗ are best responses to each other. ➢ However, 𝑦 1 ➢ Hence, in zero- sum games, it doesn’t matter which player commits first (or if both players commit together). CSC304 - Nisarg Shah 16
The Minimax Theorem • Jon von Neumann [1928] “ As far as I can see, there could be no theory of games … without that theorem … I thought there was nothing worth publishing until the Minimax Theorem was proved” CSC304 - Nisarg Shah 17
Proof of the Minimax Theorem • Simpler proof using Nash’s theorem ➢ But predates Nash’s theorem • Suppose 𝑦 1 , 𝑦 2 is a NE ➢ Note: A Nash equilibrium exists due to Nash’s theorem 𝑦 1 𝑈 𝐵 • P1 gets value 𝑤 = 𝑦 2 𝑤 = max 𝑦 1 𝑦 1 𝑈 𝐵 • 𝑦 1 is best response for P1 : 𝑦 2 𝑦 1 𝑈 𝐵 𝑦 2 • 𝑦 2 is best response for P2 : 𝑤 = min 𝑦 2 CSC304 - Nisarg Shah 18
Proof of the Minimax Theorem ∗ = min 𝑈 ∗ 𝐵 ∗ 𝑦 2 ≤ 𝑊 max 𝑦 1 2 𝑦 2 𝑦 1 𝑦 1 𝑈 𝐵 𝑦 1 𝑈 𝐵 𝑦 2 max 𝑦 2 = 𝑤 = min 𝑦 1 𝑦 2 𝑈 ∗ 𝐵 ∗ 𝑦 2 = 𝑊 ∗ ≤ max min 𝑦 1 1 𝑦 1 𝑦 2 ∗ ≤ 𝑊 ∗ • But we already saw 𝑊 1 2 ∗ = 𝑊 ∗ ➢ 𝑊 1 2 CSC304 - Nisarg Shah 19
Proof of the Minimax Theorem ∗ = min 𝑈 ∗ 𝐵 ∗ 𝑦 2 = 𝑊 max 𝑦 1 2 𝑦 2 𝑦 1 𝑦 1 𝑈 𝐵 𝑦 1 𝑈 𝐵 𝑦 2 max 𝑦 2 = 𝑤 = min 𝑦 1 𝑦 2 𝑈 ∗ 𝐵 ∗ 𝑦 2 = 𝑊 ∗ = max min 𝑦 1 1 𝑦 1 𝑦 2 • When ( 𝑦 1 , 𝑦 2 ) is a NE, 𝑦 1 and 𝑦 2 must be maximin and minimax strategies for P1 and P2, respectively. • The reverse direction is also easy to prove. CSC304 - Nisarg Shah 20
Computing Nash Equilibria • Recall that in general games, computing a Nash equilibrium is hard even with two players. • For 2p-zs games, a Nash equilibrium can be computed in polynomial time. ➢ Polynomial in #actions of the two players: 𝑛 1 and 𝑛 2 ➢ Exploits the fact that Nash equilibrium is simply composed of maximin strategies, which can be computed using linear programming CSC304 - Nisarg Shah 21
Computing Nash Equilibria Maximize 𝑤 Subject to 𝑈 𝐵 𝑘 ≥ 𝑤 , 𝑘 ∈ 1, … , 𝑛 2 𝑦 1 𝑦 1 1 + ⋯ + 𝑦 1 𝑛 1 = 1 𝑦 1 𝑗 ≥ 0, 𝑗 ∈ {1, … , 𝑛 1 } CSC304 - Nisarg Shah 22
Limitation of Minimax Theorem • It only makes sense to play your maximin strategy ∗ if you know the other player is rational enough 𝑦 1 ∗ to choose the best response 𝑦 2 • If the other player is choosing a suboptimal strategy 𝑦 2 , the best response to 𝑦 2 might be different • This is what computer programs playing Chess exploit when they play against human players CSC304 - Nisarg Shah 23
Minimax Theorem in Real Life? Goalie L R Kicker L 0.58 0.95 R 0.93 0.70 Kicker Goalie Maximize 𝑤 Minimize 𝑤 Subject to Subject to 0.58𝑞 𝑀 + 0.93𝑞 𝑆 ≥ 𝑤 0.58𝑟 𝑀 + 0.95𝑟 𝑆 ≤ 𝑤 0.95𝑞 𝑀 + 0.70𝑞 𝑆 ≥ 𝑤 0.93𝑟 𝑀 + 0.70𝑟 𝑆 ≤ 𝑤 𝑞 𝑀 + 𝑞 𝑆 = 1 𝑟 𝑀 + 𝑟 𝑆 = 1 𝑞 𝑀 ≥ 0, 𝑞 𝑆 ≥ 0 𝑟 𝑀 ≥ 0, 𝑟 𝑆 ≥ 0 CSC304 - Nisarg Shah 24
Minimax Theorem in Real Life? Goalie L R Kicker L 0.58 0.95 R 0.93 0.70 Kicker Goalie Maximin: Maximin: 𝑞 𝑀 = 0.38 , 𝑞 𝑆 = 0.62 𝑟 𝑀 = 0.42 , 𝑟 𝑆 = 0.58 Reality: Reality: 𝑞 𝑀 = 0.40 , 𝑞 𝑆 = 0.60 𝑞 𝑀 = 0.423 , 𝑟 𝑆 = 0.577 Some evidence that people may play minimax strategies. CSC304 - Nisarg Shah 25
Minimax Theorem • We proved it using Nash’s theorem ➢ Cheating. Typically, Nash’s theorem (for the special case of 2p-zs games) is proved using the minimax theorem. John von Neumann • Useful for proving Yao’s principle, which provides lower bound for randomized algorithms • Equivalent to linear programming duality George Dantzig CSC304 - Nisarg Shah 26
Recommend
More recommend