Game Theory: Lecture #9 Outline: • Zero-sum games • Security strategies and values • Value • Minimax Theorem
Individual Optimization • Previous focus: Single decision-maker i – A set of actions for the individual, denoted by A i . – A set of “other things that could happen in the world,” denoted A − i – This induces the set of states of the world A = A i × A − i – The individual’s preferences over states characterized by a function: U i : A → R • Terminology: – U i ( · ) referred to as “payoff” or “utility” or “reward” function – The individual i is referred to as an “agent,” “player,” “decision-maker,” or “user” • Player i prefers state a to state a ′ if and only if U i ( a ) > U i ( a ′ ) In case U i ( a ) = U i ( a ′ ) player i is “indifferent” • Dominant question: What should the decision-maker do in such scenarios? • New focus: What if the “other things” are adversarial? 1
Zero-sum games • Setup: Two-player zero-sum games – Set of players, N = { 1 , 2 } – Set of actions, A 1 and A 2 – This induces the set of action profiles A = A 1 × A 2 – For each player, preferences over action profiles characterized by a function: U i : A → R – Zero-sum constraint: For any action profile a ∈ A , U 1 ( a ) + U 2 ( a ) = 0 • Matrix form is a convenient representation for two player strategic games. First entry Player 1 ’s payoff, second entry Player 2’s payoff • Example: Matching pennies H T H 1 , − 1 − 1 , 1 T − 1 , 1 1 , − 1 • Example: Rock-paper-scissors R P S R 0 , 0 − 1 , 1 1 , − 1 P 1 , − 1 0 , 0 − 1 , 1 S − 1 , 1 1 , − 1 0 , 0 2
Zero-sum games • Note: Zero-sum distinction is only relevant for two-player games. Why? • Convention: – Represent game by single matrix – View row player as “maximizer” – View column player as “minimizer” (rather than maximizer of negative values) • Example: Rock-paper-scissors R P S R 0 − 1 1 P 1 0 − 1 S − 1 1 0 • Player set: { row , col } • Action sets: A row = { R, P, S } A col = { R, P, S } • Action profiles: A = { ( R, R ) , ( R, P ) , ( R, S ) , ..., ( S, R ) , ( S, P ) , ( S, S ) } • Payoff functions: U row ( R, R ) = 0 & U col ( R, R ) = 0 U row ( R, P ) = − 1 & U col ( R, P ) = 1 . . . U row ( S, S ) = 0 & U col ( S, S ) = 0 • Interpretation: row tries to maximize cell number, col tries to minimize cell number 3
Worst-case analysis • What is reasonable prediction of behavior in zero-sum games? • Worst-case model: “worst for row is best for col ” – One such model that seems reasonable in zero-sum games – Requires analyzing “what if” scenarios, i.e., if I play T what would opponent do? • Example: L R T 3 0 B 1 2 – If row plays T : Worst case outcome is 0 – If row plays B : Worst case outcome is 1 – row ’s security strategy: B (worst case is col = L ) – Likewise, col ’s security strategy is R (worst case is row = B ) • Guaranteed levels: – Let v denote the guaranteed payoff for row (=1) – Let v denote the maximum penalty for col (=2) • Example: L R T 3 0 B 2 1 – row : B (worst case is col = R ) – col : R (worst case is row = B ) – v = v = 1 4
Maximin and minimax • Question: How do v and v compare in general? – Is it always the case that v ≤ v ? – When is it the case that v = v ? • Fact: Computation of v and v is precisely the same as “maximin” and “minimax” computations • Let F : X × Y → R • Maximin: max x ∈ X min y ∈ Y F ( x, y ) – x commits – y maximizes (as a function of x ) max x ∈ X F ( x, y wc ( x )) where F ( x, y wc ( x )) = min y ∈ Y F ( x, y ) • Minimax: (Similar interpretation) min y ∈ Y max x ∈ X F ( x, y ) • Claim: “Largest minimum is smaller that smallest maximum” max x ∈ X min y ∈ Y F ( x, y ) ≤ min y ∈ Y max x ∈ X F ( x, y ) • Proof: y 1 y 2 y 3 x 1 · · min y max x F ( x, y ) x 2 · · · x 3 max x min y F ( x, y ) · α max x ∈ X min y ∈ Y F ( x, y ) ≤ α ≤ min y ∈ Y max x ∈ X F ( x, y ) 5
Security strategies • Notation: – Set of rows: I – Set of columns: J – Game matrix elements: m ij • Define: v = max i ∈I min j ∈J m ij v = min j ∈J max i ∈I m ij • From prior result: v ≤ v • Game has a value of v ∗ if v = v = v ∗ • i ∗ is a maximizing security strategy (or maximinimizer) if v ≤ m i ∗ j for all j i.e., i ∗ assures a payoff of at least v • j ∗ is a minimizing security strategy (or minimaximizer) if m ij ∗ ≤ v for all i i.e., j ∗ assures a penalty of at most v 6
Mixed strategies • Recall previous example H T H 1 , − T − 1 1 – Row player cannot assure payoff greater than − 1 with either H or T – What if row player randomizes 50/50? Then can assure payoff at least 0 • Recall previous example L R T 3 0 B 1 2 – Col player cannot assure penalty of less than 2 with either L or R – What if Col player randomizes (1 / 2 , 1 / 2) ? Then can assure penalty of at most 1 . 5 • How do security levels change when using mixed strategies (i.e., probabilistic strategies) and opposed to pure strategies (i.e., non-probabilistic strategies)? 7
(Mixed) Security strategies • Discussion parallels that of pure actions • Notation: – Mixed strategy of row player: p ∈ ∆ – Mixed strategy of column player: q ∈ ∆ • Claim: Expected payoff to (maximizing) row player is p T Mq = � � m ij p i q j i ∈I j ∈J • Define: q ∈ ∆ p T Mq v = max p ∈ ∆ min p ∈ ∆ p T Mq v = min q ∈ ∆ max • As before v ≤ v and game has a value of v ∗ if v = v = v ∗ • p ∗ is a maximizing security strategy (or maximinimizer) if v ≤ p ∗ T Mq for all q • q ∗ is a minimizing security strategy (or minimaximizer) if p T Mq ∗ ≤ v for all p 8
Value • Minimax theorem: With mixed strategies, v = v = v ∗ • Proof: Judicious use of “separating hyperplane” theorem • Remarks: – Every zero-sum matrix game has a value over mixed strategies – Mixed strategies reasonable prediction of behavior in zero-sum games – Relatively easy to compute security strategies in zero-sum games. 9
Example • What are the security strategies and value of the following game? L R T 3 0 B 1 2 • Suppose ROW playing a strategy ( p, 1 − p ) , i.e., play T with probability p – ROW’s expected utility if COL plays L : 3 p + 1(1 − p ) = 2 p + 1 – ROW’s expected utility if COL plays R : 2(1 − p ) = 2 − p • Plot and inspect: 3 2.5 Expected Utility for ROW 2 COL playing R 1.5 COL player L 1 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p • The p that maximizes the minimum payoff is: p = 0 . 25 with a security level of v = 1 . 5 . • Similar analysis for COL demonstrates (1 / 2 , 1 / 2) is security strategy with a security level of v = 1 . 5 . • Hence, the game has a value v ∗ = v = v = 1 . 5 . 10
Recommend
More recommend