l ecture 28 g ame t heory 3
play

L ECTURE 28: G AME T HEORY 3 I NSTRUCTOR : G IANNI A. D I C ARO M - PowerPoint PPT Presentation

15-382 C OLLECTIVE I NTELLIGENCE S18 L ECTURE 28: G AME T HEORY 3 I NSTRUCTOR : G IANNI A. D I C ARO M IXED N ASH E QUILIBRIUM R P S 1 3 ,1 3 , 1 3 , 1 3 ,1 3 , 1 3 R 0,0 -1,1 1,-1 Finding ME: & , ' & , ( & ,


  1. 15-382 C OLLECTIVE I NTELLIGENCE – S18 L ECTURE 28: G AME T HEORY 3 I NSTRUCTOR : G IANNI A. D I C ARO

  2. M IXED N ASH E QUILIBRIUM R P S 1 3 ,1 3 , 1 3 , 1 3 ,1 3 , 1 3 R 0,0 -1,1 1,-1 Finding ME: & ,𝜌 ' & ,𝜌 ( & , be the probabilities of the pure Let 𝜌 % P 1,-1 0,0 -1,1 Β§ & + 𝜌 ' & + 𝜌 ( & = 1 strategy mix for player 𝑗 = 1,2 , 𝜌 % -1,1 1,-1 0,0 Β§ A mixed strategy equilibrium needs to make player S 𝑗 indifferent among all three of his strategies (i.e. same expected utility) R P S Γ  Find player 𝑗 expected utilities as a function of Β§ the parameters of the mixed strategy and set the R 0,0 -2,2 1,-1 parameters in order to satisfy the previous requirement 2,-2 0,0 -1,1 P Β§ In symmetric zero-sum games the expected utility of the players at equilibrium is zero Γ  This property can be used to rule out equilibrium candidates (just -1,1 1,-1 0,0 S 15781 Fall 2016: Lecture 22 check if one player has a positive utility!) 2

  3. G AME OF CHICKEN http://youtu.be/u7hZ9jKrwvo Β§ Each player, in attempting to secure his best outcome, risks the worst Β§ Every player wants to dare , but only if the other chickens out ! Β§ A mediator would help… 3

  4. G AME OF CHICKEN Β§ Social welfare is the sum of utilities Dare Chicken Β§ Optimal social welfare = 6 0,0 4,1 Dare Β§ Pure NE: (C,D) and (D,C), social welfare = 5 1,4 3,3 Chicken Β§ Mixed NE: both ( . / , . / ), social welfare = 4 Β§ Can we do better? Players are independent so far … 4

  5. C ORRELATED EQUILIBRIUM Β§ A β€œtrusted” authority / mediator chooses a pair of strategies (𝑑 1 , 𝑑 2 ) according to a probability distribution π‘ž over 𝑇 2 (it can be generalized to π‘œ players) Robert Aumann Nobel prize, 2005 Β§ The mediator β€œflips a coin” / draw according to the distribution π‘ž( 𝑑 . , 𝑑 / ) and, based on the outcome, tells the players which pure strategy to use based 5

  6. C ORRELATED EQUILIBRIUM Β§ Γ  The trusted party only tells each player what to do, but it does not reveal what the other party is supposed to do! Β§ The distribution 𝒒 is known to the players : each player knows the probability of observing a strategy profile and assumes the other player will follow mediator’s instructions Β§ Γ  Posterior conditional probability is known: Pr [𝑑 & |𝑑 ; ] Β§ It is a Correlated Equilibrium (CE) if no player wants to deviate from the trusted party’s instructions, such that choices are correlated Β§ Γ  Find distribution π‘ž that guarantees a CE 6

  7. C ORRELATED EQUILIBRIUM Β§ Common knowledge: Distribution π‘ž (is CE) o (D,D): 0 Dare Chicken o (D,C): . > 0,0 7,2 Dare o (C,D): . > o (C,C): . > 2,7 6,6 Chicken Β§ If Player 2 is told to play D, then P2 knows that the outcome must be (C,D) and that Player 1 will obey the instructions Γ  P1 plays C ΓΌ Based on this, Player 2 has no incentive to change from playing D, as given 7

  8. C ORRELATED EQUILIBRIUM Chicken Dare Β§ Distribution π‘ž (is CE) o (D,D): 0 Dare 0,0 7,2 o (D,C): . > o (C,D): . > Chicken o (C,C): . 2,7 6,6 > Β§ If Player 2 is told to play C, then 2 knows that the outcome must be (D,C) or (C,C) with equal probability. Β§ Player’s 2 expected utility on playing C conditioned on the fact that he is told to play C (and Player 1 will obey instructions) is: . . . . / 𝑣 / 𝐸, 𝐷 + / 𝑣 / 𝐷, 𝐷 = / 2 + / 6 = 4 Β§ If Player 2 deviates from instructions and plays D: 𝑣 / = 3.5 < 4 ΓΌ It’s better to follow the instructions! 8

  9. C ORRELATED EQUILIBRIUM Chicken Dare Β§ Distribution π‘ž (is CE) o (D,D): 0 Dare o (D,C): . 0,0 7,2 > o (C,D): . > o (C,C): . Chicken 2,7 6,6 > Β§ Player 2 does not have incentive to deviate Β§ Since the game is symmetric , also Player 1 does not have incentive to deviate Β§ β†’ Correlated equilibrium! Β§ Expected reward per player: (1/3)*7 + (1/3)*2 + (1/3)*6 = 5 Β§ Mixed strategy NE: 4*(2/3), which is < 5 Β§ Social welfare: 30/3 9

  10. C ORRELATED EQUILIBRIUM Β§ Let 𝑂 = {1,2} for simplicity Β§ A mediator chooses a pair of strategies (𝑑 . ,𝑑 / ) according to a distribution π‘ž over 𝑇 / Β§ Reveals 𝑑 . to player 1 and 𝑑 / to player 2 Β§ When player 1 gets 𝑑 . ∈ 𝑇 , he knows that the distribution over strategies of 2 is Pr 𝑑 / 𝑑 . = Pr 𝑑 . ∧ 𝑑 / π‘ž 𝑑 . , 𝑑 / = M ) βˆ‘ Pr 𝑑 . π‘ž(𝑑 . , 𝑑 / P ∈( N O 10

  11. C OMPUTING CE S TRATEGY Β§ Player’s 1 strategy 𝑑 . is a best response if its expected utility cannot be unilaterally improved based on what he knows: M ∈ 𝑇 M ,𝑑 / ) Q Pr 𝑑 / 𝑑 . 𝑣 . 𝑑 . ,𝑑 / β‰₯ Q Pr 𝑑 / 𝑑 . 𝑣 . (𝑑 . , βˆ€π‘‘ . N O ∈( N O ∈( Β§ Equivalently, replacing using Bayes’ rule M ,𝑑 / ) Q π‘ž 𝑑 . , 𝑑 / 𝑣 . 𝑑 . ,𝑑 / β‰₯ Q π‘ž 𝑑 . ,𝑑 / 𝑣 . (𝑑 . N O ∈( N O ∈( Β§ π‘ž is a correlated equilibrium (CE) if both players are best responding 11

  12. CE A S LP Β§ Can compute CE via linear programming in polynomial time! find π‘ž 𝑑 . , 𝑑 / M , 𝑑 / ∈ 𝑇, s.t. βˆ€π‘‘ . , 𝑑 . M ,𝑑 / ) Q π‘ž 𝑑 . ,𝑑 / 𝑣 . 𝑑 . ,𝑑 / β‰₯ Q π‘ž 𝑑 . ,𝑑 / 𝑣 . (𝑑 . N O ∈U N O ∈U M ∈ 𝑇, βˆ€π‘‘ . , 𝑑 / , 𝑑 / M ) Q π‘ž 𝑑 . ,𝑑 / 𝑣 / 𝑑 . ,𝑑 / β‰₯ Q π‘ž 𝑑 . ,𝑑 / 𝑣 / (𝑑 . ,𝑑 / N T ∈U N T ∈U Q π‘ž 𝑑 . , 𝑑 / = 1 N T ,N O ∈( βˆ€π‘‘ . , 𝑑 / ∈ 𝑇, π‘ž 𝑑 . , 𝑑 / ∈ [0,1] 12

  13. B EST W ELFARE CE Β§ Adding an objective (linear) function f , the best correlated equilibrium (e.g., max welfare) can be found max 𝑔(π‘ž 𝑑 . , 𝑑 / ;𝑣 . , 𝑣 / ) s.t. βˆ€π‘‘ . , 𝑑 . M , 𝑑 / ∈ 𝑇, M ,𝑑 / ) Q π‘ž 𝑑 . ,𝑑 / 𝑣 . 𝑑 . ,𝑑 / β‰₯ Q π‘ž 𝑑 . ,𝑑 / 𝑣 . (𝑑 . N O ∈U N O ∈U M ∈ 𝑇, βˆ€π‘‘ . , 𝑑 / , 𝑑 / M ) Q π‘ž 𝑑 . ,𝑑 / 𝑣 / 𝑑 . ,𝑑 / β‰₯ Q π‘ž 𝑑 . ,𝑑 / 𝑣 / (𝑑 . ,𝑑 / N T ∈U N T ∈U Q π‘ž 𝑑 . , 𝑑 / = 1 N T ,N O ∈( βˆ€π‘‘ . , 𝑑 / ∈ 𝑇, π‘ž 𝑑 . , 𝑑 / ∈ [0,1] 13

  14. I MPLEMENTATION OF CE Β§ Instead of a mediator, use a hat! Β§ Balls in hat are labeled with β€œchicken” or β€œdare”, each blindfolded player takes a ball Which balls implement D the distribution π‘ž before ? D C C C D C C 1. 1 chicken, 1 dare 2. 2 chicken, 1 dare 3. 2 chicken, 2 dare 4. 3 chicken, 2 dare E.g., An automatic trusting authority can be implemented using cryptographic algorithms 14

  15. CE VS . NE What is the relation between CE and NE? CE β‡’ NE 1. NE β‡’ CE 2. NE ⇔ CE 3. NE βˆ₯ CE 4. Β§ For any pure strategy NE, there is a corresponding correlated equilibrium yielding the same outcome. Β§ For any mixed strategy NE, there is a corresponding correlated equilibrium yielding the same distribution of outcomes. Β§ From Nash theorem, β€œall” games have a mixed strategies NE”. Since a NE implies a CE, a CE always exist 15

  16. A DIFFERENT TYPE OF GAMES : S TACKELBERG GAMES L R Β§ Playing up is a dominant strategy for row player U 1,1 3,0 Β§ Row player plays up Β§ Column player would then play left 0,0 2,1 Β§ Therefore, (1,1) is the only Nash D equilibrium outcome 16

  17. C OMMITMENT IS GOOD Β§ Suppose the game is played L R sequentially as follows: o Row player commits to playing U 1,1 3,0 a row o Column player observes the commitment and chooses a D 0,0 2,1 column Β§ Row player can commit to playing Down: Column player will play Right and the Row player gets now a better reward! 17

  18. C OMMITMENT TO MIXED STRATEGY Β§ By committing to a mixed strategy , 𝜌 _ = 0 𝜌 % = 1 row player can get even better and guarantee a reward of almost 2.5: 𝜌 ] = 0.49 1,1 3,0 0.49Γ—3 + 0.51Γ—2 Β§ Stackelberg strategy (1934) 𝜌 ^ = 0.51 0,0 2,1 Β§ Rooted in duopoly scenarios Β§ Player 1 ( Leader ) moves at the start of the game. Then use backward induction to find the subgame perfect equilibrium. Β§ First for any output of leader, find the strategy of Follower that maximizes its payoff (its expected best reply). Β§ Next, find the strategy of leader that maximizes leader player utility, given the strategy of follower 18

  19. C OMPUTING S TACKELBERG Β§ Theorem [Conitzer and Sandholm, 2006] : In 2-player normal form games, an optimal Stackelberg strategy can be found in polynomial time Β§ Theorem: The problem is NP-hard when the number of players is β‰₯ 3 19

  20. T RACTABILITY : 2 PLAYERS Β§ For each pure strategy 𝑑 / of the follower, we compute via the LP below a mixed strategy 𝑦 . for the leader such that: o Playing 𝑑 / is a best response for the follower o Under this constraint, 𝑦 . is optimal (for follower) βˆ— that maximizes leader’s utility Β§ Choose 𝑦 . max βˆ‘ 𝑦 . 𝑑 . 𝑣 . (𝑑 . ,𝑑 / ) N T ∈( s.t. M ∈ 𝑇, M βˆ‘ . , 𝑑 / β‰₯ βˆ‘ 𝑦 . 𝑑 . 𝑣 / 𝑑 𝑦 . 𝑑 . 𝑣 / 𝑑 . , 𝑑 / βˆ€π‘‘ / N T ∈( N T ∈( βˆ‘ 𝑦 . 𝑑 . = 1 N T ∈( βˆ€π‘‘ . ∈ 𝑇, 𝑦 . 𝑑 . ∈ [0,1] 20

  21. A PPLICATION : SECURITY Β§ Attacker monitors defender and tries to maximize damage , while a defender deploys resources to minimize damage based on knowledge of what the attacker would like to obtain Β§ Airport security: deployed at LAX Β§ Federal Air Marshals Β§ Coast Guard Β§ Idea: o Defender commits to mixed strategy o Attacker observes and best responds 21

Recommend


More recommend