prisoner s dilemma
play

Prisoners Dilemma You and your partner have both been caught red - PDF document

Prisoners Dilemma You and your partner have both been caught red handed near the scene of a burglary. Both CS 331: Artificial Intelligence of you have been brought to the police station, Game Theory I where you are interrogated separately


  1. Prisoner’s Dilemma You and your partner have both been caught red handed near the scene of a burglary. Both CS 331: Artificial Intelligence of you have been brought to the police station, Game Theory I where you are interrogated separately by the police. 1 2 Prisoner’s Dilemma Prisoner’s Dilemma Here are the consequences of your actions: The police present your options: • If you testify against your partner and your partner 1. You can testify against your partner refuses, you are released and your partner will serve 10 years in jail 2. You can refuse to testify against your • If you refuse and your partner testifies against partner (and keep your mouth shut) you, you will serve 10 years in jail and your partner is released • If both of you testify against each other, both of you will serve 5 years in jail • If both of you refuse, both of you will only serve 1 year in jail 3 4 Prisoner’s Dilemma Game Theory • Your partner is offered the same deal • Welcome to the world of Game Theory! • Remember that you can’t communicate with • Game Theory defined as “the study of your partner and you don’t know what rational decision-making in situations of conflict and/or cooperation” he/she will do • Will you testify or refuse? • Adversarial search is part of Game Theory • We will now look at a much broader group of games 5 6 1

  2. Types of games we will deal with Uses of Game Theory today • Two players • Agent design: determine the best strategy • Discrete, finite action space against a rational player and the expected return for each player • Simultaneous moves (or without knowledge • Mechanism design: Define the rules of the of the other player’s move) game to influence the behavior of the agents • Imperfect information • Zero sum games and non-zero sum games Real world applications: negotiations, bandwidth sharing, auctions, bankruptcy proceedings, pricing decisions 7 8 Back to Prisoner’s Dilemma Formal definition of Normal Form Normal-form (or matrix-form) representation The normal-form representation of an n- player game specifies: Actions: testify, refuse Players: • The players’ strategy spaces S 1 , …, S n Alice, Bob • Their payoff functions u 1 ,…,u n Bob: testify Bob: refuse where u i : S 1 x S 2 x … x S n → R i.e. a Alice: testify A = -5, B = -5 A = 0, B = -10 function that maps from the combination of Alice: refuse A = -10, B = 0 A = -1, B = -1 strategies of all the players and returns the payoff for player i Payoffs for each player (non-zero sum game in this example) 9 10 Strategies Other Normal Form Games • Each player must adopt and execute a The game of chicken: two cars drive at each other on a narrow road. The first one to swerve loses. strategy • Strategy = policy i.e. mapping from state to action B: Stay B: Swerve • Prisoner’s Dilemma is a one move game: A: Stay A = -100, B = -100 A = 1, B = -1 – Strategy is a single action A: Swerve A = -1, B = 1 A = 0, B = 0 – There is only a single state • A pure strategy is a deterministic policy 11 12 2

  3. Prisoner’s Dilemma Strategy Other Normal Form Games Penalty kick in Soccer: Shooter vs. Goalie. The shooter shoots Bob: testify Bob: refuse the ball either to the left or to the right. The goalie dives either Alice: testify A = -5, B = -5 A = 0, B = -10 left or right. If it’s the same side as the ball was shot, the goalie makes the save. Otherwise, the shooter scores. Alice: refuse A = -10, B = 0 A = -1, B = -1 Goalie: Left Goalie: Right • What is the right pure strategy for Alice or Shooter: S =-1, G = 1 S = 1, G = -1 Bob? Left • (Assume both want to maximize their own Shooter: S = 1, G = -1 S = -1, G = 1 expected utility) Right 13 14 Prisoner’s Dilemma Strategy Prisoner’s Dilemma Strategy Bob: testify Bob: refuse Bob: testify Bob: refuse Alice: testify A = -5, B = -5 A = 0, B = -10 Alice: testify A = -5, B = -5 A = 0, B = -10 Alice: refuse A = -10, B = 0 A = -1, B = -1 Alice: refuse A = -10, B = 0 A = -1, B = -1 Alice thinks: Testify is a dominant strategy for the game • If Bob testifies, I get 5 years if I testify and 10 (notice how the payoffs for Alice are always years if I don’t bigger if she testifies than if she refuses) • If Bob doesn’t testify, I get 0 years if I testify and 1 year if I don’t • “Alright I’ll testify” 15 16 Dominant Strategies Example of Dominant Strategies Suppose a player has two strategies S and S’. We Bob: testify Bob: refuse say S dominates S’ if choosing S always yields at “testify” strongly Alice: testify A = -5, B = -5 A = 0, B = -10 least as good an outcome as choosing S’. dominates “refuse” • S strictly dominates S’ if choosing S always Alice: refuse A = -10, B = 0 A = -1, B = -1 gives a better outcome than choosing S’ (no matter what the other player does) • S weakly dominates S’ if there is one set of Bob: testify Bob: refuse “testify” weakly opponent’s actions for which S is superior, and all Alice: testify A = -5, B = -5 A = 0, B = -10 dominates “refuse” other sets of opponent’s actions give S and S’ the Alice: refuse A = -10, B = 0 A = 0, B = -1 same payoff. Note 17 18 3

  4. Dominance Dominated Strategies (The opposite) S is dominated by S’ if choosing S never gives a • It is irrational not to play a strictly dominant better outcome than choosing S’, no matter what strategy (if it exists) the other players do • It is irrational to play a strictly dominated • S is strictly dominated by S’ if choosing S strategy always gives a worse outcome than choosing S’, • Since Game Theory assumes players are no matter what the other player does • S is weakly dominated by S’ if there is at least rational, they will not play strictly one set of opponent’s actions for which S gives a dominated strategies worse outcome than S’, and all other sets of opponent’s actions give S and S’ the same payoff. 19 20 Iterated Elimination of Strictly Iterated Eliminiation of Strictly Dominated Strategies Dominated Strategies Bob: testify Bob: refuse Bob: testify Bob: refuse Alice: testify A = -5, B = -5 A = 0, B = -10 Alice: testify A = -5, B = -5 A = 0, B = -10 Alice: refuse A = -10, B = 0 A = -1, B = -1 But in this simplified game, Simplifies to: “refuse” is also a strictly dominated strategy for Bob Bob: testify Bob: refuse Alice: testify A = -5, B = -5 A = 0, B = -10 21 22 Iterated Elimination of Strictly Dominant Strategy Equilibrium Dominated Strategies Bob: testify Bob: refuse Bob: testify Bob: refuse Alice: testify A = -5, B = -5 A = 0, B = -10 Alice: testify A = -5, B = -5 A = 0, B = -10 Alice: refuse A = -10, B = 0 A = -1, B = -1 Simplifies to: • (testify,testify) is a dominant strategy equilibrium • It’s an equilibrium because no player can benefit This is the game- Bob: testify theoretic solution to by switching strategies given that the other player Prisoner’s Dilemma Alice: testify A = -5, B = -5 sticks with the same strategy (note that it’s worse • An equilibrium is a local optimum in the space of off than if both policies players refuse) 23 24 4

  5. Iterated Prisoner’s Dilemma Pareto Optimal • An outcome is Pareto optimal if there is no • Possible to arrive at the Pareto optimal other outcome that all players would prefer solution • An outcome is Pareto dominated by • Strategies for repeated game: another outcome if all players would prefer – Perpetual punishment : refuse unless opponent the other outcome has ever played testify • If Alice and Bob both testify, this outcome – Tit-for-tat : start with refuse ; then play the is Pareto dominated by the outcome if they opponents previous move both refuse. • This situation arose in trench warfare in • This is why it’s called Prisoner’s Dilemma WWI (see The Evolution of Cooperation by Robert Axelrod for more) 25 26 What If No Strategies Are Strictly Nash Equilibrium Dominated? • A dominant strategy equilibrium is a B special case of a Nash Equilibrium S1 S2 S3 • Nash Equilibrium: A strategy profile in S1 A = 0, B = 4 A = 4, B = 0 A = 5, B = 3 which no player wants to deviate from his A S2 A = 4, B = 0 A = 0, B = 4 A = 5, B = 3 or her strategy. • Strategy profile: An assignment of a S3 A = 3, B = 5 A = 3, B = 5 A = 6, B = 6 strategy to each player e.g. (testify, testify) in Prisoner’s Dilemma How do we find these equilibrium points in the game? • Any Nash Equilibrium will survive iterated elimination of strictly dominated strategies 27 28 Nash Equilibrium in Prisoner’s Dilemma How to Spot a Nash Equilibrium B S1 S2 S3 Bob: testify Bob: refuse S1 A = 0, B = 4 A = 4, B = 0 A = 5, B = 3 Alice: testify A = -5, B = -5 A = 0, B = -10 A S2 A = 4, B = 0 A = 0, B = 4 A = 5, B = 3 Alice: refuse A = -10, B = 0 A = -1, B = -1 S3 A = 3, B = 5 A = 3, B = 5 A = 6, B = 6 If (testify,testify) is a Nash Equilibrium, then: • Alice doesn’t want to change her strategy of “testify” given that Bob chooses “testify” • Bob doesn’t want to change his strategy of “testify” given that Alice chooses “testify” 29 30 5

Recommend


More recommend