Game Theory CS 188: Artificial Intelligence � Game theory: study of strategic situations, Spring 2006 usually simultaneous actions Prisoner’s Dilemma � A game has: Lecture 26: Game Theory � Players A 4/25/2006 Testify Refuse � Actions B Testify -5,-5 -10,0 � Payoff matrix Refuse 0,-10 -1,-1 Dan Klein – UC Berkeley � Example: prisoner’s dilemma Strategies Dominance and Optimality � Strategy = policy Prisoner’s Dilemma � Strategy Dominance: Prisoner’s Dilemma � A strategy s for A (strictly) � A A Pure strategy dominates s’ if it produces a � Deterministic policy Testify Refuse Testify Refuse better outcome for A, for any B � In a one-move game, just a move strategy B Testify -5,-5 -10,0 B Testify -5,-5 -10,0 Refuse 0,-10 -1,-1 Refuse 0,-10 -1,-1 � Mixed strategy � Outcome Dominance: � Randomized policy � Ever good to use one? � An outcome o Pareto dominates Two-Finger Morra Two-Finger Morra o’ if all players prefer o to o’ � Strategy profile: a spec of one � An outcome is Pareto optimal if O O strategy per player there is no outcome that all One Two One Two players would prefer � Outcome: each strategy profile E One -2,2 3,-3 E One -2,2 3,-3 results in an (expected) number for Two 3,-3 -4,4 Two 3,-3 -4,4 each player Equilibria Coordination Games � In the prisoner’s dilemma: A � No dominant strategy Technology Choice � What will A do? Testify Refuse � But, two (pure) Nash A � What will B do? B Testify -5,-5 -10,0 equilibria � What’s the dilemma? DVD HD-DVD Refuse 0,-10 -1,-1 B DVD 5,5 -2,-1 � Both testifying is a (Nash) equilibrium � What should agents do? HD-DVD -2,-1 8,8 � Neither player can benefit from a unilateral change in strategy � Can sometimes choose � I.e., it’s a local optimum (not necessarily global) Pareto optimal Nash � Nash showed that every game has such an equilibrium equilibrium � Note: not every game has a dominant strategy equilibrium Driving Direction � But may be ties! A � � Naturally gives rise to What do we have to change for the prisoners to refuse? communication � Left Right Change the payoffs � � Also: correlated equilibria Consider repeated games B Left 1,1 -1,-1 � Limit the computational ability of the agents Right -1,-1 1,1 � How would we model a “code of thieves”? 1
Mixed Strategy Games (Zero-Sum) Minimax Strategies Two-Finger Morra � What’s the Nash equilibrium? � Idea: force one player to chose Two-Finger Morra � No pure strategy equilibrium and declare a strategy first O � Must look at mixed strategies � Say E reveals first O One Two � For each E strategy, O has a One Two E One -2,2 3,-3 minimax response � Mixed strategies: E One -2,2 3,-3 Two 3,-3 -4,4 � Utility of the root favors O (why?) � Distribution over actions per state Two 3,-3 -4,4 and is -3 (from E’s perspective) � In a one-move game, a single � If O goes first, root is 2 (for E) distribution � If these two utilities matched, we � For Morra, a single number p even 1 2 1 2 would know the utility of the specifies the strategy maximum equilibrium 1 2 1 2 1 2 1 2 � How to choose the optimal � Must look at mixed strategies 2 -3 -3 4 2 -3 -3 4 mixed strategy? Continuous Minimax Repeated Games � What about repeated games? Two-Finger Morra � Imagine a minimax tree: � E.g. repeated prisoner’s dilemma � Instead of the two pure strategies, O � Future responses, retaliation becomes an issue first player has infinitely many One Two mixed ones � Strategy can condition on past experience E One -2,2 3,-3 � Note that second player should Two 3,-3 -4,4 always respond with a pure � Repeated prisoner’s dilemma strategy (why?) � Fixed numbers of games causes repeated betrayal � If agents unsure of number of future games, other options � Here, can calculate the minimax � E.g. perpetual punishment: silent until you’re betrayed, then testify thereafter (and maximin) values [p;1, (1-p);2] � E.g. tit-for-tat: do what was done to you last round � Both are ½ (from O’s perspective) � It’s enough for your opponent to believe you are incapable of � 1 2 Correspond to [7/12; 1, 5/12; 2] for remembering the number of games played (doesn’t actually both players matter whether the limitation really exists) (2)(p)+(-3)(1-p) (-3)(p)+(4)(1-p) Partially Observed Games The Ultimatum Game � Much harder to analyze � Game theory can reveal interesting issues in social psychology � You have to work with trees of belief states � E.g. the ultimatum game � Problem: you don’t know your opponent’s belief state! � Proposer: receives $x, offers split $k / $(x-k) � Accepter: either � Accepts: gets $k, proposer gets $(x-k) � Newer techniques can solve some partially observable � Rejects: neither gets anything games � Nash equilibrium? � Mini-poker analysis shows, e.g., that bluffing can be a rational � Any strategy profile where proposer offers $k and accepter will accept $k or action greater � � Randomization: not just for being unpredictable, also useful for But that’s not the interesting part… minimizing what opponent can learn from your actions � Issues: � Why do people tend to reject offers which are very unfair (e.g. $20 from $100)? � Irrationality? � Utility of $20 exceeded by utility of punishing the unfair proposer? � What about if x is very very large? 2
Mechanism Design Auctions � One use of game theory: mechanism design � Example: auctions � Consider auction for one item � Designing a game which induces desired behavior in rational � Each bidder i has value v i and bids b i for item agents � English auction: increasing bids � E.g. avoiding tragedies of the commons � How should bidder i bid? � What will the winner pay? � Classic example: farmers share a common pasture � Why is this not an optimal result? � Each chooses how many goats to graze � Adding a goat gains utility for that farmer � Sealed single-bid auction, highest pays bid � Adding a goat slightly degrades the pasture � How should bidder i bid? � � Inevitable that each farmer will keep adding goats until the Why is bidding your value no longer dominant? � Why is this auction not optimal? commons is destroyed (tragedy!) � Sealed single-bid second-price auction � Classic solution: charge for use of the commons � How should bidder i bid? � Bid v i – why? � Prices need to be set to produce the right behavior 3
Recommend
More recommend