CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro
I CE - CREAM W ARS http://youtu.be/jILgxeNBK_8 2
G AME T HEORY § Game theory is the formal study of conflict and cooperation in (rational) multi-agent systems § Decision-making where several players must make choices that potentially affect the interests of other players: the effect of the actions of several agents are interdependent (and agents are aware of it) § Example: Auctioning ! Psychology: Theory of social situations 3
E LEMENTS OF A G AME § The players: how many players are there? Does nature/chance play a role? Players are assumed to be rational § A complete description of what the players can do: the set of all possible actions. 4
E LEMENTS OF A G AME § A description of the payoff / consequences for each player for every possible combination of actions chosen by all players playing the game. § A description of all players’ preferences over payoffs Utility function for each player 5
A GENT D ESIGN VS . M ECHANISM D ESIGN § Agent strategy design: Game theory can be used to compute the expected utility for each decision, and use this to determine the best strategy (and its expected return) against a rational player Strategy ≡ Policy § System-level mechanism design: Define the rules of the game, such that the collective utility of the agents is maximized when each agent strategy is designed to maximize its own utility according to ASD 6
M AKING DECISIONS : B ASIC DEFINITIONS § Decision-making can involve: one action or a sequence of actions § Action outcomes can be certain or subject to uncertainty A set 𝐵 of alternative actions to choose from is given, it can be either § discrete (finite or numerable ) or continuous (infinite) 𝐵 = {𝑏 ' ,𝑏 ) ,⋯ , 𝑏 + } 𝐵 = 𝑏 𝑏 ∈ 0,10 } § § Strategy (=Policy): tells a player what to do for every possible situation ( state ) throughout the game (complete algorithm for playing the game). It can be deterministic or stochastic § Strategy set 𝑇 : set of all strategies available for 1 the players to play. Set 𝑇 can be finite or infinite 𝑏 ' 𝑏 ) Sequential game, one player States: {1,2,3,𝑈} 2 3 𝐵 ' = 𝑏 ' ,𝑏 ) , 𝐵 ) = 𝑐 ' ,𝑐 ) , 𝐵 1 = 𝑑 ' ,𝑑 ) , 𝐵 3 = 𝑐 ' 𝑐 ) 𝑑 ' 𝑑 ) 𝑇 = {𝑏 ' 𝑐 ' ,𝑏 ' 𝑐 ) ,𝑏 ) 𝑑 ' , 𝑏 ) 𝑑 ) } E.g. strategy: 𝑡 = {𝑏 ' 𝑐 ' } 7
M AKING DECISIONS : B ASIC DEFINITIONS § One-action (static) games 3 2 1 𝑑 ' 𝑑 ) 𝑐 ' 𝑐 ) 𝑏 ' 𝑏 ) States: {1,2,3,𝑈} 𝐵 ' = 𝑏 ' ,𝑏 ) , 𝐵 ) = 𝑐 ' ,𝑐 ) , 𝐵 1 = 𝑑 ' ,𝑑 ) , 𝐵 3 = 𝑇 = (1, 𝑏 ' ,(1, 𝑏 ) ),(2,𝑐 ' ), (2, 𝑐 ) ), (3, 𝑑 ' ),(3,𝑑 ) )} E.g. strategy: 𝑡 = {(1,𝑏 ' ), (2, 𝑐 ) ), (3, 𝑑 ' )} § The strategy defines the behavior of an agent § The observed behavior of an agent following a given strategy is the outcome of the strategy § Pure strategy: a strategy in which there is no randomization , one specific action from the set 𝐵 is selected with certainty at each state / decision node The strategy set 𝑇 is also indicated as the pure strategy set § 8
P AYOFFS AND U TILITIES § How do we choose the strategy? § Rational agents : Principle of Maximum Expected Utility § Payoffs ~ Rewards in MDPs: what results from taking an action § Payoff (for a single agent): function that associates a numerical value with every action in 𝐵 𝜌: 𝐵 → ℝ Payoff (for a multi-agent scenario): The payoff of the action 𝑏 for § agent 𝑗 depends on the actions of the other players! 𝜌: 𝐵×𝐵×⋯ ×𝐵 → ℝ Utility: it can be any convenient additive function 𝑣 of the payoffs § § In the following the payoffs will coincide with the utility of the agents (it fully makes sense for the static games that we will consider) Notation: we will use 𝜌 B and 𝑣 B quite interchangeably § 9
I NFORMATION AND T YPES OF G AMES § Complete information game: Utility functions, payoffs, strategies and “types” of players are common knowledge § Incomplete information game: Players may not possess full information about their opponents (e.g., in auctions, each player knows its utility but not that of the other players). “ Parameters ” of the game are not fully known § Perfect information game: Each player, when making any decision, is perfectly informed of all the events that have previously occurred (e.g., chess) [Full observability] § Imperfect information game: Not all information is accessible to the player (e.g., poker, prisoner’s dilemma) [Partial observability] 10
T URN - TAKING VS . S IMULTANEOUS MOVES § Static games § All players take actions “simultaneously” Morra § → Imperfect information games § Complete information § Single-move games § Dynamic games max o Turn-taking games o Fully observable ↔ min Perfect Information Games o Complete Information 10 10 9 100 o Repeated moves 11
(S TRATEGIC -) N ORMAL -F ORM G AME § Let’s focus on static games Payoff matrix § There is a strategic interaction among players § Strategy profile : a set of strategies for all players which fully specifies all actions in a game. It must include one and only one strategy for every player § A game in normal form consists of: o Set of players 𝑂 = {1,… , 𝑜} o Set of actions available to each player, that defines the strategy set 𝑇 = {𝑡 ' , 𝑡 ) ,⋯ , 𝑡 G } o For each 𝑗 ∈ 𝑂 , a utility function 𝑣 B defined Payoff matrix in a over the set of all possible strategy profiles 2-player game 𝑣 B ∶ 𝑇 + → ℝ If each player 𝑘 ∈ 𝑂 plays the strategy 𝑡 J ∈ 𝑇 , the utility of player 𝑗 is 𝑣 B 𝑡 ' ,… , 𝑡 + that is the same as player 𝑗 ’ s payoff when strategy profile (𝑡 ' ,… , 𝑡 + ) is chosen 12
T HE I CE C REAM W ARS § 𝑂 = 1,2 § 𝑇 = [0,1] § 𝑡 i is the fraction of beach § ….. K L MK N , 𝑡 B < 𝑡 J ) K L MK N • 𝑣 B 𝑡 B , 𝑡 = 1 − , 𝑡 B > 𝑡 J J ) ' ) , 𝑡 B = 𝑡 J 13
T HE PRISONER ’ S DILEMMA (1962) § Two men are charged with a crime. Police suspects they are the authors of the crime but doesn’t have enough evidence § They are taken into custody and 6 6 9 can’t communicate with each other § They are told that: 9 o If one rats out and the other § 𝑂 = 1,2 does not, the rat will be freed, other jailed for 9 years § 𝑇 = {𝐷𝑝𝑜𝑔𝑓𝑡𝑡, 𝐸𝑝𝑜 Y 𝑢 𝑑𝑝𝑜𝑔𝑓𝑡𝑡} § Strategy profiles: o If both rat out, both will be { 𝐷, 𝐷 , 𝐷, 𝐸 , 𝐸, 𝐷 , 𝐸, 𝐸 } jailed for 6 years § 𝑣 [ 𝐷, 𝐷 = 6, 𝑣 [ 𝐷, 𝐸 = 0, § They also know that if neither rats 𝑣 [ 𝐸, 𝐷 = 9, 𝑣 [ 𝐸, 𝐸 = 1 out, both will be jailed for 1 year § Symmetric for 𝑣 ^ 14
T HE PRISONER ’ S DILEMMA (1962) 15
P RISONER ’ S DILEMMA : P AYOFF MATRIX Don’t confess = Don’t rat out B Don’t Cooperate with each other Confess Confess Confess = Rat out Don’t cooperate to each other, act selfishly! Don’t -1,-1 -9,0 Confess A Confess 0,-9 -6,-6 What would you do? 16
P RISONER ’ S DILEMMA : P AYOFF MATRIX B Don’t confess: B § If A don’t confess, B gets -1 § If A confess, B gets -9 Don’t Confess Confess B Confess: Don’t § If A don’t confess, B gets 0 -1,-1 -9,0 § If A confess, B gets -6 Confess A Confess 0,-9 -6,-6 Rational agent B opts to Confess 17
P RISONER ’ S DILEMMA § Confess (Defection = Acting selfishly) is a dominant strategy for B : no matters what A plays, the best reply strategy is always to confess § (Strictly) dominant strategy : yields a player strictly higher payoff, regardless of which decision(s) the other player(s) choose § Weakly dominant strategy : ties in some cases § Because of symmetry, Confess is a dominant strategy also for A § A will reason as follows: B ’s dominant strategy is to Confess, therefore, given that we are both rational agents, B will also Confess and we will both get 6 years. 18
P RISONER ’ S DILEMMA § But, is the dominant strategy ( 𝐷 , 𝐷 ) the best strategy? Don’t B Confess Confess Don’t -1,-1 -9,0 Confess A Confess 0,-9 -6,-6 19
P ARETO OPTIMALITY VS . E QUILIBRIA § Being selfish is a dominant strategy , but the players can do much better by cooperating: (-1,-1), which is the Pareto-optimal outcome § Pareto optimality : an outcome such that there is no other outcome that makes any player better off without making at least another one player worse off → Outcome ( Don’t Confess, Don’t confess ): (-1,-1) § A strategy profile forms an equilibrium if no player can benefit by switching strategies, given that every other player sticks with the same strategy , which is the case of ( Confess, Confess ) § An equilibrium is a local optimum in the space of the strategies 20
Recommend
More recommend