lecture 6 multiagent interactions
play

LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to MultiAgent - PDF document

What are Multiagent Systems? LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to MultiAgent Systems http://www.csc.liv.ac.uk/~mjw/pubs/imas 6-1 6-2 MultiAgent Systems Utilities and Preferences Assume we have just two agents: Ag = {i,


  1. What are Multiagent Systems? LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to MultiAgent Systems http://www.csc.liv.ac.uk/~mjw/pubs/imas 6-1 6-2 MultiAgent Systems Utilities and Preferences � Assume we have just two agents: Ag = {i, j} Thus a multiagent system contains a � Agents are assumed to be self-interested : they have number of agents… preferences over how the environment is � …which interact through communication… � Assume Ω = { ω 1 , ω 2 , …} is the set of “outcomes” that agents have preferences over � …are able to act in an environment… � We capture preferences by utility functions : � …have different “spheres of influence” (which u i = Ω → � may coincide)… u j = Ω → � � …will be linked by other (organizational) � Utility functions lead to preference orderings over relationships outcomes: ω � i ω ’ means u i ( ω ) � u i ( ω ’) ω � i ω ’ means u i ( ω ) > u i ( ω ’) 6-3 6-4 What is Utility? Multiagent Encounters � We need a model of the environment in which these � Utility is not money (but it is a useful analogy) agents will act… � Typical relationship between utility & money: � agents simultaneously choose an action to perform, and as a result of the actions they select, an outcome in Ω will result � the actual outcome depends on the combination of actions � assume each agent has just two possible actions that it can perform, C (“cooperate”) and D (“defect”) � Environment behavior given by state transformer function : 6-5 6-6 1

  2. Multiagent Encounters Rational Action � Here is a state transformer function: � Suppose we have the case where both agents can influence the outcome, and they have utility functions as follows: (This environment is sensitive to actions of both agents.) � With a bit of abuse of notation: � Here is another: (Neither agent has any influence in this � Then agent i ’s preferences are: environment.) � And here is another: � “ C ” is the rational choice for i . (Because i prefers all outcomes that arise through C over all outcomes that arise through D .) (This environment is controlled by j .) 6-7 6-8 Payoff Matrices Dominant Strategies � We can characterize the previous scenario in � Given any particular strategy (either C or D ) of agent a payoff matrix : i , there will be a number of possible outcomes � We say s 1 dominates s 2 if every outcome possible by i playing s 1 is preferred over every outcome possible by i playing s 2 � A rational agent will never play a dominated strategy � So in deciding what to do, we can delete dominated strategies � Unfortunately, there isn’t always a unique � Agent i is the column player undominated strategy � Agent j is the row player 6-9 6-10 Competitive and Zero-Sum Interactions Nash Equilibrium In general, we will say that two strategies s 1 and s 2 � Where preferences of agents are � are in Nash equilibrium if: diametrically opposed we have strictly under the assumption that agent i plays s 1 , agent j can do 1. competitive scenarios no better than play s 2 ; and � Zero-sum encounters are those where utilities under the assumption that agent j plays s 2 , agent i can do 2. no better than play s 1 . sum to zero: Neither agent has any incentive to deviate from a � u i ( ω ) + u j ( ω ) = 0 for all ω � Ω Nash equilibrium � Zero sum implies strictly competitive Unfortunately: � � Zero sum encounters in real life are very rare Not every interaction scenario has a Nash equilibrium 1. … but people tend to act in many scenarios Some interaction scenarios have more than one Nash 2. equilibrium as if they were zero sum 6-11 6-12 2

  3. The Prisoner’s Dilemma The Prisoner’s Dilemma � Two men are collectively charged with a � Payoff matrix for crime and held in separate cells, with no way prisoner’s dilemma: of meeting or communicating. They are told that: � Top left: If both defect, then both get � if one confesses and the other does not, the punishment for mutual defection confessor will be freed, and the other will be jailed � Top right: If i cooperates and j defects, i gets for three years sucker’s payoff of 1, while j gets 4 � if both confess, then each will be jailed for two years � Bottom left: If j cooperates and i defects, j gets sucker’s payoff of 1, while i gets 4 � Both prisoners know that if neither confesses, then they will each be jailed for one year � Bottom right: Reward for mutual cooperation 6-13 6-14 The Prisoner’s Dilemma The Prisoner’s Dilemma � The individual rational action is defect � This apparent paradox is the fundamental This guarantees a payoff of no worse than 2, problem of multi-agent interactions . whereas cooperating guarantees a payoff of at It appears to imply that cooperation will not most 1 occur in societies of self-interested agents . � So defection is the best response to all � Real world examples: possible strategies: both agents defect, and � nuclear arms reduction (“why don’t I keep mine. . . ”) get payoff = 2 � free rider systems — public transport; � But intuition says this is not the best outcome: � in the UK — television licenses. Surely they should both cooperate and each � The prisoner’s dilemma is ubiquitous . get payoff of 3! � Can we recover cooperation? 6-15 6-16 Arguments for Recovering Cooperation The Iterated Prisoner’s Dilemma � Conclusions that some have drawn from this � One answer: play the game more than once analysis: � If you know you will be meeting your � the game theory notion of rational action is wrong! opponent again, then the incentive to defect � somehow the dilemma is being formulated appears to evaporate wrongly � Cooperation is the rational choice in the � Arguments to recover cooperation: infinititely repeated prisoner’s dilemma � We are not all Machiavelli! (Hurrah!) � The other prisoner is my twin! � The shadow of the future… 6-17 6-18 3

  4. Backwards Induction Axelrod’s Tournament � But…suppose you both know that you will play the game exactly n times � Suppose you play iterated prisoner’s dilemma On round n - 1 , you have an incentive to against a range of opponents… defect, to gain that extra bit of payoff… What strategy should you choose, so as to But this makes round n – 2 the last “real”, and maximize your overall payoff? so you have an incentive to defect there, too. � Axelrod (1984) investigated this problem, with This is the backwards induction problem. a computer tournament for programs playing � Playing the prisoner’s dilemma with a fixed, the prisoner’s dilemma finite, pre-determined, commonly known number of rounds, defection is the best strategy 6-19 6-20 Recipes for Success in Axelrod’s Strategies in Axelrod’s Tournament Tournament ALLD: � “Always defect” — the hawk strategy; � � Axelrod suggests the following rules for succeeding in his tournament: TIT-FOR-TAT: � On round u = 0 , cooperate � Don’t be envious : 1. Don’t play as if it were zero sum! On round u > 0 , do what your opponent did on round u – 1 2. � Be nice : TESTER: � Start by cooperating, and reciprocate cooperation On 1st round, defect. If the opponent retaliated, then play � TIT-FOR-TAT. Otherwise intersperse cooperation and � Retaliate appropriately : defection. Always punish defection immediately, but use “measured” force — don’t overdo it JOSS: � � Don’t hold grudges : As TIT-FOR-TAT, except periodically defect � Always reciprocate cooperation immediately 6-21 6-22 Game of Chicken Other Symmetric 2 x 2 Games � Consider another type of encounter — the game of � Given the 4 possible outcomes of (symmetric) chicken : cooperate/defect games, there are 24 possible orderings on outcomes � CC � i CD � i DC � i DD Cooperation dominates � DC � i DD � i CC � i CD (Think of James Dean in Rebel without a Cause : Deadlock . You will always do best by defecting swerving = coop, driving straight = defect.) � DC � i CC � i DD � i CD � Difference to prisoner’s dilemma: Prisoner’s dilemma Mutual defection is most feared outcome . � DC � i CC � i CD � i DD (Whereas sucker’s payoff is most feared in prisoner’s Chicken dilemma.) � CC � i DC � i DD � i CD Stag hunt � Strategies (c,d) and (d,c) are in Nash equilibrium 6-23 6-24 4

Recommend


More recommend