peng session 2
play

PENG Session 2 Roland M uhlenbernd Seminar f ur - PowerPoint PPT Presentation

PENG Session 2 Roland M uhlenbernd Seminar f ur Sprachwissenschaft University of T ubingen Review Prominent 2-player Games: C D S R B S C 3 , 3 0 , 5 S 2 , 2 0 , 1 B 2 , 1 0 , 0 5 , 0 1 , 1 1 , 0 1 , 1 0 , 0 1 , 2 D R S


  1. PENG Session 2 Roland M¨ uhlenbernd Seminar f¨ ur Sprachwissenschaft University of T¨ ubingen

  2. Review Prominent 2-player Games: C D S R B S C 3 , 3 0 , 5 S 2 , 2 0 , 1 B 2 , 1 0 , 0 5 , 0 1 , 1 1 , 0 1 , 1 0 , 0 1 , 2 D R S C: Cooperate, D: Defect S: Stag, R: Rabbit B: Bach, S: Stravinsky Signaling Game SG = �{ S , R } , T , Pr , M , A , U � N t 1 t 2 . 5 . 5 S S m 1 m 2 m 1 m 2 R R R R a 1 a 2 a 1 a 2 a 1 a 2 a 1 a 2 1 0 1 0 0 1 0 1

  3. Repeated Games: Decisions ◮ From an agent’s perspective a game is a decision problem ◮ The agent has to decide between different moves (e.g. cooperate or defect, m 1 or m 2 , a 1 or a 2 ) ◮ An agent’s decision can be guided by ◮ update dynamics ◮ learning dynamics ◮ reasoning ◮ beliefs about participant ◮ best response ◮ imitation ◮ chance

  4. Repeated Games: Update Dynamics ◮ Learning Dynamics: Collecting information of previous encounters ◮ Reasoning: Forward induction ( a i expects that a j expects that... plays defect) ◮ Best response: a i plays that move that maximizes utility by knowing or believing the opponents move ◮ Imitate the Best: Play that move that resulted in maximal utility in the last round among all neighbours

  5. Repeated Games: The Evolution of Cooperation Robert Axelrod’s Computer tournament (1979): C D 3;3 0;5 C D 5;0 1;1 Tabelle: Prisoner’s Dilemma ◮ Finding the best strategy for the Iterated Prisoners’ Dilemma (IPD) ◮ Game theorists were invited to submit their favourite strategy (decision rule) ◮ All submitted strategies play against each other for 200 rounds ◮ The strategy with the highest average score wins the tournament

  6. Repeated Games: The Evolution of Cooperation ◮ TIT FOR TAT: Cooperate in the first round and then do what your opponent did last round ◮ FRIEDMAN: Cooperate until the opponent defects, then defect all the time ◮ DOWNING: O | C t − 1 O | D t − 1 ◮ Estimate probabilities p 1 = P ( C t ), p 2 = P ( C t ) I I ◮ If p 1 >> p 2 the opponent is responsive: Cooperate ◮ Else the opponent is not responsive: Defect ◮ TRANQUILIZER: ◮ Cooperate the first moves and check the opponents response ◮ If there arises a pattern of mutual cooperation: Defect from time to time ◮ If opponent continues cooperating, defections become more frequent ◮ TIT FOR 2 TATS: Play TIT FOR TAT, but response with defect if the opponent defected on the previous two moves ◮ JOSS: Play TIT FOR TAT, but response with defects in 10% of opponent’s cooperation moves

  7. Repeated Games: The Evolution of Cooperation Results: 1. The winner was TIT FOR TAT with 504 points 2. Success in such a game correlated with the following characteristics: ◮ Be nice: cooperate, never be the first to defect. ◮ Be provocable: return defection for defection, cooperation for cooperation. ◮ Don’t be envious: be fair with your partner. ◮ Don’t be too clever: or, don’t try to be tricky.

  8. Homework ◮ Model your own PD-Agent ◮ Check it out: www.pgrim.org/pragmatics

Recommend


More recommend