multi agent learning
play

Multi-agent learning Rep eated games Gerard Vreeswijk , - PowerPoint PPT Presentation

Multi-agent learning Rep eated games Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Friday 3 rd May, 2019 interation lea rning stage game nite


  1. Plan for today ■ NE in normal form games that are repeated a finite number of times. ● Principle of indu tion . ba kw a rd ■ NE in normal form games that are repeated an indefinite number of times. r . Models the probability of continuation. ● Dis ount fa to rem . (Actually many FT’s.) Repeated games generally do ● F olk theo have infinitely many Nash equilibria. strategy . “on-path” vs. “off-path” play, “minmax” as a ● T rigger threat. This presentation draws heavily on (Peters, 2008). * H. Peters (2008): Game Theory: A Multi-Leveled Approach . Springer, ISBN: 978-3-540- 69290-4. Ch. 8: Repeated games. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 3

  2. Part I: Nash equilibria Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

  3. Part I: Nash equilibria in normal form games Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

  4. Part I: Nash equilibria in normal form games that are repeated Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

  5. Part I: Nash equilibria in normal form games that are repeated a finite number of times Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 4

  6. one Nash equilib rium D P a reto sub-optimal Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) Cooperate Y ou: ( 5 , 0 ) ( 1 , 1 ) Defect Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

  7. P a reto sub-optimal Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) Cooperate Y ou: ( 5 , 0 ) ( 1 , 1 ) Defect ■ Even if mixed strategies are allowed, the PD possesses one Nash rium , viz. ( D , D ) with payoffs ( 1 , 1 ) . equilib Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

  8. Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) Cooperate Y ou: ( 5 , 0 ) ( 1 , 1 ) Defect ■ Even if mixed strategies are allowed, the PD possesses one Nash rium , viz. ( D , D ) with payoffs ( 1 , 1 ) . equilib ■ This equilibrium is sub-optimal . P a reto Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

  9. Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) Cooperate Y ou: ( 5 , 0 ) ( 1 , 1 ) Defect ■ Even if mixed strategies are allowed, the PD possesses one Nash rium , viz. ( D , D ) with payoffs ( 1 , 1 ) . equilib ■ This equilibrium is sub-optimal . P a reto ■ Does the situation change if two parties get to play the Prisoners’ Dilemma two times in succession? Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

  10. Nash equilibria in playing the PD twice Other: Prisoners’ Dilemma Cooperate Defect ( 3 , 3 ) ( 0 , 5 ) Cooperate Y ou: ( 5 , 0 ) ( 1 , 1 ) Defect ■ Even if mixed strategies are allowed, the PD possesses one Nash rium , viz. ( D , D ) with payoffs ( 1 , 1 ) . equilib ■ This equilibrium is sub-optimal . P a reto ■ Does the situation change if two parties get to play the Prisoners’ Dilemma two times in succession? ■ The following diagram (hopefully) shows that playing the PD two times in succession does not yield an essentially new NE. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 5

  11. Nash equilibria in playing the PD twice ( 2 , 2 ) C C C D D C D D ( 4 , 4 ) ( 1 , 6 ) ( 6 , 1 ) ( 2 , 2 ) C C C D D C D D C C C D D C D D C C C D D C D D C C C D D C D D ( 6 , 6 ) ( 3 , 8 ) ( 8 , 3 ) ( 4 , 4 ) ( 3 , 8 ) ( 0 , 10 ) ( 5 , 5 ) ( 1 , 6 ) ( 8 , 3 ) ( 5 , 5 ) ( 10 , 0 ) ( 6 , 1 ) ( 4 , 4 ) ( 1 , 6 ) ( 6 , 1 ) ( 2 , 2 ) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 6

  12. Nash equilibria in playing the PD twice ( 2 , 2 ) C C C D D C D D ( 4 , 4 ) ( 1 , 6 ) ( 6 , 1 ) ( 2 , 2 ) C C C D D C D D C C C D D C D D C C C D D C D D C C C D D C D D ( 6 , 6 ) ( 3 , 8 ) ( 8 , 3 ) ( 4 , 4 ) ( 3 , 8 ) ( 0 , 10 ) ( 5 , 5 ) ( 1 , 6 ) ( 8 , 3 ) ( 5 , 5 ) ( 10 , 0 ) ( 6 , 1 ) ( 4 , 4 ) ( 1 , 6 ) ( 6 , 1 ) ( 2 , 2 ) P.S. This is just a payoff tree, not a game in extensive form! Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 6

  13. Nash equilibria in playing the PD twice In normal form: Other: CC CD DC DD ( 6 , 6 ) ( 3 , 8 ) ( 3 , 8 ) ( 0 , 10 ) CC Y ou: ( 8 , 3 ) ( 4, 4 ) ( 5 , 5 ) ( 1, 6 ) CD ( 8 , 3 ) ( 5 , 5 ) ( 4 , 4 ) ( 1 , 6 ) DC ( 10 , 0 ) ( 6, 1 ) ( 6 , 1 ) ( 2, 2 ) DD Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

  14. Nash equilibria in playing the PD twice In normal form: Other: CC CD DC DD ( 6 , 6 ) ( 3 , 8 ) ( 3 , 8 ) ( 0 , 10 ) CC Y ou: ( 8 , 3 ) ( 4, 4 ) ( 5 , 5 ) ( 1, 6 ) CD ( 8 , 3 ) ( 5 , 5 ) ( 4 , 4 ) ( 1 , 6 ) DC ( 10 , 0 ) ( 6, 1 ) ( 6 , 1 ) ( 2, 2 ) DD ■ The action profile ( DD , DD ) is the only Nash equilibrium. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

  15. Nash equilibria in playing the PD twice In normal form: Other: CC CD DC DD ( 6 , 6 ) ( 3 , 8 ) ( 3 , 8 ) ( 0 , 10 ) CC Y ou: ( 8 , 3 ) ( 4, 4 ) ( 5 , 5 ) ( 1, 6 ) CD ( 8 , 3 ) ( 5 , 5 ) ( 4 , 4 ) ( 1 , 6 ) DC ( 10 , 0 ) ( 6, 1 ) ( 6 , 1 ) ( 2, 2 ) DD ■ The action profile ( DD , DD ) is the only Nash equilibrium. ■ With 3 successive games, we obtain a 2 3 × 2 3 matrix, where the action profile ( DDD , DDD ) still would be the only Nash equilibrium. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

  16. Nash equilibria in playing the PD twice In normal form: Other: CC CD DC DD ( 6 , 6 ) ( 3 , 8 ) ( 3 , 8 ) ( 0 , 10 ) CC Y ou: ( 8 , 3 ) ( 4, 4 ) ( 5 , 5 ) ( 1, 6 ) CD ( 8 , 3 ) ( 5 , 5 ) ( 4 , 4 ) ( 1 , 6 ) DC ( 10 , 0 ) ( 6, 1 ) ( 6 , 1 ) ( 2, 2 ) DD ■ The action profile ( DD , DD ) is the only Nash equilibrium. ■ With 3 successive games, we obtain a 2 3 × 2 3 matrix, where the action profile ( DDD , DDD ) still would be the only Nash equilibrium. ■ Generalise to N repetitions: ( D D N − 1 , DD N − 1 ) still is the only Nash equilibrium in a repeated game where the PD is played N times in succession. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 7

  17. Part II: Nash equilibria Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

  18. Part II: Nash equilibria in normal form games Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

  19. Part II: Nash equilibria in normal form games that are repeated Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

  20. Part II: Nash equilibria in normal form games that are repeated an indefinite number of times Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 8

  21. a �xed numb er of times an inde�nite numb er of times ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  22. a �xed numb er of times an inde�nite numb er of times ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  23. a �xed numb er of times an inde�nite numb er of times ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  24. a �xed numb er of times an inde�nite numb er of times ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  25. an inde�nite numb er of times ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  26. ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  27. ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Depends on the context! Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  28. ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Depends on the context! ■ . . . an indefinite number of times. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  29. ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Depends on the context! ■ . . . an indefinite number of times. Means: a finite number of times, but nothing is known beforehand about the number of repetitions. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  30. ountably in�nite Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Depends on the context! ■ . . . an indefinite number of times. Means: a finite number of times, but nothing is known beforehand about the number of repetitions. ■ . . . an infinite number of times. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  31. Terminology: finite, indefinite, infinite To repeat an experiment . . . ■ . . . ten times. That’s hopefully clear. ■ . . . a finite number of times. May mean: a times , where the number �xed numb er of of repetitions is determined beforehand. Or it may mean: an inde�nite numb er of times . Depends on the context! ■ . . . an indefinite number of times. Means: a finite number of times, but nothing is known beforehand about the number of repetitions. ■ . . . an infinite number of times. When throwing a dice this must mean a in�nite number of times. ountably Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 9

  32. inde�nite dis ount fa to r in�nitely many F olk theo rems Indefinite number of repetitions Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  33. inde�nite dis ount fa to r in�nitely many F olk theo rems Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  34. dis ount fa to r in�nitely many F olk theo rems Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  35. in�nitely many F olk theo rems Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). r δ ∈ [ 0, 1 ] determines the probability of 2. A so-called dis ount fa to continuing the game after each round. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  36. in�nitely many F olk theo rems Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). r δ ∈ [ 0, 1 ] determines the probability of 2. A so-called dis ount fa to continuing the game after each round. 3. The probability to continue, δ , is large enough. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  37. F olk theo rems Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). r δ ∈ [ 0, 1 ] determines the probability of 2. A so-called dis ount fa to continuing the game after each round. 3. The probability to continue, δ , is large enough. ■ Under these conditions suddenly many Nash equilibria exist. in�nitely This is sometimes called an embarrassment of richness (Peters, 2008). Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  38. Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). r δ ∈ [ 0, 1 ] determines the probability of 2. A so-called dis ount fa to continuing the game after each round. 3. The probability to continue, δ , is large enough. ■ Under these conditions suddenly many Nash equilibria exist. in�nitely This is sometimes called an embarrassment of richness (Peters, 2008). ■ Various rems state the existence of multiple equilibria in F olk theo games that are repeated an indefinite number of times. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  39. Indefinite number of repetitions ■ A Pareto-suboptimal outcome can be avoided in case the following three conditions are met. 1. The Prisoners’ Dilemma is repeated an inde�nite number of times (rounds). r δ ∈ [ 0, 1 ] determines the probability of 2. A so-called dis ount fa to continuing the game after each round. 3. The probability to continue, δ , is large enough. ■ Under these conditions suddenly many Nash equilibria exist. in�nitely This is sometimes called an embarrassment of richness (Peters, 2008). ■ Various rems state the existence of multiple equilibria in F olk theo games that are repeated an indefinite number of times. ■ Here we discuss one version of “the” Folk Theorem. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 10

  40. Family of Folk Theorems There actually exist many Folk Theorems. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  41. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  42. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  43. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  44. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  45. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  46. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) or through average rewards. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  47. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) or through average rewards. ■ Subgame perfectness . Subgame perfect equilibria (present case) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  48. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) or through average rewards. ■ Subgame perfectness . Subgame perfect equilibria (present case) or plain Nash equilibria. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  49. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) or through average rewards. ■ Subgame perfectness . Subgame perfect equilibria (present case) or plain Nash equilibria. ■ Equilibrium . We may be interested in Nash equilibria (present case) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  50. Family of Folk Theorems There actually exist many Folk Theorems. ■ Horizon . The game may be repeated indefinitely (present case) or there may be an upper bound to the number of plays. ■ Information . Players may act on the basis of CKR (present case), or certain parts of the history may be hidden. ■ Reward . Players may collect their payoff through a discount factor (present case) or through average rewards. ■ Subgame perfectness . Subgame perfect equilibria (present case) or plain Nash equilibria. ■ Equilibrium . We may be interested in Nash equilibria (present case), or other types of equilibria, such as so-called ǫ -Nash equilibria or so-called correlated equilibria . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 11

  51. rep eated game stage game histo ry The concept of a repeated game Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  52. rep eated game stage game histo ry The concept of a repeated game ■ Let G be a game in normal form. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  53. stage game histo ry The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times ■ The rep eated Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  54. stage game histo ry The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  55. stage game histo ry The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Exercise: give P { G ∗ ( δ ) lasts at least t rounds } . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  56. stage game histo ry The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Exercise: give P { G ∗ ( δ ) lasts at least t rounds } . Answer: δ t . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  57. histo ry The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Exercise: give P { G ∗ ( δ ) lasts at least t rounds } . Answer: δ t . ■ G is called the game . stage Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  58. The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Exercise: give P { G ∗ ( δ ) lasts at least t rounds } . Answer: δ t . ■ G is called the game . stage ■ A ry h of length t of a repeated game is a sequence of action histo profiles of length t . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  59. The concept of a repeated game ■ Let G be a game in normal form. game G ∗ ( δ ) , is G , played an indefinite number of times, ■ The rep eated where δ represents the probability that the game will be played another time. Exercise: give P { G ∗ ( δ ) lasts at least t rounds } . Answer: δ t . ■ G is called the game . stage ■ A ry h of length t of a repeated game is a sequence of action histo profiles of length t . Example: (for the prisoner’s dilemma): Row player: C D D D C C D D D D Column player: C D D D D D D C D D 0 1 2 3 4 5 6 7 8 9 Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 12

  60. strategy strategy p ro�le exp e ted pa y o� The concept of a repeated game (II) Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

  61. strategy strategy p ro�le exp e ted pa y o� The concept of a repeated game (II) ■ The set of all possible histories (of any length) is denoted by H . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

  62. strategy p ro�le exp e ted pa y o� The concept of a repeated game (II) ■ The set of all possible histories (of any length) is denoted by H . strategy for Player i is a function s i : H → ∆ { C , D } such that ■ A Pr ( Player i plays C in round | h | + 1 | h ) = s i ( h ) ( C ) . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

  63. exp e ted pa y o� The concept of a repeated game (II) ■ The set of all possible histories (of any length) is denoted by H . strategy for Player i is a function s i : H → ∆ { C , D } such that ■ A Pr ( Player i plays C in round | h | + 1 | h ) = s i ( h ) ( C ) . ■ A ro�le s is a combination of strategies, one for each player. strategy p Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

  64. The concept of a repeated game (II) ■ The set of all possible histories (of any length) is denoted by H . strategy for Player i is a function s i : H → ∆ { C , D } such that ■ A Pr ( Player i plays C in round | h | + 1 | h ) = s i ( h ) ( C ) . ■ A ro�le s is a combination of strategies, one for each player. strategy p ■ The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Example on next page. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 13

  65. The concept of a repeated game (II) Repeated from the previous slide: The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

  66. The concept of a repeated game (II) Repeated from the previous slide: The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Example: prisoner’s dilemma, strategy Player 1 is s 1 = “always cooperate 80%”; Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

  67. The concept of a repeated game (II) Repeated from the previous slide: The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Example: prisoner’s dilemma, strategy Player 1 is s 1 = “always cooperate 80%”; strategy Player 2 is s 2 = “always cooperate 70%”; Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

  68. The concept of a repeated game (II) Repeated from the previous slide: The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Example: prisoner’s dilemma, strategy Player 1 is s 1 = “always cooperate 80%”; strategy Player 2 is s 2 = “always cooperate 70%”; δ = 1/2. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

  69. The concept of a repeated game (II) Repeated from the previous slide: The o� for player i given s can be computed. It is exp e ted pa y ∞ δ t Expected payoff i , t ( s ) . ∑ Expected payoff i ( s ) = t = 0 Example: prisoner’s dilemma, strategy Player 1 is s 1 = “always cooperate 80%”; strategy Player 2 is s 2 = “always cooperate 70%”; δ = 1/2. �� 1 � � t ∞ ∑ Expected payoff 1 ( s ) = [ 0.8 ( 0.7 · 3 + 0.3 · 0 ) + 0.2 ( 0.7 · 5 + 0.3 · 1 )] 2 t = 0 1 1 = 1 − 1/2 [ . . . ] ≈ 1 − 1/22.44 = 2 × 2.44 = 4.88. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 14

  70. Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  71. Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  72. Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Claim . The strategy profile ( D ∗ , D ∗ ) is a subgame perfect equilibrium in G ∗ ( δ ) . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  73. Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Claim . The strategy profile ( D ∗ , D ∗ ) is a subgame perfect equilibrium in G ∗ ( δ ) . Proof. Consider any tailgame starting at round t ≥ 0. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  74. Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Claim . The strategy profile ( D ∗ , D ∗ ) is a subgame perfect equilibrium in G ∗ ( δ ) . Proof. Consider any tailgame starting at round t ≥ 0. We are done if we can show that ( D ∗ , D ∗ ) is a NE for this subgame. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  75. Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Claim . The strategy profile ( D ∗ , D ∗ ) is a subgame perfect equilibrium in G ∗ ( δ ) . Proof. Consider any tailgame starting at round t ≥ 0. We are done if we can show that ( D ∗ , D ∗ ) is a NE for this subgame. This is true: given that one player always defects, it never pays off for the other player to play C at any time. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  76. Subgame perfect equilibria of G ∗ ( δ ) : D ∗ Definition. A strategy profile s for G ∗ ( δ ) is a subgame-p erfe t Nash rium if (1) it is a Nash equilibrium of this repeated game, and (2) equilib for every subgame (i.e., tail game) of this repeated game, s restricted to that subgame is a Nash equilibrium as well. Consider the strategy of iterated defection D ∗ : “always defect, no matter what”. 1 Claim . The strategy profile ( D ∗ , D ∗ ) is a subgame perfect equilibrium in G ∗ ( δ ) . Proof. Consider any tailgame starting at round t ≥ 0. We are done if we can show that ( D ∗ , D ∗ ) is a NE for this subgame. This is true: given that one player always defects, it never pays off for the other player to play C at any time. Therefore, everyone sticks to D ∗ . 1 A notation like D ∗ or (worse) D ∞ is suggestive. Mathematically it makes no sense, but intuitively it does. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 15

  77. Part III: Trigger strategies Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 16

  78. Cost of deviating in Round N Consider the so-called trigger strategy T : “always play C unless D has been played at least once. In that case play D forever”. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

  79. Cost of deviating in Round N Consider the so-called trigger strategy T : “always play C unless D has been played at least once. In that case play D forever”. Claim . The strategy profile ( T , T ) is a subgame perfect equilibrium in G ∗ ( δ ) , provided the probability of continuation, δ , is sufficiently large. Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

  80. Cost of deviating in Round N Consider the so-called trigger strategy T : “always play C unless D has been played at least once. In that case play D forever”. Claim . The strategy profile ( T , T ) is a subgame perfect equilibrium in G ∗ ( δ ) , provided the probability of continuation, δ , is sufficiently large. Proof. Suppose one player starts to defect at Round N . Author: Gerard Vreeswijk. Slides last modified on May 3 rd , 2019 at 12:39 Multi-agent learning: Repeated games, slide 17

Recommend


More recommend