multi agent learning
play

Multi-agent learning The repliato r dynami Gerard Vreeswijk , - PowerPoint PPT Presentation

Multi-agent learning The repliato r dynami Gerard Vreeswijk , Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Wednesday 10 th June, 2020 disrete repliato r disrete


  1. Hawk vs. Dove Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 5

  2. symmetri Symmetric normal-form games Example . Hawk-dove game (share V or threaten [possibly fight: − C ]): H D ( V − C ) /2 V H D 0 V /2 Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 6

  3. symmetri Symmetric normal-form games Example . Hawk-dove game (share V or threaten [possibly fight: − C ]): H D H D ( V − C ) /2 V H H − 2, − 2 2, 0 D 0 V /2 D 0, 2 1, 1 V=2, C=6 Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 6

  4. symmetri Symmetric normal-form games Example . Hawk-dove game (share V or threaten [possibly fight: − C ]): H D H D ( V − C ) /2 V H H − 2, − 2 2, 0 D 0 V /2 D 0, 2 1, 1 V=2, C=6 Other instantiations: prisoner’s dilemma, chicken (= hawk-dove), matching pennies, stag hunt. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 6

  5. Symmetric normal-form games Example . Hawk-dove game (share V or threaten [possibly fight: − C ]): H D H D ( V − C ) /2 V H H − 2, − 2 2, 0 D 0 V /2 D 0, 2 1, 1 V=2, C=6 Other instantiations: prisoner’s dilemma, chicken (= hawk-dove), matching pennies, stag hunt. Definition . A game is symmetri when players have equal actions and payoffs: u i ( a 1 , . . . , a i , . . . , a j , . . . , a n ) = u j ( a 1 , . . . , a j , . . . , a i , . . . , a n ) . for all i and j . Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 6

  6. Symmetric normal-form games Example . Hawk-dove game (share V or threaten [possibly fight: − C ]): H D H D ( V − C ) /2 V H H − 2, − 2 2, 0 D 0 V /2 D 0, 2 1, 1 V=2, C=6 Other instantiations: prisoner’s dilemma, chicken (= hawk-dove), matching pennies, stag hunt. Definition . A game is symmetri when players have equal actions and payoffs: u i ( a 1 , . . . , a i , . . . , a j , . . . , a n ) = u j ( a 1 , . . . , a j , . . . , a i , . . . , a n ) . for all i and j . So a 2-player game G = ( A , B ) is symmetric iff m = n and B = A T . Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 6

  7. symmetri equilib rium Symmetric equilibrium Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 7

  8. Symmetric equilibrium Definition . Let p be a strategy in an n -player symmetric game. If the n -vector ( p , . . . , p ) is a NE, p is called a rium . symmetri equilib Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 7

  9. Symmetric equilibrium Definition . Let p be a strategy in an n -player symmetric game. If the n -vector ( p , . . . , p ) is a NE, p is called a rium . symmetri equilib ■ !! Symmetric equilibria can be identified with strategies !! Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 7

  10. Symmetric equilibrium Definition . Let p be a strategy in an n -player symmetric game. If the n -vector ( p , . . . , p ) is a NE, p is called a rium . symmetri equilib ■ !! Symmetric equilibria can be identified with strategies !! ■ (Theorem.) Every symmetric game has at least one symmetric equilibrium. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 7

  11. Symmetric equilibrium Definition . Let p be a strategy in an n -player symmetric game. If the n -vector ( p , . . . , p ) is a NE, p is called a rium . symmetri equilib ■ !! Symmetric equilibria can be identified with strategies !! ■ (Theorem.) Every symmetric game has at least one symmetric equilibrium. ■ (Fact.) Symmetric games can have a-symmetric equilibria. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 7

  12. Symmetric equilibrium Definition . Let p be a strategy in an n -player symmetric game. If the n -vector ( p , . . . , p ) is a NE, p is called a rium . symmetri equilib ■ !! Symmetric equilibria can be identified with strategies !! ■ (Theorem.) Every symmetric game has at least one symmetric equilibrium. ■ (Fact.) Symmetric games can have a-symmetric equilibria. For example Hawk-Dove: H D H − 2, − 2 2, 0 D 0, 2 1, 1 Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 7

  13. Symmetric equilibrium Definition . Let p be a strategy in an n -player symmetric game. If the n -vector ( p , . . . , p ) is a NE, p is called a rium . symmetri equilib ■ !! Symmetric equilibria can be identified with strategies !! ■ (Theorem.) Every symmetric game has at least one symmetric equilibrium. ■ (Fact.) Symmetric games can have a-symmetric equilibria. For example Hawk-Dove: H D H − 2, − 2 2, 0 D 0, 2 1, 1 Two asymmetric equilibria and one symmetric equilibrium ( 1/3, 1/3 ) . Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 7

  14. Evolutionary game theory Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 8

  15. Evolutionary game theory: the idea Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 9

  16. p rop o rtions �tness average �tness Evolutionary game theory: the idea Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 10

  17. p rop o rtions �tness average �tness Evolutionary game theory: the idea ■ There are n , say 5, species. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 10

  18. p rop o rtions �tness average �tness Evolutionary game theory: the idea ■ There are n , say 5, species. An encounter between individuals of different species yields payoffs for both. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 10

  19. p rop o rtions �tness average �tness Evolutionary game theory: the idea ■ There are n , say 5, species. An encounter between individuals of different species yields payoffs for both. For row: s 1 s 2 s 3 s 4 s 5   s 1 6 7 0 − 1 0 s 2 − 1 5 − 1 4 7     A = s 3 9 0 8 9 6 .     s 4 0 − 4 − 2 3 − 3   s 5 3 0 6 0 − 1 Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 10

  20. p rop o rtions �tness average �tness Evolutionary game theory: the idea ■ There are n , say 5, species. An encounter between individuals of different species yields payoffs for both. For row: s 1 s 2 s 3 s 4 s 5   s 1 6 7 0 − 1 0 s 2 − 1 5 − 1 4 7     A = s 3 9 0 8 9 6 .     s 4 0 − 4 − 2 3 − 3   s 5 3 0 6 0 − 1 ■ The population consists of a very large number of individuals, each playing a pure strategy. Individuals interact randomly. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 10

  21. �tness average �tness Evolutionary game theory: the idea ■ There are n , say 5, species. An encounter ■ We are interested in between individuals of different species rtions : p yields payoffs for both. For row: p rop o = ( p 1 , . . . , p 5 ) . s 1 s 2 s 3 s 4 s 5   s 1 6 7 0 − 1 0 s 2 − 1 5 − 1 4 7     A = s 3 9 0 8 9 6 .     s 4 0 − 4 − 2 3 − 3   s 5 3 0 6 0 − 1 ■ The population consists of a very large number of individuals, each playing a pure strategy. Individuals interact randomly. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 10

  22. average �tness Evolutionary game theory: the idea ■ There are n , say 5, species. An encounter ■ We are interested in between individuals of different species rtions : p yields payoffs for both. For row: p rop o = ( p 1 , . . . , p 5 ) . s 1 s 2 s 3 s 4 s 5 ■ The �tness of   s 1 6 7 0 − 1 0 species i is: s 2 − 1 5 − 1 4 7     A = s 3 9 0 8 9 6 .   f i = ∑ 5 j = 1 p j A ij   s 4 0 − 4 − 2 3 − 3   = ( Ap ) i . s 5 3 0 6 0 − 1 ■ The population consists of a very large number of individuals, each playing a pure strategy. Individuals interact randomly. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 10

  23. Evolutionary game theory: the idea ■ There are n , say 5, species. An encounter ■ We are interested in between individuals of different species rtions : p yields payoffs for both. For row: p rop o = ( p 1 , . . . , p 5 ) . s 1 s 2 s 3 s 4 s 5 ■ The �tness of   s 1 6 7 0 − 1 0 species i is: s 2 − 1 5 − 1 4 7     A = s 3 9 0 8 9 6 .   f i = ∑ 5 j = 1 p j A ij   s 4 0 − 4 − 2 3 − 3   = ( Ap ) i . s 5 3 0 6 0 − 1 ■ The average �tness is ■ The population consists of a very large ¯ f = ∑ 5 i = 1 p i f i number of individuals, each playing a pure = p T Ap . strategy. Individuals interact randomly. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 10

  24. The replicator equation Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 11

  25. repli ato r dynami s History of the replicator equation Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 12

  26. History of the replicator equation ■ Defined for a single species by Taylor and Jonker (1978), and named by Schuster and Sigmund (1983): “Several evolutionary models in distinct biological fields— population genetics, population ecology, early biochemical evo- lution and sociobiology—lead independently to the same class of dynami s .” repli ato r Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 12

  27. History of the replicator equation ■ Defined for a single species by Taylor and Jonker (1978), and named by Schuster and Sigmund (1983): “Several evolutionary models in distinct biological fields— population genetics, population ecology, early biochemical evo- lution and sociobiology—lead independently to the same class of dynami s .” repli ato r ■ The replicator equation is the first game dynamics studied in connection with evolutionary game theory (as developed by Maynard Smith and Price). Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 12

  28. History of the replicator equation ■ Defined for a single species by Taylor and Jonker (1978), and named by Schuster and Sigmund (1983): “Several evolutionary models in distinct biological fields— population genetics, population ecology, early biochemical evo- lution and sociobiology—lead independently to the same class of dynami s .” repli ato r ■ The replicator equation is the first game dynamics studied in connection with evolutionary game theory (as developed by Maynard Smith and Price). Taylor P.D., Jonker L. “Evolutionarily stable strategies and game dynamics” in: Math. Biosci. 1978; 40(1) , pp. 145-156. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 12

  29. History of the replicator equation ■ Defined for a single species by Taylor and Jonker (1978), and named by Schuster and Sigmund (1983): “Several evolutionary models in distinct biological fields— population genetics, population ecology, early biochemical evo- lution and sociobiology—lead independently to the same class of dynami s .” repli ato r ■ The replicator equation is the first game dynamics studied in connection with evolutionary game theory (as developed by Maynard Smith and Price). Taylor P.D., Jonker L. “Evolutionarily stable strategies and game dynamics” in: Math. Biosci. 1978; 40(1) , pp. 145-156. Schuster P., Sigmund K. “Replicator dynamics” in: J. Theor. Biol. 1983, 100(3) , pp. 533-538. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 12

  30. repli ato r equation relative s o re matrix p rop o rtion The replicator equation Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 13

  31. relative s o re matrix p rop o rtion The replicator equation ■ The equation models how n different specifies grow (or repli ato r decline) due to mutual interaction. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 13

  32. relative s o re matrix p rop o rtion The replicator equation ■ The equation models how n different specifies grow (or repli ato r decline) due to mutual interaction. ■ It is assumed that if an individual of species i interacts with an individual of species j , the expected reward for the individual of type i is a constant a ij . Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 13

  33. p rop o rtion The replicator equation ■ The equation models how n different specifies grow (or repli ato r decline) due to mutual interaction. ■ It is assumed that if an individual of species i interacts with an individual of species j , the expected reward for the individual of type i is a constant a ij .   a 11 . . . a 1 n . . ...   . . matrix : A = Summarised in a  . . .  relative s o re a n 1 . . . a nn Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 13

  34. p rop o rtion The replicator equation ■ The equation models how n different specifies grow (or repli ato r decline) due to mutual interaction. ■ It is assumed that if an individual of species i interacts with an individual of species j , the expected reward for the individual of type i is a constant a ij .   a 11 . . . a 1 n . . ...   . . matrix : A = Summarised in a  . . .  relative s o re a n 1 . . . a nn Proportions Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 13

  35. p rop o rtion The replicator equation ■ The equation models how n different specifies grow (or repli ato r decline) due to mutual interaction. ■ It is assumed that if an individual of species i interacts with an individual of species j , the expected reward for the individual of type i is a constant a ij .   a 11 . . . a 1 n . . ...   . . matrix : A = Summarised in a  . . .  relative s o re a n 1 . . . a nn Proportions ■ The number of individuals of species i is denoted by q i , or q i ( t ) . Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 13

  36. The replicator equation ■ The equation models how n different specifies grow (or repli ato r decline) due to mutual interaction. ■ It is assumed that if an individual of species i interacts with an individual of species j , the expected reward for the individual of type i is a constant a ij .   a 11 . . . a 1 n . . ...   . . matrix : A = Summarised in a  . . .  relative s o re a n 1 . . . a nn Proportions ■ The number of individuals of species i is denoted by q i , or q i ( t ) . p j = Def q j / q is the rtion of species i , where q = q 1 + · · · + q n . ■ p rop o Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 13

  37. The replicator equation ■ The equation models how n different specifies grow (or repli ato r decline) due to mutual interaction. ■ It is assumed that if an individual of species i interacts with an individual of species j , the expected reward for the individual of type i is a constant a ij .   a 11 . . . a 1 n . . ...   . . matrix : A = Summarised in a  . . .  relative s o re a n 1 . . . a nn Proportions ■ The number of individuals of species i is denoted by q i , or q i ( t ) . p j = Def q j / q is the rtion of species i , where q = q 1 + · · · + q n . ■ p rop o ■ So p i ∝ q i and p 1 + · · · + p n = 1. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 13

  38. �tness �tness ve to r Fitness Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 14

  39. �tness ve to r Fitness ■ The �tness of an individual is its expected reward when it encounters a random individual in the population. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 14

  40. �tness ve to r Fitness ■ The �tness of an individual is its expected reward when it encounters a random individual in the population. ■ Example . Suppose     1 3 1 0.1  . A = p = 1 2 3 and 0.4    4 1 3 0.5 Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 14

  41. Fitness ■ The �tness of an individual is its expected reward when it encounters a random individual in the population. ■ Example . Suppose     1 3 1 0.1  . A = p = 1 2 3 and 0.4    4 1 3 0.5 The r , f , can now be �tness ve to computed as follows:       1 3 1 0.1 1.8  =  . f = Ap = 1 2 3 0.4 2.4     4 1 3 0.5 2.3 Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 14

  42. Fitness ■ The ■ Average fitness: �tness of an individual is its 3 expected reward when it encounters a ¯ ∑ f ( t ) = p i f i ( t ) random individual in the population. j = 1 = p ( Ap ) = 2.29. ■ Example . Suppose     1 3 1 0.1  . A = p = 1 2 3 and 0.4    4 1 3 0.5 The r , f , can now be �tness ve to computed as follows:       1 3 1 0.1 1.8  =  . f = Ap = 1 2 3 0.4 2.4     4 1 3 0.5 2.3 Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 14

  43. Fitness ■ The ■ Average fitness: �tness of an individual is its 3 expected reward when it encounters a ¯ ∑ f ( t ) = p i f i ( t ) random individual in the population. j = 1 = p ( Ap ) = 2.29. ■ Example . Suppose ■ Fitness of species 1:     1 3 1 0.1 3  . A = p = 1 2 3 and 0.4    ∑ f 1 = p j a 1 j 4 1 3 0.5 j = 1 = ( Ap ) 1 = 1.8. The r , f , can now be �tness ve to computed as follows:       1 3 1 0.1 1.8  =  . f = Ap = 1 2 3 0.4 2.4     4 1 3 0.5 2.3 Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 14

  44. Fitness ■ The ■ Average fitness: �tness of an individual is its 3 expected reward when it encounters a ¯ ∑ f ( t ) = p i f i ( t ) random individual in the population. j = 1 = p ( Ap ) = 2.29. ■ Example . Suppose ■ Fitness of species 1:     1 3 1 0.1 3  . A = p = 1 2 3 and 0.4    ∑ f 1 = p j a 1 j 4 1 3 0.5 j = 1 = ( Ap ) 1 = 1.8. The r , f , can now be So species 1 does worse �tness ve to computed as follows: than average.       1 3 1 0.1 1.8  =  . f = Ap = 1 2 3 0.4 2.4     4 1 3 0.5 2.3 Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 14

  45. Fitness ■ The ■ Average fitness: �tness of an individual is its 3 expected reward when it encounters a ¯ ∑ f ( t ) = p i f i ( t ) random individual in the population. j = 1 = p ( Ap ) = 2.29. ■ Example . Suppose ■ Fitness of species 1:     1 3 1 0.1 3  . A = p = 1 2 3 and 0.4    ∑ f 1 = p j a 1 j 4 1 3 0.5 j = 1 = ( Ap ) 1 = 1.8. The r , f , can now be So species 1 does worse �tness ve to computed as follows: than average.       1 3 1 0.1 1.8 ■ Species 2 and 3 have  =  . f = Ap = 1 2 3 0.4 2.4     fitness 2.4 and 2.3, 4 1 3 0.5 2.3 respectively. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 14

  46. The continuous replicator equation Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 15

  47. The continuous replicator equation The equation has an extremely intuitive reading: ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] , ˙ p i ( t ) is shorthand for the change of p i in time: where ˙ p i ( t ) = p ′ i ( t ) = dp i ( t ) / dt . ˙ Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 16

  48. The continuous replicator equation The equation has an extremely intuitive reading: ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] , ˙ p i ( t ) is shorthand for the change of p i in time: where ˙ p i ( t ) = p ′ i ( t ) = dp i ( t ) / dt . ˙ Example 1 . Suppose the proportion of species 7 at time t is p 7 ( t ) = 0.2 Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 16

  49. The continuous replicator equation The equation has an extremely intuitive reading: ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] , ˙ p i ( t ) is shorthand for the change of p i in time: where ˙ p i ( t ) = p ′ i ( t ) = dp i ( t ) / dt . ˙ Example 1 . Suppose the proportion of species 7 at time t is p 7 ( t ) = 0.2, the fitness of species 7 at time t is f 7 ( t ) = 6 Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 16

  50. The continuous replicator equation The equation has an extremely intuitive reading: ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] , ˙ p i ( t ) is shorthand for the change of p i in time: where ˙ p i ( t ) = p ′ i ( t ) = dp i ( t ) / dt . ˙ Example 1 . Suppose the proportion of species 7 at time t is p 7 ( t ) = 0.2, the fitness of species 7 at time t is f 7 ( t ) = 6, and the average fitness at time t is ¯ f ( t ) = 9. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 16

  51. The continuous replicator equation The equation has an extremely intuitive reading: ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] , ˙ p i ( t ) is shorthand for the change of p i in time: where ˙ p i ( t ) = p ′ i ( t ) = dp i ( t ) / dt . ˙ Example 1 . Suppose the proportion of species 7 at time t is p 7 ( t ) = 0.2, the fitness of species 7 at time t is f 7 ( t ) = 6, and the average fitness at time t is ¯ f ( t ) = 9. How fast does p 7 grow on time t ? Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 16

  52. The continuous replicator equation The equation has an extremely intuitive reading: ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] , ˙ p i ( t ) is shorthand for the change of p i in time: where ˙ p i ( t ) = p ′ i ( t ) = dp i ( t ) / dt . ˙ Example 1 . Suppose the proportion of species 7 at time t is p 7 ( t ) = 0.2, the fitness of species 7 at time t is f 7 ( t ) = 6, and the average fitness at time t is ¯ f ( t ) = 9. How fast does p 7 grow on time t ? Answer . ˙ p 7 ( t ) Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 16

  53. The continuous replicator equation The equation has an extremely intuitive reading: ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] , ˙ p i ( t ) is shorthand for the change of p i in time: where ˙ p i ( t ) = p ′ i ( t ) = dp i ( t ) / dt . ˙ Example 1 . Suppose the proportion of species 7 at time t is p 7 ( t ) = 0.2, the fitness of species 7 at time t is f 7 ( t ) = 6, and the average fitness at time t is ¯ f ( t ) = 9. How fast does p 7 grow on time t ? p 7 ( t ) = p 7 ( t )[ f 7 ( t ) − ¯ Answer . ˙ f ( t )] Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 16

  54. The continuous replicator equation The equation has an extremely intuitive reading: ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] , ˙ p i ( t ) is shorthand for the change of p i in time: where ˙ p i ( t ) = p ′ i ( t ) = dp i ( t ) / dt . ˙ Example 1 . Suppose the proportion of species 7 at time t is p 7 ( t ) = 0.2, the fitness of species 7 at time t is f 7 ( t ) = 6, and the average fitness at time t is ¯ f ( t ) = 9. How fast does p 7 grow on time t ? p 7 ( t ) = p 7 ( t )[ f 7 ( t ) − ¯ Answer . ˙ f ( t )] = 0.2 ( 6 − 9 ) Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 16

  55. The continuous replicator equation The equation has an extremely intuitive reading: ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] , ˙ p i ( t ) is shorthand for the change of p i in time: where ˙ p i ( t ) = p ′ i ( t ) = dp i ( t ) / dt . ˙ Example 1 . Suppose the proportion of species 7 at time t is p 7 ( t ) = 0.2, the fitness of species 7 at time t is f 7 ( t ) = 6, and the average fitness at time t is ¯ f ( t ) = 9. How fast does p 7 grow on time t ? p 7 ( t ) = p 7 ( t )[ f 7 ( t ) − ¯ Answer . ˙ f ( t )] = 0.2 ( 6 − 9 ) = − 0.6. � Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 16

  56. The continuous replicator equation The equation has an extremely intuitive reading: ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] , ˙ p i ( t ) is shorthand for the change of p i in time: where ˙ p i ( t ) = p ′ i ( t ) = dp i ( t ) / dt . ˙ Example 1 . Suppose the proportion of species 7 at time t is p 7 ( t ) = 0.2, the fitness of species 7 at time t is f 7 ( t ) = 6, and the average fitness at time t is ¯ f ( t ) = 9. How fast does p 7 grow on time t ? p 7 ( t ) = p 7 ( t )[ f 7 ( t ) − ¯ Answer . ˙ f ( t )] = 0.2 ( 6 − 9 ) = − 0.6. � Example 2 . Suppose p 5 ( t ) = 0.2, f 5 ( t ) = 6, and ¯ f ( t ) = 4. Same question. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 16

  57. The continuous replicator equation The equation has an extremely intuitive reading: ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] , ˙ p i ( t ) is shorthand for the change of p i in time: where ˙ p i ( t ) = p ′ i ( t ) = dp i ( t ) / dt . ˙ Example 1 . Suppose the proportion of species 7 at time t is p 7 ( t ) = 0.2, the fitness of species 7 at time t is f 7 ( t ) = 6, and the average fitness at time t is ¯ f ( t ) = 9. How fast does p 7 grow on time t ? p 7 ( t ) = p 7 ( t )[ f 7 ( t ) − ¯ Answer . ˙ f ( t )] = 0.2 ( 6 − 9 ) = − 0.6. � Example 2 . Suppose p 5 ( t ) = 0.2, f 5 ( t ) = 6, and ¯ f ( t ) = 4. Same question. p 5 ( t ) = p 5 ( t )[ f 5 ( t ) − ¯ f ( t )] = 0.2 ( 6 − 4 ) = 0.4. Answer . ˙ � Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 16

  58. The dynamics of the replicator equation     1 3 1 1/3 Relative score matrix A =  , start proportions p = 1 2 3 1/3  .   4 1 3 1/3 Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 17

  59. Phase space of the replicator on the previous page Circled rest points indicate Nash equilibria of the score-matrix, interpreted as the payoff matrix of a symmetric game in normal form. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 18

  60. A replicator dynamic in a higher dimension Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 19

  61. rest p oint (Ly apunov) stable asymptoti ally stable Rest point, stable point, asymptotically stable point The equation : ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] ˙ is a system of differential equations. We have p = ( p 1 , . . . , p n ) Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 20

  62. rest p oint (Ly apunov) stable asymptoti ally stable Rest point, stable point, asymptotically stable point The equation : ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] ˙ is a system of differential equations. We have p = ( p 1 , . . . , p n ) ∈ ∆ n Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 20

  63. rest p oint (Ly apunov) stable asymptoti ally stable Rest point, stable point, asymptotically stable point The equation : ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] ˙ is a system of differential equations. We have p = ( p 1 , . . . , p n ) ∈ ∆ n p = ( ˙ p n ) ˙ p 1 , . . . , ˙ and Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 20

  64. rest p oint (Ly apunov) stable asymptoti ally stable Rest point, stable point, asymptotically stable point The equation : ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] ˙ is a system of differential equations. We have p = ( p 1 , . . . , p n ) ∈ ∆ n p n ) ∈ R n . p = ( ˙ ˙ p 1 , . . . , ˙ and Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 20

  65. rest p oint (Ly apunov) stable asymptoti ally stable Rest point, stable point, asymptotically stable point The equation : ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] ˙ is a system of differential equations. We have p = ( p 1 , . . . , p n ) ∈ ∆ n p n ) ∈ R n . p = ( ˙ ˙ p 1 , . . . , ˙ and Definitions: Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 20

  66. (Ly apunov) stable asymptoti ally stable Rest point, stable point, asymptotically stable point The equation : ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] ˙ is a system of differential equations. We have p = ( p 1 , . . . , p n ) ∈ ∆ n p n ) ∈ R n . p = ( ˙ ˙ p 1 , . . . , ˙ and Definitions: p = 0. ■ p is called a oint , if ˙ rest p Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 20

  67. (Ly apunov) stable asymptoti ally stable Rest point, stable point, asymptotically stable point The equation : ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] ˙ is a system of differential equations. We have p = ( p 1 , . . . , p n ) ∈ ∆ n p n ) ∈ R n . p = ( ˙ ˙ p 1 , . . . , ˙ and Definitions: p = 0. (“If at p , then stays at p ”.) ■ p is called a oint , if ˙ rest p Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 20

  68. asymptoti ally stable Rest point, stable point, asymptotically stable point The equation : ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] ˙ is a system of differential equations. We have p = ( p 1 , . . . , p n ) ∈ ∆ n p n ) ∈ R n . p = ( ˙ ˙ p 1 , . . . , ˙ and Definitions: p = 0. (“If at p , then stays at p ”.) ■ p is called a oint , if ˙ rest p ■ A rest point p is called stable if for every neighborhood U (Ly apunov) of p there is another neighborhood U ′ of p such that states in U ′ , if iterated, remain within U . Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 20

  69. asymptoti ally stable Rest point, stable point, asymptotically stable point The equation : ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] ˙ is a system of differential equations. We have p = ( p 1 , . . . , p n ) ∈ ∆ n p n ) ∈ R n . p = ( ˙ ˙ p 1 , . . . , ˙ and Definitions: p = 0. (“If at p , then stays at p ”.) ■ p is called a oint , if ˙ rest p ■ A rest point p is called stable if for every neighborhood U (Ly apunov) of p there is another neighborhood U ′ of p such that states in U ′ , if iterated, remain within U . (“If close to p , then always close to p .”) Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 20

  70. Rest point, stable point, asymptotically stable point The equation : ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] ˙ is a system of differential equations. We have p = ( p 1 , . . . , p n ) ∈ ∆ n p n ) ∈ R n . p = ( ˙ ˙ p 1 , . . . , ˙ and Definitions: p = 0. (“If at p , then stays at p ”.) ■ p is called a oint , if ˙ rest p ■ A rest point p is called stable if for every neighborhood U (Ly apunov) of p there is another neighborhood U ′ of p such that states in U ′ , if iterated, remain within U . (“If close to p , then always close to p .”) ■ A rest point p is called stable if p has a neighborhood U asymptoti ally such that all proportion vectors in U , if iterated, converge to p . Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 20

  71. Rest point, stable point, asymptotically stable point The equation : ontinuous repli ato r p i ( t ) = p i ( t )[ f i ( t ) − ¯ f ( t ) ] ˙ is a system of differential equations. We have p = ( p 1 , . . . , p n ) ∈ ∆ n p n ) ∈ R n . p = ( ˙ ˙ p 1 , . . . , ˙ and Definitions: p = 0. (“If at p , then stays at p ”.) ■ p is called a oint , if ˙ rest p ■ A rest point p is called stable if for every neighborhood U (Ly apunov) of p there is another neighborhood U ′ of p such that states in U ′ , if iterated, remain within U . (“If close to p , then always close to p .”) ■ A rest point p is called stable if p has a neighborhood U asymptoti ally such that all proportion vectors in U , if iterated, converge to p . (“if close to p , then convergence to p .”). Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 20

  72. Relation with Nash equilibria State p is a Nash equilibrium: ∀ q : q ( Ap ) ≤ p ( Ap ) ⇔ ∀ q : q f ≤ p f ∀ q : q 1 f 1 + · · · + q n f n ≤ ¯ ⇔ f . If for all i : f i ≤ ¯ f , then it must be that for all i : f i = ¯ f (check!), which means we have a rest point. Such a rest point is called saturated . Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 21

  73. Relation with Nash equilibria State p is a Nash equilibrium: ∀ q : q ( Ap ) ≤ p ( Ap ) ⇔ ∀ q : q f ≤ p f ∀ q : q 1 f 1 + · · · + q n f n ≤ ¯ ⇔ f . If for all i : f i ≤ ¯ f , then it must be that for all i : f i = ¯ f (check!), which means we have a rest point. Such a rest point is called saturated . ■ Nash equilibrium ⇔ saturated rest point. Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 21

  74. Relation with Nash equilibria State p is a Nash equilibrium: ∀ q : q ( Ap ) ≤ p ( Ap ) ⇔ ∀ q : q f ≤ p f ∀ q : q 1 f 1 + · · · + q n f n ≤ ¯ ⇔ f . If for all i : f i ≤ ¯ f , then it must be that for all i : f i = ¯ f (check!), which means we have a rest point. Such a rest point is called saturated . ■ Nash equilibrium ⇔ saturated rest point. Proof. ⇒ : take pure q . ⇐ : if for all i : f i ≤ f , then no convex combination of those f i can supersede ¯ f . � Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 21

  75. Relation with Nash equilibria State p is a Nash equilibrium: ■ Nash equilibrium ⇒ rest point. (Trivial.) ∀ q : q ( Ap ) ≤ p ( Ap ) ⇔ ∀ q : q f ≤ p f ∀ q : q 1 f 1 + · · · + q n f n ≤ ¯ ⇔ f . If for all i : f i ≤ ¯ f , then it must be that for all i : f i = ¯ f (check!), which means we have a rest point. Such a rest point is called saturated . ■ Nash equilibrium ⇔ saturated rest point. Proof. ⇒ : take pure q . ⇐ : if for all i : f i ≤ f , then no convex combination of those f i can supersede ¯ f . � Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 21

  76. Relation with Nash equilibria State p is a Nash equilibrium: ■ Nash equilibrium ⇒ rest point. (Trivial.) ∀ q : q ( Ap ) ≤ p ( Ap ) ■ Fully mixed rest point ⇒ ⇔ ∀ q : q f ≤ p f Nash equilibrium. (Because ∀ q : q 1 f 1 + · · · + q n f n ≤ ¯ ⇔ f . fully mixed implies If for all i : f i ≤ ¯ f , then it must be saturated.) that for all i : f i = ¯ f (check!), which means we have a rest point. Such a rest point is called saturated . ■ Nash equilibrium ⇔ saturated rest point. Proof. ⇒ : take pure q . ⇐ : if for all i : f i ≤ f , then no convex combination of those f i can supersede ¯ f . � Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 21

  77. Relation with Nash equilibria State p is a Nash equilibrium: ■ Nash equilibrium ⇒ rest point. (Trivial.) ∀ q : q ( Ap ) ≤ p ( Ap ) ■ Fully mixed rest point ⇒ ⇔ ∀ q : q f ≤ p f Nash equilibrium. (Because ∀ q : q 1 f 1 + · · · + q n f n ≤ ¯ ⇔ f . fully mixed implies If for all i : f i ≤ ¯ f , then it must be saturated.) that for all i : f i = ¯ f (check!), ■ Strict Nash equilibrium ⇒ which means we have a rest asymptotically stable. point. Such a rest point is called saturated . ■ Nash equilibrium ⇔ saturated rest point. Proof. ⇒ : take pure q . ⇐ : if for all i : f i ≤ f , then no convex combination of those f i can supersede ¯ f . � Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 21

  78. Relation with Nash equilibria State p is a Nash equilibrium: ■ Nash equilibrium ⇒ rest point. (Trivial.) ∀ q : q ( Ap ) ≤ p ( Ap ) ■ Fully mixed rest point ⇒ ⇔ ∀ q : q f ≤ p f Nash equilibrium. (Because ∀ q : q 1 f 1 + · · · + q n f n ≤ ¯ ⇔ f . fully mixed implies If for all i : f i ≤ ¯ f , then it must be saturated.) that for all i : f i = ¯ f (check!), ■ Strict Nash equilibrium ⇒ which means we have a rest asymptotically stable. point. Such a rest point is called saturated . ■ Limit point in the interior of ∆ n ⇒ Nash equilibrium. ■ Nash equilibrium ⇔ saturated rest point. Proof. ⇒ : take pure q . ⇐ : if for all i : f i ≤ f , then no convex combination of those f i can supersede ¯ f . � Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 21

  79. Relation with Nash equilibria State p is a Nash equilibrium: ■ Nash equilibrium ⇒ rest point. (Trivial.) ∀ q : q ( Ap ) ≤ p ( Ap ) ■ Fully mixed rest point ⇒ ⇔ ∀ q : q f ≤ p f Nash equilibrium. (Because ∀ q : q 1 f 1 + · · · + q n f n ≤ ¯ ⇔ f . fully mixed implies If for all i : f i ≤ ¯ f , then it must be saturated.) that for all i : f i = ¯ f (check!), ■ Strict Nash equilibrium ⇒ which means we have a rest asymptotically stable. point. Such a rest point is called saturated . ■ Limit point in the interior of ∆ n ⇒ Nash equilibrium. ■ Nash equilibrium ⇔ saturated rest point. ■ Asymptotically stable in the Proof. ⇒ : take pure q . ⇐ : if interior of ∆ n ⇒ isolated for all i : f i ≤ f , then no trembling-hand perfect Nash convex combination of those equilibrium. f i can supersede ¯ f . � Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 21

  80. Not all Nash equilibria are Lyapunov stable ( 1, 0, 0 ) is Nash but not Lyapunov stable. (The picture is merely suggestive, since it only contains a few traces of the dynamics.) Author: Gerard Vreeswijk. Slides last modified on June 10 th , 2020 at 14:01 Multi-agent learning: The replicator dynamic, slide 22

Recommend


More recommend