multi agent learning
play

Multi-agent learning T eahing strategies Gerard Vreeswijk , - PowerPoint PPT Presentation

Multi-agent learning T eahing strategies Gerard Vreeswijk , Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Thursday 18 th June, 2020 Bully Go dfather {lenient, strit}


  1. Multi-agent learning T ea hing strategies Gerard Vreeswijk , Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Thursday 18 th June, 2020

  2. Bully Go dfather {lenient, stri t} Go dfather Go dfather++ SP aM guilt Plan for Today Part I: Preliminaries Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  3. Go dfather {lenient, stri t} Go dfather Go dfather++ SP aM guilt Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  4. {lenient, stri t} Go dfather Go dfather++ SP aM guilt Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Teacher possesses memory of k = 1 round: 2. Go dfather Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  5. Go dfather++ SP aM guilt Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Teacher possesses memory of k = 1 round: 2. Go dfather 3. Teacher possesses memory of k > 1 rounds: {lenient, stri t} Go dfather Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  6. SP aM guilt Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Teacher possesses memory of k = 1 round: 2. Go dfather 3. Teacher possesses memory of k > 1 rounds: {lenient, stri t} Go dfather 4. Teacher is represented by a finite machine: Go dfather++ Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  7. guilt Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Teacher possesses memory of k = 1 round: 2. Go dfather 3. Teacher possesses memory of k > 1 rounds: {lenient, stri t} Go dfather 4. Teacher is represented by a finite machine: Go dfather++ Part II: Crandall & Goodrich (2005) aM : an algorithm that claims to SP integrate follower and teacher algorithms. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  8. guilt Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Teacher possesses memory of k = 1 round: 2. Go dfather 3. Teacher possesses memory of k > 1 rounds: {lenient, stri t} Go dfather 4. Teacher is represented by a finite machine: Go dfather++ Part II: Crandall & Goodrich (2005) aM : an algorithm that claims to SP integrate follower and teacher algorithms. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  9. guilt Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Teacher possesses memory of k = 1 round: 2. Go dfather 3. Teacher possesses memory of k > 1 rounds: {lenient, stri t} Go dfather 4. Teacher is represented by a finite machine: Go dfather++ Part II: Crandall & Goodrich (2005) aM : an algorithm that claims to SP integrate follower and teacher algorithms. 1. Three points of criticism to Godfather++. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  10. guilt Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Teacher possesses memory of k = 1 round: 2. Go dfather 3. Teacher possesses memory of k > 1 rounds: {lenient, stri t} Go dfather 4. Teacher is represented by a finite machine: Go dfather++ Part II: Crandall & Goodrich (2005) aM : an algorithm that claims to SP integrate follower and teacher algorithms. 1. Three points of criticism to Godfather++. 2. Core idea of SPaM: combine teacher and follower capabilities. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  11. Plan for Today Part I: Preliminaries Teacher possesses memory of k = 0 rounds: 1. Bully Teacher possesses memory of k = 1 round: 2. Go dfather 3. Teacher possesses memory of k > 1 rounds: {lenient, stri t} Go dfather 4. Teacher is represented by a finite machine: Go dfather++ Part II: Crandall & Goodrich (2005) aM : an algorithm that claims to SP integrate follower and teacher algorithms. 1. Three points of criticism to Godfather++. 2. Core idea of SPaM: combine teacher and follower capabilities. 3. Notion of guilt to trigger switches between teaching and following. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  12. Literature Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  13. Literature Michael L. Littman and Peter Stone (2001). “Leading best-response strategies in repeated games”. Research note. One of the first papers, if not the first paper, that mentions Bully and Godfather. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  14. Literature Michael L. Littman and Peter Stone (2001). “Leading best-response strategies in repeated games”. Research note. One of the first papers, if not the first paper, that mentions Bully and Godfather. Michael L. Littman and Peter Stone (2005). “A polynomial-time Nash equilibrium algorithm for repeated games”. In Decision Support Systems Vol. 39, pp. 55-66. Paper that describes Godfather++. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  15. Literature Michael L. Littman and Peter Stone (2001). “Leading best-response strategies in repeated games”. Research note. One of the first papers, if not the first paper, that mentions Bully and Godfather. Michael L. Littman and Peter Stone (2005). “A polynomial-time Nash equilibrium algorithm for repeated games”. In Decision Support Systems Vol. 39, pp. 55-66. Paper that describes Godfather++. Jacob W. Crandall and Michael A. Goodrich (2005). “Learning to teach and follow in repeated games”. In AAAI Workshop on Multiagent Learning , Pittsburgh, PA. Paper that attempts to combine Fictitious Play and a modified Godfather++ to define an algorithm that “knows” when to teach and when to follow. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  16. Literature Michael L. Littman and Peter Stone (2001). “Leading best-response strategies in repeated games”. Research note. One of the first papers, if not the first paper, that mentions Bully and Godfather. Michael L. Littman and Peter Stone (2005). “A polynomial-time Nash equilibrium algorithm for repeated games”. In Decision Support Systems Vol. 39, pp. 55-66. Paper that describes Godfather++. Jacob W. Crandall and Michael A. Goodrich (2005). “Learning to teach and follow in repeated games”. In AAAI Workshop on Multiagent Learning , Pittsburgh, PA. Paper that attempts to combine Fictitious Play and a modified Godfather++ to define an algorithm that “knows” when to teach and when to follow. Doran Chakraborty and Peter Stone (2008). “Online Multiagent Learning against Memory Bounded Adversaries,” Machine Learning and Knowledge Discovery in Databases , Lecture Notes in Artificial Intelligence Vol. 5212, pp. 211-26 Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  17. Taxonomy of possible adversaries (Taken from Chakraborty and Stone, 2008): Adversaries Joint-action based Joint-strategy based Dependent on entire Previous step joint- k -Markov strategy history Entire history of joint 1. Best response strategies. 1. Fictitious play 1. IGA 2. Godfather 1. No-regret 2. Grim opponent 2. WoLF-IGA learners. 3. Bully 3. WoLF-PHC 3. ReDVaLer Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  18. Bully Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  19. Bully Play any strategy that gives you the highest payoff, assuming that your opponent is a mindless follower. Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  20. Bully Play any strategy that gives you the highest payoff, assuming that your opponent is a mindless follower. Example of finding a pure Bully strategy: Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

  21. Bully Play any strategy that gives you the highest payoff, assuming that your opponent is a mindless follower. Example of finding a pure Bully strategy: L M R   T 3, 6 8, 6 7, 3 C 8, 1 6, 3 7, 3   B 3, 5 9, 2 7, 5 Author: Gerard Vreeswijk. Slides last modified on June 18 th , 2020 at 20:55 Multi-agent learning: Teaching strategies,

Recommend


More recommend