multi agent learning
play

Multi-agent learning Erik Berbee & Bas van Gijzel , Master - PowerPoint PPT Presentation

Multi-agent learning Methodology of MAL research Metho dology of MAL resea rh Multi-agent learning Erik Berbee & Bas van Gijzel , Master Student AT, Utrecht University Erik Berbee & Bas van Gijzel. Slides last processed on Monday


  1. Multi-agent learning Methodology of MAL research Metho dology of MAL resea r h Multi-agent learning Erik Berbee & Bas van Gijzel , Master Student AT, Utrecht University Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 1

  2. Multi-agent learning Methodology of MAL research Overview Today we will talk about... • Formal setting • Characteristics of multi-agent learning • Classes of techniques • Types of results • Agendas and criticism • Loose ends and questions Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 2

  3. Multi-agent learning Methodology of MAL research The Problem • No unified goals/agendas – No unified formal setting Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 3

  4. Multi-agent learning Methodology of MAL research Formal setting: Stochastic games • Represented as a tuple ( N , S , � A , � R , T ) – N is the set of agents – S is the set of n-agents stage games – � A = A 1 , ..., A n with A i the set of actions (pure strategies) of agent i R = R 1 , ..., R n , with R i : S × � – � A → ℜ the reward function of agent i – T : S × � A → Π ( S ) is a stochastic transition function – Restricted versions: Repeated game, MDP Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 4

  5. Multi-agent learning Methodology of MAL research Sidetrack: Replicator Dynamics • Represented as a tuple ( A , P 0 , R ) • A set of possible pure strategies/actions for the agents indexed 1,...,m • P 0 initial distribution of agent across possible strategies, ∑ m i = 1 P 0 ( i ) • R : A × A → ℜ the immediate reward function for each agent • Each P t ( a ) is adjusted to the average reward • Can be seen as a repeated game between two agents playing the same mixed strategy Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 5

  6. Multi-agent learning Methodology of MAL research Formal setting: Available Information What information does an agent have? • Play is fully observable • Game is known • Opponents strategy is not know a priori Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 6

  7. Multi-agent learning Methodology of MAL research Sidetrack: Consequences of Restrictions on Information • f i ( z ) maps each state z to a probability distribution over i’s actions next period • f i ( z ) is uncoupled if it does not depend on opponents’payoffs Theorem 3. Given a finite action space A and positive integer s, there exist no uncoupled rules f i ( z ) whose state variable z is the last s plays, such that, for every game G on A, the period-by-period behaviors converge almost surely to a Nash equilibrium of G, or even to an ǫ -equilibrium of G, for all sufficiently small ǫ > 0. * H.P. Young (2007): The possible and the impossible in multi-agent learning . In: Artificial Intelligence 171 , pp. 429-433, 2007. Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 7

  8. Multi-agent learning Methodology of MAL research Characteristics of multi-agent learning • Learning and Teaching – Teaching assumes learning Y ou: • Equilibrium play not always best Other: Stackelberg game Left Right ( 1 , 0 ) ( 3 , 2 ) Up ( 2 , 1 ) ( 4 , 0 ) Down • Agent can either learn opponents strategy or learn a strategy that does well without learning the opponents strategy. Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 8

  9. Multi-agent learning Methodology of MAL research Model-based Learning • learns opponent’s strategy and play a best response • General scheme: 1. Start with some model of the opponent’s strategy. 2. Compute and play the best response. 3. Observe the opponent’s play and update your model of her strategy. 4. Goto step 2. Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 9

  10. Multi-agent learning Methodology of MAL research Model-based Learning: Fictitious Play • Model is a count of the plays by the opponent in the past • Model after ( R , S , P , R , P ) is ( R = 0.4, P = 0.4, S = 0.2 ) • Other examples Are: smooth fictitious play, exponential fictitious play, rational learninig Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 10

  11. Multi-agent learning Methodology of MAL research Model-free Learning • Learns how well own possible actions do • Most based on Bellman equation. • Basic algorithm: – Initial value function V 0 : S → ℜ for each state – V k + 1 ← R ( s ) + γ max a ∑ s ′ T ( s , a , s ′ ) V k ( s ′ ) – Optimal policy: for each s select a that maximizes ∑ s ′ T ( s , a , s ′ ) V k ( s ′ ) • Q-Learning: Compute optimal policy with unknown reward and transition functions • MAL: minimax-Q (zero sum), joint-action learner and Friend-or-Foe Q (team-games), Nash-Q and CE-Q (general-sum games) Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 11

  12. Multi-agent learning Methodology of MAL research Regret minimization: No Regret Learning • No-regret learning i ( a j , s i | s − i ) = ∑ t • r t k = 1 R ( a j , s k − i ) − R ( s k i , s k − i ) • If regret is positive agents selects each of its actions with probability proportional to regret Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 12

  13. Multi-agent learning Methodology of MAL research Types of results from Learning Algorithms 1. Convergence of the strategy profile to an (e.g., Nash) equilibrium of the stage game in self play (that is, when all agents adopt the learning procedure under consideration). 2. Successful learning of an opponent’s strategy (or opponents’ strategies). 3. Obtaining payoffs that exceed a specified threshold. • Safe, at least minimax • Consistent, at leas as well as the best response to the empirical distribution of play Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 13

  14. Multi-agent learning Methodology of MAL research Discussion 1. Convergence in self play to play of the equilibrium of stage game • Nash Equilibrium of stage game useful? • Convergence of play vs convergence of payoff • Convergence in self play necessary? 2. Most work assumes 2 players 2 actions, why? 3. Obtaining payoffs that exceed a specified threshold (safe/consistent). • Excludes teaching? Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 14

  15. Multi-agent learning Methodology of MAL research lassi� ation of the agendas in multi-agent Agendas of MAL Shoham et al. try to make a learning. What are possible purposes of the current (and possibly future) research done in MAL? Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 15

  16. Multi-agent learning Methodology of MAL research Introducing the Agenda (Shoham et al.) • Computational • Descriptive • Normative • Prescriptive - cooperative • Prescriptive - non-cooperative Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 16

  17. Multi-agent learning Methodology of MAL research iterative way to compute certain properties, Computational Agenda • Learning algorithms as an such as Nash equilibria, on a certain class of games. – Fictitious play calculates Nash equilibria for zero-sum games. – Replicator dynamics calculates Nash equilibria for symmetric games. • Quick and dirty Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 17

  18. Multi-agent learning Methodology of MAL research dire t algo rithms . Computational Agenda: Addenda by Sandholm • Computing properties of games by using • Quick and dirty MAL algorithms as a last resort, when there is no good direct algorithm available. • (MAL algorithms can be easier to program though.) * T. Sandholm (2007): Perspectives on multiagent learning . In: Artificial Intelligence 171 , pp. 382-391, 2007. Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 18

  19. Multi-agent learning Methodology of MAL research des ription for social and Descriptive Agenda p eople's b ehaviours . • MAL as a economic behaviour. la rge p opulations . • Formal models of learning possibly correspond to • Can be extended to the modelling of • Descriptive agenda corresponds to (most) usage of MAL in social sci- ences. Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 19

  20. Multi-agent learning Methodology of MAL research Descriptive Agenda: Addenda by Sandholm Problem: Humans might not have the required rationality to act according to game-theoretic equilibrium. But this is exactly what the descriptive agenda wants to model! Erik Berbee & Bas van Gijzel. Slides last processed on Monday 22 nd March, 2010 at 23:37h. Slide 20

Recommend


More recommend