multi agent learning
play

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, - PowerPoint PPT Presentation

Multi-agent learning Simplified Poker Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 1 Multi-agent learning Simplified Poker


  1. Multi-agent learning Simplified Poker Multi-agent learning Simpli�ed Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 1

  2. Multi-agent learning Simplified Poker Contents • Poker in Multi Agent Learning • Gilpin & Sandholm • Formal mechanics Ordered games Information filters Equilibrium preserving abstractions • GameShrink Algorithm sketch Results * Gilpin & Sandholm (2005): Finding equilibria in large sequential games of imperfect information . Technical Report CMU-CS-05-158, Carnegie Mellon University. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 2

  3. Multi-agent learning Simplified Poker Poker in MAL • AI Testbed: incomplete information game • Texas Hold’em Game tree tremendously large Texas Hold’em, 2 player limit: 10 18 nodes • How to solve this game? Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 3

  4. Multi-agent learning Simplified Poker Gilpin & Sandholm’s approach • Rhode Island Hold’em Strategically similar, but much less branching 3.1 · 10 9 nodes • GameShrink: Reduce branching by merging equivalent branches • Proven: Nash equilibria in the reduced game tree correspond to Nash equilibria in the original tree. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 4

  5. Multi-agent learning Simplified Poker Contents (Again) • Poker in Multi Agent Learning • Gilpin & Sandholm • Formal mechanics Ordered games Information filters Equilibrium preserving abstractions • GameShrink Algorithm sketch Results = ⇒ No introduction to poker, no demo, no proofs. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 5

  6. Multi-agent learning Simplified Poker Ordered games Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 6

  7. Multi-agent learning Simplified Poker DEFINITION 1. An ordered game is a tuple Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ , where: 1. I = { 1, . . . , n } is a finite set of players . 2. G = ⟨ G 1 , . . . , G r ⟩ , G j = ( V j , E j ) is a finite collection of finite directed trees , with nodes V j , and edges E j . Let Z j ⊂ V j be the leaf nodes of G j . Let N j ( v ) be the outgoing neighbors of v ∈ V j . 3. L = ⟨ L 1 , . . . , L r ⟩ , L j : V j \ Z j → I indicates which player is to ac t in round j . Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 7

  8. Multi-agent learning Simplified Poker DEFINITION 1. An ordered game is a tuple Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ , where: 4. Θ is a finite set of signals . 5. κ = ⟨ κ 1 , . . . , κ r ⟩ is the number of public signals revealed . γ = ⟨ γ 1 , . . . , γ r ⟩ is the number of private signals revealed . (per player in round j ) The public information revealed in round j is α j ∈ Θ κ j , and α j = ( α 1 , . . . , α j ). in all rounds up through j is ˜ The private information revealed to player i ∈ I in round j is β j i ∈ Θ γ j , β j i , . . . , β j in all rounds up through j is ˜ i = ( β 1 i ). Each signal θ ∈ Θ may only be revealed once . Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 8

  9. Multi-agent learning Simplified Poker DEFINITION 1. An ordered game is a tuple Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ , where: 6. p is a probability distribution over Θ , with p ( θ ) > 0 for all θ ∈ Θ . Signals are drawn from Θ according to p without replacement, so if A is the set of signals already revealed, then  p ( x ) ∈ A if x /  ∈ A p ( y ) ∑ y / p ( x | A ) = if x ∈ A . 0  7. ≽ is a partial ordering of subsets of Θ , and is defined for at least those pairs required by u . (coming up in 2 slides) Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 9

  10. Multi-agent learning Simplified Poker DEFINITION 1. An ordered game is a tuple Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ , where: Recall: Z j are the leaf nodes of G j . j = 1 Z j → {over , continue} 8. ω : ∪ r is a mapping of terminal nodes in round G j to one of two values: over , in which case the game ends, or continue , in which case the game continues to the next round. Clearly, for all z ∈ Z r we require ω ( z ) = over . Let ω j over = { z ∈ Z j | ω ( z ) } = over , ω j cont = { z ∈ Z j | ω ( z ) } = continue . Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 10

  11. Multi-agent learning Simplified Poker DEFINITION 1. An ordered game is a tuple Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ , where: j − 1 j n j × × Θ κ k × × × Θ γ k → R n cont × ω j u j : 9. u = ( u 1 , . . . , u r ) , ω k over × k = 1 k = 1 i = 1 k = 1 is a utility function , such that: for every j such that 1 ≤ j ≤ r , for every i ∈ I , and j − 1 [ ] × cont × ω j ω k z ∈ for every ˜ , over k = 1 at least one of the following two conditions holds: (a) Utility is signal independent , that is: u j z , ϑ ) = u j z , ϑ ′ ) i ( ˜ i ( ˜ n j j [ × Θ κ k × × × Θ γ k ] for all legal ϑ and ϑ ′ ∈ . i = 1 k = 1 k = 1 (b) See next slide. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 11

  12. Multi-agent learning Simplified Poker DEFINITION 1. An ordered game is a tuple Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ , where: j − 1 j n j × × Θ κ k × × × Θ γ k → R n cont × ω j u j : 9. u = ( u 1 , . . . , u r ) , ω k over × k = 1 k = 1 i = 1 k = 1 is a utility function , such that: for every j such that 1 ≤ j ≤ r , for every i ∈ I , and j − 1 [ ] × cont × ω j ω k z ∈ for every ˜ , over k = 1 at least one of the following two conditions holds: (a) Utility is signal independent. β ′ j β j α j , ˜ α j , ˜ (b) ≽ is defined for all legal signals ( ˜ i ) and ( ˜ i ) through round j , and a player’s utility is increasing in her private signals, all else equal: [ β ′ j ] [ β ′ j ] β j β j β j β j α j , ˜ α j , ˜ α j , ( ˜ i , ˜ α j , ( ˜ i , ˜ ( ˜ i ) ≽ ( ˜ i ) = ⇒ u i ( ˜ − i )) ≥ u i ( ˜ − i )) z , ˜ z , ˜ . Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 12

  13. Multi-agent learning Simplified Poker DEFINITION 1. An ordered game is a tuple Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ . 1. I : finite set of players. 2. G j = ( V j , E j ), where V j are the nodes and E j the edges in j . Z j ⊂ V j : the leaf nodes of G j . N j ( v ) : the outgoing neighbors of v ∈ V j . 3. L j : mapping from non-terminal nodes to players (to act in round j ). 4. Θ : finite set of signals. 5. κ j : total number of public signals revealed per player in round j . γ j : total number of private signals revealed per player in round j . 6. p : probability distribution over Θ . 7. ≽ : partial ordering of subsets of Θ . 8. ω : mapping from terminal nodes in each round to {over , continue} . 9. u : utility function. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 13

  14. Multi-agent learning Simplified Poker Information filters Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 14

  15. Multi-agent learning Simplified Poker Let Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ be an ordered game. Let S j be the set of legal ∗ signals for one player up through round j . DEFINITION 2. An information filter for Γ is a collection F = ⟨ F 1 , . . . , F r ⟩ where each F j is a function F j : S j → 2 S j s.t. the following conditions hold: β j β j β j α j , ˜ α j , ˜ α j , ˜ i ) ∈ F j ( ˜ 1. Truthfulness . ( ˜ i ) for all legal ( ˜ i ) . 2. Independence . The range of F j is a partition of S j . 3. Information preservation . If two values of a signal are distinguishable in round k , then they are distinguishable for each round j > k . That is, l = 1 κ l + γ l . We require that j let m j = ∑ for all legal ∗ ( θ 1 , . . . , θ m k , . . . , θ m j ) ⊆ Θ and ( θ ′ 1 , . . . , θ ′ m k , . . . , θ ′ m j ) ⊆ Θ : ( θ ′ 1 , . . . , θ ′ ∈ F k ( θ 1 , . . . , θ m k ) , m k ) / if then ( θ ′ 1 , . . . , θ ′ m k , . . . , θ ′ ∈ F k ( θ 1 , . . . , θ m k , . . . , θ m j ) . m j ) / Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 15

  16. Multi-agent learning Simplified Poker Example Intuition: by passing signals through a filter before revealing them, informative precision can be reduced while keeping the underlying action space intact, thus reducing game tree. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 16

  17. Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 17

  18. Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 18

  19. Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 19

  20. Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 20

  21. Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 21

Recommend


More recommend