the size of message set needed for the optimal
play

The Size of Message Set Needed for the Optimal Communication Policy - PowerPoint PPT Presentation

The Size of Message Set Needed for the Optimal Communication Policy Tatsuya Kasai, Hayato Kobayashi, and Ayumi Shinohara Graduate School of Information Sciences, Tohoku University, Japan The 7 th European Workshop on Multi-Agent Systems (EUMAS


  1. The Size of Message Set Needed for the Optimal Communication Policy Tatsuya Kasai, Hayato Kobayashi, and Ayumi Shinohara Graduate School of Information Sciences, Tohoku University, Japan The 7 th European Workshop on Multi-Agent Systems (EUMAS 2009) Ayia Napa, Cyprus Dec17-18, 2009

  2. Background Multi-agent coordination with communication. Main objective : To find the optimal action policy δ A and communication policy δ M We are interested in an approach based on autonomous learning. Definition of policies for agent i in our proposed methods Signal Learning with Messages (SLM) Signal Learning (SL) [Kasai+ AAMAS09] [Kasai+ 08] receive → A i Action A : Ω i × M i δ i A set of A set of Policy observations actions A set of received messages Communi- receive → M i M : Ω i → M i M : Ω i × M i δ i δ i cation Policy A set of messages to send other agents (SL and SLM are based on Multi-Agent Reinforcement Learning framework)

  3. Motivation Actual learning results of SL and SLM [Kasai+ AAMAS09] The performance of cooperation when the size of M i is increases. The form of policies Bad receive → A i Performance of cooperation A : Ω i × M i δ i M : Ω i δ i → M i improved receive → A i A : Ω i × M i δ i receive → M i M : Ω i × M i δ i Good Size of M i We have an interest about how much size of M i for constructing the optimal policy ?

  4. Scheme of talk We show minimum required sizes | M i | for achieving the optimal policy for Signal Learning on Jointly Fully Observable Dec-POMDP-Com Signal Learning with Messages on Deterministic Dec-POMDP-Com Dec-POMDP-Com Jointly Fully Deterministic Observable

  5. Outline  Background  Scheme of talk  Review : Dec-POMDP-Com [Goldman+ 04]  Constrained model Jointly Fully Observable Dec-POMDP-Com [Goldman+ 04] Deterministic Dec-POMDP-Com (we define)  Theoretical analysis  Conclusion

  6. Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 1/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Example of model Two agents get a treasure cooperatively. field The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

  7. Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 2/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Formulation Dec-POMDP-Com := < I, S, Ω , A , M , C, P, O, R, T > O 1 Example of model 1step for agent i on Dec-POMDP-Com 1. Receive an observation o i from the environment. field 2. Send a message m i to the other agents. a 1 = Move right 3. Perform an action a i in the environment. a 2 = Move up Restricted Sights m 2 Repeat until both agent arrive at the treasure. O 2 m 1 Two agents get a treasure cooperatively. The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

  8. Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 3/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Formulation Dec-POMDP-Com := < I, S, Ω , A , M , C, P, O, R, T > O 1 Example of model A set of agents’ indices field a 1 e.g., I = {1, 2} a 2 = 1 = 2 m 2 O 2 m 1 Two agents get a treasure cooperatively. The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

  9. Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 3/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Formulation Dec-POMDP-Com := < I, S, Ω , A , M , C, P, O, R, T > O 1 Example of model A set of global states field a 1 e.g., s = (position of agent 1, position of agent 2, a 2 position of treasure ) , s ∈ S m 2 O 2 m 1 Two agents get a treasure cooperatively. The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

  10. Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 3/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Formulation Dec-POMDP-Com := < I, S, Ω , A , M , C, P, O, R, T > O 1 Example of model Ω : a set of joint observations field a 1 Ω = Ω 1 × Ω 2 , where Ω i is a set of observations for agent i a 2 A : a set of joint actions m 2 A = A 1 × A 2 O 2 m 1 Two agents get a treasure cooperatively. The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

  11. Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 3/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Formulation Dec-POMDP-Com := < I, S, Ω , A , M , C, P, O, R, T > O 1 Example of model M : a set of joint messages field a 1 M = M 1 × M 2 a 2 C : M → R is a cost function C ( m ) represent the total cost of transmitting m 2 the messages sent by all agents. O 2 m 1 Two agents get a treasure cooperatively. The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

  12. Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 3/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Formulation Dec-POMDP-Com := < I, S, Ω , A , M , C, P, O, R, T > O 1 Example of model P : a transition probability function field a 1 O : an observation probability function a 2 m 2 O 2 m 1 Two agents get a treasure cooperatively. The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

  13. Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 3/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Formulation Dec-POMDP-Com := < I, S, Ω , A , M , C, P, O, R, T > O 1 Example of model R : a reward function field a 1 e.g., the treasure obtained by agents a 2 T : a time horizon m 2 O 2 m 1 Two agents get a treasure cooperatively. The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

  14. Outline  Background  Scheme of talk  Review : Dec-POMDP-Com [Goldman+ 04]  Constrained model Jointly Fully Observable Dec-POMDP-Com [Goldman+ 04] Deterministic Dec-POMDP-Com (we define)  Theoretical analysis  Conclusion

  15. Jointly Fully Observable Dec-POMDP-Com [Goldman+ 04] The Dec-POMDP- Com such that the combination of the agents’ observations leads to the global state. Jointly fully Observable Dec-POMDP-Com field o 1 + o 2 = global state (That is Jointly fully observable) O 1 O 2

  16. Deterministic Dec-POMDP-Com The model where P and O on the definition are constrained. Dec-POMDP-Com := < I, S, Ω , A , M , C, P, O, R, T >

  17. Deterministic Dec-POMDP-Com The model where P and O on the definition are constrained. Dec-POMDP-Com := < I, S, Ω , A , M , C, P, O, R, T > Restriction 1 : Deterministic transitions For any state s ∈ S and any joint action P is a transition probability function a ∈ A , there exists a state s’ ∈ S such that P ( s , a , s ’) = 1. P(s, a ,s 1 ’)= 0.1 s 1 ’ ・ s’ ・ s P(s, a ,s i ’)= 0.2 ・ s i ’ s P ( s, a , s’ )=1 ・ P(s, a ,s n ’)= 0.4 ・ ・ s n ’ The next global state is decided uniquely. n = | S |

  18. Deterministic Dec-POMDP-Com The model where P and O on the definition are constrained. Dec-POMDP-Com := < I, S, Ω , A , M , C, P, O, R, T > Restriction 2 : Deterministic observable For any state s, s’ ∈ S and any joint O is a observation probability function action a ∈ A , there exists a joint observation o ∈ Ω such that O ( s , a , s’ , o ) = 1 . s’ s’ ・・・ ・・・ o 1 o i o n o O ( s, a ,s’, o 1 ) O ( s, a ,s’, o i ) O ( s, a ,s’, o n ) O ( s, a ,s’, o ) = 1 = 0.1 = 0.2 = 0.3 n = | Ω | The current observation is decided uniquely.

  19. Deterministic Dec-POMDP-Com The model where P and O on the definition are constrained. Dec-POMDP-Com := < I, S, Ω , A , M , C, P, O, R, T > The next global state is decided uniquely. Restriction 1 : Deterministic transitions The current observation is decided uniquely. Restriction 2 : Deterministic observable When Dec-POMDP-Com has Restriction 1 and 2, it is called Deterministic Dec-POMDP-Com

  20. Outline  Background  Scheme of talk  Review : Dec-POMDP-Com [Goldman+ 04]  Constrained model Jointly Fully Observable Dec-POMDP-Com [Goldman+ 04] Deterministic Dec-POMDP-Com (we define)  Theoretical analysis  Conclusion

  21. Main results Corollary 1 : Minimum required sizes | M i | for Signal Learning on Jointly Fully Observable Dec- POMDP-Com Theorem 2 : Minimum required sizes | M i | for Signal Learning with Messages on Deterministic Dec- POMDP-Com Dec-POMDP-Com Jointly Fully Deterministic Observable

Recommend


More recommend