The Size of Message Set Needed for the Optimal Communication Policy - PowerPoint PPT Presentation

The Size of Message Set Needed for the Optimal Communication Policy Tatsuya Kasai, Hayato Kobayashi, and Ayumi Shinohara Graduate School of Information Sciences, Tohoku University, Japan The 7 th European Workshop on Multi-Agent Systems (EUMAS 2009) Ayia Napa, Cyprus Dec17-18, 2009

Background Multi-agent coordination with communication. Main objective : To find the optimal action policy δ A and communication policy δ M We are interested in an approach based on autonomous learning. Definition of policies for agent i in our proposed methods Signal Learning with Messages (SLM) Signal Learning (SL) [Kasai+ AAMAS09] [Kasai+ 08] receive → A i Action A : Ω i × M i δ i A set of A set of Policy observations actions A set of received messages Communi- receive → M i M : Ω i → M i M : Ω i × M i δ i δ i cation Policy A set of messages to send other agents (SL and SLM are based on Multi-Agent Reinforcement Learning framework)

Motivation Actual learning results of SL and SLM [Kasai+ AAMAS09] The performance of cooperation when the size of M i is increases. The form of policies Bad receive → A i Performance of cooperation A : Ω i × M i δ i M : Ω i δ i → M i improved receive → A i A : Ω i × M i δ i receive → M i M : Ω i × M i δ i Good Size of M i We have an interest about how much size of M i for constructing the optimal policy ?

Scheme of talk We show minimum required sizes | M i | for achieving the optimal policy for Signal Learning on Jointly Fully Observable Dec-POMDP-Com Signal Learning with Messages on Deterministic Dec-POMDP-Com Dec-POMDP-Com Jointly Fully Deterministic Observable

Outline  Background  Scheme of talk  Review : Dec-POMDP-Com [Goldman+ 04]  Constrained model Jointly Fully Observable Dec-POMDP-Com [Goldman+ 04] Deterministic Dec-POMDP-Com (we define)  Theoretical analysis  Conclusion

Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 1/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Example of model Two agents get a treasure cooperatively. field The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 2/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Formulation Dec-POMDP-Com := ＜ I, S, Ω , A , M , C, P, O, R, T ＞ O 1 Example of model 1step for agent i on Dec-POMDP-Com 1. Receive an observation o i from the environment. field 2. Send a message m i to the other agents. a 1 = Move right 3. Perform an action a i in the environment. a 2 = Move up Restricted Sights m 2 Repeat until both agent arrive at the treasure. O 2 m 1 Two agents get a treasure cooperatively. The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 3/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Formulation Dec-POMDP-Com := ＜ I, S, Ω , A , M , C, P, O, R, T ＞ O 1 Example of model A set of agents’ indices field a 1 e.g., I = {1, 2} a 2 = 1 = 2 m 2 O 2 m 1 Two agents get a treasure cooperatively. The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 3/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Formulation Dec-POMDP-Com := ＜ I, S, Ω , A , M , C, P, O, R, T ＞ O 1 Example of model A set of global states field a 1 e.g., s = (position of agent 1, position of agent 2, a 2 position of treasure ) , s ∈ S m 2 O 2 m 1 Two agents get a treasure cooperatively. The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 3/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Formulation Dec-POMDP-Com := ＜ I, S, Ω , A , M , C, P, O, R, T ＞ O 1 Example of model Ω : a set of joint observations field a 1 Ω = Ω 1 × Ω 2 , where Ω i is a set of observations for agent i a 2 A : a set of joint actions m 2 A = A 1 × A 2 O 2 m 1 Two agents get a treasure cooperatively. The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 3/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Formulation Dec-POMDP-Com := ＜ I, S, Ω , A , M , C, P, O, R, T ＞ O 1 Example of model M : a set of joint messages field a 1 M = M 1 × M 2 a 2 C : M → R is a cost function C ( m ) represent the total cost of transmitting m 2 the messages sent by all agents. O 2 m 1 Two agents get a treasure cooperatively. The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 3/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Formulation Dec-POMDP-Com := ＜ I, S, Ω , A , M , C, P, O, R, T ＞ O 1 Example of model P : a transition probability function field a 1 O : an observation probability function a 2 m 2 O 2 m 1 Two agents get a treasure cooperatively. The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

Dec-POMDP-Com [Goldman+ 04] Page of Dec-POMDP-Com 3/3 (Decentralized Partially Observable Markov Decision Process with Communication) A decentralized multi-agent system, where agents can communicate with each other and only observe the restricted information. Formulation Dec-POMDP-Com := ＜ I, S, Ω , A , M , C, P, O, R, T ＞ O 1 Example of model R : a reward function field a 1 e.g., the treasure obtained by agents a 2 T : a time horizon m 2 O 2 m 1 Two agents get a treasure cooperatively. The treasure is locked. Both agents must reach the treasure at the same time to open the lock.

Jointly Fully Observable Dec-POMDP-Com [Goldman+ 04] The Dec-POMDP- Com such that the combination of the agents’ observations leads to the global state. Jointly fully Observable Dec-POMDP-Com field o 1 + o 2 = global state (That is Jointly fully observable) O 1 O 2

Deterministic Dec-POMDP-Com The model where P and O on the definition are constrained. Dec-POMDP-Com := ＜ I, S, Ω , A , M , C, P, O, R, T ＞

Deterministic Dec-POMDP-Com The model where P and O on the definition are constrained. Dec-POMDP-Com := ＜ I, S, Ω , A , M , C, P, O, R, T ＞ Restriction 1 : Deterministic transitions For any state s ∈ S and any joint action P is a transition probability function a ∈ A , there exists a state s’ ∈ S such that P ( s , a , s ’) = 1. P(s, a ,s 1 ’)= 0.1 s 1 ’ ・ s’ ・ s P(s, a ,s i ’)= 0.2 ・ s i ’ s P ( s, a , s’ )=1 ・ P(s, a ,s n ’)= 0.4 ・・ s n ’ The next global state is decided uniquely. n = | S |

Deterministic Dec-POMDP-Com The model where P and O on the definition are constrained. Dec-POMDP-Com := ＜ I, S, Ω , A , M , C, P, O, R, T ＞ Restriction 2 : Deterministic observable For any state s, s’ ∈ S and any joint O is a observation probability function action a ∈ A , there exists a joint observation o ∈ Ω such that O ( s , a , s’ , o ) = 1 . s’ s’ ・・・・・・ o 1 o i o n o O ( s, a ,s’, o 1 ) O ( s, a ,s’, o i ) O ( s, a ,s’, o n ) O ( s, a ,s’, o ) = 1 = 0.1 = 0.2 = 0.3 n = | Ω | The current observation is decided uniquely.

Deterministic Dec-POMDP-Com The model where P and O on the definition are constrained. Dec-POMDP-Com := ＜ I, S, Ω , A , M , C, P, O, R, T ＞ The next global state is decided uniquely. Restriction 1 : Deterministic transitions The current observation is decided uniquely. Restriction 2 : Deterministic observable When Dec-POMDP-Com has Restriction 1 and 2, it is called Deterministic Dec-POMDP-Com

Main results Corollary 1 : Minimum required sizes | M i | for Signal Learning on Jointly Fully Observable Dec- POMDP-Com Theorem 2 : Minimum required sizes | M i | for Signal Learning with Messages on Deterministic Dec- POMDP-Com Dec-POMDP-Com Jointly Fully Deterministic Observable

The Size of Message Set Needed for the Optimal Communication Policy - PowerPoint PPT Presentation

The Size of Message Set Needed for the Optimal Communication Policy Tatsuya Kasai, Hayato Kobayashi, and Ayumi Shinohara Graduate School of Information Sciences, Tohoku University, Japan The 7 th European Workshop on Multi-Agent Systems (EUMAS

COMP31212: Concurrency Topics 4.3: Message Passing Topic 4.3: Message Passing Outline Topic

GSM Short Message Service GSM Short Message Service GSM Short Message Service GSM Short Message

Agenda Message Box ( Arial, Font size 18 Bold) 1 Disclaimer This document does not constitute

ROI HSP Design Clarification Recipient ID in Message must match the physical Message recipient

Lecture Notes: Message Management 1 Slide 1: Message Management Message Management A critical

Web Engineering HTTP-message = Request | Response generic-message = start-line *message-header

References Message Authentication Codes (MACs) Message Authentication Codes (MACs), Chapter

Message Passing Concepts Message Passing Model The message passing model is based on the

Lumber Size Lumber Size Control Control Studies Studies Lumber Size Control Lumber Size

Lab 2 discussion Last Time Debugging Its a science use experiments to refine

Out-of-Sample Combined Score on Test Set As function of Training Set Size (averaged over all

Input. A set of men M , and a set of women W . Input. A set of men M , and a set of women W .

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

GIVING A PRESENTATION The Core structure Introduction Message 1 Message 2 Message 3

1 Message Encryption- see Table 11.1 in the book Message Encryption if public-key encryption

Prime Spectra of 2-Categories Category theory Joint work with Milen Yakimov The prime spectra

Coverable functions Petr Ku era, joint work with Endre Boros, Ond ej epek, Alexandr Kogan

Inverse Kinematics (part 1) CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Spring

relativistic jets Main collaborators: Alexander Tchekhovskoy Matthew Liska & Sera Markoff

Joint Signal Analysis Overview Introduction Mostly we have focused on estimating statistical

Secondary 3 Form Teachers Briefing 15 February 2019 Agenda Introduction Subject

Introduction Introduction to to Geant4 Geant4 Makoto Asai (SLAC Computing Services) Makoto

GRAVITATIONAL LENSING LECTURE 13 Docente: Massimo Meneghetti AA 2015-2016 TODAYS LECTURE

Sambuz

Useful Links

Newsletter

Mail Us

The Size of Message Set Needed for the Optimal Communication Policy - PowerPoint PPT Presentation

The Size of Message Set Needed for the Optimal Communication Policy Tatsuya Kasai, Hayato Kobayashi, and Ayumi Shinohara Graduate School of Information Sciences, Tohoku University, Japan The 7 th European Workshop on Multi-Agent Systems (EUMAS

COMP31212: Concurrency Topics 4.3: Message Passing Topic 4.3: Message Passing Outline Topic

GSM Short Message Service GSM Short Message Service GSM Short Message Service GSM Short Message

Agenda Message Box ( Arial, Font size 18 Bold) 1 Disclaimer This document does not constitute

ROI HSP Design Clarification Recipient ID in Message must match the physical Message recipient

Lecture Notes: Message Management 1 Slide 1: Message Management Message Management A critical

Web Engineering HTTP-message = Request | Response generic-message = start-line *message-header

References Message Authentication Codes (MACs) Message Authentication Codes (MACs), Chapter

Message Passing Concepts Message Passing Model The message passing model is based on the

Lumber Size Lumber Size Control Control Studies Studies Lumber Size Control Lumber Size

Lab 2 discussion Last Time Debugging Its a science use experiments to refine

Out-of-Sample Combined Score on Test Set As function of Training Set Size (averaged over all

Input. A set of men M , and a set of women W . Input. A set of men M , and a set of women W .

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

GIVING A PRESENTATION The Core structure Introduction Message 1 Message 2 Message 3

1 Message Encryption- see Table 11.1 in the book Message Encryption if public-key encryption

Prime Spectra of 2-Categories Category theory Joint work with Milen Yakimov The prime spectra

Coverable functions Petr Ku era, joint work with Endre Boros, Ond ej epek, Alexandr Kogan

Inverse Kinematics (part 1) CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Spring

relativistic jets Main collaborators: Alexander Tchekhovskoy Matthew Liska &amp; Sera Markoff

Joint Signal Analysis Overview Introduction Mostly we have focused on estimating statistical

Secondary 3 Form Teachers Briefing 15 February 2019 Agenda Introduction Subject

Introduction Introduction to to Geant4 Geant4 Makoto Asai (SLAC Computing Services) Makoto

GRAVITATIONAL LENSING LECTURE 13 Docente: Massimo Meneghetti AA 2015-2016 TODAYS LECTURE

Sambuz

Useful Links

Newsletter

Mail Us

relativistic jets Main collaborators: Alexander Tchekhovskoy Matthew Liska & Sera Markoff