Actor-Attention-Critic for Multi-Agent Reinforcement Learning - PowerPoint PPT Presentation

Actor-Attention-Critic for Multi-Agent Reinforcement Learning Shariq Iqbal and Fei Sha

Outline ● Establish a baseline approach to MARL ● Demonstrate how recent approaches improve on said baseline through sharing information between agents during training ● Present our attention-based approach for information sharing ● Demonstrate our approach’s improved effectiveness in terms of scalability and overall performance

Baseline Approach to MARL Learning with single-agent RL technique (actor-critic) for each agent independently Critic Critic ... ... Actor Actor Actor Actor Environment Buffer Training Execution Each agent only considers its local information Both the actor during execution, and the actor and critic during training

Centralizing Training Addressing the downsides of the independent MARL approach ● Centralizing training = each agent’s critic takes other agents’ actions and observations into account when Critic Critic predicting their own returns ● Policies remain decentralized Information ● Pros : Sharing ○ Gives more information to each ... agent, improving performance Actor Actor ● Cons : ○ Now we need communication Buffer during training Training [1] Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S. Counterfactual multi-agent policy gradients. In AAAI Conference on Artificial Intelligence, 2018. [2] Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O. P., and Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pp. 6382–6393, 2017.

But, How to Share? ● Existing approaches [1,2] concatenate all information into one long vector Critic Critic ○ Can get large as many agents are added Concat ○ Not all information is relevant ... Actor Actor Buffer Training [1] Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S. Counterfactual multi-agent policy gradients. In AAAI Conference on Artificial Intelligence, 2018. [2] Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O. P., and Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pp. 6382–6393, 2017.

Actor-Attention-Critic Sharing information between agents using an attention mechanism ● Agents “attend” to information that is important for predicting Attention their returns Critic Critic Mechanism ● Information about other agents is encoded into a fixed size vector ... Actor Actor Buffer Training

Attention Mechanism in Detail Sharing information between agents using an attention mechanism ● Agents exchange information Sum weighted values from all other agents using a query-key system Weighted ● Ultimately receive aggregated Value information from other agents Attend that is most relevant to predicting Weight their own returns Key ... Query Value Attention Mechanism

Environments Cooperative Treasure Collection Rover-Tower ● Cooperative Treasure Collection ○ Agents with different roles cooperate to collect colored “treasure” around the map ○ Challenge : rewards are shared, and agents must perform multi-agent credit assignment ● Rover-Tower ○ Blind “rovers” and stationary “towers” randomly paired and must cooperatively reach goal through communication ○ Challenge: rewards are independent per pair, so agents must learn to select relevant information ● Both tasks are easily scalable and require coordination between heterogeneous agent types

Performance Cooperative Treasure Collection Rover-Tower ● Our method outperforms baseline methods on two cooperative tasks

Scalability ● Compared to the next best performing baseline, our method scales well as agents are added Rover-Tower Cooperative Treasure Collection

Thank you! For more details please come to our poster: 06:30 -- 09:00 PM Pacific Ballroom

Actor-Attention-Critic for Multi-Agent Reinforcement Learning - PowerPoint PPT Presentation

Actor-Attention-Critic for Multi-Agent Reinforcement Learning Shariq Iqbal and Fei Sha Outline Establish a baseline approach to MARL Demonstrate how recent approaches improve on said baseline through sharing information between

Soft Actor-Critic Zikun Chen, Minghan Li Jan. 28, 2020 Soft Actor-Critic: Ofg-Policy Maximum

Neural Fitted Actor-Critic Matthieu Zimmer Alain Dutech Yann Boniface University of Lorraine,

DAC: The Double Actor-Critic Architecture for Learning Options NeurIPS 2019 Shangtong Zhang,

AN AN AN ACTOR AN ACTOR ACTOR ACTOR- - - -CENTERED POLICY PROCESS CENTERED POLICY PROCESS

Living Actor Living Actor Living Actor - Use Cases Living Actor - Use Cases Use Cases

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Actor-Critic Policy Learning in Cooperative Planning Josh Redding, Alborz Geramifard Han-Lim Choi

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Movie & Actor QI, Xiaoxu CHEN, Guanhao JIN, Yue OVERVIEW Goal: build a movie and actor

Why actor analysis? Actor and network analysis Bert Enserink Network map of linked Network map

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Soft Actor-Critic: Deep Reinforcement Learning for Robotics Finn Rietz University of Hamburg

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

Data structures wa y x 1 D ASE System System E C r* O r state D Critic Critic E

The Road to Graduation: June 2008 Injured at the end of Jr Kindergarten Speech understood

Reading data from a file INF1100 Lectures, Chapter 6: A file is a sequence of characters (text)

Highlighting Ultra Diffuse Galaxies: VCC 1287

CCNC Informatics: Fueling Better Outcomes for Patients and Populations C. Annette DuBard, MD, MPH

Coun%ng Low-Mass Stars in Distant Galaxies Charlie Conroy

This webinar may be recorded. This webinar presents a sampling of best practices and overviews,

Giant Planets in Open Clusters S A M U E L Q U I N N G E O R G I A S TAT E U N I V E R S I

Dark matter: astrophysical evidence Uros Seljak Slides from Risa Wechsler dark matter: do we

Actor-Attention-Critic for Multi-Agent Reinforcement Learning - PowerPoint PPT Presentation

Actor-Attention-Critic for Multi-Agent Reinforcement Learning Shariq Iqbal and Fei Sha Outline Establish a baseline approach to MARL Demonstrate how recent approaches improve on said baseline through sharing information between

Soft Actor-Critic Zikun Chen, Minghan Li Jan. 28, 2020 Soft Actor-Critic: Ofg-Policy Maximum

Neural Fitted Actor-Critic Matthieu Zimmer Alain Dutech Yann Boniface University of Lorraine,

DAC: The Double Actor-Critic Architecture for Learning Options NeurIPS 2019 Shangtong Zhang,

AN AN AN ACTOR AN ACTOR ACTOR ACTOR- - - -CENTERED POLICY PROCESS CENTERED POLICY PROCESS

Living Actor Living Actor Living Actor - Use Cases Living Actor - Use Cases Use Cases

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Actor-Critic Policy Learning in Cooperative Planning Josh Redding, Alborz Geramifard Han-Lim Choi

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Movie &amp; Actor QI, Xiaoxu CHEN, Guanhao JIN, Yue OVERVIEW Goal: build a movie and actor

Why actor analysis? Actor and network analysis Bert Enserink Network map of linked Network map

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Soft Actor-Critic: Deep Reinforcement Learning for Robotics Finn Rietz University of Hamburg

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

Data structures wa y x 1 D ASE System System E C r* O r state D Critic Critic E

The Road to Graduation: June 2008 Injured at the end of Jr Kindergarten Speech understood

Reading data from a file INF1100 Lectures, Chapter 6: A file is a sequence of characters (text)

Highlighting Ultra Diffuse Galaxies: VCC 1287

CCNC Informatics: Fueling Better Outcomes for Patients and Populations C. Annette DuBard, MD, MPH

Coun%ng Low-Mass Stars in Distant Galaxies Charlie Conroy

This webinar may be recorded. This webinar presents a sampling of best practices and overviews,

Giant Planets in Open Clusters S A M U E L Q U I N N G E O R G I A S TAT E U N I V E R S I

Dark matter: astrophysical evidence Uros Seljak Slides from Risa Wechsler dark matter: do we

Movie & Actor QI, Xiaoxu CHEN, Guanhao JIN, Yue OVERVIEW Goal: build a movie and actor