nc : Going Beyond Marginal Policies for Multi-Agent Embodied Tasks - PowerPoint PPT Presentation

A Cor Cordial dial Sync nc : Going Beyond Marginal Policies for Multi-Agent Embodied Tasks ECCV 2020 (Spotlight) Unnat Jain 1* , Luca Weihs 2* , Eric Kolve 2 , Ali Farhadi 3 , Svetlana Lazebnik 1 , Aniruddha Kembhavi 2,3 , Alexander Schwing 1 * Equal contribution by UJ and LW 1 Code, data, and pretrained models at: 3 2 https://unnat.github.io/cordial-sync/

Continuous coordination task 1. Furniture Moving for embodied agents

MARL beyond marginal policies 2. Cordial SYNC policies

Preview of contributions 1. Furniture Moving task 2. Decentralized MARL beyond marginal policies

FurnMove Task FurnLift Task Jain* and Weihs* et al. “Two Body Problem: Collaborative Visual Task Completion” in CVPR 2019

FurnMove Task

Centralized MARL

Centralized MARL Expressive but introduces issues: Joint policy and model complexity scale exponentially Require high-bandwidth communication channel

Decentralized MARL ��

Decentralized MARL

Decentralized MARL Previous methods: Single marginal policy per agent Rank-1

One policy per agent (rank-1) Marginal Agents Central Agent Represent marginal policies Represent and sample from and sample independently the joint policy # ! 0.32 0 0 0.68 Agent 1 → # " Agent 2 → 0.29 0 0 0.43 0.72 0.23 0 0 0.49 0 0 0 0.06 0.06 0.02 0 0 0.04 Π = # ! ⊗ # " = Π ∗ = 0.26 0.03 0 0 0.05 0.08 0.03 0 0 0.05 Effective L1 error Joint Policy 0 0 0 0.14 0.14 0.04 0 0 0.1 Rank 2 Rank 1

0.29 0 0 0.43 0 0 0 0.06 Many policies per agent (high-rank) 0.03 0 0 0.05 0 0 0 0.14 Age gent nt 1 1 Pol olicies Age gent nt 2 2 Pol olicies Mixtur Mi ure weight ghts ! ! ! " # ! # ! 0 0.3 0 0.7 0 0 0 1 0.2 ! " 0.9 0 0.1 0 0.4 0 0 0.6 ! " # " # " 0.8 Mi Mixture-of of-Ma Margi ginals 0.29 0 0 0.43 ) ( ⊗ % ( ) ) " ( ⋅ (% ( 0 0 0 0.06 ( ⊗ % & ) ) = ! " & ⋅ (% & = ( ⊗ % ) ) ) + " ) ⋅ (% ) 0.03 0 0 0.05 &'( 0 0 0 0.14

SYNC-Policies Marginal agents

SYNC-Policies Mixture head

SYNC-Policies Generate m policies per agent

SYNC-Policies Use communication symbols

SYNC-Policies Generate mixture weights

SYNC-Policies Synchronized sampling

SYNC-Policies Select the same policy j across agents High-Rank

FurnMove Task

FurnMove Task Agents must • Remain near the TV • Move the TV together

FurnMove Task

Action Space Single-Agent Navigation MoveAhead RotateLeft RotateRight Pass MoveWithObject MWO MWOAhead MWORight MWOLeft MWOBack MOAhead MORight MOLeft MOBack MoveObject MO (Details in the paper) RotateObject Right 156/169 ≈ 92.3% of action pairs will always fail.

Top-down view Qualitative runs Goal Field of view: Triangles denote field of view & orientation Trajectories: • Agent 1 trajectory in red TV • Agent 2 trajectory in green • TV trajectory in blue

Marginal Agents Top op-dow down n vie iew Age gent nt 1’s 1’s Age gent nt 2’s 2’s (Not available to agents) view view view view

Cordial SYNC Agents Top op-dow down n vie iew Age gent nt 1’s 1’s Age gent nt 2’s 2’s (Not available to agents) view view view view

Cordial SYNC Agents Age gent nt 1’s 1’s Age gent nt 2’s 2’s Top op-dow down n vie iew (Not available to agents) view view view view

Quantitative results Cordial SYNC agents trains as well as the Central agents Generalize well (with scope for improvement) Marginal agents train poorly and worsens without comm.

Summary

Summary Marginal Agents 0.23 0 0 0.49 1. Rank-1 restriction of marginal 0.02 0 0 0.04 agents Π = # ! ⊗ # " = 0.03 0 0 0.05 Effective Joint Policy 0.04 0 0 0.1 Rank 1 0.26 L1 error

Summary Mi Mixture-of of-Ma Margi ginals 0.29 0 0 0.43 1. Rank-1 restriction of marginal # " ⊗ % " # ) " " ⋅ (% " 0 0 0 0.06 " ⊗ % ! # ) ! " ! ⋅ (% ! = = agents " ⊗ % # # ) + " # ⋅ (% # 0.03 0 0 0.05 !$" 0 0 0 0.14 2. Mixture-of-marginals

Summary 1. Rank-1 restriction of marginal agents 2. Mixture-of-marginals 3. SYNC policies

Summary 1. Rank-1 restriction of marginal agents 2. Mixture-of-marginals 3. SYNC policies 4. FurnMove task

Summary 1. Rank-1 restriction of marginal agents 2. Mixture-of-marginals 3. SYNC policies 4. FurnMove task 5. Qualitative results

A Cor ordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks https://unnat.github.io/cordial-sync/ Interpreting Communication Mirrored Gridworld Agents Reply weights Agent1 or Agent2 took a pass action Agent1 or Agent2 attempted a MoveWithObject Steps in episode → action Joi oin ou our live QA d. Communication analysis or or zoom oom session ons Joint Policy Visualizations Detailed evaluation Cordial SYNC Marginal (prior)

nc : Going Beyond Marginal Policies for Multi-Agent Embodied Tasks - PowerPoint PPT Presentation

A Cor Cordial dial Sync nc : Going Beyond Marginal Policies for Multi-Agent Embodied Tasks ECCV 2020 (Spotlight) Unnat Jain 1* , Luca Weihs 2* , Eric Kolve 2 , Ali Farhadi 3 , Svetlana Lazebnik 1 , Aniruddha Kembhavi 2,3 , Alexander Schwing 1

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-marginal optimal transportation and hedonic pricing Brendan Pass University of Alberta

Developing tools to identify marginal lands and assess their potential for bioenergy production

Short Run Marginal Cost Short Run Marginal Cost K Peter Kolf General Manager Economic

VICTORIA HARBOUR: VICTORIA HARBOUR: marginal valuation & marginal valuation & un-priced

Joint and marginal probabilities Joint: Marginal: How to compute the probability of observations

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Single-Agent Policies for the Multi-Agent Persistent Surveillance Problem via Artificial

Single Agent Policies for the Multi-Agent Persistent Surve veillance Problem Tom Kent

Weaker Forms of Monotonicity for Declarative Networking: a more fine-grained answer to the

On Using Torsion Points in the Elliptic Curve Index Calculus Gu ena el Renault Sorbonne

Optimality Conditions in Optimal Control of Elastoplasticity Roland Herzog Gerd Wachsmuth

Collision Detection CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Spring 2016

Co-regulation for the integration of sustainability assessment Final workshop, April 28th 2020

RECWOWE Seminar Understanding the Europeanisation of Domestic Welfare States Brussels, 2

The Political Economy of Development Introductory Meeting Lukas Buchheim Munich, January 2015

University of Bristol, SPAIS @alixdietzel OUTLINE OF TALK My Research What is Climate

nc : Going Beyond Marginal Policies for Multi-Agent Embodied Tasks - PowerPoint PPT Presentation

A Cor Cordial dial Sync nc : Going Beyond Marginal Policies for Multi-Agent Embodied Tasks ECCV 2020 (Spotlight) Unnat Jain 1* , Luca Weihs 2* , Eric Kolve 2 , Ali Farhadi 3 , Svetlana Lazebnik 1 , Aniruddha Kembhavi 2,3 , Alexander Schwing 1

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-marginal optimal transportation and hedonic pricing Brendan Pass University of Alberta

Developing tools to identify marginal lands and assess their potential for bioenergy production

Short Run Marginal Cost Short Run Marginal Cost K Peter Kolf General Manager Economic

VICTORIA HARBOUR: VICTORIA HARBOUR: marginal valuation &amp; marginal valuation &amp; un-priced

Joint and marginal probabilities Joint: Marginal: How to compute the probability of observations

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Single-Agent Policies for the Multi-Agent Persistent Surveillance Problem via Artificial

Single Agent Policies for the Multi-Agent Persistent Surve veillance Problem Tom Kent

Weaker Forms of Monotonicity for Declarative Networking: a more fine-grained answer to the

On Using Torsion Points in the Elliptic Curve Index Calculus Gu ena el Renault Sorbonne

Optimality Conditions in Optimal Control of Elastoplasticity Roland Herzog Gerd Wachsmuth

Collision Detection CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Spring 2016

Co-regulation for the integration of sustainability assessment Final workshop, April 28th 2020

RECWOWE Seminar Understanding the Europeanisation of Domestic Welfare States Brussels, 2

The Political Economy of Development Introductory Meeting Lukas Buchheim Munich, January 2015

University of Bristol, SPAIS @alixdietzel OUTLINE OF TALK My Research What is Climate

VICTORIA HARBOUR: VICTORIA HARBOUR: marginal valuation & marginal valuation & un-priced