A Cor Cordial dial Sync nc : Going Beyond Marginal Policies for Multi-Agent Embodied Tasks ECCV 2020 (Spotlight) Unnat Jain 1* , Luca Weihs 2* , Eric Kolve 2 , Ali Farhadi 3 , Svetlana Lazebnik 1 , Aniruddha Kembhavi 2,3 , Alexander Schwing 1 * Equal contribution by UJ and LW 1 Code, data, and pretrained models at: 3 2 https://unnat.github.io/cordial-sync/
Continuous coordination task 1. Furniture Moving for embodied agents
MARL beyond marginal policies 2. Cordial SYNC policies
Preview of contributions 1. Furniture Moving task 2. Decentralized MARL beyond marginal policies
FurnMove Task FurnLift Task Jain* and Weihs* et al. “Two Body Problem: Collaborative Visual Task Completion” in CVPR 2019
FurnMove Task
Centralized MARL
Centralized MARL Expressive but introduces issues: Joint policy and model complexity scale exponentially Require high-bandwidth communication channel
Decentralized MARL ������� ������������� ���� �������
Decentralized MARL
Decentralized MARL Previous methods: Single marginal policy per agent Rank-1
One policy per agent (rank-1) Marginal Agents Central Agent Represent marginal policies Represent and sample from and sample independently the joint policy # ! 0.32 0 0 0.68 Agent 1 → # " Agent 2 → 0.29 0 0 0.43 0.72 0.23 0 0 0.49 0 0 0 0.06 0.06 0.02 0 0 0.04 Π = # ! ⊗ # " = Π ∗ = 0.26 0.03 0 0 0.05 0.08 0.03 0 0 0.05 Effective L1 error Joint Policy 0 0 0 0.14 0.14 0.04 0 0 0.1 Rank 2 Rank 1
0.29 0 0 0.43 0 0 0 0.06 Many policies per agent (high-rank) 0.03 0 0 0.05 0 0 0 0.14 Age gent nt 1 1 Pol olicies Age gent nt 2 2 Pol olicies Mixtur Mi ure weight ghts ! ! ! " # ! # ! 0 0.3 0 0.7 0 0 0 1 0.2 ! " 0.9 0 0.1 0 0.4 0 0 0.6 ! " # " # " 0.8 Mi Mixture-of of-Ma Margi ginals 0.29 0 0 0.43 ) ( ⊗ % ( ) ) " ( ⋅ (% ( 0 0 0 0.06 ( ⊗ % & ) ) = ! " & ⋅ (% & = ( ⊗ % ) ) ) + " ) ⋅ (% ) 0.03 0 0 0.05 &'( 0 0 0 0.14
SYNC-Policies Marginal agents
SYNC-Policies Mixture head
SYNC-Policies Generate m policies per agent
SYNC-Policies Use communication symbols
SYNC-Policies Generate mixture weights
SYNC-Policies Synchronized sampling
SYNC-Policies Select the same policy j across agents High-Rank
FurnMove Task
FurnMove Task Agents must • Remain near the TV • Move the TV together
FurnMove Task
FurnMove Task
Action Space Single-Agent Navigation MoveAhead RotateLeft RotateRight Pass MoveWithObject MWO MWOAhead MWORight MWOLeft MWOBack MOAhead MORight MOLeft MOBack MoveObject MO (Details in the paper) RotateObject Right 156/169 ≈ 92.3% of action pairs will always fail.
Top-down view Qualitative runs Goal Field of view: Triangles denote field of view & orientation Trajectories: • Agent 1 trajectory in red TV • Agent 2 trajectory in green • TV trajectory in blue
Marginal Agents Top op-dow down n vie iew Age gent nt 1’s 1’s Age gent nt 2’s 2’s (Not available to agents) view view view view
Cordial SYNC Agents Top op-dow down n vie iew Age gent nt 1’s 1’s Age gent nt 2’s 2’s (Not available to agents) view view view view
Cordial SYNC Agents Age gent nt 1’s 1’s Age gent nt 2’s 2’s Top op-dow down n vie iew (Not available to agents) view view view view
Quantitative results Cordial SYNC agents trains as well as the Central agents Generalize well (with scope for improvement) Marginal agents train poorly and worsens without comm.
Summary
Summary Marginal Agents 0.23 0 0 0.49 1. Rank-1 restriction of marginal 0.02 0 0 0.04 agents Π = # ! ⊗ # " = 0.03 0 0 0.05 Effective Joint Policy 0.04 0 0 0.1 Rank 1 0.26 L1 error
Summary Mi Mixture-of of-Ma Margi ginals 0.29 0 0 0.43 1. Rank-1 restriction of marginal # " ⊗ % " # ) " " ⋅ (% " 0 0 0 0.06 " ⊗ % ! # ) ! " ! ⋅ (% ! = = agents " ⊗ % # # ) + " # ⋅ (% # 0.03 0 0 0.05 !$" 0 0 0 0.14 2. Mixture-of-marginals
Summary 1. Rank-1 restriction of marginal agents 2. Mixture-of-marginals 3. SYNC policies
Summary 1. Rank-1 restriction of marginal agents 2. Mixture-of-marginals 3. SYNC policies 4. FurnMove task
Summary 1. Rank-1 restriction of marginal agents 2. Mixture-of-marginals 3. SYNC policies 4. FurnMove task 5. Qualitative results
A Cor ordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks https://unnat.github.io/cordial-sync/ Interpreting Communication Mirrored Gridworld Agents Reply weights Agent1 or Agent2 took a pass action Agent1 or Agent2 attempted a MoveWithObject Steps in episode → action Joi oin ou our live QA d. Communication analysis or or zoom oom session ons Joint Policy Visualizations Detailed evaluation Cordial SYNC Marginal (prior)
Recommend
More recommend