nc going beyond marginal policies for multi agent

nc : Going Beyond Marginal Policies for Multi-Agent Embodied Tasks - PowerPoint PPT Presentation

A Cor Cordial dial Sync nc : Going Beyond Marginal Policies for Multi-Agent Embodied Tasks ECCV 2020 (Spotlight) Unnat Jain 1* , Luca Weihs 2* , Eric Kolve 2 , Ali Farhadi 3 , Svetlana Lazebnik 1 , Aniruddha Kembhavi 2,3 , Alexander Schwing 1

  1. A Cor Cordial dial Sync nc : Going Beyond Marginal Policies for Multi-Agent Embodied Tasks ECCV 2020 (Spotlight) Unnat Jain 1* , Luca Weihs 2* , Eric Kolve 2 , Ali Farhadi 3 , Svetlana Lazebnik 1 , Aniruddha Kembhavi 2,3 , Alexander Schwing 1 * Equal contribution by UJ and LW 1 Code, data, and pretrained models at: 3 2

  2. Continuous coordination task 1. Furniture Moving for embodied agents

  3. MARL beyond marginal policies 2. Cordial SYNC policies

  4. Preview of contributions 1. Furniture Moving task 2. Decentralized MARL beyond marginal policies

  5. FurnMove Task FurnLift Task Jain* and Weihs* et al. “Two Body Problem: Collaborative Visual Task Completion” in CVPR 2019

  6. FurnMove Task

  7. Centralized MARL

  8. Centralized MARL Expressive but introduces issues: Joint policy and model complexity scale exponentially Require high-bandwidth communication channel

  9. Decentralized MARL ������� ������������� ���� �������

  10. Decentralized MARL

  11. Decentralized MARL Previous methods: Single marginal policy per agent Rank-1

  12. One policy per agent (rank-1) Marginal Agents Central Agent Represent marginal policies Represent and sample from and sample independently the joint policy # ! 0.32 0 0 0.68 Agent 1 → # " Agent 2 → 0.29 0 0 0.43 0.72 0.23 0 0 0.49 0 0 0 0.06 0.06 0.02 0 0 0.04 Π = # ! ⊗ # " = Π ∗ = 0.26 0.03 0 0 0.05 0.08 0.03 0 0 0.05 Effective L1 error Joint Policy 0 0 0 0.14 0.14 0.04 0 0 0.1 Rank 2 Rank 1

  13. 0.29 0 0 0.43 0 0 0 0.06 Many policies per agent (high-rank) 0.03 0 0 0.05 0 0 0 0.14 Age gent nt 1 1 Pol olicies Age gent nt 2 2 Pol olicies Mixtur Mi ure weight ghts ! ! ! " # ! # ! 0 0.3 0 0.7 0 0 0 1 0.2 ! " 0.9 0 0.1 0 0.4 0 0 0.6 ! " # " # " 0.8 Mi Mixture-of of-Ma Margi ginals 0.29 0 0 0.43 ) ( ⊗ % ( ) ) " ( ⋅ (% ( 0 0 0 0.06 ( ⊗ % & ) ) = ! " & ⋅ (% & = ( ⊗ % ) ) ) + " ) ⋅ (% ) 0.03 0 0 0.05 &'( 0 0 0 0.14

  14. SYNC-Policies Marginal agents

  15. SYNC-Policies Mixture head

  16. SYNC-Policies Generate m policies per agent

  17. SYNC-Policies Use communication symbols

  18. SYNC-Policies Generate mixture weights

  19. SYNC-Policies Synchronized sampling

  20. SYNC-Policies Select the same policy j across agents High-Rank

  21. FurnMove Task

  22. FurnMove Task Agents must • Remain near the TV • Move the TV together

  23. FurnMove Task

  24. FurnMove Task

  25. Action Space Single-Agent Navigation MoveAhead RotateLeft RotateRight Pass MoveWithObject MWO MWOAhead MWORight MWOLeft MWOBack MOAhead MORight MOLeft MOBack MoveObject MO (Details in the paper) RotateObject Right 156/169 ≈ 92.3% of action pairs will always fail.

  26. Top-down view Qualitative runs Goal Field of view: Triangles denote field of view & orientation Trajectories: • Agent 1 trajectory in red TV • Agent 2 trajectory in green • TV trajectory in blue

  27. Marginal Agents Top op-dow down n vie iew Age gent nt 1’s 1’s Age gent nt 2’s 2’s (Not available to agents) view view view view

  28. Cordial SYNC Agents Top op-dow down n vie iew Age gent nt 1’s 1’s Age gent nt 2’s 2’s (Not available to agents) view view view view

  29. Cordial SYNC Agents Age gent nt 1’s 1’s Age gent nt 2’s 2’s Top op-dow down n vie iew (Not available to agents) view view view view

  30. Quantitative results Cordial SYNC agents trains as well as the Central agents Generalize well (with scope for improvement) Marginal agents train poorly and worsens without comm.

  31. Summary

  32. Summary Marginal Agents 0.23 0 0 0.49 1. Rank-1 restriction of marginal 0.02 0 0 0.04 agents Π = # ! ⊗ # " = 0.03 0 0 0.05 Effective Joint Policy 0.04 0 0 0.1 Rank 1 0.26 L1 error

  33. Summary Mi Mixture-of of-Ma Margi ginals 0.29 0 0 0.43 1. Rank-1 restriction of marginal # " ⊗ % " # ) " " ⋅ (% " 0 0 0 0.06 " ⊗ % ! # ) ! " ! ⋅ (% ! = = agents " ⊗ % # # ) + " # ⋅ (% # 0.03 0 0 0.05 !$" 0 0 0 0.14 2. Mixture-of-marginals

  34. Summary 1. Rank-1 restriction of marginal agents 2. Mixture-of-marginals 3. SYNC policies

  35. Summary 1. Rank-1 restriction of marginal agents 2. Mixture-of-marginals 3. SYNC policies 4. FurnMove task

  36. Summary 1. Rank-1 restriction of marginal agents 2. Mixture-of-marginals 3. SYNC policies 4. FurnMove task 5. Qualitative results

  37. A Cor ordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks Interpreting Communication Mirrored Gridworld Agents Reply weights Agent1 or Agent2 took a pass action Agent1 or Agent2 attempted a MoveWithObject Steps in episode → action Joi oin ou our live QA d. Communication analysis or or zoom oom session ons Joint Policy Visualizations Detailed evaluation Cordial SYNC Marginal (prior)


More recommend