QTRAN: Learning to Factorize with Transformation for Cooperative - PowerPoint PPT Presentation

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning Kyunghwan Son , Daewoo Kim, Wan Ju Kang, David Hostallero, Yung Yi School of Electrical Engineering, KAIST

Cooperative Multi-Agent Reinforcement Learning 2 Drone Swam Control Cooperation Game Network Optimization • Distributed multi-agent systems with a shared reward • Each agent has an individual, partial observation • No communication between agents • The goal is to maximize the shared reward

Background 3 • Fully centralized training 𝑅 𝑘𝑢 𝝊, 𝒗 , 𝜌 𝑘𝑢 (𝝊, 𝒗) • Not applicable to distributed systems • Fully decentralized training 𝑅 𝑗 𝜐 𝑗 , 𝑣 𝑗 , 𝜌 𝑗 (𝜐 𝑗 , 𝑣 𝑗 ) • Non-stationarity problem • Centralized training with decentralized execution • Value function factorization 1,2 , Actor-critic method 3,4 𝑅 𝑘𝑢 𝝊, 𝒗 → 𝑅 𝑗 𝜐 𝑗 , 𝑣 𝑗 , 𝜌 𝑗 (𝜐 𝑗 , 𝑣 𝑗 ) • Applicable to distributed systems • No non-stationarity problem [1] Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V. F., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J. Z., Tuyls, K., and Graepel, T. Value decomposition networks for cooperative multi-agent learning based on team reward . In Proceedings of AAMAS, 2018. [2] Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., and Whiteson, S. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of ICML, 2018. [3] Foerster, J. N., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S. Counterfactual multi-agent policy gradients. In Proceedings of AAAI, 2018. [4] Lowe, R., WU, Y., Tamar, A., Harb, J., Pieter Abbeel, O., and Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of NIPS, 2017.

Previous Approaches 4 𝑂 • VDN (Additivity assumption) 𝑅 𝑘𝑢 𝝊, 𝒗 = ෍ 𝑅 𝑗 𝜐 𝑗 , 𝑣 𝑗 • Represent the joint Q-function as a sum of individual Q-functions 𝑗=1 • QMIX (Monotonicity assumption) 𝜖𝑅 𝑘𝑢 (𝝊, 𝒗) • The joint Q-function is monotonic in the per-agent Q-functions 𝜖𝑅 𝑗 (𝜐 𝑗 , 𝑣 𝑗 ) ≥ 0 • They have limited representational complexity 𝑅 2 (𝑣 2 ) 𝑅 2 (𝑣 2 ) Agent 2 -0.92 0.00 0.01 -3.14 -2.29 -2.41 A B C 8 -12 -12 -2.29 -1.02 -8.08 -8.08 -8.08 -5.42 -4.57 -4.70 A 𝑅 1 (𝑣 1 ) 𝑅 1 (𝑣 1 ) Agent 1 -12 0 0 0.11 -8.08 0.01 0.03 -1.22 -4.35 -3.51 -3.63 B -12 0 0 0.10 -8.08 0.01 0.02 -0.75 -3.87 -3.02 -3.14 C QMIX Result 𝑅 𝑘𝑢 Non-monotonic matrix game VDN Result 𝑅 𝑘𝑢

QTRAN: Learning to Factorize with Transformation 5 • Instead of direct value factorization, we factorize the transformed joint Q-function • Additional objective function for transformation • The original joint Q-function and the transformed Q-function have the same optimal policy • The transformed joint Q-function is linearly factorizable • Argmax operation for the original joint Q-function is not required Global Q (True joint Q-function) ① 𝑀 𝑢𝑒 : Update 𝑅 𝑘𝑢 with TD error 𝜐 1 , 𝑣 1 Neural Original Q Network 𝑹 𝜐 2 , 𝑣 2 Shared reward 𝑅 𝑘𝑢 𝝊, 𝒗 𝑹 𝒌𝒖 (𝝊, 𝒗) … ② 𝑀 𝑝𝑞𝑢 , 𝑀 𝑜𝑝𝑞𝑢 : Make optimal action equal 𝜐 𝑂 , 𝑣 𝑂 Transformation Local Qs (Action selection) Neural 𝜐 1 𝑅 1 (𝜐 1 , 𝑣 1 ) Factorization Network 𝑅 1 𝜐 2 ′ (𝝊, 𝒗) Neural 𝑅 2 (𝜐 2 , 𝑣 2 ) + 𝑅 𝑘𝑢 Network 𝑅 2 … … … Transformed Q Neural 𝑅 𝑂 (𝜐 𝑂 , 𝑣 𝑂 ) 𝜐 𝑂 Network 𝑅 𝑜

Theoretical Analysis 6 ′ − 𝑅 𝑘𝑢 zero for optimal action ( 𝑀 𝑝𝑞𝑢 ), and positive for the rest ( 𝑀 𝑜𝑝𝑞𝑢 ) • The objective functions make 𝑅 𝑘𝑢  Then, optimal actions are the same (Theorem 1) • Our theoretical analysis demonstrates that QTRAN handles a richer class of tasks (= IGM condition) arg max 𝑣 1 𝑅 1 (𝜐 1 , 𝑣 1 ) ⋮ arg max 𝑅 𝑘𝑢 𝝊, 𝒗 = 𝒗 arg max 𝑣 𝑂 𝑅 𝑂 (𝜐 𝑂 , 𝑣 𝑂 ) 𝑅 2 (𝑣 2 ) Agent 2 Agent 2 4.16 2.29 2.29 A B A B C C 3.84 8.00 6.13 6.12 A A 0.00 18.14 18.14 8.00 -12.02 -12.02 Agent 1 𝑅 1 (𝑣 1 ) Agent 1 -2.06 2.10 0.23 0.23 B B 14.11 0.23 0.23 -12.00 0.00 0.00 - 2.25 1.92 0.04 0.04 13.93 0.05 0.05 C C -12.00 0.00 -0.01 ′ − 𝑅 𝑘𝑢 ′ QTRAN Result 𝑅 𝑘𝑢 QTRAN Result 𝑅 𝑘𝑢 QTRAN Result 𝑅 𝑘𝑢

Results 7 • QTRAN outperforms VDN and QMIX by a substantial margin , especially so when the game exhibits more severe non-monotonic characteristics

Thank you! Pacific Ballroom #58

QTRAN: Learning to Factorize with Transformation for Cooperative - PowerPoint PPT Presentation

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning Kyunghwan Son , Daewoo Kim, Wan Ju Kang, David Hostallero, Yung Yi School of Electrical Engineering, KAIST Cooperative Multi-Agent Reinforcement

Strategy for City Transformation (Part 4) Sunday November 25, Strategy for City Transformation

Composing Transformation Composing Transformation Composing Transformation the process of

CHAPTER 11: MODEL TRANSFORMATION Transformation Definition Transformation Tool 2 Agenda

Start-up Thinking for Large Organizations Hi, Im Jason Digital Transformation Efforts

The New Rules of Bank Transformation Branch transformation strategy expert Branch

Transformation Center: Supporting health system transformation Chris DeMars, Director of Systems

Strategy for City Transformation (Part 2) Sunday October 21, 2012 Strategy for City

Transformation launch event HSCP Transformation Programme Launch 09:15 09:25 Judith Proctor

Transformation Management Office Rick Baniak November 16, 2018 Chief Transformation Officer

Transformation-Based Learning Christian Siefkes christian@siefkes.net Transformation-Based

Transformation of the Resource Transformation of the Resource Teachers: Learning and Teachers:

DIGITAL TRANSFORMATION CENTER The Digital Transformation Center, supports companies to close

Your Digital Transformation Partner Your Digital Transformation Partner INTRODUCTION Experienced

Kokomo Transformation Zone 1 Kokomo School Corporation Transformation Zone 2 KSC

Dieter De Hen Model Transformation Explicit Transformation Modeling Traffic and Petri

Oklahoma Transformation Oklahoma Transformation Needs Assessment and Needs Assessment and

Commensurate comparisons of models with energy budget observations reveal consistent climate

www.wm.edu WILLIAMSBURG, VA WHY WILLIAM & MARY ? Second-oldest institution of

Working memory and syntac1c islands revisited Edward Gibson, MIT

Introduction to Hidden Markov Models CMSC 473/673 UMBC Recap from last time Expectation

Battle of the Accelerator Stars Xipeng Shen The College of William and Mary & MIT Top500

Radiative Forcing Efficiency of a Forest Fire Smoke Plume at the Surface and TOA John A. Augustine

SEMANTIC IMAGE SEGMENTATION WITH DEEP CONVOLUTIONAL NETS AND FULLY CONNECTED CRFS Paper by Chen,

Neutrino Physics from the CMB & Large Scale Structure - Report - Topical Conveners: K.N.

QTRAN: Learning to Factorize with Transformation for Cooperative - PowerPoint PPT Presentation

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning Kyunghwan Son , Daewoo Kim, Wan Ju Kang, David Hostallero, Yung Yi School of Electrical Engineering, KAIST Cooperative Multi-Agent Reinforcement

Strategy for City Transformation (Part 4) Sunday November 25, Strategy for City Transformation

Composing Transformation Composing Transformation Composing Transformation the process of

CHAPTER 11: MODEL TRANSFORMATION Transformation Definition Transformation Tool 2 Agenda

Start-up Thinking for Large Organizations Hi, Im Jason Digital Transformation Efforts

The New Rules of Bank Transformation Branch transformation strategy expert Branch

Transformation Center: Supporting health system transformation Chris DeMars, Director of Systems

Strategy for City Transformation (Part 2) Sunday October 21, 2012 Strategy for City

Transformation launch event HSCP Transformation Programme Launch 09:15 09:25 Judith Proctor

Transformation Management Office Rick Baniak November 16, 2018 Chief Transformation Officer

Transformation-Based Learning Christian Siefkes christian@siefkes.net Transformation-Based

Transformation of the Resource Transformation of the Resource Teachers: Learning and Teachers:

DIGITAL TRANSFORMATION CENTER The Digital Transformation Center, supports companies to close

Your Digital Transformation Partner Your Digital Transformation Partner INTRODUCTION Experienced

Kokomo Transformation Zone 1 Kokomo School Corporation Transformation Zone 2 KSC

Dieter De Hen Model Transformation Explicit Transformation Modeling Traffic and Petri

Oklahoma Transformation Oklahoma Transformation Needs Assessment and Needs Assessment and

Commensurate comparisons of models with energy budget observations reveal consistent climate

www.wm.edu WILLIAMSBURG, VA WHY WILLIAM &amp; MARY ? Second-oldest institution of

Working memory and syntac1c islands revisited Edward Gibson, MIT

Introduction to Hidden Markov Models CMSC 473/673 UMBC Recap from last time Expectation

Battle of the Accelerator Stars Xipeng Shen The College of William and Mary &amp; MIT Top500

Radiative Forcing Efficiency of a Forest Fire Smoke Plume at the Surface and TOA John A. Augustine

SEMANTIC IMAGE SEGMENTATION WITH DEEP CONVOLUTIONAL NETS AND FULLY CONNECTED CRFS Paper by Chen,

Neutrino Physics from the CMB &amp; Large Scale Structure - Report - Topical Conveners: K.N.

www.wm.edu WILLIAMSBURG, VA WHY WILLIAM & MARY ? Second-oldest institution of

Battle of the Accelerator Stars Xipeng Shen The College of William and Mary & MIT Top500

Neutrino Physics from the CMB & Large Scale Structure - Report - Topical Conveners: K.N.