D++: Structural Credit Assignment in Tightly Coupled Multiagent - PowerPoint PPT Presentation

D++: Structural Credit Assignment in Tightly Coupled Multiagent Domains Aida Rahmatualabi , Jen Jen Chung, Kagan Tumer Autonomous Agents and Distributed Intelligence Lab OSU Robotjcs

Problem Definition team team performance performance DEMUR 2016 Aida Rahmattalabi | Oregon State University 2

Loosely Coupled vs Tightly Coupled Agents Loose coupling : • Task consists of many single-robot tasks • Each robot uses/requires litule knowledge of the other robots to accomplish the task Tight coupling : • Multjple robots are required to achieve the task • Mutual dependence of the robots on each other's performance • The objectjve functjon is inherently non-smooth DEMUR 2016 Aida Rahmattalabi | Oregon State University 3

Learning is Challenging in Tightly Coupled Tasks: DEMUR 2016 Aida Rahmattalabi | Oregon State University 4

Learning is Challenging in Tightly Coupled Tasks: The probability of SUFFICIENT agents, DEMUR 2016 Aida Rahmattalabi | Oregon State University 5

Learning is Challenging in Tightly Coupled Tasks: The probability of SUFFICIENT agents, picking the RIGHT ACTION DEMUR 2016 Aida Rahmattalabi | Oregon State University 6

Learning is Challenging in Tightly Coupled Tasks: The probability of SUFFICIENT agents, picking the RIGHT ACTION , at the RIGHT TIME DEMUR 2016 Aida Rahmattalabi | Oregon State University 7

Learning is Challenging in Tightly Coupled Tasks: The probability of SUFFICIENT agents, picking the RIGHT ACTION , at the RIGHT TIME is LOW DEMUR 2016 Aida Rahmattalabi | Oregon State University 8

Learning is Challenging in Tightly Coupled Tasks: The probability of SUFFICIENT agents, picking the RIGHT ACTION , at the RIGHT TIME is LOW How can we devise agent-specifjc evaluatjon functjons to reward the stepping stone actjons? DEMUR 2016 Aida Rahmattalabi | Oregon State University 9

Difference Evaluation Function (Agogino and Tumer, 2004) – Individual agents’ contributjon to the global team performance – Removes an agent replaces a “ counterfactual ” agent Global system performance Global system performance excluding the efgects of agent i “The world with me” “The world without me” DEMUR 2016 Aida Rahmattalabi | Oregon State University 11

D++: An Extension to Difference Reward (D) – The reward functjon evaluates the performance of a “super agent” – It introduces “ counterfactual ” agents Global system performance Global system performance Where “multjple copies of me” are present – Provides agents with stronger feedback signal – Rewards the stepping stones that lead to achieving the system objectjve DEMUR 2016 Aida Rahmattalabi | Oregon State University 12

Example: DEMUR 2016 Aida Rahmattalabi | Oregon State University 13

D++: An Extension to Difference Reward(D) • How many “counterfactual” agents should be added? DEMUR 2016 Aida Rahmattalabi | Oregon State University 14

D++: An Extension to Difference Reward(D) • How many “counterfactual” agents should be added? Search difgerent number of counterfactual agents untjl a non zero reward is reached DEMUR 2016 Aida Rahmattalabi | Oregon State University 15

D++: An Extension to Difference Reward(D) • How many “counterfactual” agents should be added? Search difgerent number of counterfactual agents untjl a non zero reward is reached • What if suffjcient number of agents are already available? Is D++ enough? DEMUR 2016 Aida Rahmattalabi | Oregon State University 16

D++: An Extension to Difference Reward(D) • How many “counterfactual” agents should be added? Search difgerent number of counterfactual agents untjl a non zero reward is reached • What if suffjcient number of agents are already available? Is D++ enough? Calculatjng both D and D++ and choosing the highest one DEMUR 2016 Aida Rahmattalabi | Oregon State University 17

Cooperative CoEvolutionary Algorithm (CCEA) • Train NN policy weights via cooperatjve coevolutjonary algorithm (CCEA) Initjalize M populatjons of k NNs Initjalize M populatjons of k NNs Initjalize M populatjons of k NNs Initjalize M populatjons of k NNs Retain k best performing Mutate each to create M Retain k best performing Mutate each to create M NNs of each populatjon populatjons of 2 k NNs NNs of each populatjon populatjons of 2 k NNs Credit Assignment Randomly select one NN from each Assess team performance and Randomly select one NN from each Assess team performance and populatjon to create team T i assign fjtness to team members populatjon to create team T i assign fjtness to team members DEMUR 2016 Aida Rahmattalabi | Oregon State University 18

Domain: Multi-robot Exploration • Neural-network controllers – NN state vector [ s 1 , s ] 2 V j 1 å å s , s 1, q , i = 2, q , i = ( ) d L j , L i d L i ' , L i ( ) j Î I q i ' Î N q [ dx , dy ] – Control actjons • Team observatjon reward: 1 N i , k V i N i , j 2 å å å G = 1 2 ( d i , j + d i , k ) i j k DEMUR 2016 Aida Rahmattalabi | Oregon State University 19

Experiments: Required Number of robots Number of POIs Type observatjons 12 10 Homogeneous 3 12 10 Homogeneous 6 9 15 Heterogeneous [1,1,1] 9 15 Heterogeneous [3,1,1] DEMUR 2016 Aida Rahmattalabi | Oregon State University 20

Homogeneous Agents: Number of observations = 3 DEMUR 2016 Aida Rahmattalabi | Oregon State University 21

Homogeneous Agents: Learned Policies of D++ learners DEMUR 2016 Aida Rahmattalabi | Oregon State University 23

Heterogeneous Agents: Number of observations = [1, 1, 1] 50 40 30 G(z) 20 10 G D D++ 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Calls to G DEMUR 2016 Aida Rahmattalabi | Oregon State University 29

Heterogeneous Agents: Learned Policies of D++ learners 40 35 30 25 Y 20 15 10 5 0 -5 0 5 10 15 20 25 30 35 40 X DEMUR 2016 Aida Rahmattalabi | Oregon State University 30

Heterogeneous Agents: Number of observations = [3, 1, 1] 14 12 10 8 G(z) 6 4 G D 2 D++ 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Calls to G DEMUR 2016 Aida Rahmattalabi | Oregon State University 31

Conclusion • D++ is a new rewarding structure for tjghtly coupled multjagent domains • D++ outperforms both G and D – Rewarding the stepping stone actjons required in the long term success • Robot heterogeneity/tjghter coupling challenges G and D learners – D++ learners can learn high-reward policies DEMUR 2016 Aida Rahmattalabi | Oregon State University 32

D++: Structural Credit Assignment in Tightly Coupled Multiagent Domains Aida Rahmatualabi , Jen Jen Chung, Kagan Tumer Autonomous Agents and Distributed Intelligence Lab OSU Robotjcs

D++: Structural Credit Assignment in Tightly Coupled Multiagent - PowerPoint PPT Presentation

D++: Structural Credit Assignment in Tightly Coupled Multiagent Domains Aida Rahmatualabi , Jen Jen Chung, Kagan Tumer Autonomous Agents and Distributed Intelligence Lab OSU Robotjcs Problem Definition team team performance performance

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

New .CL NIC Chile NIC Chile Old system: 1997 - 2012 Monolithic and Tightly coupled systems

Learning From Failure | @_pkill | indeedhi.re/2wKa2Mm What would catastrophic failure look like in

Tightly and Loosely Coupled Decision Paradigms in Multiagent Expedition Yang Xiang & Frank

FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures Tayo Oguntebi,

Multicomputers Chapter 8 Multiple Processors, A Network, Definition: Tightly-coupled CPUs

Credit: Brook Ward Credit: J Dillion Asher Credit: J Dillion Asher Credit: Brook Ward Credit:

A Structural Model for Coupled Electricity Markets Stolberg, 2014 Michael M. Kustermann | Chair

Business Credit Journal Business Credit Journal Business Credit Journal Business Credit Journal

What is College Credit Plus? College Credit Plus College Credit Plus is Ohios dual credit

TENNESSEE CREDIT UNION HALL OF FAME Tennessee Credit Union League Volunteer Corporate Credit

DUAL CREDIT WHAT IS DUAL CREDIT? Dual credit means two things are happening at once. Students

What is College Credit Plus? College Credit Plus College Credit Plus is Ohios dual credit

YOUR CREDIT FILE (But Were Afraid To Ask) 1. What is a credit enquiry? 1. What is a credit

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

Senior Year Information DHS Graduation Requirements 4 credit s of English 4 credit s

Establishment of EIOPA - Risks and Challenges for State Insurance Supervision in the EU Bozena

The future version of AIDA-2020 and the roadmap to Horizon Europe Paolo Giacomelli INFN Bologna

Status of PI Analysis Status of PI Analysis Services Services Lorenzo Moneta Lorenzo Moneta

AIDA Framework: Real-Time Correlation and Prediction of Intrusion Detection Alerts CyberTIM 2019

AIDA Problem Type-1 diabetics need 5-6 injections of insulin per day in order to digest sugar

So Far Away and Yet so Close: Augmenting Toponym Disambiguation and Similarity with Text-Based

Are almost all graphs determined by their spectrum? Aida Abiad Tilburg University, The

www.arkisto.fi/aida

D++: Structural Credit Assignment in Tightly Coupled Multiagent - PowerPoint PPT Presentation

D++: Structural Credit Assignment in Tightly Coupled Multiagent Domains Aida Rahmatualabi , Jen Jen Chung, Kagan Tumer Autonomous Agents and Distributed Intelligence Lab OSU Robotjcs Problem Definition team team performance performance

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

New .CL NIC Chile NIC Chile Old system: 1997 - 2012 Monolithic and Tightly coupled systems

Learning From Failure | @_pkill | indeedhi.re/2wKa2Mm What would catastrophic failure look like in

Tightly and Loosely Coupled Decision Paradigms in Multiagent Expedition Yang Xiang &amp; Frank

FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures Tayo Oguntebi,

Multicomputers Chapter 8 Multiple Processors, A Network, Definition: Tightly-coupled CPUs

Credit: Brook Ward Credit: J Dillion Asher Credit: J Dillion Asher Credit: Brook Ward Credit:

A Structural Model for Coupled Electricity Markets Stolberg, 2014 Michael M. Kustermann | Chair

Business Credit Journal Business Credit Journal Business Credit Journal Business Credit Journal

What is College Credit Plus? College Credit Plus College Credit Plus is Ohios dual credit

TENNESSEE CREDIT UNION HALL OF FAME Tennessee Credit Union League Volunteer Corporate Credit

DUAL CREDIT WHAT IS DUAL CREDIT? Dual credit means two things are happening at once. Students

What is College Credit Plus? College Credit Plus College Credit Plus is Ohios dual credit

YOUR CREDIT FILE (But Were Afraid To Ask) 1. What is a credit enquiry? 1. What is a credit

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

Senior Year Information DHS Graduation Requirements 4 credit s of English 4 credit s

Establishment of EIOPA - Risks and Challenges for State Insurance Supervision in the EU Bozena

The future version of AIDA-2020 and the roadmap to Horizon Europe Paolo Giacomelli INFN Bologna

Status of PI Analysis Status of PI Analysis Services Services Lorenzo Moneta Lorenzo Moneta

AIDA Framework: Real-Time Correlation and Prediction of Intrusion Detection Alerts CyberTIM 2019

AIDA Problem Type-1 diabetics need 5-6 injections of insulin per day in order to digest sugar

So Far Away and Yet so Close: Augmenting Toponym Disambiguation and Similarity with Text-Based

Are almost all graphs determined by their spectrum? Aida Abiad Tilburg University, The

www.arkisto.fi/aida

Tightly and Loosely Coupled Decision Paradigms in Multiagent Expedition Yang Xiang & Frank