d structural credit assignment in tightly coupled
play

D++: Structural Credit Assignment in Tightly Coupled Multiagent - PowerPoint PPT Presentation

D++: Structural Credit Assignment in Tightly Coupled Multiagent Domains Aida Rahmatualabi , Jen Jen Chung, Kagan Tumer Autonomous Agents and Distributed Intelligence Lab OSU Robotjcs Problem Definition team team performance performance


  1. D++: Structural Credit Assignment in Tightly Coupled Multiagent Domains Aida Rahmatualabi , Jen Jen Chung, Kagan Tumer Autonomous Agents and Distributed Intelligence Lab OSU Robotjcs

  2. Problem Definition team team performance performance DEMUR 2016 Aida Rahmattalabi | Oregon State University 2

  3. Loosely Coupled vs Tightly Coupled Agents Loose coupling : • Task consists of many single-robot tasks • Each robot uses/requires litule knowledge of the other robots to accomplish the task Tight coupling : • Multjple robots are required to achieve the task • Mutual dependence of the robots on each other's performance • The objectjve functjon is inherently non-smooth DEMUR 2016 Aida Rahmattalabi | Oregon State University 3

  4. Learning is Challenging in Tightly Coupled Tasks: DEMUR 2016 Aida Rahmattalabi | Oregon State University 4

  5. Learning is Challenging in Tightly Coupled Tasks: The probability of SUFFICIENT agents, DEMUR 2016 Aida Rahmattalabi | Oregon State University 5

  6. Learning is Challenging in Tightly Coupled Tasks: The probability of SUFFICIENT agents, picking the RIGHT ACTION DEMUR 2016 Aida Rahmattalabi | Oregon State University 6

  7. Learning is Challenging in Tightly Coupled Tasks: The probability of SUFFICIENT agents, picking the RIGHT ACTION , at the RIGHT TIME DEMUR 2016 Aida Rahmattalabi | Oregon State University 7

  8. Learning is Challenging in Tightly Coupled Tasks: The probability of SUFFICIENT agents, picking the RIGHT ACTION , at the RIGHT TIME is LOW DEMUR 2016 Aida Rahmattalabi | Oregon State University 8

  9. Learning is Challenging in Tightly Coupled Tasks: The probability of SUFFICIENT agents, picking the RIGHT ACTION , at the RIGHT TIME is LOW How can we devise agent-specifjc evaluatjon functjons to reward the stepping stone actjons? DEMUR 2016 Aida Rahmattalabi | Oregon State University 9

  10. Difference Evaluation Function (Agogino and Tumer, 2004) – Individual agents’ contributjon to the global team performance – Removes an agent replaces a “ counterfactual ” agent Global system performance Global system performance excluding the efgects of agent i “The world with me” “The world without me” DEMUR 2016 Aida Rahmattalabi | Oregon State University 11

  11. D++: An Extension to Difference Reward (D) – The reward functjon evaluates the performance of a “super agent” – It introduces “ counterfactual ” agents Global system performance Global system performance Where “multjple copies of me” are present – Provides agents with stronger feedback signal – Rewards the stepping stones that lead to achieving the system objectjve DEMUR 2016 Aida Rahmattalabi | Oregon State University 12

  12. Example: DEMUR 2016 Aida Rahmattalabi | Oregon State University 13

  13. D++: An Extension to Difference Reward(D) • How many “counterfactual” agents should be added? DEMUR 2016 Aida Rahmattalabi | Oregon State University 14

  14. D++: An Extension to Difference Reward(D) • How many “counterfactual” agents should be added? Search difgerent number of counterfactual agents untjl a non zero reward is reached DEMUR 2016 Aida Rahmattalabi | Oregon State University 15

  15. D++: An Extension to Difference Reward(D) • How many “counterfactual” agents should be added? Search difgerent number of counterfactual agents untjl a non zero reward is reached • What if suffjcient number of agents are already available? Is D++ enough? DEMUR 2016 Aida Rahmattalabi | Oregon State University 16

  16. D++: An Extension to Difference Reward(D) • How many “counterfactual” agents should be added? Search difgerent number of counterfactual agents untjl a non zero reward is reached • What if suffjcient number of agents are already available? Is D++ enough? Calculatjng both D and D++ and choosing the highest one DEMUR 2016 Aida Rahmattalabi | Oregon State University 17

  17. Cooperative CoEvolutionary Algorithm (CCEA) • Train NN policy weights via cooperatjve coevolutjonary algorithm (CCEA) Initjalize M populatjons of k NNs Initjalize M populatjons of k NNs Initjalize M populatjons of k NNs Initjalize M populatjons of k NNs Retain k best performing Mutate each to create M Retain k best performing Mutate each to create M NNs of each populatjon populatjons of 2 k NNs NNs of each populatjon populatjons of 2 k NNs Credit Assignment Randomly select one NN from each Assess team performance and Randomly select one NN from each Assess team performance and populatjon to create team T i assign fjtness to team members populatjon to create team T i assign fjtness to team members DEMUR 2016 Aida Rahmattalabi | Oregon State University 18

  18. Domain: Multi-robot Exploration • Neural-network controllers – NN state vector [ s 1 , s ] 2 V j 1 å å s , s 1, q , i = 2, q , i = ( ) d L j , L i d L i ' , L i ( ) j Î I q i ' Î N q [ dx , dy ] – Control actjons • Team observatjon reward: 1 N i , k V i N i , j 2 å å å G = 1 2 ( d i , j + d i , k ) i j k DEMUR 2016 Aida Rahmattalabi | Oregon State University 19

  19. Experiments: Required Number of robots Number of POIs Type observatjons 12 10 Homogeneous 3 12 10 Homogeneous 6 9 15 Heterogeneous [1,1,1] 9 15 Heterogeneous [3,1,1] DEMUR 2016 Aida Rahmattalabi | Oregon State University 20

  20. Homogeneous Agents: Number of observations = 3 DEMUR 2016 Aida Rahmattalabi | Oregon State University 21

  21. Homogeneous Agents: Number of observations = 3 DEMUR 2016 Aida Rahmattalabi | Oregon State University 22

  22. Homogeneous Agents: Learned Policies of D++ learners DEMUR 2016 Aida Rahmattalabi | Oregon State University 23

  23. Homogeneous Agents: Learned Policies of D++ learners DEMUR 2016 Aida Rahmattalabi | Oregon State University 24

  24. Homogeneous Agents: Learned Policies of D++ learners DEMUR 2016 Aida Rahmattalabi | Oregon State University 25

  25. Homogeneous Agents: Learned Policies of D++ learners DEMUR 2016 Aida Rahmattalabi | Oregon State University 26

  26. Homogeneous Agents: Number of observations = 6 DEMUR 2016 Aida Rahmattalabi | Oregon State University 27

  27. Homogeneous Agents: Number of observations = 6 DEMUR 2016 Aida Rahmattalabi | Oregon State University 28

  28. Heterogeneous Agents: Number of observations = [1, 1, 1] 50 40 30 G(z) 20 10 G D D++ 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Calls to G DEMUR 2016 Aida Rahmattalabi | Oregon State University 29

  29. Heterogeneous Agents: Learned Policies of D++ learners 40 35 30 25 Y 20 15 10 5 0 -5 0 5 10 15 20 25 30 35 40 X DEMUR 2016 Aida Rahmattalabi | Oregon State University 30

  30. Heterogeneous Agents: Number of observations = [3, 1, 1] 14 12 10 8 G(z) 6 4 G D 2 D++ 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Calls to G DEMUR 2016 Aida Rahmattalabi | Oregon State University 31

  31. Conclusion • D++ is a new rewarding structure for tjghtly coupled multjagent domains • D++ outperforms both G and D – Rewarding the stepping stone actjons required in the long term success • Robot heterogeneity/tjghter coupling challenges G and D learners – D++ learners can learn high-reward policies DEMUR 2016 Aida Rahmattalabi | Oregon State University 32

  32. D++: Structural Credit Assignment in Tightly Coupled Multiagent Domains Aida Rahmatualabi , Jen Jen Chung, Kagan Tumer Autonomous Agents and Distributed Intelligence Lab OSU Robotjcs

Recommend


More recommend