Communicating with Unknown Teammates Samuel Barrett 1 Noa Agmon 2 Noam Hazon 3 Sarit Kraus 2 , 4 Peter Stone 1 1 University of Texas at Austin 2 Bar-Ilan University {sbarrett,pstone}@cs.utexas.edu {agmon,sarit}@macs.biu.ac.il 3 Ariel University 4 University of Maryland noamh@ariel.ac.il ECAI Aug 21, 2014 S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Ad Hoc Teamwork Theoretical Results Motivation Empirical Results Example Conclusions Ad Hoc Teamwork ◮ Only in control of a single agent or subset of agents ◮ Unknown teammates ◮ No pre-coordination ◮ Shared goals Examples in humans: ◮ Pick up soccer ◮ Accident response S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Ad Hoc Teamwork Theoretical Results Motivation Empirical Results Example Conclusions Motivation ◮ Agents are becoming more common and lasting longer ◮ Both robots and software agents ◮ Pre-coordination may not be possible ◮ Agents should be robust to various teammates ◮ Past work focused on cases with no communication S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Ad Hoc Teamwork Theoretical Results Motivation Empirical Results Example Conclusions Motivation ◮ Agents are becoming more common and lasting longer ◮ Both robots and software agents ◮ Pre-coordination may not be possible ◮ Agents should be robust to various teammates ◮ Past work focused on cases with no communication Research Question: How can an agent act and communicate optimally with teammates of uncertain types? S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Ad Hoc Teamwork Theoretical Results Motivation Empirical Results Example Conclusions Example S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Ad Hoc Teamwork Theoretical Results Motivation Empirical Results Example Conclusions Example S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Ad Hoc Teamwork Theoretical Results Motivation Empirical Results Example Conclusions Example S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Ad Hoc Teamwork Theoretical Results Motivation Empirical Results Example Conclusions Example Ad Hoc Agent Teammates S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Ad Hoc Teamwork Theoretical Results Motivation Empirical Results Example Conclusions Example / How long does the first road take? Ad Hoc Agent Teammates S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Ad Hoc Teamwork Theoretical Results Motivation Empirical Results Example Conclusions Outline Introduction 1 Problem Description 2 Theoretical Results 3 Empirical Results 4 Conclusions 5 S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Overview Theoretical Results Communication Empirical Results Teammates Conclusions Outline Introduction 1 Problem Description 2 Theoretical Results 3 Empirical Results 4 Conclusions 5 S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Overview Theoretical Results Communication Empirical Results Teammates Conclusions Problem Description ◮ Multi-armed bandit ◮ Two Bernoulli arms ◮ Ad hoc agent observes all payoffs S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Overview Theoretical Results Communication Empirical Results Teammates Conclusions Problem Description ◮ Multi-armed bandit ◮ Two Bernoulli arms ◮ Ad hoc agent observes all payoffs ◮ Multi-agent ◮ Simultaneous actions S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Overview Theoretical Results Communication Empirical Results Teammates Conclusions Problem Description ◮ Multi-armed bandit ◮ Two Bernoulli arms ◮ Ad hoc agent observes all payoffs ◮ Multi-agent ◮ Simultaneous actions ◮ Limited communication ◮ Fixed set of messages ◮ Has explicit cost S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Overview Theoretical Results Communication Empirical Results Teammates Conclusions Problem Description ◮ Multi-armed bandit ◮ Two Bernoulli arms ◮ Ad hoc agent observes all payoffs ◮ Multi-agent ◮ Simultaneous actions ◮ Limited communication ◮ Fixed set of messages ◮ Has explicit cost ◮ Goal: Maximize payoffs and minimize communication costs S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Overview Theoretical Results Communication Empirical Results Teammates Conclusions Communication ◮ Last observation ◮ Arm mean ◮ Suggestion S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Overview Theoretical Results Communication Empirical Results Teammates Conclusions Communication ◮ Last observation - The last arm chosen and the resulting payoff ◮ Arm mean - The mean and number of pulls of a selected arm ◮ Suggestion - Suggest that your teammates should pull the selected arm S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Overview Theoretical Results Communication Empirical Results Teammates Conclusions Teammates ◮ Limited number of types ◮ Continuous parameters ◮ Tightly coordinated S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Overview Theoretical Results Communication Empirical Results Teammates Conclusions Teammates ◮ Limited number of types ◮ Continuous parameters ◮ Tightly coordinated ◮ Team shares knowledge through communication ◮ Do not need to track each agent’s pulls S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Overview Theoretical Results Communication Empirical Results Teammates Conclusions Teammate Behaviors ε -Greedy UCB( c ) S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Overview Theoretical Results Communication Empirical Results Teammates Conclusions Teammate Behaviors ε -Greedy UCB( c ) ◮ Track arm means ◮ Usually choose greedily ◮ ε - fraction of time to explore S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Overview Theoretical Results Communication Empirical Results Teammates Conclusions Teammate Behaviors ε -Greedy UCB( c ) ◮ Track arm means ◮ Track arm means and pulls ◮ Usually choose greedily ◮ Choose greedily with respect to bounds ◮ ε - fraction of time to ◮ c - weight given to bounds explore S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Problem Description Overview Theoretical Results Communication Empirical Results Teammates Conclusions Teammate Behaviors ε -Greedy UCB( c ) ◮ Track arm means ◮ Track arm means and pulls ◮ Usually choose greedily ◮ Choose greedily with respect to bounds ◮ ε - fraction of time to ◮ c - weight given to bounds explore ◮ Have probability of following suggestion sent by ad hoc agent S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Question Problem Description Model Theoretical Results Simple Problem Empirical Results Proof sketch Conclusions Outline Introduction 1 Problem Description 2 Theoretical Results 3 Empirical Results 4 Conclusions 5 S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Question Problem Description Model Theoretical Results Simple Problem Empirical Results Proof sketch Conclusions Research Question Can an ad hoc agent approximately plan to communicate optimally with these teammates in polynomial time? S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Introduction Question Problem Description Model Theoretical Results Simple Problem Empirical Results Proof sketch Conclusions Model ◮ Model as a POMDP (teammates’ behaviors) ◮ State: ◮ Pulls and successes: ◮ Teammates’ ◮ Ad hoc agent’s ◮ Communicated S. Barrett, N. Agmon, N. Hazon, S. Kraus, P. Stone Communicating with Unknown Teammates
Recommend
More recommend