Introduction Approach Experiments Reasoning about Hypothetical Agent Behaviours and their Parameters Stefano Albrecht and Peter Stone Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 1
Introduction Approach Experiments Introduction Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 2
Introduction Approach Experiments Motivation: Ad Hoc Teamwork Design individual agent which can collaborate effectively with other agents, without pre-coordination Flexibility – ability to collaborate with different teammates Efficiency – find effective policy quickly AAAI 2010 Challenge Paper (Stone et al.) Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 3
Introduction Approach Experiments Motivation: Ad Hoc Teamwork Design individual agent which can collaborate effectively with other agents, without pre-coordination Flexibility – ability to collaborate with different teammates Multiagent Interaction without Prior Coordination Efficiency – find effective policy quickly JAAMAS Special Issue on MIPC AAAI 2010 Challenge Paper (Stone et al.) AAMAS’17 Workshop on MIPC mipc.inf.ed.ac.uk Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 4
Introduction Approach Experiments Type-Based Method Hypothesise possible types of other agents: Each type θ j ∈ Θ j is blackbox behaviour specification P ( a j | H t , θ j ) Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 5
Introduction Approach Experiments Type-Based Method Hypothesise possible types of other agents: Each type θ j ∈ Θ j is blackbox behaviour specification P ( a j | H t , θ j ) Compute belief over types based on interaction history H t P ( θ j | H t ) ∝ P ( H t | θ j ) P ( θ j ) Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 5
Introduction Approach Experiments Type-Based Method Hypothesise possible types of other agents: Each type θ j ∈ Θ j is blackbox behaviour specification P ( a j | H t , θ j ) Compute belief over types based on interaction history H t P ( θ j | H t ) ∝ P ( H t | θ j ) P ( θ j ) Plan own action with respect to belief over types Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 5
Introduction Approach Experiments Type-Based Method Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 6
Introduction Approach Experiments Type-Based Method HBA (Albrecht & Ramamoorthy, AIJ’16) PLASTIC (Barrett & Stone, AIJ’16) Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 6
Introduction Approach Experiments Type-Based Method and Parameters Type-based method useful for ad hoc teamwork: Flexible – can hypothesise any types Efficient – can learn true type with few observations But... Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 7
Introduction Approach Experiments Type-Based Method and Parameters Type-based method useful for ad hoc teamwork: Flexible – can hypothesise any types Efficient – can learn true type with few observations But... Limitation: method does not recognise parameters in types! Complex behaviours often have parameters If we want to reason about n parameter settings, have to store n copies of same type with different parameter settings ⇒ Inefficient, does not scale Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 7
Introduction Approach Experiments Type-Based Method and Parameters Goal in this work Devise method which allows agent to reason about both: Relative likelihood of types and Values of bounded continuous parameters in types Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 8
Introduction Approach Experiments Type-Based Method and Parameters Goal in this work Devise method which allows agent to reason about both: Relative likelihood of types and Values of bounded continuous parameters in types Keep blackbox nature of types (can be any model) Work with any continuous parameters in types Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 8
Introduction Approach Experiments Approach Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 9
Introduction Approach Experiments Approach For each θ j ∈ Θ j , maintain parameter estimate p ∈ [ p min , p max ] n Update estimates after new observations Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 10
Introduction Approach Experiments Approach For each θ j ∈ Θ j , maintain parameter estimate p ∈ [ p min , p max ] n Update estimates after new observations Updating estimate incurs two computational costs: Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 10
Introduction Approach Experiments Approach For each θ j ∈ Θ j , maintain parameter estimate p ∈ [ p min , p max ] n Update estimates after new observations Updating estimate incurs two computational costs: Computing new parameter estimate Types are blackboxes: must sample effects of parameters ⇒ Need general, efficient estimation methods Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 10
Introduction Approach Experiments Approach For each θ j ∈ Θ j , maintain parameter estimate p ∈ [ p min , p max ] n Update estimates after new observations Updating estimate incurs two computational costs: Computing new parameter estimate Types are blackboxes: must sample effects of parameters ⇒ Need general, efficient estimation methods Adjusting internal state of type May depend on history of observations and parameter values ⇒ New estimate may introduce model inconsistency Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 10
Introduction Approach Experiments Approach: Selective Parameter Updating Observe action a t − 1 of agent j j Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 11
Introduction Approach Experiments Approach: Selective Parameter Updating Observe action a t − 1 of agent j j Select types Φ ⊂ Θ j for updating Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 12
Introduction Approach Experiments Approach: Selective Parameter Updating Observe action a t − 1 of agent j j Select types Φ ⊂ Θ j for updating For each θ j ∈ Φ, update estimate p t − 1 → p t Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 13
Introduction Approach Experiments Approach: Selective Parameter Updating Observe action a t − 1 of agent j j Select types Φ ⊂ Θ j for updating For each θ j ∈ Φ, update estimate p t − 1 → p t Update beliefs: P ( θ j | H t ) ∝ P ( a t − 1 | H t − 1 , θ j , p t ) P ( θ j | H t − 1 ) j Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 14
Introduction Approach Experiments Approach: Selective Parameter Updating Observe action a t − 1 of agent j j Select types Φ ⊂ Θ j for updating Plan own action For each θ j ∈ Φ, update estimate p t − 1 → p t Update beliefs: P ( θ j | H t ) ∝ P ( a t − 1 | H t − 1 , θ j , p t ) P ( θ j | H t − 1 ) j Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 15
Introduction Approach Experiments Approach: Selective Parameter Updating Observe action a t − 1 of agent j j Select types Φ ⊂ Θ j for updating Plan own action For each θ j ∈ Φ, update estimate p t − 1 → p t Update beliefs: P ( θ j | H t ) ∝ P ( a t − 1 | H t − 1 , θ j , p t ) P ( θ j | H t − 1 ) j Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 16
Introduction Approach Experiments Updating Parameter Estimates P ( a 2 j | H 2 , θ j , p 1 , p 2 ) Given type θ j , update parameter estimate p t − 1 → p t P ( a 1 j | H 1 , θ j , p 1 , p 2 ) Type defines action likelihoods P ( a t − 1 | H t − 1 , θ j , p ) j P ( a 0 j | H 0 , θ j , p 1 , p 2 ) -5 5 0 0 5 -5 p 1 p 2 Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 17
Introduction Approach Experiments Approximate Bayesian Updating (ABU) Idea: construct Bayesian update using polynomials Maintain prior P ( p | H t − 1 , θ j ), represented as polynomial Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 18
Introduction Approach Experiments Approximate Bayesian Updating (ABU) Idea: construct Bayesian update using polynomials Maintain prior P ( p | H t − 1 , θ j ), represented as polynomial Approximate likelihood f ( p ) = P ( a t − 1 | H t − 1 , θ j , p ) as j polynomial by sampling over p Stefano Albrecht, Peter Stone Reasoning about Hypothetical Agent Behaviours and their Parameters 18
Recommend
More recommend