Planning for Cooperative Multiple Agents with Sparse Interactions Guy Revach Supervised by Professor Nahum Shimkin Technion - Israel Institute of Technology The Andrew and Erna Viterbi Faculty of Electrical Engineering November 21, 2017 Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 1 / 28
Overview The problem of planning for cooperative multiple agents in a deterministic environment, and under a finite time horizon, is considered. We suggest a model of interacting agents that partially decouples the agents in the group, using the notion of soft cooperation constraints . We present a two-step planning algorithm that breaks down a K multi-agent problem to K independent single-agent problems, such that the aggregation of the single-agent plans is optimal for the group. We suggest an efficient algorithm for computing a response function and a parametric policy under soft and hard constraints. We utilize a well known graphical model for efficient min-sum computation. The planning algorithm is complete , optimal , and efficient when interactions among the agents are sparse . Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 2 / 28
Cooperative Multi-Agents A multi-agent system (MAS) is a distributed system composed of multiple, interacting intelligent agents within a shared environment. Agents with a common goal work together to achieve goals, carry out tasks, or solve problems that are difficult or impossible for a single individual agent. The decision making of an individual agent depends on the actions of the other agents. Agents need to interact and coordinate to ensure that individual decisions result in jointly optimal decisions for the group. Diversity among agents (spatial, temporal, functional) is a major driver for distributing the execution of the main task. Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 3 / 28
Cooperative Multi-Agent Planning (MAP) MAP is the process of distributing tasks and coordinating the resources and activities of multiple agents. It applies to different domains, so in each domain we may apply a different planning strategy according to the problem assumptions and specifications. Optimizing the performance according to team objectives is considered to be NP-hard . There is a resemblance to well known , intractable problems such as network flow , the multi-depot multi-traveling salesman , and the vehicle routing problem. It is a challenging problem, requiring a considerable amount of computing resources, compared to the more restrictive single-agent planning. Since there is growing interest in this problem both in theory and practice, there is a need for an efficient mechanism for coordinating the agents’ actions so as to optimize their joint performance measure. Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 4 / 28
Coupling Level and Sparse Interactions The inherent complexity of a MAP task is often described by means of its coupling level ( Brafman and Domshlak, 2008 ); that is, the number of interactions that arise among agents during the resolution of a MAP task. Sparsely coupled tasks require few interactions among agents, and are perceived as easier to plan for, whereas tightly coupled tasks require a large number of interactions to obtain a solution plan. Some methods take a more general approach and are equally effective, regardless of the coupling level of the task. In other approaches the algorithm complexity is relative to the coupling level. The main question is to what extent would planning for MAS be harder than solving individual planning problems over the domains of each agent in isolation. Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 5 / 28
Planning Approaches In the coupled approach, planning is formalized as a global search . It usually incorporates a single centralized decision-maker that plans simultaneously for all agents. Although complete, optimal, and easier to design, it usually leads to a large optimization problem that is computationally intensive and may not scale up well in practice. The decoupled approach tries to decouple the decisions to some degree by decomposing the problem into several sub-problems . An advanced planner may leverage the distributed structure of the MAP tasks to improve efficiency. Theoretical optimality and sometimes even completeness may be traded off with improved efficiency. In the plan merging approach, each agent first applies local planning according to some degree of freedom and free parameters and then a single centralized entity coordinates and merges individual solutions into a global optimal joint solution. Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 6 / 28
Related Work on Coordination Graphs Guestrin , Koller , and Parr worked on a MAP with factored MDPs. The idea of cooperative action selection via coordination graphs is presented. A group of multiple cooperative agents, each with its own set of possible actions and its own observations, coordinates and globally selects an optimal joint action to achieve a common goal, and maximizes their joint long term utility using MDP. Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 7 / 28
Motivation for MAP Model The standard formulation is generic , but tightly coupled . The complexity of the standard planning algorithm is exponential in the number of agents. Our model is motivated by sparsely coupled real world problems. We aim to achieve a decoupling among agents for efficient algorithm. There is a need by the military to optimally coordinating tactical units, therefore sharing limited resources, increasing safety, and conserving energy. Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 8 / 28
Multi-Agent Standard Model G is a group of K agents: G = { g 1 , ..., g k , ..., g K } T is the time domain, T is its finite horizon : t ∈ T = { 0 , 1 , ..., T } S K is a factored state space : S K = S 1 × .... × S K σ ∗ ∈ S K are source and target state vectors σ I , � � A K is a factored action space : A K = A 1 × ... × A K H K is a deterministic factored state transition function for the group: H K � H 1 × ... × H K : S K × A K → S K , � s t +1 = H K ( � s t , � a t ) (1) C k is a coupled, time dependent, immediate cost function for g k : C k : S K × A K × T → R ∪ {∞} , J t , k = C k ( � s t , � a t , t ) (2) Agents are coupled via the transition function and via the cost function. Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 9 / 28
Multi-Agent Policy Let π K ∈ Π K be a deterministic factored policy for K agents. Given the current state, π K defines the action. π K : S K × T → A K , � a t = π K ( � s t , t ) (3) Let J π K be defined as the cumulative cost of the policy π K under finite-horizon T and termination constraint � s T = � σ ∗ : T − 1 K T − 1 K � � � � J π K = J t , k = C k ( � s t , � a t , t ) for � s T = � σ ∗ (4) t =0 t =0 k =1 k =1 ∞ for � s T � = � σ ∗ Given a source state � s 0 = � σ I , the objective is to find the optimal policy π K ∗ such that J π K is minimal. π K ∗ ∈ arg min � J π K � (5) π K ∈ Π K Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 10 / 28
Our Assumptions We consider the set of models in which a coordinator explicitly defines the coupling among agents by a set of soft cooperation constraints Ψ = { ψ 1 , ..., ψ ℓ , ..., ψ L } . A constraint ψ ℓ defines a single interaction among an affecting agent g + ℓ and an affected agent g − ℓ , σ − ℓ , A − σ + � � ℓ . It is associated with a context , a ℓ discounted cost C − ℓ , an activation function f ℓ , and an interaction variable τ ℓ . A constraint may be chosen to be satisfied by the designated affecting agent, in which case the discounted cost will be applied to the context for the designated affected agent. An independent transition function is assumed: H k : S k × A k → S k . By minimizing the joint additive total cost for the group, agents are driven to cooperative behavior only in case of need . Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 11 / 28
Interaction Variables A valid assignment to the interaction variable τ ℓ is equal to satisfying the constraint in timing that is within the horizon, while a null assignment T ∅ is equal to the case when the constraint is not satisfied. τ is an interaction vector, and D is its domain; i.e., the cross space of time � domains: τ = ( τ 1 , τ 2 , ..., τ L ) ∈ T 1 × T 2 × ... × T L � D (6) � Let L + k be the subsets of all cooperation constraints that apply to agent τ + g k as an affecting agent, and � k be the respective vector of interaction variables L + ℓ | g + τ + � T ℓ � D + k � ( τ ℓ ) ℓ ∈L + � � k = ℓ = g k , � k ∈ (7) k ℓ ∈L + k The same applied to agent g k as an affected agent L − ℓ | g − τ − � T ℓ � D − � � k � ( τ ℓ ) ℓ ∈L − k = ℓ = g k , � k ∈ (8) k ℓ ∈L − Guy Revach (Technion IIT) Cooperative Planning November 21, 2017 12 / 28 k
Recommend
More recommend