bility (SMM). â Supported by research program Freight Transport Au- tomation and Multimodality (FTAM). Both programs are car- ried out within the TRAIL ...
A Plan Fusion Algorithm for Multi-Agent Systems Mathijs de Weerdt∗
Andr´e Bos
†
Hans Tonino
Cees Witteveen
Delft University of Technology Faculty of Information Technology and Systems P.O. Box 356, 2600 AJ Delft, The Netherlands {M.M.deWeerdt, A.Bos, J.F.M.Tonino, C.Witteveen}@its.tudelft.nl
Abstract We introduce an algorithm for cooperative planning in multi-agent systems. The algorithm enables the agents to combine (fuse) their plans in order to increase their joint profits. A computational resources and skills framework is developed for representing the planned activities of an agent under time constraints. Using this resource-skill framework, we present an efficient (polynomial time) algorithm that fuses the plans of a group of agents in such a way that their joint profits improve. The framework and the algorithm are illustrated using a simplified example from the freight transport domain.
1
Introduction
Recently, much attention has been given to the topic of cooperation and cooperative planning in multiagent systems. Usually, the starting point for research on this problem is the observation that there exist classes of problems that cannot be solved by a single agent in isolation, but require several agents to work together in an interactive way, coordinating their plans and sharing their resources. While this observation mentions a necessary reason to cooperate, another, almost equally important, reason to cooperate should not be overlooked: While agents may be able to solve problems on their own, they might prefer to cooperate since coordinating their resources and activities with other agents may save costs. So the cooperative planning problem in multiagent systems can be divided into a task (plan) allocation problem: Which agent should perform which part of the plan in order to reach a given set of goals, together with a profit-optimization problem: How should the tasks be distributed in order to maximize the joint profit of the participating agents. ∗ Supported by research program Seamless Multimodal Mobility (SMM). † Supported by research program Freight Transport Automation and Multimodality (FTAM). Both programs are carried out within the TRAIL research school for Transport, Infrastructure and Logistics.
Of course these problems cannot be solved independently. An additional source of difficulty is the requirement that, due to several real-time constraints, solutions to both problems should have low complexity. In the most general case1 , both aspects of cooperative planning have been taken care of, as, e.g., in [8], where both task allocation and profit maximization problems are dealt with. In this approach, however, no specific algorithms for joint planning are discussed and a coalition of agents is saddled with the computationally very difficult problem of constructing a joint plan to perform a complex task from scratch. Others, as, e.g., in [9, 10], propose to solve the problem by assuming that the interactive planning part can be neglected and concentrate on efficient approximately optimal task allocation methods by means of which resources are distributed over teams of agents that, given the allocation of resources, do not have an additional computationally difficult planning problem. In this paper, we take another approach to the multi-agent cooperative planning problem. Unlike [9, 10], we do take into account the multi-agent planning aspect, but, unlike [8], our agents do not face the difficult problem of constructing a joint plan from scratch. We consider situations in which: • Each agent or group of agents already has a plan available to perform his/her part of the task. This plan includes the resources needed, the goals to be obtained, a precise plan to perform the necessary operations and the profits to be expected if the goals have been obtained. • The agents will try to increase their social wellfare (i.e. their joint total profit) by fusing their separate plans. This fusion process avoids the 1 We will deal with explicit computation-oriented approaches to cooperative planning only, excluding, e.g., approaches like [7] discussing cooperation by BDI based agents which negotiate through passing (logical) arguments. Here, no explicit plan representation nor detailed algorithm is proposed. Furthermore, the purpose of cooperation is not to obtain more efficient plans, but to enable agents to fulfill their intentions by exchanging resources.
problem of building joint plans from scratch by efficiently applying local adaptations of the individual plans, taking care for an efficient cooperation process as well as guaranteeing goalrealizability of the combined plans. This approach has some special properties we did not encounter in the approaches mentioned before. First of all, it does not require a complete replanning of the activities once agents are prepared to cooperate. Instead, the cooperation formation process is structured in such a way that the agents involved perform small plan optimization steps by locally adapting (revising) their original plans. The resulting plan is the result of a polynomial number of such polynomial-time revision steps. Secondly, the cooperation algorithm is an any-time algorithm, allowing for a partial optimization of the original plans, without destroying the goal-realizability at any revision step performed. To give an intuitive idea of the approach and the cooperation processes we want to model, let us first present an example in the field of freight transportation. Example 1 Consider two transportation agents A and B both living in New York–NY. Assume that the agents independently planned the following activities. Agent A, owning two trucks, has to bring a load l1 to Baltimore–MD before time2 10 (tl1 ≤ 10) and another load l2 to Boston–Mass before time 5 (tl2 ≤ 5). Agent B, also owning two trucks, has to bring a load l3 to Washington–DC before time 6 (tl3 ≤ 6) and, has to bring a truck t to Boston–Mass, in favor of another transportation company that desperately needs the truck for a few days, starting at time 8 (tu ≤ 8). All load and trucks are available in New York at time 0. Figure 1 shows at the upper level the goals of A and at the lower level the goals of B. We assume traveling to a neighboring city (according to Figure 1) to cost one time unit. Cooperation between agent A and B may lead to a dramatic decrease in costs. For example, one of the goals of agent A is to bring load l1 to Baltimore for which it needs one of its trucks. Agent B has to go to Washington and —as a side product— also goes to Baltimore. Assuming that agent B’s truck has enough room available and that the time constraints match, bringing load l1 to Baltimore can also be realized by agent B (almost) without incurring extra costs. Also, one of agent B’s goals is to bring a truck to Boston, and agent A has to bring load l2 to Boston. As a side product, agent A’s truck will be in Boston. So again, agent A and B may decide to cooperate by allowing agent A to satisfy one of agent B’s original 2 Throughout this paper, we consider only a simple notion of time using integers as the time unit.
tl1 ≤ 10 WAS
BAL
tl 3 ≤ 6
tl 2 ≤ 5
NYO
BOS
tu ≤ 8
Figure 1: Goals before cooperation u, l2 WAS
BAL
l3
NYO
BOS
l1 , l 3
Figure 2: Goals after cooperation
goals. Figure 1 depicts the final arrangement of goal satisfaction by agent A (upper level) and agent B (lower level). This example shows that by cooperation agents A and B can realize all initial goals with less production (viz. transportation) costs. Such a reduction can be realized by exchanging goals or by exchanging necessary resources. We analyze the agents’ plans to determine which resources are available for other agents, the so-called side products. The following notions play a central role in this analysis: (i) elementary production processes, in our framework called skills of an agent, that constitute the building blocks for production plans, (ii) the resources needed to “execute” a skill, (iii) the goals to be realized, and (iv) time properties to specify constraints on the goals and skills and to define production times of the resources. The introduction of these time-constraints is a first step toward dealing with all kinds of constraints. In this paper we will start by introducing a formal framework for planning where resources, skills and goals can be distinguished. We assume each of the agents has already been assigned its own distinct set of resources, and also has its own set of skills. First, each agent individually constructs or selects a plan from a plan library, without taking into account the activities of other agents. Next, a (sub)group of cooperative agents investigates whether a mutual approach leads to a more cost effective plan by taking advantage of so-called side products of an agent’s activities. This is called the fusion of their plans. Finally, the profits of an agent are determined by the costs of using his skills, the costs of the resources needed to execute his plan in time and the value of the goals produced by the plan. Cooperation is analyzed in terms of reallocation of resources and skills in the plans of the agents in order to increase the overall profit. To model this cooperation process, Section 2 describes a framework to model the planning capabili-
ties of an agent. Here, a plan is analyzed in terms of a (partially-ordered) set of skill-instances, resources and goals. In Section 3 we introduce time constraints and some methods to deal with time constraints in the framework discussed. Finally, in Section 4, we present and analyze a polynomial algorithm for the plan-fusion process. A previous variant of this framework, without the ability to specify constraints over time, was proposed in [5, 11]. Remark. This resource-skill framework shares many similarities with attribute grammar systems (AGs) as specified by Knuth [4]. There is at least a superficial correspondence between production rules of AGs and our skills, and between the set of symbols together with their attributes and our resources and time labels. Also Constraint Logic Programming (CLP) (for example, [12]) is closely related to the resource-skill formalism as described in this paper. However, our use of the formalism is different from formal language parsing as in AGs, and from general computation as in CLP. We are manipulating one or more resource-skill expressions in order to improve the joint plan according to a cost function.
2
The Skill formalism
In this section we propose a framework to describe plans and operations on plans. We start by giving the building blocks: Resources, goals to achieve, and the skills to combine the resources to obtain these goals. Then, we describe a plan to realize goals from basic resources, and define an agent in this context.
2.1
Resources, goals, and skills
Central to our discussion is the concept of producing a set of products from a set of resources. A product itself can be a resource for another product. Therefore, every object, whether used or produced, is called a resource. These resources can be used exactly once, but may result in new resources that refer to the same objects in the real world (but with different properties). The set of all resources will be denoted by R. The functionality of a resource will be given by the so-called type of a resource. In general, two or more resources may belong to the same type, e.g., resources a1 and a2 may be both trucks with the same capacity. In such a case, an agent can use any of these resources to realize a product. In our framework, we will label each resource with its type using a function type : R → T , where T is the set of resource types. The domain of the function type is extended to sets of resources such that the image of a set of resources is the multi-set of resource types corresponding to the resources in the original set.
Associated to each resource r we use a unique time interval I(r) = [c0 , c1 ], where c0 takes values in the set of all time points T, and c1 is either equal to c0 or to infinity3 . Intuitively, c0 denotes the earliest time r is available, while c1 is the latest time r can be used. If c1 = ∞, then it is assumed that r is available from time point c0 on. If a resource r is used in a skill, we need to specify constraints on the time interval I(r) and dependencies between time intervals of different resources. Therefore, associated with each resource r, we introduce two time variables tfr and tlr , where tfr refers to the earliest (first) time point r becomes available and tlr refers to the latest time r is available. These time variables are used to construct time expressions. Time expressions are of the form tr = e, tr ≤ e, or tr ≥ e, where e is an arithmetic expression over time variables. Both time intervals and time variables can be easily lifted to sets of resources: If r is a resource, tr denotes the set {tfr , tlr } of time variables associated r. Furthermore, I(R) denotes the set of time intervals I(r) for all r in R. Definition 1 [Skill scheme] A skill scheme then is a rule specifying a transformation of input resource types to output resource types, given conditions on the time intervals of the input resource types and constraining the time intervals of the output resource types as a function of the time intervals of the input resource types. Formally, a skill scheme ss is a rule of the form 0 ; results) ← (y1 : T10 , y2 : T20 , . . . , ym : Tm
(x1 : T1 , x2 : T2 , . . . , xn : Tn ; conditions) where • y1 , . . . , ym , x1 , . . . , xn denote unique, arbitrary 0 , T1 , . . . Tn , respecresources of types T10 , . . . , Tm tively; • results is a set of time expressions: For each output resource yi exactly two time expressions tfyi = e1 and tlyi = e2 are specified, where both e1 , e2 are arithmetic expressions over time variables txi belonging to the input resources xi ; • conditions is a set of time expressions over the time variables txi associated with the input resources xi . To give a concrete example, the following skill scheme NYO y1 : NYO BAL , y2 : BAL , y3 : u BAL; tfy = tfx , tfy = tfx , tfy = tfx + 1, 1 1 2 1 3 1 tly1 = tfy1 , tly2 = tfy2 , tly3 = ∞ ← (x1 : u 3 In
NYO; )
general c1 may takes any value in the set T ∪ {∞} that is larger or equal to c0 , but in this paper we use a restricted version of the time interval notation that is sufficient for our application domain.
represents a drive of a truck from New York to Baltimore: From a truck currently being in New York (u NYO), it is possible to “produce” room for two loads from New York to Baltimore ( NYO BAL ), and, ultimately, a truck in Baltimore (u BAL) at one time unit later (assuming in our model, New York and Baltimore are neighboring cities). In the following we will use the following notation. If I(r) = [c0 , c1 ] is the time interval associated with resource r, then C(r) denotes the set {tfr = c0 , tlr = c1 } of conditions corresponding to I(r), and if R is a set of resources, then C(R) denotes the set ∪r∈R C(r) of all conditions corresponding to the intervals of the resources in R. A skill s is an instance of a skill scheme ss. That is, if ss is the skill scheme 0 ; results) ← (y1 : T10 , y2 : T20 , . . . , ym : Tm (x1 : T1 , x2 : T2 , . . . , xn : Tn ; conditions) 0 } ← {r1 , r2 , . . . , rn } is the skill s: {r10 , r20 , . . . , rm an instance of ss, if there exists a substitution θ : {x1 , . . . , xn , y1 , . . . , ym } → R such that:
1. type(θ(yi )) = Ti0 and type(θ(xj )) = Tj (i = 1, . . . , m and j = 1, . . . , n); 2. every time interval I(rj ) of an input resource rj satisfies the conditions mentioned in conditions under θ, i.e., C({r1 , . . . , rn })∪conditionsθ is satisfiable, where in conditionsθ all occurrences of tfxj and tlxj are replaced by tfθ(xj ) and tlθ(xj ) , respectively; 3. for every time interval I(r0 ) of an output resource ri0 , I(ri0 ) = [eval (θ(ey1i ), eval (θ(ey2i ))], where tfyi = ey1i and tlyi = ey2i are the time expressions associated with yi occurring in results, and where eval is a function which evaluates its argument expressions. We will use out(s) to denote the set of output resources ri0 , in(s) to denote the input resources rj , conditions(s) to denote the conditions on the input resources, and results(s) to denote the results on the output resources. The set of all possible skills is denoted by S. An initial set of resources R can be transformed to another set of resources R0 using a skill-instance s. To describe exactly how R0 and R are related, we use a production relation ` and its closure `∗ : Definition 2 [Produced from] Let S be a set of (instances of) skills, and let R1 and R2 be sets of resources. We say that R2 can immediately be produced from R1 using S, abbreviated by R1 `S R2 , if there is a skill instance s ∈ S and a resource set L such that 1. R1 = L ∪ in(s) and R2 = L ∪ out(s), where L ∩ in(s) = ∅ and L ∩ out(s) = ∅;
2. I(R1 ) = I(L) ∪ I(in(s)); 3. I(R2 ) = I(L) ∪ I(out(s)) We say that R2 is produced from R1 using S, if R1 `∗S R2 holds, where `∗S denotes the reflexive, transitive closure of `S . A set R of resources and a set S of skills will be used to realize a set of goals G. A set of goals mentions the type of resources G to be obtained and some time-constraints they have to satisfy: A goal G is a tuple (VG , CG ), where VG is a set of resource variables gi of type Ti , i = 1, . . . , p and CG is a set of time constraints, using the variables gi in VG . It is now easy to express that an agent having resources R and skills S is able to produce goals G: Definition 3 [Goal realizability] A set of goals G = (VG , CG ) is realizable from a set of resources R ⊆ R with time intervals I(R) using a set of skills S ⊆ S, if there exists some set of resources R0 and a substitution θ : VG → R such that 1. R `∗S R0 , 2. R0 contains all the resources required in VG , i.e., θ(VG ) ⊆ R0 and type(θ(gi )) = Ti , 3. I(R0 ) satisfies the constraints CG under θ, i.e., if CG θ is the set of constraints obtained by substituting tfθ(g) and tlθ(g) for the occurrences of tfg and tlg in CG , respectively, then CG θ ∪ C(R0 ) is satisfiable. Finding, however, a sequence of skills from S, i.e., a plan, to realize G from R is NP-hard, as the decision problem “Given a set R of resources, a set S of skills and a goal g, is {g} realizable by R with S”, is NP-complete [11]. Since we want to deal with feasible cooperative planning processes, this result has some important ramifications. First, in order to deal with planning processes, we will assume, for the purpose of this paper, that each of the players, given resources and skills, has a plan available to realize their goals. Furthermore, since finding a plan is NP-hard, finding a (cost)-minimal plan is at least as hard. Therefore, we focus on local improvements. This will enable us to develop algorithms for cooperation that are polynomial, while global cooperation would be NP-hard [1].
2.2
Plans
In this paper we will assume that each player already has a plan available. This plan may be computed by, e.g., a general purpose planning system such as Blackbox [3], or the agent may have taken an appropriate plan from a plan library. We will represent a plan by a bi-partite Directed Acyclic Graph
(DAG) P = hNR ∪ NS , Ei, where NR ⊆ R denotes a set of resource nodes, NS a set of skill nodes ns where s ∈ S, and E a set of arcs. The notation “ns ” means that ns is a skill node denoting an application of skill s. For any two nodes r ∈ NR and ns ∈ NS , (r, ns ) ∈ E means that resource r is used by skill s, and (ns , r) ∈ E means that resource r is produced by s. We will use the following notational conventions for subsets of nodes in a DAG P = hNR ∪ NS , Ei: The set of input resources of P will be denoted by In(P ) = {r ∈ NR | d− (r) = 0}4 , whereas Out(P ) = {r ∈ NR | d+ (r) = 0} will refer to the set of final products of P . Definition 4 [Plan] Let S be a set of skills, and G = (VG , CG ) be a goal set. A plan P for G using S, is a bi-partite DAG P = hNR ∪ NS , Ei, such that: 1. NR represents the resources, i.e., NR ⊆ R, 2. NS the skills, i.e., {s | ns ∈ NS } ⊆ S, 3. the goals are realized by the plan without violating the constraints, i.e., there exists a substitution θ : VG → Out(P ) such that θ(VG ) ⊆ Out(P ), type(θ(gi )) = Ti for every gi of type Ti occurring in VG , and CG θ ∪ C(Out(P )) is satisfiable, 4. if ns ∈ NS , then in(s) = {r | (r, ns ) ∈ E} and out(s) = {r | (ns , r) ∈ E} (only valid skill applications are used), and 5. if r ∈ NR , then d+ (r) ≤ 1 and d− (r) ≤ 1 (resources may be used at most once, and be produced by at most one skill application). An example of a plan can be found in Figure 4. The explanation of all terms used in this figure can be found in Section 4.3.
2.3
Agents
In this paper we are interested in cooperation of a number of players producing products or services. We will model this kind of players by so-called producing agents, consisting of a set of skills and a cost function for using resources and skills. Definition 5 [Producing Agent] A producing agent A is a tuple A = hSA , cA i, where SA is the set of skills and cA : S ∪ T → N+ is the cost function of agent A. The function cA maps a skill s ∈ SA to a natural number cA (s) representing the costs of producing out(s) out of in(s). The expression cA (type(r)), or for short cA (r), denotes the costs for storage of a 4 d− (n) denotes the in-degree of node n; likewise, d+ (n) denotes the out-degree of n.
resource r ∈ R. Note that cA (r) does not include the costs for allocating r, because these costs are incorporated in the skill costs. The state of an agent captures the available resources and needed skills to produce a set of goals. Definition 6 [State] The state of an agent A is a tuple ST A = hPA , GA i, where PA is a plan for the goals GA . The set of all states is denoted by ST . For a state ST A = hPA , (VGA , CGA )i, the set Free(PA ) = type(Out(PA )) − type(VGA ) represents the set of resources that are not allocated yet by the agent. These are available (free) for production of new goals. In order to denote the value of a (produced) resource, we use a global function v : T → N, which is supposed to be identical to every agent. The number v(t) represents the value of a resource type t ∈ T . Then, the profits of an agent A = hSA , cA i in state ST A = hPA , (VGA , CGA )i are defined by: profits(ST A ) X = v(g) − g∈VGA
3
X
r∈In(PA )
cA (r) −
X
cA (s).
ns ∈NS
Propagation of time constraints
A plan specifies how a set of goals can be satisfied, starting with some set of initial resources. For our purposes, i.e., fusion, we often want to replace a resource by another (free) resource (see Section 4). This resource needs to be of the same type, but it must also satisfy the imposed time constraints. In this section, we describe how these time constraints can be derived from the constraints on the goals. We distinguish two situations: If only simple time expressions are used, a constraint propagation method exists that derives the time constraints for all input resources from the goal constraints. As an additional advantage, using simple time expressions in a plan, every plan can be represented as a (complex) skill. For more complex expressions, such a propagation can only be done for one input resource at a time, keeping all other resources fixed.
3.1
Simple time constraints
In this section we describe how goal constraints can be propagated to all input resources. First, we define for which time expressions this is possible. Then, we show how constraints can be propagated locally, and finally we show how time constraints can be propagated from the goal conditions through the whole plan to the input resources. Definition 7 [Simple time expressions] Given a skill scheme ss (such as defined in Definition 1, with n
input resource variables xi with 1 ≤ i ≤ n and m output resource variables yj with 1 ≤ j ≤ m, we say that all time expressions used in ss are simple if and only if 1. the arithmetic expression of the results can be described conform the following BNF notation: T E ::= c | tri | T E + c | max(T E, T E) where c ∈ T, and 2. the time conditions conditions are of the form txi ≤ c with c ∈ T. If a skill scheme satisfies these conditions, we say the skill scheme ss is simple. We also say the time conditions of a goal are simple if they have the same form as the conditions of a skill scheme (i.e., t ≤ c with c ∈ T), and, finally, we say a plan is simple if for each skill in the plan the corresponding skill scheme is simple. It is not difficult to see that every simple time expression, due to its restricted structure, can be rewritten to the following canonical form: T E ::= max(ST E1 , . . . , ST En ) where ST Ei ::= c | tri | tri + c. Example 2 Suppose we have a skill with input resources r1 and r2 , and output resources r3 and r4 . And suppose the time intervals of the output variables (results) are defined as follows: tfr3 = max(tfr1 + 3, tfr2 + 2), tfr4 = tfr2 + 4, tlr3 = ∞, and tlr4 = ∞ (To produce r3 it takes 3 time steps after r1 is received and 2 time steps after r2 is received, and to produce r4 it takes 4 time steps after r2 is received.) Assume the goal states that tfr3 ≤ c3 and tfr4 ≤ c4 , with c3 and c4 natural numbers. It is easy to see that all time expressions are simple. Then, to derive the time constraints on the input variables, we can make the following deductions: • tfr1 ≤ c3 − 3 (from the definition of tfr3 and the constraint on r3 ) • tfr2 ≤ c3 −2 (in an analog way) and furthermore, tfr2 ≤ c4 − 4 (from the definition of tfr4 ) and therefore: tfr2 ≤ eval (min(c3 − 2, c4 − 4)) • There are no constraints on tlr1 and tlr2 (except that they should be greater than or equal to the respective first values of the time intervals). Note that the derived constraints on the input variables are again simple. The process described in this example is called local constraint propagation and is defined as follows: Definition 8 [Local constraint propagation] Given a skill scheme ss (with output resource variables yj , input resource variables xi , and a set of canonical time definitions (results) for the output resources
tfyj = max(ST E1j , . . . , ST Enj ) and, depending on the type of yj , time definitions tlyj = tfyj or tlyj = ∞ 5 ), and a set of time conditions on the output tfyj ≤ cj , such that all time constraints are simple, the following operation on s to derive a time condition for each of the input variables is called local constraint propagation: 1. For each 1 ≤ i ≤ n and 1 ≤ j ≤ m, we derive tfri ≤ eij with eij defined as follows: • If tfri occurs in ST Eij then eij is defined as cj − (ST Eij − tfri ). • Otherwise eij is defined as ∞. 2. These inequalities can be combined to derive one inequality for each input resource. For each 1 ≤ i ≤ n: tfri ≤ min1≤j≤m (eij ) 3. The end values (tlyj ) of the time intervals are either equal to the first values of the time intervals, or equal to ∞, depending on the type of the resource yj . Proposition 1 (Constraint propagation) Given a simple plan and goal, the time conditions on the initial resources can be derived from the goal conditions by repeatedly applying local constraint propagation. This constraint propagation can be implemented by the following function: Given a plan and a set of goals, first derive the constraints for all input resources of those skills that produce a goal using local constraints propagation on the corresponding skill schemes. These skills can now be removed from the plan and their input resources can be defined as the new set of goals with the derived constraints. This process can be repeated until the plan is empty.
3.2
More complex constraints
The application domain may be such that we are unable to model all constraints using simple time expressions, for example, because we need to define time constraints involving more than one resource, such as “the package should be here before the transport resource is available”. In this case, we can’t use the method described in the previous section right away. However, we are still able to derive time constraints on one resource at a time, as we need in our plan improvement algorithm (see next section): Keeping the information about all other resources fixed, constraint propagation of complex constraints can be done by replacing each time variable associated with the other resources in the constraints by their given value. This transforms all conditions
Algorithm 1 The improvement algorithm
latest time
latest prod time
s4
r r2
s2
r3 earliest time
s
Figure 3: The removal of s and r.
to simple time constraints, allowing us to use the method described in the previous section.
4
Plan optimization and fusion
Until now we have only paid attention to individual plans. We showed how plans can be represented and how time constraints can be propagated. One of the most important advantages of our formalism is that it also can be used to describe cooperation between agents. One special form of cooperation where agents are willing to share all their resources is called fusion. We present a polynomial algorithm that can be used both for optimization of a single agent plan and for fusion of multi agent plans. In this section we first explain how the improvement algorithm works globally, and we discuss the difficulties introduced by the time constraints. In the second part we show how the plan improvement algorithm can be used in the fusion process.
4.1
Plan improvement
Agents will cooperate within a fusion if the common profits will not decrease. As can be seen from the definition of profits in Section 2.3 profits can only be increased by removing skills. In a fusion, agents may exploit resources produced by or originally assigned to other agents. Possible inefficiencies in their joint plan can be removed by local improvements involving skill removal. As these improvements do not necessarily lead to a globally optimal plan, this procedure is a local optimization. We start by describing the plan improvement algorithm (see Algorithm 4). This algorithm tries to optimize the plan of an agent by removing one or more skills. The algorithm keeps trying all skills until none of them can 5 Note
that in this paper we concentrate on a restricted version of time intervals, see also Section 2.
find subst ( r, s ) for each resource r2 with r2 6= r do if (¬needed(r2 ) ∨ r2 ∈ in(s)) ∧¬r2 ∈ out(s) ∧type(r2 ) = type(r) ∧tfr2 ≤ latest time(r) ∧(tlr2 = ∞ ∨ latest prod time(r2 ) ≥ earliest time(r)) then return true return false optimize(plan) 1. f ound = false 2. for each skill s until f ound do f ound = true for each r ∈ out(s) do if needed(r) then f ound = f ound ∧ find subst(r, s) olds = s 3. if f ound then re-assign resources(olds) delete(olds) optimize(plan)
be removed. A skill s will be removed if for each output resource r ∈ out(s), such that r is needed, denoted by needed(r), i.e. r ∈ in(s0 ) for some s0 or r is used as a goal, a substitute is found. The function find subst tries to find a replacement for a resource by checking for all other resources whether they are suitable6 . The check of the time constraints is more involved. First of all, we assume the time expressions of the resources in a plan are efficient, meaning all slack is removed and all products are produced as soon as possible7 . Concerning the time constraints, we distinguish two sorts of resources r: those which have a half open interval [c, ∞] for some c ∈ T and those that have a closed interval ([c, c]). In our transport context (see also Example 1), a typical example for the former is a truck: This resource will be available from a point in time until it is used. An example for the latter is a transportation resource: This resource is only available at a specific time, since the time the truck leaves is fixed by the skill producing the transport resource. Suppose we want to remove a skill s that produces a resource r that is needed as is shown in Figure 3. We must find a replacement r2 for r. Suppose r is a 6 This approach is analogue to the one presented in [5], for a more detailed description, see [11]. 7 This is an invariant of our algorithm.
l1 BAL 2
l2 BOS 2
s1
(tlr2 ) of the latest time interval a resource r2 can be produced by a skill s2 , with all goals (indirectly) dependent on s2 still meeting their deadlines. This time is calculated by finding the latest time for all output resources of s2 . The returned value is the minimum of these and the latest prod times of all input resources of s2 unless the right sides of their time intervals are ∞.
s3 NYO 1 BOS u BOS 1
NYO 1 BAL u BAL 1
l1 NYO 0 NYO 1 BAL
l2 NYO 0 NYO 1 BOS
s2
u NYO 0
s4
u NYO 0
Figure 4: State of agent A at begin (plan PA )
resource of a type that has a half open time interval, then r2 should have such an interval as well. For our replacement the following must hold: The earliest time r2 is available must be before the latest time r is needed (else, a time condition on one of the goals is violated). This latest time is calculated by a function latest time. The earliest time is the smallest value in the time interval of r2 . Algorithm 2 The fusion algorithm fuse(A) 1. plan = ∪Ai ∈A PAi 2. optimize(plan)
For a resource r with a closed time interval, the check is even more difficult. Not only should r2 be produced before the latest time r is needed, it should also be possible to execute the skill (s4 ) that needs this resource, within the interval r2 is available. This check is done by calculating the earliest time a replacement for r can be of use, called earliest time (too early can be useless, i.e. the package to be transported is not ready), and by calculating the possible slack in the production of r2 , called latest prod time: the latest time r2 can be produced by a skill s2 , with still all goals (indirectly) depending on s2 matching their conditions. The functionality of each of these three time functions is summarized below (and also shown in Figure 3): • latest time(r) returns the latest time the resource r must be available in the current plan, such that all goals (indirectly) dependent on r still can meet their deadlines. This time can be calculated with the constraint propagation procedure as described in Section 3 that propagates these deadlines from the goals to the initial resources of a plan. • latest prod time(r2 ) returns the right side
• earliest time(r) returns the earliest time a replacement for r is useful for the skill that needs it, s4 . This time is derived from the maximum of the first times all other input resources of the skill s4 are available: earliest time(r) = maxr3 ∈in(s),r3 6=r tfr3 . The rest of the functionality of the improvement algorithm (Algorithm 4) is in the condition of the if statement in find subst, determining whether an alternative resource r2 is acceptable as substitute for r. The procedure re-assign resources actually performs the removal of s and the substitution of each output resource r of s that is needed, by their found replacements. It also shifts the execution times of skills if needed.
4.2
Fusion
We now show how to make a fusion (see Algorithm 4.1), given the plans and goals, of a finite set of agents A. The fused plan PA of their individual plans must be such that all goals of all agents can be realized. This fusion is the result of (i) a combination of all individual plans and (ii) an optimization of this collective plan (a forest of individual plans). The following proposition can be easily verified:8 Proposition 2 Given a finite set of agents A with for each agent i goals GAi and plans PAi , the fusion algorithm will always find a plan PA that realizes ∪Ai ∈A GAi . PWith respect to the complexity, let 9 n = 1≤i≤m ||PAi || denote the sum of the sizes of the plans of all agents in A. Each of the functions latest time, latest prod time and earliest time traverses (part) of the plan exactly once, so their time complexity is O(n), therefore the worst-case time complexity of find subst is O(n2 ). This function is called at most O(n2 ) times, since each time a skill is removed at most O(n) skills may have been checked and in total at most O(n) skills can be removed. 8 The
proofs are easy but somewhat tedious and, due to lack of space, are omitted here. 9 The size ||P || of a plan P equals the size of the DAG representing P , i.e., the number of vertices plus the number of arcs in the plan.
l3 WAS 2
s7 BAL 1 NYO NYO 1 1 BOS WAS BAL NYO 1 u BOS 1 NYO 1 u WAS 1 BOS BAL l3 NYO 0 BAL 1 WAS
s6
u NYO 0
s8
u NYO 0
Figure 5: State of agent B at begin (plan PB )
Proposition 3 Fusion of m agents with total plan size n can be performed in O(n4 )-time. The fusion algorithm is implemented in C in the following set-up. First, the set of skills is translated to a STRIPS-like [2] input, that is processed further by the Blackbox [3] planner, producing a plan for each agent. These plans are fused using the described algorithm. To evaluate our algorithm, we compared this process to Blackbox solving the problem globally. Some preliminary tests showed us that solving problems for the agents separately, followed by our fusion process is about 25% faster for these instances than solving the problem globally, and returns a better solution (in terms of the number of skills used) as well. The better running time is exactly what we expected, since Blackbox has an exponential time complexity, while our algorithm is polynomial.
4.3
Example
In this section we apply the fusion algorithm to the example as given in the introduction of this paper. To save space we use the following notation: To indicate that a resource load l (or truck u) is at location XXX and has a time interval [c, ∞], we use the notation l XXX c (or u XXX c). The symbol NYO BAL c represents one transport resource for a load from New York to Baltimore and has a time interval [c, c]. These notations without the time constant denote the types of these resources. Suppose we have the following initial resources of agent A and B, respectively: RA = {l1 NYO 0, l2 NYO 0, u1 NYO 0, u2 NYO 0} and RB = {l3 NYO 0, u3 NYO 0, u4 NYO 0}, and the skill schemes as described in Figure 6. For example, skill s8 tells us that if we have a truck available in New York (u NYO) then we can “produce” two load spaces from New York to Baltimore ( NYO BAL ) and two load spaces from Baltimore to Washington
BAL ( WAS ). Furthermore, skill s7 tells us that to get a load l3 in Washington (l3 WAS) we need a load l3 in New York (l3 NYO), room for a load during the drive from New York to Baltimore ( NYO BAL ), and room for a load during the drive from Baltimore BAL to Washington ( WAS ). Figures 4 and 5 show the plans PA and PB of the agents in the begin situation. Skills are represented by boxes and resources are represented by circles. Black colored circles represent goals, and grey colored circles represent side products that could be valuable for another agent. Furthermore suppose we have the following goals: GA = {l1 BAL 10, l2 BOS 5} and GB = {l3 WAS 6, u BOS 8}. The joined plan of A and B that can be achieved by running the fusion algorithm (Algorithm 4.1) is shown in Figure 7.
5
Conclusions and future work
One of the unique features of multi-agent systems is that individual agents can cooperate in order to achieve their goals. One reason to cooperate is that agents cannot realize their goals individually (see, e.g., [1, 6, 13, 14]); another reason is that cooperation leads to a more efficient means to realize their goals. In this paper, we concentrated on the latter. We described a computational framework, consisting of resources and skills, to model cooperation processes between different agents. Central in this framework is that we model side products explicitly, so that other agents can exploit unused resources. Furthermore, it is possible to specify time properties of resources and of skills. Using this time information, it is possible to specify deadlines of goals and to model the duration of skills. Another result of this paper is an algorithm that in polynomial time fuses the individual plans of a set of agents, such that the joint profit of the agents does not decrease. Despite its simplicity, the current framework is rather general, although it suffers from at least the following shortcomings: First, we have a restriction on the time intervals, that is natural for our application domain, but may be annoying in other domains. Second, we don’t have the ability to specify additional properties of resources, and relations between resources like a truck and its load, or the speed of a truck, and the inability to handle conflicting goals (i.e. it is impossible to achieve both). Third, the current framework does not allow us, for example, to express the fact that in a given state two types of products should not be present at the same time. Finally, we should experiment more with our possibility to represent costs, e.g., we should base the improvement of plans on the cost properties of resources and skills.
SA =
SB =
f f : (y1 : l1 BAL; tfy1 = tfx2 + 1) ← (x1 : l1 NYO, x2 : NYO BAL ; tx1 ≤ tx2 ), f f f f f f NYO : (y1 : NYO , y : , y : u BAL ; t = t , t = t , t = t 2 3 y1 x1 y2 x1 y3 x1 + 1) ← (x1 : u NYO; ), BAL BAL f : (y1 : l2 BOS; tfy1 = tfx2 + 1) ← (x1 : l2 NYO, x2 : NYO ; t ≤ tfx2 ), BOS x1 f f f f f NYO NYO : (y1 : BOS , y2 : BOS , y3 : u BOS; ty1 = tx1 , ty2 = tx1 , ty3 ) = tfx1 + 1) ← (x1 : u NYO; ) , f f f f f f NYO : (y1 : NYO BOS , y2 : BOS , y3 : u BOS; ty1 = tx1 , ty2 = tx1 , ty3 = tx1 + 1) ← (x1 : u NYO; ), f f f f f NYO BAL f : (y WAS; ty1 = tx3 + 1) ← (x1 : l3 NYO, x2 : BAL , x3 : WAS ; tx1 ≤ tx2 , tx2 ≤ tx3 − 1), 1 : l3 NYO NYO BAL BAL y1 : BAL , y2 : BAL , y3 : WAS , y4 : WAS , y5 : u WAS; ← (x1 : u NYO; ) . s8 : tfy1 = tfx1 , tfy2 = tfx1 , tfy3 = tfx1 + 1, tfy4 = tfx1 + 1, tfy5 = tfx1 + 1
s1 s2 s3 s4 s6 s7
For all load (l) and truck (u) resources: tl = ∞ and for all transport resources r: tlr = tfr .
Figure 6: The skills of agent A and B. l1 BAL 2
l3 WAS 2
s1
l2 BOS 2
s7 NYO 1 BAL 1 BAL WAS u WAS 1
l1 NYO 0 NYO 1 BAL
s3 l2 NYO 1
l3 NYO 1 BAL 1 WAS
u NYO 0
s4
u NYO 0
[5] B.-J. Moree, A. Bos, H. Tonino, and C. Witteveen. Cooperation by iterated plan revision. In Proceedings of the ICMAS 2000, 2000. [6] J.P. M¨ uller. The Design of Intelligent Agents: a layered approach. Springer, 1996.
NYO 1 BOS
s8
u NYO 0
NYO 1 BOS u BOS 1
[4] Donald E. Knuth. Semantics of contextfree languages. Mathematical Systems Theory, 2(2):127–145, June 1968.
u NYO 0
[7] Simon Parsons and J.R. Jennings. Negotiation through argumentation – a preliminary report. In Proceedings of the Second International Conference on Multiagent Systems, 1996.
Figure 7: Joined plan PA
[8] T. Sandholm and V. Lesser. Coalitions among computationally bounded agents. Artificial Intelligence, 94(1):99–137, 1997.
These shortcomings will be dealt with in future versions of the framework. We will also pursue research to be able to deal with plan fragments, to create robust plans (and plans having alternatives built in or leaving some details to be filled in at run-time), and to do some form of replanning.
[9] O. Shehory and S. Kraus. Methods for task allocation via agent coalition formation. Artificial Intelligence, 101(1–2):165–200, 1998.
References [1] M. d’Inverno, M. Luck, and M. Wooldridge. Cooperation structures. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, Nagoya, Japan, pages 600–605, 1997. [2] R. E. Fikes and N. Nilsson. STRIPS: A new approach to the application of theorem proving to problem solving. Artificial Intelligence, 5(2):189–208, 1971. [3] H. Kautz and B Selman. BLACKBOX: A new approach to the application of theorem proving to problem solving. In Working notes of the workshop on planning as combinatorial search, held in conjunction with AIPS’98, Pittsburgh, PA, 1998.
[10] O. Shehory, S. Kraus, and O. Yadgar. Emergent cooperative goal-satisfaction in large-scale automated-agent systems. Artificial Intelligence, 110(1):1–55, 1999. [11] H. Tonino, A. Bos, and C. Witteveen. Replanning by revision in collective agent based systems. Technical Report PDS-2000-004, Faculty of Information Technology and Systems, Delft University of Technology, 2000. http://www.pds.twi.tudelft.nl/. [12] P. van Hentenryck. Constraint logic programming. The Knowledge Engineering Review, 6:151–194, 1992. [13] E. Werner. Distributed cooperation algorithms. In Y. Demazeau and J.-P. M¨ uller, editors, Decentralized A.I., pages 17–31. Elsevier Science Publishers B.V., 1990. [14] M. Wooldridge and N.R. Jennings. The cooperative problem solving process. Journal of Logic & Computation, 9(4), 1999.