Plan Merging in Multi-Agent Systems
Mathijs de Weerdt
Illustrations: Welmoed Kreb Cover illustration: Welmoed Kreb Cover design: Joke Herstel, Wenk
Plan Merging in Multi-Agent Systems
Proefschrift
ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus prof.dr.ir. J.T. Fokkema, voorzitter van het College van Promoties, in het openbaar te verdedigen op maandag 15 december 2003 om 15:30 uur door Mathijs Michiel DE WEERDT doctorandus in de Informatica geboren te Alkmaar
Dit proefschrift is goedgekeurd door de promotoren: Prof.dr.ir. H.J. Sips Prof.dr. J-J.Ch. Meyer Samenstelling promotiecommissie: Rector Magnificus Prof.dr.ir. H.J. Sips Prof.dr. J-J.Ch. Meyer Dr. C. Witteveen Prof.dr.ir. P.H.L. Bovy Prof.dr.ir. R.E.C.M. van der Heijden Prof.dr. C. Roos Dr. K.S. Decker Dr.ir. N. Roos
voorzitter Technische Universiteit Delft, promotor Universiteit Utrecht, promotor Technische Universiteit Delft, toegevoegd promotor Technische Universiteit Delft Katholieke Universiteit Nijmegen Technische Universiteit Delft, reservelid University of Delaware, VS Universiteit Maastricht
TRAIL-Thesis Series T2003/8, The Netherlands TRAIL Research School
SIKS Dissertation Series No. 2003-15. The research reported in this thesis has been carried out under the auspices of SIKS, the Dutch Research School for Information and Knowledge Systems. Published and distributed by: Mathijs M. de Weerdt E-mail:
[email protected] ISBN: 90-5584-052-1 Keywords: multi-agent systems, AI planning, coordination, resource allocation c 2003 by Mathijs M. de Weerdt Copyright All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without written permission of the author. Printed in The Netherlands
Preface
The research presented in this thesis was performed at the Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS), formerly called Information Technology and Systems (ITS) of the Delft University of Technology. Within the Parallel and Distributed Systems (PDS) group, a project on Collective Agent-Based Systems (CABS) was started in 1998. The CABS program aims at developing specification methods, algorithmic techniques, and implementations of large-scale agent-based systems and organizations of agent-based systems. Since these formalizations and algorithms can contribute to “the solution of scientific, business and societal problems in the field of transport, infrastructure and logistics”, this research is carried out within the research school on Transport, Infrastructure and Logistics (TRAIL, 1994–), more specifically within the Delft interdisciplinary research center (DIOC) research program Seamless Multi-Modal Mobility (SMM).
The ‘Collective Agent-Based Systems’ (CABS) projects The research projects within the CABS project concern two different themes. On the one hand, a theory and methods are being developed to realize inter-organizational coordination. On the other hand, incident management methods are studied to guarantee the execution of coordinated plans in dynamic situations. This work on incident management is carried out by two colleagues. Jonne Zutt (2000) concentrates on the recognition of unexpected events (diagnosis) and the reactions on an operational level, whereas Roman van der Krogt (2000; 2003) is more interested in how a whole plan should be modified on a tactical level to deal with more serious events that affect the applicability of such a plan. The research on inter-organizational coordination in the CABS project focuses around the coordination of the planning of actions. Initially, a distinction is made between preand post-planning coordination processes. Jeroen Valk (2001) studies multi-agent task allocation protocols to guarantee that each agent is able to construct its own plans autonomously (pre-planning), while the work in this thesis shows how to coordinate such separately constructed plans afterwards (post-planning). Eventually, the planning and the i
ii
Plan Merging in Multi-Agent Systems
coordination processes should be integrated into a single coordinated planning process. Finally, the theme on inter-organizational coordination is supported by the development of a specification language for coordination processes in multi-agent systems. This work uses a combination of game theory and (modal) logic, and is carried out in cooperation with the Intelligent Systems group at Utrecht University (Harrenstein et al., 2001). The first main contribution of this thesis is a formal framework to express multi-agent planning problems and multi-agent plans in an object oriented way. This framework can also be used on the operational level to formulate the replanning of actions. A second contribution is a method to improve multi-agent plans. This method is formally introduced and analyzed as well as subjected to an empirical investigation using a data set from a taxi company. These techniques for solving multi-agent or multi-organizational planning and coordination problems in general will help to solve specific coordination problems in many domains, and not in the least, in transportation domains.
Acknowledgments I would like to thank everyone who supported me while I was writing this thesis. First, and foremost, I thank Cees Witteveen, who was and is always available to guide and encourage me. Second, I thank John-Jules Meyer for his enthusiasm, and Henk Sips for trying to speed up my finishing this thesis. All of them always combined their constructive comments with stimulating statements. Furthermore, I thank Nomi Olsthoorn, who not only helped me with her favorite statistical tool, but also supported me upon drawbacks and made me aware of my luck (especially compared to hers). I am grateful to Roman van der Krogt, because he almost always offered useful critique and a cup of tea whenever I was stuck. I thank André Bos and Hans Tonino for their ideas and support in the first couple of years of my research, and Jonne Zutt, Jeroen Valk, and Paul Harrenstein for their easy company at every CABS occasion. I am also very grateful to all my other colleagues in the Parallel and Distributed Systems group (Arjan, Dick, Frits, Jan, Johan, Kees, Koen, Leo, Linda, Marcel, to name the most important ones) for creating a friendly and healthy (well, maybe except for the cake on Friday) working environment, and especially Leon Aronson for proof-reading most of my thesis. Besides, I am quite happy that Ard Biesheuvel and Léon Planken finished the CABS-planner before I finished my thesis. I appreciated all the (social) events, courses and all kinds of forms industriously provided by TRAIL, and I thank Piet Bovy and Ruud Sommerhalder for making this research school, the SMM project, and more specifically my own research within CABS possible. I thank Wiebe van der Hoek for pointing out the CABS research program and driving me to Delft for my application visit.
Preface
iii
More concrete results of people helping and supporting me are the data set used in Chapter 6, which is provided by Pim van der Stoel from Tens B.V. and Ron Kooi from Taxi Zeevang, the help I received from Mirjam Nieman to improve my style of writing, and the illustrations about the planners Pete, Monica and Ralph transforming to a (merged) taxi made by Welmoed Kreb. I would also like to thank Robbert Kok for his warm welcome in Delft and together with Victor van Rijn and Hiske Wegman for their friendship and inspiring discussions. I thank my parents for their stimulating interest and encouragement, my brother Maarten for his very good friendship ever since I left home and Barbra for her support when Maarten had a heart operation and my parents were in Ecuador. Finally, I am very happy with the sea scouts from Marco Polo Alkmaar and all my friends there, because whenever I am among them I completely forget all about this research and its unsolved problems.
iv
Plan Merging in Multi-Agent Systems
Contents
1
2
3
4
Introduction 1.1 Inter-organizational coordination 1.2 Coordinated planning . . . . . . 1.3 Research contributions . . . . . 1.4 Overview . . . . . . . . . . . .
. . . .
1 1 3 4 5
. . . . . . . . . . .
7 8 12 18 20 21 25 27 27 31 32 33
. . . . .
37 38 45 48 56 64
Plan reduction and plan merging 4.1 Plan reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Splitting plans . . . . . . . . . . . . . . . . . . . . . . . . . . .
69 70 73
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
From planning to multi-agent planning 2.1 The classical planning problem . . . . . . . . . . . . . . . . . 2.2 Refinement planning . . . . . . . . . . . . . . . . . . . . . . 2.3 Extended planning problems . . . . . . . . . . . . . . . . . . 2.4 Multi-agent planning: by and for agents . . . . . . . . . . . . 2.4.1 Problem definition . . . . . . . . . . . . . . . . . . . 2.4.2 Complexity of multi-agent planning . . . . . . . . . . 2.4.3 Centralized planning for multiple agents . . . . . . . . 2.4.4 Explicit distributed planning . . . . . . . . . . . . . . 2.4.5 Implicit distributed planning . . . . . . . . . . . . . . 2.4.6 Other related work in distributed artificial intelligence 2.5 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . The action resource formalism 3.1 Resources . . . . . . . . . 3.2 Actions . . . . . . . . . . 3.3 Planning . . . . . . . . . . 3.4 Operational semantics . . . 3.5 Constraints . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . .
. . . . . . . . . . .
. . . . .
. . . .
. . . . . . . . . . .
. . . . .
. . . .
. . . . . . . . . . .
. . . . .
. . . .
. . . . . . . . . . .
. . . . .
. . . .
. . . . . . . . . . .
. . . . .
v
vi
Plan Merging in Multi-Agent Systems
4.2
5
6
4.1.2 Plan joining . . . . . . . . . . 4.1.3 Plan reduction . . . . . . . . Plan merging . . . . . . . . . . . . . 4.2.1 Multi-agent planning problem 4.2.2 Parallel composition of plans . 4.2.3 Plan merging . . . . . . . . .
Algorithms for merging multiple plans 5.1 From theorem to algorithm . . . . . 5.2 Circular dependencies . . . . . . . . 5.3 Ground plan merging . . . . . . . . 5.3.1 The problem . . . . . . . . 5.3.2 The algorithm . . . . . . . . 5.4 Flexible plan merging . . . . . . . . 5.4.1 The problem . . . . . . . . 5.4.2 The algorithm . . . . . . . . 5.5 Computational complexity . . . . .
. . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . .
. . . . . . . . .
Empirical results 6.1 Research questions and expected results . . . . . . . . . . . 6.1.1 The performance of the plan merging algorithm . . 6.1.2 Utility analysis for taxi companies . . . . . . . . . . 6.2 Data set and assumptions . . . . . . . . . . . . . . . . . . . 6.2.1 Descriptives . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Assumptions . . . . . . . . . . . . . . . . . . . . . 6.3 Distance-time function approximation . . . . . . . . . . . . 6.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Influence of the date . . . . . . . . . . . . . . . . . 6.5.2 Influence of the capacity of the taxis . . . . . . . . . 6.5.3 Influence of the allowed detour distance . . . . . . . 6.5.4 Influence of the allowed time margins for passengers 6.5.5 Influence of the number of agents and taxis . . . . . 6.5.6 Run-time analysis . . . . . . . . . . . . . . . . . . 6.5.7 Improvement of the efficiency . . . . . . . . . . . . 6.5.8 Greedy behavior . . . . . . . . . . . . . . . . . . . 6.6 Discussion of results . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . .
74 76 83 83 84 85
. . . . . . . . .
89 93 94 100 100 103 107 107 108 110
. . . . . . . . . . . . . . . . . .
113 114 114 116 117 118 118 121 123 124 125 126 128 128 129 130 133 137 137
Contents 7
Conclusions and perspectives 7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Extensions of the formalism . . . . . . . . . . . 7.2.2 Extensions of the plan merging algorithm . . . . 7.2.3 Further experiments . . . . . . . . . . . . . . . 7.2.4 More advanced multi-agent planning algorithms
A Mathematical notations and definitions A.1 S-ranked alphabet, terms and formulas . . . . . . . . . . A.2 Semantics of formulas in terms of a many-sorted algebra A.3 Resource logic with constraints . . . . . . . . . . . . . . A.4 Plan joining for embedded plans . . . . . . . . . . . . . A.5 Complexity of the subplan reduction problem . . . . . . A.6 Plan-merging algorithms in more detail . . . . . . . . . A.6.1 Ground plan merging . . . . . . . . . . . . . . . A.6.2 Flexible plan merging . . . . . . . . . . . . . .
vii
. . . . . .
. . . . . . . .
. . . . . .
. . . . . . . .
. . . . . .
. . . . . . . .
. . . . . .
. . . . . . . .
. . . . . .
. . . . . . . .
. . . . . .
. . . . . . . .
. . . . . .
. . . . . . . .
. . . . . .
. . . . . . . .
. . . . . .
141 142 143 143 144 144 145
. . . . . . . .
147 147 149 152 154 154 156 156 157
Bibliography
161
Index
172
Summary
175
Samenvatting
179
Curriculum Vitae
183
viii
Plan Merging in Multi-Agent Systems
Chapter 1 Introduction 1.1
Inter-organizational coordination Somewhere in a medium to large city, Pete is working in the office of a moderately small taxi company, called P-Tax. Pete is the planner of this company. His job is to assign transportation tasks to the taxi-cabs. First, he enters transportation requests from customers into a database in his computer. Then a planning algorithm composes a possible schedule. Pete is glad to have such help, because especially planning the shared transportation in vans, also offered by this company, can become quite complicated. During the day new requests come in, and the schedule is updated regularly. Sometimes, when the number of requests is extraordinarily high, Pete contacts Monica, a befriended planner at another taxi company (M-Cabs) by phone, and asks for help. Usually, M-Cabs is able to take over some of the requests. Once or twice a day a customer requests an unusually long trip. If the cab concerned has to return empty, this is a very inefficient use of resources. Therefore, whenever he can, Pete checks with Monica to see if they have a trip in that direction as well, or even better, a trip on the way back. If possible, the planners agree on combining their requests. A similar agreement is sometimes made when passengers need to be picked up in a region or suburb where one company has some cars available nearby, while the cabs of the other are at the other side of town. Both taxi companies P-Tax and M-Cabs profit from such cooperation.
Taxi companies can clearly benefit from having their plans coordinated even without revealing all information about their trips to each other or to a hired third party. Likewise, trucking companies can benefit from exchanging some of their freight (Hengst1
2
Plan Merging in Multi-Agent Systems
Bruggeling, 1999). In fact, all organizations create a planning and cooperate with other organizations. Consequently, their plans should be coordinated to prevent conflicting use of resources or to . The following case exemplifies such a more general situation.
A group of people is working on the planning of the activities of all the subdivisions of a large firm. This firm has many contacts with other companies. Some of these other companies are clients, and others are subcontractors. The planners for this firm do not only need to coordinate the actions of all the subdivisions, but they also have to prevent conflicts with other companies about the use of resources. If possible, they should even try to exploit potential benefits of this cooperation, such as the exchange of unused resources. Unfortunately, their clients and subcontractors are not prepared to send all the details of their plans to these planners, so the planners have to call these other companies to request more relevant information. Using the latest developments in planning and scheduling technology, their computers can find some initial solutions quite fast. Thanks to their experience and their knowledge about the other companies, the planners can then adjust parameters and constraints to come up with a satisfying (almost) conflict-free plan for all subdivisions.
These two stories illustrate both the importance and the omnipresence of interorganizational coordination. By paying extra attention to coordinating their plans, companies are able to come up with schedules that do not conflict with those of others and that assign the use of (shared) resources more efficiently. Still, most organizations do not want to relinquish all information concerning their private plans, as they are afraid of losing their autonomy. Such situations do not only occur between public transport and cargo transport companies, but, for example, also when businessmen try to make a new appointment, when several students are preparing their meals in the same kitchen, when military forces of different countries prepare a joint campaign to fight terrorism (Wilkins and Desimone, 1994), or when different departments within a hospital plan the use of the X-ray machine, the operating rooms, the beds, their personnel and the patients (Decker and Li, 2000). In some of these examples, privacy of plans is not such a big issue as it is when competitors are coordinating their plans. However, privately constructing and coordinating the plans of the participants has another advantage. Often planning problems can be naturally divided into subproblems, one for each participant, which then can be solved in parallel and coordinated afterwards. Such a subdivision can reduce the time complexity considerably (cf. Korf, 1987), see Section 3.2.
Chapter 1. Introduction
1.2
3
Coordinated planning
The problems of planning and coordinating plans of autonomous organizations recently gained much attention, not only because of the substantial financial concerns, but also because of the academic interest in these problems. Since the late 60s, researchers (e.g., Fikes and Nilsson, 1971; Sacerdoti, 1975; Penberthy and Weld, 1992; Blum and Furst, 1997) have been studying the problem of creating plans for one organization, called artificial intelligence planning. Since the beginning of the 80s, interaction and synchronization between plans of autonomous entities, usually called agents, have been studied (e.g., Georgeff, 1983; Konolige and Nilsson, 1980; Rosenschein, 1981). In these works plans are coordinated from a central point of view, i.e., where a third party analyzes (parts of) the plans, and tries to determine and resolve conflicting parts of the plans (e.g., Ephrati and Rosenschein, 1993b), and to take advantage of positive interactions as well (e.g., Von Martial, 1992). Later, distributed algorithms to deal with both conflicts and potential benefits have been studied, and even been implemented for multi-vehicle cases (e.g., Alami et al., 1995; Decker and Lesser, 1992), but in these approaches it is required that large parts of the plans are exchanged. It may be no problem for a mobile robot that others know its plan, but for autonomous organizations such an intensive knowledge exchange is not desirable in most situations. These existing approaches to planning and coordination of plans will be described in more detail in Chapter 2. All current approaches to specify plans use languages such as STRIPS (Fikes and Nilsson, 1971) and PDDL (McDermott, 1998). These languages use sets of propositional atoms to model each state of the world and each effect of an action. In these approaches, several propositions are required to describe the properties of one object. Especially in a multi-agent environment, it is quite complex to find the correct subset of propositions describing resources that may be interchanged or shared between agents. The formal basis of planning and coordination should be more object-oriented, i.e., it should be able to represent the real entities (resources) such as a truck, a cab, or a person more directly. Such a resource fact should be an encapsulation of the properties of the real entity. In this way a resource fact can also have a different significance to different agents, e.g., one agent can try to produce a cab from a set of parts (wheels, engine, frame, etc.), and then give this resource fact to another agent that can use a cab to transport people. Once the cab has worn out, it can be returned to another agent to be disassembled into scrap metal and useful parts. Another example is that of a company that requests a bus resource to transport a group of people. Unfortunately, no one is able to provide such a bus at that time. However, a bus resource can be constructed from a collection of taxi resources. Together, these taxi resources provide a service that is similar to the requested bus resource.
4
Plan Merging in Multi-Agent Systems
What is really missing in current research on coordinated planning is therefore a formal way that explicitly deals with resources when describing multi-agent planning problems. Furthermore, we need a commonly accepted definition of a simple multi-agent planning problem where agents are really autonomous. For such a coordination problem, we need an organized set of algorithms that do not require complete knowledge about the plans of other agents. Therefore, in this thesis, we will introduce a formal framework using resources to describe multi-agent plans, and we will present and analyze algorithms that use this framework to coordinate planning agents without revealing all vital information. Such an approach will enable the use of a more sophisticated (and automated) coordination of the plans of organizations.
1.3
Research contributions
This thesis contributes to the research on multi-agent planning and inter-organizational coordination in a number of ways. • A simple multi-agent planning problem is defined that preserves the autonomy of the involved agents, and the complexity of this multi-agent planning problem is proved. • A formalism is introduced to describe planning problems and plans in a more object-oriented way, and an operational semantics for this action resource formalism is given. • The action resource formalism is used to derive a reduction property of plans, i.e., the condition under which actions can be removed from a plan without invalidating the plan. • A plan merging problem is defined and the complexity of this problem is analyzed. • A polynomial any-time algorithm to solve this plan merging problem is given based on the property of plan reduction. • The time complexity and the performance of this algorithm are analyzed by experimenting with a data set from a taxi company. • The effect of allowing passengers to share taxis is analyzed using the plan merging algorithm and the given data set.
Chapter 1. Introduction
1.4
5
Overview
In Chapter 2, we will describe existing methods to construct a plan to attain a goal, and we will show why and how these methods have been extended and improved to deal with multiple agents. We will also define a basic multi-agent planning problem, and we will analyze what results have already been obtained, and what challenges are left concerning these kind of multi-agent planning problems. Our formal framework to deal with both resources and multiple agents will be presented in Chapter 3. Then we will study the semantics and the properties of this framework. In Chapter 4 the specific property of plans that can be reduced is analyzed. This property is used in Chapter 5 to deal with multi-agent planning under the assumption that each agent has constructed a fully functional, but possibly very inefficient plan on its own. This algorithm, called ‘Plan Merging’, analyzes potential inefficiencies that can be resolved by combining the plans of the agents. Finally, in Chapter 6 we will show how this algorithm can be used and how it performs, and we discuss the results of some experiments. In this chapter we also present the effect of taxi sharing for a taxi company.
6
Plan Merging in Multi-Agent Systems
Chapter 2 From planning to multi-agent planning Many situations involve decision making: for example, a taxi company that has some transportation tasks to be carried out, or a large firm that has to distribute a lot of complicated tasks among its subdivisions or subcontractors. Often the problem structure is the same: (the relevant part of) the world is in a certain state, but managers or directors would like it to be in another state. The (abstract) problem of how one should get from the current state of the world through a sequence of actions (possibly concurrent) to the desired goal state is a planning problem. Ideally, to solve such planning problems, we would like to have a general planning-problem solver. Since it turned out to be inherently impossible to design an algorithm with a reasonable time complexity that can solve all planning problems correctly, several simplifications of this problem have been made in the past, eventually leading to what may be called ‘the classical planning problem’. Although not all realistic problems can be modeled and solved as such a classical planning problem, they can help to solve more complex problems. In this chapter, we give an overview of planning techniques for this classical planning problem and techniques for extensions of this problem. We show how a basic search for a plan can be performed: by starting from the initial state (forward planning), or from the goal states (backward planning), and by constructing plans by adding actions in a non-sequential order (such as is done in least-commitment planning). Then we discuss advanced heuristics to guide this search, which is quite useful since in general the planning problem is very hard (see Theorem 2.8). The heuristics are followed by extensions of the classical planning problem, such as dealing with uncertainty, time, and limited resources (fuel, capacity, money, etc.), and we show how they can be dealt with. We also discuss the relation of these planning techniques to optimization techniques like branch-and-bound search that are more often used in industry. In Section 2.4, we introduce multi-agent planning. Multi-agent planning approaches can roughly be divided into two categories: those that explicitly deal with the interaction 7
8
Plan Merging in Multi-Agent Systems
G
unload(passgr) move(B,C) load(passgr) Legend move(A,B) a()
I
I G
action set of initial states set of goal states
Figure 2.1: A sequence of actions (plan) leading from each of the initial states in I to one of the goal states in G. of plans of agents and those that use some kind of market or economy to ensure that no conflicts occur and (possibly) to help agents take advantage of positive interactions. Each agent in such a multi-agent system has to solve some form of planning problem. The most studied and accepted ‘basic’ planning problem is the classical planning problem.
2.1
The classical planning problem
The classical planning problem can be defined as follows (Weld, 1999): Given • a description of the known part of the initial state of the world in a formal language, usually propositional logic, denoted by I, • a description of the goal (i.e., a set of goal states) in a formal language, denoted by G, and • a description of the possible (atomic) actions that can be performed, modeled as state transformation functions, determine a plan, i.e., a sequence of actions that transform each of the states fitting the initial configuration of the world to one of the goal states. The formal language that was used for STRIPS (Fikes and Nilsson, 1971) is common to most classical planning frameworks. This language is also used in Definition 2.6 to give a more formal definition of the classical planning problem.
Chapter 2. From planning to multi-agent planning
9
Example 2.1. Suppose that initially (i.e., in all states of the world that match the description I), there is a taxi at a location A, represented by a binary state variable taxi(A), and a passenger at a location B, represented by passgr(B). In each of the states described by G the passenger should be at a location C, denoted by passgr(C). Furthermore, suppose that there are three actions that can transform (some part of) the state of the world. 1. The taxi can move from one location to another: move(x, y) with x, y ∈ {A, B,C}. This action requires that a priori taxi(x), and ensures that in the resulting state ¬taxi(x) and taxi(y) hold. 2. The passenger can get in the taxi: load(p). This action requires a priori taxi(x) and passgr(y) and x = y, and in the resulting state both ¬passgr(y) and passgr(taxi) should hold. 3. The passenger can get out of the taxi: unload(). This action requires taxi(x) and passgr(taxi), and results in ¬passgr(taxi) and passgr(x). A plan that represents a solution to this problem is shown in Figure 2.1. Remark 2.2. Note that it is not possible nor desirable to completely describe the state of the world and everything that changes. We use the assumption that ‘all that is not explicitly changed by an action remains unchanged’ (Janlert, 1987) to deal with this frame problem (Raphael, 1971). Consequently, we describe (frame) a state of the world only by the relevant literals. Such a set of literals is called a state specification. Under this assumption, the difference between ‘state’ and ‘state specification’ is irrelevant. So often ‘state’ is used while ‘state specification’ is meant. In STRIPS, the STanford Research Institute Problem Solver (Fikes and Nilsson, 1971), states are described using binary state variables, called propositions. Actions or operators are specified by (i) conditions on propositions, called preconditions (ii) propositions that are changed to ‘true’ in the new state, called add effects, and (iii) propositions that are changed to ‘false’, called delete effects. Furthermore, goals can be described by conditions on propositions. The propositional STRIPS planning language has been formalized, and the complexity of the problems that can be specified in this language has been analyzed by Lifschitz (1987). The following formal treatment is based on a further analysis of STRIPS planning instances performed by Nebel (2000). To formally specify a propositional planning formalism, we need the following notations. Given the set of all propositional atoms Σ, the set of all literals over Σ including ⊥ ˆ For a set of lit(bottom, to denote ‘false’) and > (top, to denote ‘true’) is denoted by Σ. erals L ⊆ Σˆ we define ¬L to be the set of literals {¬l | l ∈ L}, where ¬l ≡ l 0 if l = ¬l 0 and
10
Plan Merging in Multi-Agent Systems
l 0 ∈ Σ. The set of all formulas is denoted by PROP, and defined as follows: if p ∈ Σ then p ∈ PROP, and if A, B ∈ PROP then ¬A, (A ∨ B), (A ∧ B), (A → B), and (A ↔ B) ∈ PROP. Definition 2.3. (from Nebel, 2000) An operator o ∈ 2Σ × 2Σ is defined by a precondition pre and its effect post, denoted by hpre, posti, where pre ⊆ Σ ⊆ Σˆ is a set of propositional atoms, and post ⊆ Σˆ is a set of literals. ˆ
We use the notation pre(o) and post(o) to denote the precondition and the effect of an operator o, respectively. The propositional atoms in the precondition of an operator ˆ must all be true in the state s ∈ 2Σ to which the operator is applied. Let O be the set of all operators in a domain. Definition 2.4. (from Nebel, 2000) The application App : 2Σ × O → 2Σ of an operator or action o to a state (specification, see Remark 2.2) s is defined as1 ˆ
ˆ
s − ¬ (post(o)) ∪ post(o) if s |= pre(o) and s 6|= ⊥ and App(s, o) = post(o) 6|= ⊥ {⊥} otherwise That is, the negations of the literals in the effect clause will be removed, and the literals themselves will be added to the state s. Using Definition 2.4, we can define the result Res(s, ∆) of applying a sequence of operators ∆ to a state s. Definition 2.5. The result Res : 2Σ × O∗ → 2Σ of applying a sequence of operators ∆ = ho1 , . . . , on i to a state (specification, see Remark 2.2) s is recursively defined by ˆ
ˆ
Res (s, hi) = s Res (s, ho1 , o2 , . . . , on i) = Res (App(s, o1 ), ho2 , . . . , on i) Finally, Nebel defines a planning problem in propositional STRIPS as follows. Definition 2.6. (from Nebel, 2000) A planning problem in the propositional STRIPS formalism is a four tuple Π = (Σ, O, I, G) where • Σ is a countably infinite set of propositional atoms, called facts or fluents, • O ⊆ 2Σ × 2Σ is the set of all possible operators to describe state changes in this domain, ˆ
• I ⊆ Σˆ is the initial state, and 1 We
use the standard propositional entailment (|=) here.
Chapter 2. From planning to multi-agent planning
11
SBIC
SIC
SLIC
SBC
SBI
SLC
SLI
SB
SC
SI
SL
S Figure 2.2: The specialization relationships of planning formalisms based on syntactic restrictions (Nebel, 2000). • G ⊆ Σ is the goal specification, i.e., the set of propositions that is to be satisfied. Definition 2.7. A sequence of operators ∆ = ho1 , . . . , on i ∈ O∗ is called a solution or a plan for a planning instance Π = (Σ, O, I, G) iff Res(I, ∆) |= G and Res(I, ∆) 6|= ⊥.2 The propositional STRIPS formalism (denoted by S) requires complete state specifications, unconditional effects, and propositional atoms as the formulas in the precondition list. This formalism can be extended in various ways: (i) state specifications may be incomplete (SI ), (ii) effects can be conditional (SC ), (iii) formulas in preconditions (and effect conditions) can be literals ⊆ Σˆ (SL ), and (iv) the formulas in preconditions (and effect conditions) can be arbitrary boolean formulas ⊆ PROP (SB ). Theorem 2.8. (from Nebel, 2000) Complexity of planning. The problem of deciding whether a plan exists for a given instance (i.e., the plan existence problem) is PSPACEcomplete for specifications in the propositional STRIPS formalism and for all combinations of the four extensions of this language (SI , SC , SL , and SB ) described above. This result is from Nebel (2000), who in turn relies on a proof by Bylander (1994). The idea behind this proof is as follows. Plan existence is in PSPACE because (i) a state is described by a polynomial number (n) of propositions, and thus there are at most 2n states. Therefore (ii) the plan length is at most 2n , (iii) so plan existence of a plan of length p 2 We
use the word ‘iff’ as a short hand for ‘if and only if’.
12
Plan Merging in Multi-Agent Systems
from a state s1 to a state s2 can be determined using an algorithm that uses O (log(p)), hence polynomial, space: for any intermediate state s3 determine plan existence both from s1 to s3 , and from s3 to s2 , recursively. The depth of the recursion of this algorithm is at most O (log(p)) = O(log(2n )) = O(n). The hardness of plan existence in Bylander’s proof is based on the fact that each PSPACE problem can be translated into a planning problem, because the polynomial (Polynomial SPACE) number of Turing machine state transitions can be taken as actions. Nebel also analyzes which instances of the different extensions of the planning problem can be translated into each other, and which cannot. Figure 2.2 shows a graph of these specialization relations between combinations of the problem extensions, indicated by combinations of the subscripts I , C , L, and B. An arrow from an extensionSL (allow literals) to another extension SB (allow arbitrary boolean formulas) means that any planning problem specified in SL can be translated into a problem in SB in polynomial time, while the plan size increases at most linearly in the size of the input. Apart from a specialization relationship of all extensions of the STRIPS formalism, Nebel’s most interesting results are (i) that incomplete state specifications and literals in preconditions can be compiled to the basic STRIPS formalism where the plan size is preserved, and (ii) that incomplete state specifications and literals in preconditions and effect conditions can be compiled to the basic STRIPS formalism with conditional effects preserving plan size exactly, and (iii) that there are no other compilation schemes preserving plan size linearly except those implied by the specialization relationship and results (i) and (ii). Many approaches exist to solve the classical planning problem specified using the STRIPS formalism and its extensions. We use some of these approaches in a multi-agent setting, so in the next section we briefly discuss the most relevant ones. We describe these approaches as variants of a general method based on the refinement of the set of all possible plans. This generalization has been made by Kambhampati (1997).
2.2
Refinement planning
The search for a sequence of actions (a plan) to get from an initial state to a goal state can be seen as a refinement of the set of all possible sequences (Kambhampati, 1997). We can describe most existing (classical) planning algorithms using this unifying view on planning. The unifying concept, representing a set of candidate sequences, is a so-called partial plan. This partial plan is used to describe a set of partial solutions. Planning algorithms are given by defining how a partial plan is modified such that in the end all (partial) solutions represented by the partial plan are complete, feasible solutions.
Chapter 2. From planning to multi-agent planning
13
Definition 2.9. (from Kambhampati, 1997) The syntax of a partial plan, usually denoted by Pi , for a planning problem Π = (Σ, O, I, G) is a directed acyclic graph (V, E, IPC, PTC), where V is a set of the nodes representing the application of actions from O, E ⊆ V ×V is the set of directed edges representing the required precedences of these actions, and IPC and PTC are two kinds of auxiliary constraints: • IPC ⊆ V × V × PROP is a set of interval preservation constraints specifying that a formula should hold during a certain interval (between two actions). Currently, the only used instantiations of such constraints are so-called causal links.3 A causal p link oi → o j specifies that the precondition p of action o j should be an effect of oi that may not be undone between oi and o j . • PTC ⊆ V ×PROP is a set of point truth constraints specifying that a formula should hold at a certain point before an action. In current planners these constraints are only used to specify open preconditions of actions. The semantics of a partial plan is the set of action sequences that (i) contain at least the actions represented by the nodes V , and (ii) fulfill the precedence constraints implied by the directed edges E. Furthermore, these action sequences should satisfy (iii) all interval preservation constraints IPC, and (iv) the point truth constraints PTC. These action sequences are called candidate plans or sequences of a partial plan Pi , denoted by candidates(Pi ). The IPC and PTC are especially useful if plans are not constructed bottom-up or topdown. When actions are added in no particular order, somehow it must be stored why a particular action is added at all (e.g., to meet the precondition of a subsequent action), and, more importantly, it has to be prevented that another action undoes its required result. This kind of information can be stored using the interval preservation constraints. Point truth constraints specify which preconditions still have to be satisfied. They can also be used to indicate at (or before) which point in a plan they need to be fulfilled. For some algorithms, intermediate results cannot be represented by one partial plan. Therefore, we think in terms of sets of partial plans. We define the candidate plans of a set S of partial plans PS as candidates(PS) = Pi ∈P candidates(Pi ). Each subset of candidates is represented by a partial plan Pi ∈ PS, called a component. Minimal candidates are those candidates that contain only actions that are included in a partial plan (i.e., that are represented in V ). Given a partial plan, we need to define when a plan (sequence) is one of the candidates represented by this partial plan. Such a sequence is called a safe linearization. 3 Causal links are used by least-commitment planners (Weld, 1994).
at the end of this section.
These planners are briefly discussed
14
Plan Merging in Multi-Agent Systems
unload(passgr1 ) move(A, B) load(passgr1 ) in(passgr1 ,taxi) (PTC) I
Legend load(passgr2 ) at(taxi, A) (IPC)
move action interval preservation constraint (IPC) point truth constraint (PTC) precedence relation (A)
Figure 2.3: A partial plan describing all candidates that contain at least the four given actions, in the order given by the precedence relation and matching the interval preservation constraints and the point truth constraints. Definition 2.10. (from Kambhampati, 1997) A sequence ∆ = ho1 , . . . , on i ∈ O∗ is called a safe linearization of a partial plan Pi = (V, E, IPC, PTC), if • there exists a bijective function f : V → ∆, such that any v ∈ V represents the application of operator f (v) ∈ ∆ (v ≡ f (v)), • for any v, w ∈ V , v 6= w, if there is a path from v to w in (V, E) then f (v) < f (w) in ∆, • for any (v, w, ϕ) ∈ IPC, if oi = f (v) < f (w) = o j and if Res (I, ho1 , . . . , oi i) |= ϕ then
Res I, o1 , . . . , o j−1 |= ϕ, and • for any (v, ϕ) ∈ PTC, Res (I, ho1 , . . . , f (v)i) |= ϕ. Example 2.11. A taxi has to take two passengers passgr1 and passgr2 from location A to location B. Initially, both the taxi and the passengers are at location A. The partial plan that describes the set of candidate plans after some additional refinements (i.e., the addition of two load actions, a moveaction, and an unload action) is depicted in Figure 2.3. After some more refinements, each of the candidates should be a correct sequence of actions from the initial state to one of the desired goal states (i.e., where at(passgr1 , B) and at(passgr2 , B) hold). A planner can reduce the set of candidate plans represented by a set of partial plans, by (i) adding actions, (ii) adding precedences on these actions, and (iii) adding auxiliary constraints to one or more of the partial plans. A technique to find suitable refinements is called a refinement strategy, denoted by R . Proposition 2.12. (from Kambhampati, 1997) Refinement. A refinement strategy R refines (i.e., reduces) the set of candidate plans, represented by a set of partial plans PS: so, if PS0 = R (PS) then candidates(PS0 ) ⊆ candidates(PS).
Chapter 2. From planning to multi-agent planning
15
To evaluate the effectiveness of such strategies for refining the set of candidate plans, we look at specific properties. For example, it is important to know whether a strategy preserves all possible solutions (i.e., is complete). Definition 2.13. A refinement strategy R to find a set of solutions is called 1. progressive iff candidates(R (PS)) ⊂ candidates(PS) for any set of partial plans P, 2. complete iff solutions ∩ candidates(R (PS)) = solutions ∩ candidates(PS), and 3. systematic iff the candidates(R (PS)).
set
{candidates(Pi ) | Pi ∈ R (PS)}
is
a
partition
of
Although planning algorithms are implemented in many different ways, they can be rewritten in an alternative, uniform way, based on the theory of refining a set of potential solutions (Kambhampati, 1997). The structure they then have in common can be found in Algorithm 2.2. This algorithm describes how a solution result can be obtained from a set of partial plans PS. Each time the function R EFINE is executed, and none of the minimal candidates is a solution, a refinement strategy is selected and applied to one of partial plans P. Then repeatedly an element of the resulting set PS is selected, and this function R EFINE is called recursively using this element. Once a solution has been found, this process stops, and the result is returned. Example 2.14. The planning algorithm Fast Forward (FF) (Hoffmann and Nebel, 2001) starts with the initial state and an empty sequence of actions (plan). Repeatedly, the sequence is extended with actions, always adding to the end of the sequence. For each of the possible extensions of the sequence (first with one action, then with two actions, etc.), a heuristic value is calculated. The first of the possible extensions that leads to a state with a lower heuristic value than the current state is chosen. The heuristic uses the relaxation that no subsequent action has negative effects. Under this assumption, a so-called relaxed plan can be constructed where all actions with satisfied preconditions are added in parallel. The relaxed plan is constructed until the goal state is reached. The heuristic value is the cost of all actions in the relaxed plan that are needed to reach the goal state. Although FF does not use the presented refinement framework, it is in fact a form of refinement planning with the following refinement strategy RFF (which can be used in step 2.3 in Algorithm 2.2). Given one partial plan that represents a set of possible solutions, a new partial plan is constructed by extending this partial plan at the end. The extension is selected using the heuristic described above. See also Algorithm 2.2. Note that this particular refinement strategy uses only a singleton set of partial plans to represent of the set of candidates.
16
Plan Merging in Multi-Agent Systems
(R EFINE (P, Π) ) Input: A partial plan P and a problem Π. Output: A minimal candidate of P that is a solution to Π or ‘fail‘. begin 1. if a minimal candidate c of P is a solution to Π then 1.1. return c 2. else 2.1. result := fail 2.2. select a refinement strategy R 2.3. PS := R (P) 2.4. while PS 6= 0/ and result = fail do 2.4.1. non-deterministically select an element Pi of PS 2.4.2. PS := PS \ {Pi } 2.4.3. result := R EFINE(Pi , Π) 2.5. return result end
(RFF (PS) ) Input: A singleton set of partial plans PS. Output: A singleton set of partial plans that lead to a state with a lower heuristic cost. begin 1. h := the heuristic costs of the current end state 2. ∆ := the action sequence of a minimal candidate of PS 3. h0 := ∞ 4. while h0 ≥ h do 4.1. breadth-first select a sequence of actions ∆0 to extend ∆ 4.2. h0 := the heuristic value of the end state of ∆ + ∆0 5. return the partial plan that represents ∆ + ∆0 end
Chapter 2. From planning to multi-agent planning
17
Refinement strategies such as RFF can be roughly divided into three categories. 1. Progression or forward planning methods construct (partial) plans bottom-up: an action to be executed is selected and added to the end of the partial plan. The action-selection mechanism varies, but usually a heuristic is used to determine a next action to execute. Some methods do not use backtracking: once an action is selected, it will not be removed from the partial plan. Such planners are not complete, but can often find correct plans much faster. Examples of such planners are FF (Hoffmann and Nebel, 2001), HSP (Bonet and Geffner, 2002), and Prodigy (Veloso et al., 1995). 2. Regression refinement, also called backward planning methods, construct (partial) plans top-down: starting from the description of the set of end states, they determine an action to reach such a state from a state s that is presumably ‘closer’ to the initial state (i.e., a shorter sequence of actions is needed to reach this state s). The STRIPS planner (Fikes and Nilsson, 1971) is one of the first planners that use this technique. Many others use some of the ideas from this technique, for example as a heuristic for progression refinement, as in GRT (Refanidis and Vlahavas, 2001). An advantage of both progression and regression refinement is that these methods can describe a partial plan by a sequence of actions and a description of the state reached last, and that they can search in the state space to find what action to add next, and do not need a (more complex) plan space representation. Given such a state-space representation one can easily see whether a plan is valid by comparing the final state of the plan to the requirements of the goal state, or by comparing the first state of a plan to the initial state, respectively. 3. The third category, least-commitment planning (Weld, 1994), really needs the complicated plan space representation of the partial plans as used in the refinement framework, because this type of planning refines plans not only by extending the prefix or the postfix of the plan, but also adds constraints on the possible solutions in many ways. This category is sometimes also called partial order planning, because during the construction of a plan the order of the actions in the plan is partial, while the order of the actions in a partial plan based on forward planning is usually complete. The planners Noah (Sacerdoti, 1975) and POP (Weld, 1994) fit into this category. Some approaches use a slightly different variant of a partial plan, called a disjunctive partial plan. The nodes in the disjunctive partial plan may consist of several actions. The semantics of such a node is that exactly at that point, one of those actions is to be executed. For example Graphplan (Blum and Furst, 1997) uses a form of disjunctive planning.
18
Plan Merging in Multi-Agent Systems
Legend action proposition pre/postcondition arc no-op
Figure 2.4: Graphplan uses a plan graph consisting of proposition layers and action layers. Example 2.15. The Graphplan algorithm consists of two phases. First a planning graph is constructed that represents all possible solutions that reach the goal state in a minimum amount of planning steps. Then a solution is extracted from this planning graph. The planning graph is a directed, layered graph. An example of such a graph is shown in Figure 2.4. In this graph two types of layers are interleaved: proposition layers and action layers. A proposition layer consists of nodes that each represent an atom or the negation of an atom. The first layer is a proposition layer representing the initial state. An action layer contains a node for each possible action. The nodes are connected by three types of arcs. Precondition arcs connect the action to the atoms of its precondition of a previous layer. Add arcs connect an action to the nodes of the next layer, representing an atom that is a direct consequence of the postcondition of the action, and delete arcs connect actions to the nodes representing atoms that are disabled by the action. Propositions are always reproduced in the next layer by so-called “no-op” actions. Except when another action is chosen that has a delete effect for this proposition, this leads to a conflict. Such mutual exclusions (mutexes) are represented explicitly in Graphplan. These mutexes indicate for each action layer which actions cannot be fulfilled at the same time (in the same layer). To find a correct plan, Graphplan builds the graph while searching for a proposition layer that implies the goal state. Backwards from this goal state the mutual exclusion relations are verified. If it is impossible to satisfy all mutexes, the plan graph is extended with another layer and the process is repeated.
2.3
Extended planning problems
Over the last few years many extensions of the classical planning problem have been studied: dealing with time (Do and Kambhampati, 2001; Penberthy and Weld, 1994; Smith
Chapter 2. From planning to multi-agent planning
19
and Weld, 1999), costs (or utility maximization) (Haddawy and Hanks, 1998), limited resources (Koehler, 1998; Wolfman and Weld, 2001), and planning under uncertainty (Boutilier et al., 1999). Of these extensions, planning under uncertainty is maybe the most relevant when multiple agents are acting in the same environment. Such domains introduce several types of uncertainty. Firstly, actions can have probabilistic effects: for example, upon moving to another location by train, we know that we have 85 percent chance of actually reaching our destination in time, 10 percent of getting stuck somewhere, and 5 percent chance of getting involved in an accident. When the outcomes of actions can be (partially) observed, a plan can be constructed including sensing actions and conditional branches. This problem is called contingent planning. This leads to a second type of uncertainty. The sensing actions may fail as well or may not be able to observe the world completely. The unobservable, partially observable, and fully observable domain versions of this problem are EXPTIME-complete, EXPSPACE-complete, and NEXPTIMEcomplete, respectively (Bernstein et al., 2000; Haslum and Jonsson, 1999). The additional complexity over propositional STRIPS plan existence (Theorem 2.8) comes from the uncertain results of actions. Not only is the length of a plan exponential, now each plan can have an exponential number of resulting end states. Thirdly, planning in a domain where we lack even a probability distribution over the possible outcomes of a non-deterministic operator, is called non-deterministic planning or conformant planning. So, conformant planning is the problem of finding, in a nondeterministic domain, a sequence of actions which will achieve the goal for all possible contingencies. The complexity of a conformant planning problem is EXPSPACE-complete, i.e., strictly higher than that of classical planning (Haslum and Jonsson, 1999). Theorem 2.16. (Haslum and Jonsson, 1999) Complexity of conformant planning. Deciding the existence of a conformant plan (sequence) for a problem Π with an unobservable propositional domain and actions is EXPSPACE-complete.4 In a multi-agent system, agents often perform actions unexpectedly and independently of each other. When each agent is planning on its own without communicating and coordinating with the other agents, each agent has to solve some sort of non-deterministic planning problem. Theorem 2.16 and the complexity analysis of contingent planning show that non-determinism in planning is EXPTIME-hard. So we can conclude that such an individual approach to multi-agent planning is EXPTIME-hard. Corollary 2.17. An individual approach to planning in multi-agent systems using either conformant or contingent planning is EXPTIME-hard. 4 The
class EXPSPACE is the class of decision problems that can be solved using an amount of space bounded by 2 p(n) , where p is a polynomial and n is the input length. It is known that the class EXPTIME⊆EXPSPACE contains problems that are intractable (Theorem 7.16 in Garey and Johnson, 1979), even if P=NP.
20
Plan Merging in Multi-Agent Systems
However, by communicating parts of plans of the agents, we can reduce or even in some domains remove the uncertain effects of actions. Such approaches to planning in multi-agent systems are discussed in the remainder of this chapter.
2.4
Multi-agent planning: by and for agents
The term ‘multi-agent planning’ has incidentally been used to denote an approach to a planning problem with complex goals that splits the problem into manageable pieces, and lets each agent deal with such a subproblem (Wilkins and Myers, 1998; Ephrati and Rosenschein, 1993b). Then these results have to be combined somehow to achieve a coherent, feasible solution to the original problem. The idea of using several problem solvers or algorithms to work on one problem (De Souza, 1993) has been applied for example to transportation scheduling, for example (Fischer et al., 1995), in constraint programming (Hentenryck, 1999), and also to combine several planner agents to be able to reach a solution faster. A multi-agent architecture to support the latter is proposed in (Kamel and Syed, 1989) and in (Wilkins and Myers, 1998), among others. Some of the plan merging algorithms described in this section can also easily be used for this purpose in combination with an algorithm to divide the goal into subgoals, one for each agent. Usually, however, multi-agent planning has been interpreted as the problem of finding plans for a group of agents (Briggs, 1996; Ephrati and Rosenschein, 1993a; Rosenschein, 1982). Furthermore, it has also been used to describe the problem of coordinating the operations of a set of independent agents to achieve the goals of each individual agent (Georgeff, 1984; Konolige, 1982; Muscettola and Smith, 1989). The main difference between planning for a group of agents and coordinating agents planning for their individual goals is that in the latter approach, agents usually have their own, private goals, and they do not like to publish their complete plans, since sometimes these agents are competitors. Planning by multiple agents is also called distributed planning (Mali and Kambhampati, 1999), and planning for multiple agents is also called centralized multi-agent planning. In this thesis, we will use the term multi-agent planning if the planning is done both for and by the agents themselves: agents planning and coordinating actions to achieve their own goals. Remark 2.18. All planning methods described in this chapter assume that the world is deterministic. That is, we assume that we know the result of each action. Unfortunately, especially in a multi-agent environment, this is not the case, because for example another agent may have changed the world after the precondition of an action has been established. However, under the assumption that all agents’ actions are coordinated, the deterministic assumption is quite acceptable. This assumption is especially important, since we have seen in Section 2.3 that even in the single-agent case non-deterministic
Chapter 2. From planning to multi-agent planning
21
planning is extremely complicated.
2.4.1
Problem definition
Thus we define the multi-agent planning problem as having self-interested agents (i.e., agents working on their own goals) that try to solve their planning problems in a distributed way. It is very hard to find an algorithm that finds good solutions in a reasonable amount of time for all these problems. Therefore, we look for problems with specific properties, for example the planning of a set of taxi companies. Each taxi company usually has a set of orders and resources to execute these orders. The goals of an agent representing such a company are to carry out as many orders as possible. Often a taxi company is not able to execute all its orders on its own. Most situations involving a couple of such taxi companies have some noticeable properties: 1. The agents are self-interested, i.e., they are primarily focused on attaining their own goals. 2. The (initial) states of the agents are consistent, i.e., all agents base their state specification on the current state of the world, and they are only concerned with their own resources and goals. 3. The goals of the agents are consistent, because they (usually) have different orders (conflict-free). 4. The agents are benevolent. We define benevolent agents as agents that are, in principle, prepared to help each other (sometimes), because by cooperating they are able to carry out more of their own goals as well. 5. No two agents change a specific property (of the environment) at the same time. Although for each application some exceptions to these properties exist, many applications, such as freight transportation companies and even military forces, usually have all these properties. A problem that has all of the above properties is called a conflict-free benevolent multi-agent planning problem. In such problems the agents should be both distributed and self-interested, as in other multi-agent (planning) problems. The fact that multiple agents act in the same environment simultaneously introduces many complications. The research topic non-deterministic multi-agent planning deals with situations where the result of an action is not known in advance. However, this approach is very hard (Haslum and Jonsson, 1999), exactly because of the uncertain effects of actions. Another solution would be to introduce a semaphore to guarantee that only one
22
Plan Merging in Multi-Agent Systems
agent accesses a certain part of the world at a time. Such a semaphore can be seen as another agent controlling only this part of the world. This latter method matches property 5 above. The parts of the world that are relevant to more than one agent play a special role in multi-agent planning. In a multi-agent setting, the agents need to know which (combinations of) propositions can be used by which agent and which can be exchanged among the agents. In the following example, we have one proposition (free(B)) that can be exchanged, and one that stays private to one of the agents (holding(B)). Example 2.19. Suppose that we have a world with multiple robots, each supplied with a gripper. We use the propositions holding(B) to denote that the agent has a block B in its gripper and free(B) to denote that no block is on top of B. On the one hand, it is nonsense to exchange the proposition holding(B) in the plan of an agent a1 with an agent a2 , because the meaning would change from ‘a1 is holding block B’ to ‘a2 is holding block B’, while no action is changing the world. On the other hand, it could be useful to exchange the proposition free(B). Such a proposition probably occurs as a result of one agent a1 . According to our properties, two agents cannot change this at the same time. Therefore, a1 can ‘transfer’ this proposition to a2 , so that a2 can pick up B or put something on top of B (both actions have free(B) as precondition). Sometimes, only the combination of a number of propositions can be exchanged, for example when they each describe different aspects of the same complex physical object. We call the smallest set of propositions (possibly a singleton) that can be exchanged a resource, usually denoted by r ⊆ Σ. To represent the transfer of these resources, we introduce two actions (or operators). • A get action, get (a, r), is described by an operator h0, / ri to represent receiving resource r from agent a (see Definition 2.3). This action has no precondition, and produces the propositions belonging to r. • A put action, put (a, r) = hr, ¬ri, represents giving resource r to agent a. This action has the propositions of r as a precondition, and consumes this precondition, i.e., produces the negation of all literals in r. Note that it should not be possible to include a put(ai , r)-action in the (partial) plan of an agent a j without the corresponding get(a j , r)-action in the plan of an agent ai (or vice versa). Therefore, we require that in the final (consistent, see Definition 2.24) plan for a multi-agent planning problem, these actions only come in synchronized pairs. This way, propositions that are exchanged are deleted from one agent’s state, and added to another’s state. This ensures that the same proposition cannot be used by two different agents at the same time (as required by assumption 5).
Chapter 2. From planning to multi-agent planning
23
Formally, the multi-agent planning problem we study can be defined analogously to the single-agent planning problem from Definition 2.6. Definition 2.20. For a group of agents A = {ai , . . . , an }, the multi-agent planning problem is a tuple ΠA = (Σ, R, {Πa | a ∈ A}), where • Σ is the set of propositional atoms, • R ⊆ 2Σ is the set of resources that can be exchanged, and • Πa = (Oa , Ia , Ga ), with – Oa ⊆ 2Σ × 2Σ , the set of all operators (also called actions) to describe state changes that can be done by agent a ∈ A, including a get(a0 , r) and a put(a0 , r) action for each resource r ∈ R and agent a0 ∈ A, a0 6= a, ˆ
ˆ the part of the initial state that can be changed by agent a ∈ A, and – Ia ⊆ Σ, ˆ the goal specification for agent a ∈ A. – Ga ⊆ Σ, Furthermore, we require that for any two agents ai 6= a j ∈ A: (i) Iai ∩ Ia j = 0, / i.e., each agent knows only about his own part of the world, and (ii) Gai ∩ Ga j = 0, / no two agents have the same goal (as this could lead to a conflict). Next, we define the solution to a multi-agent problem. However, when agents produce resources for one another, it is very well possible for agents to get into a situation where there is a chain of agents, depending consecutively on each other (by means of traded resources), and where the last one is dependent on the first one. The dependencies within a partial plan PP = (V, E, IPC, PTC) of an agent are given by the precedence relations E (i.e., if (vi , v j ) ∈ E, then vi should precede v j , see Definition 2.9). In fact the inverse of the relations E define the direct dependencies between action nodes. The dependency of a node in the partial plan of one agent on another node (possible in the partial plan of another agent) can be defined as follows. Definition 2.21. Given the partial plans for a set of agents A, and the partial plan for an agent ai ∈ A, denoted by Pai = (Vi , Ei , IPCi , PTCi ), we say that an action node v ∈ Vi is dependent on a node v j ∈ V j , denoted by Dep(v, v j ) iff one of the following two conditions holds. 1. Either i = j (and thus ai = a j ) and a path exists from v j to v in (Vi , Ei ), or 2. i 6= j, and ∃ak ∈ A, vi ∈ Vi , vk ∈ Vk , r ∈ R with k 6= i, such that (a) Dep(v, vi ),
24
Plan Merging in Multi-Agent Systems PPak
PPai
v put(ai , r) vk
PPa j
vj Legend
get(ak , r) vi
dependency relation resource interchange precedence relation
Figure 2.5: The node v is dependent on w. (b) vi ≡ get(ak , r), (c) vk ≡ put(ai , r), and (d) Dep(vk , v j ). In other words, an action node v is dependent on a node v j if there is an intermediate agent with a put-action node vk that is dependent on v j and if there is a corresponding get-action node vi that precedes v (see Figure 2.5), or v and v j occur in the plan of the same agent and a path exists from v j to v in (Vi , Ei ). Remark 2.22. For every node v, it holds that Dep(v, v). Therefore, it is possible to choose vi = v and vk = v j as a special case of Definition 2.21-2 (see Figure 2.5). When plans are actually executed, an action can only be started after all actions it is dependent on have been completed. Therefore, it is impossible for an action v1 to be dependent on an action v2 that, in turn, is dependent on v1 . Definition 2.23. A circular dependency in a set of partial plans exists iff there are (at least) two action nodes v1 and v2 , v1 6= v2 , for which Dep(v1 , v2 ) and Dep(v2 , v1 ). Circular dependencies within one agent are easily prevented, since the agent has an overview of the complete plan, and can thus detect inconsistencies. Because of the distributed nature of multi-agent planning, however, agents can only see the local part of the total solution, and thus lose the overview required to detect these loops. A solution to a multi-agent problem ΠA = (Σ, R, {(Oa , Ia , Ga ) | a ∈ A}) consists of a set of consistent solutions to all problems (one for each agent) in the problem set.
Chapter 2. From planning to multi-agent planning
25
Definition 2.24. A solution ∆A = {∆a | a ∈ A} to the multi-agent planning instance ΠA is consistent iff: • every ∆a ∈ ∆A is a solution to (Σ, Oa , Ia , Ga ) (according to Definition 2.7), and • for any agents ai , a j ∈ A with i 6= j and resource r, there is a get(a j , r) ∈ ∆ai action iff there is a put(ai , r) ∈ ∆a j as well, and • the exchange of resources does not introduce circular dependencies. Given this definition of the conflict-free benevolent multi-agent planning problem, we are interested in the complexity of this problem.
2.4.2
Complexity of multi-agent planning
To analyze the complexity of the multi-agent planning problem (Definition 2.20), we consider the problem of deciding whether a solution exists for a given instance of this problem. This problem is called the multi-agent plan existence problem. Theorem 2.25. Complexity of multi-agent planning. Given that agents are able to cooperate, the multi-agent plan existence problem is PSPACE-complete, just as the complexity of single-agent planning (Theorem 2.8). The proof of this theorem is based on the idea that if a plan for all agents exists that is constructed centrally (by a single agent), then this plan can be translated into a set of plans, one for each agent. Hence we know that a multi-agent plan exists. Proof. First, note that multi-agent planning is at least as hard as single-agent planning, since any single-agent planning problem is a restriction of a multi-agent planning problem. To prove that it is not harder than single-agent planning, we show that it is possible to transform a multi-agent planning problem into a single-agent planning problem and that the solution of the single-agent problem can be translated back, all in polynomial time. We translate the problem ΠA = (Σ, R, {(Oa , Ia , Ga ) | a ∈ A}) into a single-agent problem Π = (Σ0 , O, I, G), where • Σ0 = {pa |p ∈ Σ, a ∈ A}; all propositions in Σ are labeled with the names of the agents, • O = {hprea , eff a i | hpre, eff i ∈ Oa , a ∈ A} ∪ Oputget , i.e., the possible actions are the actions of the individual agents with the preconditions and post-conditions are labeled with the name of the agent, and
Oputget = putget(rk,a , rk,b ) = rk,a , rk,b ∪ ¬rk,a | b ∈ (A − {a}) , rk ∈ R, a, b ∈ A ,
26
Plan Merging in Multi-Agent Systems i.e., actions putget are added that can change the label of the propositions in resources from one agent name to the other, • I = {pa | p ∈ Ia , a ∈ A}, the initial state is the combination of the initial states of the agents, labeled with the name of that agent whose initial state it is, and • G = {pa | p ∈ Ga , a ∈ A}, the goals of the new problem are the goals of the multiagent problem, again labeled with the name of the agent that has that goal.
This single-agent problem contains all information about which agent is going to execute which commands, and agents are only able to perform actions on propositions that are labeled with their name. The only way in which agents can help each other attaining their goals is by introducing putget-actions to change the labels. The solution to this problem is a sequence of operators ∆ that translates the initial state I to a goal state that matches G. This single-agent plan can be translated into a set of multi-agent plans ∆a as follows. 1. Substitute each putget(rk,a , rk,b ) action by a put(b, rk,a ) followed by a get(a, rk,b ) action. 2. Extract for each agent a ∈ A the subsequence ∆a of actions that have preconditions and/or post-conditions labeled with a. 3. Remove for each agent a ∈ A all labels a from the sequence. The result is a sequence that can get agent a from the initial state Ia to a goal state that fits Ga . If no solution to the constructed single-agent planning problem exists, there is also no solution for the multi-agent planning problem instance. This result maybe somewhat remarkable in the light of Corollary 2.17, which declares almost the same problem to be in the class of EXPSPACE-complete problems. The difference between these problems is that in the (benevolent) multi-agent planning problem above, agents are allowed to cooperate and communicate (partial) solutions, and do not try to guess what another agent is going to do. The remainder of this section describes three approaches: first, methods that deal with the coordination and planning for multiple agents from a global, all-knowing point of view. These approaches assume that all information about all agents is available at one location. Secondly, we show many methods that deal with distributed planning (i.e., planning by multiple agents) (Mali and Kambhampati, 1999; DesJardins et al., 2000), mostly for multiple agents. In contrast to the methods that use either social laws, markets, or auctions, as proposed in the third and final subsection, these methods explicitly construct and deal with the relations between the plans of the agents.
Chapter 2. From planning to multi-agent planning
2.4.3
27
Centralized planning for multiple agents
In principle any of the single-agent planning methods discussed in Section 2.2 can be, and sometimes is, used to make plans for multiple agents. This can be done as described in the proof of Theorem 2.25. A trusted third party creates new actions for each agent whenever the agent is able to execute these actions. These new actions are labeled with the name of this agent. Once a complete plan has been constructed, each agent should execute exactly those actions that are labeled with its name in the order given by this global plan. Such research on using the classical planning framework to construct and execute multi-agent plans has been conducted by Pednault (1987) and Katz and Rosenschein (1989). Katz and Rosenschein also describe how such multi-agent plans can be formally represented by a directed acyclic graph (DAG) of operators, and they give an algorithm to execute such a multi-agent plan using a central coordinator (1993). These approaches have (at least) two problems: (i) centralized planning requires more execution time, i.e., the number of agents times the size of an individual planning instance, and the natural division of goals and actions among agents is lost. Furthermore, (ii) the (third) party constructing all plans has to be trusted by all agents. It is not possible for an agent to keep part of its plan secret from such a third party. Because of these two problems, most of the approaches related to centralized planning for multiple agents are implemented using plan merging methods, where agents construct plans on their own, and a centralized algorithm is used to coordinate these plans. The next section discusses such distributed planning methods.
2.4.4
Explicit distributed planning
Planning methods that explicitly distribute data and control among the agents are called explicit distributed planning methods. In this section, we present three types of distributed planning methods. Partial global planning methods describe how part of each agent’s plan should be distributed among all agents, such that the agents can coordinate their actions. Hierarchical planning methods assume that plans can be represented by a hierarchy, and this hierarchy can be used to determine and solve conflicts between agents. Plan-merging methods specify how independently created plans should be merged. Planning and merging In principle, the idea behind multi-agent plan merging is quite simple. Each agent creates a plan for its own goals, and afterwards, these plans are analyzed to detect and resolve conflicts, and sometimes also to exploit positive interactions. However, such an algorithm, and related methods such as reasoning about the consequences of one’s plan on the plans of others can become quite complicated. Therefore, a structured model of actions, intentions, beliefs, states, and plans is required; one of the first was made by Bruce and Newman (1978). A formalism to describe such knowledge
28
Plan Merging in Multi-Agent Systems
has been presented by Rosenschein (1982), who used it to deal with conflicts between agents that can be resolved by synchronizing the execution of their plans. He showed what should be communicated to resolve these undesired interactions, and he includes these communication actions in the global plan. A similar approach has been taken by Georgeff (1983; 1988). Moreover, Georgeff was one of the first to actually assume a plan merging (or plan-synchronization) process starting with individual plans, as described before. Georgeff defined a so-called process model to formalize the actions open to an agent. Part of a process model are the correctness conditions, which are defined on the state of the world, and which must be valid before execution of the plan may succeed. Two agents can help each other by changing the state of the world in such a way that the correctness conditions of the other agent become satisfied. Of course, changing the state of the world may help one agent, but it may also interfere with another agent’s current plan. That is, one agent may change the state of the world in such a way that the correctness conditions of another agent become invalid (Georgeff, 1984). Stuart (1985) uses a propositional temporal logic to specify constraints on plans, such that only feasible states of the environment are guaranteed. These constraints are given to a theorem prover to generate sequences of communication actions (in fact, these implement semaphores) that guarantee that no event will fail. To both improve efficiency and resolve conflicts, one can introduce restrictions on individual plans to ensure efficient merging. This line of action is proposed by Yang et al. (1992) and Foulser et al. (1992), and can also be used to merge alternative plans to reach the same goal. Furthermore, they describe approximation algorithms for central plan merging as well. Another approach to merging a set of plans into a global plan deals with problems arisen from both conflicts and redundant actions by using A* and a smart cost-based heuristic (Ephrati and Rosenschein, 1993b). It is also shown that, by dividing the work of constructing subplans over several agents, one can reduce the overall complexity of the merging algorithm (Ephrati and Rosenschein, 1994). In other works on plan merging, Ephrati (1993a; 1995) and Rosenschein (1995) propose a distributed polynomial time algorithm to improve social welfare (i.e., the sum of the benefits of all agents). Through a process of group constraint aggregation, agents incrementally construct an improved global plan by voting about joint actions. They even propose algorithms to deal with insincere agents and to interleave planning, coordination, and execution (Ephrati and Rosenschein, 1995). Unfortunately, this is already a very difficult issue in single-agent planning. Recently, however, Tsamardinos et al. (2000) succeeded in developing a plan merging algorithm that deals with both durative actions and time. They construct a conditional simple temporal network to specify (temporal) conflicts between plans. Based on this specification, a set of constraints is derived that can be solved by a constraint solver. The
Chapter 2. From planning to multi-agent planning
29
solution specifies the required temporal relations between actions in the merged plan. Although this algorithm is designed to merge planned actions for an additional goal of one agent to a pre-existing plan in a dynamic environment, we feel it could be used in a multi-agent context as well, like is done in (Kabanza, 1995), where temporal constraints are also specified. However, Kabanza does not deal with the duration and specific start and end time points of actions. Although some of these approaches have found quite successful solutions, most end up with two problems: they require a trusted third party, and it is sometimes impossible for all agents to construct their plans independently. Theoretical abstraction Korf (1987) states that (i) if subgoals are independent, the branching factor is divided,5 (ii) if subgoals are serializable, certain operators can be ruled out from the branching factor, and (iii) if subgoals are not serializable, they may reduce the search distance (in number of states) to the goal. Especially the first result supports the idea of plan merging. Furthermore, he found that macro-operators can be used to get directly from the subgoals to the goal (if properly dealt with). And thirdly, by abstracting the search space by multiple levels, one can reduce the number of states. These last theoretical results are exploited for example by the hierarchical planning methods. Hierarchical planning Corkill (1979) studied interleaved planning and merging using task reduction of hierarchical plans. He discovered that, on the one hand, conditions on the abstract plan operators can be formalized such that conflicts can sometimes be detected in an early phase of refinement. On the other hand, he also recognized that choices of refinement or synchronization at more abstract levels could lead to unresolvable conflicts at lower levels, and that therefore backtracking could be necessary. An approach that considers both conflicts and positive relations is proposed by Von Martial (1989, 1990a,b, 1992). Plans are represented hierarchically, and need to be exchanged among the agents to determine such relations. If possible, relations are solved or exploited at top level. If this is not possible, a refinement of the plans is made, and the process is repeated. For each specific type of plan relationship, a different solution is presented. Relations between the plans of autonomous agents are categorized. The main aspects are positive/negative relations, (non)consumable resources, requests, or favor relationships. Clement and Durfee describe an interleaved planning process, where different agents jointly construct a plan to establish a set of goals. They use the class of planners called Hierarchical Task Network (HTN) planners. To avoid the backtracking problem recognized by Corkill, the agents must determine whether their plans interfere before the plans are 5A
branching factor is the average number of choices that can be made in each state of the search process.
30
Plan Merging in Multi-Agent Systems
fully refined. Clement and Durfee derive the so-called plan summary information from the hierarchical plan operators. Using this summary information, they can reason about possible interactions between different agents before the individual plans are completely specified. It would be interesting to analyze how summary information can be derived in a similar way, and be used to recognize opportunities to optimize plans of individual agents (Clement and Durfee, 1999a,b). For applications where privacy is not really important, and where all problems tend to have a similar structure, these approaches are quite useful, since hierarchical plans are especially suited for reusability. However, these methods are not really suited to solve more general multi-agent planning problems where agents are really autonomous and independent.
Partial Global Planning (PGP) In the PGP-framework (Durfee and Lesser, 1987a) and its extension, generalized PGP (Decker and Lesser, 1992, 1994), each agent has a partial picture of the plans of other agents using a specialized plan representation where a single agent’s plan includes a set of objectives, a long-term strategy (ordered lists of planned actions) for achieving those objectives, a set of short-term details (primitive problemsolving operations), and a number of predictions and ratings. Planning, which can be both the expansion of objectives into planned actions and the ordering of those actions, is based on a set of heuristics. Coordination is achieved by a process of negotiation. When an agent informs another of a portion of its plan, the other merges this information into its partial global plan. It can then try to improve the global plan by, for example, eliminating redundancy. Such an improved plan is shown to other agents, who can either accept, reject or modify it. This process is assumed to run in parallel with the execution of the local plan. This method has first been applied to the distributed vehicle monitoring test bed, but later on, an improved version has also been shown to work on a hospital patient scheduling problem, where Decker and Li (2000) used a framework for Task Analysis, Environment Modeling, and Simulation (TAEMS) to model such a multi-agent environment. An overview of the PGP related approaches is given in (Lesser, 1998). The PGP-framework is very general, and can be extended in many ways to solve conflict-free benevolent multi-agent planning problems. For example, the negotiation process can be adapted such that it is no longer necessary that large parts of a plan become globally accessible. Furthermore, we believe that the encapsulation of information in so-called resources would add significantly to the efficiency of coordinated planning algorithms.
Chapter 2. From planning to multi-agent planning
2.4.5
31
Implicit distributed planning
It is not always necessary or even desirable that agents actually send parts of their plans to each other to ensure that no conflicts will arise upon the execution of these plans. Social laws coordinate agents more indirectly and beforehand. Social laws A social law is a convention that each agent has to follow. Such laws restrict the agents. They can be used to reduce communication costs and planning and coordination time. In fact, the work of Yang et al. (1992) and Foulser et al. (1992) about finding restrictions that make the plan merging process easier, as discussed in the previous section, is a special case of this type of coordination. A typical example of ‘social’ laws in the real world are traffic rules: because everyone drives on the right side of the road (well, almost everyone), virtually no coordination with oncoming cars is required.6 Generally, solutions found using social laws are not optimal, but they may be found relatively fast. How social laws can be created in the design phase of a multi-agent system is studied by Shoham and Tennenholtz (1995). Briggs (1996) proposed more flexible laws, where agents first try to plan using the most strict laws, but when a solution cannot be found agents are allowed to relax these laws somewhat. They also showed how social laws can be learned. Unfortunately, not all conflicts can be prevented using laws and regulations. Auctions are a method to solve conflicting interests for, e.g., scarce resources. Also, auctions and even complete market models can be used to coordinate agents’ plans without (costly) explicit communication of partial plans. Auctions, markets, and decision theory For a given resource, each agent usually has a different use, and therefore attaches a different value to such a resource. An auction is a way to make sure that a resource is given (sold) to the agent that attaches the highest value (called private value) to it (Walsh et al., 2000; Wellman et al., 2001). A Vickrey (1961) auction is an example of an auction protocol that is quite often used. In a Vickrey auction each agent can make one (closed) bid, and the resource is assigned to the highest bidder for the price of the second-highest bidder. This auction has some nice properties, such as that bidding agents are stimulated to bid exactly their private value (i.e., exactly what they think it’s worth to them). In (Zlotkin and Rosenschein, 1996), a negotiation protocol is proposed, the Unified Negotiation Protocol, for two agents, each having their own goals in so-called stateoriented domains. This protocol describes how agent should interact such that they agree on a “deal” that describes a state of the world that is desirable for both agents and a joint plan to get to this state. If the goals of the agents conflict, the protocol tries to find a deal 6 Note
that it is arbitrary on which side of the road one drives, as long as everyone takes the same side.
32
Plan Merging in Multi-Agent Systems
that maximizes the utility for both agents. Other approaches are completely based on an economy of utilities. For example, market simulations can be used to distribute large quantities of resources among agents (Walsh and Wellman, 1999; Wellman, 1993, 1998). The basic principle of such markets is that with more demand the price increases and with more supply the price decreases. For example, Clearwater (1996) shows how costs and money are turned into a coordination device. Auctions and markets can be used both to allocate resources and to assign tasks. An overview of value-oriented methods to coordinate agents is given in (Fischer et al., 1998). In some of these approaches, game-theory is used to reason about the cost decisions. See, for example, the work by Sandholm and Lesser (1997), which is illustrated with results from a multiple dispatch center vehicle routing problem. In the remainder of this section we first discuss some other aspects of multi-agent systems and distributed artificial intelligence in general that are relevant to this topic. Then we summarize what research has been done and we formulate some remaining research challenges.
2.4.6
Other related work in distributed artificial intelligence
Distributed artificial intelligence (DAI) is the study and modeling of action and knowledge in multi-agent systems.7 Multi-agent systems are ‘systems in which several interacting, intelligent agents pursue some set of goals or perform some set of tasks’ (Weiß, 1999). Also the construction and application of multi-agent systems is considered as a part of DAI research. The relevance of the subfields to our work are discussed here.8 The notion of intelligent agents is omnipresent in distributed artificial intelligence. An agent is an autonomous problem solver (Wooldridge and Jennings, 1995). We use the term agents to denote autonomous entities like humans and companies, but also to denote the software program that represents humans and companies. Several theories have been developed to support the design and construction of agents, for example theories about reactive agents (Maes, 1990) that have a direct mapping from situation to action, logicbased approaches Genesereth and Nilsson (1987, Chapter 13), where decision making is supported by some form of logical deduction, and theories about modeling the beliefs, desires and intentions (BDI) of an agent (Bratman et al., 1988). Our work focuses on the planning aspect of an agent. Planning can be seen as logic-based reasoning. We could incorporate planning into a BDI-model; however, we present the planning and coordination 7 This
definition is based on (Bond and Gasser, 1988) and (Weiß, 1999). a more complete overview the reader is referred to the following book: ‘Foundations of Distributed Artificial Intelligence’ (O’Hare and Jennings, 1996) and especially the first chapter (Moulin and Chaibdraa, 1996). A slightly different view on distributed artificial intelligence is given in a more recent work about ‘Multi-agent systems’ (Weiß, 1999). 8 For
Chapter 2. From planning to multi-agent planning
33
algorithms independently of the agent-architecture. Coordination and cooperation between agents is studied in the context of multi-agent systems. A first pre-requisite of coordination is communication. This involves the study of several agent languages like Speech Acts (Searle, 1970), KQML (Finin and Fritzson, 1994), and, more recently, the FIPA Agent Communication Language (FIPA, 2001), and it also includes building relevant ontologies (Gruber, 1993). In this thesis, we assume that the agents use one of these languages, which enables them to specify the messages they need to send to each other as part of the coordination and negotiation protocols these agents use to interact. Negotiation protocols are typically used when competitive agents need to coordinate. This often involves establishing a price for a certain resource or a task that one agent performs for another.
2.5
Challenges
As we have seen above, many interesting results have been achieved concerning planning and coordinating agents. Nevertheless, much is missing. To begin with, there is not even a common definition of multi-agent planning. Problem definition and categorization For single-agent planning (also called AIplanning) there is a commonly accepted definition of the ‘basic’ problem, i.e., classical planning, see Definition 2.6. Moreover, many extensions of this problem have led to more or less coherent research communities, such as planning with resources, planning with uncertainty and conformant planning, planning with time, etc., see Section 2.3. For AIplanning, which has been studied at least since the late 60s, still many of those extensions have not been completely researched. For multi-agent planning, even such a common definition and categorization of extensions does not exist. We are left with many different approaches to many different problems, and the connection between these problems is often unclear. We need a kind of a commonly accepted ‘basic’ problem definition of multi-agent planning, just like the one we have for single-agent planning. Furthermore, we need a categorization of extensions of this problem, such as multi-agent planning with cost optimization, with non-cooperative or untruthful agents, with learning agents, with coalition forming, and so-called open multi-agent systems, where agents may join or leave the system. Taking the single-agent example, we feel that the basic multi-agent planning problem should be as restricted as possible, without being so restrictive that no real problems can be solved. Secondly, the agents in a multi-agent planning system should be really autonomous and possibly heterogeneous, so we can distinguish multi-agent planning from (distributed) single-agent planning.
34
Plan Merging in Multi-Agent Systems
In Section 2.4, many definitions of multi-agent planning have been discussed, and in Definition 2.20 we have defined the conflict-free benevolent multi-agent planning problem that meets the two requirements we just gave. This definition uses the same notation as the definition of single-agent planning, i.e., it uses propositional logic to specify the conditions on and effects of actions. It is not self-evident that the same formalism is the best for multi-agent planning problems, so we need to find a suitable formal language and framework for multi-agent planning. Formal language and framework Firstly, we need a language to formally model the multi-agent planning problem and instances of this problem, and to describe algorithms to solve this problem, like we use propositional logic for single-agent planning. Although coordination problems and solutions can be described by (an extension of) propositional logic, a single proposition usually is not the basic relevant entity for coordination. Most coordination problems are about conflicting use of resources. Different agents may have different views on such resources, i.e., they would describe such a resource by a different set of propositions. To allow such different views by different agents, we think it is important to encapsulate resources and use those as the atoms of a formal language for multi-agent planning. Secondly, we need a framework to describe algorithms for multi-agent planning, like the refinement-planning framework used for single-agent planning. Such a framework should supply us with both a generic view on existing multi-agent planning methods and a general approach to multi-agent problems. Moreover, the framework should enable us to plug-in many (if not all) existing single-agent planning methods for each (heterogeneous) agent. To provide all this, such a multi-agent planning framework should consist of two parts. On the one hand, we need a mechanism to adjust local (single-agent) planning methods to work in a multi-agent system, and on the other hand we need a method to coordinate these modified local planning methods. An example of such a method could be a market as described in Section 2.4.5. Algorithms and applications A framework itself is not of much use without a clear way to put existing algorithms into this frame. The final goal is to have a generic approach to many different applications that involve multi-agent planning, for example by combining approaches described in this chapter. However, current work on multi-agent planning is difficult to compare, organize and therefore to reuse. Moreover, it is incomplete with respect to many of the extensions that have interesting applications. For example, often large parts of plans are exchanged among the ‘autonomous’ agents, and not enough attention is given to applications that involve the use of scarce resources, time windows, optimization of costs, and uncertainty in planning in a multi-agent context.
Chapter 2. From planning to multi-agent planning
35
In this thesis In this thesis we cover some of these interesting and sometimes very hard problems. We give a formal definition of a basic multi-agent planning problem, we introduce a unique language and a framework to describe multi-agent planning problems and algorithms, and we present a plan merging algorithm that uses this framework to coordinate the plans of multiple independent taxi companies.
36
Plan Merging in Multi-Agent Systems
Chapter 3 The action resource formalism To support the production of a plan by a computer, a method to model a plan is required. For coordinating the plans of different companies, some information on their plans need to be communicated. For example, if a number of taxi companies want to coordinate, they need to be able to comprehend each others proposals for changes. Hence, in general, in order to effectively cooperate, planners need to have a language to specify their plans, and they need to be able to communicate. In this chapter, we introduce a detailed formal language, based on the ideas and some of the formalization given in two previous articles (De Weerdt et al., 2003; Tonino et al., 2002). Automated planning systems can use this language to specify their plans and goals, and to specify possible exchanges between agents. In other words, this formalism can be used for both communication (by formalizing the exchange of resources) and planning. This helps us to describe coordinated planning methods in a generic framework, such as the algorithms presented in the next chapter. This formal framework consists of plans, actions and resource facts, and is therefore called the action resource framework (ARF). In Section 2.4.1, a resource fact was defined as a set of propositions that describes a physical object or a product at a specific place and time, such as a taxi at a certain location, a person, capacity on a plane from one location to another, fuel, or a tool. In the ARF, such resources are taken as the atomic objects for planning and coordination of actions. The operators or actions in this framework are processes that consume and produce resource facts. Generally, an action has a precondition formula to denote the required input resource facts, and a description of the output resource facts. Goals are also specified by a formula, which describes the required set of resource facts. To attain such goals, one can combine actions into plans. Plans specify the dependencies between actions, but do not necessarily specify a linear order of actions. This means that concurrency can be represented explicitly in a plan, and 37
38
Plan Merging in Multi-Agent Systems
one plan may represent many possible execution sequences, even for just one agent. This is a generalization of the plan defined in Definition 2.7, which represents exactly one possible execution sequence. An agent or a group of agents can use a plan to realize some goals, given a set of initial resource facts. In this chapter, we first define a resource language and its semantics (Section 3.1). This resource language (RL) consists of resource facts and resource formulas. Resource formulas are used to describe actions (Section 3.2), and a planning problem is defined using both actions and resource formulas to describe the initial state and the goal. In Section 3.3, plans are defined and a transition system is given that formalizes the operational semantics of these plans. Furthermore, we describe whether a plan can be used to solve a given problem. The resource language cannot be used to describe constraints on specific attributes. Therefore, it is extended in Section 3.5 by a constraint language, CL. This language is a many-sorted language that can be used to specify complex goal constraints and preconditions. In this section, the definitions of actions and plans are extended to deal with such complex constraints. In the next chapter, the concepts introduced in this chapter are used to define a multi-agent planning problem and some multi-agent planning algorithms.
3.1
Resources
Objects and products in the (real) world can be divided into two groups: those that are relevant to a planning problem, and those that are not. Irrelevant objects can occur as objects in the terms of our language, or be omitted completely, whereas relevant objects are represented by a resource fact. In this way, the world can be modeled as a finite collection of resource facts. Each resource fact represents a particular atomic product or object with all its properties at a specific time and place. This implies that for each object many possible resource facts exist, but only one at a time. Resource facts may also be used to denote more abstract ‘products’, such as possibilities and availabilities. The properties of a resource are described by the terms in our language, and are defined for all planning agents in a specific domain. A fundamental property of resources is that they can be consumed and produced by actions. The state of the world can be described by a resource formula in the resource language RL. Such a formula is a conjunction of resource facts. Each resource fact has a truth value. In general, a resource formula describes a set of states, i.e., all states of the world in which these resources occur. Therefore, we also use a resource formula to describe a goal. The resource language is a special many-sorted language. We specify which specific characteristics this resource language has, compared to standard many-sorted languages. For a definition of a standard many-sorted algebra the reader is referred to Appendix A.1, where some mathematical preliminaries are given (Gallier, 1987).
Chapter 3. The action resource formalism
39
By a ground formula we mean that none of the terms contains variables. In this thesis, functions are total unless specified otherwise. Example 3.1. A resource, for example a taxi at a specific time, has a number of attributes, such as a license plate, location, time, and capacity. The values of these attributes are terms of a specific sort. For example, a location value such as ‘Amsterdam’ is a constant term of sort ‘loc’ that has as domain the set {Amsterdam, Rotterdam, etc.}, and a capacity value of ‘3’ is a constant term of sort ‘capacity’ that has as domain the natural numbers (N). We describe a resource fact by a special kind of predicate, i.e., a resource predicate. To describe the resource discussed in this paragraph, we use a notation where the attributes of the resource predicate are of the form ‘value:sort’, as can be seen here: taxi1:identity (20BX10 : plate, Amsterdam : loc, 3 : capacity). Each resource fact has a specific type that defines the number and sorts of its attributes. The type is usually denoted by the predicate name, in this case ‘taxi’. Attributes such as the one of sort ‘plate’ can be used to relate this resource fact to a real taxi in a unique way. The attribute ‘identity’ has a special function. In a description of a state of the world, or in a plan, it may happen that two ground resource facts occur with exactly the same attributes. If we described all the (possibly infinite) attributes of a resource, such as the sequence of actions that produced it, and the exact time and location, this would not happen, but describing these detailed attributes is practically impossible. Therefore, we omit such attributes, but use one additional attribute instead, viz., ‘identity’. The attribute ‘identity’ is a unique value that is assigned to each ground resource fact to make sure that whenever two otherwise identical resource facts produced by different (parts of) plans are added to a set of resource facts, this set does contain two of these resource facts.1 Because of its special nature, this attribute is specified in a special way, as a subscript to the predicate name. Since it is the only attribute that is put as a subscript, we can omit the sort name ‘identity’. The possible attribute values of a resource fact are denoted by terms. Besides constants, terms in a resource fact may also be variables (of the correct sort) or even functions on terms, as defined in Definition 3.4. This allows us to specify groups of resource facts that have a similar form. Example 3.2. An example of a resource fact where terms are variables or functions on terms is the resource fact taxi2 (p : plate, l : loc, 3 − c : capacity) 1A
similar effect can be accomplished by using multi-sets. However, in a multi-set the resources are truly interchangeable. This is not desirable, because these ‘identical’ resources have been produced in a different context. So using another of those ‘identical’ resources could introduce illegal dependencies.
40
Plan Merging in Multi-Agent Systems
that specifies a taxi at an unspecified location l with capacity (3 − c). Such resource facts are very useful to define goals and conditions for actions. Note that this resource fact has identity value 2. To formally define terms and such resource facts, we first introduce all symbols that can be used in our language. As said before, a formula in RL is typically used to describe a state of the world by a conjunction of resource facts. Each resource fact can have several attributes of different sorts. So we need symbols for conjunctions, sorts, variables, constants, and the resource predicates. The resource formulas in this many-sorted resource language then are defined in Definition 3.7. This language is a variant of (standard) many-sorted languages (see Appendix A.1, Definition A.2), where the number of logical connectives is restricted, and a special sort identity is introduced. Definition 3.3. The alphabet of the many-sorted resource language RL consists of the following sets of symbols: • A special sort identity. • A countable set S = S 0 ∪ {bool} of sorts or types, such that S 0 does not contain bool, nor identity. • A logical connective: ∧ (and) of rank (hbool, booli , bool). • For every sort s in S, a countable infinite, disjoint set of variables Vs = {x0 , x1 , . . .}, S each variable xi of rank (0, / s). Furthermore, we define V = s∈S Vs . • Auxiliary symbols: ‘(’, and ‘)’. • An S-ranked alphabet Σ of symbols consisting of: – For every sort s ∈ S a countable set Fs of function symbols f0 , f1 , . . ., and a rank function r : Fs → S + × S, assigning a pair r( f ) = (u, s) to every function S symbol f . Furthermore, we define F = s∈S Fs . – For every sort s ∈ S ∪ {identity} a set Cs of constant symbols c0 , c1 , . . ., each S of rank (0, / s). Furthermore, we define C = s∈S Cs . – We also introduce a set of resource (predicate) symbols: A countable set R of special predicate symbols r0 , r1 , . . ., each of rank (hidentity, S∗ i , bool). The sets V , C , F , and R must be disjoint. The difference from a standard many-sorted first-order language (see Definition A.2) is that in the resource language
Chapter 3. The action resource formalism
41
1. the following symbols are not used: ∨ (or), → (implication), ↔ (equivalence), ¬ (not), ⊥ (bottom), > (top), and the first-order symbols: for every sort s ∈ S, ∀s (for all), and ∃s (exists) (although we use the existential quantifier implicitly), 2. for every sort s ∈ S, we do not have the equality symbol =s , 3. a special sort ‘identity’ is introduced for which no variables and functions exist, and 4. the set of symbols is extended with a set of resource predicate symbols instead of standard predicate symbols. This language is rather limited compared to a standard many-sorted first-order language. However, it suffices for describing the most elementary aspects concerning planning, such as the initial state, a goal, actions, and a plan. In Section 3.5 we extend this language to describe more complex properties of goals and actions by constraints on attributes. Having described the symbols and their rank, we now define how these symbols can be used. In Definition A.3, the set of all terms of sort s that may occur in a constraint or as attributes to a predicate in a many-sorted logic is defined by the set TERM sL . The set of terms TERM sRL for our resource language is defined in exactly the same way. These terms are used to describe attributes of our resource predicates. Definition 3.4. For any s ∈ S, the set TERM sRL of RL-terms of sort s is defined as follows: 1. Every constant of sort s ∈ S ∪ {identity} is an RL-term of sort s. 2. Every variable of sort s ∈ S is an RL-term of sort s. 3. If t1 , . . . ,tn are RL-terms, each ti is of sort si , and f is a function symbol of rank (u, s) (with u = hs1 , . . . , sn i) then f (t1 , . . . ,tn ) is an RL-term of sort s. Example 3.5. The attributes 1, 20BX10, Amsterdam, and 3 in Example 3.1 are all constant terms, whereas p and l from Example 3.2 are variable terms, and (3 − c) is a function on terms, but all attributes are terms. Resource predicates describe resource facts, i.e., the atomic formulas of our resource language. A predicate is constructed using terms and a resource symbol. This is defined as follows: Definition 3.6. Given a many-sorted resource language RL with sorts S, if t0 , . . . ,tn ∈ TERM L are terms, t0 is of sort identity, and for 1 ≤ i ≤ n, each ti is of sort si , and r ∈ R is a resource symbol of rank (u, bool) (with u = hidentity, s1 , . . . , sn i), then r = rt0 (t1 , . . . ,tn ) is an atomic resource predicate or resource fact. The identity of a resource, denoted by id(r ), is the term t0 .
42
Plan Merging in Multi-Agent Systems
In the remainder of this thesis, we often speak just of a resource, while we actually mean the “resource fact” that describes this resource. Such resources can be combined with conjunctions into more complicated resource formulas. In Definition A.4 on page 149, the set of all formulas for a common many-sorted language is given. In our language, we only use a variant of special resource formulas FORM RL with only conjunctions and extended with a uniqueness requirement, defined as follows. Definition 3.7. Given a many-sorted resource language RL with sorts S, and a number n ≥ 1 of atomic resource facts r1 , . . . , rn , if the identity attributes are unique, i.e., there do not exist an i and j, 1 ≤ i, j ≤ n, i 6= j, such that id(ri ) = id(r j ), then (r1 ∧ . . . ∧ rn ) is a resource formula. The set FORM RL of resource formulas is the set of all such resource formulas. Example 3.8. The resource language of taxis is defined as follows. Let S = {bool, loc, capacity}. The domains of these sorts are defined as • Dbool = {true, false}, Didentity = N, • Dloc = {Amsterdam, Rotterdam}, and Dcapacity = N. • Furthermore, we have variables Vcapacity = {c}, and a resource predicate symbol R = {taxi}. We know that ‘Amsterdam’ is a term of sort loc, just as 1 and c are terms of sorts identity and capacity, respectively. Also, we can derive that ‘taxi1 (Amsterdam : loc, c : capacity)’ is a resource formula (∈ FORM RL ). Remark 3.9. Like in an ordinary multi-sorted language, the interpretation of variables of sort s are given by an assignment function vs : Vs → Ds that assigns a value from the domain of s (see Definition A.7 on page 150). The semantics of our resource language is a restriction of an ordinary multi-sorted language to the resource formulas defined in Definition 3.7. The corresponding definitions can be found in Appendix A.2, particularly Definition A.10 on page 151 for the semantics of a term and Definition A.11 on page 151 for the semantics of a formula. We introduce the following notational conventions. If a resource fact r = rt0 (t1 , . . . ,tn ) is given, the term ti (for 0 ≤ i ≤ n) is denoted by r i . We say that the type of the resource fact r is r, denoted by type(r ), i.e., the predicate symbol or name. The number n of (nonidentity) attributes of r is denoted by |r |. Note that if type(r ) = type(r 0 ), then it holds that |r | = |r 0 |, because these resource facts have the same predicate symbol and thus the same rank. However, not every two resource facts r and r 0 with the same rank are of the same type; they need to have the same resource predicate symbol as well. The (free) variables
Chapter 3. The action resource formalism
43
in a formula ϕ are denoted by var(ϕ), and the (free) variables in a set of resources R by var(R). To decide whether a set of (ground) resources satisfies a resource formula, we need to find a subset of resource facts (of the same size as the number of conjuncts in the formula) such that for each resource fact in the set exactly one equivalent conjunct can be found. When, after the variables in the conjunct are substituted by other terms of the same sort, a resource and a conjunct (resource fact) are the same, except for their identity attributes, we say they are equivalent. For a resource r with some variables, a substitution θs supplants variables in r of sort s by terms of sort s. We write r θs to denote a resource where each variable x of sort s is replaced by θs (x). We write even simply r θ to replace the variables of all sorts occurring in r , as defined by θ . Furthermore, ε is the empty substitution. Resources are used in actions. To specify which resources can be potentially used, we need a semantic notion of equality where the identity attribute is not considered. We call two resources equivalent when they are equal except for their identities. This means that they can be interchanged.2 Definition 3.10. Given two resources r and r 0 , we say that r is equivalent to r 0 , denoted by r ≡ r 0 iff type(r ) = type(r 0 ) and that for all 1 ≤ i ≤ |r | holds that r i = (r 0 )i . This equivalence is extended to sets of resources as follows: given two sets of resources R1 and R2 , we say R1 ≡ R2 , iff a bijective function f : R1 → R2 exists, such that for each r1 ∈ R1 it holds that f (r1 ) ≡ r1 . Note that for equivalence we do not require that r 0 = (r 0 )0 . In fact, we require for any two resources r , r 0 , r 6= r 0 that r 0 6= (r 0 )0 , in other words, that each resource has a unique identity. Definition 3.11. Given two resources r and r 0 , we say that r satisfies r 0 , denoted by r |=θ r 0 iff a substitution θ exists, such that r ≡ r 0 θ . A consequence of this definition is that if r |=θ r 0 , that then also r |=ε r 0 θ . In this case we may also omit the empty substitution and just write r |= r 0 θ . Note that r |= r 0 if r ≡ r 0 . Next, we introduce some more notation: given a formula ϕ, we define Rϕ to be the V set of the conjuncts of ϕ. Vice versa, given a set of resources R, we define ϕR = R. As we already mentioned, a desired goal state or a set of desirable world states can be described by a resource formula. Given the definition of one resource fact satisfying another, we now define that a set of resources satisfies such a description of the desired state(s) of the world, when all resource facts can be satisfied. 2 Equivalent
resources can be interchanged when they are stand-alone. However, as we will see in Section 5.2, resources that are the result of a plan cannot always be interchanged with other resources in the same plan.
44
Plan Merging in Multi-Agent Systems
Definition 3.12. A resource formula ϕ ∈ FORM RL , where ϕ = r1 ∧ . . . ∧ rm , is satisfied by a set of resources R0 , denoted by R0 |=RL ϕ, iff a substitution θ and m different resources r10 , . . . , rm0 ∈ R0 exist, such that for all 1 ≤ i ≤ m holds that ri0 |=θ ri , and for all 1 ≤ j ≤ m with i 6= j holds that ri0 6= r j0 . Similarly, we can say that the set of resources Rϕ = {r1 , . . . , rm } is satisfied by R0 , denoted by R0 |=RL Rϕ . It is easy to see that if we define the set R00 as precisely the subset of the set of m different resources r10 , . . . , rm0 ∈ R0 from this definition, then trivially R00 |=RL ϕ. We introduce a notation for the collection of all sets of resources that satisfy ϕ. Definition 3.13. Given a formula ϕ, the set of all sets of resources that satisfy ϕ, denoted by [[ϕ]], is defined as R | R ⊆ FORM RL and R |=RL ϕ . From Definition 3.12 we know that one set of resources R0 can satisfy another set of resources R if for each resource r from R a resource exist in R0 that can satisfy it. This property is very useful in proofs on the satisfaction relation. Proposition 3.14. Iff R0 |=RL R, then an injective function f : R → R0 exists, such that for all r ∈ R it holds that f (r) |=RL r. Proof. By Definition 3.12 we know that m different resources r10 , . . . , rm0 ∈ R0 exists, such that for all 1 ≤ i ≤ m holds that ri0 |=θ ri . So define f : R → R0 as follows: f (ri ) = ri0 . Note that f is injective because for all 1 ≤ j ≤ m with i 6= j it holds that ri0 6= r j0 . For ease of syntactic notation, we lift the use of the symbol |= for resource facts to sets of resource formulas as defined in Definition 3.12. Consequently, R0 |= R is an alternative notation for R0 |=RL R. In the following property of this satisfaction operator, this alternative notation is used. Proposition 3.15. For four sets of resources R1 , R2 , R3 , and R4 , such that R1 ∩ R3 = 0, / the following property holds: if R1 |= R2 , and R3 |= R4 , then R1 ∪ R3 |= R2 ∪ R4 . Proof. Given R1 |= R2 and R3 |= R4 we know that injective functions f1 : R2 → R1 and f2 : R4 → R3 exists such that for all r ∈ R2 holds that f1 (r) |= r and for all r ∈ R4 holds that f2 (r) |= r (with Proposition 3.14). Define now the function f (r) =
f1 (r) if r ∈ R2 . f2 (r) if r ∈ R4
By Proposition 3.14 then directly follows that R1 ∪ R3 |= R2 ∪ R4 . To specify the dynamics of resources, such as a movement of a taxi, we need to define operators on resources.
Chapter 3. The action resource formalism taxi(p, x)
passgr( f (p))
45 Legend load()
load()
taxi taxi(p, x)
passgr(x)
action
input
resource
output
dependency
Figure 3.1: An action such as ‘load()’ has two input resources displayed below the action, and two output resources displayed above.
3.2
Actions
The basic process of consumption and production of resources is called an action. Actions can be compared to the operators in STRIPS (Fikes and Nilsson, 1971), described as state changes in the previous chapter (Definition 2.3 on page 10). The main difference is that in the resource language the precondition of an action is consumed. In other words, the state change is defined as follows: if an action o is performed in a state S, the input resources are removed from S, resulting in an intermediate state S00 , and then its output resources are added to S00 resulting in a state S0 . So an action o performed on S results in a state change from S to S0 . In STRIPS, the precondition is never automatically removed from the state and the postcondition (effect) consists of a separate add and delete list. Example 3.16. (See Figure 3.1.) The action load needs two resources, a taxi and a passenger (passgr) at the same location x. When the load action is executed, these two resources are consumed, and two new resources are produced with different identities. The location attribute of the new passenger resource is the license plate of the taxi. This attribute together with the attributes of the taxi resource denote the location of the passenger. The (term) function f : plate → loc is in fact only a type cast.3 load() : taxi(p : plate, x : loc) ∧ passgr(x : loc) ⇒ taxi(p : plate, x : loc) ∧ passgr( f (p) : loc) So an action consists of an input resource formula to describe which resources are consumed, and an output resource formula that describes how the output resources are produced. In general, an action also has a set of parameters, and the input resources may contain variables (see, e.g., Example 3.18). The output resources should be uniquely defined given the input resources and the parameters, so the only variables allowed in the output formula are those that occur in the input formula and the parameters. This is a generalization of the range restrictedness as defined for skills (De Weerdt et al., 2003), 3 In
this example the (unique) identity values are irrelevant, and therefore omitted. In Definition 3.17 and the discussion about uniqueness at the end of this section, we show how the identity attributes are dealt with and how unique values can be generated.
46
Plan Merging in Multi-Agent Systems
where actions do not have parameters. In the following definition, we formally define such an action. Definition 3.17. Action. We introduce a countable set of action symbols Ω, where each symbol has rank (hidentity, S ∗ , bool ∗ i , bool ∗ ). The set OPSRL is the domain of the sort action, where an action is defined as follows. Given an action symbol o ∈ Ω and its rank function r(o), if 1. id is a term of sort identity, 2. x1 , . . . , xn are variables of sorts s1 , . . . , sn , respectively, 3. in1 , . . . , inm are atomic resource facts, such that for all 1 ≤ i ≤ m (a) the identity id(ini ) is the tuple (id, −i), where i is called the sequence number of ini , (b) the variables var(ini ) are labeled with id, and 4. out1 , . . . , outk are atomic resource facts, such that for all 1 ≤ i ≤ k (a) the identity id(outi ) is the tuple (id, i), where i is called the sequence number of outi , (b) the variables var(outi ) are labeled with id, and (c) var(outi ) ⊆
S
j var(in j ) ∪ {xi
| 1 ≤ i ≤ n},
then oid (x1 , . . . , xn ) : ϕ{ini |1≤i≤m} ⇒ ϕ{outi |1≤i≤k} is an operator or action having the following signature: o : identity × Sn × FORM RL → FORM RL . The simplest form of an action, without any parameters and variables, is a ground action. This is a tuple of two sets of ground resources: the input resources and the output resources. The semantics of this tuple is that whenever all ground input resources are available, they can be consumed, and all ground output resources are produced in return. For an action o = oid (x1 , . . . , xn ) : ϕRL ⇒ ψRL , we define the set of input resources in(o) as {ini | 1 ≤ i ≤ m}, and we define the set of output resources out(o) as {outi | 1 ≤ i ≤ k}. All resources in an action are denoted by res(o) = in(o) ∪ out(o). The identity of o is denoted by id(o), and each parameter xi is denoted by parami (o) for 1 ≤ i ≤ n, the set of all parameters is defined by param(o) = {parami (o) | 1 ≤ i ≤ n}. Finally, var(o) is the set of variables in an action o except param(o). We lift these definitions to sets of actions O as follows: in(O) = {in(o) | o ∈ O}, out(O) = {out(o) | o ∈ O}, res(O) = {res(o) | o ∈ O}, id(O) = {id(o) | o ∈ O}, param(O) = {param(o) | o ∈ O}, and var(O) = {var(o) | o ∈ O}. The action o is said to contain the resource r if r ∈ res(o); this is denoted by o(r).
Chapter 3. The action resource formalism
47
Example 3.18. The action move has one parameter, y, which is used to specify to which location the move of the taxi is aimed. The action consumes a taxi resource at the origin location (x) and produces two resources: a taxi at y (with a different identity), and a ride resource to denote that it is possible for a passenger to get from x to y.
move17 (y17 : loc) : taxi(17,−1) (x17 : loc) ⇒ ride(17,1) (x17 : loc, y17 : loc) ∧ taxi(17,2) (y17 : loc) (3.1) In this example, the set of all input resources of the move action in(move) = taxi(17,−1) (x17 : loc) , the set of all output resources is out(move17 ) = ride(17,1) (x17 : loc, y17 : loc), taxi(17,2) (y17 : loc) , the identity id(move17 ) = 17, the only parameter param(move17 ) = {y17 }, and the used variable var(move17 ) = {x17 }. Uniqueness of resources In the previous section we already argued that ground resources need a unique identity to indicate equivalent resources in a description of the state of the world or in a plan. Since an action may be executed many times, it is also important to distinguish between several occurrences of such an action. Furthermore, we require that each (occurrence of an) action in a set of actions O has a unique set of variables and parameters. This latter requirement ensures that when values are assigned to these variables using a substitution, we can simply join the substitutions for all actions to create a substitution for the complete set O. If the same symbol for a variable was used in different actions, such a global substitution could introduce unwanted relations. The used solution to meet these uniqueness requirements is to specify an identity term for each action: the id of sort identity in Definition 3.17, and to derive the identities of the input and output resources from the identity of the respective action and their sequence number. In this thesis, we use N+ ∪ (N × Z) as the domain for the sort identity, where N+ is used to denote the identity of actions and N × Z is used for the identity of resources.4 Furthermore, the definitions above ensure the uniqueness of actions, resources, and variables in a domain in the following way: 1. Each action o in a set of actions (for example in a planning problem) has a unique identity id 6= 0. 2. Each input resource of an action o has as identity a tuple (id, −i) where 1 ≤ i ≤ |in(o)| is the sequence number of the input resource (Definition 3.17 of actions). 4 We
conform to the following convention: N = {0, 1, 2, 3, . . .}, and N+ = {1, 2, 3, . . .}.
48
Plan Merging in Multi-Agent Systems 3. Each output resource of o has as identity a tuple (id, i) where 1 ≤ i ≤ |out(o)| is the sequence number of the output resource (Definition 3.17). 4. Each variable and each parameter in an action is labeled with the identity of the action (Definition 3.17). 5. Each resource rI in an initial state I has as identity a tuple (0, fI (rI )), where fI is a bijective function fI : I → {1, . . . , |I|}. 6. Each resource rG in a set of goal resources G has as identity a tuple (0, − fG (r )), where fG is a bijective function fG : G → {1, . . . , |G|}.
Apart from the properties defined in the definition of an action, these conditions are mainly realized by the definition of a planning problem.
3.3
Planning
Planning problem description Abstractly, planning is the problem of getting from where you are to where you want to be, using a given set of potential actions to change your situation. The most common representation of a situation or state is using propositions, such as in STRIPS (Fikes and Nilsson, 1971). So a planning problem in STRIPS is defined by a set of all propositions that are allowed, a set of propositions describing the initial state, a set describing the goal states, and a set of operators, defined by state changes (see Definition 2.6 on page 10). A planning problem can be described in the action resource formalism in a similar way using resources instead of propositions. Definition 3.19. Given the language RL, a planning problem is a triple Π = (O, I, G) where • O ⊆ OPS is a set of possible operators to describe state-changes in this domain, • r1 , . . . , rm is a sequence of atomic resource facts, and I = {ri | 1 ≤ i ≤ m} is a set of resources that describe the initial state, such that for all 1 ≤ i ≤ m the resource ri is ground and id(ri ) = (0, i), and • r1 , . . . , rn is a sequence of atomic resource facts, and G = {ri | 1 ≤ i ≤ n} is the goal specification, such that for all 1 ≤ i ≤ n the identity id(ri ) = (0, −i). Consequently, [[ϕG ]] denotes the set of all desirable ground states R, for which R |= ϕG (see Definition 3.13).
Chapter 3. The action resource formalism
49
Actions specify resource transformations. Using actions, we discuss resource transformation schemes built from a set of action specifications, where the inputs of an action o may depend on the output of other actions oi . We call such a structure a plan. The problem is to find such a structure using actions from O to translate the initial state I to a goal state that satisfies ϕG . Plans A plan is a specification of a finite collection of actions together with a set of dependency relations. Given a set of resources, such a plan can be used to satisfy a goal. Here, the set of resources given and the goal stated specify the context of the structured set of actions. Often, it is the case that this same set of actions can be used in different contexts, i.e., we have a plan that can be used for different pairs of given resources given and goals specified. Hence, we first concentrate on the structured set of actions separated from the planning context. In this structured set, we distinguish between a partial ordering specifying the (global) dependency relation between actions and a dependency relation specifying the (local) dependencies between the input and output resources of actions that are globally dependent upon each other. Definition 3.20. An action dependency structure is a partially ordered set (O,