International Conference on Databases and Expert Systems Applications
A Cooperative-Architecture Expert System for Solving Large Time/Travel Assignment Problems Yves Caseau Bellcore, 445 South Street, Morristown NJ 07962-1910
[email protected]
Peter Koppstein Bellcore, 444 Hoes Lane Piscataway NJ 08854-4182
[email protected]
Abstract: In this paper, we consider the problem of assigning tasks to operators according to a large set of constraints that include time sensitivity and travel optimization. Our practical instance of this problem combines computational complexity (scheduling the tasks for one technician is NP-hard and little is known about getting a solution [CK92]) and size (around 20000 tasks stored in a database). We present a solution that has been successfully implemented and tested, which we describe as an expert system where expertise is applied to constraint satisfaction. By combining a constraint solver and a rule-based domain expert, we have obtained a satisfactory level of efficiency while keeping the flexibility and extensibility of a constraint-based approach. Keywords: Scheduling, travel optimization, constraints, object-oriented deductive systems.
1. Introduction Much theoretical and practical work has been done on problems such as resource allocation based on feature matching, scheduling of tasks with time sensitivity and precedence, or travel optimization of a Hamiltonian circuit (TSP: Traveling Salesman Problem [L&al85]). However, real-life applications often combine all these problems at the same time with a size that makes looking for the optimal solution hopeless. For instance, we have been working on the following problem, where a set of tasks must be allocated to a set of technicians, so that each task is executed within a time window and the technician has the right skills. The goal is to optimize the total quantity of work done, by minimizing travel from one location to another and using technician efficiency profiles (cf. next section). When building a solution to such a problem, the quality of a solution can be evaluated along three axes: quality (how close to optimal is the solution ?), performance (how long do we have to wait ?) and flexibility (how easy is it to add new specifications ?). For instance, an ad hoc solution built as a C program will vary in quality (from good, when we can apply some techniques like dynamic or linear programming, to poor, when we simply use heuristics based on cost functions), will usually offer good performance but poor flexibility. An expert-system approach has usually an average quality (expert rules produce solutions that are better than those obtained with simple heuristics but worse than optimal), average performance (the cost of executing rules is compensated by a simplified search tree) and good flexibility. Finding the optimal solution with a constraint-satisfaction tool ([CLP][VH89]) would get top marks in quality or flexibility but poor ones on performance because of the intractability of those problems.
-2-
In this paper we describe a system that solves our task/technician assignment problem by combining various techniques to get good marks along all three axes. We use constraint representation for flexibility, constraint resolution for quality, and rules to help the constraint resolution and thereby improve performance. We also use methods (imperative code) to represent expertise from graph theory or operational research, which are linked to the constraint solver with production rules. The paper is organized as follows. Section 2 describes the practical application that we have built using the LAURE hybrid language [Ca91a]. Section 3 discusses the notion of an extended expert system for the resolution of constraint-satisfaction problems. Section 4 presents the issues related to the size of the problem and the database interface. Finally, we give some performance results and directions for future work.
2. An Assignment Expert System 2.1 A Task/Technician Assignment Problem The problem that we want to solve occurs in the telephone industry, when managing work requests for repair/maintenance; however, our approach is general enough so that it can apply to most task assignment problems. We suppose (see Figure 1) that we have a database of work requests, which is augmented each day by new work requests and by those that were not executed the day before. In our application the database contain 20000 tasks, and we want to consider sizes up to 100000. A task is a tuple containing the task identifier, a duration (how long the task takes to be performed on average), a time window, a location, and a work request type. The time window is made of two times x - y, and tells that the task must be performed after x and before y. The location is a reference to a physical location where the tasks must be performed. A map with all locations and distances is also stored in the database. The work type identifies a group of tasks with common features; the work type defines the list of skills necessary to perform the task and the setup time required to do the task unless the prior task in a technician's schedule is of the same type and at the same location.
New Tasks
technician database
task database t1 10 8 - 15 A wt10 t2 80 8 - 9 B wt10 t3 100 B wt2
tech1 tech2 tech3
tech1 t10 9am 9:30 t3 t11 11:00 …
loaded tasks Assignment Algorithm unloaded tasks
Figure 1: The Work Assignment Process
The goal of the assignment is to load as much work as possible (we will elaborate on this point in the next section) to a set of technicians stored in another database. For each technician, we have his/her starting and ending locations, his/her availability represented as another time window and a set of skills with associated efficiency ratios. These ratios are used to predict how much time a given technician will take to perform a task based on her/his skills, by reference to the average
-3-
time stored in the task1 . The result of the assignment algorithm is a set of schedules for each technician, where each loaded task is given with a starting time. The following are the main issues and difficulties in building such a system. •
Flexibility: We need to manage numerous additional constraints (such as preferred technician for a set of tasks, simultaneous tasks to be performed at the same time, precedence relation over tasks). A complete specification of the problem is beyond the scope of this paper, and we need to keep the door open for new specifications and features. Two related issues are maintenance (coping with new specifications) and explanation (we need some tools to justify how the algorithm has built its solution).
•
Quality: Because the algorithm applies to a large number of technicians and will be deployed at many sites, each 1% improvement on the total load is worth a lot of money. Therefore, we want to be as close to optimal as possible.
•
Performance: We want to perform the assignment in a few minutes on a workstation. Although it would be possible to run the load algorithm during the night and take a few hours, our goal is to move towards a more reactive system; experience has shown that too many last-minute modifications lower the value of schedules computed too far in advance.
•
Robustness: The algorithm has to work for a large domain of problems. For instance, the importance of the time window may vary from none (pure travel optimization) to total (pure scheduling). Another example is the distance between tasks that can be dominated by travel (nice Euclidean distance) or by setup (non-Euclidean, non-symmetrical).
2.2 Two Constraint Satisfaction Problems Taken as a whole, the assignment problem described above is totally intractable. It is a mix of many NP-hard problems with sizes that are out of reach by many orders of magnitude. Thus, we need to introduce some simplifications and use some heuristics for parts of the problem. We can represent this problem as the combination of two smaller problems, and try to address each of them separately. The first problem is the task-technician matching problem, where the goal is to assign one technician to each task. The second problem is a "scheduling" problem, where the goal is to assign a starting time to each task in one technician's tour. We know that an optimal solution for the global assignment problem implies that each technician has an optimal schedule. The second problem is self-contained, but the first problem relies on the second since we need to find the optimal schedule to know how many tasks we can "pack" into a given technician's schedule. However, if we use an optimal solution to the scheduling problem inside an exhaustive search of the first problem, we are bound to find the optimal solution (which justifies this decomposition, but is not of practical help). The task/technician assignment problem is a matching problem, which means that some general techniques are available. However, it is not a transportation problem because of the skill efficiency ratio associated to each pair (technician, skill), and it is not a machine allocation problem because task durations are not additive. Solving the problem by supposing that tasks are additive (in which case we could use an integer programming package) produces solutions that are too far from being realistic to be helpful. The local scheduling problem is a time-constrained traveling salesman problem TCTSP [Sav86]. We may represent each task as a node in a graph where the distance associated with an 1
A technician with skill s1 efficiency 150 will perform in 20 minutes a task of duration 30 that requires skill s1.
-4-
edge between two tasks is the sum of the duration of the first task, the travel time between the two tasks' locations, and the setup time for the second task, if it was executed just after the first task. Although time-constraints make TSP much more difficult, there is a fair amount of expertise about how to solve such a problem [CK92]. We have assumed so far that we have a well-defined optimization problem. Unfortunately, defining what an optimal solution is has proven to be quite difficult. The issue is to guarantee that optimizing for short-term results (one day's run) will not unduly compromise long-term results. If we optimize the amount of tasks loaded, the system will pick the smallest task and leave the big ones aside. If we optimize the total duration of loaded tasks, the system will favor large tasks (with less associated travel). A detailed analysis shows that we can associate a value to each task and that we should optimize the total value of loaded tasks. The construction of such an economic function is beyond the scope of this paper and is the subject of a companion paper [Ca92]. 2.3 A Practical Solution Our proposal is to solve completely the second problem because the usual size of a technician schedule is rather small (an average of 10 to 15 tasks), and to use a heuristic approach for the global assignment problem. The existence of priorities for tasks (higher priority tasks must be loaded first) and efficiency ratios (some technicians are better than others at performing a task) make heuristic choice a viable solution compared to alternative approaches, as we shall discuss in Section 4.3. The architecture of such a system is described in Figure 2. task t1 t2 t3 t4
tech tech1 tech2 tech3
pick task
tech7
TCTSP solver
tech8
TCTSP solver
possible techs Dynamic Assignment Map
TCTSP solver
schedule
pick schedule
the task is assigned
propagate choice
Figure 2: Architecture of the Assignment Algorithm
The TCTSP module is implemented in LAURE as a constraint satisfaction/optimization problem. Constraint resolution is helped with a set of rules and methods that represent expertise from TSP or scheduling problem resolution. This part of the system is thoroughly described in [CK92] and we shall give more details in Section 3.2. We use the open-backtracking architecture of the LAURE constraint solver [Ca91b], which allows one to write rules that interact with the constraint resolution. These rules get triggered whenever the constraint solver tries a new hypothesis and are used to manage additional relevant information. This information is used in turn by the constraint solver to make its next choice. This cooperative architecture allows the extension of the constraint solver with domain-dependent knowledge that is necessary for such a problem. Technically, the original feature of LAURE is the extended backtracking of the constraint solver that removes, when a failure occurs, not only the wrong choice but all its consequences computed by rules and methods. The optimization in the TCTSP module is performed through branch-and-bound and starts with the value of a solution generated by an insertion heuristic. Given a set of tasks for one technician, this module is able to return the optimal schedule (with respect to travel-time and end-time) in a 100-400 ms range on a SUN SPARC 2.
-5-
The task assignment module mimics an arc-consistency [Ma77][VHD91] resolution of a constraint problem without the backtracking part. We dynamically maintain a set of possible technicians for each task, which is used to pick the next task to be examined using the first-fail principle 2 , and this set is updated by propagation whenever an assignment is made. After a candidate task is picked, the system tries to insert it in all schedules of possible technicians by calling the TCTSP module. Resulting schedules are compared using a cost function that tries to pick the assignment that uses less resources, weighted by the scarcity of these resources. This cost function relies, among other things, on scarcity of technician skills based on the task-technician assignment map. It should be noted that the whole system is written in LAURE, and that the assignment module uses extensively the support for hypothetical reasoning provided by the LAURE language.
3. A New Form of Expert System 3.1 A Hybrid System The system that we have described in the previous section is a real instance of a hybrid system, since we are merging various computation paradigms to solve different problems. It is an expert system, since it uses domain-dependent expertise about its own process, expressed in a declarative form that can be easily consulted or extended. The three main paradigms that we used are constraints, rules, and methods (imperative programming). The use of constraints is the key to achieve flexibility. By representing the problem to be solved (for instance in the TCTSP module) as a set of constraints and by solving the problem through the constraint solver, we make sure that we can add any new constraint at any time, thus keeping the system open to further specifications. The fact that we leave the control to the constraint solver, although we extend it with rules, dramatically reduces the size and the complexity of the code. Most of the technical aspects of building a search tree are hidden from the designer because we use a constraint resolution language. We use production rules3 to describe expert guidance throughout the process in a declarative manner. The significant difference is that rules are subordinated to the constraint solver in our architecture, whereas they are usually used to guide the search as a meta-level tool. Instead of using rules to tell how to build a pseudo-optimal solution, we use rules to help make sure that we find the optimal solution faster. Methods play an important role in such a system and could not easily be supplanted by production rules. First, we use them to implement cost functions and choice heuristic in the task/technician assignment module. Second, we use them as conclusions to the production rules that are triggered by the constraint solver. Some of the expertise on TCTSP consists of algorithms that come from graph theory and operational research, and is described more easily and more efficiently using methods than production rules.
2
The choice function used to select a candidate task is actually more complicated, but the first-fail principle (pick the tasks with the smallest set of possible technicians) is an important part of it. Other components are the priority of the task, its duration and time window. The priority is important as we mentioned previously, and the two other parameters are used to make sure that the system does not favor "easy" tasks, which would break stability (Section 2.2). 3
forward-chaining rules that are similar to those of an expert system shell.
-6-
The natural consequence is that we need a hybrid language to implement such a system. Using an expert system shell without constraint resolution (our first approach was to use a commercially available expert system shell) yields both more complex code (we have to code the search strategy) and much worse performance (an expert system shell is not an efficient low-level implementation language). Most commercially available constraint solvers follow a black-box philosophy, which makes them irrelevant for our problem. A better compromise is the PECOS system [PA91], which combines an efficient constraint solver in a LISP environment. However, we miss the declarativeness of production rules when using LISP functions to extend the constraint solver. 3.2 A Library of Expertise The part that makes our LAURE program an expert system is the set of rules used to help the constraint solver. These rules are grouped into sets, with associated methods, and come from various viewpoints [CK92]. For instance, a scheduling approach suggests maintaining a dynamic time window for each task, represented by two relations atleast and atmost, and concentrating on the ordering of tasks. We imported from a previous scheduling application the set of rules that propagates (reduces the time window) updates on the schedule to the time windows and deduces new ordering relations (thus further reducing the time windows). Looking at our problem from a TSP point of view, we use a lower-bound estimate of travel between tasks according to the tour that is being built, which is maintained by another set of rules. Similarly, we use the difference between the closest neighbor and the next closest neighbor as a heuristic for ordering tasks during constraint resolution. We also check that the graph defined by the possible choices at any time is strongly connected (a consequence of the existence of a tour). The well-known algorithm for doing so is implemented as a method and is triggered by a rule. Another improvement in the algorithm comes from preventing permutations. We identify pairs of identical tasks and order them arbitrarily using the ordering relation that we introduced previously. This prevents the solver from exploring many equivalent solutions and improves efficiency significantly in some situations. Depending on the actual data, one of this viewpoint is usually better suited to solve the problem. The reason we want an expert-system approach using rules is the ability to combine those viewpoints easily. A new set of rules can be added at any time in our current system, whereas a more traditional implementation would require a complete change in the algorithm's implementation to add an extra constraint or a new resolution technique. 3.3 A Framework to Experiment with Various Strategies However, it would be unfair to claim that we easily found the LAURE solution to our problem. There are many ways to formulate the TCTSP with constraints, as we have shown in [CK92], depending on which relations we try to define among the objects. There are similarly many ways to help the resolution of a given set of constraints with additional knowledge. It is usually easy to come up with a solution that works well for a small set of examples, by analyzing how an "expert" would solve them. However, as soon as the system is tested on a very large sample of test data, problems arise. We actually started our project thinking that we were building a constraint satisfaction approach to the assignment problem. After many iterations and when so many expert rules/methods were added to get better results, it became clear that our system is more of an expert system because most of the development effort was spent in synthesizing rules. As for any expert system, a lot of time is needed to extract the relevant knowledge by understanding why such problems occur.
-7-
One of the key elements for success is the ability to redesign the system and try new resolution strategies in a very short time. This is the main technology issue as far as we are concerned, and this is the main advantage of LAURE [Ca91a]. We see LAURE as a tool for experimenting resolution strategies because of the following features: •
LAURE is both an interpreted language (for user-friendliness) and an efficient compiled language. Thus, we can experiment with real sets of data as opposed to toy problems.
•
LAURE constraint solver uses an open world stack as its backtracking mechanism, which allows other software components to participate in the constraint resolution.
•
LAURE is a high-level knowledge representation language, which yields elegant and compact programming.
As a result, we spent only 20% of the 6 months necessary to achieve this project developing LAURE code and the rest of the time was spent testing the system and extracting the expertise to enrich the system. A first result is that we have much more confidence about the robustness of our system when used in the field. A second advantage is that we have demonstrated in the early development phase an ability to modify incrementally the system to support new ideas or new features. This is a strong confidence indicator for the ease of maintenance of the future product.
4. Issues and Future Directions 4.1 Volume and Database Issues In the current system, the data is stored on a mainframe using a hierarchical database. Although changing the database technology is not an option here from our point of view (the load algorithm is just one piece of a much larger set of software components that uses the same database architecture), there are still a lot of open questions in developing such a system. Concurrency/safety/recovery issues are handled by the corporate database architecture, but there is a volume/persistence issue that needs to be solved. We currently use LAURE to perform the processing, which relies on virtual memory to accommodate large sets of data. LAURE is interfaced with the database through a data extraction library. Some filtering is done at this early stage to reduce the number of tasks that need to be considered. The size issue is very different for the two components of our system. The TCTSP module is extremely data intensive, but only works on small problems. Thus, a LAURE implementation is an excellent match. On the other hand, the task/technician assignment problem relies on a global view of the problem (cf. Figure 2, the assignment map), which causes problems. In the current implementation, we use information such as priority to segment the load problems into many sub problems, loading a smaller set of tasks at a time. Together with the use of virtual memory, we obtain satisfactory performances. However, a longer term evolution will incorporate the need to accommodate larger problems and more complex assignment strategies, which argues for the use of an intermediate database. Our current plan is to migrate LAURE's target language (output of the compiler) from C/C++ to a persistent OODBMS. This will give persistence to LAURE objects and transfer the burden of cashing/segmenting from the programmer to the database management system. There are other benefits of using a local database (with more advanced technology), which are out of scope of this paper (e.g., developing a better user interface for querying about the work requests and the technicians). 4.2 Performances
-8-
To evaluate the success of our new system, we have some comparison points. First, the attempts to solve the problem using more conventional AI tools failed for lack of scalability. We have also developed various comparison systems to evaluate the relevance of some of our choices. We built a simplified version using heuristics to solve the TCTSP, with a similar task/technician assignment module. We also built a TCTSP module using a smart dynamic programming algorithm to evaluate the performance of the LAURE approach. We tried several designs for the task/technician assignment module to evaluate the benefit of additional complexity. Last, we used the existing system with old data sets to evaluate overall quality and performance (a somewhat unfair test since the old system does not handle time constraints and solves a much simpler problem). The quality results have been surprisingly good, since our system loads up to 3% more work than the simpler system based on heuristics. The actual improvement depends on how much travel and how many time constraints can be found in an average technician tour. Because of the size of the problem and the work force involved here, each 1% is worth a lot and these results are very good. Similarly, we have found that the new system gives better loads than the old system, even if no time constraints are present. The current system is able to load 2000 tasks with time constraints in 300 s on a SUN SPARC 2 workstation, with an optimal resolution of TCTSPs for sizes up to 10-12 tasks. When the TCTSP is too large (which is rare), we use a heuristic instead of exhaustive search (the cutoff value depends on the data but is always larger than 10). The actual time spent for the resolution of a 10 nodes TCTSP is in the 100-500 ms range, which compares well with the dynamic programming algorithm and compares very well with pure constraint resolution approaches. More details about the performance comparison between the LAURE and the dynamic programming approach may be found in [CK92]. 4.3 Improving our Design Most of our effort has concentrated on the resolution of the TCTSP and we are now satisfied with the level of performance obtained with the LAURE implementation. Our next step will be to reevaluate the task/technician assignment strategy, with respect to the theoretical analysis described in [Ca92] (cf. Section 2.2). The other direction for future work is to incorporate an objectoriented database as we mentioned previously. A first direction to improve the current system is to allow for limited backtracking in the assignment module. Some form of intelligent backtracking would allow one to consider exchanging pairs of tasks between technicians when an "impossible" task is found. A more complex issue is then to balance the trade-offs between performance and exhaustivity in both constraint resolution modules. Our first findings have shown that heuristics can perform a better job for the task/assignment problem because of the task priorities that limit the benefit of global strategies. However, more work is needed, and we need to try combinations of smarter assignment modules with simpler TCTSP modules (using heuristics such as local optimization [LK73][Joh90]). Another direction to improve the assignment module is to perform a preassignment based on some extension of the transportation model. If we had no priority and no efficiency ratio, we could get a good estimate of how to distribute work using a data flow analysis [GM79]. We have tried simpler preassignment strategies but they are subsumed by the dynamic assignment map that we use in our current system. Our next goal is to develop a more complex algorithm that will suggest both a preferred assignment for each task as well as possible exchange pairs for limited backtracking.
-9-
5. Conclusion In this paper we have presented a practical system and a design method. Although both are in early stages and need further work, they have already proven successful on a "real-world" large problem. The system is successful because the quality of the solution it produces is significantly higher than what we obtained with previous approaches, its performance is adequate according to our specifications and the flexibility is much better than for the system that was used previously. The method for building an expert constraint satisfaction system seems relevant for problems that are hard and for which little is known (so that they do not satisfy requirements for expert systems or integer programming approaches). For such problems, an experimental view (deriving expertise from experience) must be combined with a combinatorial approach (techniques to explore a large search space efficiently). Thus, the method requires a hybrid language to be implemented easily (at least in the prototype phase). As for the application described in [G&al90], the combination of programming paradigms found in LAURE has proven relevant for such a complex problem.
Acknowledgments This work was done in collaboration with a large team of persons involved in the assignment process. In particular, we would like to thank Paul Matthews and Fred Eichelman, who introduced us to the assignment problem and who suggested some useful heuristics and evaluation strategies. We are also thankful to Abie Reifer, Mark Weichselbaum and Winnie Chen who acted as our domain experts, because of their experience with previous versions of the assignment software. We would like to thank Clyde Monma, Bill Cook, Pascal Van Hentenryck and Martin Grötschel for their insight and useful suggestions. Last, we are grateful to all persons involved in the development of LAURE, including Diane Hoffoss, Sergiu Simmel, Laurent Perron and Jerry Lutkus.
References [Ca91a] [Ca91b] [CLP] [CK92]
[Fo82] [G&al90] [GM79] [Joh90] [L&al85]
Y. Caseau. An Object-Oriented Deductive Language. Annals of Mathematics and Artificial Intelligence, special issue on deductive databases, March 1991. Y. Caseau. Rule-Aided Constraint Resolution. Proc. of PDK'91, Lecture Notes in Artificial Intelligence, Springer Verlag vol. 567, 1991. N. Heintze, et al.. Constraint Logic Programming: A Reader. 4th IEEE Symposium on Logic Programming, San Francisco, 1987. Y. Caseau, P. Koppstein. A Rule-Based Approach to a Time-Constrained Traveling Salesman Problem. International Symposium on Artificial Intelligence and Mathematics, Fort Lauderdale, January 1992, to appear. C.L. Forgy. RETE: A Fast Algorithm for the Many Pattern/Many Object Pattern Matching Problem. Artificial Intelligence, no 19, 1982. M. Ganti, P. Goyal, R. Nassif, P. Sunil. An Object-Oriented Development Environment. COMPCON, Feb 1990. M. Gondran, M. Minoux. Graphes et Algorithmes. Eyrolles, Paris, 1979. D. S. Johnson. Local Optimization and the Traveling Salesman Problem. Proc. of the 17th Colloquium on Automata, Languages and Programming, Springer-Verlag, 1990. E. Lawler, J. Lenstra, A. Rinnooy, D. Shmoys (eds.). The Traveling Salesman Problem: a Guided Tour of Combinatorial Optimization. Wiley, Chichester, 1985.
- 10 -
[LK73] [Ma77] [PA91] [Sav86]
[Sav89] [VH89] [VHD91]
S. Lin, B.W. Kernighan. An Effective Heuristic Algorithm for the Traveling-Salesman Problem.. Operations Res. 21, 1973. A. Mackworth. Consistency in Networks of Relations. Artificial Intelligence vol.8, 1977. J.F. Puget, P. Albert. PECOS: programmation par contraintes orientée objets. Génie Logiciel et Systèmes Experts, vol. 23, 1991. M. Savelsbergh. Local search for routing problems with time windows. . Report BSR89xx, Centre for Mathematics and Computer Science, Amsterdam, The Netherlands, 1989. M. Savelsbergh. The vehicle routing problem with time windows: minimizing route duration. Ann. Oper. Res. 4, 1986.. P. Van Hentenryck. Constraint Satisfaction in Logic Programming. The MIT press, Cambridge, 1989. P. Van Hentenryck, Y. Deville. The Cardinality Operator: A New Logical Connective for Constraint Logic Programming. Proc. of the 8th ICLP, Paris, 1991.