Automatic Composition of Web Services with Contingency Plans Luiz A. G. da Costa1, Paulo F. Pires2, Marta Mattoso1 COPPE - Systems Engineering and Computer Science Program1 DCC-IM/NCE2 Federal University of Rio de Janeiro P.O. Box 68511, Rio de Janeiro, RJ, 21945-970, Brazil {gibson, marta, pires}@cos.ufrj.br
Abstract The semantic Web technology and the Web services description language extensibility may be combined to describe services in an unambiguous and machine interpretable way, automating Web services discovery, selection and invocation. In this paper, we present an algorithm and a prototype for the automatic composition of Web services that implement workflows described in a high level language. Our approach has many advantages comparing to the manual creation of a simple program composition, such as smaller implementation time and cost, reliability with the generation of contingency plans, greater capacity to evolve with the dynamic service discovery, and faster execution time with the use of heuristics. We use the OWL-S ontology to semantically describe Web services metadata and indexes to help selecting them. The proposed algorithm considers that equivalent services may have different interfaces and also respects preferences of the users.
1. Introduction The semantic Web [1,2] and the Web services technologies [19,20] may be combined to describe services in an unambiguous and machine interpretable way, providing the underpinning for automating Web services discovery, selection and composition. Through the Web services technology, it is possible to know how to invoke a given Web service. Through the use of ontologies, it is possible to know the semantic value of Web services metadata. It is possible to know that one operation must be executed before another, _________________________ This work was supported in part by the Brazilian funding agencies CNPq and CAPES.
Proceedings of the IEEE International Conference on Web Services (ICWS’04) 0-7695-2167-3/04 $ 20.00 IEEE
because the former generates an effect that is a required pre-condition of the latter. It is also possible to know that an output is a sub-class or contains a class or sub-class of an input and, therefore, may be used when invoking that operation. The automatic composition of Web services can be viewed as the automatic implementation of workflows based on Web services. An automatically generated program may be superior in various aspects if compared to a hard-coded composition of Web services operations calls. The first advantage of automatizing Web services compositions is that the user, who knows the composition workflow, has the power to execute adhoc programs. Thus, the user does not have to wait for a programmer to implement the workflow, which is time consuming and costly. The second advantage is better robustness with the automatic creation of contingency plans. To create every possible contingency plan, the programmer would have to know every useful and available web service and to imagine every possible path in the program execution flow. Another advantage is the capacity to evolve that comes from the automatic discovery of services. New services may be added or have their interfaces changed over time with no impact in the original workflow definition, the only impact is the automatic regeneration of the workflow implementation logic, which is automatized. Finally, an automatized implementation of a Web service composition can be tuned with the use of heuristics. A programmer may also use heuristics when choosing services to create a program. However, a computer is much more accurate and, besides, the quality of a service, which is used to tune the program, may change after the program implementation. In this paper, we present an algorithm and a prototype for the automatic composition of Web
services based on both the semantic Web and Web services technology. The main differential of our algorithm is the automatic creation of contingency plans. They are important to obtain robustness and the possibility of parallel scheduling. Moreover, our algorithm allows equivalent Web services (services that perform the same semantic task) to have different interfaces, which is very frequent in the real world, as long as they agree on the common ontology used. The algorithm also supports incomplete workflow descriptions. An input necessary to execute an operation may be unavailable as long as there is an output of another operation that has all necessary available inputs. In the proposed algorithm, the user specifies a workflow by means of available inputs and a set of activities that must be executed, using a given ontology. There are no explicit references to Web services. All references are made to activities, which are implemented by Web services operations. The inputs used in each activity invocation are not explicitly specified. It is up to the execution engine to choose between the available inputs. Although our execution engine may choose several services to implement a workflow, the user may limit its options by being more specific. If the user specifies an activity, output or effect sub-class, instead of the super-class, the user is limiting the engine options and specifying preferences. In our prototype, Web services must be described in OWL-S [2] to allow our engine to reason about them. Each input, output, pre-condition and effect and the activity of each operation of each Web service has a reference to an OWL [1] ontology concept. This allows our execution engine to know exactly what every operation does and what it needs to execute. The workflow may be described in OWL-S or in a high level language, which is described in Section 5.1. The remainder of this work is organized as follows. In the next Section we analyze the related work. Then, we describe the proposed algorithm that automatically composes Web services with the generation of contingency plans. Next, we show a prototype that implements the ideas of the proposed algorithm. Finally, we conclude this work and present some future tasks.
2. Related Work In the semantic Web area there are algorithms and prototypes being developed that handle the problem of composition and discovery of services in two different ways. There are solutions based on static composition
Proceedings of the IEEE International Conference on Web Services (ICWS’04) 0-7695-2167-3/04 $ 20.00 IEEE
and dynamic discovery and solutions based on dynamic composition and dynamic discovery. In [4], [5] and [9] there are examples of compositors that use static composition and dynamic discovery. While [6] and our work are examples of both dynamic composition and dynamic discovery. The Stanford University Knowledge Systems Laboratory proposed the use of a situations calculus based language called Golog to define generic workflows. According to [4], they are working on a DAML-S [22] to Golog translator to allow the specification of workflows in DAML-S. Their objective in creating a Web services composer is to allow the reuse of generic high level descriptions of workflows that are customized by the user with the addition of restriction clauses. With the generic workflow description and the restrictions, the user would use the composer to execute the workflow. Differently, our objective is to allow the user to automatically implement workflows to obtain smaller implementation cost and time, better capacity to evolve, robustness and faster execution. The user specifies preferences with activity and parameter subclassing, flow control constructs and different input parameters in later executions. Moreover, their proposal uses a reasoner to generate the composition plan, so they must represent the workflow using logics. Our algorithm uses a reasoner only to index the available services, so we can represent the workflow using a graph abstraction. Their algorithm uses backtracking to compose Web services and on the fly recovery and planning. If a path cannot be executed or generated an execution error, they discover and try to execute another path. We generate all paths previously so the workflow can be executed faster if we consider it will be executed many times. The University of Georgia Large Scale Distributed Information Systems Lab has developed the METEOR-S Framework for Semantic Web Process Composition [5]. They describe services in DAML-S and workflows in an extended version of BPEL [14]. They added an invoke-activity element to BPEL in order to describe the invocation of operations with a reference to an ontology concept, a tModel or a port type. For each generic operation invocation, they select a specific operation that implements the activity specified by the ontology concept reference, is part of the tModel or the port type. On the other hand, we select many operations for each generic operation invocation because we are interested in contingency plans and because we allow the description of the generic operation call with the activity that must be implemented, the outputs that must be produced and/or the effects that must be generated and they may not be
produced by one operation but by a pipeline of operations. Moreover, METEOR-S does not compose plans that may be automatically executed. The user must specify which user inputs and which activity outputs are used as an input for each activity. However, during the discovery of services they consider the quality of the services and allow the user to establish ranking metrics based on the quality and the semantic matching of the services. We are currently not supporting such feature. In [6] an algorithm for the composition of Web services is described. This algorithm receives as input a set of outputs and effects that must be generated. The authors of this algorithm argument that their approach is able to build plans dynamically from scratch and that it does not rely on templates for composition. We consider the definition of a set of outputs and effects that must be generated at one step of a workflow and that the workflow may also contain other steps as well as flow control constructs, i.e., a composition algorithm should also be able to implement workflows that are not pipelines. [6] uses DAML-S to describe the available Web services and a reasoner called Jess [18] to generate the composition plan. my GRID [9] is a large UK eScience pioneer project to provide open source Grid services middleware for bioinformatics. They implemented a DAML+OIL [21] ontology service to support partial user-driven service matching and composition. They extended DAML-S in order to support bioinformatics service description. They added elements to specify the task performed by the service, which data source is used, an element to specify if the service is a proxy of an existing application and an element to specify the algorithm used by the service. They also created a large bioinformatics ontology. To describe workflows, they use WSFL, one of the languages that gave origin to BPEL. To implement workflows, they use a reasoner, a matcher and ranker of services. For each task in the workflow, one operation is selected. SRMW [3] is an architecture that manages scientific resources using metadata management. Its implementation combines metadata support with Web services within a framework that supports scientific workflows. Its objective is to help bioinformatics develop in silico experiments with features like assisted workflow implementation and the use of partial results in workflow re-executions. Today, BPEL is used to formally specify the composition of the programs. It is worth noting that none of these works address the generation of contingency plans to improve the reliability of generated compositions. We believe that
Proceedings of the IEEE International Conference on Web Services (ICWS’04) 0-7695-2167-3/04 $ 20.00 IEEE
such a feature is essential in many different application domains.
3. The Composition Algorithm The proposed algorithm was created based in the DAML-S description of Web services. A Web service operation has inputs and outputs and implements an activity, which has pre-conditions, effects and subclasses. The input of the algorithm is a high level description of a workflow: the available inputs and pre-conditions and a set of activities that must be executed. The set of activities is composed of flow control constructs and generic calls to any operation, of any Web service, that produces an output, executes an activity or generates an effect. Figure 1 shows a high level description of a simple workflow to buy a book. Notice that there are no explicit references to Web services. Also notice that the inputs used in each activity invocation are not explicitly specified. Input: bookTitle: Dreamcatcher bankAAccountNumber: 10001-0 clientEmail:
[email protected] Activities: searchBookPrice; if (bookPrice < 20) buyBook;
Figure 1. High level description of a workflow. try { ISBN=BookstoreA.searchBookISBN(bookTitle); bookPrice=BookstoreA.searchBookPrice(ISBN); if (bookPrice < 20) { bankAuthorization = BankA.authorizePurchase(bookPrice); BookStoreA.buyBook(ISBN, bankAuthorization, clientEmail); } } catch exception { bookPrice = BookstoreB.searchBookPrice(bookTitle); if (bookPrice < 20) { bankAuthorization = BankA.authorizePurchase(bookPrice); BookStoreB.buyBook(bookTitle, bankAuthorization,clientEmail); } }
Figure 2. Implementation with contingency plans. The output of the algorithm is a graph whose nodes are flow control constructs or calls to specific Web services operations. Figure 2 shows a possible implementation of the workflow shown in Figure 1. Notice that the input bankAAccountNumber restricts the purchase authorization to invocation of Bank A, which reflects a preference of the user. Also notice that no input restricts the bookstore options. In addition, the Bookstore B service is used as a contingency plan and the services provided by the bookstores are slightly
different: the ISBN is necessary to buy a book at Bookstore A but it is not used at Bookstore B. In the next section we provide a set of definitions that are used throughout the description of the proposed algorithm.
3.1 Definitions An input or pre-condition is considered available if it is provided by the user or if it is an output or effect of a Web service operation that is available. A Web service operation is available if all of its inputs and pre-conditions are available. An activity is available if there is at least one available Web service operation that implements it. A Web service operation is considered useful if it is available and generates an output or effect or implements an activity specified by the user or if it generates an input or pre-condition of a useful Web service operation. For each concept in the ontology, if it is an input or an output of an activity then, there is a variable for it. Such variable allows us to associate an output of an activity with an input of another.
3.2 The Algorithm Steps The proposed algorithm is composed of five steps. First, an initial graph is assembled from the workflow description. Then, for each node that represents an activity (a generic operation call) there is the selection of useful Web services operations. Such selection restricts the search space and activates the third step, which is the generation of all possible execution paths that produce the effects, generate the outputs and execute the activities specified by the user. The fourth step is the creation of the execution graph from the paths generated in the previous step. The last step is the optimization of the execution graph. 3.2.1 Initial Graph Assemble. In this step, an initial graph is assembled from the workflow description. It is composed of flow control nodes and activity nodes, representing generic calls to any Web service operation that produces a set of outputs, generates a set of effects and/or implements an activity. Children nodes represent alternative execution choices and a path in the graph represents a possible execution path. The flow control nodes are if/else, while, do while, do until, try/catch exception, split and split and join. Flow control nodes have a condition expression based on the ontology concepts. These concepts must be an input of the user or an output generated by an operation. The
Proceedings of the IEEE International Conference on Web Services (ICWS’04) 0-7695-2167-3/04 $ 20.00 IEEE
algorithm next steps are only applied to operation nodes. The flow control nodes do not need composition because they do not invoke services. They only access the value of the available variables. Since our algorithm represents workflows as a common graph representation the user can specify workflows through virtually any language as long as a translator to the graph representation exists. 3.2.2 Useful Operations Selection. In this step, the search space of the generation of all possible execution paths is restricted. The selection of useful operations has two sub-steps. First, eight sets are created: inputs that are available, pre-conditions that are available, activities that are available, outputs that are wanted, effects that are wanted, activities that are wanted, operations that are useful if available and operations that are useful. Initially, the user inputs are added to the set of available inputs and the user pre-conditions are added to the set of available pre-conditions. The outputs and effects generated from the ancestors of the node present in the path from the initial node are also added to the sets of available inputs and available preconditions, respectively. Finally, the user wanted activities, outputs and effects are also added to their respective sets. The second step is the application of five rules, until it is not possible to apply any rule or one plan is possible. The initial filling of the sets acts as a trigger for the execution of the rules. Rule #1: IF there are new elements in the inputs that are available, or in the pre-conditions that are available sets THEN the operations that are useful if available, that need only available inputs and preconditions, must be added to the set of operations that are useful. Rule #2: IF there are new elements in the useful operations set THEN their outputs, effects and activities, must be added to their respective available sets. Rule #3: IF there are new elements in the wanted outputs, effects or activities sets THEN the operations that produce at least one of them must be added to the operations that are useful if available set. Rule #4: IF there are new elements in the set of operations that are useful if available THEN their inputs and pre-conditions must be added to the wanted outputs and effects sets, respectively. Rule #5: IF all the user outputs, effects and activities are available THEN it is possible to generate a plan. If an operation produces an output, generates an effect or implements an activity that is an OWL sub-
class of a wanted output, effect or activity or has it as a property, the operation is selected by rule #3 and, therefore, is considered useful. Also, if an available input or pre-condition is a sub-class of an input or precondition of an operation or has it as a property, the input or pre-condition is used to make the operation available by rule #1. The output of the useful operations selection step is the useful operations set if a plan is possible. Otherwise, an error is reported. Figure 3 shows which sets are accessed by which rules. Arrows that arrive at sets indicate writing and arrows that leave sets indicate reading.
Finally, for each path generated, if it is not complete, vectors with operations from the useful operations set that produce the needed inputs or preconditions and do not produce elements present in the set of outputs that should not be produced are added and the previous step is executed again. Figure 4 shows a possible configuration of vectors and possible paths. The vector responsible for the generation of output #1 is composed of operations op1 and op2. The vector responsible for the generation of output #2 is composed of operations op3 and op4. The paths that generate both output #1 and output #2 are {op1, op3}, {op1, op4}, {op2, op3} and {op2, op4}.
Figure 4. Vectors and possible paths.
Figure 3. Sets and rules data flow. 3.2.3 Generation of All Possible Execution Paths. In this step, for each operation node, all possible paths composed of specific Web services operations that implement the generic operation call are generated from the useful operations selection. It has four substeps. First, a set to store outputs that should not be produced is created. The outputs used in if conditions by ancestors of the node present in the path from the initial node are added to this set. Only operations that produce an output, generate an effect or implement an activity specified by the user are considered useful if they produce an output that should not be produced. If an operation was selected because it produces an output that is an input of a useful operation but it also produces an output that should not be produced, it is discarded. Next, for each user output, effect or activity, a vector is created and filled with operations that produce this output, effect or activity from the useful operations set. Then, all possible paths composed of one element of each vector are generated. Notice that every path produces every output, effect or activity requested by the user.
Proceedings of the IEEE International Conference on Web Services (ICWS’04) 0-7695-2167-3/04 $ 20.00 IEEE
3.2.4 Execution Graph Creation. In this step, the execution paths are added to a graph. This execution graph contains all possible contingency plans. If a node’s operation fails, we can try to execute the path that starts in the brother of the node. The workflow invocation only fails if an operation of the node fails and there are no brothers or uncles left in the execution graph.
Figure 5. Paths with repeated operations. The execution graph creation is composed of four sub-steps that are applied for each path generated in the previous step. First, repeated operations are eliminated. They may be present because an operation may produce more than one desired output or effect, so it may be present in more than one vector in the generation of all possible execution paths step. Figure 5 shows a possible case of vectors that generate a path (op1 op2 op1) with a repeated operation (op1), that is present twice because it generates both output #1 and effect #1. Next, redundant operations are removed from the path. An operation is redundant if all of its outputs that are used as inputs to other operations are also produced by other operations to fulfill the same inputs. To
identify the redundant operations, a graph is created. In this graph, the path operations are states and if one operation produces an output that is used as an input by another operation, there is a transition from the state of the first operation to the state of the second operation. The transition is labeled with the input name. For each operation, if all the transitions that leave its state arrive at states that have other transitions arriving with the same label then the operation is redundant and must be removed. Figure 6 shows a possible instance of the graph used to identify redundant operations. Notice that operation op2 is redundant, because it only produces in1 and in1 is also produced by op1, and that operation op4 is not, because it is the only operation that produces in2.
4.1 Completeness Suppose that a composition plan was not generated. If it was possible to generate with the user inputs then a useful operation was not selected. If an operation is useful then it either produces a user specified output, effect or activity or produces an output or effect that is an input or pre-condition of a useful operation. In both cases, the operation is selected by rule #3 and then by rule #5.
4.2 Correctness Suppose that a composition plan was generated incorrectly. Then it does not produce all the user requested outputs, effects or activities. However every path generated by the algorithm is composed of at least one element of every vector that contains operations responsible for producing each user requested output, effect or activity.
4.3 Complexity Figure 6. Graph used to identify redundant operations. The third step is the reordering of the path operations. The operations are ordered according to their input and pre-conditions dependencies. First, a new path is created with operations that depend only on inputs and pre-conditions of the user. Then, their outputs and effects are made available so other operations can be added to the path until all operations of the first path are added to the new path. Finally, the paths are added to the execution graph and the operation node in the workflow graph is replaced by the execution graph. 3.2.5 Execution Graph Optimization. The execution graph optimization step is the application of operations to the graph according to a set of heuristics. Today, the only heuristic used is the reordering of the paths so the shortest paths are executed first. If they fail, longer paths are executed. The objective of this heuristic is to try to minimize the communication cost. We plan to use the cost function presented in [8] to improve this step.
4. Algorithm Analysis In this section, we analyze the algorithm in an informal manner to intuitively show its completeness, correctness and complexity.
Proceedings of the IEEE International Conference on Web Services (ICWS’04) 0-7695-2167-3/04 $ 20.00 IEEE
The complexity of the algorithm to compose each activity of the workflow described by the user is O(Nop2 + NusNus), where Nop is the total number of operations available and Nus is the number of useful operations to compose the activity. The O(Nop2 + NusNus) formula is given by the complexity of the useful operations selection step, which is O(Nop2), and by the complexity of the generation of all possible paths step, which is O(NusNus).
5. The Prototype The prototype of the Web services composition algorithm allows the user to describe a workflow in a high level language, and then it creates the composition plan, shows it to the user and allows the user to execute that plan or to save it as a Java program. Since the proposed Web services composition algorithm uses a graph abstraction to internally represent compositions, the algorithm input can virtually be in any workflow specification language. Currently, the prototype supports workflow specifications described in OWL-S, in DAML-S or in a high level language created to make it easier for the common user to describe workflows and to overcome some OWL-S limitations. The available Web services are described in OWL-S or in DAML-S. Usually a Web service is described in OWL-S according to three views: the service profile, the
service model and the service grounding. The service profile informs the inputs, outputs, pre-conditions and effects of the service. The service model describes the inputs, outputs, pre-conditions and effects of each operation of the service. The service grounding is an extended WSDL file and establishes a reference to an ontology concept for each part of each message and for each operation. The use of the three files is unnecessary to describe the data we need to compose services, since there is redundant information. To make it easier to describe a Web service, we currently do not use the service profile. However, we expect to use it soon because it may also contain information about the quality of a service and we plan to use this information in the graph optimization phase. The prototype was implemented using Tomcat as the Web server, Axis as the Web services engine [12], Jena [10] to interpret the DAML+OIL and OWL files, and WSIF [11] to invoke the Web services operations. All these products are free and open source software. Currently Web services operations parameters must be simple types. However, we plan to support complex types soon.
Figure 7. Sample description of a workflow using our high level language.
5.2 Example As an example, we used the prototype to implement the workflow shown in Figure 7. The generated execution graph is shown in Figure 8 and its interpretation is shown in Figure 9. Notice that a contingency plan was generated with the discovery, selection and composition of BookstoreA and BookstoreB Web services. Also, notice that the incomplete workflow description was implemented with the help of the dynamic composition: BankA Web service was discovered, selected and added to the composition because it was available and was needed to implement a contingency path.
5.1 The High Level Workflow Description Language We created a language to be able to describe workflows with try/catch blocks, assign commands, if and while conditions with expressions based only on outputs and generic calls to any operation that produces an output or effect besides calls to any operation that implements an activity. It is not possible to express these constructs in OWL-S. Our high level workflow description language is an XML-based language with the following elements: workflow, sequence, split, split_join, process, output, effect, if_else, while, until, do_while, do_until, try_catch, terminate, empty and assign. Figure 7 shows a description of the workflow shown in Figure 1 using our high level language. From BPEL, we borrowed the simple syntax, try/catch blocks and assign commands. From OWL-S, we borrowed most of the other elements.