Semantics Based Verification and Synthesis of BPEL4WS Abstract Processes Ziyang Duan, Arthur Bernstein, Philip Lewis SUNY at Stony Brook Department of Computer Science Stony Brook, NY 11790 {art, dziyang, pml}@cs.sunysb.edu
Abstract We introduce a logic model to formally specify the semantics of workflows and their composite tasks described as BPEL4WS abstract processes. Based on the model, we present a set of inference rules to deduce the strongest postcondition and weakest precondition of a workflow and demonstrate that automatic workflow verification is possible due to the restrictions on data manipulation in an abstract process. We then sketch an algorithm that automatically synthesizes a workflow given its specification and a task library.
1. Introduction A workflow is a computational model of a business process. It is a collection of interconnected activities. A workflow management system (WFMS) provides tools to define, manage and execute workflows. Workflows are either specified as flow-graphs, which define control dependencies among tasks as a directed graph, or as structured programs. The emergence of web service technologies provides a new way for applications to communicate with each other. Many standards are proposed on how to specify the coordination of web service communication and execution as workflows. BPEL4WS [5] is one that has gained wide popularity. Its flow specification language supports both structured constructs and flow-graph constructs. It also supports the concept of an abstract process, which serves as an interface definition that hides implementation details. The concept of an abstract process is important because it can be used to describe the protocol used by the process, P , to communicate with other web services. Hence it can be used to construct other web services which communicate with P without revealing P ’s internal organization. We present a model based on program logic[7] to formally specify workflow semantics. It provides inference rules describing control constructs and can be used to reason
Shiyong Lu Wayne State University Department of Computer Science Detroit, Michigan 48202
[email protected]
about the correctness of an abstract process. Program logic does not support fully automatic reasoning since the choice of loop invariants requires human intervention. However, because of the restrictions on data assignment generally recommended for an abstract process, an automatic verification algorithm can be constructed in our model. We sketch an algorithm that automatically synthesizes a workflow given its specification and a library of available tasks with well-defined semantics. We assume the tasks satisfy the ranking assumption. This means that each task performs something “good” and leads closer to a business goal. Ranking holds in processes we have studied from the financial services industry: trade negotiation processes and corporate actions. These processes share common properties making automatic synthesis particularly appropriate. However, their details are different, and can be quite complex. In addition, many of them have to change frequently due to changes in business rules. Constructing workflows from these tasks manually is costly and error-prone. Automatic synthesis can provide a solution for this problem. The main contributions of this paper are: 1. a logical framework in which the semantics of tasks and workflows can be formally specified 2. a set of inference rules for reasoning about the correctness of workflows. 3. a verification algorithm which is both complete and sound. 4. a workflow synthesis algorithm.
2. Process Model Separation of computation from control A major advantage of describing business processes as workflows is to separate the details of computation from control flow. Details should be contained in task implementations, whereas workflows specify the execution order of tasks. Tasks publish only their interfaces; implementation details are hidden. This separation supports interdomain task integration and task re-implementation. Workflows are data-dependent. Thus, a mortgage workflow evaluates a customer based on his credit, and produces
a status. The workflow may subsequently branch on the status. Hence, a workflow language has to have the capability of specifying data-dependent behavior (i.e., conditional statements). However, the computation that determines status complicates the description and need not be visible to communicating web services. Furthermore, it may be proprietary to the organization. To emphasize only the control flow aspect of a workflow, we limit the computational capabilities of the BPEL4WS abstract process. We call our language Workflow Modeling Language (WFML). Similar to an abstract process, a workflow specified in WFML is not executable because it lacks computational power, but it defines the control logic among tasks, and can be used to describe the protocol that should be followed by all parties interacting with the workflow. Tasks and task library A task has a name and sets of input and output parameters. We assume that a task in a business process, P , corresponds to an operation invoked by P and implemented within and exported by a remote web service, which is described in a WSDL document. We focus on the abstract (e.g., interfaces), as opposed to the concrete (e.g., bindings), aspect of WSDL to describe tasks. All tasks used in business process specification form a task library, which corresponds to one or more WSDL definitions. Flow control A workflow specification contains an activity body, and a set of variables. The body specifies the control flow logic using Invoke, a restricted form of Assignment, Sequence, Switch, and While. The model has been extended to include Flow, but we do not discuss that here. The only form of assignment allowed in WFML is opaque assignment defined in BPEL, which assigns to a variable a value that has been nondeterministically chosen from the variable’s domain. The actual value assigned is not specified. The postcondition of opaque assignment can only assert that a value has been assigned. This restricts the language to describing only control flows, and serves as an abbreviation of the computation which might appear in a fully executable description of the workflow. Only the while loop is provided in BPEL. Hence, an assignment to a loop counter variable is needed in a BPEL program to construct a for loop. To eliminate the need for such an assignment statement but to provide for-loop capability, we introduce the FOR loop construct. Since this can be easily simulated in BPEL, this addition does not add any new functionality not present in a BPEL abstract process.
3. Logical Model We describe a simple logical framework on which the semantics of tasks and workflows can be formally specified
in WFML. In this model, we describe the execution state of a workflow using propositional logic. A proposition asserts a basic fact about the workflow state. A literal is either a proposition or its negation. All the literals that are relevant to describing workflows form the universe, U. The universe comes from the workflow specification and the library. For example, a telecom company might have a task that strings a phone line to a home, making the proposition wired true, and another task that installs phone service. Then wired is an element of U, and it is a postcondition of the former task and a precondition of the latter task. Since propositions of this type are the result of task execution, we refer to them as task propositions. If x is a variable, then gen(x) is a proposition in U asserting that a value from x’s domain has been assigned to x (as a result of an opaque assignment or the the execution of a task). Initially, the value of x is undefined, and gen(x) is false. We include gen(x) in the set of task propositions. Conditions on variables are a component of the workflow state. Thus if x is an integer variable and a workflow has a SWITCH with condition x ≤ 1, then x ≤ 1 is part of the workflow state on one branch of the switch. A condition is also a proposition, although not a task proposition. State: At any time, the state s of a workflow execution is a set of true literals in U. A proposition p is defined if either p or p¯ ∈ s, which means p is true or false, respectively. p is undefined if neither p nore p¯ is in s. Hence a state assigns a subset of literals in U Boolean values. Assertion: An assertion is a well-formed formula (wff) defined on U. A well-formed formula (wff) is a sentence in propositional logic. A literal is a wff; A ∧ B, A ∨ B, and ¬A are wffs if A and B are wffs. Thus an assertion can be evaluated to true, false, or undefined on a state based on the assignments to the literals. Action: An action is used to formally describe the operational semantics of a task or workflow by specifying its effect on the workflow state. An action performs an update operation on a state s. The effect of the update can be nondeterministic and conditional, therefore the result is a set of possible new states. Hence an action is a function that maps an input state to a set of possible output states. Computing pre- and postconditions Given the precondition, P , of an action, A, we can compute the strongest postcondition using function Result(P, A). Similarly, given the postcondition, Q of A, we can compute the weakest precondition using the function Residue(A, Q). Semantics of Tasks and Workflows A task, T has a precondition, P , which must evaluate to true when its execution is initiated. T ’s effect is described by an action, A, that specifies the propositions whose value it might change. T ’s frame semantics are given by the triple {P }T [A].
Opaque assignment on variable x makes gen(x) true and nodeterministically updates the value of condition propositions that depend on x, such as x > 0. It can be described as a nondeterministic action: AssignAct(x) = [gen(x) ∧ (x > 0), gen(x) ∧ ¬(x > 0)]. The square brackets indicate a comma list of nondeterministic results. The semantics of an arbitrary activity, A, (including an entire workflow) can be described by the triple {P } Activity {Q}, where P and Q are its pre- and postconditions. Inference rules Given an activity’s precondition, the inference rules of program logic can be used to generate an annotation consisting of assertions describing the state at control points. Whereas the need to determine appropriate loop invariants makes the automatic generation of an annotation impossible for arbitrary programs, the restrictions on assignment in WFML eliminate this problem.
4
Workflow analysis
Given the semantics of each task and the workflow precondition, an annotation algorithm automatically annotates and verifies a workflow in two steps. • Starting from the workflow precondition, get strongest assertion at each control point by recursively determining the pre- and postconditions of all nested activities using Result and the inference rules. • Check that the annotation satisfies all task precondition. If a precondition of an activity is not satisfied, the workflow is incorrect and the procedure returns false, otherwise, it returns true. A workflow is correct if the postcondition produced by the algorithm implies the workflow postcondition contained in its specification. A workflow, W , can be put into the task library and invoked by other workflows in the same manner as a task. This can be done by deriving its frame semantics: we need to describe W as {P }W [A], where P is W ’s precondition and A is its equivalent action. Since, in contrast to a task, a workflow transitions through intermediate states, its frame semantics are somewhat more complex.
5. Automatic synthesis We briefly describe an algorithm to synthesize a workflow. The input is a task library, and a workflow specification: its pre- and postcondition. The algorithm produces a workflow that satisfies the specification. For simplicity, we assume that task preconditions and workflow pre- and postconditions are conjunctions of positive task propositions and that actions are either conjunctive or nondeterministic. Disjunctive preconditions and conditional actions can
be handled by partitioning a task into multiple tasks. The model can be extended to handle conditions and some aspects of negation. We assume that the library is ranked: • Task propositions are associated with ranks, which are positive integers. A task proposition and its negation have the same rank. • A task has at least one positive task proposition in its action, called its primary output, that has higher rank than any task proposition in its precondition and any negative task proposition in its action.
5.1
Finding a path
We first discuss an algorithm that assumes that tasks are all deterministic. A path is a sequence of tasks in a workflow which corresponds to a BPEL sequence construct. We can find a path that satisfies a given postcondition using a closure algorithm that calculates a closure set of literals, CL. Each ∈ CL is associated with a task T , which has as a primary output. can be made true through a path terminating with T . F indLiteralP ath finds a path for a given literal by first locating the task T from the closure set, then finding a path that implements the precondition of T and appends T at the end. Procedure F indP ath finds a path P ath(G) given a conjunctive postcondition G. It first sorts the literals in G from highest rank to lowest as g 1 . . . gm . It then uses F indLiteralP ath to find paths for each gi and concatenates them in index order. Ranking guarantees that the resulting workflow meets the required specifications.
5.2
Handling nondeterministic output
Tasks may have nondeterministic outputs. We sketch an algorithm that generates a workflow with loops and switches to handle this situation. Finding a backbone path Given a task {P }T [A, B], we replace it with task {P }T [A] if A contains the primary output. In this way we get a new task library that satisfies the assumptions in 5.1 and therefore a backbone path can be found using F indP ath. If T is substituted for T the backbone produces the workflow’s postcondition provided that, when nondeterministic choices are made, all tasks generate their primary outputs. Fixing unresolved control points Nondeterministic choices will not always yield primary outputs: In the above example, T might choose B. Let the resulting control point be c1 . To provide for this, • If the state at c1 implies the state at c2 on the backbone and c1 precedes c2 , then join c 1 and c2 using SWITCH.
• If the state at c1 implies the state c2 on the backbone, and c2 precedes c1 , then join c 1 and c2 using LOOP. • Otherwise,find another path to the workflow postcondition starting from c 1 . This utilizes SWITCH. The process is repeated on every unresolved control point.
6. Implementation We implemented a verification system, and are in the process of implementing a synthesis system based on the ideas discussed above.
7. Related work BPEL Abstract Process WFML captures the essential concepts of a BPEL abstract process. Its control flow constructors are equivalent to those in BPEL. However, it has a stronger limitation on the computational power of the language since only opaque assignment is allowed. However, the use of general assignment in abstract processes is discouraged (but not disallowed) since the intent is to describe only the public aspects of the process. Hence, we propose WFML as an appropriate language for describing the protocol of a business process. DAML-S DAML-S [1](DAML-based Web Service Ontology) provides a core set of markup language constructs for formally describing the properties and capabilities of web services. Hence, it can serve as a basis for automatic verification and synthesis. However, DAML-S does not intend to provide solutions to those problems. It was developed in parallel with BPEL. Instead of using the DAML-S process model, we use WSDL and the BPEL4WS abstract process to specify control flow and semantics. Program logic Our model is closely related to program logic. An action specification is a limited form of assignment statement. Hence, the task library essentially replaces the Assignment Axiom Schema of program logic with a finite set of axioms, one per task. Since the model is a subset of program logic, all inference rules in program logic are valid in our model and can be used in verification. Furthermore, the need to generate loop invariants is eliminated. AI planning Automatic workflow generation is closely related to AI planning: it can be treated as a special planning problem and solved by appropriate planner systems. Work has been done on automatic web service synthesis based on AI planning [2, 3, 10, 8]. Though most approaches map web service synthesis into AI planning, the web service domain has its own special characteristics and current AI planning
is unlikely to provide a complete solution [9, 8]. First, a web service (task) often has multiple outcomes and nondeterministic behavior. Second, workflow languages provide a set of control constructs, and the synthesis algorithm needs to choose the constructs wisely in order to produce a wellstructured workflow. Classic planning focuses on generating a path for conjunctive goals and does not consider actions with conditional or nondeterministic effects. Recently, planning algorithms have been developed to handle more complex situations, such as conditions and nondeterminism (i.e., [6, 4]). However, they are not intended to generate structured workflows. We have identified the ranking assumption as generally applicable to tasks used in business processes. The assumption is not applicable in planning since tasks are generally primitive and it is not possible to identify a primary output. However, ranking greatly simplifies the synthesis problem.
References [1] A. Ankolekar, M. Burstein, J. Hobbs, O. Lassila, D. Martin, D. McDermott, S. McIlraith, S. Narayanan, M. Paolucci, T. Payne, and K. Sycara. Daml-s: Web service description for the semantic web. In Proc. 1st Intl. Semantic Web Conf., 2002. [2] D. Berardi, D. Calvanese, D. G. Giuseppe, M. Lenzerini, and M. Mecella. Automatic composition of e-services that export their behavior. In The First International Conference on Service Oriented Computing, Trento, Italy, December 2003. [3] M. Carman, L. Serafini, and P. Traverso. Web service composition as planning. In ICAPS 2003 Workshop on Planning for Web Services, Trento, Italy, June 2003. [4] A. Cimatti, M. Roveri, and P. Traverso. Automatic OBDDbased generation of universal plans in non-deterministic domains. In Proc. of the 15th National Conference on Artificial Intelligence, pages 875–881, Wisconsin, 1998. [5] F. Curbera, Y. Goland, J. Klein, F. Leymann, D. Roller, S. Thatte, and S. Weerawarana. Business Process Execution Language for Web Services, Version 1.1, 2003. http://www106.ibm.com/developerworks/library/ws-bpel/. [6] E. Hansen and S. Zilberstein. Lao*: A heuristic search algorithm that finds solutions with loops. Artificial Intelligence, 129(1-2):35–62, 2001. [7] C. A. R. Hoare. An axiomatic basis for computer programming. Comm. ACM, pages 576 – 580, Oct. 1969. [8] M. Sheshagiri, M. desJardins, and T. Finin. A planner for composing services described in daml-s. In ICAPS 2003 Workshop on Planning for Web Services, Trento, Italy, 2003. [9] B. Srivastava and J. Koehler. Web service composition - current solutions and open problems. In ICAPS 2003 Workshop on Planning for Web Services, Trento, Italy, June 2003. [10] D. Wu, B. Parsia, E. Sirin, J. Hendler, and D. Nau. Automating DAML-S web services composition using SHOP2. In Proc. 2nd Intl. Semantic Web Conf., Florida, 2003.