Keywords: Service composition; structured processes; collaborative cross- ...... these branching types are specified by a user (a domain expert) either manually or .... assemble. Motorpump produce. Cap assemble. Watertank buy. Motorpump.
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
csisp-final
International Journal of Cooperative Information Systems c World Scientific Publishing Company
COMPOSING SERVICES INTO STRUCTURED PROCESSES
RIK ESHUIS and PAUL GREFEN Eindhoven University of Technology, School of Industrial Engineering P.O. Box 513, 5600 MB Eindhoven, The Netherlands {h.eshuis|p.w.p.j.grefen}@tue.nl
Service composition languages like BPEL and many enactment tools only support structured process models, but most service composition approaches only consider unstructured process models. This paper defines an efficient algorithm that composes a set of cooperative services with their dependencies into a structured process. The algorithm takes as input a dependency graph and returns a structured process model that orchestrates the services while respecting their dependencies. The algorithm is embedded in a lightweight, semi-automated service composition approach, in which first dependencies between services are derived in a semi-automated way and next the algorithm is used to construct a structured composition. The approach has been implemented in a prototype that supports the dynamic formation and collaboration of dynamic virtual enterprises using cross-organizational service-oriented technology. Keywords: Service composition; structured processes; collaborative cross-organizational process management
1. Introduction Today, organizations more and more focus on their core competences, relying on competences of other organizations to deliver requested products or services. The resulting cross-organizational collaborations give rise to networked organizations, in which one organization acts as main contractor and the network partners cooperate by delivering products and services to the main contractor. The market dictates that these networks are highly agile and efficient.1 This typically means that networks are formed ad hoc, depending upon a specific service requested by a customer. One of the most promising technologies to support this way of working is web services,2,3 which are self-contained functions that are defined in an implementationindependent way, usually by means of WSDL.4 Their descriptions are published in a publicly accessible repository, which are searched by service consumers to find and invoke specific web services, which are offered by service providers. A main contractor can play the rˆ ole of service consumer. Upon request of a customer, the main contractor then searches the repository for basic services offered by service providers. Next, the main contractor can orchestrate these services into a composite service that meets the customer’s request. When the composite service is enacted, the main contractor (service composer) invokes the basic services in the 1
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
2
csisp-final
Rik Eshuis and Paul Grefen
order specified in the service orchestration. While BPEL5 has emerged as the standard language for describing service compositions, the actual problem of how to orchestrate a set of given services, still remains open. Formal approaches6,7,8,9,10,11 focus on automated service composition and require that the effect of each service is modeled with a pre- and post-condition. This allows the application of techniques from AI planning and program synthesis to orchestrate the services into a composite service. However, these approaches burden the service provider with the task of formally specifying its services and annotating its WSDL specifications with this additional information. Moreover, preand post-conditions may be hard to state in real-world business applications. Other approaches are more pragmatic and derive graph-based compositions in a semi-automated way, by analyzing input/output dependencies between services and translating these directly into control flow.12,13 In the resulting graph-based compositions, services are coordinated through control elements like AND-forks, AND-joins, XOR-forks, and XOR-joins. Though such approaches do not require any annotation of the web services, they suffer from another disadvantage: graphbased process models can contain flaws, for example deadlocks. Executing such a flawed composition could result in failures at run-time, which involves considerable expense to repair. Sometimes such flaws can be detected already at design-time, but such checks can be computationally expensive, and next, several design iterations may need to be made to find a flawless composition. This problem with graph-based models disappears if the models are blockstructured or structured for shorta , i.e., if the models consist of blocks and each block has a unique start and a unique end point.14,15 Blocks can be nested, but are not allowed to overlap. Models that are not block-structured are similar to programs containing goto statements. Due to the block-structure, structured models do not contain any flaws, such as deadlocks. The most important orchestration languages, BPEL5 and OWL-S,16 are structured (BPEL is actually semi-structured: it does allow cross links, but only between parallel services and parallel blocks). Next, process enactment tools, which can be used for executing orchestrations, all support structured process models.17 Thus, expressing service compositions as structured process models is a reasonable choice. However, composing a structured process model from a set of services and their dependencies is not straightforward, since directly translating each dependency directly into control flow may lead to an unstructured composition. For example, a direct mapping from dependencies to control flow may result in a composition in which a fork node matches multiple join nodes. This corresponds to a block that has multiple end nodes, which is not allowed in (block-)structured models. Also, there can be services whose dependencies do not correspond to any structured coma Structured process models should not be confused with structured processes, which are processes whose structure can be specified in any model, including graph-based ones. For example, processes supported by groupware are unstructured.
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
csisp-final
Composing Services into Structured Processes
3
position. Consequently, the main problem is to analyze the dependencies in such a way that a structured composition can be derived. Section 2 illustrates these issues in more detail. This paper defines an algorithm that composes a given set of services with their dependencies into a structured process model. The services and their dependencies are specified in a dependency graph, which generalizes a flow graph. The structured process model orchestrates the services respecting their dependencies. The algorithm generalizes an existing algorithm for structuring flow graphs18 and runs in almost linear time. Next, the paper defines constraints on dependency graphs under which the algorithm operates correctly. The composition algorithm is embedded in a semi-automated two-step approach for composing services into structured processes. In the first step, dependency graphs are constructed, while in the second step, a structured process model is constructed from the dependency graph. While the second step is fully automated, the first step requires user input as argued in Section 3.1. In this paper, dependencies are derived from data flow between services, but dependencies can also be due to events or business logic constraints. We have implemented such a two-step service composition approach in a prototype system in the context of the IST CrossWork project,19 which has developed technology to support the dynamic formation of instant virtual enterprises.20 In this setting, organizations offer their business processes as services to a virtual enterprise. Section 6 gives more details on the CrossWork project and how service composition has been applied there. Though there is ample related work on service composition (see Section 7), there are but a few service composition approaches that support the construction of structured process models.6,11 However, these approaches are fully automated, so services need to be formally specified using pre- and post-conditions, whereas our semi-automated approach is much more lightweight, since it only requires dependencies between services. Moreover, the actual composition algorithms used in these approaches have a non-polynomial time complexity, so they are much less efficient than the almost linear-time algorithm defined in this paper. In sum, the main contribution of this paper is an efficient algorithm for composing a set of services into a structured process, and a lightweight, semi-automated service composition approach in which the algorithm is embedded. The resulting structured process compositions can be straightforwardly encoded in standard composition languages like BPEL5 and OWL-S.16 Another contribution is the application of wellknown techniques from compiler theory18,21 to the field of service composition. Since the focus of the paper is on the control flow aspects of service composition, we consider here structured process models without explicit data flow. So variables and parameters of services are not modeled. We do model choice conditions to enforce a proper execution order of the services, but these conditions are set in the composition model, not in a service; see Section 3 for more details. The remainder of this paper is organized as follows. Section 2 introduces the
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
4
csisp-final
Rik Eshuis and Paul Grefen
composition algorithm by showing examples of dependency graphs and the resulting structured process compositions. It also shows that not every dependency graph has a corresponding structured composition. Section 3 defines dependency graphs and structured process models. Section 4 defines the actual composition algorithm, which fully automatically constructs a structured process model from a dependency graph. Next, constraints on dependency graphs are defined under which the algorithm operates correctly, i.e. the returned composition corresponds to the dependency graph. Also the correctness and complexity of the algorithm are analyzed. Section 5 defines a lightweight semi-automated service composition approach in which the algorithm of Section 4 is embedded. Section 6 discusses how the semi-automated service composition approach has been implemented in a prototype system that supports the dynamic formation of instant virtual enterprises in the CrossWork project.19 Next, a medium-sized case study from CrossWork is presented, and the approach is evaluated. Section 7 gives an overview of related work. Finally, Section 8 winds up with conclusions and further work. This paper is partly based on preliminary work reported at BPM 2006.22
2. Overview of composition algorithm To introduce the salient features of the composition algorithm, we consider a quotation process for a production company. In the process, a client asks for a quotation for an order request. First, the request is checked for completeness and the customer is asked for clarification if needed. Next, the amount of credit of the customer is checked. If the check is not successful, the quotation is canceled and the client is informed through the customer service department. Otherwise, a quotation is sent to the customer, consisting of a cost statement plus production and shipment plan. The client can decide either to accept the quotation by paying the cost statement, or to reject the quotation. If the quotation is rejected, the customer service department is informed, so that they can do some follow-up inquiries with the customer. Finally, the quotation is archived. Figure 1 shows the dependency graph of this process, which consists of a number of services and dependencies between them (see Section 3.1 for a formal definition of dependency graphs). In the figure, a dotted arrow from A to B indicates that B depends on A. For example, service Receive Request depends on input from Check Request. This dependency information can, for example, be derived from the signatures of the services (see Section 5). If a service has more than one incoming (outgoing) dependency, then either all the predecessor (successor) services are executed (AND) or only one (XOR). To comply with this constraint, an empty service Dummy has been inserted immediately after Check Credit. Section 3.1 gives more details on this insertion step. Figure 2 shows the structured composition constructed by the algorithm. The composition is represented as a process tree, in which the leaf nodes are the services and the internal nodes represent the blocks. Each block specifies an ordering con-
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
csisp-final
Composing Services into Structured Processes
Make Production Plan
5
AND AND
Get Shipment Plan
Dummy
Make Fulfillment Schedule
Send Quote
Handle Payment XOR
Receive Request
Check Request
Check Credit
XOR
XOR
AskFor Clarification
Handle Rejection
Get Price Rates
AND
XOR
XOR XOR
Inform Customer Service
Cancel Order
Archive
Fig. 1. Services for quotation process and their dependencies
SEQ
REPEAT UNTIL in(Dummy1)
Check Credit
XOR
XOR
Archive g2
g1 SEQ
Receive Request
Check Request
AskFor Clarification
Cancel Order
SEQ
XOR
Dummy1
Make Production Plan
AND
Send Quote
SEQ
Get Price Rates
AND
Make Fulfillment Schedule
Get Shipment Plan
Dummy2
Inform Customer Service
XOR
Handle Payment
Handle Rejection
g1=in(Handle Payment)|in(Inform Customer Service) g2=in(Handle Rejection)|in(Cancel Order)
Fig. 2. Structured composition for Figure 1
straint, for example parallelism (AND) or sequence (SEQ). For a sequence node, the ordering of its subnodes is from left to right, so for example Receive Request is done
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
6
csisp-final
Rik Eshuis and Paul Grefen
AND A
B
XOR C
D XOR
Fig. 3. Dependency graph not having any corresponding structured composition
before Check Request. Note that each process tree represents a block-structured process, so unstructured processes cannot be expressed in process trees. A structured process language, in which process trees can be expressed textually, is defined in Section 3. Compared to Figure 1, two additional dummy services, Dummy1 and Dummy2, have been introduced, which correspond to empty else-branches of an if-statement. They are automatically inserted by the algorithm. Service Dummy is not explicitly represented in the composition; it is implicitly represented by the composite AND node containing Make Production Plan, Get Shipment Plan, Get Price Rates and Make Fulfillment Schedule. The algorithm ignores for the composition empty (dummy) fork services. The structured composition can specify executions that are not possible according to the dependency graph. For example, according to the control flow skeleton in Figure 2, an execution is possible in which Handle Payment is executed before Inform Customer Service, even though the XOR-join Archive in the dependency graph specifies that only one these services is done, not both. To rule out such spurious executions, we use guard conditions that state that a service can only be executed if one of its immediate predecessor services, according to the dependency graph, has been executed. For example, in Figure 2 service Inform Customer Service can only be started if services Handle Rejection or Cancel Order have been executed. The algorithm described in Section 4 generates these guard conditions automatically. Finally, there are dependency graphs that do not have a corresponding structured composition, for example the one in Figure 3. The AND-fork inside the loop in Figure 3 specifies that each iteration of the loop starts a new instance of service D. No structured composition supports such a dynamic instantiation of service D. In Section 4.2 we define healthiness constraints on dependency graphs which rule out dependency graphs like Figure 3. If a dependency graph satisfies the healthiness constraints, the algorithm returns a corresponding structured composition.
3. Preliminaries We define dependency graphs and a language for expressing structured processes. The algorithm in the next section takes as input dependency graphs and outputs structured processes that are expressions in this language.
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
csisp-final
Composing Services into Structured Processes
7
3.1. Dependency graphs First we define and explain dependency graphs. Next, we describe a semi-automated approach for constructing dependency graphs. 3.1.1. Definition Let S be a set of services and D a set of empty dummy services. A dependency graph is a tuple (N, E, start, end, f orkT ype, joinT ype) with • N = S ∪ D are the nodes of the graph, • E ⊆ N × N the set of edges, representing dependencies, • start ∈ N the unique start node such for each node n ∈ N there is a path from start to n, • end ∈ N the unique final node such that for each node n ∈ N there is a path from n to end, • f orkT ype : S → {AN D, XOR} is a partial function that labels a service having more than one outgoing dependency with its branching type, • joinT ype : S → {AN D, XOR} is a partial function that labels a service having more than one incoming dependency with its branching type. The notion of dependency graph can be seen as an extension of the definition of flow graphs18,21 with parallelism (branching types can have type AND). A minor difference is that nodes in flow graphs model assignments, whereas in dependency graphs they model services. Liang et al.13 also use dependency graphs in the context of service composition, but do not consider structured compositions and do not consider flow graphs. A node having more than one incoming dependency is called a join whereas a node with more than one outgoing dependency is called a fork . Functions f orkT ype and joinT ype assign one type only to incoming resp. outgoing dependencies, which might seem restrictive. For example, languages like XPDL23 allow that some incoming or outgoing arcs have type AN D while other have type XOR. Thus, the dependency graph in Figure 4(a) would be valid in XPDL. However, we rule it out since it is ambiguous: it is not specified whether for example B and D can be executed both or are exclusive. This restriction can be overcome by using empty dummy services, which have no implementation but whose sole purpose is to describe dependencies. For example, Figure 4(b) shows a partial dependency graph (without end node) with the same services and dependencies as in Figure 4(a), but now two empty services are included. Now, the dependency between for example B and D is made precise: either B is executed or D but not both. In Figure 1, an empty service Dummy has been inserted for the same reason. When defining the algorithm in Section 4, we use a few auxiliary functions on dependency graphs. Given a node n, its set of precondition nodes, written pre(n) are those nodes on which n depends. Symmetrically, the set of postcondition services
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
8
csisp-final
Rik Eshuis and Paul Grefen
(a)
AND
(b)
AND B
XOR
B Dummy1
C A
C A
D
D Dummy2
XOR
E
E XOR
Fig. 4. Invalid partial dependency graph (a) and valid partial dependency graph with same dependencies as (b)
of n, written post(n), are those services that depend on n: pre(n) = { x | (x, y) ∈ E ∧ y = n } post(n) = { y | (x, y) ∈ E ∧ x = n }. 3.2. Structured process language Usually, structured processes are defined as a syntactic restriction of unstructured processes.14,24 However, we take a different approach and explicitly define a language for structured processes. So, unstructured processes cannot be expressed in the language. Let S be a set of services, ranged over by s, and let D be a set of dummy services, ranged over by d. The language of structured processes, ranged over by P , is generated by the following grammar: P ::= seq | and{seq, seq, .., seq} | xor{grdseq, grdseq, .., grdseq} | repeat seq until grd | atomic atomic ::= s | d seq ::= < P, P, .., P > grd ::= in(atomic) | grd ∨ grd | true Expression < P, P, .., P > indicates that the elements in the list are executed one by one, according to the order specified in the list, and{seq, seq, .., seq} specifies that the elements in the set are executed in parallel, xor{gseq, gseq, .., gseq} specifies that exactly one of the guarded expressions in the set is executed, while repeat seq until grd specifies that expression seq is executed until condition grd holds. For a service x ∈ S ∪ D, guard in(x) is true if x was executed previously. The predicate abbreviates a test of a local variable vx that is created for each service x. Variable vx is initially false, set to true immediately after x is executed, and set to
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
csisp-final
Composing Services into Structured Processes
9
false immediately after the test has been done. To avoid cluttering the compositions with variable definitions, assignments and tests, we decided to use a special predicate in, rather than introducing variables and tests explicitly. In the algorithm, we use some additional functions, which are defined on the parse trees of the structured process expressions. Leaf nodes of a parse tree correspond to services. An internal node of a parse tree is called a block . Given a block b, we denote by children(b) the children of b and by parent(b) the unique parent block of b. Since we consider a hierarchical structure, each block has one parent, except the root of the hierarchy which has no parent. For example, children(xor{in(W )X, in(Y )Z}) = {in(W )X, in(Y )Z}. Next, services(b) denotes the set of services that are a direct or indirect child of b. For example, services(and{seq[X, Y ], seq[Z]}) = {X, Y, Z}. Structured process expressions are graphically represented by their parse trees; an example is given in Figure 2. Guards of an XOR block are shown at the edges that connect the XOR block to its children, as illustrated in Figure 2. For example for the structured process in Figure 2, the XOR block in the top right followed by Archive represents expression . 4. Composition algorithm In this section, we define an algorithm that takes as input a dependency graph meeting specific healthiness constraints and outputs a structured composition that satisfies the dependencies. First, we explain how block structures can be inferred from unstructured dependency graphs. Next, we define healthiness constraints on dependency graphs. If the constraints are satisfied, the algorithm can be applied correctly. We then define two preprocessing steps on dependency graphs. The first preprocessing step helps to simplify the algorithm while the second one leads to structured compositions with more parallelism. Next, we define the actual algorithm, which extends in two different ways an existing algorithm by Baker18 for structuring flow graphs in which each loop has a single entry point. First, we use guard conditions to ensure proper control flow, whereas Baker uses goto statements for this purpose. Goto statements are not expressible in structured process languages. Second, dependency graphs allow branching type AN D in addition to type XOR, whereas flow graphs only use type XOR. 4.1. Finding structure in dependency graphs Since the dependency graph is unstructured, inferring the structure from the dependency graph is not straightforward. The approach sketched here is based on Baker.18 The notions introduced are used in the algorithm. The first problem is to identify the nesting structure for the loops of the structured process to be constructed. To this end, we use the concept of dominator.18,21
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
10
csisp-final
Rik Eshuis and Paul Grefen
Receive Request
Check Request
AskFor Clarification
Dummy
Make Production Plan
Get Shipment Plan
Get Price Rates
Check Credit
Inform Customer Service
Cancel Order
Make Fulfillment Schedule
Archive
Send Quote
Handle Rejection
Handle Payment
Fig. 5. Dominator tree for Figure 1
For a dependency graph (N, E, start, end, f orkT ype, joinT ype), node p dominates node q if every path from start to q passes through p. Node p is the immediate dominator of node q if every dominator of q other than p also dominates p. Let DOM (p) denote the immediate dominator of p. DOM (p) is unique for p.21 Figure 5 represents the dominators for Figure 1 in a dominator tree.21 In the tree, each parent of a child is the immediate dominator of that child. Next, we identify loops. A back edge is an edge (x, y) ∈ E in the dependency graph such that y dominates x,21 so every path from start to x passes through y. Intuitively, a back edge closes a loop that is started at y. An edge that is not a back edge is called a forward edge. Figure 1 has only one back edge: (Ask For Clarification, Check Request), all the other edges are forward edges. For each back edge (x, y), its natural loop can be computed, which is the set of nodes that can reach x without going through y, plus y. Node y is the header of the natural loop. If two natural loops have different headers, they are either disjoint or one is nested inside the other.21 If two different natural loops have the same header, they can be merged.21 A node that is target of some back edge is called a loop node. For a non-loop node n, its head, denoted HEAD(n), is the header of the most nested natural loop containing n. In the structured composition, n is contained in the repeat
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
csisp-final
Composing Services into Structured Processes
11
statement constructed for HEAD(n). For a loop node l, HEAD(l) is the header of the most nested loop that strictly contains the natural loops headed by l. If n is not contained in a loop, node HEAD(n) is undefined. For convenience, if HEAD(n1 ) and HEAD(n2 ) are undefined, then HEAD(n1 ) = HEAD(n2 ) by definition. For the dependency graph in Figure 1, we have HEAD(Check request) = HEAD(Ask For Clarification) = Receive order. For all other nodes n, HEAD(n) is undefined. For each node p, DOM and HEAD are used to define the set F OLLOW (p), containing all nodes that are immediately after p at the same level of nesting in the structured composition. For a fork node p, so |post(p)| > 1, let F OLLOW (p) = { q | q is a join ∧ p = DOM (q) ∧ HEAD(p) = HEAD(q) }. For a loop node p, so some back edge enters p, define F OLLOW (p) = { q | DOM (q) is in a natural loop headed by p }. For each other node p, define F OLLOW (p) = { q | HEAD(p) = HEAD(q) ∧ p = DOM (q) }. The following properties of F OLLOW are important for the algorithm.18 For every pair of distinct nodes p, q, the sets F OLLOW (p) and F OLLOW (q) are disjoint. Every node is in a F OLLOW set, except the first nodes at each level of nesting. Conversely, the last node at each level of nesting has an empty F OLLOW set. For Figure 1, F OLLOW (Receive Request) = {Check Credit}, F OLLOW (Check Credit) = {Inform Customer Service, Archive}, F OLLOW (Dummy) = {Make Fulfillment Schedule, Send Quote}; all other nodes have empty F OLLOW sets. In particular, Archive ∈ F OLLOW (Send Quote), since Send Quote does not dominate Archive. 4.2. Constraints For the algorithm, we require that each dependency graph satisfies the following constraints: C1 Each fork has a subsequent join of the same type. C2 Each loop has a single entry and a single exit point of type XOR. Constraint C1 may seem restrictive, but still allows a fork that belongs to multiple joins or a join that belongs to multiple forks, provided the forks and subsequent joins have the same type. For example, in Figure 1 fork Check Credit has two subsequent joins of the same type. Thus, Constraint C1 allows unstructured dependency graphs like Figure 1. We now explain and motivate these constraints. In the next section, we will formalize them.
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
12
csisp-final
Rik Eshuis and Paul Grefen
C1. We require that each pair of fork f and subsequent join j have the same type, so if j ∈ F OLLOW (f ), then f orkT ype(f ) = joinT ype(j). An XOR-fork that is followed by an AND-join would imply a deadlock at the join, whereas an ANDfork followed by an XOR-join would imply multiple execution of the XOR-join. Such behaviors cannot be expressed in structured processes, and therefore such dependency graphs are ruled out by C1. C2. We require that each loop has one unique entry point, i.e. the dependency graph is reducible.21 Formally stated, for each back edge (x, y), y dominates x, so y is the unique entry point to x. We adopt this constraint to simplify the algorithm. Moreover, using techniques like node splitting,21 irreducible dependency graphs can be transformed to equivalent reducible ones. In addition, we require that each loop has a single exit point, so for each loop node l we have F OLLOW (l) = {x} for some node x, and of all the edges entering x, at most one leaves a node that has l has head. That node is the exit point of the loop. The constraint can be relaxed to allow that a loop has multiple exit points. However, that would require the introduction of boolean variables (similar to the constructions used in compiler theory to structure flow graphs). Since such constructs have been well studied in compiler theory,21 we do not consider them here. Zhao et al.25 focus explicitly on how loops with multiple entries and exits can be rewritten into loops with single entries and single exits. Finally, as explained in Section 2, parallelism is not allowed to cross the border of loops. Such behavior cannot be expressed in structured processes, and therefore the resulting constructed structured process would not correspond to the input dependency graph. We therefore require that the entry point and exit point of a loop have type XOR. For example, Figure 3 violates C2. 4.3. Preprocessing dependency graphs Before invoking the algorithm on a dependency graph, we preprocess the graph. In the first step, dummy loop and fork nodes are inserted in order to simplify the algorithm. In the second step, forks hidden inside other forks are eliminated, in order to increase the parallelism in the constructed compositions. In both steps, dummy services are inserted. Step 1 is obligatory while Step 2 is optional. Step 1 is defined for loop nodes by Baker,18 but Step 2 is not, since flow graphs cannot contain parallelism. Step 1: Inserting loop and fork nodes. To simplify the algorithm, we require that each loop node and each fork node in the dependency graph is a dummy service. To ensure this, we insert before each loop node n ∈ S a fresh dummy service d ∈ D, which represents the repeat statement in the structured composition. The original loop node n is then the first statement of the repeat statement. Replace each edge (x, n) ∈ E by (x, d) and add dependency (d, n) to E. Similarly, we insert after each
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
csisp-final
Composing Services into Structured Processes
13
fork node f ∈ S a fresh dummy service d ∈ D, which represents the XOR or AN D composite node. Replace each edge (f, x) ∈ E by (d, x) and add edge (f, d) to E. For the dependency graph in Figure 1, Step 1 inserts for every fork and every loop node a dummy service, except for Dummy, since that is already a dummy service. Step 2: Eliminating implicit forks. In a dependency graph, some joins can have implicit forks, i.e. forks that are part (subset) of another fork. For example, in Figure 1 join Make Fulfillment Schedule has an implicit fork inside Dummy. Since Make Fulfillment Schedule is in F OLLOW (Dummy), Make Fulfillment Schedule will be at the same level of nesting in the structured composition as Dummy and therefore after Get Price Rates. However, it is more natural that Make Fulfillment Schedule is in parallel with Get Price Rates. To allow suitable nesting and increase the parallelism in the returned structured composition, it is desirable to replace implicit AND-forks by explicit ones. This can be done by inserting a dummy service that acts as explicit fork. For this preprocessing step, we only consider the forward edges of each fork, not its back edges. For each AND-join j, take its immediate dominator d, which is an AND-fork by C1. Let succ(d) be the set of forward successor nodes of d, so for each x ∈ succ(d), (d, x) is a forward edge. Let X ⊆ succ(d) be the set of successor nodes of d that are postdominated by j. (Node p postdominates node q if every path from q to end passes through p, and the immediate postdominator is the postdominator closest to q. Just as a dominator is the unique entry point of a block in a structured composition, the postdominator is the unique exit point of a block.) If X = post(d) then d has a successor node that is not postdominated by j. In that case, there is an implicit AND-fork that has an incoming dependency from d and outgoing dependencies to the nodes in X. To make this fork explicit, create a dummy service e with fork type AN D, replace for each x ∈ X each edge (d, x) ∈ E by (e, x) and add edge (d, e) to E. 4.4. Algorithm The structured composition algorithm is listed in Figure 6. It requires as input a set N of nodes, a set E of edges, function f orkT ype and joinT ype, a partial function grd that maps edges to guard expressions, and the node current that is to be processed. The algorithm returns a sequential block that starts with current. For a dependency graph (N, E, start, end, f orkT ype, joinT ype), the initial call is StructuredComposition(N, E, f orkT ype, joinT ype, ∅, start). The algorithm has two main parts. In the first part (l. 2-26), the node current is processed by creating a sequential block P containing current and all its indirect successors that are not in F OLLOW (current). In the second part (l. 27-34), the nodes in F OLLOW (current) are processed and the resulting block is appended to P . Finally (l. 35), P is returned as the structured composition for current. We now explain these parts in more detail. For the first part (processing
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
14
csisp-final
Rik Eshuis and Paul Grefen
1: procedure StructuredComposition((N, E, f orkT ype, joinT ype, grd, current)) 2: if current is fork node then 3: children := ∅ 4: for n ∈ post(current) do 5: if n not in any F OLLOW set then 6: Cn := StructuredComposition(N, E, f orkT ype, joinT ype, grd, n) 7: else 8: Cn :=< dummycurrent,n > 9: end if 10: if f orkT ype(current) = XOR then 11: Cn := grd(current, n)Cn 12: end if 13: children := children ∪ {Cn } 14: end for 15: if f orkT ype(current) = XOR then 16: Pcomp := xor children 17: else 18: Pcomp := and children 19: end if 20: P :=< Pcomp > 21: else if current is loop node with successor node x and F OLLOW node n then 22: Px := StructuredComposition(N, E, f orkT ype, joinT ype, grd, x) 23: P :=< repeat Px until in(dummycurrent,n ) > 24: else 25: P :=< current > 26: end if 27: if F OLLOW (current) = ∅ then 28: if |F OLLOW (current)| > 1 then 29: insert unique F OLLOW node after current 30: end if 31: next := the unique node following current 32: Q := StructuredComposition(N, E, f orkT ype, joinT ype, grd, next) 33: P := P Q 34: end if 35: return P 36: end procedure Fig. 6. Algorithm for constructing structured compositions
current), there are three cases. If current is a fork node (l. 2), a composite block Pcomp having a set children of child blocks is created. Each successor n of current, so n ∈ post(current), is processed. If n is not in a F OLLOW set (l. 5), algorithm StructuredComposition is recursively called with n as current and returns a block that contains n (l. 6). If the fork has type XOR (l. 10), the guard for n is added as a precondition (l. 11); the guard is constructed before in Figure 7. Otherwise (l. 7), n is in F OLLOW (x) for some node x. Then a block for n will be created when the node x is being processed in the algorithm, so when x = current. Therefore, now a block with a dummy service is created (l. 8). To illustrate this, for
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
csisp-final
Composing Services into Structured Processes 1: 2: 3: 4: 5: 6: 7: 8: 9:
15
f := a fresh node not in N N := N ∪ {f } Ef ollows := { (x, y)|(x, y) ∈ E ∧ x ∈ services(P ) ∧ y ∈ F OLLOW (current) } Enew := { (x, f ), (f, y)|(x, y) ∈ Ef ollows } E := (E \ Ef ollows ) ∪ Enew f orkT ype = f orkT ype ∪ {f → f orkT ype(current)} if f orkT ype(current) = XOR then grd := grd ∪ {(f, y) → f ormulaP reCondition(y)|(x, y) ∈ Ef ollows } end if Fig. 7. Code for inserting unique F OLLOW node after current (l. 29 of Fig. 6)
Figure 1, if current = Check Request and n = Check Credit, then a dummy node is created at l. 8, since Check Credit ∈ F OLLOW (Receive Request). In both cases (l. 5 and l. 7), for n a block Cn is created. This block is added to children (l. 13). Next, a composite block Pcomp is created, whose type depends upon the fork type of current (l. 15). Block Pcomp is appended to P (l. 20). If current is a loop node (l. 21), then by constraint C2 there is a unique node n that follows current, so F OLLOW (current) = {n}. Since current is a loop node, current also has a unique successor node x due to preprocessing step 1. For x, a block Px is created by invoking StructuredComposition with x as current (l. 22). The last node of block Px is dummy service dummycurrent,n . To see why, note that since each loop has a single exit point (C2), there is a unique predecessor p of n that is part of the loop. Moreover, then node p must be a fork, since p can reach both the start of the loop as well as n. Since n ∈ F OLLOW (current), the algorithm will create a dummy service dummycurrent,n when fork p is processed as current. The loop can be left if dummycurrent,n is reached, so in(dummycurrent,n ) is true. If current is not a fork and not a loop node (l. 24), then a sequential block < current > is created (l. 25). Unlike the previous two cases, the successor nodes of current are now not processed immediately. In the previous two cases, the successor nodes of current appear at a more nested level than current in the structured composition, while in this case, the unique successor node of current appears at the same level of nesting as current. Consequently, since current is not a fork or a loop node, the unique successor node x of current is in some set F OLLOW (n), where n is a node. If current is the only predecessor of x, then x ∈ F OLLOW (current), so current = n. Otherwise, x is a join node and n is a fork node. In both cases, x is processed in l. 27-34 if n = current. In the second part (l. 27-34), the nodes in F OLLOW (current) are processed. If current has more than one F OLLOW node, then a node is inserted just before the nodes in F OLLOW (current) (l. 29). The detailed code for this step is listed in Figure 7 and discussed below. Observe that if current has more than one F OLLOW node, then current must be a fork, since each non-fork and each loop node have singleton F OLLOW sets by definition and by constraint C2, respectively. After this
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
16
csisp-final
Rik Eshuis and Paul Grefen
step, current has unique node next that follows current, so F OLLOW (current) = {next} (l. 31). The algorithm is invoked with next as current (l. 32), and the returned block is appended to P (l. 33). The code in Figure 7 details how the dummy service is inserted at l. 29. A fresh node f not in N is created (l. 1) and added to N (l. 2). Next, each edge (x, y) such that x ∈ services(P ), so x is in the block created for current, and y ∈ F OLLOW (current), is split in two edges (x, f ) and (f, y) (l. 3-5). The fork type of f is the fork type of current (l. 6). If the fork type of f is XOR (l. 7), then a guard condition g is created (l. 8). Since current is a fork node of type XOR and y ∈ F OLLOW (current), node y is a join and by C1 y has type XOR. The guard condition ensures that each successor y of f is only enabled if one the relevant predecessor nodes of y is executed in block P . Next, service p that is predecessor of y according to the input dependency graph, so (p, y) ∈ Ef ollows , is sometimes a fork. In that case, a statement dummyp,y is created at l. 8 of Figure 6, since y is in the F OLLOW set of current. So not p, but dummyp,y is the predecessor of y in the constructed block. For a given service y, its guard condition g is therefore defined as f ormulaP reCondition(y) = p∈pre(y) f ormula(p) where f ormula(p) = in(p) if p is not a fork, and f ormula(p) = in(dummyp,y ) otherwise. To illustrate the code in Figure 7, consider the running example shown in Figure 1 and let current = Check Credit. Two nodes, Inform Customer Service and Archive are in the F OLLOW set of Check Credit. Then Ef ollows becomes {(Handle Rejection,Inform Customer Service),(Cancel Order,Inform Customer Service),(Handle Payment,Archive)}. A fresh node f is inserted immediately before Inform Customer Service and Archive. The guard condition created for (f, Inform Customer Service) is in(Handle Rejection) ∨ in(Cancel Order) (guard g2 in Figure 2), while the guard condition created for (f, Archive) is in(Handle Payment) ∨ in(Inform Customer Service) (guard g1 in Figure 2). 4.5. Correctness and complexity As explained in the introduction of this section, for dependency graphs only having XOR types for forks and joins, the algorithm is similar to the algorithm of Baker,18 which structures reducible flow graphs into structured sequential programs. We have made two main extensions to Baker’s algorithm. First, we use guard conditions to ensure proper control flow, whereas Baker uses goto statements for this purpose. Second, dependency graphs allows branching type AN D in addition to type XOR, whereas flow graphs only use type XOR. We argue now for the correctness of the extensions made. First, the guard conditions are needed when a node x has multiple F OLLOW nodes. To represent the choice, a XOR branching point is inserted at l. 29 of Figure 6 immediately before the F OLLOW nodes of x. Only a bounded number of branching points is inserted, since the F OLLOW sets induce an acyclic order on the nodes of the dependency graph, that is, two different nodes cannot indirectly follow each
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
csisp-final
Composing Services into Structured Processes
17
other. Thus, the composition procedure terminates. Furthermore, guard conditions are only needed after an XOR block is completed, in order to ensure that the right next service s is enabled. The guard conditions created at l. 8 in Figure 7 ensure that s is only reached if its relevant predecessor services have been executed, as argued in the previous subsection. Second, regarding the correctness of using branching type AN D, observe that the algorithm ignores information from function joinT ype to construct blocks. This ignorance is allowed under constraint C1 and C2. Constraint C1 ensures that if a join j has type AN D, then each preceding fork f , so j ∈ F OLLOW (f ), has type AN D as well. Constraint C2 ensures that if j is in a loop, then f is in the same loop as well. This ensures that parallelism does not cross the boundary of a repeat or an XOR node, and can be expressed in a structured process. Regarding complexity, though the algorithm is recursive, we now argue that each node in the dependency graph is processed at most once as current by examining the three lines in which the algorithm is invoked recursively. Consequently, the algorithm runs in time linear in the size of the dependency graph. For l. 6, suppose there are two different forks f1 , f2 such that n ∈ post(f1 ) ∩ post(f2 ). Then n is by definition a join node. But each join node is in a F OLLOW set, which contradicts the test in l. 5. Thus, at l. 6, node n is only a successor of current, so the algorithm is invoked for n only once. For l. 22, each loop node has a single successor node by preprocessing step 1. So the algorithm is invoked for x only once. For l. 32, since the F OLLOW sets induce an acyclic order on the nodes, the algorithm is invoked for next only once. Thus, each node in the dependency graph is processed at most once as current in the algorithm. Next, the algorithm uses F OLLOW sets. To compute these, dominators and loop headers need to be computed. Dominators can be computed in almost linear time using Lengauer and Tarjan’s algorithm.26 Loop headers can also be computed in almost linear time using an algorithm due to Tarjan.27,28 Both computations are done sequentially: first dominators are computed, next loop headers. Thus, the F OLLOW sets can be computed in almost linear time. Therefore, the complete algorithm, including the computation of the F OLLOW sets, runs in almost linear time. 5. Service composition approach We embed the algorithm in a lightweight service composition approach. Input to the approach is a set of services. Output is a structured composition that orchestrates the services. The approach consists of three steps: S1 Analyzing dependencies between the input services. The result is an abstract dependency graph with empty fork and join types. This step can be fully automated, as explained below. S2 Adding branching types to the abstract dependency graph constructed in the previous step. Output is concrete dependency graph with non-empty
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
18
csisp-final
Rik Eshuis and Paul Grefen
fork and join types. This step needs user input, as explained below. S3 Executing the structured composition algorithm. Input is the concrete dependency graph constructed in the previous step, and output is a structured composition. This step is fully automated, since it is implemented by the algorithm shown in Figure 6. We next explain these steps in more detail. Analyzing dependencies. Dependencies can be due to data flow, events, business logic, etc. We now describe a semi-automated approach for detecting data flow dependencies between services. Services communicate with each other through messages. Each service has an input or an output message or both. Each message consists of a set of typed data items. In line with existing work on ontology-based matching of data types in the context of services,29,30 we consider business types here, not low-level data types. For example, a message could comprise a data item of type order and a data item of type customer. Business types can be derived from low level types actually used in web services in a semi-automated way, using input from either ontological tools31 or from a domain expert that annotates services.32 Based on the input/output data types of each service, we can define dependencies between services. If a service a outputs a data item with a certain type and service b needs as input a data item with the same type, then b depends on a. More advanced notions of matching outputs to inputs, for example those based on ontological concepts,29,30 can be easily used instead. If multiple messages refer to the same stateful data item, additional dependencies based on the states need to be defined, but we do not consider that here. This way, the dependencies of a dependency graph can be derived completely automatically from the signature of the services. Labeling branching types. The previous step results in a dependency graph without branching types, i.e. fork and join types are not specified. In this step, these branching types are specified by a user (a domain expert) either manually or semi-automatically. Labeling cannot be done fully automatically since for example data dependencies are usually not sufficient to decide on the type of a dependency. For example, consider two services that both have as input an order for some goods. If both services deal with shipping, they would be exclusive and type XOR would be used. If one service deals with picking up the requested goods from the warehouse and another with calculating the total fee to be paid, both services are required and type AN D would be useful. For similar reasons, domain knowledge of a user is needed to resolve violations of the healthiness constraints that are defined in Section 4.2. Executing structured composition algorithm. This step is implemented by the algorithm described in Section 4.4. The input dependency graph is output of
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
csisp-final
Composing Services into Structured Processes
19
the previous step. If the dependency graph satisfies the healthiness constraints of Section 4.2, the algorithm operates correctly. However, the algorithm also returns a composition if the healthiness constraints are violated, but then the composition is not guaranteed to be compliant with the dependency graph, as explained in Section 2. In the next section, we discuss how this approach has been used in the CrossWork project. In Section 7, we compare the approach to other related service composition approaches. There, we show that our approach is much more lightweight than existing ones, since our approach does not require that services have formal descriptions. 6. CrossWork We explain how the lightweight service composition approach of the previous section has been applied and implemented in the context of the IST CrossWork project.19,20 First we explain the CrossWork prototype system and its architecture. Next, we discuss a medium-sized case study from the project, which is based on a real-life business scenario. 6.1. Architecture The goal of CrossWork was to support the dynamic formation of Instant Virtual Enterprises (IVEs) within a Network of Automotive Excellence (NoAE). These networks consist of automotive suppliers that are potential collaborators. Upon request of an Original Equipment Manufacturer (OEM), which produces cars or trucks, an IVE is formed to deliver a requested product or component. Members of the IVE
Prod.
Goal Goal Decomp. Decomp.
Market
Team Team Format. Format.
Format. Format. UI UI
WF WF UI UI
WF WF Compos. Compos. WF WF Pattern Verificat. Verificat.
Monitor. Monitor. UI UI Global Global Enactm. Enactm.
WF WF Prototyp. Prototyp.
Local Local Enactm. Enactm. Legacy Legacy Integrat. Integrat.
Infra Fig. 8. CrossWork architecture
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
20
csisp-final
Rik Eshuis and Paul Grefen
are selected from the NoAE. Examples of a product requested by the OEM are the interior of a car or a watertank for a truck. The OEM incorporates the IVE product in its own product. Suppliers need to collaborate with each other since an individual supplier typically does not have the capability to handle a product request of an OEM all by itself. The IVE is formed dynamically, depending upon the specific product request received by one specific supplier (the main contractor). The two main steps in the dynamic formation of an IVE is finding the partner suppliers that can deliver parts of the requested product (team formation) and constructing a global IVE workflow that coordinates and integrates the local workflows of the individual suppliers (workflow formation). To support this way of working, a prototype system has been developed, the architecture of which is shown in Figure 8. The goal decomposition module receives an order specification from an OEM and decomposes it into a required set of components and services, using a product knowledge base. Next, the team formation module finds for each identified component and service a partner using a market and infrastructure knowledge base.33 The market knowledge base stores per organization the services and components it delivers and which activities it offers, while the infrastructure knowledge base stores information about the legacy systems of organizations. The team formation module composes the retrieved partners into a team that can cooperate according to the market and infrastructure knowledge base. Next, the workflow composition module queries the team members for their local workflow models (not shown) and composes these into a global process model using the algorithm of Section 4. Each local workflow model is abstracted into a black box service. The constructed global workflow model can be verified and validated. Finally, the global workflow model can be enacted. The global workflow coordinates the local workflows of the team partners. The process language used to model processes is a structured process language based on XRL.34 For the enactment, this language is mapped to BPEL.5 Finally, the monitoring UI allows the main contractor to view internal progress of the global workflow. This module uses process views offered by local workflows. The algorithm as implemented in the workflow composition module assumes that each fork has a unique F OLLOW node. Thus, no guard conditions are created or used in the constructed structured composition. If this assumption is lifted, the output BPEL process use boolean variables to encode the in predicates, as explained in Section 3.2. The workflow module is realized in Java. To compute dominators, we use the simple almost linear time algorithm by Lengauer and Tarjan,26 while we use an almost linear time algorithm by Tarjan27,28 to compute loop headers. To facilitate integration with the agent-based Team Formation module, the workflow composition module is implemented on top of JADE, a platform for multi-agent-based systems. To interface with the BPEL Global Enactment Module, which is based on Ac-
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
csisp-final
Composing Services into Structured Processes
21
produce Watertank body
produce Pump assemble Motorpump
prepare Order
assemble Watertank
produce Motor
buy Motorpump
produce Grommet
produce Cap
Fig. 9. Abstract dependency graph
tiveBPEL, we implemented an XSLT-based translation that maps the constructed structured XRL process to a BPEL process. More details on the implementation can be found elsewhere.20 6.2. Watertank case study We illustrate the CrossWork composition approach with a medium case study from CrossWork. An OEM requests a watertank from one member of a cluster of automotive suppliers. We assume the member has selected the partners. Figure 9 shows the abstract dependency graph obtained by analyzing the dependencies between the black-box workflows of each partner. Thus, the internal structure of each local workflow is hidden inside a service, but a partner supplier can offer an external process view to the entire network through some additional interfaces.35,20 In reality, the dependency graph is much larger, since a watertank contains over twenty different components, to but to ease the presentation, we only show the most important ones. The abstract dependency graph in Figure 9 violates the second constraint on dependency graphs, since there is, among others, an edge connecting prepare Order and assemble Motorpump. Therefore, the user (a domain expert) has to provide a corrected version, and also needs to annotate the dependencies with types. Figure 10 shows a corrected dependency graph with branching types. Some direct dependencies, e.g. between prepare Order and assemble MotorPump have been removed, because the user decided they are redundant (e.g. prepare Order delivers input to assemble MotorPump by means of produce Motor and produce Pump). Furthermore, by typing the dependency graph, the user has decided that a motorpump is either bought or produced. Three dummy services have been inserted to type the dependency graph in a correct way. Finally, the workflow is composed using the algorithm defined in Section 4. The
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
22
csisp-final
Rik Eshuis and Paul Grefen
produce Watertank body produce Pump Dummy2
Dummy1
AND
XOR
AND
produce Motor
assemble Motorpump XOR
Dummy3
prepare Order
assemble Watertank
buy Motorpump produce Grommet AND
AND
produce Cap
Fig. 10. Concrete dependency graph for Figure 9
SEQ
prepare Order
produce Watertank body
assemble Watertank
produce Grommet
XOR
SEQ
produce Cap
buy Motorpump
assemble Motorpump
AND
produce Pump
AND
produce Motor
Fig. 11. Structured composition for Figure 10
result of the composition is shown in Figure 11. This structured workflow can be translated directly into BPEL and be enacted using the global enactment module. Note that presented workflow models the operation of a part of an automotive supply chain. To ease the presentation, the case study has been simplified somewhat: in practice, a watertank has over twenty components. However, the actual branching of the resulting workflow has not been not simplified, i.e. all omitted components have corresponding production activities that are done in parallel to the shown
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
csisp-final
Composing Services into Structured Processes
23
production activities. 6.3. Evaluation A main feature of the CrossWork approach is its reliance on gradual automation.20 The user parties have clearly stated that complete automation of knowledge and decision procedures is at this moment not feasible as many of these have not yet been formalized enough to allow automation. Also, complete automation is at this point not desirable since the domain is not yet ready to accept fully computerled business, even if complete domain knowledge would be available. Therefore, a service composition approach that requires services to be modeled with pre and post conditions was not feasible. Instead, we have chosen to model services in a lightweight manner, using descriptions that are either based on user input or based on lightweight ontological descriptions which are derived from for example Bill of Materials. From such descriptions, dependencies among services can be easily derived. In addition, this approach supports the goal of gradual automation, since initial descriptions can be very coarse grained, only referencing a few data entities, while later descriptions can be more detailed, up to the level of being formally specified. However, for the automotive application domain that we considered in CrossWork, we expect complete formal descriptions for services are not feasible, for reasons mentioned above. Next, the actual composition models constructed for the CrossWork case studies are not so complex, though their size is greater than that of the model presented above. For example, in practice a watertank has over twenty components, for which production and assembly steps are needed. However, the actual branching is not so complex, since most steps are done in parallel. We conjecture that this simplicity is due to the nature of supply chains: all dependencies between different activities are due to movements of physical goods, and only a few choices are made during a process. Despite their simplicity, the models do capture part of an existing automotive supply chain, and can be used as starting point for enacting and monitoring Instant Virtual Enterprises that are responsible for supplying car parts to an OEM.20 Since the algorithm has an almost linear time complexity, the approach can also be applied to dependency graphs having more complex branching conditions. Thus, for the CrossWork project, the service composition approach proved to be useful and easy to implement. 7. Related work The topic of service composition has attracted the attention of many researchers. Existing approaches can be classified into three categories: manual, partly automated, or fully automated.12 Approaches in the manual category assume that a user manually designs a service composition, including the binding to concrete web services. This category includes languages like BPEL,5 OWL-S16 and JOpera36 and concrete composition prototypes.37,38
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
24
csisp-final
Rik Eshuis and Paul Grefen
In approaches in the semi-automated category,12,13,39,40 the user must either provide a template with a composition skeleton which defines the process logic, or give input to a composition procedure that constructs the template. This template can then be instantiated automatically by searching for atomic services that match each of the services specified in the skeleton.39,40 The focus of some approaches39,40 lies on automatically finding substitute services for specified services in a template, while others12,13 consider the problem of constructing a composition skeleton. Fully automated approaches6,7,8,9,10,11,41,42,43 mostly come from the field of AI planning or formal reasoning. These approaches require that web services are specified formally with pre- and post-conditions. This puts a considerable burden on the shoulders of service designers, since WSDL specifications do not require that level of detail and hence need to be annotated with the additional pre- and post-conditions. Our approach has some features in common with semi-automated approaches12,13 and fully automated approaches.6,11 We now examine more closely the differences and similarities between our approach and these two groups of related work. Similar to the approaches in the semi-automated category that construct a composition skeleton,12,13 our approach composes services by analyzing dependencies. However, the current semi-automated approaches focus on constructing unstructured process models, by translating dependencies directly intro control flow, whereas we focus on constructing structured process models. As explained in Section 2, constructing structured models is more complex, since dependencies cannot be translated directly into structured control-flow. Thus, the main difference with these semi-automated approaches is the use of the structured composition algorithm in our approach. Regarding fully automated approaches, most of them7,8,9,10,11,41,42,43 only consider sequential compositions, whereas we consider compositions containing parallelism. Nevertheless, there are a few automated approaches6,11 that construct structured process models. Duan et al.6 consider structured process models with choice and parallelism, but Sirin et al.11 only consider sequential structured compositions. The time complexity of these approaches is not clearly indicated. However, the approach of Duan et al.6 requires back tracking, which suggests a non-polynomial time complexity, whereas the AI planning technique used by Sirin et al.11 is NPcomplete. In contrast, our algorithm is almost linear time and thus polynomial. Our approach is also more lightweight, since we do not require formally specified preand post-conditions, and thus users do not have to provide as much input as in the mentioned other approaches. Note that we do use guard conditions in the compositions, but these are generated automatically by the algorithm and are not due to pre- or post-conditions of services. Drawback, however, is that our approach is less precise, since service dependencies are less detailed than pre- and post-conditions. In addition, we have defined healthiness constraints under which the algorithm operates correctly. All fully automated approaches do not use such constraints; thus it is unclear when a given set of services with pre- and post-conditions can be
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
csisp-final
Composing Services into Structured Processes
25
composed together successfully. Thus, the main difference with the fully automated approaches is the use of an efficient algorithm and the use of lightweight service descriptions in our approach. Another way of dealing with the unstructuredness problem would be to transform a dependency graph into an unstructured process model using e.g. Refs. 12,13 and to transform that process model into a structured one, using approaches for structuring process models.14,15,25,44,45,46 Most approaches for structuring process models14,15,25,44,45 only sketch how techniques developed in the 70’s and 80’s to structure sequential programs with goto statements can be used to structure process models containing parallelism. Unfortunately, converting an unstructured process model into a structured one has revealed to be quite intricate, because process models can contain parallelism while programs are sequential. Consequently, automated transformations exist only for sequential process parts. Ouyang et al.46 focus on recognizing structured patterns in unstructured process models and mapping those to structured BPEL fragments, but do not address how an unstructured process model can be structured. For example, the unstructured process model corresponding to Figure 1 cannot be structured using the approach of Ouyang et al. Our approach is more powerful than all these approaches since we use an automated structuring algorithm for unstructured dependency graphs that may contain parallelism. This algorithm generalizes an existing algorithm for structuring sequential flow graphs.18 Moreover, we identified healthiness constraints on dependency graphs under which the structuring algorithm operates correctly; we are unaware of other work defining similar constraints for process graphs that contain parallelism. Finally, there are approaches47,48,49 that consider the problem of decomposing a global process view into a set of local process models that collaboratively implement the global process view. The problem there is to ensure that the behaviors in the global process view are preserved by the collaborative local processes. Instead, our approach focuses on how to compose a global process view from a set of local services that do not have a behavioral interface. An interesting extension of our approach would be to consider local services that do have a behavioral interface. In that case, the techniques of Refs. 47,48,49 can be useful to show the correctness of the constructed composition, i.e. that the global composition can be realized by the local services. In summary, the most distinguishing feature of our approach is the usage of an efficient algorithm that builds on concepts from compiler theory, whereas the most closely related other service composition approaches are either not algorithmic12,13 or build upon AI planning11 or formal reasoning,6 which results in less efficient algorithms and requires formal annotation. Our approach is especially useful in settings where detailed information about services is lacking and cannot be provided that easily, as is the case in the CrossWork cases.
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
26
csisp-final
Rik Eshuis and Paul Grefen
8. Conclusions and further work We have presented a simple yet efficient algorithm for composing a set of cooperative services into a structured process model. Given a dependency graph, the algorithm constructs a structured composition satisfying the given dependencies. The algorithm is embedded in a lightweight, semi-automated service composition approach, which needs domain knowledge in the form of user input to annotate dependency graphs with branching types and to resolve constraint violations. However, these activities are less laborious than annotating services with formal pre- and post-conditions, which is required by most other comparable service composition approaches, and is also applicable if services are not formally specified. Moreover, the overall approach is more efficient than comparable approaches and therefore useful in business settings where speed is important.20 The main advantage of using structured processes is that such processes only use basic workflow patterns that are supported by virtually every process enactment tool.17 This feature enables the constructed structured compositions to be encoded straightforwardly into any process language including BPEL,5 OWL-S16 and other standard languages like XPDL.23 However, it considerably complicates the composition task, since dependencies cannot be translated directly into control-flow links. The service composition approach has been implemented in a prototype in the context of the CrossWork project.19 This project studied the dynamic formation and collaboration of virtual enterprises. Though the case studies are of medium size, the approach has been applied successfully to construct workflows that specify parts of automotive supply chains. Since the algorithm runs in almost linear time, it scales up to larger models. Though structured process models are less expressive than unstructured ones, for supply chains complex branching types are not used in practice, so structured process models can model supply chain processes satisfactorily. Further work includes extending the approach to support the composition of local services that have a behavioral interface.35 Another interesting extension is to let quality of service aspects guide decisions made in the composition phase.50
Acknowledgements Sven Till is thanked for implementing the first version of the composition module of the CrossWork system. All CrossWork partners are acknowledged for their work on the CrossWork project and system. This work has been supported by the IST project CrossWork (No. 507590) and the IST Network of Excellence INTEROP (No. 508011).
References 1. P. Grefen. Towards dynamic interorganizational business process management (keynote talk). In S. Reddy, editor, Proc. WETICE 2006, pages 13–18. IEEE Computer Society, 2006.
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
csisp-final
Composing Services into Structured Processes
27
2. G. Alonso, F. Casati, H. Kuno, and V. Machiraju. Web Services: Concepts, Architectures and Applications. Springer, 2004. 3. M.P. Papazoglou. Web Services: Principles and Technology. Prentice Hall, 2007. 4. E. Christensen, F. Curbera, G. Meredith, and S. Weerawarana. Web Services Description Language (WSDL) 1.1. http://www.w3.org/TR/wsdl, 2001. 5. A. Alves, A. Arkin, S. Askary, C. Barreto, B. Bloch, F. Curbera, M. Ford, Y. Goland, A. Guzar, N. Kartha, C. K. Liu, R. Khalaf, D. Koenig, M. Marin, V. Mehta, S. Thatte, D. Rijn, P. Yendluri, and A. Yiu. Web services business process execution language version 2.0 (OASIS standard). WS-BPEL TC OASIS, http://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.html, 2007. 6. Z. Duan, A. Bernstein, P. Lewis, and S. Lu. A model for abstract process specification, verification and composition. In Proceedings of the 2nd international conference on Service oriented computing (ICSOC’04), pages 232–241. ACM Press, 2004. 7. T. Madhusudan and N. Uttamsingh. A declarative approach to composing web services in dynamic environments. Decision Support Systems, 41(2):325–357, 2006. 8. M. Matskin and J. Rao. Value-added web services composition using automatic program synthesis. In C. Bussler, R. Hull, S.A. McIlraith, M.E. Orlowska, B. Pernici, and J. Yang, editors, CAiSE’02 workshop on Web Services, E-Business, and the Semantic Web, Revised Papers, Lecture Notes in Computer Science 2512, pages 213–224. Springer, 2002. 9. S.A. McIlraith and T.C. Son. Adapting golog for composition of semantic web services. In D. Fensel, F. Giunchiglia, D.L. McGuinness, and M.-A. Williams, editors, Proc. of the 8th International Conference on Principles and Knowledge Representation and Reasoning (KR-02), pages 482–496. Morgan Kaufmann, 2002. 10. S.R. Ponnekanti and A. Fox. Sword: A developer toolkit for building composite web services. In Proc. of the 11th International World Wide Web Conference, 2002. 11. E. Sirin, B. Parsia, D. Wu, J.A. Hendler, and D.S. Nau. HTN planning for web service composition using SHOP2. Journal of Web Semantics, 1(4):377–396, 2004. 12. A. Brogi and R. Popescu. Towards semi-automated workflow-based aggregation of web services. In F. Casati, P. Traverso, and B. Benatallah, editors, Proceedings of Third International Conference on Service Oriented Computing (ICSOC05), Lecture Notes in Computer Science 3826. Springer, 2005. 13. Q. Liang, L. N. Chakarapani, S. Su, R. Chikkamagalur, and H. Lam. A semi-automatic approach to composite web services discovery, description and invocation. Int. Journal on Web Service Research, 1(4):64–89, 2004. 14. B. Kiepuszewski, A.H.M. ter Hofstede, and C. Bussler. On structured workflow modelling. In B. Wangler and L. Bergman, editors, Proc. CAiSE ’00, pages 431–445. Springer, 2000. 15. R. Liu and A. Kumar. An analysis and taxonomy of unstructured workflows. In W.M.P. van der Aalst, B. Benatallah, F. Casati, and F. Curbera, editors, Proc. 3rd Conference on Business Process Management (BPM 2005), Lecture Notes in Computer Science 3649, pages 268–284, 2005. 16. D. Martin (editor), M. Burstein, J. Hobbs, O. Lassila, D. McDermott, S. McIlraith, S. Narayanan, M. Paolucci, B. Parsia, T. Payne, E. Sirin, N. Srinivasan, and K. Sycara. Owl-s: Semantic markup for web services, 2004. http://www.daml.org/services/owl-s/1.1/ overview. 17. W.M.P. van der Aalst, A.H.M. ter Hofstede, B. Kiepuszewski, and A.P. Barros. Workflow patterns. Distributed and Parallel Databases, 14(1):5–51, 2003. 18. B.S. Baker. An algorithm for structuring flowgraphs. Journal of the ACM, 24(1):98– 120, 1977.
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
28
csisp-final
Rik Eshuis and Paul Grefen
19. CrossWork consortium. Crosswork project, IST no. 507590. http://www.crosswork.info. 20. P. Grefen, N. Mehandjiev, G. Kouvas, G. Weichhart, and R. Eshuis. Dynamic business network process management in instant virtual enterprises. Beta Working Paper Series, WP 198, 2007. To appear in Computers in Industry. 21. A.V. Aho, R. Sethi, and J.D. Ullman. Compilers: Principles, Techniques, and Tools. Addison Wesley, 1986. 22. R. Eshuis, P. Grefen, and S. Till. Structured service composition. In S. Dustdar, J.L. Fiadeiro, and A. Sheth, editors, Proc. of the 4th International Conference on Business Process Management (BPM 2006), Lecture Notes in Computer Science 4102, pages 97–112. Springer, 2006. 23. Workflow Management Coalition. Workflow process definition interface – XML process definition language. Technical Report WFMC-TC-1025, Workflow Management Coalition, 2002. 24. M. Reichert and P. Dadam. ADEPTflex: Supporting Dynamic Changes of Workflow without Loosing Control. Journal of Intelligent Information Systems, 10(2):93–129, 1998. 25. W. Zhao, R. Hauser, K. Bhattacharya, B.R. Bryant, and F. Cao. Compiling business processes: untangling unstructured loops in irreducible flow graphs. Int. Journal of Web and Grid Services, 2:68–91, 2006. 26. T. Lengauer and R.E. Tarjan. A fast algorithm for finding dominators in a flowgraph. ACM Transactions on Programming Languages and Systems, 1(1):121–141, 1979. 27. G. Ramalingam. Identifying loops in almost linear time. ACM Transactions on Programming Languages and Systems, 21(2):175–188, 1999. 28. R.E. Tarjan. Testing flow graph reducibility. Journal of Computer and System Sciences, 9(3):355–365, 1974. 29. J. Cardoso and A. Sheth. Semantic e-workflow composition. Journal of Intelligent Information Systems, 21(3):191–225, 2003. 30. M. Paolucci, T. Kawamura, T.R. Payne, and K.P. Sycara. Semantic matching of web services capabilities. In I. Horrocks and J.A. Hendler, editors, Proc. International Semantic Web Conference (ISWC’02), Lecture Notes in Computer Science 2342, pages 333–347. Springer, 2002. 31. T. Syeda-Mahmood, G. Shah, R. Akkiraju, A.-A. Ivan, and R. Goodwin. Searching service repositories by combining semantic and ontological matching. In Proc. ICWS 2005, pages 13–20. IEEE Computer Society, 2005. 32. A. Hess and N. Kushmerick. Machine learning for annotating semantic web services. In Proc. AAAI Spring Symposium on Semantic Web Services, 2004. 33. M. Carpenter, N. Mehandjiev, and I.D. Stalker. Flexible behaviours for emergent process interoperability. In S. Reddy, editor, Proc. WETICE 2006, pages 249–255. IEEE Computer Society, 2006. 34. W.M.P. van der Aalst and A. Kumar. XML Based Schema Definition for Support of Inter-organizational Workflow. Information Systems Research, 14(1):23–46, 2003. 35. P. Grefen, H. Ludwig, A. Dan, and S. Angelov. An analysis of web services support for dynamic business process outsourcing. Information and Software Technology, 48(11):1115–1134, 2006. 36. C. Pautasso and G. Alonso. The JOpera visual composition language. Journal of Visual Languages & Computing, 16(1-2):119–152, 2005. 37. B. Benatallah, Q.Z. Sheng, and M. Dumas. The self-serv environment for web services composition. IEEE Internet Computing, 7(1):40–48, 2003. 38. J. Yang and M. Papazoglou. Service components for managing the life-cycle of service
November 18, 2008 13:37 WSPC/INSTRUCTION FILE
csisp-final
Composing Services into Structured Processes
29
compositions. Information Systems, 29(2):97–125, 2004. 39. F. Casati and M.-C. Shan. Dynamic and adaptive composition of e-services. Information Systems, 26(3):143–162, 2001. 40. B. Medjahed, A. Bouguettaya, and A. Elmagarmid. Composing Web services on the Semantic Web. The VLDB Journal, 12(4):333–351, 2003. 41. D. Berardi, D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Mecella. Automatic service composition based on behavioral descriptions. International Journal of Cooperative Information Systems, 14(4), 2005. 42. P. Traverso and M. Pistore. Automated composition of semantic web services into executable processes. In S.A. McIlraith, D. Plexousakis, and F. van Harmelen, editors, Proc. Third International Semantic Web Conference ISWC (2004), Lecture Notes in Computer Science 3298, pages 380–394. Springer, 2004. 43. R. Hull, M. Benedikt, V. Christophides, and J. Su. E-services: A look behind the curtain. In Proc. of the 22nd ACM Symposium on Principles of Database Systems, pages 1–14. ACM, 2003. 44. J. Koehler and R. Hauser. Untangling unstructured cyclic flows - a solution based on continuations. In R. Meersman and Z. Tari, editors, Proc. CoopIS/DOA/ODBASE 2004, Lecture Notes in Computer Science 3290, pages 121–138. Springer, 2004. 45. R. Hauser, M. Friess, J.M. K¨ uster, and J. Vanhatalo. Combining analysis of unstructured workflows with transformation to structured workflows. In Proc. Tenth IEEE International Enterprise Distributed Object Computing Conference (EDOC 2006), pages 129–140. IEEE Computer Society, 2006. 46. C. Ouyang, M. Dumas, A. ter Hofstede, and W.M.P. van der Aalst. Pattern-based translation of BPMN process models to BPEL web services. International Journal of Web Services Research, 5(1):42–61, 2008. 47. W.M.P. van der Aalst and M. Weske. The P2P approach to interorganizational workflows. In K.R. Dittrich, A. Geppert, and M.C. Norrie, editors, Proc. of the 13th Int. Conference on Advanced Information Systems Engineering (CAiSE’01), Lecture Notes in Computer Science 2068, pages 140–156. Springer, 2001. 48. R. Khalaf and F. Leymann. E role-based decomposition of business processes using bpel. In Proc. ICWS 2006, pages 770–780, 2006. 49. J.M. Zaha, M. Dumas, A.H.M. ter Hofstede, A.P. Barros, and G. Decker. Service interaction modeling: Bridging global and local views. In Proc. EDOC 2006. IEEE Computer Society, 2006. 50. D. Ardagna and B. Pernici. Adaptive service composition in flexible processes. IEEE Transactions on Software Engineering, 33(6):369–384, 2007.