A Semantic Based Redesigning of Distributed Workflows Vijayalakshmi Atluri1
Wei-Kuang Huang2
Elisa Bertino3
1 Center for Information Management, Integration and Connectivity and
MS/IS Department, Rutgers University 180 University Avenue, Newark NJ 07102
[email protected] 2 Department of Operation and Information Management
University of Connecticut 368 Fairfield Road, U-41M, Storrs, CT, 06269-2041
[email protected] 3 Dipartimento di Scienze dell’Informazione
Universit`a degli Studi di Milano Via Comelico, 39/41 20135 Milano, Italy
[email protected]
Abstract Workflow management systems (WFMS) support the modeling and coordinated execution of processes within an organization. To coordinate the execution of the various activities (or tasks) in a workflow, task dependencies are specified among them. Often, the workflow application domains are such that the workflow is a long-running activity and the various tasks that constitute the workflow need to be executed by systems that are distributed and autonomous in nature, possibly owned by different organizations. In such an environment, it is desirable to minimize the number of communications among the distributed sites and minimize the number of tasks that need to wait for their execution for those at other sites. In this paper, we propose an approach that can automatically redesign a workflow in such a way that the interference among sites is minimized. Our approach first performs a semantic classification of task dependencies and categorizes them in several ways: conflicting vs conflict-free, result-independent vs resultdependent, strong vs weak, and abortive vs non-abortive. Based on this categorization, we basically apply three techniques – split, compensate and substitute – to minimize the number of communications and the amount of interference among sites. Our approach is such that it keeps the number of tasks to be compensated to minimal, and minimizes the number of tasks to wait, using the presumed commit and presumed abort notions. We then show that the redesigned workflow is equivalent to the original workflow. Key Words: Workflows, Distributed Systems, Workflow Redesign, Semantic Classification, Workflow Equivalence, Task Dependencies
1 Introduction Workflow management systems (WFMS) represent today a key technological infrastructure for effectively managing business processes (workflows) in several application domains including insurance policy/claims This work of Vijayalakshmi Atluri and Wei-Kuang Huang was partially supported by National Science Foundation under grant IRI9624222.
processing, and healthcare management. A workflow is the specification of the activities (tasks), and relations among these activities, that represent a business process of an organization. Relations among tasks in a workflow are typically represented as task dependencies. Such dependencies enable the coordinated executions of tasks by specifying for example that a task must be executed only if another given task aborts. An important requirement of many application domains, adopting WFMS technology, is the support for distributed workflows running on possibly heterogeneous, pre-existing systems. Many application domains are inherently distributed because the organizations, whose processes the applications implement, are geographically dispersed at various sites. Insurance and healthcare management are good examples of nationwide distributed organizations. Moreover, in many organizations, workflows may need to access pre-existing databases or interact with pre-existing applications. Furthermore, workflows are often long-running activities in nature. A distributed workflow can be thought of as a set of tasks, where each task may require accessing data from several sites, and a set of possibly distributed task dependencies, called inter-site dependencies. An intersite dependency is such that the tasks related by the dependency reside at different sites. Enforcing inter-site dependencies is particularly expensive since it involves communications between sites. A large number of inter-site dependencies may have serious consequences on the performance and the degree of concurrency. Another important issue arising in workflows running in heterogeneous systems is that it is not always possible to detect the final state of a task until after the task has entered the state (because no precommit state is made externally available), or it is not possible to delay the completion of a task. The need for delaying the completion of a task arises whenever the possible outcome of the task (e.g. abort, or commit) depends on the outcome of some other task. These issues are not typical only to workflows but arise also in distributed transaction management in the framework of heterogeneous systems. An approach often adopted is to allow tasks to complete and, if necessary, undo their actions through the use of compensating tasks. It is, however, important to keep the number of compensating tasks to a minimum. In this paper, we provide an approach to enforce inter-site dependencies of a workflow that reduces the number of communications among sites and the number of tasks that need to wait for their execution and completion for those at other sites. Our approach begins with a semantic classification of task dependencies, which is based on a close examination of the source of the task dependencies. We categorize them in several ways: conflicting versus conflict-free, result-independent versus result-dependent, strong versus weak, and abortive versus non-abortive. Based on this classification, we develop methodologies to automatically redesign them so that inter-site dependencies are minimal. Our approach to reduce inter-site dependencies is based on three techniques - (1) splitting a task, (2) compensating a task and (3) substituting a dependency. While applying the compensating and substituting approaches, we utilize the presumed abort and presumed commit notions. Many distributed atomic commit protocols use the notion of presumed commit (presumed abort) to reduce the communication cost among sites, which makes the assumption that most tasks commit successfully (abort) [16]. Specifically, our goal is to enforce the task dependencies so as to 1. increase concurrent execution among the tasks (As the workflow is a long running activity, this will result in improved response time.); 2. minimize the number of inter-site communications when a task has to depend on the execution of tasks at other sites either due to accesses from other sites or due to semantic relationships; 3. reduce the number of tasks that need to wait at their precommit state before they can commit or abort because of the dependencies from tasks at other sites; 4. keep the number of tasks required to be compensated to minimal; and 5. reduce the number of tasks unnecessarily executed that would eventually have to be aborted. The remainder of the paper is organized as follows. Section 2 reviews the workflow model and outlines our distributed system model. Section 3 provides an approach to categorize the various types of task dependencies based on the semantics of the dependencies. Section 4 presents our notion of equivalence through which
ab
cm
c
a ex b
in
Figure 1: Task structure we will show the redesigned workflow is equivalent to the original one. Section 5 presents our approaches to redesigning the workflow. Section 6 presents the related research in this area. Finally section 7 provides conclusions. Due to space limitations, we omit the proof of the theorem in this paper. The reader may refer to [8].
2 The Model In this section we introduce all definitions and relevant aspects that we will use in the remainder of the paper. We start by first introducing the workflow model and then present the heterogeneous distributed system model.
2.1 The Workflow Model A workflow is a set of tasks with task dependencies defined among them. A task in its simplest form consists of a set of data operations (either read or write) and task primitives fbegin, abort, commitg. All data operations in a task must be executed only after the begin primitive is issued. All tasks must end with either a commit or abort. A primitive may move a task from one state to another. A task (ti) can be in one of the following states: initial state (ini ), execution state (exi ), commit state (cmi ) or abort state (abi ). (We use bi ; ai and ci to denote the begin, abort and commit primitives of ti .) Figure 1 shows the general structure of a task where the initial and final state are denoted by a filled circle, and an intermediate state by an unfilled circle. As the task structure shows, the initial state of a task may be one of the final states. (In case when a task does not execute). The following definition formalizes our notion of task. Definition 1 Let D be the set of all data objects. A task ti is a partial order with an ordering relation i where 1. ti = DOi [ POi where DOi fri[x]; wi[x]jx 2 Dg and POi fbi; ai; ci g; 2. ci 2 ti iff bi 2 ti ^ ai 62 ti, and ai 2 ti iff bi 2 ti ^ ci 62 ti ; 3. for any oi 2 DOi , bi i oi i either ci or ai (whichever is in ti); and 4. if ri [x]; wi[x] 2 ti , then either ri [x] i wi[x] or wi [x] i ri[x].
Definition 2 Two data operations oi [d] and object d and at least one of them is a write.
2
oj [d] conflict with each other if they operate on the same data 2
Coordination among different tasks is based on dependencies specified in terms of the above task primitives. Dependencies state relationships that must hold among the various states of the tasks within a workflow. Therefore, dependencies are an important component of a workflow specification. The following definition formalizes our notion of workflow.
Dependency type Begin-on-Commit Dependency
?!
Begin Dependency
?!
Abort Dependency
?!
Termination Dependency
?!
Strong Commit Dependency Force Begin-on-abort (commit/begin) Dependency
Exclusion Dependency
Weak begin-on-commit
Description A task tj cannot begin until ti commits bc (represented as ti tj ). A task tj cannot begin until ti has begun b (represented as ti tj ). A task tj must abort if ti aborts a (represented as ti tj ). A task tj can terminate (either commit or abort) only after the completion (commit or abort) of ti t (represented as ti tj ). If a task ti commits then tj must commit sc (represented as ti tj ). A task tj must begin if ti aborts (commits/begins) fba fbc fbb (represented as ti tj (ti t j / ti tj )). Given any two tasks ti and tj , if ti commits tj must abort, or vice versa e (represented as ti tj ). Given any two tasks ti and tj , tj can begin if ti commits wbc (represented as ti tj ).
?! ?!
?!
?!
?!
?!
Table 1: A list of possible control flow dependencies Definition 3 A workflow W can be defined as a directed graph whose nodes are the tasks t1; t2 : : :tn in the x workflow and edges are the task dependencies ti ?! tj , where ti; tj 2 W and x denotes the type of dependency. 2 Because of the large variety of coordination requirements that a workflow may need to support, various kinds of dependencies have been proposed. A possible classification is presented in [17], where three basic types of task dependencies are identified: control-flow dependencies, value dependencies and external dependencies. Control-flow dependencies may in turn involve explicit transmission of data as part of the result of a task. We call such dependencies control-flow dependencies with data flow. In particular, a control-flow dependency specifies the conditions, specified as the state sti of task ti, under which a task tj can enter state stj . Table 1 reports a list of possible control-flow dependencies, for two given tasks ti and tj . Other control flow dependencies have been devised and we refer the reader to [13, 12] for a comprehensive list. Control-flow dependencies with data-flow extend control flow dependencies by imposing in addition transmission of data values between tasks. Such dependencies are useful when a task needs to wait for data from another task. More specifically, a control-flow dependency with data-flow can be defined as follows: A task tj can enter state stj only after task ti enters state sti and ti passes values of data objects to tj . Notice that control-flow dependency with data-flow is meaningful only for limited combinations of sti and stj . For example, sti and stj can be “commit” and “begin,” respectively, but cannot be “begin” and “commit.” Value dependencies are based on more general conditions than those used in control flow dependencies. More specifically, a value dependency can be defined as follows: A task tj can enter state stj only after task ti’s outcome satisfies a condition ci. The condition in the above statement can be a logical expression whose value is either 0 or 1. For example, “tj can begin if ti is a success (semantically). Unlike the prior types, external dependencies are caused by some parameters external to the system, such as time. An external dependency can be defined as follows: A task ti can enter state sti only after if a certain condition cj is satisfied where the parameters in cj are external to the workflow. Examples include a task ti can start its execution only at 9:00am or task tj can start execution only 24hrs after the completion of task tk . Since external dependencies require special modeling techniques that can incorporate external events, we
do not consider them in this paper. The methodologies developed in this paper can be extended to cater to value dependencies by simply modeling values as states. (Refer to [3] for more details.) Therefore, we do not discuss them explicitly in this paper, but focus our discussion on the control-flow dependencies only.
2.2 Heterogeneous Distributed System Model We assume that the heterogeneous distributed system consists of a set S of participating sites where all the sites in the system are connected via communication links. We assume there exists a workflow management system that sends tasks to an appropriate site si and each site si is equipped with a DBMS to execute the task. Each site si stores a set of data items Di . We assume that the Di ’s are disjoint. That is, every data item d is stored at only one site. We use s(d) to represent the site at which d is stored. In addition, we make the following assumption. A task ti is allowed to perform write operations on data items that belong to one and only one site, say sj and we say that ti belongs to site sj . We use s(ti ) to denote the site to which ti belongs. However, ti is allowed to perform read operations on data items stored at more than one site. This is not an unreasonable assumption. For example, consider a travel reservation workflow consisting of three tasks: reserving an airline ticket ( t1), reserving a hotel room ( t2 ) and reserving a rental car (t3). While t1 updates the hotel database by reserving a hotel room, it does not update neither airline database nor car rental database, but it may read the status of the airline database before making a hotel reservation. Many organizations require this property due to reasons of security (integrity) and/or autonomy. A workflow may consist of tasks that belong to different sites. For the purpose of our work, we distinguish task dependencies as follows: The dependency between tasks of the same site is referred to as intra-site dependency and the one between tasks of different sites as inter-site dependency. Since our aim is to minimize the number of inter-site communications while executing the workflow, we concentrate on inter-site dependencies and examine ways to redesign them in order to achieve our objective. Example 1 Consider a workflow that computes the weekly pay of all employees at the end of each week at each project location. This process involves several tasks as follows. Task t1 : compute the number of hours worked by an employee h, which is the sum of regular hours worked ( n) and overtime hours worked ( o) by the employee during that week, Task t2 : calculate the weekly pay of an employee (p) by multiplying h with the hourly rate of the employee (r), and Task t3 : after computing the pay for the week, reset h; n and o to zero. Assume n, o and h are maintained at the project location (site A), whereas p and r are maintained at the administrative office (site B ). Thus t1 and t3 belong to site A and t2 belongs to site B . The following task bc dependencies are required for a meaningful workflow: task t2 can begin only after t1 commits, thus t1 ?! bc t2, and t3 can begin only after t2 commits, thus t2 ?! t3 , as shown in figure 2. Note here that both these dependencies are inter-site dependencies. 4
3 Semantic Classification of Task Dependencies In this section, we closely examine the semantics of each type of dependency. We argue that only some types of dependencies can be specified in workflows and other types do not exist in a correct workflow specification. To reason about the semantics of workflow dependencies (control-flow), it is important to notice that there are two different categories of dependencies, each arising from different requirements: 1. The first category of dependencies arises to force the order of (conflicting) operations on shared data objects. 2. The second category of dependencies arises to force properties such as atomicity, mutual exclusion, etc.
bc
As an example, consider once again the task dependencies in example 1. Dependency t1 ?! t2 is to make sure that the salary is computed only after the computation of the number of hours worked by an employee is
Site A
Site B Site A
t
1
bc
t
bc
1
t
2
t3
bc
bc t
Site B
bc t
3
a
2
Figure 2: The workflow in example 1
t4
Figure 3: The workflow in example 2
bc
completed. The intention of dependency t2 ?! t3 is to avoid overwriting of n and o by t3 before t2 reads them. Thus, the purpose of this dependency is to force a specific order on the conflicting operations, and therefore belongs to the first category. Dependencies in the second category are specified according to the semantics of the workflow, for example to enforce the mutual exclusion of commit operations for two tasks. Example 2 Consider a workflow that arranges a travel schedule for a person P. Assume that this involves arranging a trip from New York to Delhi and then Delhi to Hyderabad. Also assume that the second part of the trip is offered by a local airline company, and therefore, two separate airline reservations have to be made. Assume this workflow consists of the following four tasks: reserving a ticket for the first leg of the trip (denoted as t1 ), purchasing the ticket for the first leg ( t2), reserving a ticket for the second leg of the trip ( t3), and purchasing the ticket for the second leg (t4 ), where t1 and t2 belong to site A, and t3 and t4 belong to site B . The following task dependencies exist: Purchasing a ticket cannot be started unless reserving the ticket is bc bc complete. Thus, t1 ?! t2 and t3 ?! t4 . Moreover, reserving a flight for the second part of the trip has to be bc done only after making sure that the flight is available for the first leg of the trip, thus t1 ?! t3 . Furthermore, if purchasing the ticket for the second leg of the trip aborts, purchasing the ticket for the first leg of the trip a also needs to be aborted, thus, t4 ?! t2. While the first two task dependencies are intra-site dependencies, the latter two are inter-site dependencies, as shown in figure 3. 4
bc
a
The intention of the inter-site dependencies t1 ?! t3 and t4 ?! t2 in the above example is to capture the semantics of the workflow rather than forcing an order between conflicting operations. Thus this dependency falls under the second category. If we examine once again the two types of the source of dependencies and analyze their effect on the workflow, we can make the following observation. Imagine the following two scenarios: in the first, assume the inter-site dependency ti ! tj is enforced, whereas in the second, this dependency is not enforced. With the a second category of dependency (e.g., t4 ?! t2 in example 2) the result of tj might be different if ti ! tj is not enforced than from the case when it is enforced. In other words, the result of tj might be affected if ti ! tj is not enforced. However, the dependency ti ! tj does not impact the result of ti . Thus we call the second category of dependencies result-dependent (RD). On the other hand, consider the first category of bc dependencies (e.g., t2 ?! t3 in example 1). If the dependency is not enforced, the result of tj will not be affected but that of ti will be affected. This can only occur when the two tasks share common data. Thus, we call the first category of dependencies conflicting (CN). In the following, we formally define these categories.
x
Definition 4 A dependency between two tasks ti ?! tj is said to be conflicting (CN) if there exist at least x 2 two conflicting operations oi [d] and oj [d] (i 6= j ); otherwise ti ?! tj is said to be conflict-free (CF).
On the other hand, from the perspective of task result, a dependency can either be result-independent or result-dependent. Formally:
x
Definition 5 A dependency ti ?! tj is said to be result-dependent (RD) if the result of executing the child x is different when the dependency is enforced from that when it is not enforced; otherwise ti ?! tj is said to be is result-independent (RI). 2 The intuitive idea behind this classification is that the result of the execution of either the child (in case of RD) or the parent (in case of CN) must be different when the dependency is enforced than from the case when it is not enforced. Thus, there cannot be any dependency which is both conflict-free and result-independent. An additional categorization can be done based on the logical nature of dependency. In general, dependencies can be grouped into strong and weak.
x
Definition 6 Given a control-flow dependency ti ?! tj , if the dependency implies a logical relationship sti =) stj (sti (= stj ), we say that it is weak (strong). 2 According to the above definition, a strong dependency specifies a logical relationship such that tj can enter state stj only if task ti enters state sti . That is, the strong type specifies the necessary condition to enforce x the dependency. In other words, ti ?! tj can be interpreted as tj can enter state stj only if ti enters sti . Moreover, in order to enforce the dependency, sti must precede stj (sti stj ). An example of strong type
bc
is the begin-on-commit dependency (ti ?! tj ), which states that a task tj cannot begin unless ti commits. (Other examples include b, t, gc etc.) In practice, strong type dependency is used to specify the precondition(s) for a particular event to occur. On the other hand, a weak dependency states that if ti enters state sti then tj can/must enter state stj , but tj can enter stj even ti has not entered sti . That is, the weak type specifies the sufficient condition to enforce sc the dependency. For example, the strong commit dependency, (represented as ti ?! tj ) which states “if a task ti commits then tj must commit” is weak type. (Other examples include a, fba, wba, wbc, c, e, etc.). In a workflow, the weak type can be used in situations where a particular workflow state has to trigger an event. In addition to the above classification, dependencies can be categorized based on the resultant state of the child task, which is as follows. Definition 7 Given a control-flow dependency otherwise it is non-abortive.
x t , the dependency is of type abortive if st is ab ; ti ?! j j j 2
The classification given by the above definition applies to both strong and weak dependencies. For example, abortive type dependencies include abort, exclusion dependency, etc. whereas non-abortive type dependencies are commit, strong commit, begin on commit, serial, begin on abort, etc..
No-Force-Commit and No-Prevent-Abort Assumptions: When enforcing the dependencies we make the following two assumptions. No-Force-Commit (NFC) Assumption: No task execution can be guaranteed to commit. However, a task can be forced to begin or abort. No-Prevent-Abort (NPA) Assumption: No task execution can be prevented from aborting. However, a task can be prevented from beginning or committing. Further examination of these dependencies reveals that all abortive type workflow dependencies must be weak, however, non-abortive type workflow dependencies can be either weak or strong. Strong abortive type dependencies such as force abort dependency (fa) cannot exist because it violates the NPA assumption. Additionally, because of the NFC assumption, enforcing some weak non-abortive type dependencies such as strong
RD
weak abortive
weak non-abortive strong weak non-abortive abortive abortive 0000000000000 1111111111111 strong
11111111111111 00000000000000
RI
non-abortive
CN
CF
Figure 4: A semantic categorization of task dependencies commit (sc) and force-commit-on-abort (fca) (tj must commit if ti aborts), etc. may require an additional primitive operation such as precommit. Now, let us relate the CN-CF and RD-RI categorization to strongweak and abortive- non-abortive types.
Case CN: Dependencies belonging to this category indicate that the two tasks involved in the dependency access some common data items in conflicting mode. The primary reason to enforce this type of dependency is to enforce the order of these conflicting operations. This category depicts a typical conflicting situation where the parent with a read operation is followed by the child with a write operation on the same data object or vice versa. (No other combination of read-writes are possible as per our distributed system model). Inter-site CN dependencies can be further classified into result-dependent (RD) and result-independent (RI) dependencies, thus resulting in CN-RD dependencies and CN-RI dependencies. These two categories of dependencies are briefly discussed below. – CN-RI: RI dependencies mean that the result of the child does not depend on whether the dependency is enforced or not. However, no enforcement of this dependency may produce a different result for the parent task. Obviously the result of the child is independent of whether the dependency is enforced or not. Therefore, all dependencies falling into this category should be non-abortive such as begin on commit, begin on abort, etc. because otherwise the result of the child will get affected (i.e, abort) by the parent. – CN-RD: Dependencies such as force abort, termination, exclusion etc. that may cause the child task to get aborted (the abortive type) will fall under RD category. Abortive type dependencies are all RD because without enforcing the dependency, the child task may commit (as opposed to abort). This is because all abortive dependencies could possibly cause the abortion of the child and therefore are not of type RI.
Case CF: Dependencies belonging to this category can only be formed by pure semantic specification. Although two tasks do not conflict, sometimes for ensuring an acceptable termination state of the workflow, such dependencies are specified. – CF-RD: These dependencies are specified in such a way that the execution of the child depends on the value or outcome of the parent. Therefore altering the execution order between these two tasks would affect the outcome of the child, in other words, the dependency is result-dependent. These dependencies can be either strong or weak. – CF-RI: A CF type dependency does not cause any effect on the result of the parent task. Thus, there cannot exist any dependency which is both CF and RI because its presence neither effects the parent nor the child.
The above categorization of inter-site dependencies (shown in figure 4) is important because each category needs to be handled using to a different approach in a distributed workflow environment. The specific approach used for each category will be presented in the next section. We introduce now an algorithm to classify the inter-site dependencies in a given workflow according to the above classification. The algorithm only needs to know the set of data that will be potentially read and written by each task and the set of dependencies among tasks. Note, however, the approach we introduce in the next section still applies even if a task only reads (writes) a subset of such set. Algorithm 1 [Identifying the Type of Dependency]
x
for every ti ?! tj in W where s(tj ) 6= s(ti ), if 9ri[d] 2 ti and wj [d] 2 tj where i 6= j x if ti ?! tj is abortive label x with CN-RD else label x with CN-RI else label x with CF-RD endfforg 2
4 Notion of Equivalence In the following, we develop the necessary definitions for formalizing the execution model for workflows. x Given a dependency ti ?! tj , we say ti is the parent of tj and tj the child of ti. Definition 8 Given two tasks ti and tj in W , ti is said to be an ancestor (descendent) of tj , if ti is a parent (child) of tj or ti is a parent (child) of tk where tk is an ancestor (descendent) of tj . 2 Definition 9 Given a workflow W , a potential-state-set of W (denoted as PSS (W )) is a set such that each element (called potential-state) in PSS (W ) represents an allowed combination of final states of all the tasks 2 in W . Therefore, according to the above definition, PSS (W ) is a set of sets. As an example, consider the worka flow W = ft1 ; t2g, and the dependency t1 ?! t2 . PSS (W ) = f(ab1; ab2); ( ab1 ; ab2); ( ab1 ; ab2)g = f(ab1; ab2); (cm1 ; ab2); (cm1 ; cm2)g. Note that a different dependency between t1 and t2 may result in an entirely different PSS (W ). We define a workflow history as the set consisting of only data operations of all the tasks in the workflow by removing all the primitive operations. Formally, it is defined as follows. Definition 10 [Workflow History] Given a workflow W comprised of a set of tasks ft1; t2; : : :tn g, the work2 flow history WH of W is defined as WH = [n i=1tiH such that tiH = ti ? POi. Our approach to executing workflows is to redesign the workflow by using three mechanisms – splitting a task compensating task and substituting a dependency . In the following, we develop the necessary formalism to show that the original workflow is in some sense equivalent to the redesigned workflow. Definition 11 Given a task tijs(d) = sj g 6= ;.
ti and a site sj , we say that there exists a partition tsi j
of
ti if tsi j
=
foi [d] 2 2
According to our distributed system model, a task ti is allowed to read and write data at the site where the task belongs, but is allowed to read data from any other site. Thus if a task reads from two other sites and reads or writes at its own site, it has three partitions. As an example, consider a task ti as follows: ti = ri[x]ri[y]ri[z ]wi[z ], where s(ti ) s= sj ; s(x) = sk ; s(y) = sl ; and s(z ) = sj such that sj 6= sk 6= sl . Accordingly, ti has three partitions: ti j = ri [x]; tsi k = ri [y]; andtsi l = ri [z ]wi[z ]. Thus, every partition can read data from at most one site. Definition 12 A task ti is said to be compensatable if the effects of its execution can be semantically undone 1 2 by executing a compensating task t? i . 1 We assume all compensating tasks eventually commit. In other words, ex? i
) cm?i 1 .
=
Definition 13 [Semantic Projection] Given a workflow history WH , the semantic projection of WH , denoted S (WH ), is obtained as follows: (1) for every pair of tasks ti and t?i 1 in W , if both ti and t?i 1 commit, then remove all DOi and DOi?1 from WH , (2) for every ti in W , if ti aborts, then remove DOi and DOis of every existing tsi of ti for all s 6= s(ti ) from WH . 2 The first item of the above definition states that if a task is compensated, then all its operations as well as the operations of the compensating task are removed from the workflow history. The second item states that if a task aborts, all its operations are removed from the history. Definition 14 [Semantic Transformation] Given a potential-state-set PSS (W ), the semantic transformation ( ) of PSS (W ) is obtained by applying the following transformation rules: 8P 2 PSS (W ), (1) replace 1 1 s 2 P where s = every occurrence of cmi ; cm? with ini , (2) remove every occurrence of in? i , (3) if cm 0i 0 i s 0 s s(ti ), then replace every cmi such that s 6= s(ti ) with cmi , otherwise replace every abi and cmsi 0 such that s0 6= s(ti ) with abi , and (4) replace every occurrence of ini with abi . 2 Since the potential-state-set of a workflow represents the set of all possible final states, our semantic transformation replaces a final state or a combination of final states with its semantic equivalent state. More specifically, the first item of the above definition states that if a task successfully executes and later compensated then it is treated as if the task has not started its execution. The second item removes the states of all unexecuted compensating tasks. The third item states that if a task commits, then all its remote partitions are also considered as committed; on the other hand, it it aborts, the remote partitions are considered as aborted. Finally, the fourth item treats every unexecuted task as if it has aborted. Definition 15 [Semantic Equivalence] Given two workflows W and W 0 , we say that W 0 is semantically equiv0 0 alent to W , denoted as W = W , if (1) the semantic projection of both WH and WH consist of the same data 0 operations, i.e., S (WH ) = S (WH ), (2) for every pair of conflicting operations oi [d] and oj [d], if ti is an ancestor of tj in W , then ti is an ancestor of tj in W 0 , and (3) PSS (W ) PSS (W 0 ). 2
x
Definition 16 Given a dependency ti ?! tj of type x we define an inverse x?1 for x such that PSS (W ) x t and W 0 = ft ; t ; t?1g with dependencies PSS (W 0 ) where W = fti; tj g with the dependency ti ?! j i j j
?1
bc t?1 and t ?! x t?1 in which t?1 is a compensating task of t . tj ?! i j j j j
2
In order to derive such an inverse for each dependency, we have made the following assumptions. 1. exi =) either abi or cmi 3. cmi abi 2. abi cmi 4. exi ini The first assumption indicates that every task that starts its execution must either commit or abort. Assumptions 2 and 3 state that abort and commit are complements of each other. Assumption 4 states that a task that has not yet begun is functionally equivalent to the state where the task is in its initial state. Table 2 lists some RD dependencies and their inverse that are derived based on these assumptions.
x x?1
bc fba
a fba
e fbc
ba fbc
sc fba
Table 2: Some RD type dependencies and their inverses
x a e ba bc x0 bc ba e a Table 3: Examples of dependencies and their substitute dependencies Lemma 1 The dependency fba is an inverse for bc. (See [6] for a proof.)
x t , we define a substitute x0 for x such that PSS (W ) PSS (W 0 ) ?! j 0 x x 0 where W = fti; tj g with the dependency ti ?! tj and W = fti ; tj g with the dependency ti ?! tj .
Definition 17 Given a dependency ti
Table 3 lists some dependencies and their substitutes. Lemma 2 The dependency bc is a substitute for a. (See [6] for a proof.)
5 Workflow Redesigning In this section, first we present the split-task and compensate-task approaches that are utilized in the redesign process of workflows. Then we show a workflow can be redesigned using the three techniques – split, compensate and substitute.
5.1 Split-Task Approach According to this approach, first all the operations in every task are reordered in such a way that all read operations on remote data items occur before all operations on local data items. (Since all these are read operations, it does not affect the correctness of the task.). Then the task is divided into partitions based on the data items it is accessing. For example, if a task t2 in example 1 is split into two tasks, the first task tA 2 contains the read (we refer to this as t ) the read/write operations on r and p. operations on n and o, and the second task tB 2 2 We introduce a begin-on-commit dependency with data flow tA 2 Site A
Site B
bc?! data t to ensure that data read by the read 2 Site A
Site B tB 1
t
1
t2
t
1
t2 A
t3 t
t3
4
Figure 5: An example
t
4
t3
Figure 6: Split-task algorithm will not be used in this case
Site A
Site A t
Site B
t
Site B
1
1
t2
tA 2
t2
tA
2 A t3
A
t3 t
t3
t3 t
4
4
Figure 7: Split-task algorithm will be used in this Figure 8: Bundling of two inter-site communications case
t2
Site A x1
t1
t6
x2
x6 x7
t4 x8
t8
t9
x3
t5
x9
x4
t7 x5
t3 Site C
Site B
Figure 9: An example demonstrating the closest ancestor operations in tA 2 in fact are carried over to t2 even in the wake of other interfering tasks. Then, to enforce the
bc
bc
bc
bc
A original t1 ?! t2 and t2 ?! t3, we convert these two dependencies as t1 ?! tA 2 and t2 ?! t3 , as shown A in figure 10. Task t3 proceeds only if t2 commits, thus preserving the inter-site dependency between t2 and t3. Thus the redesigned workflow with the split approach consists of only one inter-site dependency instead of two. Note that splitting a task into two may not always reduce the number of inter-site dependencies. Our algorithm identifies those cases and splits the tasks only in those cases that result in a reduced number of inter-site dependencies. For example, consider the workflow in figure 5. The split algorithm will not be used if the CN A type dependencies are in such a way that tB 1 and t3 exist, then even if t1 and t3 are split, there would still be the same number of inter-site dependencies (as shown in figure 6). On the other hand, if the CN dependencies A are such that tA 2 and t3 exist, then the split algorithm can be applied to these dependencies, in which case the two inter-site communications (shown in figure 7) can be bundled together into one (because they are just read-only) as shown in figure 8, thereby reducing the original two inter-site communications into one. In the following, we formally present our approach. Definition 18 Given two tasks ti and tj in W , tj is said to be a closest-s-ancestor of ti if (1) s(tj ) = s, (2) tj is an ancestor of ti, (3) there exists no tk in W such that s(tk ) = s and tk is an ancestor of ti and descendent of tj , and (4) for every tk in W such that tk is an ancestor of ti and descendent of tj , s(tk ) = s(ti ). 2
For example, in the workflow shown in figure 9, t8 and t1 are the closest-B-ancestors of t7 and t3 is the closest-C-ancestor of t7 . By contrast, t6 is not a closest-B-ancestor of t7 because there exists t8 which is an ancestor of t7 and a descendant of t8 . The following algorithm specifies our approach to task splitting.
Algorithm 2 [split task]
x
for every inter-site dependency ti ?! tj in W where x is of CN-RI type s(t ) if there exists a ti j for each closest-s(tj )-ancestor tk of ti find the descendent tl of tk such that s(tl ) = s(ti ) and tl is an ancestor of ti s(t ) if there exists a tl l s (tj ) s(t ) add ti and tl j to W s(t ) s(t ) replace ti with ti ? ti j and tl with tl ? tl j
s(t ) add dependencies ti j endfifg endfforg endfifg endfforg 2
bc(data)
(data) bc ts(tj ) and ts(tj ) ?! x t ?! ti , tls(tj ) bc?! tl , tk ?! j i l
x
According to the above algorithm, given a dependency ti ?! tj , where x is of CN ? RI type, we first split ti into a number of partitions based on the site to which each data items that ti accesses belongs. Then s s(t ) we add a dependency of type “bc” (i.e., begin-on-commit) with data flow from every partition ( ti j ) to (ti i ) where sj 6= s(ti ). This is to ensure that the data read by the sj partition reaches the s(ti ) partition. Later, for each site where there is a partition of ti , we add a dependency of type “bc” from every closest-sj -ancestor tk of ti at that site to that partition. This dependency is to ensure that splitting does not remove any dependency s from tk to ti j . We need to take care of the case where tk and tsi are conflicting. The abortive type need not s s be enforced because we need not preserve the order of conflicting operations if ti j is to abort because ti j is x always read-only. Thus tk ?! tsi is always of type “bc.” However, in the above algorithm, we do not add any dependency between the closest-sj -ancestor tk of ti s C if there does not exist a partition ti j . For example, in the workflow shown in figure 9, if tB 7 exists and t7 does
bc
bc
B not exist, we need to add dependencies t1 ?! tB 7 and t8 ?! t7 but do not need to add any dependency from t3 though it is a closest-C-ancestor of t7 . On the other hand, if tB7 does not exist but only tC7 exists, then we bc
s
j need to add only the dependency t3 ?! tC 7 . We also add a dependency from the partition ti of tj to tj . This is of the same type of the original dependency.
5.2 Compensate-task Approach As noted earlier, weak abortive, strong non-abortive and weak non-abortive dependencies fall into the CF-RD category. In this paper, we propose an approach to enforce a RD type of inter-site dependency by executing a compensating task. We show below that some inter-site dependencies (strong non-abortive and weak abortive type) can be enforced by compensating a child task. This approach, however, is applicable only to tasks for which there exist a compensating task. The formalism developed in section 4 can be used to enforce inter-site dependencies as follows: For exbc ample, if there exists a inter-site dependency ti ?! tj , since the inverse of “bc” is “fba,” and “bc” is strong
fba
1 ?1 and non-abortive, we replace the above dependency with ti ?! t? j , meaning that the compensating task tj cannot begin until ti aborts. That is, both ti and tj can be executed independently, thus tj need not wait for ti. However, if ti aborts, a compensating task t?j 1 will be started. It is at this point, we make use of the presumed abort and presumed commit notions and replace the original dependency in such a way that the number of times compensating task must be executed is minimized. We explain our approach further with a simple example. Consider an inter-site dependency as shown in figure 11(a). If t1 is presumed commit, then we redesign this as shown in figure 11(b). Note that, both t1 and t2 can be executed simultaneously and there is no
Site A
Site B Site A
t1
Site B
Site A
Site B -1
fba
bc A
t
2
bc
bc(data)
t
t
bc
1
t2
t2 bc
t
1
t2
2
t3 (a)
(b)
Figure 10: Modified workflow after executing Figure 11: An example showing the compensation split task for example 1 approach need for t2 to wait until t1 commits. However, if t1 does not commit, which is supposed to occur rarely, then t2 will be compensated by executing t?2 1 . Effectively, there does not exist a dependency between sites A and B most of the time; it exists only in rare cases when t1 aborts. Thus the inter-site communication is required rarely. On the other hand, if t1 is presumed abort, then we retain the original dependency while redesigning. Therefore the workflow is same as shown in figure 11(a). Note that, in this case, if t2 does not wait until t1 finishes its execution, it may result in unnecessarily compensating or rolling-back tasks. Note that the redesign depends on the nature of the dependency. The following algorithm shows how the above methodology can be employed to modify an inter-site CF-RD type of dependency. Algorithm 3 [compensate task] for every inter-site dependency ti x remove ti ?! tj from W ? add a task tj 1 and
x?1
x t in W to be compensated ?! j bc
1 ?1 dependencies ti ?! t? j and tj ?! tj in W for every tl where tl is the closest-s(tj )-ancestor of ti bc add a dependency tl ?! tj in W endfforg y for every tj ?! tk parent j and child k execute cascaded-compensation endfforg endfforg
cascaded-compensation z for each tparent ?! tchild 1 if there exists a t? child
1 add a task t? child and dependencies tchild endfforg 2
fbb t?1 to W bc t?1 and t?1 ?! ?! parent child child
The cascaded-compensation part of the above algorithm ensures that in the event of tj ’s compensation, all its descendents need to be compensated as well. We accomplish this by adding a force-begin-on-begin (fbb) 1 ?1 from t? j to tj ’s child tk , tk to tk ’s child tl , and so on. However, we do not need to continue this cascaded compensation until we reach the farthest descendent of tj , but can stop whenever a inter-site dependency is
encountered in this path since this dependency has already been redesigned by one of algorithms 2, and 3, based on its type. It is important to note here that, if there exists more than one parent to a compensating task t?j 1 (other than tj ), t?j 1 will be initiated when any one of its parents reach the triggering state.
5.3 Constructing the Redesigned Workflow Our redesign process has been summarized in the algorithm shown as a flow chart in figure 12. To facilitate easy explanation of the rationale behind our approaches, we define favorable and unfavorable as follows:
x
Definition 19 Given a dependency ti ?! tj , ti is said to be favorable to x if either (1) sti 6= abi and ti is presumed commit (PC) or (2) sti = abi and ti is presumed abort (PA). In contrast, ti is said to be unfavorable to x if either (1) sti 6= abi and ti is presumed abort (PA) or (2) sti = abi and ti is presumed commit (PC). 2 In the following, we explain how the redesign algorithm works and provide the rationale for these different redesign approaches. If the dependency is of CN type (either CN-RI or CN-RD), we use the split-task approach due to the reasons explained in section 5.1. Since a CN-RD type may additionally have dependency due to the semantics of the workflow, we label that as CF-RD and redesign it as follows. All CF-RD dependencies are either redesigned using compensate or substitute approaches, or retained. x Consider an inter-site CF-RD dependency ti ?! tj . According to our categorization (see figure 4) it belongs to one of the following three types: strong non-abortive, weak abortive and weak non-abortive. If x is a strong non-abortive type dependency, tj enters stj (where stj 6= abj ) only if ti is favorable to x. If ti is favorable to x, it implies that tj will be forced to enter stj most of the time. Since stj 6= abj , it would be efficient to execute both ti and tj simultaneously, thereby increasing the degree of concurrency, and compensate tj if ti results in a state that is not same as sti . Note however that, tj is compensated rarely because most of the time ti will end up in sti . Thus it reduces the required number of inter-site communications. 1 However, if there does not exist a compensating task t? j of tj , we retain the dependency. On the other hand, if ti is unfavorable to x, tj will not enter stj most of the time. Therefore, we retain the dependency. If x is a weak abortive type dependency, ti is favorable to x implies that tj will be forced to abort most of the time. In order to reduce the situations where tj executes and needs to be compensated eventually, resulting in wastage of computational resources, it would be prudent to substitute x with its substitutable dependency x0. Thus, in most cases, tj will never start as opposed to the case that it starts and eventually aborts most of the time if it is not redesigned. However, tj will be started only rarely when ti does not end up with sti . By doing so, we reduce the probability that a task acquires resources and waits at its precommit state, which is more likely to be aborted eventually. On the other hand, if ti is unfavorable to x, tj can execute normally and independently with ti since most of the time ti does not affect tj ’s execution. In rare cases when ti enters sti , tj will be compensated to enforce the dependency, thereby reducing the number of inter-site communications. If x is a weak non-abortive type dependency, when ti is favorable to x, tj is more likely to be forced to enter stj (where stj 6= abj ). Although in most cases, tj can be executed normally, the dependency x has to be retained to ensure that tj enters stj when ti enters sti . In rare cases when ti does not enter sti , x does not affect tj ’s execution at all. On the other hand, if ti is unfavorable to x, tj can execute independently most of the time. The dependency x is retained only to handle the rare situation when ti enters sti . Therefore, in both cases, neither compensate nor substitute approaches result in efficient execution. Therefore, we retain x in its original form. Consider once again the workflow in example 2. Suppose the execution statistics are known and are as follows: t3 is PC and compensatable, t4 is not compensatable, t1 is PC and t2 is PA. Figure 13 shows the redesigned workflow of that in figure 3. For simplicity, we have omitted the the cascaded compensation part in the figure. Theorem 1 Let W be a workflow. Let W . Then, W 0 = W.
W 0 be the workflow obtained by applying algorithm in figure 12 to 2
x
ti
tj
Is x CN-RI, CN-RD conflicting ? yes
Redesign using split-task approach
no CF-RD Is x RD ?
yes strong non-abortive Is x a weak type ? no
Is t i favorable to x ?
Retain x no
yes weak non-abortive
Is st j =ab j?
no
yes yes
Retain x
weak abortive
Is t i favorable to x ?
-1
no
Does t j exist ?
no
yes yes Replace x with x’
Redesign using compensate-task approach
Figure 12: Algorithm for redesigning workflow dependencies
6 Related Work In recent years, research in workflow management has received considerable attention [17, 9, 11, 4, 10, 19, 1, 2, 14]. We review below only the work related to the redesigning of workflows. Recently, Liu and Pu [15] have proposed an approach to restructuring complex workflow activities. As the workflow progresses with execution, new dependencies may crop up or existing dependencies may disappear. Their paper shows how this dynamism can be captured by providing users with a capability to define several restructuring methodologies including activity-split and activity-join operations. Therefore, in their paper, the workflow is redesigned when new semantics are realized. In contrast, in our work, the workflow semantics are intact and the dependencies are represented in an equivalent form in order to minimize inter-site communication cost. We provide automatic means to statically redesign a workflow in advance that is equivalent to the original workflow. Therefore our objective as well as the methodology are different from those of Liu and Pu [15]. Our approach in this paper is closer to the semantic redesign approach proposed by Atluri et al. [7] to execute workflows in a multilevel secure (MLS) environment. In a multilevel secure (MLS) workflow, tasks may
Site B Site A
-1
t3 fba
t
bc
1
t3 bc
bc t
2
bc
t4
Figure 13: The redesigned workflow in example 2 belong to different security levels. Ensuring the task dependencies from the tasks at higher security levels to those at lower security level (high-to-low dependencies) may compromise security. Atluri et al. have proposed an execution model that redesigns the workflow in such a way that all high-to-low dependencies can be executed without compromising security. In this paper, we extend [7]’s approach to suit to a distributed workflow environment to minimize the inter-site communications. Another novel aspect of our approach is utilizing the presumed commit and presumed abort notions. Our work is also somewhat related to the research by Smith et al. [18] proposed in the area of multilevel transaction processing in the MLS environment. Our split-task approach is similar to the cache scheme in [18]. However, it does not use the notion of presumed commit or presumed abort notions and therefore cannot minimize the inter-site communications. Moreover, [18]’s cache scheme can only be employed in case of conflicting type dependencies.
7 Conclusions In a distributed workflow environment, various tasks that constitute the workflow need to be executed by systems that are distributed and autonomous in nature. In such an environment, it is desirable to minimize the number of communications among the distributed sites and minimize the number of tasks that need to wait for their execution for those at other sites. In this paper, we propose an approach that can automatically redesign a workflow in such a way that the interference among sites is minimized. Our approach is based on a semantic categorization of task dependencies and utilizes the notion of presumed commit and presumed abort. We show that the redesigned workflow is equivalent to the original workflow. We are currently implementing the redesign algorithm based on the system architecture similar to that in [5] to conduct a performance analysis of our approach.
References [1] Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions, http://www.cs.uga.edu/LSDIS/ . [2] Bulletin of IEEE Technical Committee on Data Engineering. Special Issue on Workflow and Extended Transaction Systems, 16(2), 1993. [3] Nabil R. Adam, Vijayalakshmi Atluri, and Wei-Kuang Huang. Modeling and Analysis of Workflows Using Petri Nets. Journal of Intelligent Information Systems, 10(2), 1998.
[4] Mansoor Ansari, Linda Ness, Marek Rusinkiewicz, and Amit Sheth. Using flexible transactions to support multi-system telecommunication applications. In Proc. 18th Int’l. Conf. on Very Large Data Bases, pages 65–72, British Columbia, Canada, 1992. [5] Vijayalakshmi Atluri, Wei-Kuang Huang, and Elisa Bertino. A semantic based execution model for multilevel secure workflows. Submitted for publication, January 1998. [6] Vijayalakshmi Atluri, Wei-Kuang Huang, and Elisa Bertino. A semantic based redesigning of distributed workflows. Technical Report, CIMIC, September 1998. [7] Vijayalakshmi Atluri, Wei-Kuang Huang, and Elisa Bertino. An Execution Model for Multilevel Secure Workflows. In Proc. of the 11th IFIP WG 11.3 Workshop on Database Security, August 1997. [8] Vijayalakshmi Atluri, Wei-Kuang Huang, and Elisa Bertino. A Semantic Based Workflow Execution Model for a Heterogenous Distributed Environment. Technical report, TR-98-103, CIMIC, Rutgers University, June 1998. [9] Paul C. Attie, Munindar P. Singh, Amit Sheth, and Marek Rusinkiewicz. Specifying and enforcing intertask dependencies. In Proc. 19th Int’l. Conf. on Very Large Data Bases, pages 134–145, Dublin, Ireland, 1993. [10] A. Biliris, S. Dar, N. Gehani, H.V. Jagadish, and K. Ramamritham. ASSET: a system for supporting extended transactions. In Proc. ACM SIGMOD Int’l. Conf. on Management of Data, pages 44–54, Minneapolis, MN, May 1994. [11] Y. Breitbart, A. Deacon, H.J. Schek, A. Sheth, and G. Weikum. Merging application-centric and datacentric approaches to support transaction-oriented multi-system workflows. SIGMOD Record, 22(3):23– 30, 1993. [12] P.K. Chrysanthis. ACTA, A framework for modeling and reasoning about extended transactions. PhD thesis, Department of Computer and Information Science, University of Massachusetts, Amherst, 1991. [13] Ahmed K. Elmagarmid. Database Transaction Models for Advanced Applications. Morgan Kaufmann, San Mateo, California, 1992. [14] Mohan Kamath and Krithi Ramamritham. Failure Handling and Coordinated Execution of Concurrent Workflows. In Proc. IEEE 14th Int’l. Conf. on Data Engineering, Orlando, Florida, February 1998. [15] Ling Liu and Calton Pu. Methodical Restructuring of Complex Workflows. In Proc. IEEE 14th Int’l. Conf. on Data Engineering, Orlando, Florida, February 1998. [16] C. Mohan, B. Lindsay, and R. Obermarck. Transaction management in the R distributed database management system. ACM Transactions on Database Systems, 11(4):378–396, December 1986. [17] Marek Rusinkiewicz and Amit Sheth. Specification and Execution of Transactional Workflows. In W. Kim, editor, Modern Database Systems: The Object Model, Interoperability, and Beyond. AddisonWesley, 1994. [18] K.P. Smith, B.T. Blaustein, S. Jajodia, and L. Notargiacomo. Correctness Criteria for Multilevel Secure Transactions. IEEE Transactions on Knowledge and Data Engineering, 8(1):32 – 45, February 1996. [19] D. Wodtke and G. Weikum. A Formal Foundation For Distributed Workflow Execution Based on State Charts. In Proc. International Conference on Database Theory, Delphi, Greece, January 1997.