Distributed Global Transaction Support for Workflow Management

12 downloads 0 Views 86KB Size Report
Workflow management systems require advanced transaction support to ... The architecture of the non-distributed global transaction support system is extended.
Distributed Global Transaction Support for Workflow Management Applications♣ Jochem Vonk, Paul Grefen, Erik Boertjes, Peter Apers Center for Telematics and Information Technology (CTIT) University of Twente {vonk, grefen, boertjes, apers}@cs.utwente.nl

Abstract Workflow management systems require advanced transaction support to cope with their inherently long-running processes. The recent trend to distribute workflow executions requires an even more advanced transaction support system that is able to handle distribution. This report presents a model as well as an architecture to provide distributed advanced transaction support. Characteristic of the transaction support system is the ability to deal with arbitrary distribution of business processes over multiple workflow management systems and the support for flexible rollbacks. The modularity of the architecture and the orthogonality with respect to the workflow management system allows the transaction system to be applied in other application areas as well. The high scalability of the architecture allows an arbitrary combination of transaction support systems and workflow management systems of which the locations are irrelevant. In the WIDE project, the developed technology is applied to the FORO workflow management system.



The work presented in this report is supported by the European Commission in the WIDE Project (ESPRIT No. 20280). Partners in WIDE are Sema Group sae and Hospital General de Manresa in Spain, Politectnico di Milano in Italy, ING Bank and University of Twente in the Netherlands.

1

1. Introduction As indicated by recent studies [Alon96, Alon97, Jajo97, Cich98], transaction support is a necessary functionality required by workflow management systems. However, the traditional ACID transaction paradigm is too strict for the inherently long-running processes of workflow management systems. Because of the characteristics of workflow management systems a relaxed notion of atomicity and isolation is required [Hsu93, Gref97, Cich98]. Another functionality required for workflow management systems is the support for distributed workflow executions. Large organizations are usually divided into multiple business units, each operating their own workflow management systems. However, the overall business processes concerning the business goals of the complete organization are performed by some or all of the separate business units, i.e., the business processes are distributed over the multiple business units of the organization. Workflow management systems should therefore support the distributed execution of the workflow processes. This requires the transaction support system to cope with the distributed execution as well. In the WIDE project, a transaction management system has been designed that consists of two layers [Gref97, Gref99]. The upper layer caters for the requirements of long-running processes by offering global transactions, with relaxed ACID properties and is based on the saga model [Garc87]. As in the saga model, a global transaction is divided in steps whose actions are committed to the database after the step finishes. Rolling back already finished steps relies on compensating actions that undo the effects of completed actions. In the WIDE project, the saga model has been extended to support flexible rollbacks. The lower layer of the transaction model is based on the nested transaction model [Moss85], which provides the more strict traditional ACID properties required by the separate global transaction steps. The lower layer is completely orthogonal to the upper layer and is thus not influenced by the extensions made to the upper layer. The lower layer is therefore not relevant in this report and the reader is referred to [Boer98] for more information. This report discusses the extension to the WIDE upper layer providing support for the distributed execution of workflow processes, i.e. distributed global transactions. The developed distributed transaction model offers different abort modes to allow for

2

flexible rollbacks. It also offers a mechanism to handle concurrent aborts is discussed. The architecture of the non-distributed global transaction support system is extended with a communication protocol to coordinate the different systems. The architecture allows for an arbitrary distribution of transaction support systems and workflow management systems. This report is structured as follows. Section 2 describes related work. Section 3 discusses distribution of workflow executions, illustrated by an example. The distributed global transaction support is described in Section 4, of which the architecture is presented in Section 5 together with implementation issues. Section 6 presents concluding remarks. Appendix A describes the architecture of the distributed global transaction support in more detail.

3

2. Related Work Numerous advanced transaction models have been proposed in the past [Elma92, Jajo97, Chic98]. The aim of the WIDE project is not to propose yet another advanced transaction model, but to reuse what is already available. A combination of two models, each supporting different requirements of workflow management systems, is chosen as a basis and both are extended to provide for more advanced features. This report discusses the extensions made to upper layer of the WIDE transaction with respect to distributed workflow executions and flexible rollbacks. The transaction model used in the Exotica project [Alon96] is also based on the saga model, but relies on statically computed compensation patterns. The process structure supported by Exotica is more limited than the one supported in WIDE, e.g. cycles are not allowed. Consequently, its functionality is limited compared to the work presented in this report. [Barb96] describes a way to support distributed workflow execution by using Information Carriers (INCAs). All information necessary (data, control flow, etc.) to execute a workflow is contained in INCAs, so there is no notion of a WFMS. INCAs are passed to the autonomous processing stations involved in the distributed execution. The transactional functionality offered depends on the transactional functionality offered by the processing stations. If a processing station provides the ACID properties, compensating steps are used to undo the completed steps. Which compensating steps need to be executed is specified by rules in the INCAs. In the Mentor project [Wodt96], distributed workflow execution is addressed. A transaction processing monitor (Tuxedo) is used to provide for failure tolerance and reliable message passing. Transaction in Mentor are more restrictive and comply to the strict ACID paradigm, while the model presented in this report allows for relaxed ACID properties. The Workflow Management Coalition has specified a standard interface to facilitate the interoperability between different WFMSs [WfMC96] making distributed workflow execution possible, even in a heterogeneous WFMS environment. They do not address transactional issues, except for writing an audit log, and the work presented in this report can therefore be seen as complementary to the standard interoperability interface.

4

Commercial workflow products, like Cosa and Staffware, do not offer much transaction support [Cosa99, Staf99].

For example, Cosa treats every workflow

activity as a separate ACID transaction.

5

3. Distributed Workflows Large organizations are usually divided in multiple business units which have to work together to perform the business processes that involve the whole organization. The workflow management systems (WFMSs) in use by the separate business units need to interoperate to support the overall business processes, i.e., those business processes that are distributed throughout the organization. The distribution of a process over multiple business units can be seen as the delegation of parts of the process, called subprocesses, from one business unit to another business unit. For example a billing department performs some actions (the subprocess) for the purchase department. A delegation model is therefore adopted for the distribution of processes over the different business units, and thus over the different WFMSs. In the delegation model, WFMSs can delegate the execution of subprocesses to other WFMSs. This implies that the distributed workflow execution topology is tree-shaped, i.e., one WFMS can delegate to multiple other WFMSs, but a delegated subprocess cannot originate from more than one other WFMS, see Figure 1 in which the dotted lines represent site boundaries. Note that the workflow execution topology might be different than the WFMS topology. Although the execution topology is tree-shaped, the WFMS topology might not be, e.g. in the case where one WFMS executes multiple delegated subprosses.

WFMS WFMS

WFMS WFMS WFMS

Figure 1: Distributed WFMSs topology

An example of a distributed workflow process specification is shown in Figure 2 which represents the workflow process “selling a holiday trip to a customer” of a large travel agency. The travel agency consists of several business units that are involved in the process, the customer buys the holiday trip at the sales office (left-

6

hand side of the figure), the financial department handles the invoices, the booking department arranges the travel documents and vouchers. All involved business units have to perform their own subprocess using their own WFMS. The subprocesses together form the complete “selling trip” process.

SP

Figure 2: Distributed workflow process

In the figure, each distributed subprocess is represented by a global transaction which is a graph consisting of circles and solid arrows. The circles represent actions, called global transaction steps, and the solid arrows indicate the order in which the called global transaction steps can be performed. The circle marked with ‘SP’ is a global savepoint, which is explained in the next section. The dotted arrows indicate the delegation of a subprocess to another WFMS.

WFMS GTS

WFMS

WFMS GTS GTS

WFMS GTS

WFMS GTS

Figure 3: Distributed WFMSs and global transaction

7

As each separate subprocess is a global transaction, executed by a certain WFMS, it can per site be supported by the non-distributed global transaction support (GTS) system as described in Section 1, see Figure 3 below. The connection of the subprocesses, represented by the dotted arrows, implies the order in which the subprocesses are executed. The connected global transactions can therefore be seen as one large distributed global transaction. Only the distribution aspect requires an extension to the global transaction support system to handle the delegation of subprocesses. This extension is described in the next section.

8

4. Distributed Global Transaction Support As described before, the upper layer of the transaction model developed in the WIDE project relies on compensating actions to semantically undo the effects of completed global transaction steps in case of failures. Compensating an entire workflow, thereby undoing all the work that has been done so far is called a complete abort. A complete abort is usually too strict and undoing only part of the process would suffice. For this purpose the global savepoint concept is introduced to allow for flexible, partial aborts. When a partial abort is initiated because of some failure in the workflow execution, the executed workflow is only rolled back (compensated) to one or more savepoint(s), thereby avoiding rolling back the entire workflow execution [Gref98]. The part of a workflow that needs to be rolled back is dynamically calculated by the global transaction support. In the distributed workflow execution scenario, as presented in Figure 3, every WFMS is supported by a global transaction system, which can handle aborts of global transactions for that specific site. This means that the extension to the global transaction system to cope with distribution does not change the algorithms to calculate a compensating global transaction for one site, i.e. in a non-distributed scenario. In the distributed scenario however, an additional communication protocol is required that decides which other sites need to abort as well and in which abort mode (partial or complete abort). It then signals those other sites to actually start an abort in the correct abort mode. site 2 site 4

SP

Forward abort

site 1 site 3

Backward abort

Figure 4: Backward and forward abort

9

site 5

4.1. Backward and Forward Aborts To illustrate the way aborts are propagated through the workflow execution, an example is presented in Figure 4. In the figure, the black circle at site 5 is the workflow activity that fails, thereby causing an abort. Suppose the abort request indicates that the abort should be performed in partial abort mode. The transaction system at site 5 finds no savepoint at that site to roll back to. It signals the parent site to abort, called backward abort, in partial abort mode. The subprocess aborted at the parent site can roll back to a savepoint so no more backward aborts are necessary. However, since the part of the subprocess that is rolled back has delegated a subprocess to site 4, that subprocess must be aborted as well. The transaction system will therefore signal the child site to abort, called forward abort. A forward abort implies that the site receiving the forward abort request must abort in complete abort mode as all the actions done at that site are dependent on an action that has been rolled back and are therefore invalid. In summary, an abort involves multiple sites if the following situations occur: •

A complete abort request is issued by a WFMS that is executing a delegated subprocess; at least one backward abort is issued.



A complete abort request is issued by a WFMS that has delegated a subprocess; at least one forward abort is issued.



A partial abort request is issued by a WFMS that has delegated a subprocess and for which the global transaction step that delegated a subprocess has to be compensated; at least one forward abort is issued.



A partial abort request is issued by a WFMS for which there is no savepoint in the subprocess to roll back to on that site, see example above; at least one backward abort is issued.

If the global transaction support system has decided that other sites need to abort as well, the communication protocol will signal those sites to abort in the abort mode that depends on the following two rules: 1. The parent site: backward abort in the abort mode that was issued by the failing site 2. The child site: forward abort in complete abort mode.

10

As each WFMS might independently decide that it needs to abort, concurrent aborts can occur. If those aborts trigger forward and/or backward aborts, the concurrent aborts might come together at one site, causing an abort conflict. The way this is handled is described in the next section.

4.2. Concurrent Aborts Inherent to distributed workflow execution is the possibility that different subprocesses concurrently issue an abort request. Because of the backward and forward aborts, which might follow an abort request, the occurrence of concurrent aborts might lead to abort conflicts. Abort conflicts occur when one site must handle different abort requests, e.g. a subprocess at one site receives a partial abort request from one child site, while another child site request a complete abort. The way in which the abort conflicts are resolved depends on the state in which a subprocess resides at the time an abort request is received. Figure 5 shows the state diagram of a subprocess. A subprocess can reside in the following states: •

Executing, i.e. the subprocess is being executed normally by the WFMS,



On Hold, i.e. the subprocess is stopped while a compensating process is being created by the transaction system. This state is not shown in the diagram to keep it readable as the subprocess is always put on hold whenever an abort request is issued and a compensating global transaction is being calculated.



Compensating partially, i.e. the WFMS is executing a compensation process that has been created by the transaction system based on a partial abort request from the failing subprocess,



Compensating completely, i.e. the WFMS is executing a compensation process that has been created by the transaction system based on a complete abort request from the failing subprocess, and



Finished, i.e. the subprocess has been completely executed.

11

xyz: call [xyz]: condition : state transition

[partial comp. finished]

complete abort

partial abort complete abort

Compensating Partially

[case ended]

partial abort

partial abort & [partial comp. finished]

Compensating Completely

complete abort & [partial comp. finished]

complete abort

partial abort

Executing

Finished

[complete comp. finished]

[workflow ended]

Figure 5: State diagram for concurrent abort requests

A backward or forward abort request can be received while the subprocess resides in any of the five states. The reaction to the backward or forward abort request can be seen in the state diagram: If a subprocess is in execution or finished state, no abort conflict can occur and an abort request is handled as described in the previous subsection. However, abort conflicts can occur when the subprocess is in one of the other three states and are handled in the following way, see also Figure 5: •

If the subprocess is in the on hold state and thus a compensating global transaction is being created, the following table shows the occurring conflicts and the reaction of the distributed global transaction support: Receiving abort request in Receiving abort request in partial mode complete mode Creation of compensa- Finish creation and execution Cancel creation, then create tion in partial abort of compensation, then handle and execute the compensamode the received abort request tion in complete mode Creation of compensa- Ignore receiving abort request tion in complete abort mode

12

Ignore request

receiving

abort



If the subprocess is executing a compensating global transaction, i.e. in the compensating partially or compensating completely states, the received abort request is “remembered” and handled after the compensation process has been executed, i.e. concurrent aborts will be handled one after the other.

An example of two concurrent abort requests is shown in Figure 6 in which the failing action on site 4 requests a partial abort and the failing action on site 5 requests a complete abort.

site 2 site 4

SP

site 1 site 3 site 5

Figure 6: Concurrent abort requests

Both failing sites will issue a backward abort request to site 3. Suppose the partial backward abort request is handled first. During the creation of the partial compensation request, a forward abort is issued to site 4. Site 4 has already handled a complete abort request so no additional abort is necessary at site 4. When the backward complete abort request from site 4 is received by site 3, which is executing the partial compensation request, it first finishes the partial compensation (leaving only the savepoint as the rest of the actions are compensated) after which it handles the complete abort request. In this case no forward abort request is issued as site 5 has already been compensated. An architecture has been designed to support the discussed distributed transaction model and is described in the next section.

13

5.

Architecture of the Distributed Global Transaction Support

The architecture of the distributed global transaction (DGTS) support does not differ much from the non-distributed global transaction support (GTS). As described before, the necessary communication protocol needs to be added. A non-distributed GTS consists of two parts: a global transaction manager (GTM) and global transaction (GT) objects [Gref97]. Each global transaction, i.e. workflow execution is represented by one GT object. GT objects are dynamic objects that are created when a global transaction is started and destroyed when a global transaction finishes. They keep track of workflow executions by administrating events like starting and ending of global transaction steps.

5.1. Distributed Global Transaction Manager The distributed global transaction manager (DGTM) is composed of multiple modules. The most important module with respect to the transactional support for distributed workflow executions is the communication module. It is responsible for the coordination of the different sites in case of an abort that involves multiple sites, i.e. backward and/or forward aborts as described in the previous section. The WFMSs involved in a distributed workflow execution have a communication mechanism that allows them to interoperate, e.g. using interface 4 specification of the Workflow Management Coalition [WfMC96]. This communication mechanism could be used by the DGTS to signal forward and backward aborts to other sites, but that requires the WFMS to handle some of the transactional functionality, reducing the orthogonality between the DGTS and the WFMS. The architectural choice has therefore been made to have a separate and direct communication mechanism between GTS systems. This makes the DGTS and the WFMS as orthogonal as possible, thereby shielding the WFMS from transactional issues and increasing the portability of the transactional support system. The complete architecture with the communication infrastructure is shown in Figure 7. In the figure, the solid arrows represent the communication between the WFMSs and the dotted arrows represent the communication between the different GTS systems.

14

The logic module of the DGTM contains the algorithms that are used to calculate compensating global transactions for one site. A formal specification of those algorithms, i.e. for non-distributed workflow executions, can be found in [Gref98].

WFMS GTS Comm

WFMS

WFMS

Comm

WFMS GTS

GTS

Comm

GTS Comm

WFMS GTS Comm

Figure 7: Distributed WFMS and GTS architecture

The other modules of the DGTM contain mechanisms to store and retrieve global transaction and to communicate with the WFMS to receive abort request originating from the WFMS or to pass the compensating global transaction to the WFMS so it can be executed, thereby undoing (part of) the workflow execution it relates to. A more detailed description of the architecture mentioned above is presented in Appendix A.

5.2. Extended Architectural Considerations The workload of the DGTS depends on the amount of failures that occur in workflow executions. If few failures occur, it is not necessary to have a transaction system on every site where a WFMS is running which only wastes resources. In case of many failures, it is more efficient if the WFMS can use more than one transaction support system. Both requirements can be satisfied by making it possible for any WFMS to use any GTS system that is available, i.e., one WFMS may use multiple GTS systems and one GTS system can be used by multiple WFMSs.

15

This can be realized using a middleware system such as a CORBA [OMG95] compliant object request broker. In this case it is possible to arbitrarily distribute the GTS systems over multiple sites. The GTS systems can then be called by any of the WFMSs, which may or may not be on the same site. This way a very flexible and adaptable architecture is offered, of which an example is shown in Figure 8. Whenever more or less GTS systems are required, a new GTS system can be instantiated or an existing one can be shut down. For the non-distributed global transaction support system the arbitrary distribution of GTS systems is already realized in the WIDE project.

site 1

GTS

site 2

WFMS site 3

Comm

WFMS GTS WFMS WFMS

site 4

GTS

Comm

Comm

WFMS

site 5

site 6

Figure 8: Arbitrary distribution of multiple GTS systems

16

6. Conclusions and Future Work This report describes a model and architecture to provide transactional support for distributed workflow executions. The transaction model relies on compensating actions to undo the effects of completed actions. Specifying savepoints in the workflow process allows to partially compensate a workflow. The WIDE global transaction support system is extended with a communication protocol to provide support for distributing workflow executions over multiple sites. The communication protocol coordinates the calculation of the sites that are involved in an abort, in which abort mode the compensation for those sites must be performed, and the enactment of the distributed global transaction support system at those sites. In the WIDE project, the non-distributed version of the global transaction support system has been implemented in a prototype environment to support the commercial workflow management system FORO, marketed by SEMA Group [FORO99]. The environment is based on ILU [Ilu99, Ilu96], which is a CORBA compliant object request broker and the Oracle database management system [Orac99]. The realization of the extensions to support the distributed execution of workflows as described in this report is currently considered and can be achieved with little effort. Because of the modularity of the global transaction support system and the orthogonality to the workflow management system, only the communication module needs to be specified and implemented, which can be easily integrated into the global transaction support system. The transaction model and architecture are othogonal to the workflow management system and can therefore be easily applied to other application areas dealing with long-running processes. The work presented in this report is currently under consideration to be used in the CrossFlow project [Cros99].

17

References [Alon96]

G. Alonso, D. Agrawal et al.; Advanced Transaction Models in Workflow Contexts; Procs. Int. Conf. on Data Engineering, 1996

[Alon97]

G. Alonso, D. Agrawal, A. El Abbadi, C. Mohan; Functionality and Limitations of Current Workflow Management Systems; IEEE Expert; Vol. 12, No. 5; 1997.

[Barb96]

D. Barbará, S. Mehrotra, M. Rusinkiewicz; INCAs: Managing Dynamic Workflows in Distributed Environments; Journal of Database Management – Special Issue on Multidatabases 7(1): 5-15, 1996.

[Boer98]

E. Boertjes, P. Grefen, J. Vonk, P. Apers; An Architecture for Nested Transaction Support on Standard Database Systems; Procs. 9th Int. Conf. on Database and Expert System Application (DEXA); Vienna, Austria, 1998.

[Cich98]

A. Cichocki, A. Helal, M. Rusinkiewicz, D. Woelk; Workflow and Process Automation: Concepts and Technology; Kluwer Academic Publishers, 1998.

[Cosa99]

COSA Solutions Gmbh; Cosa Solutions: Turning Workflow into Cashflow; http://www.cosa.de.

[Cros99]

The CrossFlow Project Web Site: http://www.crossflow.org/

[Elma92]

A. Elmagarmid, ed.; Database Transaction Models for Advanced Applications; Morgan Kaufmann; USA, 1992.

[FORO99] SEMA Group sae; FORO Web Site: http://dis.sema.es/projects/FORO. [Garc87]

H. Garcia-Molina, K. Salem; Sagas; Procs. ACM-SIGMOD, California, 1987.

[Gref97]

P. Grefen, J. Vonk, E. Boertjes, P. Apers; Two-Layer Transaction Management for Workflow Management Applications; Procs. 8th Int. Conf. on Database and Expert System Application (DEXA); Toulouse, France, 1997.

[Gref98]

P. Grefen, J. Vonk, E. Boertjes, P. Apers; Global Transaction Support: Formal Specification and Practical Application; CTIT Technical Report 98-10; University of Twente, 1998. Submitted for publication.

[Gref99]

P. Grefen, B. Pernici, G. Sánchez (Eds.); DatabaseSupport for Workflow Management – The WIDE Project; Kluwer Academic Publisher, 1999. ISBN 07923-8414-8.

[Hsu93]

M. Hsu (Ed.); Special Issue on Workflow and Extended Transaction Systems; IEEE Data Engineering Bulletin Vol. 16, No. 2., June 1993.

[Ilu96]

B. Janssen, M. Spreitzer; ILU 2.0alpha8 Reference Manual; Xerox Corporation, 1996.

[Ilu99]

ILU Web Site: ftp://ftp.parc.xerox.com/pub/ilu/ilu.html.

18

[Jajo97]

S. Jajodia, L. Kerschberg (Eds.); Advanced Transaction Models and Architectures; Kluwer Academic Publishers, 1997.

[Moss85]

J. E. B. Moss; Nested Transactions: An Approach to Reliable Distributed Computing; The MIT Press, Cambridge, Massachusetts, 1985.

[OMG95]

Object Management Group; The Common Object Request Broker: Architecture and Specification, Version 2.0; 1995.

[Orac99]

Oracle Corporation. Oracle Web Site: http://www.oracle.com.

[Staf99]

Staffware Web Site: http://www.staffware.com.

[WfMC96] The Workflow Management Coalition; Interoperability Abstract Specification; Document Number WFMC-TC-1012; 1996. [Wodt96]

D. Wodtke, J. Weissenfels, G. Weikum, A. Dittrich; The MENTOR Project: Steps Towards Enterprise-wide Workflow Management; Proc. Int. Conf on Data Engineering, New Orleans, USA, 1996.

19

A Detailed DGTS Architecture The non-distributed Global Transaction Support (GTS) consists of two main parts, i.e. the Global Transaction Manager (GTM) and the Global Transaction (GT) objects, see Figure 9.

GTS

WMFS

GTM

Workflow Engine

GT

Workflow Instance

DBMS Figure 9: GTS - WFMS architecture

The GTM is responsible for the handling of compensation requests and the construction of compensating global transactions. The GT objects perform the administration of running global transactions.

A.1 Global Transaction Manager Architecture The GTM architecture is depicted in Figure 10. Conforming to the standard software module architecture used in the WIDE project, it consists of the following submodules: Communication Interface: the communication interface submodule contains methods for two-way communication with Workflow Instances. The methods of the communication module have the character of wrapper methods to ensure independence between GTM internals and GTM context. Control Logic: the control logic submodule contains the internal GTM process control driving the module. It is responsible for maintaining the internal state of GTM. It links and controls the other submodules.

20

CORBA / IDL Communication Interface

Kernel Logic

Control Logic

Data Interface

Figure 10: Internal GTM architecture

Kernel Logic: the kernel submodule contains the actual “intelligence” of GTM, i.e. the algorithms for constructing compensating global transactions [Gref98]. It is completely independent from internal states and external communication. Data Interface: the data interface submodule contains methods for global transaction data retrieval and data storage from the shared workflow database through the global transaction objects. It takes care of necessary translation and assembly/disassembly of information. The GTM as a whole is ‘wrapped’ in CORBA/IDL [OMG95], i.e. the external methods of the communication interface and the data access of the data interface take place through methods specified in IDL and routed through an object request broker service [Ilu99, Ilu96], thereby providing location transparency.

A.2 Global Transaction objects The Global Transaction (GT) objects handle the administration of workflow executions. They have a one-to-one relation to workflow instances, i.e. every workflow instance has one GT object administrating its execution and vice versa. The GT objects place the workflow execution related information in persistent storage,

21

e.g. the start and end of global transaction steps, the order in which global transaction steps are executed, if a global transaction step is a savepoint or not. In effect, the GT objects create a graph, like the graph shown in Figure 2, that represents the workflow execution and is used by the GTM to construct compensating global transactions. When the GTM has constructed a compensating global transaction, it passes it to the GT object relating to the workflow instance for which the compensating global transaction is constructed. The GT object transforms the compensating global transaction into a workflow process specification and places that in persistent storage so it can be executed by the workflow management system.

A.3 Distributed Communication Design Issues The distributed execution of workflow processes as described in this report, requires an extension to the communication interface of the GTM, see Figure 10 and Chapter 5. The communication interface extensions deal with global abort requests originating from a different site. This section discusses the possibilities between which modules the communication can take place and explains why we have chosen for a specific communication path.

A.3.1 General Communication Options The involved modules in a distributed global abort are the workflow instances (WFI), the GTMs and the GT objects. A total of nine communication combinations are therefore possible. The GT objects are only involved in passing information from the workflow instances to the database thereby storing workflow execution information, and for retrieving and storing graphs. The GT objects themselves cannot decide if a remote global abort invocation should be issued, which reduces the possible communication combinations to six, shown in Figure 11. To explain the options for the communication, suppose that WFIi has created WFIj, i.e., the workflow engine at site i has delegated a subprocess to the workflow engine at site j, see Figure 11. When WFIi fails it invokes a local global abort to GTMi. GTMi computes a compensation graph and finds that WFIj also needs to abort. Therefore a global abort needs to be requested at site j originating from site i. This can

22

be done in the following way (numbers below correspond to the numbers on the arrows in Figure 11):

WFI i

WFI j

1 2 3

GT i

GT j 4 5

GTM i

6

GTM j

Figure 11: Communication options

1. WFI to WFI. After the DGTS has constructed a compensating global transaction, it returns this compensating workflow to the WFI together with a list of WFIs that also need to be aborted. The WFI can then signal those WFIs to abort. 2. WFI to GT. Same as above, except the WFI does not signal the WFIs but it signals the corresponding GT objects. The called GT object should then signal the WFI to abort. 3. WFI to GTM. Same as above, except the WFI does not signal the WFIs nor the GT objects but it signals other GTM(s) or itself if there is no other GTM. The called GTM should then signal the WFI to abort. 4. GTM to WFI. While a GTM is constructing a compensating global transaction and it computes that another WFI also needs to be aborted, the GTM will immediately signal that WFI to abort. 5. GTM to GT. Same as above. However in stead of signaling another WFI to abort, the GTM will signal the GT object corresponding to that case to abort. The GT object will then signal the WFI to abort. 6. GTM to GTM. Same as above, except the GTM does not signal the WFI nor the GT object at the other site but the GTM at that other site. The called GTM should then signal the WFI to abort. The differences, advantages and disadvantages of the communication options are explained in the next subsections.

23

A.3.2 Communication from WFI to other Modules The first 3 communication options involve the WFI in sending out abort requests to other sites. This means that the WFI is performing some transaction management tasks, namely the distribution aspect, which should be handled by the DGTS. It also means that the orthogonality between the workflow system and the transaction manager system is reduced significantly. Options 1, 2, and 3 are therefore discarded as being viable communication path for the distributed global transaction support system.

SITE i

SITE j

WFI i

WFI j

client transaction server

SITE k

WFI k abort

abort

GTM j abort

GTM k

Figure 12: Communication: GTM -> WFI vs. GTM -> GTM

A.3.3 Communication from GTM to other Modules

Communication option 4 is shown in the left-hand side of Figure 12, while the righthand side shows communication option 6. In this figure, GTM j is computing a compensation graph and has computed that the other sites also need to abort. Suppose the GTM would call another GTM (option 6) preferably the GTM on the site where the case that has to be aborted is located. However, if there is no GTM on the other site, the GTM itself might signal the case to abort, but this changes the right-hand side of Figure 12 into the left-hand side which is option 4. The amount of communication between the client layer, i.e., the WFIs and transaction server layer, i.e., the GTMs is the same for both options, but the communication between sites is increased for option 6, which is therefore discarded. To choose between the remaining two options, i.e. options 4 and 5, we have to look at the actions that the WFI should take when an abort request is received originating form another site. These actions depend on the state the WFI resides in, e.g. when a 24

WFI is compensating in complete mode and a request for a partial compensating is received, that request can be ignored (see also Chapter 4). Keeping the workflow management system and the DGTS as orthogonal as possible, the decision about what action a WFI should take, should be made by the DGTS and not by the WFI itself. The DGTS should therefore know the state of the WFI, which it can request from the WFI (option 5) or which can be passed by the WFI (option 4), see Figure 13.

SITE i

SITE j

WFI i

WFI j

request status

GT i client transaction server

SITE k

WFI k abort

pass status

GT j

GT k

GTM j

GTM k

abort DGTS

Figure 13: Communication: GTM -> WFI vs. GTM -> GT object

If the global abort request by a GTM is made directly to a WFI at another site (option 4), the WFI simply passes the abort request and its status to an appropriate DGTS which can then decide what action the WFI should take. In option 5 the WFI is asked for its status and does not even see the global abort request, which provides for more orthogonality between the WFIs and DGTS. Comparing the left hand-side of Figure 13 (option 5) with the right-hand side (option 4) it is also clear that the transactional distribution aspect is kept inside the global transaction support if the GTM communicates with the GT objects. Option 5, the GTM communicates with the GT objects to issue distributed global abort requests, is therefore the preferred and chosen option.

25

A.3.4 Other Communication Issues

When a GT object is called by a GTM from another site, the GT object will request the status of the WFI that is involved in the distributed compensation. The GT object then decides which action the WFI has to perform, based on the received state, and signals this action to the WFI. As the WFI can have its state changed during the time the GT object is deciding which action the WFI should perform, the signaled action might not be consistent with the state anymore. This problem can be overcome by using a synchronization protocol. As stated in Chapter 6, the implementation of the extensions to the already implemented non-distributed GTS, requires little effort. A specification of the extensions to the GTS as described in this report has been completed, the implementation of which is currently under consideration.

26

Suggest Documents