Recovery management must use an original copy of the unmodified data to ... The DBE environment, as a service oriented business environment, tries to ..... coordinator on top of a transaction tree; naturally T1 tries to run first sub-transaction.
Communicating Process Architectures 2007 Alistair McEwan, Steve Schneider, Wilson Ifill and Peter Welch (Eds.) IOS Press, 2007 © 2007 The authors and IOS Press. All rights reserved.
1
Concurrency Control and Recovery Management for Open e-Business Transactions Amir R. RAZAVI, Sotiris K. MOSCHOYIANNIS and Paul J. KRAUSE Department of Computing, School of Electronics and Physical Sciences, University of Surrey, Guildford, Surrey, GU2 7XH, UK. {a.razavi, s.moschoyiannis, p.krause}@surrey.ac.uk Abstract. Concurrency control mechanisms such as turn-taking, locking, serialization, transactional locking mechanism, and operational transformation try to provide data consistency when concurrent activities are permitted in a reactive system. Locks are typically used in transactional models for assurance of data consistency and integrity in a concurrent environment. In addition, recovery management is used to preserve atomicity and durability in transaction models. Unfortunately, conventional lock mechanisms severely (and intentionally) limit concurrency in a transactional environment. Such lock mechanisms also limit recovery capabilities. Finally, existing recovery mechanisms themselves afford a considerable overhead to concurrency. This paper describes a new transaction model that supports release of early results inside and outside of a transaction, decreasing the severe limitations of conventional lock mechanisms, yet still warranties consistency and recoverability of released resources (results). This is achieved through use of a more flexible locking mechanism and by using two types of consistency graph. This provides an integrated solution for transaction management, recovery management and concurrency control. We argue that these are necessary features for management of long-term transactions within “digital ecosystems” of small to medium enterprises Keywords. concurrency control, recovery management, lock mechanism, compensation, long-term transactions, service-oriented architecture, consistency, recoverability, partial results, data dependency, conditional-commit, local coordination, business transactions.
Introduction This paper focuses on support for long-term transactions involving collaborations of small enterprises within a Digital Business Ecosystem [1]. Although there is significant current work on support for business transactions, we argue that this all rely on central coordination that provides unnecessary (and possibly threatening) governance over a community of collaborating enterprises. To address this, we offer an alternative transaction model that respects the local autonomy of the participants. This paper focuses on the basic transactional model in order to highlight the concurrency issues that are inherent in these kinds of reactive systems. Formal analysis of this model is in hand, and first results are reported in [2]. The conventional definition of a transaction [3] ACID properties: Atomicity – either all tasks in a transaction are performed, or none of them are; Consistency – data is in a
2
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
consistent state when the transaction begins, and when it ends; Isolation – all operations in a transaction are isolated from operations outside the transaction; Durability – upon successful completion, the result of the transaction will persist. Several concurrency control mechanisms are available for maintaining consistency of data items such as: turn-taking [4], locking [5], serialization [6], transactional locking mechanism [7][8], and operational transformation [9]. Lock mechanisms, as a widely used method for concurrency control in transaction models [8], provide enough isolation on modified data items (Exclusive lock) to ensure there is no access to any of these data items before a transaction that is accessing or updating them commits [8]. The constraint of atomicity requires that a transaction either fully succeeds, or some recovery management process is in place to ensure that all the data items being operated on in the transaction return to their original state should the transaction fail at any point prior to commitment. Recovery management must use an original copy of the unmodified data to ensure the possibility of recovering the system to a consistent check point (before running the faulty transaction). Recovery management may also use a log system (which works in parallel with the lock mechanism of the concurrency control), to support reversing, or rolling back, the actions of a transaction following failure. However, as we will discuss, if these properties are strictly adhered to in business transactions, they can present unacceptable limitations and reduce performance [10]. In order to design a transaction model suitable for Digital Business Ecosystems, we will focus on three specific requirements which cause problems for conventional transaction models [11], [12], [13], [10]: long-term transactions (also called long-running or long-life transactions); lack of partial results; and omitted results. Within the Digital Business Ecosystem (DBE) project [1], the term “Digital Business Ecosystem” is used at a variety of levels. It can refer to the run-time environment that supports deployment and execution of e-services. It can include the “factory” for developing and evolving services. But most importantly it can be expanded to include the enterprises and community that uses the ecosystem for publishing and consuming services. It is this last that is the most important driver for the underlying technology, since it is the ability to support a healthy and diverse socio-economic ecosystem that is the primary “business goal” of the project. From that comes a specific focus on supporting and enabling e-commerce with Small and Medium-sized Enterprises – contributors of over 50% of the EU GDP. The DBE environment, as a service oriented business environment, tries to facilitate business activities for SMEs in a loosely coupled manner without relying on a centralized provider. In this way, SMEs can provide services and initiate business transactions directly with each other. The environment is highly dynamic and service relatively frequent unavailability and/or change of SME providers is to be expected. Therefore we can anticipate these necessary attributes in such environment: •
Long-term transactions: a high range of B2B transactions (business activities [14], [15] or business transactions), has a long execution time period. Strictly adhering to ACID properties for such transactions can be highly problematic and can reduce concurrency dramatically. The application of a traditional lock system (as the concurrency control mechanism [3], [8]) for ensuring Isolation (or capturing some version of serializability[3], [8]), reduces concurrency and the general performance of the whole system (many transactions may have to wait for a long-term transaction to commit and release its resources or results). As a side effect, the probability for deadlock is also increased since long-term holding of locks directly increases the possibility for deadlock. Furthermore, the lack of centralized control
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
•
3
in a distributed transactional environment such as DBE hinders the effective application of a deadlock correction algorithm. Partial results: according to business semantics; releasing results from one transaction to another before commitment is another challenge in the DBE environment. According to conventional transaction models, releasing results before transactions commit is not legal, as it can misdirect the system to an inconsistent state should the transaction be aborted before final commit. Allowing for partial results and (ensuring) consistency are two aspects that clearly cannot fit within a conventional lock mechanism for concurrency control and log- or shadowbased recovery management [3], [15]. A wide range of business scenarios, however, demand partial results in specific circumstances. Therefore we need to reconsider this primary limitation but at the same time also provide consistency for the system.
•
Recoverability and failures: in the dynamic environment of distributed business transactions, there is a high probability for failure due to the temporary unavailability of a particular service. Thus, recoverability of transactions is important. Recovering the system in the event of failure or abortion of a transaction needs to be addressed in a way that takes into account the loosely-coupled manner of connections. This makes a recoverability mechanism in this context even more challenging. As we can not interfere with the local state of the underlying services, the recovery has to be done at the deployment level [16],[17] and service realization (which includes the state of a service) has to be hidden during recovery. This is a point which current transactional models often fail to address as will be further discusses in the sequel.
•
Diversity and alternative scenarios: by integrating SMEs, the DBE provides a rather diverse environment for business transactions. The provision for diversity has been discussed in the literature for service composition [16], [17], [2], [18]. When considered at the transaction model and/or business processes, it can provide a unique opportunity in not only covering a wider range of business processes but also in designing a corresponding recovery system [2], [18]. In conventional concurrency control and recovery management there is no technical consideration of using diversity for improving performance and reliability of the transactions.
•
Omitted results: one point of criticism for recovery systems often has to do with wasting intermediate results during the restart of a transaction after accruing a failure. The open question here is how much of these results can be saved (i.e. not being rolled back) and how. In other words, how we can preserve as much progress–to-date as possible. Raising to this challenge within a highly dynamic environment such as DBE can have significant direct benefits for SMEs in terms of saving time and resources.
Similar Approaches for Business Environment: In 2001, a consortium of companies including Oracle, Sun Microsystems, Choreology Ltd, Hewlett-Packard Co., IPNet, SeeBeyond Inc. Sybase, Interwoven Inc., Systinet and BEA System, began work on the Organization for Advance Structured Information Systems (OASIS) Business Transaction Protocol (BTP), which was aimed at B2B transactions in loosely-coupled domains such as Web services. By April 2002 it had reached the point of a
4
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
committee specification [19]. At the same time, others in the industry, including Microsoft, Hitachi, IBM, IONA, Arjuna Technologies and BEA Systems, released their own specifications: Web Services Coordination (WS-Coordination) and Web Services Transactions (WS-AtomicTransactions and WS-BusinessActivity) [20], [14]. Recently, Choreology Ltd. has started to make a joint protocol which tries to cover both models and this effort has highlighted the caveats of each as mentioned in [15]. The coordination mechanism of these well-known transaction models for webservices, namely BTP and WS-BusinessActivity, is based on WS-Coordination [21]. A study of this coordination framework however, reported in [22], shows it to suffer from some critical decisions about the internal build-up of the communicating parties; a view also supported in [23]. The Coordinator and Initiator roles are tightly-coupled and the Participant contains both business and transaction logic. These presumptions are against the primary requirements of SOA, particularly loose-coupling of services and local autonomy, and thus are not suitable for a digital business ecosystem, especially when SMEs are involved. A further concern has to do with the compensation mechanism. Behavioural patterns such as “validate-do” and “provisional-final” [23], [2], [15] are not supported while the “do-compensate” pattern, which is supported, results in a violation of local autonomy, since access to the service realisation level is required (see [22] for further details). Prescribing internal behaviour at the realisation level raises barriers for SMEs as it inevitably leads to their tight-coupling with the Coordinator. In previous work [2], [18], [15] we have been concerned with a distributed transaction model for digital business ecosystems. We have shown how a thorough understanding of the transaction behaviour, before run-time, can ease the adoption of behaviour patterns and compensation routines necessary to prevent unexpected behaviour (but without breaking local autonomy). In this paper, we are present a lock system that provides concurrency control, for data items within and between DBE transactions. Further, with the local autonomy of the coordinators in mind, we introduce two additional locks, an internal and a conditional-commit lock, which allow for exchange of data both inside and across transactions. We show how the lock system together with the logs generated by the transaction model can provide full consistency and ultimately lead to automation in this model. In the next section, we provide an overview of our primary log system, which has been introduced in [2]. In Section 3 we describe a mechanism for releasing uncommitted results between subtransactions of a transaction. Section 4 is concerned with the issue of releasing partial results between transactions (to the outside world). Section 5 recapitulates our concurrency model for a full recovery mechanism. The issue of omitted results is addressed in Section 6 which also describes a forward recovery mechanism. The paper finishes with some concluding remarks and a discussion on future extensions of this work. 1. Log System and Provided Graphs for Recoverability We have seen that in our approach [2] transactions are understood as pertaining to SOC [17] for B2B interactions. Hence, a transaction has structure, comprising a number of subtransactions which need to be coordinated accordingly (and locally), and execution is long-term in nature. In order to relax the ACID properties, particularly Atomicity and Isolation without compromising Consistency, we need to consider some additional structure that will warranty the consistency of the transaction model. Maintaining consistency is critically
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
5
important within a highly dynamic and purely distributed environment of a Digital Ecosystem. To reach this aim, we categorize the solution in two stages: providing recoverability and consistency by introducing a transaction model. In our approach, a transaction is represented by a tree structure. Each node is either a coordinator (a composition type) or a basic service (a leaf). Five different coordinator types are considered, drawing on [16], [2], [18], [15] that allow for various forms of service composition to be expressed in our model. 1.1 Local Coordinators At the heart of this transactional model are the local coordinators. They have to handle the complexities of the model and control/generate all logs. At the same time, they should have enough flexibility for handling the low bandwidth (and low processing power) limitations from some nodes in the network. Based on different types of compositions [16], we use different type of coordinators. Therefore a transaction will split to a nested group of sub-transactions with a tree structure (nested transaction model). The root of this tree is the main composition, which is a coordinator and each sub-transaction is either a coordinator or a simple service (in the leaf). There are five different coordinator types plus delegation coordination for handling delegation: •
Data oriented coordinator: This coordinator is specifically working on data oriented service composition; including fully atomic and simple service oriented which is dealing with released data item inside of a transaction or using partial results, released by other transactions.
•
Sequential process oriented coordinator: This coordinator is invoking its subtransactions (services) sequentially. The execution of a sub-transaction is dependent on its previous service, i.e., one cannot begin unless the previous subtransaction commits. In fact this coordinator handles Sequential process oriented service composition by covering both Sequential with commitment dependency (SCD) and Sequential with data dependency (SDD).
•
Parallel process oriented coordinator: In the Parallel oriented coordinator all the sub-transaction (component services) can be executed in parallel but different scenarios can be considered which can make different situations (implementations) in the transactional outlook which covers; Parallel with data dependency (PDD), Parallel with commit dependency (PCD) and Parallel without dependency (PND).
•
Sequential alternative coordinator: This coordinator indicates that there are alternative sub-transactions (services) to be combined, and they are ordered based on some criterion (e.g., cost, time, etc). They will be attempted in succession until one sub-transaction (service) produces the desired outcome. In fact it is for supporting Sequential alternative composition (SAt) and it may use dynamically for forward recovery.
•
Parallel alternative coordinator: Unlike the previous coordinator, alternative subtransactions (services) are pursued in parallel. As soon as any one of the subtransaction (service) succeeds the other parallel sub-transactions are aborted (as it
6
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
has clear, this coordinator rely on reliable compensation mechanism). Actually the Parallel alternative coordinator handles Parallel alternative composition (PAt). •
Delegation coordinator: The whole transaction or a sub transaction can be delegated to another platform; delegation can be by sending request specification or service(s) description. Figure 1 shows the DBE transaction model structure [13],[2],[16].
Figure 1. Transaction model structure
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
7
1.2 Internal Dependency Graph (IDG) Two different graphs are introduced for: keeping track of data (value) dependencies; providing support for reversing actions; supporting a deadlock control mechanism; and, transparency during delegation. These graphs provide important system logs, which are stored locally on a coordinator and will be effected locally (in terms of local faults, forward recovery and contingencies plan) and globally (abortion, restart etc). The Internal Dependency Graph (IDG) is a directed graph in which each node represents a coordinator and the direction shows the dependency between two nodes. Its purpose is to keep logs on value dependencies in a transaction tree. In further explanation,, when a coordinator wants to use a data item belonging to another coordinator, two nodes have to be created in the IDG (if they do not already exist) and an edge generated between them (the direction of which shows the dependency between the two coordinators). Figure 2 shows an example of a SDD coordinator when IDSi releases data item(s) to IDSi+1 and IDSi+1 releases data items to IDSi+2. This means that IDSi+1 is dependent on IDSi+1 and IDSi+1 is dependent on IDSi (on the other hand if some failure happen for IDSi, the coordinator by traversing the graph knows who used the results from IDSi which are not consistent anymore).
Figure 2. Sequential Data Dependency Coordinator and Associated Internal Dependency Graph
1.3 Conditional Commit, External Dependency Graph (EDG) When a subtransaction needs to access a released data item which belongs to another DBE transaction this dependency is shown by creating a directed link between these two nodes from the owner to the user of that data item. As an example, Figure 3 shows the release of partial results from two subtransactions of IDHC1 to IDHC2. As shown in the figure the two nodes appear linked in the corresponding EDG – notice the direction is towards the consumer of data thus indicating this data item usage.
8
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
Figure 3. EDG for releasing partial results
If each of these nodes are absent in EDG, they must be added and if nodes and a connection between them already exist, there is no need for repetition. The most important usage of this graph is in the creation of compensatory transactions during a failure. By using the IDG and EDG, we have provided a framework which shows dependencies between coordinators and the order of execution in the transaction tree. This gives a foundation for recoverability. But the internal structure of the local coordinator (local coordination framework) still is not explained and the feasibility of the model relies on it. The IDG and EDG can support provision of a routine for recovering the system in a global view but they show neither the internal behaviour of a coordinator, nor the automated routines of each coordinator for avoiding the propagation of failure or clarifying support for loosely coupled behaviour patterns. We may ask these questions: how can a coordinator release deployed data items to other coordinators of the same transaction, and which safeguards/procedures should be considered in concurrency control (for example on SDD or PDD coordinators)? How, when and based on which safeguards (which lock mechanism on the concurrency control), can deployed data items be released to a coordinator of another transaction (Partial results) and which internal structure will support this procedure? How will failure and abortion of a transaction be managed internally in a coordinator and how can complete failure be minimized and recovery automated? The next section provides answers to these questions. 2. Releasing Data Inside a Transaction Implementing locks as a conventional mechanism in concurrency control provides a practical mechanism for preserving consistency while allowing for (restricted) concurrency
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
9
in the transactional system. However, the traditional two-phase S/X lock model does not give permission for releasing data items before a transaction commits. Based on this model [3], [8], once a Shared lock (S_Lock) is converted to an Exclusive lock (X_Lock), the respective data item can only be accessed by the owner of the lock (who triggered the transition from S_Lock to X_Lock). In this way subtransactions cannot share their modified data items between (as they have been locked by Exclusive lock and can not be released before the transaction commits). In contrast, in our approach deployed data items are made available to other subtransactions of the same transaction (by using the corresponding IDG). We introduce an Internal lock (I_Lock) which in combination with the IDG can provide a convenient practical mechanism for releasing data inside a transaction. When a subtransaction needs to release some results before commitment, it will use the I_Lock as a relaxed version of X_Lock. This has the effect that other subtransactions can use these results by adding an entry to the IDG. For example, in a parallel coordinator each child not only can use S_Lock and X_Lock but also it can convert an X_Lock to I_Lock and release that data item for the other children of the coordinator (applying data dependency). This means that the other subtransactions can (provisionally) read/modify this data item, as well as the owner/generator of the data item. In comparison with the conventional usage of X_Lock, which decreases concurrency dramatically since it isolates deployed data items, I_lock not only supports a higher level of collaboration inside a transaction, but also allows more concurrent subtransactions to be executed and their results shared. It also provides a proper structure for any compensation during possible failures, as will be discussed in Section 4.
Figure 4. Internal lock (I_Lock) schema
10
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
2.1 I_Lock as Mechanism for Releasing the Un-committed Results Inside a Transaction The use of I_Lock (Figure 4.) allows for the generation of new logs, which could be used in creating/updating the corresponding IDG [2], [15]. The necessary information from the owner of each I_Lock is the unique identifier of the main transaction (IDT), the identifier of the parent (parallel coordinator), IDSh, and the identifier of the subtransaction (IDS). When another subtransaction needs to access a data item, a validation process will compare IDSh with the parallel scheduler of the subtransaction. In the sequential coordinator with data dependency (SDD), I_Lock is again used for data access with a similar method. When each child modifies any data item, it uses X_Lock on it and after committing the child (subtransaction), X_Lock will be converted to I_Lock. Remember that only subtransactions (children) with the same parent id can access that data item and modify it. In the case of a value dependency, a data item will be released when converting the X_Lock to I_Lock. This means the other children of the same parent (the I_Lock owner) can use the data item. The combination of I_Lock and the IDG shows the chain of dependencies between different coordinators. In the final section of this paper, we will discuss a possible algorithm that this enables us to design for the combination for deadlock detection and correction. In Figure 4 we show the schema for I_Lock conversion from/to X_Lock and final commit of the transaction. Converting the lock to S_lock provides the possibility to share the result or return it to the initiator of the transaction. 3. Partial Results One of the novel aspects of our transaction model [2] for DBEs has to do with the release of partial results. These are results which are released (to some other transactions) during a long-term transaction before the transaction commits (conditional-commit). This requires a mechanism for concurrency control and recovery management to be designed to maintain the integrity and consistency of all data. 3.1 Conditional Commit by Using C_Lock (after 1st Phase of Commit) As we have seen in the previous chapter, I_Lock in collaboration with the IDG provides for the possibility of releasing data items to the other subtransactions of the same transaction. But another important problem concerns releasing results to other transactions. The inability to do this not only stops transactions from being executed concurrently, but also, according to the nature of business activities which may have a long duration or life time, can stop a wide range of transactions from reaching their targets. Using a similar approach to that of introducing the internal lock I_Lock, we introduce a conditional-commit lock (C_Lock) which in collaboration with the EDG can provide a safe mechanism for releasing partial results to a subtransaction within another transaction. It works as follows. In the first step, a transaction can release its data item by using C_Lock on them (before commit). When a data item has C_Lock, that data item is available but some logs must be written during any usage of data (in the corresponding EDG). The released data item is from a data-oriented coordinator to the other data-oriented coordinator of another transaction. If a failure occurs, the compensating mechanism must be run. In this mechanism, transactions that used the released data item must run the same mechanism (rollback/abort). In the process of conditional-commit a data item with X_Lock can be converted to C_Lock (or I_Lock for Internal data release) for releasing partial results. The necessary
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
11
information from the owner of each C_Lock is the unique identification of a transaction (IDT) and the identification of the compensatory subtransaction (IDS). The combination of C_Lock lifecycle and EDG (and IDG) provide a practical mechanism for compensation (recoverability) which warranties the consistency of the system and will be discussed in the next chapter.
Figure 5. Conditional commit lock (C_Lock) schema
Figure 5, shows the lifecycle of C_Lock (without considering any failure. This is covered in the following section). In these circumstances, the final commit will be the final stage of C_Lock which triggers conversion of the C_Lock to an S_Lock. At that point, results can be returned to the initiator of the transaction and a signal can be sent to the other dependent transactions for permission to proceed with their commit. 4. Recovery Management Recovery management in a Digital Business Ecosystem has to deal with specific challenges which other similar models do not have to handle. One of the most important differences is the purely distributed nature of a DBE and the participation of SMEs. The (necessary) lack of a strong central point for managing the recovery procedure, forces the model towards a distributed algorithm which is supposed to not only handle but also predict failures. Localising recovery, guides us to delegate full responsibility to local coordinators. We start by considering loss of communication between two coordinators as the first sign of possible failure. Based on this presumption, we provide a mechanism for applying an effective policy for a local coordinator to rolling back the transaction effects on the environment.
12
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
The other challenge is the high speed of failure propagation, which can lead the system towards a completely inconsistent situation. By using an oracle from nature (cf Sections 4.2 and 4.3), we introduce a mechanism for limiting the side effects of a failure and apply the recovery procedure in two phases. Using distributed (and probably replicated) logs (provided by the IDG and EDG), gives more opportunity for generalising our mechanism. The cost of the recovery and the range of waste during the procedure, was another motivation for applying an optimization mechanism (5) and trying to avoid a full system rollback, during the recovery procedure. There are two established methods for designing recovery management [3]: Shadow paging; and, Log-based. As the proposed model is a fully distributed model which can be widely generalised, shadow paging can not be considered as a suitable method because of the global overheads [24]. The structure of our model has a similarity with a log-based system with several features that make the method feasible for such a complex environment. Two types of information are released before final commitment, which provide certain complexities for our recovery management. The first type is the release of results between subtransactions within a transaction. The second type is release of partial results between different transactions before their commitment. In order to support release of results within a transaction we have introduced an internal log with a graph structure that records the internal dependencies for a recovery routine when a failure occurs (drawn from the IDG). To support release of partial results (release of information between DBE transactions) we can use the other dependency log which is recording external dependencies (EDG). The graph creation, the order of the recovery manager execution, and the routines for the release of results (both within and between transactions) has been analysed so far and conventionally it is considered to be the responsibility of the concurrency control mechanism. In contrast with the conventional methodology, one of the DBE necessities (given the dynamicity and unpredictable nature of the environment) was to merge these two and as we explain in this section, our design reflects this fact. 4.1 Fully Isolated Recovery and Using R-lock The nature of business activities and long-term transactions infers that considering the recovery system as a practical mechanism directly attached to the transaction, leads to an unacceptable long time period of the recovery system. Accruing a fault in a DBE transaction does not necessarily mean full abortion of the transaction (because of the nature of a distributed network and the diversity of the DBE environment there is possibility to perform a task in different ways). Rather, it could necessitate the restart of some subtransaction or repair and/or choosing of some alternative scenario. Additionally, it is important to note that restart/repair mechanisms can become part of an abort/restart chain (in a different transaction). This is why Recovery Management is one of the most crucial and important parts of the transaction model. In order to design this part, we drew analogies from the biochemical diagnosis of an infectious disease; the isolation of enzymes from infected tissue can also provide the basis of a biochemical diagnosis of an infectious disease [25]. Common strategies of public health authorities for containing the spread of a contagious illness rely on isolation and quarantine [26]. This provides further inspiration for designing our recovery model. Overall, Recovery Management in combination with the concurrency control procedure runs in two phases:
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
13
1. Preparation phase: by sending a message (abort/restart) to all subtransactions that puts them (and their data) to an isolated mode (preparing for recovery). This helps avoid any propagation of inconsistent data before rollback. 2. Atomic Recovery Transaction routine: recovery routine will be run as an atomicisolation (quarantine) procedure that can rollback or just pass (without applying any changes) a subtransaction. It can be seen that the first task of Recovery Management in the transaction model is to isolate the failed transaction and related transactions (those using its partial results directly or indirectly), then to determine the damaged part (where the failure occurred), and finally to rollback to a consistent virtual check point (our system does not work based on determining actual check points, but virtually by using the logs and structural definitions of coordinators - we can rollback the system to the part of transaction tree in which the corresponding coordinator is working well and then that specific coordinator can lead the transaction to the next step). The compensable nature of our model can help on what could be done by compensating transactions (after applying the preparation phase). Another benefit of a two phase recovery management is the possibility for saving valuable results provided by safe subtransactions until the transaction is restarted. 4.2 Two Phase Recovery Routine In the first phase, Recovery Management tries just to isolate the damaged (or failed) part of the system by distributing a message that can isolate all worked data-items of those subtransactions. In the transaction model, we have seen that modified data items can be locked by two different locks, I_Lock and C_Lock. As it was shown, data items that are locked by I_Lock, can be used just internally (IDG). Therefore, when the transaction is aborted (or restarted) there is no danger of misuse of these data items by the other transactions (because they do not have access to these items). These data items naturally can be considered atomic. They will be rolled back (if necessary) by using the IDG. The only issue is whether we need to rollback all data items. Only the damaged part (and related data items) of a transaction must be rolled back (and all related parts as determined by IDG). The other modified data items are locked by the C_Lock and so are available for all other transactions. Meanwhile by following the EDG, the other transactions which used these partial results are in danger of abortion (or a restart), at least in some parts of a transaction. Therefore they must be identified as soon as possible. In fact, this must be done in the preparation phase, because the procedure of rollback for C_Lock can result in chains of rollback operations which can take time to commit. 4.3 Solution for Isolation in Recovery For the critical part of the problem (C_Lock), the lock must be converted to R_Lock (Recovery Lock) by using the EDG and without any processing on data. The R_Lock must restrict access to data purely to Recovery Management in a transaction. This stops problem (failure) propagation until the Recovery routine is finished.
14
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
Figure 6. Recovery lock (R_Lock) schema
For the I_Lock optimization, we define T_Lock (Time-out lock) by some key abilities in a DBE transaction. The T_Lock is rather like giving a time-out before rollback of a data item. In addition, access to the data item will be limited to Recovery routines (avoiding failure propagation). Before finally considering a time-out, Recovery Management has the opportunity for reconverting a T_Lock to I_Lock (if rollback is not necessary). However, after finishing the time-out the data item will be rolled back automatically (figure 6 shows the effect of Recovery on the locking system). 5. Omitted Results and Forward Recovery The probability for failure (for example because of a disconnection between different coordinators), can activate recovery and the preparation phase of recovery can be started. As we have seen, in the preparation phase, C_Locked (in all related transactions) data items were converted to R_Lock by using the EDG and all I_Locked data items were converted to T_Lock by using the IDG. Therefore Recovery Management in phase two behaves like a full ACID transaction in that it is fully isolated during the lifetime of a transaction. However, using a suitable data structure the recovery manager transaction is optimized by providing not only special concurrent (by introducing the isolated T_Lock structure) operations, but also enables the possibility for saving key results of some sub-transactions even when the transaction has failed and been restarted.
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
15
Figure 7. Creating compensating routines using EDG and IDG
The normal procedure of phase two of recovery of the transaction will be done by traversing EDG and IDG. For rollback of partial results, traversing the EDG will help to create/execute compensatory transactions (figure 7); the T_Lock can provide for an automatic rollback operation (after passing the time-out). However, for revalidating the correct data-items before time-out, the recovery manager transaction traverses the IDG and recalculates the data items. Then for unchecked data items it reconverts T_Lock to I_Lock, which can be useful in forward recovery and/or restart of aborted transactions. In this way, recalculating a specific data item is unnecessary. This will be happen only if a data item of T_lock has not been dependant to some inconsistent data item in the IDG graph. 5.1 Forward Recovery Within a Digital Business Ecosystem, a number of long-running and multi-service transactions take place. Each comprises an aggregation of sub-transactions. There is an increased likelihood that at some point a subtransaction might fail. This may be due to a platform failure or its coordinator not responding or, simply, because it is a child of a Parallel Alternative coordinator and some alternative sub-transaction has already met the pre-set condition. There must be a way to compensate for such occasions and defer from aborting or even restarting the whole transaction. Forward recovery is reliant on alternative coordinators (SAt and PAt at section 1.1) and the compensation operation in recovery management (section 4). By failing one subtransaction of an alternative coordinator, that specific sub-transaction should be fully rolled back (by some compensation mechanism) and then the alternative coordinator tries to commit the transaction with its other sub-transaction(s). Figure 8, shows an example in which transaction T1 is using a sequential alternative coordinator on top of a transaction tree; naturally T1 tries to run first sub-transaction (‘T1,B1’ on the figure 8). If a failure happens (for example failure on s1), T1,B1 must be compensated (in this scenario some partial results has been released to d3 from transaction T2 and this means by using EDG, those results should be rolled back too. This will be reflected in the compensation tree). After this compensation, the alternative sequential coordinator of T1 tries to run the second sub-transaction; T1,B2 and partial results will be
16
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
released to transaction T2 from this sub-transaction (it will be reflected on the corresponding EDG). If T1,B2 is not successful too, transaction T1 will be fully aborted (recovered). But the interesting part is on transaction T2, which even after abortion of T1, needs to compensate any dependant results to d3 and then will try s5, which means the whole transaction will not fail even when T2 uses partial results from the aborted transaction (T1). Only dependant sub-transactions will be rolled back, and T2 will try to continue the execution and successfully commit.
Figure 8. Forward recovery in the transaction model
This example (Figure 8), shows forward recovery in two different levels; firstly when we have an alternative coordinator and all dependencies are internal (T1,B1 to T1,B2), and secondly when a transaction (T2) uses a partial result of another transaction (T1) and it is dependant on that transaction (still T2 tries to avoid full recovery and will only modify its affected part). 6. Full Lock Schema In total, there are 6 different locks for Concurrency Control in our transaction model. Two locks (R_Lock and T_Lock) are related to maintaining atomicity, and optimization during recovery. The S_Lock and X_Lock (eXclusive Lock) have similar behaviour to a conventional two-phase commit transaction model. However, value dependency and conditional commitment (partial results) can change the S_Lock /X_Lock behaviour (Figure 9 shows the full life cycle of locking system). By using I_Lock, we relax the X_Lock and increase the support for concurrency inside a long-term transaction. Using C_Lock ables us to provide concurrency even when there are data dependencies between transactions (conventionally this was not possible, as with X_Lock there is not any permission for sharing data items before a transaction commits). IDG and EDG, as a two types of dependency graph have a complementary role of providing full recoverability for the transaction model.
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
17
Figure 9. Full life cycle of the locking system
7. Further Work and Conclusion The nature of the transactions that take place in a highly dynamic and distributed environment such as that of a Digital Business Ecosystem (DBE) raises a number of nontrivial issues with respect to defining a consistent transaction model. In this paper, we have presented the fundamentals of a concurrency control mechanism based on an extended lock implementation for DBE transaction model that satisfies a number of issues that arise in providing a collaborative distributed software environment for SMEs. The long-term nature of business transactions frames the concept of a transaction in Digital Business Ecosystems. Conceptually support for recoverability and data consistency, cause considerable limitations on concurrency, which are reflected in the limitations of conventional concurrency control mechanisms as applied in a transactional environment [8]. We have described an extended locking mechanism that supports the DBE transaction model. This is done in a way that ensures data consistency and transaction recoverability; at the same time it maximizes concurrency by relaxing the concurrency control limitations and introduces a flexible structure to support that. More specifically, we described the use of two locks, namely I-Lock and C-Lock, for ensuring consistency between the distributed logs as provided by the IDG and EDG and the local concurrency model. We also introduced a lock, the so-called T-Lock, for covering omitted results in common distributed events. Finally, we described a lock for recovery, named R-Lock, which facilitates an isolated two-phase recovery routine.
18
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
These different locking schemes, as apart of the concurrency control, can provide mechanisms to support compensation and forward recovery in a way that ensures local progress-to-date is preserved as much as possible. The locking mechanism is set up in such a way that it allows us to introduce a customised three-phase commit (3PC) communication mechanism, where the intermediate phase is used for addressing unexpected failures in the commit state. 7.1 Further Approaches and Future Work Apart from increasing concurrency, another benefit of our work is that by relaxing the lock system and relying on logs for consistency and recoverability, the average duration of locks is reduced (comparing with conventional model in which a simple X_Lock could have the same time duration as its transaction and only after transaction commit could release data items to other transactions), it is possible to claim potential for a dramatic reduction on the probability of deadlock. Our interest for future work is not just measuring this reduction, but also designing deadlock detection/prevention algorithms. In the case of deadlock correction, we are interested to reduce the probability for a transaction blocking and starvation (abortion of a transaction for avoiding and/or correcting a deadlock scenario). Our preliminary approaches show that by detecting loops in the IDG and EDG, and a combined graph of both, it is possible to find all possibilities of deadlock. On the other hand, the primary proposed method for avoiding starvation is relying on alternative scenarios and forward recovery during prevention of deadlock, instead of restarting the whole transaction. In this way, that specific transaction which causes the loop on an EDG, can abort one its subtransactions (coordinators) and use an alternative subtransaction for avoiding the creation of a loop on the graph (deadlock scenario). As is clear, checking against deadlock and other pathological properties has potential for further integration on this model. Meanwhile connection of the model with the semantic of particular business processes of SMEs is another area that sponsors of the Digital Business Ecosystem would like to have a solution for. The minimum requirements for the structural infrastructure of DBE network that supports this model is another issue of discussion and research in the wider scope of Digital Business Ecosystems. Acknowledgements This work was supported by the EU FP6-IST funded projects DBE (Contract No. 507953) and OPAALS (Contract No. 034824).
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
19
References [1] Digital Business Ecosystems (DBE) EU IST Integrated Project No 507953. Available http://www.digital-ecosystem.org [19 Sep 2006]. [2] A. Razavi, S.Moschoyiannis, P.Krause. A Coordination Model for Distributed Transactions in Digital Business Ecosystems. In Proc. IEEE Int’l Conf on Digital Ecosystems and Technologies (IEEEDEST’07). IEEE Computer Society, 2007. [3] C. J. Date , An introduction to Database Systems (5th edition), Addison Wesley, USA, 1996. [4] S. Greenberg and D. Marwood. Real time groupware as a distributed system: concurrency control and its effect on the interface. In Proc. ACM Conference on Computer Supported Cooperative Work, pages 207– 217. ACM Press, Nov. 1994. [5] L. McGuffin and G. Olson. ShrEdit: A Shared Electronic Workshpace. CSMIL Technical Report, 13, 1992. [6] C. Sun and C. Ellis. Operational transformation in real-time group editors: Issues, algorithms, and achievements. In Proceedings of ACM Conference on Computer Supported Cooperative Work, pages 59–68. ACM Press, Nov. 1998. [7] P. Bernstein, N. Goodman, and V. Hadzilacos. Concurrency Control and Recorvery in Database Systems. Addision-Welsley, 1987. [8] J. Gray, A. Reuter. Transaction processing: Concepts and Techniques, Morgan Kaufmann Publishers, USA, 1993. [9] C. Sun, X. Jia, Y. Zhang, Y. Yang, and D. Chen. Achieving convergence, causality-preservation, and intention-preservation in real-time cooperative editing systems. ACM Transactions on Computer-Human Interaction, 5(1):63 – 108, Mar. 1998. [10] A. Elmagarmid , Database Transaction Model for Advanced applications, Morgan – Kaufmann, 1994. [11] J. E. B. Moss , Nested transaction an approach to Reliable Distributed Computing, MIT Press, USA, 1985. [12] T. Kakeshita, Xu Haiyan , “Transaction sequencing problems for maximal parallelism”, Second International Workshop on Transaction and Query Processing (IEEE), 2-3 Feb. 1992, pp: 215 – 216, 1992. [13] M.S. Haghjoo, M.P. Papazoglou, “TrActorS: a transactional actor system for distributed query processing”, Proceedings of the 12th International Conference on Distributed Computing Systems (IEEE CNF), 9-12 June 1992, pp: 682 – 689, 1992. [14] L.F. Cabrera, G. Copeland, W. Cox et al. Web Services Business Activity Framework (WSBusinessActivity). August 2005. Available http://www128.ibm.com/developerworks/webservices [19 Sep 2006] [15] A. Razavi, P.J. Krause and S.K. Moschoyiannis. DBE Report D24.28, Universtiy of Surrey, 2006. [16] J. Yang, M. Papazoglou and W-J. van de Heuvel. Tackling the Chal-lenges of Service Composition in EMarketplaces. In Proc. 12th RIDE-2EC, pp. 125-133, IEEE Computer Society, 2002. [17] M.P. Papazoglou. Service-Oriented Computing: Concepts, Charac-teristics and Directions. In Proc. WISE’03, IEEE, pp. 3-12, 2003. [18] A. Razavi, P. Malone, S.Moschoyiannis, B.Jennings, P.Krause. A Distributed Transaction and Accounting Model for Digital Ecosystem Composed Services. In Proc. IEEE Int’l Conf on Digital Ecosystems and Technologies (IEEE-DEST’07). IEEE Computer Society, 2007. [19] P. Furnis, S. Dala, T. Fletcher et al. Business Transaction Protocol, version 1.1.0, November 2004. Available at http://www.oasisopen. org/committes/downaload.php [19 September 2006] [20] L.F. Cabrera, G. Copeland, J. Johnson and D. Langworthy. Coordinating Web Services Activities with WS-Coordination, WSAtomicTransaction, and WS-BusinessActivity. January 2004. Available: http://msdn.microsoft.com/webservices/default.aspx [19 September 2006]
20
A.R. Razavi et al. / Concurrency Control and Recovery for Open e-Business
[21] L.F. Cabrera, G. Copeland, M. Feingold et al. Web Services Coordination (WS-Coordination). August 2005. Available http://www-128.ibm.com/developerworks/webservices/library/specification/ws-tx [19 September 2006] [22] P. Furnis and A. Green. Choreology Ltd. Contribution to the OASIS WS-TX Technical Committee relating to WS-Coordination, WSAtomicTransaction and WS-BusinessActivity. November 2005. [23] F.H. Vogt, S. Zambrovski, B. Grushko et al. Implementing Web Ser-vice Protocols in SOA: WSCoordination and WS-BusinessActivity. In Proc.7th IEEE Conf on E-Commerce Technology Workshops, pp. 21-26, IEEE Computer Society, 2005. [24] van der Meer, D. Datta, A. Dutta, K. Ramamritham, K. Navathe, S.B. (2003), “Mobile user recovery in the context of Internet transactions”, IEEE Transactions on Mobile Computing, Volume: 2, Issue: 2, April-June 2003, pp: 132 – 146. [25] Wikipedia, 'Infectious disease', http://en.wikipedia.org/wiki/Infectious_disease (last access: 08/03/2007). [26] US Department of Health and Human Services, 'Fact Sheet: Isolation and Quarantine', Department of health and Human Services; Centers for Disease Control and Prevention, , last access: 08/03/2007.