Scheduling Transactions on Distributed Systems with the VPL Engine eva K uhn, Wei Liu and Herbert Pohlai
In: Proceedings of the Second Biennial European Joint Conference on Engineering Systems Design, ESDA'94, published by The American Society of Mechanical Engineers (ASME) and IEEE, London, July 4{7, England, 1994.
1
In: Proceedings of the ESDA Conference, 1994
Scheduling Transactions on Distributed Systems with the VPL Engine eva Kuhn
Wei Liu
Herbert Pohlai
University of Technology Vienna, Institute of Computer Languages, Argentinierstr. 8, 1040 Vienna, Austria, Europe feva,wei,
[email protected]
Abstract The coordination of distributed services and the integration of data repositories that are maintained by autonomous distributed databases requires exible transaction management. Recently numerous advanced transaction models have been proposed that extend traditional transaction models by releasing one or more of the classical ACID (atomicity, consistency, integrity, durability) properties of transactions. Advanced transaction processing requires a notation for the speci cation, a high-level communication mechanism to control the trac between the dierent sites as well as a exible scheduling mechanism. In previous work we have shown how our parallel logic based coordination language VPL can serve as transaction speci cation language and support reliable communication. The main concern of this paper is to demonstrate that the VPL control constructs for AND/OR concurrency can be employed directly for the scheduling of compensatable and non-compensatable distributed transactions. We also show some demo programs that we have run with our prototype system and explain some extensions towards more sophisticated scheduling strategies.
Keywords:
Distributed Transactions, Multi Databases, Concurrency Control
1 Introduction The notion of work ow management is frequently used by researches in the area of multi database sys The work is supported by the Austrian FWF (Fonds zur Forderung der wissenschaftlichen Forschung), project \Multidatabase Transaction Processing", contract number P9020PHY in cooperation with the NSF (National Science Foundation).
tems [DEB93]. It has been recognized that the multi database [BHP92, LMR90] problem is much more related with the coordination of distributed and autonomous services than with traditional homogeneous database management. A main task of distributed transaction processing in open environments, like public networks, is the speci cation of data and the control ow between these dierent activities (= services oered by local systems), including constraints and dependencies. Software support to control the execution of a global transaction composed of many such activities or transactions, as well as advanced transaction models that extend the classical transaction model are needed [Elm92]. The Flex Transaction model [ELLR90, RELL90] has been proposed for multi database transaction processing. It releases the isolation property of the classical transaction model [BHG87]. The Flex Transaction Model provides a lot of novel features like function replication and semantic compensation [GMS87]. Function replication. The same task can often be accomplished by more than one service in a network. If one service fails or is not available, another one can be tried, instead. Semantic compensation. A global transaction must not lock local (database) systems over a long period of time. The autonomy of the local systems [GMK88] must not be compromised: a local system wants to continue servicing its users even if it is accessed by a global transaction. A possible solution is to allow subtransactions to commit and make their eects globally visible before the global (nested) transaction commits. Thus it can happen that the global transaction fails after some subtransactions commit. So all already committed subtransactions must be undone. As already other transactions may have seen the results, only semantic compensation is feasible. Dependencies between transactions and external
constraints have an in uence on transaction execution. Dependencies are either data dependencies (one transaction depends on data computed by another one) or user de ned preferences between several semantically equivalent alternatives. Dependencies can also be introduced by scheduler for optimization purposes. For example, the scheduler may have information about the costs of transaction. A good scheduling policy is to try cheap alternatives rst so that the start of expensive alternatives depends on the failure of the cheap ones. For the speci cation|to a large extent done by users and system administrators|a high-level, declarative speci cation language is required [Kuh93]. A mathematicalnotation has been proposed for the speci cation of Flex Transactions. As an example, dependencies between transactions are de ned as sets of order relations. For the execution of a Flex Transaction, a scheduler must be designed that understands this notation. The scheduler derives an optimal execution plan from a given speci cation and controls the execution. This task comprises the synchronization and communication among parallel processes, representing the execution of transactions. Information about the current execution states of transactions must be maintained by the scheduler. The scheduling of Flex Transactions is discussed in [BCDE93, ANRS92, RS93]. ienna Our V Parallel Logic language [KPP93] can be used for such coordination tasks: it provides advanced language constructs (operators) for concurrency control and oers a high-level communication mechanism based on shared data objects, termed communication variables in VPL. The communication aspects in a distributed system are out of the scope of this paper and have been discussed elsewhere [Kuh94]. The focus of this paper is to explain the operational semantics of a subset of VPL, i.e., VPL without ordinary logic variables and without communication variables, instead only the control ow of the concurrency operators is explained. We demonstrate that VPL's concurrency control mechanisms are sucient to manage distributed global transactions. In Section 2 we show how transactions at autonomous local systems are called from VPL. In Section 3 a control ow model of VPLs concurrency operators is developed based on the usual control ow model for sequential Prolog [MC93]. In Section 4 we brie y explain Flex Transactions and de ne a relevant subclass of Flex Transactions that we call declarative transactions . The automatic generation of a VPL program from a Flex Transaction speci cation is ex-
plained. In Section 5 an example for the scheduling of a transaction over databases containing travel information is given and some extensions towards more sophisticated scheduling strategies are shown.
2 Representation of Local Transactions A global transaction |if the local systems executing it are database systems, also termed multidatabase transaction |consists of several transactions that access autonomous software systems, distributed at different sites. Let us assume a procedure remote call(system@site,Query,Result)
that executes the query Query as a local transaction t at the speci ed software system and returns the result in Result. The remote call is responsible for the communication between the site where the global transaction is started and controlled, and the site where the local software system resides. In this paper we are not interested in the realization of a remote call and only specify its interfaces. For details on the realization of remote calls with VPL we refer to [KPE91, BKP93]. If a transaction is non-compensatable, the local system must provide the 2PC (2-phase-commit) protocol with a visible prepared-to-commit-state [BHG87]. The decision about the commitment of this transaction is delayed until the commitment of the global transaction. A non-compensatable transaction t = remote call2PC (system@site, Query,Result)
can be represented as: abort do
succ comm succ
?
-
-
fail
-
t
redo
Note that the comm-succ--EXIT (committed success exit ) is not needed for this box. The control ow of the execution of a compensatable transaction t = remote call(system@site, Query, Result)
and its associated compensate action comp t = remote call(system@site, Compensate Action, Result)
is represented as: abort do
succ comm COMMIT(t, comp t) succ ?
-
-
-
fail
redo
A compensate action is assumed to be commited immediately, because its termination does not depend on other transactions. The COMMIT box will be explained in Section 3|this box will serve to motivate the comm-succ--EXIT in the VPL control ow model. The behavior of the execution of a transaction t is as follows:
do. The do--ENTRY causes the start of t: the query
and all its arguments are communicated to the local system and the execution is initiated there. Depending on the local system, the query is passed e.g. by a le/pipe or issued by the usual interfaces of the local system. If t fails, the fail--EXIT is taken. If t succeeds and it is of type compensatable, the comm-succ--EXIT is activated and t is immediately committed at the local system. Otherwise, if t is of type noncompensatable the local system now is in its \prepared to commit" state, waiting until a commit or an abort is sent, and the succ--EXIT is enabled. It must be guaranteed by the local system that it is able to commit the transaction, even if a failure occurs. When a failure (such as a system failure) occurs during the execution of the remote call if the transaction t is idempotent, i.e., it can be executed multiple times always producing the same result, or if t had not yet started its execution at the local system, it is simply resubmitted. Otherwise, the remote call must check whether the transaction has completed and if so, re-use the old results. It is a necessary demand on a remote call to prevent multiple executions of the same transaction that is not idempotent: for example the transfer of a certain amount of money from account A to account B must not increase B twice. Depending on the capabilities of the local system, more or less eort has to be done by the remote call to emulate this behavior. If the failure occurred while the transaction was still running, then we assume that the local system is able to perform backward recovery [BHG87] so that the
Query
transaction is undone on a system-restart without leaving any eects. abort. Depending on the local software system, a signal at the abort--ENTRY may interrupt and abort the execution of a transaction if it has been started by the local system or prevent its execution otherwise. In both cases fail--EXIT is selected. An abort is simply ignored, if: the local system does not accept an interruption after the submission of the transaction, the abort signal arrives too late to catch the transaction at the local system, i.e. the abort arrives after the transaction has already completed and reported its success to its caller by enabling the succ--EXIT. redo. Activation of the redo--ENTRY indicates that further solutions shall be computed, provided the local system supports indeterminism. For example, in a Prolog system the enabling of the redo--ENTRY causes backtracking and computation of another solution|if one exists. Generally, the redo--ENTRY can also be used to bridge the tuple-set interfaces between the VPL engine and database systems: in this case, the remote call issues the transaction at the database, receives a result table and subsequently (on each redo) sends the next tuple to the VPL system. The operational semantics of VPL given in Section 3 are a simpli ed model of the VPL language, only specifying the language's control ow aspects. The VPL language also provides the possibility to handle data: similar to Prolog it supports logic variables. A further important feature of VPL is its powerful communication mechanism using communication variables [KPP93]. As we here only focus on the control aspects, i.e., a simpli ed model, of the language, we have to assume that all local transactions that are waiting in their prepared-to-commit state are known at the commitment time of the global transaction. If the decision of the global transaction is to commit, a commit is automatically sent to all these prepared transactions. Thus, no COMMIT--entry of boxes is explained. If the decision of the global transaction is to abort, the abort--ENTRY of each of them is enabled. The commits' sending aborts to all local transactions respectively must be done in an atomic action: either all local systems are informed, or none is. VPLs communication mechanism, that allows an atomic write of a group of communication variables,
can be used to implement the global commitment procedure (see [BKP93]).
3 Operational Semantics of VPLs Control Flow In this section we develop a control ow model that speci es the semantics of VPLs concurrency control operators. We start with simple boxes|without commit operation, compensate action and abort signal| that have two entries (do--ENTRY and redo--ENTRY) and two exits (succ--EXIT and fail--EXIT). The comm-succ--EXIT will be motivated after the COMMIT box has suceeded. If a signal is sent to the entry of a box, it triggers actions that are understood as the behavior of the box. Internally, a box may maintain one or more switches with a state. A switch has a number of entries and exits; depending on the kind of the switch one or more exits can be enabled. The action of the box results in activating exactly one exit. All boxes have the same \interface", i.e., they offer the same entries and exits so that they can be composed to more complex boxes (nesting). However, every box makes use of only one success exit (see Section 3.6). The box model uses a binary model, because this simpli es the explaination of its semantics. However, an arbitrary number of processes can be spawned by recursively applying the binary model. Also in the implementation of the VPL engine, the source code is rst translated into a binary representation, before intermediate code is generated (similar to [Tar91]). E.g., N processes are spawned in AND parallel as: AND(p1,AND(p2,AND(. . .,pn ))). In the following representation we rst show the program in VPL source code then the transformed program (binary) and the graphical representation with the box model.
do -
:{ a. :{ b.
-
S
redo
-
b
When this box is activated with the do--ENTRY , a is started. Upon success of a, the signal is reported to the switch which then enables the succ--EXIT. If a fails, it starts b. If b succeeds it informs the switch too, and the switch enables the succ--EXIT. If b fails, the simple SOR box fails too (i.e., the fail--EXIT is enabled and the box can be removed). When a signal arrives on the redo--ENTRY , the switch transmits this redo signal to either one of the alternatives a or b, depending on which one had reported its success.
3.2 Simple Sequential AND (SAND) a
do fail
& b.
-
SAND(a,SAND(b,true)).
-
-
a
succ redo
-
b
-
The semantics of the simple box above is similar to those of sequential PROLOG [Sha86] 1 . The success of a leads to the start of b. The success of b enables the succ--EXIT of the simple SAND box, whereas a fail of b causes a redo of a. If a fails the simple SAND box chooses the fail--EXIT.
3.3 Simple Parallel OR (POR) ::{ a. ::{ b.
h h
do
h
-
1
::{ POR(a,POR(b,true)).
-
:{ SOR(a,SOR(b,true)).
-
-
fail h
succ
-
a
fail
3.1 Simple Sequential OR (SOR) h h
-
succ -
-
S2
S1
-
a
-
-
b
Forward execution is represented by bold lines.
redo
abort ?
When the simple POR box is started, both boxes a and b are started. Each box reports its success to switch S2, which selects one of them (usually S2 will choose the faster one), and exits the simple POR box with success. If a redo is received, the switch S2 transmits this signal to the alternative that it had selected previously. Afterwards, again in an indeterministic way another unselected successful alternative can be selected. Failures are reported to switch S1. If both alternatives have sent their fail to S1, the simple POR box is exited with fail.
3.4 Simple Parallel AND (PAND) a
&& b.
do
-
fail
-
PAND(a,PAND(b,true)).
-
-
a -
-
S -
?
succ -
b
redo
6
If the do--ENTRY is activated, both a and b are started. The switch waits until both boxes report success and then enables the succ--EXIT 2. The redo of the box leads to the redo of b. A fail of b results in the redo of a and the restart of b. This behavior can be understood as sequential backtracking . If a fails then the simple PAND box fails. Note that there is an obvious defect in this case: b can be still running. In fact at that time the execution of b is no longer necessary, moreover b should be aborted, before the control exits the box. Thus, we now extend the control ow model by an abort signal, introducing an abort--ENTRY to all boxes. The extension of the boxes described so far is shown in the following.
3.5 Simple Parallel AND (PAND) with Abort a
&& b.
PAND(a,PAND(b,true)).
In the complete VPL language, the computed data of ANDparallel goals must be checked for compatibility. 2
do
-
?
-
-
fail
S -
6
redo
?
-
-
?
-
succ
?
-
a
-
b
6
The behavior of the simple PAND box is slightly modi ed: the fail--EXIT of a is no longer directly connected to the fail--EXIT of the whole box but to the switch and the abort--ENTRY of b. If a fails, the switch waits until b terminates (succeeds or fails), before it enables the fail--EXIT. As the extension towards an abort--ENTRY is quite simple, we omit the gures for \Simple SAND with Abort", \Simple SOR with Abort" and \Simple SAND with Abort". Generally, the abort is directly sent to a, b, and the switch. It is possible that the abort signal does not catch a and/or b directly if it comes too late, but in any case the switch guarantees that the box does not exit with success if the abort--ENTRY is enabled.
3.6 COMMIT a
and compensate(comp a) | .
COMMIT(a,comp a).
?
abort
do
?
-
-
a
-
?
comp{a
-
-
fail
succ comm succ
?
-
S
-
redo
6
A simple box a, representing a compensatable transaction, can be extended by a commit-mechanism in the following way: on activation of the do--ENTRY, a starts and reports its success to the switch. The success of a includes the database-commitment to be sent to a (and to all its subtransactions, if a was composed). Note that this cannot be re ected in the simple control
ow model of VPL.
We now have to dierentiate between committed success and normal success of a box by introducing a new exit comm-succ--EXIT. If there no abort has arrived in the meantime, the switch passes the success of a to the new comm-succ--EXIT. For convenience we combine the two kinds of boxes into one box, where, if the box is a simple box, its comm-succ--EXIT is not used; otherwise the succ--EXIT is not used. In the following the outmost boxes are showed as simple boxes because it is easy to do the extension in the way above if necessary.
&& b.
PAND(a,PAND(b,true)).
?
-
?
-
-
fail
S -
?
-
b
-
b
S2
redo
-
The dierence to the simple SOR box is that the of must not directly trigger the execution of . If was a successful COMMIT box, on backtracking of ( is compensated and its fail--EXIT is enabled), the fail of a now causes the fail of the whole SOR box. In other words: a committing alternative prevents other ones from execution (pruning ). ::{ a. ::{ b.
h h
-
h
::{ POR(a,POR(b,true)).
redo ?
do
The dierence to the \PAND with Abort" box is that the switch must dierentiate between success and committed success. If one box that has reported a committed success, is met on backward execution (it is aborted or redone), a redo must be sent to this box. This causes the execution of the compensate action, the failure of this box and afterwards the failure of the whole PAND box.
3.8 Sequential AND (SAND) The gure of a SAND box with compensation can easily be derived from the \Simple SAND with Abort" box: the success and the commit success exits need not be dierentiated and can be joint into one line.
3.9 Sequential OR (SOR) :{ SOR(a,SOR(b,true)).
fail
abort
succ
?
?
?
-
h
-
-
-
:{ a. :{ b.
-
-
6
6
h h
succ
?
-
a ?
-
6
fail
-
-
?
-
-
-
succ
?
-
a
?
-
3.10 Parallel OR (POR)
abort
do
do
fail--EXIT a b a a a
3.7 Parallel AND (PAND) a
abort ?
S1
-
a
-
-
-
S2
?
?
-
-
b
-
redo
The dierence to the simple POR is that if the switch receives a committed success of either one of the alternatives, it aborts the other one. If nevertheless, this to be aborted alternative completes with a committed success then the switch activates its redo--ENTRY (to trigger its compensate action). Now the operational semantics of VPLcontrol ow has been explained. In the next section we show how this control mechanisms are managed to express the distributed global transactions.
4 Mapping Flex Speci cations into VPL Programs A general Flex Transaction T is a 5-tuple (D,SuccDep,FailDep,ExtDep,AccStates)
where D = ft1,. . .,tng is a nite set of typed transactions
called the domain of T. ti (i = 1; . . .; n) is either a transaction to be executed at a local software system, or again a Flex Transaction (nesting is supported). If ti is compensatable , its type is denoted with c. It is assumed to commit autonomously if it succeeds, and to make all its eects globally visible. The corresponding compensate action of ti is denoted by comp ti . If ti is non-compensatable , its type is denoted with nc and it must wait in its prepared state until its enclosing transaction terminates. SuccDep (set of success dependencies ) is a set of ordering relations, denoted by S , between transactions in the domain. ti S tj means that tj depends on the success of ti, or in other words: tj must not be started until ti has either reported its prepared to commit state (if ti is a non-compensatable transaction), or succeeded and committed its computation (if ti is a compensatable transaction). SuccDep speci es the success order of T. It is a partial order on D. If ti S tj and tj S tk then it must hold that ti S tk . It is not possible that both ti S tj and also tj S ti appear in SuccDep. FailDep (set of failure dependencies ) is a set of ordering relations, denoted by F , between transactions in the domain. ti F tj means that tj depends on the failure of ti, or in other words: tj must not be started until the execution of ti has failed. FailDep speci es the failure order of T. Like SuccDep, it is a partial order on D. SuccDep and FailDep must be chosen so that the order of T, with ordering relation , is a partial order of D where ti tj if and only if ti S tj or ti F tj (for all ti ,tj 2 D). ExtDep is a set of predicates f1,. . .,ng on D, called the set of external dependencies of T. Each i 2 ExtDep is a predicate on ti 2 D. For example, i can be the predicate time < 18:00 and time > 9:00, which constrains ti to be executed during oce hours. In contrast to SuccDep and FailDep, predicates in ExtDep do not refer to the execution state of a ti 2 D. AccStates = fC1,. . .,Cm g is the set of all acceptable states . Each Ci D is called a commit-set of T (i = 1; . . .; m). A commit-set contains a
set of transactions ti (i = 1; . . .; n) saying that if all ti belonging to this commit-set have successfully completed (i.e., they are in the prepared state or have committed) then the Flex Transaction T is successful and this commit-set C, which has recursively united with all commit-sets of all nested Flex Transactions that are an element of C, is called the solution of T. If the cardinality of AccStates > 1, the Flex Transaction can be considered as an indeterministic speci cation: more than one solution is possible. In [KPE92] we have shown an algorithm that automatically translates a general Flex Transaction into a VPL program. This algorithm employs VPLs communication variables to control the execution of the local transactions. We have recognized that a relevant subclass of Flex Transaction, termed declarative Flex Transactions , can be represented without explicitly expressing the control ow with communication variables. Additionally we guarantee that the execution of declarative Flex Transactions retains the same amount of possible parallelism. Declarative Flex Speci cations can be considered as declarative logic programs and can be expressed solely by VPLs AND/OR concurrency control constructs. Surprisingly enough, we have found that almost all relevant speci cations can be mapped into binary Flex Transactions, and only some contrived ones require communication variables. Thus, it is not really a restriction to reduce our attention to declarative Flex Transactions. In fact, any general Flex Transaction can also be represented by a non-execution-equivalent declarative Flex Transaction where the parallelism is reduced or in other words, arti cial preferences concerning commit-sets have to be introduced. First let us de ne a subclass of Flex Transactions, termed binary Flex Transactions . As the name suggests, a binary Flex Transaction is de ned over a set D consisting of two transactions t1 and t2 . Two types of binary Flex Transactions can be identi ed: There exists no or exactly one success dependency between t1 and t2 (ti S tj , i; j 2 f1; 2g, i 6= j) and AccStates consists only of one commit-set C1 containing both transactions t1 and t2 ; or: There exists no or exactly one failure dependency between t1 and t2 (ti F tj , i; j 2 f1; 2g, i 6= j) and AccStates consists of two commit-sets C1 and C2 with each of these commit-sets containing exactly one tk (k 2 f1; 2g).
The rst type of binary Flex Transactions can be represented by sequential or parallel AND; the second type by sequential or parallel OR. The parallel operators can be selected in either case, if no dependencies exist. All other types of Flex Transactions that are de ned over a set D with exactly two transactions are not meaningful and do not belong to the class of binary Flex Transactions. Examples for contradictionary or redundant speci cations are (i; j 2 f1; 2g; i 6= j):
A Flex Transaction speci es only the control ow between dierent local transactions. A VPL program is able also to denote the data to be accessed, thus the query can be speci ed. We represent a declarative Flex transaction in VPL as a structure flex/6 (the number after the \/" denotes the arity), where also the queries are included. A ti designates the execution of Queryi, and activates a remote call at the corresponding local system.
There exists more than one success dependency between ti and tj , i.e., tiS tj and tj S ti (contradiction). There exists more than one failure dependency between ti and tj , i.e., tiF tj and tj F ti (contradiction). There exist failure as well as success dependencies between ti and tj : ti S tj and ti F tj (contradiction). AccStates consists of one commit-set containing only one ti : then tj can be omitted from the speci cation (redundancy). AccStates consists of two commit-sets C1 =ftig and C2=fti ,tj g: then the commit-set C2 is redundant, because the success of the binary Flex Transaction only depends on the success of ti and thus tj can be omitted from the speci cation (redundancy). We now de ne the class of declarative Flex Transactions as the class of binary Flex Transactions extended by Flex Transactions with the following four properties: D = ft1,t2,t3g SuccDep = ft1S t2g FailDep = ft1F t3g AccStates = fft1,t2g,ft3gg Obviously the semantics of this type of Flex Transaction is:
Ti serves as transaction identi er, and Qi speci es the query to be executed by Ti . Qi is typed and has either the form c(Queryi,CompQueryi) or nc(Queryi), depending on whether Ti is a compensatable or noncompensatable transaction. CompQueryi is the compensate action of Queryi. Qi either denotes a transaction at the local software system, or is a Flex Transaction (nesting). The VPL program flex to vpl(Decl Flex, Start Goal,VPL Program) translates a given declarative Flex Transaction T into a VPL program and returns Start Goal. The execution of Start Goal corresponds to an execution of T.
IF t1 THEN t2 ELSE t3.
Additionally, the class of declarative Flex Transactions also contains a Flex Transactions, where the cardinality of D = 1 and exactly the single transaction t1 2D forms the only commit-set.
ex( [T1 = Q1 ,T2 = Q2 ], %set of transaction identi ers and corresponding typed %queries (Qi = c(Queryi,CompQueryi) or Qi = nc(Queryi)) SuccDep, %set of success dependencies between ti ,tj . (i,j2f1,2g) or %empty set FailDep, %set of failure dependencies between ti ,tj . (i,j2f1,2g) or %empty set [ExtDep1,ExtDep2], %set of external dependencies 1 and 2 AccStates %set of acceptable states ).
% map AND/OR concurrency
ex to vpl( ex([T1=Query1,T2=Query2],SuccDep,FailDep, [Pi1,Pi2],AccStates),Goal,Prog) :{ ( SuccDep=[] & (AccStates = [[T1],[T2]]; AccStates = [[T2],[T1]] ) & Goal = ((Pi1 & Goal1) && (Pi2 & Goal2)) & Prog = Prog3 ; SuccDep = [T1 S T2] & ( AccStates = [[T1],[T2]] ; AccStates = [[T2],[T1]] ) & Goal = ((Pi1 & Goal1) & (Pi2 & Goal2)) & Prog = Prog3 ; FailDep = [] & ( AccStates = [[T1],[T2]] ; AccStates = [[T2],[T1]] ) & Goal = Tid & Prog = [(Tid ::{ Pi1 & Goal1), (Tid ::{ Pi2 & Goal2)|Prog3] & new tid(Tid) ; FailDep = [T1 F T2] & ( AccStates = [[T1],[T2]] ; AccStates = [[T2],[T1]] ) & Goal = Tid & Prog = [(Tid :{ Pi1 & Goal1), (Tid :{ Pi2 & Goal2)|Prog3] & new tid(Tid) )&
ex to vpl(Query1,Goal1,Prog1) &
ex to vpl(Query2,Goal2,Prog2) & append(Prog1,Prog2,Prog3).
ex to vpl(c(Query,CompQuery),Tid,(T3=Query3)],[Ti S Tj], [(Tid :{ Goal and compensate(CompQuery) | true)|Prog]) :{ new tid(Tid) & ex to vpl(Query,Goal,Prog).
ex to vpl(nc(Query),Tid,[(Tid :{ Goal)|Prog]) :{ new tid(Tid) & ex to vpl(Query,Goal,Prog). % map IF-THEN-ELSE
ex to vpl( ex([(T1=Query1),(T2=Query2),(T3=Query3)], [Ti S Tj],[Ti F Tk],[Pi1,Pi2,Pi3], [[Ti,Tj],[Tk]]),Tid,[(Tid :{ (Pii & Goali & Pij & Goalj)), (Tid :{ (Pik & Goalk))|Prog2]) :{ map indices(T1,T2,T3,Ti,Tj,Tk,Pi1,Pi2,Pi3,Pii,Pij, Pik,Query1,Query2,Query3,Queryi,Queryj,Queryk) & next tid(Tid) & ex to vpl(Queryi,Goali,Progi) &
ex to vpl(Queryj,Goalj,Progj) &
ex to vpl(Queryk,Goalk,Progk) & append(Progi,Progj,Prog1) & append(Prog1,Progk,Prog2). % map a unary ex
ex to vpl( ex([(T=Query)],[],[],[Pi],[T]),(Pi & Goal),Prog) :{ ex to vpl(Query,Goal,Prog). % local query
ex to vpl(Query,Query,[]) :{ true.
The procedure append/3 de nes the relation that the list in the third argument is the concatenation of the lists given in the rst two arguments [Sha86]. The procedure new tid/1 returns a new transaction identi er. The procedure map indices maps the indices 1,2,3 into i,j,k. The operator \;" has been used to make the program more readable. It has the semantics of sequential OR and can also be implemented by using VPLs sequential OR operator \:{", introducing an extra clause for each alternative. We assume that \S " and \F " are de ned as in x operators.
Example 4.1 (Automatic Derivation of a VPL Start Goal and Program from a Given Declarative Flex Speci cation ).
In this example a declarative Flex Transaction T is speci ed, consisting of two transactions t3 and t4 where t3 again is a declarative Flex Transaction| to make the program readable, t3 and T are de ned outside the call of flex to vpl/3 by using the logic variables Flex3 and T. The semantics of T are to book a ight either at the klm or at the twa airline company. The klm ight is preferred. Only after a ight can be booked, also a hotel room reservation shall be performed. Both ight booking transactions are compensatable, whereas the room reservation transaction is non-compensatable. Note that the queries in the local systems are implemented by means of remote calls, for example:
book klm(Client Name,Flight No) :{ remote call(
[email protected], insert(seat table(Client Name)), result(Flight No)). cancel klm(Flight No) :{ remote call(
[email protected], delete(seat table(Flight No)), ).
The VPL goal: ?-
(Flex3 = ex([(t1 = c(book klm(Nm,Fno), cancel klm(Fno))), (t2 = c(book twa(Nm,Fno), cancel twa(Fno)))], [],[t1 F t2], [pi1,pi2],[[t2],[t1]]) )& (T = ( ex( [(t3 = Flex3), (t4 = reserve room(Nm,Hotel))], [t3 S t4], [],[pi3,pi4],[[t3,t4]]) )&
ex to vpl(T,StartGoal,Prog).
returns:
StartGoal = pi3 & t0 & pi4 & t0000 Prog = [(t0 :{ pi1 & t00 ), (t0 :{ pi2 & t000 ), (t00 :{ book klm(Nm,Fno) and compensate(cancel klm(Fno)) | true), (t000 :{ book twa(Nm,Fno) and compensate(cancel twa(Fno)) | true), (t0000 :{ reserve room(Nm,Hotel))]
5 Scheduling of Distributed Transactions The start goal as computed by Example 4.1 can be represented in the control ow model as: SAND1 (pi3, SAND2 (SOR3 (SAND4 (pi1, COMMIT5 (book klm(Nm,Fno), cancel klm(Fno))), SAND6 (pi2, COMMIT7 (book twa(Nm,Fno), cancel twa(Fno)))), SAND8 (pi4, reserve room(Nm,Hotel))))
and can thus be scheduled directly by the VPL engine as explained below. To be able to name the different boxes, we have added unique indices. During the execution of a Flex Transaction in the proposed model, all possible commit-sets are left open as long as possible (indeterminism), and no transaction in its domain must be started concurrently more than once. However, we will show that in our interpretation of Flex Transactions, backtracking is supported
so that a transaction can be compensated and afterwards started again. If pi3 is ful lled, SAND2, SOR3, and SAND4 are started. pi1 is tested and if it is ful lled COMMIT5 is started which calls book klm and immediately commits the transaction, if a ight can be booked at klm. The compensate action of book klm is speci ed as cancel klm. If book klm fails, or if pi1 is not ful lled, the second alternative of SOR3 is tried: (?) pi2 is tested, and if it is true, then the transaction book twa is executed. If book twa completes successfully, it commits and speci es cancel twa as its compensate action. If the ight could neither be booked at klm nor twa, SOR3 fails and causes SAND2 to fail, which in turn causes SAND1 and thus T = StartGoal to fail. If SOR3 succeeds, then SAND8 is activated, which tests pi4. If pi4 is ful lled, the reserve room transaction is executed and if it succeeds, the computation of T completes. The solution of T can be understood as the commit-set C containing one of the transactions book klm or book twa (depending on which alternative of SOR3 has been selected), and of the transaction reserve room. Now the global transaction T is committed, which|according to our assumption|triggers the commits in all local databases where transactions have been executed that are still waiting in their prepared states. In our example, reserve room is committed. Let us now assume that no room could be reserved, or that pi4 is not ful lled. pi4 could for example be the test, whether the price of the ight is less than $100. Then SAND8 fails, and causes a redo of SOR3. If the ight was booked at twa, no alternative exists and SOR3 fails, causing SAND2, SAND1, and T to fail. If the
ight was booked at klm, a redo is sent to this transaction which activates the corresponding compensate action cancel klm, because book klm is a compensatable subtransaction. After the compensate action has terminated, the second alternative of SOR3 is tried and the control ow of the execution resumes at the place marked by (?) above. The above description also shows how the VPL scheduler works if backtracking (redo) is necessary. In [KPE92] we have claimed that in the following situation a failure dependency tiF tj exists, and ti has succeeded, but in a subsequent execution a failure occurs that causes a redo of ti and subsequently, if ti cannot be redone (this is usually the case for database transactions), then ti is forced to fail, possibly by calling ti 's compensate action.
the following execution is forbidden: tj may be executed and selected to be member of a commit-set. For example, we claimed that an \IF t1 THEN t2 ELSE t3" Flex Transaction t be executed as3 : t :{ t1 ! t2. t :{ t3. However, as the semantics of Flex Transactions have not yet been formally de ned in the literature, we think that also the interpretation: t :{ t1 & t2. t :{ t3. where backtracking is not disabled, has its justi cation. In the second case, a solution can be found, if t2 fails. As an example let us assume that in Example 4.1 the ight at twa can be booked at a lower price in a second attempt than at klm, thus now pi4 is ful lled and the room booking can be executed which may result in the success of T. We believe that especially when dealing with long-living transactions in heterogeneous environments the success probability should be maximized. Besides function replication, also backtracking contributes to this goal. The same argumentation holds for function replication represented by SOR, where we believe that the second alternative should be left open as a choice, even if the rst one had succeeded and was later on redone and forced to fail. Example 5.1 (Towards More Parallelism (OR)). Example 4.1 can be released towards more parallelism in that some of the dependencies are omitted. If for example t1 F t2 is dropped, then the resulting VPL program will employ OR parallelism: Prog = [(t0 ::{ pi1 & t00 ), (t0 ::{ pi2 & t000 ), (t00 :{ book klm(Nm,Fno) and compensate(cancel klm(Fno)) | true), (t000 :{ book twa(Nm,Fno) and compensate(cancel twa(Fno)) | true), (t0000 :{ reserve room(Nm,Hotel))]
Example 5.2 (Towards More Parallelism (AND)). If in Example 5 also t3 S t4 is dropped, the resulting VPL start goal will employ AND parallelism: StartGoal = (pi3 & t0) && (pi4 & t0000) 3 The \!" has the same control ow eect as the COMMIT, namely to prune all other alternatives, but does not automatically trigger the commit of all prepared database transactions in its scope.
Example 5.3 (Non-compensatable only ).
Transactions
A further modi cation to Example 5.2 is to drop all internal success and failure dependencies, by changing all transaction types to nc, then the resulting VPL start goal and program are: StartGoal = (pi3 & t0 ) && (pi4 & t0000 ) Prog = [(t0 ::{ pi1 & t00 ), (t0 ::{ pi2 & t000 ), (t00 :{ book klm(Nm,Fno)), (t000 :{ book twa(Nm,Fno)), (t0000 :{ reserve room(Nm,Hotel))]
The scheduling of declarative Flex Transactions with the VPL engine employs the maximum possible parallelism that is allowed by the speci cation. The above examples have shown that the control ow that is supported by the VPL language is powerful enough to correctly execute a given declarative Flex Transaction speci cation. However, sometimes there exists information about the estimated time a transaction will run or about its costs. Services that can ful ll the same task, not necessarily cause the same costs: function replication allows the semantically same task to be done in completely dierent ways. The knowledge about costs may introduce preferences between transactions that the scheduler should be able to handle. Information about the run-time of a transaction may be used to delay its start, if it is a very short running transaction that is non-compensatable. This can avoid long and unnecessary locking times at the local system. Information about transactions can either be statically asserted or the scheduler may incrementally generate this information by \learning" from previous executions. We will now brie y show how even more advanced schedulers can be built by using the technique of metainterpretation. A meta-interpreter [Neu88, Sha86] executes programs written in the same language the meta-interpreter is implemented in. Logic based languages oer this possibility. A meta-interpreter typically re-implements (rei es [Sha89]) some features to improve them, and re-uses (absorbs ) all other features from the language. We show the design of a \meta-scheduler" that has access to information about the run-times of transactions: estimated time/2 is a procedure that gives an estimate about the run-time of a transaction. The meta-scheduler re-implements the AND parallelism of VPL (\ && ") so that the shorter running transaction is scheduled later. sleep/1 is a procedure that waits the speci ed amount of time.
meta scheduler(T1 && T2) :{ (estimated time(T1,Time1) && estimated time(T2,Time2)) & Time1 > Time2 | Delta is Time1 - Time2 & (T1 && (sleep(Delta) & T2)). meta scheduler(T1 && T2) :{ (estimated time(T1,Time1) && estimated time(T2,Time2)) & Time2 > Time1 | Delta is Time2 - Time1 & ((sleep(Delta) & T1) && T2). meta scheduler(T1 && T2) :{ | (meta scheduler(T1) && meta scheduler(T2)).
An example for a meta-scheduler that has knowledge about transaction costs can be implemented as shown in the following. The OR parallelism (\::{") of VPL is re-implemented. Let parallel procedure/2 be a procedure that returns all clauses of a parallel procedure for a given predicate. The costs are evaluated for every clause (= alternative) and then the list of clauses is sorted by increasing costs. The metascheduler then executes the alternatives sequentially (try next/2). Also other strategies can be chosen. meta scheduler(T) :{ parallel procedure(T,Clauses) & sort by costs(Clauses,SClauses) & try next(T,SClauses). sort by costs(Clauses,SClauses) :{ append(L1,[(H1 ::{ B1),(H2 ::{ B2)|L2],Clauses) & estimated cost(B1,Cost1) & estimated cost(B2,Cost2) & Cost1 > Cost2 & append(L1,[(H2 ::{ B2),(H1 ::{ B1)|L2],SClauses1) & sort by costs(SClauses1,SClauses) & sort by costs(Clauses,Clauses). try next(T,[(H ::{ Body)|Clauses]) :{ Body. try next(T,[ |Clauses]) :{ try next(T,Clauses). try next(T,[]) :{ true.
The two examples for meta-schedulers demonstrate that it is very easy to ne-tune dierent scheduling strategies. This is mainly due to the fact that the VPL engine itself provides a very powerful control ow model. Also the combination of cost and time estimates leads to powerful, yet simple to implement, meta-schedulers.
6 Conclusions The handling of distributed transactions in a heterogeneous environment requires advanced transaction models. A scheduler for such advanced transactions is much more complicated than a local database scheduler. It has to provide a lot of controlling features, must be adaptable to changing environments and should provide a declarative control language for the speci cation of execution plans.
A scheduler that can fully rely on the coordination features of our logic based VPL programming language ful lls all these requirements. We have shown how speci cations that belong to the class of declarative Flex Transactions can automatically be converted to a VPL program the execution of which by the VPL engine corresponds to an execution of the Flex Transaction. Features like function replication and multiple commit-sets allow indeterminism in the speci cation and can be represented by logical OR. The composition of a task by several transactions that are required for a solution maps to logical AND. Preferences between transactions that are speci ed as success or failure dependencies lead to a constraining of the possible parallelism: sequential AND and OR operators are used instead of the parallel ones. The semantic compensation of transactions allows an early commitment so that locking times in local systems can be reduced. VPLs built-in transaction mechanism supports the commitment of solutions which in turn triggers an atomic commitment at several databases. The programmable backtracking mechanism of VPL automatically calls the corresponding compensate action of a compensatable transaction, if it has to be redone. Additionally we have motivated, why backtracking contributes to increase the success probability of a global (multi database) transaction. Most problems can be speci ed in VPL in a more declarative way than in the usual mathematical notation used to specify Flex Transactions: the AND/OR VPL operators have a logical reading. The execution states of transactions need not be explicitly and actively be tested by the VPL based scheduler, because the computation state implicitly is re ected by the execution state of the VPL program. We have explained the control aspects of the VPL language by means of a control ow model. The expansion towards more sophisticated scheduling strategies is easily provided by meta-interpretation where VPLs control operators can be re-used. A prototype of VPL, viz. the runnable formal speci cation of the language, is available. Although this was not in the scope of the paper, we have to add that also the communication aspects and thus the failure atomicity of transactions can be guaranteed by an executor written in VPL: the communication objects of the language are persistent and recoverable after failures and thus provide a high-level, shared data based, reliable communication mechanism.
References [ANRS92] Mansoor Ansari, Linda Ness, Marek Rusinkiewicz, and Amit Sheth. Using exible transactions to support multi-system telecommunication applications. In Proceedings of the 18th VLDB Conference, Vancouver, British Columbia, Canada, 1992. [BCDE93] Omran A. Bukhres, Jiansan Chen, Weimin Du, and Ahmed K. Elmagarmid. InterBase: An execution environment for heterogeneous software systems. IEEE Computer, 26(8), August 1993. [BHG87] Ph. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. AddisonWesley, 1987. [BHP92] M. W. Bright, A. R. Hurson, and S. H. Pakzad. A taxonomy and current issues in multidatabase systems. IEEE Computer, March 1992. [BKP93] O. Bukhres, e. Kuhn, and F. Puntigam. A language multidatabase system communication protocol. In Proceedings of the 9th International Conference on Data Engineering. IEEE Computer Society, April
1993.
[DEB93] IEEE Data Engineering Bulletin, 16(2), June 1993. Special Issue on Work ow Applications. [ELLR90] A. Elmagarmid, Y. Leu, W. Litwin, and M. Rusinkiewicz. A multidatabase transaction model for InterBase. In Proceed-
ings of the 16th International Conference on Very Large Data Bases, August 1990.
[Elm92]
A. K. Elmagarmid, editor.
Database Transaction Models for Advanced Applications. Morgan Kaufmann Publishers, 1992.
[GMK88] H. Garcia-Molina and B. Kogan. Node autonomy in distributed systems. In Pro-
ceedings of the International Symposium on Databases in Parallel and Distributed Systems, pages 158{166, Austin, Texas,
December 1988. IEEE Computer Society Press.
[GMS87] Hector Garcia-Molina and Kenneth Salem. Sagas. In Proceedings of the ACM SIGMOD Annual Conference, San Francisco, May 1987. [KPE91] e. Kuhn, F. Puntigam, and A. K. Elmagarmid. Multidatabase transaction and query processing in logic. In A. K. Elmagarmid, editor, Database Transaction Models for Advanced Applications, chapter 9. Morgan Kaufmann Publishers, 1991. [KPE92] e. Kuhn, F. Puntigam, and A. K. Elmagarmid. An execution model for distributed database transactions and its implementation in VPL. In Proceedings of the International Conference on Extending Database Technology, EDBT'92, Vienna,
March 1992. Springer Verlag, LNCS. [KPP93] e. Kuhn, H. Pohlai, and F. Puntigam. Conienna currency and backtracking in V Parallel Logic . Computer Languages, 19(3), Juli 1993. [Kuh93] e. Kuhn. Multidatabase language requirements. In Proceedings of the 3rd International Workshop on Research Interests in data Engineering, Interoperability in Multidatabase Systems, RIDE-IMS-93. IEEE
[Kuh94]
Computer Society, 1993. e. Kuhn. Fault-tolerance for communicating multidatabase transactions. In Proceedings of the 27th Hawaii International Conference on System Sciences (HICSS),
Wailea, Maui, Hawaii, January 4{7 1994. ACM, IEEE. accepted. [LMR90] W. Litwin, Leo Mark, and Nick Roussopoulos. Interoperability of multiple autonomous databases. ACM Computing Surveys, 22(3), 1990. [MC93] Johan Widen Mats Carlson. SICStus Prolog User's Manual. Swedish Instituten of Computer Science, PO Box 1263, S-164 28 KISTA, Sweden, January 1993. [Neu88] Gustaf Neumann. Meta-Programmierung und Prolog. Addison-Wesley, Bonn, 1988. (in German). [RELL90] M. E. Rusinkiewicz, A. K. Elmagarmid, Y. Leu, and W. Litwin. Extending the transaction model to capture more meaning. SIGMOD RECORD, 19(1):3{7, 1990.
[RS93]
[Sha86] [Sha89] [Tar91]
Marek Rusinkiewicz and Amit Sheth. Speci cation and execution of transactional work ows. Technical Memorandum TM-STS-023284, Bellcore, August 1993. Also to appear in the Future Directions in Database Systems, W. Kim (ed.), ACM Press. Ehud Shapiro. The Art of Prolog. The MIT Press, 1986. E. Shapiro. The family of concurrent logic programming languages. ACM Computing Surveys, 21(3):412{510, September 1989. P. Tarau. A simpli ed abstract machine for the execution of binary metaprograms. In Proceedings of the Logic Programming Conference'91, pages 119{128, ICOT, Tokyo, September 1991.
Acknowledgements We acknowledge the encouragement and support of Manfred Brockhaus, the head of the Department of Computer Languages at the TU Vienna. We would like to thank Charly Sabitzer and Thomas Tschernko for their helpful comments on this text.