Recovery in Heterogeneous System - CiteSeerX

2 downloads 0 Views 151KB Size Report
scheme is intermediate between [Clematis and Gianuzzi 1993] and [Gregory and Knight. 1985]). A "deserter process" can stop the operation of the entire system ...
Recovery in Heterogeneous System L. Strigini

A. Romanovsky

F. Di Giandomenico

IEI-CNR

University of Newcastle

IEI-CNR

Pisa, Italy

Newcastle, UK

Pisa, Italy

We discuss backward error recovery for large software systems, where different subsystems may belong to essentially different application areas, like databases and process control. Examples of such systems are found in modern telecommunication, transportation, manufacturing and military applications. Such heterogeneous subsystems are naturally built according to different design "models", viz. the "object-action" model (where the long-term state of the computation is encapsulated in data objects, and active processes invoke operations on these objects), and the "process-conversation" model (where the state is contained in the processes, communicating via messages), which also imply different ways of organising backward error recovery. In the objectaction model, backward recovery is naturally organised via atomic transactions; in the process-conversation model, via conversations. We show how checkpointing and roll-back can be co-ordinated between two sets of such heterogeneous subsystems, namely sets of message passing processes organised in conversations and data servers offering atomic transactions. Our solution involves altering the virtual machine on which the programs run, and programming conventions which seem rather natural and can be automatically enforced. We demonstrate the feasibility of the approach by showing how it would work with the Ada language, and show a toy example.

1 Introduction A general means for providing fault tolerance is backward error recovery. This consists in "rolling back" a computation, in which an error has been detected, to a previous state (a recovery point or checkpoint), assumed to be correct, a copy of which had previously been saved for this purpose. The computation may then be restarted from this correct state, and will proceed correctly, if the cause of the previous error is not encountered again. Backward recovery in a system of multiple processes requires co-ordination between processes to avoid inconsistencies in the overall state of the system after roll-back. Among the various schemes studied for ensuring consistency, we consider in this paper planned conversations (in which a set of communicating processes are rolled back together), which also allow the recomputation after roll-back to use different code from the first computation, so that errors

caused by software design faults may not be repeated at the new execution [Randell 1975], and atomic transactions (in which a sequence of changes on a set of data items are undone together), which allow the designer to manage together error recovery and concurrency control in accessing data [Lynch, Merrit et al. 1993]. Conversations and atomic transactions are in fact dual models of recovery, as discussed in [Shrivastava, Mancini et al. 1993]: they are two ways of describing the same backward recovery philosophy in the two models (or design styles) which the authors of [Shrivastava, Mancini et al. 1993] call the "object-action" model (where the long-term state of the computation is encapsulated in data objects, and active processes invoke operations on these objects), and the "process-conversation" model (where the state is contained in the processes, which communicate via messages). Conversations have been studied mainly in application scenarios of tightly coupled processes, interacting in a predictable fashion, e.g. control applications. Atomic transactions are characteristic of applications where different, independent agents access shared data at times that they decide autonomously from each other and from the data repositories. It would be wrong to conclude from the consideration of the duality of the two models, that one of these two models, or design organisation principles, is sufficient for all applications. Design styles must first of all be natural for the people who have to use them, and there is little doubt that for each of these two programming models there is a sizeable community of designers who feel at ease with it. Requiring a designer to use an unnatural (for him or her) design style for the sake of language standardisation or savings in development tools would be quite wrong: the designers' intelligence is best spent in making a design right, rather than in adapting to a counterintuitive notation, and high-level design errors, as could be caused by casting an application in an "unnatural" computational model, are certainly a concern even for mature software development organisations. So, both the "object-action" and the "processconversation" models (as well as others) are here to stay, and so are the styles of backward recovery that best adapt to them. An interesting aspect of large, complex modern systems is that they may include parts that belong to traditionally separate application domains, and thus are naturally designed according to different approaches. For instance, the "intelligent network" evolution of the telephone network relies on the interaction of real-time control components with vast on-line data bases, with different timing and availability requirements [Gopal and Griffith 1990]. Similar factors of heterogeneity may be expected in other large-scale control systems, e.g. in transportation (air traffic, vehicle-highway systems), and integrated manufacturing control. In all these cases, if backward recovery is employed, difficulties may be expected in co-ordinating such heterogeneous subsystems. As a consequence of these considerations, we have studied in [Strigini and Di Giandomenico 1991] a scheme for co-ordinating the backward recovery of subsystems of processes using conversations with "data-server" subsystems offering an atomic transaction-based interface (we call these "TDSCs", for transactional data server components). This would allow a clean structuring of the overall system into active processes, whose states are protected via conversations, and passive data, protected by the atomic transaction-based servers. Our approach imposes some further restrictions on the patterns of interaction that the application designer is allowed to program between the subsystems, and requires some additional facilities

in the run-time environment and/or the language compiler 1. The result is to hide from the application programmer some of the complexity of guaranteeing consistent recovery in such complex systems. It is appropriate to emphasise that we do not propose a monolithic platform offering facilities for every application programmer to use both atomic transactions and conversations. Rather, we assume that the whole application system is divided into data-server subsystems (TDSCs) and client subsystems (possibly running on separate and even different platforms). The former are designed (at the application level, plausibly) to offer their clients the use of atomic transactions. The client subsystems are built according to the "process-conversation" model, and it is therefore natural to give them interpreter-provided support for recovery via conversations. So, the programmers of client processes would use conversations to organise the recovery of groups of communicating client processes. However, when they have to use the data in the TDSCs, they would exploit the atomic transactions offered by the TDSCs to guarantee concurrency control and manage roll-back. This, however, would not prevent some kinds of inconsistency in backward recovery. We then propose enhancements to the interpreter mechanisms which support conversations (which, in a distributed system, may well be limited to that subset of platforms that runs client processes) to preserve consistency across roll-back operations. In this paper, we present our proposal in some detail, and analyse its application to programming in Ada as an example and as proof of feasibility. In this example, the operation of the conversation-based subsystems and their interactions with the data servers are co-ordinated via standard mechanisms, which are programmed in Ada itself, instead of being hidden in a run-time support. We use a restricted variety of conversations, limited to side effect-free subprograms, to simplify the implementation of roll-back. We show how a design aid tool could take care of the clerical aspects of using this approach to construct conversations, guaranteeing the correctness of the code that co-ordinates recovery. Checkpointing and recovery operations for the transactional data server components are instead the responsibility of their designers, as we assume that data-specific2 recovery mechanisms and concurrency control policies will be used. For these, we specify general requirements and show an example of their application. In Section 2, we give some background about schemes for backward recovery, and specify the flavours of conversations and atomic transactions which we consider. In Section 3, we describe our scheme for their co-ordination. Section 4 specifies our Ada implementation, illustrating it

1

We shall call these, together, the interpreter on which the application runs, using the terminology of [Lee and Anderson 1990]. For the purpose of most of the ensuing discussion, one can picture the functions of the "interpreter" as realised in a complex run-time support, either part of a kernel or of a layer overlaid above the kernel, intercepting calls from the application process. In practice, many of these functions could be realised by code and data structures at the application level, produced by the use of libraries or pre-processing before compilation. This possibility will be discussed later on.

2

Where "application" indicates the programs managing the data, not other programs using those data.

with a toy example. Section 5 discusses what we have achieved, the merits of this proposal and its possible developments.

2 Conversations and Atomic Transactions. General Model 2.1 Backward Recovery We shall briefly justify here our choice of planned conversations and atomic transactions as the two models of backward recovery which we are trying to integrate. As stated before, backward recovery is defined as "rolling back" a computation, in which an error has been detected, to a previous, correct state. By contrast, forward error recovery involves correcting errors so as to move the computation to a new correct state, which it could have reached had the error not happened in the first place. Backward recovery has the appeal that it does not require redundant computation (except the saving of checkpoints) during errorfree processing (as opposed, for instance, to running multiple copies of a computation to mask errors by voting), and that it is a general recovery mechanism, so that one general set of tools for backward recovery could be used in many different applications. It has thus been the subject of much research work in the last twenty years or so. The main complication with backward error recovery is that roll-back must be consistent: if the computation is made up of communicating processes, rolling back only some of them will make them ignore, during reexecution, communications that took place between the saving of the recovery point and the detection of the error: a rolled-back process might wait forever for a message, or read a new value of a shared variable when it should read a previous, now overwritten value, etc. Furthermore, the new execution may produce different outputs from the one that was aborted by the error, e.g. because the error is not produced, or the state of the external world has changed, or the designer explicitly provided for different code to be used in the re-execution: the correct processes, if acting upon output from the execution that was interrupted, may find themselves in states inconsistent with that of the rolled-back process. Inconsistency may be avoided by "propagating" the roll-back operation to more processes, but this may in turn produce a "domino effect" [Randell, Lee et al. 1978] whereby an error causes a large amount of error-free computation to be undone. When providing what we called "general tools" for backward recovery (in the form of interpreter-provided mechanisms, or even of programming conventions and templates), guaranteeing consistency is thus an essential requirement, and limiting the domino effect is an important one. Researchers have suggested various ways of organising the use of backward recovery so as to avoid inconsistency, simplify the task of application programmers and limit the domino effect. Among these techniques we can quote planned conversations, [Randell 1975] and atomic transactions [Lynch, Merrit et al. 1993], which, together, are the topic of this paper and will be detailed in the next two subsections. A third possibility is given by programmer-transparent schemes, where the application programmer may be given the illusion of a completely reliable underlying machine. These last include [Barigazzi and Strigini 1983; Kim and You 1990] and [Strom and Yemini 1985].

The problem of propagating roll-back may be avoided altogether by requiring effective confinement of errors inside individual components, and guaranteeing (in the run-time support) that all the messages received or sent by a failed component (in the example above) after a checkpoint and before an error are respectively replayed to it, and not delivered a second time to their recipients, during its re-execution [Birman, Joseph et al. 1984; Powell, Bonn et al. 1988]. This solution is attractive but requires, besides perfect confinement of errors, deterministic processes: running from a given initial state, and receiving a given sequence of messages, the retry of a process must produce exactly the same message exchanges produced by the previous (aborted) run. These assumptions are often used in the world of distributed data-bases, for designing algorithms that can cope with simple (crash) hardware faults, and can be guaranteed (still for physical faults only) by special design of the hardware. However, it is difficult to make them hold for errors caused by software faults. The programmer-transparent schemes are also limited, in that they risk being less efficient (e.g. in the amount of checkpoint information to be stored) and less effective (in terms of achieved reliability) than those requiring the intervention of the programmer, like conversations and atomic transactions. A programmer can design more effective checks of correctness than those possible for an application-independent interpreter, and can plan for the checkpointing to take place at times when the state information is less and/or easier to verify. Requiring the programmers to specify directives for checkpointing and recovery (or commit of changes) does allow higher effectiveness or efficiency, but also complicates their task. Schemes like conversations and atomic transactions facilitate the programmers' task, compared to programmers developing backward recovery on an ad-hoc basis, by giving them a sound template to follow, and standard mechanisms to use.

2.2 Conversations [Randell 1975] proposed the use of a programming construct called a recovery block which combined checkpointing and backward recovery with retry by a diverse variant of the code. This allowed redundancy and design diversity to be hidden inside program blocks (Fig. 1). ensure Acceptance Test by Alternate1 else by Alternate2 . . . . . . . . . . else by Alternaten else error.

Fig. 1: The recovery block construct The Alternatei modules are diverse implementations of the function of the whole recovery block, so that if an execution (by Alternate1 , for instance) fails the Acceptance Test, the following retry, by, for instance Alternate 2, may not repeat the same error. Each retry executes a different Alternatei (and is subject to the same Acceptance Test), and exchanges messages that differ (in their sequence, or in their contents) from those exchanged by the previous alternate. So, co-ordinated roll-back is necessary. To this end, the same paper proposed conversations. A conversation (Fig. 2) can be described as a multi-process recovery block: when two or more processes enter a conversation, each must checkpoint its state, and they may only leave the

conversation (thus committing the results computed during the conversation, and discarding their checkpoints) by consensus that all their acceptance tests are satisfied. Processes can asynchronously enter a conversation, but all must leave it at the same time. During the conversation, they must not communicate with any process outside the conversation itself (violations of this rule are called information smuggling). So, the occurrence of an error in a process inside a conversation requires the roll-back of all and only the processes in the conversation, to the checkpoint established upon entering the conversation. Conversations may be nested freely, meaning that any subset of the processes involved in a conversation of nesting level i may enter a conversation of nesting level i + 1. P

Q

R P, Q, R : processes enter conversation (local checkpoint)

global close of conversation (commit/roll-back) conversation boundary local close conversation request

Fig. 2: Two nested conversations [Randell 1975] did not specify an implementation language construct for the conversation scheme. Several proposals have later appeared of language constructs and implementations, differing in the resulting semantics of conversations. A critical survey can be found in [Di Giandomenico and Strigini 1991]. This latter paper concludes that the simpler implementations of conversations, based on static membership and restricted communication mechanisms, appear useful, albeit for restricted classes of applications (e.g., cyclic control systems); extensions to accommodate the needs of a wider class of applications are paid for in terms of a less controllable execution and added complexity in the run-time support, and their usefulness appears questionable. The results developed in the present paper apply to a rather wide class of implementations of conversations. We only assume that support is provided for some kind of conversation. A simple model is as follows. The interpreter provides application processes with an enter_conversation, a close_conversation and an abort_conversation calls. The enter_conversation call may, in some implementations, carry as a parameter a conversation name or an explicit list of participants; additional controls on the legality of a process entering a given conversation may exist ([Di Giandomenico and Strigini 1991] discusses in some detail the different options for naming and protection). Both local acceptance tests (inside the participating processes) and a global acceptance test could exist; the latter would allow checks on conditions that only hold after all the participants have concluded their respective parts of the

conversation, or that involve the contents of variables which are not all visible by any one participant (how and whether this could be practically implemented depends on details of the operating system architecture and the computational model presented to the application programmer). For an application process P, the interpreter executes these calls as follows: enter_conversation(conversation_name) : check that it is legal for P to participate in conversation conversation_name; checkpoint the state of P; record P among the participants in conversation_name (this information is also used in preventing communication between the participants in the conversations and processes outside it); activate the first alternate of P in conversation_name. close_conversation (it is not necessary to specify the conversation name: it is the innermost conversation P is in; let us call it C): record that P agrees to close the conversation C; suspend P until all processes in C have agreed to close C, then run the global acceptance test, if any; if it is passed, then atomically commit the results computed by all participants during the conversation and discard their checkpoints (the conversation is successfully closed), returning control to P after the close_conversation statement; if the global acceptance test fails, proceed as for abort_conversation below. abort_conversation (as for close_conversation, it is not necessary to specify the conversation name; again let us call it C): record that each participant in C has tried its current alternate and failed; rollback all the variables modified by all the processes in the current alternate of C; if each participant has an alternate that it has not yet tried, activate it, otherwise raise an appropriate exception (or return an appropriate error message) for all participant processes. This model is fairly general, though it may exclude some of the more complex variations. The Ada example which we present later applies to a more restrictive model of conversations, although slightly more flexible models may well be practical with languages other than (simpler than) Ada.

2.3 General Description of Atomic Transactions Atomic transactions [Gray and Reuter 1992; Lynch, Merrit et al. 1993] are a useful means for structuring access to shared data by multiple independent agents or clients. Atomic transactions help in solving the two problems of concurrency control and error recovery, by letting the programmer of each client see only consistent states of the data, as they evolve through a strictly sequential execution of transactions, and by not letting him/her see the effects of faults (usually). Atomic transactions come in many flavours, which share the general concept but vary in details of their specification as well as their implementation. For our co-ordination scheme, we have assumed a quite general model of atomic transaction (which we have later specialised when specifying the example implementation in Ada), as follows. A transactional data server

component, or "TDSC", encapsulates a set of data, with a set of specific access methods which can be called by client processes by "remote procedure call" (RPC); and it has the additional, special methods, start_transaction, abort_transaction and commit_transaction. Any call to methods other than these three must be bracketed between a start_transaction and an abort_transaction or commit_transaction; the whole sequence, including the bracket statements, is called a transaction. start_transaction returns a transaction_identifier; all the subsequent calls in the same transaction must carry this identifier as a parameter. We also assume that the client process which started a transaction can pass the transaction identifier to other processes, which can then operate on the TDSC concurrently with the initiator process. The transaction identifier is effectively a capability [Saltzer and Schroeder 1975] for access to the protected data in the current transaction 3 . This is a more general model of transaction than usually considered, and is motivated by the consideration that programmers of tightly interacting groups of processes may well wish to distribute the operations that are part of the same transaction among multiple processes. In this case, the programmer of the client processes is in practice seeing the set of client processes as a multi-thread process performing the transaction4. Our proposal for co-ordinated recovery in heterogeneous systems, which can deal with this generalised model of transaction, can of course deal with single-process transactions as well. Guaranteeing serialisability is the responsibility of the programmer of the TDSC, who must implement a policy for accepting or rejecting start_transaction and commit_transaction calls (and for honouring the calls to other methods) that satisfies the consistency invariants established for the data it protects. We assume the classical "ACID" properties of transactions: atomicity (with respect to failures); consistency; isolation (or serialisability); and durability (or permanence). Our proposal is limited to a scheme for co-ordination with conversations, so we do not assume any particular way of implementing these properties; a run-time support may take care of atomicity and provide mechanisms for isolation and durability, and the application programmer (of the clients and to some extent of the TDSC) is probably responsible for consistency and any non-standard policies for isolation (application-specific concurrency control policies). Nested transactions are allowed: the start_transaction method can be called from within a transaction. The TDSC guarantees serialisability between transactions at the same level of

3

Choosing the identifier in pseudo-random fashion from a sparsely populated name space can make this protection as effective as required. Of course the designers of the TDSC may also implement additional protection policies at the application level, e.g. to protect data confidentiality.

4

So, different client processes share the data encapsulated in the TDSC either by performing separate transactions, in which case serialisability is guaranteed by the TDSC itself, or by sharing the same transaction, in which case the correct co-ordination of their accesses is their responsibility. The TDSC may provide help in the form of internal consistency-check methods, and an ability to autonomously detect some of its internal errors, in which case it may abort on-going transactions and/or recover from the errors transparently.

nesting and between them and other atomic operations performed within the same parent (enclosing) transaction. We do not assume any special mechanism for co-ordinating transactions involving multiple TDSCs: if desired, they can be organised by the application programmer, possibly through an "umbrella" TDSC which takes care of starting and aborting/committing the transactions on all the other TDSCs involved.

3 Our Proposal for Co-ordinated Recovery 3.1 Principles We now assume a system made up of "conversational" components (processes, tasks, or threads) which synchronise their checkpointing/recovery via conversation-like mechanisms, and of server components (TDSCs) which implement an atomic transaction mechanism: they accept requests (in the form of messages or remote procedure calls) to start, abort and commit transactions and serve them according to some policy that ensures proper concurrency control among competing transactions. We assume that the interpreter on which the "conversational" components run accepts the enter_conversation, close_conversation and abort_conversation requests defined in Section 2.2, and that the division of components into conversational components and TDSCs is known to this interpreter. In addition, we assume that the interpreter is able to recognise and intercept transaction control requests (open, abort, commit) issued by the conversational components towards the TDSCs. Our proposal is then to realise programmer-transparent co-ordination between the recovery of the conversation-based and transaction-based subsystems. Our rationale is that, while programmers of an individual subsystem should be trusted to use the mechanisms available there properly, it seems excessive to ask them to plan the co-ordination of heterogeneous components, presumably developed separately, which implement different models of computation and interact in possibly complex ways. If a component requests operations on a transactional component while engaged in a conversation, it must be ensured that these operations can be undone if the conversation were to be rolled back. The TDSC only provides an undo operation (the abort_transaction) for sequences of operations starting with a start_transaction (and not including the corresponding commit ). So, a first rule for the programmer of the client component is: RULE 1: Any operation on a TDSC that is requested by the client process while participating in a conversation must be part of a transaction started within the innermost current conversation (we say that that conversation creates the transaction). The interpreter must block, and signal as errors, all calls that are not so bracketed. We already assumed that TDSCs always reject requests which are not part of a transaction (notice that transactions may also be started by a process which is not involved in any conversation); what we are adding now is a prohibition of sequences like: transaction_id := start_transaction(TDSC1); .. open_conversation; ....

operation (transaction_id, TDSC1); .... close_conversation

Rule 1 requires a transaction to be started in the TDSC before any operation on its data can be performed (by components in the conversation considered). In practice, it requires the TDSC to "remember" the state of those data before they are changed by these operations, a necessary condition for these operations to be "undoable" if the conversation in which they were performed has then to be rolled back. This ability must be preserved so long as the possibility of the conversation being rolled back exists. So, the interpreter must also delay the commit of transactions, lest a subsequent roll-back of a conversation leave the effects of the transaction standing. A transaction can only be committed (i.e., the changes it made can be made permanent) at the same time as a conversation in which it was created is committed (that is, after it has passed its acceptance test[s]), and must be aborted if the conversation does not pass its acceptance tests and therefore aborts (rolls back) the current alternate. We now consider how long the commit of the transaction is to be delayed: 1.

If the transaction is nested inside another transaction on the same TDSC, but both were created by processes in the same conversation, the inner-nested transaction can be committed without delay: its effects can still be undone, if necessary, by aborting its parent transaction.

2.

Considering instead the outermost transaction opened inside a conversation (on a given TDSC), the rule for delaying its commit is more complex. If the conversation that created the transaction is the outermost conversation currently open, then the commit of the transaction must be delayed until the conversation itself commits: at that point, it is legal to make all the effects of that conversation visible to components outside it. However, more complex situations may exist. In Fig. 3, transaction T2 is created by conversation C3. If T2 were committed at the end of C3, its results would be visible to component P, outside C2. If then C2 rolled-back, this visibility would become a case of information smuggling through the borders of C2. So, T2 must only be committed when C2 commits. Likewise, if T1 did not exist (that is, if T2 were the outermost transaction on component Q), T2 should not be committed until C1 commits. We can say that the commit of a transaction T must be delayed until the commit of its associated conversation, which can be found as follows.

DEFINITION: The conversation C associated with transaction T is the innermost conversation such that: i) C encloses the opening of T, i.e., the process starting T has previously opened, and not yet committed or aborted, C; and ii) either C is a top-level conversation, or the conversation immediately enclosing C has an associated transaction of which T is a sub-transaction (at any level of nesting). Notice that at run-time the associations between conversations and transactions are easily determined as transactions are opened: an appropriate naming scheme for conversations can provide the nesting information needed. In the case that multiple processes can take part in the same transaction, as allowed by our generalised model, a second rule must be imposed on the programmers of client components to prevent information smuggling between conversations:

RULE 2: If a transaction has multiple participants, these must belong to the same enclosing (innermost) conversation. In other words, the participants in a conversation can thus use the data protected by TDSCs as part of the data modified (but in an "undoable" way) by the conversation, but no process outside the conversation can. This rule can easily be enforced by the interpreter: the transaction must be created when the creator process is already in the conversation (Rule 1), and thereafter, due to the very protection mechanism of conversations, it can only communicate the transaction identifier to other participants in the same conversation. Last, the fact that the request to commit a transaction is not executed until the commit of its associated conversation implies a third rule for the programmer (not enforceable by the interpreter): RULE 3: Two transactions at the same level of nesting can be opened within the same conversation only if they are known not to interfere with each other, otherwise deadlock may ensue. This rule does not address the possibility of deadlock on shared TDSCs in general, but only of deadlock arising from the forced delay in committing transactions due to the mechanism for coordination with conversation-based components. In this context, "two transactions interfere" means "the TDSC would want (based on its concurrency control policy) to postpone the commitment of one until the other is done", rather than "their order affects the final state of the TDSC" (often called dependency between transactions). Two interfering transactions would cause deadlock, as the TDSC would never commit the first transaction until the conversation were finished (due to our requirement that the actual commit be deferred), and the second transaction could not be processed until the first is committed, but the initiator of the second transaction would not be likely to agree to commit the conversation until the second transaction had been processed. Only a time-out could break the deadlock. Whether two transactions interfere is thus a matter of how the TDSC is implemented internally. Obeying Rule 3 is the responsibility of the client's programmer. Conditions for interference must be notified to the clients' designers, for instance in the documentation of the TDSC, or by providing a special method for this purpose, or a special return code if the parameters of a start_transaction allow the TDSC to forecast possible interference. A simple solution (not uncommon in databases) is to avoid interference altogether by an optimistic policy in the TDSC: the TDSC accepts all start_transaction calls, but as soon as it perceives that two on-going transactions are interfering, aborts one of them. In this case, the programmers of the clients would never have to worry about the Rule 3. Fig. 3 shows some applications of these rules. Rules 1 and 2 are easy to enforce automatically. Rule 3 creates a more serious problem, in that interference between transactions is a function of the internal organisation of the TDSC. Our model is asymmetric. Conversations (being collections of processes) consist of (and could themselves be seen as) active, or client, entities. Transactions are sets of operations performed by these entities on passive TDSCs. The asymmetry of our model is manifest in that a transaction created by a conversation is affected (e.g., by delaying its commit) by the behaviour of the processes in that conversation. Instead, if a process P starts a conversation while

engaged in a transaction on a TDSC, the processes in the conversation are not affected by that TDSC, except insofar as they attempt to access the TDSC. We shall see that it makes sense to visualise, in our model, conversations as creating or owning transactions, but not vice versa. This asymmetry implies, for us, that the organisation of recovery between clients and TDSCs sees the client as the caller of an "idealised fault-tolerant component" [Lee and Anderson 1990]. Thus, a TDSC detecting an internal error may either treat it internally, hiding it from the client, or abort the transaction and signal the client. Thus, for instance, we shall specify that a transaction abort, generated by causes internal to the TDSC, should not automatically abort the conversation of the client. The client is assumed to be better equipped for dealing with errors in the server according to the client's own goals. 'transactional' component

'conversational' component

P

Q

X

C1 C2

T1 C3 T2 actual commit for T2

actual commit for T1

enter conversation (local checkpoint)

link between a transaction and its creator conversation

global close of conversation (commit/roll-back)

'open transaction' request

local close conversation request conversation boundary link between a transaction and its associated conversation (it determines the actual commit time of a transaction)

X

mark for a forbidden communication

other requests to transactional component (replies are not shown) 'commit transaction' request transaction start and commit

Fig. 3: Scenarios of co-ordinated conversations and transactions

The main implementation problem for our co-ordination scheme is that the closing of a conversation must be implemented as an atomic commit of the conversation and all its associated transactions. There are two solutions: restricting each conversation to work on one TDSC only (more precisely, to be associated with only one transaction 5), or having TDSCs organised for supporting atomic commit, viz. two-phase commit. We assume in the following discussion that TDSCs export a prepare_to_commit operation which can be used for two-phase commit. This allows an additional improvement in efficiency: when the client process issues a commit_transaction, the interpreter translates it into an immediate prepare_to_commit request to the TDSC. This: i) allows the TDSC to signal back to the client any failure to commit6; 2) notifies the TDSC that the transaction in question will contain no more operations before its actual commit, which is useful information for the TDSC's concurrency control management; 3) possibly simplifies and speeds up the atomic commit phase for multiple transactions required at the time of the close_conversation call, so recovering some part of the performance loss due to the forced delay of transaction commit operations requested by the client processes.

3.2 Detailed Specification of the Operations We specify here in more detail how the interpreter supporting the conversation model can be extended for the co-ordination of recovery between TDSCs and processes in conversations. This implies intercepting all the operation requests (and the corresponding replies) involved in the interaction between conversational processes and TDSCs. These include those already examined when treating pure conversations in Section 2.2. In addition, the communication primitives regarding calls to TDSCs (remember that we assume communications with TDSCs by RPC) are also intercepted and processed by this extended interpreter. Inter-process communication calls would be intercepted by the conversation interpreter in any case to prevent information smuggling; here this processing is extended with the actions peculiar to the co-

5

This would not eliminate the need for both the transaction and the conversation to be one atomic operation. It would simply reduce the duration of the window of vulnerability during which a fault could cause one of the two to abort and the other to commit. Any fault-tolerant protocol must of course be designed to tolerate the fault situations which are expected to arise with non-negligible probability. With multiple TDCSs involved, in a distributed environment, the two-phase commit ability which we describe seems absolutely necessary.

6

If at the time of committing the conversation, a TDSC found itself unable to commit a transaction (for any reason: internal failure, an optimistic concurrency control policy depending on aborting conflicting transactions, ..), the client's conversation would need to abort (roll-back) the current alternate. This may be too drastic a reaction, and a waste of computation. The client's programmer may prefer to be able to specify an alternative action locally to the alternate in execution, without aborting it. This reaction could be purely local to the client process, e.g. attempting a new, identical transaction, or a different, equivalent one, or, if the transaction was for desirable but non-essential data updates, the client could just give up on this update). If the problem in the TDSC is only detected after invoking the close_conversation statement, which does not allow an error return code (either the alternate commits or it aborts), the programmer would not have this option.

ordination of recovery. The state information needed by this interpreter is logically described as an acyclic directed graph, that we call co-ordination graph, CG, containing two kinds of nodes and three kinds of oriented arcs. There is one CG for each active top-level conversation. The nodes are: -

conversation nodes, each labelled with the name of a conversation, and corresponding to currently active conversations, i.e., conversations for which a process had issued an enter_conversation and the corresponding close_conversation or abort_conversation has not yet been executed; and

-

transaction nodes, each labelled with a pair {TDSC name, transaction identifier} (we shall use the notation "Serv:Trans" to denote transaction Trans on TDSC Serv), and representing active transactions, i.e. transactions for which a start_transaction has been accepted by the TDSC and a commit_transaction or abort_transaction has not yet been accepted.

An arc in the co-ordination graph may be: -

a parent-child arc, connecting nodes of the same type: the child node denotes a conversation (or a transaction) directly nested inside the conversation (or transaction) corresponding to the parent node, when there is a relation father-child between them. Parent-child arcs group nodes of the same kind into trees (conversation trees and transaction trees);

-

a creation arc. A creation arc connecting a conversation node Conv to a transaction node Trans means that is the first transaction started on a given TDSC by any process among those currently executing Conv (note that in our asymmetric model a conversation may create a transaction, but a transaction cannot create a conversation);

-

an association arc, which connects a transaction node to its associated conversation node, found according to our definition in Section 3.1. Notice that not all the transactions have an associated conversation: for instance, if a process in a conversation Conv starts a transaction Trans1 and then a nested transaction Trans2 on the same TDSC, Trans2 has no associated conversation. Trans2 has actually neither an association arc nor a creation arc, but is connected in the CG by the parent-child arc from its parent transaction.

Both conversation and transaction nodes have associated descriptors (respectively, conversation descriptor and transaction descriptor). A conversation descriptor stores the following information: 1) the list of processes involved in the conversation; 2) information for managing commit and rollback (e.g. pointers to copies of checkpointed state, information about which processes are ready to commit, and other temporary variables used by the atomic commit protocol) and, optionally, 3) information about which processes are allowed to enter this conversation (in some models of conversations, the set of processes that may enter a conversation is easily determined at compile time, and the information could be stored in runtime data structures for added protection). A transaction descriptor stores: 1) information about the state of the transaction, which can be "ongoing" or "requested to commit" (after a commit has been requested by an application process, but not yet executed: the interpreter will deny all requests to operate in a transaction whose state is "requested to commit") and 2) information to

be used by the multiple atomic commit protocol (for example, information necessary to realise a two-phase commit). The co-ordination graph is generated, updated or examined by the interpreter when executing the operations involved in the interaction between conversational processes and TDSCs, that is: enter_conversation, close_conversation, abort_conversation (we assume that these three are seen by the application process as procedure calls or system calls), start_transaction, commit_transaction, abort_transaction and all other calls to methods of TDSCs. The high-level specification of these operations, including all the actions performed on the co-ordination graph structure, is given below. The conversation tree subset of the co-ordination graph contains the information necessary for checking the legality of inter-process communication (when the operations send, receive, write operations in shared memory etc. are invoked by a process inside a conversation). To simplify this description, we now define a few properties and symbols for our entities. Let Proc, Conv and Serv:Trans be a generic process, conversation, and transaction, respectively: 1)

Proc in conversation means that Proc is a participant in at least one conversation. In terms of CG, it means that at least one conversation node Conv exists in whose descriptor Proc is registered as a participant;

2)

Serv:Trans committable means that Serv:Trans can be immediately committed at the moment the commit_transaction is invoked, that is, Serv:Trans is not associated with any conversation and no nested transactions are open inside Serv:Trans. In terms of CG, it means that the node Serv:Trans is not directly owned by a conversation (that is no creation arc exists for the node Serv:Trans) and it is a leaf in its transactional tree;

3)

Proc has right to abort Serv:Trans is a predicate on Proc and Serv:Trans: it is true if Proc can legally abort Serv:Trans, that is, Proc is a participant in the conversation inside which Serv:Trans was started. This is actually a necessary condition for Proc to know the transaction identifier (capability) Serv:Trans. In terms of CG, it means that in CG there is a directed path from a conversation node in whose descriptor Proc is registered as a participant to the node Serv:Trans, starting with a creation arc possibly followed by one or more parent-child arcs;

4)

Conv owns Serv:Trans is a property on Conv and Serv:Trans: it is true if the transaction Serv:Trans was started inside conversation Conv or a transaction Serv:Trans^ started inside Conv exists such that Serv:Trans is nested inside Serv:Trans^ and all the transactions, if any, that are ancestors of Serv:Trans and descendants of Serv:Trans^ are all started inside Conv. In terms of CG, it means that a creation arc exists directly between nodes Conv and Serv:Trans or between Conv and the nearest ancestor node (according to parent-child arcs) of Serv:Trans for which a creation arc exists;

5)

Proc has right to commit Serv:Trans is a predicate on Proc and Serv:Trans: it is true if Proc can legally commit Serv:Trans, that is, Proc is a participant in the conversation inside which Serv:Trans was started (that is, the conversation owning Serv:Trans) and no nested transactions are open inside Serv:Trans. In terms of CG, it means that Proc has right to abort Serv:Trans and Serv:Trans is a leaf in its transaction tree;

6)

CinnerProc represents the innermost active conversation in which Proc is a participant. In terms of CG, it means that CinnerProc is the last node in the path starting from the root of the conversation tree where Proc is registered as a participant.

Notes on this specification: -

all operations update CG in such a way that at every instant it represents the real situation: the conversation and transaction nodes in CG (with their connection arcs) correspond to the active conversations and transactions in the system;

-

for simplicity, we omit the details of those operations required only of the pure conversation-support part of the interpreter (e.g. the checkpointing of processes at the start of conversations), and all redundant checks of the consistency of the CG that could be included in an actual implementation;

-

in the pseudo-code below, "Proc" designates the process issuing an operation request;

-

comments are bracketed with "/*" and "*/";

-

the notation Serv. means the invocation of the operation exported by the TDSC called Serv;

-

the pseudo-code that implements an operation requested by the application process, say abort_conversation(in conversation_name=Conv*), contains an invocation of internal operations of the interpreter including - in this case - "abort conversation Conv*". This is not a recursive definition. The former operation is seen by the application programmer of the client process, and hides a series of operations by the interpreter, including "abort conversation Conv*"; the latter operation (called in the body of one of our pseudo-code subprograms) is the "standard" conversation abort (undo all memory changes, move the program counter to the beginning of the next alternate or, if alternates are exhausted, to an exception handler). Likewise, when the interpreter intercepts a call issued by a client to a TDSC, the interpreter's operations will include forwarding the start_transaction call to that TDSC.

enter_conversation(in conversation_name=Conv*) begin if (Conv* is not in CG) then if not (Proc in conversation) then /* update CG: */ create a conversation node labelled Conv* and use it as root of a new conversation tree; register Proc among the participants in Conv*; /* operation ends here*/ else /* Proc is in a conversation */ update CG: create a conversation node labelled Conv* and register Proc among the participants in Conv*; create a parent-child arc from CinnerProc to Conv*; /* operation ends here*/ else /* Conv* exists in CG */ if (Conv* is in CG as child of CinnerProc) then /* update CG: */

register Proc among the participants in Conv*; /* operation ends here*/ else end. close_conversation( ) begin /* assume CinnerProc=Conv* */ find the set S of transactions associated with Conv* and the set Q of transactions owned by Conv* if (Q includes transactions in "ongoing" state) then /* error: for each transaction there should already have been a request to commit or abort it */ abort Conv*; send abort_transaction requests to TDSCs for all transactions in S and Q7; /* update CG: */ delete nodes in S, Q and Conv* /* operation ends here */ else /* multiple-atomic commit of Conv* plus transactions in S */ if (all transactions in S and the conversation Conv* are ready to commit) then for each Serv involved in S: Serv.commit_transaction(all transactions in S on TDSC Serv); commit conversation Conv*; /* update CG: */ delete nodes in S and node Conv* else for each Serv involved in S or in Q: Serv.abort_transaction(all transactions in S and Q on TDSC Serv); abort Conv*; /* update CG: */ delete nodes in S, Q and node Conv* end. abort_conversation( ) begin find set R of transactions associated with or owned by Conv*; send abort for each of them and abort Conv*; /* update CG: */ delete all nodes in R and node Conv*; end. start_transaction(in TDSC=Serv, in transaction_type=..., in parent_id= .., out transaction_id, [in/out TDSC-specific parameters]) begin

7

We are thus creating a "conservative" interpreter, which requires the application programmer always to close all the transactions he/she has opened. An alternative specification would be that of a more friendly interpreter, which helps the application programmer to "clean up" at the end of a conversation. Such an interpreter at this point would atomically commit C* plus all transactions in S or in Q. A caution to be used then is to commit the transactions in the appropriate order (bottom-up along their order of nesting): one could argue that application programmers relying on such a powerful aid would easily loose control of the effects of multiple transactions.

result:=Serv.start_transaction(..., .., transaction_id,[TDSC-specific parameters]) if (result ≠ OK) then /* operation ends here */ /* else: start_transaction was successful */ if not (Proc in conversation ) then /* no coordination needed: operation ends here */ /* else: Proc is in a conversation */ /* update CG: */ create a transaction node labelled {Serv:} with state="ongoing"; if then create a creation arc from CinnerProc to this transaction node; create appropriate association arc for this transaction node if (parent_id=Null) then use this transaction node as root of a new transaction tree else create a parent-child arc from Serv:parent_id to Serv:transaction_id end. commit_transaction(in TDSC=Server, in transaction_id=Trans) begin if not (Proc in conversation) then result:=Serv.commit_transaction(Trans); /* Trans was not recorded in any CG, no need to update a CG */ /* else: Proc is in a conversation; Trans is recorded in its CG */ if not (Proc has right to commit Serv:Trans) then /* else: Proc has a right to commit Serv:Trans */ if (Serv:Trans committable) then result:=Serv.commit_transaction(Trans); /* update CG: */ delete node Serv:Trans; else result:=Serv.prepare_to_commit(Trans); if (result=OK) then /* update CG: */ set Serv:Trans state to "commit requested"; else 8 end. abort_transaction(in TDSC=Serv, in transaction_id=Trans) begin if not (Proc in conversation) then result:=Serv.abort_transaction(Trans); /* operation ends here */ if (Serv:Trans is in state "commit requested") then /* operation ends here */ if not (Proc has right to abort Serv:Trans) then else result:=Serv.abort_transaction(Trans); if (result="abort executed")

8

We choose not to automatically abort the conversation. The rationale of this choice is explained at the end of Section 3.2 and in footnote 6.

then /* update CG: */ delete node Serv:Trans and all descendants end.

Each request to a method of a TDSC is treated as follows: method_name(in TDSC=Serv, in transaction_id=Trans, [in/out parameters specific of the method]) begin if not (Proc in conversation) then result:=Serv.method_name(Trans,[parameters specific of the method]); /* operation ends here */ /* else: Proc is in a conversation */ if (Serv:Trans is in state "ongoing") then result:=Serv.method_name(Trans,[parameters specific of the method]); /* operation ends here */ else end.

4 An Ada Implementation: Specification and an Example 4.1 Purposes and General Approach This Section specifies a possible implementation of the ideas described so far. Its purpose is to demonstrate that these ideas could work in practice, and to validate our specification of recovery co-ordination by studying the problems in mapping it to an actual programming language. We doubt that the effort of building a full-fledged interpreter for our proposed scheme could be justified outside a large application project, and we have thus accepted an implementation which has probably a limited applicability. However, describing it will provide a demonstration of feasibility, and some insight into what can be achieved with the means we chose. We have chosen the Ada language, for reasons explained later, as the vehicle for specifying an implementation. The general approach is to implement the functions of the interpreter (supporting conversations and co-ordinated recovery) in application code rather than in the runtime support for the language. The operations and data structures required are therefore part of the code and variables of application programs. The proper manipulations of these variables have to be added to the code of the alternates, following a set of programming conventions which we shall specify in detail. The best solution is then to offer the application programmers an extended Ada "dialect", including the calls for manipulating conversations and transactions, which a pre-processor can translate into Ada (we found it impossible to use just libraries and maintain the "dialect" as simple as seemed reasonable). This can be modelled as a layered implementation, in the style of [Brownbridge, Marshall et al. 1982], as in the figure below, where the intermediate layer which intercepts the calls to the basic kernel (here the Ada run-time support) is implemented by the pre-processor rather than by a separate run-time component. To use Ada without changes to its run-time support, we have specified a rather restrictive model of conversations. For atomic transactions, we have allowed a rather general model.

Turning now to discuss our choice of Ada, we would ideally have used existing languages and operating systems already offering facilities for concurrency, atomic transactions and conversations. No such ready-made system is available. For instance, Arjuna [Shrivastava, Dixon et al. 1991] offers support for building atomic transactions, but no support for concurrency outside data sharing through transactions. It appeared necessary at least to choose between having a concurrent language (which would spare us the need to deal with operating system calls directly), and atomic transaction facilities. In the end, we chose Ada as a wellknown language with built-in concurrency and inter-process communication.

Application code using conversations

Interpreter for conversations and coordinated recovery (knows identity of TDSCs)

Application code in TDSC (possibly using some support for, e.g., stable memory, commit protocols for distributed data)

Basic kernel (process activation, inter-process communication): here, Ada run-time support : propagation of requests to TDSCs Fig. 4: The structure of the proposed Ada implementation For extending an existing language the way we are considering, a full-fledged object-oriented language would be the natural candidate: one could build new libraries of classes with the required properties and the user could use them with very little concern for their internal implementation. However, we could not find a suitable concurrent object-oriented language. There are several approaches to concurrency in object-oriented languages [CACM 1993; Wyatt, Kavi et al. 1992] with Concurrent C++ [Gehani and Roome 1988] on one extreme and the Actor-family languages [Agha 1990] on the other. Generally speaking, such languages support concurrency of either objects or processes. Attempting to build a conversation tool within concurrent objects seemed unnatural, while the languages with concurrency of processes did not meet our practical requirements. Ada [ANSI 1983] has neither conversation nor atomic transaction tools, but it has data encapsulation with abstract data type features which can be used to implement atomic transactions. Its concurrency features are quite powerful, allowing both interacting processes with equal rights (for building a conversation scheme) and processes in client-server relations (for atomic transaction tools). Therefore Ada appeared a good choice for building a wellunderstandable demonstration with some chance for being used in practice. We now describe the three steps of the case study: the specifications of the conversation scheme, of the TDSCs and of the support for their joint recovery.

4.2 Implementation of Conversations in Ada Most implementations proposed for the conversation approach [Randell 1975] consist of extensions of conventional concurrent languages. We wanted to find an approach that could be used within the standard Ada language. After considering all known schemes, we chose a restrictive scheme very similar to Kim’s concurrent recovery block scheme [Kim 1982] and, to some degree, to Anderson and Knight’s exchange scheme [Anderson and Knight 1983]. The restriction consists in that conversations are only allowed among a set of processes spawned together (Fig. 5): a process cannot freely form conversations with arbitrarily chosen other processes in the system. Hence the name "concurrent recovery block": from outside, an alternate is indistinguishable from a sequential block of execution, like an alternate in a sequential recovery block.

P

p1 p2 p3 p4

pa pb pc

Fig. 5: Concurrent recovery block 4.2.1 Conversations in Ada In our scheme, a fault-tolerant unit of software is an Ada procedure (a similar scheme could be realised for functions), and will be called a "fault tolerant procedure", or ft-procedure. It is built as a set of alternates, each a procedure with the same interface (formal parameters) as the whole ft-procedure. The caller of the subprogram does not need to be aware of either its fault-tolerant implementation or its internal concurrency. Any concurrency occurs within an alternate, where multiple tasks can be spawned; these tasks communicate, by rendezvous or by data sharing, within the alternate. Different alternates may have different numbers of tasks. An alternate is completed when all of its tasks have terminated. Each task contains "local" acceptance tests, checking variables accessible to that task, and a "global" acceptance test may be specified to be performed after the spawned tasks have terminated but before returning from the fault-tolerant subprogram (i.e., committing the conversation). If any test fails, the next alternate is executed. If all alternates fail, a FAILURE exception is raised. Conversations can be nested, which means that any task can call a subprogram which is in its turn structured as a conversation. Each task can use two statements to terminate an alternate: one to signal that the local acceptance test (any check internal to the task) is not satisfied and the other that the task completed correctly. Tasks which do not call either statement are eventually aborted via a time-out. The specification of the acceptance tests is of course to be decided based on the specification and

details of the ft-procedure. However it includes, as a minimum, that all exceptions raised during the execution of an alternate must be treated without propagating outside the ft-procedure; a catch-all handler is automatically provided for this purpose, to abort the alternate upon unforeseen exceptions. The treatment of exceptions obviously requires some caution. In each alternate, each task is allowed to have its own handlers for its exceptions and for the FAILURE exceptions arising from its own calls to (nested) ft-procedures: all these are local problems of the given task. If it fails to process an exception, it will be aborted in due time. In addition, we introduce a deadline mechanism: the programmer can bound the execution time of each alternate and therefore of the whole conversation. Within the limitations of the timing primitive of the language (the specification of the timed entry call in Ada does not provide for an actual real-time deadline mechanism), the caller of the fault-tolerant subprogram is thus guaranteed to receive either the correct result, or the notification of a failure, by the time the runtime support signals the expiration of the deadline. This is especially important when a task in a conversation uses another, nested conversation, as the duration of the calling task, and hence of the alternate where it belongs, can still be bounded. We shall specify rules guaranteeing that the fault-tolerant subprogram is side effect-free. Otherwise, "information smuggling" could not be prevented with an ordinary Ada run-time support. As a bonus, the absence of side-effects also makes a checkpointing mechanism unnecessary. 4.2.2 Ada Dialect for Conversations As explained before, we specify a dialect of Ada, obtained by adding special constructs for conversations and imposing some restrictions on the allowable uses of the standard parts of the language. This dialect is meant to be paired with an appropriate pre-processor, which would translate a program written in the dialect into conventional Ada. In Section 4.2.3 we will discuss how conversations could be programmed in standard Ada (this will be described as a set of conventions and as a skeleton for programmers to follow) and therefore the peculiarities of our pre-processor (which is then specified in Appendix B). Our dialect has statements similar to those often used in the literature to describe conversation schemes (and especially for concurrent recovery block). Ada is quite a complex and flexible language which gives a designer many choices and opportunities. Our dialect does not offer all these opportunities because that would mean an extremely complex pre-processor. Thus, we propose a special way of developing an ft-procedure. The declaration of an ftprocedure should be given in the following way (we use bold italic for our new constructs, bold for Ada keywords): ft_procedure Name( declaration of parameters);

An ft-procedure returns its result via its parameters, or else raises the predefined exception FAILURE. The latter signals an error which could not be recovered during the execution of the ft-procedure. The body of an ft-procedure is specified in our dialect as follows: ft_procedure Name( declaration of parameters) is

....... -- ** ensure Test( list of actual parameters) by Altern1 else by Altern2 ....... end Name;

The global test for the given conversation is represented by function Test. The list of actual parameters to the Test function call can be a subset of the list of formal parameters given to the ft-procedure. This function should have no side effect and all its parameters should be of in mode. We allow these to include out parameters of the ft-procedure, an exception to Ada rules, which is necessary for the global test to be useful (in Section 4.2.3 we discuss how this exception is implemented). The specifications of the Altern1, Altern2, ... and Test subprograms and their bodies can be written either in the declarative part of the ft-procedure (position marked by comment ** above) or outside the ft-procedure. These subprograms should be declared as follows: alternate Altern1(declaration of parameters of ft-procedure); alternate Altern2(declaration of parameters of ft-procedure); ..... function Test(declaration of some of parameters of ft-procedure) return BOOLEAN;

The body of each alternate looks as follows: alternate Altern1(declaration of parameters of ft-procedure) within time_out1 is ... -- declaration of shared variables ... -- and common types for application tasks (if any) task T11 is entry ...; ....; end T11; ..... task T1N is entry ...; ....; end T1N; task body T11 is begin ........ task_done; end T11; ....... task body T1N is begin ........ alternate_error; ......... task_done; end T1N; end Altern1;

where -

T11, ..., T1N are application tasks, that are concurrently executed to form an execution of alternate Altern1;

-

time_out1 is a time-out for the execution of the alternate Altern1 (if the time-out expires, all tasks are terminated, this alternate is considered unable to ensure Test and the next alternate, if any, is executed);

-

the statement task_done is to be executed in the body of each task when it completes execution successfully;

-

the statement alternate_error is to be executed in a task when it decides that it is not able to complete its execution successfully; its effect is to abort the alternate, similarly to the effect of a time-out expiring.

These last two statements can be used after checking some test (invariant, condition) local for the task or for set of tasks. 4.2.3 How to Write Conversations in Ada We now discuss the implementation of a conversation in pure Ada. This would be the result of pre-processing of a program written in our dialect. At the same time, our description gives a set of conventions which an application programmer could follow in developing conversations directly in Ada. In Appendix A we give a complete example written in standard Ada, demonstrating the use of these conventions. The details of the operation of the pre-processor, that is, the main steps of code translation and the correspondences between the code of conversations in our Ada dialect and the code in standard Ada, are described in Appendix B. The structure of an ft-procedure is fixed, and consists of calling the alternates successively and checking their acceptance tests, until one alternate passes all tests or alternates are exhausted. We now list the conventions that our hypothetical application programmer has to follow in order to develop a conversation in Ada using this approach. The internal structure of an alternate (alternate-procedure), given in the preceding section, is implemented as follows. The alternate consists of a set of tasks, which start when the execution of the alternate body is started. The effect of running these tasks is to assign values to the out parameters of the alternate-procedure. For controlling the execution of the tasks within an alternate, an auxiliary task, WATCHDOG, has to be added. This task has a co-ordination role: i) it knows about the deadline assigned (by the programmer) to this alternate and after this deadline it deletes all the other tasks and signals an error; ii) it accepts from the other tasks, through its entry ACTION, signals of detected errors in their executions; iii) it accepts from the other tasks (through its entry DONE) their signals of successful termination: calling DONE must be the last statement in each of these tasks, after running the local acceptance test. The WATCHDOG task thus waits until all the other tasks have completed successfully, and then terminates. The WATCHDOG task initiates recovery in two cases: i) not all tasks have called the DONE entry within the deadline; or ii) one of the tasks has called the ACTION entry. In these situations the WATCHDOG task aborts all application tasks and signals an error (by setting a flag) for starting the next alternate. We only use the Ada abort operation in this case, i.e., when an error has been detected.

Each alternate-procedure must have the same formal parameter list as the ft-procedure, with one additional Boolean parameter, alternatesuccess, for signalling errors detected while executing that alternate. Such errors may be detected in the tasks and explicitly signalled to the WATCHDOG task, or the signalling may consist in raising an exception. The following set of conventions prevents an alternate-procedure from producing side effects (outside its local state), and thus obviates the need for the run-time support to provide a rollback operation. The alternate-procedure must be side effect-free, returning all its results through out parameters. So, it must: -

have no in out parameters; -

have no parameters of type pointer (access);

-

contain no assignment to global variables external to it;

-

perform no output operations to files, controlled devices, operator;

-

contain no tasks which rendezvous with tasks outside the alternate procedure.

If the alternate-procedure calls a procedure from another package, this latter procedure must have no side effects, just as the alternate as a whole (no changes in the global data of any package and no operating with the outer world). If all these conditions hold, then aborting all tasks in an alternate guarantees no effect of their partial execution on other parts of system. Note that these restrictions amount to a strict adherence to well-known structured programming conventions for working with data and parameters in procedures [Dijkstra 1976]. The restrictions imposed by Ada on the modes of parameters cause some further complications. Let us suppose that a programmer wishes to use an out parameter of the ft-procedure as an input to the Test function (as will normally be the case: the Test function is supposed to implement an 'acceptance test" on the results of the alternates). This is not allowed by the Ada rules. A seemingly simple solution would be for the pre-processor to transform the declarations (in the ft-procedure) of such parameters from out to in out. This would solve the immediate problem, but would have consequences like: a cascade of procedure calls which seems legal in the dialect (passing a same out parameter along the tree of calls) could become illegal once preprocessed. A better solution, detailed in Appendix B, is: the pre-processor creates temporary variables in the body of the ft-procedure for each out parameter that needs to be used as in for the Test function; it uses these as actual parameters when calling the alternate-procedures; and it copies the values of these variables to the corresponding out of the ft-procedure before the latter returns. Our conversation scheme allows only a "flat" set of tasks as a set of participant processes, meaning that each alternate-procedure must: -

contain no calls to procedures in which an internal task starts (unless such procedures are ft-procedures);

-

contain no internal tasks declared in tasks which form this alternate.

We want to mention some additional restrictions caused solely by our desire to have a clear syntax for the dialect, a simplified template for the implementation of ft-procedures and a simpler pre-processor. These are: an alternate-procedure must contain no tasks defined as objects of task type, must not execute an allocator to create a new task, and must have null as a procedure body. It is obvious that our general scheme allows more complex alternatives as well. 4.2.4 Discussion of the Scheme Proposed Let us first discuss the scheme proposed without any regard for its implementation. Kim's "concurrent recovery block" scheme in [Kim 1982] is intended for extended Concurrent Pascal and does not allow a designer to impose deadlines. The exchange scheme in [Anderson and Knight 1983] is discussed on a more abstract level: neither implementation in a programming language, nor the possibility of nesting conversations are discussed. Rather serious problems are known to be involved in introducing planned backward error recovery schemes into parallel programming languages (see [Di Giandomenico and Strigini 1991; Kim 1982; Randell 1975]). Those which are most discussed include the possibility of a deadlock and the "deserter process" problem. Neither of these is possible under our scheme (all tasks needed in a concurrent recovery block are started concurrently, and the time-out avoids infinite waits). Comparing our scheme with the colloquy scheme in [Gregory and Knight 1985], we notice that the authors of the colloquy scheme consider the entire system as consisting of a set of longlived processes which can join one "discuss" (alternate in our scheme) but are not necessarily expected to participate in the next one (after a fault has been detected in the previous one). This has obvious advantages of flexibility and disadvantages in allowing extreme complexity in application design. Note that with our approach there is no need to develop a checkpointing tool. This advantage, and the corresponding limitations to the behaviour of alternate procedures, are not shared by other implementations of the concurrent recovery block. The Ada conversation scheme proposed in [Clematis and Gianuzzi 1993], is based on introducing a service task - the conversation manager - for every conversation. This task has a special structure; it monitors and synchronises entry to conversations, alternate execution, the acceptance test check for all participant processes. This scheme does not allow for deadline constraints, and assumes that the same set of processes takes part in all alternates (in this, our scheme is intermediate between [Clematis and Gianuzzi 1993] and [Gregory and Knight 1985]). A "deserter process" can stop the operation of the entire system both in the entry to a conversation and in the acceptance test check. Programmers have to develop their own recovery point tools and take care not only of saving, discarding or restoring information but also of recovery point nesting as well. As correctly pointed out in [Gregory and Knight 1985], Ada is an extremely complex language for realising a consistent concept of concurrent backward error recovery which would incorporate all opportunities of the language. The authors of [Gregory and Knight 1989]

enumerated several problems and unclear situations (process manipulation, shared data) which prevent the direct use of the conversation concept in Ada. Our approach to solving these problems is to specify a restrictive but powerful enough subset of the language, on which a clear conversational semantics can be imposed. A disadvantage of our approach could be seen in the need for spawning tasks when each alternate starts. There are two counter-arguments. First, this is one of the well-known approaches to designing concurrent software (e.g. fork-join, cobegin-coend) and its suitability depends on the application area. As for the cost, the time overhead of process creation is one of the important characteristics of all modern operating and run time systems; it is decreasing and it is always possible to know this time in advance and to decide whether the scheme proposed can be used. Note that Ada tasks are normally implemented as light-weight processes. And, lastly, this style in designing concurrent software is in accordance with top-down, structured engineering and with information hiding. In conclusion, this implementation of conversations, while restrictive, seems simple enough that it would be useful in practice. However, our purpose in this paper is to use it in demonstrating our recovery co-ordination scheme.

4.3 Atomic Transactions in Ada In this Section we discuss how TDSCs could be implemented to interface with our scheme of conversations implemented in Ada. The Ada language aids programmers, through its encapsulation mechanism, to design software with the abstract data type approach, and to develop simple schemes for preserving data consistency in concurrent applications. We thus assume that TDSCs are naturally mapped to Ada "packages". In a large system, it is reasonable to expect data servers and client processes to be separate runtime entities (separate processes possibly running on separate computers), possibly written in different languages and using special-purpose hardware facilities. In this scenario, those subsystems that are written in Ada would be separate programs, and Ada-style concurrency (e.g. inside conversations) would become an implementation detail of these subsystems. However, it seems reasonable to hide data servers (TDSCs), for use by clients written in Ada, under an Ada package interface. This allows us to limit the following discussion to the use of Ada proper. 4.3.1 How to Use Atomic Transaction The protocol for a client process using a TDSC (a data server software component with concurrency control via atomic transactions), in terms of calls to the TDSC, looks as follows. Transaction::= StartTransaction [Transaction Operation]* (AbortTransaction | CommitTransaction) TransactionOperation::= TDSCSpecificMethod | Transaction TDSCSpecificMethod::= TDSCmethod(id; other parameters)

From the client's point of view a TDSC in our system is a set of states and a set of operations (methods) to manipulate its states. A TDSC in our approach is an Ada package, and its methods (the only way for clients to operate with the state of the TDSC, i.e., the data it encapsulates) are

declared in the package specification. No data must be declared in the visible part of the package specification. As an example, let us consider a TDSC implementing a matrix. The matrix is stored as a two-dimensional array of elements, each element containing a value. The Matrix package (our TDSC) provides methods for setting or reading the value of each entry: Set and Get: procedure Set(idMatrx:in ...; x,y:in ...; item:in ...; methodcode: out ...); procedure Get(idMatrx:in ...; x,y:in ...; item:out ...; methodcode: out ...);

where idMatrx is a transaction_identifier of the transaction in progress, x and y are the coordinates of the element in the matrix, item will be written in the matrix (for Set method) or is the variable which will be assigned the value of the element from the matrix (for Get method) and methodcode is a return code. In addition, atomic transaction tools provide the following interface for transaction control: procedure StartTransactionMatrix(idMatrx: out ...; matrxLock: in ...; matrxcode: out ...; parentidMatrx: in ...); procedure AbortTransactionMatrix (idMatrx: in ...; matrxLock: in ...; matrxcode: out ...); procedure CommitTransactionMatrix(idMatrx: in ...; matrxLock: in ...; matrxcode: out ...);

The transaction must be started by calling the StartTransactionMatrix procedure. matrxLock gives a type of operations with data (WRITE or READ transaction); matrxcode is a return code and parentidMatrx is an identifier of the parent transaction, if any, within which the nested transaction is to be started (a null value here means that we are starting an outermost transaction). Transactions can be nested. It is obvious that several Ada tasks could operate within one transaction by concurrent calling of methods of the given TDSC, but in this case they have to receive the transaction_identifier from the task which started the transaction. Of course, the programmer of the client must know in advance whether the transaction will be read-only or not. If there will be a nested WRITE transaction, then its parent (and recursively its ancestors) have to be WRITE transactions as well. The AbortTransactionMatrix procedure allows the client process to abort a transaction, if this appears necessary for any reason. This reason can be either related to the state of the data or the transaction, or external to the transaction (e.g. in the client process). The CommitTransactionMatrix procedure must be executed to commit a transaction. This model of atomic transactions imposes no restriction on the task structure of a client application. One task can start a transaction, receive its identifier, and give it to any other task. Any task having this identifier can execute any method of the involved TDSC, and this can be done concurrently by different Ada tasks. After operating on the TDSC, one of these tasks has to commit or to abort this transaction; co-ordinating the client tasks so that the commit request is only issued when appropriate is the responsibility of the programmer. After this, the transaction identifier is invalid and all method calls with this identifier are rejected by the TDSC. Any number of transactions can be active at one time if they concern different TDSCs, or if they are properly nested and the concurrency control policies of all TDSC allow it.

4.3.2 Implementation Issues The developer of a TDSC has to provide the start_transaction, commit_transaction and abort_transaction procedures, and to implement concurrency control on accesses to data by the methods so as to preserve the properties of atomic transactions. The task of writing TDSCs can be eased by relying on a library of transaction control packages, and especially generic packages (as these can be instantiated for TDSCs of different types), to handle standard policies for concurrency control and recovery. Error propagation can be structured following the scheme of an "idealised fault-tolerant component" [Lee and Anderson 1990], or two-dimensional propagation [Menon, Dasgupta et al. 1993]. Errors may arise inside the TDSC, and are naturally treated there, if possible, delaying or aborting the servicing of transactions as appropriate in each application; other errors may be in the form of calls by the clients that are illegal, or inappropriate in terms of the current state of the TDSC, and are best treated by the clients. Every method must have a return parameter which can signify the following events. First, an interface error, which is caused by an illegal or incorrect method call. Such an error can be tolerated by the client, while the TDSC is in correct state after "executing" (that is, refusing to service) the call. There are several possible reasons for this kind of error, and the client should be informed of the actual reason to treat the error properly. In our Matrix TDSC we consider as a possible reason for this kind of return code the situations when the pre-defined CONSTRAINT_ERROR exception is raised during a method execution. This can occur when actual parameters are not in the subtype of the formal parameters, or the index values are out of bounds, etc. An important kind of interface error happens when a method bears an incorrect transaction identifier (the ID of a non-active or non-existing transaction). As opposed to interface errors, internal exceptions may occur during method execution. These exceptions should normally be treated within the TDSC, and, if this does not succeed, propagated to the client. Generally speaking, a TDSC that is in an incorrect state needs to be repaired in some way. The designer of the TDSC may improve its robustness by internal consistency checks (by periodic auditing tasks and/or triggered by the execution of specific methods), which can trigger internal repair actions unrelated to the servicing of methods. 4.3.3 Discussion The problem of introducing an atomic transaction scheme in Ada has been considered by several authors over some time [Arevalo and Alvarez 1989; Burns and Wellings 1989; Munck, Oberndorf et al. 1988]. It is obvious that Ada lacks this tool. In [Munck, Oberndorf et al. 1988] extensions are proposed to an Ada run-time system, including support for transactions. [Arevalo and Alvarez 1989] discusses the problem of inconsistencies that could occur in the state of a client/server system because of clients failing after starting a transaction. We can not agree with the authors of [Burns and Wellings 1989] that "atomic transactions ... are not suitable for programming fault-tolerant system". Actually, this tool can be easily used, and has been used in practice, for tolerating faults. We did not deal with the problems of permanence of effects, mainly because it is not important for our research. This topic should be investigated separately. Schemes for achieving

permanence can be found in the literature about database transaction recovery (e.g. [Bernstein, Hadzilacos et al. 1987; Haeder and Reuter 1983]). The specifics of an implementation depend on the physical implementation of stable memory, the access times of the memory supports, the degree of distribution, the faults to be tolerated and other application requirements. We believe that investigating atomic transaction tools for tolerating faults in object-based languages is quite a promising direction. It is interesting to compare our approach with an existing system like Arjuna [Shrivastava, Dixon et al. 1991], which supports fault-tolerant programming in C++. While Arjuna offers several novel and advantageous features relying on the object-oriented properties of C++, an Ada implementation like the one we have outlined would be complicated by the strong typing of Ada, but facilitated by the varied and powerful facilities for concurrency of Ada.

4.4 Example of How to Use Joint Recovery 4.4.1 The SHARES Matrix Problem We describe here a toy example of how the co-ordinated recovery approach could be used. Though simplified, this example can be designed and implemented in a natural way as a heterogeneous system with components of two kinds: a set of interacting processes and a TDSC encapsulating data. We consider a system in which a database stores data concerning the allocation of a certain resource in a controlled system (e.g. electric power in a distribution grid, bandwidth in a communication network). Individual "users" are grouped into "sectors" and their shares in the use of the resource are determined by a set of individual "claims", one per user, which are updated periodically. A client process (or group of processes) must update the shares allocated to all the users under the condition that the total amount of the resource allocated to all the users be constant. This update transaction must be atomic, as other clients may be querying the database at the same time. To exploit parallelism, the update is performed by multiple tasks, structured in conversations. Data concerning the shares allocated to users are structured in a matrix, SHARES, encapsulated in the TDSC called ResourceShares (similar to the Matrix TDSC in Section 4.3.1). If this TDSC had a sophisticated concurrency policy, parallel actions on it would yield good concurrency and good robustness. Matrix SHARES has as many rows as the sectors and as many columns as the maximum number of users in a sector plus 1. Each element of SHARES is a record collecting two items of information: SHARES[i,j].share is the amount of the resource allocated to user j of sector i, and SHARES[i,j].claim is the claim of user j of sector i. SHARES[i,0].share is the sum of all the shares allocated to users belonging to sector i and SHARES[i,0].claim is the sum of all the claims of users belonging to sector i. Information about the new claims, necessary for updating the shares allocated to users, are collected in the matrix CLAIMS that has as many rows as the sectors are, and as many columns as the maximum number of users in a sector, where CLAIMS[i,j] is the claim of user j of sector i. CLAIMS is not a permanent data structure, so it is not encapsulated in a TDSC.

4.4.2 Design and Algorithm Let D be the number of sectors; Ni (i=1,D) the number of users in sector i; N is max(Ni) i=1,D; and matrices CLAIMS(D*N), SHARES(D*(N+1)) as defined before. The algorithm is as follows (Fig. 6 shows its general structure, and a detailed pseudo-code description is in Appendix C): 1) The conversational component is an ft-procedure (conversation, in this case a concurrent recovery block) UPDATE_SHARES, with two alternates. 2) The first alternate has D+1 processes. D processes determine the new allocation of shares (using the C L A I M S matrix), each one for a sector. The (D+1)-th process, SharesUpdateLeader, co-ordinates this operation. SharesUpdateLeader opens a WRITE transaction on ResourceShares and sends the transaction ID to the processes SecUpdateTask1, ... SecUpdateTaskD. 3) Each SecUpdateTaskj (j=1,D) starts a nested transaction to work on a row of the matrix (their actions can be parallelised, as they concern different rows of SHARES). It: calculates the new total claim for sector j as the sum of all the new claims contained in CLAIMS[j,k], for k=1,N (for sectors with less than N users, assume that SHARES.share=0 and SHARES.claim=0 for non existing users in the sector, so all calculations can be done assuming N users in each sector) and writes it to SHARES[j,0].claim; writes CLAIMS[j,k] in SHARES[j,k].claim, for k=1,N (that is, updates the claim of each user in sector j); and commits the nested transaction. 4) Each SecUpdateTask j sends the new total claim and the total amount of the shares currently allocated to sector j (the latter read from S H A R E S [ j , 0 ] ) to the ShareUpdateLeader process. The latter calculates an "usage quota update coefficient" for each sector and sends it back to all SecUpdateTaskj. 5) Each SecUpdateTask j calculates the new total amount allotted to its sector (by multiplying SHARES[j,0].claim by the "usage quota update coefficient"). At this point, each SecUpdateTaskj has enough information to determine exactly the new shares allocated to all the users in sector j. To update the shares, each SecUpdateTaskj calls a concurrent recovery block UPDATE_SECTOR_SHARE (so we have D inner parallel conversations). 6) In the first alternate of the j-th UPDATE_SECTOR_SHARE conversation, N processes are spawned (one for each user in sector j). Each one of them updates the share allocated to one user. This update is implemented as a transaction. 7) One process (say SectorLeader) in UPDATE_SECTOR_SHARE opens a nested transaction and sends its ID to the other processes (User2, ...UserN) in its same conversation.

8) Each process in {User2, ...UserN}, inside UPDATE_SECTOR_SHARE, calculates and sets the new share allocated to one user, and then terminates. 9) SectorLeader updates the sector share SHARES[j,0].share, checks the consistency N

constraint, SHARES[j,0].share= ∑ SHARES[j,i].share, and commits the nested i=1

transaction. 10) Each UPDATE_SECTOR_SHARE conversation closes. 11) SharesUpdateLeader process in UPDATE_SHARES checks that the total share, N

∑ SHARES[j,0].share, has not changed, and commits the external transaction.

j=1

12) UPDATE_SHARES closes. The execution is completed. Conversational Component UPDATE_SHARES

k1 eTas t a d p SecU

UPDATE_SECTOR_SHARE

Shar esUp date Lead er Ta skD

. .Sec.Update

UPDATE_SECTOR_SHARE

sets of parallel nested transactions

.. UserN

UserN-1

SectorLeader

..

Transactional Component ResourceShares

request to start an external transaction request to start a nested transaction inter-process communication in the conversation

Fig. 6: The structure of the SHARES matrix example

About alternates. The second alternate for UPDATE_SECTOR_SHARE is a process Single that runs a loop through the entries of the row containing information for sector j, instead of spawning processes which access all entries in parallel. The second alternate for UPDATE_SHARES is a process Single which runs loops over the whole matrix. This is a common use of recovery blocks: the primary is fast but complex, the alternate is slower but simpler to understand, test, etc. There is no violation of Rule 3 and there will be no deadlock during the execution of this concurrent recovery block. In this example, all the concurrent transactions started work on separate parts of SHARES. These transactions are therefore independent, in that the order in which they are serialised does not affect the final state of the TDSC, and the TDSC "ResourceShares" can easily implement a concurrency control policy which allows them to be serviced in parallel. Conclusions. This example illustrates a scenario where a need to use both process and transactional paradigms may arise, and it demonstrates the application of our approach to joint recovery.

4.5 Directions for Implementing Co-ordinated Recovery in Ada In previous sections we have shown a general approach for implementing atomic transactions and a pre-processor for conversations in Ada. In this Section, we briefly discuss the problem of implementing co-ordinated recovery in Ada, so that it can be used for experimental and industrial applications. Here we are concentrating on the co-ordination "interpreter": that is, on a source pre-processor and on the code that it produces. 4.5.1 Assumptions As in the previous sections, we assume that we cannot change the Ada compiler and run-time support. We assume that we have (or know how to implement) data server components offering atomic transactions, and the pre-processor for conversations; that the designers of client processes will use the conversation scheme we have described; and that the designers of TDSCs will give them a package-like interface, in accordance with our general considerations in Section 4.3. We are now looking for a way to implement an interpreter supporting co-ordinated recovery. Again, its code will have to be built into the application processes. Again, as in the case of conversations, it would be desirable to use purely procedural extensions, but we shall need the help of a pre-processor. So, we now discuss what additional application-level code will be needed in the client processes to handle co-ordination of recovery, and how this code can be built out of added libraries and insertions by a pre-processor. We also assume that we can not force the designer of a TDSC to change its design for the purpose of co-ordinated recovery. In the general case, we would like to allow the TDSCs to be already running when the design of a conversational client is started. This is in accordance with the idea of open systems: we would like to be able to add new (conversational) clients into a running system with its data servers. As noted in Section 3.2, only TDSCs which export a visible prepare_to_commit method will be considered.

A first consideration is that we can proceed by expanding the conversation support preprocessor, but, because of peculiarities of our implementation of conversations, the Ada source code (after pre-processing) will not contain explicit calls to the conversation interpreter. So, the interpreter of co-ordinated recovery needs to be informed about conversation enter, close and abort operations, and the pre-processor will have to introduce appropriate "calls" into the source code. It is also impossible to have a TDSC (package) name as a parameter in Ada subprogram calls. If we wished to preserve the syntax that we have used so far for calls to TDSCs, the preprocessor would need to understand some naming conventions, like start_transaction(TDSCname,...) to be translated to start_transaction_TDSCname(...). It is possible that a different syntax (naming convention) for our "dialect" would make implementation easier. At this stage, we have been mostly interested in the feasibility of implementing the scheme, so we kept the natural-looking syntax we assumed in Section 3.2. 4.5.2 Implementation Aspects Name of conversations. The run-time names of conversations (used, e.g., in the co-ordination graph, CG) can be dynamically created at run-time as unique names. The problem is how a separately compiled unit containing the code of alternates can know the name of the calling conversation. The solution can be to insert in the code, before and after the call to each alternate, statements saving information about this event (within the body of the ft-procedure, without changing the bodies of alternates), so that the interpreter can know about each alternate execution. These names must then be passed as explicit arguments to be used in manipulating the CG. An important question, because of the possibility of separate compilation, is how the calls to the co-ordination interpreter have to be distributed among the alternates, the body of the ftprocedure and the global test. It is desirable to limit most of the pre-processing to the bodies of ft-procedures. Names of TDSCs. Strong typing in Ada, and the impossibility of using procedure and package names as parameters and values, impose many restrictions, and in particular do not allow us to use a pure procedural solution. Besides, the implementation of transaction control is inevitably TDSC-specific, that is, specific to the semantics of the data types involved, the TDSC-specific concurrency control policies and the implementation details of the TDSC (e.g., storage organisation). So, calls like start_transaction(TDSCname, id, ...) are impossible. The preprocessor must collect the names of the TDSCs used to map the calls to our co-ordination interpreter into TDSC-specific calls. TDSC-specific information in the CG. So that the code for manipulating the CG can be standard, transaction identifiers must belong to a separate type which is the same for all TDSCs. Likewise, there will be (hidden from the programmer who use the dialect) a type for conversation identifiers, and a type for TDSC names used in the CG. Protection of data structures. By choosing a very restrictive model of conversations, we were able to build the conversation interpreter without any major data structures. In the case of the co-ordination interpreter, the CG must be implemented in data structures visible to the added

application code. It is desirable to keep access to these data structures as restricted as possible: a run-time error (Ada "erroneous execution", run-time support bug, hardware transient) could otherwise corrupt the recovery information of a large part of the application. A compromise could be to have only one CG for each top-level conversation, to be shared by the procedures and tasks called by that conversation. The data structures and manipulation code of the CG can be provided as an Ada package, whose procedures are called at each call related to conversations or transactions. A reference to the CG must be passed along to all nested ftprocedures (conversations), in the form of a parameter or shared variable. Creating the source code of the co-ordination interpreter. As we require all conversational and transactional calls to go through the co-ordination interpreter, this co-ordination interpreter (or, more concretely, our pre-processor) must know the names of all these calls. The programmer declares which TDSCs may be called by a client process by using the with clause. The preprocessor can then complement all calls to TDSCs with the appropriate checks and updates of the CG. There is a general implementation issue of how much of the "interpreter" can consist of Ada packages and procedures, and how much of in-line code added by the pre-processor. Preprocessing is used to give the programmer the power to invoke complex actions through simple statements in the "dialect" source. However, it makes sense for the "dialect" still to look like Ada, and to create, as far as possible, well-structured Ada code using packages and procedures. At this stage, we have not chosen a specific solution, but only pointed out problems and ways to solve them.

5 Discussion We have considered the problem of backward recovery in those software systems that include different subsystems designed according to two different computational models: the "objectaction" model and the "process-conversation" model. We argue that this is a realistic problem: a computational model is imposed on designers by factors of application peculiarities and design culture that cannot be arbitrarily changed, and large software systems exist where heterogeneous computational models are needed. [Shrivastava, Mancini et al. 1993] showed how techniques developed for each of the two models, the "object-action" model and the "process-conversation" model, could be implemented in the other model as well, so broadening the possibilities of both design styles. One might then investigate, for instance, extensions to the process-conversation model to allow more complex and open patterns of communication. By contrast, we have assumed a net separation in the use of the two computational models. In our scenario, a subsystem based on processes and conversations would offer full interpreter support for a rather constrained pattern of conversations, while atomic transaction servers would incorporate great flexibility (in terms, e.g., of concurrency control policies), mostly obtained in their application-level design. This seems to us a realistic expectation (though not the only realistic scenario) about how such facilities could be designed and used in large, heterogeneous systems. On this basis, we have described how the backward recovery of computations involving both types of subsystems can be co-ordinated. An important aspect of this work is that we do not assume that the interpreter for a large application system can or should be built as a monolithic, homogeneous entity, and we allow for "transactional data server components", which could run on virtually any

computing platform, to be used in conjunction with the specially enhanced platform which we propose for client processes. An interesting consideration about the two faces of duality is one of granularity. To co-ordinate an atomic transaction changing the states of data items encapsulated in the same TDSC, a "coordinating agent" only needs authority on the set of data in that one TDSC. So, designers can implement an atomic transaction capability as a responsibility of entities, possibly at the application level, with visibility and authority limited to individual TDSCs. However, an atomic transaction involving two such TDSCs could only be managed by an entity with authority on both: either a third application-level entity designed as a "transaction co-ordinator", or some facility in the run-time support. Two levels of granularity are thus apparent; no such distinction usually appears in the study of recovery in the process-conversation model, where if two processes can communicate, then the co-ordination of their recovery must be guaranteed. While we have assumed no facility for transactions on multiple TDSCs, the co-ordination mechanism we have specified automatically allows the building of multi-TDSC transactions. All transactions owned by the same conversation are committed together. It is also interesting to notice that our co-ordination mechanism is an extension of the application-transparent coordination schemes, studied for the process-conversation model. Three styles of co-ordination can thus coexist in a system like this: i) conversations, where for each multi-process interaction the application programmer must pre-plan the participation of each process, while noninterference and recovery policies are pre-packaged in the interpreter; ii) atomic transactions, where the interaction of processes via data protected by TDSCs can be decided at run-time, is monitored by the TDSC, and non-interference and recovery policies are decided by the programmer of the TDSC; and iii) application-transparent co-ordination between conversations and transactions, where the programmer of a TDSC does not need to pre-plan participation in a conversation, and the interpreter keeps track of which TDSCs are involved in a conversation and co-ordinates their recovery. We believe that the proposed organisation can successfully hide some of the inherent complexity of planning error recovery in a large system with heterogeneous sub-systems. Some of the complexity is avoided by taking care of difficult tasks (co-ordination) in the interpreter, and some by constraining the application designer to use stereotyped design templates. Risks in this style of system construction exist, e.g. in the possibility of deadlock between clients attempting multiple, conflicting transactions. These interaction problems need to be studied in each case, and of course their existence must limit the complexity of any attempted design. On the other hand, if an application requires such complex interactions, our scheme seems able to constrain them to patterns amenable to analysis. We now outline some related research by other authors, and the ways in which we think our proposal improves the few previous proposals for co-ordinating conversations with server processes. The difficulty of managing concurrent, independent accesses to shared data servers in the framework of conversations was pointed out in [Gregory and Knight 1989]. The problem of joint recovery of client processes (organised in recovery blocks) together with database management systems was independently raised in [Gudes and Fernandez 1990]. The authors consider enhancements to the Recovery Meta Program (RMP) approach [Ancona, Clematis et al. 1990] - a monolithic, interpreter-level solution - to support this kind of recovery.

Another related paper is [Clematis and Gianuzzi 1993], which proposes quite a different approach to combining components developed in the process and the server/client models. Like us, these authors posit that all fault tolerance should be provided within an existing conventional language (they choose Ada, as we did in our example). However, in their approach a special service task is to be started for every conversation and each participant process is to inform it about entrance to the given conversation. All participants leave a conversation at the same time if they pass the acceptance test; otherwise each of them must start the next alternate. All control of these decisions is through the service task. This task must know a list of all servers needed for the given conversation, and it locks all of them when starting the conversation. Each conversation participant executing an alternate can call any operation on a server (all permissible client operations are implemented by selective wait in the server). Our atomic transaction model is more general than the client/server model in [Clematis and Gianuzzi 1993]. We do not lock all servers potentially involved in a conversation (and in all possible nested ones) for the whole duration of the conversation's execution: rather, we allow application-specific locking policies. In terms of ease of use, [Clematis and Gianuzzi 1993] requires the programmers to specify in each request a level of nesting, a list of all servers needed at the given level, a list of all processes in each given conversation; these requirements are absent in our proposal. We have specified in some detail how a support for our co-ordination model could be implemented. This specification seems to us to be reasonably general, in that it makes few assumptions on the varieties of conversations and of transactions existing in the system. As an additional proof of feasibility, we have endeavoured to specify an implementation in Ada. We chose Ada mainly to use its concurrency facilities, and because compatibility with a popular language is an important feature for any proposed design scheme. However, this posed an interesting problem: Ada and its run-time support are complex and monolithic, and previous researchers have found it difficult to extend them to support backward recovery schemes. So, we regard the fact that we could outline an implementation strategy for our scheme (albeit in a constrained variant) in Ada as a reasonable indication of feasibility of the scheme in different contexts. As part of this work, we have also specified a viable implementation of conversations (more specifically, concurrent recovery blocks) in Ada to a level of detail that could be used in practice. Leaving the Ada run-time support untouched, we had to specify the "interpreter" in application code. In practice, this makes it necessary to use a pre-processor to enforce compliance with the necessary programming conventions. This approach may be questioned in terms of the philosophy of Ada, but Ada subsets and dialects, with tools for their support, are already a practical necessity elsewhere (for instance, "safe" or time-predictable subsets). We have not dealt, of course, with the whole problem of building fault-tolerant applications in Ada. This would involve the whole problem of which lower-level fault tolerance mechanisms can be included in an Ada (distributed) run-time support and how these could be visible to the application programmer. Some of the problems that we have found in specifying an Ada implementation could be solved by using Ada9X [Barnes 1993]: from the syntactic problem of being unable to use the names of packages as arguments to procedure calls, to the more basic concern that Ada lacks provisions for building multi-program applications to be combined with some degree of openness.

Considerations of complexity, however, would lead us to prefer a simpler, object-oriented language over either Ada or Ada9X. This work has a rather speculative nature: it concerns a complex toolset for supporting fault tolerance in large applications, and therefore in projects which are quite outside the reach of academic research. An actual proof of feasibility and usefulness can only come from a complete implementation and use in reasonably realistic projects. This is a shortcoming common to many such proposals, notably in the field of fault tolerance. We would expect some help in reducing the gap between conception and trial use from the ongoing research [Fabre and Randell 1992; Rubira-Calsavara and Stroud 1993] on the use of object-oriented methods to facilitate the construction of toolsets for fault-tolerant programming.

6 Acknowledgements Part of this research was done during Dr. Romanovsky's postdoctoral fellowship, sponsored by the Royal Society. We would like to thank Franco Mazzanti and Robert Stroud for their criticism and advice.

References [Agha 1990] G. Agha, “Concurrent object oriented programming”, CACM, vol. 33, no. 9, pp.125-141, 1990. [Ancona, Clematis et al. 1990] M. Ancona, A. Clematis, G. Dodero, E.B. Fernandez and V. Gianuzzi, “A System Architecture for Fault Tolerance in Concurrent Software”, IEEE Computer, vol. 23, no. 10, pp.23-31, 1990. [Anderson and Knight 1983] T. Anderson and J.C. Knight, “A Framework for Software FaultTolerance in Real-Time Systems”, IEEE TSE, vol. SE-9, no. 3, pp.355-364, 1983. [ANSI 1983] ANSI, ANSI/MIL-Std-1815a Reference Manual for the Ada Programming Language, American National Standards Institute, 1983. [Arevalo and Alvarez 1989] S. Arevalo and A. Alvarez, “Fault tolerant distributed Ada”, Ada Letters, vol. 9, no. 5, pp.54-59, 1989. [Barigazzi and Strigini 1983] G. Barigazzi and L. Strigini. “Application-Transparent Setting of Recovery Points”, in 13th IEEE International Symposium on Fault-Tolerant Computing (FTCS-13), pp. 48-55, Milano, Italy, IEEE Computer Society, 1983. [Barnes 1993] J. Barnes, Introducing Ada9X. Ada9X Project Report, Intermetrics, Inc., 1993. [Bernstein, Hadzilacos et al. 1987] P.A. Bernstein, V. Hadzilacos and N. Goodman. Concurrency Control and Recovery in Database Systems, Reading, Mass., Addison-Wesley, 1987. [Birman, Joseph et al. 1984] K.P. Birman, T.A. Joseph, T. Raeuchle and A.E. Abbadi. “Implementing Fault-Tolerant Distributed Objects”, in Proc. 4th Symposium on Reliability in Distributed Software and Database Systems, pp. 124-133, 1984.

[Brownbridge, Marshall et al. 1982] D.R. Brownbridge, L.F. Marshall and B. Randell, “The Newcastle Connection, or - UNIXes of the World Unite!”, Software Practice and Experience, vol. 12, no. 12, pp.1147-1162, 1982. [Burns and Wellings 1989] A. Burns and A.J. Wellings, “Programming Atomic Actions in Ada”, Ada Letters, vol. 9, no. 6, pp.67-79, 1989. [CACM 1993] CACM, “Concurrent Object-Oriented Programming”, special issue, vol. 36, no. 9,1993. [Clematis and Gianuzzi 1993] A. Clematis and V. Gianuzzi, “Structuring conversation in operation/procedure oriented programming languages”, Computer Languages, vol. 18, no. 3, pp.153-168, 1993. [Di Giandomenico and Strigini 1991] F. Di Giandomenico and L. Strigini. “Implementations and Extensions of the Conversation Concept”, in Proc. 5-th International GI/ITG/GMA Conference on Fault-Tolerant Computing Systems - Tests, Diagnosis, Fault Treatment, pp. 4253, Nürnberg, Germany, Springer-Verlag, Informatik-Fachberichte, No 283, 1991. [Dijkstra 1976] D.W. Dijkstra. A Discipline of Programming, Englewood Cliffs, N.J., Prentice Hall, 1976. [Fabre and Randell 1992] J.C. Fabre and B. Randell. “An object-oriented view of fragmented data processing for fault and intrusion tolerance in distributed systems”, in Proc. ESORICS 92, Y. Deswarte, G. Eizenberg, J. J. Quisquater, Eds. Lecture Notes in Comp. Sci., vol. 648, Springer-Verlag, New York, pp. 193-208, 1992. [Gehani and Roome 1988] N.H. Gehani and W.D. Roome, “Concurrent C++: Concurrent programming with class(es)”, Software Practice and Experience, vol. 18, no. 12, pp.11571177, 1988. [Gopal and Griffith 1990] G. Gopal and N.D. Griffith. “Software Fault Tolerance in Telecommunication Systems”, in Proc. 4th ACM SIGOPS European Workshop, pp. Bologna, Italy, 1990. [Gray and Reuter 1992] J. Gray and A. Reuter. Transaction processing: techniques and concepts, Morgan Kaufmann, 1992. [Gregory and Knight 1985] S.T. Gregory and J.C. Knight. “A New Linguistic Approach to Backward Error Recovery”, in Proc. 15th International Symposium on Fault-Tolerant Computing, pp. 404-409, Ann Arbor, Michigan, IEEE Computer Society, 1985. [Gregory and Knight 1989] S.T. Gregory and J.C. Knight. “On the provision of Backward Error Recovery in production programming languages”, in Proc. 19th International Symposium on Fault-Tolerant Computing, pp. 507-511, Chicago, Illinois, IEEE Computer Society, 1989. [Gudes and Fernandez 1990] E. Gudes and E.B. Fernandez. “Combined application datafault recovery”, in Proc. IEEE Int. Conf. on Computer Systems and Software Engineering, pp. 3643, Tel Aviv, Israel, IEEE Computer Society, 1990.

[Haeder and Reuter 1983] T. Haeder and A. Reuter, “Principles of transaction-oriented recovery”, ACM CS, vol. 15, no. 4, pp.287-327, 1983. [Kim 1982] K.H. Kim, “Approaches to Mechanization of the Conversation Scheme Based on Monitors”, IEEE TSE, vol. SE-8, no. 3, pp.189-197, 1982. [Kim and You 1990] K.H. Kim and J.H. You. “A Highly Decentralized Implementation Model for the Programmer-Transparent Coordination (PTC) Scheme for Cooperative Recovery”, in Proc. 20th International Symposium on Fault-Tolerant Computing, pp. 282-289, Newcastleupon-Tyne, U.K., IEEE Computer Society, 1990. [Lee and Anderson 1990] P.A. Lee and T. Anderson. Fault Tolerance: Principles and Practice, Dependable Computing and Fault-Tolerant Systems. New York, Springer-Verlag, 1990. [Lynch, Merrit et al. 1993] N.A. Lynch, M. Merrit, W.E. Wehil and A. Fekete. Atomic transactions, Morgan Kaufmann, 1993. [Menon, Dasgupta et al. 1993] S. Menon, P. Dasgupta and R.J. LeBlanc. “Asynchronous event handling in distributed object-based systems”, in 13th Int. Conference Distributed Computing Systems, pp. 383-390, Pensylvania, IEEE Computer Society, 1993. [Munck, Oberndorf et al. 1988] R. Munck, P. Oberndorf, E. Ploedereder and R. Thall, “An overview of DOD-STD-1838A (proposed), the common APSE interface set, Revision A”, SIGSOFT Software Engineering Notes, vol. 13, no. 5, pp.235-247, 1988. [Powell, Bonn et al. 1988] D. Powell, G. Bonn, D. Seaton, P. Verissimo and F. Waeselynck. “The Delta-4 Approach to Dependability in Open Distributed Computing Systems”, in Proc. 18th International Symposium on Fault-Tolerant Computing, pp. 246-251, Tokyo, Japan, IEEE Computer Society, 1988. [Randell 1975] B. Randell, “System Structure for Software Fault Tolerance”, IEEE TSE, vol. SE-1, no. 2, pp.220-232, 1975. [Randell, Lee et al. 1978] B. Randell, P.A. Lee and P.C. Treleaven, “Reliability Issues in Computing System Design”, Computing Surveys, vol. 10, no. 2, pp.123-175, 1978. [Rubira-Calsavara and Stroud 1993] C.M.F. Rubira-Calsavara and R.J. Stroud. “Forward and backward error recovery in C++”, in Predictably Dependable Computing Systems, First Year Report, pp. 147-164, ESPRIT BASIC RESEARCH PROJECT 6362-PDCS 2, 1993. [Saltzer and Schroeder 1975] J.H. Saltzer and M.D. Schroeder, “The Protection of Information in Computer Systems”, Proceedings of the IEEE, vol. 63, no. 9, pp.1278-1308, 1975. [Shrivastava, Dixon et al. 1991] S.K. Shrivastava, G.N. Dixon and G.D. Parrington, “An Overview of the Arjuna Distributed Programming System”, IEEE Software, vol. 8, no. 1, pp.66-73, 1991. [Shrivastava, Mancini et al. 1993] S.K. Shrivastava, L.V. Mancini and B. Randell, “The Duality of Fault-Tolerant System Structures”, Software Practice and Experience, vol. 23, no. 7, pp.773-798, 1993.

[Strigini and Di Giandomenico 1991] L. Strigini and F. Di Giandomenico. “Flexible schemes for application-level fault tolerance”, in Proc. 10th Symposium on Reliable Distributed Systems, pp. 86-95, Pisa, Italy, IEEE Computer Society, 1991. [Strom and Yemini 1985] R.E. Strom and S.A. Yemini, “Optimistic recovery in distributed systems”, ACM Transactions on Computer Systems, vol. 3, no. 3, pp.204-226, 1985. [Wyatt, Kavi et al. 1992] B.B. Wyatt, K. Kavi and S. Hufnagel, “Parallelism in objectoriented languages: a survey”, IEEE Software, vol. 9, no. 6, pp.56-66, 1992.

Appendix A. Code for Conversation in Ada We give an example of conversation in Ada, developed as described in Section 4.2. Procedure Max finds the maximum element in the large matrix A(N*M). For the sake of this example, let us assume that the matrix elements are scattered over distributed storage elements, that the search is to be completed in a limited time, that the locations of the matrix elements are not known in advance (the search can take unpredictable time), and the allocation of tasks is also unknown. With these assumptions, this search is organised as follows. An ft-procedure Max has two alternates: procedures MaxRow and MaxColumn which are called from the body of MaxElem. In the first alternate, MaxRow, each one of N tasks looks for the maximum elements in one row and afterward each of them sends the element found to the task T1N which calculates the maximum element in the entire matrix. If an error occurs or the time-out expires, this alternate is considered to be failed and the second one, procedure MaxColumn, is started. It has M tasks each of which does searches a column of the matrix. The time constraint T given for the entire procedure Max is split equally between the two alternates. Function TestResult checks whether the element found falls within a statically known range (minlimit, maxlimit). package

MaxElem

is

FAILURE: exception; procedure Max(A: in big_matrix; elem: out elemtype); end

MaxElem;

package body MaxElem is -- real-time constraint for entire procedure Max: T: constant:= 10.0; procedure MaxRow(A: in big_matrix; elem: out elemtype; alternatesuccess: out BOOLEAN); procedure MaxColumn(A: in big_matrix; elem: out elemtype; alternatesuccess: out BOOLEAN); function TestResult(elem: in elemtype) return BOOLEAN; procedure Max(A: in big_matrix; elem: out elemtype) is type ALTERNATE_RANGE is range 1..2; alternate: ALTERNATE_RANGE:=1; alternatesuccess: BOOLEAN; temp_elem: elemtype; begin loop case alternate is when 1 => MaxRow(A, temp_elem, alternatesuccess); when 2 => MaxColumn(A, temp_elem, alternatesuccess); when others => raise FAILURE; end case; exit when alternatesuccess and then TestResult(temp_elem); alternate:=alternate+1; end loop; elem:=temp_elem; exception

when others =>raise FAILURE; end Max; procedure MaxRow(A: in big_matrix; elem: out elemtype; alternatesuccess: out BOOLEAN) is task T11; .... -- N tasks form the first alternate task T1N is entry NEWMAX (newelem: in elemtype); end TIN; -- special internal task for conversation -- and real-time constraint control: task WATCHDOG is entry ACTION; entry DONE; end WATCHDOG; task body T11 is -- bodies of application tasks begin -- ....... T1N.NEWMAX(myelem); WATCHDOG.DONE; end T11; ... task body T1N is begin -- ........ WATCHDOG.DONE; end T1N; task body WATCHDOG is -- task number in alternate 1: TaskNumber: constant:=N; -- constraint for 1st alternate: TimeConstr: constant:=T/2; task_count: INTEGER:=0; this_alt_deadline: TIME; begin this_alt_deadline:=CLOCK+TimeConstr; loop select accept ACTION; abort T11, ..., T1N; alternatesuccess:=FALSE; exit; or -- one task ends: accept DONE; task_count:=task_count+1; if task_count=TaskNumber then exit; end if; or -- time constraint delay (this_alt_deadline-CLOCK); abort T11, ..., T1N; alternatesuccess:=FALSE; exit; end select; end loop; exception when others => abort T11, ..., T1N; alternatesuccess:=FALSE; end WATCHDOG; begin --MaxRow body null; end MaxRow;

procedure MaxColumn(A: in big_matrix; elem: out elemtype; alternatesuccess: out BOOLEAN) is task T21; .... -- M tasks form the second alternate task T2M is entry NEWMAX (newelem: in elemtype); end T2M; task WATCHDOG is entry ACTION; entry DONE; end WATCHDOG; -- application task bodies: task body T21 is begin -- ....... T2M.NEWMAX(myelem); WATCHDOG.DONE; end T21; ... task body T2M is begin -- ........ WATCHDOG.DONE; end T2M; task body WATCHDOG is -- task number in alternate 2: TaskNumber: constant:=M; -- constraint for 2nd alternate: TimeConstr: constant:=T/2; task_count: INTEGER:=0; this_alt_deadline: TIME; begin this_alt_deadline:=CLOCK+TimeConstr; loop select accept ACTION; abort T21, ..., T2M; alternatesuccess:=FALSE; exit; or -- one task ends: accept DONE; task_count:=task_count+1; if task_count=TaskNumber then exit; end if; or -- time constraint delay (this_alt_deadline-CLOCK); abort T21, ..., T2M; alternatesuccess:=FALSE; exit; end select; end loop; exception when others => abort T21, ..., T2M; alternatesuccess:=FALSE; end WATCHDOG; begin --MaxColumn body null; end MaxColumn; function TestResult(elem: in elemtype) return BOOLEAN is minlimit: constant:=-10000; maxlimit: constant:= 10000;

begin return elem>minlimit and elemraise FAILURE) into the ft-procedure body. This catches all exceptions and in its turn raises the FAILURE exception, to be interpreted, by the caller of the ft-procedure, as the notification of an error that could not be tolerated within the ft-procedure. The body of this ft-procedure is translated as follows: ft_procedure Name( declaration of parameters) is ....... ensure Test( list of actual parameters) by Altern1 else by Altern2 ....... end Name;

procedure Name( declaration of parameters) is ...... type ALTERNATE_RANGE is range 1..M; alternate: ALTERNATE_RANGE:=1; alternatesuccess: BOOLEAN; begin loop case alternate is when 1 => Altern1( list of actual parameters, alternatesuccess); when 2 => Altern2( list of actual parameters, alternatesuccess);

....... when others =>raise FAILURE; end case; exit when alternatesuccess and then Test(list of actual parameters); alternate:=alternate+1; end loop; exception when others =>raise FAILURE; end Name;

The body of the ft-procedure is translated into the body of the corresponding Ada procedure by the following steps: -

the number of alternates is calculated and put into the declaration of type ALTERNATE_RANGE as a constant M; the declaration of the type ALTERNATE_RANGE is added;

-

the declarations of the variables alternate, alternatesuccess, are added; where the place holder appears, declarations of internal variables are added, one for each out parameter of the ft-procedure that is also an in formal parameter for the Test function;

-

the body of the corresponding Ada procedure consists of only one loop and an exception handler;

-

within this loop, the case statement calls successively each alternate, given in the declaration of the ft-procedure after the keywords by or else by;

-

each alternate call has the same list of actual parameters as the ft-procedure, with two changes: -

for those out parameters of the ft-procedure for which a "temporary replica" variable has been created, the latter is passed as an actual parameter to the alternate-procedure instead of the out parameters of the ft-procedure; if the acceptance test is satisfied then the contents of the "temporary replica" variables are copied to the corresponding out parameters of the ft-procedure;

-

the additional parameter alternatesuccess is added;

-

the function Test given after the word ensure in the Ada dialect is called to check a global test; again, its actual parameters include the "temporary replica" variables instead of actual out parameters of the ft-procedure;

-

the exit statement is in this loop after the case statement; it checks Test and alternatesuccess;

-

before ending the loop, the variable alternate is increased by one.

Note that only one ft-procedure can be in a declarative part, because it returns the predefined exception FAILURE; we do not concentrate on this problem but our dialect could be extended to allow several ft-procedures to be declared in one block. A solution would be to complicate the pre-processor so that it declares the FAILURE exception only once, or in a more complex way by changing the dialect and giving a developer an opportunity to specify a name for this exception. Each alternate declaration is translated as follows: alternate Altern1( declaration of parameters of ft-procedure);

procedure Altern1( declaration of parameters of ft-procedure; alternatesuccess: out BOOLEAN);

Each alternate body is translated as follows: alternate Altern1( declaration of parameters of ft-procedure) is within

time_out1

task T11 is ....; end T11; ..... task T1N is ....; end T1N; task body T11 is begin ........ task_done; end T11; ....... task body T1N is begin ........ alternate_error; ......... task_done; end T1N; end Altern1;

procedure Altern1( declaration of parameters of ft-procedure; alternatesuccess: out BOOLEAN) is task T11 is ....; end T11; ..... task T1N is ....; end T1N; task WATCHDOG is entry ACTION; entry DONE; end WATCHDOG; task body WATCHDOG is -- task number in alternate 1: TaskNumber: constant:=N; -- constraint for 1st alternate: TimeConstr: constant:=time_out1; task_count: INTEGER:=0; this_alt_deadline: TIME; begin this_alt_deadline:=CLOCK+TimeConstr; loop select accept ACTION; abort T11, ..., T1N; alternatesuccess:=FALSE; exit; or -- one task ends: accept DONE; task_count:=task_count+1; if task_count=TaskNumber then exit; end if; or -- time constraint delay (this_alt_deadline-CLOCK); abort T11, ..., T1N; alternatesuccess:=FALSE; exit; end select; end loop; exception when others => abort T11, ..., T1N; alternatesuccess:=FALSE; end WATCHDOG;

task body T11 is begin ....... --corresponds to task_done: WATCHDOG.DONE; end T11; ....... task body T1N is begin ..... --corresponds to alternate_error: WATCHDOG.ACTION; ..... --corresponds to task_done: WATCHDOG.DONE; end T1N; begin null; -- Altern1 body end Altern1;

Each alternate written in the dialect is processed following the following rules: -

first, the list of the names of all application tasks declared in the alternate (that is T11, ..., T1N) is collected;

-

each statement task_done in the bodies of these tasks is translated into the entry call WATCHDOG.DONE;

-

each statement alternate_errorin the bodies of these tasks is translated into the entry call WATCHDOG.ACTION;

-

the specification and the body of the special WATCHDOG task are inserted in the Ada text after the specifications of all the application tasks: i) the list of all tasks is to be used in abort statements; ii) the simple expression time_out1 given with the keyword within is used in the declaration of the constant TimeConstr; iii) the number of tasks is calculated during pre-processing and put in the constant TaskNumber; iv) the default catch-all handler is inserted into the ft-procedure body.

Appendix C. The SHARES Matrix Example Consider that the TDSC called ResourceShares is implemented as discussed in Section 4.3. It includes eight methods to set and to examine the share and the claim of one user and of an entire sector: S e t _ C l a i m , G e t _ C l a i m , S e t _ S h a r e , G e t _ S h a r e , Set_Total_Claim, Get_Total_Claim, Set_Total_Share, Get_Total_Share. In addition, the three transaction control procedures are provided: StartTransactionResourceShares, AbortTransactionResourceShares and CommitTransactionResourceShares. We do not assume any particular concurrency control policy. The ft-procedure UPDATE_SHARES, which has two alternates: CONCURRENT and SEQUENTIAL, is declared as follows: ft_procedure UPDATE_SHARES(CLAIMS: in CLAIMS_matrix); ft_procedure UPDATE_SHARES(CLAIMS: in CLAIMS_matrix) is ensure Test by CONCURRENT

else by SEQUENTIAL end UPDATE_SHARES;

These alternates and global test function have the following declarations and bodies: alternate CONCURRENT(CLAIMS: in CLAIMS_matrix); alternate SEQUENTIAL(CLAIMS: in CLAIMS_matrix); function Test return BOOLEAN;

-- idle in our example

alternate CONCURRENT(CLAIMS: in CLAIMS_matrix) is within time1 task SharesUpdateLeader is entry TransactionId(i: out id); entry NewSectorClaim(g: in ClaimType); entry OldSectorShare(w: in SharesType); entry UpdateCoef(u: out SharesType); entry ACK; end SharesUpdateLeader; task SecUpdateTask1; ... task body SharesUpdateLeader is wid: SharesPointer; -- transaction id one_share, old_share, new_share: SharesType; one_claim, sum_claim: ClaimType; coef: SharesType; ... begin StartTransactionResourceShares(wid,WRITE, ...); for I in 1 .. MaxSectorNumber loop accept TransactionId(i: out id) do i:=wid; end TransactionId; end loop; sum_claim:=0; old_share:=0; -- for acceptance test for I in 1 .. MaxSectorNumber loop accept NewSectorClaim(g: in ClaimType) do sum_claim:=sum_claim+g; end NewSectorClaim; accept OldSectorShare(w: in SharesType) do old_share:=old_share+w; end OldSectorShare; end loop; coef:=old_share/sum_claim; for I in 1 .. MaxSectorNumber loop accept UpdateCoef(u: out SharesType) do u:=coef; end UpdateCoef; end loop; for I in 1 .. MaxSectorNumber loop accept ACK; end loop; new_share:=0; -- for acceptance test for I in 1 .. MaxSectorNumber loop for J in 1 .. MaxUserNumber loop Get_Share(wid,I,J,one_share, ...); new_share:=new_share+one_share; end loop; end loop; CommitTransactionResourceShares(wid, ...); if old_share=new_share then task_done; -- local test

else alternate_error; end if; end SharesUpdateLeader; task body SecUpdateTask1 is mySector: constant:=1; wid, widnest: SharesPointer; locsectorshare, newsectorshare: SharesType; sectorclaim: ClaimType; newcoef: SharesType; ... begin SharesUpdateLeader.TransactionId(wid); StartTransactionResourceShares(widnest,WRITE,...,wid); sectorclaim:=0; for I in 1 .. MaxUserNumber loop sectorclaim:=sectorclaim+CLAIMS[mySector,I]; Put_Claim(widnest,mySector,I,CLAIMS[mySector,I],...); end loop; Set_Total_Claim(widnest,mySector,sectorclaim,...) CommitTransactionResourceShares(widnest,...,wid); SharesUpdateLeader.NewSectorClaim(sectorclaim); Get_Total_Share(wid,mySector,locsectorshare,...); SharesUpdateLeader.OldSectorClaim(locsectorshare); SharesUpdateLeader.Coef(newcoef); newsectorshare:=newcoef*sectorclaim; Nested.UPDATE_SECTOR_SHARE(wid, mySector, newsectorshare, newcoef); -- NESTED CONV SharesUpdateLeader.ACK; task_done; end SecUpdateTask1; .... end CONCURRENT; alternate SEQUENTIAL(CLAIMS: in CLAIMS_matrix) is within time2 task Single; task body Single is .... end Single; end SEQUENTIAL;

The nested ft-procedure UPDATE_SECTOR_SHARE and its alternates (D_CONCURRENT, D_SEQUENTIAL) are declared as follows: ft_procedure UPDATE_SECTOR_SHARE(id: in SharesPointer; d: in SectorNumber; tw: in SharesType; -- new sector share coef: in SharesType); alternate D_CONCURRENT(id:in SharesPointer; d: in SectorNumber; coef: in SharesType; tw: in SharesType); alternate D_SEQUENTIAL(id: in SharesPointer; d: in SectorNumber; coef: in SharesType;tw: in SharesType); function NestedTest return BOOLEAN;

-- idle in the example

ft_procedure UPDATE_SECTOR_SHARE(id: in SharesPointer; d: in SectorNumber; tw: in SharesType; coef: in SharesType) is ensure NestedTest by D_CONCURRENT else by D_SEQUENTIAL end UPDATE_SECTOR_SHARE;

alternate D_CONCURRENT(wid: in SharesPointer; d: in SectorNumber; coef: in SharesType; tw: in SharesType) is within time11 task SectorLeader is entry TransactionId(i: out id); entry ACK; end SectorLeader; task User2; ... task body SectorLeader is widnest: SharesPointer; -- nested transaction id one_claim: ClaimType; total_share, one_share, sum_share: SharesType; ... begin StartTransactionResourceShares(widnest,WRITE,...,wid); Set_Total_Share(widnest,d,tw,...); for K in 2 .. MaxUserNumber loop accept TransactionId(i: out id) do i:=widnest; end TransactionId; end loop; Get_Claim(widnest,d,1,one_claim,...); Put_Share(widnest,d,1,one_claim*coef,...); for K in 1 .. MaxUserNumber loop accept ACK; -- entry call from User end loop; sum_share:=0; -- for acceptance test for K in 1 .. MaxUserNumber loop Get_Share(widnest,d,K,one_share,...); sum_share:=sum_share+one_share; end loop; Get_Total_Share(widnest,d,total_share,...); CommitTransactionResourceShares(widnest, ...); if total_share=sum_share then task_done; -- local test else alternate_error; end if; end SectorLeader; task body User2 is widnest: SharesPointer; -- transaction id one_claim:ClaimType; myUser: constant:=2; ... begin SectorLeader.TransactionId(widnest); Get_Claim(widnest,d,myUser,one_claim,...); Put_Share(widnest,d,myUser,one_claim*coef,...); SectorLeader.ACK; task_done; end User2; ... end D_CONCURRENT; alternate D_SEQUENTIAL(id: in SharesPointer; d: in SectorNumber; coef: in SharesType; tw: in SharesType) is within time22 task Single; task body Single is ... end Single; end D_SEQUENTIAL; ...