possible outcomes of a CA action can be divided into four cases: normal, ..... end(c) := one of (normal end(c);exceptional end(c);aborted end(c);failed end(c)).
Formalization of the CA Action Concept Based on Temporal Logic D. Schwier and F. von Henke University of Ulm, Germany J. Xu, R.J. Stroud, A. Romanovsky, and B. Randell University of Newcastle upon Tyne, UK Abstract: This paper presents a formalization of CA actions based on tem-
poral logic. CA actions are a structuring technique for the organization and subdivision of concurrent systems. The patterns of interactions between concurrent CA actions re ect the dynamic behaviour of a system. The main properties of the CA action concept are: atomicity, well-de ned boundaries, nesting, concurrency, error recovery and multiple outcomes. Linear time temporal logic is used as a formal framework for the formalization, according to the inherent dynamic and concurrent nature of CA actions. The proposed methods for building concurrent systems from CA actions rely on the atomicity of CA actions as state transitions at some abstraction level. At that level, a CA action can be treated as a single \large" state transition, free from interference from the rest of the system. Thus, only the pre- and post-conditions for the CA action need to be considered. The possible outcomes of a CA action can be divided into four cases: normal, exceptional, aborted and failure.
1 Introduction Real-life concurrent and distributed systems are often extremely complex. Faults occur and cause errors. Their consequences cannot always be limited to within a system component or even a whole computer, or from aecting environment of the computer system. One way to control such complexity, and hence facilitate recovery after an error has been detected, is to somehow restrict interaction and communication between concurrent activities. The usual tool employed in both research and practice to achieve this goal is atomic actions.
The activity of a group of components or objects constitutes an atomic action if there are no interactions between that group and the rest of the system for the duration of the activity [LA90]. An atomic action provides an abstraction that allows the programmer to group a set of operations on objects into a logical execution unit. It can also provide a way of gluing multiple execution threads together and enclosing both their normal and their recovery activities. The coordinated atomic (or CA) action concept [XRR+95] is a generalized form of the basic atomic action structure. CA actions present a general technique for achieving fault tolerance by integrating the concepts of conversations (that enclose cooperatively concurrent activities), transactions (that ensure consistent access to external shared objects), and exception handling (for error recovery) into a uniform structuring framework. A CA action is thus a mechanism for coordinating multi-threaded interactions and ensuring consistent access to objects in the presence of concurrency and potential faults. More precisely, a CA action provides a logical enclosure of a group of operations on a collection of objects. These operations are performed cooperatively by one or more roles executing in parallel within the CA action. The interface to a CA action speci es the objects that are to be manipulated by the CA action and the roles that are to manipulate these objects. In order to perform a CA action, a group of execution threads must come together and agree to perform each role in the CA action concurrently, with each thread undertaking its speci ed role. For a given CA action, there are two kinds of objects to be manipulated by the CA action: external objects and local objects. The eect of a CA action can be observed only through the state of its external objects. The external objects of a CA action can be shared, sequentially or concurrently, with other actions and threads. If a particular implementation permits only sequential sharing, the execution atomicity of a CA action will always hold. However, even if a particular implementation permits concurrent sharing for some reasons (e.g. performance concerns), these external objects must behave atomically with respect to other CA actions and threads, so that the eects of any operations that the roles within a CA action perform on external objects are not visible to other threads or CA actions until that CA action terminates. Local objects that are purely internal to a CA action are used to support interactions and communications between the roles of that action. However, because the CA action concept is recursive and CA actions can be nested, it is important to realize that the notions of both external objects and local objects are relative - for example, external objects of a nested CA action may be the objects local to the enclosing CA action. If it is to support backward error recovery, a CA action must provide a recovery line which coordinates the recovery points of the objects and the threads participating in the action so as to avoid the domino eect [Ran75]. To support forward error recovery, a CA action must provide an eective means of coordinating the use of exception handlers. An acceptance
test can and ideally should be provided in order to determine whether the outcome of the CA action is successful or acceptable. The various threads participating in a given CA action enter and leave the action synchronously. If an error is detected inside a CA action (either by the acceptance test or some other form of error detection), appropriate forward and/or backward recovery measures must be invoked cooperatively in order to reach some mutually consistent conclusion. The desired eect of performing a CA action is usually characterized in terms of a normal outcome and a series of exceptional (or degraded) outcomes. The eects of performing a CA action can be checked by an acceptance test that allows both a normal outcome and one or more exceptional outcomes, with each exceptional outcome signalling a speci ed exception to the surrounding environment. However if it is not possible to satisfy the acceptance test at the end of a performance, even by signalling one of the speci ed exceptions, the CA action is considered to have failed. It is therefore necessary to undo the potentially visible eects of the CA action in this case and signal an abort exception to the surrounding environment. If the CA action is unable to satisfy the \all or nothing property" necessary to guarantee atomicity (e.g. because the undo fails), then an actual \failure exception" must be signalled to the surrounding environment indicating that the CA action has failed to pass its acceptance test and that its eects have not been undone. The system has probably been left in an erroneous state and this should if possible be dealt with by the enclosing CA action (if any). Each of the threads involved in a given performance of a CA action should receive an indication of the result: a normal outcome, an exceptional outcome, an abort exception, or a failure exception. It is important that all of the threads should agree about the outcome whether normal, exceptional or failure. If the threads fail to reach such agreement, then the CA action mechanism itself has failed and this failure must be dealt with at a higher level in the system. Enclosure, or atomicity, is one of the most important properties of CA actions. In order to make error recovery eective, a CA action constitutes a logical enclosure of recoverable activities of multiple interacting roles and creates a \time-space boundary" that role interaction and communication must not cross. The boundary of a CA action consists of the entrance, the exit and two side rewalls. Information can be only passed at the entrance and exit of an action, and during the execution of the action a role inside the action cannot interact and communicate in any way with a role or thread that is not in the action (i.e. not cross the side rewalls). In particular, although for a particular implementation the external objects of a CA action may be shared with other CA actions and threads concurrently, these objects must behave atomically so that they cannot be used as a means of interactions and communications that would enable information to be \smuggled" [Kim82] into or out of that action. The following sections rst introduce the form of temporal logic that is used for the formalization and then develop the various aspects of CA actions.
2 Temporal Logic In this paper, a linear-time temporal logic system is used as a speci cation language for specifying and proving properties of the CA action concept. Lamport [Lam94], Manna and Pnueli [MP91, MP95] and others give a detailed description of the temporal logic framework. In this section, we summarize those aspects needed for formalizing the properties of CA actions. The syntax of temporal logic formulas extends the syntax of formulas in ordinary rst-order predicate logic by the temporal operators 3 (eventually), 2 (henceforth), and 3 ? (sometime in the past). The temporal operators have higher binding power than the boolean operators and quanti ers; parentheses may be used whenever needed. The variables occurring in a temporal logic formula may refer (or correspond) to program variables or may be logical variables. Temporal logic formulas (or, for short, temporal formulas) are interpreted over (in general in nite) sequences of program states. Such sequences are generated by the execution of a program (such as an CA action instance a ). Each state assigns values to the program variables; an execution sequence can thus be viewed as a trace of the values of the program variables. A formula of the form 2P is true in a state of an execution sequence if P is true in that state and all subsequent states of the sequence. A formula of the form 3P is true in a state of an execution sequence if a successor state of that state in the sequence exists for which P is true; correspondingly, 3 ? P is true if a predecessor state exists in which P is true. Most formulas to be considered below have the form 2P , i.e. they state properties that are always required to hold (in the literature frequently referred to as safety properties or invariants). The concurrent execution of several programs is modeled by interleaving the execution sequences of the programs. The interleaving is in general nondeterministic, so that several dierent interleavings may exhibit the same overall behaviour. A temporal formula describing a property of a concurrent system thus characterizes all execution sequences for which the formula holds. For a CA action instance a we call a temporal formula F a property of a if all execution sequences of a satisfy F .
3 Fundamentals of CA Actions In order to formalize the CA action concept, we need a simple model to describe the software systems we are considering. We de ne a system as a set of interacting objects.
An object is a named entity that combines a data structure (internal state) with some associated operations ; these operations determine the externally visible behaviour of the object. A thread is an agent of computation and an active entity that is responsible for executing a sequence of operations on objects. A system is said to be concurrent if it contains multiple threads that behave as though they are all in progress at one time. In a distributed or parallel computing environment, this may literally be true - several threads may execute at once, each on its own processing node. A CA action provides a mechanism for performing a group of operations on a collection of local and external objects. Within the CA action these operations are performed cooperatively by one or more threads undertaking roles that execute in parallel. A role speci es a sequence of operations on a set of objects. In the following, let C always denote the set of instances of CA actions in a given context. When it causes no confusion, we will use the phrase \action c " instead of \CA action instance c ". The following attributes are de ned for every action c .
roles (c ) denotes the set of roles of action c . parent (c ) denotes the CA action enclosing c ; we will use this attribute only in contexts
in which we can assume that an enclosing action exists. There are several important sets of objects associated with an action c : ex objects (c ) is the set of external objects of c ; these objects can be shared with other actions and threads. local objects (c ) is the set of local objects of c . In addition objects (s ) is the set of objects on which the operation sequence s is performed, where s is a role or a single operation.
For any action c 2 C , we consider two basic predicates: begin (c ) is true in exactly that state in which action c starts, while end (c ) is true in the state where action c terminates, i.e. the predicates characterize the initial state and the nal state, respectively, in execution sequences of c .
4 Interface Properties of CA Actions This section speci ed the basic properties of a CA action as they are observable from the enclosing program. Viewed from the outside, a CA action performs what amounts to an atomic state transition. The properties stated here are intended to be sucient for proofs involving an action; in turn, they are requirements on the body of the action.
We need to introduce two new predicates in order to specify the interface properties: called (r ) is true at the state in an execution sequence when role r is called by a thread, and return (r ) is only true at the state at which r returns to its caller. As a simple consequence of the de nitions of begin (c ) and end (c ), it follows that each end (c ) must be preceded by a begin (c ):
? begin (c )) 2(end (c ) ) 3 In addition, we require that every CA action will eventually terminate:
2(begin (c ) ) 3end (c )) For a CA action c to begin it is necessary that each of its roles has been called earlier by a thread. ? called (r )) 2(begin (c ) ) 8 r 2 roles (c ) 3 When a CA action c ends, all of its roles must return in due course.
2(end (c ) ) 8 r 2 roles (c ) 3return (r )) The main eect of a CA action is described by pre- and post-conditions on the values of the objects external to c (i.e. the objects in the set ex objects (c )) in the initial and nal states of c . The pre-condition pre (c ) serves to specify the condition under which the post-condition is expected to hold; its satisfaction is not required for the action c to start, thus pre (c ) is in general independent of begin (c ). There are four categories of possible outcomes of an action c , that is, normal outcomes, exceptional outcomes, no outcome (when an action is aborted) and outcomes caused by failure. With respect to the dierent outcome categories the nal state end (c ) can be further classi ed into four mutually exclusive sub-states (the combinator one of is to express that exactly one of its arguments is true). end (c ) := one of (normal end (c ); exceptional end (c ); aborted end (c ); failed end (c ))
In order to characterize these sub-states, we distinguish between dierent types of postconditions: n post (c ) is the post-condition for execution with normal outcome, e post (c ) the post-condition for exceptional outcome.
If a CA action c began in a state that satis ed the pre-condition and ends normally, then its normal post-condition should hold. ? (begin (c ) ^ pre (c )) ^ normal end (c ) ) n post (c )) 2(3
If a CA action c began in a state that satis ed the pre-condition and ends exceptionally (with a signalled exception), then one of its exceptional post-conditions should hold. We assume that the post-condition e post (c ) includes all possible cases of exceptional termination.
? (begin (c ) ^ pre (c )) ^ exceptional end (c ) ) e post (c )) 2(3 If a CA action c ends with abortion, there should be no visible eect on the external
objects. We express this by stating that whatever the values of the external objects were when the action started, they are unchanged at termination of the action. In the following formula, e objs stands for the list (or vector) of the objects o 2 ex objects , and the (logical) variable o for a corresponding list of (initial) values. init
8 o 2(3 ? (begin (c ) ^ e objs = o ) ^ aborted end (c ) ) e objs = o ) init
init
init
In the case of termination with failure, nothing can be said about the values of external objects, thus no meaningful post-condition can be given for this case.
All threads participating in the execution of a CA action are synchronized. All threads leaving a CA action can rely on the fact that the CA action has terminated and the eects of the action have taken place. Depending on the actual outcome of a CA action, either one of the post-conditions is satis ed, the CA action has aborted and no state change has happened, or the CA action has failed. Each role returns from a call whenever the corresponding instance terminates.
5 Enclosure Property The enclosure property (often referred to as atomicity in the area of transaction processing [LMWF94]) of a CA action can be summarized as follows: 1) Information can be passed only at the entrance and exit of the CA action, and 2) During the execution of the action (i.e. between the begin and end of the action) a role inside that action cannot interact or communicate in any way with a role or thread that is not in the action. For any action c 2 C , we de ne a predicate: inf pass (c ) is true in exactly that state in which information is passed from or to action c . Now the enclosure property of a CA action c can be stated formally.
8 c 2 C 2((inf pass (c ) ) begin (c ) _ end (c ))
For a given CA action its external objects are the only means of passing information from or to the action. This is because the eect of a CA action can be observed only through the state of its external objects. If inf pass (c ) is true during the execution of c , then 1) other actions or outside threads observe the state change of c 's external objects made by c after it begins, or 2) c observes the state change of its external objects made by other actions or outside threads after c begins. It follows that the enclosure property of an action c always holds if a particular implementation guarantees that c is executed as a whole with respect to its external objects, and no other actions or outside threads can access those objects, i.e. inf pass (c ) is false during the execution of c . However, this is too restrictive; in practice, by carefully interleaving the operations of other actions and/or threads on the external objects of c , it is possible to increase concurrency and improve performance, without violating the enclosure property. Consider a particular implementation that permits concurrent access, from other actions and/or threads, to the external objects of a given action c . We will derive two conditions for action c to keep the enclosure property, i.e. inf pass (c ) is false during the execution of c , despite concurrent access to its external objects. Let C be the set of action instances at the same level of nesting within a given enclosing action p , i.e. C = fa 2 C j p = parent (a )g, and let V be the set of possible values of an object o . We rst de ne three relationships between actions and roles at a given level of nesting. For any actions c ; d 2 C , the execution of action c is said to indivisibly precede the execution of action d if the following holds: ? end (c )) 2(begin (d ) ) 3 p
p
o
p
8 o 2 ex objects (c ) \ ex objects (d ) 8 o 2 V 2(end (c ) ^ o = o ) 3(begin (d ) ^ o = o )) We use c d to represent the execution sequences where c indivisibly precedes d . Two actions c ; d 2 C are said to be concurrent if there is always a point in their execution i
i
o
i
p
sequences at which both actions are performing their own operations, that is
? end (c ) _ 3begin (d ) _ 3 ? end (d ))) 2(:(3begin (c ) _ 3
We use c k d to represent execution sequences where c and d are concurrent. Let us now consider a more complex relationship at a given level of nesting between an action c and the roles of its parent p that do not participate in c , i.e. that bypass c . We de ne bypass (p ; c ) as the set of such roles of p . For any i 2 bypass (p ; c ), if ex objects (c ) \ objects (i ) 6= ?, i.e. if role i has access to some of c 's external objects, then we have to
ensure that the operations of role i on these external objects do not interfere with c . Here we de ne ex op (i ; c ) as the set of such operations. (For a given operation op of role i , predicate called (op ) is true at the point in an execution sequence when o p is called by a thread and return (op ) is true when op returns to its caller.) In general, we use c k i to represent interleaved execution sequences of c and role i . The interleaved execution of c and role i 2 bypass (p ; c ) is said to be proper if action c is executed as a whole (i.e. no operation is interleaved into the execution of c ) between the operations in ex op (i ; c ), that is if
8 op 2 ex op (i ; c ) 2((begin (c ) ) 3 ? return (op )) _ (end (c ) ) 3called (op ))) We use c ? i to represent properly interleaved execution sequences. Let s be a set of possible execution sequences characterized by an expression of the form c d , c k d , c ? i or c k i . Let O be a set of objects. For each possible execution sequence in s , we are interested in the values of O at the end of the appropriate action (d for c d , c for c ? i and c k i , and c or d , whichever ends later, for c k d .) For any given execution sequence in s , we will have a set of end values for the objects in O . De ne end values (O ; s ) to be the set of all possible sets of end values of O for execution sequences in s . Given these relationships de ned above, we can now formalize two conditions regarding concurrent access to the external objects of an action. 1. The concurrent (interleaved) execution of an action c and any action d 2 C where p = parent (c ) is in eect equivalent to one of the executions such that c indivisibly precedes d or d indivisibly precedes c , that is p
end values (ex objects (c ); c k d ) end values (ex objects (c ); c d ) [ end values (ex objects (c ); d c )
2. The concurrent (interleaved) execution of a CA action c and a bypass role i 2 bypass (p ; c ) of its parent p is equivalent to one of their proper executions, that is: end values (ex objects (c ); c k i ) end values (ex objects (c ); c ? i )
It follows immediately that the following theorem gives sucient conditions for the enclosure property of a CA action c to hold despite concurrent access to its external objects. Theorem: Let p be the parent action of a CA action c 2 C . For a given implementation, the enclosure property of c holds, i.e. inf pass (c ) is false during the execution of c , if the implementation guarantees:
1. For any d 2 C , any interleaved execution of c and d satis es Condition 1, and 2. For any i 2 bypass (p ; c ), any interleaved execution of c and i satis es Condition 2. p
6 Nested CA Actions A CA action has an internal structure that is not visible to the context in which it is executed. Nesting allows a CA action instance to be composed of several, possibly concurrent, CA actions and operations. This section speci es properties that are to hold of nested CA actions. Nesting of CA actions, as expressed by the call hierarchy represented by the attribute parent , entails that initial and nal states must be properly nested. When an action d is a child of an action c (i.e. c = parent (d )), the initial state (begin (d )) of d must occur after the initial state of c .
8 c ; d 2 C j c = parent (d ) 2(begin (d ) ) 3 ? begin (c )) Termination of CA actions is similarly restricted. A parent action may terminate only when all of its children actions have already terminated. For all children d of an action c the nal state (end (d )) must occur before the nal state of c .
8 c ; d 2 C j c = parent (d ) 2(end (c ) ^ 3 ? begin (d ) ) 3 ? end (d ))
7 Exception Handling and Fault Tolerance This section further characterizes the interface properties of an action from a fault tolerance point of view. A CA action c 2 C should have a simple and deterministic behaviour, that is, either end normally or signal an appropriate exception to its enclosing action (or its environment). The termination model for exception handling is used here, i.e. when an exception is raised within an action, the corresponding exception handler copes with the exception and completes the action execution. Let E (c ) be the set of (internal) exceptions that can occur during the execution of action c , and (c ) be the set of exceptions that c can signal externally. We de ne two new state predicates: raise (e ; c ) becomes true in the state where an exception e is raised within c , and signal (e ; c ) becomes true in the state where action c signals an exception e to its enclosing action or its user environment. Whenever an exception e is raised in c or
signalled to c , the predicate exception (e ; c ) will become true. The eects of error recovery are indicated by two additional state predicates: full recovery (c ) is true if all error recovery performed by c is fully successful, and undone (c ) is true if the undo operation with respect to the eects of c is fully successful.
The normal end of an action can be reached only if no exception occurs or error recovery is fully successful, that is
? exception (e ; c )) _ 3 ? full recovery (c )) 2(normal end (c ) ) (: 9 e 2 E (c ) 3 The exceptional end of an action implies the signalling of an appropriate exception e that speci es an exceptional outcome, that is
2(exceptional end (c ) ) 9 e 2 (c ) signal (e ; c )) The aborting termination of an action has to remove any eect that the action may have had on its external objects and signals a special abortion exception abort :
? undone (c )) 2(aborted end (c ) ) signal (abort ; c ) ^ 3 The failed end of an action is reached if error recovery is not possible. In this worst case, a special failure exception fail is signalled:
2(failed end (c ) ) signal (fail ; c )) Within a CA action it is also important to study the relative ordering of the execution states related to exceptions and exception handling in the interests of fault tolerance. We de ne handling (e ; r ) as a state predicate that is true in the state where role r of action c starts handling the exception e . If an exception is raised within an action and cannot be handled locally by a role, then all the roles of the action must handle the exception cooperatively:
8 e 2 E (c ) 2(raise (e ; c ) ) 3(8 r 2 roles (c ) handling (e ; r ))) The signalling of an exception from a nested action will cause all the roles of the enclosing action handle the exception cooperatively:
8 e 2 (c ) 2(signal (e ; c ) ) 3(8 r 2 roles (parent (c )) handling (e ; r ))) Note that all the roles within a given action are supposed to be able to handle the same exception. However, in a complex concurrent system, there is a possible complication that
several exceptions can occur at the same time and thus require a process of exception resolution [CR86, RXR96]. We now de ne a new predicate resolving (c ) that is true in the state in which action c starts resolving multiple exceptions raised concurrently, and de ne R(e1; : : : ; e ) as a set function that returns an exception which covers all the exceptions e1; : : : ; e that occurred concurrently. Two or more concurrent exceptions, raised in c or signalled to c , will lead to the state that c starts resolving these exceptions using function R(e1; : : : ; e ). k
k
k
8 ce (c ) E (c ) j ce (c ) j 2 ) 2(8 e 2 ce (c ) exception (e ; c ) ) 3resolving (c )) 8 ce (c ) E (c ) j ce (c ) j 2 ) 2(resolving (c ) ) 3(8 r 2 roles (c ) handling (R(ce (c )); r )))
8 Conclusion The formalization as presented here is still preliminary; in particular, proofs about CA actions involving faults and exception handling have not yet been carried out. Further work will investigate the usefulness of our formalization and suggest re nements.
Acknowledgements The work has been supported by ESPRIT Long Term Research Project 20072 on \Design for Validation (DeVa)". We would also like to thank other members of the DeVa team for several interesting discussions on the formalization of the CA action semantics, in particular J. Fitzgerald and P. Ezhilchelvan at Newcastle and E. Canver at Ulm.
9 References [CR86] R. H. Campbell and B. Randell. Error recovery in asynchronous systems. IEEE Trans. on Software Engineering, SE{16(8):811{826, 1986. [Kim82] K.H. Kim. Approaches to mechanization of the conversation scheme based on monitors. IEEE Trans. on Software Engineering, SE{8(3):189{197, 1982. [LA90] P. A. Lee and T. Anderson. Fault Tolerance: Principles and Practice. PrenticeHall, second edition, 1990.
[Lam94] L. Lamport. The temporal logic of actions. Trans. on Programming Languages and Systems, 16(3):872{923, May 1994. [LMWF94] N. Lynch, M. Merrit, W. Weihl, and A. Fekete. Atomic Transactions. Morgan Kaufmann, 1994. [MP91] Z. Manna and A. Pnueli. The Temporal Logic of Reactive and Concurrent Systems. Springer, 1991. [MP95] Z. Manna and A. Pnueli. Temporal Veri cation of Reactive Systems: Safety. Springer, 1995. [Ran75] B. Randell. System structure for software fault tolerance. IEEE Trans. on Software Engineering, SE{1(2):220{232, 1975. [RXR96] A. Romanovsky, J. Xu, and B. Randell. Exception handling and resolution in distributed object-oriented systems. In Proc. 16th Intern. Conf. on Distributed Computing Systems, pages 545{552, 1996. [XRR+95] J. Xu, B. Randell, A. Romanovsky, C. Rubira, R. Strout, and Z. Wu. Fault tolerance in concurrent object-oriented software through coordinated error recovery. In Proc. 25th Intern. Symp. on Fault-tolerant Computing, pages 499{508, June 1995.