Coordinated Atomic Actions for Dependable Distributed Systems: the Current State in Concepts, Semantics and Verification Means
Barbara Gallina and Nicolas Guelfi
Alexander Romanovsky
Laboratory for Advanced Software Systems University of Luxembourg 6, rue R. Coudenhove-Kalergi, L-1359 Luxembourg {barbara.gallina, nicolas.guelfi}@uni.lu
School of Computing Science Newcastle University Newcastle upon Tyne NE1 73U, UK
[email protected]
Abstract Coordinated Atomic Actions (CAAs) have been introduced about ten years ago as a conceptual framework for developing fault-tolerant concurrent systems. All the work done since then extended the CAA framework with the capabilities to model, verify, and implement concurrent distributed systems following pre-defined development methodologies. As a result, CAAs, compared to other approaches available, offer a rich set of means for engineering dependable systems. Nevertheless, it is sometimes difficult to have a global and analytical view of all the features available as this concept provides a number of features which need to be applied in combination. The main contribution of this paper is in presenting a complete state-of-the-art overview of the work done around CAAs from the three perspectives: the definitions of the fundamental concepts, their various semantics and the means supporting formal verification. This paper is useful for the potential CAAs users in helping them to avoid misinterpretation when employing all the available features. Finally, our paper should contribute in better understanding of the likely directions in which the CAA framework may evolve in the near future. Keywords Coordinated Atomic Actions, Fault Tolerance, System Structuring, Exception Handling, Dependability, Formal Methods, Formal Properties, Verification.
1 Introduction The Coordinated Atomic Action concept was introduced in 1995 for modeling complex fault-tolerant object-oriented concurrent systems, defined as a collection of interacting objects [3]. This concept represented a significant im-
provement in the research field related to fault tolerance. By integrating exception handling, conversations (cooperative concurrency) [1] and transactions (competitive concurrency) [2], it provided a well conceived fault-tolerant structuring unit used to tolerate important classes of faults (hardware, software, environmental) and to identify implicit and explicit patterns of interactions pertaining to the system’s dynamic structure. Each Coordinated Atomic Action is conceived as a multientry unit with roles activated by action participants, which cooperate within the action through internal objects. The cooperation takes place also in case of exceptions occurrence: appropriate exception handlers are executed to recover from the erroneous state by going backward (Backward Error Recovery, BER) or forward (Forward Error Recovery, FER). When a CAA starts, an ACID transaction [2], which terminates at the end of the CAA, is started on external objects. A CAA may terminate normally (normal outcome) or exceptionally (exceptional or abort or failure outcome). Whenever a CAA terminates exceptionally, its outcome identifies an interface exception which is signaled to the enclosing world, which recursively may correspond to another CAA (action nesting). It should be pointed out that a CAA may degenerate in a transaction [2], if only one role is involved, or a conversation [1], when no external objects are used. Further work spanning over ten years has enriched the concept capabilities in modeling, verifying and implementing concurrent distributed fault-tolerant systems. The concept itself has been extended in order to be adequate in providing a structuring unit for specific types of systems such as real time, service oriented, mobile and so forth. Achieving a global and analytical understanding of all this work is difficult. The concepts can be misunderstood because a
considerable amount of works, in which slightly different definitions, are provided. Through this paper we aim at providing a state of the art description of the CAA framework, focusing on three different perspectives: concepts, semantics and verification means. Concepts, spanning from those belonging to the original definition (like Role, Participant, etc) to those later added (like Composition) or modified (like External objects and exception resolution graphs) will be informally defined and the properties associated with them explained. This conceptual perspective will be used in this paper in order to make it easy to understand the concepts coverage provided in the formal approaches which provide either a formal semantics to these concepts using known formal methods (like B, Timed CSP, etc) or verification means using model checking techniques and tools (like the SMV system). The thorough analysis will finally allow us to discuss future trends according to these CAA dimensions.
5.
It has, on the entry point, to establish a checkpoint synchronously or asynchronously with the other roles belonging to the same CAA.
6.
It has, on the exit point, to pass an acceptance test synchronously with the other roles belonging to the same CAA.
7.
It may communicate with other roles belonging to the same CAA (intra-action communication) while being in the same state (active, handling, and returning). It may access external objects (extraaction communication) which enjoy ACID properties (Atomicity, Consistency, Isolation and Durability). These communication rules guarantee information smuggling [23] avoidance, that is they avoid any uncontrolled inter-action information exchange.
The rest of the paper is organized as follows. Section 2 defines CAA concepts and associates properties to them. Section 3 describes CAA concept variants. Section 4 analyses the formal semantics provided to represent CAA concepts. Section 5 analyses formal means provided to verify CAA based designs. Section 6 outlines some perspectives. Finally Section 7 presents some concluding remarks and future work.
2 CAA concepts In this section we present the Coordinated Atomic Action concepts. To make our presentation clearer we systematically refer to figure 1. This figure illustrates the informal graphical notation typically used to introduce the CAA concepts. With each concept, written in bold, we associate a set of properties. Participant represents an active component which may be identified with a process, an active object, a thread, a client, a user and so forth. See, for example, P1 in figure 1. 1.
It may sequentially activate different roles.
2.
It may only enter one sibling nested CAA at a time (nested CAA may not overlap). Role represents a program which is activated by the corresponding participant in a fragment of its life-time inside a CAA. See, for example, role R1 in figure 1. 1. 2. 3. 4.
At least one role takes part in a CAA. It is constituted of a behavior and some internal data (internal objects). It may raise, propagate and signal exceptions. It may be in one of the following state: inactive (before being activated by a participant), active (while executing its corresponding program), suspended (after having raised an exception), handling (while executing the handler corresponding to the resolved exception), returning (when able to establish an outcome).
Figure 1 CAA concepts informally represented. Check pointing (also called recovery line) represents a consistent set of checkpoints. At checkpoint, each role saves its state. The set of checkpoints is consistent if the saved states form a consistent global state1. A consistent set of checkpoints allows the domino effect [1] to be avoided. See the set of grey small rectangles in figure 1. Local object represents a passive object which might be non atomic.
1 According to [17] a consistent global state is a cut dividing the time diagram (describing interactions among processes) into two halves where no arrow starts (message sending) on the right hand side of the cut and ends (message reception) on the left hand side of it.
1.
It may be accessed concurrently by a set of cooperating roles when used for cooperation (whenever defined inside the CAA).
2.
Its access has to be controlled to guarantee causal relationship (sending before receiving) in inter process communication.
3.
It may be accessed by a single role only (whenever defined inside the role).
about the exception to be handled or to be signaled. 2.
It is carried out cooperatively using BER (that is by going backward to the previous saved, during check-pointing, error-free state) and/or FER (that is by going forward to a new error-free state).
3.
Exceptions raised concurrently at different levels (for example one in an enclosing CAA, and the other in a nested one) are treated following the blocking approach. The blocking approach imposes to an enclosing CAA to wait for its nested CAAs termination before handling the raised exceptions.
External object represents an atomic passive object. 1.
It may be accessed concurrently by different competing actions.
2.
It is treated in the same way as concurrent resources are treated in the nested transactions [16]: an external object has the ACID properties if it is external to the top level CAA; otherwise it only guarantees ACI properties.
For the sake of clarity, it should be noted that in the literature an object is often defined as internal or external according to the access point of view. An object defined inside an enclosing CAA, e.g. inside CAA1 in figure 1, may in fact be accessed by roles belonging to CAA1 itself or by the roles of the enclosed CAA2. This object is defined as internal in the former case and external in the latter. The external objects, shared among CAAs belonging to different levels of the CAA tree structure, have, however, to be protected as if they were external objects. Exception represents the detection of an error. If the exception is handled at the level of the CAA in which it has been generated, it is defined as internal exception (see, for example, E1 in figure 1) and it is said that it has been raised. If it is handled at the level of an enclosing CAA, it is defined as interface exception and it is said that it has been signaled. A two-level scheme for handling exceptions is adopted. 1.
It may have associated parameters.
2.
Internal and interface exceptions are elements of the two homonym sets.
3.
Abort and Failure exceptions are default exceptions that have to be present in both exception sets.
4.
Abort or Failure Exception has to be signaled in case a role can not be activated.
5.
For any nested CAA, its interface exceptions must be a subset of the internal exceptions set related to its enclosing CAA. Hence if a nested CAA (like CAA2, in figure 1) signals an exception, this exception is propagated to its enclosing environment and is raised at that level (CAA1, in figure 1).
Exception handling represents a strategy of dealing with exceptions. It is carried out during the recovery phase (see recovery in figure 1). 1.
Once an exception is raised inside a role, the cooperating roles have to be informed. Information exchange is fundamental to achieve an agreement
Exception handling context represents the boundaries in which an exception may happen. 1.
It may be identified with a CAA, with a role or with a handler.
Exception resolution tree [4] represents a tree which is traversed to find out the appropriate exception (resolved exception) to be handled in case of concurrent exceptions. The tree is built on the internal exceptions which may be raised concurrently by roles inside the CAA (see the tree in figure 1). 1.
The exception to be used to activate the exception handlers is the exception that is root of the smallest sub-tree containing all the concurrently raised exceptions.
2.
Its root is identified by the Universal Exception. This Universal Exception is handled by default exception handlers which carry out FER and signal a failure exception and diagnostics to the enclosing CAA.
3.
In case of unexpected exceptions, the resolved exception corresponds to a default exception, called Other_exceptions. This exception is handled by default exception handlers which carry out BER and signal abort exception to the enclosing CAA.
4.
It defines a partial order (based on priority and damage assessment conditions) among the internal exceptions belonging to a CAA. In general, failure takes precedence over abort, which takes precedence over every other exception which can be raised internally. It has to be associated to each CAA.
5. 6.
It is traversed in case the number of concurrent exceptions is greater than one.
Exception resolution graph [5] represents a direct acyclic graph which allows specifying a more general exception hierarchy then the resolution trees proposed in [4]. Exception Handler represents the activity to be executed to handle an exception (see, for example, EH1 in figure 1). The handler is automatically executed as soon as the resolved exception is available.
1.
It may need to start nested CAAs.
2.
It may have its own context but it is not allowed to raise exceptions (no nesting handling). It may only signal the failure or abort exceptions.
3.
A handler, associated to each role, has to be present for each internal exception and also for each resolved exception of the exception resolution tree. It follows that at least two handlers are always present: one to carry out BER and then to signal the abort exception and the other to signal the failure exception and provide diagnostics for the class of failure exceptions.
Nested CAA represents, recursively, a CAA defined inside an enclosing one (see CAA2, in figure 1). 1.
Inside it new objects may be created and destroyed. In case of still living objects after its termination, the enclosing one, and only it, has to take care of them. 2. Its participants are a subset of the participants of the enclosing CAA. 3. The enclosing CAA cannot access objects that belong to the nested one, until the termination of this latter. The nested one may access a subset of external objects belonging to its enclosing CAA. 4. Its recovery line must be established after the one of the enclosing CAA and its test line must be set before the one of its enclosing CAA. 5. It is indivisible and invisible for the enclosing and siblings CAAs. Outcome represents the result of the test line (agreement phase). In the test line, the results of the acceptance tests (see the set of black small rectangles in figure 1), related to the roles, are collected and, on the basis of these results, a final outcome is established. 1. Normal, exceptional, abort and failure are the outcomes allowed. The last three outcomes identify the interface exceptions. The failure outcome is the only one for which ACID/ACI properties are not guaranteed (i.e. failure of the transaction system). The exceptional outcome introduces a different semantics for atomicity since partial but consistent results are allowed. 2. Normal outcome (all semantics) is achieved when roles succeed in providing the expected service and they agree on the outcome to be forwarded. 3. A pre-defined Exception (something semantics) has to be signaled if roles agree on the outcome and a degraded but consistent state has been reached. 4. Abort Exception (nothing semantics) has to be signaled either if roles do not agree on the outcome or if an exception is raised during FER. 5. A Failure Exception has to be signaled if roles agree on the outcome and an inconsistent state has been reached.
3 CAAs variants Since 1998, the CAA conceptual model has been evolving in order to meet dependability requirements of more complex distributed systems (real-time, mobile agent-based, web-service-based, COTS-based and so forth) or to be used to model the system since the early phases of the software life-cycle (COoperation Actions). In the following subsections, among the multitude of variants that have been proposed, we will focus our attention on some of them in which cooperative and competitive concurrency may still be recognized. In particular, the evolution towards real time, service oriented and mobile systems will be discussed. Similarly to section 2, we will introduce the concepts and their properties.
3.1 Real Time (RT) CAAs CAAs have been extended to meet real time systems dependability requirements [6]. One of the main issues that pushed towards this variant was related to role desertion (when roles are not ready, that is when they are not activated or have not finished in time), time-dependent exception handling, time triggered CAAs (CAAs started in time by a scheduler or an RT clock) and action level timing constraints (the CAA duration has to respect time constraints). In Real Time CAA: 1.
The roles entry phase, the roles exit phase and the entire CAA have to be accomplished within a precise time interval. Each violation leads to the raising of time-related exceptions. To express these constraints, the informal syntax provides the following keyword: start [t0, t1], finish [t2, t3], within T.
2.
An optimized variant of the exception resolution tree [6], based on the pre-emptive approach, is adopted. The pre-emptive approach, differently from the blocking one, allows an enclosing CAA to force the abortion of its nested CAAs (so the full indivisibility and invisibility are not guaranteed anymore). This abortion is forced at the occurrence of an exception at the level of the enclosing one. This variant introduces time constraints for message passing, for resolving concurrent exceptions, for aborting nested transactions, etc.
3.2 Web Service Composition Action Service oriented systems built upon Web Service architecture are characterized by a set of typical non functional properties and typical faults [7]. One of the main issues that pushed the investigation of a CAA variant for Web Service-based systems is Web Service composition. Because of their autonomy and because they interact with a large number of concurrent service requesters that would not stand extensive delays, Web Services are not suited to be composed of ACID transactions. Locking resources until
the termination of the transaction is in fact not appropriate for Web Services. Moreover distributed transactions management requires cooperation among the transactional supports of the individual Web Services, which may not be compliant with each other and may not even be willing to cooperate. CAAs, as explained in Section 2, require for managing external objects transactional support guaranteeing ACID properties. Therefore CAAs need to be extended to be suitable for structuring Web Service-based systems. The variant of CAAs, in the framework of these systems, initially introduced in [18] and later finalized in [7], is called WSCAs (Web Service Composition Actions). WSCA identifies a dependable composite Web Service, an assembly of autonomous services (an assembly which may be static or, by retrieving the adequate services based on their interfaces, dynamic). Therefore, in addition to the nesting relationship among CAAs, it introduces the composition relationship. It proposes a different approach to deal with concurrent exceptions, because of the difficulty in building an efficient exception resolution tree. Moreover, it also proposes a three-level scheme to handle exceptions instead of a two-level one. The added level consists in attempting, first of all, a local handling at the role level (no cooperative exception handling). Originally, as seen in Section 2, an exception was handled always cooperatively at the action level (raising) and then in case at the enclosing action level (signaling). Finally WSCA proposes also a different criterion to define the isolation levels. In database systems, a locking strategy is defined on the basis of the isolation levels. In the Web Service context, since the semantics of operations cannot be determined in general, these levels, instead of locking resources, are used to restrict the access at the client side. In WSCA (see [7] for more details): 1. A role that calls a composed CAA enters a waiting state in a way similar to synchronous RPC. 2.
3. 4. 5.
The set of external objects accessed by a composed CAA is disjoint with respect to the one accessed by the calling CAA. Therefore, in case of exceptional, abort or failure outcome of the composed CAA, a local exception handling (at the level of the calling role) may be attempted. All the roles of a composed CAA, except one, are activated by participants which do not belong to the calling CAA. External objects identify Web Services and their access is constrained on the basis of three types of isolation levels (none, WSCA-visible and strict). Concurrent exceptions are considered as if they were sequential (in a separate order) and handled separately according to the Guardian model.
3.3 Open mobile Coordinated Atomic Actions
Open mobile CAAs have been foreseen in [19] in the context of mobile agents. Mobile systems are very dynamic and flexible (their structure changes dynamically). The agents that compose these systems are autonomous. Mobile agents requirements with respect to fault tolerance, listed in [19, 20], show that traditional fault tolerance techniques cannot be applied straightaway to handle abnormal situations at the agent level. CAAs provide some interesting features, for example Forward Error Recovery, which make them a potential candidate to achieve fault tolerance in these types of systems. In [20], even though not explicitly indicated, an extension of the CAA conceptual framework, called Context Aware Mobile Agents (CAMA) has been provided introducing location aware actions as well as open actions. In CAMA, mobile agent-based systems are always structured on the basis of their activity. Scopes, which are dynamic container for data (tuples), provide an isolated coordination space for compatible agents. Scopes are analogous to CAAs in fact they can be nested. This CAA variant introduces the concept of location as a physical and/or logical container of scopes (CAAs) and it also introduces a modified exception handling policy to guarantee exception propagation among agents preserving however their anonymity. In CAMA (see [20] for more details): 1.
A participant (agent) may play more than one role at the same time in the same scope and/or in different scopes.
2.
To a participant is associated a location, which represent the physical and/or logical context.
3.
The number of roles which take part into a CAA is not fixed. It must however be inside a desired range [min,max].
4.
A CAA (scope) may start even though not all its roles have been activated. The CAA is in a state called expanding.
5.
The CAA tree is associated to a location.
6.
The root CAA of the activation tree has two predefined roles and a fixed set of operations useful for managing agent names issues and agents cooperation compatibility.
7.
A shared tuple space is available for allowing anonymous inter-agent communication.
8.
The tuple space is divided into normal (internal/external objects) and exceptional tuples (internal exceptions).
9.
Raised exceptions are considered as proto exceptions. They are processed by a special agent (the guard agent) in order to find out the final exception (resolved exception). This special agent is in charge to establish to which agents the final exception has to be propagated.
4
CAA semantics
Several approaches have been investigated to formalize CAAs. In the following, we discuss and compare (on the basis of CAAs concepts coverage) the main contributions. We are aware that contributions comparison would be more precise if carried on using a unique formal framework; however in order to do that we should restrict our attention to a small subset of properties and this would prevent us from achieving our main goal which consists in discovering concepts coverage lacks in order to establish useful future research directions.
4.1 COALA A formal language, called COALA (COordinated Atomic action LAnguage), for specifying system design using CAAs, is introduced in [8]. It provides a textual (concrete) syntax and a formal abstract syntax and semantics based on COOPN/2 [9]. COOPN/2 is an object-oriented formal specification language. It is based on Petri nets for a behavioral modeling of the system’s concurrent functionalities. Partial order-sorted algebraic specifications (Abstract Data Types, ADT) are provided for describing the data structure used in the Petri nets. The core semantics of CAAs consists of twenty-five ADT modules and three COOPN/2 classes, called CAA, Role and Scheduler. The body of each class represents the algebraic net which constitutes of a set of places, some variables, the initial values of the places and a set of behavioral formulae. The ADT modules specify the numerous algebraic types (representing instructions, exceptions, resolution mappings, locks and so forth) required by the three classes. The CAA class module identifies the behavior corresponding to the main coordinating activities which all CAAs are responsible for: starting roles, managing their execution, interrupting them when necessary, resolving simultaneously raised exceptions, coordinating the agreement on the final outcome and so forth. The Role class module identifies the behavior corresponding to the instruction evaluation, to the switching to handler body, to the provision of the outcome and so forth. The Scheduler class implements a locking policy similar to the one characterizing nested transactions [16] in order to guarantee serializability on external objects. A partial coverage of the CAAs conceptual model is related to internal objects; while the concept of participant is completed omitted since roles belonging to a nested CAA are activated by the corresponding roles of the enclosing CAA which act as participants. However nothing is said about the activation of the roles belonging to the root CAA. Internal objects are essential to carry out inter-role communication inside an action but no proposal has been investigated to rule coordination and agreement among roles through a well synchronized access to these objects. A special role called System (introduced to manage internal ob-
jects) is mentioned in the case-studies but it is supposed to be defined externally. Some non formal extensions have been proposed: Async key word represents asynchronous roles (those roles which may enter an already started CAA); Spawn key word allows roles to create new threads and increase concurrency within a CAA, even though this concept is not really part of the CAA concept. These extensions and some others still require to be formalized.
4.2 Temporal Logic of Action In [14], the authors make use of a temporal logic framework in order to define CAA properties and therefore contribute to the formalization of CAA concept. A CAA is formally described as a set. The elements of the set identify all the execution instances (sequences of program states) which may be generated during the CAA execution. Following a property-oriented approach, formulas are recognized as CAA properties when they are satisfied by all the execution instances. Temporal logic formulae specify a series of CAAs properties divided into 4 sets: basic (involving, for example, temporal order among nested and enclosing CAAs), interface (involving, for example, roles synchronization upon entry and upon exit and outcomes specification), fault tolerance (involving, for example, exception handling) and enclosure. This last set has been relevant to capture how CAAs external objects behave in the value domain. A theorem to guarantee information smuggling avoidance, based on serializability, is proposed. This theorem refers to external objects which may be accessed concurrently (in a competitive way) by different CAAs at the same level of nesting or by different CAAs at different level (enclosing and enclosed). Unfortunately no properties have been provided to better specify local objects.
4.3 ERT model Paper [15] introduces a formal model called ERT (Extractions, Refusals and Traces) based on the Communicating Sequential Processes (CSP) divergence-based model. The authors also give some intuitions on how the ERT model may be used to formalize CAAs but unfortunately their proposal is rather poor. A CAA is represented as a process, seen as a network of processes. Processes may represent roles, nested CAAs, internal objects, initialization (the In process takes care of roles activation) and termination (the Out process takes care of the CAA outcomes). Unfortunately among these processes which may compose the CAA process, only In and Out processes have been defined. Exception handling is not considered at all. External objects are not modeled but the authors simply suggest adding input and output channels, ruled with special protocol to guarantee atomicity, to connect the CAA process to further processes representing external objects.
Two alternative solutions are given for modelling roles but no formalization is provided. In the first solution a role identifies a process composing the network, as mentioned before; while in the second one, it identifies an external process.
4.4 B In [12], the authors use B to specify CAAs. B is as a complete formal method that supports a large part of the development life cycle (from abstract specification to implementation). Three B abstract machines (one to describe a CAA, one to describe CAA objects and one to describe CAA participants) support CAAs main concepts and a further auxiliary abstract machine is used to group global declarations. The interesting contribution is the fact that they take into consideration composition relationship, but, unfortunately, they deal only with the ACID external objects and not internal ones. External objects have to provide transactional interfaces to allow performing transactional operations such as begin, commit and abort. Isolation and atomicity properties have been formalized. Formalization of the exception handling and exception resolution tree is only sketched. Unfortunately no further refinement step has been provided in this work. The failure outcome has not been formalized. Unfortunately the capability of the B method in automatically proving model consistency, is not exploited. Invariants are specified and operations are also pre-conditioned; however, nothing is done about proofs. The authors of [21] also use B to formalize a variant of CAAs for mobile agents-based systems. In particular in that work, they investigate the formalization of the location concept by proposing a B abstract machine for it.
4.5 Timed CSP In [13], the authors, using Timed CSP, concentrate their attention on time critical systems structured into CAAs. They introduce time constrains on each message passing, on each event, on each operation mode switching and so forth. These constraints are necessary to satisfy the ones defined at the interface, discussed in Section 3.1. They make also the choice between BER and FER time dependent. FER is often preferred since putting the state backward and retrying take too much time. A CAA consists of a process which has associated at least two participants (roles and participants are used throughout the paper as if they were synonyms) and a fixed set of nested CAAs. Participants and nested CAAs are also processes. Error-free and also error-prone (equipped with special modes of operations to handle exceptions) participants may me modeled. This approach constitutes an interesting step towards the formalization of the CAA extension for real time systems. The CAA formal model depicted, however, moves away
from the original CAA conceptual framework. Since at least two participants are required, a CAA is prevented from degenerating into a transaction. Moreover, a controlled object (we suppose it to be an external object) is manipulated by a unique participant and this excludes the competitive concurrency. An exception handling event has not to be shared by more than one participant. This imposition is not compatible with the cooperative exception handling mechanism which wants all processes taking part into the CAA to handle cooperatively the resolved exception (which identifies the common handling event). This model does not capture the difference between external and internal objects. Throughout the paper, the term controlled object is used instead of external object. Internal object are not mentioned but, maybe, they are identified through a set of events, called CP, which represents the events undertaken in cooperation with other participants. However nothing more is said about this CP set. The difference between signaling and raising is not captured in this model. Exceptional outcomes are not well identified. Even though an exception resolution algorithm, targeting real time systems, is made available, the authors have chosen the algorithm originally proposed in [4]. This choice is probably due to the fact that the process, called XRAISE, which is in charge of exception raising, has a scope limited to the inner-most active CAA.
4.6 SMV In [10], the authors provide a formal specification of CAAs-based designs in terms of finite state transition systems and corresponding SMV modules. Abstractions are, of course, used to deal with finite systems. Two SMV modules are proposed: one, called Role, to represent the generic role behavior and the other, called callRole, to represent role activation/termination behavior. To represent CAAs and participants the skeleton of two other modules is suggested. Concerning CAAs, thanks to their atomicity, the skeleton takes into consideration only their interface (roles, external objects, pre-conditions and post conditions, which identify the CAAs outcomes). Concerning participants, instead, the skeleton allows expressing the conditions under which the various roles have to be activated by the participants themselves. External objects and operations on them are also identified by SMV modules but no skeleton is suggested. Moreover no proposal is suggested to link operations with the corresponding roles. Roles have no program to execute, they only have a state value (non active, active, returning) which determines the corresponding CAA activation/termination. In these works, only a top level view of the CAA-based design is modeled, that is, only those CAAs which identify the roots of the CAA trees. No nesting among CAAs is therefore modeled. Finally, no fault tolerance aspect is taken into consideration.
4.7 Alloy
Table 1CAA concepts coverage in the formal semantics.
The authors of [11] provide a formal specification of CAAs using Alloy. The Action, Role, Participant and RootException types are defined as the main CAAs design elements. Moreover a set of relations are added to further add information to the design elements themselves. Roles, NestedActions, ComponentActions, RolesPlayed, for example, are relations which allow describing the system structure. The exception resolution tree adopted is the one in [4]. However its formalization is not complete since no universal exception and no other-exception are provided. Moreover, exceptions do not have any parameters. Local and external objects are not modeled at all.
Semantics Concept
Alloy
B
COALA
TLA
Timed CSP
SMV
ERT model
Participant
y
y
n
n
y
y
y
Role
y
n
y
yn
n
ny
y
Local obj
n
ny
n
ny
ny
n
yn
Ext obj
n
y
y
y
yn
yn
ny
Ex tree [4]
y
n
y
y
y
n
n
Ex handler
y
n
y
y
y
n
n
Nested
ny
y
y
y
y
n
n
4.8 Summary
Outcome
n
y
y
y
y
y
yn
Ex han.ing
ny
y
y
y
ny
n
n
RT CAA
n
n
n
n
yn
n
n
WSCA
ny
ny
n
n
n
n
n
CAMA
n
yn
n
n
n
n
n
The analysis of the CAA concepts and their formal representation is summarized in Table1. This table shows the concepts coverage available in the formal semantics until now proposed. Whenever a y letter fills the cell it means that the concept is covered. Letter n indicates coverage absence; while the presence of both letters indicate the term coverage but with slightly yn (or significantly ny) different properties. This summary clearly demonstrates that until now no one has really deeply investigated properties concerning the exception resolution tree and those related to internal objects. It also shows that very little has been done on formalization of WSCA and CAMA. The formal semantics in CAMA is developed using also process algebra [22]; however no fault tolerance aspects are taken into consideration. We decided to omit the rows related to those concepts which have not been investigated at all (like, for example, the exception resolution tree based on the pre-emptive approach) or which have to be considered coupled with others (like exception and exception context which have been considered coupled with exception handling).
5
CAA verification means
Some approaches to verify formally and automatically CAA based designs have been investigated. The model checker SMV System and the constraints solver Alloy Analyser are the tools used. In the following, we briefly discuss these main contributions.
5.1 Alloy Analyzer In [11], authors focus their attention on a set of basic, desired and application specific properties (expressed in First Order Logic) related to CAA-based designs. Alloy Analyzer tool is used to verify them. One of the main contributions of this work is related to the verification of coherence in the exception resolution tree. The third basic property is, however, arguable. This property wants the exception resolution mechanism of an action to resolve all possible combinations of concurrent internal exceptions, unless explicitly stated otherwise. Since, however, no universal exception is considered in the model, if the designer decides to avoid providing a handler concerning concurrent exceptions, no resolution may take place.
5.2 SMV System In [10] and related technical reports, authors exploit the usefulness of model checking technique to identify flaws in their designs. They also formalize (in CTL) and verify interesting properties related to fault tolerance. Fault tolerance is expressed by stating that along each execution path, a desired property P is valid if no faults or only tolerable faults occur. In CTL this fault tolerance property becomes:
AG (tolerableÆ P). P could express the degenerated (however under control) service guaranteed by the system even in the presence of faults.
6
Perspectives
The summarizing table in Section 4 shows that some fundamental concepts, such as the one representing internal objects, have not yet been formalized. This gap between the concepts and the semantics cannot be neglected. The main innovation of the CAA framework corresponds to the effort of combining conversations and transactions. Internal objects constitute the fundamental cooperation means among interacting roles. Further investigation should, therefore, address the provision of a language, in which internal objects are formalized in order to support also cooperative concurrency. Until now, in fact, only competitive concurrency has been tackled in depth. Inspired by the works on nested transactions, properties have been formalized to ensure competition guarantying serializability (see the scheduler class in [8]). Properties inherent to internal objects should also be deeply understood to provide coordination and agreement means for reasoning about normal and abnormal behavior. A COALA extension could be investigated to cover these aspects. The summarizing table also reveals that no investigation has been carried out concerning the formalization of those external objects which satisfy relaxed ACID properties. During our discussion of the CAA variant called WSCA in Section 3.2, we mentioned that in the service-oriented systems it is not feasible to fully guarantee the ACID properties. As seen, in [7], in fact, new types of isolation levels have been introduced which take into consideration the nature of Web Services. Future works, therefore, should tackle these aspects in more detail and provide, for example, a formal semantics for WSCAL (Web Service Composition Action Language), the declarative XML based language used to describe the WSCA behavior [7]. According to the general concept of atomicity, a CAA is invisible and indivisible. So the blocking approach to handling exception seems to be more appropriate. A preemptive approach is less suitable because (see section 3.1) it allows an enclosing CAA to force the abortion of its nested CAAs (indivisibility and invisibility are not guaranteed anymore). But by relaxing atomicity the pre-emptive approach adoption would not only make sense but would be more efficient. Future research could therefore investigate, beside “relaxing” atomicity, the use of the second version of the exception resolution mechanism [6], in which the pre-emptive approach is adopted. Moreover, since this variant should guarantee a better time performance, it could be combined with the investigation of real time constrains. For what concerns the CAA extension for mobile agentbased systems (CAMA), as discussed, the formalization work is still ongoing and therefore it would be interesting
to try to investigate potential exploitation of previous works in B, done around the original CAA concept. Concerning verification and in particular model checking it would be interesting to increase the safety and liveness properties to be model-checked covering structural and dynamic, but also application specific aspects. As known, Linear Temporal Logic (LTL), w.r.t. Computation Tree Logic (CTL), has a different expressive power. For example, in LTL and not in CTL, we can state that if a formula p holds infinitely often, then q will be valid eventually (eventual reliability). This type of formula may be particularly interesting to express tolerance of transient faults by retrying. In the future, therefore, a combination of LTL and CTL properties could be investigated in order to be able to express a richer set of properties. Moreover, bounded liveness properties (which in reality identify safety properties) may be taken into consideration to model-check that not only infinite loops are absent but also that something positive happens within a desired time interval.
7
Conclusion and future work
In this paper, we have presented the state of the art related to Coordinated Atomic Action conceptual framework. We have informally defined concepts associating with each of them a set of properties. We have then analyzed the completeness and the correctness of the various formal semantics proposed. As a comparison criterion, we have used the coverage of the properties previously mentioned. From this analysis a summarizing table has been derived. We have also discussed the state of the art concerning the verification means capabilities of the CAAs framework. In particular we have focused our attention on automatic techniques such as model checking. On the basis of the identified omissions and shortcomings, related to the semantics and to the verification means, we have sketched some perspectives in which we suggest and motivate potential research investigation directions. In the future, by exploiting this work as a starting point, we will aim at providing our own description of the CAAs conceptual framework. We will work towards defining a new formal language to represent a CAA extension to structure service-oriented systems with soft real time constrains. Beside we will investigate means to enhance results in formal verification of CAA-based designs, starting by considering a richer class of fault tolerance properties to be model-checked. These works on formalization and verification will be integrated into a general software development process supporting application of the CAA framework.
Acknowledgements We thank our colleagues, A. Berlizev, A. Capozucca, A. Cultrone and J. Kienzle for their helpful comments. A. Romanovsky is partially supported by the ICT RODIN and EPSRC TrAmS projects.
References [1] B. Randell; System Structure for Software Fault Tolerance. IEEE Trans. On Software Engineering, SE-1 (2), pp.220-32, 1975.
cept Based on Temporal Logic. (DeVa) Basic ESPRIT Project. Second Year Report, Volume 2, University of Newcastle, UK , February , 1998.
[15] M. Koutny, G. Pappalardo; The ERT Model of Fault-
[2] J. Gray, A.Reuter; Transactions Processing: Concepts and
Tolerant Computing and Its Application to a Formalisation of Coordinated Atomic Actions. CS-TR: 636, Department of Computing Science, Newcastle University, 1998.
[3] J. Xu, B. Randell, A. Romanovsky, C. Rubira, R. Stroud, Z.
[16] J. E. B. Moss; Nested transactions: an approach to reliable
Wu; Fault tolerance in concurrent object-oriented software through coordinated error recovery. In FTCS-25, California, USA, pp.499-509, 1995.
[17] S. K. Shrivastava, L. V. Mancini, B. Randell; The Duality of
Techniques. Morgan Kaufmann Publishers, 1993.
[4] R. H. Campbell and B. Randell; Error Recovery in Asynchronous Systems. IEEE Trans. Software Eng.,. 12,.8, pp. 811-826, Aug. 1986.
[5] J. Xu, A. Romanovsky, B. Randell; Concurrent Exception Handling and Resolution in Distributed Object Systems. In IEEE, TPDS-11, 10, 2000.
[6] A. Romanovsky, J. Xu and B. Randell; Exception Handling in Object-Oriented Real-Time Distributed Systems. In the 1st IEEE Int. Symposium on Object-oriented Real-time Distributed Computing. Kyoto, Japan. April, pp.32-42, 1998.
[7] F. Tartanoglu; Dependable Composition of Web Services. PhD Thesis Report, University of Paris 6, France, December 2005.
[8] J. Vachon; COALA: a design language for reliable distributed systems. PhD thesis, EPFL, no 2302, 2000.
[9] O. Biberstein; CO-OPN/2: An Object-Oriented Formalism for the Specification of Concurrent Systems. PhD thesis, University of Geneva, July 1997.
distributed computing. Massachusetts Institute of Technology Cambridge, MA, USA, 1985. Fault-tolerant System Structures. Softw., Pract. Exper. 23(7): 773-798, 1993.
[18] A.F. Zorzo, P. Periorellis, A. Romanovsky; Using Coordinated Atomic Actions for Building Complex Web Applications: a Learning Experience. The 8th IEEE International Workshop on Object-oriented Real-time Dependable Systems, Guadalajara, Mexico, IEEE CS, 2003.
[19] G. Di Marzo Serugendo, A. Romanovsky. Designing Faulttolerant Mobile Systems. In Proceedings of the International Workshop on Scientific Engineering for Distributed Java Applications (FIDJI 2002), Luxembourg, Luxembourg, 2829 November 2002 Guelfi, N., Astesiano, E. and Reggio, G. (eds.). LNCS 2604 pp.185-201. Springer-Verlag 2003.
[20] A. Iliasov, A. Romanovsky. CAMA: Structured Coordination Space and Exception Propagation for Mobile Agents. ECOOP 2005, Workshop on Exception Handling in Object Oriented Systems: Developing Systems that Handle Exceptions. Glasgow UK, July 25, 2005.
[21] A. Iliasov, L. Laibinis A. Romanovsky, E. Troubitsyna. To-
E. Canver, F. von Henke; Rigorous Development of a SafetyCritical System Based on Coordinated Atomic Actions. In FTCS-29, Madison, USA, pp. 68-75, 1999.
wards Formal Development of Mobile Location-based Systems. REFT 2005 Workshop on Rigorous Engineering of Fault-Tolerant Systems, at FME, Newcastle Upon Tyne, UK, June 2005.
[11] F. C. Filho, A. Romanovsky, C. M. F. Rubira; Verification of
[22] A. Iliasov, V. Khomenko, M. Koutny, A. Romanovsky. On
[10] J. Xu, B. Randell, A. Romanovsky, R.J. Stroud, A.F. Zorzo,
Coordinated Exception Handling. In Proceedings of the Applied Computing 2006: the 21st annual ACM Symposium on Applied Computing, Dijon, France, Volume 1, pp. 680-685, April 23-27, 2006.
[12] F. Tartanoglu, N. Levy, V. Issarny, A. Romanovsky; Using the B Method for the Formalization of Coordinated Atomic Actions. CS-TR: 865 School of Computing Science, University of Newcastle, Oct 2004.
[13] S. Veloudis and N. Nissanke; Modelling Coordinated Atomic Actions in Timed CSP. In Proceedings, 6th International Symposium on Formal Techniques in Real-Time Fault Tolerant Systems. Pune, India. LNCS 1926 Springer Verlag, pages 228-239. September 2000.
[14] D. Schwier, F. von Henke, J. Xu , R.J. Stroud, A. Romanovsky, B. Randell; Formalization of the CA Action Con-
Specification and Verification of Location-Based Fault Tolerant Mobile Systems. M. Butler, C. B. Jones, A. Romanovsky, E. Troubitsyna (Eds.): Rigorous Development of Complex Fault-Tolerant Systems. LNCS 4157 Springer 2006.
[23] K.H. Kim. Approaches to Mechanization of the Conversation Scheme Based on Monitors. IEEE TSE-8, pp. 189-197, 1982.