An Approach to Simplifying Formal Veri cations of Protocols through Identi cation of Modular Blocks in Redundancy Management Protocols Purnendu Sinha ECE Dept. Boston University Boston, MA 02215
[email protected]
Neeraj Suri Dept. of Computer Engineering Chalmers University Goteborg, Sweden
[email protected]
Abstract
Dependable system designs typically use redundant resources and redundancy management protocols to deliver reliable and timely services. For such systems, a considerable eort gets expended in ascertaining the correctness of the system operations. In recent years, formal methods have been extensively used for proving the correctness of fault-tolerant system design and implementation [5]. We have also utilized formal methods for V&V of dependable protocols [34, 35, 37], and have observed that a number of protocols providing for distributed and dependable services can often be formulated using the same and, very few, basic functional primitives or their variations. Thus, our perspective from the formal viewpoint is to investigate techniques that could eectively de ne and reuse basic formal modules in order to simplify the V&V for a spectrum of protocols. To support this outlook, our objectives in this paper are to: (a) identify functional building blocks that can be reused in formulating varied redundancy management protocols, (b) provide guidelines for constructing formal library routines of these functional building blocks, and (c) highlight subtleties in block interactions and inter-dependencies between building blocks which would in uence the overall correctness of the composite protocols.
Keywords { Dependability, Protocol Composition, Redundancy Management, Formal Techniques.
1 Introduction
Redundancy, in the physical or temporal domain, is a well established approach for providing dependable services. The chosen redundancy management protocols also characterize a speci c system design. The design and, especially, testing of these dependable protocols involves investigating the extremely large operational state spaces involved. This operational state space issue, along with the arbitrary large number of possible execution paths to explore in these protocols, consequently constrains the eectiveness of conventional veri cation and validation (V&V) techniques. As formal methods [5] provide an extensive support for automated and exhaustive state explorations over the formal veri cation based analysis of operations of a given protocol, in [37, 34] we introduced a formal methods based approach to speci cally identify pertinent test cases to guide the validation process. Over these studies, we observed that most protocols that provide for dependable services required us to model just a few basic functional primitives for its testing. From the formal viewpoint, we are investigating whether these functional primitives can be characterized for specifying generic distributed, dependable protocols. Thus, an immediate aim is to identify these functional blocks for their subsequent reuse in formulating generic protocols, and construct formal library routines for these identi ed building blocks. Supported in part by DARPA Grant DABT63-96-C-0044 and NSF CAREER CCR 9896321
1
Motivation: Typical examples of important building blocks in the construction of fault-tolerant systems include [6]: consensus (or exact agreement), atomic broadcast and group membership. Formal speci cation and veri cation of these building blocks exist in the literature [11, 19, 29, 41]. The work reported in [40] combines some of these building blocks in the design of dierent diagnosis algorithms, and presents formal analyses of the proposed algorithms. It is important to point out that formal speci cations and related veri cation in [40] have been performed on a progressive or hierarchical basis covering a complete range of faults from benign faults to arbitrary (or Byzantine) faults. Here, we give an example of a FDIR protocol to present our motivation and outlook towards resuability of formal theories. The FDIR protocol besides fault detection (& diagnosis) also employs operations for fault isolation and resource recon guration. As we will show in Section 3.6, FDIR protocol can be composed utilizing building blocks of voting, synchronization, agreement, atomic broadcast and checkpointing. Suppose, that there exists a formalization of these dierent building blocks in some suitable formal languages. If someone wants to formalize the FDIR in that particular semantic framework, then it would be bene cial if a methodology was there which would allow the formal constructs of these building blocks to be utilized eciently, and guide the process of establishing the proof of correctness of the FDIR protocol. We believe this to be a promising approach as it would curtail the time and eort needed to establish the correctness of a new algorithm which is based on any of these well-founded theories. It seems quite feasible with a formal environment with a mechanized support (for example, PVS [24]). The advantage of this type of formal environment/tool in modular composition is that the built-in typechecker can ag inconsistencies in type de nitions appearing in dierent theories, or instantiations of imported theories of building blocks, and thus can detect simple errors at an early stage. We have chosen the FDIR as a speci c example to illustrate this fact. However, the approach is generic in the sense that formalization of a chosen dependable real-time protocol can be performed utilizing the formal constructs of identi ed basic primitives. A key aspect in modularly constructing a new protocol out of these building blocks is that the inherent dependencies among these components need to be identi ed and studied. To support our philosophy of \reuse of primitives" for formulating and testing redundancy management protocols, our speci c objectives here are to develop a modular approach to specifying/verifying dependable protocols through: Identifying building blocks within the class of redundancy management protocols, Highlighting and addressing issues involved in block interactions and inter-dependencies. Developing guidelines for constructing libraries of formal speci cations (and their associated veri -
cation) of these categorized building blocks/protocols that can be utilized in establishing the overall correctness of the composite protocol.
It is important to emphasize the fact that by de ning and a priori validating building blocks for dependable protocols, larger and more complex protocols can be readily veri ed and validated, if composed of these building blocks. Modular composition and resuability of modules are general concept widely being used in distributed system design. It is, therefore, important to highlight the distinction of our approach towards modularization from other \building block" approaches. Related Work: Modularization is a well-known technique for simplifying complex software systems. Most approaches [1, 2, 9, 13, 12, 15, 16, 20, 23, 38] to modularizing dependable/distributed protocols focus on developing an implementation of the protocol by combining selected components (or micro-protocols). Some of these approaches [1, 2, 9, 15, 20] provide for a formal framework to ascertain con gurations against system speci cations. An important distinction of our approach from other approaches to modularization is that we iteratively add implementation or parametric details to an abstract speci cation, and establish the proof of correctness. Our modular composition of dependable distributed protocol is speci cally based on formal speci cation of building blocks (or functional primitives) and with a consideration to guide and 2
supplement veri cation and validation process [37]. The main bene t of our approach is that the proof of correctness of the composite protocol speci cations can be readily established utilizing the proof-constructs being developed for these basic building blocks. Instead of de ning formal theories from scratch each time a new protocol is tackled, we could utilize reusable and parameterized theories. The organization of the paper is as follows. Section 2 presents our building block approach to protocol composition, identi es dierent building blocks or basic primitives, highlights various issues/nuances of modular composition, and then discusses how this approach would help perform V&V of these dependable protocols. Sections 3.1 through 3.5 focus on outlining and developing the basic building blocks of redundancy management protocols and highlight salient features and invariants of these constituent blocks. Utilizing these building blocks, Section 3.6 presents hierarchical composition of a fault detection, isolation and recon guration (FDIR) protocol to illustrate our modular approach. We conclude with a discussion in Section 4. With this background, we now develop our modular approach to protocol composition and identify constituent blocks or functional primitives that are used in formulating protocols, and redundancy management protocols in particular. We then elaborate on each of these building basic primitives to justify their selection as a building block.
2 The Building Block Approach to Protocol Composition and V&V
In this section, we identify the basic building blocks of redundancy management protocols and outline a formal framework of these modules/primitives which are constituent to the overall protocol development. As agreement, synchronization, detection are essential functional primitives for developing most dependable applications, we envision that our experience with modularity of these fault-tolerant services would help us identify the overall eectiveness of our proposed formal-methods-based approach for V&V. At this stage we utilize Figure 1 to depict our general approach for formal-method-based V&V of dependable distributed protocols. Building Blocks
Theories of Time & Failure Models
Communication Primitives
System Models
Voting/Convergence Functions
LEVEL
Building Blocks Specification and Verification Consistency of Specification Across Building Blocks Synergistic Formulation of Dependable Distributed Operations
Protocol Verification Protocol Validation
Figure 1: A General Framework For the scope of this paper, we will restrict our subsequent discussions till the level of \Synergistic Formulation : : :" (See Fig. 1). A fundamental issue in our generalized formal approach for V&V is to identify basic building blocks which are constituent to most of the fault-tolerant protocols used in practice. Towards this issue, we have rst identi ed four primary protocol building blocks, namely:
System models, Failure models, Communication primitives, and Voting/convergence functions. 3
We acknowledge the fact that these basic blocks may not possibly cover all nuances of redundancy management protocols. However, at this stage, these individual units suce to demonstrate our composition approach. The important aspects of these building blocks will be elaborated as we discuss individual redundancy management schemes in Section 3. It is important to mention that our proposed general framework is applicable to the composition of a generic protocol out of other system building-block protocols which have already been composed out of these basic primitives. We will elaborate on these aspects in Section 3.6 where we demonstrate our approach through a case study of a FDIR protocol. We begin our discussion with issues and nuances of our proposed modular composition approach as outlined in Figure 1.
2.1 Modular Composition: Issues and Nuances
As depicted in the top most level of Figure 1, building blocks, namely system and failure models, communication primitives, and voting/convergence functions, are utilized to compose redundancy management protocols. In order to form a consistent composition out of these building blocks, it is necessary to identify some salient features (or invariants) of these basic blocks so that their consistency can be checked across dierent levels. An important step towards this objective is to identify dependencies among constituent blocks being used in this composition. It is necessary to ensure that the composition of these building blocks would function properly in the presence of these dependencies and provide the desired services. A building block B1 is said to be dependent on another building block B2 , if the correctness of B1 depends on the correctness of B2 . Identifying dependencies among dierent building blocks is just one facet of the correct composition of a given generic protocol. Another facet is to ensure that the building blocks are compatible, i.e., they are based on the same set of assumptions. Also for speci cations to be non-con icting across various building blocks, the formalizations of the failure models considered have to be uniform throughout. For example, a basic block designed to handle only omission failures cannot be trivially composed with other Byzantine fault-handling basic blocks to provide an overall Byzantine resilient service. Other examples include group semantics, system model (e.g., assumption of synchrony, etc.). An important objective in choosing constituent blocks for composition is to make sure that inherent properties of these blocks are compatible and do not lead to a possibility of a deadlock condition. However, such a generated con icting condition is a direct re ection of an erroneous composition. Such con icts get agged at the \Consistency Check Across Building Blocks" stage of our building block approach (See Fig. 1). We now present a rst cut in generating guiding principles for protocol composition. We acknowledge that these guidelines may not cover all aspects of dependable distributed protocols composition. Based on the expected services of a chosen protocol, we rst identify building blocks which would help constitute the protocol and provide the required services in composition. We propose the following guidelines to facilitate a generic protocol composition: Only semantically non-con icting blocks should be composed at a particular level of hierarchy to achieve the desired service of the protocol. To ensure consistency across dierent levels (either horizontal or vertical), and to ease modi cation of the speci cation of building blocks used repeatedly in the protocol composition, these building blocks should be speci ed independently across the levels. The core/basic speci cation of the building blocks should not be modi ed while specifying these blocks across dierent levels. Only those requirements should be modi ed which would not cause any con icts across dierent levels. The invariants of all the building blocks at a level higher to a composite protocol (i.e., all the parents) apply to the protocol itself. At present, our interest is in ensuring consistency across requirements and speci cations. Although this is a high level abstraction for protocols, and speci cation level consistency does not necessarily imply functional consistency, we do believe this to be a necessary rst step to providing functional consistency. 4
We begin our discussion with individual building blocks mentioned on the top most level in Figure 1, and highlight the salient features (and invariants) and services of these building blocks.
2.2 Basic Building Blocks System Models
We consider a distributed system framework comprised of a collection of processors (or nodes) connected via a communication network. Each processor has a clock and local memory of its own. The system, in itself, does not inherently possess a global time base across the nodes, and there is no shared memory between processors. Nodes in a system communicate with each other by passing messages over the de ned communication network. In general, a distributed system is modeled as having nodes and a communication network as the basic components. The con guration of the communication network can vary from pointto-point networks to local-area broadcast networks. The system model in uences the nature of synchrony to be maintained1 , i.e., synchronous or asynchronous. The synchrony of the system relates to assumptions made about the time bounds on the performance of the system. We illustrate pertinent aspects of synchrony through characterizing its functions as a module given below. We will utilize this format of \module" throughout the paper. Module Name: Synchronous System Module Attributes:
For a chosen message type, transmission and processing delays are bounded by a constant d; this consists of the time it takes for sending, transporting, and receiving a message over a link. Every process 0p has a local clock Cp with known bounded rate of drift 0 with respect to real-time. That is, for all p and all t > t , ? Cp (t0 ) (1 + ) (1 + )?1 Cp ((t)t ? t0 ) where Cp (t) denotes p's local clock time at real-time t.
The clocks of correct processors are monotone increasing functions of real-time and the resolution of processors clocks is ne enough, so that separate clock readings yield dierent values, i.e., t1 t2 $ Cp (t1 ) Cp (t2 ). There are known upper bounds on the time required by a process to execute a step. A system is asynchronous if there are no bounds on message delays, clock drifts, or the time necessary to execute a step. In general, asynchronous systems do not explicitly utilize the notion of time. As predictability is an important aspect in the design of dependable computer systems deployed for critical applications, we primarily focus on synchronous models though our approach directly extends to asynchronous, timed-
asynchronous and quasi-asynchronous models as well. These system models are further distinguished by the nature of messages exchanged by nodes. In unauthenticated message protocols, the sender of a message is assumed to be identi able. In authenticated message protocols, a digital signature (using error detecting codes) is associated with a message which explicitly characterizes its identity to prevent another node from forging this message.
Failure Models
This particular block de nes general failure models. The service speci cation of a processing element's acceptable behavior may prescribe both the processing element's response for any initial server state and input, and the real-time interval within which the response should occur [6]. A process is faulty in an execution if it does not behave in a manner consistent with the service speci cation. A failure model speci es how a faulty process can deviate from its speci cation, or de nes the behavior of a processing element once it has become faulty. For fault diagnosis, the processor fault classes considered in the literature are transient, intermittent, or permanent. These common fault classes characterize errors which occur in the data domain. We can represent various data-domain faults as a state transition by de ning the rates at which the fault switches states, a time variable capturing the duration of the fault [18]. Figure 2 depicts the state transition diagram. 1
Other system models such as timed-asynchronous [8] and quasi-synchronous [39] also appear in the literature.
5
a(t) No Fault
b(t) Fault Active
c(t)
Fault Benign d(t)
Figure 2: Data-Domain Fault Models The rates at which permanent, transient and intermittent faults switch states are as follows: Permanent: a(t) > 0; b(t) = c(t) = d(t) = 0 Transient: a(t) > 0; b(t) = 0; c(t) > 0; d(t) = 0 Intermittent: a(t) > 0; b(t) > 0; c(t) = 0; d(t) > 0 Another classi cation of faults is based on the perturbations that occur in the time domain, and their being detectable in the time domain. The classes, from strongest to weakest, are fail-stop faults, crash faults, omission faults, timing faults, and Byzantine faults [3]. The fault classes discussed above can be eectively applied to both processors and links.
Communication Primitives
This block outlines basic assumptions of services provided by the underlying communication mechanism being used by the protocol. In general, communication protocols utilize a datagram service which allows the transmission of messages along links between a pair of nodes in a point-to-point communication network. To avoid waiting forever for packets that will never arrive, we impose a timeout delay D, such that a datagram message that travels more than D time units between node p and node q is considered lost. The datagram service has omission or performance failure semantics. Let d be a bound on datagram message delay. A datagram omission failure occurs if m is never delivered at q , while a performance failure occurs if m is delivered after d time units. We also assume that the communication network provides a diusion service, in which a node p diuses a message to another node q by sending message copies in parallel on all paths between p and q . We assume that the number of communication components (processors and links) that can be faulty during a diusion is bounded by some arbitrary but xed constant F . In [7], an implementation of a synchronous atomic broadcast service using a low-level point-to-point datagram service and synchronized clocks is presented. Module Name: Atomic Broadcast Module Attributes:
The network remains connected even upon failures in components (processors and links). The clocks of correct processors are approximately synchronized within a maximum allowable deviation. Transmission and processing delays of messages are bounded by a constant.
Requirements: Atomicity: if any correct processor delivers a message at time T on its clock, then that message was initiated by some processor and is delivered by each correct processor at time T on its clock. Order: all correct processors deliver their messages in the same order. Termination: every message whose broadcast was initiated by a correct processor at time T on its clock is delivered by all correct processors at time T + on their own clocks.
Voting/Convergence Functions
Redundancy basically implies having multiple system entities execute the same task, and compare their outputs. The primary redundancy technique employed for detecting errors has been duplication, where two processors perform the same computation and their outputs are compared to detect an error. If at least three processors are involved (e.g., TMR), the comparison can be performed through majority voting (e.g. 2-out-of-3), and thus the eects of errors can be masked. However, in the presence of arbitrary (Byzantine) 6
failures, these simple voting mechanism may not work, and thus necessitates Byzantine-resilient mechanism to perform comparison. In Section 3.1, we will present examples of commonly used voting techniques and fault-tolerant averaging functions.
2.3 Building Blocks ?! Protocol Speci cation
So far, we have outlined the basic primitives in formulating dependable protocols. The next step is to formally specify and verify these basic primitives. Based on these, the crucial issue is to demonstrate that a given dependable protocol can be composed from these building blocks, or to show that a protocol can be decomposed into these identi ed primitives. The major eort goes into specifying the basic features of the protocol in terms of these basic blocks to achieve the desired FT-RT services. We stress that prior to composing a protocol with the building blocks, the consistency of the requirements across these blocks should be established to avoid any con icting speci cations. This is an important factor in our selecting speci c building blocks that conform as such. It is important to point out that we verify properties of set(s) of axioms that form the speci cation and tackle the inconsistencies at the speci cation level.
2.4 Protocol Veri cation
Following the formal speci cation of a chosen protocol, at the protocol veri cation stage, the correctness of various theorems/conjectures re ecting desired services (or expected behavior) are established. The eort in verifying the correctness of the protocol properties is in proving that the conjunction of the speci cations of the protocol and the basic primitives lead to desired FT-RT services. It is important to point out that due to subtle dependencies between building blocks of fault-tolerant systems, simple parallel composition rules as the one presented in [14] may not be sucient to capture all nuance of these building blocks interactions. We are investigating existing approaches for modular veri cation [1, 12, 22] to tackle cases involving inter-block dependencies.
2.5 Protocol Validation
For completeness, we brie y discuss protocol validation2 aspects of our building block approach. We utilize our developed representation structures [37] to encapsulate protocol attributes generated over the formal speci cation and veri cation process to identify system states and design/implementation parameters to construct test cases. The permutations and combinations of these identi ed (possibly minimal) set of variables basically constitutes a test case to validate the protocol. For a detailed discussion on the representation structures and their role in test cases identi cation, the reader is referred to [37, 34].
3 Redundancy Management Operations
At this stage, we have outlined our modular approach for protocol composition, and discussed basic primitives needed in formulating any generic redundancy management protocols. We now start focusing on speci c operations which serve as building blocks and are essential in managing redundancy. After a chosen system model and associated fault/error model is established, the empirical formulation of a generic redundancy management protocol involves the following procedures:
Specifying the procedures for assimilating varied information obtained from the redundant system
entities, and disseminating (& pruning) it to obtain a value(s) usable in subsequent protocol operations. We label these building blocks as \Voting/Convergence functions". We incorporate the requisite error detection operations in this step as well. This step only de nes the process of voting on replicated data without specifying the underlying replication process. This leads to the next building block of \redundancy approaches". Outlining the \Redundancy Approaches" which will de ne the nature of redundancy and the associated recovery process that will utilize the dissemination of redundant information obtained in the previous step.
Validation refers to ascertaining whether the implementation meets the design speci cations by testing it in an actual run-time environment. 2
7
Characterizing the speci c operational features, within the chosen system models, that are essentially
required in formulating any redundancy management procedures. We have chosen, and limited ourselves to the following three functions (blocks) that we have repeatedly encountered in analyzing dependable protocols, namely: Synchronization function Agreement function Checkpoint establishment
Our intent is to categorize basic, though essential, functions and/or operations that are extensively used across a variety of redundancy management protocols. We do not claim completeness of this categorization. These constitute the building blocks we have encountered most often in composing redundancy management operations. For each of these blocks, we provide a brief description of the block and then detail the formal attributes of each block.
3.1 Building Block 1: Voting/Convergence Functions
In this section, we will look at voting and convergence functions used for redundant systems, and highlight their properties of interest.
Module Name: Voting/Convergence Function Module Attributes: input: x1 ; x2 ; xN , a set of N outputs from N redundant units.
output: majority value(s)
All N units perform identical operation using identical inputs. N must be at least 2m + 1 or 3m + 1 for NMR or Byzantine-resilient systems, respectively, where m is the maximum number of faulty units. All inputs to the voter must be from the same cycle or round. Voter processing time must be less than the incoming message rate. The order of the input sequence must be sustained. To prevent inputs from being \stale", all voter inputs must arrive within a speci ed time window. Requirements: Voting: The value determined by a voting mechanism must always be contained in the set of consensus outputs produced by the formalized majority voter. Convergence: Two dierent evaluations by a convergence function must be within a known precision bound, and the evaluation by a convergence function should not be more than a known accuracy bound from its argument.
As the voting/convergence function is a primary building block for redundancy management operations, we mention some common voting techniques and convergence functions used in practice where approximate agreement is required. Examples of generalized voters [21] include: formalized majority voter, generalized median voter, formalized plurality voter, and weighted averaging technique. Since a faulty processor may send dierent values to dierent processors, it is quite possible that due to receiving widely ranging values, a processor may dier by arbitrary amount and may not converge. This situation necessitates a fault-tolerant averaging function. Examples of fault-tolerant averaging functions [30] include: egocentric average, fast convergence algorithm, fault-tolerant midpoint and fault-tolerant average or mean of medial extremes (MME).
3.2 Building Block 2: Redundancy Approaches
Redundancy is typically manifested [26, 32] in one of two ways: spatial or temporal. Spatial redundancy uses extra hardware or functional modules to guard against the eect of failures. On the other hand, temporal redundancy involves extra executions of the same calculation (probably by dierent methods) and comparisons of the results to determine if any discrepancies exist. In this section we will elaborate on both the spatial and the temporal redundancy.
8
3.2.1 Spatial Redundancy
Spatial techniques involve replication of system hardware and include static pairing, N -modular redundancy or sparing techniques. We characterize N -modular redundancy technique below.
N -Modular Redundancy (NMR): The basic concept of NMR is to have N identical units and perform a majority vote on their outputs to determine the output of the system. Module Name: NMR Module Attributes:
Maximum number of faulty units at any time : m Number of units required : N = 2m + 1 All N units perform identical operation using identical inputs. All N inputs to the voter (outputs of these N units) must be synchronized. For a recon gurable NMR, spare units must be brought in synchrony with other existing units.
As a note, for the system to be resilient for up to m arbitrary or Byzantine failures, at least (3m + 1) units are needed.
3.2.2 Temporal Redundancy
The form of redundancy discussed thus far involves extra system entities for the implementation of the various techniques. Temporal or time redundancy attempts to reduce the amount of extra units at the expense of using additional time. The basic concept of temporal redundancy is to perform the same computation two or more times and compare the results to detect failures or discrepancies. In case an error is detected, the computation is performed again to see whether the discrepancies exist or not. This approach can detect errors resulting from transient faults. Time redundancy can be used for implementing backward error recovery. The simplest form of backward error recovery is retry, where the failed instruction is repeated again. Other forms include rolling back the failed computation back to a previous checkpoint (or recovery point) and continuing from there, or (may be) restarting the computation all the way from the beginning. We will detail checkpointing in Section 3.5.
3.3 Building Block 3: Synchronization
Redundancy requires some form of synchronization among the independent data sources. With systems, comprising multi-processors, the objective of clock synchronization primitive is to establish and maintain a consistent, system-wide time base among the various processors (nodes) in the system. The basic way to achieve fault-tolerant synchronization [27, 30, 36] is for each processor to periodically execute a protocol that involves exchanging clock values with the other processors, computing a reference value, and appropriately adjusting its local clock to re ect the consensus3 . Module Name: Synchronization Module Attributes
The non-faulty clocks are initially synchronized to some constant quantity. The non-faulty clock should not drift away from real-time by a rate greater that . The period between two re-synchronization signals and the range within which these signals occur must satisfy de ned bounds. There is no overlap between synchronization periods. A non-faulty processor can read the dierence between its own clock and that of another non-faulty processor with at most a small error. Requirements: The maximum skew between any two good clocks must be bounded, i.e., jCp (t) ? Cq (t)j < . There should be a small bound on the amount by which a non-faulty processor's clock is changed during each resynchronization, i.e., jCp (t + i) ? Cp (t)j < . Here, Cp (t) denotes the logical clock value of processor p at realtime t. 3 In [31], a formal speci cation and veri cation of Schneider's formulation of clock synchronization [30] has been conducted.
9
3.4 Building Block 4: Exact Agreement
Distributed system functions often require all non-faulty processors to agree mutually on a speci c value, even if certain processors in the system are faulty. This agreement is achieved through an agreement protocol that involves several rounds of message exchange among the processors [25]. In the agreement problem, generally known as Byzantine agreement problem, a single agreed-upon value is initialized by an arbitrary processor and all non-faulty processors have to agree on that value. We now present certain attributes of the Byzantine agreement protocol. A formal treatment of the agreement protocol of [25] is presented in [29].
Module Name: Byzantine Agreement Module Attributes:
At least 3m + 1 participants must be there. Each participants must be connected to each other participants through at least 2m + 1 disjoint communication paths. At least m + 1 rounds of communication among participants must take place. All participants must be synchronized within a known skew of each other. Requirements: Agreement: All non-faulty processors agree on exactly the same value (or course of action). Validity: If the transmitter is non-faulty, all non-faulty processors agree on the senders value.
Protocols implementing Byzantine agreement usually execute in a series of rounds, each round consisting of message exchanges between processors. In the rst round, the transmitter sends its value to every processor. In subsequent rounds, each processor sends a copy of every value it has received to every other processor. A processor then takes the majority values it gets from other nodes and the value it got from the transmitter. This majority value is taken to be the value sent by the transmitter.
3.5 Building Block 5: Checkpoint Establishment
Checkpointing is a commonly used technique to provide for sustained operations in a distributed system in the presence of transient faults, without incurring the high performance cost of restarting tasks/processes from scratch as transients are encountered. Over the process ow, periodic or aperiodic, consistent or inconsistent checkpoints are set up, till where the consistency of a task execution is assured. In case of a transient, the process needs to only roll back to the past consistent checkpoint and re-start rather than rolling back to the initial stage of the process. For a comprehensive discussion on various types of checkpointing approaches, we refer the reader to [10, 33]. We now de ne certain attributes of a generic checkpointing function. Module Name: Checkpointing Module Attributes
The system consists of multiple processes and these processes communicate by exchanging messages through communication channels. Communication failures do not partition the network. A set of checkpoints of dierent processes form a consistent system state. Requirements for a consistent state [17]: The set contains exactly one checkpoint for each process. There is no event for sending a message in a process X succeeding its checkpoint, whose corresponding receive event in another process Y occurs before the checkpoint of Y that is contained in the set. There is no event for sending a message in a process X preceding its checkpoint, whose corresponding receive event in another process Y occurs after the checkpoint of Y that is contained in the set.
After having identi ed the protocol building blocks (Sections 3.1 through 3.5) and subsequently the high-level framework for protocol composition, we now consider a commonly used distributed dependable protocol to illustrate the applicability of our approach. We mention that this example is used to illustrate the concept { there is no claim for completeness for all possible protocols. This protocol is chosen as it comprises of protocols widely used in both theory and practice, and is a representative of large classes of redundancy management protocols. 10
3.6 Modular Composition of Fault Detection, Fault Isolation and Resource Recon guration (FDIR) Protocol
A fault tolerant system employing dynamic redundancy techniques achieves a desired dependability attributes in the presence of faults through a process of Fault Detection (& diagnosis), fault Isolation and resource Recon guration, typically referred to as the (FDIR) paradigm. Fault diagnosis is a key component of this approach, requiring an accurate determination of the status of the system. Fault isolation component prevents a faulty unit from causing incorrect behavior in a non-faulty unit. We give a functional description of the generic on-line fault diagnosis and FDIR algorithm [40] that we use to demonstrate our approach. Our intent here is to outline the basic aspects of fault diagnosis and FDIR protocols to form basis for the protocol composition example. An excellent survey on variety of approaches for system diagnosis appear in [3].
Fault Diagnosis and FDIR Algorithm (for each node): Synchronous/Frame-Based Operations
Fault diagnosis in the system is achieved through constant monitoring and exchange of data between dierent system nodes. Each node broadcasts its output value at frame end. Each node has a majority voter, which votes on the output data from all the nodes including its own output at the end of every frame. The determination of an error in a node (X ) is achieved by the other nodes in the system as they receive and analyze the incoming messages from X for errors. The message from round (frame), say i ? 1, has arrived at a node. At this node's round i frame boundary, the incoming message is checked for any perceived errors to the extent of the implemented error detection mechanisms. Suppose an error is perceived in a message coming from a node (X ) and this particular node (X ) is found to be in a minority in a voting cycle at the end of the frame, then that node (X ) is marked as erroneous by all other functional nodes. Consequently, an assigned penalty count for that particular node (X ) is incremented re ecting error detection. This process is repeated over each frame. As long as the penalty count for a node is lower than a chosen \exclusion threshold" of penalty value K , the node is declared to be functional and its data used in subsequent operations. When the penalty count for a particular node reaches a pre-speci ed exclusion threshold value, the node is declared faulty and is no longer allowed to participate in the computations (or majority voting) in subsequent frames. On the other hand, if the node after being incriminated for an error and having the penalty count greater than 1 starts functioning properly, then a reward count for that node is incremented for every frame the node is in the majority of the vote. At this point there are two dierent scenarios by which a suspect node can be re-admitted into the system. One, where a node is immediately re-admitted as soon as it starts functioning properly; in which case reward threshold value is 1. At this point its penalty count is reset to zero. This is a simplistic solution to handle transient errors. A more conservative (and realistic) scenario is where the node is re-admitted only when the node functions properly (i.e., node's output data agrees with the majority vote of the remaining functional nodes) for L number of consecutive cycles, where L is a reward threshold value assigned for error-free operations. With this brief introduction, we rst highlight basic functions/steps which are essential to the overall correctness of the FDIR protocol. We then relate these functions to protocol-building-blocks which are constituent to the hierarchical composition of the FDIR protocol. Following this, for each building blocks we discuss their role and various issues/nuances in the overall protocol composition. Module Name: FDIR Module Attributes
All processors execute the same workload and determine the output value using a voting function. At the end of a frame, each node votes on the output data from all the nodes including its own output, and generates an error report. At the beginning of every frame, each node broadcast an error report message regarding each node in the system. Upon receipt of a round of error report, each node votes and collectively agrees upon the penalty count of each other nodes.
11
Once this exact agreement is reached, the penalty count of an accused node is compared to the prespeci ed exclusion threshold. If the penalty count exceeds the threshold, each node then votes on the exclusion of an accused node from the operating set, and collectively (Byzantine agreement) agrees upon this decision. If an excluded nodes exhibit correct behavior, its penalty and reward counts are updated, and all other functional nodes agree upon these values. If the reward count of an excluded node falls below a prede ned readmission threshold, then its inclusion in the current operating set is collectively (Byzantine agreement) agreed upon by all other functional nodes.
A: Choosing the Building Blocks:
Through these identi ed functions/steps, it can be inferred that the following basic protocol-buildingblocks are essentially needed. We also discuss their role in the overall operation of fault detection, isolation and recon guration. The building blocks being used are:
Voting/Convergence Function is used to (a) determine the output value of the processor, and detect
any error in an incoming message, (b) compute the average value during interactive convergence (synchronization) phase, and (c) compute the majority value(s) during exact agreement phase. Synchronization is used to (a) synchronize various activities of all processors, and (b) implement atomic broadcast primitive. Communication Primitives (speci cally, atomic broadcast primitive) is used to achieve the atomicity, order and bounded communication of message exchanges. For example, if the majority value is taken as a median value, then order of inputs to the voter is indispensable. Agreement Function is used to collectively agree upon (a) penalty count and/or reward count of an accused node, and (b) inclusion/exclusion of a node in/from the operating set of nodes. Checkpointing Function is used to establish recovery points to restart the process upon the initiation of recon guration of the operating set of nodes. This function may be implemented implicitly by using the frame boundaries from the synchronization function. We have included this as this is a logical function required in the chosen FDIR protocol.
B: Outlining the Block Interactions:
After having identi ed the building blocks and their role in achieving the overall objectives of FDIR process, we present the hierarchical composition of FDIR protocol using the building blocks of redundancy management protocols, namely voting, broadcast, synchronization, agreement and checkpointing, as shown in Figure 3. Building Blocks
LEVEL
Voting/ Detection
Synchronization Broadcast
Agreement Checkpointing
Building Blocks Specification and Verification Protocol Composition from Building Blocks Consistency of Specification Across Building Blocks
Protocol Verification & Validation
Figure 3: Hierarchical Composition of a FDIR Protocol An important step in composing a protocol is to establish the consistency of speci cations across various constituent blocks by demonstrating that the requirements of these building blocks are non-con icting. Towards this objective, we identify the inter-dependencies of these basic building blocks in the overall FDIR protocol operation, as depicted in Figure 4. 12
Voting/ Convergence
Atomic Broadcast
Synchronization
FDIR
Agreement
Checkpointing
A
B:
Error Detection Routines
A influences B.
Figure 4: Inter-dependencies of Building Blocks of FDIR Protocol Blocks Interactions Voting/Conv ! FDIR Voting/Conv ! Sync Voting/Conv ! Agrmt Sync ! At.Bcast Sync ! FDIR Sync ! Chkptg Sync ! Agrmt Agrmt ! FDIR At.Bcast ! FDIR At.Bcast ! Chkptg Chkptg ! FDIR
Primary Attributes/Conditions at least N = 3m + 1 units all N units perform identical tasks all inputs must be from the same round voting time < incoming message rate all inputs must arrive within a time window initially synchronized clocks bounded drift rate bounded clock-reading error at least 3m + 1 processes at least 3m + 1 participants at least 2m + 1 disjoint communication paths at least m + 1 rounds of message exchange synchronous system model synchronous system model bounded communication delays synchronized clocks of correct processors connected network even upon failures process co-ordination to establish checkpoints
Requirements majority value(s)
synchronized clocks bounded maximum skew bounded correction amount agreement and validity atomicity, order and termination a consistent system state
Table 1: Inter-Block Requirements For Byzantine Fault Coverage Since the in uence paths in Figure 4 outline the interactions, our next step is to identify the exact condition that needs to be maintained, or conditions a particular block imposes on another blocks. This would help ascertain the exact requirements either con icting or non-con icting that need to be observed across the blocks. As the modularization approach help us identify the speci c attributes of each block, we summarize the conditions/requirements (for Byzantine fault coverage) for each block interactions in Table 1. After having identi ed basic inter-block requirements, we now consider an in uence path from Figure 4, speci cally Voting/Conv ! Sync ! At.Bcast ! FDIR, to elaborate on the nuances of blockinteractions. As we see from Table 1 that the in uence of atomic-broadcast block on the FDIR block is that the underlying system model must be synchronous. The synchronous nature of the system essentially requires the synchronization block. Voting/convergence block primarily in uences the system con guration for the given fault coverage. Now, suppose that for the Chkptg ! FDIR block-interaction, one chooses an asynchronous checkpointing algorithm to establish recovery points. A basic assumption in asynchronous checkpointing algorithms is that the message transmission delay is arbitrary but nite. However, from the correctness viewpoint, a protocol that is designed for an asynchronous system (weaker assumptions) will still be correct when executed in a synchronous system (stronger assumptions). 13
The most stringent requirement of a FDIR protocol is to identify a faulty unit so as to restrict the eects of faults on the system operations. Fault diagnosis is a key component in the FDIR protocol. Any algorithm to diagnose a faulty processor should satisfy the following two properties: (a) correctness { every processor that is diagnosed by a non-faulty processor as being faulty should indeed be faulty, and (b) completeness { every faulty processor in the system is identi ed. As per the chosen fault model, processors in the system may exhibit arbitrary behavior. Typically, in the presence of malicious processors, systems use interactive consistency algorithm (ICA) to ensure agreement among the non-faulty processors. Since each execution of ICA consists of many number of rounds of message exchange, the messages exchanged during the course of ICA can be used to diagnose the faulty processors. The basic assumptions made in a diagnosis algorithm is that a non-faulty processor can identify the sender of an incoming message, and can detect the absence or deviation from speci ed time window of an expected message. Moreover, Byzantine agreement is possible only for synchronous distributed system where the message delays and the dierences in relative speeds of processors are bounded.
Composition Guidelines Revisited:
After having identi ed these basic requirements, we now refer back to the composition guidelines presented in Section 2.1 to highlight some important issues of this FDIR composition.
First of all, we need to ensure that constituent blocks are semantically non-con icting. Since Byzan-
tine fault coverage is required for the FDIR protocol, all building blocks, such as atomic broadcast, synchronization, agreement, etc. must have the underlying Byzantine failure semantics. For example, an atomic broadcast protocol must guarantee atomicity even in the presence of Byzantine failures. Furthermore, synchronization protocol may execute an interactive consistency algorithm to obtain the vector so that every processors apply a convergence function on the same set of values. The assumption of synchrony must be sustained across all constituent blocks. Next, to ensure consistency across dierent levels, building blocks used repeatedly in the protocol composition must be speci ed independently across the levels. It is important to mention that the core speci cation of these building blocks is not changed but only those speci cations are modi ed/altered which are speci c to a chosen function. We observe that over this composition voting/convergence block is used repeatedly in dierent operations such as majority-output-value evaluation, synchronization, and agreement. Thus, properties of voting/convergence functions speci c to a particular operation get speci ed separately in each individual operation. We emphasize that this \separation of speci cations" does not deviate from our \reuse philosophy" but eases the veri cation process as well as modi cation of speci cations, if needed. It is not necessary in most situations for a malicious processor to exhibit its arbitrary behavior m + 1 times before being diagnosed to be faulty. It is, however, important to point out that the completeness of the diagnosis procedure cannot be guaranteed if the algorithm performs less than m + 1 rounds of message exchange. This emphasizes the fact that the invariants of a building block at a higher level (in this case, an agreement block) apply to the composite protocol itself. We reiterate the fact that once the basic formal speci cation and analysis of a particular building block is developed, associated axioms and formal theories of these blocks essentially remain unchanged over the protocol composition. This highlights the aspects of reusability of formal concepts in our building block approach to protocol composition.
C: Issues in Inter-Block Composition/Interactions:
An important step in constructing modular composition of a protocol is to identify and separate required functionality into self-contained modules or building blocks with well-de ned interfaces. The eectiveness of the process of isolating and de ning these components depends on capabilities to identify underlying 14
direct and indirect dependencies between building blocks. Dependencies between functions can complicate the process of de ning the modules. Another limitation is that these dependencies can change as the modules are used in a dierent way. We illustrate this complication with an example of timing requirements of the basic functions of the FDIR protocol. For a speci c application, the system diagnosis cycle is rst decided. It is important to point out that at the start of each frame (or round) all the nodes in the system must be synchronized and performing the same set of computation. This condition imposes requirements on the synchronization block to bring all the functional nodes within a known skew. The frame boundary also dictates the voting cycle time, and in turn, the processing time of the voter being used. As we mentioned in Section 3.1 that all inputs to the voter must arrive within a speci ed time window, this requirement in the voter speci cation actually depends on the \closeness" requirement speci ed in the synchronization block. Moreover, as the times for establishing checkpoints are approximated based on the predicted behavior of the process, this condition actually de nes the bound on failure detection and recon guration delays. We note that most of these basic primitives employ exchange of messages, various timing requirements actually de ne conditions for bounded communication provided by the broadcast primitive. To further elaborate on this aspect, let us consider the following scenario. As mentioned, the FDIR protocol employs a non-authenticated message type for information exchanges. Now suppose one selects an atomic broadcast which guarantees atomicity under Byzantine fault model by considering an authenticated message type. This would then be inconsistent with the chosen system model as 3m + 1 redundant units are no longer needed for agreement when authenticated messages are used. So far, we have only considered semantic-level non-con icting aspects of the protocol composition to ensure that the underlying system, fault and communication models are consistent. We also discussed how a timing constraint on a speci c block can in uence timing requirements of other basic blocks. In a practical setup, it is quite possible that attributes and requirements of building blocks, and services provided by them can aect the functionality of other blocks, if con gured in a particular way. In such cases, it is necessary to be able to identify such conditions and a way to con gure the blocks to make them work correctly. We explain this issue with an example. For illustration purposes, we focus on the compatibility of three speci c building blocks namely, synchronization, atomic broadcast, and checkpointing, being used in FDIR composition. One of the requirements [17] of a consistent set of recovery points (See module attributes in Section 3.5) taken by the checkpointing operation is that there are no orphan messages. As to be illustrated, the ability of a checkpointing operation to ensure that there are no orphan messages in a consistent set of recovery points is governed by properties and correctness of synchronization and atomic broadcast primitives. As mentioned earlier, checkpointing operation can be implemented using the frame boundaries from the synchronization function. In reality, it could be possible that at any global time, the clock reading of any two dierent processors in the system is within of each other. Let at time instant T , each process is scheduled to take its checkpoints. Since clocks are not perfectly synchronized, the clocks of dierent processor will reach T within of each other. Consider a particular case of a message exchange between two processors as shown in Fig. 5. T
X m Y T β
time
Figure 5: Checkpointing Operation: Interaction of Processes 15
It is to reiterate that processors are supposed to take checkpoints at time instant T as per their respective local clock. Here, processor Y sends a message m to X after establishing its checkpoints. Since there is no lower bound imposed on broadcast delay (it is only upper bounded by a constant ), in the event when < , the message m can get delivered at X before X takes its checkpoint at time instant T . Under this situation, the set of checkpoints taken by X at time instant T will have an orphan message. The fact we illustrate here is that by modularizing and identifying speci c requirements and dependencies of each individual building block, the correct working of the composite protocol can be ascertained. This also helps to identify subtle cases (as the one discussed above) which might occur when building blocks are operating together. For the checkpointing case discussed above, modular composition can give insights into the development of mechanisms which would be needed to avoid inclusion of an orphan message in the set of recovery points. This situation can be handled in the following two ways: (a) a process does not send any message after establishing its checkpoint until all the processes have established their checkpoints, and (b) a process does not consume any message delivered during before establishing its checkpoint. The correctness of either of these two approaches then needs to be established with respect to approximate clock synchronization and broadcast primitives. To achieve the second possibility, all messages sent by a process between its kth checkpoint and k + 1th checkpoint are tagged with k to indicate the interval in which they are sent. Only messages with tag k ? 1 are included in the kth checkpoint. The correctness of this technique is guaranteed by the existence of a common time base by synchronizing the clocks of all processors in the system.
Illustration of Functional Composition:
After having discussed issues and nuances of the FDIR protocol composition, we next present fragments of formal speci cation speci ed in PVS4 to illustrate functional composition. We present a general discussion on diagnosis procedure rather than getting into the details of various aspects of FDIR protocol. Let Syndrome captures the property that in round r, i believes j is faulty. For some value type T , r being round number, a; b; i; j denoting processors, proc set denoting set of processors, and send denoting the value being sent by the sender to the other processors, Syndrome can be formalized as follows: Syndrome(r, j, i) : T = IF Val(r,j,i) /= send(Val(r, j, i), j, i) THEN BAD ELSE GOOD ENDIF
In order for a processor a to declare a processor b faulty, a executes an interactive consistency algorithm (ICA) using Syndrome as follows: Declare_Faulty(proc_set, r, b, a) : bool = ICA(proc_set, LAMBDA i : Syndrome(r, b, a))(a)(b) = BAD
Here, interactive consistency is achieved by letting all the N processors in the system run a copy of the Byzantine agreement protocol as shown below: ICA(proc_set, (v: value_vector)) : Return_Type = (LAMBDA i : OMBG(i, 1, v(i), proc_set))
A fragment of formal speci cation [29] for Oral Message Byzantine General Algorithm (OMBG) [25] is shown below. It illustrates how majority function is used to compute the majority value(s). The choice of PVS is used as an example formal speci cation and veri cation environment. We have found its expressiveness and higher-order logic basis well suited for specifying protocol level operations. The use of PVS script in this paper is also based on this. However, the proposed formal modularization approach of this paper is not constrained by the choice of a speci c formal language/tool support. 4
16
OMBG(G, r, t, caucus) (p) : RECURSIVE T = IF r = 0 THEN IF member(G, caucus) AND member(p, caucus) THEN send(t, G, p) ELSE False ENDIF ELSE IF member(G, caucus) AND member(p, caucus) THEN IF p = G THEN send(t, G, G) ELSE majority(remove(G, caucus), (LAMBDA q: OMBG(q, r - 1, send(t, G, q), remove(G, caucus))(p))) ENDIF ELSE False ENDIF ENDIF MEASURE r
In the above speci cation, caucus refers to a set of processors, majority computes the majority value(s), checks for the set membership, remove eliminates a particular member from the set, and send denotes the value t being sent from the sender to the other processors. OMBG is a recursive function over the number of round r (being equal to the maximum number of faulty units, m). We can now combine the above functions to formulate a generic diagnosis procedure as follows:
member
Diagnosis(proc_set, r)(a, b) : RECURSIVE bool = IF r = 0 THEN False ELSE Declare_Faulty(proc_set, r, b, a) OR Diagnosis(proc_set, r-1)(a, b) ENDIF MEASURE r
Formalization of FDIR Protocol:
After having discussed functional compatibility issues, we now outline how dierent properties required to ensure correctness of the FDIR operation are formalized (Refer to Table 1). Our aim here is to compose formal speci cation of FDIR protocol out of formal speci cations being developed for basic building blocks, and utilize proof-constructs of these blocks to establish the overall correctness of the FDIR operation. Ultimately, we would like to utilize the resulting veri cation information of the FDIR protocol to construct test cases and guide experimental validation (Refer to [37] for our proposed formal approaches for V&V). It is thus essential to include formalization of speci c properties, such as synchronized clocks, bounded communication delays, and group membership view, as these basic aspects are inevitable in establishing the overall correctness. These properties can be axiomatized (or declared as lemmas) in the theory developed for the FDIR operation. It is important to point out that the correctness of these properties gets established in their respective theories. As an example, the consistent view of processor-group (termed as \caucus") in the function OMBG is guaranteed by the group membership protocol. In [11], it is shown how a group membership protocol can be formally analyzed utilizing synchronization and broadcast primitives. Similarly, the atomicity and bounded delivery of messages using send is guaranteed by the atomic broadcast primitive. The work reported in [41] presents formal speci cation and veri cation of an atomic broadcast protocol which utilizes synchronization and diusion primitives. We emphasize that inclusion (or consideration) of these basic primitives in formal speci cations, and subsequent formal veri cations, are essentially needed to gain insights on the dependencies of the FDIR operations on these basic functions, and most importantly, to validate its implementation in run-time. It is important to point out that in order to include speci cations of these basic primitives in FDIR composition, we are only interested in speci c requirements of these blocks and services provided by them. If a detailed proof of (or an insight into) any speci c property is required, we need to follow a rigorous procedure of establishing its correctness. We elaborate on this aspect in our subsequent discussions. We present a fragment of some basic formal constructs of FDIR operation speci ed in PVS. This also illustrates how dierent formal theories can be imported within a new theory being developed. A brief functional description is marked as % in the PVS speci cation. We acknowledge the fact that the development of complete theories of these building blocks and their usage in actual protocol's compositional veri cation 17
is an ongoing work. It is important to mention that to set up the speci cation of the FDIR protocol, we specify the building-block protocols at a higher abstraction rather than incorporating the complete details. For example, for the clock synchronization block we axiomatize the fact that the clocks of two correct processors are approximately synchronized, and specify the invariants of this block which must be satis ed over this composition. That is, we would not go into the details of specifying the procedure used for reading clock values and the convergence function or voting mechanism. FDIR [Processor : TYPE] : THEORY BEGIN % Note ``;'' separates different declarations (It is used to save space). % Importing parameterized theories.. IMPORTING Clock_Sync[Processor], Broadcast[Processor, Message], Membership[Processor] p, q : var Processor group : TYPE = setof[Processor] clock_time : TYPE = nat ; delta, T, U : clock_time; time : TYPE = {x : real | x >= 0} containing 0 t : var time ; broadcast_delay, detection_delay, join_delay : clock_time m : var Message; g : var group initiate(p, m)(T) : bool; deliver(p, m)(T) : bool interval : TYPE = setof[time] ; P : VAR pred[time] ;
I : VAR interval
Fail : [Processor -> pred[time]] ; Alive: [Processor -> pred[time]] during?(P, I) : bool = (FORALL (t : time) : member(t, I) IMPLIES P(t)) Correct(r)(t) : bool = EXISTS (t0 : time) : t0 clock_time] clock_value : [Processor, time -> clock_time] % Synchronized clock assumption (See Module Attributes in Section 3.3) Clock_Sync : AXIOM Correct(p)(t) AND Correct(q)(t) IMPLIES abs(clock_value(q, t) - clock_value(p, t)) < delta % Atomicity property of broadcast (See Module Attributes in Section 2.2) Atomicity : bool = (FORALL (p, q, m, U): deliver(p, m)(U) AND (EXISTS (t : time): C(p, t) = U AND Correct(p)(t) AND Correct(q)(t)) IMPLIES (EXISTS (s: Resource, V : clock_time): initiate(s, m)(V) AND deliver(q, m)(U))) % Termination property of broadcast (See Module Attributes in Section 2.2) Termination : bool = (FORALL (p, m, T) : initiate(p,m)(T) AND (EXISTS (t : time) : C(p, t) = T AND during?(Alive(p), close(T, T + broadcast_delay))) IMPLIES (FORALL (q : Resource) : during?(Alive(q), close(T, T + broadcast_delay)) AND (EXISTS (t : time) : C(q, t) = T) IMPLIES deliver(q, m)(T + broadcast_delay))) % % % %
Bounded failure detection [Group membership Protocol (GMP)] There exists a time constant `d' (detection_delay) such that if a processor belonging to a group `g' fails at time `t' then, by `t + d', all members of `g' that stays correct in the interval [t, t+d] will join a new group that does not contain `p'.
view(p)(T) : group Failure_Detection : bool = (FORALL p, q, g, T : member(p, g) AND member(q, g) AND Fail(p)(T) AND during?(Alive(q), close(T, T + detection_delay)) IMPLIES NOT member(p, view(q)(T + detection_delay))) % Includes definitions of OMBG, ICA, Diagnosis and other functions as described earlier... % Recall ``caucus'' -- a group of processors being used in OMBG; consistency of caucus over each round is
18
% guaranteed by GMP. ``proc_set'' is computed by GMP before execution of each FDIR round. % Agreement on ``proc_set'' or processors' view is formalized as below. joined(p)(T) : bool Agreement: bool = (FORALL p, q, T : Correct(p)(T) AND Correct(q)(T) AND joined(p)(T) AND joined(q)(T) IMPLIES view(p)(T) = view(q)(T)) frame : var interval ; proc_set : var group Caucus_AX : AXIOM (FORALL (p, t, frame) : member(t, frame) AND Correct(p)(t) IMPLIES view(p)(t) = proc_set % Utilizing ICA, we formalize the step where each node collectively agrees upon exclusion of an accused node count : TYPE = posnat ; penalty_count : count ; round : posnat Penalty_Count : [Processor -> count] ; accused_value_vector : TYPE = [proc_set -> T] Exclude_Accused?(p:Processor): bool = Penalty_Count(p) > penalty_count AND (FORALL (i,j: processor, r:round): i /= j AND p = j AND ICA(caucus, accused_value_vector)(i)(j) = BAD) % Agreement on the inclusion of an accused node after monitoring its correct behavior is formalized below. reward_count_threshold : count Reward_Count : [Processor -> count] ; approved_value_vector : TYPE = [proc_set -> T] Include_Accused?(p:Processor) : bool = Reward_Count(p) > reward_count_threshold AND (FORALL (i,j: processor, r: round) : i /= j AND p = j AND ICA(caucus, approved_value_vector)(i)(j) = GOOD) % If a processor `p' is allowed to join the group, then its `i.d.' will be included in the membership view. Recognition: bool = (FORALL p, g, T : joined(p)(T) IMPLIES g = view(p)(T) AND member(p, g))
%... %... END FDIR
With this example, our intent was to illustrate how various attributes and associated formal theories of dierent building blocks are utilized in the protocol composition. Speci cally, we have showed how properties of group membership is utilized to ensure consistent view of \caucus" over each round of FDIR operation, and how ICA function is used to formalize exclusion as well as inclusion of an accused node. We also formalized the fact that once a processor is included in the operating set, its i.d. must be included in its membership view. Through the building-block approach and formulation of various aspects of the identi ed basic primitives, we gain insights into proof-steps which would be needed to establish the overall correctness of the FDIR operations. For examples, (a) consistency of \caucus" over each round of the FDIR operation can be checked utilizing boolean predicates Agreement and Recognition, and (b) inclusion of an accused node which has exhibited correct behavior over several rounds of operations can be ascertained by checking the boolean predicate Recognition. It is important to reiterate that as higher level and more complex functions are composed as an aggregation of building blocks, the fact that we go through a rigorous matching and con ict resolution over the hierarchical composition, and ensure that the correctness of the underlying blocks is provided, facilitates the systematic and easier design process, veri cation and subsequent validation of the hierarchically/modularly composed protocol. Unlike current ad-hoc approaches to deal with dependencies and interactions, the proposed formal framework presents a methodical approach.
4 Discussion and Conclusions
Our outlook in this paper has been to identify and utilize protocol building blocks for subsequently composing a variety of dependable distributed protocols based on them, i.e., a reuse philosophy. With our focus being on redundancy management, we have identi ed basic building blocks which are inherent to provide such service. The key idea is that if a library of { system, fault, communication, tasks { models and building blocks based on them can be formulated, then these elements aid in systematic and hierarchical 19
development of the formal models of dependable distributed protocols. The basis is that by de ning and a priori validating building blocks for dependable distributed protocols, larger and more complex protocols can be easily veri ed and validated. We have presented formal characterization of the assumptions and properties of these blocks to help designer ensure that these blocks can be con gured properly, and that the correctness of the composite protocol can be methodically and rigorously established.
Common Themes/Lessons Learnt:
Based primarily on the basic functional and temporal requirements, and fault coverage of the protocol and/or the system, the rst step was to identify building blocks which satisfy such requirements, and have uniform timing and failure semantics. Following this, we needed to highlight the invariants/requirements of these building blocks which must be satis ed across dierent hierarchical levels. In order to obtain a viable composition out of these building blocks, we had to identify the inter-dependencies of these building blocks for a chosen con guration. The next objective was to determine what restrictions or requirements one particular block imposed on another block. This part was crucial in the sense that it determined whether there were any con icts among the building blocks composed/con gured together. In the FDIR composition example discussed in this paper, all building blocks had the same fault coverage (i.e., Byzantine faults) and synchrony assumptions (i.e., synchronous system). For this particular case, the inter-blocks interactions was relatively straightforward. However, it is important to highlight that particular con gurations of protocol-building-blocks may require formal analyses to determine the weakest properties that still can guarantee overall correctness of the protocol. For maximum bene ts, system building blocks should have a simple and concise de nition and speci cation of their behavior and requirements. We next discuss some of the observations made during these studies. Any interaction between two building blocks de nes a dependency between them. While composing a given protocol, building blocks may be stacked on top of each other or may be glued at the same level. With the \Lego" or stack-up style of protocol composition (e.g., Horus, x-Kernel), the interface of these components can be systematically de ned. When building blocks are at the same level of hierarchy, it is important to ensure that they cooperate with each other to implement a set of services. In this respect, it is, therefore, important to detect interactions between building blocks to de ne subtle interdependencies between them. Understanding these dependencies and their aects on the composition can help the designer to ensure that the composite protocol will provide the desired services. In a hierarchical approach to protocol composition, even if we have shown a composition, at a particular level of abstraction, to be sound (i.e., free of con icts), it is quite possible that incorporating new parameters (implementational or temporal) may cause some of the inter-blocks subtle dependencies or con icts to surface. In our opinion, a speci cation language should support an axiomatic and de nitional style of speci cation. A higher-order logic-based speci cation language, which provides expressiveness, a set of built-in data types and supports for strong-typechecking, is most suited for de ning formal theories of these building blocks. The language should allow speci cation to be structured into parameterized theories. Also, parameters should be allowed to have constrained imposed upon them. Additionally, formalism should allow instantiations of theories. Building blocks with parameterized theories can then be easily instantiated in the formal theory being developed for the protocol under consideration. The speci cation language of PVS [24], for example, allows for such constructs. The advantage of this type of speci cation language in modular composition is that the built-in typechecker can ag inconsistencies in type de nitions appearing in dierent theories, or instantiations of imported theories of building blocks, and thus can detect simple errors at an early stage.
Caveats/Limitations:
Even with these guidelines, selecting an appropriate set of building blocks for a given protocol can sometimes be dicult due to subtleties associated with fault-tolerance and timing attributes. We acknowledge some of the current limitations of our modular approach: 20
The choice of which building blocks to include, when composing a given protocol, and the parame-
terization of their respective formal theories are very much an intuitive process. In order to ensure a viable composition, the user needs to selectively choose building blocks which would satisfy varied functional, temporal and dependability requirements. Inability to identify subtle indirect dependencies among building blocks is the major cause for these problems. Although the formal speci cation and veri cation of individual building blocks may be easy, the formal veri cation of the composite protocol may not be straightforward. For example, the formal speci cation and veri cation of majority voter as an individual building block may be relatively easy. However, establishing the correctness properties of dierent instances of this building block in the FDIR protocol composition may require additional eort and ingenuity. With better understanding of building blocks/components behavior and mathematically well-de ned external speci cation, this problem can be alleviated to a certain extent.
Perspectives on Future Protocol Composition and Related Issues:
As systems in future would grow more complex with stricter real-time and dependability speci cations, the design of the protocol and, most importantly, its formal reasoning would necessitate a modular or a hierarchical approach to protocol composition. We have introduced initial approaches to identify functional blocks for subsequent reuse in formulating/modularizing distributed dependable protocols. We envision that a modular approach such as the one we have proposed in this paper would facilitate modular composition of future protocols and their formal treatment. Such building blocks will be useful to system designers because they will permit thoroughly (and rigorously) tested formal theories of required system and component behavior, and will support system design decisions and modi cation. One of the interesting viewpoint is to investigate whether the correctness properties of these guiding principles can be formally de ned and established. For modular composition of large or complex protocols, an immediate need is to have an automated procedure to ensure the consistency of varied requirements of building blocks across dierent levels. Further research in this context would require detailing and de ning external speci cations of building blocks to facilitate such automated mechanisms.
References
[1] R. Alur, T.A. Henzinger, F.Y.C. Mang, S. Qadeer, S.K. Rajamani, S. Tasiran, \MOCHA: Modularity in Model Checking." LNCS 1427, pp. 521{525, Springer-Verlag, 1998. [2] A. Arora, M.G. Gouda, \Closure and Convergence: A Foundation of Fault Tolerant Computing." IEEE Trans. on Software Engineering, 19(10), pp. 1015{1027, Oct. 1993. [3] M. Barborak, M. Malek, A. Dahbura, \The Consensus Problem in Fault-Tolerant Computing." ACM Computing Surveys, vol. 25, no. 2, pp. 171{220, June, 1993. [4] D.M. Blough, H.W. Brown, \The Broadcast Comparison Model for On-Line Fault Diagnosis in Multicomputer Systems: Theory and Implementation." IEEE Trans. on Computers, vol. 48, no. 5, pp. 470{493, May 1999. [5] E.M. Clarke, J.M. Wing, et al., \Formal Methods: State of the Art and Future Directions." ACM Computing Surveys, vol. 28, No. 4, pp. 626{643, Dec. 1996. [6] F. Cristian, \Understanding Fault-Tolerant Distributed Systems." CACM, vol. 34, no. 2, pp. 57{78, Feb. 1991. [7] F. Cristian, H. Aghili, R. Strong, D. Dolev, \Atomic Broadcast: From Simple Message Diusion to Byzantine Agreement." Information and Computation, 118:158{179, April 1995. [8] F. Cristian, C. Fetzer, \The Timed Asynchronous Distributed System Model." IEEE Trans. on Parallel and Distributed Systems, 10(6), pp. 642{657, June 1999. [9] R. De Prisco, et al., \Building Blocks for High Performance and Fault-Tolerant Distributed Systems." Details Available at http://www.lcs.mit.edu/research/projects, 1999. [10] E.N. Elnozahy, L. Alvisi, Y-M. Wang, D.B. Johnson, \A Survey of Rollback-Recovery Protocols in Message-Passing Systems." CMU Technical Report CMU{CS{99{148, June 1999. [11] Y. Gurevich, R. Mani, \Group Membership Protocol: Speci cation and Veri cation." Speci cation and Validation Methods, pp. 295{328, Oxford University Press, 1995.
21
[12] T. Henzinger, S. Qadeer, S.K. Rajamani, \You Assume, We Guarantee: Methodology and Case Studies." Proc. of CAV'98, June/July 1998. [13] M.A. Hiltunen, R.D. Schlichting, \An Approach to Constructing Modular Fault-Tolerant Protocols." Proc. of the 12th IEEE Symposium on Reliable Distributed System, pp. 105{114, Oct. 1993. [14] J. Hooman, Speci cation and Compositional Veri cation of Real-Time Systems. LNCS 558, Springer Verlag 1991. [15] G.J. Holzmann, Design and Validation of Computer Protocols. Prentice Hall, 1991. [16] N. Hutchinson, L. Peterson, \The x-Kernel: An Architecture for Implementing Network Protocols." IEEE Trans. on Software Engineering, SE 17(1), pp. 64{76, Jan. 1991. [17] P. Jalote, Fault Tolerance in Distributed Systems. Prentice Hall, 1994. [18] C.M. Krishna, K.G. Shin, Real-Time Systems. McGraw Hill, 1997. [19] L. Lamport, S. Marz, \Specifying and Verifying Fault-Tolerant Systems." Proc. of FTRTFT, LNCS{863, pp. 41{76, 1994. [20] X. Liu, C. Kreitz, R. van Renesse, J. Hickey, M. Hayden, K. Birman, R. Constable, \Building Reliable, High-Performance Communication Systems from Components." Operating Systems Review, 34(5), pp. 80{92, Dec. 1999. [21] P.R. Lorczak, A.K. Caglaya, D.E. Eckhardt, \A Theoretical Investigation of Generalized Voters for Redundant Systems." Proc. of FTCS{19, pp. 444{451, 1989. [22] P. Michel, V. Wiels, \A Framework for Modular Formal Speci cation and Veri cation." Proc. of FME'97, 1997. [23] S. Misra, L. Peterson, R. Schlichting, \Consul: A Communication Substrate for Fault-Tolerant Distributed Programs." Distributed Systems Engineering, 1(2), pp. 87{103, 1993. [24] S. Owre, J. Rushby, N. Shankar, F. von Henke, \Formal Veri cation for Fault-Tolerant Architectures: Prolegomena to the Design of PVS," IEEE Trans. on Software Engineering, SE 21(2), pp. 107{125, Feb. 1995. [25] M. Pease, R. Shostak, L. Lamport, \Reaching Agreement in the Presence of Faults." Journal of the ACM, 27(2), pp. 228{234, Apr. 1980. [26] D.K. Pradhan, Fault Tolerant Computer System Design, Edited, Prentice Hall, 1996. [27] P. Ramanathan, K.G. Shin, R.W. Butler, \Fault-tolerant Clock Synchronization in Distributed Systems." IEEE Computer, 23(10): 33{42, Oct. 1990. [28] P. Ramanathan, K.G. Shin, \Use of Common Time Base for Checkpointing and Rollback Recovery in a Distributed System." IEEE Trans. on Software Engineering, 19(6), pp. 571{583, June 1993. [29] J. Rushby, \Formal Veri cation of an Oral Message Algorithm for Interactive Consistency," SRI-TR CSL-92-01, 1992. [30] F.B. Schneider, \Understanding Protocols for Byzantine Clock Synchronization." Technical Report 87{859, Dept. of Computer Sc., Cornell University, NY, August 1987. [31] N. Shankar, \Mechanical Veri cation of a Generalized Protocol for Byzantine Fault Tolerant Clock Synchronization." In FTRTFT, LNCS 571, Springer-Verlag, pp. 217{236, 1992. [32] D.P. Siewiorek, R.S. Swarz, The Theory and Practice of Reliable System Design, Digital Press, Bedford, 1982. [33] M. Singhal, N.G. Shivratri, Advance Concepts in Operating Systems. McGraw-Hill, 1994. [34] P. Sinha, N. Suri, \Identi cation of Test Cases Using a Formal Approach." Proc. of FTCS-29, pp. 314{321, 1999. [35] P. Sinha, N. Suri, \On the Use of Formal Techniques for Analyzing Dependable Real-time Protocols." Proc. of RTSS-20, pp. 126{135, 1999. [36] N. Suri, M. Hugue, C. Walter, \Synchronization Issues in Real-Time Systems," Proc. of IEEE, 82(1), pp. 41{54, 1994. [37] N. Suri, P. Sinha, \On the Use of Formal Techniques for Validation." Proc. of FTCS-28, pp. 390{399, 1998. [38] R. van Renesse, K. Birman, S. Maeis, \Horus: A Flexible Group Communication System." Communication of the ACM, 39(4), pp. 76{83, April 1996. [39] P. Verissimo, C. Almeida, \Quasi-synchronism: A Step Away from the Traditional Fault-Tolerant Real-Time System Models." IEEE Bulletin of the TCOS, 7(4), pp. 35{39, Winter 1995. [40] C. Walter, P. Lincoln, N. Suri, \Formally Veri ed On-Line Diagnosis." IEEE Trans. on Software Engineering, SE 23(11), pp. 684{721, Nov. 1997. [41] P. Zhou, J. Hooman, \Formal Speci cation and Compositional Veri cation of an Atomic Broadcast Protocol." Real-Time Systems, 9(2), pp. 119{145, 1995.
22