Lightweight Verification of Network Protocols: The Case of Chord Pamela Zave AT&T Laboratories—Research
[email protected]
63
ABSTRACT 53
This paper applies modeling in Alloy and verification with the Alloy Analyzer to the well-known Chord protocol. The results include the first rigorous proof that the join-only case is correct, and many counterexamples showing that the full protocol is not correct. The paper also explains the methods so they can be applied to other protocols, justifies their costeffectiveness in protocol design, and shows how the methods complement other formal techniques.
1.
62
50
10 48
16
37
30
9
62
10 48
16
37
30
Figure 1: Ideal (left) and valid (right) rings.
INTRODUCTION
The well-known Chord distributed hash table needs no introduction. According to Chord papers [13, 12], “three features that distinguish Chord from many other peer-to-peer lookup protocols are its simplicity, provable correctness, and provable performance.” Yet the Chord routing protocol is neither proven nor correct. The only published proof of correctness excludes failures from consideration [14]. Even within its scope the proof does not compel belief, due to ill-defined terms and missing or unjustified steps. The full protocol is clearly incorrect, even after bugs with straightforward fixes have been eliminated. Not one of the six properties claimed invariant for the full protocol in [9] is invariantly true. The point of these observations is not to disparage the designers of Chord, whose creativity and insight are impressive. Their verification work was performed without the benefit of formal modeling languages and automated analysis tools. The point of illuminating its flaws is to show that design of network protocols is too difficult to achieve without the help of these formal methods. A further purpose is to motivate the modeling and verification techniques presented in this paper, which yield a straightforward proof of what is provable, and examples of the problems that prevent construction of a full correctness proof. The techniques are termed “lightweight” because they can be applied without deep knowledge of formal methods, and because most of the verification steps are fully automated.
1.1 1.1.1
Overview of verification technology Invariants
In this paper “verification” has its standard meaning in computer science, which is proving that a system satisfies its specification. Many verification techniques rely on an invariant, which is a concise description of the system’s reachable state space. An invariant is usually an incomplete description, focusing only on the properties that are important for system operation and needed for verification. For example, a routing protocol reacts to changes in a network. Changes degrade routing information, so the protocol runs frequently across the network to restore routing information to its ideal state. The invariant of a routing protocol tells us how badly corrupted or outof-date the routing information can become, yet still be restored successfully by the protocol. Figure 1 shows two Chord network states (successors only). The state on the left is ideal, because all the member nodes are joined in a single cycle that is ordered by node identifier. The state on the right is uncorrupted but out-of-date, because some newly-joined nodes have not been absorbed into the cycle. As this state is both reachable and can be corrected by the Chord ring-maintenance protocol, it should satisfy whatever invariant is employed for verification. A state that 1
satisfies the invariant is valid.1 A specification of a network protocol could center on a predicate Valid expressing the intended invariant, and a predicate Ideal expressing the ideal state of the network. With such a specification, there are two main proof obligations. First, a safety proof must establish that the network never reaches an invalid state—a state that does not satisfy Valid. Second, a liveness proof must establish that, starting in any valid state, if there are no more disruptive changes, then the protocol will eventually bring the network to an ideal state. Invariants have value beyond their use in verification. For one reason, the network invariant is the key to the success of the protocol, because it provides the guaranteed structure on which the protocol works. It should be designed with the protocol, and featured in its documentation. It follows that if the protocol is subtle, then the invariant is also subtle, and may be difficult to find if the protocol was designed first. This problem is common to program verification in all application domains. Recognizing the importance of invariants, the Chord designers have published invariants for three versions of the protocol [9]. The “pure join” model has only events in which new nodes join the network, and events in which the network stabilizes. Stabilization is the part of the routing protocol that incorporates new nodes into the cycle. The “pure failure” model has only events in which nodes fail silently, and events in which the routing tables are reconciled. Reconciliation is the part of the routing protocol that recovers from failures. Finally, the “full” model includes all of these events. Another reason for the importance of invariants is that a large, rapidly-changing network will actually never be in an ideal state. In other words, its invariant describes its normal operating conditions.
1.1.2
Model checking is the automated verification technique most often used for concurrent systems, because analysis is fully automated. Model-checkers work only on system models with finite state spaces. The checker constructs an internal representation of the reachability graph of the model. The nodes of this graph are the reachable states of the model, and its edges are the possible state transitions. Although this internal representation of the reachable state space is neither concise nor human-readable, it does play the same role as an invariant in verification, without the need for human ingenuity to figure it out. A user need only supply a predicate specifying the ideal state, and the checker can (in principle) complete a liveness proof. The best-known disadvantage of model checking is that state explosion prevents analysis of large models. Another disadvantage, less well understood, is that the modeling languages of model checkers are on the inexpressive end of the spectrum. They are usually adequate for expressing network operations, but often inadequate for expressing specifications such as invariants. For example, probably the most popular model-checker is Spin [4]. In using Spin, a designer must supply any but the simplest invariants in the form of a program in a language such as C. Then the checker runs the program on each state to check if it satisfies the invariant. If an invariant is intricate, subtle, or unknown, requiring many cycles of trial-and-error to get right, then each cycle of trial-and-error will entail writing, modifying, or debugging a C program. This is daunting to say the least. A third alternative, less well-known than theorem proving or model checking, is model enumeration. The use of model enumeration has been pioneered by the language Alloy and the Alloy Analyzer [5].2 Like model checking, model enumeration offers fully automated analysis. Like model checking, model enumeration suffers from state explosion, so that model size is limited. The two approaches differ primarily in language expressiveness and analysis trade-offs. The particular advantage of using Alloy in this context is that the language is extremely good for modeling network structures and expressing network invariants. The particular disadvantage of using Alloy in this context is that time and events are not built into Alloy. They must be built into models by convention and analyzed without special optimization, so that Alloy analysis is computationally limited to short traces. In contrast, the concept of time (formalized by temporal logic) is fundamental to model checking. Model checkers distinguish new reachable system states from states they have already analyzed. With these concepts
Three approaches to verification
A formal modeling language and its automated analysis tool are an inseparable pair. Such pairs are typically categorized according to the tool. The two wellknown categories in common use are theorem provers and model checkers. Theorem provers tend to have rich, expressive modeling languages. They can prove general theorems for networks of any size. Theorem provers have been applied successfully to path-vector and distance-vector routing protocols [3, 15]. Their biggest disadvantage is that theorem proving is not fully automated, so use of a theorem prover requires considerable effort and fairly deep knowledge of theorem proving. 1 Although the Chord ring-maintenance protocol does not look much like other routing protocols, it has the same purpose. We introduce the term “valid” because “invariant” has the wrong meaning (“unchanging”) when used as an adjective.
2
Also http://alloy.mit.edu/community/. The Alloy library contains versions of the Chord lookup protocol.
2
sig Node { succ: Node lone -> Time, prdc: Node lone -> Time } { all t: Time | no this.succ.t => no this.prdc.t }
built-in and highly optimized, model checkers can analyze long and even infinite event traces.
1.2
Contributions of this paper
Figure 2: Network state of the pure-join model.
This paper considers the hypothesis that model enumeration is a valuable formal method for network protocols. According to this hypothesis, a modeling language with the expressiveness of Alloy is sometimes necessary, particularly for experimenting with invariants. Also according to this hypothesis, the limitations of Alloy analysis—particularly with respect to temporal properties and traces—can be overcome. To test this hypothesis, we developed an Alloy style for modeling protocols with about the same amount of detail as is used in the pseudocode description of Chord. We also developed new specification techniques for dealing with causality and with redundant state structures used for reliability. Finally, we found techniques for liveness proofs that do not require analysis of long traces or large network states. Although Alloy has been used on a network protocol before, the earlier study by Jackson et al. [6] does not employ any of these techniques. The techniques are presented in Sections 3 and 4. The results of these modeling and verification techniques, applied to Chord, are very interesting. For the pure-join case (Section 3), they result in several new insights and the first rigorous proof of correctness. For the full model (Section 4), they show why it is not correct. Analysis of the full model provides more than just a simple negative. We found some bugs with straightforward fixes, and fixed them. We gave formal definitions for the six properties claimed invariant in [9], and developed examples showing both why each is desirable (when that is not clear) and why each is not invariantly true. Some of the claimed properties are not required for correctness in the narrowest sense, but are desirable for other reasons such as good lookup behavior or sound performance analysis. The presentation concludes with a discussion of implications for protocol design (Section 5). One possible criticism of this work is that some of the discovered problems appear unlikely to occur in practice. Although this is true, some of the discovered problems appear fairly probable. Also, this is what we would expect of a protocol with the maturity of Chord. The purpose of this research is to encourage the application of formal methods to new protocols as they are being designed. Experience with formal methods indicates that this is faster and more cost-effective than implementation and testing for finding design flaws, even though testing will eventually find the problems that occur with high-enough probability. Another possible criticism of this work is that it does not yield verified protocol implementations. Compared to implementations, the verified models are so simple
and abstract that they avoid many real problems. However, any problem that shows up in an Alloy model is very likely to show up in a real implementation of the same protocol. It makes sense to eliminate these inherent problems before tackling the additional issues that arise in implementation. As further evidence of this, in Section 4.6 we compare our results to the results of several attempts to verify Chord implementations. This study supports the following conclusions: • Even well-known protocols can be poorly understood. Lightweight formal methods are easy to use on network protocols, particularly when they are used in the design stage. Their benefits—in terms of early problem diagnosis, better documentation, and reliability assurance—amply repay the modest cost. • Both model checking and model enumeration are lightweight methods that work well for network protocols. Although model checking is common in this domain and model enumeration is not, model enumeration has advantages that complement those of model checking.
2.
GETTING STARTED WITH ALLOY
The Alloy language is a seamless combination of relational algebra and first-order predicate logic. Critical second-order operators such as transitive closure are built in. To begin with a small fragment, Figure 2 defines the network state of a pure-join model of Chord in Alloy. sig Node means that Node is a basic type. Each individual node has two fields, succ and prdc. A possible value of each of these fields is a binary relation from nodes to times. The modifier lone means “zero or one.” Because of the placement of this modifier in the type declarations, each of these relations must associate each time with zero or one node. In other words, at any time, a node has zero or one successor (predecessor) nodes. The set brackets after the type declaration of the fields of a node enclose an invariant on each node. In this model the presence or absence of a successor indicates whether a node is or is not a member of the network. The node invariant is stating that if a node has no successor (and therefore is not a member), then it has no predecessor. To read this invariant it is necessary to understand the Alloy semantics of relations. A field such as succ is actually a global relation of type Node -> Node -> Time, not counting the further constraint imposed by lone. Although the relation declared within the node signature is binary, it becomes ternary when the owning 3
sig Node Node0 field succ Node1 -> Node1 -> field prdc Node1 field succ Node0 -> field prdc Node0 ->
Time0 Time1
Time1 Time1
sig Node Node0 field succ Node0 -> Node1 -> Node0 -> Node1 -> field prdc Node1 field succ Node0 -> field prdc Node0 -> Node0 ->
pre- and post-states will be represented by distinct collections of tuples. For short traces and models of moderate size, the Alloy Analyzer can check an assertion in milliseconds or a few seconds. Because of the expressiveness of the language, subtle behaviors and assertions can be expressed easily and entered quickly. Consequently, a typical exploratory cycle of trial-and-error takes no more than a minute or two, so that it is feasible to use Alloy to develop deep understanding of a network protocol.
Time0 Time0 Time1 Time1
Time1 Time0 Time1
3.
Figure 3: An instance of Figure 2 (left), and a non-instance (right).
3.1
VERIFICATION OF A PURE-JOIN MODEL Model construction
As introduced in Section 1.1.1, a “pure join” model of Chord has only join and stabilization operations, so nodes never fail or leave the network once they have joined it. A state of such a model is a set of nodes with timestamped successor and predecessor fields, as declared in Figure 2. The model is far simpler than any implementation would be. In the state alone we can see two simplifications: (1) the concepts of a node, its IP address, and its hashed identifier are all conflated. The identifier is the most important for verification, so nodes in the model correspond to identifiers. An Alloy library component ordering is invoked to ensure that the members of Node are totally ordered. (2) There are no finger tables, as these are derived from the more basic routing information. The events of the pure-join model are modeled in Figure 4. Each ChordEvent is an event with a node field, the value of which is the single node at which the event takes place. Join, Stabilize, and Notify events are all Chord events. Notify events have an additional field newPrdc. Join and stabilize events can occur at any time. Each notify event, on the other hand, is preceded and caused by a stabilize event. To include these constraints in the model, each event has a cause field which can be empty or can contain a single event which is the cause of the event. Join and stabilize events have no causes, while each notify event has a stabilize event as its cause. In this paper stabilization refers to a composite operation consisting of stabilize and notify events. The fact NonmemberCanJoin places all the necessary constraints on join events. In the fact, j, n, and t are the event, its node, and the time of its pre-state, respectively. The fact uses predefined predicates Member and NonMember, defined according to whether the node has a successor at the stated time. First, the joining node must not already be a member. Second, there must be a member node m such that n is between m and its successor at t, and in the post-state n has the same successor as m. Third, n has no predecessor in the post-
node is taken into account. Thus the meaning of a tuple (n1,n2,t) in this relation is that node n1 has as its successor node n2 at time t. this refers to the node being constrained, t is a bound variable of the expression, and dot is relational join. So the value of this.succ.t is the set of all nodes appearing as second elements in tuples of the relation succ having this as their first element and t as their third element. The lone constraint tells us that this set is of size zero or one. no this.succ.t is true if and only if the size of this.succ.t is zero. In Alloy terminology, a typical model has many instances, which are collections of individuals and relations that satisfy the constraints in the model. For example, Figure 3 shows two collections of individuals and relations with two node individuals and two time individuals. The one on the left is an instance of Figure 2, while the one on the right is not. On the right, Node0 has two successors at one time. Also on the right, Node1 has a predecessor at Time0, but no successor at that time. Given a model, the Alloy Analyzer checks all possible instances up to a given scope. For example, the scope of the instances above is 2 Node, 2 Time. If there are no instances, the model is inconsistent. If all instances satisfy a given assertion, then the assertion is verified for that scope. The concept of time is not built into Alloy. To study event traces in Alloy, one uses a convention that both times and events are totally ordered sets, and that each event e has a field containing the time immediately preceding it (e.pre) and a field containing the time immediately following it (e.post). The preconditions of the event must be true in the network state immediately preceding it, consisting of all tuples timestamped e.pre. Similarly, the postconditions of the event must be true in the network state immediately following it, consisting of all tuples timestamped e.post. Even if the event does not change the system state, the event’s 4
abstract sig ChordEvent extends Event { node: Node } sig Join extends ChordEvent { } { no cause } sig Stabilize extends ChordEvent { } { no cause } sig Notify extends ChordEvent { newPrdc: Node } { some cause }
either case, a notify event is not the cause of any other event. Figure 6 shows the effects of several stabilize and notify events. Events occur sequentially in a trace, and events can be interleaved in traces in any way that does not violate causality constraints. Each event is localized in the sense that it only changes the state of one node, although it can read the states of adjacent nodes. Both the pure-join model and the full model in Section 4 conform closely to the pseudocode descriptions of Chord in [9] and [13]. The parts of the Alloy model not shown in this paper are so routine that they could be copied from one project to another with no changes or only standardized changes. In other words, the two Alloy models discussed in this paper are templates that could be adapted easily to make skeletons for Alloy models of other network protocols.3
fact NonmemberCanJoin { all j: Join, n: j.node, t: j.pre | NonMember[n,t] && (some m: Node | Member[m,t] && Between[m,n,m.succ.t] && n.succ.(j.post) = m.succ.t ) && no n.prdc.(j.post) && no cause:>j } fact StabilizeMayChangeSuccessor { all s: Stabilize, n: s.node, t: s.pre | let newSucc = (n.succ.t).prdc.t | Member[n,t] && ( ( some newSucc && Between[n,newSucc,n.succ.t] ) => n.succ.(s.post) = newSucc else n.succ.(s.post) = n.succ.t ) && (some f: Notify | f.cause = s && f.newPrdc = n && f.node = n.succ.(s.post) ) }
3.2
The Between predicate
The predicate Between is used in all situations to test the order of node identifiers. Its definition is: pred Between [n1, n2, n3: Node] { lt[n1,n3] => ( lt[n1,n2] && lt[n2,n3] ) else ( lt[n1,n2] || lt[n2,n3] ) }
fact NotifyMayChangePredecessor { all f: Notify, n: f.node, p: f.newPrdc, t: f.pre | (no n.prdc.t || Between[n.prdc.t,p,n]) => (n.prdc.(f.post) = p && no cause:>f) else (n.prdc.(f.post) = n.prdc.t && no cause:>f ) }
Recall that nodes are totally ordered, so they can be compared by library predicates such as “less than” (lt). The definition is nontrivial because it must take identifier wraparound into account. The properties of this definition are critical to the success of an implementation, so it makes sense to document them. For any arguments x, y, and z, Between[x,y,z] is never true if x = y or y = z. For any distinct arguments x and y, Between[x,y,x] is always true, as is Between[y,x,y]. The significance of this fact is that it is impossible to order two members of a cycle—obvious when you think about it, but easy to forget when writing definitions.
Figure 4: Events of the pure-join model.
state. Fourth, this join is not the cause of any other event. In the valid ring in Figure 1, we see the effects of recent joins of nodes 50 and 53. The fact StabilizeMayChangeSuccessor places all the necessary constraints on stabilize events. The let clause creates a temporary variable newSucc whose value is the stabilizing node’s successor’s predecessor, in the pre-state. First, the stabilizing node must be a member in the pre-state. Second, if there is some new successor (as opposed to an empty set), and if it is between n and its current successor, then the successor of n becomes newSucc in the post-state. (Otherwise n’s successor is unchanged.) Third, regardless of whether there is a successor change or not, the event causes a subsequent notify event at n’s successor, with argument newPrdc set to n. The fact NotifyMayChangePredecessor places all the necessary constraints on notify events. If the notified node has no current predecessor, or if the new predecessor is better than its old predecessor, it adopts the new predecessor. Otherwise its predecessor is unchanged. In
3.3
The Valid and Ideal predicates
The definition of the network invariant Valid is given in Figure 5. The figure also includes the definition of a property OrderedMerges that is not in the invariant, but will be discussed. OneOrderedCycle is the most important property in the Chord network invariant. At any time, it is true if and only if the members of the network at that time contain exactly one cycle, and that cycle is ordered by node identifiers. Definition of the predicate begins with a let clause introducing a temporary variable cycleMembers containing all the member nodes that are in some cycle. A 3
When the paper is published, the complete models will be available on the Web.
5
node is in a cycle if it can be reached by the irreflexive transitive closure of its own successors. The value of succ.t is the binary successor relation on nodes at time t. The carat denotes its irreflexive transitive closure, also a binary relation on nodes. The value of the relational join n.(^ (succ.t)) is the set of all nodes that can be reached from n via this relation. If n itself is in this set, then n is a member of a cycle. The remainder of the predicate definition consists of three conjuncts. The first conjunct says that there is at least one cycle member, which means that there is at least one cycle. The second conjunct says that any cycle member is reachable from any other, which means that there is at most one cycle. The third conjunct says that it cannot happen that n2 is the direct successor of n1, and a third cycle member n3 falls between them in the identifier order. This means that the cycle is globally ordered by identifiers. Property AntecedentPredecessors defines for each member n a set antes containing all the nodes having n as their successor at time t. The property says that n’s predecessor field value, if any, must be in this set. This is necessary because the successor structure is built, through stabilization, from predecessors. If predecessors can have arbitrary values, for example point to nodes that are not members, then the ring can degrade quickly. Property ConnectedAppendages says that for each node na that is a member but not in the cycle, there is a cycle member nc reachable from it by succession. The node na is said to be in the appendage attached to the cycle at nc. Stabilization among new nodes can cause the formation of chains in appendages, as shown in Figure 1. Then joins at appendage nodes can transform these chains into trees. Property OrderedAppendages says that all paths through these trees are ordered by identifiers. OrderedAppendages is one of those subtle properties mentioned in Section 1.1.1. Getting a formal definition of it exactly right would usually take some trialand-error, as there are several pitfalls. The definitions of temporary variables members and cycleMembers are familiar by now. For each cycle member n, the property is concerned with all the members in the appendage attached to the cycle at n. From each such appendage member, n is reachable by succession. However, n is actually reachable by succession from any appendage member, because n is on the cycle. Therefore we are only interested in members from which n can be reached by non-cycle successors. The binary relation of non-cycle successors is appendSucc, defined as succ.t - (cycleMembers -> Node), where the subtrahend is the binary relation containing all node pairs starting with a cycle member. If there are three distinct members a1, a2, a3 taken
pred Valid [t: Time] { OneOrderedCycle[t] && AntecedentPredecessors[t] && ConnectedAppendages[t] && OrderedAppendages[t] } pred OneOrderedCycle [t: Time] { let cycleMembers = { n: Node | n in n.(^(succ.t)) } | some cycleMembers -- at least one cycle && (all disj n1, n2: cycleMembers | n1 in n2.(^(succ.t)) ) -- not two && (all disj n1, n2, n3: cycleMembers | n2 = n1.succ.t => ! Between[n1,n3,n2] --- cycle is globally ordered ) } pred AntecedentPredecessors [t: Time] { let members = { n: Node | some n.succ.t } | all n: members | let antes = (succ.t).n | n.prdc.t in antes } pred ConnectedAppendages [t: Time] { let members = { n: Node | Member[n,t] } | let cycleMembers = { n: Node | n in n.(^(succ.t)) } | all na: members - cycleMembers | some nc: cycleMembers | nc in na.(^(succ.t)) } pred OrderedAppendages [t: Time] { let members = { n: Node | Member[n,t] } | let cycleMembers = { n: members | n in n.(^(succ.t)) } | let appendSucc = succ.t - (cycleMembers -> Node) | all n: cycleMembers | all disj a1, a2, a3: (members - cycleMembers) + n | ( n in a1.(^appendSucc) && a2 = a1.appendSucc && ( a1 in a3.(^appendSucc) || a3 in a2.(^appendSucc) ) ) => ! Between[a1,a3,a2] } pred OrderedMerges [t: Time] { let cycleMembers = { n: members | n in n.(^(succ.t)) } | all disj n1, n2, n3: Node | ( n1 in cycleMembers && n3 in cycleMembers && n2 !in cycleMembers && n3 in n1.succ.t && n3 in n2.succ.t ) => Between[n1,n2,n3] }
Figure 5: The pure-join invariant Valid.
6
6
10 and 12 join, 10 stabilizes and notifies 16
6
10 12
6
10
12 stabilizes and notifies 16
16
12
16
6 stabilizes and notifies 12
12 10
16
Figure 6: Three stages (left to right) creating a counterexample to OrderedMerges. Solid arrows represent successors, while dotted arrows represent predecessors. from the appendage set plus n, and if a2 is the direct successor of a1, and if a3 is in the same succession path, then a3 must not fall between a1 and a2 in the identifier order. If it did, then the path would not be properly ordered. The Chord invariant for the pure-join model [9] has aspects that do not appear in the Alloy model, such as identifier distribution and finger validity. The two can be compared, however, on the aspects they have in common. The relevant portions of the Chord invariant are as follows. Bracketed portions are substitutions of terminology. 1. “There is a path using successor lists . . . connecting any two nodes.” 2. “The cycle is [ordered by identifiers].” 3. “For every node v in the appendage [attached to the cycle at u], the path of successors from v to u is increasing.” 4. “For every node v, if v is on the cycle, then v.successor is the first cycle node following v.” 5. “For every node v, if v is in the appendage [attached to the cycle at u], then u is the first cycle node following v.”
identifier wraparound could occur within an appendage. No doubt the Chord designers meant “increasing” in the special ring interpretation, but an implementor could easily miss this point and go wrong. The Chord invariant does not include AntecedentPredecessors, which is necessary in some form or another for a proof of correctness. The fifth Chord property is defined in Figure 5 as OrderedMerges, because it says that an appendage merges with the cycle at the right place in the cycle order. This property is the worst problem here, because it is clearly not true of Chord networks. Although a counterexample can be constructed with only 3 nodes total, Figure 6 shows how the property can be violated at any time, in a network of any size. The discussion of OrderedMerges will continue in Section 4. The definition of the protocol goal Ideal is given in Figure 7. A state is Stable if successors and predecessors are all consistent. The predicate AllCycle says that all members belong to a single cycle. Exploration of instances suggests that AllCycle is implied by Valid and Stable. This is asserted and verified by the Analyzer, so that AllCycle can be left out of Ideal.
3.4
Proof for the sequential case
With these preliminaries, it is easy to complete the proof of the following theorem: In any reachable state, if there are no subsequent joins, then eventually the network will become ideal and remain ideal. The formal assertions are shown in Figure 8. The proof has several informal steps as well as formal and automated steps. The first step of the proof is to show that Valid is an invariant. This is an automated step in which the Alloy Analyzer checks the assertions InitialIsValid, JoinPreservesValidity, and StabilizationPreservesValidity. The Initial assertion says that the initial
The first four properties correspond roughly to the combination of OneOrderedCycle and ConnectedAppendages and OrderedAppendages, with some definitional problems. For one example, the first property does not specify whether successors can be followed forward and backward, or forward only. If the latter is meant then the property is violated by any network with two appendages, because nodes in different appendages cannot reach each other by following successors forward only. For another example, with the usual interpretation of “increasing” the third property is also false, because 7
network with one node is valid. The Preserve assertions say that if a network is valid before an operation, then it is valid after it. Because these assertions concern zero, one, and two events, respectively, they can be checked exhaustively on traces of these lengths. The check commands give the scope for each check. In this proof the assertions are checked for networks with up to 8 nodes; this number will be discussed below. As a result of this proof step, we conclude that the set of valid states contains the set of reachable states. The second step of the proof shows that any time the network is valid and not ideal, some stabilization that will change the state of the network is enabled. The assertion ValidRingIsImprovable uses two crucial predicates that are not defined here. The predicate StabilizationWillChangeSuccessor with arguments n, newSucc, and t, is true if and only if stabilization, occuring at time t at node n, will change n’s successor to newSucc. The predicate StabilizationShouldChangePredecessor with arguments n, nSucc, and t, is true if and only if stabilization, occuring at time t at node n, should notify nSucc and change its predecessor to n.4 It is easy to make mistakes writing these predicates, so it is important to use the Analyzer to check that if a predicate is true and the stabilization really happens, the result is as predicted by the predicate. The second step is completed automatically with the check of ValidRingIsImprovable. The third step of the proof is an informal argument that a valid ring will eventually become ideal, if there are no further joins to disrupt it. The protocol specification says that enabled stabilizations will continue to occur. It can be argued from the definitions of the operations that they change the state only when the new state is closer to ideal that the old one. Because any ring is finite, after a finite number of improvements it must become ideal. It should be possible to formalize these informal arguments and prove them with a theorem prover, given enough motivation and knowledge of theorem proving. The fourth step of the proof is an automated check of the assertion IdealRingCannotChange. This shows that once the ring is ideal, no further stabilization operation will change its state. The fifth and final step is an informal argument that if the formal assertions are true for networks with up to 8 nodes then they are true for networks of any size. On the pragmatic side, our experience is that when analysis begins, small rings exhibit an astonishing variety of anomalous behaviors, resulting in assertions that fail when checked. Sometimes the problem is in the model, and sometimes it is in the assertion; in ei-
pred Ideal [t: Time] { Valid[t] && Stable[t] } pred Stable [t: Time] { let members = { n: Node | some n.succ.t } | all n1, n2: members | n2 = n1.succ.t n1 = n2.prdc.t } pred AllCycle [t: Time] { let members = { n: Node | some n.succ.t } | all n1, n2: members | n2 in n1.(^(succ.t)) } assert IdealImpliesAllCycle { all t: Time | Ideal[t] => AllCycle[t] }
Figure 7: The pure-join goal Ideal.
assert InitialIsValid { let members = { n: Node | Member[n,trace/first] } | ( one members && members.succ.trace/first = members && no members.prdc.trace/first ) => Valid[trace/first] } check InitialIsValid for 8 but 0 Event, 1 Time assert JoinPreservesValidity { (some Join && Valid[trace/first]) => Valid[trace/last] } check JoinPreservesValidity for 8 but 1 Event, 2 Time assert StabilizePreservesValidity { (some Stabilize && Valid[trace/first]) => (Valid[trace/first.next] && Valid[trace/last]) } check StabilizePreservesValidity for 8 but 2 Event, 3 Time assert ValidRingIsImprovable { (Valid[trace/first] && ! Ideal[trace/first]) => ( (some n, newSucc: Node | StabilizationWillChangeSuccessor [n,newSucc,trace/first] ) || (some n, nSucc: Node | StabilizationShouldChangePredecessor [n,nSucc,trace/first] ) ) } check ValidRingIsImprovable for 8 but 0 Event, 1 Time assert IdealRingCannotChange { Ideal[trace/first] => ( (no n, newSucc: Node | StabilizationWillChangeSuccessor [n,newSucc,trace/first] ) && (no n, nSucc: Node | StabilizationShouldChangePredecessor [n,nSucc,trace/first] ) ) } check IdealRingCannotChange for 8 but 0 Event, 1 Time
Figure 8: Formal components needed for the proof.
4
The difference between “will” and “should” is discussed in the next section.
8
purejoin full model
4 3.0 sec 5 4.4 min
6 6.5 min 6 72 min
8 53 hr 7 15 hr
posed to change a predecessor (according to StabilizationShouldChangePredecessor) but does not, because a concurrent operation has already installed a better predecessor at the same node. This is the reason for the use of “should” instead of “will”. Fortunately, the unexpected result is closer to ideal than the expected result, so the overall proof still holds.
ther case the counterexample given by the Analyzer is equally helpful in fixing it. The natural progression is to analyze rings of size one, fix all the problems so that the assertions are true for that scope, then go on to rings of size two, then three, etc. With this method, one finds that, for the pure-join model, all of the problems manifest themselves in rings of size 4 or smaller. Because rings are symmetric and node interactions are local, it is safe to conclude that no new problems will arise in larger networks—except of course for implementation problems that are not represented in this study at all. On the theoretical side, there is considerable research proving size cutoffs for analysis of rings and other symmetric structures. For example, Emerson and Namjoshi have proved that it is only necessary to check rings up to size 4 to verify assertions relating pairs of nodes [1]. This cutoff does not apply directly to Chord because its ordering assertions constrain all network nodes, and in fact the full model exhibits new counterexamples at size 5. Nevertheless, it supports the argument that if a network structure has symmetries, analysis of relatively small instances is sufficient. The table above shows ring sizes and analysis times for the hardest assertion in each model, on a 2GHz processor. For each model, the first column is the largest size at which a new counterexample was found. The second column is the largest size at which model exploration is relatively quick and easy. The third column is the largest size that is feasible to analyze at this processor speed.
3.5
3.6
The original, informal proof of the pure-join case [14] is inferior to the Alloy proof in several ways. This comparison is intended to show how helpful a little formality can be in constructing a convincing argument. Section 3.3 has already mentioned definitional problems with the Chord invariant from [9]. The earlier paper [14] has similar definitional problems. The biggest problem with the proof in [14] is that it relies on an invariant, but neither states a well-formed invariant nor proves that any property is invariantly true. Section 5.1 says, “The invariant states than once node n can reach node r via successor pointers, it always can.” This is not an invariant (a predicate on state snapshots) but rather a temporal formula whose role in induction is unclear. Later in the paper there is a definition of weakly stable (Stable in Figure 7) and strongly stable (weakly stable and well-ordered). This is followed by the statement, “The [pure-join protocols] maintain strong stability in a strongly stable network.” Is strong stability meant to be the pure-join invariant? But of course neither weak nor strong stability is invariantly maintained—both are properties of networks in the ideal state. Other steps of the proof have missing justifications (Theorem 5.8) or justifications that are difficult to follow (Lemma 5.5). Both proofs rely on the argument that a finite number of improvements is sufficient to transform any valid network into an ideal one. The Chord proof assumes that stabilization is atomic, without discussion. Therefore it does not consider the concurrent case.
Additional verification of the concurrent case
Up to this point we have been assuming that both operations on the network are atomic. Because stabilization entails two different events at different nodes, this is clearly an oversimplification. With the Alloy model, it is easy to explore the concurrent cases. There are three concurrent cases, which can be diagrammed (with time running vertically) as: stabilize join notify
stabilize1 stabilize2 notify1 notify2
Comparison to the original proof
4. 4.1
ANALYSIS OF A FULL MODEL Model construction
The full model adds failures and reconciliation to the pure-join model. Failure recovery requires that each node maintain a list of successors, rather than just one. For simplicity, in the Alloy model each node maintains two successors. With these changes, the node declaration is now:
stabilize1 stabilize2 notify2 notify1
sig Node { succ: Node lone -> Time, succ2: Node lone -> Time, prdc: Node lone -> Time, bestSucc: Node lone -> Time }
Analysis of the concurrent cases is designed to determine whether the results of concurrent operations are always the same as for atomic ones. In fact, they are not. Sometimes a notify event is sup9
The field bestSucc is defined in the node invariant as the first of the two successors that is live (still a member), if any. For example, if a node n’s current successor (at time t) is not a member (has failed), and if the node’s current succ2 is a member, then n.bestSucc.t = n.succ2.t. This field is redundant because it can be derived from the successor fields, but it is convenient to have it in the state because almost all properties must be defined in terms of it. In a real Chord implementation, each node keeps a successor list long enough so that the probability that all its successors will fail is extremely small. For purposes of modeling in Alloy, this probabilistic guarantee must be transformed into a deterministic guarantee. To achieve this, there is a condition on fail events that a failure cannot occur if it would leave some node with no live successors. As a result of these modeling decisions, the relation bestSucc has the same critical property that succ does: each member node has exactly one successor at all times. Because bestSucc has this critical structural property, it can be substituted for succ in property definitions from the pure-join model. For example, in the full model OneOrderedCycle is:
6 10 12 6 10 fails 12 6 6 stabilizes
10 12
Figure 9: Three stages (upper left to lower right) creating a counterexample to correctness of the Chord protocol. Figure 9 shows what can happen if a stabilizing node over-writes its current successor with a new pointer obtained from its current successor, and the new pointer points to a node that has already failed. The stabilization removes a good link and replaces it with a bad one. If this occurs when the stabilizing node does not have a redundant successor, then the cycle is lost and cannot be recovered by stabilization or reconciliation. In the same general vein, a joining node gets a successor from a member node with the same successor. What if this successor has failed, and the member doesn’t know it yet? Then the joining node thinks it is connected to the network but is not, because its only pointer is to a failed node. The Chord papers present the protocol in very compact pseudocode that is vulnerable in these situations. The problems are easily fixed with extra checks in the operations. It may be that most potential implementors already knows that these extra checks are necessary, but it is much more likely that most do not. For example, the checks are not present in the OverLog implementation of Chord [10]. The Alloy specification of the Chord operations is a bit longer than the pseudocode, but not by much, because like the pseudocode it is very abstract. Its form accommodates extra checks and error cases easily, and counterexamples from analysis show why the extra checks are necessary. It demonstrates that better documentation of routing protocols is valuable and easily within reach.
pred OneOrderedCycle [t: Time] { let cycleMembers = { n: Node | n in n.(^(bestSucc.t)) } | some cycleMembers && (all disj n1, n2: cycleMembers | n1 in n2.(^(bestSucc.t)) ) && (all disj n1, n2, n3: cycleMembers | n2 = n1.bestSucc.t => ! Between[n1,n3,n2] ) } OneOrderedCycle must be changed from its orginal definition in Figure 2, because the original is not invariantly true in a network with failures. This method of writing assertions about networks with redundant state structures might be quite helpful in analyzing a variety of reliability protocols. In addition to join, stabilize, and notify events, the full model has fail, flush, update, and reconcile events. The last three correspond in name and function to the pseudocode for handling failures in [9], and fit collectively under the name of reconciliation in this paper. Flush events remove failed predecessor pointers, update events remove failed successor pointers (replacing them with succ2 pointers), and reconcile events improve succ2 pointers (if possible) by replacing them with the current successor’s successor. As with stabilizations, a member can schedule reconciliation events at any time.
4.2
10
4.3
Problems of small rings
Each implementation of Chord chooses a length r for the successor lists stored in nodes. With respect to a
Easily-fixed bugs 10
particular value of r, we can consider a network to be small if it has r or fewer cycle members. Small rings display a wide variety of pathologies, capable of violating almost any invariant property. There are two reasons for these problems, both originating in the fact that successor lists can wrap around the entire cycle. First, successor lists can contain less information than expected, because the same node appears more than once, or a node appears in its own successor list (except for the case of a ring with one member, which must have itself as successor). Second, proper ordering relationships can become confused. For example, in the Alloy model with r = 2, OrderedAppendages can be violated in a network with cycle size 2 and 5 members. In the Alloy full model, most of these problems are avoided by constraining successor lists to have no redundant or wraparound entries. This means that small rings have less redundancy, and are therefore less robust. It also means that a valid small ring might not become ideal by stabilization and reconciliation.
4.4
When designing an invariant for Chord or similar protocols, there are three reasons why a property might appear in it. First, the property might be necessary to make a valid network improvable, i.e., well-structured enough so that it could approach or become ideal. Second, the property might be necessary or desirable for normal operating conditions. Third, the property might be neither of the above, but necessary to preserve another invariant property that is in one of the first two categories. Exploration of the model with respect to the first goal is done by checking an updated version of the assertion ValidRingIsImprovable. Here the news is good. Stabilization and reconciliation are so powerful in healing disruptions of the structure that only OneOrderedCycle and ConnectedAppendages are necessary to make all valid networks improvable. Next we consider invariant properties with respect to the second goal, which is desirable normal operating conditions. Both OrderedMerges and OrderedAppendages are desirable for fast, reliable lookups. Property 6 of the Chord invariant is mysterious at first glance. What does it mean? Why is this property important? With bound variable w playing the role of u’s successor, it is formalized in Alloy as:
Invariant design for the full model
Having disposed of easier problems, it is now time to consider what an invariant for the full model might be. The Chord invariant, in the same form used in Section 3.3, is [9]:
pred ValidSuccessorList [t: Time] { let members = { n: Node | Member[n,t] } | all w: members | let antes = (succ.t).w | all v: members | -- if w’s successors skip over a live -- node v ( Member[w.succ2.t,t] && Between[w.succ.t,v,w.succ2.t]) ) -- then v is not in the successor -- list of any w antecedent => v !in antes.succ2.t }
1. “The network is connected.” 2. “The cycle is [ordered by identifiers].” 3. “For every node v in the appendage [attached to the cycle at u], the path of successors from v to u is increasing.” 4. “If v is on the cycle, then v.successor is the first live cycle node following v.” 5. “If v is in the appendage [attached to the cycle at u], then u is the first live cycle node following v.”
The Alloy version takes into account the fact that w might have more than one antecedent. The purpose of ValidSuccessorList can be seen in Figure 10. The third snapshot violates ValidSuccessorList, and the subsequent failure of node 17 causes what we see in the fourth snapshot. Now the former cycle node 20 has become an appendage node, unreachable from all cycle nodes until it becomes part of the cycle again. This is an undesirable degradation in service. Finally we consider potential invariants that might help preserve the properties already defined as goals. The model defines four of these. AntecedentPredecessors comes from the pure-join model. DistinctSuccessors says that a node’s succ2 is neither the node’s successor nor the node itself. OrderedSuccessors requires Between[n,n.succ.t,n.succ2.t] for each node n and time t. ReachableSuccessor2 is a complex property
6. “If the successor list of u.successor skips over a live node v, then v is not in [the successor list of u].” Due to confusion born of too little formality, Property 4 is meaningless; it uses the phrase “first live cycle node” when it is impossible to have a dead cycle node (a dead node has no successor, so it cannot be part of any cycle). The correct phrasing would be, “If v is on the cycle, then the first live node in v’s successor list is the first cycle node following v.” Properties 1, 2, and 4 are roughly equivalent to the conjunction of OneOrderedCycle and ConnectedAppendages. Property 3 is roughly equivalent to OrderedAppendages, and Property 5 to OrderedMerges. All five properties are wrong in detail, because they do not distinguish between succ and bestSucc. Property 6 is discussed below. 11
9
9
17
9
13
13
13
9 stabilizes
20 joins, 17 stabilizes, 9 reconciles
13
17 fails, 13 updates 17
17 20
20
20
25
25
9
25
25
Figure 10: Three stages (left to right) creating a counterexample to ValidSuccessorList, and its effect (fourth stage) after a failure. Solid arrows represent primary successors, while dashed arrows represent secondary successors. snapshot to the third, all three new members fail, and all three old members update to promote their second successors. The result is a cycle that is not ordered by identifiers. In this general class of counterexample, any cycle of odd size will become disordered, and any cycle of even size will break into two unconnected networks. It is well-known from the original Chord papers that the protocol cannot recover from either of these catastrophes. Concerning the six properties claimed invariant for the full protocol, this class of counterexample shows that 1, 2, and 4 are not invariant. Figure 6 shows that Property 5 is not invariant, and Figure 10 shows that Property 6 is not invariant. These problems remain after the fixes in Sections 4.2 and 4.3 have been made. The counterexample to Property 3 is mentioned in Section 4.3; this property may be invariant once the problems of small rings are fixed. There has been a great deal of work on probabilistic analyses that relate rates of change in Chord networks to their states and performance [8, 9, 13, 14, 12]. Whether it is ever stated or not, all of this work assumes that the protocol maintains some invariant structure, and the soundness of the analysis depends on knowing what that invariant structure is. Figure 6 contains a small example of why some performance analyses may be suspect. The analysis in [8] assumes that every stabilization operation that changes a successor reduces the overall number of wrong successors by one. This is over-optimistic: the stabilization by node 6 in Figure 6 is one of several categories that changes a successor but does not reduce the overall number of wrong successors.
saying (roughly) that a node’s succ2 is either reachable from the node, or is well-formed like the succ2 in the last snapshot of Figure 10.5
4.5
Attempts at verification
Analogously to the theorem for the pure-join model, we would like to prove the theorem that if a Chord network ceases to experience new joins and failures, and is given enough time for stabilization and reconciliation, then it will eventually become ideal. As discussed above, any invariant containing OneOrderedCycle and ConnectedAppendages is strong enough to ensure that stabilization and reconciliation can idealize the network. Despite extensive exploration of the model, there was no success in finding an invariant that is strong enough and preserved by all operations of the full model. Searching for an elusive invariant is hard work. If an operation begins with a valid structure and makes it invalid, the invariant (definition of validity) might be too weak or too strong. If it is too weak, then the structure preceding the operation is ill-formed and could never have arisen in practice. If it is too strong, then the structure following the operation is fine and should not be declared invalid. The trouble is that any change in the definition of validity, made to fix a problem with one operation, typically causes a new problem with another operation. Fortunately, there is no need to continue the search. Analysis has yielded counterexamples demonstrating that there can be no such invariant, because the theorem is false. There is actually a class of counterexamples, one for each cycle size above 2. The counterexample for cycle size 3 is shown in Figure 11. In the multi-event transition from the first snapshot in Figure 11 to the second, three new members join the cycle. In the multi-event transition from the second 5
4.6
Analysis of Chord with model checking
Techniques based on model checking can be applied to the actual implementations of network protocols. Be-
The complexity lies in defining “well-formed like.”
12
0
an ideal state
40
three nodes join
18
0
49
40
0
new nodes fail, old nodes update
5
18
18
40
21
Figure 11: Three stages (left to right) creating a counterexample to correctness of the Chord protocol. cause implementations are so much more complex than abstract models, analysis is necessarily incomplete, and the techniques find bugs rather than attempt or approximate verification. Three papers on such techniques use Chord as a case study [7, 16, 17]. Although all found bugs in Chord implementations, they did not find any of the problems described here. Furthermore, Killian et al.’s work on Chord led them to state and enforce the invariant that “a node’s predecessor is itself if and only if its successor is itself” [7]. However, this property is not an invariant of the version of the Chord protocol modeled here, which is the version they claim to have implemented. In fact it would prevent a one-node ring from ever becoming a two-node ring.
5.
This extreme approach would be a valuable beginning, however, if we knew how to refine the protocol successively, with each refinement making it more efficient without sacrificing correctness. Protocol design by refinement would be the best method, if it were mature enough for big challenges. Collecting and expanding the the repertoire of design refinements would be an important contribution to research in distributed computing. Already some implementations of Chord have tighter coupling among nodes than the original protocol, to solve practical problems such as non-transitive connections [2]. This is evidence that an implementation can be somewhat less concurrent than the original protocol and still be practical. In the worst case, design by refinement would get stuck at a stage where there was no choice but to have an inefficient protocol or lose absolute correctness. Even after losing absolute correctness, however, the protocol might have fewer problems and be better understood than Chord. It might be possible to reason that it is correct with high probability. Like Chord, its goals would be stated explicitly. Unlike Chord, there would be no need to search for an invariant that does not exist. At the other extreme of the concurrency spectrum, there are implementations. In a real implementation some of the events in the Alloy model would require many events in which messages are sent and received or timed out. It is clear that this additional concurrency can only make an implementation more difficult to understand than an abstract model. This is why it makes sense to understand an abstract model before tackling the complexities of a real implementation.6 An “almost-invariant” as used by Yabandeh et al. [16] is a property of a distributed system that is true unless there is an implementation bug or a design flaw with a low probability of occurrence. The purpose of the Avenger tool is to infer “almost-invariants” from implementations of distributed systems. The inferred properties are then used to monitor live executions or
DISCUSSION
As everyone knows, asynchronous concurrency is the principal reason why it is so difficult to design and implement a reliable network protocol. On the concurrency spectrum, the Chord pseudocode and the models presented here occupy the middle ground. For an ideal ring to recover completely after a single join would require two stabilize events and two notify events. For an ideal ring to recover completely after a single failure would require one update event, two reconcile events, one flush event, one stabilize event, and one notify event. Some of these events have to occur in the correct order, as Figure 9 shows. At this level of concurrency, simpler protocols than Chord could probably be proven correct. At this level of concurrency, Chord can be shown to perform efficiently. The trouble with Chord is that it cannot be shown to perform correctly, and there is no obvious way to fix its flaws, or even to determine how frequently they occur in real networks. At one extreme of the concurrency spectrum, we could regard a join and all subsequent recovery as one atomic transaction, and regard detection of a failure plus all subsequent recovery as one atomic transaction. It would be easy to prove the protocol correct, and it would be very inefficient.
6 The work on MultiChord [11] is doubly impressive because the protocol is modeled at the implementation level, and the verification is not even partially automated.
13
implementation-level model checks to find bugs [16]. There is no doubt that it is important to have invariants or almost-invariants to use for model checking or monitoring of live executions. However, the properties presented in [9] and here are very good candidates for almost-invariants of Chord, and far more powerful than anything inferred by applying Avenger to Chord. It may be that time exploring abstract models of a protocol would be better spent than time applying a tool such as Avenger. This study is an excellent demonstration of why model checking and model enumeration are complementary. A model checker constructs an efficient internal representation of the entire reachable state space of a system. It is not necessary to provide an explicit invariant, because it is implicit in the set of reachable states. However, an explicit invariant greatly enhances the power of model checking by giving the model checker something to check for. The Alloy Analyzer cannot analyze long traces, because it is not optimized for them. However, if there is an explicit invariant, verification requires short traces only. Explicit invariants contribute greatly to our ability to understand protocols, design them well, and debug their implementations. The expressiveness of the Alloy language contributes greatly to exploring possible invariants. It would not have been possible to do this work with a model checker such as Spin, because of the difficulty of writing and modifying checkable invariants. The property language of the Mace model checker appears to be considerably better than Spin’s, but still falls far short of the expressiveness of Alloy.
6.
the behaviors of large ones. Other protocols may have fewer symmetries, which means that small networks may not exhibit all the interesting behaviors. Even though lightweight methods may not succeed as completely on all network protocols, they are likely to provide some benefits in every case.
7.
REFERENCES
[1] E. A. Emerson and K. S. Namjoshi. Reasoning about rings. In Proceedings of the Symposium on Principles of Programming Languages, pages 85–94. ACM, 1995. [2] M. J. Freedman, K. Lakshminarayanan, S. Rhea, and I. Stoica. Non-transitive connectivity and DHTs. In Proceedings of the Second Conference on Real, Large, Distributed Systems, pages 55–60. USENIX, 2005. [3] K. Ghargavan, D. Obradovic, and C. A. Gunter. Formal verification of standards for distance vector routing protocols. Journal of the ACM, 49(4):538–576, 2002. [4] G. J. Holzmann. The Spin Model Checker: Primer and Reference Manual. Addison-Wesley, 2004. [5] D. Jackson. Software Abstractions: Logic, Language, and Analysis. MIT Press, 2006. [6] D. Jackson, Y. Ng, and J. Wing. A Nitpick analysis of Mobile IPv6. Formal Aspects of Computing, 11(6):591–615, 1999. [7] C. Killian, J. A. Anderson, R. Jhala, and A. Vahdat. Life, death, and the critical transition: Finding liveness bugs in systems code. In Proceedings of the Fourth USENIX Symposium on Networked System Design and Implementation, pages 243–256, 2007. [8] S. Krishnamurthy, S. El-Ansary, E. Aurell, and S. Haridi. A statistical theory of Chord under churn. In Peer-to-Peer Systems IV. Springer-Verlag LNCS 3640, 2005. [9] D. Liben-Nowell, H. Balakrishnan, and D. Karger. Analysis of the evolution of peer-to-peer systems. In Proceedings of the 21st ACM Symposium on Principles of Distributed Computing, pages 233–242. ACM, 2002. [10] B. T. Loo, T. Condie, J. M. Hellerstein, P. Maniatis, T. Roscoe, and I. Stoica. Implementing declarative overlays. In Proceedings of the 20th ACM Symposium on Operating System Principles, pages 75–90. ACM, 2005. [11] N. Lynch and I. Stoica. Multichord: A resilient namespace management protocol. MIT CSAIL Technical Report 2004-007, February 2004. [12] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proceedings of SIGCOMM. ACM, August 2001. [13] I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup protocol for Internet applications. IEEE/ACM Transactions on Networking, 11(1), February 2003. [14] I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for Internet applications. MIT LCS Technical Report 819, http:// www.pdos.lcs.mit.edu/chord/papers/chord-tn, 2001. [15] A. Wang, P. Basu, B. T. Loo, and O. Sokolsky. Declarative network verification. In Proc. 11th Intl. Symp. on Practical Aspects of Declarative Languages, January 2009. [16] M. Yabandeh, A. Anand, M. Canini, and D. Kosti´ c. Almost-invariants: From bugs in distributed systems to invariants. Technical report, EPFL NSL-REPORT-2009-007, 2009. [17] M. Yabandeh, N. Kneˇ zevi´ c, D. Kosti´ c, and V. Kuncak. CrystalBall: Predicting and preventing inconsistencies in deployed distributed systems. In Proceedings of the Sixth USENIX Symposium on Networked Systems Design and Implementation. USENIX, April 2009.
CONCLUSION
This paper demonstrates “lightweight” modeling and verification of a network protocol. Lightweight models are simple and abstract. Lightweight verification is automated and requires minimal knowledge of formal methods. A study of Chord using a lightweight method led to a long list of new findings about its behavior and correctness, despite the fact that Chord was well-known and well-studied before this work began. This justifies the claim that the use of lightweight methods should be considered a normal and even indispensable part of network protocol design. Both model checking and model enumeration can be lightweight methods (model checking can be applied to abstract models as well as implementations). This paper shows that they are complementary. Model checking is a popular technique for distributed systems, but model enumeration—although almost completely unknown in this domain—can also be extremely useful. Lightweight methods are particularly successful on ring protocols because of the obvious symmetries among nodes, which guarantee that small networks exhibit all 14