Roy Friedman , Achour Mostefaoui , Michel Raynal. Th`eme 1 â Réseaux et syst`emes. Projet Adept. Publication interne nË1598 â Janvier 2004 â 12 pages.
I
IN ST IT UT
DE
E U Q TI A M R
ET
ES M È ST Y S
E N
RE CH ER C H E
R
IN F O
I
S
S IRE O T ÉA AL
A
PUBLICATION INTERNE No 1598
INTERSECTING SETS: A BASIC ABSTRACTION FOR ASYNCHRONOUS AGREEMENT PROBLEMS
ISSN 1166-8687
ROY FRIEDMAN , ACHOUR MOSTEFAOUI , MICHEL RAYNAL
IRISA CAMPUS UNIVERSITAIRE DE BEAULIEU - 35042 RENNES CEDEX - FRANCE
http://www.irisa.fr
Intersecting Sets: a Basic Abstraction for Asynchronous Agreement Problems Roy Friedman , Achour Mostefaoui , Michel Raynal Th`eme 1 — R´eseaux et syst`emes Projet Adept Publication interne n˚1598 — Janvier 2004 — 12 pages
Abstract: Defining good abstractions is a central issue when one wants to understand the deep structure and basic principles that underlie computing mechanisms. This paper introduces a basic and particularly simple distributed computing abstraction suited to asynchronous distributed agreement problems. This abstraction, called Intersecting Sets, requires each process to deposit a value and allows each non-faulty process to obtain a subset of these values such that any two such sets have a non-empty intersection. This simple abstraction captures an essential part of distributed agreement problems. After having introduced and motivated this abstraction, the paper investigates its properties, its power and its benefit when solving distributed agreement problems. Key-words: Asynchronous system, Computing abstraction, Consensus, Distributed algorithm, Failure detector, Message passing, Non blocking atomic commit.
(R´esum´e : tsvp)
email: {rfriedma,mostefaoui,raynal}@irisa.fr
Centre National de la Recherche Scientifique (UMR 6074) Université de Rennes 1 – Insa de Rennes
Institut National de Recherche en Informatique et en Automatique – unité de recherche de Rennes
Une brique de base pour les probl` emes d’accord asynchrone : intersecting sets R´ esum´ e : Afin de mieux comprendre et mieux appr´ehender les fondements et les principes des ex´ecutinos r´epartie, il est judicieux de d´efinir les bonnes abstractions. Cet article pr´esente une abstraction appel´ee Intersecting Sets. Tout processus y d´epose sa valeur et tout processus qui en r´ecup`ere un sous-ensemble de valeurs voit celui-ci intersecter tout sous-ensemble r´ecup´er´e par tout autre processus. Cette abstraction simple d’´enonc´e est essentielle `a la conception d’algorithmes d’accord r´eparti. Mots cl´ es : Abstraction, paradigme, probl`eme d’accord, syst`eme asynchrone, consensus, communication par messages, validation atomique, d´efaillance de processus.
1
Introduction
On abstractions in fault-tolerant distributed systems “Fundamentally, computer science is a science of abstraction - creating the right model for a problem and devising the appropriate mechanizable techniques to solve it” [2]. To attain this goal, computer scientists have to design computation models and paradigms, i.e., “abstractions” with a well defined behavior (described by a specification), and hiding the details not relevant to the problems to be solved. Defining good computation models, good paradigms and good abstractions still remains a real challenge [13]. A paradigm is an abstraction that underly a variety of computing problems. Hence, a main part of our job is to formulate a problem in terms of paradigms, and to use known paradigms whenever possible (we do not have to reinvent the wheel each time we need it! [14]). A common example for the use of paradigms in fault-tolerant asynchronous distributed computing is in the way failure detection is handled. One of the most known and attractive of these paradigms is the failure detector concept introduced by Chandra and Toueg [4]. It consists in equipping the underlying asynchronous system with devices providing the processes with (possibly unreliable) information on failures. A major advance in this approach lies in the fact that a failure detector is defined by a set of abstract properties (related to failure detection) and not in term of a particular implementation (involving network topology, message delays, local clock, etc.). The failure detector approach allows a modular decomposition that not only simplifies protocol design but also provides very general solutions. Moreover, from a theoretical point of view, its greatest advantage lies in the fact that it allows stating the minimal assumptions (with respect to failure detection) that are required to solve a problem (e.g., see [5] for the consensus problem, and [1] for the uniform reliable broadcast problem). We call these abstractions “failure detection abstractions”. They are characterized by the fact their outputs depend only on the failure pattern (they do not involve process inputs). Another type of paradigm often encountered is related to the notion of abstraction layer. As a simple example, let us consider the Reliable Broadcast problem which requires that (at least) the non-faulty processes deliver the same set of messages despite process failures. This problem can be solved without “additional computational power” (such as the one provided by a failure detector) by using the send and receive primitives offered by the bare underlying system. Nevertheless, it is usual to provide the upper layer with an “abstract” reliable broadcast primitive while hiding the way it is implemented. More sophisticated examples are the use of a consensus abstraction to build atomic broadcast [4], or the use of a lattice agreement abstraction to build a snapshot primitive [3]. Another noteworthy example is the wave abstraction used to solve distributed observation problems (such as termination detection, deadlock detection, or stable properties detection) in asynchronous failure-free systems [12, 15].1 We call these abstractions “distributed computing abstractions”. They are characterized by the fact the specification of their outputs always involves values provided by the processes. Content of the paper This paper introduces and investigates a new fault-tolerant distributed computing abstraction whose aim is to make easier the understanding and the design of asynchronous distributed agreement protocols. This abstraction, that we call IS (Intersecting Sets), allows each process to deposit (propose) a value and obtain (decide) a set of values such that any such two sets have a non-empty intersection despite any number of process failures. 1 In fact, the whole notion of middleware is centered around providing abstractions that simplify application writing. Is middleware terminology, abstractions are typically encapsulated by entities like object brokers, services, and group communication toolkits.
PI n˚1598
This is a very simple abstraction. In particular, it can be easily implemented without additional assumptions in a pure asynchronous distributed system prone to process crashes when there is a majority of non-faulty processes. Nevertheless, it is very powerful and useful. Actually, it captures the very essence of distributed agreement problems as, in some sense, it factorizes their common part. Intuitively, it states that, for deciding, two processes have to share something. Promoting IS as a first class abstraction has many advantages. First, as it is at the core of non-deterministic one-shot agreement problems, it allows us to get a better understanding of what unifies and what distinguishes these problems. Then, as it hides implementation details, it allows a relatively easy statement of some problem equivalence in failure detector-augmented asynchronous systems. It also participates in making clearer the separation between the abstractions that are based on failure detection and the abstractions that are not, thereby providing further insights into the fine structure of one-shot agreement problems. Roadmap The paper is made up of six sections. Section 2 presents the computation model and a few definitions. Section 3 defines the intersecting sets problem, and illustrates its benefits by presenting an IS -based consensus protocol. Then, Section 4 addresses the implementation of intersecting sets. A t-resilient (t denoting the maximum number of processes that may crash), failure-detector-based IS protocol is presented, that is shown to be optimal with respect to t, i.e., it works for any t < n, where n is the total number of processes. Then, Section 5 enlarges the picture by providing other distributed computing abstractions suited to agreement problems. Finally, Section 6 concludes the paper.
2
Computation Model and Definitions
2.1
Asynchronous Distributed Systems with Process Crashes
The computation model follows the one described in [4, 8]. The system consists of a finite set Π of n > 1 processes, namely, Π = {p1 , . . . , pn }. A process can fail by crashing, i.e., by prematurely halting. At most t < n processes can fail by crashing. A process behaves correctly (i.e., according to its specification) until it (possibly) crashes. By definition, a correct process is a process that does not crash. A faulty process is one that is not correct. Until it (possibly) crashes, a process is alive. Processes communicate and synchronize by sending and receiving messages through channels. Every pair of processes is connected by a channel. Channels are assumed to be reliable. There is no assumption about the relative speed of processes nor on message transfer delays: the system is asynchronous. The primitive broadcast () is used as a shortcut for for each j ∈ [1..n] do send () to pj end do. This means that, while send () is an atomic operation, broadcast () is not: if pi crashes while it is executing broadcast (m), only a subset of the processes may receive m.
2.2
The Failure Detector Class 3S
Each process pi is provided with a set suspectedi that contains processes suspected to have crashed. If pj ∈ suspectedi , we say that “pi suspects pj ”. The class 3S includes all the failure detectors that satisfy the following properties [4]: • Strong Completeness: Eventually, every process that crashes is permanently suspected by every correct process. Irisa
• Eventual Weak Accuracy: There is a time after which some correct process is never suspected by any correct process. When there is no ambiguity, the notation 3S is used to denote either the class of failure detectors it includes, or any of these failure detectors. Naturally, 3S is a failure detection abstraction.
2.3
The Consensus Problem
As already indicated in the Introduction, in the consensus problem, every correct process pi proposes a value vi and all correct processes have to decide on the same value v, which has to be one of the proposed values. More precisely, the consensus problem (denoted CONS ) is defined by two safety properties (Validity and Agreement) and a Termination Property [4, 8]: • Validity: If a process decides v, then v was proposed by some process. • Agreement: No two processes decide differently2 . • Termination: Every correct process eventually decides on some value. Let us notice that CONS is a distributed computing abstraction.
2.4
Notation for System Models
In the following, ASn,t [D1 , . . . ][A1 , . . . ] denotes an asynchronous distributed system made up of n processes (where up to t may crash), satisfying the additional assumptions provided by the abstractions D1 , . . . , and A1 , . . . . As we can see, this is a two-dimension notation. The first one [D1 , . . . ] is related to the computational power added by failure detection: each Dx is a failure detection abstraction providing information on failures that can be used by the upper layer processes. The second dimension [A1 , . . . ] concerns distributed computing abstractions the system is supplied with. It aims at making application design easier by encapsulating basic design paradigms. As a few examples, ASn,t [∅][CONS ] denotes an asynchronous distributed system made up of n processes among which up to t < n can crash, equipped with a distributed computing abstraction, namely, a consensus black box. Albeit it provides a consensus black box, this system provides no information on failures. More generally, whatever the distributed computing abstraction X, any system ASn,t [∅][X] provides no information on failure detection. Differently, the system ASn,t [3S][CONS ] provides the processes with a consensus black box, plus failure information whose quality is the one provided by the 3S properties. ASn,0 [∅][∅] represents a reliable asynchronous distributed system. To summarize, in the ASn,t [][] notation, the first dimension is related to the computational power provided by the addition of failure detection mechanisms, while the second dimension is related to the capture of basic programming abstractions, i.e., to the programming comfort.
3 3.1
The Intersecting Set Problem Definition
The Intersecting Set problem (in short IS ) is a distributed computing abstraction defined as follows. Each process pi deposits (proposes) a value vi and obtains (decides) a set of values Vi such that the following properties are satisfied: 2 This property is sometimes called Uniform Agreement as it requires that a faulty process that decides, decides as a correct process.
PI n˚1598
• Termination: Each correct process decides. • Agreement: Let Vi (resp., Vj ) be the set decided by pi (resp., pj ). We have Vi ∩ Vj 6= ∅. • Validity: If v ∈ V (where V is the set decided by a process), then v is a proposed value. In the following, we consider that a process pi invokes the primitive Inter sets(vi ) to propose vi . When it terminates, that primitive returns to pi a set Vi of values.
3.2
An Example of Use
To illustrate the interest on the IS abstraction, and the modular protocol design it furthers, we present here a consensus protocol for asynchronous systems equipped with 3S. The protocol skeleton is similar to the one of the 3S-based consensus protocol introduced in [11]. It is described in Figure 1. A process pi starts a consensus execution by invoking Consensus (vi ) where vi is the value it proposes. This function is made up of two tasks, T 1 (the main task) and T 2. The statement return(v) terminates the consensus execution (as far as pi is concerned) and returns the decided value v to pi . The processes proceed by consecutive asynchronous rounds. Each process pi manages two local variables whose scope is the whole execution, namely, ri (current round number) and esti (current estimate of the decision value), and two local variables whose scope is the current round, namely, auxi and a set reci . ⊥ denotes a default value which cannot be proposed by processes. Due to the use of the IS abstraction, the protocol is extremely simple. A round is made up of two phases. • The first phase (lines 4-6) is based on the round coordinator paradigm. More precisely, the current round coordinator (namely, pc where c = (r mod n) + 1) broadcasts its current estimate v, and each process pi that receives it, keeps it in auxi ; otherwise, pi sets auxi to ⊥. It is important to notice that, during each round, due to the fact that there is a single round coordinator per round, the following property holds [11] at the end of the first phase: ∀i, j : (auxi 6= ⊥ ∧ auxj 6= ⊥) ⇒ (auxi = auxj = v) . • The second phase (lines 7-11) relies on IS . The processes exchange the values of their auxi variables using the IS abstraction3 . It follows from the IS properties that ∀i, j : (reci = {v}) and (recj = {⊥}) are mutually exclusive . Then, in the rest of the second phase, a process pi either decides (case reci = {v}), or adopts v as its new estimate (case reci = {v, ⊥}), or keeps its previous estimate (case reci = {⊥}). When it does not decides (two last cases), pi proceeds to the next round. As a process that decides stops participating in the sequence of rounds and processes do not necessarily terminate in the same round, it is possible that processes proceeding to round r + 1 wait forever for messages from processes that decided during r. The aim of the second task is to prevent such a deadlock possibility by disseminating the decided value. The termination property follows from the strong completeness and the eventual weak accuracy properties of the 3S part of the failure detector. This modular construction provides a clean separation of what is used to ensure the consensus properties, namely, consensus termination relies on the underlying 3S failure detector, while consensus agreement relies on the IS abstraction. 3 Each instance of IS is identified by a round number r. Hence, the invocation Inter sets (r, auxi ) by pi (line 7) refers to the r-th instance of IS , and auxi is the value proposed by pi to the instance Inter sets (r, −)
Irisa
Function Consensus (vi ) Task T 1: (1) ri ← 0; esti ← vi ; (2) while true do (3) c ← (ri mod n) + 1; ri ← ri + 1;
(4) (5) (6)
——- Phase 1 of round r: based on the round coordinator ————— if (i = c) then broadcast phase1(ri , esti ) end if; wait until (phase1(ri , v) has been received from pc ∨ c ∈ suspectedi); if (phase1(ri , v) received from pc ) then auxi ← v else auxi ← ⊥ end if; % (auxi 6= ⊥ ∧ auxj 6= ⊥) ⇒ (auxi = auxj = v) %
——- Phase 2 of round r: reduction to the IS problem —————— reci ← Inter sets (ri , auxi ); % (reci = {v}) ∧ (recj = {⊥}) are mutually exclusive % (8) case reci = {v} then esti ← v; broadcast decision(esti ); return (esti ) (9) reci = {v, ⊥} then esti ← v (10) reci = {⊥} then skip (11) end case (12) end while (7)
Task T 2: when decision(est) is received: do broadcast decision(esti ); return (esti ) end do
Figure 1: A t-Resilient Consensus Protocol for ASn,t [3S][IS ] with t < n Theorem 1 ∀t < n, the consensus problem can be solved in ASn,t [3S][IS ]. Proof The very existence of the protocol described in Figure 1 constitutes the proof. We give here a sketch of the proof of the agreement and termination properties. Termination. Let us first observe that, due to the strong completeness of 3S, no process can block forever at line 5. Let us now consider the time τ after which there is a correct process that is never suspected. Let px be that process and r the first round after τ that is coordinated by px . During r, every non-crashed process pi receives the current estimate v of px and adopts it in auxi . It follows that each set reci is equal to {v} and consequently each non-crashed pi decides at line 8. Agreement. Let r be the first round during which processes decide. The consensus agreement follows from the fact that, due to IS , all the processes that decide during r decide the same value v. Moreover, due to IS and line 9, the processes that start round r + 1 do it with v as their new estimate value. Hence, no other value can be decided in a future round. 2T heorem 1
3.3
An Equivalence in ASn,t [3S][∅]
Reducing the IS problem to the consensus problem is trivial. On the other side, as, in the consensus problem, all the processes decide the same value, i.e., the same singleton set that contains a proposed value, it follows that the invocation Inter sets (vi ) can directly be translated into the invocation Consensus (vi ). Hence the following theorem: Theorem 2 ∀t < n, the IS problem can be solved in ASn,t [∅][CONS ]. Combining this theorem with Theorem 1 (on the consensus protocol of Figure 1), we get: PI n˚1598
Theorem 3 ∀t < n, the consensus problem and the IS problem are equivalent in ASn,t [3S][∅]. An interesting open question is to prove (or disprove) the following conjecture: “ASn,t [3S][∅] is the weakest failure detector-based model in which consensus and IS are equivalent”.
4
Solving the IS Problem
4.1
When a Majority of Processes are Correct (t < n/2)
The trivial one-step communication protocol described in Figure 2 provides a simple way to implement the IS abstraction in ASn,t [∅][∅] when t < n/2, i.e., when the “majority of correct processes” assumption is satisfied. As two majorities always intersect, two sets of (n − t) proposed values have at least one value in common, and consequently, the protocol satisfies the properties defining IS . Hence, the following theorem: Theorem 4 ∀t < n/2, the IS Problem can be solved in ASn,t [∅][∅].
(1) broadcast phase1 (vi ); (2) wait until (phase1 (−) messages have been received from (n − t)); (3) let Vi be the set of values received by pi at line 2
Figure 2: A One-Step IS Protocol in ASn,t [∅][∅] with t < n/2 Interestingly, the consensus protocol presented in [11] can be seen as a non-modular version of the consensus protocol of Figure 1 where the sub-protocol implementing the IS abstraction is the “majority of correct processes”-based protocol described in Figure 2.
4.2
When n/2 ≤ t < n
The failure detector class P x A more crucial question is how to solve the IS problem when the faulty processes are not restricted to be a minority. To answer this question, we first present the class of failure detectors denoted P t introduced in [7] to solve the atomic register problem in An,t when n/2 ≤ t < n. This problem is a distributed computing abstraction that consists of implementing an atomic shared variable in a message-passing system prone to any number of process crashes. It is shown in [7] that P t is the weakest class of realistic4 failure detectors that allows solving the atomic register problem in these systems. The class P x includes all the failure detectors that satisfy the following properties: • Strong Completeness: Eventually, every process that crashes is permanently suspected by every correct process. • x-Accuracy: At any time, no more than n − x − 1 alive processes are suspected. The reader can check that P n−1 is the class of perfect failure detectors (as defined in [4]), i.e., the class of failure detectors that make no mistakes. 4 The notion of realistic failure detector has been introduced in [6]. Informally, a failure detector is realistic if it can be implemented in a synchronous system. Among other features, such a failure detector cannot guess the future. This paper considers only realistic failure detectors.
Irisa
A P t -based IS protocol We show here that a simple combination of P t with a two communication steps pattern provides the desired intersection property. The corresponding IS protocol is described in Figure 3. (1) broadcast phase1 (vi ); (2) wait until (phase1 (−) messages have been received from ≥ (n − t) processes and all processes not locally suspected by P t ); (3) let reci be the set of values received by pi at line 2; ————————————————————————————————(4) broadcast phase2 (reci ); (5) wait until (phase2 (−) messages received from at least (n − t) processes); (6) let Vi be the union of the sets of values received at line 5
Figure 3: A Two-Step IS Protocol for ASn,t [P t ][∅] with n/2 ≤ t < n Theorem 5 ∀t < n, the two-step communication protocol described in Figure 2 solves the IS problem in ASn,t [P t ][∅]. Proof The validity properties of IS follows directly from the protocol code. The termination property follows from the strong completeness of P t , and the the fact that t is an upper bound on the number of faulty processes. The rest of the proof concentrates on the agreement property of IS . Let Qi (resp., Qj ) be the set of processes from which pi (resp., pj ) has received a phase2() message (line 5). Let px be any process of Qi (resp., py any process of Qj ). As px (resp., py ) has sent a phase2() message to pi (resp., pj ), it has executed the first phase (lines 1-3). Moreover, as a process waits at line 5 for messages from at least (n − t) processes, we have |Qi | ≥ n − t and |Qj | ≥ n − t. Let crashk denote the event “crash of pk ”. Assume (by way of contradiction) that Vi ∩ Vj = ∅. We show that this cannot occur. Let ex be the event “px terminates the first phase” (i.e., the event “px stops waiting at line 2”). Let us observe that such an event does exist, since (by definition) each process px ∈ Qi starts the second phase. Similarly, let py be any process of Qj and ey the event “py terminates the first phase”. Due to the t-accuracy property of P t , when ex occurs, px suspects at most (n − t − 1) processes that are currently alive. If px has received a phase1() message from a process py ∈ Qj , we have Vi ∩ Vj 6= ∅ and the theorem follows. So, let us assume that px has not received messages from any process py ∈ Qj when ex occurs. As (1) px falsely suspects at most (n − t − 1) processes when ex occurs, and (2) Qj contains at least (n − t) processes, we conclude that at least one process py′ ∈ Qj has crashed when ex occurs. So, there is py′ ∈ Qj such that crashy′ happened before ex . Similarly, if, for any process py ∈ Qj , py has not received messages from any process in Qi , we can conclude that there is at least one process px′ ∈ Qi that has crashed before the event ey , and crashx′ happened before ey . As (1) px′ ∈ Qi , (2) py′ ∈ Qj , and (3) all the processes in Qi and Qj start the second communication phase, we have (from the previous observations): crashx′ happens before ey′ and crashy′ happens before ex′ . Since ex′ happens before crashx′ , and ey′ happens before crashy′ , we get the following cycle: ey′ happens before crashy′ that happens before ex′ , that happens before crashx′ , 2T heorem 5 that happens before ey′ . A contradiction. Interestingly, the consensus protocol presented in [9] can be seen as a non-modular version of the consensus protocol of Figure 1 where the sub-protocol implementing the IS abstraction is the PI n˚1598
P t -based protocol described in Figure 3. Moreover, we have shown in [10] that two communication steps are necessary and sufficient to implement IS in ASn,t [P t ][∅] with n/2 ≤ t < n.
5
Towards a Bigger Picture
This section presents a bigger picture by presenting additional distributed computing abstractions.
5.1
The CO abstraction
The “collect” problem is defined as follows. Each process pi deposits (proposes) a value vi and obtains (decides) a set of values Vi such that the following properties are satisfied: • Termination: Each correct process decides. • Agreement: Let Vi be the set decided by pi . We have | Vi | ≥ n − t. • Validity: If v ∈ V (where V is the set decided by a process), then v is a proposed value. This well-known abstraction simply gathers values. It can be trivially implemented in ASn,t [∅][∅]. The reader can easily see that the protocol presented in Figure 3 implementing IS can be restated as two CO invocations separated by the P t -based statement: “wait until (phase1 (−) messages have been received from all processes not locally suspected by P t )”.
5.2
The CI Abstraction
The “common intersection” problem is defined as follows. Each process pi deposits (proposes) a value vi and obtains (decides) a set of values Vi such that the following properties are satisfied: • Termination: Each correct process decides. • Agreement: Let D = T {px , . . . } be the set of processes that decide, and Vx the set decided by such a px . We have ( px ∈D Vx ) 6= ∅. • Validity: If v ∈ V (where V is the set decided by a process), then v is a proposed value. This abstraction ensures that all the processes that obtain a set of values have a value in common, but they do not know which value.
5.3
The LA Abstraction
The “lattice agreement” problem [3] is defined as follows. Each process pi deposits (proposes) a value vi and obtains (decides) a set of values Vi such that the following properties are satisfied5 : • Termination: Each correct process decides. • Agreement: Let Vi and Vj be the sets decided by pi and pj . We have Vi ⊆ Vj or Vj ⊆ Vi . • Validity: No decided set is empty. Moreover, if v ∈ V (where V is the set decided by a process), then v is a proposed value. This abstraction ensures that the sets of values obtained by the processes are ordered by containment. 5 This definition is slightly different than the one introduced in [3] that requires that vi ∈ Vi . This difference is not significant for our purpose.
Irisa
5.4
Ranking these Abstractions
As the specification of LA is stronger (more constraining) than the specification of CI , which is in turn stronger than the specification of IS , it follows that, without additional computation, LA trivially implements CI , which in turn trivially implements IS . So, LA or CI can replace IS in Figure 1 to solve the consensus problem in ASn,t [3S][∅]. Hence, we get the following corollaries (where X stands for LA, CI or IS ). Corollary 1 ∀t < n, the consensus problem can be solved in ASn,t [3S][X ]. Corollary 2 ∀t < n, X can be solved in ASn,t [∅][CONS ]. Corollary 3 ∀t < n, CONS , LA, CI and IS are equivalent in ASn,t [3S][∅]. Let V be the set of values that the processes can propose to any of the three previous problems (LA, CI and IS ). It is interesting to notice that when |V| = 2, these three problems are the same problem. It is also interesting to observe that the 3S + IS -based consensus protocol described in Figure 1 uses first the 3S oracle to transform the set of values proposed by the processes into a set of two values (namely, the estimate of the current coordinator, plus the default value ⊥), before using the underlying IS abstraction.
6
Conclusion
This paper has presented a distributed computing abstraction that captures an essential part of a class of agreement problems. This abstraction, called Intersecting Sets (IS ), requires each process to deposit a value and allows each non-faulty process to obtain a subset of these values such that any two such sets have a non-empty intersection. After having introduced and motivated this abstraction, the paper has investigated its properties, its power and its benefit when solving the consensus problem. As it has been shown, IS factorizes an important part of consensus protocols. It has allowed the design of a general protocol that does not depend on t, the maximum number of process that can crash (t does not appear in the protocol text). Then, it has been shown that the implementation of the underlying IS abstraction, which depends on t, can be realized without additional requirement when t < n/2, while it relies on an appropriate failure detector when there is no constraint on t (i.e., when t < n).
Acknowledgments We would like to thank Carole Delporte-Gallet, Hugues Fauconnier and Rachid Guerraoui for interesting discussions and exchanges on the failure detector class P t , and on the comparison of the message passing model with the shared memory model.
References [1] Aguilera M.K., Toueg S. and Deianov B., Revisiting the Weakest Failure Detector for Uniform Reliable Broadcast. Proc. 13th Int. Symposium on DIStributed Computing (DISC’99), Springer-Verlag LNCS #1693, pp. 21-34, 1999. [2] Aho A.V. and Ullman J.D., Foundations of Computer Science. Computer Science Press, New-York, 765 pages, 1992. PI n˚1598
[3] Attiya H., Herlihy M.P. and Rachman O. Atomic Snapshots Using Lattice Agreement. Distributed Computing, 8(3):121-132, 1995. [4] Chandra T.D. and Toueg S., Unreliable Failure Detectors for Reliable Distributed Systems. Journal of the ACM, 43(2):225-267, 1996. [5] Chandra T.D., Hadzilacos V. and Toueg S., The Weakest Failure Detector for Solving Consensus. Journal of the ACM, 43(4):685-722, 1996. [6] Delporte-Gallet C., Fauconnier H. and Guerraoui R., A Realistic Look at Failure Detectors. Proc. Int. Conference on Dependable Systems and Networks (DSN’02), IEEE Computer Society Press, pp. 354-353, Washington D.C., 2002. [7] Delporte-Gallet C., Fauconnier H. and Guerraoui R., Failure Detection Lower Bounds on Registers and Consensus. Proc. 16th Int. Symposium on Distributed Computing (DISC’02), Springer-Verlag LNCS #2508, pp. 237-251, 2002. [8] Fischer M.J., Lynch N. and Paterson M.S., Impossibility of Distributed Consensus with One Faulty Process. Journal of the ACM, 32(2):374-382, 1985. [9] Friedman R., Most´efaoui A. and Raynal M., A Weakest Failure Detector-Based Asynchronous Consensus Protocol for f < n. Research Report #1557, irisa, Universit´e de Rennes 1 (France), 2003, 11 pages. http://www.irisa.fr/bibli/publi/pi/2003/1557/1557.html. To appear in Information Processing Letters. [10] Friedman R., Most´efaoui A. and Raynal M., Building and Using Quorums despite any Number of Process Crashes. Research Report #1583, irisa, Universit´e de Rennes 1 (France), 2003, 16 pages. http://www.irisa.fr/bibli/publi/pi/2003/1583/1583.html. [11] Most´efaoui A. and Raynal M., Solving Consensus Using Chandra-Toueg’s Unreliable Failure Detectors: a General Quorum-Based Approach. Proc. 13th Int. Symposium on Distributed Computing (DISC’99), Springer-Verlag LNCS #1693, pp. 49-63, 1999. [12] Raynal M. and H´elary J.-M., Synchronization and Control of Distributed Systems and Programs. Series in Parallel Computing, J. Wiley & Sons, 124 pages, 1990. [13] Schneider F.B., What Good are Models and What Models are Good? Chapter 2 in Distributed Systems (2d edition), Addison-Wesley and ACM Press, New-York, pp. 17-26, 1993. [14] Schneider F.B. and Lamport L., Paradigms for Distributed Programs. In Distributed Systems: an Advanced Course, Springer-Verlag LNCS #190, pp. 431-480, 1985. [15] Tel G., Introduction to Distributed Algorithms. Cambridge University Press, (2d edition), 596 pages, 2000.
Irisa