Uniform Actions in Asynchronous Distributed Systems Extended Abstract

3 downloads 24 Views 916KB Size Report
qInstitute of Computer. Science,. The. Hebrew. University of. Jerusalem,. Israel. tComputer. Sci. Dept.,. Cornell. University,. Ithaca,. NY, USA. Supported by ONR.
Uniform

Actions

in Asynchronous Extended

Dalia

Ken

Malki*

We devetop necessary

classical

conditions

for

the development

asynchronous

distributed sofiware that will perform (’evenis that if performed by any pro-

problem

to

in

problems

leave and join

the

that

ongoing

asynchronous

Andr6

Ricciardi$

Schiper

~

may not be surprising that they employ similar level mechanisms. In fact, we are not aware

cess, must be performed at all processes). The pauniformity, which differs from per focuses on dynamic ihe

Systems

Abstract Aleta

Birmant

Abstract

of asynchronous uniform actions

Distributed

processes

computation.

continually It relates

Consensus,

and

the

shows

that

distributed

low-level

architecture

ronment:

dynamic

system

that

uses a different

and also provides addition

lowerof any

the same envi-

and removal

of processes,

and non-trivial consistency properties. This observation led us to speculate that asynchronous distributed systems might be subject to conditions that dictate the lowest levels of the system architecture. If one can

Consensus is a harder problem. We provide a rigorous characterization of the framework upon which

identify necessary conditions for the solution of some basic problems, these conditions will also define the software architecture for a wide range of real-world

several existing

distributed

distributed

programming

environments

are based. And, our work shows that progress is sometimes possible in a primary-partition model even when consensus is not.

1

Our work on fault-tolerant (Isis [4], ‘llansis [2], and Horus desire to understand

the theoretical

which

depend.

such systems

●Institute Jerusalem, tComputer Supported ililectrical

distributed systems [18]) gives rise to a foundations

upon

systems

all use

These

synchrony

programming

of Computer

Science,

The

model Hebrew

[3], so it

University

of

Israel Sci. Dept., by ONR

systems. believe

that

(D-Uniformity)

Dynamic

Uniproperty

is the fundamental

required for many applications in asynchronous tings. In D-Uniformity, there are certain actions

setthat,

if t aken by any process in the system, must eventually be taken by every other active process; only failure and forcible removal (forced inactivity) excuse a pro-

Introduction

the virtual

We formity

Cornell

grant

number

& Computer

Eng.,

University,

Ithaca,

NY,

U.Texas

at Austin,

ber 21-32210.91, as part of the ESPRIT Number 0s60 (BROADCAST).

Basic

Research

be propagated

to the remaining

active

The motivation for D-Uniformity distributed system, processes will

processes.

is that in a large often act on behalf

of the system as a whole. While actions locally, they must eventually be known tire membership.

Sup nmn-

propagation of such actions, as well as the obligation of other processes to take these actions, while avoiding notions such as ‘correct/incorrect’ processes. D-Uniformity is trivial to solve if communication channels are reliable and the membership of the sys-

USA

Project

D-Uniformity

captures

may occur to the en-

USA.

NOO014-92-J1866.

~Ecole Polytechnique F6d&ale, Lausarme, Switzerland. ported by the “Fords national suisse” and OFES contract

cess from taking the action. 1 Even actions taken by a process that later crashes or is forcibly removed must

the required

tem is known to all process. When channels are lossy, a process initiating an action must know its that ce

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association of Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. PODC 94-8194 Los Angeles CA USA @ 1994 ACM 0-89791 -654-9/94/0008.$3.50

1Systems failures

participation nication under

274

like

fsis,

accurately,

Transis

in the system,

to such a process a new process

and

sa unresponsive

Horus

as if they is reestablished,

identifier.

are unable

processes had

to

detect

are excluded

from

crashed.

If commu-

it rejoins

the system

horts

are aware of the action

it is initiating.

result shows that D-Uniformity that

a majority

is achievable

Our first

and Horus - can be understood

provided

programming with dynamic uniformity. tems can be viewed aa having two layers.

of the processes do not crash.

layer assumes a failstop

We then explore the impact of adding a membership service to our asynchronous environment. We assume that

a membership

service

reports

failures,

model

as practical

[16] with

tools for These sysThe upper

notifications,

upon which abstractions such as group communications and message delivery ordering are implemented.

and pos-

sibly joins, to each process.2 We show that, given a system of size IV and a membership service that can

The lower layer implements a membership service that looks like a failstop failure detector to the upper layer,

make an infinite

using agreement protocols completion, and ‘suspected

number

of mistakes

(i. e., reporting

a functional process as failed), but only t– 1 mistakes at each process event, D-Uniformity can be achieved with low

up to N–t

layer must track membership within tition, an instance of D-Uniformity.

failures.

D-Uniformity,

We then show that a membership service can alprocess-joins. We show that in order dynamic

to achieve resiliency

with

any single

live member

tual

ordination

[13].

For

show that

Consensus

example,

Chandra

is equivalent

to both

fault

Processes

de-

demonstrates model,

rests on a sound

that

the vir-

shared

by these

foundation,

and ex-

Model

The system

and Toueg

a primary parBy formalizing

how virtual synchrony relates to the one within asynchronous consensus haa been studied.

System

2

In much prior work, Distributed Consensus is considered the basic problem in achieving distributed co-

paper

programming

systems,

plains which

model, which we study.

this

synchrony

three

(l-

ThreshoM), the membership service must ensure that process-joins are ordered with respect to regular actions. This property is indeed supported in practice by the virtual synchrony programming is provided by the distributed systems

to react to process join, crash’ events. The lower

consists

of a finite

communicate

with

set S of processes.

each other

by passing

messages. The system is asynchronous in that there is no common global clock, and messages may be ar-

tection and to atomic broadcast [6]. It is well known that Consensus has no solution in asynchronous envi-

bitrarily long in transit. A process p fails by crashing, which we model by the distinct event crashP, but always follows its assigned protocol.3 Messages between processes may be lost, but to compensate for this, processes resend messages until acknowledged. Specifically, if processes p and g both

ronments in which even a single process may crash [9]. As a result, Consensus per se is not the basic problem underlying all dynamic coordination in asynchronous environments. In particular, studying only Consensus ignores problems in which coordinating the activity of only a majority of the processes suffices. One of our goals is to better understand the relationship between

remain

ensuring

be delivered. There are no permanent partitions.4 nally, we assume the network does not corrupt

progress in a pn’mary

sical Consensus problem. Finally, to model the higher

partition layers

and the clasof abstraction

sages. Let

found in the systems we study, we define two subclasses of Uniformity, for future research. The first consists of actions that must be done both uniformly and in the same sequence. This sequence problem has been studied database

extensively

in the context

of distributed

service

reports

several

in addition

types

of membership

to suspected

this

asynchronous

environment.

with the unique event sta~: hP = star$ . e: -.. e;, k >0. A cut is a tuple of finite process histories, one 3The to other the

[6], a membefip events

to

function. We model the change in a process’ local state with an event in the execution of the process. A history for process p, hP, is a sequence of events beginning

upon detecting termination (e.g., re-using entries in a table, or discarding saved copies of messages related to the action). The systems we have developed – Isis, Tkansis,

and leaves)

refer

Fimes-

It is natural to model processes as 1/0 state automata: A process has a local state and a transition

cesses that have not failed). This subclass is important in systems that perform some sort of clean-up action

detector

A

eventu-

may never

crashed.

applies to uniform actions that must also eventually be known to have terminated (i.e., taken by all pro-

role to a juilure

each message from p will

Since messages may be delayed arbitrarily long, no process can determine whether an unresponsive process haa crashed, or whether it only appears to have

systems [1, 10, 11, 17]. The second subclass

zw~,le. similar in

operational,

all y reach q; if p crashes, messages in transit

(e. g., joins,

failures.

275

~ra~~p

event is not

machines

discussion. 4 The r=~ts

failure

model

cation

channels

within prmented

with

only

performed

the system; here minor

are FIFO.

by P itsdf,

nor is it visible

it is introduced

extend rhanges,

to simplify

to the general omission provided

the commum.-

for each p E S. only with

The

initial

system

cut,



co, consists

of all the siar$ events. We assume familiarity inter-event causality and the “happens-before”

relation(e~

e’) [12], and with

consistent

cuts [7]. A

system run is an n-tuple of infinite process histories, one for each process in S.5 A cut, then, is a finite prefix

DIDP (/3) holds on c if and only

if doesP(~) is an

event in p’s history

of c,

component



OWNSP(~) if and only if p is the owner of p,



KP “(~)” holds exactly action /?. Obviously,

of a run.

but

~P “(~)”

when p knows ‘about’ the OWNsP (~) * ~p “(fl)”,

A TOWNSP(@

received

a message naming

CRASHP

holds

holds

only

after

p

~.

Actions ●

special

set of events,

called

actions can redescribed,

actions.

thespecific

While

The

Language

are propositional, and all formulas are evaluated along consistent cuts. When formula p holds on cut c, we write c ~ p. The modalities have the following semantics:

❑Ip

condition

P E S A DIDP(P)

*

Dynamic The

basic

Uniformity 5Histories

A

O (DID&3)

of an action

Knowledge cute

it:

~P

“(~)”

To define

)()

v DISABLEDq

1

+

forces

a process

ODIDP (fl)

DISABLED9

to try

to exe-

V DISABLEDP.

precisely,

we introduce

the no-

of

p has permission within S to execute ~. While we do not specify what constitutes permission (e.g., whether it is granted quorum

only

subset),

by a designated

we do require

process,

or by a

it to satisfy: permission

+

2. Initially

PERMITP(S,

no actions

@

.

are permitted

co ~ V@ (Ap 7PERM1TP($

~))

.

actions are eventually 3. Permitted process does not crash

executed

if a

to its local state at c. (S, P) + O (DIDP(P)

holds exactly Now, DISABLEDq get permission to execute actions

Uniformity propositional

problem of crashed

infinitely

that

permission to execute an action. The need for permission arises from the network assumptions, which force a process intending to execute an action

tion

formulas

of

the

D-

not already

many

are: processes crash

can be made

infinite

V CRASHP . ) when q will never for which it does

have permission:

DISABLEDq pending

states

9CS

PERMITP

3

for D-Uniformity

(we discuss this below in more detail):

DIDP(fl)

of c in any run,

I rN/21. Let S1 and S2 partition S with IS I I = r. Let p be a run of U in which all

Lemma 3.1 Let S be a system of N processes, such that each process owns an infinite number of actions.

processes in S1 crash at some cut c. Because p is live,

If fewer than [N/21 failures occur throughout cution, then D-Uniformity is solvable.

an action ~, owned by a member of S2 and unknown to Sl, such that c’ ~ DIDP(@, for P E %. Now let p’ be another run of U with the same prefix

Proof:

The

protocol

in

Figure

1

there must be a cut c’, also in p and after c, as well as

the exe-

solves

c, except

D-

that

Uniformity (the symbol II denotes concurrent threads). There, “fairly” means no action, owned or buffered, is never selected.

are indistinguishable

faulty finite

The protocol

is live because each non-

process succeeds in initiating time.

an action

It is safe because for each performed

after

in p’ all messages to and from

delayed

a cut in p’.

c. We claim To see this,

that

observe

these processes will therefore

action

both

runs up to cut c’.

61t j~

well ~own

the

two runs

take the same actions

Specifically,

that, unless

tine failures, multi-valued binary-consensus.

277

that

S2 are

above is also

to the processes in S2, and that

within

~, a message announcing 13reaches at least one process that never crashes, guaranteeing that the action will eventually propagate to every other live process. 9

c’ from

the system

consensus

in

action P is also iS prone

to byzan-

can be implemented

given

performed

in p’ with

no communication

involving

The Weakest Membership Solving D-Uniformity

4.2

any

process in SI. Now, at c’ in p’, crash all processes in S2. By asOur

primary

intereat

is

to

Service

define

the

weakest

sumption, ISZI < r, so in this execution fewer than r processes have crashed by cut c’. Thus, p’ is still live in the sense that there is some q c S1 that continues

membership service that will permit solutions to DUniformity in the presence of [ZV/21 failures or more.

performing an infinite number of its actions. However, in our model, no message informing q about /3

We use the definition services.

need ever reach q, precluding ~. This violates

Resiliency

Membership Section whenever

be a solution

~

Increasing

with

Uniformity. Specifically we augment d with bership service. A membership service reports

a memto every

a Membership

is a protocol

that,

(in line 7) only

We

on how to use a member-

from

using MS’,

the

Let MS and MS’ be

requisite

can implement properties

MS)(U operating

MS.

of a weakest we make no asor on how it is

in the presence of memberto D-Uniformity

that

is

a cut

of Z./(MS),

that

while

cannot

Lemma

occur

in any execution

4.2 shows that

waiting

for

‘enough’ acknowledgments (direct or indirect) before executing any action is the only way to prevent such a cut.

Service

ship service, recall the protocol of Figure modify that protocol so process p awaits ments

& Toueg)

resilient to $ failures, for ~ ~ [IV/2]. Let t = N –f be the number of processes that do not crash. Lemma 4.1

Lemma To give some intuition

membership

MS is weaker than MS! if there

ship service MS) be a solution

describes

a local view

LocalViewP (e), which is a list of process identifiers. simply use LocalViewP when the event is clear. Using

services.

Let 24(

In this section, we extend the asynchronous environment A to increase the resiliency of solutions to D-

4.1

1 (Chandra

membership service for D-Uniformity, sumptions on the values it providea, used by the protocol.

that D-Uniformity is solvable of the system is operational.

process p, at every event e in p’s history,

Definition membership

In proving

a

Service

3 showed a majority

[5] to compare

ever performing

safety, and so 2.4cannot

to D-Uniformity.

4

q from

from

processes

4.1

Let

Z.((MS)

be

solution

a

to D-Uniformity that is resilient to f failures. Let t = N–f, and let {p, ql, ..., q~) ~ S. Then in every

1. We might acknowledg-

execution

in LocalViewP.

of U(MS)

there is no consistent

cut c such

that

However, a naive membership service could lead to undesirable results. For example, a membership service based on timeout would remove from LocalVlewP any process that does not respond to p’s messages within some time bound. Without further constraints, this membership service is unsafe for the purposes of DUniformity, even in failure-free runs. Another example membership service might maintain agreement about suspected failures (e.g., the service described in [14]). The advantage of such a service is that the protocol in Figure 1 can make safe progress even when rlV/2] or more of the processes have failed. Informally, the system this of

requires

processes,

mistaken

and

failure

successive

agreement an

among

inopportune

suspicions

reconfiguration

and

true

crashes

will

cause

the

system

of two

of them

remaining

three

may

be fatal,

processes

will

of five processes, for

the

block

real

the f~e

failure

2 Let

p

In our model,

~, violating perform

safety.

an

this qi

~

action

,8.

Let

acks-rcvdP (/3)

remov~

of two

‘=

{9 I &K?

’’(l3)”}

to

Lemma in a system

indefinitely.

of

between

block7. 7For ~x_pIe,

actions

may never learn about

acks-rcvdP (/3) be the set of processes from which p received direct or indirect acknowledgment regarding f? up to the event doesP(P).

a majority

combination

performing

Definition

this is done by successively reconfiguring into decreasing majority sets. However,

approach the

Proof: Assume to the contrary that c can occur in some execution of 24(MS). At c, suppose an adversary crashes all processes except ql, ..., qt. By assumption, the system remains live, and so one of the gi continues

of the

the system

278

4.2

If U (MS)

is an f-resilient

solution

to D-

Uniformity, and doesp (6) occurs in any run, then at least one process in acks-rcvdP (~) never crashes.

Proofi

Suppose

to the contrary

that

by some cut c,

Since at least f processes never crash, there is a set of

Theorem 4.3 WMS(t- 1) is weaker than any mem. bership service extending the asynchronous environment A in which D- Uniformity can be solved with f

processes {gl,...

failures.

every process (except p) in acks-rcvdP (~) has crashed. 9t} G (S\

acks-rcvdP(p)),

such that

Proof:

Consider

Take Let &be

a cut that is pequivalent

to c, and satisfies

& # =~{q, “(/3)”. Now define t to be identical except that p’s history component in i terminates

to c’ with

the event

~

crash.

This cut violates

Lemma

4.1.

the following

LocalViewP(startP)

event

doe% (~), set LocalViewP to acks-rcvdP (/3), but wise, do not change Loca lViewp.

other-

We show this membership service belongs services: class of W MS(t —1) membership

to the

Weak

Membership

Service

that

ery p,q E S.

3. By Lemma

2. Crashed processes are eventually removed from the local view of each active process that remains

LocalVlewp

active:

crashed made). V

DISABLEDP

form. processes are mistak3. At most m non-crashed enly removed from Loca lViewp. We further disstable (i.e.,

non-stable

permanent)

Theorem

of

4.2, if t processes are removed

from

at

have

least

one

of

them

must

(at most t— 1 mistakes can have been We note that, since eventually all the

I 4.4

removal

from

Vniformity

are stable,

then

and, in Figure

Proofi If removals

WMS(rn)

does not make more than

m ac-

WMS(t–

with

1) is suficient

up to f = N–t

LocalViewP, crashed.

removals:

If

removed

process within

for

solving

D-

failures.

processes

to local views, then at any from LocalViewP (e) that

1) membership

This solution is maintained

a finite

service

it for line 7 (waiting

time.

for a

to D-Uniformity is because a live prowith every other live

Since MS reports

every

failure within a finite time, p does not wait for acknowledgments from crashed processes forever. Therefore, p performs within finite time every action it initiates.

event e there are at most m processes curremoved

1, substitute

cess p succeeds in communicating

at least one of these is, in fact,

may be returned

Let MS be a W MS(t–

majority of replies). f-resilient. Liveness

cumulated mistakes for each process. Thus, whenever m+ 1 processes are removed from

rently

number

removal:

Stable-removals:

Non-Stable

the fact that

a finite

processes in the system remove the crashed process from their view, these removals are uni-

) Pdi

tinguish

of only

actions before it crashed. If some p performs an infinite number of actions, there must be a point in its execution after which q was unresponsive to every announcement p made. Thus, q is not in acks-rcvdP (/3) and is then removed from LocalViewP (doesP(P)).

1. The initial view of all processes is identical: for evLocalVlewP (startP) = LocalViewq (startq)

# LocalViewP

process stems from

q can have learned

that:

O(q

of a crashed process q from the view

of every active

makes m mistakes (WMS(rn)) provides each process p at each event e, with a local view Loca lViewP (e) such

A

at

pEs.

2. The removal

~RASHg ~

S;

service:

each

view of some process.

3 A

membership be

1. Agreement on an initial view stems directly from the definition of LocalViewp (startP), for every

As in [6] a membership service makes a mistake if and only if it removes a non-faulty process from the local

Definition

to

The solution

are

performed

not crashed.

maintains

one process that

remains

mation about ~ will live processes. 1

We now show that that a WMS((lV-/)-l) membership service (i. e., WMS(t– 1)) is the weakest for which f-resilient solutions to D-Uniformity exist. Necessity and sufficiency are shown in the next Theorems.

If removals service

279

safety because an action

by process p, is acknowledged alive.

eventually

are stable,

Therefore,

the infor-

propagate

to all the

a WMS(t-1)

can make t– 1 mistakes

/?,

by at least

membership

per process,

or N x

(t–l)

mistakes

globally.

t >1, such a failure solve consensus

In [6] it is shown

detector

is not strong

that

process learns that

for

enough

all the currently

cesses have performed

to

when ~ z [IV/21.*

that

action

this

in more

from

its buffers

detail).

active

some action, (Section

Thus,

pro-

it discards 6 discusses

existing

processes

When removals are not permanent, WMS(t–1) can make infinitely may mistakes both about and to any process. Obviously, this membership service also can-

need not store the accumulated execution history, on the chance that other processes may join

not help in solving

a point

4.3

consensus.

The obligation

N–l-Resiliency For the special

equivalent Lemma

cesses obliged case of ~ = IV– 1, D-Uniformity

obligation

is

to Consensus. 4.5

The

N–1 -resilient

D-

far in the future.

consensus

problem

is reducible

defined

as follows:

set So of processes that

(1) SystView

tJni~orrniiy.

a WMS(0)

crashed

process removal

makes

processes with

no mistakes,

are eventually

a WMS(0)

to join.

Allowing We now

tion.

In

q the permission

processes

arises when the start by some communication In this communication, permission

to join

to join,

that

join

a computa-

via a special

execute

is a common

joining

all relevant

many

mistakes

terminology, as the

for stron~ly

view SystView(c)

by:

and the event

ad~ (q) is in a

p is in SystView(c).

practice

applications. processes

Obliged(/?,

Safety

k-misfaken

failure

the initiation

‘Sf

u

of ac-

SystView(c)

+

A

now becomes: +ID#)

V DISABLEDq

q~Obliged(B)

in fault-

State-transfer

Clearly, a system that starts set of S cannot hope to tolerate

).

up with only a subarbitrary crashes of

a pre-defined, fixed number of processes; resiliency of the system should also be redefined to incorporate system size at various points in the execution. To this end, we define the set NotCrashed(c) to be the subset of SystView(c) that have not crashed. We define resiliency

make u

detector,

p)

for D-Uniformity

DIDp(@)

need not actually

1, W MS(t - 1) may

denote

{.~t c I inii(fl)~~}

events to reach the desired

t >

5 Let inii(~)

in run p is:

event de-

From a practical point of view, it is not feasible to expect processes to maintain information about actions indefinitely. Typically, when a

ak their

c, then

Definition

local state. ●

The system defined

tion @ by the owner of ~; that is the event by which set of ~ the owner of ~ announces ~. The obligation

operation, in which a newly accepts a snapshot of the sys-

distributed

means that

increases

from processes in

We define now Obliged(@) as the set of processes that are in SystView before the initiation of action P propagates in the system:

noted addP (q). In this case, we deem it appropriate to relieve q from the obligation to perform actions that were initiated prior to addP (q). This approach is based on the following observations:

tolerant

the event

SO;

2. if p G SystVlew(c)

with another process, say p. presumably p has granted q

tem upon joining,

start

environment,

the system,

. A staie-transfer joined process

initially

Formally:

1. SystView(cO)d~f

is uniform.

late joining of a process, say q, was preceded

asynchronous

contains have their

and

removed,

Joins consider

an

the

SystView

Definition 4 Let co be the initial system cut, and So the special set of processes that have their start event in co. Let add(q) be a special uniform action that gives

cut

5

to define view

system cut co, and (2) SystView

on a cut c is recursively all

a system

as new processes get the permission

Proofi A solution to N– l-resilient D-Uniformity requires a WMS(0). By definition any WMS(0) is a “perfect failure suspector” of [6], and consensus can be solved using the perfect failure suspector. ~ By definition

~ is the set of pro-

In order

special

SystView

since

/3.

set we introduce

in the initial

to

set of an action

to perform

Definition

with

at c is in terms of NotCrashed(c). 6 A solution

U to D-Uniformity

has a t-

threshold if every cut c of every run of U is live provided lNotCrashed(c)l > t.

k = N x (t-l).

280

5.1

One-Threshold For simplicity,

service

required

we show

the weakest

for a l-Threshold

D-Uniformity. The extension done as in Section 4. Lemma

5.1

solu~ion

to D- Uniformity.

U{MS)

4. Changes to SystVlew are ordered with respect to regular actions: For each process q, and each event init(~), either an event add(q) (by

Solutions membership

(l-T)

Let p, q E S, and let U(MS) Then

is there a consistent

to

some

case is

init(~)

solution

to the general

+

Theorem

be a 1-T

in no execution

process)

5.3

an OWMS(0)

of

extending

cut c such that

DIDP(8) A q E Obliged(/?)

A -ICRASHq

after

the processes

c, suppose an adversary

in SystView(c)

except

q.

safety.

init(~),

processes

joins

or

are permitted,

is weaker than any membership D- Uniformity

environment

A

service in which

has solutions.

Proofi Consider the following, in which and changes to it are defined as follows: Init

crashes all This

can be

Loca lViewP

Let f? be the first action p gets permission local view to execute, and set p’s initial

to be

acks- rcvdP (@;

done if the protocol has a l-Threshold, still leaving the system live. However, q may never learn about /3 (due to network assumptions); since q G Obliged(b), this violates

precedes

-IKq “(P)”

A

Proofi Assume to the contrary that c satisfies the conditions above and occurs in some execution of U. Immediately

When

the asynchronous

1-threshold c #

causally

adL (q) for all r.

Add/Remove At each event doesP(P), set LocalViewP (doesP(@) to acks-rcvdP (/3), but otherwise don’t change Loca lViewP.

~

Recall that acks-rcvdP (B) is the set of processes from about P up to which p received some acknowledgment

We now show that this membership the properties of an OWMS(0):

service satisfies

doesP (,8).

1. If p is active,

Lemma

5.2

In

any

l-Threshold

solution

to

D-

process in Obliged(~)

crashed-

therefore includes not crashed.

Proofi

The proof is similar to that of Lemma 4.2. Let c be any cut on which DIDP(/l) holds. Assume there exists a process q that violates the Lemma. The adversary would choose to crash all processes except q on a cut c on which -IKq “(B)” holds, in contradiction to Lemma 5.1. D

Definition

7 An

supports

(OWMS(0)),

Open

joins, maintains

Weak and

Membership makes

the following

zero

obtains

a finite

number

an initial

p

and

pro4.3.

2. Crashed processes are eventually removed the local view of each active process.

from

non-crashed 3. No LocalViewP.

from

that

in some run p, the

occurs at p concurrently

of a process q, adds(q),

to the ad-

such that

q is un-

to Obliged.

Let p’ be an identical

(modulo

run except

that

in p’ the

process s crashes immediately before the event add, (q). In p’, q @ Obliged(@). We claim that p and p’ are indistinguishable therefore, in one of them, reflect

removed

to the contrary

process q belongs

failures).

is

process

known to p. Immediately after add.(q) let the adversary crash s. Note that, in the run p, the

p and every process in So (that has crashed). In consequence, the initial local

process

every

has not crashed,

2. The proof of eventual removal of crashed cesses is the same as in the proof of Theorem

event inii(~)

LocalViewP,

views of the proceasea of So are identical

that

has

4. Assume

containing

not

for p

Service

properties:

view

exists

3. If a process q is in LocalViewP at some point, /3 initiated later by p, q E then in any action Obliged(p). From Lemma 5.2, if q is removed from LocalViewP, q must have crashed.

mistakes

of events after p starts,

membership

view

in So that

dition 1. Within

an initial

as p obtains permission for an action (say @. By Lemma 5.2, this view contains every

Uniformity, when p obtains permission for an action ~, every process in Obliged(e)\ acks-rcvdP(8) has

that

then

as soon

Lemma

281

Obliged(p) 5.2.

u

correctly,

to the process p, and acks-rcvdP (0) cannot in contradiction

to

1 II endIeaa 2

loop:

select

/“of

~ fairly

out

3

1. @ is a new action;

4

2. ~ ia removed /*

5

Initiate

der.

can occupy

Whenever

Uniformity

out from

pencfing-bvfleq

in the past

add q to Loca lVlewP (unless LVP +

7

aend(announce(~)

q was removed);

8

wait

) to all processes

for ack(~) in LVP

from

each process

n LocalViewp;

performed

by all active

eventually

~ will

11

send(ack(~))

12

if ~ ia received

13

store

Figure

2:

OWMS(0)

f?)) from

q:

A

D-Uniformity

protocol

using

an

by OWMS(0). 7.

Uniformity

This follows

If p initiates

The reason is that

7, then

by property

is ensured

by item

lems.

to remove.

P

global

~

D-Terminated-

safety condition:

v

ODISABLEDP

O(~PDIDq(@)

V&DISAEILED,)

It seems most appropriate

membership

services

with

ware built ship

D-Uniformity

in the higher

over the membership

services

identical,

to define a family

varied

erties, which are then reflected

any

2 of the definition

knowing

an additional

Whereas D-Uniformity can sometimes be solved when Consensus cannot, D-Sequential-Uniformity and D-Terminated-Uniformity are equivalent to Consensus. This raises the question of exactl y how a membership service relates to the various D-Uniformity prob-

process addition haa either preceded init(/3), or will causally follow it. In the former case, the process is incIuded in LVP, and in the latter case, it is precluded Obliged(/.?). Thus, by line 9, every process in from Obliged(o) that has not crashed knows about ~, which guarantees uniformity. Liveness

eventually

xObliged(@)

LVP = Obliged(@). 4 of OWMS(0),

from

Knowing

case in any solution

requires

candidates

~

from item 4 of Def-

in Figure 2 uses an OWMS(0) and has a l-Threshold.

@ at line

.

members.

(the

adds the following

DIDP(@)

Theorem 5.4 An O WillS(O) is sujjlcient for building a l-Threshold solution for D- Uniformity in the face of process joins.

Proofi The protocol to solve D-Uniformity,

which

are reasonable

joins.

We note that, as every correct process initiates infinitely many actions, process additions are done uniformly inition

to D-

state detection. Definitive knowledge of termination is a pragmatic concern and arises whenever information about actions is kept in a buffer. Eventually the buffer must be cleaned, and terminated actions

time

@ in pending-buffe~

and supporting

obliged

differs

has terminated,

to q ; for the first

(fl) , doe$q(~))

terminate

to D-Uniformity) receive(announce(

condition

Even if p performs an action /3, it can take arbithat is, to be trarily long for the action to terminate;

in LVP;

cloesp(,t?);

1011upon

safety

or-

D-Sequential-

doesP(~), doesP(7)) A D1D9(7) *

BEFORE(dOeSq

LocalViewp

6

~, 7 conflict,

Uniformity: BEFORE(

add(g)

the ith space in the delivery

actions

adds the following

~ *\

for each event

9

of them

process p “/

of the following:

layer.

used in the systems

but all provide

of

proplevel soft-

The member-

cited

some combination

here are not of Unifor-

mity, Sequence, and Termination properties. IVote that when a primary partition membership service is actually implemented (e.g., in Isis), then, lacking access to a sufficiently accurate failuredetector, the service might sometimes block, meaning

of

that

it is impossible

OWMS(0) and by line 8: p will never wait indefinitely to get the permission to perform /3. ~

tially

6

ure scenarios from making

violating

to make progress

the safety

conditions

without

poten-

of D-Uniformity.

Since the upper layers then block as well, there are failSequence

and

Termination

D-Uniformity characterizes a problem in which the actions taken do not in any way conflict with each other; this assumption of no contention is not always reasonable. cast services

For example, in totally-ordered multithe delivery of a message conflicts with

the delivery

of all other

messages

in that

only

7

in which these systems progress.

can be prevented

Conclusions D-Uniformity

is suitable

for

modeling

the

lowest

layer in the distributed systems we study. This layer implements a reliable-multicast service, which ensures

one

282

uniform

delivery

of messages multicast

and is used both to maintain within

a primary

partition

cat ions structured

within

of the system,

as process groups.

a group,

[7] K.

shots:

and in appli-

tems.

given

degree of resiliency.

a membership

number event,

layer

of mistakes, the reliable

failures. dynamic

Most

that

layer

have implications

t ation

of

as the

protocols

the

uniform

Transis

reliable in

multi

[15]

system.

for cast

the

and

the

[10]

N–t

Moreover,

[11]

results

[12]

[13]

apply

Acknowledgments

[15]

M.

Syst.,

[16]

Pease,

Replicated

Databases.

Symp.

on Prin.

March

1986.

[2] Y. Amir, In

U’nd

D. Dolev,

[3] K.

S. Kramer,

Computing,

IEEE

Toolkit,

Press,

Computing

for High July

Transis:

[18]

Model.

to appear.

the Isis

D. Chandra,

Weakest proc.

1 Ith

Distributed

V.

Failure

Toolkit.

Reliable IEEE

Hadzilacos,

Detector

annual

ACM

Computing,

[6] T. D. Chandra

Distributed

Press, 1994. to

and

S. Toueg.

for Solving Symposium pages

and S. Toueg.

tectors for Asynchronous Symposium nua! ACM Computing,

Shostak,

on

pages 325-340,

In

on Principles

of

147-158,

1992.

Unreliable

Systems. Principles

The

Consensus.

Failure

In prac. of

10th

Dean-

Distributed

1991.

283

Syst.,

availability:

Atom-

ACM

Trans.

1987.

and the Ordering of Events Comm. A CM, 21(7):558-

and

1980.

Systems.

The

Science,

L.

Group

Cornell

Lamport.

Reaching

Journal of A CM,

of Faults.

Membership

PhD

thesis,

University,

Problem dept.

in

of Com-

November

1992.

92-1313).

A. Schiper

and A. Sandoz.

Uniform

Synchronous Intl.

May

2(2):145-154,

generals

in action:

ACM

Trans.

K. P. Birman, Reliable

R. Cooper,

Multicast

An Efficient Data Man-

symp. on prin. March 1985. B. Glade,

between

In Proceedings oj the USENIX April

ImComp.

1984.

D. Skeen, F. Christian, and A. El Abbadi. Fault-Tolerant Algorithm for Replicated

R. van Renesse,

IEEE

Computing

1993.

processors. May

Multicast In

on Distributed

Byzantine

fail-stop

Reliable

Environment.

Conf.

pages 561-568,

F. B. Schneider.

27-28,

appear. [5] T.

R.

Micro-Kernels with

method

Comp.

data.

August

in the Presence

A. Ricciardi. Asynchronous

kernels.

1992.

Synchrony

versus

replicated

5(3):249-274,

and P. Stephenson.

on Fault-

Computing

Trans.

agement. In ACM SIGA CT- SIGMOD of Database Systems, pages 215–229,

Availability.

Symposium

and R. van Renesse. with

CT-SIGMOD

and D. Malki.

Virtual

[17]

pages 240-251,

Distributed

chapter

1994.

[4] K. P. Birman

SIGA

pages 76-84,

Reliable

in Partitioned

Systems,

International

P. Birman.

the Isis

ACM

SubSystem

Annual

Tolerant

In

of Database

A Communication

for

27(2):228-234,

Syst., Availabtity

J.

1978.

plementing and S. Toueg.

Process.

1986.

L. Lamport. Time, Clocks, in a Distributed System.

M.

Impossibility

replication

ACM

Concurrency

mechanisms

learn.

1985.

types.

Proc. oj the I%h

References El Abbadi

April

February

Herlihy.

processes

One Faulty

data

in a Virtually

for their

with

A quorum-consensus

Systems,

[1] A.

syz-

February

1986.

and M. Paterson.

Consensus

Agreement

[14]

How

1(1):40-52,

N. Lynch,

abstract

(TR

Dolev

snap

of dktributed 3(1):63-75,

J. Mizra.

M. Herlihy.

puter

and Danny

and

Computing,

565, July

of

in this paper.

We thank Fred Schneider many helpful discussions.

Syd.,

for

Comp.

to the reliable multicast protocols employed within the “strong” membership services of Isis and Transis, which go well beyond the weak membership services employed

Distributed

states

Comp.

32:374-382,

icity

such

safe protocols

these

Chandy

4(1):32-53,

implemen-

protocols,

M.

ACM,

achieve l-Threshold resiliency, this membership layer needs to maintain order between process joins and regular actions. results

L. Lamport. global

Trans.

of Distributed

In addition, a membership layer can support process additions. We show that in order to

Our

ACM

[9] M. Fischer,

at each

can tolerate

and

determining

Distributed

an infinite

but at most t– 1 mistakes multicast

[8] K.

interestingly,

can make

Chandy

1985.

Our results show

that a reliable multicast can be implemented within any majority partition of the system, or, given access to a membership service, can be implemented with an even higher

M.

information

membership

Micro-

workshop on and Other Kernel Architectures, pages

1992.