qInstitute of Computer. Science,. The. Hebrew. University of. Jerusalem,. Israel. tComputer. Sci. Dept.,. Cornell. University,. Ithaca,. NY, USA. Supported by ONR.
Uniform
Actions
in Asynchronous Extended
Dalia
Ken
Malki*
We devetop necessary
classical
conditions
for
the development
asynchronous
distributed sofiware that will perform (’evenis that if performed by any pro-
problem
to
in
problems
leave and join
the
that
ongoing
asynchronous
Andr6
Ricciardi$
Schiper
~
may not be surprising that they employ similar level mechanisms. In fact, we are not aware
cess, must be performed at all processes). The pauniformity, which differs from per focuses on dynamic ihe
Systems
Abstract Aleta
Birmant
Abstract
of asynchronous uniform actions
Distributed
processes
computation.
continually It relates
Consensus,
and
the
shows
that
distributed
low-level
architecture
ronment:
dynamic
system
that
uses a different
and also provides addition
lowerof any
the same envi-
and removal
of processes,
and non-trivial consistency properties. This observation led us to speculate that asynchronous distributed systems might be subject to conditions that dictate the lowest levels of the system architecture. If one can
Consensus is a harder problem. We provide a rigorous characterization of the framework upon which
identify necessary conditions for the solution of some basic problems, these conditions will also define the software architecture for a wide range of real-world
several existing
distributed
distributed
programming
environments
are based. And, our work shows that progress is sometimes possible in a primary-partition model even when consensus is not.
1
Our work on fault-tolerant (Isis [4], ‘llansis [2], and Horus desire to understand
the theoretical
which
depend.
such systems
●Institute Jerusalem, tComputer Supported ililectrical
distributed systems [18]) gives rise to a foundations
upon
systems
all use
These
synchrony
programming
of Computer
Science,
The
model Hebrew
[3], so it
University
of
Israel Sci. Dept., by ONR
systems. believe
that
(D-Uniformity)
Dynamic
Uniproperty
is the fundamental
required for many applications in asynchronous tings. In D-Uniformity, there are certain actions
setthat,
if t aken by any process in the system, must eventually be taken by every other active process; only failure and forcible removal (forced inactivity) excuse a pro-
Introduction
the virtual
We formity
Cornell
grant
number
& Computer
Eng.,
University,
Ithaca,
NY,
U.Texas
at Austin,
ber 21-32210.91, as part of the ESPRIT Number 0s60 (BROADCAST).
Basic
Research
be propagated
to the remaining
active
The motivation for D-Uniformity distributed system, processes will
processes.
is that in a large often act on behalf
of the system as a whole. While actions locally, they must eventually be known tire membership.
Sup nmn-
propagation of such actions, as well as the obligation of other processes to take these actions, while avoiding notions such as ‘correct/incorrect’ processes. D-Uniformity is trivial to solve if communication channels are reliable and the membership of the sys-
USA
Project
D-Uniformity
captures
may occur to the en-
USA.
NOO014-92-J1866.
~Ecole Polytechnique F6d&ale, Lausarme, Switzerland. ported by the “Fords national suisse” and OFES contract
cess from taking the action. 1 Even actions taken by a process that later crashes or is forcibly removed must
the required
tem is known to all process. When channels are lossy, a process initiating an action must know its that ce
Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association of Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. PODC 94-8194 Los Angeles CA USA @ 1994 ACM 0-89791 -654-9/94/0008.$3.50
1Systems failures
participation nication under
274
like
fsis,
accurately,
Transis
in the system,
to such a process a new process
and
sa unresponsive
Horus
as if they is reestablished,
identifier.
are unable
processes had
to
detect
are excluded
from
crashed.
If commu-
it rejoins
the system
horts
are aware of the action
it is initiating.
result shows that D-Uniformity that
a majority
is achievable
Our first
and Horus - can be understood
provided
programming with dynamic uniformity. tems can be viewed aa having two layers.
of the processes do not crash.
layer assumes a failstop
We then explore the impact of adding a membership service to our asynchronous environment. We assume that
a membership
service
reports
failures,
model
as practical
[16] with
tools for These sysThe upper
notifications,
upon which abstractions such as group communications and message delivery ordering are implemented.
and pos-
sibly joins, to each process.2 We show that, given a system of size IV and a membership service that can
The lower layer implements a membership service that looks like a failstop failure detector to the upper layer,
make an infinite
using agreement protocols completion, and ‘suspected
number
of mistakes
(i. e., reporting
a functional process as failed), but only t– 1 mistakes at each process event, D-Uniformity can be achieved with low
up to N–t
layer must track membership within tition, an instance of D-Uniformity.
failures.
D-Uniformity,
We then show that a membership service can alprocess-joins. We show that in order dynamic
to achieve resiliency
with
any single
live member
tual
ordination
[13].
For
show that
Consensus
example,
Chandra
is equivalent
to both
fault
Processes
de-
demonstrates model,
rests on a sound
that
the vir-
shared
by these
foundation,
and ex-
Model
The system
and Toueg
a primary parBy formalizing
how virtual synchrony relates to the one within asynchronous consensus haa been studied.
System
2
In much prior work, Distributed Consensus is considered the basic problem in achieving distributed co-
paper
programming
systems,
plains which
model, which we study.
this
synchrony
three
(l-
ThreshoM), the membership service must ensure that process-joins are ordered with respect to regular actions. This property is indeed supported in practice by the virtual synchrony programming is provided by the distributed systems
to react to process join, crash’ events. The lower
consists
of a finite
communicate
with
set S of processes.
each other
by passing
messages. The system is asynchronous in that there is no common global clock, and messages may be ar-
tection and to atomic broadcast [6]. It is well known that Consensus has no solution in asynchronous envi-
bitrarily long in transit. A process p fails by crashing, which we model by the distinct event crashP, but always follows its assigned protocol.3 Messages between processes may be lost, but to compensate for this, processes resend messages until acknowledged. Specifically, if processes p and g both
ronments in which even a single process may crash [9]. As a result, Consensus per se is not the basic problem underlying all dynamic coordination in asynchronous environments. In particular, studying only Consensus ignores problems in which coordinating the activity of only a majority of the processes suffices. One of our goals is to better understand the relationship between
remain
ensuring
be delivered. There are no permanent partitions.4 nally, we assume the network does not corrupt
progress in a pn’mary
sical Consensus problem. Finally, to model the higher
partition layers
and the clasof abstraction
sages. Let
found in the systems we study, we define two subclasses of Uniformity, for future research. The first consists of actions that must be done both uniformly and in the same sequence. This sequence problem has been studied database
extensively
in the context
of distributed
service
reports
several
in addition
types
of membership
to suspected
this
asynchronous
environment.
with the unique event sta~: hP = star$ . e: -.. e;, k >0. A cut is a tuple of finite process histories, one 3The to other the
[6], a membefip events
to
function. We model the change in a process’ local state with an event in the execution of the process. A history for process p, hP, is a sequence of events beginning
upon detecting termination (e.g., re-using entries in a table, or discarding saved copies of messages related to the action). The systems we have developed – Isis, Tkansis,
and leaves)
refer
Fimes-
It is natural to model processes as 1/0 state automata: A process has a local state and a transition
cesses that have not failed). This subclass is important in systems that perform some sort of clean-up action
detector
A
eventu-
may never
crashed.
applies to uniform actions that must also eventually be known to have terminated (i.e., taken by all pro-
role to a juilure
each message from p will
Since messages may be delayed arbitrarily long, no process can determine whether an unresponsive process haa crashed, or whether it only appears to have
systems [1, 10, 11, 17]. The second subclass
zw~,le. similar in
operational,
all y reach q; if p crashes, messages in transit
(e. g., joins,
failures.
275
~ra~~p
event is not
machines
discussion. 4 The r=~ts
failure
model
cation
channels
within prmented
with
only
performed
the system; here minor
are FIFO.
by P itsdf,
nor is it visible
it is introduced
extend rhanges,
to simplify
to the general omission provided
the commum.-
for each p E S. only with
The
initial
system
cut,
●
co, consists
of all the siar$ events. We assume familiarity inter-event causality and the “happens-before”
relation(e~
e’) [12], and with
consistent
cuts [7]. A
system run is an n-tuple of infinite process histories, one for each process in S.5 A cut, then, is a finite prefix
DIDP (/3) holds on c if and only
if doesP(~) is an
event in p’s history
of c,
component
●
OWNSP(~) if and only if p is the owner of p,
●
KP “(~)” holds exactly action /?. Obviously,
of a run.
but
~P “(~)”
when p knows ‘about’ the OWNsP (~) * ~p “(fl)”,
A TOWNSP(@
received
a message naming
CRASHP
holds
holds
only
after
p
~.
Actions ●
special
set of events,
called
actions can redescribed,
actions.
thespecific
While
The
Language
are propositional, and all formulas are evaluated along consistent cuts. When formula p holds on cut c, we write c ~ p. The modalities have the following semantics:
❑Ip
condition
P E S A DIDP(P)
*
Dynamic The
basic
Uniformity 5Histories
A
O (DID&3)
of an action
Knowledge cute
it:
~P
“(~)”
To define
)()
v DISABLEDq
1
+
forces
a process
ODIDP (fl)
DISABLED9
to try
to exe-
V DISABLEDP.
precisely,
we introduce
the no-
of
p has permission within S to execute ~. While we do not specify what constitutes permission (e.g., whether it is granted quorum
only
subset),
by a designated
we do require
process,
or by a
it to satisfy: permission
+
2. Initially
PERMITP(S,
no actions
@
.
are permitted
co ~ V@ (Ap 7PERM1TP($
~))
.
actions are eventually 3. Permitted process does not crash
executed
if a
to its local state at c. (S, P) + O (DIDP(P)
holds exactly Now, DISABLEDq get permission to execute actions
Uniformity propositional
problem of crashed
infinitely
that
permission to execute an action. The need for permission arises from the network assumptions, which force a process intending to execute an action
tion
formulas
of
the
D-
not already
many
are: processes crash
can be made
infinite
V CRASHP . ) when q will never for which it does
have permission:
DISABLEDq pending
states
9CS
PERMITP
3
for D-Uniformity
(we discuss this below in more detail):
DIDP(fl)
of c in any run,
I rN/21. Let S1 and S2 partition S with IS I I = r. Let p be a run of U in which all
Lemma 3.1 Let S be a system of N processes, such that each process owns an infinite number of actions.
processes in S1 crash at some cut c. Because p is live,
If fewer than [N/21 failures occur throughout cution, then D-Uniformity is solvable.
an action ~, owned by a member of S2 and unknown to Sl, such that c’ ~ DIDP(@, for P E %. Now let p’ be another run of U with the same prefix
Proof:
The
protocol
in
Figure
1
there must be a cut c’, also in p and after c, as well as
the exe-
solves
c, except
D-
that
Uniformity (the symbol II denotes concurrent threads). There, “fairly” means no action, owned or buffered, is never selected.
are indistinguishable
faulty finite
The protocol
is live because each non-
process succeeds in initiating time.
an action
It is safe because for each performed
after
in p’ all messages to and from
delayed
a cut in p’.
c. We claim To see this,
that
observe
these processes will therefore
action
both
runs up to cut c’.
61t j~
well ~own
the
two runs
take the same actions
Specifically,
that, unless
tine failures, multi-valued binary-consensus.
277
that
S2 are
above is also
to the processes in S2, and that
within
~, a message announcing 13reaches at least one process that never crashes, guaranteeing that the action will eventually propagate to every other live process. 9
c’ from
the system
consensus
in
action P is also iS prone
to byzan-
can be implemented
given
performed
in p’ with
no communication
involving
The Weakest Membership Solving D-Uniformity
4.2
any
process in SI. Now, at c’ in p’, crash all processes in S2. By asOur
primary
intereat
is
to
Service
define
the
weakest
sumption, ISZI < r, so in this execution fewer than r processes have crashed by cut c’. Thus, p’ is still live in the sense that there is some q c S1 that continues
membership service that will permit solutions to DUniformity in the presence of [ZV/21 failures or more.
performing an infinite number of its actions. However, in our model, no message informing q about /3
We use the definition services.
need ever reach q, precluding ~. This violates
Resiliency
Membership Section whenever
be a solution
~
Increasing
with
Uniformity. Specifically we augment d with bership service. A membership service reports
a memto every
a Membership
is a protocol
that,
(in line 7) only
We
on how to use a member-
from
using MS’,
the
Let MS and MS’ be
requisite
can implement properties
MS)(U operating
MS.
of a weakest we make no asor on how it is
in the presence of memberto D-Uniformity
that
is
a cut
of Z./(MS),
that
while
cannot
Lemma
occur
in any execution
4.2 shows that
waiting
for
‘enough’ acknowledgments (direct or indirect) before executing any action is the only way to prevent such a cut.
Service
ship service, recall the protocol of Figure modify that protocol so process p awaits ments
& Toueg)
resilient to $ failures, for ~ ~ [IV/2]. Let t = N –f be the number of processes that do not crash. Lemma 4.1
Lemma To give some intuition
membership
MS is weaker than MS! if there
ship service MS) be a solution
describes
a local view
LocalViewP (e), which is a list of process identifiers. simply use LocalViewP when the event is clear. Using
services.
Let 24(
In this section, we extend the asynchronous environment A to increase the resiliency of solutions to D-
4.1
1 (Chandra
membership service for D-Uniformity, sumptions on the values it providea, used by the protocol.
that D-Uniformity is solvable of the system is operational.
process p, at every event e in p’s history,
Definition membership
In proving
a
Service
3 showed a majority
[5] to compare
ever performing
safety, and so 2.4cannot
to D-Uniformity.
4
q from
from
processes
4.1
Let
Z.((MS)
be
solution
a
to D-Uniformity that is resilient to f failures. Let t = N–f, and let {p, ql, ..., q~) ~ S. Then in every
1. We might acknowledg-
execution
in LocalViewP.
of U(MS)
there is no consistent
cut c such
that
However, a naive membership service could lead to undesirable results. For example, a membership service based on timeout would remove from LocalVlewP any process that does not respond to p’s messages within some time bound. Without further constraints, this membership service is unsafe for the purposes of DUniformity, even in failure-free runs. Another example membership service might maintain agreement about suspected failures (e.g., the service described in [14]). The advantage of such a service is that the protocol in Figure 1 can make safe progress even when rlV/2] or more of the processes have failed. Informally, the system this of
requires
processes,
mistaken
and
failure
successive
agreement an
among
inopportune
suspicions
reconfiguration
and
true
crashes
will
cause
the
system
of two
of them
remaining
three
may
be fatal,
processes
will
of five processes, for
the
block
real
the f~e
failure
2 Let
p
In our model,
~, violating perform
safety.
an
this qi
~
action
,8.
Let
acks-rcvdP (/3)
remov~
of two
‘=
{9 I &K?
’’(l3)”}
to
Lemma in a system
indefinitely.
of
between
block7. 7For ~x_pIe,
actions
may never learn about
acks-rcvdP (/3) be the set of processes from which p received direct or indirect acknowledgment regarding f? up to the event doesP(P).
a majority
combination
performing
Definition
this is done by successively reconfiguring into decreasing majority sets. However,
approach the
Proof: Assume to the contrary that c can occur in some execution of 24(MS). At c, suppose an adversary crashes all processes except ql, ..., qt. By assumption, the system remains live, and so one of the gi continues
of the
the system
278
4.2
If U (MS)
is an f-resilient
solution
to D-
Uniformity, and doesp (6) occurs in any run, then at least one process in acks-rcvdP (~) never crashes.
Proofi
Suppose
to the contrary
that
by some cut c,
Since at least f processes never crash, there is a set of
Theorem 4.3 WMS(t- 1) is weaker than any mem. bership service extending the asynchronous environment A in which D- Uniformity can be solved with f
processes {gl,...
failures.
every process (except p) in acks-rcvdP (~) has crashed. 9t} G (S\
acks-rcvdP(p)),
such that
Proof:
Consider
Take Let &be
a cut that is pequivalent
to c, and satisfies
& # =~{q, “(/3)”. Now define t to be identical except that p’s history component in i terminates
to c’ with
the event
~
crash.
This cut violates
Lemma
4.1.
the following
LocalViewP(startP)
event
doe% (~), set LocalViewP to acks-rcvdP (/3), but wise, do not change Loca lViewp.
other-
We show this membership service belongs services: class of W MS(t —1) membership
to the
Weak
Membership
Service
that
ery p,q E S.
3. By Lemma
2. Crashed processes are eventually removed from the local view of each active process that remains
LocalVlewp
active:
crashed made). V
DISABLEDP
form. processes are mistak3. At most m non-crashed enly removed from Loca lViewp. We further disstable (i.e.,
non-stable
permanent)
Theorem
of
4.2, if t processes are removed
from
at
have
least
one
of
them
must
(at most t— 1 mistakes can have been We note that, since eventually all the
I 4.4
removal
from
Vniformity
are stable,
then
and, in Figure
Proofi If removals
WMS(rn)
does not make more than
m ac-
WMS(t–
with
1) is suficient
up to f = N–t
LocalViewP, crashed.
removals:
If
removed
process within
for
solving
D-
failures.
processes
to local views, then at any from LocalViewP (e) that
1) membership
This solution is maintained
a finite
service
it for line 7 (waiting
time.
for a
to D-Uniformity is because a live prowith every other live
Since MS reports
every
failure within a finite time, p does not wait for acknowledgments from crashed processes forever. Therefore, p performs within finite time every action it initiates.
event e there are at most m processes curremoved
1, substitute
cess p succeeds in communicating
at least one of these is, in fact,
may be returned
Let MS be a W MS(t–
majority of replies). f-resilient. Liveness
cumulated mistakes for each process. Thus, whenever m+ 1 processes are removed from
rently
number
removal:
Stable-removals:
Non-Stable
the fact that
a finite
processes in the system remove the crashed process from their view, these removals are uni-
) Pdi
tinguish
of only
actions before it crashed. If some p performs an infinite number of actions, there must be a point in its execution after which q was unresponsive to every announcement p made. Thus, q is not in acks-rcvdP (/3) and is then removed from LocalViewP (doesP(P)).
1. The initial view of all processes is identical: for evLocalVlewP (startP) = LocalViewq (startq)
# LocalViewP
process stems from
q can have learned
that:
O(q
of a crashed process q from the view
of every active
makes m mistakes (WMS(rn)) provides each process p at each event e, with a local view Loca lViewP (e) such
A
at
pEs.
2. The removal
~RASHg ~
S;
service:
each
view of some process.
3 A
membership be
1. Agreement on an initial view stems directly from the definition of LocalViewp (startP), for every
As in [6] a membership service makes a mistake if and only if it removes a non-faulty process from the local
Definition
to
The solution
are
performed
not crashed.
maintains
one process that
remains
mation about ~ will live processes. 1
We now show that that a WMS((lV-/)-l) membership service (i. e., WMS(t– 1)) is the weakest for which f-resilient solutions to D-Uniformity exist. Necessity and sufficiency are shown in the next Theorems.
If removals service
279
safety because an action
by process p, is acknowledged alive.
eventually
are stable,
Therefore,
the infor-
propagate
to all the
a WMS(t-1)
can make t– 1 mistakes
/?,
by at least
membership
per process,
or N x
(t–l)
mistakes
globally.
t >1, such a failure solve consensus
In [6] it is shown
detector
is not strong
that
process learns that
for
enough
all the currently
cesses have performed
to
when ~ z [IV/21.*
that
action
this
in more
from
its buffers
detail).
active
some action, (Section
Thus,
pro-
it discards 6 discusses
existing
processes
When removals are not permanent, WMS(t–1) can make infinitely may mistakes both about and to any process. Obviously, this membership service also can-
need not store the accumulated execution history, on the chance that other processes may join
not help in solving
a point
4.3
consensus.
The obligation
N–l-Resiliency For the special
equivalent Lemma
cesses obliged case of ~ = IV– 1, D-Uniformity
obligation
is
to Consensus. 4.5
The
N–1 -resilient
D-
far in the future.
consensus
problem
is reducible
defined
as follows:
set So of processes that
(1) SystView
tJni~orrniiy.
a WMS(0)
crashed
process removal
makes
processes with
no mistakes,
are eventually
a WMS(0)
to join.
Allowing We now
tion.
In
q the permission
processes
arises when the start by some communication In this communication, permission
to join
to join,
that
join
a computa-
via a special
execute
is a common
joining
all relevant
many
mistakes
terminology, as the
for stron~ly
view SystView(c)
by:
and the event
ad~ (q) is in a
p is in SystView(c).
practice
applications. processes
Obliged(/?,
Safety
k-misfaken
failure
the initiation
‘Sf
u
of ac-
SystView(c)
+
A
now becomes: +ID#)
V DISABLEDq
q~Obliged(B)
in fault-
State-transfer
Clearly, a system that starts set of S cannot hope to tolerate
).
up with only a subarbitrary crashes of
a pre-defined, fixed number of processes; resiliency of the system should also be redefined to incorporate system size at various points in the execution. To this end, we define the set NotCrashed(c) to be the subset of SystView(c) that have not crashed. We define resiliency
make u
detector,
p)
for D-Uniformity
DIDp(@)
need not actually
1, W MS(t - 1) may
denote
{.~t c I inii(fl)~~}
events to reach the desired
t >
5 Let inii(~)
in run p is:
event de-
From a practical point of view, it is not feasible to expect processes to maintain information about actions indefinitely. Typically, when a
ak their
c, then
Definition
local state. ●
The system defined
tion @ by the owner of ~; that is the event by which set of ~ the owner of ~ announces ~. The obligation
operation, in which a newly accepts a snapshot of the sys-
distributed
means that
increases
from processes in
We define now Obliged(@) as the set of processes that are in SystView before the initiation of action P propagates in the system:
noted addP (q). In this case, we deem it appropriate to relieve q from the obligation to perform actions that were initiated prior to addP (q). This approach is based on the following observations:
tolerant
the event
SO;
2. if p G SystVlew(c)
with another process, say p. presumably p has granted q
tem upon joining,
start
environment,
the system,
. A staie-transfer joined process
initially
Formally:
1. SystView(cO)d~f
is uniform.
late joining of a process, say q, was preceded
asynchronous
contains have their
and
removed,
Joins consider
an
the
SystView
Definition 4 Let co be the initial system cut, and So the special set of processes that have their start event in co. Let add(q) be a special uniform action that gives
cut
5
to define view
system cut co, and (2) SystView
on a cut c is recursively all
a system
as new processes get the permission
Proofi A solution to N– l-resilient D-Uniformity requires a WMS(0). By definition any WMS(0) is a “perfect failure suspector” of [6], and consensus can be solved using the perfect failure suspector. ~ By definition
~ is the set of pro-
In order
special
SystView
since
/3.
set we introduce
in the initial
to
set of an action
to perform
Definition
with
at c is in terms of NotCrashed(c). 6 A solution
U to D-Uniformity
has a t-
threshold if every cut c of every run of U is live provided lNotCrashed(c)l > t.
k = N x (t-l).
280
5.1
One-Threshold For simplicity,
service
required
we show
the weakest
for a l-Threshold
D-Uniformity. The extension done as in Section 4. Lemma
5.1
solu~ion
to D- Uniformity.
U{MS)
4. Changes to SystVlew are ordered with respect to regular actions: For each process q, and each event init(~), either an event add(q) (by
Solutions membership
(l-T)
Let p, q E S, and let U(MS) Then
is there a consistent
to
some
case is
init(~)
solution
to the general
+
Theorem
be a 1-T
in no execution
process)
5.3
an OWMS(0)
of
extending
cut c such that
DIDP(8) A q E Obliged(/?)
A -ICRASHq
after
the processes
c, suppose an adversary
in SystView(c)
except
q.
safety.
init(~),
processes
joins
or
are permitted,
is weaker than any membership D- Uniformity
environment
A
service in which
has solutions.
Proofi Consider the following, in which and changes to it are defined as follows: Init
crashes all This
can be
Loca lViewP
Let f? be the first action p gets permission local view to execute, and set p’s initial
to be
acks- rcvdP (@;
done if the protocol has a l-Threshold, still leaving the system live. However, q may never learn about /3 (due to network assumptions); since q G Obliged(b), this violates
precedes
-IKq “(P)”
A
Proofi Assume to the contrary that c satisfies the conditions above and occurs in some execution of U. Immediately
When
the asynchronous
1-threshold c #
causally
adL (q) for all r.
Add/Remove At each event doesP(P), set LocalViewP (doesP(@) to acks-rcvdP (/3), but otherwise don’t change Loca lViewP.
~
Recall that acks-rcvdP (B) is the set of processes from about P up to which p received some acknowledgment
We now show that this membership the properties of an OWMS(0):
service satisfies
doesP (,8).
1. If p is active,
Lemma
5.2
In
any
l-Threshold
solution
to
D-
process in Obliged(~)
crashed-
therefore includes not crashed.
Proofi
The proof is similar to that of Lemma 4.2. Let c be any cut on which DIDP(/l) holds. Assume there exists a process q that violates the Lemma. The adversary would choose to crash all processes except q on a cut c on which -IKq “(B)” holds, in contradiction to Lemma 5.1. D
Definition
7 An
supports
(OWMS(0)),
Open
joins, maintains
Weak and
Membership makes
the following
zero
obtains
a finite
number
an initial
p
and
pro4.3.
2. Crashed processes are eventually removed the local view of each active process.
from
non-crashed 3. No LocalViewP.
from
that
in some run p, the
occurs at p concurrently
of a process q, adds(q),
to the ad-
such that
q is un-
to Obliged.
Let p’ be an identical
(modulo
run except
that
in p’ the
process s crashes immediately before the event add, (q). In p’, q @ Obliged(@). We claim that p and p’ are indistinguishable therefore, in one of them, reflect
removed
to the contrary
process q belongs
failures).
is
process
known to p. Immediately after add.(q) let the adversary crash s. Note that, in the run p, the
p and every process in So (that has crashed). In consequence, the initial local
process
every
has not crashed,
2. The proof of eventual removal of crashed cesses is the same as in the proof of Theorem
event inii(~)
LocalViewP,
views of the proceasea of So are identical
that
has
4. Assume
containing
not
for p
Service
properties:
view
exists
3. If a process q is in LocalViewP at some point, /3 initiated later by p, q E then in any action Obliged(p). From Lemma 5.2, if q is removed from LocalViewP, q must have crashed.
mistakes
of events after p starts,
membership
view
in So that
dition 1. Within
an initial
as p obtains permission for an action (say @. By Lemma 5.2, this view contains every
Uniformity, when p obtains permission for an action ~, every process in Obliged(e)\ acks-rcvdP(8) has
that
then
as soon
Lemma
281
Obliged(p) 5.2.
u
correctly,
to the process p, and acks-rcvdP (0) cannot in contradiction
to
1 II endIeaa 2
loop:
select
/“of
~ fairly
out
3
1. @ is a new action;
4
2. ~ ia removed /*
5
Initiate
der.
can occupy
Whenever
Uniformity
out from
pencfing-bvfleq
in the past
add q to Loca lVlewP (unless LVP +
7
aend(announce(~)
q was removed);
8
wait
) to all processes
for ack(~) in LVP
from
each process
n LocalViewp;
performed
by all active
eventually
~ will
11
send(ack(~))
12
if ~ ia received
13
store
Figure
2:
OWMS(0)
f?)) from
q:
A
D-Uniformity
protocol
using
an
by OWMS(0). 7.
Uniformity
This follows
If p initiates
The reason is that
7, then
by property
is ensured
by item
lems.
to remove.
P
global
~
D-Terminated-
safety condition:
v
ODISABLEDP
O(~PDIDq(@)
V&DISAEILED,)
It seems most appropriate
membership
services
with
ware built ship
D-Uniformity
in the higher
over the membership
services
identical,
to define a family
varied
erties, which are then reflected
any
2 of the definition
knowing
an additional
Whereas D-Uniformity can sometimes be solved when Consensus cannot, D-Sequential-Uniformity and D-Terminated-Uniformity are equivalent to Consensus. This raises the question of exactl y how a membership service relates to the various D-Uniformity prob-
process addition haa either preceded init(/3), or will causally follow it. In the former case, the process is incIuded in LVP, and in the latter case, it is precluded Obliged(/.?). Thus, by line 9, every process in from Obliged(o) that has not crashed knows about ~, which guarantees uniformity. Liveness
eventually
xObliged(@)
LVP = Obliged(@). 4 of OWMS(0),
from
Knowing
case in any solution
requires
candidates
~
from item 4 of Def-
in Figure 2 uses an OWMS(0) and has a l-Threshold.
@ at line
.
members.
(the
adds the following
DIDP(@)
Theorem 5.4 An O WillS(O) is sujjlcient for building a l-Threshold solution for D- Uniformity in the face of process joins.
Proofi The protocol to solve D-Uniformity,
which
are reasonable
joins.
We note that, as every correct process initiates infinitely many actions, process additions are done uniformly inition
to D-
state detection. Definitive knowledge of termination is a pragmatic concern and arises whenever information about actions is kept in a buffer. Eventually the buffer must be cleaned, and terminated actions
time
@ in pending-buffe~
and supporting
obliged
differs
has terminated,
to q ; for the first
(fl) , doe$q(~))
terminate
to D-Uniformity) receive(announce(
condition
Even if p performs an action /3, it can take arbithat is, to be trarily long for the action to terminate;
in LVP;
cloesp(,t?);
1011upon
safety
or-
D-Sequential-
doesP(~), doesP(7)) A D1D9(7) *
BEFORE(dOeSq
LocalViewp
6
~, 7 conflict,
Uniformity: BEFORE(
add(g)
the ith space in the delivery
actions
adds the following
~ *\
for each event
9
of them
process p “/
of the following:
layer.
used in the systems
but all provide
of
proplevel soft-
The member-
cited
some combination
here are not of Unifor-
mity, Sequence, and Termination properties. IVote that when a primary partition membership service is actually implemented (e.g., in Isis), then, lacking access to a sufficiently accurate failuredetector, the service might sometimes block, meaning
of
that
it is impossible
OWMS(0) and by line 8: p will never wait indefinitely to get the permission to perform /3. ~
tially
6
ure scenarios from making
violating
to make progress
the safety
conditions
without
poten-
of D-Uniformity.
Since the upper layers then block as well, there are failSequence
and
Termination
D-Uniformity characterizes a problem in which the actions taken do not in any way conflict with each other; this assumption of no contention is not always reasonable. cast services
For example, in totally-ordered multithe delivery of a message conflicts with
the delivery
of all other
messages
in that
only
7
in which these systems progress.
can be prevented
Conclusions D-Uniformity
is suitable
for
modeling
the
lowest
layer in the distributed systems we study. This layer implements a reliable-multicast service, which ensures
one
282
uniform
delivery
of messages multicast
and is used both to maintain within
a primary
partition
cat ions structured
within
of the system,
as process groups.
a group,
[7] K.
shots:
and in appli-
tems.
given
degree of resiliency.
a membership
number event,
layer
of mistakes, the reliable
failures. dynamic
Most
that
layer
have implications
t ation
of
as the
protocols
the
uniform
Transis
reliable in
multi
[15]
system.
for cast
the
and
the
[10]
N–t
Moreover,
[11]
results
[12]
[13]
apply
Acknowledgments
[15]
M.
Syst.,
[16]
Pease,
Replicated
Databases.
Symp.
on Prin.
March
1986.
[2] Y. Amir, In
U’nd
D. Dolev,
[3] K.
S. Kramer,
Computing,
IEEE
Toolkit,
Press,
Computing
for High July
Transis:
[18]
Model.
to appear.
the Isis
D. Chandra,
Weakest proc.
1 Ith
Distributed
V.
Failure
Toolkit.
Reliable IEEE
Hadzilacos,
Detector
annual
ACM
Computing,
[6] T. D. Chandra
Distributed
Press, 1994. to
and
S. Toueg.
for Solving Symposium pages
and S. Toueg.
tectors for Asynchronous Symposium nua! ACM Computing,
Shostak,
on
pages 325-340,
In
on Principles
of
147-158,
1992.
Unreliable
Systems. Principles
The
Consensus.
Failure
In prac. of
10th
Dean-
Distributed
1991.
283
Syst.,
availability:
Atom-
ACM
Trans.
1987.
and the Ordering of Events Comm. A CM, 21(7):558-
and
1980.
Systems.
The
Science,
L.
Group
Cornell
Lamport.
Reaching
Journal of A CM,
of Faults.
Membership
PhD
thesis,
University,
Problem dept.
in
of Com-
November
1992.
92-1313).
A. Schiper
and A. Sandoz.
Uniform
Synchronous Intl.
May
2(2):145-154,
generals
in action:
ACM
Trans.
K. P. Birman, Reliable
R. Cooper,
Multicast
An Efficient Data Man-
symp. on prin. March 1985. B. Glade,
between
In Proceedings oj the USENIX April
ImComp.
1984.
D. Skeen, F. Christian, and A. El Abbadi. Fault-Tolerant Algorithm for Replicated
R. van Renesse,
IEEE
Computing
1993.
processors. May
Multicast In
on Distributed
Byzantine
fail-stop
Reliable
Environment.
Conf.
pages 561-568,
F. B. Schneider.
27-28,
appear. [5] T.
R.
Micro-Kernels with
method
Comp.
data.
August
in the Presence
A. Ricciardi. Asynchronous
kernels.
1992.
Synchrony
versus
replicated
5(3):249-274,
and P. Stephenson.
on Fault-
Computing
Trans.
agement. In ACM SIGA CT- SIGMOD of Database Systems, pages 215–229,
Availability.
Symposium
and R. van Renesse. with
CT-SIGMOD
and D. Malki.
Virtual
[17]
pages 240-251,
Distributed
chapter
1994.
[4] K. P. Birman
SIGA
pages 76-84,
Reliable
in Partitioned
Systems,
International
P. Birman.
the Isis
ACM
SubSystem
Annual
Tolerant
In
of Database
A Communication
for
27(2):228-234,
Syst., Availabtity
J.
1978.
plementing and S. Toueg.
Process.
1986.
L. Lamport. Time, Clocks, in a Distributed System.
M.
Impossibility
replication
ACM
Concurrency
mechanisms
learn.
1985.
types.
Proc. oj the I%h
References El Abbadi
April
February
Herlihy.
processes
One Faulty
data
in a Virtually
for their
with
A quorum-consensus
Systems,
[1] A.
syz-
February
1986.
and M. Paterson.
Consensus
Agreement
[14]
How
1(1):40-52,
N. Lynch,
abstract
(TR
Dolev
snap
of dktributed 3(1):63-75,
J. Mizra.
M. Herlihy.
puter
and Danny
and
Computing,
565, July
of
in this paper.
We thank Fred Schneider many helpful discussions.
Syd.,
for
Comp.
to the reliable multicast protocols employed within the “strong” membership services of Isis and Transis, which go well beyond the weak membership services employed
Distributed
states
Comp.
32:374-382,
icity
such
safe protocols
these
Chandy
4(1):32-53,
implemen-
protocols,
M.
ACM,
achieve l-Threshold resiliency, this membership layer needs to maintain order between process joins and regular actions. results
L. Lamport. global
Trans.
of Distributed
In addition, a membership layer can support process additions. We show that in order to
Our
ACM
[9] M. Fischer,
at each
can tolerate
and
determining
Distributed
an infinite
but at most t– 1 mistakes multicast
[8] K.
interestingly,
can make
Chandy
1985.
Our results show
that a reliable multicast can be implemented within any majority partition of the system, or, given access to a membership service, can be implemented with an even higher
M.
information
membership
Micro-
workshop on and Other Kernel Architectures, pages
1992.