WILLIAM. WEIHL. Massachusetts. Institute of Technology,. Cambridge, ... tory, 1 Kendall Square, Cambridge, MA 02139; N. Lynch, Massachusetts. Institute.
On the Correctness
MAURICE Digital
Co~oration,
Algorithms
Cambridge,
Massachusetts
LYNCH
Massachusetts
Institute
MICHAEL
MERRITT
AT&
Management
HERLIHY
Equipment
NANCY
of Orphan
T Bell Laboratones,
of Technology,
Murray Hill,
Cambridge,
Massachusetts
New Jersey
AND WILLIAM
WEIHL
Massachusetts
Institute
of Technology,
Cambridge,
Massachusetts
Abstract. In a distributed system, node failures, network delays, and other unpredictable occurcomputations—subcomputations that continue to run but whose rences can result in o~han results are no longer needed. Several algorithms have been proposed to prevent such computations from seeing inconsistent states of the shared data. In this paper, two such orphan management algorithms are analyzed. The first is an algorithm implemented in the Argus distributed-computing system at MIT, and the second is an algorithm proposed at CarnegieMellon. The algorithms are described formally, and complete proofs of their correctness are given. The proofs show that the fundamental concepts underlying the two algorithms are very similar in that each ran be regarded as an implementation of the same high-level algorithm. By exploiting M. Herlihy was sponsored by the Defense Advanced Research Projects Agency (DOD), ARPA Order No. 4976 (Amendment 20), under Contracts F33615-84-K-1 520 and F33615-87-C-1499, monitored by the Avionics Laboratory, Air Force Wright Aeronautical Laboratories, WrightPatterson AFB. N. Lynch was supported by the National Science Foundation under Gmnts DCR 83-02391 and CCR 86-11442, the Defense Advanced Research Projects Agency (DARPA) under Contract NOO014-83-K-0125, the Office of Naval Research under Contract NOO014-85-K-0168, and the Office of Army Research under Contract DAAG29-84-K-O058. W. Weihl was supported by an IBM Faculty Development Award, the National Science Foundation under grants DCR 85-100014 and CCR 87-16884, and the Defense Advanced Research Projects Agency (DARPA) under Contract NOO014-83-K-0125. Authors’ addresses: M. Herlihy, Digital Equipment Corporation, Cambridge Research Laboratory, 1 Kendall Square, Cambridge, MA 02139; N. Lynch, Massachusetts Institute of Technology, Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139; M. Merritt, AT& T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974; W. Weihl, Massachusetts Institute of Technology, Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computmg Machinery. To copy otherwme, or to repubhsh, requires a fee and/or specific permission. @ 1992 ACM 0004-5411/92/1000-0881 $01.50 Journal
of the Assocmtmn
for Computmg
Machinery,
Vol
39, No
4, October
1992, pp
881 –930
M. HERLIHY
882
ET AL.
properties of information flow within transaction management systems, the algorithms ensure that orphans only see states of the shared data that they could also see if they were not orphans. When the algorithms are used in combination with any correct concurrency control algorlthm, they guarantee that all computations, orphan as well as nonorphan, see consistent states of the shared data. Categories and Subject Descriptors: —distributed progrrznzrnbzg; D.3.2
D. 1.3 [Programming Techniques]: Concurrent Programming [Programming Languages]: Language Classifications— concurrent, dzstnbuted, and parallel languages; D.4. 1 [Operating Systems]: Process Management— concawerzcy: D.4.5 [operating Systems]: Reliabili~—@lt-@lerarzce: F.3. 1 [Logics and Meaning of Programs]: Specifying and Verifying and Reasoning about Programs—assertions; i)zturimzts; spec@catzon techizlques; H.2.3 [Database Management]: Languages—database ( persistent ) prograrnrnizg languages; H.2.4 [Database Management]: Systems—concurrency: dwti”buted systems; transaction
General
processing
Terms: Algorithms,
Languages,
Additional Key Words and Phrases: tomata, recovery, serializability
Rehability,
Argus,
Theory,
avalon,
Verification
atomic
actions,
camelot,
input–output
au-
1. Introduction Nested
transaction
projects
(e.g.,
computations
systems
see
[1],
[8],
in distributed
have been [9],
systems.
and
in a number
[23])
Nested
tions, provide a simple construct for failures. Nested transactions extend permit degree
explored
[21],
of recent
as a means
transactions,
for
research organizing
like ordinary
transac-
masking the effects of concurrency and the usual notion of transactions [3] to
concurrency within a single transaction. They also provide a greater of fault-tolerance by isolating a transaction from the failures of its
descendants. In distributed
systems,
various
factors,
including
node
crashes
and network
delays, can result in orphan computations—subcomputations that continue to run even though their results are no longer needed. For example, in the Argus system [9], a node partition or some
making a remote request may give up because other problem prevents it from communicating
a network with the
other node. This may leave a process running at the called node; this process is an orphan. The orphan runs as a descendant of the transaction that made the call. Since the caller gives up by aborting the transaction that made the call, the orphan will not have any permanent effects on the observed state of the shared data. As discussed in [10] and [17], even if a system is designed to prevent orphans from
permanently
affecting
shared
data,
orphans
are still
undesirable,
for two
reasons. First, they waste resources: they use processor cycles, and may also hold locks, causing other computations to be delayed. Second, they may see inconsistent states of the shared data. For example, a transaction might read data at two nodes, with some invariant relating the values of the different data objects. If the transaction reads data at one of the nodes and then becomes an orphan, another transaction could change the data at both nodes before the orphan reads the data at the second node. This could happen, for example, because the first node learns that the transaction has aborted and releases its locks. Although the inconsistencies seen by an orphan should not have any permanent effect on the shared data in the system, they can cause strange behavior if the orphan interacts with the external world; this can make programs difficult to design and debug.
Correctness
of Orphan
Several
Management
Algorithms
have
proposed
algorithms
inconsistent
information.
algorithms
for
crashes.
detecting
Nelson’s
been
Early and
work
work
to
prevent
orphans
in the area includes
eliminating
did not
883
orphans
assume
that
arise
an underlying
from
[19], which
seeing
describes
because
transaction
of node
mechanism,
so it was difficult to assign simple semantics to abandoned computations. Recent work [10, 17, 24] has studied orphans in the context of a nested transaction system, in which an abandoned computation can be aborted, preventing
it from
having
any effect
on the state of the system.
algorithms in [10], [17], and [24] is to detect can see inconsistent information. 1.1. ness
NEW
RESULTS.
proofs
for
algorithm
in
completely and
the
algorithms.
proofs
is
yet
in
use
intuitions the
give
regarded
formal
Argus
In
addition,
the
designers
concepts as
an
before
descriptions
the
algorithms
The goal of the
orphans
algorithms
in
that
two
fundamental
be
we
management
straightforward.
the
the
can
paper,
currently
follow
that
each
this
orphan
Although
show
fact,
[10]
In two
rigorous,
the
proofs
the
and eliminate
in system.
the
have to be
underlying
them
implementation
of
[17].
The
proofs
are
presentations
used
in
quite
describing
different,
are very the
correct-
and
1 Our
both
appear
and
[10]
they
our
similar;
same
in
high-level
algorithm. Our
results
relate
management
S’
must
which of
namely,
S‘ must
see
a view
it is not
rency
with
the
invoked
by
control
results
algorithm
imply
consistent sometimes
of
that
views. made
by
a
system,
that
S’,
S in the
sense
it
see
including that
results
provide
that
in
results
orphan
of
and
[13],
Lynch
a~d
Merritt
transaction
designers
that
S in
and
sub-
a concurviews,
as nonorphan, for informal
the
a model
of
is its sequence
see consistent
justification
orphan
execution
operations
as well
formal
develop
each
orphan
no
S includes
combination with any concurrency control algorithm. The formal model used in this paper is based on that [12]
an
system
system
nonorphans
an
S, having
of the
When
transactions,
algorithms’
could the
transaction.) ensures
containing
system,
“view”
all
the
system,
transaction’s
the
S‘,
These
system
(A
that
in
of
of a corresponding “simulate”
the
an orphan.
interactions
transactions
behavior
to that
management; in
the
algorithm
algorithms
our
see claims
work
in
in [4], [12], and [13]. In for
nested
transaction
systems including aborts, and use the model to show that an exclusive locking variation of Moss’s algorithm [18] ensures correctness for nonorphans. The paper [4] contains improvements to the basic model in [12] and [13], plus proofs that Moss’s read–write algorithm and a more general commutativity-based locking algorithm also ensure correctness for nonorphans. In this paper, we use the same model to describe the two orphan management algorithms mentioned above,
to state correctness
1.2. RELATED ment
algorithm
properties,
WORK.
Earlier
appears
in [6]. This
nested transaction systems that general than those presented
work
and to prove on verifying work
is described here, since
is based
the algorithms the Argus on
in [11]. The they apply
correct.
orphan
an earlier
managemodel
for
results in [6] are less only to the specific
1 Our analysis covers only orphans resulting from aborts of transactions that leave running descendants; there is another component of the Argus algorithm that handles orphans that result from node crashes in which the contents of volatile memory are destroyed.
884
M. HERLIHY
concurrency
control
algorithm
(nested
locking)
used by Argus.
ET AL.
Moreover,
the
presentation there is much more complex than the one in this paper. Much of the complexity in [6] arises because the treatments of concurrency control and orphan
management
the two. The
are intermingled,
model
for describing Other work
whereas
here
in [4], [12], and [13] provides
this separation. using the model
we are able
a convenient
to separate
set of concepts
of [4], [12], and [13] includes
[2], [5], and [20];
these papers prove correctness of algorithms for replica management, timeand distributed transaction commit, stamp-based concurrency control, respectively. The fact that it is possible to use the model to explain such a variety of transaction-processing algorithms is strong evidence that it is a useful tool for modeling and analyzing nested transaction systems. In an earlier version of this paper [7], we presented similar results proving the correctness however, modeling
of the two algorithms
were restricted to locking algorithms,
timestamps
and other
the
paper
earlier
analyzed
mechanisms.
to
apply
here.
The
results
in that
to
In this paper,
a wide
range
we generalize
of systems.
1.3. ORGANIZATION as follows:
and a brief
OF THIS PAPER.
Section
description
2 contains of I/O
The
remainder
general
of the
some preliminary
automata,
which
for using
the results
The
we give
paper
mathematical
of
results
described here are somewhat abstract; to make them more concrete, two examples illustrating how they apply to specific kinds of systems.
nized
paper,
a particular class of systems appropriate and did not apply directly to systems
is orga-
definitions
serve as the formal
foundation
for our work. This section may be skipped on first or cursory reading, and is included in order to make the technical presentation of this paper entirely self-contained. Section 3 contains a definition of basic systems, a general class of transaction-processing systems to which our results apply. These are nested transaction systems in which orphans may occur, and for which the problem of managing orphans can be precisely and intuitively stated. A basic system models user
the components
program
system control
system
is said to mange
sen”al correctness
Section among
4 contains
different
in
we
[10]
simple
some
events
and
system,
not
require
the
the
the of
proofs that it
use
of
local
algorithm. of the of the
the
have
as 1/0
automata.
and
the
Each
rest
of
the
principal the
information
properties
already
and
clocked
algorithm.
see
clocked
and
proofs proved algorithms
dependencies
this
We
about
the
consistent
algorithm
only,
a property
the results
of
structure.
orphans
the
the
underlie
management
information
simulation
Argus
orphan
interesting
that
about
contributions
two
The
abstract
results
if it ensures and nonorphans.
these concepts
global
and
correctly orphans,
and
an
uses
ensures
algorithm
reproving
correctness correctness
8 contain
that
abstract
definitions
correctness
Our
Argus
orphans
in a basic system;
algorithm show
the
requires more
the
[17].
and
the The
prove
abstract
formalize that
system automaton,
for all transactions,
the rest of the paper. Sections 5 through which
transaction
as a transaction
(which may include a division into objects, and may include concurrency and recovery algorithms) is modeled as a single basic database automa-
ton. A basic called
of a nested
is modeled
are for
first
that
the
then
[17] each
a
of the We
then
in
a way
simulates
simple,
abstract follows
define
history
views.
quite
in
algorithms
from
show
paper,
in
and
do
algorithm. directly
from
Correctness Each
of Orphan
orphan
transforming these
Management
management
Algorithms
algorithm
an arbitrary
basic
systems
contains
the
each manages
orphans
using
is described
system
same
885
without
transactions
a different
as a system
orphan
obtained
management.
as the
given
basic database.
basic
by
Each system,
The abstract
of but
algorithm
is modeled by the jiltered database, which maintains information about the global history of the system, and uses tests based on this history information to prevent orphans from learning that they are orphans. The Argus database models
the
manages
behavior
orphans
dencies
among
abstract
algorithm,
algorithm even
of
using
the tests
system
Argus
orphan
based
events.
The
introduced
models
restrictive
the orphan
than
the
management
[10];
direct
filtered
database
models
of the
correctness
database.
algorithm
about
the proof on global
filtered
algorithm
information
strictly
to simplify
in [17]; it also uses tests based
more
management
on local
from
history
another of the
information,
Finally,
the
[17]; it manages
it
depen-
clock
and is database
orphans
using
information about logical clocks. Each of these four databases is described as the result of a transformation of the basic database. We prove that the filtered system (the system consisting of the transactions and
the
filtered
transactions, system basic
ensures
system
the Argus
correctness strictly
they
serial
Similarly, which
basic
for
correctness
the filtered
in
turn
they
the
sense
that
It follows
that
transactions, We
and so inherits
if the
then
the
filtered
the
also prove the same
the clock system implements
implements
all
see in the basic
all transactions. system,
that
in
could
orphans.
nonorphan for
we prove
system
that
are not
correctness
system implements system,
the
see a “view”
in which serial
ensures
property.
jiltered
simulates
orphans,
in an execution system
filtered that
database)
including
system,
the thus
showing that the clock system has the same correctness property as the filtered system. Section 9 makes some of the preceding general concepts more concrete by describing two particular types of basic systems, taken from other work using this model.
The
first
kind
of basic
system,
a gerzeric system,
is appropriate
for
describing locking algorithms, while the second kind of basic system, a pseudotime system, is appropriate for describing timestamp-based algorithms. Both kinds
of systems
database each tions
specialize
automaton
into
object in the system and objects together.
the notion two
kinds
of a basic
system
of components:
and a controller The concurrency
by splitting
an object
the basic
automaton
the system is encapsulated within the object systems differ in that they have slightly different
automata. interfaces
The two kinds of between the objects
and the controller. Particular information flow dependencies are described both of these kinds of basic systems. Section 10 contains a summary of our results and some suggestions further work. 2. Formal
for
automaton that links the transaccontrol and recove~ performed by
for for
Preliminaries
An irrejlexiL’e partial order is a binary relation that is irreflexive, antisymmetric, and transitive. The formal subject matter of this paper is concerned with finite and infinite sequences describing the executions of automata. Usually, we are discussing sequences of elements from a universal set of actions. Formally, a sequence ~ of actions is a mapping from a prefix of the positive integers to the
886
M. HERLIHY
set of actions. integers under may
occur
different
We describe the mapping,
several
times
occurrences.
ET AL.
the sequence by listing the images of successive writing ~ = T1 Tz rr~ .”. .Z Since the same action in
a sequence,
Thus,
we refer
a sequence as an euent. Formally, actions is an ordered pair (i, m),
it
is convenient
to a particular an event where i
to
distinguish
occurrence
the
of an action
in
in a sequence P = Wlmz . . . of is a positive integer and n- is
an action, such that w,, the ith action in ~, is T. A set of sequences P is prefix-closed provided that
whenever
is a prefix limit-closed
a set of sequences P is finite prefixes are in P
of ~, it is also the case that -y c P. Similarly, provided that any sequence all of whose
is also in P. We sequences
to any nonempty,
about
INPUT/
OUTPUT
complex
transactions,
it is important
put automaton
model
of concurrent
The
such
model
[15, 16]. This algorithms
to satis~.
MODEL.
systems
to have a simple
computation.
supposed
AUTOMATON
concurrent
for concurrent tions
prefix-closed,
and limit-closed
set of
The
In
that
and clearly
to reason implement
defined
we use for our work
model
allows
careful
and of the correctness
model
order
as those
can serve
formal
model
is the input/out-
and readable conditions
as the basis for
careatomic
descrip-
that
rigorous
they
proofs
is sufficient
for
der properties
of
in
this
finite
paper.
executions
properties. system component
mathematical ever, an 1/0 set. The internal.
use
object
particular,
in
only,
and
not
is modeled
somewhat
automaton
In
like
as an
a traditional
need not be finite-state,
actions of an 1/0 This classification
do 1/0
this
paper
consider
automaton,
finite-state
we
are that
particular algorithms satisfy particular correctness conditions. This subsection contains an introduction to a special case of the model
fairness Each
y
as a safe~ prope~.
2.1. THE fully
refer
f3 e P and
that consi-
liveness which
automaton.
or is a
How-
but can have an infinite-state
automaton are classified as either input, output, or is a reflection of a distinction between events (such
as the receipt of a message) that are caused by the environment, events (such as sending a message) that the component can perform when it chooses and that affect the environment, variable) that a component
and events can perform
(such as changing when it chooses,
the value of a local but that are unde-
tectable by the environment except through their effects on later events. In the model, an automaton generates output and internal actions autonomously, and transmits
output
actions
instantaneously
to its environment.
In contrast,
the
automaton’s input is generated by the environment and transmitted instantaneously to the automaton. The distinction between input and other actions is an fundamental, based cm who determines when the action is performed: automaton can establish restrictions on when it will perform an output or internal action, but it is unable to block the performance of an input action. 2.1.1. Action Signatures. and their classification into action signature. An action
The formal description of an automaton’s actions inputs, outputs, and internal actions is given by its signature S is an ordered triple consisting of three
pairwise-disjoint sets of actions. We write components of S, and refer to the actions 2 We use the symbols indiwdual actions.
/3, y,...
for
sequences
in(S), out(S), and int(S) for the three in the three sets as the input actions,
of actions
and the symbols
m, ~, and
+
for
Correctness
of Orphan
output
actions,
out(S)
and refer
local(S)
and
Management
internal
actions
to the actions
= int(S)
u out(S),
Algorithms of S, respectively.
in ext(S)
and
887
refer
We
as the external
to the
actions
An
external
then
action
signature
is an
that
is, having
no internal
actions,
the
external
action
signature
(in(S), out(S), 0), that is, the removing the internal actions. 2.1.2.
Input/Output
an I/O
automaton
action
of
action
S is the
consisting
action
that
the
set of
signature
local~
controlled
extsig(S) from
An inpzlt/output automaton A (also an automaton) consists of four components:
states
=
S by
called 3
of start states,
need
not
be
finite.
We
as a step of A. The step (s’, T,s) action, and output steps, internal
steps are defined
of
signature,
is obtained
—a transition relation steps(A) c states(A) x acts(sig(A)) property that for every state s’ and input action rr (s’, m-,s) in steps(A).
(s’, m, S) of steps(A) A if T is an input
and
entirely
If S is an action
that
U
locally
U int(S),
Automata. or simply
—an action signature sig(A), —a set states(A) of states, —a nonempty set start(A) c states(A)
Note
= in(S)
as the
u out(S)
signature actions.
signature
ext(S)
of S. AIso, we let
in local(S)
controlled actions of S. Finally, we let acts(S) = in(S) refer to the actions in acts(S) as the actions of S.
external
let
actions
analogously.
x states(A),
with
the
there
is a transition
refer
to
an element
is called an input step of steps, external steps, and
If (s’, rr, s) is a step of A, then
T
is said to be enabled in s’. Since every input action is enabled in every state, automata are said to be input-enabled. The input-enabling property means that an automaton
is not
sometimes write out(A), etc. An
able
to block
input
actions.
If A
is an automaton,
locally controlled, that is, if in(A) = 0. Note that an 1/0 automaton can be nondeterministic, two things: same
state,
different model’s
that
we
acts(A) as shorthand for acts(sig(A)), and likewise for in(A), 1/0 automaton A is said to be closed if all its actions are
more
and that
successor descriptive
than the
one locally same
action,
controlled applied
action in the
states. This nondeterminism power. Describing algorithms
possible tends to make results results about nondeterministic
by which
we mean
can be enabled same
state,
in the
can lead
is an important part as nondeterministically
to
of the as
about the algorithms quite general, since many algorithms apply a fortiori to all algorithms
obtained by restricting the nondeterministic choices. Moreover, the use of nondeterminism helps to avoid cluttering algorithm descriptions and proofs with inessential details. Finally, the uncertainties introduced by asynchrony make nondeterminism an intrinsic property of real concurrent systems, and so an important 2.1.3. an 1/0
property
to capture
Executions,
Schedules,
automaton,
each possible
in our formal
and Behaviors. run
model When
of the system
of such systems. a system is modeled
is modeled
by
by an “execu-
tion,” an alternating sequence of states and actions. The possible activity of the system is captured by the set of all possible executions that can be generated by the automaton.
However,
not
all the information
contained
in an execution
is
3 1/0 automata, as defined in [15], also include a fifth component, which is used for describing fair executions. We omit it here as it is not needed for the results described in this paper.
M. HERLIHY
888 important
to a user of the system,
placed.
We believe
externally
that
visible
events,
on the automaton’s of external
what
nor to an environment
is important
and not
the states
“behaviors’’-the
(i.e., input
about
or internal
subsequences
and output)
actions.
in which
the activity
the system
of a system
events.
Thus,
a system
is
is the
we focus
of its executions
We regard
ET AL.
consisting
as suitable
for a
purpose if any possible sequence of externally visible events has appropriate characteristics. Thus, in the model, we formulate correctness conditions for an 1/0 automaton in terms of properties of the automaton’s behaviors. Formally, or
infinite
A
such
an execution sequence
that
execution denote
(s,, ml+,,
of A is a finite
“””
fragment
beginning
-””
mnsn
s,+ ~) is a step
the set of executions
of A by firzexecs(A). a finite The
fragment
sOmlslm2
of A
with
of for
a start
state
and
and the set of finite
A state is said to be reachable fragment
SOT,s ~V2 states
a
---
Vns ~
actions
of
i for which s,+, exists. An is called an execution. We
every
of A by execs(A),
execution of A. schedule of an execution
sequence
alternating
executions
in A if it is the final
of A
is the
state of
subsequence
of
a
consisting of actions, and is denoted by sched( a ). We say that P is a schedule of A if ~ is the schedule of an execution of A. We denote the set of schedules of A by scheds(A) and the set of finite schedules of A by &wcheds(A). The behalior of a sequence ~ of actions in acts(A), denoted by beh( ~ ), is the subsequence of ~ consisting of actions in ext(A). execution fragment a of A, denoted by beh( a ), is defined We say that denote
f? is a behal)ior
of A if ~ is the behavior
the set of behaviors
of A by behs(A)
The belzavior of an to be beh(sched( a )).
of an execution
and the set of finite
of A. We
behaviors
of A
by &zbehs(A). We say that a finite schedule ~ of A can leale A irz state s if there is some finite execution a of A with final state s and with sched( a ) = ~. Similarly, a finite
behavior
~ of A can leave A in state s if there
is some finite
execution
a of A with final state s and with beh( a ) = ~. An extended step of automaton A is a triple of the form (s’, ~, s), where s’ and s are in states(A),
an ~
is a finite
of
sequence
of actions
in acts(A),
and there
is an execution
fragment
A having s’ as its first state, s as its last state, and p as its schedule. If /3 is any sequence of actions and @ is a set of actions, we write denote
the subsequence
an automaton,
we write
2.2. COMPOSITION. tion in
of several our
automata
operator identically
model, can
of ~ containing (31A for Often,
component we be
connects named
combined
each input
component automata. autonomously by one
a single a
of actions
~ 10 to
in 0. If A is
~ lacts(A).
systems
define
all occurrences
system
interacting
“composition” to yield
can with
also one
operation
a single
1/0
be viewed another. by
automaton.
output action of the component actions of any number (usually
In the resulting component and
which
as a combinaTo
reflect
several
this 1/0
Our composition automata with the one) of the other
system, an output action is generated is thought of as being instantaneously
transmitted to all components having the same action as an input. All such components are passive recipients of the input, and take steps simultaneously with the output step. 2.2.1. Composition of Action Signatures. action signatures. Let I be an index set that
We first is at most
define composition of countable. A collection
Correctness
of O~han
Management
signatures
{S,}, e ~ of action properties hold:
(3)
m
Thus,
action
is in
no action
internal
acts(S,)
for
Moreover,
component
signatures.
we
if the
for all
i, j = I such that
i + j,
i, j = I such that
i #j,
than
do not
do
not
one signature appear
permit
in the collection,
in any other
actions
=
U, ~, in(s~)
–
U,
●
and
signature
in the
infinitely
many
compatible
action
involving
The composition S = HZ. ~ S, of a collection of strongly signatures {S1}, ~ ~ is defined to be the action signature with ‘in(s)
following
i.
many
of more
of any signature
collection.
compatible
for all infinitely
is an output
actions
889
is said to be strong~
= 0 = @
(1) OUt(S1) n O@J) (2) int(Sl) n acts(S,)
Algorithms
1 out($),
= u ~● ~ OUt(si), = U, ● ~ int(S, ).
‘OUt(S) —int(S) Thus,
output
actions
are those
that
signatures,
and similarly
for internal
are inputs
to any of the component
are outputs
actions.
of any of the
Input
signatures,
actions
but
component
are any actions
outputs
that
of no component
signature. 2.2.2.
Composition
be strong~
of Automata.
compatible
if their
A collection
action
{A,},.
signatures
composition A = H,. ~ A, of a strongly {A,},., has the following components:s
~ of automata
are strongly
compatible
is said to
compatible.
collection
of
The
automata
—sig(A) = ~,. ~ sig(A,), —states(A) = H, ● ~ states, —start(A) = Hz ● ~ start(A,), —steps(A) is the set of triples (s’, n-,s) such that for all i c I, (a) if T G acts(Al), (s’[i], w, s[i]) = steps(A1), and (b) if m @ acts(Ai), then s’[i] == s[i].G then Since the automata
Ai
are input-enabled,
so is their
composition,
and hence
their composition is an automaton. Each step of the composition automaton involves all the automata that have a particular action in their action signature performing action
that
in their
action
concurrently,
signature
do nothing.
while
the automata
We often
refer
that
do not
to an automaton
have that formed
by
composition as a “system” of automata. If a! = SO’iTISl “.” is an execution of A, let a /Ai be the sequence obtained by deleting w,s~ when ~, is not an action of A,, and replacing the remaining SJ by SJ[i ]. Recall that we have previously defined a projection sequences. The two projection operators are related sched( a IA, ) = sched( a )IAZ, and similarly In the specifying
course their
of our internal
discussions, actions.
4 A weaker notion called “compatibility” given properties only. For the purposes
beh( a IAl)
we often
To avoid
reason
tedious
in
operator for action the obvious way:
= beh( a )Iil,. about
arguments
automata about
without
compatibil-
is defined in [15], consisting of the first two of the three of this paper, only the stronger notion will be required.
5 Note that the second and third components listed are just ordinary Cartesian products, ~t component uses the previous definition of composition of action signatures. We use the notation s[i] to denote the ith component of the state vector s.
while the
M. HERLIHY
890 ity, henceforth, are unique
we assume
to that
that
automaton,
unspecified
internal
and do not occur
of any of the other automata we discuss. All of the systems that we use for modeling that
is, each
action
component
will
2.2.3.
Propetiies
is an output
be an input
of some
of at most
actions
transactions
one other
of Systems of Automata.
of any automaton
as internal
component.
ET AL.
or external
actions
are closed
systems,
Also,
each
output
of a
component.
Here
we give basic results
relating
executions, schedules, and behaviors of a system of automata to those of the automata being composed. The first result says that the projections of executions of a system onto the components are executions of the components, and similarly
for schedules,
PROPOSITION
andlet
A=
Moreoller, in place
etc.
Let
1.
EI A,.
fl,
{A ,}1~ ~ be a strongly If
a = execs(A),
compatible
then
the same result holds for j7nexecs,
alA,
collection
= execs(A,)
scheds, f%scheds,
of automata, for
all
i = 1.
behs, and jinbehs
of execs.
Converses
can also be proved
for
all the parts
of the preceding
proposition.
The following are most useful. They say that schedules and behaviors of component automata can be “patched together” to form schedules and behaviors of the composition. PROPOSITION
Let
2.
and let A = TIlef
{A,}, ~ ~ be a strongly
(1)
Let P be a sequence of actions i ~ 1, then P e scheds(A ).
(2)
Let ~ be a finite sequence of actions all i = I, then ~ ● @scheds(A).
(3)
Let ~ be a sequence then /3 = behs(A).
(4)
Let ~ be a finite sequence of actions i = 1, then P ● finbehs( A ). The preceding
behavior behaviors
of actions
collection
of automata,
proposition
in acts(A). in acts(A).
in ext(A).
is useful
If
/31A, If
~ IA,
If
~ IA,
in ext(A).
If/3
in proving
that
E scheds(A1) = jlnscheds(A,
● behs(A,)
IA,
for
for
) for
all i =1,
= jinbehs(A1)
a sequence
all
for all
of actions
of a system A: It suffices to show that the sequence’s projections of the components of A and then to appeal to Proposition 2.
2.3. IMPLEMENTATION. automaton
compatible
A,.
by another.
We Let
define
A and
a
notion
B be automata
of with
“implementation” the
same
are
of external
is a
one
action
B if signature, that is, with extsig(A) = extsig(B). Then A is said to implement finbehs(A) G finbehs(B). One way in which this notion can be used is the following: Suppose we can show that an automaton B is correct, in the sense that its finite behaviors all satisfy some specified property. Then, if another automaton A implements B, A is also correct. One can also show that, if A implements B, then replacing B by A in any system yields a new system in which all finite behaviors are behaviors The definition of an implementation exhibit
every
possible
behavior
of the original system. of B by A does not
of B. Rather,
B is viewed
require
that
as describing
A the
acceptable behaviors, and the behaviors of A are constrained to be acceptable. This constraint is easy to satisfy: A could simply never produce any outputs. (Notice
that
A
still
has
behaviors,
since
all
automata
are
required
to
be
Correctness
of Orphan
input-enabled.) something. address using
In
Management
other
Such
One useful
words,
there
requirements
in this paper.
the parts
take
They
891
is no the
that
that
of
that
A
actually
liveness,
which
the 1/0
automaton
within
deal with
for showing
requirement
form
can be handled
of the model technique
Algorithms
we
do
do not model
liveness.
one automaton
implements
another
is
to give a correspondence between states of the two automata. Such a correspondence can often be expressed in the form of a kind of abstraction mapping that we call a possibilities mapping, defined as follows: Suppose A and B are automata with the same external action signature, and suppose f is a mapping from
states(A)
to the power
set of states(B).
That
is, ifs
is a state of A, f(s) is a
set of states of B. The mapping f is said to be a possibilities B if the following conditions hold:
mapping
(1) ~=
eV:O~ start
t ~ of
(2) flet
s’ be a reachable
state SO of A, state
(s’, m,s) a step of A. Then having an empty schedule) (a) ylext(B) (b) t G f(s). The following
PROPOSITION
3.
and
behaviors where
input
A implements
it
safety
such
a possibilities
restrictions
neither properties.
provides
of B, and
mapping
from
As long
does the automaton. O
be
a set
in our model
to restrict inputs
well-fomzedness
is in terms
of behaviors: Let
automata
convenient
obeys certain
for sequences of actions We say that A preserues &-lA G finbehs(A). Thus, if an automaton to violate cumulative
state
that
A to
B.
Although
is often
the environment
a property
the property, for
actions,
in which
of discussing
a reachable
B such
Suppose that A and B are automata with the same external there is a possibilities mapping from A to B. Then A
the environment
preserves
state
there is an extended step (t’, y, t) of B (possibly such that the following conditions are satisfied:
2.4. PRESERVING PROPERTIES. to block
t’ = f(s’)
shows that giving
to show that
action signature implements B.
of A,
is a start
A to
= m-lext(A),
proposition
B is sufficient
there
from
of the
in a sensible notion
Such a notion actions
and
to
those
way, that
restrictions.
A useful
that
as the environment of
are unable
attention
k, way
an automaton does not violate
is primarily
interesting
P a safety
property
in 0. Let Abe an automaton with O fl int(A) = @. P if ~w 10 G P whenever ~ I@ = P, m G out(A), and preserves
P: as long as the behavior satisfies
a property
P, the automaton
environment only provides P, the automaton will only
inputs perform
is not the first such that the outputs such
that the cumulative behavior satisfies P. In many cases of interest, we have @ G ext(A); note that even in this case, the fact that an automaton A preserves P does not imply
that
all of A’s behaviors,
when
restricted
to 0, satisfy
P. It is
possible for a behavior of A to fail to satis~ P if an input causes a violation of P. However, the following proposition gives a way to deduce that all of a system’s behaviors satisfy P. The proposition says that, under certain conditions, if all components of a system preserve P, then all the behaviors of the composition satisfy P.
M. HERLIHY
892 PROPOSITION
Let
4.
{A,},.
presenes
P.
behs(A)l@
Then
A
presen’es
Let
/3 be
a sequence
and &r 1A = finbehs(A). finbehs(A,), by Proposition Now
suppose
In
this
section,
we
to which
ric systems conditions orphans.
if
@ n i~d A)
= 0, and i E I, A, = 0,
actions
= 0,
such
~ I@ G P, m ~ out(A),
that
~ ● behs(A).
and let prefix
then
PI@ G p.
then
Since
pm IA, ●
A preserves
of ~ I@ is in P. Then,
basic
results
to
plus in
invoke can
invoke describing
on
~ I@ e P, by
transaction-processing the
subtransactions algorithms
and include
operations concurrency
the
geneof
user-provided to coordinate
written
by applica-
Transactions
addition,
if nesting
and
receive
responses
are
per-
is allowed, from
the
processing. transaction-processing
making
decisions
on
objects.
control
of
In
of their
data
are
the
correctness
management
designed
language.
objects.
system,
transactions,
consist
transactions
programming
results
both
of correct
algorithms
subtransactions the
transaction-processing
of [2]. We also define
the notion
The
data
of
generalize
systems
transactions. a suitable
class
systems
systems
in particular,
operations
subtransactions
with
the
Basic
transaction-processing
of different
transactions
interact
systems,
apply.
Transaction-processing
code,
programmers
a
of automata
❑
for basic systems,
activities
In
of
every finite
define
our
3.1. OVERVIEW.
mitted
Furthermore,
of [4] and the pseudotime
transaction tion
collection
Systems
systems
the
compatible
Then m E out(A, ) for some i c I, and 2. Since Al preserves P, pm I@ ~ P.
@ n in(A)
that
P, by a simple induction, the limit-closure of P. 3. Basic
P.
~ ● behs(A),
c P. That is, f
PROOF.
~ be a strongly
@ be a set of actions such that @ n int(A) for actions in @. Suppose that for each
and let A = 111. ~ A,. Let let P be a safety property
ET AL.
and
about The
recovery
algorithms
when
to
schedule
transaction-processing
In
algorithms.
many
interesting cases (e.g., for locking algorithms), the transaction-processing algorithms can be naturally divided into a controller and a collection of objects, where each object includes concurrency control and recovery algorithms appropriate
for
transactions
that
object
and
and
objects.
general results. The transaction-processing ~stems,
In the organization
the We
controller
manages
do not,
however,
systems
studied
communication require
in
this
this paper
among
division are
called
for
the our basic
we consider,
the transaction-processing algorithms are represented by a component called a basic database. Each component of a basic system is modeled as an 1/0 automaton. That is, each transaction is an automaton, and the basic database is another automaton. The nested structure of transactions is modeled by describing each transaction and subtransaction in the transaction nesting structure as a separate 1/0 automaton. If a parent transaction T wishes to invoke a child transaction T’, T issues an output action that “requests that T’ be created. ” The basic database receives this request, and at some later time might issue an action that is an input
to the child
transactions cating with
T’ and corresponds
to the creation
of T’. Thus,
the different
in the nesting structure comprise a forest of automata, communieach other indirectly through the basic database. The highest-level
Correctness
of Orphan
transactions, tions,
that
is, those
are the roots
It is actually
Management that
Algorithms are not
893
subtransactions
of any other
transac-
in this forest. structure
as a
tree rather than as a forest. Therefore, we add an extra root automaton dummy transaction, located at the top of the transaction nesting structure.
more
convenient
to model
the transaction
nesting
as a The
highest-level user-defined transactions are considered to be children of this new root. The root can be thought of as modeling the outside world, from which
invocations
of
top-level
transactions
originate
and
about the results of such transactions are sent. h the rest of this section, we define basic systems conditions
that
3.2. SYSTEM to name
the
they are supposed TYPES.
We
transactions
begin
and
A system type consists
subset
by defining
objects
in
a type
a basic
reports
the correctness
structure
that
will
be used
system.
of the following:
TO E T,
accesses of T not containing
—a mapping names into
and state
which
to satisfy.
—a set T of transaction names, —a distinguished transaction name —a
to
To,
parent: T – {TO} + T, which configures the set of transaction a tree, with To as the root and the accesses as the leaves,
—a set X of object names, —a mapping object: accesses —a set V of return ualues.
-+ X,
Each element of the set accesses is called an access transaction name, simply an access. Also, if object(T) = X, we say that T is an access to X. In referring as leaf node,
to the transaction tree, we use standard tree internal node, child, ancestor, and descendant.
we consider
any node
the ancestor
and descendant
a least common The
to be its own
ancestor
transaction
tree
describes
TO as the
name
tree
represents
the name
The
of the
children
transactions.
The
and its own
are reflexive.
descendant,
such case,
that
is,
We also use the notion
of
of two nodes.
with
parent.
ancestor
relations
terminology, As a special
or
the nesting
dummy
root
structure
transaction.
of a subtransaction
of
TO represent
accesses represent
transaction child
of the transaction
names
names
for Each
of
the
named
top-level
for the lowest-level
names,
node
in this by its
user-defined transactions
in
the transaction nesting structure; we use these lowest-level transaction names to model operations on data objects. Thus, the only transactions that actually access data are the leaves of the transaction tree, and these do nothing else. The internal nodes model transactions whose function is to create and manage subtransactions The
tree
all possible tion,
including
structure
transactions
however,
imagine system. ing.
that The
only the
be thought
that
some
tree
tree will,
accesses, but they
should
might
of these
structure in general,
do not access data directly.
of as a predefine
naming
scheme
ever be invoked.
In any particular
transactions
actually
is known
will
in advance
be an infinite
take
steps.
by all components
structure
with
infinite
for
execuWe of a branch-
The set X is the set of names for the objects used in the system. Each access transaction name is assumed to be an access to some particular object, as designated by the object mapping. The set V of return values is the set of
M. HERLIHY
894
ET AL.
Basic
\
Database /
/
\
\
FiG. 1.
possible
values
that
might
be returned
to their parent transactions. For the rest of this paper, 3.3. GENERAL system
type
each nonaccess Later basic
by successfully
we fix a particular
STRUCTURE
is a closed
Basic system.
OF BASIC
system
transaction
name
system A
SYSTEMS.
consisting
completed
transactions
type. basic
system
for
a given
a transaction
of
T and a single
autonzato?z AT basic database automaton
in this section, we give conditions to be satisfied by the transaction database automata. Here, we just describe the signatures of
for B.
and these
automata, in order to explain how the automata are interconnected. Figure 1 depicts the structure of a basic system. The transaction nesting structure is indicated in part by dotted lines between transaction
automata
corresponding
do not have associated
automata,
of accesses. The
connections
indicated
by solid
the basic Figure
direct lines.
database,
Thus,
between
the transaction
but not directly
2 shows the interface
to parent so the diagram
with
and
child.
Access
transactions
does not indicate
automata
the parents
(via shared
automata
actions)
interact
directly
are with
each other.
of a transaction
automaton
in more
detail.
The
automaton for transaction name T has an input action CREATE(T), which is generated by the basic database in order to initiate Ts processing. We do not include explicit arguments to a transaction in our model; rather we suppose that there is a different transaction for each possible set of arguments, so any input to the transaction is encoded in the name of the transaction. T has REQUEST_ CREATE(T’) output actions for each child T’ of T in the transaction
nesting
structure;
these
are requests
for creation
of child
transactions,
and
are communicated directly to the basic database. At some later time, the basic database might respond to a REQUEST_ CREATE(T’) action by issuing a CREATE(T’)
action;
in case T’ is not
an access, this
action
is an input
to the
automaton for transaction T’. T also has REPORT_ COIvlMIT(T’, v) and REPORT_ABORT(T’ ) input actions, by which the basic database informs T about the fate (commit or abort) of its previously requested child T’. In the case of a commit, the report includes a return value v that provides information about the activity of T’; in the case of an abort, no information is returned. Finally, T has a REQUEST_ COMMIT(T, v) output action, by which it announces to the basic database that it has completed its activity successfully, with a particular result that is described by return value v. Figure 3 shows the basic database interface. The basic database in any particular CREATE
basic system and REQUEST_
receives the previously mentioned COMMIT actions as inputs from
the
REQUEST_ transaction
Correctness
of Orphan
Management
Algorithms
895
REQUEST_COMMIT(T,v)
CREATE(T)
T
REPORT.COMMIT(T,V)
REQUEST_CREATE(T)
REPORT_ABORT(T’)
Transaction
FIG. 2.
REQUEST_CREATE
interface.
CREATE
REQ1.JEST_COMMIT
REQUEST.COMMIT
(accesses)
Basic Database
COMMIT ABORT ‘
‘o
REPORT.COMMIT REPORT_ABORT Basic database interface.
FIG. 3.
automata. action
It produces
automata
produces represent
CREATE
or invoking
actions operations
as outputs,
thereby
on objects.
The
awakening
basic
database
transalso
REQUEST_ COMMIT(T, v) output actions for accesses T; these responses to the invocations of operations on objects. The value v in
a REQUEST_ COMMIT(T, tion as part of its response.
v) action is a return value returned by the operaThe basic database also produces COMMIT(T) and
ABORT(T) actions for arbitrary transaction names T + TO, representing decisions about whether the designated transaction commits or aborts. For technical convenience, we classi~ the COMMIT and ABORT actions and the REQUEST–COMMIT actions of the basic system component.7 REPORT–ABORT transactions Different section
of this
actions for access transactions when they are not inputs to
as output any other
The basic database also has REPORT_ COMMIT and actions as outputs, by which it communicates the fates of
to their basic
and CREATE database, even
parents.
databases
paper
may
describes
include
generic
additional databases,
output which
actions. have
The
additional
final out-
7 Classifying actions as outputs even though they are not inputs to any other system component is permissible in the 1/0 automaton model. In this case, h would also be possible to classify these actions as internal actions of the basic database, but then the statements and proofs of the ensuing results would be more complicated.
896
M. HERLIHY
puts
by which
locks
may
contain
the
be
additional
As
all child
any
output
until
In
for most
many
our
such
work,
it
treats
cases,
is convenient
to
activated.
This
separation
occurs
in
results
among
COMMIT
occurs.
that
this
means
use
two
as if it had
A
that
the
to
to perform
consequence
ever
been
action
not
is that
might
are
of “creat-
as an input
constrained
transactions
a system
action
transaction
be
action of
which
data.
of
to the
formally
will
so that
the
of this
system
must
be created
system
in any
will
include
automata.
to describe
is important
components
child
transactions
and CREATE,
and
the
objects, example,
of timestamp
earlier
is treated
creation
CREATE
distinction
the
a CREATE
interesting
to the a second
management
transaction
all possible
are
referred
action
child
transaction
we
model
dynamic
automata
infinitely
automata,
though
CREATE
modeling
execution.
1/0
for
the
actions of
the
the
The
communicated
databases
involving
Even
transaction;
method
In
case
transaction,
along.
the
include
the
are
Pseudotime
statically.
a child
there
of transactions
outputs
is always
determined ing”
fates
released.
ET AL.
our
what
in actual and
separate
happens distributed
proofs.
REQUEST-COMMIT,
actions,
when
systems
Similar
REQUEST_
a subtransaction such
remarks
COMMIT,
hold
and
is
as Argus, for
the
REPORT-
actions.
3.4. SERIAL
ACTIONS.
system type include given system type subsection:
CREATE(T)
transaction
name
COMMIT(T),
The
external
actions
of
a basic
the serial actions for that type. are defined to be the actions and
and v is
REQUEST–COMMIT(T, a return
ABORT(T),
REPORT_
value,
system
v),
and
of
a given
The serial actions for a listed in the preceding where
T
is
any
REQUEST-CREATE(T),
COMMIT(T,
v),
and
REPORT.
ABORT(T), where T # To is a transaction name and v is a return value.x If ~ is a sequence of actions, define sen”al( ~ ) to be the subsequence of ~ containing all the serial
actions
in ~.
In this subsection, we define some simple concepts involving serial actions. All the definitions in this subsection are based on the set of actions only, and not on the specific automata in any particular system. For this reason, we present these definitions here, before going on (in the next subsection) to give more information about the basic system components. We first
present
of well-formedness 3.4.1.
some fundamental for sequences
Terminology.
completion ABORT(T)
The
definitions,
and then
we define
notions
of actions.
COMMIT(T)
and
ABORT(T)
actions for T, while the REPORT_ COMMIT(T, for T. actions are called report actions
actions
are
called
v) and REPORT_
We associate transaction names with some of the serial actions, as follows. Let T be a transaction name. If n is either a CREATE(T) or a REQUEST_ COMMIT(T, v) action, or is a REQUEST. CREATE(T’), REPORT_ COMMIT(T’, v’) or REPORT–ABORT(T’), where T’ is a child of T, then we define transactiorz(m ) to be T. If w is a completion action, then transaction (w) is undefined.
In some
contexts,
we also need
to associate
a transaction
with
8 These actions are called serial actzons because they are exactly the external actions of a serial of the given type. More will be said about serial systems later in the paper.
gstem
Correctness
of Orphan
completion
actions;
Management since
Algorithms
a completion
897
action
for
T can
be
thought
of
as
occurring in between T and parent(T), some of the time we want to associate T with the action, and at other times we want to associate parent(T) with it. Thus, we extend the transaction(w) definition in two different ways. If rr is any serial action, then we define hightransaction(m ) to be transaction(n) if m is not a completion
action,
if n is any serial is not
a completion
particular, rial
and to be parent(T), action,
action,
T for which
if m is a completion
lowtransaction(
action
= lowtransaction(n) transaction(m)
for T. Also,
m ) to be transaction(m)
and to be T, if T is a completion
hightransaction(n)
actions
We whose
we define
action
if m-
for T. In
= transaction(~)
for all se-
is defined.
also require notation for the object associated with any serial action transaction is an access. If n- is a serial action of the form CREATE(T)
or REQUEST_ object(m)
COMMIT(T,
v), where
T is an access to
X,
then
we
define
to be X.
We extend the preceding notation to events as well as actions. For example, if T is an event, then we write transactiorz(n ) to denote the transaction of the action tion,
of which
T is an occurrence.
lowtransaction,
paper
and
object
in the same way, without
Now
we require
execution.
Let
actiue in REQUEST_
~
further
terminology
the definitions
We
extend
of hightransac-
other
notation
in this
explanation.
to describe
/3 be a sequence
provided COMMIT
We extend similarly.
the
of actions.
status
of a transaction
A transaction
that ~ contains event for T. Similarly,
name
during
T is said to be
a CREATE(T) event but no T is said to be live in /3 provided
that ~ contains a CREATE(T) event but no completion event for T. (However, note that /3 may contain a REQUEST–COMMIT for T.) Also, T is said to be an orphan T. We
in ~ if there
have
particular introduce Namely,
already sets
of
another
is an ABORT(U)
used
projection
actions,
and
projection
if ~ is a sequence
action
operators
to
actions
operator, of actions
p IU is defined to be the sequence transaction name, we sometimes write
3.4.2.
automata.
this time
~ I{T: transaction(m) /3 IT as shorthand for name,
and basic
database
we define
transaction
We define
well-formedness.
those
write
event in ~, if any, is a CREATE(T) events,
then
/3 IX to
created, and the been requested.
conditions, which are actions of the transac-
conditions
here.
Let T be any transaction
event,
names.
on the transaction of a basic system. are guaranteed; for
A sequence B of serial actions m with transaction(n) = T is defined transaction well- foimed for T provided the following conditions hold: (1) The first CREATE
to now
G u}. If T is a ~ I{T}. Similarly, if
should not take steps until it has been not create a transaction that has not
automata.
We names,
we sometimes
are captured by well-formedness properties of sequences of external
U of
sequences
to sets of transaction
We place very few constraints and basic database automaton in our definition we want to assume that certain simple properties
Such requirements fundamental safety
First,
action
particular
Well-Forrnedness.
example, a transaction basic database should
tion
to restrict of
and U is a set of transaction
~ is a sequence of actions and X is an object denote ~ I{n: object(n) = X}.
automata However,
in ~ for some ancestor
and there
name. to be
are no other
898
M. HERLIHY
There is at most T’ of T,
(2)
(3) Any report CREATE?(T’) (4) There (5)
one REQU13ST_CREATE(T’)
event for in ~,
is at most
a child
one report
T’
event
of
event
T
is
in p for each child
preceded
in ~ for each child
ET AL.
by
REQUEST_
T’ of T,
If a REQUEST-COMMIT event for T occurs in ~, then it is preceded by a report event for each child T’ of T for which there is a REQUEST_ CREATE(T’) in /3,
(6) If a REQUEST-COMMIT in ~. In particular, well-formed
event for T occurs
if T is an access, then for
T are
the
CREATE(T)REQUEST_ set of transaction
prefixes
COMMIT(T,
well-formed
is defined
conditions
to
be
basic
sequences
two-event
v). For
sequences
it is prefix-closed and limit-closed. Next, we define basic database actions
the only of the
in ~, then
that
are transaction
sequences
of the
form
any T, it is easy to see that
for T is a safety
well-formedness.
database
it is the last event
A
well-foimed
property, sequence
provided
that /3 of the
the
is, that serial
following
hold:
(1) The sequence ~ IT is transaction well-formed, for all transaction names T, (2) If a CREATE(T) event occurs in ~, fG~ T + TO, then there is a preceding REQUEST_ CREATE(T) in ~, (3) If
there
is a COMMIT(T)
Q1-JEsT-COMMIT(T, (4)
If
there
event
v) event
is an ABORT(T)
QuEiST.C~EATE(T)
event
event
in
~,
then
there
is a preceding
RE-
is a preceding
RE-
in p, for some v, in
~,
then
there
in ~,
(5) There is at most one completion event in ~ for each transaction name T, (6) If there is a REPORT_ COMMIT(T, v) event in f3, then there is a preceding REQUEST_ COMMIT(T, v) event in ~ and a preceding COMMIT(T) (7)
event in /3, If there is a REPORT-ABORT(T) ABORT(T) event in ~,
(8) There
is at most
3.5. BASIC
one report
SYSTEMS.
We
event
event are
now
in
~, then
there
in ~ for each transaction ready
to
define
basic
is a preceding name
T.
systems.
Basic
systems are composed of transaction automata, one for each nonaccess transaction name, and a single basic database automaton. We describe the two kinds of components
in turn.
3.5.1. Transaction Automata. A transaction automaton transaction name T (of the given system type) is an 1/0 following external action signature: Input: CREATE(T) REPORT–COMMIT(T’, REPORT_ABORT(T’), output: REQUEST_ REQUEST_
CREATE(T’), COMMIT(T,
AT for a nonaccess automaton with the
v), for every child T’ of T, and return for every child T’ of T for every child T’ of T v), for every return value
v
value
v
Correctness
of O~han
In addition, to preserve
AT
Management
Algorithms
may have an arbitrary
transaction
899
set of internal
well-formedness
for
T,
actions.
as defined
We require in
the
AT
preceding
subsection. As discussed earlier, this does not mean that all behaviors of AT are transaction well-formed, but it does mean that as long as the environment of AT does not violate transaction well-formedness, AT will not do so. Except for that requirement, transaction automata can be chosen if ~ is a sequence of actions, then ~ IT = P lext(AT). Transaction
automata
transactions gramming
example,
complete.)
intended
may impose
Argus
However, Basic
are
to
in any reasonable
languages
ior. (For
3.5.2.
defined
Database
general
enough
restrictions
activity
in transactions
do not require
Automata.
A
Note
to
language.
additional
suspends
our results
be
programming
arbitrarily.
model
Particular
on transaction until
that the pro-
behav-
subtransactions
such restrictions.
basic database
automaton
is also mod-
eled as an 1/0 automaton. A basic database passes requests for the creation of subtransactions to the appropriate recipient, initiates REQUEST_ COMMIT actions for accesses, makes decisions about the commit or abort of transactions, and passes reports about the completion of children back to their parents.
It may also car~
A basic
database
Input: REQUEST_
out other
activity.
has the following
CREATE(T),
actions
v), T a nonaccess
COMMIT(T,
COMMIT(T), ABORT(T),
T + TO T # T(]
REPORT. COMMIT(T, REPORT_ABORT(T),
action
signature:
T + TO
REQUEST–COMMIT(T, output: CREATE(T) REQUEST_
in its external
transaction
v), T an access transaction
name
name
V), T # TO T # TO
In addition, it may have internal actions. Depending
other arbitrary output actions, as well as arbitrary upon the design of the particular basic database
automaton, some of the additional output actions may be associated with particular objects. Hence, each basic database is assumed to come equipped with an extension of the object partial mapping on actions, which may associate some of these additional, nonserial output actions with particular object names. That is, each nonserial output action n may (but need not) have object(n) defined. The REQUEST. to be identified conversely, actions
all
CREATE with
the
for which
the
and REQUEST.
corresponding
CREATE
and
report
COMMIT
outputs outputs
T is an access) are identified
inputs
of transaction
with
(except
those
are intended automata,
and
CREATE(T)
the corresponding
inputs
of
transaction automata. A basic database is required to preserve basic database well-formedness. There are many examples of basic databases in the literature. For example, the composition of the generic controller and generic objects of [4] preserves basic database well-formedness, and so is an example of a basic database. The same is true for the composition of the pseudotime controller and pseudotime
900 objects
of [2]. We present
fact, we claim
that
these
almost
examples
all interesting
be modeled as basic databases. Our notion of basic database algorithms
that
It turns
out
are relevant
that
the
in more
later
in this paper. algorithms
In can
(See [14] for additional examples.) identifies the aspects of transaction-processing
to our analysis
details
detail
transaction-processing
of how
of orphan
management
synchronization
and
mented by a basic database are largely irrelevant. important contributions of this paper: we are able tions dent
ET AL.
M. HERLIHY
algorithms.
recove~
Indeed, to state
are imple-
this is one correctness
of the condi-
for and verify orphan management algorithms in a way that is indepenof the concurrency control and recove~ methods used within the basic
database. 3.5.3.
Basic
Systems.
A
compatible
set of
automata
transaction
names
and the
ated with for
each nonaccess
T. Associated
with
system type. When the particular
basic
system
indexed singleton name
basic
name
BD
system
the
composition
union
of
(for
“basic
set {BD}
transaction
the
B is the
by
the
of
a strongly
set of
nonaccess
database”).
T is a transaction
is a basic
database
B is understood
Associ-
automaton
automaton
from
for
the context,
A ~ the
we call
schedules, and behavits external actions the basic actions, and its executions, basic schedules, and basic behazliors, respectively. The iors the basic executions,
following proposition formedness properties: P~OPOSITION
5.
If
(1)
For elw?y transaction
(2)
The sequence PROOF.
Note
says that
basic
behaviors
p is a basic behauior, name
is basic database
first
the
basic
the
appropriate
then the following
T, BIT is transaction
serial(~) that
have
conditions
well- fon-ned for
well-
hold.
T.
well-formed.
database
preserves
basic
database
well-
formedness, formedness automaton
and this immediately implies that it preserves transaction wellfor every transaction name. Next, note that each transaction preserves transaction well-formedness for the appropriate transac-
tion
Furthermore,
name.
it has in its signature
no actions
and so preserves transaction well -formedness for first part of the proposition foljows by Proposition A simple induction shows that each transaction basic
database
Proposition
4.
well-formedness,
and the
second
of other
transactions,
all transaction names. The 4. automaton also preserves conclusion
follows
also from
❑
3.6. SERIAL CORRECTNESS. In this subsection, we give appropriate notions of correctness for basic systems. These include notions appropriate for systems that manage orphans, as well as notions for systems that do not manage orphans but do carry out concurrency control and recovery. These notions are taken from [4], [12], and [13]. The spirit of our definitions is similar to that of the usual definition of serializability in the database literature. However, the usual notion does not take nesting or aborts into account. We define correctness conditions for basic systems of a given type by relating their behaviors to those of a particular basic system of that type, the serial system. The executions, schedules, and behaviors of a serial system are called serial executions, serial schedules, and serial behaviors, respectively. Serial systems are composed of transaction automata and a serial database, which
Correctness itself
of O~han
is the
automata
Management
composition
of
are identical
Algorithrm
a serial
to those
901
scheduler
in basic
and
objects.
The
serial
systems.
The
transaction
scheduler
controls
the order in which the transactions take steps and in which accesses to objects occur. It permits only one child of a transaction to run at a time. Thus, sibling transactions
execute
sequentially
at every
parent
node
in the transaction
tree,
so that transactions are run in a depth-first traversal of the tree. Also, the serial scheduler aborts a transaction only if it has not been created, and creates a transaction
only if it has not been
transactions
execute
Objects
in
guarantees never take with
sequentially
a serial
aborted.
Thus,
and aborted
system
are
quite
in a serial
transaction
simple.
execution,
take
Since
sibling
no steps.
the
serial
scheduler
that siblings execute sequentially, and that aborted transactions any steps, serial objects do not have to deal with concurrency or
failures.
The
serial
objects
serve
as a specification
of how objects
should
behave in the absence of concurrency and failures. (The serial objects serve the same purpose as the serial specifications [25] and [26].) A detailed description of serial systems may be found in the references [41 and [12– 14]. NOW we give a definition that says that a sequence of actions serial
behavior
actions there
to
and T is a transaction exists a serial
same thing
behavior
in P that
Now we can define that
a particular
a basic
system
it could
transaction. name,
we say that
y such that
~
“looks
like”
is a sequence
f? is serially
correct for
words,
a of
T if
T sees the
behavior.
of correctness
correct
if
y IT = ~ IT. In other
see in some serial
two notions is serially
Namely,
for basic systems.
if each of its finiteg
First,
behaviors
we say
is serially
correct
for all transaction names. 1“ The requirement that every transaction see a serial view is very strong. Without orphan management, in fact, systems may not meet this requirement. (This is true of all published concurrency control algorithms for nested transactions of which we are aware.) Instead, they provide a slightly weaker notion of correctness, namely that nonorphan transactions see serial views. More precisely, we say that a basic system is serially comect for nonophans transaction arbitrary The serially
names
if each of its finite that
are
not
orphans
behaviors in
~.
~ is serially Orphans,
correct
however,
for all can
see
that
are
views. papers correct
[2], [4], [12], and [13] contain for nonorphans.
examples
The basic system
of basic
systems
in [12] and [13] uses exclusive
locking for concurrency control and recovery,ll while more general commutativity-based locking strategy. timestamps for concurrency control and recovery. The orphan management algorithms of this paper
the systems in [4] use a The systems in [2] use ensure
that
the
systems
that use them are serially correct. To ensure this, the orphan management algorithms rely on the basic database to ensure serial correctness for nonorphans; in fact, the algorithms correctness for nonorphans.
work with any basic database that ensures serial In this sense, the orphan management algorithms
‘] Serial correctness is stated in terms of finite behaviors because the corresponding property for infinite behaviors is not satisfied by locking algorithms, in the absence of extra assumptions [22]. 1“ As discussed in [12], this definition of correctness allows different transactions in ~ to “see” different serial behaviors. However, correctness applies to the root transaction T(, as WCI1. so the root must see the same results from the top-level transactions that it could see in some serial ~~havior. There are some minor differences; for example, the completion and report actions are combined into single actions rather than treated as two separate actions.
M. HERLIHY
902 and the concurrency the following
sort
of the system
control for
with
algorithms
each orphan
orphan
are independent.
management
management
and
there exists a behavior y of the underlying and T is not an orphan in y. In other algorithms prevent thing a transaction basic imply
transactions from sees is consistent
system in some execution that if the basic system
corresponding
system
4. Information
Flow
In this
section,
models
the information
n
orders
in an execution,
n- cannot
world”
name,
then
it is not an orphan. These results correct for nonorphans, then the
events
is serially
partial
if an event that
m occurs
correct.
orders,
in behaviors
“know”
in which
of
that they are orphans—everyit could see in the underlying
of irreflexive
affects relations;
then
is a “possible
“knowing” with what
between
a result
~ is a behavior
T is a transaction
management
families
flow
If
basic system such that y IT = ~ IT words, the orphan management
in which is serially
orphan
we define
call these partial there
with
We prove
algorithm:
ET AL.
each
of which
of a basic system.
+ does not affect
@ occurred,
but
We
an event
in the sense that
+ does not.
The
algorithms
described later in this paper ensure that no event of a transaction is affected by the abort of an ancestor; thus, no event of a transaction ever knows that the transaction is an orphan. To make our definitions as general as possible, we define affects relations by describing the constraints they must satisfy. Later in the paper, we give examples of affects relations for particular kinds of systems. 4.1. FAMILIES R is a binary
OF AFFECTS
say that
y is R-closed
contains
any event
Let each
in
in ~, and
~ if, whenever
~ in ~ such that
B be a basic
system,
actions
affects relations
for B provided
that
RP
is an irreflexive
● Rp
then
(2) If -y is a prefix (&n) = RY,
of ~, and
the following order
+ and rr are in y, then
Suppose that (3 is a behavior of B and ABORT(T’) event, and m is a CREATE, REPORT–COMMIT, a REPORT–ABORT(T)
#
is related
(a) If m (b) If n (c) If ~ (d) If m (e) If T (f)
to
= = = = =
+
or
between
of relations,
conditions
of B and Y is an Rfi-closed
transaction,
we
in ~, it also one for
R is said to be a family
on the
(4)
is an event
T
actions,
~, then
of
hold.
events
in
~ such
that
if
m in ~,
If p is a behavior a behavior of B,
a nonaccess
of
an event
be a family
(3)
there
of basic
n-) E R.
of B. Then
partial
@ precedes
is a sequence
y contains
(+,
~ of external
(+, ~)
~
y is a subsequence
and let R = {RB}
sequence
(1) Each
If
RELATIONS.
on events
relation
(+, T)
subsequence
and
v
in
of p, then
y is
(~, T) G RD, where + is an a COMMIT, an ABORT, a of for T # T’, an output
a REQUEST–COMMIT +
E RP if and only if
/3 such
for that
an
access.
( q5, +)
E Rp,
Then where
m as follows:
CREATE(T), then 4 = REQUEST. CREATE(T), COMMIT(T), then * = REQUEST. COMMIT(T, REPORT-COMMIT(T, v), then * = COMMIT(T), REPORT-ABORT(T), then $ = ABORT(T), ABORT(T), then + = REQUEST-CREATE(T),
If w is an output transaction T,
of nonaccess
transaction
T, then
*
v),
is an event
of
Correctness
of O@an
Management
903
Algorithms
(g) If r = REQUEST-COMMIT(T, then object(~) = X.lz The first
two conditions
describes
a partial
v) where
are quite
ordering
T is an access to object
X,
R6
simple;
the first
says that
with
the order
in which
consistent
each relation events
appear
in
/3, and the second says that whether or not one event affects another is determined at the time the second event occurs. The third condition describes a sense in which the relation Rp captures all the dependency i elationships between
events.
behavior
/3, then
The
condition
n cannot
implies
“know”
that
that
if m is not
@ occurred,
affected since
by C$ in some
~ cou!.~ also have
occurred in a different behavior in which + did not occur. The fourth condition is a technical condition that describes certain limitations on the pattern of information strated
flow.
It will
be needed
for the examples
for
in Section
our
later
proofs,
and
can be demon-
9.
When the particular family R = {Rp} of affects relations is understood, we often refer to each RB as an ajfects relation, and we often say “0 affects n in ~“
to mean
that
4.2. FAMILIES a family
of
● RD.
(+, T)
OF DIRECTLY-AFFECTS
affects
family
of
family
of relations,
smaller
is said
to be
relations
can
relations. one
Thus,
for
each
RELATIONS. be
let
B be
transitive
closure
of R~.
In this
many
cases
described
a basic
sequence
of directly-affects a family R = {RP} of affects relations
family
In
conveniently
system
~ of external
by and
R’
actions
of interest,
a generating = {R~}
be
of B. Then
a R’
relations for B, provided that there is a for B such that for each P, RB is the
case, we say that
R’
generates
R. Note
that
there is no requirement that the relations in R’ be minimal. When the particular family R’ = {R~} of directly-affects relations is understood, we often refer to each R~ as a directly -aflects relation; also, we often say that
“+
directly
4.3. USING AGEMENT
the
view
FAMILIES
is that abort
of
it could
get
subsequence that
affect
the
definition
to mean
OF AFFECTS The ensure
that
original
in that
idea
an event
Then
in a behavior
we
behavior of affects
TO
DESCRIBE
behind
show
the
that
it is not
all
resulting
relations,
and
affected
transaction we
events
does
MAN-
management
T is never
every
sequence
ORPHAN
orphan
an orphan;
containing
The
m) ● R~.
(4,
of a transaction
can
in which
behavior.
that
RELATIONS
intuitive
ancestor.
of a family
simply
of T and
gets take all
a
the
events
is a basic
behavior,
not
contain
an abort
can
see
inconsistent
by for
of T, by construction.
following
state
in
with
children
example
a system
the following
they an
of the them
an ancestor The
n in ~”
ALGORITHMS.
algorithms by
affects
without
illustrates orphan
how
an
orphan
TI and T2, both
of which
scenario:
accesses X through
T first
an
Suppose that T is a transaction are accesses to an object X. Consider
management.
the creation of Tz, which will access X again. requested only if TI completes successfully,
its child
Furthermore, and that TI
T1. T then
requests
assume that T2 is modifies X. Now
suppose that T aborts before Tz starts running at X, and X learns of the abort. If T1’s modification of X is undone when X learns that T aborted, then T, will not see the value for X that it expects, since T2 only runs if T1 has modified X successfully. This scenario is captured more precisely by the following fragment 12 Note that
$ can be either
a serial or a nonserial
event.
904
M. HERLIHY
of a schedule of a generic in Section 9.1):
system
CREATE(T) REQUEST.
CREATE(T1
)
CREATE(TL) REQUEST.
COMMIT(T1
, VI )
COMMIT(T1) REPORT. COMMIT(T1, REQUEST.
CREATE(TZ REQUEST_
two
in more
detail
OF(T) , V2 )
event lets X know that T has aborted.) The family of generic systems described in Section 9.1 ensures that the
event affects the INFORM_ABORT–AT(X) OF(T) event, and that at an object is affected by all prior events at the object. Thus, by Tz from running when its events would be affected by the abort of
an ancestor,
The
are described
V, )
) COMMIT(Tz
(The INFORM-ABORT affects relations for
5. Filtered
systems
CREATE(TZ)
ABORT(T) INFORM.ABORT_AT(X)
ABORT(T) an event preventing
(generic
ET AL.
we can prevent
it from
knowing
that
it is an orphan.
Systems orphan
management
different techniques. However, implements the same abstract For Sections
5 through
system B, together with directly-affects relations algorithms that exploit
algorithms
analyzed
in
this
paper
use
each can be proved correct by showing algorithm, described in this section.
8 of this paper,
we fix a particular
quite that
but arbitrary
it
basic
a family R of affects relations for B and a family R’ of for B, where R’ generates R. We describe several the properties of the affects relations to manage
orphans. In Section 9, we illustrate this general development specific basic systems and their affects relations.
by describing
two
One way of ensuring that actions of a transaction T are never affected by the abort of an ancestor of T is to add preconditions to all the actions of the basic database to permit actions of T to occur only if they would this way. It turns out, however, that this approach checks more
frequently
than
necessary.
In
this
section,
we
define
not be affected in for orphans much another
kind
of
that checks for orphans only when filtered system, REQUEST–COMMIT actions occur for access transactions. We then show that this is sufficient to ensure that transactions are never affected by the aborts of ancestors.
system,
called
a
We construct a filtered given family R of affects transaction database
automata automaton
automaton; it “filters” that any transaction, the basic system.
system based on the given basic system B and the relations. The filtered system consists of the given
from B and is obtained
a filtered database automaton. The filtered by slightly modifying the basic database
REQUEST_ COMMIT actions of access transactions orphan or not, sees a view it could see as a nonorphan
so in
Correctness
of (lphan
Management
5.1. THE FILTERED simple transformation the behaviors of the
Algorithms
905
DATABASE. The filtered database is obtained via a from the basic database. The only difference between two databases is that the new database only allows a
R13QUEST_COMMIT
of an access to occur
if it is not affected
by the abort
of
an ancestor. The state
filtered of the
where
database filtered
has the
database
basic_ state is a state
basic
actions.
Initial
equal
to an initial
states state
same
signature
has two
of the basic
database
of the filtered
of the basic
as the
components,
and history
database
database
basic
database.
basic_ state
The
history,
is a sequence
are those
and history
and
with
equal
of
basic–state to the
empty
sequence. A triple conditions
(s’, m,s) is a step of the filtered hold:
database
if and only if the following
(1) (s’.basic-state, n-, s.basic-state) is a step of the basic (2) s.history = s’.historym if n- is a basic action,l~ (3) s.history = s’.histo~ if m- is not a basic action, (4)
If n = REQUEST-COMMIT(T,
v) where
T’ is an ancestor of T, then object(~) = X in s’ history. Thus, occur,
at the point an explicit
same object LEMMA
jiltered
Let
6.
5.2. THE transaction
of the filtered = beh( /3 ).
filtered
SYSTEM.
and the filtered
schedules, behaliors,
and
The
behaviors
the
filtered
system implements
The mapping
LEMMA
above,
Let
8.
ellent
f that
database
is the
length the
First
database of
lemma
~. If
composition
automaton.
executions,
that can Ieaue the
of the
We call
filtered
its execu-
schedules,
assigns to each state s of the filtered
the filtered
is easily
and
P be a jiltered
behavior
(~,
performs
an explicit
note
that
Lemma
~
holds
13Recall that internal
The
is empty, for
P‘.
the
From
7 and proof
result the
test to ensure
by the abort of any actually guarantees
and let T be any transaction p)
=
p) = RP, for any ancestor
well-formed.
system the
seen to be a possibilities
❑
database
in B such that transaction
$ such that
PROOF. basic
to
at the
of the access.
that the REQUEST_ COMMIT of an access is not affected ancestor. The following key lemma shows that this test more: that a similar property holds for all events.
p be an elent
event
the basic system.
set f(s) consisting of s.basic_state Proposition 3 implies the result.
As described
no preceding
system
database
@ with
respectively.
The filtered
7.
PROOF.
schedule
Then s.histoly
X, and if
an event
of an access is about
that
of any ancestor
automata
tions,
singleton mapping.
~ be a jinite
affects
COMMIT
to verify
FILTERED
jlltered
event
the REQUEST_
by the abort
in states.
T is an access to object
no ABORTIT’)
is performed
is affected
database
LEMMA
where
test
database,
T‘
clearly
restrictions
the
Let
of T.
Proposition of
name.
T. Then there is no ABORT(T’)
5 imply
lemma
holds.
is by
Suppose
on affects
actions of the basic database are not classified
that
serial(~)
induction /3 =
relations,
~ ‘m, RF
as basic actions.
on and c
is the that
Rp ~ U
906
M. HERLIHY
{(0, m)l 4 is an action in P ‘}. Thus, lemma holds when p = m. Suppose
that
the lemma
by induction,
does not hold,
that
w in ~, where transaction(n) = T and T’ contradiction. We consider cases. (1) T
is a nonaccess
and
v
is an output
property of affects relations, n- such that ~ affects ~ in
it suffices
is, that
to show
that
~ = ABORT(T’)
is an ancestor
action
ET AL.
of
T.
there is an event /3’. This contradicts
affects
of T. We
Then,
the
derive
by the
a
fourth
* of T between + and the inductive hypothesis.
(2) T is an access to object X and w is a REQUEST_ COi14MITfor T. Then, by the fourth property of affects relations, there is an event Y of object X between ~ and m in ~‘ tion for m in the filtered (3)
such that database
m is CREATE(T). Then, is a REQUEST–CREATE(T)
~ affects is violated,
~ in ~‘. Then, a contradiction.
by the fourth property of affects relations, there event + between # and m such that +
affects * in ~‘. Since REQUEST_ CREATE(T) the inductive hypothesis implies that T’ is not The only possibility is that T’ = T, which implies REQUEST_ basic
CREATE(T)
database
in
well-formed,
P.
But
this
rr such that
~ affects
is an action of parent(T), an ancestor of parent(T). that ABORT(T) precedes
implies
rr is a nonserial ❑ diction.
5.3. SIMULATION following theorem system
ensures when
~ in /3’. Since transaction
behavior
~
discover
is
that
basic action.
OF
h
an orphan.
ment
algorithms.
( ~ ) = T“,
this contradicts
This
orphan, is the
gets
~ IT).
In
the
view
correctness
~ affects @ in hypothesis.
is undefined,
THE
FILTERED
SYSTEM.
that
a view
it could
get
in the
T’s
“view”
other it
words, sees
property
an
the
orphan
is consistent for
the
with
orphan
Let
y be the subsequence
of ~ containing
all actions
in
a
it
not
manage-
Let ~ be a jiltered behavior and let T be a transaction 9. there exists a basic behalior y SLLCh that T is not an oiphun in PIT.
PROOF.
basic cannot
THEOREM
Then ylT=
The filtered
It shows
a transaction
~‘.
a contra-
paper.
(Formally,
since basic
w such that the inductive
BY
of this
transaction
behavior,
), where T is a child of T. there is a REQI_JEsT_
transaction(m)
SYSTEM
result
an orphan.
local
is an
Then,
BASIC
key
every
is not its
being
THE
is the that
it
/? ) is not
hypothesis.
CllEATE(T, v) event + between ~ and Since transaction ~) = T, this contradicts
system
serial(
v), where T“ is a child of fourth property of affects v) event @ between ~ and
(5) T is a n.onaccess and ri is REPORT-ABORT(T” By the fourth property of affects relations,
(6)
that
a contradiction.
(4) T is a nonaccess and n- is REPORT-COMMIT(F3 T. Then T’ is an ancestor of T. By the relations, there is a REQUEST_ COMMIT(T”, the inductive
the precondi-
name. y and
n such that
transaction(w) = T, and all other actions @ that affect, in ~, some action whose transaction is T. Since RD is a transitive relation, y is RP-closed in ~. By the definition of a family of affects relations, y is a basic behavior. It suffices in
y.
to show that Suppose
not;
there that
is no ancestor is,
there
T’ of T for which
exists
an
ancestor
ABORT(T’ T’
of
T
) occurs for
which
Correctness
of Oiphan
ABORT(T’)
occurs
of T such
that
We obtain
PROOF.
Let
Theorem 8 and
of -y, ~ contains
~ affects
is
corollary
T in
an event
rr
8, this
is
P. By Lemma
~
be
a filtered
a behavior
~ IT =
/3 IT. Since
a
of Theorem
9.
serial
correct for nonorphans,
then the
correct.~4
9 yields
behavior
6 of the the
behavior
basic y
and
basic system
with
let
T be
system
is serially
-y IT =
any
such
8 IT;
transaction
that
T is not
correct
this
is
for
name. an orphan
nonorphans,
equal
to
~ IT,
as
❑
needed.
At first,
it might
seem somewhat
REQUEST–COMMIT for all orphans. of the indicate event
event
If the basic system is serial~
10.
system is serially
there
907
by the construction
an ABORT(T’)
an important
COROLLARY
in
in y. Then
Algorithms
❑
impossible.
filtered
Management
events
The reason
surprising
of orphan
that
it is enough
accesses to ensure
it is not necessary
to filter
to prevent
serial
other
the
correctness
actions
is because
assumptions about affects relations. Essentially, these assumptions that an event, say CREATE(T), cannot “know” about an ABORT(T’) unless
“knows”
some earlier
about
event,
in this case REQUEST_
the ABORT(T’).
an affects-closed is not affected
(The
assumption
CREATE(T),
about
affects
subsequence of a behavior is itself a behavior by ~, n cannot know that @ has occurred,
already
relations
that
implies that if n since there is a
behavior of the system in which m occurs and @ does not.) If we assume inductively that REQUEST_ CREATE(T) does not know about the abort of an ancestor
of T, then
CREATE(T)
will
The assumptions
the properties
not know about
assumed
about affects
for affects
relations
guarantee
that
it either. relations
derive
from
the restricted
communi-
cation patterns in typical systems: A nonaccess transaction receives information from its parent when it is created, and from its children when they report, but not from any other source. Access transactions may receive information from other accesses (e.g., accesses to the same object share state), but can only affect nonaccesses through a REPORT_ COMMIT event, which must be preceded by a REQUEST–COMMIT. As long as T does not receive reports from
any accesses that
a state
that
COMMIT
depends
actions
“know”
that
on
abort.
the
for orphan
its ancestor In
effect,
accesses, we isolate
has aborted, by
T cannot
preventing
orphan
observe
REQUEST_
transactions
from
the
objects, ensuring that an orphan transaction never sees that it is an orphan. Sometimes we want to be explicit about the dependency of the filtered system on the particular basic system and family of affects relations it is derived. Thus, we restate the corollary above in a form that dependency. then
If B is a basic
let Filtered(B,
system
and R is a family
R) be the corresponding
the same transaction automata the given basic database.
and the
filtered
filtered
of affects system;
database
from which exhibits this
relations
for B
it is composed
that
corresponds
of to
14It should be easy to see that the filtered system is also a basic system, and so the notion of serial correctness has been defined for filtered systems. Similar comments hold for the rest of the systems described in this paper.
M. HERLIHY
908 COROLLARY
6. Argus In that
comect for nonorphans,
section, system
we
describes
in
database,
simple
construction.
transactions
the
formal the
an
filtered
transactions.
tributed
system.
aborts
access
that
track
of
In
the
actual
sent
from
an
over event
directly-affects implement.
is serially
for
con-ect.
database,
the
Argus
by
the
that
[10].
basic
As
with
the
database
which
the
system
via
is composed
Argus
system
is serially
a of
imple-
correct,
ensure
abort
of
by
if
each later
that make
event
the
so is
ancestor,
the occurs,
h affects.
about
aborts
formally
directly
in
about of an
algorithm
and
we
affects
keeps
propagates
is propagated
ajj$ects. A
poor
describe another
of
a dis-
knowledge
by propagating
algorithm
@ directly
practical
Argus
that
this
of
actions
use of local
that
it
knowledge
REQUEST–COMMIT
event
the
global
is not
events
model
event
one
makes that
knowledge we
could
knowledge
an
uses
IIEQUEST_COMMIT
algorithm
system,
every
database
the
global
to any
network;
filtered
To
“known”
relation However,
show
filtered
filter
of
occurred.
Argus
to
The to
kind
an event
the
the
in the
system,
in the
system.
actions
Thus,
Argus
and
if
from
used
an Argus database
discussed
is obtained the
algorithm
by defining
algorithm
define
Thus.
This
aborts from
sages
of
affected
the
knowledge
Argus
have
is not
the
IIATABASE.
history
access the
of affects relations
B, R]
management
the algorithm
database
then
Argus
6.1. THE ARGUS entire
orphan
terms
system.
corresponding
the
the
Argus We
and
the
analyze
[9, 10]. We describe
filterecl
ments
then Filtered(
$stems
this
Argus
Let B be a basic system and R a famih
11.
B. If B is serially
ET AL.
this in
mes-
knowledge choice here
event
of hard
w
a to
if
only
the two events occur at the same site, or if a message is sent from ~’s site to n-’s after ~ occurs and before n occurs, then it is straightforward to implement the algorithm using information available the information about aborted transactions The construction of the Argus database
locally at each site, by transmitting on messages sent between sites. makes use of the given family R’ of
directly-affects relations for B. The Argus database has the same signature as the basic database. The state of the Argus database has three components: basic_ state, histo~, and known–aborts, where basic_ state is a state of the basic database, history is a sequence of basic actions, and known–aborts is a partial mapping from basic transactions whose
events aborts
to sets of transactions. This mapping records affect each event that has occurred. The
the set
known_ aborts(w ) may actually include more transactions than those whose aborts affect ~. By adding more aborted transactions to this set, an implementation would restrict the behavior of orphans further than is strictly necessary to ensure the correctness conditions. In Argus, for example, each event occurs at some physical node of the network, and each node manages a sipgle set of aborted transactions for the entire set of events that occur at that node. In this case, known_ aborts(~ ) includes at least the transactions whose aborts affect T, as well as those transactions whose aborts affect any other event that has occurred at the same physical node as n. Initial states of the Argus database are those with basic–state equal to an initial state of the basic database, history equal to the empty sequence, and known_
aborts
everywhere
undefined.
A triple
(s’, T,s)
is a step of the Argus
Correctness database
of O@zan if and only
(1) (s’.basic-state, (2)
Management
s.history
if the following
conditions
m-, s.basic_state)
= s’.historym,
909
Algorithms hold:
is a step of the basic
if T is a basic
database,
action,
(3) s.history = s’.history if T is not a basic action, (4) If m is a basic action and (+, m) = R~,~,,,o,Y then
s’ .known-aborts(
d) c
s.known_aborts(m), (5) If (6)
+ # n, then
s.known_aborts(~)
= s’.known-aborts(
~),
If rr is a basic
action and + is an ABORT(T) event in s’.history such that then T = s.known–aborts(m ), (+ T) c Rim,.,,, (7) If m is a REQUEST_ COMMIT(T, v) action for an access T to object X, then there is no ancestor of T in s’ .known–aborts( +), for any event 4 of object X in s‘ history. There basic
are two
database.
require
significant First,
s.known_aborts(~
directly
affects
requires
T
constraints ABORT(T)
differences
the
effects ) to
r. In addition,
to
be
in
between
for
each
in
include
s’ .known_aborts( m that is directly
to ensure
Second, the precondition permits the event to occur
n
an event
known–aborts(~
are enough affects n.
the Argus
action
for only
). As
that
database
the +)
Argus for
affected
Lemma
each
~
that
by ABORT(T)
16 below
s.known_aborts(m-)
and the database
contains
shows,
these
T whenever
the R13QUEiST_COMIVIIT of an access to X if the access does not “know about” the abort
of an ancestor, that is, no ancestor is in s’ .known_aborts( ~) for any event ~ of object X in s’ history. As Lemma 17 below shows, this is enough to ensure that every Argus The
behavior
known_
by the Argus
is a filtered
aborts
mapping
algorithm
behavior.
models
to keep
track
the distributed
information
of actions
abort.
that
maintained
However,
rather
than modeling nodes directly and keeping the information on a per-node basis as is done in the actual algorithm, we maintain the information for each event, propagating it whenever one event directly affects another. The known_ aborts component is managed so as to ensure that at least the minimum
amount
implementation
of
necessary
is permitted
stance.
an implementation
coarser
granularity.
information
to propagate might
keep track
03y maintaining
than
of the Argus
type follows
In describing
the
of the known_
the known–aborts
basis, the implementation this strategy.)
is propagated more
algorithm
at each
step.
minimum;
for
aborts
mapping
in the current
the algorithm,
mapping
An inat a
on a per-node Argus
we have tried
prototo focus
on the behavior necessary for correctness, and to avoid constraining an implementation any more than necessary. Notice also that the Argus database does not put any upper limit on what goes into the known–aborts known_ aborts(~ ) to contain
mapping. For example, it is permissible for a transaction that has not aborted. This might
cause nonorphans to block, but will otherwise not result in incorrect behavior. It would be easy (and intuitively appealing) to add a requirement that known_ aborts(w) only includes aborted transactions, but this is not necessa~ for proving that the algorithm To prove other properties, we would
prevents all orphans from such as that the algorithm
need to add additional
We do not attempt
requirements
to state or prove
seeing inconsistent views. only detects real orphans,
such as the one just mentioned.
such properties
in this paper;
the property
M. HERLIHY ET AL.
910
just described is a special case of more general liveness properties, which are an appropriate subject of further research. Finally, while the Argus database is distinguished from the filtered database by maintaining information about ABORT actions on a more local basis, we have kept the state component s.history, which maintains global knowledge of the past behavior of the system. A practical implementation of this algorithm would maintain less voluminous history information in a distributed fashion. Examination of the additional preconditions imposed by the Argus database on any
action
directly
n
affect
reveals
access T to object of efficient
that
s.history
T, and when
X, to determine
maintenance
is used the events
of sufficient
particular basic database further in this paper.
to
determine
the
n- is a REQUEST–COMMIT(T,
and
6.2. THE ARGUS SYSTEM.
that
history
The
Argus
precede
information
distribution
scheme,
system
events
~
v) action
that
for
an
n at X. The details are dependent
and
are
not
is the composition
on the
addressed
of transac-
tions and the Argus database. Executions, schedules, and behaviors of the Argus system are called Argus executions, Argus schedules, and AGUS behauiors, respectively. LEMMA
The Argt{s
12.
PROOF.
The
LEMMA
proof
Let
13.
system implements
is similar
to that
~ be a finite
Argus
in state s. Then s.krlowtl_aborts(w The
following
once during LEMMA
database
lemma
an Argus 14.
Let
The next
behavior
) is defitled
says that
~‘ p be a finite (s’,
~,s)
v ) is dejined.
lemma
of Lemma
each
that can leave the Argus if and only if v
known–aborts
says that
Argus
schedule,
15.
Let
and
r
database
is an elent
set is defined
is an extended
where
the known_
aborts
~ be a finite
Argus
in ~.
at most
can leave the Ajgus database.
If
) = s.known-aborts(v
set for
behalior
~‘
step of the Argus
then s‘ .knowtz–aborts(rr
events that directly affect m-, and that the known_ m is directly affected by an ABORT(T) event. LEMMA
❑
7.
execution.
in state s‘ and
s‘ .known_abotts{
the basic system.
an event
aborts
T includes
set for w includes
that can leave the Argus
). all T if
database
in state s. (1)
If
4
are events
s.known–aborfs(
(2)
If w is directly aborts(w). PROOF.
Lemmas
affected
Immediate 13 and 14.
The next lemma It says that
in
P such
that
c+) G s.krzown_abo?fs(
by
$
by an ABORT(T)
the
directly
affects
w
in
~,
then
m).
definition
elent
of
the
in
~, then
Argus
T G s.known_
database
steps
and
❑
is the key to the proof
the known–aborts
such that an ABORT(T) event propagates enough information about” (stores in s.known_aborts(
set for
of correctness
an event
for the Argus
m includes
system:
all transactions
affects T. In other words, the Argus database about aborts so that every event w “knows m-)) every abort that affects it.
T
Correctness
of O~han
LEMMA
Let
16.
in state s. If
p be a finite
~ and
ABORT(T)
euent,
PROOF.
The
Algorithms
Argus
911
behavior
that can leave the Argus
T are euents in ~ such that
~ affects
database
T in /3 and
~ is an
then T c s.known_abotis(~).
proof
directly-affects 1, then
Management
proceeds
relation
@ directly
by
by which
affects
induction
on
~ affects
m in
the
m in
~. Then,
length
of the
~. If the
Lemma
length
15 implies
chain
in
of the
that
the
chain
is
T E s.known_
aborts(v).
Now
suppose
that
the length
of the chain
is k + 1, where
k >1.
Then
there
is an event ~ in ~ such that @ affects ~ in ~ by a chain of length at most k and Y directly affects T in /3. By inductive hypothesis, T ~ s.known-aborts( +). Lemma 15 implies that s.known–aborts( ~) c s.known–aborts(n), so that T ~ ❑ s.known_aborts(n ). 6.3. SIMULATION ing
lemma
precondition ensure
LEMMA
the
same
Argus Argus is
except to
m- =
of
a mapping
we
omission
state.
We
must
G f(s’)
the
Argus
claim
element
f that
the
t‘
that
(t’,
the
consisting
for For
known_ actions
implements
of
the
that
system.
The
each
state
only
interesting T is an
of the
Argus that
component mapping.
filtered
to
system,
to
the
Condition and
an object
of the (s’,
check
where
is
of
state
system, case
access
filtered
the
system
s’ is a reachable
of the
to
s of
filtered
s.known_aborts
state
the
enough
system,
of the
that
is
followwith
system.
f is a possibilities
v), where m, t) is a step
to
The
combined
accesses,
filtered
state
2, suppose
is a reachable
for
assigns
SYSTEM.
aborts,
the filtered
of the
show
condition
REQUIX+lT_~OMMIT(T,
case,
BY THE ARGUS in
system implements
set f(s)
check.
system,
a step
system
define
singleton
database
1 is easy
SYSTEM
information
REQI_JEST_COMMIT
The Argus
We
the
the
Argus
17.
PROOF. the
that
on
that
system
OF THE BASIC
shows
m,s)
is when X.
In
t is the
this
single
of f(s).
To show that conditions
(t’, r, t) is a step of the filtered
defining
these
steps. The
first
three
system,
we must
are immediate
show the four
from
tion of the steps of the Argus database. To see the fourth, suppose ancestor of T. We must show that no ABORT(T’) event affects
the definithat T’ is an an event of
object X in t‘ history. Suppose the contrary, that an ABORT(T’) event ~ affects an event * of object X in t‘ history. Then, + also affects @ in s’.history. By Lemma 16, T’ = s.known_aborts( ~). But this violates the precondition for n in the Argus The
database.
following
Therefore,
theorem
shows
m is enabled that
ensure that every nonaccess transaction in which it is not an orphan. THEOREM
Then
there
18.
Let
~ be an Argus
exists a basic
behavior
Argus
in t‘.
ID
systems,
like
gets a view it could
behavior
y such
filtered
systems,
get in an execution
and let T be a transaction
that
T is not
Theorem
9.
an orphan
name.
in
y and
about
serial
ylT=~lT, PROOF.
As
for
correctness.
Immediate
the
filtered
by Lemma
system,
17 and
we obtain
an important
❑
corollary
M. HERLIHY ET AL.
912 COROLLARY
Argus
If the basic system is serial~
19.
system is serially
Again,
correct for nonoiphans,
then the
correct.
we give a version
of the preceding
corollary
in which
the dependency
of the Argus system on the basic system is made explicit. If B is a basic system and R’ is a family of directly-affects relations for B, then let Argus(B, R’) be the corresponding Argus system; it is composed of the same transaction automata and the Argus database that is constructed from the given basic database,
using
COROLLARY
family
Suppose
20.
affects relations serially
the given
of relations
R’.
B is a basic system and R‘
for B. If B is serially
is a family
correct for non orphans,
of directly-
then Argus( B, R‘)
is
correct.
7. Strictly
Filtered
Systems
The orphan management algorithm described in [17] actually ensures a stronger property than does the Argus algorithm. It ensures that REQUEST–COMMIT can never occur for an orphan access, whereas the Argus algorithm merely can occur if the access can ensures that no such REQUEST_ COMMIT “observe”
that
database,
which
no ancestor
it is an orphan. allows
In
this
a REQUEST_
has aborted.
(Compare
section,
we define
COMMIT
to occur
this to the filtered
an access to REQUEST_ COMMIT access is not affected by the abort.)
if an ancestor We then define
which is composed of transactions and the strictly that the strictly filtered system implements the section, we describe formally the algorithm from ments
the strictly
7.1. THE similar triple
STRICTLY
to the (s’,
following
filtered
filtered
rr, s)
FILTERED
is
a step hold.
database,
(3) s.history
= s’ history
of
DATABASE. it has
the
The
the
strictly
if n is not
same
LEMMA PROOF.
if
allows
strictly
filtered and
database
event
the if
database same
and
states.
only
if
is A the
database,
action,
v) where
Thus, at the point where the REQUEST_ occur, an explicit test is performed to verify any ancestor of the access. 7.~. THE
which
filtered controller, and show filtered system. In the next [17] and show that it imple-
actions,
filtered
a basic
If m- = REQUEST–COMMIT(T, ancestor of T, then no ABORT(T’)
composition schedules, executions,
filtered
has aborted as long as the the strictly filtered system,
(1) (s’.basic–state, m-, s.basic_state) is a step of the basic (2) s.history = s’.historyn if n- is a basic action, (4)
strictly
an access only
system.
database;
conditions
the for
T is an access, occurs in s’ history.
COMMIT that there
and
T’
is an
of an access is about is no preceding abort
STRICTLY
to
of
FILTERED SYsTE~. The strictly jlltered system is the of transactions and the strictly filtered database. Executions, and behaviors of the strictly filtered system are strictly filtered schedules, and b~haliors, respectively. 21. The
The strictly filtered proof
is similar
system implements to that
of Lemma
the basic ~stem. 7.
❑
Correctness
of Orphan
7.3. SIMULATION LEMMA PROOF.
We
that
f
condition t’
define
system
show
G f(s’)
BASIC
singleton
a
possibilities
is
that
is a reachable filtered
we
unique
claim
that
element
To
show
of the
m, t)
(t’,
these
tion
of
the
T’
is an ancestor
steps
of
the
state
s of
same
state.
the
the
system,
only
of
the
easy
and
(s’,
access
n-,s)
to
strictly
We
must
check.
For
filtered
case
filtered
the
to
strictly
interesting
T is an
a step
X
tions
for
the
strictly
Therefore,
no
Thus,
filtered
first
system,
is a step
to
check
of the
is when
an object
system,
But
that
X.
where
we
must
immediate To
see
=
the
s’ history, no
event
In
this
t is
the
and
an
the
the
fourth,
affects
by the
of
an
precondi-
event
event
four
definisuppose
event
ABORT(T’)
affects
show from
no ABORT(T’)
database,
ABORT(T’)
is enabled
are
database.
show t‘ history
filtered
system,
three
filtered must
in t‘ history.
~
of the The
strictly
of object in
steps.
of T. We
event
t ‘history.
the
m, t) is a step
that
m
is
each
1 is of
SYSTEM
system.
of
state
FILTERED
the filtered
Condition
filtered
before,
to
consists
v), where
(t’,
defining
s‘ .histo~.
assigns
that
s‘ is a reachable
As
STRICTLY
of f(s).
that
conditions
f that f(s)
mapping.
state
system.
BY THE
system implements
set
REQUEST–COMMIT(T,
case,
SYSTEM
a mapping
the
2, suppose
strictly m =
OF THE
913
Algorithms
The strictly filtered
22.
filtered
Management
occurs object
in
X
in
❑
in t‘.
Strictly filtered systems, like filtered systems orphans from discovering that they are orphans.
and
Argus
systems,
prevent
name.
23. Let ~ be a strictly filtered behavior and let T be a transaction Then there exists a basic behal)ior y such that T is not an orphan in y and
ylT=
~lT.
THEOREM
PROOF.
Immediate
COROLLARY
by Lemma
Theorem
If the basic system is serially
24.
strictly filtered
22 and
system is serially
filtered
COROLLARY
nonorphans 8. Clocked
database
that
correct for nonophans,
then the
be the corresponding transaction automata
strictly and the
correct.
If B is a basic system, let Strictly-Filtered(B) filtered system; it is composed of the same strictly
❑
9.
corresponds
to the given
25. Suppose that B is a basic system. then Strictly-Filtered(B) is serially correct.
basic
database.
If B is serially
correct for
Systems
In this section, we describe formally the orphan management algorithm from [17]. We do this by defining the clocked database, which uses a global clock to ensure that transactions do not abort until all their descendant accesses have stopped running. We then define the clocked system, which is composed of transactions and the clocked database. Finally, we show that the clocked system implements the strictly filtered system, and thus simulates the basic system in the same way as the previously 8.1. THE time
for
access has
not
CLOCRED
each
transaction passed.
access
mentioned
DATABASE.
The
transaction
is allowed Release
times
and
systems clocked
a release
to REQUEST_ are
chosen
do.
database time
for
COMMIT so that
once
maintains every only
a quiesce
transaction. if its
quiesce
a transaction’s
An time release
M. HERLIHY ET AL.
914 time
is reached,
all
its
descendant
accesses
have
quiesced.
A
transaction
is
allowed to abort only if its release time has passed. This ensures that, after a transaction aborts, none of its descendant accesses will request to commit. If quiesce and release times are fixed in advance, some transactions may be forced to abort unnecessarily as their quiesce times expire, and aborts may need to be delayed until release times are reached. It is possible to obtain extra flexibility
by providing
actions
and release times. The action signature
in the
of the clocked
clocked database
database, except that the clocked database internal actions. The new actions are: Internal
database
for
adjusting
is the same as that has
three
quiesce
of the basic
additional
kinds
of
Actions:
TICK ADJUST. ADJUST_ The
TICK
QUIESCE(T), RELEASE(T), action
T an access T any transaction
advances
the clock,
while
the two ADJUST
actions
adjust
quiesce and release times. By adjusting the quiesce time for a transaction to be later than its current value, we cad extend the time during which a transaction is allowed to run. Similarly, by adjusting the release time for a transaction to be earlier
than
its
current
value,
we
waiting as long as would otherwise The state of the clocked database clock,
quiesce,
and
release.
Here,
can
allow
a transaction
be necessary. consists of components basic_ state
is a state
to
abort
basic–state, of the basic
without aborted, database,
initialized at an initial state of the basic database. The component aborted is a set of transactions, initially empty. The component clock is a real number, initialized arbitrarily. The component transaction names to real numbers, mapping from all transaction names
quiesce is a total mapping from access and the component release is a total to real numbers. The initial values of
quiesce and release are arbitrary, subject to the following condition: for all transaction names T and T’, where T is an access and T’ is an ancestor of T, quiesce(T) A triple conditions
< release. (s’, n,s) hold:
is a step of the clocked
(1)
If m is an action of the basic system, is a step of the basic database,
(2)
If m is a TICK, then s.basic–state
(3)
database
then
ADJUST_ QUIESCE, = s’ .basic_state,
If n = ABORNT),
if and only if the following
(s’. basic_ state, n-, s.basic_state) or ADJUST_
RELEASE
action,
then
(a) s.aborted = s’.aborted u {T}, (b) s’:release(T) s s’.clock, (4)
If m 1s not an abort
action,
then
(5) If T = REQUEST-COMMIT(T, s’.quiesce(T), (6)
If m = TICK,
(7) (8)
If If (a) (b) (c)
then
s’.clock
s.aborted v) where
= s’.aborted, T is an access, then
< s.clock,
w is not a TICK action, then s.clock = s’.clock, n- = ADJUST-RELEASE(T), then if T e s’.aborted, then s.release(T) < s’.clock, s’.quiesce(T’) s s.release(T) for all T’ = descendants(T) s.release(T’)
s’.clock
= s’.release(T’)
for all T’ + T,
n
accesses,
s’.interval(T’) for every Effect: s.interval
= s’.interval
u
INFORM_ TIME_AT(X)OF(T, Precondition: (T, P) E s’.interval P = ‘m,.
T’ in siblings(T)
n
domain(s’
{(T, P)} p), T an access to X
interval)
have are
Correctness
of O~han
The following LEMMA
(1)
Management
lemma
Algorithms
927
is straightforward.
39
Let
T be any
transaction
transaction
name.
well- formedness
for
Then
the pseudotime
(2)
Let ~ be any object name. Then the pseudo~ime time object well- formedness for X,
(3)
The pseudotime
9.2.4. strongly
Pseudotime compatible
names with
and
the
associated
name
with
preserves
set {PC}
(for
presemes
presezes
pseudo-
well- fonnedness.
database is the composition of a by the union of the set of object
“pseudotime
X is a pseudotime
the name
controller
basic database
Database. A pseudotime set of automata indexed
singleton
each object
system
controller
controller
T,
object
controller”).
automaton
PC is the pseudotime
Associated
Px for X. Finally,
controller
automaton
for
the
type.
LEMMA
40.
name X,
If
is a behavior
~
/3 IX is pseudotime
LEMMA
of a pseudotirne
object
Let B be a pseudotime
41.
database,
then for every object
well-formed. database.
Then B preserves
basic database
well- formedness. It follows 9.2.5. strongly
the pseudotime
Pseudotime compatible
nonaccess {PC}.
that
automaton object
for
When utions,
Px
9.2.6.
A Fami~ relations
defining transitive
with
Finally,
of a basic
of Affects
names,
transaction each
object
associated
for the system
pseudotime behaviors,
database.
system
the
singleton
set
name
T is a transaction
name
X is a pseudotime
with
the
name
PC
is the
type.
is clear from
the pseudotime
and
of a set of
context,
executions,
we call its exec
pseudotime
sched-
respectively.
Relations.
any particular
Now
we define
pseudotime
system
a family B. We
R = {RD} do this
of
by first
a family R’ = {R~} of directly-affects relations for B, and then taking closures. For a sequence ~ of pseudotime actions, define the relation
R~ to be the relation before
X.
and behaviors
for
set of object
nonaccess
automaton
the particular
ules, and pseudotime
affects
for
controller
schedules,
the
each
T. Associated
automaton
pseudotime
names,
with
AT
is an example
Systems. A pseudotirne system is the composition set of automata indexed by the union of the
transaction
Associated
database
containing
n- in ~, and at least
—transaction(
the pairs
(~, m-) of events
one of the following
+) = transaction(m)
and
n
such that
$ occurs
holds:
is an output
event
of the
transac-
tion, —object( –~
@) = object(m)
and T is a REQUEST–COMMIT(T,
is a REQUEST.CREATE(T) event,
—~ —~ —+ —~ —+ —+
is is is is is is
and
v) event,
T an ASSIGN-PSEUDOTIME(T,
an ASSIGN. PSEUDOTIME(T, P) and T a CREATE(T) event, a REQUEST-COMMIT(T, v) and m a COMMIT(T) event, a REQUEST. CREATE(T) and m an ABORT(T) event, an ASSIGN-PSEUDOTIME(T, P) and w an ABORT(T) event, a COMMIT(T) and m a REPORT_ COMMIT(T, v) event, an ABORT(T) and ~ a REPORT-ABORT(T) event,
P)
928 —+ —~
M. HERLIHY
is a COMMIT(T) is an ABORT(T)
and rr an INFORM-COMMIT.AT(X)OF(T) and n an INFORM-ABORT-AT(X)
—~ is an ASSIGN. TIME–AT(X)OF(T, once
again,
define
easy to see that {RP}
defined
is a family
If
p is a behavior
LEMMA
of affects
{RP}
Applying Orphan easy corollary from
and
nono~hans,
of Lemma
is a family
n
relations
an
INFORM_
closure
We
claim
of R~. It is
that
the family
for G. system B and y is an R@-closed
❑
36.
of affects relations
Algorithms the Argus
be used for
then the following
order.
relations
system,
defined
for B.
to Pseudotirne Systems. The algorithm and the clocked
a pseudotime
Let B be a pseudotime
44.
directly-affects
and
of B.
Management shows that
[17] can both
event
of pseudotime
to the proof
The family
COROLLARY
affects
partial
of ~, then y is a behalior
43.
algorithm
P)
event, event, or
OF(T)
RP to be the transitive
is an irreflexive
Analogous
PROOF.
9.2.7. following
the relation
R6
above
42.
LEMMA
subsequence
PSEUDOTIME(T, p) event.
ET AL.
aboue.
system:
and R and R‘ If
B
the family
is serially
cowect
of for
are true:
—Filtered( B, R) is seriallj correct, —Argus( B, R‘ ) is serially correct, —Strictly-Filtered( —Clocked(
B)
B) is serially is serially
correct,
correct.
10. Conclusions We have defined correctness properties for orphan management algorithms, and have presented precise descriptions and proofs for two algorithms from [10] and [17]. Our proofs are quite simple, and show that the systems exhibit a substantial
degree
of modularity:
the
orphan
management
algorithms
can be
used in combination with any concurrency control protocol (in basic system form) that is serially correct for nonorphans. The simplicity of our proofs is a direct result of this modularity, and is in sharp contrast to earlier work [6], in which were
the orphan not cleanly
management
algorithm
and the concurrency
control
protocol
separated.
Our proofs have an interesting structure. We first define a simple abstract algorithm that uses global information about the history of the system, and show that it ensures that orphans see consistent views. We Argus algorithm and the clocked algorithm in a way that only local information, and show that each simulates algorithm. The simulation proofs are quite simple, and do ing the properties already proved for the abstract algorithm. the Argus
and clocked
algorithms
the abstract algorithm. In this paper, we have transactions. eliminating complicated whether
the
then
analyzed
follows
only
directly
orphans
from
that
then formalize the requires the use of the more abstract not require reprovThe correctness of the correctness
result
from
aborts
of of
Interesting algorithms have also been developed for detecting and orphans arising from crashes [10, 17]. These algorithms seem more than the algorithms for handling aborts. An open question is known
algorithms
for
handling
crash
orphans
can
be analyzed
Correctness
of O~han
Management
using techniques similar find a similar separation
929
Algorithms
to those in this paper. In particular, of concerns for those algorithms,
orphan algorithms can be understood protocols and abort-orphan algorithms.
it would so that
be nice to the crash-
independently of concurrency Whether this will be possible
control is still
unknown.
ACKNOWLEDGMENT. for their
We
comments
thank
on earlier
Alan
Fekete,
versions
Ken
Goldman,
and Sharon
Perl
of this work.
R13FERENCES 1. ALLCHIN, J. E. An architecture for reliable decentralized systems. Tech. Rep. GIT-ICS83/23. Georgia Institute of Technology, Atlanta, Ga., Sept. 1983. M., AND WEIHL, W. A theory of timestamp2. ASPNES, J., FEKETE, A., LYNCH, N., MERRIIT, of the 14th International based concurrency control for nested transactions. In Proceedings Conference on Very Large Data Bases (Aug.), Morgan-Kaufmann, San Mateo, Calif., 1988, pp. 431-444. and Reco[ery in 3. BERNSTEIN, P., HADZILACOS, V., AND GOODMAN, N. Concurrency Control Database Systems. Addison-Wesley, Reading, Mass., 1987. Commutatk@-based locking for 4. FEKETE, A., LYNCH, N., MERRITT, M., AND WEIHL, W. nested transactions. J. Comput. Syst. Sci. 41, 1 (Aug. 1990), 65–156. 5. GOLDMAN. K., AND LYNCH, N. Quorum consensus in nested transaction systems. In Proceedings of 6th An?u~al ACM Symposium on Principles of Distributed Cornpzdation (Vancouver, B. C., Canada, Aug. 10– 12). ACM, New York, 1987, pp. 27–41. (Expanded version available as Tech. Rep. MIT\ LCS/TM-390. Laborato~ for Computer Science. Massachusetts Institute of Technology, Cambridge, Mass., May 1987.) Internal consistency of a distributed transaction system with orphan detection. 6. GOREE, J. A. Tech. Rep. MIT/LCS\TR-286. Massachusetts Institute of Technology, Cambridge, Mass., January 1983. M., LYNCH, N., MERRITT, M., AND WEIHL, W. On the correctness of orphan 7. HERLIHY, of the 17th International ,$ymposiam on Fault-Tolerant elimination algorithms. In Proceedings Computing (Pittsburgh, Pa., July). IEEE, New York, 1987, pp. 8–13. (Extended version available as MIT/LCS/TM-329, Laboratory for Computer Science. Massachusetts Institute of Technology, Cambridge, Mass., May 1987.) M. P., AND WING, J. M. Inheritance of synchronization and 8. DETLEFS, Il. L., HERLIHY, recovery properties in avalon/C + +. IEEE Cornput. 21, 12 (Dec. 1988) 57–69. and actions: Linguistic support for robust, 9. LISKOV, B., AND SCHEIFLER, R. Guardians distributed programs. ACM Trans. Prog. Lang. Syst. 5, 3 (July, 1983), 381-404. Orphan detection. In Proceedings 10. LISKOV, B., SCHEIFLER, R., WALKER. E. F., AND WEIHL, W. of the 17th International Symposium on Fault-Tolerant Computing (July). IEEE, New York, 1987, pp. 2–7. AdL. Comput. Res. 3 Concurrency control for resilient nested transactions. 11. LYNCH, N. A. (1986),
335-373.
12. LYNCH, N. A., AND MERRITT, M. Comput
Sci. 62 (1988),
Introduction
to the theory of nested transactions.
Theoret.
123-185.
to the theory of nested transactions. In ProceedTheory (Rome, Italy, Sept.). 1986, pp. 278–305. 14. LYNCH, N., MERRITT’, M., WEIHL, W., AND FEKETE, A. Atomic Transactions. MorganKaufmann, San Mateo, Calif. To appear, Fall 1992. Hierarchical correctness proofs for distributed algorithms. In 15. LYNCH, N., AND TUTTLE, M. Proceedings of the 6th ACM Symposium on Principles of Distributed Computing (Vancouver, B. C., Canada, Aug. 10–12). ACM, New York, 1987, pp. 137–151. (Expanded version available as Tech. Rep. MIT\ LCS/TR-387. Laboratory for Computer Science. Massachusetts Institute of TechnoloM, Cambridge, Mass., April 198’7.) An introduction to input/output automata. CW7 Quarter@ 2, 3 16. LYNCH, N., AND TUTTLE, M. 13. LYNCH, N., AND MERRHT, ings of the International
(1989),
Introduction
on Database
219-246.
17. MCKENDRY, Softw.
M.
Conference
Eng.
M.,
AND HERLIHY, 1989).
15, 7 (July,
M.
Timestamp-based
orphan
elimination.
IEEE
Trans.
930
M. HERLIHY
ET AL.
Nested Transactions: An Approach to Rehable Distributed Computmg. MIT 18. MOSS, J. E. B. Press, Cambridge, Mass., 1985. 19. NELSON, B. J. Remote procedure call. Tech. Rep. CMU-CS-81-119. Dept. Computer Science. Carnegie-Mellon Univ., Pittsburgh, Pa., May 1981. 20. PERL, S. Distributed commit protocols for nested atomic actions. Tech. Rep. MIT/LCS/TR-431. Massachusetts Institute of Technology, Cambridge, Mass., Sept. 1987. 21. Pu, C. AND NOE, J. D. Nested transactions for general objects: The Eden implementation. Tech. Rep. TR-S5-12-03. Dept. of Computer Science, Univ. of Washington, Seattle, wash., December 1985. System level concurrency cOntrol for ZZ. ROSENKRANTZ, D. J., STEARNS, R. E., AND LEWIS, P. M.
distributed database systems. ACM Tram. Datab. Syst. 3, 2 (June 1978), 178-198. A., ANDSWEDLOW,K. Galde to the Camelot Distnbated Transaction Facd@: Release 23. SPECTOR, 1. Carnegie-Mellon Univ., Pittsburgh, Pa., Oct. 1987. 24. WALKER, E. F. Orphan detection in the argus system. Tech. Rep. MIT/LCS/TR-326. Massachusetts Institute of Technology, Cambridge, Mass., May 1984. W. E. Specification and implementation of atomic data types. Tech. Rep. 25. WEIHL, MIT/LCS\TR-314. Massachusetts Institute of Technology, Cambridge, Mass., 1984. Modular concurrency control for abstract data 26. WEIHL, W. E. Local atomicity properties: types. ACM Trans. Prog. Lang. Syst. 11, 2 (Apr. 1989), 249–283. RECEIVED
JUNE 1987;
REVISED MARCH 1991 ; ACCEPTED APRIL 1991
JOurnd of the AssocIdmII for Cmnputlng hidchl~~m, Vd
39, No 4, oct~bc~ 1992