On the correctness of orphan management algorithms - Research - MIT

2 downloads 0 Views 4MB Size Report
WILLIAM. WEIHL. Massachusetts. Institute of Technology,. Cambridge, ... tory, 1 Kendall Square, Cambridge, MA 02139; N. Lynch, Massachusetts. Institute.
On the Correctness

MAURICE Digital

Co~oration,

Algorithms

Cambridge,

Massachusetts

LYNCH

Massachusetts

Institute

MICHAEL

MERRITT

AT&

Management

HERLIHY

Equipment

NANCY

of Orphan

T Bell Laboratones,

of Technology,

Murray Hill,

Cambridge,

Massachusetts

New Jersey

AND WILLIAM

WEIHL

Massachusetts

Institute

of Technology,

Cambridge,

Massachusetts

Abstract. In a distributed system, node failures, network delays, and other unpredictable occurcomputations—subcomputations that continue to run but whose rences can result in o~han results are no longer needed. Several algorithms have been proposed to prevent such computations from seeing inconsistent states of the shared data. In this paper, two such orphan management algorithms are analyzed. The first is an algorithm implemented in the Argus distributed-computing system at MIT, and the second is an algorithm proposed at CarnegieMellon. The algorithms are described formally, and complete proofs of their correctness are given. The proofs show that the fundamental concepts underlying the two algorithms are very similar in that each ran be regarded as an implementation of the same high-level algorithm. By exploiting M. Herlihy was sponsored by the Defense Advanced Research Projects Agency (DOD), ARPA Order No. 4976 (Amendment 20), under Contracts F33615-84-K-1 520 and F33615-87-C-1499, monitored by the Avionics Laboratory, Air Force Wright Aeronautical Laboratories, WrightPatterson AFB. N. Lynch was supported by the National Science Foundation under Gmnts DCR 83-02391 and CCR 86-11442, the Defense Advanced Research Projects Agency (DARPA) under Contract NOO014-83-K-0125, the Office of Naval Research under Contract NOO014-85-K-0168, and the Office of Army Research under Contract DAAG29-84-K-O058. W. Weihl was supported by an IBM Faculty Development Award, the National Science Foundation under grants DCR 85-100014 and CCR 87-16884, and the Defense Advanced Research Projects Agency (DARPA) under Contract NOO014-83-K-0125. Authors’ addresses: M. Herlihy, Digital Equipment Corporation, Cambridge Research Laboratory, 1 Kendall Square, Cambridge, MA 02139; N. Lynch, Massachusetts Institute of Technology, Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139; M. Merritt, AT& T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974; W. Weihl, Massachusetts Institute of Technology, Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computmg Machinery. To copy otherwme, or to repubhsh, requires a fee and/or specific permission. @ 1992 ACM 0004-5411/92/1000-0881 $01.50 Journal

of the Assocmtmn

for Computmg

Machinery,

Vol

39, No

4, October

1992, pp

881 –930

M. HERLIHY

882

ET AL.

properties of information flow within transaction management systems, the algorithms ensure that orphans only see states of the shared data that they could also see if they were not orphans. When the algorithms are used in combination with any correct concurrency control algorlthm, they guarantee that all computations, orphan as well as nonorphan, see consistent states of the shared data. Categories and Subject Descriptors: —distributed progrrznzrnbzg; D.3.2

D. 1.3 [Programming Techniques]: Concurrent Programming [Programming Languages]: Language Classifications— concurrent, dzstnbuted, and parallel languages; D.4. 1 [Operating Systems]: Process Management— concawerzcy: D.4.5 [operating Systems]: Reliabili~—@lt-@lerarzce: F.3. 1 [Logics and Meaning of Programs]: Specifying and Verifying and Reasoning about Programs—assertions; i)zturimzts; spec@catzon techizlques; H.2.3 [Database Management]: Languages—database ( persistent ) prograrnrnizg languages; H.2.4 [Database Management]: Systems—concurrency: dwti”buted systems; transaction

General

processing

Terms: Algorithms,

Languages,

Additional Key Words and Phrases: tomata, recovery, serializability

Rehability,

Argus,

Theory,

avalon,

Verification

atomic

actions,

camelot,

input–output

au-

1. Introduction Nested

transaction

projects

(e.g.,

computations

systems

see

[1],

[8],

in distributed

have been [9],

systems.

and

in a number

[23])

Nested

tions, provide a simple construct for failures. Nested transactions extend permit degree

explored

[21],

of recent

as a means

transactions,

for

research organizing

like ordinary

transac-

masking the effects of concurrency and the usual notion of transactions [3] to

concurrency within a single transaction. They also provide a greater of fault-tolerance by isolating a transaction from the failures of its

descendants. In distributed

systems,

various

factors,

including

node

crashes

and network

delays, can result in orphan computations—subcomputations that continue to run even though their results are no longer needed. For example, in the Argus system [9], a node partition or some

making a remote request may give up because other problem prevents it from communicating

a network with the

other node. This may leave a process running at the called node; this process is an orphan. The orphan runs as a descendant of the transaction that made the call. Since the caller gives up by aborting the transaction that made the call, the orphan will not have any permanent effects on the observed state of the shared data. As discussed in [10] and [17], even if a system is designed to prevent orphans from

permanently

affecting

shared

data,

orphans

are still

undesirable,

for two

reasons. First, they waste resources: they use processor cycles, and may also hold locks, causing other computations to be delayed. Second, they may see inconsistent states of the shared data. For example, a transaction might read data at two nodes, with some invariant relating the values of the different data objects. If the transaction reads data at one of the nodes and then becomes an orphan, another transaction could change the data at both nodes before the orphan reads the data at the second node. This could happen, for example, because the first node learns that the transaction has aborted and releases its locks. Although the inconsistencies seen by an orphan should not have any permanent effect on the shared data in the system, they can cause strange behavior if the orphan interacts with the external world; this can make programs difficult to design and debug.

Correctness

of Orphan

Several

Management

Algorithms

have

proposed

algorithms

inconsistent

information.

algorithms

for

crashes.

detecting

Nelson’s

been

Early and

work

work

to

prevent

orphans

in the area includes

eliminating

did not

883

orphans

assume

that

arise

an underlying

from

[19], which

seeing

describes

because

transaction

of node

mechanism,

so it was difficult to assign simple semantics to abandoned computations. Recent work [10, 17, 24] has studied orphans in the context of a nested transaction system, in which an abandoned computation can be aborted, preventing

it from

having

any effect

on the state of the system.

algorithms in [10], [17], and [24] is to detect can see inconsistent information. 1.1. ness

NEW

RESULTS.

proofs

for

algorithm

in

completely and

the

algorithms.

proofs

is

yet

in

use

intuitions the

give

regarded

formal

Argus

In

addition,

the

designers

concepts as

an

before

descriptions

the

algorithms

The goal of the

orphans

algorithms

in

that

two

fundamental

be

we

management

straightforward.

the

the

can

paper,

currently

follow

that

each

this

orphan

Although

show

fact,

[10]

In two

rigorous,

the

proofs

the

and eliminate

in system.

the

have to be

underlying

them

implementation

of

[17].

The

proofs

are

presentations

used

in

quite

describing

different,

are very the

correct-

and

1 Our

both

appear

and

[10]

they

our

similar;

same

in

high-level

algorithm. Our

results

relate

management

S’

must

which of

namely,

S‘ must

see

a view

it is not

rency

with

the

invoked

by

control

results

algorithm

imply

consistent sometimes

of

that

views. made

by

a

system,

that

S’,

S in the

sense

it

see

including that

results

provide

that

in

results

orphan

of

and

[13],

Lynch

a~d

Merritt

transaction

designers

that

S in

and

sub-

a concurviews,

as nonorphan, for informal

the

a model

of

is its sequence

see consistent

justification

orphan

execution

operations

as well

formal

develop

each

orphan

no

S includes

combination with any concurrency control algorithm. The formal model used in this paper is based on that [12]

an

system

system

nonorphans

an

S, having

of the

When

transactions,

algorithms’

could the

transaction.) ensures

containing

system,

“view”

all

the

system,

transaction’s

the

S‘,

These

system

(A

that

in

of

of a corresponding “simulate”

the

an orphan.

interactions

transactions

behavior

to that

management; in

the

algorithm

algorithms

our

see claims

work

in

in [4], [12], and [13]. In for

nested

transaction

systems including aborts, and use the model to show that an exclusive locking variation of Moss’s algorithm [18] ensures correctness for nonorphans. The paper [4] contains improvements to the basic model in [12] and [13], plus proofs that Moss’s read–write algorithm and a more general commutativity-based locking algorithm also ensure correctness for nonorphans. In this paper, we use the same model to describe the two orphan management algorithms mentioned above,

to state correctness

1.2. RELATED ment

algorithm

properties,

WORK.

Earlier

appears

in [6]. This

nested transaction systems that general than those presented

work

and to prove on verifying work

is described here, since

is based

the algorithms the Argus on

in [11]. The they apply

correct.

orphan

an earlier

managemodel

for

results in [6] are less only to the specific

1 Our analysis covers only orphans resulting from aborts of transactions that leave running descendants; there is another component of the Argus algorithm that handles orphans that result from node crashes in which the contents of volatile memory are destroyed.

884

M. HERLIHY

concurrency

control

algorithm

(nested

locking)

used by Argus.

ET AL.

Moreover,

the

presentation there is much more complex than the one in this paper. Much of the complexity in [6] arises because the treatments of concurrency control and orphan

management

the two. The

are intermingled,

model

for describing Other work

whereas

here

in [4], [12], and [13] provides

this separation. using the model

we are able

a convenient

to separate

set of concepts

of [4], [12], and [13] includes

[2], [5], and [20];

these papers prove correctness of algorithms for replica management, timeand distributed transaction commit, stamp-based concurrency control, respectively. The fact that it is possible to use the model to explain such a variety of transaction-processing algorithms is strong evidence that it is a useful tool for modeling and analyzing nested transaction systems. In an earlier version of this paper [7], we presented similar results proving the correctness however, modeling

of the two algorithms

were restricted to locking algorithms,

timestamps

and other

the

paper

earlier

analyzed

mechanisms.

to

apply

here.

The

results

in that

to

In this paper,

a wide

range

we generalize

of systems.

1.3. ORGANIZATION as follows:

and a brief

OF THIS PAPER.

Section

description

2 contains of I/O

The

remainder

general

of the

some preliminary

automata,

which

for using

the results

The

we give

paper

mathematical

of

results

described here are somewhat abstract; to make them more concrete, two examples illustrating how they apply to specific kinds of systems.

nized

paper,

a particular class of systems appropriate and did not apply directly to systems

is orga-

definitions

serve as the formal

foundation

for our work. This section may be skipped on first or cursory reading, and is included in order to make the technical presentation of this paper entirely self-contained. Section 3 contains a definition of basic systems, a general class of transaction-processing systems to which our results apply. These are nested transaction systems in which orphans may occur, and for which the problem of managing orphans can be precisely and intuitively stated. A basic system models user

the components

program

system control

system

is said to mange

sen”al correctness

Section among

4 contains

different

in

we

[10]

simple

some

events

and

system,

not

require

the

the

the of

proofs that it

use

of

local

algorithm. of the of the

the

have

as 1/0

automata.

and

the

Each

rest

of

the

principal the

information

properties

already

and

clocked

algorithm.

see

clocked

and

proofs proved algorithms

dependencies

this

We

about

the

consistent

algorithm

only,

a property

the results

of

structure.

orphans

the

the

underlie

management

information

simulation

Argus

orphan

interesting

that

about

contributions

two

The

abstract

results

if it ensures and nonorphans.

these concepts

global

and

correctly orphans,

and

an

uses

ensures

algorithm

reproving

correctness correctness

8 contain

that

abstract

definitions

correctness

Our

Argus

orphans

in a basic system;

algorithm show

the

requires more

the

[17].

and

the The

prove

abstract

formalize that

system automaton,

for all transactions,

the rest of the paper. Sections 5 through which

transaction

as a transaction

(which may include a division into objects, and may include concurrency and recovery algorithms) is modeled as a single basic database automa-

ton. A basic called

of a nested

is modeled

are for

first

that

the

then

[17] each

a

of the We

then

in

a way

simulates

simple,

abstract follows

define

history

views.

quite

in

algorithms

from

show

paper,

in

and

do

algorithm. directly

from

Correctness Each

of Orphan

orphan

transforming these

Management

management

Algorithms

algorithm

an arbitrary

basic

systems

contains

the

each manages

orphans

using

is described

system

same

885

without

transactions

a different

as a system

orphan

obtained

management.

as the

given

basic database.

basic

by

Each system,

The abstract

of but

algorithm

is modeled by the jiltered database, which maintains information about the global history of the system, and uses tests based on this history information to prevent orphans from learning that they are orphans. The Argus database models

the

manages

behavior

orphans

dencies

among

abstract

algorithm,

algorithm even

of

using

the tests

system

Argus

orphan

based

events.

The

introduced

models

restrictive

the orphan

than

the

management

[10];

direct

filtered

database

models

of the

correctness

database.

algorithm

about

the proof on global

filtered

algorithm

information

strictly

to simplify

in [17]; it also uses tests based

more

management

on local

from

history

another of the

information,

Finally,

the

[17]; it manages

it

depen-

clock

and is database

orphans

using

information about logical clocks. Each of these four databases is described as the result of a transformation of the basic database. We prove that the filtered system (the system consisting of the transactions and

the

filtered

transactions, system basic

ensures

system

the Argus

correctness strictly

they

serial

Similarly, which

basic

for

correctness

the filtered

in

turn

they

the

sense

that

It follows

that

transactions, We

and so inherits

if the

then

the

filtered

the

also prove the same

the clock system implements

implements

all

see in the basic

all transactions. system,

that

in

could

orphans.

nonorphan for

we prove

system

that

are not

correctness

system implements system,

the

see a “view”

in which serial

ensures

property.

jiltered

simulates

orphans,

in an execution system

filtered that

database)

including

system,

the thus

showing that the clock system has the same correctness property as the filtered system. Section 9 makes some of the preceding general concepts more concrete by describing two particular types of basic systems, taken from other work using this model.

The

first

kind

of basic

system,

a gerzeric system,

is appropriate

for

describing locking algorithms, while the second kind of basic system, a pseudotime system, is appropriate for describing timestamp-based algorithms. Both kinds

of systems

database each tions

specialize

automaton

into

object in the system and objects together.

the notion two

kinds

of a basic

system

of components:

and a controller The concurrency

by splitting

an object

the basic

automaton

the system is encapsulated within the object systems differ in that they have slightly different

automata. interfaces

The two kinds of between the objects

and the controller. Particular information flow dependencies are described both of these kinds of basic systems. Section 10 contains a summary of our results and some suggestions further work. 2. Formal

for

automaton that links the transaccontrol and recove~ performed by

for for

Preliminaries

An irrejlexiL’e partial order is a binary relation that is irreflexive, antisymmetric, and transitive. The formal subject matter of this paper is concerned with finite and infinite sequences describing the executions of automata. Usually, we are discussing sequences of elements from a universal set of actions. Formally, a sequence ~ of actions is a mapping from a prefix of the positive integers to the

886

M. HERLIHY

set of actions. integers under may

occur

different

We describe the mapping,

several

times

occurrences.

ET AL.

the sequence by listing the images of successive writing ~ = T1 Tz rr~ .”. .Z Since the same action in

a sequence,

Thus,

we refer

a sequence as an euent. Formally, actions is an ordered pair (i, m),

it

is convenient

to a particular an event where i

to

distinguish

occurrence

the

of an action

in

in a sequence P = Wlmz . . . of is a positive integer and n- is

an action, such that w,, the ith action in ~, is T. A set of sequences P is prefix-closed provided that

whenever

is a prefix limit-closed

a set of sequences P is finite prefixes are in P

of ~, it is also the case that -y c P. Similarly, provided that any sequence all of whose

is also in P. We sequences

to any nonempty,

about

INPUT/

OUTPUT

complex

transactions,

it is important

put automaton

model

of concurrent

The

such

model

[15, 16]. This algorithms

to satis~.

MODEL.

systems

to have a simple

computation.

supposed

AUTOMATON

concurrent

for concurrent tions

prefix-closed,

and limit-closed

set of

The

In

that

and clearly

to reason implement

defined

we use for our work

model

allows

careful

and of the correctness

model

order

as those

can serve

formal

model

is the input/out-

and readable conditions

as the basis for

careatomic

descrip-

that

rigorous

they

proofs

is sufficient

for

der properties

of

in

this

finite

paper.

executions

properties. system component

mathematical ever, an 1/0 set. The internal.

use

object

particular,

in

only,

and

not

is modeled

somewhat

automaton

In

like

as an

a traditional

need not be finite-state,

actions of an 1/0 This classification

do 1/0

this

paper

consider

automaton,

finite-state

we

are that

particular algorithms satisfy particular correctness conditions. This subsection contains an introduction to a special case of the model

fairness Each

y

as a safe~ prope~.

2.1. THE fully

refer

f3 e P and

that consi-

liveness which

automaton.

or is a

How-

but can have an infinite-state

automaton are classified as either input, output, or is a reflection of a distinction between events (such

as the receipt of a message) that are caused by the environment, events (such as sending a message) that the component can perform when it chooses and that affect the environment, variable) that a component

and events can perform

(such as changing when it chooses,

the value of a local but that are unde-

tectable by the environment except through their effects on later events. In the model, an automaton generates output and internal actions autonomously, and transmits

output

actions

instantaneously

to its environment.

In contrast,

the

automaton’s input is generated by the environment and transmitted instantaneously to the automaton. The distinction between input and other actions is an fundamental, based cm who determines when the action is performed: automaton can establish restrictions on when it will perform an output or internal action, but it is unable to block the performance of an input action. 2.1.1. Action Signatures. and their classification into action signature. An action

The formal description of an automaton’s actions inputs, outputs, and internal actions is given by its signature S is an ordered triple consisting of three

pairwise-disjoint sets of actions. We write components of S, and refer to the actions 2 We use the symbols indiwdual actions.

/3, y,...

for

sequences

in(S), out(S), and int(S) for the three in the three sets as the input actions,

of actions

and the symbols

m, ~, and

+

for

Correctness

of Orphan

output

actions,

out(S)

and refer

local(S)

and

Management

internal

actions

to the actions

= int(S)

u out(S),

Algorithms of S, respectively.

in ext(S)

and

887

refer

We

as the external

to the

actions

An

external

then

action

signature

is an

that

is, having

no internal

actions,

the

external

action

signature

(in(S), out(S), 0), that is, the removing the internal actions. 2.1.2.

Input/Output

an I/O

automaton

action

of

action

S is the

consisting

action

that

the

set of

signature

local~

controlled

extsig(S) from

An inpzlt/output automaton A (also an automaton) consists of four components:

states

=

S by

called 3

of start states,

need

not

be

finite.

We

as a step of A. The step (s’, T,s) action, and output steps, internal

steps are defined

of

signature,

is obtained

—a transition relation steps(A) c states(A) x acts(sig(A)) property that for every state s’ and input action rr (s’, m-,s) in steps(A).

(s’, m, S) of steps(A) A if T is an input

and

entirely

If S is an action

that

U

locally

U int(S),

Automata. or simply

—an action signature sig(A), —a set states(A) of states, —a nonempty set start(A) c states(A)

Note

= in(S)

as the

u out(S)

signature actions.

signature

ext(S)

of S. AIso, we let

in local(S)

controlled actions of S. Finally, we let acts(S) = in(S) refer to the actions in acts(S) as the actions of S.

external

let

actions

analogously.

x states(A),

with

the

there

is a transition

refer

to

an element

is called an input step of steps, external steps, and

If (s’, rr, s) is a step of A, then

T

is said to be enabled in s’. Since every input action is enabled in every state, automata are said to be input-enabled. The input-enabling property means that an automaton

is not

sometimes write out(A), etc. An

able

to block

input

actions.

If A

is an automaton,

locally controlled, that is, if in(A) = 0. Note that an 1/0 automaton can be nondeterministic, two things: same

state,

different model’s

that

we

acts(A) as shorthand for acts(sig(A)), and likewise for in(A), 1/0 automaton A is said to be closed if all its actions are

more

and that

successor descriptive

than the

one locally same

action,

controlled applied

action in the

states. This nondeterminism power. Describing algorithms

possible tends to make results results about nondeterministic

by which

we mean

can be enabled same

state,

in the

can lead

is an important part as nondeterministically

to

of the as

about the algorithms quite general, since many algorithms apply a fortiori to all algorithms

obtained by restricting the nondeterministic choices. Moreover, the use of nondeterminism helps to avoid cluttering algorithm descriptions and proofs with inessential details. Finally, the uncertainties introduced by asynchrony make nondeterminism an intrinsic property of real concurrent systems, and so an important 2.1.3. an 1/0

property

to capture

Executions,

Schedules,

automaton,

each possible

in our formal

and Behaviors. run

model When

of the system

of such systems. a system is modeled

is modeled

by

by an “execu-

tion,” an alternating sequence of states and actions. The possible activity of the system is captured by the set of all possible executions that can be generated by the automaton.

However,

not

all the information

contained

in an execution

is

3 1/0 automata, as defined in [15], also include a fifth component, which is used for describing fair executions. We omit it here as it is not needed for the results described in this paper.

M. HERLIHY

888 important

to a user of the system,

placed.

We believe

externally

that

visible

events,

on the automaton’s of external

what

nor to an environment

is important

and not

the states

“behaviors’’-the

(i.e., input

about

or internal

subsequences

and output)

actions.

in which

the activity

the system

of a system

events.

Thus,

a system

is

is the

we focus

of its executions

We regard

ET AL.

consisting

as suitable

for a

purpose if any possible sequence of externally visible events has appropriate characteristics. Thus, in the model, we formulate correctness conditions for an 1/0 automaton in terms of properties of the automaton’s behaviors. Formally, or

infinite

A

such

an execution sequence

that

execution denote

(s,, ml+,,

of A is a finite

“””

fragment

beginning

-””

mnsn

s,+ ~) is a step

the set of executions

of A by firzexecs(A). a finite The

fragment

sOmlslm2

of A

with

of for

a start

state

and

and the set of finite

A state is said to be reachable fragment

SOT,s ~V2 states

a

---

Vns ~

actions

of

i for which s,+, exists. An is called an execution. We

every

of A by execs(A),

execution of A. schedule of an execution

sequence

alternating

executions

in A if it is the final

of A

is the

state of

subsequence

of

a

consisting of actions, and is denoted by sched( a ). We say that P is a schedule of A if ~ is the schedule of an execution of A. We denote the set of schedules of A by scheds(A) and the set of finite schedules of A by &wcheds(A). The behalior of a sequence ~ of actions in acts(A), denoted by beh( ~ ), is the subsequence of ~ consisting of actions in ext(A). execution fragment a of A, denoted by beh( a ), is defined We say that denote

f? is a behal)ior

of A if ~ is the behavior

the set of behaviors

of A by behs(A)

The belzavior of an to be beh(sched( a )).

of an execution

and the set of finite

of A. We

behaviors

of A

by &zbehs(A). We say that a finite schedule ~ of A can leale A irz state s if there is some finite execution a of A with final state s and with sched( a ) = ~. Similarly, a finite

behavior

~ of A can leave A in state s if there

is some finite

execution

a of A with final state s and with beh( a ) = ~. An extended step of automaton A is a triple of the form (s’, ~, s), where s’ and s are in states(A),

an ~

is a finite

of

sequence

of actions

in acts(A),

and there

is an execution

fragment

A having s’ as its first state, s as its last state, and p as its schedule. If /3 is any sequence of actions and @ is a set of actions, we write denote

the subsequence

an automaton,

we write

2.2. COMPOSITION. tion in

of several our

automata

operator identically

model, can

of ~ containing (31A for Often,

component we be

connects named

combined

each input

component automata. autonomously by one

a single a

of actions

~ 10 to

in 0. If A is

~ lacts(A).

systems

define

all occurrences

system

interacting

“composition” to yield

can with

also one

operation

a single

1/0

be viewed another. by

automaton.

output action of the component actions of any number (usually

In the resulting component and

which

as a combinaTo

reflect

several

this 1/0

Our composition automata with the one) of the other

system, an output action is generated is thought of as being instantaneously

transmitted to all components having the same action as an input. All such components are passive recipients of the input, and take steps simultaneously with the output step. 2.2.1. Composition of Action Signatures. action signatures. Let I be an index set that

We first is at most

define composition of countable. A collection

Correctness

of O~han

Management

signatures

{S,}, e ~ of action properties hold:

(3)

m

Thus,

action

is in

no action

internal

acts(S,)

for

Moreover,

component

signatures.

we

if the

for all

i, j = I such that

i + j,

i, j = I such that

i #j,

than

do not

do

not

one signature appear

permit

in the collection,

in any other

actions

=

U, ~, in(s~)



U,



and

signature

in the

infinitely

many

compatible

action

involving

The composition S = HZ. ~ S, of a collection of strongly signatures {S1}, ~ ~ is defined to be the action signature with ‘in(s)

following

i.

many

of more

of any signature

collection.

compatible

for all infinitely

is an output

actions

889

is said to be strong~

= 0 = @

(1) OUt(S1) n O@J) (2) int(Sl) n acts(S,)

Algorithms

1 out($),

= u ~● ~ OUt(si), = U, ● ~ int(S, ).

‘OUt(S) —int(S) Thus,

output

actions

are those

that

signatures,

and similarly

for internal

are inputs

to any of the component

are outputs

actions.

of any of the

Input

signatures,

actions

but

component

are any actions

outputs

that

of no component

signature. 2.2.2.

Composition

be strong~

of Automata.

compatible

if their

A collection

action

{A,},.

signatures

composition A = H,. ~ A, of a strongly {A,},., has the following components:s

~ of automata

are strongly

compatible

is said to

compatible.

collection

of

The

automata

—sig(A) = ~,. ~ sig(A,), —states(A) = H, ● ~ states, —start(A) = Hz ● ~ start(A,), —steps(A) is the set of triples (s’, n-,s) such that for all i c I, (a) if T G acts(Al), (s’[i], w, s[i]) = steps(A1), and (b) if m @ acts(Ai), then s’[i] == s[i].G then Since the automata

Ai

are input-enabled,

so is their

composition,

and hence

their composition is an automaton. Each step of the composition automaton involves all the automata that have a particular action in their action signature performing action

that

in their

action

concurrently,

signature

do nothing.

while

the automata

We often

refer

that

do not

to an automaton

have that formed

by

composition as a “system” of automata. If a! = SO’iTISl “.” is an execution of A, let a /Ai be the sequence obtained by deleting w,s~ when ~, is not an action of A,, and replacing the remaining SJ by SJ[i ]. Recall that we have previously defined a projection sequences. The two projection operators are related sched( a IA, ) = sched( a )IAZ, and similarly In the specifying

course their

of our internal

discussions, actions.

4 A weaker notion called “compatibility” given properties only. For the purposes

beh( a IAl)

we often

To avoid

reason

tedious

in

operator for action the obvious way:

= beh( a )Iil,. about

arguments

automata about

without

compatibil-

is defined in [15], consisting of the first two of the three of this paper, only the stronger notion will be required.

5 Note that the second and third components listed are just ordinary Cartesian products, ~t component uses the previous definition of composition of action signatures. We use the notation s[i] to denote the ith component of the state vector s.

while the

M. HERLIHY

890 ity, henceforth, are unique

we assume

to that

that

automaton,

unspecified

internal

and do not occur

of any of the other automata we discuss. All of the systems that we use for modeling that

is, each

action

component

will

2.2.3.

Propetiies

is an output

be an input

of some

of at most

actions

transactions

one other

of Systems of Automata.

of any automaton

as internal

component.

ET AL.

or external

actions

are closed

systems,

Also,

each

output

of a

component.

Here

we give basic results

relating

executions, schedules, and behaviors of a system of automata to those of the automata being composed. The first result says that the projections of executions of a system onto the components are executions of the components, and similarly

for schedules,

PROPOSITION

andlet

A=

Moreoller, in place

etc.

Let

1.

EI A,.

fl,

{A ,}1~ ~ be a strongly If

a = execs(A),

compatible

then

the same result holds for j7nexecs,

alA,

collection

= execs(A,)

scheds, f%scheds,

of automata, for

all

i = 1.

behs, and jinbehs

of execs.

Converses

can also be proved

for

all the parts

of the preceding

proposition.

The following are most useful. They say that schedules and behaviors of component automata can be “patched together” to form schedules and behaviors of the composition. PROPOSITION

Let

2.

and let A = TIlef

{A,}, ~ ~ be a strongly

(1)

Let P be a sequence of actions i ~ 1, then P e scheds(A ).

(2)

Let ~ be a finite sequence of actions all i = I, then ~ ● @scheds(A).

(3)

Let ~ be a sequence then /3 = behs(A).

(4)

Let ~ be a finite sequence of actions i = 1, then P ● finbehs( A ). The preceding

behavior behaviors

of actions

collection

of automata,

proposition

in acts(A). in acts(A).

in ext(A).

is useful

If

/31A, If

~ IA,

If

~ IA,

in ext(A).

If/3

in proving

that

E scheds(A1) = jlnscheds(A,

● behs(A,)

IA,

for

for

) for

all i =1,

= jinbehs(A1)

a sequence

all

for all

of actions

of a system A: It suffices to show that the sequence’s projections of the components of A and then to appeal to Proposition 2.

2.3. IMPLEMENTATION. automaton

compatible

A,.

by another.

We Let

define

A and

a

notion

B be automata

of with

“implementation” the

same

are

of external

is a

one

action

B if signature, that is, with extsig(A) = extsig(B). Then A is said to implement finbehs(A) G finbehs(B). One way in which this notion can be used is the following: Suppose we can show that an automaton B is correct, in the sense that its finite behaviors all satisfy some specified property. Then, if another automaton A implements B, A is also correct. One can also show that, if A implements B, then replacing B by A in any system yields a new system in which all finite behaviors are behaviors The definition of an implementation exhibit

every

possible

behavior

of the original system. of B by A does not

of B. Rather,

B is viewed

require

that

as describing

A the

acceptable behaviors, and the behaviors of A are constrained to be acceptable. This constraint is easy to satisfy: A could simply never produce any outputs. (Notice

that

A

still

has

behaviors,

since

all

automata

are

required

to

be

Correctness

of Orphan

input-enabled.) something. address using

In

Management

other

Such

One useful

words,

there

requirements

in this paper.

the parts

take

They

891

is no the

that

that

of

that

A

actually

liveness,

which

the 1/0

automaton

within

deal with

for showing

requirement

form

can be handled

of the model technique

Algorithms

we

do

do not model

liveness.

one automaton

implements

another

is

to give a correspondence between states of the two automata. Such a correspondence can often be expressed in the form of a kind of abstraction mapping that we call a possibilities mapping, defined as follows: Suppose A and B are automata with the same external action signature, and suppose f is a mapping from

states(A)

to the power

set of states(B).

That

is, ifs

is a state of A, f(s) is a

set of states of B. The mapping f is said to be a possibilities B if the following conditions hold:

mapping

(1) ~=

eV:O~ start

t ~ of

(2) flet

s’ be a reachable

state SO of A, state

(s’, m,s) a step of A. Then having an empty schedule) (a) ylext(B) (b) t G f(s). The following

PROPOSITION

3.

and

behaviors where

input

A implements

it

safety

such

a possibilities

restrictions

neither properties.

provides

of B, and

mapping

from

As long

does the automaton. O

be

a set

in our model

to restrict inputs

well-fomzedness

is in terms

of behaviors: Let

automata

convenient

obeys certain

for sequences of actions We say that A preserues &-lA G finbehs(A). Thus, if an automaton to violate cumulative

state

that

A to

B.

Although

is often

the environment

a property

the property, for

actions,

in which

of discussing

a reachable

B such

Suppose that A and B are automata with the same external there is a possibilities mapping from A to B. Then A

the environment

preserves

state

there is an extended step (t’, y, t) of B (possibly such that the following conditions are satisfied:

2.4. PRESERVING PROPERTIES. to block

t’ = f(s’)

shows that giving

to show that

action signature implements B.

of A,

is a start

A to

= m-lext(A),

proposition

B is sufficient

there

from

of the

in a sensible notion

Such a notion actions

and

to

those

way, that

restrictions.

A useful

that

as the environment of

are unable

attention

k, way

an automaton does not violate

is primarily

interesting

P a safety

property

in 0. Let Abe an automaton with O fl int(A) = @. P if ~w 10 G P whenever ~ I@ = P, m G out(A), and preserves

P: as long as the behavior satisfies

a property

P, the automaton

environment only provides P, the automaton will only

inputs perform

is not the first such that the outputs such

that the cumulative behavior satisfies P. In many cases of interest, we have @ G ext(A); note that even in this case, the fact that an automaton A preserves P does not imply

that

all of A’s behaviors,

when

restricted

to 0, satisfy

P. It is

possible for a behavior of A to fail to satis~ P if an input causes a violation of P. However, the following proposition gives a way to deduce that all of a system’s behaviors satisfy P. The proposition says that, under certain conditions, if all components of a system preserve P, then all the behaviors of the composition satisfy P.

M. HERLIHY

892 PROPOSITION

Let

4.

{A,},.

presenes

P.

behs(A)l@

Then

A

presen’es

Let

/3 be

a sequence

and &r 1A = finbehs(A). finbehs(A,), by Proposition Now

suppose

In

this

section,

we

to which

ric systems conditions orphans.

if

@ n i~d A)

= 0, and i E I, A, = 0,

actions

= 0,

such

~ I@ G P, m ~ out(A),

that

~ ● behs(A).

and let prefix

then

PI@ G p.

then

Since

pm IA, ●

A preserves

of ~ I@ is in P. Then,

basic

results

to

plus in

invoke can

invoke describing

on

~ I@ e P, by

transaction-processing the

subtransactions algorithms

and include

operations concurrency

the

geneof

user-provided to coordinate

written

by applica-

Transactions

addition,

if nesting

and

receive

responses

are

per-

is allowed, from

the

processing. transaction-processing

making

decisions

on

objects.

control

of

In

of their

data

are

the

correctness

management

designed

language.

objects.

system,

transactions,

consist

transactions

programming

results

both

of correct

algorithms

subtransactions the

transaction-processing

of [2]. We also define

the notion

The

data

of

generalize

systems

transactions. a suitable

class

systems

systems

in particular,

operations

subtransactions

with

the

Basic

transaction-processing

of different

transactions

interact

systems,

apply.

Transaction-processing

code,

programmers

a

of automata



for basic systems,

activities

In

of

every finite

define

our

3.1. OVERVIEW.

mitted

Furthermore,

of [4] and the pseudotime

transaction tion

collection

Systems

systems

the

compatible

Then m E out(A, ) for some i c I, and 2. Since Al preserves P, pm I@ ~ P.

@ n in(A)

that

P, by a simple induction, the limit-closure of P. 3. Basic

P.

~ ● behs(A),

c P. That is, f

PROOF.

~ be a strongly

@ be a set of actions such that @ n int(A) for actions in @. Suppose that for each

and let A = 111. ~ A,. Let let P be a safety property

ET AL.

and

about The

recovery

algorithms

when

to

schedule

transaction-processing

In

algorithms.

many

interesting cases (e.g., for locking algorithms), the transaction-processing algorithms can be naturally divided into a controller and a collection of objects, where each object includes concurrency control and recovery algorithms appropriate

for

transactions

that

object

and

and

objects.

general results. The transaction-processing ~stems,

In the organization

the We

controller

manages

do not,

however,

systems

studied

communication require

in

this

this paper

among

division are

called

for

the our basic

we consider,

the transaction-processing algorithms are represented by a component called a basic database. Each component of a basic system is modeled as an 1/0 automaton. That is, each transaction is an automaton, and the basic database is another automaton. The nested structure of transactions is modeled by describing each transaction and subtransaction in the transaction nesting structure as a separate 1/0 automaton. If a parent transaction T wishes to invoke a child transaction T’, T issues an output action that “requests that T’ be created. ” The basic database receives this request, and at some later time might issue an action that is an input

to the child

transactions cating with

T’ and corresponds

to the creation

of T’. Thus,

the different

in the nesting structure comprise a forest of automata, communieach other indirectly through the basic database. The highest-level

Correctness

of Orphan

transactions, tions,

that

is, those

are the roots

It is actually

Management that

Algorithms are not

893

subtransactions

of any other

transac-

in this forest. structure

as a

tree rather than as a forest. Therefore, we add an extra root automaton dummy transaction, located at the top of the transaction nesting structure.

more

convenient

to model

the transaction

nesting

as a The

highest-level user-defined transactions are considered to be children of this new root. The root can be thought of as modeling the outside world, from which

invocations

of

top-level

transactions

originate

and

about the results of such transactions are sent. h the rest of this section, we define basic systems conditions

that

3.2. SYSTEM to name

the

they are supposed TYPES.

We

transactions

begin

and

A system type consists

subset

by defining

objects

in

a type

a basic

reports

the correctness

structure

that

will

be used

system.

of the following:

TO E T,

accesses of T not containing

—a mapping names into

and state

which

to satisfy.

—a set T of transaction names, —a distinguished transaction name —a

to

To,

parent: T – {TO} + T, which configures the set of transaction a tree, with To as the root and the accesses as the leaves,

—a set X of object names, —a mapping object: accesses —a set V of return ualues.

-+ X,

Each element of the set accesses is called an access transaction name, simply an access. Also, if object(T) = X, we say that T is an access to X. In referring as leaf node,

to the transaction tree, we use standard tree internal node, child, ancestor, and descendant.

we consider

any node

the ancestor

and descendant

a least common The

to be its own

ancestor

transaction

tree

describes

TO as the

name

tree

represents

the name

The

of the

children

transactions.

The

and its own

are reflexive.

descendant,

such case,

that

is,

We also use the notion

of

of two nodes.

with

parent.

ancestor

relations

terminology, As a special

or

the nesting

dummy

root

structure

transaction.

of a subtransaction

of

TO represent

accesses represent

transaction child

of the transaction

names

names

for Each

of

the

named

top-level

for the lowest-level

names,

node

in this by its

user-defined transactions

in

the transaction nesting structure; we use these lowest-level transaction names to model operations on data objects. Thus, the only transactions that actually access data are the leaves of the transaction tree, and these do nothing else. The internal nodes model transactions whose function is to create and manage subtransactions The

tree

all possible tion,

including

structure

transactions

however,

imagine system. ing.

that The

only the

be thought

that

some

tree

tree will,

accesses, but they

should

might

of these

structure in general,

do not access data directly.

of as a predefine

naming

scheme

ever be invoked.

In any particular

transactions

actually

is known

will

in advance

be an infinite

take

steps.

by all components

structure

with

infinite

for

execuWe of a branch-

The set X is the set of names for the objects used in the system. Each access transaction name is assumed to be an access to some particular object, as designated by the object mapping. The set V of return values is the set of

M. HERLIHY

894

ET AL.

Basic

\

Database /

/

\

\

FiG. 1.

possible

values

that

might

be returned

to their parent transactions. For the rest of this paper, 3.3. GENERAL system

type

each nonaccess Later basic

by successfully

we fix a particular

STRUCTURE

is a closed

Basic system.

OF BASIC

system

transaction

name

system A

SYSTEMS.

consisting

completed

transactions

type. basic

system

for

a given

a transaction

of

T and a single

autonzato?z AT basic database automaton

in this section, we give conditions to be satisfied by the transaction database automata. Here, we just describe the signatures of

for B.

and these

automata, in order to explain how the automata are interconnected. Figure 1 depicts the structure of a basic system. The transaction nesting structure is indicated in part by dotted lines between transaction

automata

corresponding

do not have associated

automata,

of accesses. The

connections

indicated

by solid

the basic Figure

direct lines.

database,

Thus,

between

the transaction

but not directly

2 shows the interface

to parent so the diagram

with

and

child.

Access

transactions

does not indicate

automata

the parents

(via shared

automata

actions)

interact

directly

are with

each other.

of a transaction

automaton

in more

detail.

The

automaton for transaction name T has an input action CREATE(T), which is generated by the basic database in order to initiate Ts processing. We do not include explicit arguments to a transaction in our model; rather we suppose that there is a different transaction for each possible set of arguments, so any input to the transaction is encoded in the name of the transaction. T has REQUEST_ CREATE(T’) output actions for each child T’ of T in the transaction

nesting

structure;

these

are requests

for creation

of child

transactions,

and

are communicated directly to the basic database. At some later time, the basic database might respond to a REQUEST_ CREATE(T’) action by issuing a CREATE(T’)

action;

in case T’ is not

an access, this

action

is an input

to the

automaton for transaction T’. T also has REPORT_ COIvlMIT(T’, v) and REPORT_ABORT(T’ ) input actions, by which the basic database informs T about the fate (commit or abort) of its previously requested child T’. In the case of a commit, the report includes a return value v that provides information about the activity of T’; in the case of an abort, no information is returned. Finally, T has a REQUEST_ COMMIT(T, v) output action, by which it announces to the basic database that it has completed its activity successfully, with a particular result that is described by return value v. Figure 3 shows the basic database interface. The basic database in any particular CREATE

basic system and REQUEST_

receives the previously mentioned COMMIT actions as inputs from

the

REQUEST_ transaction

Correctness

of Orphan

Management

Algorithms

895

REQUEST_COMMIT(T,v)

CREATE(T)

T

REPORT.COMMIT(T,V)

REQUEST_CREATE(T)

REPORT_ABORT(T’)

Transaction

FIG. 2.

REQUEST_CREATE

interface.

CREATE

REQ1.JEST_COMMIT

REQUEST.COMMIT

(accesses)

Basic Database

COMMIT ABORT ‘

‘o

REPORT.COMMIT REPORT_ABORT Basic database interface.

FIG. 3.

automata. action

It produces

automata

produces represent

CREATE

or invoking

actions operations

as outputs,

thereby

on objects.

The

awakening

basic

database

transalso

REQUEST_ COMMIT(T, v) output actions for accesses T; these responses to the invocations of operations on objects. The value v in

a REQUEST_ COMMIT(T, tion as part of its response.

v) action is a return value returned by the operaThe basic database also produces COMMIT(T) and

ABORT(T) actions for arbitrary transaction names T + TO, representing decisions about whether the designated transaction commits or aborts. For technical convenience, we classi~ the COMMIT and ABORT actions and the REQUEST–COMMIT actions of the basic system component.7 REPORT–ABORT transactions Different section

of this

actions for access transactions when they are not inputs to

as output any other

The basic database also has REPORT_ COMMIT and actions as outputs, by which it communicates the fates of

to their basic

and CREATE database, even

parents.

databases

paper

may

describes

include

generic

additional databases,

output which

actions. have

The

additional

final out-

7 Classifying actions as outputs even though they are not inputs to any other system component is permissible in the 1/0 automaton model. In this case, h would also be possible to classify these actions as internal actions of the basic database, but then the statements and proofs of the ensuing results would be more complicated.

896

M. HERLIHY

puts

by which

locks

may

contain

the

be

additional

As

all child

any

output

until

In

for most

many

our

such

work,

it

treats

cases,

is convenient

to

activated.

This

separation

occurs

in

results

among

COMMIT

occurs.

that

this

means

use

two

as if it had

A

that

the

to

to perform

consequence

ever

been

action

not

is that

might

are

of “creat-

as an input

constrained

transactions

a system

action

transaction

be

action of

which

data.

of

to the

formally

will

so that

the

of this

system

must

be created

system

in any

will

include

automata.

to describe

is important

components

child

transactions

and CREATE,

and

the

objects, example,

of timestamp

earlier

is treated

creation

CREATE

distinction

the

a CREATE

interesting

to the a second

management

transaction

all possible

are

referred

action

child

transaction

we

model

dynamic

automata

infinitely

automata,

though

CREATE

modeling

execution.

1/0

for

the

actions of

the

the

The

communicated

databases

involving

Even

transaction;

method

In

case

transaction,

along.

the

include

the

are

Pseudotime

statically.

a child

there

of transactions

outputs

is always

determined ing”

fates

released.

ET AL.

our

what

in actual and

separate

happens distributed

proofs.

REQUEST-COMMIT,

actions,

when

systems

Similar

REQUEST_

a subtransaction such

remarks

COMMIT,

hold

and

is

as Argus, for

the

REPORT-

actions.

3.4. SERIAL

ACTIONS.

system type include given system type subsection:

CREATE(T)

transaction

name

COMMIT(T),

The

external

actions

of

a basic

the serial actions for that type. are defined to be the actions and

and v is

REQUEST–COMMIT(T, a return

ABORT(T),

REPORT_

value,

system

v),

and

of

a given

The serial actions for a listed in the preceding where

T

is

any

REQUEST-CREATE(T),

COMMIT(T,

v),

and

REPORT.

ABORT(T), where T # To is a transaction name and v is a return value.x If ~ is a sequence of actions, define sen”al( ~ ) to be the subsequence of ~ containing all the serial

actions

in ~.

In this subsection, we define some simple concepts involving serial actions. All the definitions in this subsection are based on the set of actions only, and not on the specific automata in any particular system. For this reason, we present these definitions here, before going on (in the next subsection) to give more information about the basic system components. We first

present

of well-formedness 3.4.1.

some fundamental for sequences

Terminology.

completion ABORT(T)

The

definitions,

and then

we define

notions

of actions.

COMMIT(T)

and

ABORT(T)

actions for T, while the REPORT_ COMMIT(T, for T. actions are called report actions

actions

are

called

v) and REPORT_

We associate transaction names with some of the serial actions, as follows. Let T be a transaction name. If n is either a CREATE(T) or a REQUEST_ COMMIT(T, v) action, or is a REQUEST. CREATE(T’), REPORT_ COMMIT(T’, v’) or REPORT–ABORT(T’), where T’ is a child of T, then we define transactiorz(m ) to be T. If w is a completion action, then transaction (w) is undefined.

In some

contexts,

we also need

to associate

a transaction

with

8 These actions are called serial actzons because they are exactly the external actions of a serial of the given type. More will be said about serial systems later in the paper.

gstem

Correctness

of Orphan

completion

actions;

Management since

Algorithms

a completion

897

action

for

T can

be

thought

of

as

occurring in between T and parent(T), some of the time we want to associate T with the action, and at other times we want to associate parent(T) with it. Thus, we extend the transaction(w) definition in two different ways. If rr is any serial action, then we define hightransaction(m ) to be transaction(n) if m is not a completion

action,

if n is any serial is not

a completion

particular, rial

and to be parent(T), action,

action,

T for which

if m is a completion

lowtransaction(

action

= lowtransaction(n) transaction(m)

for T. Also,

m ) to be transaction(m)

and to be T, if T is a completion

hightransaction(n)

actions

We whose

we define

action

if m-

for T. In

= transaction(~)

for all se-

is defined.

also require notation for the object associated with any serial action transaction is an access. If n- is a serial action of the form CREATE(T)

or REQUEST_ object(m)

COMMIT(T,

v), where

T is an access to

X,

then

we

define

to be X.

We extend the preceding notation to events as well as actions. For example, if T is an event, then we write transactiorz(n ) to denote the transaction of the action tion,

of which

T is an occurrence.

lowtransaction,

paper

and

object

in the same way, without

Now

we require

execution.

Let

actiue in REQUEST_

~

further

terminology

the definitions

We

extend

of hightransac-

other

notation

in this

explanation.

to describe

/3 be a sequence

provided COMMIT

We extend similarly.

the

of actions.

status

of a transaction

A transaction

that ~ contains event for T. Similarly,

name

during

T is said to be

a CREATE(T) event but no T is said to be live in /3 provided

that ~ contains a CREATE(T) event but no completion event for T. (However, note that /3 may contain a REQUEST–COMMIT for T.) Also, T is said to be an orphan T. We

in ~ if there

have

particular introduce Namely,

already sets

of

another

is an ABORT(U)

used

projection

actions,

and

projection

if ~ is a sequence

action

operators

to

actions

operator, of actions

p IU is defined to be the sequence transaction name, we sometimes write

3.4.2.

automata.

this time

~ I{T: transaction(m) /3 IT as shorthand for name,

and basic

database

we define

transaction

We define

well-formedness.

those

write

event in ~, if any, is a CREATE(T) events,

then

/3 IX to

created, and the been requested.

conditions, which are actions of the transac-

conditions

here.

Let T be any transaction

event,

names.

on the transaction of a basic system. are guaranteed; for

A sequence B of serial actions m with transaction(n) = T is defined transaction well- foimed for T provided the following conditions hold: (1) The first CREATE

to now

G u}. If T is a ~ I{T}. Similarly, if

should not take steps until it has been not create a transaction that has not

automata.

We names,

we sometimes

are captured by well-formedness properties of sequences of external

U of

sequences

to sets of transaction

We place very few constraints and basic database automaton in our definition we want to assume that certain simple properties

Such requirements fundamental safety

First,

action

particular

Well-Forrnedness.

example, a transaction basic database should

tion

to restrict of

and U is a set of transaction

~ is a sequence of actions and X is an object denote ~ I{n: object(n) = X}.

automata However,

in ~ for some ancestor

and there

name. to be

are no other

898

M. HERLIHY

There is at most T’ of T,

(2)

(3) Any report CREATE?(T’) (4) There (5)

one REQU13ST_CREATE(T’)

event for in ~,

is at most

a child

one report

T’

event

of

event

T

is

in p for each child

preceded

in ~ for each child

ET AL.

by

REQUEST_

T’ of T,

If a REQUEST-COMMIT event for T occurs in ~, then it is preceded by a report event for each child T’ of T for which there is a REQUEST_ CREATE(T’) in /3,

(6) If a REQUEST-COMMIT in ~. In particular, well-formed

event for T occurs

if T is an access, then for

T are

the

CREATE(T)REQUEST_ set of transaction

prefixes

COMMIT(T,

well-formed

is defined

conditions

to

be

basic

sequences

two-event

v). For

sequences

it is prefix-closed and limit-closed. Next, we define basic database actions

the only of the

in ~, then

that

are transaction

sequences

of the

form

any T, it is easy to see that

for T is a safety

well-formedness.

database

it is the last event

A

well-foimed

property, sequence

provided

that /3 of the

the

is, that serial

following

hold:

(1) The sequence ~ IT is transaction well-formed, for all transaction names T, (2) If a CREATE(T) event occurs in ~, fG~ T + TO, then there is a preceding REQUEST_ CREATE(T) in ~, (3) If

there

is a COMMIT(T)

Q1-JEsT-COMMIT(T, (4)

If

there

event

v) event

is an ABORT(T)

QuEiST.C~EATE(T)

event

event

in

~,

then

there

is a preceding

RE-

is a preceding

RE-

in p, for some v, in

~,

then

there

in ~,

(5) There is at most one completion event in ~ for each transaction name T, (6) If there is a REPORT_ COMMIT(T, v) event in f3, then there is a preceding REQUEST_ COMMIT(T, v) event in ~ and a preceding COMMIT(T) (7)

event in /3, If there is a REPORT-ABORT(T) ABORT(T) event in ~,

(8) There

is at most

3.5. BASIC

one report

SYSTEMS.

We

event

event are

now

in

~, then

there

in ~ for each transaction ready

to

define

basic

is a preceding name

T.

systems.

Basic

systems are composed of transaction automata, one for each nonaccess transaction name, and a single basic database automaton. We describe the two kinds of components

in turn.

3.5.1. Transaction Automata. A transaction automaton transaction name T (of the given system type) is an 1/0 following external action signature: Input: CREATE(T) REPORT–COMMIT(T’, REPORT_ABORT(T’), output: REQUEST_ REQUEST_

CREATE(T’), COMMIT(T,

AT for a nonaccess automaton with the

v), for every child T’ of T, and return for every child T’ of T for every child T’ of T v), for every return value

v

value

v

Correctness

of O~han

In addition, to preserve

AT

Management

Algorithms

may have an arbitrary

transaction

899

set of internal

well-formedness

for

T,

actions.

as defined

We require in

the

AT

preceding

subsection. As discussed earlier, this does not mean that all behaviors of AT are transaction well-formed, but it does mean that as long as the environment of AT does not violate transaction well-formedness, AT will not do so. Except for that requirement, transaction automata can be chosen if ~ is a sequence of actions, then ~ IT = P lext(AT). Transaction

automata

transactions gramming

example,

complete.)

intended

may impose

Argus

However, Basic

are

to

in any reasonable

languages

ior. (For

3.5.2.

defined

Database

general

enough

restrictions

activity

in transactions

do not require

Automata.

A

Note

to

language.

additional

suspends

our results

be

programming

arbitrarily.

model

Particular

on transaction until

that the pro-

behav-

subtransactions

such restrictions.

basic database

automaton

is also mod-

eled as an 1/0 automaton. A basic database passes requests for the creation of subtransactions to the appropriate recipient, initiates REQUEST_ COMMIT actions for accesses, makes decisions about the commit or abort of transactions, and passes reports about the completion of children back to their parents.

It may also car~

A basic

database

Input: REQUEST_

out other

activity.

has the following

CREATE(T),

actions

v), T a nonaccess

COMMIT(T,

COMMIT(T), ABORT(T),

T + TO T # T(]

REPORT. COMMIT(T, REPORT_ABORT(T),

action

signature:

T + TO

REQUEST–COMMIT(T, output: CREATE(T) REQUEST_

in its external

transaction

v), T an access transaction

name

name

V), T # TO T # TO

In addition, it may have internal actions. Depending

other arbitrary output actions, as well as arbitrary upon the design of the particular basic database

automaton, some of the additional output actions may be associated with particular objects. Hence, each basic database is assumed to come equipped with an extension of the object partial mapping on actions, which may associate some of these additional, nonserial output actions with particular object names. That is, each nonserial output action n may (but need not) have object(n) defined. The REQUEST. to be identified conversely, actions

all

CREATE with

the

for which

the

and REQUEST.

corresponding

CREATE

and

report

COMMIT

outputs outputs

T is an access) are identified

inputs

of transaction

with

(except

those

are intended automata,

and

CREATE(T)

the corresponding

inputs

of

transaction automata. A basic database is required to preserve basic database well-formedness. There are many examples of basic databases in the literature. For example, the composition of the generic controller and generic objects of [4] preserves basic database well-formedness, and so is an example of a basic database. The same is true for the composition of the pseudotime controller and pseudotime

900 objects

of [2]. We present

fact, we claim

that

these

almost

examples

all interesting

be modeled as basic databases. Our notion of basic database algorithms

that

It turns

out

are relevant

that

the

in more

later

in this paper. algorithms

In can

(See [14] for additional examples.) identifies the aspects of transaction-processing

to our analysis

details

detail

transaction-processing

of how

of orphan

management

synchronization

and

mented by a basic database are largely irrelevant. important contributions of this paper: we are able tions dent

ET AL.

M. HERLIHY

algorithms.

recove~

Indeed, to state

are imple-

this is one correctness

of the condi-

for and verify orphan management algorithms in a way that is indepenof the concurrency control and recove~ methods used within the basic

database. 3.5.3.

Basic

Systems.

A

compatible

set of

automata

transaction

names

and the

ated with for

each nonaccess

T. Associated

with

system type. When the particular

basic

system

indexed singleton name

basic

name

BD

system

the

composition

union

of

(for

“basic

set {BD}

transaction

the

B is the

by

the

of

a strongly

set of

nonaccess

database”).

T is a transaction

is a basic

database

B is understood

Associ-

automaton

automaton

from

for

the context,

A ~ the

we call

schedules, and behavits external actions the basic actions, and its executions, basic schedules, and basic behazliors, respectively. The iors the basic executions,

following proposition formedness properties: P~OPOSITION

5.

If

(1)

For elw?y transaction

(2)

The sequence PROOF.

Note

says that

basic

behaviors

p is a basic behauior, name

is basic database

first

the

basic

the

appropriate

then the following

T, BIT is transaction

serial(~) that

have

conditions

well- fon-ned for

well-

hold.

T.

well-formed.

database

preserves

basic

database

well-

formedness, formedness automaton

and this immediately implies that it preserves transaction wellfor every transaction name. Next, note that each transaction preserves transaction well-formedness for the appropriate transac-

tion

Furthermore,

name.

it has in its signature

no actions

and so preserves transaction well -formedness for first part of the proposition foljows by Proposition A simple induction shows that each transaction basic

database

Proposition

4.

well-formedness,

and the

second

of other

transactions,

all transaction names. The 4. automaton also preserves conclusion

follows

also from



3.6. SERIAL CORRECTNESS. In this subsection, we give appropriate notions of correctness for basic systems. These include notions appropriate for systems that manage orphans, as well as notions for systems that do not manage orphans but do carry out concurrency control and recovery. These notions are taken from [4], [12], and [13]. The spirit of our definitions is similar to that of the usual definition of serializability in the database literature. However, the usual notion does not take nesting or aborts into account. We define correctness conditions for basic systems of a given type by relating their behaviors to those of a particular basic system of that type, the serial system. The executions, schedules, and behaviors of a serial system are called serial executions, serial schedules, and serial behaviors, respectively. Serial systems are composed of transaction automata and a serial database, which

Correctness itself

of O~han

is the

automata

Management

composition

of

are identical

Algorithrm

a serial

to those

901

scheduler

in basic

and

objects.

The

serial

systems.

The

transaction

scheduler

controls

the order in which the transactions take steps and in which accesses to objects occur. It permits only one child of a transaction to run at a time. Thus, sibling transactions

execute

sequentially

at every

parent

node

in the transaction

tree,

so that transactions are run in a depth-first traversal of the tree. Also, the serial scheduler aborts a transaction only if it has not been created, and creates a transaction

only if it has not been

transactions

execute

Objects

in

guarantees never take with

sequentially

a serial

aborted.

Thus,

and aborted

system

are

quite

in a serial

transaction

simple.

execution,

take

Since

sibling

no steps.

the

serial

scheduler

that siblings execute sequentially, and that aborted transactions any steps, serial objects do not have to deal with concurrency or

failures.

The

serial

objects

serve

as a specification

of how objects

should

behave in the absence of concurrency and failures. (The serial objects serve the same purpose as the serial specifications [25] and [26].) A detailed description of serial systems may be found in the references [41 and [12– 14]. NOW we give a definition that says that a sequence of actions serial

behavior

actions there

to

and T is a transaction exists a serial

same thing

behavior

in P that

Now we can define that

a particular

a basic

system

it could

transaction. name,

we say that

y such that

~

“looks

like”

is a sequence

f? is serially

correct for

words,

a of

T if

T sees the

behavior.

of correctness

correct

if

y IT = ~ IT. In other

see in some serial

two notions is serially

Namely,

for basic systems.

if each of its finiteg

First,

behaviors

we say

is serially

correct

for all transaction names. 1“ The requirement that every transaction see a serial view is very strong. Without orphan management, in fact, systems may not meet this requirement. (This is true of all published concurrency control algorithms for nested transactions of which we are aware.) Instead, they provide a slightly weaker notion of correctness, namely that nonorphan transactions see serial views. More precisely, we say that a basic system is serially comect for nonophans transaction arbitrary The serially

names

if each of its finite that

are

not

orphans

behaviors in

~.

~ is serially Orphans,

correct

however,

for all can

see

that

are

views. papers correct

[2], [4], [12], and [13] contain for nonorphans.

examples

The basic system

of basic

systems

in [12] and [13] uses exclusive

locking for concurrency control and recovery,ll while more general commutativity-based locking strategy. timestamps for concurrency control and recovery. The orphan management algorithms of this paper

the systems in [4] use a The systems in [2] use ensure

that

the

systems

that use them are serially correct. To ensure this, the orphan management algorithms rely on the basic database to ensure serial correctness for nonorphans; in fact, the algorithms correctness for nonorphans.

work with any basic database that ensures serial In this sense, the orphan management algorithms

‘] Serial correctness is stated in terms of finite behaviors because the corresponding property for infinite behaviors is not satisfied by locking algorithms, in the absence of extra assumptions [22]. 1“ As discussed in [12], this definition of correctness allows different transactions in ~ to “see” different serial behaviors. However, correctness applies to the root transaction T(, as WCI1. so the root must see the same results from the top-level transactions that it could see in some serial ~~havior. There are some minor differences; for example, the completion and report actions are combined into single actions rather than treated as two separate actions.

M. HERLIHY

902 and the concurrency the following

sort

of the system

control for

with

algorithms

each orphan

orphan

are independent.

management

management

and

there exists a behavior y of the underlying and T is not an orphan in y. In other algorithms prevent thing a transaction basic imply

transactions from sees is consistent

system in some execution that if the basic system

corresponding

system

4. Information

Flow

In this

section,

models

the information

n

orders

in an execution,

n- cannot

world”

name,

then

it is not an orphan. These results correct for nonorphans, then the

events

is serially

partial

if an event that

m occurs

correct.

orders,

in behaviors

“know”

in which

of

that they are orphans—everyit could see in the underlying

of irreflexive

affects relations;

then

is a “possible

“knowing” with what

between

a result

~ is a behavior

T is a transaction

management

families

flow

If

basic system such that y IT = ~ IT words, the orphan management

in which is serially

orphan

we define

call these partial there

with

We prove

algorithm:

ET AL.

each

of which

of a basic system.

+ does not affect

@ occurred,

but

We

an event

in the sense that

+ does not.

The

algorithms

described later in this paper ensure that no event of a transaction is affected by the abort of an ancestor; thus, no event of a transaction ever knows that the transaction is an orphan. To make our definitions as general as possible, we define affects relations by describing the constraints they must satisfy. Later in the paper, we give examples of affects relations for particular kinds of systems. 4.1. FAMILIES R is a binary

OF AFFECTS

say that

y is R-closed

contains

any event

Let each

in

in ~, and

~ if, whenever

~ in ~ such that

B be a basic

system,

actions

affects relations

for B provided

that

RP

is an irreflexive

● Rp

then

(2) If -y is a prefix (&n) = RY,

of ~, and

the following order

+ and rr are in y, then

Suppose that (3 is a behavior of B and ABORT(T’) event, and m is a CREATE, REPORT–COMMIT, a REPORT–ABORT(T)

#

is related

(a) If m (b) If n (c) If ~ (d) If m (e) If T (f)

to

= = = = =

+

or

between

of relations,

conditions

of B and Y is an Rfi-closed

transaction,

we

in ~, it also one for

R is said to be a family

on the

(4)

is an event

T

actions,

~, then

of

hold.

events

in

~ such

that

if

m in ~,

If p is a behavior a behavior of B,

a nonaccess

of

an event

be a family

(3)

there

of basic

n-) E R.

of B. Then

partial

@ precedes

is a sequence

y contains

(+,

~ of external

(+, ~)

~

y is a subsequence

and let R = {RB}

sequence

(1) Each

If

RELATIONS.

on events

relation

(+, T)

subsequence

and

v

in

of p, then

y is

(~, T) G RD, where + is an a COMMIT, an ABORT, a of for T # T’, an output

a REQUEST–COMMIT +

E RP if and only if

/3 such

for that

an

access.

( q5, +)

E Rp,

Then where

m as follows:

CREATE(T), then 4 = REQUEST. CREATE(T), COMMIT(T), then * = REQUEST. COMMIT(T, REPORT-COMMIT(T, v), then * = COMMIT(T), REPORT-ABORT(T), then $ = ABORT(T), ABORT(T), then + = REQUEST-CREATE(T),

If w is an output transaction T,

of nonaccess

transaction

T, then

*

v),

is an event

of

Correctness

of O@an

Management

903

Algorithms

(g) If r = REQUEST-COMMIT(T, then object(~) = X.lz The first

two conditions

describes

a partial

v) where

are quite

ordering

T is an access to object

X,

R6

simple;

the first

says that

with

the order

in which

consistent

each relation events

appear

in

/3, and the second says that whether or not one event affects another is determined at the time the second event occurs. The third condition describes a sense in which the relation Rp captures all the dependency i elationships between

events.

behavior

/3, then

The

condition

n cannot

implies

“know”

that

that

if m is not

@ occurred,

affected since

by C$ in some

~ cou!.~ also have

occurred in a different behavior in which + did not occur. The fourth condition is a technical condition that describes certain limitations on the pattern of information strated

flow.

It will

be needed

for the examples

for

in Section

our

later

proofs,

and

can be demon-

9.

When the particular family R = {Rp} of affects relations is understood, we often refer to each RB as an ajfects relation, and we often say “0 affects n in ~“

to mean

that

4.2. FAMILIES a family

of

● RD.

(+, T)

OF DIRECTLY-AFFECTS

affects

family

of

family

of relations,

smaller

is said

to be

relations

can

relations. one

Thus,

for

each

RELATIONS. be

let

B be

transitive

closure

of R~.

In this

many

cases

described

a basic

sequence

of directly-affects a family R = {RP} of affects relations

family

In

conveniently

system

~ of external

by and

R’

actions

of interest,

a generating = {R~}

be

of B. Then

a R’

relations for B, provided that there is a for B such that for each P, RB is the

case, we say that

R’

generates

R. Note

that

there is no requirement that the relations in R’ be minimal. When the particular family R’ = {R~} of directly-affects relations is understood, we often refer to each R~ as a directly -aflects relation; also, we often say that

“+

directly

4.3. USING AGEMENT

the

view

FAMILIES

is that abort

of

it could

get

subsequence that

affect

the

definition

to mean

OF AFFECTS The ensure

that

original

in that

idea

an event

Then

in a behavior

we

behavior of affects

TO

DESCRIBE

behind

show

the

that

it is not

all

resulting

relations,

and

affected

transaction we

events

does

MAN-

management

T is never

every

sequence

ORPHAN

orphan

an orphan;

containing

The

m) ● R~.

(4,

of a transaction

can

in which

behavior.

that

RELATIONS

intuitive

ancestor.

of a family

simply

of T and

gets take all

a

the

events

is a basic

behavior,

not

contain

an abort

can

see

inconsistent

by for

of T, by construction.

following

state

in

with

children

example

a system

the following

they an

of the them

an ancestor The

n in ~”

ALGORITHMS.

algorithms by

affects

without

illustrates orphan

how

an

orphan

TI and T2, both

of which

scenario:

accesses X through

T first

an

Suppose that T is a transaction are accesses to an object X. Consider

management.

the creation of Tz, which will access X again. requested only if TI completes successfully,

its child

Furthermore, and that TI

T1. T then

requests

assume that T2 is modifies X. Now

suppose that T aborts before Tz starts running at X, and X learns of the abort. If T1’s modification of X is undone when X learns that T aborted, then T, will not see the value for X that it expects, since T2 only runs if T1 has modified X successfully. This scenario is captured more precisely by the following fragment 12 Note that

$ can be either

a serial or a nonserial

event.

904

M. HERLIHY

of a schedule of a generic in Section 9.1):

system

CREATE(T) REQUEST.

CREATE(T1

)

CREATE(TL) REQUEST.

COMMIT(T1

, VI )

COMMIT(T1) REPORT. COMMIT(T1, REQUEST.

CREATE(TZ REQUEST_

two

in more

detail

OF(T) , V2 )

event lets X know that T has aborted.) The family of generic systems described in Section 9.1 ensures that the

event affects the INFORM_ABORT–AT(X) OF(T) event, and that at an object is affected by all prior events at the object. Thus, by Tz from running when its events would be affected by the abort of

an ancestor,

The

are described

V, )

) COMMIT(Tz

(The INFORM-ABORT affects relations for

5. Filtered

systems

CREATE(TZ)

ABORT(T) INFORM.ABORT_AT(X)

ABORT(T) an event preventing

(generic

ET AL.

we can prevent

it from

knowing

that

it is an orphan.

Systems orphan

management

different techniques. However, implements the same abstract For Sections

5 through

system B, together with directly-affects relations algorithms that exploit

algorithms

analyzed

in

this

paper

use

each can be proved correct by showing algorithm, described in this section.

8 of this paper,

we fix a particular

quite that

but arbitrary

it

basic

a family R of affects relations for B and a family R’ of for B, where R’ generates R. We describe several the properties of the affects relations to manage

orphans. In Section 9, we illustrate this general development specific basic systems and their affects relations.

by describing

two

One way of ensuring that actions of a transaction T are never affected by the abort of an ancestor of T is to add preconditions to all the actions of the basic database to permit actions of T to occur only if they would this way. It turns out, however, that this approach checks more

frequently

than

necessary.

In

this

section,

we

define

not be affected in for orphans much another

kind

of

that checks for orphans only when filtered system, REQUEST–COMMIT actions occur for access transactions. We then show that this is sufficient to ensure that transactions are never affected by the aborts of ancestors.

system,

called

a

We construct a filtered given family R of affects transaction database

automata automaton

automaton; it “filters” that any transaction, the basic system.

system based on the given basic system B and the relations. The filtered system consists of the given

from B and is obtained

a filtered database automaton. The filtered by slightly modifying the basic database

REQUEST_ COMMIT actions of access transactions orphan or not, sees a view it could see as a nonorphan

so in

Correctness

of (lphan

Management

5.1. THE FILTERED simple transformation the behaviors of the

Algorithms

905

DATABASE. The filtered database is obtained via a from the basic database. The only difference between two databases is that the new database only allows a

R13QUEST_COMMIT

of an access to occur

if it is not affected

by the abort

of

an ancestor. The state

filtered of the

where

database filtered

has the

database

basic_ state is a state

basic

actions.

Initial

equal

to an initial

states state

same

signature

has two

of the basic

database

of the filtered

of the basic

as the

components,

and history

database

database

basic

database.

basic_ state

The

history,

is a sequence

are those

and history

and

with

equal

of

basic–state to the

empty

sequence. A triple conditions

(s’, m,s) is a step of the filtered hold:

database

if and only if the following

(1) (s’.basic-state, n-, s.basic-state) is a step of the basic (2) s.history = s’.historym if n- is a basic action,l~ (3) s.history = s’.histo~ if m- is not a basic action, (4)

If n = REQUEST-COMMIT(T,

v) where

T’ is an ancestor of T, then object(~) = X in s’ history. Thus, occur,

at the point an explicit

same object LEMMA

jiltered

Let

6.

5.2. THE transaction

of the filtered = beh( /3 ).

filtered

SYSTEM.

and the filtered

schedules, behaliors,

and

The

behaviors

the

filtered

system implements

The mapping

LEMMA

above,

Let

8.

ellent

f that

database

is the

length the

First

database of

lemma

~. If

composition

automaton.

executions,

that can Ieaue the

of the

We call

filtered

its execu-

schedules,

assigns to each state s of the filtered

the filtered

is easily

and

P be a jiltered

behavior

(~,

performs

an explicit

note

that

Lemma

~

holds

13Recall that internal

The

is empty, for

P‘.

the

From

7 and proof

result the

test to ensure

by the abort of any actually guarantees

and let T be any transaction p)

=

p) = RP, for any ancestor

well-formed.

system the

seen to be a possibilities



database

in B such that transaction

$ such that

PROOF. basic

to

at the

of the access.

that the REQUEST_ COMMIT of an access is not affected ancestor. The following key lemma shows that this test more: that a similar property holds for all events.

p be an elent

event

the basic system.

set f(s) consisting of s.basic_state Proposition 3 implies the result.

As described

no preceding

system

database

@ with

respectively.

The filtered

7.

PROOF.

schedule

Then s.histoly

X, and if

an event

of an access is about

that

of any ancestor

automata

tions,

singleton mapping.

~ be a jinite

affects

COMMIT

to verify

FILTERED

jlltered

event

the REQUEST_

by the abort

in states.

T is an access to object

no ABORTIT’)

is performed

is affected

database

LEMMA

where

test

database,

T‘

clearly

restrictions

the

Let

of T.

Proposition of

name.

T. Then there is no ABORT(T’)

5 imply

lemma

holds.

is by

Suppose

on affects

actions of the basic database are not classified

that

serial(~)

induction /3 =

relations,

~ ‘m, RF

as basic actions.

on and c

is the that

Rp ~ U

906

M. HERLIHY

{(0, m)l 4 is an action in P ‘}. Thus, lemma holds when p = m. Suppose

that

the lemma

by induction,

does not hold,

that

w in ~, where transaction(n) = T and T’ contradiction. We consider cases. (1) T

is a nonaccess

and

v

is an output

property of affects relations, n- such that ~ affects ~ in

it suffices

is, that

to show

that

~ = ABORT(T’)

is an ancestor

action

ET AL.

of

T.

there is an event /3’. This contradicts

affects

of T. We

Then,

the

derive

by the

a

fourth

* of T between + and the inductive hypothesis.

(2) T is an access to object X and w is a REQUEST_ COi14MITfor T. Then, by the fourth property of affects relations, there is an event Y of object X between ~ and m in ~‘ tion for m in the filtered (3)

such that database

m is CREATE(T). Then, is a REQUEST–CREATE(T)

~ affects is violated,

~ in ~‘. Then, a contradiction.

by the fourth property of affects relations, there event + between # and m such that +

affects * in ~‘. Since REQUEST_ CREATE(T) the inductive hypothesis implies that T’ is not The only possibility is that T’ = T, which implies REQUEST_ basic

CREATE(T)

database

in

well-formed,

P.

But

this

rr such that

~ affects

is an action of parent(T), an ancestor of parent(T). that ABORT(T) precedes

implies

rr is a nonserial ❑ diction.

5.3. SIMULATION following theorem system

ensures when

~ in /3’. Since transaction

behavior

~

discover

is

that

basic action.

OF

h

an orphan.

ment

algorithms.

( ~ ) = T“,

this contradicts

This

orphan, is the

gets

~ IT).

In

the

view

correctness

~ affects @ in hypothesis.

is undefined,

THE

FILTERED

SYSTEM.

that

a view

it could

get

in the

T’s

“view”

other it

words, sees

property

an

the

orphan

is consistent for

the

with

orphan

Let

y be the subsequence

of ~ containing

all actions

in

a

it

not

manage-

Let ~ be a jiltered behavior and let T be a transaction 9. there exists a basic behalior y SLLCh that T is not an oiphun in PIT.

PROOF.

basic cannot

THEOREM

Then ylT=

The filtered

It shows

a transaction

~‘.

a contra-

paper.

(Formally,

since basic

w such that the inductive

BY

of this

transaction

behavior,

), where T is a child of T. there is a REQI_JEsT_

transaction(m)

SYSTEM

result

an orphan.

local

is an

Then,

BASIC

key

every

is not its

being

THE

is the that

it

/? ) is not

hypothesis.

CllEATE(T, v) event + between ~ and Since transaction ~) = T, this contradicts

system

serial(

v), where T“ is a child of fourth property of affects v) event @ between ~ and

(5) T is a n.onaccess and ri is REPORT-ABORT(T” By the fourth property of affects relations,

(6)

that

a contradiction.

(4) T is a nonaccess and n- is REPORT-COMMIT(F3 T. Then T’ is an ancestor of T. By the relations, there is a REQUEST_ COMMIT(T”, the inductive

the precondi-

name. y and

n such that

transaction(w) = T, and all other actions @ that affect, in ~, some action whose transaction is T. Since RD is a transitive relation, y is RP-closed in ~. By the definition of a family of affects relations, y is a basic behavior. It suffices in

y.

to show that Suppose

not;

there that

is no ancestor is,

there

T’ of T for which

exists

an

ancestor

ABORT(T’ T’

of

T

) occurs for

which

Correctness

of Oiphan

ABORT(T’)

occurs

of T such

that

We obtain

PROOF.

Let

Theorem 8 and

of -y, ~ contains

~ affects

is

corollary

T in

an event

rr

8, this

is

P. By Lemma

~

be

a filtered

a behavior

~ IT =

/3 IT. Since

a

of Theorem

9.

serial

correct for nonorphans,

then the

correct.~4

9 yields

behavior

6 of the the

behavior

basic y

and

basic system

with

let

T be

system

is serially

-y IT =

any

such

8 IT;

transaction

that

T is not

correct

this

is

for

name. an orphan

nonorphans,

equal

to

~ IT,

as



needed.

At first,

it might

seem somewhat

REQUEST–COMMIT for all orphans. of the indicate event

event

If the basic system is serial~

10.

system is serially

there

907

by the construction

an ABORT(T’)

an important

COROLLARY

in

in y. Then

Algorithms



impossible.

filtered

Management

events

The reason

surprising

of orphan

that

it is enough

accesses to ensure

it is not necessary

to filter

to prevent

serial

other

the

correctness

actions

is because

assumptions about affects relations. Essentially, these assumptions that an event, say CREATE(T), cannot “know” about an ABORT(T’) unless

“knows”

some earlier

about

event,

in this case REQUEST_

the ABORT(T’).

an affects-closed is not affected

(The

assumption

CREATE(T),

about

affects

subsequence of a behavior is itself a behavior by ~, n cannot know that @ has occurred,

already

relations

that

implies that if n since there is a

behavior of the system in which m occurs and @ does not.) If we assume inductively that REQUEST_ CREATE(T) does not know about the abort of an ancestor

of T, then

CREATE(T)

will

The assumptions

the properties

not know about

assumed

about affects

for affects

relations

guarantee

that

it either. relations

derive

from

the restricted

communi-

cation patterns in typical systems: A nonaccess transaction receives information from its parent when it is created, and from its children when they report, but not from any other source. Access transactions may receive information from other accesses (e.g., accesses to the same object share state), but can only affect nonaccesses through a REPORT_ COMMIT event, which must be preceded by a REQUEST–COMMIT. As long as T does not receive reports from

any accesses that

a state

that

COMMIT

depends

actions

“know”

that

on

abort.

the

for orphan

its ancestor In

effect,

accesses, we isolate

has aborted, by

T cannot

preventing

orphan

observe

REQUEST_

transactions

from

the

objects, ensuring that an orphan transaction never sees that it is an orphan. Sometimes we want to be explicit about the dependency of the filtered system on the particular basic system and family of affects relations it is derived. Thus, we restate the corollary above in a form that dependency. then

If B is a basic

let Filtered(B,

system

and R is a family

R) be the corresponding

the same transaction automata the given basic database.

and the

filtered

filtered

of affects system;

database

from which exhibits this

relations

for B

it is composed

that

corresponds

of to

14It should be easy to see that the filtered system is also a basic system, and so the notion of serial correctness has been defined for filtered systems. Similar comments hold for the rest of the systems described in this paper.

M. HERLIHY

908 COROLLARY

6. Argus In that

comect for nonorphans,

section, system

we

describes

in

database,

simple

construction.

transactions

the

formal the

an

filtered

transactions.

tributed

system.

aborts

access

that

track

of

In

the

actual

sent

from

an

over event

directly-affects implement.

is serially

for

con-ect.

database,

the

Argus

by

the

that

[10].

basic

As

with

the

database

which

the

system

via

is composed

Argus

system

is serially

a of

imple-

correct,

ensure

abort

of

by

if

each later

that make

event

the

so is

ancestor,

the occurs,

h affects.

about

aborts

formally

directly

in

about of an

algorithm

and

we

affects

keeps

propagates

is propagated

ajj$ects. A

poor

describe another

of

a dis-

knowledge

by propagating

algorithm

@ directly

practical

Argus

that

this

of

actions

use of local

that

it

knowledge

REQUEST–COMMIT

event

the

global

is not

events

model

event

one

makes that

knowledge we

could

knowledge

an

uses

IIEQUEST_COMMIT

algorithm

system,

every

database

the

global

to any

network;

filtered

To

“known”

relation However,

show

filtered

filter

of

occurred.

Argus

to

The to

kind

an event

the

the

in the

system,

in the

system.

actions

Thus,

Argus

and

if

from

used

an Argus database

discussed

is obtained the

algorithm

by defining

algorithm

define

Thus.

This

aborts from

sages

of

affected

the

knowledge

Argus

have

is not

the

IIATABASE.

history

access the

of affects relations

B, R]

management

the algorithm

database

then

Argus

6.1. THE ARGUS entire

orphan

terms

system.

corresponding

the

the

Argus We

and

the

analyze

[9, 10]. We describe

filterecl

ments

then Filtered(

$stems

this

Argus

Let B be a basic system and R a famih

11.

B. If B is serially

ET AL.

this in

mes-

knowledge choice here

event

of hard

w

a to

if

only

the two events occur at the same site, or if a message is sent from ~’s site to n-’s after ~ occurs and before n occurs, then it is straightforward to implement the algorithm using information available the information about aborted transactions The construction of the Argus database

locally at each site, by transmitting on messages sent between sites. makes use of the given family R’ of

directly-affects relations for B. The Argus database has the same signature as the basic database. The state of the Argus database has three components: basic_ state, histo~, and known–aborts, where basic_ state is a state of the basic database, history is a sequence of basic actions, and known–aborts is a partial mapping from basic transactions whose

events aborts

to sets of transactions. This mapping records affect each event that has occurred. The

the set

known_ aborts(w ) may actually include more transactions than those whose aborts affect ~. By adding more aborted transactions to this set, an implementation would restrict the behavior of orphans further than is strictly necessary to ensure the correctness conditions. In Argus, for example, each event occurs at some physical node of the network, and each node manages a sipgle set of aborted transactions for the entire set of events that occur at that node. In this case, known_ aborts(~ ) includes at least the transactions whose aborts affect T, as well as those transactions whose aborts affect any other event that has occurred at the same physical node as n. Initial states of the Argus database are those with basic–state equal to an initial state of the basic database, history equal to the empty sequence, and known_

aborts

everywhere

undefined.

A triple

(s’, T,s)

is a step of the Argus

Correctness database

of O@zan if and only

(1) (s’.basic-state, (2)

Management

s.history

if the following

conditions

m-, s.basic_state)

= s’.historym,

909

Algorithms hold:

is a step of the basic

if T is a basic

database,

action,

(3) s.history = s’.history if T is not a basic action, (4) If m is a basic action and (+, m) = R~,~,,,o,Y then

s’ .known-aborts(

d) c

s.known_aborts(m), (5) If (6)

+ # n, then

s.known_aborts(~)

= s’.known-aborts(

~),

If rr is a basic

action and + is an ABORT(T) event in s’.history such that then T = s.known–aborts(m ), (+ T) c Rim,.,,, (7) If m is a REQUEST_ COMMIT(T, v) action for an access T to object X, then there is no ancestor of T in s’ .known–aborts( +), for any event 4 of object X in s‘ history. There basic

are two

database.

require

significant First,

s.known_aborts(~

directly

affects

requires

T

constraints ABORT(T)

differences

the

effects ) to

r. In addition,

to

be

in

between

for

each

in

include

s’ .known_aborts( m that is directly

to ensure

Second, the precondition permits the event to occur

n

an event

known–aborts(~

are enough affects n.

the Argus

action

for only

). As

that

database

the +)

Argus for

affected

Lemma

each

~

that

by ABORT(T)

16 below

s.known_aborts(m-)

and the database

contains

shows,

these

T whenever

the R13QUEiST_COMIVIIT of an access to X if the access does not “know about” the abort

of an ancestor, that is, no ancestor is in s’ .known_aborts( ~) for any event ~ of object X in s’ history. As Lemma 17 below shows, this is enough to ensure that every Argus The

behavior

known_

by the Argus

is a filtered

aborts

mapping

algorithm

behavior.

models

to keep

track

the distributed

information

of actions

abort.

that

maintained

However,

rather

than modeling nodes directly and keeping the information on a per-node basis as is done in the actual algorithm, we maintain the information for each event, propagating it whenever one event directly affects another. The known_ aborts component is managed so as to ensure that at least the minimum

amount

implementation

of

necessary

is permitted

stance.

an implementation

coarser

granularity.

information

to propagate might

keep track

03y maintaining

than

of the Argus

type follows

In describing

the

of the known_

the known–aborts

basis, the implementation this strategy.)

is propagated more

algorithm

at each

step.

minimum;

for

aborts

mapping

in the current

the algorithm,

mapping

An inat a

on a per-node Argus

we have tried

prototo focus

on the behavior necessary for correctness, and to avoid constraining an implementation any more than necessary. Notice also that the Argus database does not put any upper limit on what goes into the known–aborts known_ aborts(~ ) to contain

mapping. For example, it is permissible for a transaction that has not aborted. This might

cause nonorphans to block, but will otherwise not result in incorrect behavior. It would be easy (and intuitively appealing) to add a requirement that known_ aborts(w) only includes aborted transactions, but this is not necessa~ for proving that the algorithm To prove other properties, we would

prevents all orphans from such as that the algorithm

need to add additional

We do not attempt

requirements

to state or prove

seeing inconsistent views. only detects real orphans,

such as the one just mentioned.

such properties

in this paper;

the property

M. HERLIHY ET AL.

910

just described is a special case of more general liveness properties, which are an appropriate subject of further research. Finally, while the Argus database is distinguished from the filtered database by maintaining information about ABORT actions on a more local basis, we have kept the state component s.history, which maintains global knowledge of the past behavior of the system. A practical implementation of this algorithm would maintain less voluminous history information in a distributed fashion. Examination of the additional preconditions imposed by the Argus database on any

action

directly

n

affect

reveals

access T to object of efficient

that

s.history

T, and when

X, to determine

maintenance

is used the events

of sufficient

particular basic database further in this paper.

to

determine

the

n- is a REQUEST–COMMIT(T,

and

6.2. THE ARGUS SYSTEM.

that

history

The

Argus

precede

information

distribution

scheme,

system

events

~

v) action

that

for

an

n at X. The details are dependent

and

are

not

is the composition

on the

addressed

of transac-

tions and the Argus database. Executions, schedules, and behaviors of the Argus system are called Argus executions, Argus schedules, and AGUS behauiors, respectively. LEMMA

The Argt{s

12.

PROOF.

The

LEMMA

proof

Let

13.

system implements

is similar

to that

~ be a finite

Argus

in state s. Then s.krlowtl_aborts(w The

following

once during LEMMA

database

lemma

an Argus 14.

Let

The next

behavior

) is defitled

says that

~‘ p be a finite (s’,

~,s)

v ) is dejined.

lemma

of Lemma

each

that can leave the Argus if and only if v

known–aborts

says that

Argus

schedule,

15.

Let

and

r

database

is an elent

set is defined

is an extended

where

the known_

aborts

~ be a finite

Argus

in ~.

at most

can leave the Ajgus database.

If

) = s.known-aborts(v

set for

behalior

~‘

step of the Argus

then s‘ .knowtz–aborts(rr

events that directly affect m-, and that the known_ m is directly affected by an ABORT(T) event. LEMMA



7.

execution.

in state s‘ and

s‘ .known_abotts{

the basic system.

an event

aborts

T includes

set for w includes

that can leave the Argus

). all T if

database

in state s. (1)

If

4

are events

s.known–aborfs(

(2)

If w is directly aborts(w). PROOF.

Lemmas

affected

Immediate 13 and 14.

The next lemma It says that

in

P such

that

c+) G s.krzown_abo?fs(

by

$

by an ABORT(T)

the

directly

affects

w

in

~,

then

m).

definition

elent

of

the

in

~, then

Argus

T G s.known_

database

steps

and



is the key to the proof

the known–aborts

such that an ABORT(T) event propagates enough information about” (stores in s.known_aborts(

set for

of correctness

an event

for the Argus

m includes

system:

all transactions

affects T. In other words, the Argus database about aborts so that every event w “knows m-)) every abort that affects it.

T

Correctness

of O~han

LEMMA

Let

16.

in state s. If

p be a finite

~ and

ABORT(T)

euent,

PROOF.

The

Algorithms

Argus

911

behavior

that can leave the Argus

T are euents in ~ such that

~ affects

database

T in /3 and

~ is an

then T c s.known_abotis(~).

proof

directly-affects 1, then

Management

proceeds

relation

@ directly

by

by which

affects

induction

on

~ affects

m in

the

m in

~. Then,

length

of the

~. If the

Lemma

length

15 implies

chain

in

of the

that

the

chain

is

T E s.known_

aborts(v).

Now

suppose

that

the length

of the chain

is k + 1, where

k >1.

Then

there

is an event ~ in ~ such that @ affects ~ in ~ by a chain of length at most k and Y directly affects T in /3. By inductive hypothesis, T ~ s.known-aborts( +). Lemma 15 implies that s.known–aborts( ~) c s.known–aborts(n), so that T ~ ❑ s.known_aborts(n ). 6.3. SIMULATION ing

lemma

precondition ensure

LEMMA

the

same

Argus Argus is

except to

m- =

of

a mapping

we

omission

state.

We

must

G f(s’)

the

Argus

claim

element

f that

the

t‘

that

(t’,

the

consisting

for For

known_ actions

implements

of

the

that

system.

The

each

state

only

interesting T is an

of the

Argus that

component mapping.

filtered

to

system,

to

the

Condition and

an object

of the (s’,

check

where

is

of

state

system, case

access

filtered

the

system

s’ is a reachable

of the

to

s of

filtered

s.known_aborts

state

the

enough

system,

of the

that

is

followwith

system.

f is a possibilities

v), where m, t) is a step

to

The

combined

accesses,

filtered

state

2, suppose

is a reachable

for

assigns

SYSTEM.

aborts,

the filtered

of the

show

condition

REQUIX+lT_~OMMIT(T,

case,

BY THE ARGUS in

system implements

set f(s)

check.

system,

a step

system

define

singleton

database

1 is easy

SYSTEM

information

REQI_JEST_COMMIT

The Argus

We

the

the

Argus

17.

PROOF. the

that

on

that

system

OF THE BASIC

shows

m,s)

is when X.

In

t is the

this

single

of f(s).

To show that conditions

(t’, r, t) is a step of the filtered

defining

these

steps. The

first

three

system,

we must

are immediate

show the four

from

tion of the steps of the Argus database. To see the fourth, suppose ancestor of T. We must show that no ABORT(T’) event affects

the definithat T’ is an an event of

object X in t‘ history. Suppose the contrary, that an ABORT(T’) event ~ affects an event * of object X in t‘ history. Then, + also affects @ in s’.history. By Lemma 16, T’ = s.known_aborts( ~). But this violates the precondition for n in the Argus The

database.

following

Therefore,

theorem

shows

m is enabled that

ensure that every nonaccess transaction in which it is not an orphan. THEOREM

Then

there

18.

Let

~ be an Argus

exists a basic

behavior

Argus

in t‘.

ID

systems,

like

gets a view it could

behavior

y such

filtered

systems,

get in an execution

and let T be a transaction

that

T is not

Theorem

9.

an orphan

name.

in

y and

about

serial

ylT=~lT, PROOF.

As

for

correctness.

Immediate

the

filtered

by Lemma

system,

17 and

we obtain

an important



corollary

M. HERLIHY ET AL.

912 COROLLARY

Argus

If the basic system is serial~

19.

system is serially

Again,

correct for nonoiphans,

then the

correct.

we give a version

of the preceding

corollary

in which

the dependency

of the Argus system on the basic system is made explicit. If B is a basic system and R’ is a family of directly-affects relations for B, then let Argus(B, R’) be the corresponding Argus system; it is composed of the same transaction automata and the Argus database that is constructed from the given basic database,

using

COROLLARY

family

Suppose

20.

affects relations serially

the given

of relations

R’.

B is a basic system and R‘

for B. If B is serially

is a family

correct for non orphans,

of directly-

then Argus( B, R‘)

is

correct.

7. Strictly

Filtered

Systems

The orphan management algorithm described in [17] actually ensures a stronger property than does the Argus algorithm. It ensures that REQUEST–COMMIT can never occur for an orphan access, whereas the Argus algorithm merely can occur if the access can ensures that no such REQUEST_ COMMIT “observe”

that

database,

which

no ancestor

it is an orphan. allows

In

this

a REQUEST_

has aborted.

(Compare

section,

we define

COMMIT

to occur

this to the filtered

an access to REQUEST_ COMMIT access is not affected by the abort.)

if an ancestor We then define

which is composed of transactions and the strictly that the strictly filtered system implements the section, we describe formally the algorithm from ments

the strictly

7.1. THE similar triple

STRICTLY

to the (s’,

following

filtered

filtered

rr, s)

FILTERED

is

a step hold.

database,

(3) s.history

= s’ history

of

DATABASE. it has

the

The

the

strictly

if n is not

same

LEMMA PROOF.

if

allows

strictly

filtered and

database

event

the if

database same

and

states.

only

if

is A the

database,

action,

v) where

Thus, at the point where the REQUEST_ occur, an explicit test is performed to verify any ancestor of the access. 7.~. THE

which

filtered controller, and show filtered system. In the next [17] and show that it imple-

actions,

filtered

a basic

If m- = REQUEST–COMMIT(T, ancestor of T, then no ABORT(T’)

composition schedules, executions,

filtered

has aborted as long as the the strictly filtered system,

(1) (s’.basic–state, m-, s.basic_state) is a step of the basic (2) s.history = s’.historyn if n- is a basic action, (4)

strictly

an access only

system.

database;

conditions

the for

T is an access, occurs in s’ history.

COMMIT that there

and

T’

is an

of an access is about is no preceding abort

STRICTLY

to

of

FILTERED SYsTE~. The strictly jlltered system is the of transactions and the strictly filtered database. Executions, and behaviors of the strictly filtered system are strictly filtered schedules, and b~haliors, respectively. 21. The

The strictly filtered proof

is similar

system implements to that

of Lemma

the basic ~stem. 7.



Correctness

of Orphan

7.3. SIMULATION LEMMA PROOF.

We

that

f

condition t’

define

system

show

G f(s’)

BASIC

singleton

a

possibilities

is

that

is a reachable filtered

we

unique

claim

that

element

To

show

of the

m, t)

(t’,

these

tion

of

the

T’

is an ancestor

steps

of

the

state

s of

same

state.

the

the

system,

only

of

the

easy

and

(s’,

access

n-,s)

to

strictly

We

must

check.

For

filtered

case

filtered

the

to

strictly

interesting

T is an

a step

X

tions

for

the

strictly

Therefore,

no

Thus,

filtered

first

system,

is a step

to

check

of the

is when

an object

system,

But

that

X.

where

we

must

immediate To

see

=

the

s’ history, no

event

In

this

t is

the

and

an

the

the

fourth,

affects

by the

of

an

precondi-

event

event

four

definisuppose

event

ABORT(T’)

affects

show from

no ABORT(T’)

database,

ABORT(T’)

is enabled

are

database.

show t‘ history

filtered

system,

three

filtered must

in t‘ history.

~

of the The

strictly

of object in

steps.

of T. We

event

t ‘history.

the

m, t) is a step

that

m

is

each

1 is of

SYSTEM

system.

of

state

FILTERED

the filtered

Condition

filtered

before,

to

consists

v), where

(t’,

defining

s‘ .histo~.

assigns

that

s‘ is a reachable

As

STRICTLY

of f(s).

that

conditions

f that f(s)

mapping.

state

system.

BY THE

system implements

set

REQUEST–COMMIT(T,

case,

SYSTEM

a mapping

the

2, suppose

strictly m =

OF THE

913

Algorithms

The strictly filtered

22.

filtered

Management

occurs object

in

X

in



in t‘.

Strictly filtered systems, like filtered systems orphans from discovering that they are orphans.

and

Argus

systems,

prevent

name.

23. Let ~ be a strictly filtered behavior and let T be a transaction Then there exists a basic behal)ior y such that T is not an orphan in y and

ylT=

~lT.

THEOREM

PROOF.

Immediate

COROLLARY

by Lemma

Theorem

If the basic system is serially

24.

strictly filtered

22 and

system is serially

filtered

COROLLARY

nonorphans 8. Clocked

database

that

correct for nonophans,

then the

be the corresponding transaction automata

strictly and the

correct.

If B is a basic system, let Strictly-Filtered(B) filtered system; it is composed of the same strictly



9.

corresponds

to the given

25. Suppose that B is a basic system. then Strictly-Filtered(B) is serially correct.

basic

database.

If B is serially

correct for

Systems

In this section, we describe formally the orphan management algorithm from [17]. We do this by defining the clocked database, which uses a global clock to ensure that transactions do not abort until all their descendant accesses have stopped running. We then define the clocked system, which is composed of transactions and the clocked database. Finally, we show that the clocked system implements the strictly filtered system, and thus simulates the basic system in the same way as the previously 8.1. THE time

for

access has

not

CLOCRED

each

transaction passed.

access

mentioned

DATABASE.

The

transaction

is allowed Release

times

and

systems clocked

a release

to REQUEST_ are

chosen

do.

database time

for

COMMIT so that

once

maintains every only

a quiesce

transaction. if its

quiesce

a transaction’s

An time release

M. HERLIHY ET AL.

914 time

is reached,

all

its

descendant

accesses

have

quiesced.

A

transaction

is

allowed to abort only if its release time has passed. This ensures that, after a transaction aborts, none of its descendant accesses will request to commit. If quiesce and release times are fixed in advance, some transactions may be forced to abort unnecessarily as their quiesce times expire, and aborts may need to be delayed until release times are reached. It is possible to obtain extra flexibility

by providing

actions

and release times. The action signature

in the

of the clocked

clocked database

database, except that the clocked database internal actions. The new actions are: Internal

database

for

adjusting

is the same as that has

three

quiesce

of the basic

additional

kinds

of

Actions:

TICK ADJUST. ADJUST_ The

TICK

QUIESCE(T), RELEASE(T), action

T an access T any transaction

advances

the clock,

while

the two ADJUST

actions

adjust

quiesce and release times. By adjusting the quiesce time for a transaction to be later than its current value, we cad extend the time during which a transaction is allowed to run. Similarly, by adjusting the release time for a transaction to be earlier

than

its

current

value,

we

waiting as long as would otherwise The state of the clocked database clock,

quiesce,

and

release.

Here,

can

allow

a transaction

be necessary. consists of components basic_ state

is a state

to

abort

basic–state, of the basic

without aborted, database,

initialized at an initial state of the basic database. The component aborted is a set of transactions, initially empty. The component clock is a real number, initialized arbitrarily. The component transaction names to real numbers, mapping from all transaction names

quiesce is a total mapping from access and the component release is a total to real numbers. The initial values of

quiesce and release are arbitrary, subject to the following condition: for all transaction names T and T’, where T is an access and T’ is an ancestor of T, quiesce(T) A triple conditions

< release. (s’, n,s) hold:

is a step of the clocked

(1)

If m is an action of the basic system, is a step of the basic database,

(2)

If m is a TICK, then s.basic–state

(3)

database

then

ADJUST_ QUIESCE, = s’ .basic_state,

If n = ABORNT),

if and only if the following

(s’. basic_ state, n-, s.basic_state) or ADJUST_

RELEASE

action,

then

(a) s.aborted = s’.aborted u {T}, (b) s’:release(T) s s’.clock, (4)

If m 1s not an abort

action,

then

(5) If T = REQUEST-COMMIT(T, s’.quiesce(T), (6)

If m = TICK,

(7) (8)

If If (a) (b) (c)

then

s’.clock

s.aborted v) where

= s’.aborted, T is an access, then

< s.clock,

w is not a TICK action, then s.clock = s’.clock, n- = ADJUST-RELEASE(T), then if T e s’.aborted, then s.release(T) < s’.clock, s’.quiesce(T’) s s.release(T) for all T’ = descendants(T) s.release(T’)

s’.clock

= s’.release(T’)

for all T’ + T,

n

accesses,


s’.interval(T’) for every Effect: s.interval

= s’.interval

u

INFORM_ TIME_AT(X)OF(T, Precondition: (T, P) E s’.interval P = ‘m,.

T’ in siblings(T)

n

domain(s’

{(T, P)} p), T an access to X

interval)

have are

Correctness

of O~han

The following LEMMA

(1)

Management

lemma

Algorithms

927

is straightforward.

39

Let

T be any

transaction

transaction

name.

well- formedness

for

Then

the pseudotime

(2)

Let ~ be any object name. Then the pseudo~ime time object well- formedness for X,

(3)

The pseudotime

9.2.4. strongly

Pseudotime compatible

names with

and

the

associated

name

with

preserves

set {PC}

(for

presemes

presezes

pseudo-

well- fonnedness.

database is the composition of a by the union of the set of object

“pseudotime

X is a pseudotime

the name

controller

basic database

Database. A pseudotime set of automata indexed

singleton

each object

system

controller

controller

T,

object

controller”).

automaton

PC is the pseudotime

Associated

Px for X. Finally,

controller

automaton

for

the

type.

LEMMA

40.

name X,

If

is a behavior

~

/3 IX is pseudotime

LEMMA

of a pseudotirne

object

Let B be a pseudotime

41.

database,

then for every object

well-formed. database.

Then B preserves

basic database

well- formedness. It follows 9.2.5. strongly

the pseudotime

Pseudotime compatible

nonaccess {PC}.

that

automaton object

for

When utions,

Px

9.2.6.

A Fami~ relations

defining transitive

with

Finally,

of a basic

of Affects

names,

transaction each

object

associated

for the system

pseudotime behaviors,

database.

system

the

singleton

set

name

T is a transaction

name

X is a pseudotime

with

the

name

PC

is the

type.

is clear from

the pseudotime

and

of a set of

context,

executions,

we call its exec

pseudotime

sched-

respectively.

Relations.

any particular

Now

we define

pseudotime

system

a family B. We

R = {RD} do this

of

by first

a family R’ = {R~} of directly-affects relations for B, and then taking closures. For a sequence ~ of pseudotime actions, define the relation

R~ to be the relation before

X.

and behaviors

for

set of object

nonaccess

automaton

the particular

ules, and pseudotime

affects

for

controller

schedules,

the

each

T. Associated

automaton

pseudotime

names,

with

AT

is an example

Systems. A pseudotirne system is the composition set of automata indexed by the union of the

transaction

Associated

database

containing

n- in ~, and at least

—transaction(

the pairs

(~, m-) of events

one of the following

+) = transaction(m)

and

n

such that

$ occurs

holds:

is an output

event

of the

transac-

tion, —object( –~

@) = object(m)

and T is a REQUEST–COMMIT(T,

is a REQUEST.CREATE(T) event,

—~ —~ —+ —~ —+ —+

is is is is is is

and

v) event,

T an ASSIGN-PSEUDOTIME(T,

an ASSIGN. PSEUDOTIME(T, P) and T a CREATE(T) event, a REQUEST-COMMIT(T, v) and m a COMMIT(T) event, a REQUEST. CREATE(T) and m an ABORT(T) event, an ASSIGN-PSEUDOTIME(T, P) and w an ABORT(T) event, a COMMIT(T) and m a REPORT_ COMMIT(T, v) event, an ABORT(T) and ~ a REPORT-ABORT(T) event,

P)

928 —+ —~

M. HERLIHY

is a COMMIT(T) is an ABORT(T)

and rr an INFORM-COMMIT.AT(X)OF(T) and n an INFORM-ABORT-AT(X)

—~ is an ASSIGN. TIME–AT(X)OF(T, once

again,

define

easy to see that {RP}

defined

is a family

If

p is a behavior

LEMMA

of affects

{RP}

Applying Orphan easy corollary from

and

nono~hans,

of Lemma

is a family

n

relations

an

INFORM_

closure

We

claim

of R~. It is

that

the family

for G. system B and y is an R@-closed



36.

of affects relations

Algorithms the Argus

be used for

then the following

order.

relations

system,

defined

for B.

to Pseudotirne Systems. The algorithm and the clocked

a pseudotime

Let B be a pseudotime

44.

directly-affects

and

of B.

Management shows that

[17] can both

event

of pseudotime

to the proof

The family

COROLLARY

affects

partial

of ~, then y is a behalior

43.

algorithm

P)

event, event, or

OF(T)

RP to be the transitive

is an irreflexive

Analogous

PROOF.

9.2.7. following

the relation

R6

above

42.

LEMMA

subsequence

PSEUDOTIME(T, p) event.

ET AL.

aboue.

system:

and R and R‘ If

B

the family

is serially

cowect

of for

are true:

—Filtered( B, R) is seriallj correct, —Argus( B, R‘ ) is serially correct, —Strictly-Filtered( —Clocked(

B)

B) is serially is serially

correct,

correct.

10. Conclusions We have defined correctness properties for orphan management algorithms, and have presented precise descriptions and proofs for two algorithms from [10] and [17]. Our proofs are quite simple, and show that the systems exhibit a substantial

degree

of modularity:

the

orphan

management

algorithms

can be

used in combination with any concurrency control protocol (in basic system form) that is serially correct for nonorphans. The simplicity of our proofs is a direct result of this modularity, and is in sharp contrast to earlier work [6], in which were

the orphan not cleanly

management

algorithm

and the concurrency

control

protocol

separated.

Our proofs have an interesting structure. We first define a simple abstract algorithm that uses global information about the history of the system, and show that it ensures that orphans see consistent views. We Argus algorithm and the clocked algorithm in a way that only local information, and show that each simulates algorithm. The simulation proofs are quite simple, and do ing the properties already proved for the abstract algorithm. the Argus

and clocked

algorithms

the abstract algorithm. In this paper, we have transactions. eliminating complicated whether

the

then

analyzed

follows

only

directly

orphans

from

that

then formalize the requires the use of the more abstract not require reprovThe correctness of the correctness

result

from

aborts

of of

Interesting algorithms have also been developed for detecting and orphans arising from crashes [10, 17]. These algorithms seem more than the algorithms for handling aborts. An open question is known

algorithms

for

handling

crash

orphans

can

be analyzed

Correctness

of O~han

Management

using techniques similar find a similar separation

929

Algorithms

to those in this paper. In particular, of concerns for those algorithms,

orphan algorithms can be understood protocols and abort-orphan algorithms.

it would so that

be nice to the crash-

independently of concurrency Whether this will be possible

control is still

unknown.

ACKNOWLEDGMENT. for their

We

comments

thank

on earlier

Alan

Fekete,

versions

Ken

Goldman,

and Sharon

Perl

of this work.

R13FERENCES 1. ALLCHIN, J. E. An architecture for reliable decentralized systems. Tech. Rep. GIT-ICS83/23. Georgia Institute of Technology, Atlanta, Ga., Sept. 1983. M., AND WEIHL, W. A theory of timestamp2. ASPNES, J., FEKETE, A., LYNCH, N., MERRIIT, of the 14th International based concurrency control for nested transactions. In Proceedings Conference on Very Large Data Bases (Aug.), Morgan-Kaufmann, San Mateo, Calif., 1988, pp. 431-444. and Reco[ery in 3. BERNSTEIN, P., HADZILACOS, V., AND GOODMAN, N. Concurrency Control Database Systems. Addison-Wesley, Reading, Mass., 1987. Commutatk@-based locking for 4. FEKETE, A., LYNCH, N., MERRITT, M., AND WEIHL, W. nested transactions. J. Comput. Syst. Sci. 41, 1 (Aug. 1990), 65–156. 5. GOLDMAN. K., AND LYNCH, N. Quorum consensus in nested transaction systems. In Proceedings of 6th An?u~al ACM Symposium on Principles of Distributed Cornpzdation (Vancouver, B. C., Canada, Aug. 10– 12). ACM, New York, 1987, pp. 27–41. (Expanded version available as Tech. Rep. MIT\ LCS/TM-390. Laborato~ for Computer Science. Massachusetts Institute of Technology, Cambridge, Mass., May 1987.) Internal consistency of a distributed transaction system with orphan detection. 6. GOREE, J. A. Tech. Rep. MIT/LCS\TR-286. Massachusetts Institute of Technology, Cambridge, Mass., January 1983. M., LYNCH, N., MERRITT, M., AND WEIHL, W. On the correctness of orphan 7. HERLIHY, of the 17th International ,$ymposiam on Fault-Tolerant elimination algorithms. In Proceedings Computing (Pittsburgh, Pa., July). IEEE, New York, 1987, pp. 8–13. (Extended version available as MIT/LCS/TM-329, Laboratory for Computer Science. Massachusetts Institute of Technology, Cambridge, Mass., May 1987.) M. P., AND WING, J. M. Inheritance of synchronization and 8. DETLEFS, Il. L., HERLIHY, recovery properties in avalon/C + +. IEEE Cornput. 21, 12 (Dec. 1988) 57–69. and actions: Linguistic support for robust, 9. LISKOV, B., AND SCHEIFLER, R. Guardians distributed programs. ACM Trans. Prog. Lang. Syst. 5, 3 (July, 1983), 381-404. Orphan detection. In Proceedings 10. LISKOV, B., SCHEIFLER, R., WALKER. E. F., AND WEIHL, W. of the 17th International Symposium on Fault-Tolerant Computing (July). IEEE, New York, 1987, pp. 2–7. AdL. Comput. Res. 3 Concurrency control for resilient nested transactions. 11. LYNCH, N. A. (1986),

335-373.

12. LYNCH, N. A., AND MERRITT, M. Comput

Sci. 62 (1988),

Introduction

to the theory of nested transactions.

Theoret.

123-185.

to the theory of nested transactions. In ProceedTheory (Rome, Italy, Sept.). 1986, pp. 278–305. 14. LYNCH, N., MERRITT’, M., WEIHL, W., AND FEKETE, A. Atomic Transactions. MorganKaufmann, San Mateo, Calif. To appear, Fall 1992. Hierarchical correctness proofs for distributed algorithms. In 15. LYNCH, N., AND TUTTLE, M. Proceedings of the 6th ACM Symposium on Principles of Distributed Computing (Vancouver, B. C., Canada, Aug. 10–12). ACM, New York, 1987, pp. 137–151. (Expanded version available as Tech. Rep. MIT\ LCS/TR-387. Laboratory for Computer Science. Massachusetts Institute of TechnoloM, Cambridge, Mass., April 198’7.) An introduction to input/output automata. CW7 Quarter@ 2, 3 16. LYNCH, N., AND TUTTLE, M. 13. LYNCH, N., AND MERRHT, ings of the International

(1989),

Introduction

on Database

219-246.

17. MCKENDRY, Softw.

M.

Conference

Eng.

M.,

AND HERLIHY, 1989).

15, 7 (July,

M.

Timestamp-based

orphan

elimination.

IEEE

Trans.

930

M. HERLIHY

ET AL.

Nested Transactions: An Approach to Rehable Distributed Computmg. MIT 18. MOSS, J. E. B. Press, Cambridge, Mass., 1985. 19. NELSON, B. J. Remote procedure call. Tech. Rep. CMU-CS-81-119. Dept. Computer Science. Carnegie-Mellon Univ., Pittsburgh, Pa., May 1981. 20. PERL, S. Distributed commit protocols for nested atomic actions. Tech. Rep. MIT/LCS/TR-431. Massachusetts Institute of Technology, Cambridge, Mass., Sept. 1987. 21. Pu, C. AND NOE, J. D. Nested transactions for general objects: The Eden implementation. Tech. Rep. TR-S5-12-03. Dept. of Computer Science, Univ. of Washington, Seattle, wash., December 1985. System level concurrency cOntrol for ZZ. ROSENKRANTZ, D. J., STEARNS, R. E., AND LEWIS, P. M.

distributed database systems. ACM Tram. Datab. Syst. 3, 2 (June 1978), 178-198. A., ANDSWEDLOW,K. Galde to the Camelot Distnbated Transaction Facd@: Release 23. SPECTOR, 1. Carnegie-Mellon Univ., Pittsburgh, Pa., Oct. 1987. 24. WALKER, E. F. Orphan detection in the argus system. Tech. Rep. MIT/LCS/TR-326. Massachusetts Institute of Technology, Cambridge, Mass., May 1984. W. E. Specification and implementation of atomic data types. Tech. Rep. 25. WEIHL, MIT/LCS\TR-314. Massachusetts Institute of Technology, Cambridge, Mass., 1984. Modular concurrency control for abstract data 26. WEIHL, W. E. Local atomicity properties: types. ACM Trans. Prog. Lang. Syst. 11, 2 (Apr. 1989), 249–283. RECEIVED

JUNE 1987;

REVISED MARCH 1991 ; ACCEPTED APRIL 1991

JOurnd of the AssocIdmII for Cmnputlng hidchl~~m, Vd

39, No 4, oct~bc~ 1992

Suggest Documents