Designing Application Software in Wide Area ... - Semantic Scholar

6 downloads 15242 Views 1MB Size Report
Oct 17, 1990 - as a set of tools for use within the Isis system is described. Keywords ... we lack a good model for applications of this sort. ..... to free-standing.
Designing Application in Wide Area Network

Software Settings

Mesaac Makpangou Ken Birman*

, f,:

TR 90-1165 October

1

1990

J

,

Department of Computer Cornell University Ithaca, NY 14853-7501

*This research was funded in part under DARPNNASA in part under DARPA contract MDA-972-88-C-0024.

subcontract

Science

NAG 2-593, and

Designing Application in

Software

Wide Area Network Settings * Mesaae

Ken

Makpangou October

17,

Birman

t

1990

Abstract Progress in methodologies matched by similar results

for developing robust for wide-area settings.

local area network software In this paper, we consider

has not been the design of

application software spanning multiple local area environments. For important classes of applications, simple design techniques are presented that yield fault-tolerant wide area programs. An implementation

of these

Keywords and work partitions.

1

is growing

this

Process-groups,

phrases:

recognition

approach,

implement

group

protocols.

multicast systems

for use within

the

Isis system

ISIS, fault-tolerance,

[2], port

groups

plemented although

wide-area

have

different

restricted tency, not

high occur.

is described. protocols,

ronment.

cated

paper

in different will typically

of group multicast,

local

networks

and

that

work

area on

process

For example, that

although

hold

net-

and

perform

examines

wide

LAN

systems.

area

by binding

"This research was funded in part contract MDA-972-88-C-0024.

poorly

the

user

may to a local

under DARPA/NASA

not give

present

subcontract

area

to The

to include

[5] and have

etc).

ISIS

been

im-

Unfortunately, networks

(WAN) has

been

communication

network

typical

la-

partitions

of the

acceptable

do

WAN

envi-

performance

in

environment.

interconnecting an integrated

representative

RPC

V system

low

be lost,

in a WAN by

and

communication

assume

that

constructed

applications

wide

are

cooperate so forth.

protocols

group

may

protocols

(or incorrectly)

and

and

but

that

multiRPC,

systems

messages

environment

multicast

applications

Such

such

in the

computing.

and

IPC

multicast

(LAN)

groups

most

individual

in a LAN

mechanisms

operate

groups

reliable

most

might

(process

a variety

assumptions

environment,

This

facilities

processes another,

conventional

multicast,

environments.

Consequently,

a LAN

group

extend

in distributed

of one

Likewise,

accepted

bandwidth, These

typically

groups

monitor

causal

characteristics,

to LAN

into data,

paradigm

[11], etc).

multicast,

it is widely

replicated

systems

implemented

in Chorus

(atomic

of the process-group is structured

share

for such

have

utility

software

services,

facilities

Many

of the

distributed

distributed

communication

but

as a set of tools

Introduction

There In

techniques

process interface

of the WAN

NAG-2-593,

and in part

groups

lo-

abstraction, service,

which

under

DARPA

responds to requests a suitable collection protocols

that

using local data whenever possible. Our goal is to identify and impleme_lt of WAN tools to assist in this process. These consist of mechanisms a:L¢l

assume

that

applications

will be long-running

and will experience

such problems

partitions, network crashes, and long haul connection failures. Because few WAN applications have been developed, we lack a good model for applications this sort. To overcome this, we begin by examining problems that arise in a WAN application capture needed

and analysis of seismic signals. We then turn to the problem to solve this problem. Finally we discuss a general framework

applications, The rest

presenting this in the context of the Isis environment. of this paper is organized as follows. Section 2 discusses

computing environment. support requirements. emerge

2

from

these

case studies

The

wide

and provide

and area

performances

system

Isis process process

for our initial

the their that

implementation.

of a wide area environment. The system is composed of by point-to-point long haul links that comprise the wide the set of sites belonging to a single local area network.

group is a set of processes that are cooperating context of Isls, a system that provides extensive We say that

about

model

More than one link may connect two clusters. Computing within a cluster takes place in processes

another. situation

our assumptions

assumptions

Figure 1 illustrates the overall architecture a set of local area network, interconnected area network. The term cluster denotes

communication.

figures

of for

the facilities of wide area

Section 3 discusses the applications we have selected and examines Sections 4, 5 and 6 discuss the mechanisms and long haul protocols

Background

2.1

of implementing for the support

as

groups

do not span

groups located

in different

that communicate

for some support multiple clusters

via messages.

A process

purpose. Our work was done for process groups and reliable

in the group

clusters. are related if they communicate

with one

A partitioned wide area application is one composed of related groups. Figure 1 depicts a where we have two partitioned wide area applications represented on each cluster by the

process group named respectively G1 and G2. A local multicast protocol designates a protocol used to multicast a message to the members of some process group. A long haul multicast protocol designates a protocol used to multicast a message 2.1.1

to the members Failure

We assume

of a set of related

groups.

assumptions

that

each LAN system

"isolates"

the effect of a host crash, local connection

failure,

and

LAN partition. This means that only application components located within the affected cluster are involved in the detection and handling of these events. These assumptions hold for our Isis-based implementation, but might limit the applicability of our work to other LAN-based With regard to wide-area communication, we assume that long haul connection crash,

and WAN

say that a _ WAN" partition partitioning:

partition

can all occur.

haul connection occurs

failure

Because

occurs

clusters .may be redundantly

when a link connecting

connected

two dusters

when all such links fail. It will be useful to distinguish

systems. failures, cluster we will

fail, and that a

two subcases

of WAN

Cluster A

Long

Cluster B

haul

Channels

Cluster C Cluster

D

I

I

Figure

1: Overall

architecture

3

of a wide

area

system

Controlled

WAN

partitioning

WAN communication be satisfied (i.e., many applications

lines may be costly or subject

to physical

constraints

always

a satelite link will need a fine-of-site path to a satelite). For these reasons. use a periodic communication model. As needed (or whenever possible),

clusters open communication links. Data is shipped across the links, We will refer to this kind of partitioning as controlled partitioning. Unplanned

that cannot

which

are then

closed.

partitions

A WAN

partition

is unplanned

of the only communication

if it results

line linking

from

an unpredictable

two clusters

or the failure

event

such as the failure

of a machine

managing

an

endpoint of such a line. Such a partition is undistinguishable from the simultaneous failure of all the machines in one of the clusters. Our work assumes that no failure lasts indefinitely and hence

that

communication

area applications The following

will eventually

explicitly

additional

designed

terminology

be reestablished.

to tolerate

Accordingly,

the delay introduced

is used throughout

we focus

by unplanned

the rest of this paper.

on wide

partitions.

A partition

WAN partition. An application is a wide-area application, formed of a set of related groups in separate partitions. And, a connection is a single long haul communication channel. 2.1.2

An

impossibility

There

exists

a substantial

tions. The work the characteristics of possible mit protocols

body

of work on protocols

failures

at its lowest

[10,6]. 1 The

levels,

to maintain

LAN

for environments

to work correctly

implementation

information

process-group members. This information drives An implication is that little of the software modified

running

result subject

most relevant to systems like Isis is by Skeen, who proves of a two- or three-phase commit cannot be terminated

partitioning

is a

about

the higher commonly

in a WAN environment.

of Isis the status

to unplanned

uses multi-phase

com-

(operational/failed)

levels of the system. used by Isis in LAN

In particular,

parti-

that protocols having safely in the presense

settings

the form of consistency

of can be

that

Isis

supports cannot be made tolerant of network partitions without risk of "blocking" when partitions occur. The current version of Isis finesses this issue by shutting down the sites in a "minority" (smaller)

partition.

Were Isis to be used in a WAN setting,

(correct, predictable behavior) or availability. Notice that although $keen's results preclude

one would

any transparent

sacrifice

either

scaling of the existing

consistency Isis systems

- or any similar system - into a WAN environment, it /s possible to make LAN systems highly resilient to failure, and the existing Isis toolkit is quite effective at using state replication for this purpose. This justifies our assumption that from crashes) and will not lose "committed" above. t Readers izabifity consistent

familiar

with

in the presense group

the database of partitiou

management

literature failures.

and atomic

LAN services will be highly available (recovering rapidly state - the property we referred to as failure isolation,

will

be aware

Unfortunately,

communication,

of several

these which

approaches

protocols

cannot

are the cornerstones

that

yield

be extended of the

transactional into

serial-

protocols

approach.

for

2.1.3

Long-haul

We initially Such

channels

assume

that inter-cluster

a channel has the following

communication

• All messages

sent from one cluster

• Inter-cluster

communication

presence

of connection

These characteristics any of the five ISO multiple

physical

is by a communication-failure

Fee fifo channel.

properties: to another

are received

is not subject

to message

in the order duplication

sent.

or packet

loss, even

in

failures.

are stronger than what a general purpose transport transport classes provides, because we require these

communication

links exist between

a pair

of clusters

protocol properties

and even

like TCP or even when

when links fail or

are restarted during the course of execution. In Sec. 5 the implementation of a communication channel with these properties is shown to be feasible using existing Isis facilities. 2.2

Impact

For purposes

of WAN

characteristics

of protocol

design,

on

a wide area

protocol

network

design

(e.g.

ARPANET)

differs

from a local

area

network (e.g. ETHERNET) primarily in four respects: higher latency, lower bandwidth, point to point connections, and a higher probability of partition. These differences, together with the assumption that the application components located in different LAN systems are loosely coupled (that

is, they interact

relatively

infrequently

and

most

interaction

is asynchronous),

stantial impact on the implementation of long protocols, particularly those pair of participants (such as multi-phase commit or reliable multicast): 1. Network

partition

In a LAN

must receive

environment,

more

involving

the low probability

Of partition

makes

it feasible

be so infrequent and because when LAN failures actually occur, they of machine failures by separating application programs from resources of machines

machines have actually with this restriction.

are crippled

by a partition

failed may not be unreasonable.

In a WAN environment,

partition

a subthan a

attention. to either

these events, or to implement a harsh solution such as the Isls approach cited treatment can be justified, at least in moderately small LAN systems, because

If laxge numbers

have more

failure,

provoke large numbers on which they depend.

simply

assuming

Isis users have reported

will often be the usual state,

ignore

above. Such a partitions will

with dusters

that

these

little trouble

contacting

each

other periodically so as to minimize the cost of maintaining open connections for long periods of time and to maximize the use of connections when they are opened. Moreover, because applications of machine

will be loosely coupled, a WAN partition will generally not trigger large numbers failures. These considerations make it important to limit the impact of a partition

and to provide level of service 2. Multicasting Systems groups

mechanisms in partitioned

by which applications settings.

only when it is really

like ISlS often structure with perhaps

can offer some restricted

(or autonomous)

necessary. applications

3 or 4 members

each.

and services A request

using a collection

of small process

on such a group may be implemented

as an IPC or RPC

to a favored

member,

all members

perform

others

it up for fault-tolerance.

back

because

different

the request

group

or as a multicast

in parallel,

members

The

2 to the full set. In this case, either

or one member primary/backup

can respond

performs

the request

approach

is encouraged

as the primary

server

while

the

in I._ls

for different

requests,

providing a form of load sharing. This approach is inexpensive because it benefits from the comparatively high speed of communication and because the backup processes for one request will be working actively on other special LAN hardware facilities. In a WAN environment,

casual

due to the long latency strictions on establishing

requests.

Moreover,

use of a "large-scale

multicast"

of WAN communication, lower and using WAN communication

of programming will not map transparently will normally communicate with the WAN

the multicast

itself

may make

use of

could lead to poor performance

WAN bandwidths, and possible relinks. Consequently, the Isls style

to WAN applications. application through

Instead, the group

such applications representing that

application on the local cluster. As much as possible, this group will respond to requests using local information. If information from a remote server is needed, it will most often request it using some form of point-to-point long haul multicast might remain useful for asynchronous to the groups

3

Case

studies

This section

discusses

in a partitioned

wide-area

a series of problems

communication. On the other purposes, such as the diffusion

hand, a WAN of information

application.

motivated

by a set of wide-area

seismic

monitoring

appli-

cations collectively called the Nuclear Monitoring Research and Development System, or NMRD, being developed by Science Applications International Corporation under contract to DARPA. 3 NMRD includes several knowledge-based applications which collect, analyze and archive seismic data

from

a geographically

dispersed

network

of seismic

ing and analyzing data in the archive to address automated with rule-based AI techniques.

sensors,

seismological

and a rich set of tools issues.

The

system

for select-

is extensively

The largest and most complex element of NMRD is the Intelligent Monitoring System or IMS which detects, locates, and identifies seismic events using data from a network of stations in Eurasia. IMS is structured as a collection of LAN clusters, initially placed in Washington, Norway, and San Diego. As the system more LAN dusters.

is developed,

there

are potential

requirements

for expansion

to include

Our group became involved in developing LAN and WAN software for NMRD 1989. The LAN aspects of NMRD are concerned with system fault-tolerance and

several

and IMS in configuration

management, commmdcation, LAN resource scheduling, and related issues. All of these aspects are beyond the scope of the present paper. Below, we focus on WAN use of Isls in the current IRIS prototype. Currently, IMS is structured like a wheel, with a central "hub" in Washington, DC, that performs most of the automated data interpretation functions. A set of "spokes" connect this hub to free-standing LANs which acquire the data and do extensive signal processing to select and 2We are using multicaet in the sense of a software protocol for communicating with the full membership of a dynamically changing group - not in reference to a haxdware feature. _DARPA Contract No. MDA972-88-C-0024

characterizedata at the "hub"

segments

plays

a crucial

which may have

signals

role in this selection.

of interest.

The

network, and consist of long-distance TCP channels. of automatically initiated data selection and transfer

spokes

The central comprise

interpretation

done

the WAN communication

Most of the WAN communication operations, with the hub software

consists issuing

requests to the remote subsystems. Because the system is automated, the fault-tolerance of these operations is critical to correct function. In the future, IMS and other NMRD subsystems may grow to include multiple hubs, supporting seismic researchers as well as automated a number of of additional WAN services. hypothetical 3.1

issues

File

The most

after

transfer common

and file transfer.

briefly

and

analysis, and this will make it important to support The discussion that follows examines some of these

commenting

remote

on the file transfer

notification

of the WAN applications

The initial

problem.

arising

signal processing

in IMS concern

inter-LAN

is done close to the data

event

acquisition

notification

systems

to avoid

the requirement that all data be transferred to the hub. All acquired data are processed to detect signals and characterize them in terms of a standard set of parameters which axe archived in a local commercial relational database management system (RDBMS). On a regular schedule (e.g., every 15 minutes), the hub initiates a request to transfer data from the remote RDBMS to the central RDBMS at the hub. The automated knowledge-based system (KBS) at the hub analyzes the data from all stations to locate and identify all detected events. Depending on the location and character data.

of the events

The sequence utility is invoked retrieved

(station

formed

by the KBS,

a request

is formed

for relevant

segments

of the raw

of steps involved in such a raw data transfer is as follows. First, the ISIS long-haul by an IMS program running on the hub with a message describing the data to be and time

interval).

The

remote

portion

of IMS

receives

this message,

retrieves

the requested data and initiates the file transfer to the hub. When the file transfer takes place, a suitable spooling area is found for the incoming data and notifies the hub process that initiated the retrieval. Finally, after the transfer has completed successfully, the remote file is deleted. This procedure is generalized by replication for additional errors such as failure to transfer files, lost or duplicate problems 3.2

Resource

Resource and

requiring

location

contents

later

human

intervention.

remote sites. Fanlt-tolerance notification messages, and

4

location is the problem

of the named

of mapping

data objects.

resource

names

important

if the system

expands

into information

This is the problem

services, and represents an active research topic. Because the problem does not yet arise. However, WAN solutions become

is key here: so forth cause

to include

multiple

about

solved by so-called

the location "white

pages"

the current IMS system is centralized, to the resource naming problem would hubs.

Imagine an IMS-like system running with many integrated computational hubs. Each of these hubs would have the ability to request information (new_data) from outer clusters (data that was not provided

as part of routine

processing).

Obtaining

and

analyzing

new_data

may involve

expensive

4IMS almost never _crashes" due to software failures - the system tries to handle errors gracefully. However, errors may cause the system to lose things - events, data for the analyst to review, etc. In cases where the lost data may be important, a fairly tedious manual corrective action will eventually be needed.

(in terms

of resources)

data

retrieval

and processing

a complex data adaptive beaanforming hours of CPU time. Clearly, one would

operation not want

operations.

For example,

it might

require

theft

be performed; such computations may require to perform this sort of operation on hub A when

hub B has already performed one. It follows that when a new_data request is made, a service will be needed to determine if the computation has already been performed (or is underway), and if so, whether it would be cheaper to transfer the computational results or to transfer the raw data and repeat the analysis locally. It is natural to think of such a version or database. cost)

This file would identify

of the corresponding

a computation. A number

of IMS as generating

both raw events

processed

data

and manipulating

and the location

file, or the location

(and

a large event-file

size, and computational

of any hub currently

engagaed

in such

The problem can thus be reduced to one of locating resources in a WAN. of difficult problems now arise. First, observe that the n_.rnlng space is a dynamically

changing one with several natural forms of hierarchy: physical hierarchy in space (i.e., the set of events known only within some local cluster), logical hierarchy (i.e., the set of raw-data objects associated with some new_data event), and global hierarchy (i.e., a set events currently under consideration as evidence that a nuclear test has been detected). Operations on the naming space will be search

requests,

want this namespace should be maintained

read

requests,

and

update

requests.

For simplicity

to present a seamless global abstraction. At the close to where it will be generated or manipulated,

of design,

one would

same time, information to avoid excess WAN

communication. Consistency any update

or coherency

eventually

of such a WAN naming

reaches

all clusters

structure

will correspond

with a copy of an event

descriptor,

to the property that

read operations

preserve the abstraction of a single global namespace, and in particular, that updates serialized. To see this, consider a computation that reads a descriptor (say, a correlation The computation this

descriptor

corrupted. enforcing that

should depends;

subsequently otherwise,

see "current" it would

Such a relationship is causal, causal orderings shortly.

and

appear

copies of any other that

the

we will have

event

namespace

more

appear to be descriptor).

descriptors has

somehow

to say about

To ensure

that

the namespace

on which become

mechanisms

For brevity, we will not develop a complete solution to this problem here. We observe, the core mechanisms needed here will be ways to form WAN groups and to multicast

to the group members. Given into a collection of information

that

for

however, updates

such tools, the resource management service would be structured domains within which updates would be multicast to all members. presents

a causally

consistent

abstraction,

we will need

to know

that any multicast sent to such a WAN group (eventually) reaches all its members, and that if an update is dependent upon some prior update, then all WAN group members see the two updates in the order they were issued. Notice also that once a WAN group is formed in this application, its membership remains fairly stable. Only the creation of new hubs or thew addition of new sensor dusters would require changes in this part of the system configuration. Both operations will obviously should imagine

be infrequent.

be fairly needing

common.

The physical

scale of WAN

On the other hand,

to send messages

to a subset

within

systems

suggests

such a WAN mulitcast

of the total membership.

8

that

this form of stability group,

one can easily

3.3

Resource

scheduling

The above examples the need to support Notice to initiate

show how IMS uses WAN file transfer and WAN multicast. They also hint at WAN resource allocation and scheduling policies in an extended system.

that the e.xisting IMS permits an analysis data retrieval requests and computation

is only one

hub.

However,

with

multiple

program or researcher working in Washington in Norway. This is not a major issue if there

analysis

hubs,

it would

become

important

to partition

computational cycles among the various hub systems contending for database access and signal processing facilities. Otherwise, it would be easy for an IMS component at one location to overload a cluster located halfway around the world, preventing it from accomplishing locally critical tasks such as data

compression

of the computational We can abstract holding "event"

and event

resources. this problem

detection,

as one of selling

denying

tickets

local

analysis

for a periodic

systems

event.

a fair share

Only

a process

the appropriate tickets will be granted access to the processor pool on a given LAN. An in this formulation might correspond to one specific hour of activity on the Norway cluster,

and a ticket to a permission sales problem has substantially

to perform five minutes computation during that hour. more structure than the basic file transfer and remote

problems seen in our first example. A solution to this problem should loosely cluster

or even

address

two goals.

The ticket notification

The first arises from the need

coupled scheduling service. It should be possible to sell tickets for a future event even if communication with that cluster is presently impossible, if a connection

the interaction,

or even if a partitioning

or cluster

failure

occurs.

A second

to design

a

on a remote fails during

goal is that

the system

should satisfy the maximum number of demands possible (presumably using an application-specific cost function) while also guaranteeing fairness (also an application-specific notion). Let us ask what can be said about this problem without speculating on the application-specific aspects.

Clearly,

if the

distribution

of tickets

is static

and

fixed,

a cluster

that

receives

a large

number of demands may not be able to satisfy all of them, while some other duster may fail to sell some of the tickets it holds. This will compromise the second goal, and suggests that the distribution algorithm will either need a central decision making mechanism or a way to dynamically repartition the coUection of tickets. A centralized policy would violate our first goal. Thus, we need a dynamic distributed

allocation

policy.

Such an approach

might

pre-allocate

tickets

to dusters,

but

include

a mechanism for reallocating unsold tickets as the "event period" approaches. Ideally, we would want this mechanism to make progress even if a communication failure or partition occurs. 3.3.1

Structure

Assume

that

cluster.

We will partition

Each vending

of the

we have group

application

N dusters

and

that

a group of ticket

the pool of tickets in N subsets

uses its partition

to serve demands

vending

processes

and pre-allocate

are active

in each

each to a specific

cluster.

from its local workers.

Next,

we divide

the

selling period in subperiods. At the end of each subperiod, each server multicasts a state message to its peers. This message reflects recent sales as well as the anticipated needs of the sender. Finally, on the basis of the state messages it receives, each server computes a new partitioning of unsold tickets

using some deterministic,

well known algorithm.

3.3.2

Classes

of ticket

Repartitioning messages, and

algorithms

algorithms can be characterized by the degree to which actions

We distinguish 1. Class

repartitioning

three

classes

1 consists

in which

state

of such algorithms:

of algorithms messages

and deterministic

by their sensitivity to the delivery order of state by servers in different partitions are synchronized.

that

axe received

repartitioning

operate

asynchronously

from different

algorithms.

servers.

For example,

and

are insensitive

These suppose

are all fixed, that

to the order well known,

we have five servers.

An algorithm in class 1 might assign 1/5 of each lot of unsold tickets carried by each state message to each server. Notice that even if different servers see state messages in different orders, the number of tickets available to a given server in a given round will be the same. Class 1 algorithms are simple and stateless: they require only that the system provide eventual delivery of each state message its destinations, and that the set of participants be fixed before execution starts. We refer to WAN multicasts satisfying this eventual delivery property as fault-tolerant

WAN

2. Class _ algorithms

multicasts. operate

by having

carrying out the repartitioning the class 1 algorithms because unsold tickets, and anticipated

each server wait for all the round-k

algorithms

and support

require

that

for fault-tolerant

3. Class 3 algorithms

are sensitive

messages

before

for round k-l-1. Such an algorithm has more flexibility than it operates with full knowledge of ticket sales, availability of demand. Again, the algorithm must be deterministic and well

known, so that all servers can execute it in parallel. Class to the order in which messages are received but synchronous. 1, these

state

the system

provide

2 algorithms are thus insensitive Like their counterparts in class

information

about

the set of participants

multicasts. to the delivery order

of state

messages

and asynchronous.

For

example, consider a system in which a server needing tickets broadcasts its need, and servers with a surplus broadcast the existence of the surplus. One might imagine a rule under which allservers,in parallel, reallocateticketsas each such message isreceived.Such a scheme has the advantage of making progressas rapidly as possible,as in the class1 algorithms,but without requiringthe rigiddeterminism of the classI algorithms. However, the order in which messages containingticketrequestsare receivedmay affectthe way that ticketsare repartitionedin this case. In general,serversimplementing class3 algorithms may need to see allstatemessages in the same order,or at leastin a predictable order. We will refer to such multicasts as ordered WAN multicasts. General

remarks

C/ass 1 algorithms will perform poorly if demands axe not uniformly distributed within the WAN system as a whole. Typically, for these algorithms to maximize the number of requests satisfied, the sellingperiod willneed to be divided in small subperiods. Such divisionwill increasethe wide-areanetwork traffic making theapplicationcomponents more tightly coupled. ,

Class £ algorithms might reduce availability at certainlocations.Suppose that some server has no more ticketsto sell.Even ifithas already receiveda statemessage indicatingthat unsold ticketsexiston some other server,and even ifthe repartitioning algorithmissuch that 10

it will be allocated state

messages

Because

.

some

before

of these

granting

class 3 algorithms

at the repartition

any further

allow

servers

to operate

yield a loosely coupled solution. However, known delivery ordering properties, and used in a class 1 asynchronous 4. Communication

failures

it has

to wait

until

it receives

all

asynchronously,

these

are more likely

to

class 3 algorithms need a multicast primitive with this may be a more costly primitive than the one

algorithm.

will affect

time,

requests.

We return

all these

to this issue

algorithms

below.

by delaying

the

delivering

of state

messages. • For class I algorithms, delays impact ticket availability at certain locations. For example, suppose the two subsets of servers {A, B} and {C, D, E} are isolated from one another. Naturally, messages during the partition. assign

about unsold tickets released by each subset will not reach the other Therefore any tickets released by A or 8 that the algorithm will

to C, D or E will remain

unused

• For c/ass 2 algorithms, the delay duration of the partition. • Finally,

for class 3 algorithms,

during

might

delays

the partition.

completely

impact

inhibit

ticket

the availability

repartitioning

of unsold

tickets

for the

in certain

partitions. Moreover, communication partitions might prevent the algorithm implementing atomic WAN multicast from making progress in certain partitions. For example, if WAN multicast is done using a multi-phase protocol, a partition during the first round could completely inhibit the delivery of WAN messages for the duration of the partition. This suggests that one-phase protocols are strongly preferable to multi-phase protocols in WAN settings. 3.4

Summary

The examples plications.

of WAN discussed

communication

above

In this section,

seem

requirements

representative

we summarize

the

of a reasonably

essential

WAN

large

class of wide

communication

area

requirements

apthat

emerge. An

abstraction

super.imposed

upon the concept

of group

WAN applications will typically need communication between a set of related groups located in different clusters. This wide area set of groups (wSet) constitutes a new WAN abstraction super-imposed upon the existing Isls LAN process group mechanisms. In such s set, each element

is a group and

there is at most one element

transmit messages to individual Unlike groups in LAN settings, after creation. Fault-tolerant

multicasts

Certain

applications

need

members of this set of groups it seems reasonable to assume

a multicast

protocol

eventually deliver messages to all its destinations crashes or connection failures. If a server issues and

the system

has "accepted"

on each cluster.

the message 11

tolerant

It must be possible

to

as well as to the set as a whole. that wSets change infrequently

of failures.

Such

a protocol

will

even in presence of partitions, network a fault-tolerant multicast and then fails,

in a sense

discussed

below,

this fault-tolerant

multicastmustbe deliveredsooneror later to all its destinations.Conversely, whena serw,r recoversfrom a crash,it shouldbe ableto recoverpendingfault-tolerantmulticastsdestin(_

Suggest Documents