Teapot: Language Support for Writing Memory ... - Semantic Scholar

9 downloads 10698 Views 1MB Size Report
Teapot: Language Support for Writing Memory Coherence Protocols. Satish Chandra, Brad ... Custom protocols have been used to achieve message-pass-.
Teapot: Language Support for Writing Satish Chandra,

Memory

Brad Richards,

Computer University

Sciences

and James R.Lams Department

of Wisconsin-Madison

1210 West Dayton Madison, {chandra,

exciting possibility

parallel

lams}

COISto lit an application’s

systems offer the A shared-memory

memory coherence pmto-

ming model of a global protocols dlficult

semantics and sharing patterns.

written

address space-and

constructs.

sistent view of memory.

in a conventional

Teapot,

a small,

coherence protocols.

help reduce the complexity

or modlfi.

This paper

protocols.

nearly as fast as hand-written

an application’s

to

nication.

Simple

poorly

C code. A Teapot specifica-

For

between a protocol

and

example,

invalidation

protocols

sharing,

perform

since invalidating

copies forces the consumers to re-request data,

transfer [7]. Recent shared-memory

CO1and to input for a model checking system, which permits

in a higher-level

can deter-

which requires up to four protocol messages for a small data

degrees, the flexibility

the specljication to be venjied. We report our experiences coding and venfiing several protocols written in Teapot, ing a protocol

protocol

sharing pattern leads to excessive commu-

for producer-consumer

outstanding

that run

tion can be compiled both to an executable coherence pmto-

along with measurements of the overhead incurred

sees a con-

a coherence

copies with the new value. A protocol

stores to shared data. A mismatch

much of the

and results in protocols

In general,

mine, to a large extent,”the performance of a shared-memory program, since communication occurs through loads and

language for

Teapot uses continuations of writing

static analysis in the Teapot compiler eliminates overhead of continuations

updating

coherence

language such as C are

domain-specij$c

manages the repliprogram

ensures consistency when a processor writes to a shared location either by invalidating outstanding copies or by

to implement

Unfortunately,

to write, debug, understand,

describes writing

language

coherence protocol

cation of data, to ensure that a parallel

Custom protocols have been used to achieve message-passing performance-while retaining the convenient programhigh-level

USA @cs,wisc.edu

1 Introduction

computer

of customizing

St.

WI 53706

richards,

Abstract Recent shared-memory

Coherence Protocols

architectures

of tailoring

provide,

to varying

a coherence protocol

to

better fit an application’s needs. An example of such an architecture, the one on which this work is based, is the Tempest interface, a specification of the mechanisms neces-

by writ-

language.

sary to implement

shared-memory

coherence protocols

at

user-level, i.e., as unpriviledged code running in an application program [14, 23]. With Tempest, an application programmer can either use a ready-made shared-memory

This work is supported in part by Wright Laboratory Avionics Directorate, Air Force Materiat Command, USAF, under grant #F33615-94-l-1525 and ARPA order no. B550, an NSF NYI Award CCR-9357779, NSF Grant MIP-9225097, DOE Grant DE-FG02-93ER25176, and donations from Digitat Equipment Corporation and Sun Microsystems. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon, The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Wright Laboratory Avionics Directorate or the U.S. Government.

protocol

from a library

memory protocol achieving

or construct

a customized

that suits an application’s

higher performance

shared-

needs, thereby

[11 ]. Compiler

writers

can

exploit custom coherence policies as well. For example, a recent paper showed how a custom protocol (LCM) enables a compiler implement

to use a fine-grain, copy-on-write mechanism to a language’s copy-in, copy-out semantics, with-

Permissionto make digitelhrci copy of part or atl of this work for persondl or classroom use is ranted without fee provided Umtcopies are not mada or distributed for prof!t or commercial advanta e, the copyright notice, fhe title of the ublication and its date appear, an8 notice ia given that copying is ! y parmisdon of ACM, ino. To copy otherwise, to republish, to post on asrwrs, or to redistribuk to lists, raquires prior spedfio perrrrisskm andbr a fee.

out being limited

PLDi ’965’96 PA, USA CI 1996 ACM 0-89791-795-290/0005... $3.50

willing

237

by conservative pointer analysis [18].

Unfortunately, Tempest mechanisms are low level and protocols can be difficult to write, debug, and modify. Writing coherence protocols grams)

artificially

in an ad-hoc manner

limits

or able to develop

the

number

of

custom protocols.

(as C pro-

programmers

This paper

addresses the problem

with anew

that supports both writing

language, called Teapot,

and verifying

coherence proto-

cols. By reducing the complexity of developing new protocols, Teapot increases the attractiveness of systems, such as Stanford Flash [17], Wisconsin [24], that implement protocols

The main difficulty

in writing

that all protocol

response to a protocol incoming

in

protocol

before relinquishing code for other

blocks. Action code, however, frequently blocking

(code executed

event, such as a read miss or an

to execute

communication,

cache

needs to engage in

e.g., sending a request message to

another node and awaiting a reply. To circumvent this problem, protocol programmers introduce intermediate states in their finite state machines. A protocol awaits a reply message by shifting into an intermediate state and relinquishing the processor. The intermediate

state handles the message

on its arrival, and then goes to the destination

state (one that

reflects the overall effect of the protocol

event). The result-

ing proliferation

protocols,

non-atomic

of states complicates

transitions

force a programmer

because

to reason about

the interaction

between all messages and each state. In prac-

tice, protocols

written

in this style are difficult

modify. Detailed discussion of coherence their complications appears in Section 2.

to debug or

protocols

and

Teapot allows a programmer to suspend an action handler and await an asynchronous event by calling a waiting function with the current continuation (Section 3 and Section 4). Teapot also permits the programmer

to concisely

specify how to handIe other messages that arrive while waiting. The Teapot compiler transforms the handler with suspending calls into atomically without

causing deadlock,

Teapot compiler head incurred

executable pieces that can run

Analysis

and optimization

(Section 5) eliminates by writing

protocols

in the

most of the over-

at a higher

level of

abstraction. Section 6 presents our experience write several protocols

with using Teapot to

for the Blizzard

system [24]. With

optimization disabled, our compiler generates protocol code in C that runs application programs within 1370 of the speed of a hand-written performance

protocol.

is (except

With

optimization

for one program)

corresponding hand-written protocol, tive given the vastly superior protocol Even with

this system, as it has reduced the time to develop a complex protocol by an order of magnitude.

2 Why are coherence protocols

a coherence protocol is the actions

message) run to completion

the processor

better programming

enabled,

within

A shared-memory mechanisms.

the system to control

abstractions,

exploration

has emerged

as a viable

debugging

aid

can be built

writing location

access to memory

an invalid

location

or writing

must cause an access fault

ence protocol.

The

second

com-

allows

by permitting

read

a valid but read-only and invoke the coher-

mechanism,

communication,

information

and data

among processors. A protocol comes into play at an access fault. It must satisfy the memory access by bringing the data to the faulting processor’s

memory.

In many

protocols,

each block

shared data has a home node that coordinates

of

accesses to the

block. The accessing processor sends a request to the referenced block’s

home processor, which perfomns bookkeep-

ing and returns the data. Once a processor obtains the data, it caches a copy, which can be subsequently is invalidated.

Many protocols,

accessed until it

for example our Stache pro-

tocol [23], enforce coherence by permitting only a single writer (or multiple readers) to a block. When a home node receives a request for a writable copy of a block, it invalidates the outstanding copies before returning the block. The memory reference and the request and invalidation messages are protocol events, which cause transitions tocol state machine.

in a pro-

Both the home node and caching processors record state for a block. At a protocol event for block B, the protocol consults B’s state to determine

an action, which may send

messages and update both the state and the contents of the block in memory. For example, Figure 1 depicts states in a simple protocol for a block at a non-home node (Figure 2 shows the corresponding home node state machine.) Consider a block that is initially

Invalid.

When the processo~

Readable invalidatemsg (acknowledge)o $A yg:p *Q‘$%= &J +

protocols

for

on two basic

access control,

enables a system to transfer control

the

[8, 9, 22, 28]. In addition to executable C code, the back-end of the Teapot compiler can generate

hard?

and write accesses only for valid, cached data. Reading or

10% of the

which is very attracwriting environment.

system

The first mechanism,

can be difficult to write correctly. Field testing cannot ensure correctness since protocols contain complex timingdependent paths. Although completely verifying a nontrivmodel checking through state space ial protocol is difficult, plex

system (Section 7). Pro-

has been one of the greatest benefits of

Typhoon [23], and Blizzard in software. Protocols writ-

ten in Teapot could also be used for distributed object systems, such as Multipol [6], CRL [16], and Orca [3]. requirement

input for the Mur@ [9] verification tocol verification

Writable Figure

protocols

~

1: Idealized

protocol

state

machine

(non-home

‘7

ReadShared waits for a PutResponse message. Before that message arrives, the home node may also attempt to write the same block and send a WriteFault col programmer

message. The proto-

must consider these possibilities

vide the Excl-To-ReadShared One unattractive

and pro-

state with a suitable action.

approach is to receive these messages

and encode a block’s history, including pending actions, in its state. For example, a block in state Excl-To-ReadShared receives a WriteFault

message and moves to another state

Excl-To-ReadShared-PendingShared

2: Idealized

~Figure

~Transitions / action.

protocol state machine (home side). are labeled with cause and, in parenthesis,

reads the block, the protocol

obtains a read-only

an auxiliary

copy from

the copy with an Invalidate

mes-

sage, which causes a transition to state Invalid. Simultaneously, the block’s access permissions are changed so subsequent reads or writes cause an access fault. In general, actions execute on transitions between states, which are caused by protocol events. Actions may send messages to other processors, protocol-specific ions. the

await their replies,

information, algorithm

and

memory

processes

later processing. An

negatively

acknowledge

messages, or queue unexpected

auxiliary

the

(nack)

messages for

All three approaches have disadvantages,

data

structure

because a programmer

complicates

must remain

programming,

aware that a block’s

state is split between two representations. Nacks can lead to deadlock, so they must be employed carefully. And, queuing requires additional memory. Teapot offers all three options, but advocates queuing unexpected messages. Message reordering complexity,

in a network

further

adds to the

because messages may arrive in an unexpected

order. For example,

consistency

a ReadRequest

from

a processor

that

already has a readable copy cannot be ignored or treated as

model. To illustrate

the complications

in writing

a protocol,

an error. The processor may have returned

con-

PutNoData

sider the home-side state machine of the Stache protocol (Figure 2). Conceptually, the protocol requires only three

message and subsequently

states: Idle, ReadShared, cessor currently

and Exclusive

(a non-home

has the single valid copy.) Although

-—..—=

.

to ReadShared,

{

previous

the

message and

;State

3

a simplification

.

—.--———...”,.. “’?

(Node

R)

Put Request)

; \

(R) ;

: = ReadShared;

State

. . . . .. ... . . . . ...

...”..”.

..- ..,. ... . ”.. .,—

——..

Exclu-sive

Message

Send State

1

ReadRequest (Node PutRequest)

R)

:.

I ;

(Owner,

R;

: = Excl-To-ReadShared;

~

exit;

(b) State

the PutRe-

I

Excl-To-ReadShared Message

PutResponse

(Node

SendSharedCopy

sponse subsequently arrives, the transition completes by changing to state ReadShared (Figure 3b). Other states also require intermediate states for their transitions. Figure 4 shows the new, more complex state machine—which

.

PutResponse;

sharer

ton can wait for a protocol event to cause a transition. Hence, after sending the invalidate message, the ReadRequest handler changes to an intermediate state When

.

exit;

I

completion and terminate. Handlers cannot wait on an asynchronous event, such as a message arrival. Only the automa-

and terminates.

.

SendSharedCopy

~ I 2 .. ,”.,..-

it. Conceptually,

sends an invalidation

.

ReadRequest (Owner,

AWAIT

(a) ~

awaits the block’s new value (Figure 3a). However, to avoid deadlock in a real system, protocol handlers must run to

(Excl-To-ReadShared)

Send

I

can complete only when the

owner relinquishes

action for this transition

Message

which responds to a read request

by a processor. This transition

.

(3’GE’& Exciiis’ive

protransi-

tions appear as atomic, state changes in response to protocol events cannot be atomic. Consider the transition from Exclusive

its copy with a

requested a readable

copy with a ReadRequest. If messages can pass each other,

,~-.

“—.

block’s

When the Read-

the protocol

update

and actions depend on

the

data structure,

unexpected

and change access permiss-

The exact states, transitions, coherence

WriteFault.

completes,

pending WriteFault. Because this approach greatly expands the state space, protocols either encode this information in

the home node and changes its state to Readable. Later, the home node may invalidate

transition

~,

is still

.. . . .

State

:.

exit; .

. ...”.-

(sharer)

R) ;

I

ReadShared; ..

. . .. ... .

. .

Figure 3: The action in box(a) executes the exclusive to synchronous (blocking) using shared transition communication, which could deadlock on a real system. The action in box (b) shows an approach to avoiding it introduces an communication: synchronous intermediate state, which waits for a reply message.

of the actual protocol.

Intermediate states complicate programming because they make transitions non-atomic. While in an intermediate state, the protocol may receive messages other than the expected reply message. For example, the state Excl-To-

239

tation. The second computation tured computation,

can then resume the cap-

often passing along a value.

Consider the example from the previous section. Figure 5 shows how to write the protocol transition with continuations. The Teapot Suspend ReadRequest handler is similar to a cal current continuation the function Resume

statement in the lCC. It passes the

(L, the first argument to Suspend)

that

is its

statement

second

is similar

argument.

to a throw,

does not return a value.3 The Suspend its environment

(the program position,

The

to

Teapot

except that it

statement captures as well as local vari-

ables) at the call point in its first argument, switches to the subroutine state specified by the second argument, and passes the continuation. At this point, the handler yields control (to a dispatch loop), until an action in response to a message to the subroutine state causes the suspended han-

machine (home side) with intermediate states necessary to avoid synchronous communication. Figure

4: State

~eseemingly

gratuitous

ReadRequest must be retained an{

processed after the PutNoData

message. Teapot, by default,

queues such messages for processing at a Iater time, Another important difficult oping

to modify, a

custom

consideration

Consider

adding

a

Com-

pare&Swap primitive to the basic protocol, This primitive is a minor variation of a WriteRequest that also executes the compare and swap operation

at a block’s

states are parameterized

(e.g., the Cent

L parameter

which is part of the environment

is that a protocol can be

although this is a desirable way of develprotocol.

dler to resume. Subroutine continuation

home node once

A subroutine

of all handlers in the state.

state differs from a normal state in two ways:

it is entered synchronously

by a Suspend

Using these mechanisms,

a programmer

cussed in Section 2, many protocol ply a matter of awaiting action.

This section describes our approach to writing handlers. In general, any mechanism that permits multiple execution

an explicit

of synchronous

An attractive benefit of continuations tions bv. .movidin~

and need

intermediate

Using threads, a protocol event handler can be launched in a new thread, which yields control at a synchronous communication point. When the subsequent response message

,@%K”e”ExcY”il-

Suggest Documents