RUNTIME SUPPORT FOR DATA PARALLEL TASKS Matthew Haines ...

2 downloads 456 Views 1MB Size Report
We have recently introduced a set of Fortran language extensions that allow for integrated support of task and data parallelism, and provide for shared data.
NASA ICASE

Contractor Report

Report

194904

No. 94-26

S

IC RUNTIME

SUPPORT

FOR

DATA

(NASA-CR-194904) FOR DATA PARALLEL Report

([CASE)

PARALLEL

TASKS

N94-35240

RUNTIME SUPPORT TASKS Final 23

p

Unclas

G3/61

Matthew

Haines

Bryan Hess Piyush Mehrotra John Van Rosendale Hans Zima

Contract April

NAS

1- 19480

1994

Institute NASA Hampton,

for Computer Langley VA

Applications

Research

in Science

and

Engineering

Center

23681-0001

Operated

by Universities

Space

Research

Association

0012636

Runtime

Support

Matthew

Haines

i

for Data

for Computer NASA

Research

Hampton, [haines,

t

t

in Science

Center,

bhess,prn,jvr]

and

Mail

Engineering

Stop

132C

(_icas c. edu

Technology

University Strasse

Mehrotra

VA 23681-0001

for Software

Briinner

Tasks*

Zima *

Applications

Langley

tInstitute

Piyush

Bryan Hess t John Van 1qosendale Hans

tInstitute

Parallel

and

Parallel

Systems

of Vienna

72, A-1210,

Vienna,

zirna @par. univic,

Austria

ac. at

Abstract We

have

integrated tions

support

(SDAs)

In this such

to

runtime

*This research No. NASA-19480, VA 23681.

of task

we discuss support

a system.

(MDO)

introduced and

as a method

paper

necessary

the

recently

To system

problem

a set data

of Fortran

parallelism,

for communication the

these

design

the

feasibility

and

use

this

supported by the while the authors

and

We give

provide

the

an abstract

of the

underlying

multidisciplinary and

discuss

allow

data

among

we implement

results

that

for shared

issues

approach,

initial

extensions

synchronization

discuss

of this

to support

design.

and and

implementation

extensions,

test

for aircraft

and

language

abstrac-

these

runtime

for

tasks. system

requirements

for

a prototype

of

optimization future

plans.

National Aeronautics and Space Administration under NASA Contract were in residence at ICASE, NASA Langley Research ('enter, Hampton,

1

Introduction

Most

of the

recent

Sl)ecification large

research

and

nmnber

exploitation

of scientific

Multidisciplinary as weather a larger

and

and

more

interacting level

outer

paralM

Vienna

Fortran

do not

l)rovide

We have

each

[5] are

recently called

level,

controlled

other

in a plug In this

runtime

of an

Fortran

and clean

to HPF

for large

memory grained

of SDAs.

SDA method every

codes.

on

the

In pa.rl, icular, data

Method

functions

based The

SDA

are capable

of SUl)porting

both

in the form

of shared

the

SDA method.

independently

sources. collective such

design

Since

data

system

memory,

executing

that

might

parallel

tasks

also supports

Other

projects

on the runtime

M [11, 12], Dec

to

(HPF)

such tasks

[17] and codes,

of these support

[6]. Along a new

to share

features

from This

tasks

hut

codes. mech-

data

both

with

obje:cts in

provides

to interact

a highwith

each

message and

focus

[27] and

FX

and

the

and

SDA

to communicate

[28, 29].

remote

for SDA

invocation

for accessing

condition

of task and

clauses

(see

executing

that

user-level

service

requests the

can guard

threads

that

primitives

[16]. This

underlying

allows

parallel

re-

threads, the system also supl)orts in a loosely synchronous manner, for data

parallel

data

to each

of SPMD

threads.

and data. pa.rallelism Dec

and

communication

to share

compiler

two sets

Fortran-M

to address

of design

method

on lightweight,

distributed

oil the integration

system

areas

SDA

inter-processor

between executing

by an HPF between

and

methods

communication SPMD threads

be produced need

is based

t)assing,

and

the protocols

semantics

system

support

specific

includes the distribution of SDA data with distributed data structures in the

includes

intra-processor tasks

two

management

monitor

communication that

from

so forth

introduced

languages. parallel

internal

codes

and

discipline

provide

of a runtime

we address

structure

invocation

Along with asynchronous communication among

as those

In addition

exhibit

ilia.tiller.

distributed

task

data.

interaction

these

execute

aircraft;.

we have

by including

in shared

Figure 5). Distributed data structure management structures as well as the protocols for interfacing indi\'idual

which tasks,

such

can

Fortran

and

a

to solve

parallel

individual

to allow

90 modules

interface

controls

within

(SDAs),

monitors

data

execution

executing

Abstractions

requires analysis,

Performance

l)arallelism

are

codes

often

of the

there

al)plications,

to share codes

on

of parallelism.

disciplines

discipline

variables

as Itigh

These

different

need

aircraft

the

levels

discipline

design

such

ttowever,

codes.

they

concentraled

codes,

from

structural

the

for the

we concentrate

needs

support:

design

independently

compatible

1)aper,

specific

when

has

nmltiple

different

individual

a set of extensions

generalize and

only

the

for coordinating

Data

systems

other

the

propulsion,

adequate

designed

SDAs

In general,

optimizing

support

,5'hated

other.

codes

extensions

to manage

object-oriented

the

language any

extensions

anism,

of such

the

while

exhibit

example

each

example,

which

integrate

parallelism,

compilers

in scientific

codes

a good

as aerodynamics,

asynchronously

Data

with

For

such

interact

with

and

design,

problem.

of task

parallelism.

disciplines

are

languages

parallelism

engineering

aircraft

complex

to this

in parallel

of data

and

applications

modeling

concurrently, data

effort,

support

computation.

other,

the

include

mechanisms

runtime

Fortranfor estab-

]ishing messaget)lmnbing betweentasks, directly basedon ports and an extensionof C file structures, respectively. In Fx, tasks communicateonly through arguments at the time of creation and ternaination. The runtime support systemsfor these efforts face someof the sameissuesdiscussedill this paper. However, unlike the SDA runtime system described here, thesesystemsare not basedon lightweight threads. The remainder of the paper is organized as follows: Section 2 summarizesthe Fortran languageextensionsfor supporting both task parallelism and shareddata abstractions; Section3 outlines the runtime support necessaryfor supporting theseextensions,with partitular respect to data distribution and method invocation issues;and Section4 introduces a prototype of the SDA runtime system,dew,loped to test the feasibility of the SDA method invocation design.

2

An

In this

Introduction

section,

support

In our

system,

SDA

executes

can

execute one

and

SDA

\_

access

a data

Full details

we have

of the these

the

data

access

method

within

an SDA

designed

to

extensions

object

resources,

object

nested

can

as a data

by invoking

the

associated

call to the SDA, at any

given

for lfierarchically

Itowever, which

time.

that for

by creating

to all tasks

acts

task.

tasks

parallelism,

interact

accessible

and

to the invoking

too!

autonomous

0mbody

A set of tasks

the

is active

a powerful

may

program.

for each

SDA

tasks

making

with respect

to the data

forms

HPF and

on its own

of a particular

concepts

of a set of asynchronous, These

parallel type

asynchronously

presume

that

in the

concentrate data

various

disciplines

constructs

and

Performance Thus,

optimization

parallelism,

application

ttigh

codes.

on management

Multidisciplinary

MDO

extensions

an

in the

set.

repository. SDA

The

methods,

the SDA semantics

is done This

by ensuring

combination

structuring

that of task

a COml)lex

body

of

code.

parallelism

and

Abstractions

Fortran

parallelism.

another.

autonomously

exclusive

only

parallel

data

Data

of the

is composed

of one

of an appropriate

which enforce

overview

and

a program

by executing

object

tasks

and

of task

independently

example, The

a short

Shared

in [6].

execute SDA

we provide

the integration

l)e found

to

to form

conmmnication

Fortran set

(HPF)

of extensions

of asynchronous (MDO)

as they for aircraft

the

problems

are commonly

a single design

that

between

tasks

and

form

a natural

formed

application.

their

by combining

As an example,

will highlight tasks

[t 7] is to be used described

using

data

the need structure

here

to specify'

build

interaction target data

the data

on top

of HPF

through

SDAs.

for integrating parallel

we introduce for supporting repositories

task

units

fi'om

a simplified task

parallel

(SDAs).

F

olver

Figure

2.1

A

We now namic

Sample briefly

and

such

MDO

des,tribe

structural

optimization

1: A sample

The

a focus

design

of the

application:

of an aircraft this program

and

makes

Optimizcr

SDA

them

initiates

,S'u_fac(:Gcom.

geometry tions are

Though

require

version

optimization

for the

in Figure

and then

sake

execution

The

of the

Fc,%lvcr

new

cycles,

the

deformations,

task

the

two tasks,

application generates

to determine as a deflected

tolerance.

At this

Fc,S'olvcr

and

itics SDA.

point,

FlowSoIvcr

FcSolvcr

while

vious flow solution to produce determines that the difference

tasks

are

Flow,,%Ivcr

by storing a finite

codes,

represented controls the &nsitivitics, and

F(:,5'olvcr,

it places

represent

the

variables,

such

inner

iteration

forces

Flow,5'olvcr

some

also produce

These

the

the

uses

output

sensitivity

behavior as the

is completed,

element

the

and

on

the

new geometry

in the

model

geometry based

performs

current

deformed

in the

derivatives

and

variables, of the

Optimizer

SurfaccGcom

3

initial

on

in this

an analysis

flow

geometry

SDA,

of

solution and

process is repeated until the old deflections is within some ,5'tatusRccord

the

them

in the

such

as lift and

pre-

F_Solvcr specified

SDA.

place

to

Both

,5'(,nsitivdrag,

with

variables

and

wing. obtains

the sensitivity derivatives and, based on some objective function whether to terminate the program or to produce a modified it places

the

variables

angle the

based

nses

of the output sweep

the

the structural deflections. These deflecgeometry. The FlowSoIvcr, meanwhile,

new solutions. This between the new and

to design

When,

case,

discipline

of brevity.

1 where

to the other

aerody-

multidiscil)linary

of other

The Optimizer is the main task and It creates three SDAs (,5'tatusRccord,

available

uses some initial forces stored in SurfaccGcom

In subsequent

respect

of the

a realistic

a number

generates an aerodynamics grid based on the initial geometry the airflow around the aircraft, producing a new flow solution.

l)roduce

design

are spawned.

The the

and

simultaneous

would

is shown

SDAs by ovals. MDO application.

SutfaccG¢om,)

for aircraft

configuration.

simplified

by recta_ngles and the execution of the entire as they

the

configuration

we present

structure

application

Application

of a full aircraft

as controls,

MDO

and

the

output

that it is minimizing, decides basegeometry. In the latter the

inner

cycle

is repeated.

SPAWN FlowSolver(SurfaceCeom, Sensitivities ....) ON Figure

2.2

Task

We now units

the

is explicitly

2,

arguments

where

Tasks

operate through

defaults

assigned

ification

and

particular

The

have

space.

These

They

tasks

are

to a Fortran after

task

spawned

The

the

spawn

spawn.

program

are

subroutine

of SUBROUTINE).

associated

SPMD

type

structures and

only

has

that

A task

code,

task

which

or if it

is depicted

are

passed

data

as

of machines a set

spawned

task

resources

using

HPF

directive

along

It is then

the

the

such

machines.

The will

of the spawning

former

task.

is embodied

Thus,

directives

compilers

task

spawned

directives. with

a

is specified,

use part

where

spec-

to specify

the specified

for the

or as

a machine

of such

on which

spawn-

above)

be used

one

new

is specified

parallel

parts: can

If a class from

of their

as shown

two optional

parallelism,

processors.

at the time

clause

specification

the

allocate

or data

latter

an on

set of processors

should

these

job

the

that

task

specify'

to produce

the

task.

Abstractions modeled

an associated

methods

to tasks can

the

for the

specification, and

objects

to them

(using

a PROCESSORS

across

Data

Flow,5'olvcr

SDA

allocated

to choose

the

functional and

code

the

are

machine

can request

the system

nested

of data

spawn

of machines.

is free

to select

the user

tasks,

could

request

Tim

system

may include

Shared

methods

tasks.

similar

its execution

of the

request

A resource

is used

case,

other

distribution

data

continues

resources

or a class

the

or that

specification

accessible

manage

address

instead

Sensitivities

specification.

machine

In either

An SDA

own

is used

task

resource

by the system.

10, then

may

and

an explicit

specification

appropriate

data

task

task.

on a set of system

resources,

2.3

and

syntactically

tile end

Optim, izcr

,%rfaceGfom

physical

by spawning the

task

reaches

the

a processor

Spare

Tasks code

in their

which

spawning

FlowSolver

either

task's

the

of how

to the

execute.

Flow,Solver

killed.

in Figure

processor

the

to create

entities

CODE

its execution

example

as Sun

executing

TASK

in that

when

An

spawning

required

of task programs,

for the keyword

terminates

extensions

parallelism

activation

is non-blocking

ing,

HPF

of coarse-grain

(except

fragment

Management

describe

by explicit

'2: Code

resource-request

can be public

which be used

have

access

by other

after set

the

Fortran

of methods or private,

where

to an instance data

or methods

4

90

module

(procedures)

that

public

of the

methods

SDA

within

type.

an SDA.

[2], consists

of a set

manipulate and

data

Private In this

this are

SDA

of

data.

directly data

respect.,

and SDAs

SDA TYPE SGeomType(StatusRecord) SDA (StatRecType)StatusRecord TYPE (surface)base TYPE (surface)deflected LOGICAL

DeflectFull

PRIVATE

base, deflected,

= •FALSE. DeflectFull

CONTAINS SUBROUTINE

Put.Base

TYPE (surface) base = b

(b)

b

deflected = b DefiectFull = .TRUE. END SUBROUTINE

GetDeflected

(d) WHEN

DeflectFull

TYPE (surface) d DefiectFull = .FALSE. d = deflected END

END

Surface(teem

Figure

and

their

methods

As stated conflicts

an SDA

task

blocked Figure

of the

to SDA

data

asynchronous

method

can

at any

the

3 presents SDA.

currently a code

The

from

first

part

outside.

which

declaration

method,

calls•

similar

can

That

depicts the

have

second

thus

SDA

[8].

ensuring

that

there

is, only

one

method

requests

are

delayed

the

methods

guarded

of the type

CONTAINS)

declared

an optional

to Dijkstra's

functions

a portion

(after the

SurfaceGeom

are

no data

call associated and

the

calling

completes.

keyword been part

constitute have

class

Other

method

(before The

and

time.

that

all of which

for the

is exclusive,

executing fragment

declarations

procedure of the

be active

of the SDA,

procedure

Each

access

classes

object

accessed

execution

to C++

specifying

to the

structures

directly

similar

are

until

SurfaceGeom data

fragment

before,

due

with

3: Code

consists

private

here,

keyword

commands

for the

of the

and

thus

clause

with

the

which

[7]. The

internal

cannot

CONTAINS)

associated

condition

specification

be

consists SDA. "guards"

condition

the clause

SDA (StatRecType)StatusRecord SDA (SGeomType)SurfaceGeom SDA (SensType)Sensitivities CALL StatusRecord%INIT CALL SurfaeeGeom%INIT(StatusRecord) CALL Sensitivities%INIT

Figure 4: Declaration and initialization of SDA variables. consistsof a logical expression,comprisedof tlle internal data structures and the arguments to the procedure. A method call is executedonly if the associatedcondition clause is true at the

moment

method

call

being

of evaluation.

is enqueued

modified

If the

until

by another

call.

a default

condition

the

the

to metho(1

is true. until

code, Thus,

the

if a task

latter

(_!ondition

calls

calls

is executed,

clauses,

clause

that

that

G_'tDefl_'ctcd

the

a way

the

to true,

as a result without

evaluates

will

before

that

to false,

is declared

always

G(,tD(fl_:ct(d

provide

evaluates

evaluates

A method

ensuring

therefore,

clause

the expression

method

will be assigned above

condition

a call for

variable

task

For

the

has

SDA

data clause

example,

in

whpll

D_flcctFulI

former

is blocked

Ollly

PutBasc,

d_flcctcd

to synchronize

of the

a condition

to true.

be executed

corresponding

an appropriate

interactions

value.

based

on data

dependencies. Similar to HPF procedure declarations, directives which allows the internal data these

processors.

structures. rules

This

Tile

applicable An

SDA

variable

in our

in Figure

4, where

l)redefined

methods

the internal storage. internal which

data The

state

INIT

specify

resource the

Having involved

with

from

provided providing

that

also

comprise

large

be (tistril)llt('d

heap, storage

runtime

before

data

using

one

for executing overview support

4), which can

use.

MDO the

of tasks for the

tile SDA

loads l)y the

This

the

any

th( _

is done SDA

data SDAs

An SDA

spawning

tasks,

For

as shown other

glol)al

using

the

by

allocating

from

secondary

programmer

allows

appli(;ations. when

This the

types.

variahles

As with

initializes

which

be used

used

90 deriw:'d

it can be used.

for later

for most to the

declare selection.

or LOAD,

method

initial

might

in Figure

SAVE

similar

to Fortran

for member

to secondary

to be used an

can

similar

Optimizer used

the

consideration

request,

resources

the

operator (as shown

corresponding

for SDAs

methods

syntax

has to be initialized

structures

is an important

optiot!a]

using

application

of an SDA

SDA

procedure.

N is the variable

necessary)

of the

is declared

MDO

an SDA

(or perhaps

arguments

to an HPP

example, variable,

is useful

dummy

each SDA type may have an optional procc_ors structures of the SDA to be distributed across

to save

the

to be pc:rsistr_.t, may which

specify

an

is used

to

SDA. and SDA

SDAs, extensions.

we now

examine

A detailed

the

description

issues of

the task and SDA extensions,including samplecodefor a similar MDO application, can 1)e found in [6]

3

SDA

If we take "tasks" for

Runtime a.n al)stract

that

must

executing

the

actual

the

processor

participating

task. any

Since

SDA

based

proMenl,

computations and

being

performing

in the

each

fashion.

utilize

However,

this

two

basic

tasks, tasks,

communication

types

tasks

where

responsible

process-based

at least

one SDA

execution

of

on Unix-

can execute has

and

resources,

simultaneous

processor

Each task,

(or overlapping)

approach

for

operations.

can be implemented

each

of

responsible

will be assigned for the

multiple

to a process,

SDA

the same

be responsible

of these

are

at least, one computation

of an SDA object

might

task

and

resulting

may

there

computation

performed, any

SDAs

Execution

by mapping

in some

and

system

tasks.

resources:

will be assigned

in the storage

computations

processor

we see that

to a set of physical

participating

systems

SDA

in a computation

independent

processes

of the

methods

multiple

given

multil)le,

view

be mapped

executing

each processor

Support

several

multil)le

drawbacks,

including tile inability operating tile

to control system

inability

assigned

to share

its own

computation costly

inability

per

processor,

without

In light to utilize

since

little

from

user;

of the

on systems

as

the

support,

that

nCUBE/2

such

disadvantages

lightweight,

are

since

each

tasks,

for the

SDA

tile comt)utation large

processes

amount

task

scheduled

Unix

by the

process

to deliver

data

is

to the

task; of context

associated

with

a (heavy-

multiple

Unix

processes

and

to execute such

to the

Unix

between

It is desirable

involving due

the

spaces

space.

without

process;

process

input

addressing

switching,

Unix

the

decisions,

address

task

context

weight)

with

scheduling

as the

[24],

threads

our

fully

support

or systems

SUNMOS

of mapping

user-level

do not

kernel

tasks

to represent

running for the

to Unix these

special

microkernels

Paragon.

processes,

various

our

approach

is

independent

tasks.

A

lightweight, user-level thread is a unit of computation with minimal context that executes within the domain of a kernel-level entity, such as a Unix process or Math thread. Lightweight threads

are

parallel

and

process. can

be

support

becoming sequential

Threads scheduled

are on

for coroutines,

[13, 15, 32] to support

increasingly machines

useful

by providing

used

in simulation

a single

processor,

Ada

tasks,

fine-grain

in supporting

or C++ parallelism

a level

systems language method and

language

implementations

of concurrency

within

[14, 26] to provide implementations invocations, mnltithreading

for both a kernel-level

parallel

events

[20, 22, 25] and

generic

capabilities.

runtime

that

to provide systems

Additionally,

SDARuntimeInterface I

Distributed

Data Structure

Support

1

Method

Invocation

Support

I

Chant: Communicating Communication

Library

Threads

Lightweight

(e.g. MPI, p4, PVM, ..3

Figure

tile POSIX

committee

many

lightweight

shared

memory

thread

process,

very

fast

and

l)rimitives. and

tasks

communicate may

and

an SDA

data.

tasks

task

Chant

supports

POSIX

thread

primitives

(such

as Active

Messages

need

[18], and

for workstations

and

memory send/receive

a remote called

[18]) and

Chant

interface

in the

A descrit)tion

MPI

SDA

[16] that

standard

of Chant,

tile

a piece

support,

we are

currently (as

using

[10]) or remote

compiler, of remote

as depicted developing.

st)ecified

either service

to

computation

by the

to fetch

threads

and

primitives

runtime

operations

among

primitives

request

inserted

request

for thread

are a lack (,f

communication

For example,

our

same

do not l)rovide

tasks

memory

service

primitives

service

communication

our

the

communication

remote

spaces.

is to implement

interface

defined

point-to-point

including

within

that

to support

for distril)uted and

processes, threads

on systems

threads

contains

to execute

among

to execute

of using

in other

problems

[31]).

interface

heavyweight

spaces

the ability

will require

a standardized

as those

threads

implemented

over

point-to-t)oint

that

a runtime both

and

located

standard

and

addressing

drawbacks tasks

to these

5, atop

designed

..3

support

for a lightweight

advantages

shared

both

module

may

for SDA

and a lack of support

computation

Out' solution

in Figure

main

will require

be an HPF

been

significant

switching,

The

with

task

have

standardization,

The

SDA

a standard

scheduling,

context

layers

cthreads,

[1, :3, 9, 19, 23, 30].

offer

thread

support.

l)ortability

adopted

libraries

threads

over

m_dtiprocess

5: Runtime

multiprocessors

Lightweight full coi_trol

has

Thread Library

(e.g. pthreads,

by

the

point-to-point requests

its current

status,

can

composed

of two portions:

(such

be found

in

[16]. Figure distributed next

data

the

putation distributed

these

Data

to the threads

runtime

management

we examine

Distributed

In addition

SDA

structure

two sections

3.1

within

5 depicts

two and

across

a computation

the

types SDA

and

as being

one for method

portions

in more

invocation

one

management.

for

Iu the

detail.

Structures of threads

threads),

processors: task,

system

and

being there

potentially

are

two

computation SDA

data,

types

data, which

mapped

to each

processor

of data

structures

that

which

is any

data

is any

data

structure

structure defined

(commust

be

defined within

an

PROGRAM main !HPF$PROCESSORS P(M) SDA (SType)S

SDA TYPE PROCESSORS

!HPF$

SType P(N)

CONTAINS SUBROUTINE

INTEGER A(1000) !HPF$DISTRIBUTE A(BLOCK) ,

put ( B )

INTEGER

B(:)

,,

CALL

END

S_put(A)

put

.•• .

.

END

,

END

main

Figure

SDA.

Since

each

ories, we must the issues that structure code

an SDA,

method

among

by BLOCK the

put.

across

distributed

Let's

data

distributed

structure.

declares

an

To HPF

N processors. the

the

same

over

which

be responsit)le

may

method,

the

we have

1. The

the

master

sends

A are

issues

thread

solution master

such

that

master

main thread

each

task

a set of processor

with

each

different

contains

the

the elements

updates

values

thread

portion

is the

SDA

N. structures of threads

among

the

To execute

set,

the put

h to B:

of h into a local

distributes its

from

the

their data by a set

thread

and

point,

B, using

of N and

group.

data

the

A, that

At some

array,

a "master"

of the

for transferring

for S, which

thread

SDA

consider

an array,

main.

main and S along with S are each represented

coordination

collects

the

let's

on M processors,

contains

contain

men>

We now illustrate computation data

point,

main,

main

that

to update

arise

the

computation,

The

used

that

options

from

that

tile values

scratch

among

of B. This

array,

theft

S's remaining

provides

the

simplest

in terms of scheduling data transfers, since only one transfer occurs, fiom thread of main to master thread of S. However, two scratch arrays and two

gather/scatter

operations

master

thread

then negotiates to be distributed gotiation, This

and

for external

following

it to the

threads,

2. Tlle

processors,

over

illustrate

M processors

IfM and N are both greater than 1, then both distributed. We will assume that main and

distributed

excerpt

for transferring data between the two. must be transferred fi'om a distributed

array

consider

code

be independently

6, which

S, distributed

from

may

SDA

in Figure

distributed

are

of data

have a mechanism arise when data

to a distributed

excerpt

values

type

6: SDA

SType

main

required,

consuming

collects

the

both

elements

time

and

of h into

space. a local

with the master thread of S to determine how the among the threads of S, considering B's distribution.

main's

approach

a negotiation

from

are

master eliminates

phase

that

thread one

distributes scratch

is required

array

the and

to discern

scratch the

array

scatter

B's distribution.

directly operation,

scratch

array',

scratch array is After tile neto S's threads. but

introduces

PO

PI

i

Figure :3. The

master

other

threads

thread this

8 has

approach

The

HPF array

of 8 elements

with

CYCLIC

distribution

array

the

thread

from distribution

ways

which ttPF

on processor send

the

Referring

array

and

formed

a gather

the

portion

requires

again

data

data

from

an SDA

master

approach

elements Consider

array. P0 and

be restricted threads

array

it owns

After

HPF

Wheu

a local

scratch

the

the

tile master array,

previous

the

scenario,

at the

thread

of S, then

approl)riate both

expense

of

informs

threads

scratch

S and main

in S,

arrays

and

to ,ln(h,rstand

to the

in Figure their

tlPF

locally,

and

and

llPF

SDA

where

each

that

array

will collect and

which

element

array, scratch

HPF

corresponding

is located

6, we explore

Ig is 1, the array

SDA

values

thread

SDA owns

2 of the

SDA

on processor

the special

is executed

to the

array.

a set of hardware SDA

an ttPF To avoid

each

of

consider

array"

elements array

is

P2, and

can

directly.

is needed to update the SDA data structures as follows:

all computation

than array.

on Pl,

threads, 10

We can

processing and

on which

case of having

on a single

single

when g is 1, the computation array is stored on a single processor, thread will distribute the array to the SDA processors. When

application

are a result

For example,

threads,

the

thread

it learns SDA

processors

to 1. When will send

the

to an SDA

differently

fi'om the

comlmtation

6 of the

task

to be viewed.

is distributed

negotiations,

element

want

the values

among

the

to our example

all computation

execute

informs

thread.

eliminates from

a computation may

with

operations

or Ig (or both)

• The

of S, then

operation

of A to the

all threads

structure

array

array" elements

a single message distributed SDA

tasks

thread

in S. As with

with

of B. This

7, where

array

located.

6 of the

and

threads

their

the SDA

gather/scatter

are

to send

in passing

to update

know

elements

messages

negotiates

but

a particular

in Figure

and

tile master

SDA

distrib,ltion.

complications

we wish

arrays

operations,

array

situation

and

of A to S's master

remaining

main

to the

and

portion

a one scratch

according

the

their

for HPF

with

all of the

among

in main

The

distributions

negotiates

to send

threads

different

then

distribution

tile other

the

located

BLOCK

main

eliminates

master

others

2 and

with

phase.

gather/scatter

must

of 8 elements

received

is distributed

a negotiation 4.

from

in main

from

array

5DA array

7: Sample

thread

i"r2

SDA

and both

either

processor,

processor.

I4

and

Likewise,

so a single computation M and 1_ are 1, at most

summarize

elements to store

that

the

support

for

will be used

to

all data

structures.

Each computation task will be representedby a group of

computation

tributed

belonging

over

some

of processors.

also be distributed tribution directives.

over

Each

distributed

SDA

SDA

will be

oil any

thread

given

thread

entire Multiple the

decisions. higher

For

priority

at tile next

This

level

various

developed format

than

modules

the

filter

data

the

all of these

3.2

SDA using

method

to the

once

again

the

can

data

can act as a negotiator distribution. processor

policy

be used for

threads,

will

of the

an SDA

structure

comprise

of all an SDA

for the

executed

in

system

thread

thread,

allowing

management

an MDO

others

and

it to handle

and

schedl,ling

it may

to change

predefined

filter.

is necessary

application, wishes

to remapping

the

assume message

The

since

each

module

the

same

data

to view

a data the

structure

from

dimensionality

methods

to accommodate

one

distribution

of a data

outlined

above

is typically in a different to

structure

or to

will accommodate

Invocation to Figure

SDA

5, we now

semantics

invocation

We can view

SDA data invocation.

be

thread

to influence

object

has

discuss

of SDAs

place

exclusive

access

can be active

at any

the

runtime

issues

two restrictions to tile SDA one

time),

involved

on method data

(i.e.

the

SDA In the

structures.

an SDA methods previous

as being

comprised

in accordance section

We now address

of two components:

with

the stated

we addressed the issues

11

with regarding

only

one

SDA

method

issues SDA

is an expression

a control

restrictions, the

with

invocation:

and

2. execution of each method is guarded by a condition, clause, which must evaluate to true before the method code can be executed.

structures.

and

that

same

arrives

be required

The

method

for a given

executes

dis-

presence

data

scheduling

computation

in data

Method

invocation.

I. each

on

a message

other

thread

threads

In addition some

respective

tile

set of distributed

will

requests.

SDA

Referring

SDA

of the

may

where

dis-

task

opportunity.

which

or distribution.

another,

threads, to the

to their

of processors,

as determining

when

of complexity

independently

set

according

by some

Priorities

the

scheduling

some

threads.

example,

structures

of processors,

a "master" such

according

of the

data

data.

and

fashion,

"readiness"

set

is evidenced

on issues

computation

Any

over

will identify

group

an interleaved

same

for that

group

thread

the

processor

responsible

• Each

tile

set

and arising

control

that

structure,

which

a set of SDA due

structure

data

to distributed and

method

At. this point, our designonly supportsa centralizedSDA control structure, represented by a single master thread on a specified processor. All remaining SDA processors will host worker

threads,

which

thread.

Allowing

mutual

exclusion

and

is a point

take

each

of interest

SDA

by a thread the

SDA

for the

master

can

now

Each

established

describe

SDA

has

expression

specified

a method,

its condition

the

method

evaluates

any enqneued enqueued Starvation

method

cation. control a cluster test

describe At

this

structure the

executed

and

the

To explain example, guarded

point,

is centralized.

the

the

may

which

semantics

of SDA

control

the

and,

When

request the

for

no more

is processed.

condition by

if not,

function

functions

changed.

whose

condition to execute

is true

condition

have

we

is enforced.

a condition

the

waits

structure,

a request

invocation

the

then

restriction

is determined

is under

are

is built

has changed, order

programmer's

of our

using

design,

in which

control.

C++

and

we have

invo-

and

SDA

is currently

successfully

in Section

the running

for parallel

stack

implementation,

for method

processor,

primitives

a simple

introduced

prototype

system

to a single

this prototype: design

runtime

[4] portable

invocation

using

SDA

allocated

by the p4

for aircraft

details

of the

prototype

method

programs

application

only

The

of our

pop and

structures

supported

two SDA

the

evaluated,

method

condition

method

Fairness

implementation

all data

which provides by a condition

example, push

a prototype

feasibility

MDO

conditions

any enqueued

task

representing

which

a new method

request.

executing

all actual

Whenever

after

processor

and

receives

to see if the

is invoked,

to true, that

flmction master

fact

is invoked

calling

SDA

second

is handled.

to see if their

method

the

by the

method

monitor-like

of the

that

evaluated

The

processor,

the

an SDA

SDA

to the

method.

condition

request

method

are

of SDAs,

Implementation

of workstations,

and

is first

by ensuring

conditions

the

organization

When

evaluate

a new

master

distributed

semantics

is guaranteed an

is sent

on a single

boolean

are examined

A Prototype

We now

To

before

by the

inll)lementing

design,

a message

for ensuring

another

conditions

is prevented

enqueued

and

require

When

we maintain

programmer.

associated

methods

is processed

4

the

method

master,

master-worker

function

instructed

the monitor-like

current

to invoke

an associated

by the

would

thread.

is located

mechanism

is enqueued to true,

master

the

in the

master

processor

by the

a simple

method

SDA

a request

SDA

invocations are performed method invocation. IIaving

to the

by a single

with

the

of an SDA

when

research.

on a different

lhread

execution

as [21], to guarantee

for future access

Since

method

control such

is controlled

executing

reply.

in the

for distributed algorithms,

Mutually-exclusive that

part

on

execution. implemented

manipulation

program

2.1.

we present

the

SDA

stack

several public methods, such as push, pop, and top. Each method is clause to ensure that the stack does not overflow or underflow. For top methods

be executed

may so long

only

be invoked

as the 12

stack

when

is not

the

at its

stack

is non-empty,

maximuna

size.

while If a task

TYPE

SDA-stack-pop-stub

()

{ send-message-to-SDA

(POP)

receive-message-from-SDA

(result)

return(result)

bool

SDA-stack-pop-condition

()

{ return

(stack-height

> O)

} generic

SDA-stack-pop-method()

{ .result return

= pop-from-private-stack (result)

()

}

Figure

thread the

sends

method

8: Three

a request

fuuctions

needed

to pop an empty

will be enqueued

until

to implen_ent

stack,

the

the

SDA

condition

tile condition

clause

stack

clause

can

returned to the caller. stack is now non-empty,

acknowledgment

message.

Each

SDA

1. the

method

method

gral]lln

function,

clause,

Figure the

function,

and

(perhaps

at

stack and, assuming the will be executed, and an

three

embodies

fmwtions: the

method

code

itself

as specified

by the pro-

call,

arguments

Each

message

(used

for the

which

is a boolean

function

that

evaluates

the guarded

condition

and

stub function,

is used

method

to false

The condition for the blocked pop will now evaluate and the resulting value returned to its caller in an

into

which

When

er,

2. the conditioT_

3. the

is compiled

pop

will evaluate

be satisfied.

a later time) another task thread sends a push request to the same stack is not full, its condition clause will evaluate to true, the push acknowledgment to true since the

method

to access

8 depicts

which the

the

it is actually into reply

method

pseudocode invoking

a buffer,

consists

provides

SDA

sends

of (1) the

message),

and

the

inethod's

function

for the the

name

SDA

stub routine

a generic

13

method

to the

SDA

pop.

When

method, master,

to be invoked,

arguments.

to the task

threads

and

processor.

for the specified

method

method

interface

a remote

stack

message

of the

(3) the

public

from

a task which

inakes marshals

and

awaits

a reply.

(2) the

callers

address

a

qlist[O] .method int push(void*)

qlist[O].cond_ptr

ptr qentry

qentry

qentry

int push_conditionO

\

/

qlist[0] qlist[ 1]

@t

t



PUSH

qlist[2]

@l

_



TOP

qlist[3]

I]

oo_

eoo

12'aIdr

I void args

Figure

Each

SDA

appropriate

to that

of queues,

one

pbinter value

master

that

is a thread

shown

method

that

in Figure

functions.

back

9 for the and Both

returns

to the

in the

for

messages

The

stack

associated

sent

with

Along

SDA

from

using

the

task

queue

method method

pointers

generic

method

method

of a list

and

invocation

to its

argument

request.

returns

takes

structure

poin(ers

invocation function

and

consists are

single

to the

a list of outstanding the stub routine.

stubs a data

structure the

args

prototype

incorporates

method

the

with

stack

Tim data

each

whereas

stub function.

each method queue also contains by the message received from

SDA.

are called

and

a boolean,

master

with

fimctions

[void

used

waits

message.

by the stub code

always

is sent

structure

which

by the

method,

was created

function

functions, represented

list data

as specified

for each

and

condition

queue

action

analogous condition

9: Tile

I_

The

a generic condition requests,

The algorithm in Figure 10 depicts the main loop of the SDA master. On receiving a message from a task stub routine, the SDA master executes the associated condition function to deterlnine if the method is enqueued executed

and

the

of any method enqueued SDA are executed. can be executed, routines.

method in the

results

can be executed. appropriate list.

returned

may change methods are This

at which

to the caller

the SDA reevaluated

reevaluation time

If the condition function Otherwise, the associated through

the

stub routine.

returns method Since

false, function

the is

the execution

state, the condition functions associated with any and the methods whose conditions evaluate to true

of condition the master

functions continues

I4

is repeated waiting

until

for further

no further messages

methods from

stub

S da-mast

er ( )

{ do

forever

m

= wait-for-message

if

(m.condition

(); (m.arg)

==

true)

{ result

= m.method

(m.arg);

return-result-to-caller

(result,

m.caller);

repeat

( for

each

method

queue

do

{ while

condition

of

queue

i

is

true

{ m

=

get-first-method-in-queue

result

= m.method

(i)

(m.arg)

return-result-to-caller

(result,

m.caller)

} } } until

no

methods

are

executed

in

a

single

pass

} else enqueue-message

Figure

4.1

ing

10: Pseudocode

Preliminary

In order

(m);

for tile main

loop of an SDA

master

Results

to quantify

the overhead

of our

method

invocation

design,

we 1)et'formed

the follow-

experiment:

1. We first the

measured

basis

from 2. Next,

the

calling

stub

we encoded

call mechanism, tures.

the round-trip

for comparing

This

our

to a master

a version but

is done

this

test

of our

without by

for underflow/overflow work,

other

represents

the

performing conditions. a lower

message

delay

results,

since

thread, SDA

and stack

routines SDA

a sequence Since

our

to the using master

of pushes SDA

to the

15

network

in all cases back

complicated

bound

of our

method

execution

using

p4.

a single calling

algorithm must of our

provides is passed

stub.

a simple and

This

message

pops

remote and

procedure queue

without

do at least SDA

stack.

struc-

checking this

much

3. Next,

we timed

pushes

and

they

always

designed from 4.

but

time,

true

this

then

retrieve

of the

trial

call

(including

basic

time

of the R.PC

for

overhead

overhead

to the when

We acknowledge

such

5

we are

Paragon.

introduced

definition

with

1.7% the

basic the

for the entire

basic

RPC

overheads

some and

currently

in the

We plan

to report

and a mechanism

method

queue (test

process

low traffic.

basic are

vs.

(test

test

#2),

(test

#4

(test

our

prototype

#1); vs.

test 4.0%

is only

5.8%

may to other

final version

only

only

#1).

round-trip

interface

in the

#4

adds

which

is

is 3.4%

vs. test

vs. test

by tile large

of porting

adds

which

#3

#1

similarly.

functions

is 2.3%

passing

results

computed

mechanism

time

message

to test

#2),

then under

results:

round-trip

test

time

were

performed

#2

and for a

empirical

of test

condition

vs.

message

on these

Future

message

yield

higher

platforms,

of this

paper.

Research

for integrating

a set of primitives

#4

trials

with

its

calls,

required

were

structures

may be obscured

a leaner

different

following

invocation

message

time

tests

to the

message

method

method

average

overheads

round-trip

round-trip

that

complete

the

#3

first

the

method,

manipulating

overhead

(test

on

arriving with

The

executing

and

finally

All

overhead

false the

complete

#1).

relative

mechanism

that

is

pushes

and

the

4.1, yield

and

SDA

with

test

messages

associated

of 10,000

of 95%.

mechanism

compared

package,

Conclusions

We have HPF

and

as the

p4

the

for manipulating

overhead

overheads,

trials

Remaining

RPC

This

of alternating

function,

10 workstations

only

method,

unpacking

evaluating

to determine

in Figure

where

methods.

to enqueue

overhead

case of test

messages

basic with

so the overhead

of the

#2),

marshaling

as compared

overhead

latency

the

intervals

-tt_st#1)/l_test#t.

to the

overhead

#3),

in the

adds

test

(tt,_st#2

overhead

the

vs.

master

its condition

delays)

presented

mechanism

#1 as:

tests,

of alternating

for each

and

a series

function

SDA

on a set of Sun Sparcstation

computed

1.7%

method

the

of inultiple

confidence

any

packing

using

to measure

message

to achieve

(test

the

each forces

message

(or a round-trip

with

a series

evaluated

routines.

re-evaluate

consisted

outcome

the

method,

four tests

conditions

The

This

are

enqueue

associated

prototype

we simulate

performing

conditions

so we never

stack

is designed

together

the

condition

SDA

second.

the

and

This test queues.

was timed

averaged similar

time

implementation

overhead

evaluating our

on the

execution. the method

SDA although

to true the

and

we timed

pops,

method

evaluate

tile stubs

Each

prototype However,

to highlight

Finally,

each

our

pops.

for defining 16

task and

and

data

parallelism

controlling

parallel

by extending tasks

and

the shared

Time

for

each

test

(wit]]

q5%

confidence

intervals)

10 Test Test Test

2:

3:

SDA

Test

4:

I:

p4

simple without

SDA

round-trip RPC

wit]]

using

p4

method

queue

method

queue

18.8

18.6

18.4

'U

o lS.2 o, ol

18

c

.,-t ,-.

17.8

H

._

17.6

17.4

17.2

i ]

17 0

i 2

| 3

i 4

5

Test

data

abstractions

(SDAs).

system, and provide data structures and In particular, prototype task

parallelism

pops,

and

codes

execute Using

this and

SDA

prototype SDAs:

our

prototype

and

we have

a simple

a multidisciplinary on a cluster

runtime

the implementation

of executing

Using

on the

support

necessary,

to support

such

stack

task

codes

implemented program

optimization of workstations

of our method

(MDO) using

implementation,

with

centralized

and which

measured

SDA

executed

communication

as a C+ +

control

randomized for aircraft

the

design

two systems

performs

application

p4 as the

we have

invocation

and

data

employing pushes

design.

and Both

library.

expected

overhead

of our

method invocation design and have found that our design adds little overhead to a strippeddown RPC version of the same test, which we feel is a realistic lower bound for remote method for the

invocation. final

version

We plan of the

a

details for the two segments of the runtime support: distributed invocation using monitor and guarded condition semantics.

we describe

capable

structures.

We focus

design method

to expand

our

results

paper.

17

on the

overheads

of method

invocation

W'e are currently in the processof incorporating tile Chant runtime system into our prototype, which will give us the capability to exerciseour distributed SDA data structure designs,as well as full support of our parallel tasks and portalfility to a large number of platforms, including the Paragon and KSR-1. From the experiencesand results of our prototype, wearemaking modificationsto the SDA syntax andsemantics,andare working on a source-to-sourcetranslator that will take HPF programsaugmentedwith our SDA syntax and produce codethat will executeon distributed memory multiprocessorsand workstation clusters. Finally, we are working with severalapplicationsgroups to developrealistic MDO codesthat will provide the true test of our designs.

References [1] Thomas

E. Anderson,

Scheduler lelism.

activations:

manet

[5] Barbara

Piyush

Programming, M.

A software

Chapman, ICASE

Engineering,

A. Ellis

Addlson-Wesley, W. Felton

applications puting

Report

94-18,

Guarded

and

tIans

B. Wagner.

Technical

Report

January

An 88-01-

1988.

programming

October

system.

1992.

Programming

in Vienna

Van

Rosendale,

and

applications:

Integrating

for Computer

Applications

non-determinacy 18(8):453-457,

Stroustrup,

The

ISBN

0-201-51459-I.

and

Dylan

McNamee.

84-89,

P¢:TJor-

Fortran.

Itans task

P. Zima. and

data

in Science

and

1994.

1990.

pages

Zima.

High

1992.

David

1)4 parallel

Laboratory,

John

Institute

commands,

Bjarne

and

199t.

1992.

of the ACM,

by multlthreading.

Conference,

and

In ,5'calabb:

systems.

to the

M. Levy. of paral-

95-109,

VA, April

of Washington,

National

Mehrotra,

VA, March

Communications

[8] Margaret

guide

of multidisciplinary

Hampton,

Dijkstra.

Mehrotra,

Piyush

architecture

parallelism.

lTser's

Henry

management pages

90.

M. Levy,

University

Argonne

I ( 1):31-50,

Fortran

Williamsburg, Henry

and

user-level

Prb_ciples,

programming

Science,

Lusk.

C,hapman,

[6] Barbara

W.

Ewing

ANL-92/17,

for the

Vienna

51-59,

parallel

D. Lazowska,

5'ystcms

Lazowska,

for building

and

support

H. Zima. pages

D.

Report

£'cicntific

grams.

and

of Computer

Butler

Technical

Edward

on Opcratb_g

Edward

environment

[4] Ralph

kernel

Conference,

N. Bershad,

03, Department

[9] Edward

Effective

B. Chapman,

Computing

[3] Brian

[7] E.

N. Bershad,

In A CM Symposium

S. Benkner,

open

Brian

hnproving

In Proceedings April

1992.

18

and August

AnTwtatcd

formal

C++

the performance

of the Scalable

derivation

of pro-

1975.

High

R@rcncc

Manual.

of message-passing Performance

Com-

[10] MessagePassingInterface Forum. draft [11]

[12]

edition,

I. Foster,

M. Xu,

High

Performance

I. T.

Foster

Science

[14]

[15]

Fortran K.

a Standard

A. Choudhaxy.

Fortran

Chandy.

Report

Argonne

Message

Passing

Inter:face,

Fortran

system

May

Revision

Laboratory,

June

that

integrates

1994.

M: A language

MCS-P327-0992 National

A compilation

M. In SHPCC,

for

modular

parallel

1, Mathematics

and

pro-

(lomputer

1993.

Ian Foster, Carl Kessehnan, ability layer for parallel and

Robert Olson, and Steven Tuecke. Nexus: An interoperdistributed computer systems. Technical Report ¥%rsion

1.3, Argonne

December

National

Labs,

Dirk

Crunwald.

A users

and

simulation

system.

Science,

Ilniversity

Matthew

tlaines

Matthew

and

threads

Science

and

Wire

An object

Report

On

November

tile design

1:209-240,

oriented

CU-CS-552-91,

at Boulder,

Bghm.

David

1993.

to AWESIME:

Technical

LaT_guagcs,

Haines,

talking

guide

of Colorado

of ProgrammiT_g [16]

and and

M.

Technical Division,

for

1993.

B. Avalani,

and

gramming.

[13]

November

Document

parallel

t)rogramming

Department

of Computer

1991.

of distributed

memory

Sisal..]ourT_al

1993.

Cronk,

and

Piyush

package.

ICASE

Report

Engineering,

NASA

Langley

Mehrotra.

94-25,

On

Institute

Research

the

design

for Computer

Center,

Hampton,

of (lhant:

A

Applications VA 23681,

in April

1994. [17]

High

Performance

version [18]

IEEE.

[19]

David nical

Fortran

1.0 edition, Threads

Tools

for

and

UWCSE

[22]

Mamoru

Maekawa.

A CM

Transactions

Frank

Mueller.

USENIX,

techniques

[23]

Bodhisattwa interface ing,

Mukherjee, for lightweight

Georgia

Institute

for building

University

Fortran

Language

,5'pccificatioT_,

fast

(Draft

portable

of W'ashington,

for mutual

Systems,

7), February

threads

1992.

packages.

Tech-

1993.

Diego,

Greg threads.

exclusion

3(2):145-159,

implementation San

5'gstcms

Gannon. Object oriented parallel programnaing experiments of,5'upercomputiu9 91, pages 273-282, Albuquerque, NM,

on Computer

29-41,

Pelformance

OpcratiT_g

A v/_ 7 algorithm

A library

pages

Portable

93-05-06,

[20] Jenq Kuen Lee and Dennis and results. In Proceedings November 1991. [21]

High

1993.

Ea'te_zsion

Keppel. Report

May

Forum.

of POSIX January

1993.

Eisenhauer,

and

Kaushik

Technical

of Technology,

Atlanta, 19

Report

May threads

CA,

in decentralized 1985. under

(lhosh.

CIT-CC-93/53,

Georgia,

1993.

systems.

UNIX.

A machine College

Ill Winte:r

independent of Comput-

nCUBE, Carl

Beaverton,

OR.

SchmidtInann,

multithreaded

Michael

Xlib.

[26] It. Schwetman. Technology

nCUBE/2

[27] B. Seevers,

multiple

Run- Time

Environments and

CS-94-112,

and

data-parallel for

T. (;ross.

School

parallel

of Computer

Design

1990.

and

imI)lnlentation

San Diego,

CA,

.9). Microelectronics

Austin,

Hatcher.

modules.

SYSTEMS,

193-203,

(Revision Blvd,

Distributed

Task

Watt.

pages

Manual

P. J.

Owrvicw,

Steven

94:30 !Research

Quinn,

supporting

and

USENL¥,

Cfi_[M Reference

M. J.

[2s] J. Subhlok

Tao,

In Winter

Corperation,

T_chnical

TX,

programming

In Workshop

on Languages,

:I'[meor 9 Machines,

October

Science,

in Fx.

Carnegie

and

1993.

Computer

1986.

A parallel

progranuning

of a

January

Mellon

environment Uompibrs

and

1992.

Teclmicat

Report

University,

CMU-

Pittsburgh,

PA

1521:3, 1994.

[29] J. Subhlok, lelism

on

Principles

[:30]Sun

.J. Stichnoth,

D. O'Hallaron,

a multicomputer. and Practice

Microsystems,

and

In Proceedings of Paralh'l

Inc.

T. Gross. of the

Programming,

Lightweight

Process

Exploiting

2rid ACM San

Library,

Diego, sun

task

and

,S'IGPLAN CA,

May

release

data

paral-

Symposium

on

1993.

4.1 edition,

January

1990.

[:31]Thorsten Active

von Eicken, messages:

Proceedings pages

[:32]Mark

256

\¥eiser,

May Alan

to interoperability. December 1989.

E. Culler,

A mechanism

of the 266,

David 19th

Annual

Seth

Copen

for integrated h_tcrnationaI

Goldstein,

and

communications Symposium

Klaus and

on

Erik

Schauser.

computation.

Computer

Architecture,

1992. Demers, ACM

and

Carl

Symposium

Hauser.

The

on Opcratin

20

portable 9 Systems

common Principles,

runtime pages

approach 114-122,

In

Form

REPORT

DOCUMENTATION

PAGE

Approved

OMBNo

0704-0188

Public reporting burden for this collection of information is estimated to average t hour per response including the time for reviewing instructions, searching existing data sources gatheringandmaintainingthedataneeded andcompletingandreviewingthecolfectionofinformation Send comments re ardin thisbu den estmate or an oth ras f s coIlect .... f,nf .... ,,on,including suggestions for reducing this burden, to Washington Headquarters Servi .... Directorate _or ,nfogrmat on Operat ..... d ReYports_,2lPe_;f_ertshon Davis Highway, Suite 1204. Arlington 1. AGENCY

4.

USE

TITLE

AND

VA 22202-4_02

ONLY(Leaveblank)

and to the Office of Management

12.

REPORT

I

Apri]

and Budget

DATE

3.

Paperwork

REPORT

1994

Reduction

TYPE

Contractor

S.

SUPPORT

FOR DATA PARALLEL

(0704 0188)

DATES

Washington

DC 20503.

COVERED

Report

SUBTITLE

RUNT[ME

Project

AND

FUNDING

NUMBERS

TASKS C

NA,ql-19480

W IT ,)0o-. _- _" 90-o_-').-01 6.

AUTHOR(S) Matthew

Haines,

,John

7.

Van

PERFORMING Institute and

Bryan

Rosendale,

Hess, and

Piyush

Hans

ORGANIZATION

NAME(S)

Computer

for

Mehrotra,

ZiIna

AND

Applications

in

ADDRESS(ES)

8.

PERFORMING REPORT

Science

Engineering ICASE

Mail

Stop

132C,,

Hampton,

9.

NASA

VA

Langley

AGENCY

Aeronautics

Langley

and

Research

Hampton,

To 12a.

NAME(S)

Space

AND

ADDRESS(ES)

10.

No.

9,1-26

SPONSORING/MONITORING AGENCY

Administration

NA,qA

REPORT

NUMBER

CR-194904

ICA,qE

23681-0001

SUPPLEMENTARY

Report

No.

94-26

NOTES

Technical

Final

Report

Center

Center

VA

Langley

Research

23681-0001

SPONSORING/MONITORING National

11.

ORGANIZATION NUMBER

Monitor:

Michael

F.

(-Yard

Report appear

in

Supercomputing

'94

DISTRIBUTION/AVAILABILITY

STATEMENT

]2b.

DISTRIBUTION

CODE

Unclassified-Unlimited

Subject

13.

Category

(Maximum

ABSTRACT V_Te have

to

these

support

optimization

SUBJECT

provide

tasks. these we

words)

introduced

and

approach,

14.

200

recently

parallelism, antong

61

a set for

this

In extensions,

implement (MDO)

problem

Fortran data

paper and

a

of

shared

discuss

we discuss

the

prototype for

language abstractions,

of aircraft

extensions

the design underlying

the

as

a

for for

and implementation system

We

allow

ntethod

requirements

runtime

design.

that

(SDAs)

give

for

and initial

use results

integrated

issues such

this

of

support discuss

TERMS

Thread-based

To an

runtime test

future

systems,

data

parallelism,

ta.sk

parallelism,

task

and

data

synchronization system

the

abstract

necessary

feasibility

of

this

multidiseiplinary

plans.

15. runtime

of and

the

a system. to

and

support

communication

NUMBER

HPF

OF

PAGES

22' 16.

PRICE

CODE A03

17.

SECURITY OF

CLASSIFICATION

REPORT

Unclassified _ISN 7540-01-280-5500

18.

SECURITY OF

THIS

CLASSIFICATION PAGE

19.

SECURITY OF

ABSTRACT

CLASSIFICATION

20.

LIMITATION OF

ABSTRACT

Unclassified ii ii Standard Prescribed 2q8-102

i Form 298(Rev. 2-89) by ANSI Std z3q IR