We have recently introduced a set of Fortran language extensions that allow for integrated support of task and data parallelism, and provide for shared data.
NASA ICASE
Contractor Report
Report
194904
No. 94-26
S
IC RUNTIME
SUPPORT
FOR
DATA
(NASA-CR-194904) FOR DATA PARALLEL Report
([CASE)
PARALLEL
TASKS
N94-35240
RUNTIME SUPPORT TASKS Final 23
p
Unclas
G3/61
Matthew
Haines
Bryan Hess Piyush Mehrotra John Van Rosendale Hans Zima
Contract April
NAS
1- 19480
1994
Institute NASA Hampton,
for Computer Langley VA
Applications
Research
in Science
and
Engineering
Center
23681-0001
Operated
by Universities
Space
Research
Association
0012636
Runtime
Support
Matthew
Haines
i
for Data
for Computer NASA
Research
Hampton, [haines,
t
t
in Science
Center,
bhess,prn,jvr]
and
Mail
Engineering
Stop
132C
(_icas c. edu
Technology
University Strasse
Mehrotra
VA 23681-0001
for Software
Briinner
Tasks*
Zima *
Applications
Langley
tInstitute
Piyush
Bryan Hess t John Van 1qosendale Hans
tInstitute
Parallel
and
Parallel
Systems
of Vienna
72, A-1210,
Vienna,
zirna @par. univic,
Austria
ac. at
Abstract We
have
integrated tions
support
(SDAs)
In this such
to
runtime
*This research No. NASA-19480, VA 23681.
of task
we discuss support
a system.
(MDO)
introduced and
as a method
paper
necessary
the
recently
To system
problem
a set data
of Fortran
parallelism,
for communication the
these
design
the
feasibility
and
use
this
supported by the while the authors
and
We give
provide
the
an abstract
of the
underlying
multidisciplinary and
discuss
allow
data
among
we implement
results
that
for shared
issues
approach,
initial
extensions
synchronization
discuss
of this
to support
design.
and and
implementation
extensions,
test
for aircraft
and
language
abstrac-
these
runtime
for
tasks. system
requirements
for
a prototype
of
optimization future
plans.
National Aeronautics and Space Administration under NASA Contract were in residence at ICASE, NASA Langley Research ('enter, Hampton,
1
Introduction
Most
of the
recent
Sl)ecification large
research
and
nmnber
exploitation
of scientific
Multidisciplinary as weather a larger
and
and
more
interacting level
outer
paralM
Vienna
Fortran
do not
l)rovide
We have
each
[5] are
recently called
level,
controlled
other
in a plug In this
runtime
of an
Fortran
and clean
to HPF
for large
memory grained
of SDAs.
SDA method every
codes.
on
the
In pa.rl, icular, data
Method
functions
based The
SDA
are capable
of SUl)porting
both
in the form
of shared
the
SDA method.
independently
sources. collective such
design
Since
data
system
memory,
executing
that
might
parallel
tasks
also supports
Other
projects
on the runtime
M [11, 12], Dec
to
(HPF)
such tasks
[17] and codes,
of these support
[6]. Along a new
to share
features
from This
tasks
hut
codes. mech-
data
both
with
obje:cts in
provides
to interact
a highwith
each
message and
focus
[27] and
FX
and
the
and
SDA
to communicate
[28, 29].
remote
for SDA
invocation
for accessing
condition
of task and
clauses
(see
executing
that
user-level
service
requests the
can guard
threads
that
primitives
[16]. This
underlying
allows
parallel
re-
threads, the system also supl)orts in a loosely synchronous manner, for data
parallel
data
to each
of SPMD
threads.
and data. pa.rallelism Dec
and
communication
to share
compiler
two sets
Fortran-M
to address
of design
method
on lightweight,
distributed
oil the integration
system
areas
SDA
inter-processor
between executing
by an HPF between
and
methods
communication SPMD threads
be produced need
is based
t)assing,
and
the protocols
semantics
system
support
specific
includes the distribution of SDA data with distributed data structures in the
includes
intra-processor tasks
two
management
monitor
communication that
from
so forth
introduced
languages. parallel
internal
codes
and
discipline
provide
of a runtime
we address
structure
invocation
Along with asynchronous communication among
as those
In addition
exhibit
ilia.tiller.
distributed
task
data.
interaction
these
execute
aircraft;.
we have
by including
in shared
Figure 5). Distributed data structure management structures as well as the protocols for interfacing indi\'idual
which tasks,
such
can
Fortran
and
a
to solve
parallel
individual
to allow
90 modules
interface
controls
within
(SDAs),
monitors
data
execution
executing
Abstractions
requires analysis,
Performance
l)arallelism
are
codes
often
of the
there
al)plications,
to share codes
on
of parallelism.
disciplines
discipline
variables
as Itigh
These
different
need
aircraft
the
levels
discipline
design
such
ttowever,
codes.
they
concentraled
codes,
from
structural
the
for the
we concentrate
needs
support:
design
independently
compatible
1)aper,
specific
when
has
nmltiple
different
individual
a set of extensions
generalize and
only
the
for coordinating
Data
systems
other
the
propulsion,
adequate
designed
SDAs
In general,
optimizing
support
,5'hated
other.
codes
extensions
to manage
object-oriented
the
language any
extensions
anism,
of such
the
while
exhibit
example
each
example,
which
integrate
parallelism,
compilers
in scientific
codes
a good
as aerodynamics,
asynchronously
Data
with
For
such
interact
with
and
design,
problem.
of task
parallelism.
disciplines
are
languages
parallelism
engineering
aircraft
complex
to this
in parallel
of data
and
applications
modeling
concurrently, data
effort,
support
computation.
other,
the
include
mechanisms
runtime
Fortranfor estab-
]ishing messaget)lmnbing betweentasks, directly basedon ports and an extensionof C file structures, respectively. In Fx, tasks communicateonly through arguments at the time of creation and ternaination. The runtime support systemsfor these efforts face someof the sameissuesdiscussedill this paper. However, unlike the SDA runtime system described here, thesesystemsare not basedon lightweight threads. The remainder of the paper is organized as follows: Section 2 summarizesthe Fortran languageextensionsfor supporting both task parallelism and shareddata abstractions; Section3 outlines the runtime support necessaryfor supporting theseextensions,with partitular respect to data distribution and method invocation issues;and Section4 introduces a prototype of the SDA runtime system,dew,loped to test the feasibility of the SDA method invocation design.
2
An
In this
Introduction
section,
support
In our
system,
SDA
executes
can
execute one
and
SDA
\_
access
a data
Full details
we have
of the these
the
data
access
method
within
an SDA
designed
to
extensions
object
resources,
object
nested
can
as a data
by invoking
the
associated
call to the SDA, at any
given
for lfierarchically
Itowever, which
time.
that for
by creating
to all tasks
acts
task.
tasks
parallelism,
interact
accessible
and
to the invoking
too!
autonomous
0mbody
A set of tasks
the
is active
a powerful
may
program.
for each
SDA
tasks
making
with respect
to the data
forms
HPF and
on its own
of a particular
concepts
of a set of asynchronous, These
parallel type
asynchronously
presume
that
in the
concentrate data
various
disciplines
constructs
and
Performance Thus,
optimization
parallelism,
application
ttigh
codes.
on management
Multidisciplinary
MDO
extensions
an
in the
set.
repository. SDA
The
methods,
the SDA semantics
is done This
by ensuring
combination
structuring
that of task
a COml)lex
body
of
code.
parallelism
and
Abstractions
Fortran
parallelism.
another.
autonomously
exclusive
only
parallel
data
Data
of the
is composed
of one
of an appropriate
which enforce
overview
and
a program
by executing
object
tasks
and
of task
independently
example, The
a short
Shared
in [6].
execute SDA
we provide
the integration
l)e found
to
to form
conmmnication
Fortran set
(HPF)
of extensions
of asynchronous (MDO)
as they for aircraft
the
problems
are commonly
a single design
that
between
tasks
and
form
a natural
formed
application.
their
by combining
As an example,
will highlight tasks
[t 7] is to be used described
using
data
the need structure
here
to specify'
build
interaction target data
the data
on top
of HPF
through
SDAs.
for integrating parallel
we introduce for supporting repositories
task
units
fi'om
a simplified task
parallel
(SDAs).
F
olver
Figure
2.1
A
We now namic
Sample briefly
and
such
MDO
des,tribe
structural
optimization
1: A sample
The
a focus
design
of the
application:
of an aircraft this program
and
makes
Optimizcr
SDA
them
initiates
,S'u_fac(:Gcom.
geometry tions are
Though
require
version
optimization
for the
in Figure
and then
sake
execution
The
of the
Fc,%lvcr
new
cycles,
the
deformations,
task
the
two tasks,
application generates
to determine as a deflected
tolerance.
At this
Fc,S'olvcr
and
itics SDA.
point,
FlowSoIvcr
FcSolvcr
while
vious flow solution to produce determines that the difference
tasks
are
Flow,,%Ivcr
by storing a finite
codes,
represented controls the &nsitivitics, and
F(:,5'olvcr,
it places
represent
the
variables,
such
inner
iteration
forces
Flow,5'olvcr
some
also produce
These
the
the
uses
output
sensitivity
behavior as the
is completed,
element
the
and
on
the
new geometry
in the
model
geometry based
performs
current
deformed
in the
derivatives
and
variables, of the
Optimizer
SurfaccGcom
3
initial
on
in this
an analysis
flow
geometry
SDA,
of
solution and
process is repeated until the old deflections is within some ,5'tatusRccord
the
them
in the
such
as lift and
pre-
F_Solvcr specified
SDA.
place
to
Both
,5'(,nsitivdrag,
with
variables
and
wing. obtains
the sensitivity derivatives and, based on some objective function whether to terminate the program or to produce a modified it places
the
variables
angle the
based
nses
of the output sweep
the
the structural deflections. These deflecgeometry. The FlowSoIvcr, meanwhile,
new solutions. This between the new and
to design
When,
case,
discipline
of brevity.
1 where
to the other
aerody-
multidiscil)linary
of other
The Optimizer is the main task and It creates three SDAs (,5'tatusRccord,
available
uses some initial forces stored in SurfaccGcom
In subsequent
respect
of the
a realistic
a number
generates an aerodynamics grid based on the initial geometry the airflow around the aircraft, producing a new flow solution.
l)roduce
design
are spawned.
The the
and
simultaneous
would
is shown
SDAs by ovals. MDO application.
SutfaccG¢om,)
for aircraft
configuration.
simplified
by recta_ngles and the execution of the entire as they
the
configuration
we present
structure
application
Application
of a full aircraft
as controls,
MDO
and
the
output
that it is minimizing, decides basegeometry. In the latter the
inner
cycle
is repeated.
SPAWN FlowSolver(SurfaceCeom, Sensitivities ....) ON Figure
2.2
Task
We now units
the
is explicitly
2,
arguments
where
Tasks
operate through
defaults
assigned
ification
and
particular
The
have
space.
These
They
tasks
are
to a Fortran after
task
spawned
The
the
spawn
spawn.
program
are
subroutine
of SUBROUTINE).
associated
SPMD
type
structures and
only
has
that
A task
code,
task
which
or if it
is depicted
are
passed
data
as
of machines a set
spawned
task
resources
using
HPF
directive
along
It is then
the
the
such
machines.
The will
of the spawning
former
task.
is embodied
Thus,
directives
compilers
task
spawned
directives. with
a
is specified,
use part
where
spec-
to specify
the specified
for the
or as
a machine
of such
on which
spawn-
above)
be used
one
new
is specified
parallel
parts: can
If a class from
of their
as shown
two optional
parallelism,
processors.
at the time
clause
specification
the
allocate
or data
latter
an on
set of processors
should
these
job
the
that
task
specify'
to produce
the
task.
Abstractions modeled
an associated
methods
to tasks can
the
for the
specification, and
objects
to them
(using
a PROCESSORS
across
Data
Flow,5'olvcr
SDA
allocated
to choose
the
functional and
code
the
are
machine
can request
the system
nested
of data
spawn
of machines.
is free
to select
the user
tasks,
could
request
Tim
system
may include
Shared
methods
tasks.
similar
its execution
of the
request
A resource
is used
case,
other
distribution
data
continues
resources
or a class
the
or that
specification
accessible
manage
address
instead
Sensitivities
specification.
machine
In either
An SDA
own
is used
task
resource
by the system.
10, then
may
and
an explicit
specification
appropriate
data
task
task.
on a set of system
resources,
2.3
and
syntactically
tile end
Optim, izcr
,%rfaceGfom
physical
by spawning the
task
reaches
the
a processor
Spare
Tasks code
in their
which
spawning
FlowSolver
either
task's
the
of how
to the
execute.
Flow,Solver
killed.
in Figure
processor
the
to create
entities
CODE
its execution
example
as Sun
executing
TASK
in that
when
An
spawning
required
of task programs,
for the keyword
terminates
extensions
parallelism
activation
is non-blocking
ing,
HPF
of coarse-grain
(except
fragment
Management
describe
by explicit
'2: Code
resource-request
can be public
which be used
have
access
by other
after set
the
Fortran
of methods or private,
where
to an instance data
or methods
4
90
module
(procedures)
that
public
of the
methods
SDA
within
type.
an SDA.
[2], consists
of a set
manipulate and
data
Private In this
this are
SDA
of
data.
directly data
respect.,
and SDAs
SDA TYPE SGeomType(StatusRecord) SDA (StatRecType)StatusRecord TYPE (surface)base TYPE (surface)deflected LOGICAL
DeflectFull
PRIVATE
base, deflected,
= •FALSE. DeflectFull
CONTAINS SUBROUTINE
Put.Base
TYPE (surface) base = b
(b)
b
deflected = b DefiectFull = .TRUE. END SUBROUTINE
GetDeflected
(d) WHEN
DeflectFull
TYPE (surface) d DefiectFull = .FALSE. d = deflected END
END
Surface(teem
Figure
and
their
methods
As stated conflicts
an SDA
task
blocked Figure
of the
to SDA
data
asynchronous
method
can
at any
the
3 presents SDA.
currently a code
The
from
first
part
outside.
which
declaration
method,
calls•
similar
can
That
depicts the
have
second
thus
SDA
[8].
ensuring
that
there
is, only
one
method
requests
are
delayed
the
methods
guarded
of the type
CONTAINS)
declared
an optional
to Dijkstra's
functions
a portion
(after the
SurfaceGeom
are
no data
call associated and
the
calling
completes.
keyword been part
constitute have
class
Other
method
(before The
and
time.
that
all of which
for the
is exclusive,
executing fragment
declarations
procedure of the
be active
of the SDA,
procedure
Each
access
classes
object
accessed
execution
to C++
specifying
to the
structures
directly
similar
are
until
SurfaceGeom data
fragment
before,
due
with
3: Code
consists
private
here,
keyword
commands
for the
of the
and
thus
clause
with
the
which
[7]. The
internal
cannot
CONTAINS)
associated
condition
specification
be
consists SDA. "guards"
condition
the clause
SDA (StatRecType)StatusRecord SDA (SGeomType)SurfaceGeom SDA (SensType)Sensitivities CALL StatusRecord%INIT CALL SurfaeeGeom%INIT(StatusRecord) CALL Sensitivities%INIT
Figure 4: Declaration and initialization of SDA variables. consistsof a logical expression,comprisedof tlle internal data structures and the arguments to the procedure. A method call is executedonly if the associatedcondition clause is true at the
moment
method
call
being
of evaluation.
is enqueued
modified
If the
until
by another
call.
a default
condition
the
the
to metho(1
is true. until
code, Thus,
the
if a task
latter
(_!ondition
calls
calls
is executed,
clauses,
clause
that
that
G_'tDefl_'ctcd
the
a way
the
to true,
as a result without
evaluates
will
before
that
to false,
is declared
always
G(,tD(fl_:ct(d
provide
evaluates
evaluates
A method
ensuring
therefore,
clause
the expression
method
will be assigned above
condition
a call for
variable
task
For
the
has
SDA
data clause
example,
in
whpll
D_flcctFulI
former
is blocked
Ollly
PutBasc,
d_flcctcd
to synchronize
of the
a condition
to true.
be executed
corresponding
an appropriate
interactions
value.
based
on data
dependencies. Similar to HPF procedure declarations, directives which allows the internal data these
processors.
structures. rules
This
Tile
applicable An
SDA
variable
in our
in Figure
4, where
l)redefined
methods
the internal storage. internal which
data The
state
INIT
specify
resource the
Having involved
with
from
provided providing
that
also
comprise
large
be (tistril)llt('d
heap, storage
runtime
before
data
using
one
for executing overview support
4), which can
use.
MDO the
of tasks for the
tile SDA
loads l)y the
This
the
any
th( _
is done SDA
data SDAs
An SDA
spawning
tasks,
For
as shown other
glol)al
using
the
by
allocating
from
secondary
programmer
allows
appli(;ations. when
This the
types.
variahles
As with
initializes
which
be used
used
90 deriw:'d
it can be used.
for later
for most to the
declare selection.
or LOAD,
method
initial
might
in Figure
SAVE
similar
to Fortran
for member
to secondary
to be used an
can
similar
Optimizer used
the
consideration
request,
resources
the
operator (as shown
corresponding
for SDAs
methods
syntax
has to be initialized
structures
is an important
optiot!a]
using
application
of an SDA
SDA
procedure.
N is the variable
necessary)
of the
is declared
MDO
an SDA
(or perhaps
arguments
to an HPP
example, variable,
is useful
dummy
each SDA type may have an optional procc_ors structures of the SDA to be distributed across
to save
the
to be pc:rsistr_.t, may which
specify
an
is used
to
SDA. and SDA
SDAs, extensions.
we now
examine
A detailed
the
description
issues of
the task and SDA extensions,including samplecodefor a similar MDO application, can 1)e found in [6]
3
SDA
If we take "tasks" for
Runtime a.n al)stract
that
must
executing
the
actual
the
processor
participating
task. any
Since
SDA
based
proMenl,
computations and
being
performing
in the
each
fashion.
utilize
However,
this
two
basic
tasks, tasks,
communication
types
tasks
where
responsible
process-based
at least
one SDA
execution
of
on Unix-
can execute has
and
resources,
simultaneous
processor
Each task,
(or overlapping)
approach
for
operations.
can be implemented
each
of
responsible
will be assigned for the
multiple
to a process,
SDA
the same
be responsible
of these
are
at least, one computation
of an SDA object
might
task
and
resulting
may
there
computation
performed, any
SDAs
Execution
by mapping
in some
and
system
tasks.
resources:
will be assigned
in the storage
computations
processor
we see that
to a set of physical
participating
systems
SDA
in a computation
independent
processes
of the
methods
multiple
given
multil)le,
view
be mapped
executing
each processor
Support
several
multil)le
drawbacks,
including tile inability operating tile
to control system
inability
assigned
to share
its own
computation costly
inability
per
processor,
without
In light to utilize
since
little
from
user;
of the
on systems
as
the
support,
that
nCUBE/2
such
disadvantages
lightweight,
are
since
each
tasks,
for the
SDA
tile comt)utation large
processes
amount
task
scheduled
Unix
by the
process
to deliver
data
is
to the
task; of context
associated
with
a (heavy-
multiple
Unix
processes
and
to execute such
to the
Unix
between
It is desirable
involving due
the
spaces
space.
without
process;
process
input
addressing
switching,
Unix
the
decisions,
address
task
context
weight)
with
scheduling
as the
[24],
threads
our
fully
support
or systems
SUNMOS
of mapping
user-level
do not
kernel
tasks
to represent
running for the
to Unix these
special
microkernels
Paragon.
processes,
various
our
approach
is
independent
tasks.
A
lightweight, user-level thread is a unit of computation with minimal context that executes within the domain of a kernel-level entity, such as a Unix process or Math thread. Lightweight threads
are
parallel
and
process. can
be
support
becoming sequential
Threads scheduled
are on
for coroutines,
[13, 15, 32] to support
increasingly machines
useful
by providing
used
in simulation
a single
processor,
Ada
tasks,
fine-grain
in supporting
or C++ parallelism
a level
systems language method and
language
implementations
of concurrency
within
[14, 26] to provide implementations invocations, mnltithreading
for both a kernel-level
parallel
events
[20, 22, 25] and
generic
capabilities.
runtime
that
to provide systems
Additionally,
SDARuntimeInterface I
Distributed
Data Structure
Support
1
Method
Invocation
Support
I
Chant: Communicating Communication
Library
Threads
Lightweight
(e.g. MPI, p4, PVM, ..3
Figure
tile POSIX
committee
many
lightweight
shared
memory
thread
process,
very
fast
and
l)rimitives. and
tasks
communicate may
and
an SDA
data.
tasks
task
Chant
supports
POSIX
thread
primitives
(such
as Active
Messages
need
[18], and
for workstations
and
memory send/receive
a remote called
[18]) and
Chant
interface
in the
A descrit)tion
MPI
SDA
[16] that
standard
of Chant,
tile
a piece
support,
we are
currently (as
using
[10]) or remote
compiler, of remote
as depicted developing.
st)ecified
either service
to
computation
by the
to fetch
threads
and
primitives
runtime
operations
among
primitives
request
inserted
request
for thread
are a lack (,f
communication
For example,
our
same
do not l)rovide
tasks
memory
service
primitives
service
communication
our
the
communication
remote
spaces.
is to implement
interface
defined
point-to-point
including
within
that
to support
for distril)uted and
processes, threads
on systems
threads
contains
to execute
among
to execute
of using
in other
problems
[31]).
interface
heavyweight
spaces
the ability
will require
a standardized
as those
threads
implemented
over
point-to-t)oint
that
a runtime both
and
located
standard
and
addressing
drawbacks tasks
to these
5, atop
designed
..3
support
for a lightweight
advantages
shared
both
module
may
for SDA
and a lack of support
computation
Out' solution
in Figure
main
will require
be an HPF
been
significant
switching,
The
with
task
have
standardization,
The
SDA
a standard
scheduling,
context
layers
cthreads,
[1, :3, 9, 19, 23, 30].
offer
thread
support.
l)ortability
adopted
libraries
threads
over
m_dtiprocess
5: Runtime
multiprocessors
Lightweight full coi_trol
has
Thread Library
(e.g. pthreads,
by
the
point-to-point requests
its current
status,
can
composed
of two portions:
(such
be found
in
[16]. Figure distributed next
data
the
putation distributed
these
Data
to the threads
runtime
management
we examine
Distributed
In addition
SDA
structure
two sections
3.1
within
5 depicts
two and
across
a computation
the
types SDA
and
as being
one for method
portions
in more
invocation
one
management.
for
Iu the
detail.
Structures of threads
threads),
processors: task,
system
and
being there
potentially
are
two
computation SDA
data,
types
data, which
mapped
to each
processor
of data
structures
that
which
is any
data
is any
data
structure
structure defined
(commust
be
defined within
an
PROGRAM main !HPF$PROCESSORS P(M) SDA (SType)S
SDA TYPE PROCESSORS
!HPF$
SType P(N)
CONTAINS SUBROUTINE
INTEGER A(1000) !HPF$DISTRIBUTE A(BLOCK) ,
put ( B )
INTEGER
B(:)
,,
CALL
END
S_put(A)
put
.•• .
.
END
,
END
main
Figure
SDA.
Since
each
ories, we must the issues that structure code
an SDA,
method
among
by BLOCK the
put.
across
distributed
Let's
data
distributed
structure.
declares
an
To HPF
N processors. the
the
same
over
which
be responsit)le
may
method,
the
we have
1. The
the
master
sends
A are
issues
thread
solution master
such
that
master
main thread
each
task
a set of processor
with
each
different
contains
the
the elements
updates
values
thread
portion
is the
SDA
N. structures of threads
among
the
To execute
set,
the put
h to B:
of h into a local
distributes its
from
the
their data by a set
thread
and
point,
B, using
of N and
group.
data
the
A, that
At some
array,
a "master"
of the
for transferring
for S, which
thread
SDA
consider
an array,
main.
main and S along with S are each represented
coordination
collects
the
let's
on M processors,
contains
contain
men>
We now illustrate computation data
point,
main,
main
that
to update
arise
the
computation,
The
used
that
options
from
that
tile values
scratch
among
of B. This
array,
theft
S's remaining
provides
the
simplest
in terms of scheduling data transfers, since only one transfer occurs, fiom thread of main to master thread of S. However, two scratch arrays and two
gather/scatter
operations
master
thread
then negotiates to be distributed gotiation, This
and
for external
following
it to the
threads,
2. Tlle
processors,
over
illustrate
M processors
IfM and N are both greater than 1, then both distributed. We will assume that main and
distributed
excerpt
for transferring data between the two. must be transferred fi'om a distributed
array
consider
code
be independently
6, which
S, distributed
from
may
SDA
in Figure
distributed
are
of data
have a mechanism arise when data
to a distributed
excerpt
values
type
6: SDA
SType
main
required,
consuming
collects
the
both
elements
time
and
of h into
space. a local
with the master thread of S to determine how the among the threads of S, considering B's distribution.
main's
approach
a negotiation
from
are
master eliminates
phase
that
thread one
distributes scratch
is required
array
the and
to discern
scratch the
array
scatter
B's distribution.
directly operation,
scratch
array',
scratch array is After tile neto S's threads. but
introduces
PO
PI
i
Figure :3. The
master
other
threads
thread this
8 has
approach
The
HPF array
of 8 elements
with
CYCLIC
distribution
array
the
thread
from distribution
ways
which ttPF
on processor send
the
Referring
array
and
formed
a gather
the
portion
requires
again
data
data
from
an SDA
master
approach
elements Consider
array. P0 and
be restricted threads
array
it owns
After
HPF
Wheu
a local
scratch
the
the
tile master array,
previous
the
scenario,
at the
thread
of S, then
approl)riate both
expense
of
informs
threads
scratch
S and main
in S,
arrays
and
to ,ln(h,rstand
to the
in Figure their
tlPF
locally,
and
and
llPF
SDA
where
each
that
array
will collect and
which
element
array, scratch
HPF
corresponding
is located
6, we explore
Ig is 1, the array
SDA
values
thread
SDA owns
2 of the
SDA
on processor
the special
is executed
to the
array.
a set of hardware SDA
an ttPF To avoid
each
of
consider
array"
elements array
is
P2, and
can
directly.
is needed to update the SDA data structures as follows:
all computation
than array.
on Pl,
threads, 10
We can
processing and
on which
case of having
on a single
single
when g is 1, the computation array is stored on a single processor, thread will distribute the array to the SDA processors. When
application
are a result
For example,
threads,
the
thread
it learns SDA
processors
to 1. When will send
the
to an SDA
differently
fi'om the
comlmtation
6 of the
task
to be viewed.
is distributed
negotiations,
element
want
the values
among
the
to our example
all computation
execute
informs
thread.
eliminates from
a computation may
with
operations
or Ig (or both)
• The
of S, then
operation
of A to the
all threads
structure
array
array" elements
a single message distributed SDA
tasks
thread
in S. As with
with
of B. This
7, where
array
located.
6 of the
and
threads
their
the SDA
gather/scatter
are
to send
in passing
to update
know
elements
messages
negotiates
but
a particular
in Figure
and
tile master
SDA
distrib,ltion.
complications
we wish
arrays
operations,
array
situation
and
of A to S's master
remaining
main
to the
and
portion
a one scratch
according
the
their
for HPF
with
all of the
among
in main
The
distributions
negotiates
to send
threads
different
then
distribution
tile other
the
located
BLOCK
main
eliminates
master
others
2 and
with
phase.
gather/scatter
must
of 8 elements
received
is distributed
a negotiation 4.
from
in main
from
array
5DA array
7: Sample
thread
i"r2
SDA
and both
either
processor,
processor.
I4
and
Likewise,
so a single computation M and 1_ are 1, at most
summarize
elements to store
that
the
support
for
will be used
to
all data
structures.
Each computation task will be representedby a group of
computation
tributed
belonging
over
some
of processors.
also be distributed tribution directives.
over
Each
distributed
SDA
SDA
will be
oil any
thread
given
thread
entire Multiple the
decisions. higher
For
priority
at tile next
This
level
various
developed format
than
modules
the
filter
data
the
all of these
3.2
SDA using
method
to the
once
again
the
can
data
can act as a negotiator distribution. processor
policy
be used for
threads,
will
of the
an SDA
structure
comprise
of all an SDA
for the
executed
in
system
thread
thread,
allowing
management
an MDO
others
and
it to handle
and
schedl,ling
it may
to change
predefined
filter.
is necessary
application, wishes
to remapping
the
assume message
The
since
each
module
the
same
data
to view
a data the
structure
from
dimensionality
methods
to accommodate
one
distribution
of a data
outlined
above
is typically in a different to
structure
or to
will accommodate
Invocation to Figure
SDA
5, we now
semantics
invocation
We can view
SDA data invocation.
be
thread
to influence
object
has
discuss
of SDAs
place
exclusive
access
can be active
at any
the
runtime
issues
two restrictions to tile SDA one
time),
involved
on method data
(i.e.
the
SDA In the
structures.
an SDA methods previous
as being
comprised
in accordance section
We now address
of two components:
with
the stated
we addressed the issues
11
with regarding
only
one
SDA
method
issues SDA
is an expression
a control
restrictions, the
with
invocation:
and
2. execution of each method is guarded by a condition, clause, which must evaluate to true before the method code can be executed.
structures.
and
that
same
arrives
be required
The
method
for a given
executes
dis-
presence
data
scheduling
computation
in data
Method
invocation.
I. each
on
a message
other
thread
threads
In addition some
respective
tile
set of distributed
will
requests.
SDA
Referring
SDA
of the
may
where
dis-
task
opportunity.
which
or distribution.
another,
threads, to the
to their
of processors,
as determining
when
of complexity
independently
set
according
by some
Priorities
the
scheduling
some
threads.
example,
structures
of processors,
a "master" such
according
of the
data
data.
and
fashion,
"readiness"
set
is evidenced
on issues
computation
Any
over
will identify
group
an interleaved
same
for that
group
thread
the
processor
responsible
• Each
tile
set
and arising
control
that
structure,
which
a set of SDA due
structure
data
to distributed and
method
At. this point, our designonly supportsa centralizedSDA control structure, represented by a single master thread on a specified processor. All remaining SDA processors will host worker
threads,
which
thread.
Allowing
mutual
exclusion
and
is a point
take
each
of interest
SDA
by a thread the
SDA
for the
master
can
now
Each
established
describe
SDA
has
expression
specified
a method,
its condition
the
method
evaluates
any enqneued enqueued Starvation
method
cation. control a cluster test
describe At
this
structure the
executed
and
the
To explain example, guarded
point,
is centralized.
the
the
may
which
semantics
of SDA
control
the
and,
When
request the
for
no more
is processed.
condition by
if not,
function
functions
changed.
whose
condition to execute
is true
condition
have
we
is enforced.
a condition
the
waits
structure,
a request
invocation
the
then
restriction
is determined
is under
are
is built
has changed, order
programmer's
of our
using
design,
in which
control.
C++
and
we have
invo-
and
SDA
is currently
successfully
in Section
the running
for parallel
stack
implementation,
for method
processor,
primitives
a simple
introduced
prototype
system
to a single
this prototype: design
runtime
[4] portable
invocation
using
SDA
allocated
by the p4
for aircraft
details
of the
prototype
method
programs
application
only
The
of our
pop and
structures
supported
two SDA
the
evaluated,
method
condition
method
Fairness
implementation
all data
which provides by a condition
example, push
a prototype
feasibility
MDO
conditions
any enqueued
task
representing
which
a new method
request.
executing
all actual
Whenever
after
processor
and
receives
to see if the
is invoked,
to true, that
flmction master
fact
is invoked
calling
SDA
second
is handled.
to see if their
method
the
by the
method
monitor-like
of the
that
evaluated
The
processor,
the
an SDA
SDA
to the
method.
condition
request
method
are
of SDAs,
Implementation
of workstations,
and
is first
by ensuring
conditions
the
organization
When
evaluate
a new
master
distributed
semantics
is guaranteed an
is sent
on a single
boolean
are examined
A Prototype
We now
To
before
by the
inll)lementing
design,
a message
for ensuring
another
conditions
is prevented
enqueued
and
require
When
we maintain
programmer.
associated
methods
is processed
4
the
method
master,
master-worker
function
instructed
the monitor-like
current
to invoke
an associated
by the
would
thread.
is located
mechanism
is enqueued to true,
master
the
in the
master
processor
by the
a simple
method
SDA
a request
SDA
invocations are performed method invocation. IIaving
to the
by a single
with
the
of an SDA
when
research.
on a different
lhread
execution
as [21], to guarantee
for future access
Since
method
control such
is controlled
executing
reply.
in the
for distributed algorithms,
Mutually-exclusive that
part
on
execution. implemented
manipulation
program
2.1.
we present
the
SDA
stack
several public methods, such as push, pop, and top. Each method is clause to ensure that the stack does not overflow or underflow. For top methods
be executed
may so long
only
be invoked
as the 12
stack
when
is not
the
at its
stack
is non-empty,
maximuna
size.
while If a task
TYPE
SDA-stack-pop-stub
()
{ send-message-to-SDA
(POP)
receive-message-from-SDA
(result)
return(result)
bool
SDA-stack-pop-condition
()
{ return
(stack-height
> O)
} generic
SDA-stack-pop-method()
{ .result return
= pop-from-private-stack (result)
()
}
Figure
thread the
sends
method
8: Three
a request
fuuctions
needed
to pop an empty
will be enqueued
until
to implen_ent
stack,
the
the
SDA
condition
tile condition
clause
stack
clause
can
returned to the caller. stack is now non-empty,
acknowledgment
message.
Each
SDA
1. the
method
method
gral]lln
function,
clause,
Figure the
function,
and
(perhaps
at
stack and, assuming the will be executed, and an
three
embodies
fmwtions: the
method
code
itself
as specified
by the pro-
call,
arguments
Each
message
(used
for the
which
is a boolean
function
that
evaluates
the guarded
condition
and
stub function,
is used
method
to false
The condition for the blocked pop will now evaluate and the resulting value returned to its caller in an
into
which
When
er,
2. the conditioT_
3. the
is compiled
pop
will evaluate
be satisfied.
a later time) another task thread sends a push request to the same stack is not full, its condition clause will evaluate to true, the push acknowledgment to true since the
method
to access
8 depicts
which the
the
it is actually into reply
method
pseudocode invoking
a buffer,
consists
provides
SDA
sends
of (1) the
message),
and
the
inethod's
function
for the the
name
SDA
stub routine
a generic
13
method
to the
SDA
pop.
When
method, master,
to be invoked,
arguments.
to the task
threads
and
processor.
for the specified
method
method
interface
a remote
stack
message
of the
(3) the
public
from
a task which
inakes marshals
and
awaits
a reply.
(2) the
callers
address
a
qlist[O] .method int push(void*)
qlist[O].cond_ptr
ptr qentry
qentry
qentry
int push_conditionO
\
/
qlist[0] qlist[ 1]
@t
t
•
PUSH
qlist[2]
@l
_
•
TOP
qlist[3]
I]
oo_
eoo
12'aIdr
I void args
Figure
Each
SDA
appropriate
to that
of queues,
one
pbinter value
master
that
is a thread
shown
method
that
in Figure
functions.
back
9 for the and Both
returns
to the
in the
for
messages
The
stack
associated
sent
with
Along
SDA
from
using
the
task
queue
method method
pointers
generic
method
method
of a list
and
invocation
to its
argument
request.
returns
takes
structure
poin(ers
invocation function
and
consists are
single
to the
a list of outstanding the stub routine.
stubs a data
structure the
args
prototype
incorporates
method
the
with
stack
Tim data
each
whereas
stub function.
each method queue also contains by the message received from
SDA.
are called
and
a boolean,
master
with
fimctions
[void
used
waits
message.
by the stub code
always
is sent
structure
which
by the
method,
was created
function
functions, represented
list data
as specified
for each
and
condition
queue
action
analogous condition
9: Tile
I_
The
a generic condition requests,
The algorithm in Figure 10 depicts the main loop of the SDA master. On receiving a message from a task stub routine, the SDA master executes the associated condition function to deterlnine if the method is enqueued executed
and
the
of any method enqueued SDA are executed. can be executed, routines.
method in the
results
can be executed. appropriate list.
returned
may change methods are This
at which
to the caller
the SDA reevaluated
reevaluation time
If the condition function Otherwise, the associated through
the
stub routine.
returns method Since
false, function
the is
the execution
state, the condition functions associated with any and the methods whose conditions evaluate to true
of condition the master
functions continues
I4
is repeated waiting
until
for further
no further messages
methods from
stub
S da-mast
er ( )
{ do
forever
m
= wait-for-message
if
(m.condition
(); (m.arg)
==
true)
{ result
= m.method
(m.arg);
return-result-to-caller
(result,
m.caller);
repeat
( for
each
method
queue
do
{ while
condition
of
queue
i
is
true
{ m
=
get-first-method-in-queue
result
= m.method
(i)
(m.arg)
return-result-to-caller
(result,
m.caller)
} } } until
no
methods
are
executed
in
a
single
pass
} else enqueue-message
Figure
4.1
ing
10: Pseudocode
Preliminary
In order
(m);
for tile main
loop of an SDA
master
Results
to quantify
the overhead
of our
method
invocation
design,
we 1)et'formed
the follow-
experiment:
1. We first the
measured
basis
from 2. Next,
the
calling
stub
we encoded
call mechanism, tures.
the round-trip
for comparing
This
our
to a master
a version but
is done
this
test
of our
without by
for underflow/overflow work,
other
represents
the
performing conditions. a lower
message
delay
results,
since
thread, SDA
and stack
routines SDA
a sequence Since
our
to the using master
of pushes SDA
to the
15
network
in all cases back
complicated
bound
of our
method
execution
using
p4.
a single calling
algorithm must of our
provides is passed
stub.
a simple and
This
message
pops
remote and
procedure queue
without
do at least SDA
stack.
struc-
checking this
much
3. Next,
we timed
pushes
and
they
always
designed from 4.
but
time,
true
this
then
retrieve
of the
trial
call
(including
basic
time
of the R.PC
for
overhead
overhead
to the when
We acknowledge
such
5
we are
Paragon.
introduced
definition
with
1.7% the
basic the
for the entire
basic
RPC
overheads
some and
currently
in the
We plan
to report
and a mechanism
method
queue (test
process
low traffic.
basic are
vs.
(test
test
#2),
(test
#4
(test
our
prototype
#1); vs.
test 4.0%
is only
5.8%
may to other
final version
only
only
#1).
round-trip
interface
in the
#4
adds
which
is
is 3.4%
vs. test
vs. test
by tile large
of porting
adds
which
#3
#1
similarly.
functions
is 2.3%
passing
results
computed
mechanism
time
message
to test
#2),
then under
results:
round-trip
test
time
were
performed
#2
and for a
empirical
of test
condition
vs.
message
on these
Future
message
yield
higher
platforms,
of this
paper.
Research
for integrating
a set of primitives
#4
trials
with
its
calls,
required
were
structures
may be obscured
a leaner
different
following
invocation
message
time
tests
to the
message
method
method
average
overheads
round-trip
round-trip
that
complete
the
#3
first
the
method,
manipulating
overhead
(test
on
arriving with
The
executing
and
finally
All
overhead
false the
complete
#1).
relative
mechanism
that
is
pushes
and
the
4.1, yield
and
SDA
with
test
messages
associated
of 10,000
of 95%.
mechanism
compared
package,
Conclusions
We have HPF
and
as the
p4
the
for manipulating
overhead
overheads,
trials
Remaining
RPC
This
of alternating
function,
10 workstations
only
method,
unpacking
evaluating
to determine
in Figure
where
methods.
to enqueue
overhead
case of test
messages
basic with
so the overhead
of the
#2),
marshaling
as compared
overhead
latency
the
intervals
-tt_st#1)/l_test#t.
to the
overhead
#3),
in the
adds
test
(tt,_st#2
overhead
the
vs.
master
its condition
delays)
presented
mechanism
#1 as:
tests,
of alternating
for each
and
a series
function
SDA
on a set of Sun Sparcstation
computed
1.7%
method
the
of inultiple
confidence
any
packing
using
to measure
message
to achieve
(test
the
each forces
message
(or a round-trip
with
a series
evaluated
routines.
re-evaluate
consisted
outcome
the
method,
four tests
conditions
The
This
are
enqueue
associated
prototype
we simulate
performing
conditions
so we never
stack
is designed
together
the
condition
SDA
second.
the
and
This test queues.
was timed
averaged similar
time
implementation
overhead
evaluating our
on the
execution. the method
SDA although
to true the
and
we timed
pops,
method
evaluate
tile stubs
Each
prototype However,
to highlight
Finally,
each
our
pops.
for defining 16
task and
and
data
parallelism
controlling
parallel
by extending tasks
and
the shared
Time
for
each
test
(wit]]
q5%
confidence
intervals)
10 Test Test Test
2:
3:
SDA
Test
4:
I:
p4
simple without
SDA
round-trip RPC
wit]]
using
p4
method
queue
method
queue
18.8
18.6
18.4
'U
o lS.2 o, ol
18
c
.,-t ,-.
17.8
H
._
17.6
17.4
17.2
i ]
17 0
i 2
| 3
i 4
5
Test
data
abstractions
(SDAs).
system, and provide data structures and In particular, prototype task
parallelism
pops,
and
codes
execute Using
this and
SDA
prototype SDAs:
our
prototype
and
we have
a simple
a multidisciplinary on a cluster
runtime
the implementation
of executing
Using
on the
support
necessary,
to support
such
stack
task
codes
implemented program
optimization of workstations
of our method
(MDO) using
implementation,
with
centralized
and which
measured
SDA
executed
communication
as a C+ +
control
randomized for aircraft
the
design
two systems
performs
application
p4 as the
we have
invocation
and
data
employing pushes
design.
and Both
library.
expected
overhead
of our
method invocation design and have found that our design adds little overhead to a strippeddown RPC version of the same test, which we feel is a realistic lower bound for remote method for the
invocation. final
version
We plan of the
a
details for the two segments of the runtime support: distributed invocation using monitor and guarded condition semantics.
we describe
capable
structures.
We focus
design method
to expand
our
results
paper.
17
on the
overheads
of method
invocation
W'e are currently in the processof incorporating tile Chant runtime system into our prototype, which will give us the capability to exerciseour distributed SDA data structure designs,as well as full support of our parallel tasks and portalfility to a large number of platforms, including the Paragon and KSR-1. From the experiencesand results of our prototype, wearemaking modificationsto the SDA syntax andsemantics,andare working on a source-to-sourcetranslator that will take HPF programsaugmentedwith our SDA syntax and produce codethat will executeon distributed memory multiprocessorsand workstation clusters. Finally, we are working with severalapplicationsgroups to developrealistic MDO codesthat will provide the true test of our designs.
References [1] Thomas
E. Anderson,
Scheduler lelism.
activations:
manet
[5] Barbara
Piyush
Programming, M.
A software
Chapman, ICASE
Engineering,
A. Ellis
Addlson-Wesley, W. Felton
applications puting
Report
94-18,
Guarded
and
tIans
B. Wagner.
Technical
Report
January
An 88-01-
1988.
programming
October
system.
1992.
Programming
in Vienna
Van
Rosendale,
and
applications:
Integrating
for Computer
Applications
non-determinacy 18(8):453-457,
Stroustrup,
The
ISBN
0-201-51459-I.
and
Dylan
McNamee.
84-89,
P¢:TJor-
Fortran.
Itans task
P. Zima. and
data
in Science
and
1994.
1990.
pages
Zima.
High
1992.
David
1)4 parallel
Laboratory,
John
Institute
commands,
Bjarne
and
199t.
1992.
of the ACM,
by multlthreading.
Conference,
and
In ,5'calabb:
systems.
to the
M. Levy. of paral-
95-109,
VA, April
of Washington,
National
Mehrotra,
VA, March
Communications
[8] Margaret
guide
of multidisciplinary
Hampton,
Dijkstra.
Mehrotra,
Piyush
architecture
parallelism.
lTser's
Henry
management pages
90.
M. Levy,
University
Argonne
I ( 1):31-50,
Fortran
Williamsburg, Henry
and
user-level
Prb_ciples,
programming
Science,
Lusk.
C,hapman,
[6] Barbara
W.
Ewing
ANL-92/17,
for the
Vienna
51-59,
parallel
D. Lazowska,
5'ystcms
Lazowska,
for building
and
support
H. Zima. pages
D.
Report
£'cicntific
grams.
and
of Computer
Butler
Technical
Edward
on Opcratb_g
Edward
environment
[4] Ralph
kernel
Conference,
N. Bershad,
03, Department
[9] Edward
Effective
B. Chapman,
Computing
[3] Brian
[7] E.
N. Bershad,
In A CM Symposium
S. Benkner,
open
Brian
hnproving
In Proceedings April
1992.
18
and August
AnTwtatcd
formal
C++
the performance
of the Scalable
derivation
of pro-
1975.
High
R@rcncc
Manual.
of message-passing Performance
Com-
[10] MessagePassingInterface Forum. draft [11]
[12]
edition,
I. Foster,
M. Xu,
High
Performance
I. T.
Foster
Science
[14]
[15]
Fortran K.
a Standard
A. Choudhaxy.
Fortran
Chandy.
Report
Argonne
Message
Passing
Inter:face,
Fortran
system
May
Revision
Laboratory,
June
that
integrates
1994.
M: A language
MCS-P327-0992 National
A compilation
M. In SHPCC,
for
modular
parallel
1, Mathematics
and
pro-
(lomputer
1993.
Ian Foster, Carl Kessehnan, ability layer for parallel and
Robert Olson, and Steven Tuecke. Nexus: An interoperdistributed computer systems. Technical Report ¥%rsion
1.3, Argonne
December
National
Labs,
Dirk
Crunwald.
A users
and
simulation
system.
Science,
Ilniversity
Matthew
tlaines
Matthew
and
threads
Science
and
Wire
An object
Report
On
November
tile design
1:209-240,
oriented
CU-CS-552-91,
at Boulder,
Bghm.
David
1993.
to AWESIME:
Technical
LaT_guagcs,
Haines,
talking
guide
of Colorado
of ProgrammiT_g [16]
and and
M.
Technical Division,
for
1993.
B. Avalani,
and
gramming.
[13]
November
Document
parallel
t)rogramming
Department
of Computer
1991.
of distributed
memory
Sisal..]ourT_al
1993.
Cronk,
and
Piyush
package.
ICASE
Report
Engineering,
NASA
Langley
Mehrotra.
94-25,
On
Institute
Research
the
design
for Computer
Center,
Hampton,
of (lhant:
A
Applications VA 23681,
in April
1994. [17]
High
Performance
version [18]
IEEE.
[19]
David nical
Fortran
1.0 edition, Threads
Tools
for
and
UWCSE
[22]
Mamoru
Maekawa.
A CM
Transactions
Frank
Mueller.
USENIX,
techniques
[23]
Bodhisattwa interface ing,
Mukherjee, for lightweight
Georgia
Institute
for building
University
Fortran
Language
,5'pccificatioT_,
fast
(Draft
portable
of W'ashington,
for mutual
Systems,
7), February
threads
1992.
packages.
Tech-
1993.
Diego,
Greg threads.
exclusion
3(2):145-159,
implementation San
5'gstcms
Gannon. Object oriented parallel programnaing experiments of,5'upercomputiu9 91, pages 273-282, Albuquerque, NM,
on Computer
29-41,
Pelformance
OpcratiT_g
A v/_ 7 algorithm
A library
pages
Portable
93-05-06,
[20] Jenq Kuen Lee and Dennis and results. In Proceedings November 1991. [21]
High
1993.
Ea'te_zsion
Keppel. Report
May
Forum.
of POSIX January
1993.
Eisenhauer,
and
Kaushik
Technical
of Technology,
Atlanta, 19
Report
May threads
CA,
in decentralized 1985. under
(lhosh.
CIT-CC-93/53,
Georgia,
1993.
systems.
UNIX.
A machine College
Ill Winte:r
independent of Comput-
nCUBE, Carl
Beaverton,
OR.
SchmidtInann,
multithreaded
Michael
Xlib.
[26] It. Schwetman. Technology
nCUBE/2
[27] B. Seevers,
multiple
Run- Time
Environments and
CS-94-112,
and
data-parallel for
T. (;ross.
School
parallel
of Computer
Design
1990.
and
imI)lnlentation
San Diego,
CA,
.9). Microelectronics
Austin,
Hatcher.
modules.
SYSTEMS,
193-203,
(Revision Blvd,
Distributed
Task
Watt.
pages
Manual
P. J.
Owrvicw,
Steven
94:30 !Research
Quinn,
supporting
and
USENL¥,
Cfi_[M Reference
M. J.
[2s] J. Subhlok
Tao,
In Winter
Corperation,
T_chnical
TX,
programming
In Workshop
on Languages,
:I'[meor 9 Machines,
October
Science,
in Fx.
Carnegie
and
1993.
Computer
1986.
A parallel
progranuning
of a
January
Mellon
environment Uompibrs
and
1992.
Teclmicat
Report
University,
CMU-
Pittsburgh,
PA
1521:3, 1994.
[29] J. Subhlok, lelism
on
Principles
[:30]Sun
.J. Stichnoth,
D. O'Hallaron,
a multicomputer. and Practice
Microsystems,
and
In Proceedings of Paralh'l
Inc.
T. Gross. of the
Programming,
Lightweight
Process
Exploiting
2rid ACM San
Library,
Diego, sun
task
and
,S'IGPLAN CA,
May
release
data
paral-
Symposium
on
1993.
4.1 edition,
January
1990.
[:31]Thorsten Active
von Eicken, messages:
Proceedings pages
[:32]Mark
256
\¥eiser,
May Alan
to interoperability. December 1989.
E. Culler,
A mechanism
of the 266,
David 19th
Annual
Seth
Copen
for integrated h_tcrnationaI
Goldstein,
and
communications Symposium
Klaus and
on
Erik
Schauser.
computation.
Computer
Architecture,
1992. Demers, ACM
and
Carl
Symposium
Hauser.
The
on Opcratin
20
portable 9 Systems
common Principles,
runtime pages
approach 114-122,
In
Form
REPORT
DOCUMENTATION
PAGE
Approved
OMBNo
0704-0188
Public reporting burden for this collection of information is estimated to average t hour per response including the time for reviewing instructions, searching existing data sources gatheringandmaintainingthedataneeded andcompletingandreviewingthecolfectionofinformation Send comments re ardin thisbu den estmate or an oth ras f s coIlect .... f,nf .... ,,on,including suggestions for reducing this burden, to Washington Headquarters Servi .... Directorate _or ,nfogrmat on Operat ..... d ReYports_,2lPe_;f_ertshon Davis Highway, Suite 1204. Arlington 1. AGENCY
4.
USE
TITLE
AND
VA 22202-4_02
ONLY(Leaveblank)
and to the Office of Management
12.
REPORT
I
Apri]
and Budget
DATE
3.
Paperwork
REPORT
1994
Reduction
TYPE
Contractor
S.
SUPPORT
FOR DATA PARALLEL
(0704 0188)
DATES
Washington
DC 20503.
COVERED
Report
SUBTITLE
RUNT[ME
Project
AND
FUNDING
NUMBERS
TASKS C
NA,ql-19480
W IT ,)0o-. _- _" 90-o_-').-01 6.
AUTHOR(S) Matthew
Haines,
,John
7.
Van
PERFORMING Institute and
Bryan
Rosendale,
Hess, and
Piyush
Hans
ORGANIZATION
NAME(S)
Computer
for
Mehrotra,
ZiIna
AND
Applications
in
ADDRESS(ES)
8.
PERFORMING REPORT
Science
Engineering ICASE
Mail
Stop
132C,,
Hampton,
9.
NASA
VA
Langley
AGENCY
Aeronautics
Langley
and
Research
Hampton,
To 12a.
NAME(S)
Space
AND
ADDRESS(ES)
10.
No.
9,1-26
SPONSORING/MONITORING AGENCY
Administration
NA,qA
REPORT
NUMBER
CR-194904
ICA,qE
23681-0001
SUPPLEMENTARY
Report
No.
94-26
NOTES
Technical
Final
Report
Center
Center
VA
Langley
Research
23681-0001
SPONSORING/MONITORING National
11.
ORGANIZATION NUMBER
Monitor:
Michael
F.
(-Yard
Report appear
in
Supercomputing
'94
DISTRIBUTION/AVAILABILITY
STATEMENT
]2b.
DISTRIBUTION
CODE
Unclassified-Unlimited
Subject
13.
Category
(Maximum
ABSTRACT V_Te have
to
these
support
optimization
SUBJECT
provide
tasks. these we
words)
introduced
and
approach,
14.
200
recently
parallelism, antong
61
a set for
this
In extensions,
implement (MDO)
problem
Fortran data
paper and
a
of
shared
discuss
we discuss
the
prototype for
language abstractions,
of aircraft
extensions
the design underlying
the
as
a
for for
and implementation system
We
allow
ntethod
requirements
runtime
design.
that
(SDAs)
give
for
and initial
use results
integrated
issues such
this
of
support discuss
TERMS
Thread-based
To an
runtime test
future
systems,
data
parallelism,
ta.sk
parallelism,
task
and
data
synchronization system
the
abstract
necessary
feasibility
of
this
multidiseiplinary
plans.
15. runtime
of and
the
a system. to
and
support
communication
NUMBER
HPF
OF
PAGES
22' 16.
PRICE
CODE A03
17.
SECURITY OF
CLASSIFICATION
REPORT
Unclassified _ISN 7540-01-280-5500
18.
SECURITY OF
THIS
CLASSIFICATION PAGE
19.
SECURITY OF
ABSTRACT
CLASSIFICATION
20.
LIMITATION OF
ABSTRACT
Unclassified ii ii Standard Prescribed 2q8-102
i Form 298(Rev. 2-89) by ANSI Std z3q IR