A Specification for a Highly. Parallel Computer. System. RND-91-015. David. Bailey,. John Barton,. Russell. Carter,. Thomas. Lasinski,. Horst. Simon. Numerical.
A
Specification
David
for
a Highly
Bailey, Thomas
Parallel
Computer
System
John Barton, Russell Carter, Lasinski, Horst Simon 1
Report RND-91-015,
December
1991
NAS Systems Division NASA Ames Research Center Mail Stop 258-6 Moffett
1 NASA
Ames
Research
Field,
Center,
CA 94035-1000
Moffett
Field,
CA 94035-1000
A
David
Specification
Bailey,
John
for a Highly Parallel RND-91-015
Barton, Numerical NASA Moffett
Russell
Carter,
Aerodynamic Ames Research Field,
CA
Computer
Thomas
Lasinski,
System
Horst
Simon
Simulation Center
94035,
USA
ABSTRACT A specification specification computer
for a highly parallel is designed to facilitate
system
system specification level supercomputer performance performance; support, and
through
the
use
computer system is presented. The acquisition of the highly parallel of a competitive
procurement.
The
is designed to cover all aspects of a production system, including floating point architecture
issues; main memory and mass storage capacity operating system and programming languages network hardware and software interface and
performance requirements. The evaluated primarily on the basis Parallel Benchmarks.
floating point of performance
Page 1
and
and software
computational rate is achieved on the NAS
INTRODUCTION The purpose
of this
specification
is to define
the
requirements
necessary to acquire the NAS HPCCP Testbed 1 (NHT-1) computer system, which will be installed in the Numerical Aerodynamic Simulation (NAS) system at NASA Ames Research Center, Moffett Field,
California.
requirements
It is mandatory defined
that
in Sections
offerors
2 through
offeror cannot meet these requirements, the information describing what can be provided the desired feature would be available.
1.1
all of the In the
event
an
offeror must supply and projections of when
Background The NAS
Program
is a large
in computational
centers
computational industry.
world
to a wide
NASA, The
other
program
capable
of performing
highly
parallel
floating 2000.
The
complex
computer
point
installation
The NAS Cray
Y-MP
provides of local with
scientific
remote
users
academia
and
research
and Communications of computer systems
computations
capable
on a sustained of the NHT-1
at a sustained
of achieving CFD
system will
premier
scientific
providing
Computing development
state-of-the-art
of the
and
agencies,
tasked
computation rate of 3-5 Gigaflops attaining this objective.
Description
the
is one
generation supercomputers. The is to make available to its users system
operations
and
government
for the High Performance (HPCCP) leading to the than current NAS Program
to advance facility
range
has been
support Program
rate greater plan of the
effort This
in the
services
representing
scale
aerodynamics.
supercomputer
1.2
meet
7 below.
one
calculation, capable
long term an advanced, trillion by the
year
of a sustained
be an important
milestone
in
consists
of a
of Environment Processing
System
system,
a Cray-2
Network
(NPSN)
system,
a Connection
currently
Machine-2
system,
an
Intel iPSC/860 system, as well as numerous other auxiliary systems that provide for interactive usage, networking and mass storage. Through its long-haul communication system, scientists across the United States access
1.3
the
Definitions In the capable
NPSN
via high
speed
data
for Hardware/Software
following,
the
of executing
multiplication
term both
instructions.
links.
Specifications
"processor" floating The
is defined point
"local
Page 2
addition memory"
as a hardware and
floating
of a processor
unit point refers
to less
randomly accessible memory that can be accessed by that processor than one microsecond. The term "main memory" refers to the
combined
local
memory
of all processors.
shared by all processors than one microsecond. randomly
accessible
processor
within
hardware
unit
memory,
which
the
storage
of one
is logically The
computation.
term
offeror's
term
proposed
response
system
versa. nodes
"service
must
shall
nodes. nodes
nodes, be met
memory
and
processors
unit
on the nodes" refers
one
is defined
plus
network
to high-speed
nodes"
by at least
node"
refers
their
local
that
connects
to those
floating to those
point
processing
to system operations, including compilation, with external computers over a network. specifically
state
of computational
which nodes
part and
of the
which
part
These two subsystems need not be mutually may also be computational nodes, and vice
However, the requirements must be met by the subsystem
computational nodes
or more
devoted
consists
consists of service exclusive; system
any
can be accessed
A "processing
"computational
primarily
The
that
a single
nodes primarily devoted linking and communication The
media
milliseconds.
consisting
nodes
includes
that can be accessed by each processor in less The term "mass storage" refers to non-volatile
forty
processors.
processing
This
in
the
listed below for computational identified by the offeror as
requirements
by the subsystem
Page 3
identified
listed
below
as service
for service nodes.
as a
2
NHT-1
SPECIFICATION
The offeror
is requested
corresponding approximately will
be denoted
a "Half-Size
approximately will be denoted not explicitly Testbed The
proposals
targeted
implicitly
apply
shall
computation
the
to both,
provide rate
The second
dollars and Testbed."
to either
generically
a testbed
on certain per
floating
point
operations
for a Full-Size
Half-Size
memory Size
and
Testbed
Testbed
will
a minimum will
have
second
have
denoted
the
with
"NHT-I."
benchmarks
of 1.5 billion
of 16 billion
of 4 billion
bytes
of mass
of 8 billion
bytes
storage.
bytes
Details A.
2.1.1
The
Half-Size
Testbed
must
include
at least
64 computational
The
Full-Size
Testbed
must
include
at least
128
of main The Full-
of main
General
2.1.3
or 3 billion
Testbeds.
2.1
2.1.2
floating
Testbed,
and a minimum of 32 billion bytes of mass storage. performance requirements are given in Appendix
Hardware
is
Full-Size
a sustained
for a Half-Size
a minimum
a minimum
level
or the
CFD-based
operations
funding
Testbed
system
point
level is at this level
a system proposed at this level In the following, specifications
Half-Size
floating
The
for Testbeds
The first funding a system proposed
Testbed."
$10 million a "Full-Size
contractor
point
to prepare
to two funding levels. $5 million dollars and
memory of the
Specifications
nodes.
computational
nodes.
Each computational node must be capable of performing floating-point arithmetic on numbers in the 64 bit IEEE-754 "double" format, as specified
in [8.16].
On the
computational
multiplication least 52 bits, The hardware In the
event
be capable the result software
nodes,
the
results
of 64 bit floating point as specified in [8.16]. shall
be capable
of an underflow
of addition,
numbers
of reporting
must
a processor
on a computational
interrupt.
Page 4
and to at
all detected
processor
node,
of either flagging an error, or at the user's to zero and continuing without intervention or causing
subtraction, be accurate
the
node
errors. shall
discretion, setting by system
2.2
Main
2.2.1
The
Memory
size
Specifications
of main
memory
for the
computational
Testbed shall be at least 4 billion bytes. the computational nodes of the Full-Size billion 2.2.2
The
offeror
shall
the Full-Size
provide
Testbed
An individual
the
bit error
must
be capable
any double
option
to at least
in data
to increase
Testbed to at least to increase the size
computational
single
16 billion node
fetched
of detecting
processor network data communicated
must
from
any double
2.3
Mass
2.3.1
The
Storage
the
Half-Size
be capable bit error.
The
any
memory
and
computational
of correcting any single bit error and must be capable of detecting
the
capability of reporting communication errors,
Testbed
shall
Full-Size
have
Testbed
at least
shall
have
in
all detected both
16 billion
bytes
at least
32 billion
2.3.3
There
be a reliable
system.
Transfer or exceed
main
of correcting
to its local
The mass storage system shall have the capability detected mass storage errors, both correctable and shall
of the
8 billion bytes. The offeror of the main memory for
2.3.2
storage
size
Specifications
storage. The mass storage.
The
for 8
bit error.
The memory system shall have main memory and interprocessor correctable and uncorrectable.
Data
Half-Size
bytes.
or stored
must be capable between nodes
2.2.4
2.4
of the
of main memory shall be at least
bytes.
memory for the Half-Size shall provide the option
2.2.3
nodes
The size Testbed
Backup
backup
mechanism
across
an external
rate from the mass storage 8 megabytes per second.
bytes
of reporting uncorrectable.
for data network
system
of mass
stored is not
to backup
of
all
in the
mass
satisfactory.
media
must
meet
Communication NHT-1
shall
include
support the simultaneous described above, at least least four (4) HiPPI/Ultra
a data
communication
capacity
sufficient
operation of the mass storage system four (4) Ethernet network interfaces, and network interfaces.
Page 5
to at
2.4.1 The NHT-1 shall provide the capability of attaching at least four HiPPI interfaces [8.15]. Each of these to an Ultra network [8.8] (HiPPI/Ultra).
interfaces These
must also HiPPI/Ultra
(4)
provide access interfaces
shall be capable of transferring at least 40 million bytes per second per pair of adapters in a sustained loopback mode over the transmission media. The source and destination files shall be resident on the mass storage
system
performed requirement
2.4.2
The
in Section
2.3.
shall
provide
interfaces.
transferring sustained
at least loopback
These
the
capability
Ethernet
this
requirement
test shall
The details A.5.3.
for testing
of attaching
per second transmission
6
shall
at least be capable
be this
four
(4)
of
per pair of adapters in a media. The source
on the service node file system. pairs of adapters. The details for
are in Appendix
Page
loopback
interfaces
0.250 megabytes mode over the
and destination files shall be resident This shall be performed between two testing
This
between one pair of adapters. are in Appendix A, Section
NHT-1
Ethernet
required
A, Section
A.5.3.
3
NHT-1
SOFTWARE
3.1
System
Software
3.1.1
Source
Requirements
Code
At installation, tapes containing Vendor at any
will time.
Operating The
the contractor shall provide, if requested, magnetic all of the source codes for the current system software.
also
allow
access
System
and
Utilities
operating
conformance
system with
reference Validation
interface
UNIX
terminate,
The operating errors
suspend,
detected
system
3.1.2.4.1
UNIX System on the service resume
nodes
shall
request
be in
4 as defined
in
to lock
adjust
nodes
including
interface, This
gethostid,
gethostname,
as described
includes
the
log
single
in Sections calls
getpeername,
and
bit and
nodes, errors.
a number [8.2]) shall
system
priority
job into
shall
on the computational errors, and disk
required.
the
a computational
on the service
The socket
as
a specific
report
routines
described
double
required.
Page
7
bit
of Berkeley UNIX also be included as
2, 3, and 4 of [8.2] is
accept,
bind,
be implemented
in Section
all
interprocessor
getsockname,
shall
to
of a
connect, getsockopt,
recvfrom, select, send, sendto, sendmsg, socket. In addition, modifications to read
write to allow socket operations described in Section 2 of [8.2]. The library
V system commands, nodes to allow a user
and
On nodes that provide network access, 4.3 commands (as defined in reference follows:
recv, recvmsg, shutdown, and
3.1.2.4.2
service
V, Release
by the hardware,
main memory errors network communication 3.1.2.4.
on Government
in [8.1].
computational job, and node or set of nodes. 3.1.2.3
source
on the
System
In addition to the standard facilities shall be available initiate,
to on-line
[8.1]. This operating system shall pass the System V Suite (SVVS) published by AT&T Bell Laboratories,
specified 3.1.2.2
SPECIFICATIONS
3n of reference
and
as
[8.2]
are
3.1.2.4.3
The "special
files"
described
ethernet
Ultra
support,
the 3.1.2.4.4
arp,
inet,
ip, pty,
and
in Section
4 of [8.2.]
as well
as the
tcp manual
pages
shall
that
interfaces shall
The programs hostid, hostname, optimize, telnet, and telnetd, as described in Sections [8.2],
3.1.2.5
and
pertain
to
described
be
in
supported.
tip, ftpd, sendmail, 1 and 8 of reference
be supported.
The socket interface shall support the Internet protocols IP, ICMP, TCP, FTP, SMTP, UDP, and TELNET, as described in reference [8.3] for all nodes networking
that
provide
network
functionality
access.
set forth
In addition
in references
to this,
[8.9]
and
all
[8.10],
shall
be provided. 3.1.2.6
3.1.2.7
3.1.2.8
Subnetting:
The
access
comply
shall
NFS:
The
RPC,
with
XDR,
NFS
in references
Xl1:
a client
Either
The
implementation
operating
The
system
system
users shall
must
shall
have
allow
the
shall
run
have
operating
system.
computational
be provided shall
support
the
allows node,
on
more
than
it must
under
facility
the
It
job sizes
system. to prevent
one
to share
user
twenty
nodes.
different
to another
Page 8
which
at least
assigned
include
operating
computational between
the operating
nodes
This
to allow
the
of nodes
an efficient
of jobs
is run.
facility
jobs
be
in
all computational
to that
Test
restarting
computational
system
shall
by
[8.13].
emulation
memory.
be identical
allocation
without
system
and
SUN
nodes
as specified
job to utilize
an efficient
to simultaneously also
from
shall
Xll
nodes
of main
Performance
accessing
If the
a single
80 percent
Processor
be changed The
allow
at least
The system
[8.12],
by the vendor,
[8.6].
service
of X-windows
on the service
configuration
Multiple
code
on the
with
network
in reference
server
[8.11],
of NeWS nodes
provide
System (NQS) to allow submissions nodes as specified in reference [8.5].
shall
and
and
be provided
implementation
on the service [8.14].
that
as described
client
as specified
system
3.1.2.13
and
the vendor,
nodes
3.1.2.12
RFC 950,
shall
Network Queueing from other NPSN
3.1.2.11
on nodes
File System
provided reference
3.1.2.10
software
Networked
or a client
3.1.2.9.
system
an efficient
a user user
from
or the
a single facility
to prevent
to
one user from accessingthe data of another user or the operating system. 3.1.2.14 The maximum file size allowed for a job on the computational nodes must be at least twice the size of main memory. This operating system configuration must be identical to that under which
the
3.2
Programming
3.2.1
Compilers
for the
be supplied
run the
on the service computational offeror
Fortran-77
language,
on the service
shall
nodes, nodes. certify
that
to be part
Compilers
for
[8.7],
be supplied
in one.
the
The
ANSI draft the current include ANSI Any
draft
document
necessary
for the
entire
that
and
one
standard
using
nodes constructs
percent
of main the
the
to
Fortran-77 in this
memory.
compile
compile may
compilers
programs be
combined
conform
to the
language, as specified The C compilers shall as described
language code
in
in the
in reference
is considered
[8.7].
in this
compiler. features,
shall
support
or extensions.
to utilize
computational
shall
library,
C source
in reference
shall
compilers
these
C subroutine
to analyze
to these
or C program
programs
programs to run on be combined in one.
One
These
for the C programming used
Fortran between
certify
of the
computational
compile
as defined
nodes.
nodes,
nodes.
to be part
In addition the
language,
service
shall
standard
shall
[8.4],
Software Testing Service. Any Fortran source code is considered
on the service
on the
preprocessor
in reference
pass
standard for the C programming draft standard in reference [8.7].
support
is run.
of the compiler.
computational offeror
One
compilers
the C programming
to run on
as defined
nodes.
these
document
to run
Test
and one shall compile These compilers may
suite of the Federal used to analyze
shall
Performance
Support
validation preprocessor
programs
3.2.3
Processor
Software
shall
The
3.2.2
Multiple
the
Fortran
parallel They
all computational
Additionally, nodes
and
high-speed the mass
and
C compilers
computation, must
allow
nodes
a single
and
at least
asynchronous storage
for
if
system
80
I/O must
be
supported. 3.2.4
The maximum file size supported by the Fortran and C compilers for a single job on the computational nodes shall be at least twice the size of main
3.2.5
memory.
Codes written computational
in assembly nodes shall
language or other languages for the be able to access Fortran and C subroutines,
Page 9
and vice versa. 3.2.6
and
vice
The
vendor
that will nodes.
3.2.7
Fortran
programs
shall work
The same
supply
a symbolic
on Fortran
loader
shall
and
be used
be able
shall
include
debugger,
C codes
to call C subroutines,
with
running
for all languages
running on the service nodes. all produce AT&T ELF object and
shall
versa.
a dbx-like
on the
interface,
computational
designed
for programs
The compilers for these languages shall modules as defined in reference [8.19],
all ELF debugging
information
as a compile
time
option.
3.3
NHT-1
Support
3.3.1
Handling
Requirements
of System-Related
The contractor
a clearly
defined
system
problems
mechanism problems
shall reported
be provided for prompt by the Government.
these
and
mechanism
shall
the
of the
severity The
(24)
hours
point
per
of contact
day.
for
A
handling of system-related This mechanism shall provide
Government
contractor's
provide
single
plan
of both
for dealing
for response
the
with
escalation
status
them.
of The
depending
on
problems. shall
reports
on
provide
access
the NHT-1
to the
Government
of hardware
system.
Support
On-Site The
to the
the
also
contractor
Personnel
twenty-four
communications
problems
monitor
Support
contractor
FORTRAN systems (exclusive On-Call
shall
analyst analyst.
consultation
3.3.2.2
provide
reporting
for regular
3.3.1.1
shall
Problems
from
provide and
These 08:00
of Government
at least
at least two
one
analysts
to 17:00
local
one
full-time
full-time will
on-site
on-site
be available
time,
Monday
software for through
Friday
holidays).
Support
The contractor shall provide a hardware customer engineer on-call twenty-four (24) hours per day. The offeror also shall provide a software
systems
analyst
on-call
during
all the
hours
software
systems
analyst
is not
on duty
at NAS.
Both
Page 10
that these
the
on-site
analysts
will
be available
for service
on-site
within
two
(2) hours
when
requested. 3.3.3
3.3.4
Training The
contractor
site
training
shall
Documentation
hundred
(200)
instructor
hours
of on-
as required.
and
provide to the Government complete the operating system, languages and shall be provided for frequently utilities, and shall be available
Government
shall
NPSN
including
users,
two
personnel,
Support
The offeror shall of the hardware, documentation commands
provide
of NAS
have
the those
right
to make
off-site.
Page
ll
copies
documentation libraries. On-line
used system to users. The of documentation
for
4
NHT-1
4.1
Floating The
PERFORMANCE
Point
Half-Size
billion
SPECIFICATIONS
Computation Testbed
shall
specified
Government.
per
second
the Government.
The
rate exceeding 3 billion on a suite of benchmarks
exceeding
on a suite Full-Size
1.5
of
Testbed
shall
64 bit floating-point specified by the
are approximations.
The
of the wall clock Details of these
times between requirements
Mass
in Appendix
Storage
from
test
shall
in Appendix network
node that
A, Section
A.5.3.
Definitions
government
Note
that
this
test
to pass
and
and
are all components
A, Section
for Availability
"Inoperative"
requires
FORTRAN requirements
the
in detail
certain
with of this
to and
4.2.2.
is described the mass
mass storage
program to be for this test are
A.5.4.
or "down"
the
"down"
if it at any
specified
when
refers
Government's
installation
reports
Specifications
"available, .... operational, .... operative" can correctly execute the Government's
execute
and
4.2.1
data
Availability
In this section, that the system
begins
sufficient
to be run simultaneously
These
in Appendix
4.3.1
at a rate
by the
tests
of transferring
in Sections
shall consist of a benchmark by the offeror. The detailed
described
System
be capable
memory
be provided
storage data transfers. benchmark test.
4.3
shall
are required
benchmark
This test supplied
Performance
subsystem
computational tests
actual
A.
Benchmark
storage
benchmark This
performance
rate
rates
The mass
4.2.2
a performance
test requirements are specified in terms initiation and completion of execution. are given
These
Performance
operations
by
achieve a performance operations per second
4.2.1
achieve
64 bit floating-point
benchmarks
4.2
Benchmark
the that
time
to a system
workload. fails
tests
Government the system
any
required
notifies is down.
Page 12
cannot
The system
to complete
at the
that
the
or "up" means workload. correctly
shall
of the
be
Government's
performance. vendor's
considered "Downtime"
point
of contact
4.3.1.1
During
the
begins return
executing the system
approved 4.3.1.2
After
by the
completion
vendor
Acceptance During
the
(beginning downtime advance
when
the
workload. The status according
notifies
Acceptance the
Test,
"uptime"
Government
contractor must to a procedure
that
begins
the
Acceptance
Test,
up to four
(4) hours
time
the system per
may
be allocated
twenty-four
each
day
Acceptance
shall
to the
(24) hour
Test
be considered
the Government's workload over a 30-day test period.
Production
day
period,
the
scheduled
system
at least
must 80%
completion
inoperative malfunction
uptime. be available
of scheduled
of the
Acceptance
Test,
if the
system
to uptime
remains
during scheduled uptime due to a hardware or software for a total of four (4) hours (whether or not consecutive)
during a twenty-four contractor shall grant
(24) hour day (beginning at midnight), a downtime credit to the Government.
downtime
credit
accumulate
each
the
hour
counted
This
Requirements
the
hours.
the
is ready
Requirements
execute averaged
regarded
when
system
status.
the
credit
system
Government.
During
After
begins
at midnight) for the purposes of system maintenance. must be scheduled at least twenty-four (24) hours in and must occur between 18:00 and 08:00 local time. The
remaining
4.3.3
"uptime"
the Government's to an operational
to operational Test
contractor
Test,
of the
officially
to return 4.3.2
Acceptance
will
will
system
be prorated as operative
At that
time,
at the rate
is inoperative
in excess
for fractional until the
hours.
it has been
previous
two
as uptime.
Page 13
of 1/20
of a day
of four The
system
up for two (2) hours
the The (5%)
(4) hours. will
not
for The be
(2) consecutive will
be retroactively
5
FACILITY PLANNING AND SITE PREPARATION The contractor shall provide a plan for facility planning and site preparation. The contractor shall be responsible for providing a site preparation equipment, including
plan
for the
such
as connections
vendor
equipment,
provided
installation from
motor
MAINTENANCE
6.1
Responsibilities
building
contractor
and
installed
condition.
shall and
The
all delivered
provide
shall
Preventive
shall
shall
reserves
preventive
Preventive
specify
The Contractor
for
shall
of preventive
provided
established,
sources,
to the
computer
room
equipment
delivered
operating services
for
manufacturer.
Equipment
in writing
right
the
in good
the
frequency
and
duration
If a mutually agreed cannot be established,
to specify
the
schedule
of
upon the
for performance
maintenance.
Maintenance
mutually
of the
for maintenance
of the
for Leased
the
for
arrange
regardless
Maintenance
Government
that
sets
the equipment
the preventive maintenance required. schedule for preventive maintenance
quality
power
generator
maintenance
keep
contractor
hardware
The Contractor
6.3
operation
of the Contractor
The
of
and
etc.
6
6.2
proper
for Purchased specify
in writing
maintenance.
by the Contractor
agreed the
performance
upon
of preventive
the
The
frequency,
quality
for identical
schedule
Government
Equipment
leased
for preventive
reserves
the
maintenance.
Page 14
right
duration,
shall
and
be comparable
equipment.
maintenance to specify
If a cannot
the
to
schedules
be
7
FEDERAL
INFORMATION
PROCESSING
STANDARDS
PUBLICATIONS (FIPS-PUBS) In accordance Regulations Standards specification FIPS
1-2
with
(FIRMR), Publications and
Federal
Information
the following (FIPS-PUBS)
are
applicable
Data
Encryption
FIPS
Recorded
to the extent
Magnetic
FIPS 60-2
I/O
Channel
FIPS 61-1
Channel
FIPS
62
Operational
FIPS
69
Federal
FIPS
79
Magnetic
specified
Interchange,
Standard
CPI (246 cpmm),
Resources
Management
Federal Information Processing shall constitute a part of this
Code for Information Subsets and Extensions
FIPS 46 50
the
Its Representations,
(DES)
Tape Group
herein:
for Information Coded
Interchange,
6250
Recording
Interface
Level
Power
Control
Specifications
Standard Tape
Interface
for Magnetic
Tape
Subsystems
FORTRAN Labels
and
File Structure
for Information
Exchange FIPS 81
Data
FIPS
Additional
86
Encryption
Standard
Controls
Standard
Code Practices
FIPS
112
Accepted
FIPS
113
Data
FIPS
130
Intelligent
FIPS
149
Government
FIPS
151-1
POSIX: Portable Environments
(DES)
for use
for Information
Authentication Peripheral Open
Modes
with
of Operation
American
National
Interchange
for Password
Usage
(DES) Interface Systems
(IPI) Interconnection
Operating System (IEEE 1003.1-1988)
Page l5
Interface
Profile
(GOSIP)
for Computer
8
REFERENCES
8.1
System
V Interface
#320.135,
American
Information 8.2
Unix
The
Internet
Protocol
New
Standard American
from
Cosmic
Athens,
8.9
Requirements
Network
Berkeley
of California,
Request
Standard
1989, Internet SRI International,
Data
on the Sun 2550 Garcia
Park,
CA
Fortran, 1430
ANSI
Broadway,
for (by
Annex,
Comment Mogul,
950,
J. and
11750.
of
"Internet
Postel, CA,
Standards
Institute,
Drive,
-
Application
1430
Specification"
Workstation, part #800-1324-03, Avenue, Mountain View, CA
Page 16
Sun 94043.
CA, 95134,
RFC
1122,
Information
Support,
Engineering Task Force. Network Menlo Park, CA, 94025. Protocol
Broadway,
Layers,
and
1985.
C, Document
San Jose,
-- Communication
J.).
August
Language
101 Daggett
Hosts
ARC
University
Los Angeles,
Programming
Hosts
Number
Engineering Task Force. Network Menlo Park, CA, 94025.
Representation
94025.8.4.
Language Institute,
Services
National
for Internet
1989, Internet SRI International,
Network
Submission
Institute,
Technologies,
1982
542-3265.
Procedures,"
for Internet
Requirements
Software.
Sciences
National
Ultra USA
"External
4.3
10018.
Group
Working
8.8
8.11
IN 46219.
Programming Language FORTRAN, National Standards Institute, 1430
(404)
Subnetting
October Center,
Customer
Manual,
Menlo
Standards
Computer
#X3.159-1989, American New York, NY, 10018.
8.10
AT&T
Indianapolis,
University
Programming
GA 30602,
Standard
American
NY,
System
Network
October Center,
1986,
I, II, III, IV, Order
Inc.,
Programmer's
Handbook,
National
York,
USC/Information 8.7
Standard
Queueing
Available
8.6
Volume
Road,
SRI International,
National
American National ANSI X3.9 - 1978,
Georgia,
Unix
Transition
- 1978, American York, NY 10018.
Network
Telegraph,
Franklin
1, November
Center,
Broadway, 8.5
and
North
System,
Volume
American
8.4
Edition,
CA 94708.
Information X3.9 New
2833
Time-Sharing
Berkeley,
Third
Telephone
Center,
Distribution,
8.3
Definition
RFC
1123,
Information
in Networking
Microsystems,
Inc.,
8.12
"Remote
Procedure
Sun Workstation, Garcia Avenue, 8.13
"ONC/NFS 3084-10, Avenue,
8.14
Call
Protocol
Protocol
Specifications
and
Rev A, of 26 August 1988. Sun Mountain View, CA 94043.
X Windows
Version
X Window System Xlib - C Language X Toolkit
11, Release
Intrinsics
Compound
Text
X Logical X Display
C Language
Encoding,
Nonrectangular
on
Inc.,
Manual",
Microsystems,
the
2550
Part
No.
Inc., 2550
11, Release 11, Release
Interface,
800-
Garcia
4 4
X Version
11, Release
Manual,
Version
4 1.0
1.1
Conventions, Version Protocol, Version 1.0
Window
Specifications available MIT X Consortium
Services
Version 2.1 Conventions Version
Font Description Manager Control
Laboratory
in Networking
4
Protocol, X Version X Interface, X Version
Bitmap Distribution Format, Inter-Client Communication
Xll
Specification"
part #800-1324-03, Sun Microsystems, Mountain View, CA 94043.
Shape
Extension,
1.3 Version
1.0
from:
for Computer
Science
545 Technology Square Cambridge, MA 02139 8.15
High
Performance
Draft Draft
proposed proposed
American 10018.
Parallel HiPPI-PH HiPPI-DF
National
Interface ANSI ANSI
Standards
(HiPPI)
Standard Standard Institute,
X3T9.3/88-023 X3T9.3/89-013 1430
Broadway,
8.16
IEEE Standard for Binary Floating Point Numbers, Standard 754-1985, IEEE, New York, 1985.
8.17
PCF
Fortran
Extensions
1990; PCF, c/o Illinois 61820. 8.18
D. Bailey,
J Barton,
Benchmarks, Research 8.19
UNIX
Kuck
Revision Center,
System
Programming
-- Draft and
Moffett V Release
Support
Associates,
T. Lasinski, 2.
Document,
and
Field,
CA
Report
Tools.
1990,
Guide:
Prentice-Hall,
Page 17
NY,
eds. July
March
18,
Champaign,
The NAS
RNR-91-002,
94035-1000,
4 Programmers
2.11,
Fox Drive,
H. Simon,
Technical
York,
ANSI/IEEE
Revision
1906
New
Parallel
NASA
2, 1991.
ANSI Inc.
C and New
Jersey.
Ames
Page 18
&P_Y_Fd_II212f_ DETAILS
A.1
INTRODUCTION
A.1.1
General
A establishes
requirements "Benchmark
for the Package"
sample
output
the
files
implementations
the
The benchmark
for single
these
tests
proposal
will
as the
are to be included is selected
be
provided
proposals "RFP
in response
Response
should
concerning
be directed
the
running
reimburse
other tests. The
offerors
expenses
benchmark
not to be used demonstrations. disseminated programs Research shall
benchmark offeror performs such
programs
and
same
Every
and accompanying Center upon notice a Contracting
final
decisions
programs in their
the
any
proposal,
as proposed.
additional
and
the
for the
will
Page
19
these
or any
benchmark
of NASA
and
are
be of the to Ames Officer
Representative of the
(COTR)
results
NASA
of the
may
require
or software of verifying
be given
requirements.
efforts,
shall be returned The Contracting
hardware purpose
be made
specified
demonstrations.
proposed
will
or data may and all copies
Technical
evaluation
The offeror
demonstration
than
programs parties,
Officer's
programs
effort
performing
documentation of non-selection.
regarding
offeror's set of
manner. Questions and Ames Research Center will
are the property
other
of
prior to shipment. The vendor is
programming
and
data
for any purpose
to demonstrate
described
in preparing
No copies of these in any form to outside
appoint
to make
incurred
time,
results
benchmark
Officer.
for machine
These
that the proposed system in the response to the RFP.
to provide answers to questions in a timely answers will be disseminated to all offerors. not
to
are to be If the
the exact
of the
to the Contracting
below. the
proposal.
then
required to demonstrate prior to shipment meets or exceeds the performance claimed questions
in writing
Test," and
benchmark tests is to be repeated at the vendor's site This set of tests is denoted the "Pre-Shipment Test."
All
The data,
to potential
are described
offeror's
award,
tests. input
package
to the RFP,
in the
for contract
performance
scaled-down
the benchmark
for Testbed
offeror
and
benchmark source code,
processor,
request
identified
by the
procedures
of the NHT-1 containing the
benchmarks)
tests
collectively
performed
specific
execution (a tape
offerors who specifically Ames Research Center.
tests,
TESTS
Information
Appendix
and
OF THE BENCHMARK
adequate No
the
specifically that
the system
notice
reimbursement
of any will
be made and
for these
the
Government
the benchmark Government.
A.1.2
Testbed
Type
Half-Size are required explicitly section,
Benchmark
Passage
must
be verified
selection
A.3
The
in the
by the
of successful
INSTRUCTIONS Passage
results
proposed
be performed
intended
for the
that
the levels
differences
the
offeror
specified
offeror's The
are
demonstrate
in Sections
RFP Response
A.4,
Test
RFP Response
as specified
PRE-SHIPMENT Test
Test
in Section
are
results
A.4
prior
that
the offeror
RFP Response.
actual
to NAS.
by the Government
requires
hardware The
The and
the
Pre-Shipment
software
Pre-Shipment
as specified
demonstrate
requirements meets or exceeds
Test
in Section
A.4
Test
configuration results
must
prior
be
to
shipment.
A.4
GENERAL TESTS
INSTRUCTIONS
FOR
RUNNING
THE
BENCHMARK
The time, location, and participants demonstrations must be mutually
in the Benchmark Test agreed to by both NASA
offeror.
The
a suitable
for use
by the
provided
offeror Ames
by the
shall
provide
Research
offeror
shall
Center include
private
participants. persons
and
the
conference The
room
participants
knowledgeable
in both
the hardware and software systems to answer questions that arise the course of the demonstration. For all the tests described in this section
that
require
system
software,
execution the
to
TEST
on the benchmark A.5.1-4, A.6.1-3 that
offeror's
on the
for delivery
verified
these
tests
of the
Government
Pre-Shipment
must
for both
performance
TEST
requires
response.
prior-to-shipment performance specified in Sections A.4, A.5, level
of
of the
proposals.
FOR THE
of the
demonstration
interest
to proposals
of Testbed,
Test
offeror's
request,
explicitly stated in the appropriate type of Testbed are identical.
of the benchmark
A.6.1-3.
may
the
best
different
RFP RESPONSE
RFP Response
to be included
apply
Where
otherwise for each
completion
A.5.1-4,
below
on the type
FOR THE
to delay
to be in the
Testbeds.
depending
of the
right,
Offerors
Tests
presented
Full-Size
noted. Unless the requirements
successful
the
and the
INSTRUCTIONS
A.5,
reserves
if it is judged
tests
and
demonstrations.
tests,
The benchmark
A.2
additional
compilation
of source and
Page 20
programs linking
other
of this
code
than
in
standard
must
be
performed, as a prelude to
the
benchmark representatives, NHT-1. The same version
on the service of the compiler
all tests. Once signal is given representative. Where
the
test,
the test is ready by the officially
performance
in the
requirement
jobs
may
be executing
it shall NASA
is stated
being performed, may occur during interruption
in the
must,
of a test
run due
after
completion
upon
to an operating
consultation
of any
request,
with
source
3.
listing
A paper
A disk
file copy
The offeror
a
clock from test. No test
is
Any rerun shall of the test.
test
to NASA
of the
actual
of the output
files
is also
1, 2, 3 and
strongly
encouraged
NASA will
at the COTR's
demonstrations,
the
representatives
the
software
including
used,
programs
for the
of items
failure,
participants,
In the case of an interruption offeror's control (such as a
of all system
listing
system
offeror
following:
demonstrated
in the
test.
4. A separate paper listing of the differences between the test and the reference output files, if any, provided 5.
of "wall
for
after
a demonstration
offeror's
test. the
of these
provide
1. A complete description version numbers. 2. A paper test.
in terms
when
outage), a rerun will be allowed. be performed from the beginning the
be started only benchmark
continuous time-of-day the completion of the
system
decide whether or not to rerun the due to circumstances totally beyond
After
of the be used
and no system tuning or other human intervention the execution of the test. In the case of an
representatives,
power option
of NASA
node subsystem and linker must
to execute, designated
time" this time is defined as the elapsed the beginning of the test execution until other
presence
the output by NASA.
files
of
4. to provide
a description
of not
more than one page length detailing with references the algorithm used, and a description of not more than one page detailing the data layout scheme used for each benchmark. Verification
of the
Benchmark
made based on a technical demonstrations.
A.5
OPERATING NETWORKING There are four A.5. The tests A.5.1),
the
SYSTEM TESTS separate are the
Functional
Test
analysis
results
by the
of the
Benchmark
VALIDATION,
DATA
Government
TRANSFER
AND
tests involved in passing Appendix System V Verification Suite (SVVS) Performance
Page
Test
21
(Section
A.5.2),
will
Test
the
A Section (Section
be
Throughput Performance Storage File Tests (Section modification configuration
System The
System
This
The
Definition
Package
Suite. must
Performance
Test
form
are
by successful
Test
of a shell
to the
run.
be run on the
failures by the
The
service
completion
is included
archive.
use the Bourne shell to unpack to set up the test, and then use
must
network
be approved
Performance
in the
Tests
[8.1]
dedicated
Compatibility
allowed
Functional
Benchmark
(SVID)
and
be verified
V Verification
Functional
Mass
(SVVS)
system
will
exceptions
all other
Suite
operating
functionality
any
which
V Interface
node
System
A.5.2
under
V Verification
service
A.5.3) and the FORTRAN test must be run without
and successfully completed. The operating system under which these tests are run must be identical
configuration
A.5.1
Test (Section A.5.4). Each
of the
must
be noted,
and
Government.
in the offeror
the archive, the run shell
nodes.
Benchmark
shall
be required
to
use the install shell script script to run the test to
completion. The install shell script asks the vendor to supply the pathname to the networking I/O library and the hostnames to be used in the ethernet HiPPI/Ultra functional tests. It also asks for the number of files to include in the each file. The script finally asks to correctly components
compile for the
throughput
test.
install
script
verifies available the
install
script
has completed,
that a Berkeley and functional,
the
A.5.3
test
must
at the
Throughput This A.5.2 Section script source
test
be run
network
above.
This
A.5.2. asks and
test
for the
addresses
uses
hostnames
in the actual throughput hardware configuration
or failure.
shall
(BSD) socket to transfer
specified.
that
provide
the
Once
be run.
This
the test
library is files using
In addition
of the echo protocol will run scripts reports success
Bourne
shell
to ftp,
be used to or failure.
network
files.
archive
the functionality
has an install
destination
test
to install I/O
access.
Test
in the
It also
success
functional
on all nodes
Performance is included
reports
Standard Distribution and that it is possible
a Government implementation verify BSD functionality. The This
as the pathname of information needed
the functional test. It then attempts functional performance test and the
The
ftp command
run test, as well for configuration
and
to use This
performance provided will
Page 22
demonstrated
run shell
as well
script
then
referenced script.
as pathnames creates
in Section by the
The
to use
a database
test
install
in
shell
for file
used
test. It is likely that the target not support this test. This test
may be modified with the hardware and
provided
required
4.2.2
will
so as to run at the
benchmark The
by the Government
in Section
vendor
the permission In addition
provided.
run
test. shell
test
in Section
at this point
When
all of the
Section
4.2.2,
the
time test.
shall
followed
by the
system
files.
"READ
"WRITE
is once
again
portion
printed
difference
first
time
this
has
between
the
the start of are then
of the
of the
including
without interval
clock
COTR.
TEST" portion
transfers must be successfully completed 120 second wall clock time interval. This in wall
by the
of this
a background I/O benchmark
TEST"
completed,
test
by the
portion
When
by creating FORTRAN
the
have
time
test
the
is printed out indicating The required file transfers
begin
transfers
this
networking
be approved
source
the same time The vendor's
4.2.2
file
must
Package,
into
as the
all of the
system of the
at approximately for each transfer.
required
time
in order to run on test described below
Benchmark
be incorporated
same
builds
completed, the current the benchmark portion started process
in the
All modifications
script
of the COTR to the network
the
out.
All
from
of these
error within is measured
printing
test.
test
a single by the
of the system
time and the final printing of the system time. A log file for each transfer is also created by the run shell script. These logs will be examined to help determine the success of the benchmark. File transfer program.
through
the
network
shall
be done
with
the
vendor's
Transfer
Number
Type
each type to pass
I
of
Source
Streams
file
tip
The required benchmark rates are given below in Table 1 for of file transfer. Each individual rate must be met or exceeded the above benchmark wallclock time limit.
Table
File
Size per Stream
(billion bytes) HiPPI/Ultra
1
4.8
Ethernet
2
0.30
Minimum Required Transfer Rate per Stream (million bytes per second) 40.0 0.25
The run shell script shall be executed the ethernet and HiPPI/Ultra network
twice. During the first execution, interfaces shall be disconnected
from
the
system
the
wire
loop
back.
in the log. then
the
networks in any
to demonstrate This
run
If it appears run
is considered
shall way,
and
shall
that generate
to succeed,
the run shell
connection
is through
vendor-dependent
or if the diagnostics
a failure.
be reconnected,
Immediately
without script
Page
file
shall
23
do not appear,
afterwards,
modifying be executed
the
diagnostics
the
the
running again.
This
system time
it shall
complete
within
the
time.
The
network
of the ftp one as output
and
ftp program
one
as input.
Simultaneously,
pair of HiPPI/Ultra to disk in all three
one
Mass
Storage
File Test
The
shall
prepare
a FORTRAN
offeror storage perform
files totaling the "READ
into
computational
data
in memory).
The
"READ
TEST"
portion
clock time or less. The complete in 60 seconds complete
in 120 seconds
A.6
MULTIPLE
A.6.1
Introduction The
memory
Processor
program
Throughput
that
shall
first
is
create
shall
then
bytes
perform
(overwriting the
"WRITE
the TEST"
is to write out at least 10 billion bytes of data, 10 billion bytes of data. For this test, the
of the
test
shall
complete
in 60 seconds
"WRITE TEST" portion wall clock time or less. wall
clock
time
Performance
of the test The entire
wall
shall test shall
or less.
PERFORMANCE
TEST
Test
Benchmarks" [8.18]. The performance this test differs depending on whether Full-Size Testbed.
Language
of the
occur.
all 10 billion
program
PROCESSOR
Multiple
shall
at least 10 billion bytes. The program shall TEST" portion of the test which is to read
node
portion of the test which overwriting the original
execution
interfaces cases.
FORTRAN
mass then
A.6.2
specified
portion of the run shell script consists of two executions program, each using a different pair of ethernet adapters, using one measured
A.5.4
successfully
is based
on the
level required the proposal
"NAS
Parallel
of a Testbed for is for a Half-Size or
Rules
The Multiple Processor Performance Test must be programmed using the Fortran-77 language, with a number of approved extensions. The complete language rules are given in [8.18], Section 1. All of these rules apply
to this
programming A.6.3
Multiple The
test,
Processor
Multiple
with
language
the exception is not
Performance
Processor
"Size"
gives
Test
Performance
defined in the NAS Parallel increased to the sizes listed headed
that coding
allowed
the
the
problems
consist
of the
in the C
here.
Details Test
will
problems
Benchmarks [8.18], with the problem sizes in Table 2 below. In Table 2, the column
values
of the key
Page
24
problem
parameters.
The
column
headed
in millions
"Memory"
of 64-bit
words,
implementations.
The
counts of floating implementations. Package
approximate
based
column
on one
headed
instructions
programs
may
on how
NHT-1
Multiple
Benchmark
Processor
Simplified Conjugate
4 3-D
Parallel
FFT PDE
5 Integer
LU Solver
7
Pentadiagonal
8
Block
by the
compiled
and
vendor
Pre-Shipment
in
Test
Memory
FP Ops
3.2 x 1010 6.0 x 109
512 x 2562
500
3.2 x 1010
225
128
1.0 x 1010
1023
26
4.5 x 1011
1023
22
4.9 x 1011
1023
22
7.5 x 1011
to review and verification tests are performed, the
for all of the using
listed
8.9 x 10 l°
Solver
linked,
sizes
one
1
The results of this test are subject COTR. Before any of these kernel produced
in these
480 50
Solver
Tridiagonal
computer Benchmark
5122x 256 7.4 x 106
Sort
6
processor in the
problem
23o
Solver
approximate
2
Size
Multigrid Gradient
computer
parameters
to the
requirements
gives
on one included
Performance
Description
1 Embarrassingly 2 3
processor
the
be increased
Table
memory
"FP Ops"
point operations, based The sample code files
contain
processor Table 2.
gives
benchmark
a single
version
by the source code
programs
of the
must
Fortran
be
compiler
and linker for the computational nodes. Compiling and linking the entire set of Multiple Processor Performance Test source files must be completed within 30 minutes wall clock time. The individual computational
tests
executables operating execution
files
described
resulting
below
from
system must remain of all the benchmark
The performance benchmarks will
must
this
be performed
compile-link
up during codes.
with
operation.
compilation,
Pi of the proposed system be computed as follows:
on each
the The
linking,
of the eight
and
kernel
Fi Pi =_ where
Fi is the
number
i= 1,2,3,...,8
of floating
point
kernel listed in Table 2, and seconds of the i th kernel.
ti is the
The
requirement
minimum
Performance
performance Test
is that
for each
of the
Page 25
operations
measured
for eight
(FP Ops)
elapsed
the
Multiple
kernels
for the
runtime
the
in
Processor performance
ith
pi must exceed 1.5 billion FP Ops per second for the Half-Size or 3.0 billion FP Ops per second for the Full-Sized Testbed.
Testbed,
Proposals that meet this minimum performance requirement will then be evaluated by the rank of their overall multiple processor figure of merit P. P is computed as follows: t2 + T-3 t3 +_4 t4 +____s)+51.t6(_66 +_7t7 +_88 ta ) P=25 2_2,,tl (_- + T-2 where h, t2, ..., ta are defined the following manner.
above,
and T1, T2, ..., Ts are computed
in
The proposals will be divided at the time of evaluation into two groups: one for proposals for Half-Size Testbeds and the other for proposals for Full-Size Testbeds. For each benchmark kernel and group, the corresponding Ti is defined to be the minimum elapsed runtime measured in seconds for that kernel in any of the verified vendor proposals in the group. Thus proposals for Half-Size Testbeds will have an overall multiple processor figure of merit P computed on the basis of the group of Half-Size Testbed proposals, and likewise the proposals for Full-Size Testbeds will have P computed on the basis of the group of Full-Size Testbed proposals.
Page 26