Emerging ATM technology may resolve many of the existing ... NYNET wide area ATM network that will serve as the experimental platform for development .... The NYNET public network provides permanent virtual circuit (PVC) cell relay service ... enough PVC's to provide full mesh logical connectivity between all sites.
A High-Performance Distributed Computing Environment for the NYNET ATM Wide Area Network Salim Hariri, Georey C. Fox and JongBaek Park Northeast Parallel Architectures Center Syracuse University Syracuse, NY 13244 Stuart Elby and Joel Morrow NYNEX Science and Technology White Plains, NY 10604
Abstract Current advances in processor technology, networking technology, and software tools have made high-performance distributed computing (HPDC) over local as well as wide area networks promising. However, these advances can not be exploited fully unless some networking and communication issues are resolved. Emerging ATM technology may resolve many of the existing network limitations. The current implementation of communication software represents another problem that demands further research. In this paper, we present an approach to develop an HPDC environment that will harness the computing potential of existing, cost-eective heterogeneous computers and ATM networks. This environment supports two modes of operations: Normal Speed Mode, where standard transport protocols are used, and High-Speed Mode, which utilizes a light-weight communications protocol. In this environment, we develop a message passing interface that implements eciently the commands/primitives supported by most software tools for parallel and distributed computing (e.g, PVM, EXPRESS, ISIS, etc.). The message passing interface utilizes the services provided by a high-speed communications protocol (HCP) that itself runs over the ATM Adaptation Layer (AAL). We also present the architecture of the NYNET wide area ATM network that will serve as the experimental platform for development of the HPDC software and communications protocols. Applications that will be developed on top of the HPDC environment over the NYNET testbed will also be discussed.
1
1 Introduction Current advances in processor technology, networking technology, and software tools have made high-performance distributed computing (HPDC) over local as well as wide area networks promising. A cluster of 1024 DEC alpha workstations would provide a combined computing power of 150 Giga ops, while the same size con guration (1024 nodes + control computer + i/o computers) of the CM5 from Thinking Machines Inc. has a peak rating of only 128 Giga ops. Also, a recent report from the IBM European Center for Scienti c and Engineering Computing [1] stated that a cluster of 8 RS/6000 Model 560 workstations connected with IBM serial optical channel converter achieved a performance of 0.52 Giga ops for the Dongarra benchmarks for massively parallel systems. The obtained result outperforms a number of existing parallel computers like a 24 node Intel iPSC/860, 16 node Intel Delta, 256 node nCube2 or a 24 node Alliant CAMPUS/800. The current trend in computer networks is away from shared channel architectures with aggregate capacities of 10-100 Mbps towards switched channel architectures with aggregate capacities reaching multi-Gbps. Several software tools have been introduced as well to help distributed programming by providing message passing primitives; e.g., PVM, Express, ISIS and so on. However, these advances cannot be exploited fully to provide a wide area heterogeneous HPDC environment unless some hardware and software issues are resolved. Wide area networks are typically limited to tens of Megabits/sec per connection { 1-2 orders of magnitude less than the aggregate bandwidth of emerging computer LAN's. This bottleneck precludes the seemless interconnectivity of computers across the wide area. Furthermore, most existing protocols were designed in the 1970's, when the available communication bandwidths were in the Kbits/sec range and the existing computing nodes had limited computing power. Since these protocols regarded the communication bandwidth as a scarce resource and the communication medium as inherently unreliable, they were designed to be very robust to handle complex failure scenarios and resulted in complicated and inecient protocols. In this paper, we present an approach to develop a high-performance distributed computing environment that will harness the computing potential of existing, cost-eective heterogeneous computers and ATM (Asynchronous Transfer Mode) based wide area networks. Emerging ATM technology { with large bandwidth, low latency switching { may resolve many of the existing network limitations by providing seemless connectivity between computers across vast distances. We also present a high-speed communication protocol that will eciently implement the HPDC environment. The HPDC environment to be implemented on NYNET uses high speed ATM
2
switches interconnected by ber-optic SONET (Synchronous Optical Network) links to integrate the parallel computers and supercomputers available at NYNET sites into one virtual computing environment as shown in Figure 1. NYNET will also provide its users with direct access to the national supercomputer center network.
ATM platform from the bottom up.
2.1 NYNET Network Architecture A generalized ATM network topology for heterogeneous computing environment is shown in Figure 3. The speci c layout of the NYNET testbed is shown in Figure 1. The backbone of the network is comprised of broadband ATM switches interconnected via SONET trunks. In the NYNET testbed, these trunks will range from OC-3 (155 Mbps) to OC48 (2.488 Gbps) data rates. To provide uni ed network management and control capabilities across the public network, the switches are interfaced with the ATM Forum speci ed Public NNI (Node-to-Node Interface) and an OSI CMIP-based network management system is operated. The NYNET participants are provided access to the ATM backbone via 2 OC-3c SONET links per site1 . The purpose of providing two links per site is as follows:
To provide each site with larger than OC-3c bandwidth prior to the availability of standards compliant OC-12c public network equipment,
To permit a user to run experiments in which trac runs through the network and back to his/her own site. This eases the development of protocols by permitting experimenters to observe and control both ends, and
To test techniques of inverse multiplexing to achieve throughputs that are greater than any single link can provide.
The interface between the participant sites and the public network will conform to the ATM Forum speci ed UNI (User-to-Network Interface), and an SNMP-based customer network management system will be developed. This management system will allow the users to monitor and collect network con guration and trac statistics, as well as providing limited end-to-end management and control capabilities. To interconnect the New York upstate participants to the downstate participants (dierent LATAs) requires NYNET interface to a third party's network. This is accomplished by means of the ATM Forum speci ed B-ICI (Broadband Inter-Carrier Interface). The same interface will also be used to connect NYNET to the national computing environment network. The NYNET public network provides permanent virtual circuit (PVC) cell relay service (CRS). In the NYNET PVC CRS, virtual circuits are provisioned based upon the peak user burst rate, which may be speci ed as large as the line access rate (OC-3 in this case). However, the gain due to statistical multiplexing and the very bursty nature of data trac permits many 1
The Syracuse Museum of Science and Technology will attach to the network via a single OC-3c link.
5
PVCs, each with the line rate bandwidth, to share a single link with minimum cell loss. Studies based upon Poisson trac statistics indicate that an ATM statistical multiplexing gain of 2-4 can be achieved with cell loss rates as low as 10?9 . Each NYNET user link will be assigned enough PVC's to provide full mesh logical connectivity between all sites. Additionally, each trac mode supported on a path may be provided its own PVC { in this way HSM trac would traverse PVCs allotted the full 155 Mbps while NSM trac would traverse PVCs allotted no more than 45 Mbps. At the NYNET participant sites, there are three general local topologies considered: 1. Direct, dedicated CRS access { A port on the public ATM switch is dedicated to a single device such as a supercomputer, MPP machine, or a video/image server. Currently, this type of access is not available with commercial products, but it is expected that direct OC-3c SONET and HiPPI/OC-12c interfaces to high performance computers and servers will be available in 1994. 2. Direct CRS access via local ATM networks { An ATM LAN switch is used to provide direct 100-150 Mbps ATM access to workstations via TAXI (now) and OC-3c (1994). This architecture allows legacy networks to gain access via routers using LAN bridge interfaces (Ethernet, FDDI) available on many LAN ATM products. 3. Indirect CRS access through routers { Routers equipped with DS-3 (45 Mbps) HSSI interfaces can use the ATM DXI protocol to route legacy LAN trac across the ATM network. This architecture requires an ATM DSU which provides the DXI to ATM protocol conversion. DSUs equipped with a concentrating (muxing) function permit several routers to share the OC-3c link. Any particular NYNET participant site is likely to have a hybrid of these topologies, and the explosive growth of ATM products leads to continually changing topologies.
2.2 The HPDC-ATM Interface The HPDC environment described above supports two classes of trac. Both of these are best supported by a connection-oriented variable bit rate data transport service. The ATM Adaptation Layer type 5 (AAL 5) was designed to support this type of service, and NYNET uses AAL 5 exclusively. Because routing is performed at the ATM layer (refer to Figure 2), all information required for meeting Quality of Service (QOS) objectives at intermediate switching nodes must be present in the ATM header. To accomplish this, data passing through the HCP layer (HSM trac) will be placed in ATM cells in which the ATM header CLP (cell loss priority)
6
bit is set to 1. In that the normal state of the CLP bit is 0, the HSM protocol stack is not required to eect the state of this bit. Hence, cells carrying HSM trac will have a higher priority than cells carrying NSM trac. At contention points within the network { primarily at switch output buers { cells with CLP 1 will always be served prior to those with CLP 0. By appropriately designating the buer size of the CLP 1 and CLP 0 buers2 and using a priority service policy, the QOS objectives in terms of maximum cell latency and cell loss rate may be met for both trac classes. The QOS throughput objectives are addressed by provisioning a dierent PVC bandwidth to each trac class as described above.
2 The switch output buers may be physically separated into two FIFOs or logically separated into two virtual FIFOs, depending upon the switch vendor's implementation.
7
3 Software Support for Parallel and Distributed Computing In order to identify the HCP services for HPDC, we rst study the primitives provided by some current parallel/distributed programming tools. These software tools are a subset of existing tools and our study is by no means comprehensive; however, it serves our purpose in the sense that it provides us with a list of services that are required to support HPDC. The software tools studied include EXPRESS [5], PICL [3], PVM [6], ISIS [2], the iPSC communication library [4] and the CM5 communication library (CMMD) [7]. These tools were selected because of their availability at the Northeast Parallel Architecture Center at Syracuse University and also the following two reasons: (1) they support most potential computing environments, i.e., parallel, homogeneous and heterogeneous distributed systems; and (2) they are either portable tools (EXPRESS, PICL and PVM) or hardware dependent tools (CMMD and the iPSC communication library). There is an increased interest in the standardization of message-passing primitives supported by software tools for parallel/distributed computing [8]. The characterization provided in this section can be viewed as step in this direction.
3.1 Characterization of Message Passing Primitives for HPDC The communication primitives supported by existing libraries can be characterized into ve classes, viz., point-to-point communication, group communication, synchronization, con guration/control/management, and exception handling.
3.1.1 Point-to-Point Communication The point-to-point communication is the basic message passing primitive for any parallel/distributed programming tool. To provide ecient point-to-point communication, most systems provide a set of function calls rather than the simplest send and receive primitives.
Synchronous and Asynchronous Send / Receive: The choice between synchronous
and asynchronous primitives depends on the nature and requirements of the application. As a result, most tools support both, asynchronous and synchronous send/receive primitives. For example, EXPRESS provides exread/exwrite for synchronous message passing, and exreceive/exsend for asynchronous message passing. To provide asynchronous message processing, additional supporting functionality must be provided in the tools. For example, 1) poll/probe the arrival and/or information of incoming messages e.g., extest, probe, or CMMD msg pending used in EXPRESS, PVM, or CMMD, respectively; 2) install a user-speci ed handler for incoming messages 9
e.g., exhandle or hrecv used in EXPRESS or iPSC, respectively; and 3) install a user-speci ed handler for outgoing messages e.g., hsend used in iPSC.
Synchronous/Asynchronous Data Exchange: There are at least two advantages for
providing such primitives. First, the user is freed from having to decide which node should read rst and which node should write rst. Second, it allows optimizations to be made for both speed and reliability. The examples of synchronous data exchange are exchange and CMMD swap in EXPRESS and CMMD, respectively. iPSC provides csendrecv for the synchronous case and isendrecv for the asynchronous case.
Non-contiguous or Vector Data: One example of transferring a non-contiguous message
is sending a row (or column) of a matrix that is stored in column-major (or row-major) order. For example, exvsend/exvreceive and CMMD send v/CMMD receive v used in EXPRESS and CMMD, respectively.
3.1.2 Group Communication Group communication for parallel or distributed computing can be further classi ed into three categories, 1-to-many, many-to-1, and many-to-many, based on the number of senders and receivers.
1-to-Many Communication: Broadcasting and multicasting are the most important ex-
amples of this category. For example, exbroadcast and bcast used in EXPRESS and ISIS, respectively. Some systems do not explicitly use a separate broadcast or multicast function call. Instead, a wild card character used in the destination address eld of point-to-point communication primitives, provides multicasting functions. It is important to note that in ISIS broadcast primitives with dierent types and order are available to users. Users can choose the proper broadcast primitives according to the applications.
Many-to-1 Communication: In many-to-1 communication, one process collects the data
distributed across several processes. Usually, such function is referred to as reduction operation and must be an associative, commutative function, such as, addition, multiplication, maximum, minimum, logical AND, logical OR, or logical XOR. For example, g[op]0 and g[type][op] in PICL and iPSC, where op denotes a function and type denotes its data type. Furthermore, the reduction operation will be more useful if a user-speci ed routine can be issued. For example, global operation excombine, gcomb0, and gopf in EXPRESS, PICL, and iPSC, respectively. 10
Many-to-Many Communication: There are several dierent types of many-to-many communications. The simplest example is the case where every process needs to receive the result produced by a reduction operation. From implementation point of view, such operation can be implemented by a many-to-one operation (i.e., one process combines the data from other processes) and a one-to-many operation (i.e., one process broadcasts the data to other processes).
3.1.3 Synchronization A parallel / distributed program can be divided into several dierent computational phases. To prevent asynchronous messages from dierent phases interfering with one another, it is important to synchronize all processes or a group of processes. Usually, a simple command without any parameters, such as, exsync, sync0, gsync in EXPRESS, PICL, and iPSC, can provide a transparent mechanism to synchronize all the processes. But, there are several options that can be adopted to synchronize a group of processes. In PVM, barrier, which requires two parameters barrier name and num, blocks caller until a certain number of calls with the same barrier name are made. In PICL, barrier0 synchronizes the node processors currently in use. In iPSC, waitall and waitone allow the caller to wait for speci ed processes to complete. Another type of synchronization is that one process is blocked until a speci ed event occurs. In PVM, ready and waituntil provide event synchronization by passing the signal. In ISIS, the order of events is used to de ne virtual synchrony and a set of token tools (e.g., t sig, t wait, t holder, t pass, t request, etc.) are available to handle it. Actually, event detection is a very powerful mechanism for exception handling, debugging, as well as performance measurement.
3.1.4 Con guration, Control, and Management The tasks of con guration, control, and management are quite dierent from system to system. A subset of the con guration, control and management primitives supported by the studied software tools are such as to allocate and deallocate one processor or a group of processors, to load, start, terminate, or abort programs, and for dynamic recon guration, process concurrent or asynchronous le I/O, and query the status of environment.
3.1.5 Exception Handling In a parallel or distributed environments, it is important that the network, hardware and software failures must be reported to the user's application or system kernel in order to start a special procedure to handle the failures. In traditional operating systems such as UNIX, ex-
11
ception handling is processed by an event-based approach, where a signal is used to notify a process that an event has occurred and after that a signal handler is invoked to take care of the event. Basically, an event could be a hardware condition (e.g., bus error) or software condition (e.g., arithmetic exception). For example, in the iPSC library, a user can attach a user-speci ed routine to respond to a hardware exception by the handler primitive. In ISIS, a set of monitor and watch tools are available to users. EXPRESS supports tools for debugging and performance evaluation. PICL supports tools for event tracing.
3.2 HCP Primitives The primitives of software tools studied are shown in Table 1. Based on this characterization, the HCP services that support most software tools for parallel and distributed computing are shown in Table 2. The services can be broadly classi ed as data transfer services, synchronization services, system management/ con guration services and error handling services. Data transfer services include point-to-point services for sending, receiving and exchanging messages and group communication services for broadcasting and multicasting data(hcp Send, hcp Recv, hcp Exch, hcp Mcast/Bcast). Synchronization services allow a processor to lock resources so that no other processor can access them(hcp lock, hcp Barrier). This service enables mutually exclusive access of resources shared between processors. The hcp Barrier primitive enables a speci ed number of processors to synchronize at a logical barrier before proceeding. System management/con guration services(hcp Probe, hcp MsgStat, etc) include calls to monitor transmitted and arriving messages, the current status of the network and hosts and to con gure the hosts into logical groups and for adding/deleting hosts from/to these logical groups. Special error handling services include the hcp Signal primitive which sends a high priority message to all hosts to propagate any error status and the hcp Log/ChkPt primitive to enable checkpointing and logging of previously speci ed data for debugging purposes. When the hcp Log/ChkPt signal is sent, all processors dump this data into a log le and proceed with their computation. In what follows, we describe how some of the services shown in Table 2 are implemented in HCP.
3.3 HCP Implementation Issues 3.3.1 HCP Operation
HCP is a connection-oriented transport protocol. We distinguish between two transfer schemes depending on the message size: long message transfer and short message transfer. A message with length of less than 40 bytes is designated as a short message and otherwise it is regarded as a long one. Each long message is transferred as a sequence of data frames in two phases:
12
connection setup and data transfer. Connection release step is merged into data transfer phases; the successful reception of the last frame triggers the connection release step. Note that the conventional network layer is not maintained in HSM mode since the network routing function is performed in ATM layer. By doing so, protocol processing overhead on the host computer can be reduced. For short message transfer case, the data is transmitted with the connection request frame. During the connection setup phase, HCP requests the underlying AAL to setup a virtual connection (AAL-CREATE.request). ATM layer establishes a corresponding virtual channel connection over the ATM network and returns AAL CEI (AAL Connection Endpoint Identi er) to HCP. The length of data units is forwarded to AAL during this step (max SDU send length, e.g., 64Kbytes). Once the connection is setup, HCP sends one or more data units to AAL (AAL-UNITDATA.request), which will eventually activate the AAL at the destination to inform of arrival of the data (AAL-UNITDATA.indicate). For messages with larger size than max SDU send length, HCP breaks them into several data units, and transfers them as a sequence of data units (AAL further segments the data units into cells). max SDU send length can be determined depending on the network conditions: for example, network reliability and system buer capability. Connection parameters are unchanged during multiple transmission of data units.
3.3.2 Error and Flow Control Depending on the application requirements, HCP can provide either reliable or unreliable message transfer service to its user. In the case of unreliable transfer, the error control function of HCP is not activated. The error control function of HCP relies on the error detection capability of ATM network; AAL provides HCP with the indication of any erroneous data reception (AAL-ERROR.indicate). Error and ow control function in HCP is based on blocked selective retransmission scheme. Since the underlying network media is reliable, the receiving node need not acknowledge every data unit it receives. Instead, the receiver sends a positive acknowledgment (PACK) for every N error-free data units and a negative acknowledgment (NACK) for the data unit received in error. Once the sender receives PACK, it releases the buer and stores the next N frames and retransmits only the data acknowledged with NACK. This will reduce the unnecessary network trac due to frequent PACK, taking advantage of reliable network environment.
14
3.3.3 Frame Formats Figure 4 shows four types of frames which are used during HSM: CR (Connection Request) frame with short data, CR frame with long data, Data frame, and ACK frame. These four frames are classi ed as Type 1 { frames requiring less than 40 bytes { and Type 2 for frames containing more than 40 bytes. A CR frame with short data, a CR frame with long data and an ACK frame are all Type 1, while the Data frame is either Type 1 or Type 2 depending on the length of data. A long data message requires the transmission of a Type 1 frame followed by a sequence of Type 2 frames. The type eld is used to distinguish between the dierent kinds of frames. The Source (SRC) and Destination (DST) elds in CR frame indicate the addresses of application processes in source and destination nodes. The length eld indicates the number of bytes to be transmitted. The frame size eld is used to indicate the number of bytes of each data frame to be transmitted. The status eld in ACK frame distinguishes acknowledgments of connection con rm (CC) and disconnect (DC) as well as positive (PACK) and negative (NACK) acknowledgment of data frames. The sequence eld (SEQ) in Data frame and ACK frame represents the sequence number of data frames. Figure 5 depicts the mapping of HCP messages into ATM cells via AAL 5. 40 bytes is chosen as the maximum message length for Frame Type 1 because this permits the entire message to be encapsulated in an AAL5 PDU. The AAL provides the end-to-end error checking, so no additional overhead is required for this purpose in the HCP layer for Frame Type 1 messages. The AAL SAR (Segmentation and Reassembly) sublayer uses a look up table to map the HCP source and destination address into the appropriate PVC VCI for the ATM cell header. Encapsulating the HCP source and destination, rather than stripping it, permits bridging across NYNET to end devices not directly terminated on the NYNET WAN. The maximum length of a Frame Type 2 is 65535 bytes, which corresponds to the maximum AAL SDU supported by AAL 5. Type 2 frames are segmented sequentially into AAL PDUs. As in Type 1, the AAL provides the error checking across the entire Type 2 frame at the receiving node. For HCP messages longer than the maximum AAL SDU length, multiple Data frames must be sent.
15
4 NYNET Applications Once NYNET is deployed and becomes operational, the research activities of its participants will focus on developing NYNET applications and enabling technologies. The applications that are currently being considered span several important areas such as industry, education, and military technology transfer. Listed below is an outline of some of these and other future applications:
Industry. A real time arti cial intelligence approach for communications network management, agile intelligent manufacturing, distributed database applications, nancial management, and medical application.
Education. Digital libraries and electronic publishing, K-12 supercomputer/parallel processing access, and classroom-to-classroom teleconferencing.
Dual-use Technology Transfer. ATM switching, network management and technical control, command and control information services, security and global grid.
Grand challenges. This set of applications deals with problems of a grand challenge or global scale whose accurate solutions would have a signi cant impact on industrial competitiveness, as well as providing important advances in environmental management, medicine, and other areas. Such challenges require supercomputing capability, and the NYNET parallel computers and supercomputers that are available at NPAC of Syracuse University and the Theory Center at Cornell University enables expanded access to these scarce and costly resources.
In addition to applications, high-speed computing and communications are in themselves major commercial opportunities. The aim of our eort in this area is the development of software tools that will control, manage, and build applications exploiting the continuing rapid advances in processing and network technologies. Areas of focus include:
network R & D: network management, protocols and routing, and network simulation
real time arti cial intelligence
multimedia conferencing
distributed simulation languages
environments for high-performance distributed computing
In what follows, we describe a multimedia medical collaborative environment as an application example. 17
4.1 A Multimedia Cardiologists' Collaborative System The main objective of this application is to use the proposed HPDC environment to develop a real-time environment for cardiologists to facilitate the referral of patients for cardiovascular surgery or coronary angioplasty. We are currently collaborating with the cardiologists and information chiefs at the Veteran Hospitals Medical Centers (VAMC) at Syracuse, Albany and Bualo to allow the cardiologists at Syracuse and/or Albany to confer directly with cardiovascular surgeons and interventional cardiologists in Bualo. An important function of such conferences will be the ability to interactively review the angiographic images of the patients being discussed. Such interaction would provide much more ecient and comprehensive exchange of information than do the existing methods of transmitting clinical information to distinct sites (currently carried out using regular mail services). In this eort we will be collaborating with NYNEX S & T to integrate the Media Broadband Services (MBS), developed by NYNEX, to build the proposed HPDC environment. It supports multi-user exchanges of high resolution images as well as audio, video and textual information. These multi-user sessions allow for fully interactive and collaborative conferencing between participants and information sources based in geographically dispersed locations. This application can be easily extended to other important medical areas such as teaching and research. In teaching, a multimedia collaborative system would allow the members of the sta of multiple hospitals to bene t from lectures given at a particular hospital without spending the time and money needed to travel to the site of the lecturers. In research, a multimedia collaborative system would allow collaborative research among institutions by enabling investigators at several hospitals to interactively plan research and review data from existing projects. Among other things, such collaboration will increase the size of the pool of the data on which research is based.
5 Conclusion In this paper, we presented an HPDC environment that we are currently developing and will be deployed over the NYNET testbed. This environment capitalizes on the current advances in processing and networking technology and software tools to provide cost-eective parallel and distributed computing over a wide area network. In such environment, there are two modes of operation: Normal speed mode where standard communication protocols are used and operate at speed 45 Mbps or less, while the high speed mode is used for parallel/distributed computing over the NYNET and operate at 155 Mbps. We analyzed the primitives, supported by existing parallel and distributed software tools and characterized them into ve categories; i.e., 18
point-to-point communication, group communication, synchronization, con guration / control / management, and exception handling. We also presented the design of a high speed communi-
cation protocol (HCP) that provides the needed bandwidth and services for HPDC applications and showed how some of HCP primitives can be implemented using the services provided by AAL 5 protocol. We brie y discussed the applications that are currently being developed and/or will be developed to run over the NYNET testbed. Further research is needed to analyze the performance of the proposed environment in providing and guaranteeing the required bandwidth and QOS for NYNET applications.
References [1] IBM European Center for Scienti c and Engineering Computing. Usenet news item, 1992. [2] K. Birman, R. Cooper, T. Joseph, K. Kane, and F. Schmuck. The ISIS System Manual. [3] G. A. Geist, M. T. Heath, B. W. Peyton, and P. H. Worley. A user's guide to picl, a portable instrumented communication library. Technical Report Tech. Rep. ORNL/TM-11616, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, Oct 1991. [4] Intel Supercomputer System Division, Beaverton, Oregon. ipsc/2 and ipsc/860 user's guide. 1991. [5] Parasoft Corporation. Express Reference Manual, 1988. [6] V. S. Sunderam. PVM: A framework for parallel distributed computing. Technical report, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, 1991. [7] Thinking Machines Corporation, Massachusetts, Cambridge. Cmmd reference manual, version 1.1. 1992. [8] David W. Walker. Standards for message-passing in a distributed memory environment. ORNL/TM-12147, Aug 1992. [9] H. T. Kung, \ Gigabit Local Area Networks: A Systems Perspective," IEEE Communications Magazine, pp. 79-89, April 1992. [10] G. Chesson, \ The Protocol Engine Project," Proceedings of the Summer 1987 USENIX Conference, pp. 209-215, June 1987. [11] J.B. Park and S. Hariri, "Architectural Support for a High-Performance Distributed System," Proceedings of the 12th Annual IEEE International Phoenix Conference on Computers and Communications'93 (IPCCC-93), pp. 319 {325, March 1993.
19