To appear J. Supercomputing, Technical Note DHPC-079.
Asynchronous Transfer Mode and other Network Technologies for Wide-Area and High-Performance Cluster Computing K.A. Hawick and H.A. James Distributed & High Performance Computing Group Department of Computer Science, University of Adelaide Australia SA 5005 Tel +61 8 8303 4519, Fax +61 8 8303 4366 Email
[email protected] December 20, 1999 Abstract We review fast networking technologies for both wide-area and high performance cluster computer systems. We describe our experiences in constructing ATM-based local-area and wide-area clusters and the tools and technologies this experience led us to develop. We discuss our experiences using Internet Protocol on such systems as well as native ATM protocols and the problems facing widearea integration of cluster systems. We are presently constructing Beowulf-class computer clusters using a mix of Fast Ethernet and Gigabit Ethernet technology and we anticipate how such systems will integrate into a new local-area Gigabit Ethernet network and what technologies will be used for connecting shared HPC resources across wide-areas. High latencies on wide-area cluster systems led us to develop a metacomputing problem-solving environment known as DISCWorld. We summarise our main developments in this project as well as the key features and research directions for software to exploit computational services running on fast networked cluster systems. Keywords: cluster computing; ATM; Gigabit Ethernet; Fast Ethernet; DISCWorld; metacomputing
1 Introduction Cluster computing has become an important part of the high-end computing world. Many of the applications traditionally run on high-end supercomputers are now successfully run on computer clusters. In this paper we describe our experiences in build1
ing high performance computer clusters and in linking them together as clusters of clusters over wide areas. This approach towards an Australian “National Computer Room” was started in 1996 and led us to two important conclusions about wide-area computing. Firstly from a technical standpoint, the limitations of latency and of network reliability are likely to be more important than bandwidth limitations in the future and therefore successful wide-area systems need to be constructed accordingly. Secondly, the political and administrative issues behind supercomputer resource location and ownership mean that in the future it will be attractive to loosely cluster compute resources nationally and even internationally. Institutions have strong reasons to wish to retain ownership and control of their own computer resources. Clusters of local resources, each of which itself may be a cluster of some sort, are therefore an attractive architecture to target for software development. In this article we try to divide cluster computing systems issues into those relevant to achieving high performance (typically on local area compute clusters) and those issues pertaining to the successful sharing of resources between institutions (typically wide-area issues). The size and utilisation of computer clusters varies enormously. The recent IEEE Workshop on Cluster Computing [3] demonstrated this as well as the different uses of the term cluster computing. We will use cluster in its broader sense to describe a collection of networked compute resources that may be heterogeneous in hardware and in operating systems software, but which are capable of cooperating in some way on a distributed application. This may be using any of the parallel computing software models such as PVM[8], MPI[25] or HPF[17] or through some higher granularity object or service based system like Globus[7], Legion[10], CORBA[27] or DISCWorld[16]. We use the term Beowulf-class cluster in the sense it now widely adopted to describe a cluster that is dedicated in some sense, and is not just a random collection of networked computers. The term “constellation” is now being adopted to describe a compute cluster that uses shared memory or bus based technology. We focus on clusters and Beowulf-class systems in describing our work in this paper. At the time of writing the watershed sizes for clusters are limited by the number of ports on commercially available network switches. For example, many groups report work on modest clusters of 8 or 16 processors. We continue to operate our prototype 8 node Beowulf system for student experiments. Modern switches are typically capable of interconnecting 16, 24 or even 48 compute nodes using Fast Ethernet at 100MBit/s bandwidth between all nodes. The limits in size from a single switch therefore might be used as the delimiting size for distinguishing large and small computer clusters. This number is at present larger than the number of processors that can be economically linked using shared memory technology. High end systems such as the ASCI systems and Tera computer architecture do achieve considerably higher number of nodes interlinked with shared memory, but can hardly be described as commodity economical architectures. Switches such as the 510 series from Intel allow switches to be stacked using high bandwidth links between individual switches to preserve the full point to point bandwidth between compute nodes. We have constructed a 120 node Beowulf class system using this flat switch architecture and dual processor nodes. The Intel 510 series is limited to linking 7 of these 24 port switches in this manner and this limit of 168 2
networked nodes represents well the limitations of current network technology. Fast Ethernet technology is used to link individual nodes, with Gigabit Ethernet uplinking used between the switch backplane and the file server node. Beowulf systems larger than this are being built, but must use some sort of hierarchical network architecture and are no longer “as commodity” in nature. It is likely that network technology will move forward with economics and technical improvements and that commodity switchgear will soon allow bigger systems. Cluster systems will therefore continue to encroach on the traditional supercomputing market area. Scalable software to run on such systems is available for some applications problems, but our experience is that some effort is still needed to ensure the major contribution towards latency in parallel applications running on clusters does not remain the operating system kernel or communications software. Achieving low latency in sending messages between nodes in a Beowulf system appears to be a more important limitation at present than achieving high bandwidth. We describe our experiences in constructing ATM-based clusters in section 2. We attempt to summarise our metacomputing approach to cluster management in section 3 and describe how some of the latency and distributed computing problems, that are so important to wide-area clusters, can be tackled. We review some of the networking technologies actively used for cluster computing systems in section 4 and discuss their relative technical and economic advantages.
2 Wide Area ATM Clusters In this section we describe some experiences from an experimental wide area system we ran from 1996 to 1998, employing local and wide area ATM technology to connect high performance computing clusters around Australia. The Research Data networks (RDN) Cooperative Research Centre (CRC) was set up as a joint venture involving our own University of Adelaide, the Australian National University in Canberra, Monash University in Melbourne, the University of Queensland and Telecom Australia (Telstra). The map shown in figure 1 shows the initial broadband network we were able to set up and use. It connected Adelaide, Melbourne, Sydney, Canberra and Brisbane; some of its latency and performance aspects are discussed below. We trialed links of 34Mbit/s and 155Mbit/s over various branches of the network and experimented with various combinations of: cross mounted long distance filesystems; video conferencing; simple shared whiteboard style collaborative software; and cross connected computer clusters. We describe the Asynchronous Transfer Mode (ATM) technology used for these experiments in section 4. The main features of our experiments were to analyse the effects of having relatively high wide area bandwidths available for cluster computing yet having perceptibly high latencies arising from the long distances. We were able to construct wellconnected compute clusters having the peculiar bisection bandwidth and latency properties that arise from having part of the cluster at each geographically separate site. Although a number of wide area broadband networks have been built in the USA [4, 26], it is unusual to have a network that is fully integrated over very long distances 3
rather than local area use of ATM technology. Telstra built the Experimental Broadband Network (EBN) [21] to provide the foundation for Australian broadband application development. Major objectives of this were to provide a core network for service providers and customers to collaborate in the development and trial of new broadband applications, and to allow Telstra and developers to gain operational experience with public ATM-based broadband services. 140
130
150
20
20
Telstra’s Experimental Broadband Network (EBN)
0 200 400 600 800 1000 Kilometres
BRISBANE
30
30
Great Australian Bight CANBERRA
SYDNEY
SOUTH PACIFIC OCEAN
@KAH, 1996
ADELAIDE
MELBOURNE Bass Strait
40
40
SOUTHERN OCEAN 130
HOBART 140
150
Figure 1: Telstra’s Experimental Broadband Network (EBN). The EBN is a 34Mbit/s ATM network connecting research and commercial partner sites. Figure 2 shows how our prototype storage and processing hardware were arranged at both Adelaide and Canberra (separated by some 1200km) and connected by the EBN. The Adelaide and Canberra cells consisted of a number of DEC AlphaStations, interconnected locally by 155MBit/s ATM, and with the cells at the two sites connected across the EBN at 34MBit/s. We experimented extensively with cluster computing resources at Adelaide and Canberra in assessing the capabilities of a distributed high performance computing system operating across long distances. It is worth considering the fundamental limitations involved in these very long distance networks. Although we employ OC-3c (155MBit/s) multi-mode fibre for local area networking we were restricted to an E-3 (34MBit/s) interface card to connect Adelaide to Melbourne and hence to Canberra. In practice this was not a severe limitation as for realistic applications were were unable to fully utilise a 155MBit/s link over wide areas except in the most contrived circumstances. This would not be the case for a shared link however. The line-of-sight distances involved in parts of the network were: Adelaide/Melbourne 660 km; Melbourne/Canberra 467 km; Melbourne/Sydney 710 km; and Sydney/Brisbane 4
E-3 (34Mbps)
Canberra
Brisbane
Multi-Mode Fibre, OC-3c (155Mbps)
DEC
SUN 690
Gigaswitch
Fileserver
X-Terminals
Q47 DLT
DEC Alpha
Telstra’s Experimental Broadband Network (EBN)
Tape Library
SUN Compute & File Servers
FORE ASX-1000 Switch
SGI
Workstation Farm
Storage Works RAID
Power Challenge DEC Alpha
TMC CM5
Sydney
Workstation Farm
Melbourne
Adelaide
DHPC Project Resources on the EBN at Adelaide and Canberra
Storage Works RAID
Figure 2: DHPC Project Hardware Resources at Adelaide and Canberra connected via Telstra’s EBN. While traffic can be sent between sites via the ’ordinary’ Internet, bandwidth-intensive experiments were carried out by specifically routing traffic over the EBN. 732 km. Consequently the effective network distances between Adelaide and the other cities are shown in table 1. The light-speed-limited latencies shown in table 1 are calculated on the basis of the vacuo light-speed (2.9978 × 105kms−1 ). It should therefore be noted that this is a fundamental physics limitation and does not take into consideration implementation details. EBN City Melbourne Canberra Sydney Brisbane
Network Distance from Adelaide (km) 660 660 + 467 = 1127 660 + 710 = 1370 660 + 732 = 2102
Light-speed-Limited Latency (ms) 2.2 3.8 4.6 7.0
Table 1: Inter-city distances (from Adelaide) and Light-speed-limited latencies for the Experimental Broadband Network. We made some simple network-performance measurements using the Unix ping utility, which uses the Internet Control Message Protocol (ICMP), for various packet sizes which are initiated at Adelaide, and bounced back from a process running at other networked sites. These are shown in table 2. By varying the packet sizes sent it is possible to derive crude latency and bandwidth measurements. It should be emphasized these measurements are approximations of what is achievable and are only for comparison with the latency limits in table 1. The times in table 2 were averaged over 30 pings and represent a round-trip time. Measurements are all to a precision of 1 ms except for those for Syracuse which had a significant packet loss and variations that suggest an accuracy of ±20 ms is appropriate. The ping measured latency between Adelaide and Canberra appears to be approx-
5
Ping Packet Size (Bytes) 64 1008 2008 4008 6008 8008
Mean Time Canberra via EBN (ms) 15 16 17 18 20 22
Mean Time Syracuse USA via Internet (ms) 340 367 387 405 428 441
Mean Time local machine via ethernet (ms) 0 3 6 9 14 18
Mean Time local machine via ATM switch (ms) 0 0 1 1 1 2
Table 2: Approximate Performance Measurements using Ping. imately 15 ms. This is to be compared with the theoretical limit for a round-trip of 7.6 ms. The switch technology has transit delays of approximately 10 µs per switch. Depending upon the exact number of switches in the whole system this could approach a measurable effect but is beyond the precision of ping to resolve. We believe that allowing for non-vacuo light-speeds in the actual limitation our measured latency is close (within better than a factor of two) to the best achievable. We believe that variations caused by factors such as the exact route the EBN takes, the slower signal propagation speed over terrestrial copper cables, the routers and switch overheads and small overheads in initiating the ping, all combined, satisfactorily explain the discrepancy in latencies. The EBN appears to provide close to the best reasonably achievable latency. Also of interest is the bandwidth that can be achieved. The actual bandwidth achieved by a given application will vary depending upon the protocols and buffering layers and other traffic on the network but these ping measurements suggest an approximate value of 2 × 8kB/(22 − 15 = 7ms) = 2900kB/s ≃ 22.7MBit/s. This represents approximately 84% of the 27MBit/s of bandwidth available to us, on what was an operational network. The Unix utility ttcp was useful for determining bandwidth performance measurements which are outwith the resolution possible with ping. A typical achievable bandwidth between local machines on the operational 155MBit/s fibre network is 110.3 MBit/s compared with a typical figure on local 10MBit/s ethernet of 6.586MBit/s Both these figures are representative of what was a busy network with other user traffic on them. In 1998 and 1999 an additional link connecting Australia and Japan was made available to us with a dedicated 1MBit/s of bandwidth available for experimentation. We carried out some work in collaboration with the Real World Computing Partnership(RWCP) [28] in Japan and found that there was an effective latency of around 200ms single trip between Adelaide and Tsukuba City in Japan over this ATM network. All these experiences suggest that the major limitations for wide area cluster computing in the future are more likely to be from latency limitations rather than from bandwidth limitations. The Telstra Experimental Broadband network was closed at the end of 1998 and
6
we have had an alternative ATM network supplied by Optus Pty Ltd made available to us in 1999. This network has similar latency characteristics but has the advantage that we can apply to burst up to a full 155MBit/s between Adelaide and Canberra at will. In practice, we have found that 8MBit/s is entirely adequate for our day to day wide-area cluster computing operations. It is interesting to reflect on our experiences in setting up and using these networks. Staff at Telstra’s research laboratories were instrumental in setting up the initial circuits and allocating bandwidth to our experiments. As the trials proceeded this became a well automated process with little human intervention required except when we needed to split bandwidth allocations between video, cluster and file system circuits for example. One of the problems with the experimental set up was that only permanent virtual circuits (PVCs) were available to us and that the network configuration (on the routing hosts) needed to be changed (manually) each time a reconfiguration was required. One of the promises of the ATM standard is that of Switched Virtual Circuits (SVCs) which could be manipulated in software to reallocate bandwidth to different applications. Although we were able to experiment with pseudo SVCs using the proprietary facilities of our ATM switch gear, it was not possible to enable proper switch virtual circuits across different vendors’ switch gear at the different sites in our collaboration. To our knowledge this is still not available on wide area ATM networks and is, we believe a disappointing limitation of this technology in practice. We expended considerable time and effort in carrying out these experiments, which was particularly difficult when we had to reconfigure routers and operating systems and driver software every time we needed to carry out a different experiment. ATM technology has matured since the start of our experiments but we believe it will not be the technology of choice at the cluster computing level. ATM is likely to still find a place as a network backbone technology, certainly for long distance networks and perhaps even still in local area networks. Driver support and administration tools for ATM based cluster computing do not appear to be forthcoming and consequently we believe it much more likely that Internet Protocols implemented on top of ATM or other technologies perhaps will be much more useful for cluster computing.
3 Metacomputing Cluster Management Our vision at the start of our wide area network trials was of a “National Computer Room”. We imagined this as a network and software management infrastructure that would enable institutions to share access to scarce high-performance computing resources. This seemed important given the scarcity of supercomputing systems in Australia in particular. We have now revised this vision however. While there are still some areas in which a computer sharing mechanism is important, the whole supercomputing industry has continued to be shaken up. Global reduction in the number of supercomputer vendors still in business can be attributed to a number of factors, but not least of these is the greater availability of cluster computing resources. The need for institutions to own and control their own resources appears to be a strong one, and relatively cheap cluster computing systems enables this phenomena. We believe therefore that although there is still a need for software to enable compute resource sharing, the re7
sources are more likely to be clusters themselves and this affects the characteristics of software and applications that can integrate them. Software such as Netsolve, Globus, Legion, Ninf and our own DISCWorld system are all aimed at enabling the use of distributed computing resources for applications. Our favoured approach is to encapsulate applications (including parallel ones) as services that can run in a metacomputing environment and we have focussed our efforts in to building Java middleware to allow this. Other approaches have taken experiences from the parallel computing era to allow wide area systems to behave as a more tightly coupled parallel system running multi processor applications across the wide area networks. The choice is a matter of granularity and of how to group parts of the application together. The wide area parallel approach is attractive for its simplicity in that the tools and technologies from parallel computing can be redeployed with only software re-engineering efforts. The metacomputing approach recognizes that networks (especially wide-area ones will remain unreliable, and that long distance latencies will not improve (short of some new physics breakthrough) and that the fundamental distributed computing problems need to be considered when building wide-area cluster systems. Distributed Information Systems Control World (DISCWorld) [16] is a metacomputing model or framework with a series of prototype systems developed to date. The basic unit of execution in the DISCWorld is that of a service. Services are pre-written software components or applications. These are either written in Java or are legacy codes that have been provided with a Java wrapper. Users can compose a number of services together to form a complex processing request. Jobs can be scheduled across the participating nodes [19]. An example DISCWorld application is a land planning system [5], where a client application requires access to land titles information at one site, and digital terrain map data at another, and aerial photography or satellite imagery stored at another site. DISCWorld itself does not build on parallel computing technology, but can embed parallel programs as services. A support module known as JUMP provides integration with message-passing parallel programs which might run on a conventional supercomputer as an or on a cluster[13]. An environment such as DISCWorld is ideal for use across a cluster or a distributed network of clusters. The nodes within the cluster can be used as federated hosts connected via a high-speed, dedicated network. Some or all of the nodes can be used as a parallel processor farm, running those services which are implemented as parallel programs. We employed both these approaches in our multi cluster experiments across our ATM network. The ability to make intelligent decisions on where to schedule services in the DISCWorld environment, in order to minimize either execution time or total resource cost relies on the ability to characterise both the services and the nodes in the environment.
4 Fast Networking Technologies In this section we review the major networking technologies and describe their roles for wide area and high-performance cluster computing. We discuss Asynchronous Transfer Mode (ATM); Fast Ethernet; Gigabit Ethernet; Scalable Coherent Interconnect (SCI); and Myrinet technologies. Each of these technologies may have differing 8
roles to play in wide-area and cluster computing over the next few years. Table 3 summarises the networking technologies we consider here. The data in the table has been combined from our own benchmarking experiments as well as reports in the literature [6, 22, 23, 29]. It can be seen that for problems which do not require gigabit-order bandwidth Fast Ethernet seems a logical choice. If gigabit-order bandwidth is required, the decision is not so clear-cut: there is a complex trade-off between the bandwidth and latency attainable, the approximate cost per node, and whether the technology is considered commodity. When we use the term commodity, we refer to the fact that the technology is available to the mass market, and is available from more than a few specialist vendors. Technology Ethernet Fast Ethernet ATM Gigabit Ethernet SCI Myrinet
Theoretical Bandwidth 10MBit/s 100MBit/s 155MBit/s 1000MBit/s 1600MBit/s 1200MBit/s
Measured TCP/IP Bandwidth 6.58MBit/s 68MBit/s 110.3MBit/s 950MBit/s 106MBit/s 1147MBit/s
Latency 1.2ms 1ms 1ms 12ms 4µs 117µs
Approx Cost Per Node ($US) 80 150 2500 1200 1400 1700
Table 3: Interconnection network technologies characteristics as measured and also reported in the open literature. We were able to trial our ATM network of clusters using sponsorship from the Australian Commonwealth government. Table 3 shows that ATM is still not an especially economic choice. It is our belief that Gigabit Ethernet technology will be widely adopted and that it will be correspondingly driven down in price.
4.1 ATM Asynchronous Transfer Mode (ATM) [1] is a collection of communications protocols for supporting integrated data and voice networks. ATM was developed as a standard for wide-area broadband networking but also finds use as a scalable local area networking technology. ATM is a best effort delivery system – sometimes known as bandwidthon-demand, whereby users can request and receive bandwidth dynamically rather than at a fixed predetermined (and paid for) rate. ATM guarantees the cells transmitted in a sequence will be received in the same order. ATM technology provides cell-switching and multiplexing and combines the advantages of packet switching, such as flexibility and efficiency of intermittent traffic, with those of circuit switching, such as constant transmission delay and guaranteed capacity. ATM uses a point-to-point, full-duplex transmission medium, and provides connection-oriented protocols. In addition to supporting native ATM protocols such as ATM Adaption Layer (AAL) 3/4 and 5, the use of LANE allows the ATM network to be viewed as part of a TCP/IP network (to allow multicast and broadcast). There have been a number of studies that consider the use of the AALs for high-speed interconnections in parallel computing [20]. We employ a 155MBit/s local-area ATM network [14, 15] as well as 9
a 34MBit/s broadband network (EBN). ATM allows guaranteed bandwidth reservation across a circuit (a link or a number of links with defined end-points. There are three different types of bandwidth reservation: constant bit rate (CBR); variable bit rate (VBR); and available bit rate (ABR). CBR is used when the traffic between sites is constant and will rarely, if ever change in bandwidth requirements. VBR is used to characterise traffic that has a mean bandwidth requirement that can change slightly. Traffic along a VBR circuit can operate at the circuit’s maximum bandwidth only for short amounts of time; intermediate switches may drop cells that exceed the VBR parameters. Finally ABR exists for bursty traffic which cannot be easily characterised. When ABR traffic is sent it uses any available link capacity; if there is not enough link capacity for the traffic the ABR traffic is queued until the buffers fill, in which case cells are dropped. ATM has been popular with the Telecommunications Industry [21] for broadband networks, where CBR and VBR traffic is used for dedicated (statically- and dynamicallyallocated) customer links (for voice and constant-rate data). As typical data is characterised as bursty, it does not make sense to reserve bandwidth in a CBR or even VBR capacity. ABR, which uses any available bandwidth on an ATM link, must be used to avoid wasting capacity. There have not been many sites that have adopted ATM as a local area network. The major factor prohibiting widespread adoption is the cost of ATM switches and of ATM interface cards for individual nodes.
4.2 Other Notable Networking Technologies High Performance Parallel Interface (HiPPI) is a point-to-point link that uses twistedpair copper cables to connect hosts via crossbar switches. HiPPI gets its name because data is transmitted in parallel; the connecting cables have 50 cores: 32 are used to transmit data, one bit per line. The standard allows for transmission rates of 800MBit/s and 1600MBit/s but the maximum length of copper cables is 25 metres. A serial version of HiPPI is available, using fibre optic media, which allows a maximum distance of 10km between ends. HiPPI has been successfully used for networking supercomputer systems, but now seems likely to be overtaken by other cheaper technologies. Fibre Channel (FC), defined by the Fibre Channel Standard, is a circuit-switching and also packet-switching technology that allows transmission at multiple rates. Data is sent in frames of 2148 bytes (2048 bytes of data). FC is successfully used in interconnecting hosts but is also likely to be overtaken by the economics of other technologies.
4.3 Internet or Native Technology Protocols Many networking technologies have proprietary communications protocols, such as ATM Adaptation Layers (AAL). While these proprietary protocols provide optimized communication, there is usually a trade-off for code portability. Using proprietary protocols can also be fraught with pitfalls. The majority of users do not wish to make every optimisation to their code, or because users are working with heterogeneous mixtures of networking hardware, the custom protocols are not widely used. For example, experiments with ATM’s AAL5 uncovered a bug in the implementation that the manufacturer’s engineers said could only be fixed by purchasing a much 10
faster machine [20]. We experimented with implementing message passing communications software on raw ATM Adaptation layers and conclude that since this approach has not been widely supported by ATM vendors, this will not be a feasible approach in the long term. We conclude from our experiments that using a well-known, standard protocol such as IP is the easiest, and most portable approach. Most manufacturers provide for the encapsulation of IP packets within their proprietary protocols (such as ATM’s LAN Emulation). IP promises to be around a long time with the advent of IPv6, which features IP tunneling, allowing standard IP traffic to be encapsulated within IPv6 packets[18]. In summary, we have found that while 10MBit/s desktop connections are still very common, most current machines are now capable of effectively utilising a 100MBit/s connection. For clusters of workstations with medium bandwidth requirements 100MBit/s Fast Ethernet is a viable and affordable solution. If high bandwidth is required then a trade-off between the cost and performance is necessary: SCI provides the best latency but the measured bandwidth is nowhere near as high as Myrinet; Myrinet has the largest bandwidth of all the systems we consider, and the latency is still under a millisecond. If a larger latency can be tolerated, and bandwidth is not critical, then we believe the open standards of Gigabit Ethernet may be an appropriate choice. We reiterate that we believe the economics of large markets will ensure Gigabit Ethernet technology becomes widespread for cluster computing.
5 Summary and Conclusions In this article we have reviewed a number of fast networking technologies for cluster computing and related our experiences with ATM in particular. Preliminary experiments and experiences now being reported leads us to believe that the combination of Fast Ethernet and Gigabit Ethernet will be the chosen route for most general purpose Beowulf class cluster computing systems. We believe that the technical criteria being fairly finely balanced with only factor of two advantages for one technology over another, that the economics of the mass market will dictate which technology becomes most widespread. This of course is a feedback phenomena as cheaper switches and network cards will be adopted even more widely. We believe ATM may still have a role to play in wide area cluster computing systems but it is preferable that it be transparent to cluster users and cluster system software. Internet protocol will surely be implemented on top of ATM and users and cluster software will continue to interface to that. We believe there are still many interesting developments to be made in software to manage wide area clusters as well as high-performance clusters. It seems important to recognise that will message passing level technology can play a useful part in high performance systems, there is some considerable research still to be done to address the latency and reliability issues for wide area cluster systems.
11
6 Acknowledgements Thanks to J.A.Mathew for assistance is carrying out some of the measurements reported in this work. It is also a pleasure to thank all those who helped in conducting trials of the EBN: K.J.Maciunas, D.Kirkham, S.Taylor, M.Rezny, M.Wilson and M.Buchhorn.
References [1] ATM Forum. Available at http://www.atmforum.com Last visited December 1999. [2] N. Boden, R. Felderman, A. Kulawik, C. Seitz, J. Seizovic and W. Su Myrinet A Gigabit Per Second Local Area Network. IEEE Micro, 15(1):29–36, February 1995. [3] Rajkumar Buyya, Mark Baker, Ken Hawick and Heath James, eds. Proc. First IEEE Computer Society International Workshop on Cluster Computing (IWCC’99), IEEE Computer Society, December 1999. ISBN 0-7695-0343-8. [4] CASA Project. The CASA Gigabit Testbed Project. Available at http://www.nlanr.net/CASA/casa.html Last visited December 1999. [5] P. D. Coddington, K. A. Hawick and H. A. James. Web-Based Access to Distributed, High-Performance Geographic Information Systems for Decision Support. Technical Report DHPC-037, Advanced Computational Systems CRC, Department of Computer Science, University of Adelaide, June 1998. [6] William J. Cronin, Jerry D. Hutchinson, K.K. Ramakrishnan and Henry Yang. A Comparison of High Speed LANs In Proc. IEEE 9th Conf on Local Computer Networks (LCN), October 1994. [7] Ian Foster and Carl Kesselman. The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann Publishers, Inc., 1999. ISBN 1-55860-475-8. [8] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek and V. Sunderam. PVM: Parallel Virtual Machine A Users’ guide and Tutorial for Networked Parallel Computing, MIT Press, 1994. [9] Gigabit Ethernet Alliance. Gigabit Ethernet Alliance. Available at http://www.gigabit-ethernet.org/ Last visited December 1999. [10] Andrew S. Grimshaw and Wm. A. Wulf. Legion – A View From 50,000 Feet. In Proc. Fifth IEEE Int. Symp. High Performance Distributed Computing, Los Alamos, California, August 1996. IEEE Computer Society Press. [11] IEEE Standards On-line. IEEE Standard 802.3u Fast Ethernet Standard Available at http://standards.ieee.org Last visited December 1999.
12
[12] IEEE Standards On-Line. IEEE Standard 1596 Scalable Coherent Interface (SCI). Available at http://standard.ieee.org Last visited December 1999. [13] K. A. Hawick. Meta-Cluster Computing. DHPC-074, Computer Science Department, The University of Adelaide, November 1999. [14] K. A. Hawick, H. A. James, K. J. Maciunas, F. A. Vaughan, A. L. Wendelborn, M. Buchhorn, M. Rezny, S. R. Taylor and M. D. Wilson. Geographic Information Systems Applications on an ATM-Based Distributed High Performance Computing System. In Proceedings HPCN’97, Vienna, Austria, August 1997. [15] K. A. Hawick, H. A. James, K. J. Maciunas, F. A. Vaughan, A. L. Wendelborn, M. Buchhorn, M. Rezny, S. R. Taylor and M. D. Wilson. Geostationary-Satellite Imagery Applications on Distributed, High-Performance Computing. In Proc. HPCAsia’97, August 1997. [16] K. A. Hawick, H. A. James, A. J. Silis, D. A. Grove, K. E. Kerry, J. A. Mathew, P. D. Coddington, C. J. Patten, J. F. Hercus and F. A. Vaughan. DISCWorld: An Environment for Service-Based Metacomputing. Future Generation Computing Systems (FGCS), 15623–635, 1999. [17] High Performance Fortran Forum (HPFF). High Performance Fortran Language Specification. Available from http://www.crpc.rice.edu/HPFF/ [18] Internet Engineering Task Force. IPv6 Standard. Available from http://www.ietf.org [19] Heath A. James. Scheduling in Metacomputing Systems. PhD Thesis, Department of Computer Science, The University of Adelaide, July 1999. [20] Heath A. James. Using ATM In Distributed Applications. Honours Thesis, Department of Computer Science, The University of Adelaide, 1995. [21] D. Kirkham. Telstra’s Experimental Broadband Network. Telecommunications J. Australia, 45(2):1995. [22] C. Krumann and T. Stricker. A comparison of two Gigabit SAN/LAN technologies: Scalable Coherent Interface and Myrinet. In Proc. SCI Europe, EMMSEC’98, September 1998. [23] Jens Mache. An Assessment of Gigabit Ethernet as Cluster Interconnect. In Proc. IEEE Computer Society International Workshop on Cluster Computing (IWCC’99), December 1999. [24] J. A. Mathew and K. A. Hawick. ATM Performance Characteristics on Distributed High Performance Computers. Technical Report DHPC-016, Computer Science Department, University of Adelaide, 1997. [25] Message Passing Interface Forum. MPI: A Message Passing Interface Standard. Available from http://www.mpi-forum.org 13
[26] Salim Hariri and Geoffrey Fox. Applications and Enabling Technology for NYNET Upstate Corridor. Technical Report 642, Northeast Parallel Architectures Center, Syracuse University. November 1994. [27] Object Management Group. CORBA/IIOP 2.2 Specification. Available from http://www.omg.org/corba/cichpter.html, July 1998. [28] Satoshi Sekiguchi, Mitsuhisa Sato, Hidemoto Nakada and Umpei Nagashima. – Ninf – : Network base information library for globally high performance computing. In Parallel Object-Oriented Methods and Applications (POOMA), February 1996. [29] Bruce Tolley, 3Com Corporation. The Lowdown on High Speed: Gigabit Ethernet. Available at http://www.3com.com/technology/layer3sw2/layer3sw2c.html
14