Proc. HPCN Europe '97, Vienna, 28-30 April 1997, Technical Note DHPC-002.
An ATM-based Distributed High Performance Computing System November 1996
K.A.Hawick?, H.A.James, K.J.Maciunas, F.A.Vaughan, A.L.Wendelborn Department of Computer Science, University of Adelaide, SA 5005, Australia & M.Buchhorn, M.Rezny, S.R.Taylor, M.D.Wilson Australian National University, Canberra, ACT 0200, Australia The RDN CRC Distributed High Performance Computing Project
Abstract. We describe the distributed high performance computing system
we have developed to integrate together a heterogeneous set of high performance computers, high capacity storage systems and fast communications hardware. Our system is based upon Asynchronous Transfer Mode (ATM) communications technology and we routinely operate between the geographically distant sites of Adelaide and Canberra (separated by some 1100km) using Telstra's ATM-based Experimental Broadband Network (EBN). We discuss some of the latency and performance issues that result from running day-to-day operations across such a long distance network. In addition to reviewing the hardware and systems software we have used, we relate some of the experience we have gained from integrating this technology into our working environment. We believe this type of distributed computing system has great potential for a range of distributed applications which we also present.
Introduction Distributed High Performance Computing (DHPC) is still a novel and challenging technology which is only now becoming feasible over very long distance networks. In this paper we discuss some of the issues of using such networks in a day-to-day project environment and of integrating this new technology with the traditional distributed computing equipment found in present day university departments and industrial environments. We have built a distributed high performance computing system with computational and storage resources sited at the University of Adelaide and at the Australian National University in Canberra. These resources are connected using the ATM based Experimental Broadband Network (EBN) provided by Telecom Australia (Telstra) [15]. These resources are illustrated in gure 1. Central to our system are the ATM switches and routers which interface with the Telstra network and which provide a star shaped inter-connection to our compute and storage platforms. Major ?
Author for correspondence, Email:
[email protected], Fax: +61 8 8303 4366, Tel +61 8 8303 4519.
computational resources include the 64 processor Thinking Machines Connection Machine (CM5) and the 20 processor Silicon Graphics Power Challenge. We also operate a 20GB Redundant Array of Inexpensive Disks (RAID) storage capability which is closely integrated with Q47 DLT Tape silo at Adelaide and provides the basis for a hierarchical le store. The RAID can be used as a cache for the tape silo. Our combined system is designed to be capable of processing and storing datasets of terabyte size. E-3 (34Mbps)
Canberra
Brisbane
Multi-Mode Fibre, OC-3c (155Mbps)
DEC
SUN 690
Gigaswitch
Fileserver
X-Terminals
Q47 DLT
DEC Alpha
Telstra’s Experimental Broadband Network (EBN)
Tape Library
SUN Compute & File Servers
FORE ASX-1000 Switch
SGI
Workstation Farm
Storage Works RAID
Power Challenge DEC Alpha
TMC CM5
Sydney
Workstation Farm
Melbourne
Adelaide Storage Works RAID
DHPC Project Resources on the EBN at Adelaide and Canberra
Fig. 1. DHPC Project Hardware Resources at Adelaide and Canberra Our day-to-day project operations are based around two farms of DEC Alpha platforms, one each in Adelaide and Canberra. These platforms have various roles as compute, le, security, time and cell directory servers. Some of the platforms also act as user workstations. The Alpha farms are arranged as two cells running the Distributed Computing Environment (DCE) [17] and although each site can operate as a separate cell, they are closely integrated together with some cross mounted disks using Networked File System (NFS) and some other services such as Distributed File System (DFS). We believe one of the most important success criteria for DHPC technology is that it should be integrable with existing infrastructure in general and in particular with heterogeneous hardware platforms and operating systems. Our present system is based around platforms which use Unix-like operating systems - IRIX (SGI), Digital Unix (DEC) or Solaris (Sun), and we are presently experimenting with integrating PC platforms running Windows NT. Our compute resources are dual-interfaced so that they can communicate using conventional 10Mbps ethernet as well as the 155Mbps ATM connection on optical bre. This is useful in that it has allowed us a degree of exibility in experimenting with services running solely on the ATM interconnect that would have been dicult without being able to serve les by some other means. For instance, our system has some user le stores cross mounted from a conventional Unix environment on machines maintained by the Adelaide University Department of Computer Science.
ATM and the EBN Asynchronous Transfer Mode (ATM) [12] is a collection of communications protocols for supporting integrated data and voice networks. ATM was developed as a standard for wide-area broadband networking but also nds use as a scalable local area networking technology. ATM is a best eort delivery system { sometimes known as bandwidth-on-demand, whereby users can request and receive bandwidth dynamically rather than at a xed predetermined (and paid for) rate. ATM guarantees the cells transmitted in a sequence will be received in the same order. ATM technology provides cell-switching and multiplexing and combines the advantages of packet switching, such as exibility and eciency of intermittent trac, with those of circuit switching, such as constant transmission delay and guaranteed capacity. Although a number of wide area broadband networks have been built in the USA [16], [13], it is unusual to have a network that is fully integrated over very long distances rather than local area use of ATM technology. Telstra have built the Experimental Broadband Network (EBN) [15] to provide the foundation for Australian broadband application development. Major objectives of this are to provide a core network for service providers and customers to collaborate in the development and trial of new broadband applications, and to allow Telstra and developers to gain operational experience with public ATM-based broadband services. 140
130
20
150
20
Telstra’s Experimental Broadband Network (EBN)
0 200 400 600 800 1000 Kilometres
BRISBANE
30
30
Great Australian Bight CANBERRA
SYDNEY
SOUTH PACIFIC OCEAN
@KAH, 1996
ADELAIDE
MELBOURNE Bass Strait
40
40
SOUTHERN OCEAN 130
HOBART 140
150
Fig. 2. Telstra's Experimental Broadband Network (EBN) The current layout of the EBN is shown in gure 2 and some of its latency and performance aspects are discussed below.
Latency and Performance We have experimented extensively with DHPC resources at Adelaide and Canberra in assessing the capabilities of a distributed high performance computing system
operating across long distances. It is worth considering the fundamental limitations involved in these very long distance networks. Telstra's EBN is shown in gure 2. Currently it connects Adelaide, Melbourne, Sydney, Canberra and Brisbane. Although we employ OC-3c (155Mbps) multi-mode bre for local area networking we are restricted to an E-3 (34Mbps) interface card to connect Adelaide to Melbourne and hence to Canberra. The line-of-sight distances involved in parts of the network are: Adelaide/Melbourne 660 km; Melbourne/Canberra 467 km; Melbourne/Sydney 710 km; and Sydney/Brisbane 732 km. Consequently the eective network distances between Adelaide and the other cities are shown in table 1. EBN City
Network Distance Light-speed-Limited from Adelaide Latency (km) (ms) Melbourne 660 2.2 Canberra 660 + 467 = 1127 3.8 Sydney 660 + 710 = 1370 4.6 Brisbane 660 + 732 = 2102 7.0
Table 1. Inter-city distances (from Adelaide) and Light-speed-limited latencies for the Experimental Broadband Network.
The light-speed-limited latencies shown in table 1 are calculated on the basis of the vacuo light-speed (2:9978 105 kms?1 ). It should therefore be noted that this is a fundamental physics limitation and does not take into consideration implementation details, the most important of which are that the EBN does not necessarily use bre over its entire length, the actual route used has distances that are almost certainly longer than the `city route-map' ones quoted here, and that electron carried signals propagate more slowly than light-speed. The important point is that over this length of network some signi cant latency is unavoidable and applications developed to run over such distances must be developed with this in mind. We have made some crude network-performance measurements using the Unix ping utility, which uses the Internet Control Message Protocol (ICMP), for various packet sizes which are initiated at Adelaide, and bounced back from a process running at other networked sites. These are shown in table 2. By varying the packet sizes sent it is possible to derive crude latency and bandwidth measurements. It should be emphasized these measurements are approximations of what is achievable and are only for comparison with the latency limits in table 1. The times in table 2 are averaged over 30 pings and represent a round-trip time. Measurements are all to a precision of 1 ms except for those for Syracuse which had a signi cant packet loss and variations that suggest an accuracy of 20 ms is appropriate. The ping measured latency between Adelaide and Canberra appears to be approximately 15 ms. This is to be compared with the theoretical limit for a round-trip of 7.6 ms. The switch technology has transit delays of approximately 10 s per switch. Depending upon the exact number of switches in the whole system this could approach a measurable eect but is beyond the precision of ping to resolve. We believe that allowing for non-vacuo light-speeds in the actual limitation our
Ping Mean Time Mean Time Mean Time Mean Time Packet Canberra Syracuse USA local machine local machine Size via EBN via Internet via ethernet via ATM switch (Bytes) (ms) (ms) (ms) (ms) 64 15 340 0 0 1008 16 367 3 0 2008 17 387 6 1 4008 18 405 9 1 6008 20 428 14 1 8008 22 441 18 2
Table 2. Approximate Performance Measurements using Ping. measured latency is close (within better than a factor of two) to the best achievable. We believe that variations caused by factors such as the exact route the EBN takes, the slower signal propagation speed over terrestrial copper cables, the routers and switch overheads and small overheads in initiating the ping, all combined, satisfactorily explain the discrepancy in latencies. The EBN appears to provide close to the best reasonably achievable latency. Also of interest is the bandwidth that can be achieved. The actual bandwidth achieved by a given application will vary depending upon the protocols and buering layers and other trac on the network but these ping measurements suggest an approximate value of 2 8kB=(22 ? 15 = 7ms) = 2900kB=s ' 22:7Mbps. This represents approximately 84% of the 27Mbps of bandwidth available to us, on what is an operational network. The Unix utility ttcp is useful for determining bandwidth performance measurements which are outwith the resolution possible with ping. A typical achievable bandwidth between local machines on the operational 155Mbps bre network is 110.3 Mbps compared with a typical gure on local 10Mbps ethernet of 6.586Mbps Both these gures are representative of what was a busy network with other user trac on them.
Systems Software Our DHPC system links together a range of heterogeneous computing resources. In addition to the Distributed Computing Environment (DCE) operating system level software mentioned earlier, we are investigating systems software technologies that provide a good integration of services and information access. A number of management systems for clustered computing have been developed [3], and several are available under the US National HPCC Software Exchange [5]. These include low-level message passing tools of various degrees of robustness, scheduling tools, and batching environments. An interesting development is that of systems software to manage distribution of user simulations on a distributed computing system. We are investigating use of the Nimrod [1] system as a possible mechanism for this. A number of user controlled message passing environments have been available for some time and provide mechanisms to distribute work over a DHPC system. Of particular interest are those systems such as Isis [4] and Horus [18] that provide a degree of robustness and reliability when managing distributed applications. Another system that was developed speci cally to cope with wide area
communications between applications programs is the Nexus[9] software which was developed for the Globus project [8] in the USA.
Applications Implications Wide area ATM-based distributed computing oers a range of possibilities for applications development in areas crucial to industry and commercial interests [11]. Many existing applications can be made to operate across such a network using existing communications protocols layered on top of the Internet Protocol (IP) which can be provided on top of the ATM technology. This is the preferred route in many cases since it preserves software portability. It is also possible to develop applications using proprietary and other applications programming interfaces that can communicate directly with the ATM technology. For example, FORE systems provide an API that can communicate directly with the ATM adaptation layer (AAL5)[12]. There has been considerable recent interest in the concept of meta-computing [7], driven in part by the possibilities of distributed applications running from World Wide Web interfaces [10]. We are currently prototyping applications that will build on some of the systems software packages discussed above to provide users with access to a number of information processing, storage and access services which may operate over a distributed computing system [14]. One such is the distributed geographic information system we are building to archive, process and provide user access to geostationary satellite data [14]. A signi cant application that we have demonstrated across the EBN employs the Mathserver package [19]. Mathserver is a client-server package using remote procedure calls (RPC) which enables an application program executing on one computer to remotely execute numerical routines on other computers within a network. We have also carried out video conferencing experiments using Multicast Backbone (MBONE) [6] tools such as vic, vat and nv. Our DHPC project contributed to a demonstration of the potential of distributed high performance computing technology for exchanging and processing Earth observation data. The demonstration was given at the 10th Plenary of the international Committee on Earth Observation Satellites (CEOS), held in Canberra in November 1996 under the chairmanship of Australia's Commonwealth Scienti c and Industrial Research Organization (CSIRO). The EBN was used to transfer satellite datasets from Adelaide to Canberra, and the Alpha farm in Canberra was used to perform a distributed warp transformation and 3D terrain y-through. Further details can be found at [20].
Discussion We have reviewed some of the technologies appropriate for a distributed high performance computing (DHPC) system and have described some aspects of the hardware and software of the system we have built making use of Telstra's Experimental Broadband Network. We have presented some of the performance issues and conclude by noting that a DHPC system may be viewed not simply as a collection of hardware and systems software but as a set of information processing, storage and delivery services that can be built on specialist computing resources.
We are proceeding towards major applications-oriented demonstrations of our system which is constructed using the following layers: { compute service-agent and broker; { data resource service-agent and broker; { infrastructure layer.
Service-agents provide a mechanism by which very complex services are encapsulated and presented in such a manner that they may be combined into high level actions. In particular service-agents utilize remote execution protocols, expressed either in the Java language [2] or other software technologies appropriate for interfacing with World Wide Web. We are building our implementation on top of existing software technologies such as those we discuss earlier.
Acknowledgments Distributed High Performance Computing (DHPC) is a project of the Research Data Networks Cooperative Research Center (RDN CRC), is managed by the Advanced Computational Systems CRC, and is a joint activity of the University of Adelaide and the Australian National University. Some of the computational resources we have used in this work are owned by the South Australian Centre for Parallel Computing (SACPC), located in Adelaide. We thank Telstra for provision of the Experimental Broadband Network.
References 1. \Nimrod: A Tool for Performing Parametised Simulations using Distributed Workstations", D. Abramson, R.Sosic, J. Giddy, B. Hall, Proc. 4th IEEE Symposium on HPDC, Virginia, August 1995. 2. \Java Sourcebook - A Complete Guide to Creating Java Applets for the Web", Ed Anu, Pub. Wiley, 1996, ISBN 0-471-14859-8. 3. \A Review of Commercial and Research Cluster Management Software", Mark A. Baker, Georey C. Fox and Hon W. Yau, National HPCC Software Exchange Technical Note, June 1996. 4. \The ISIS Project: Real experience with a fault tolerant programming system", Kenneth Birman and Robert Cooper, Operating Systems Review, (April 1991), 103-107. ACM/SIGOPS European Workshop on Fault-Tolerance Techniques in Operating Systems, Bologna, Italy, 1990. 5. \Prototype of the National High Performance Software Exchange", Shirley Browne, Jack Dongarra, Stan Green, Keith Moore, Tom Rowan, Reed Wade, Georey Fox, Ken Hawick, Ken Kennedy, Jim Pool, Rick Stevens, Bob Olson, Terry Disz. IEEE Computational Science & Engineering, Summer 1995, P62-69. 6. \First Internet Engineering Task Force Internet Audiocast", S. Casner and S. Deering, ACM SIGComm Computer Communications Review, July 1992, pp. 92-97. 7. \A Scalable Paradigm for Eectively-Dense Matrix Formulated Applications", G.Cheng, G.C.Fox and K.A.Hawick, Proc. HPCN 1994, Munich, Germany April 1994. Volume 2, PP202. 8. \Globus: A Metacomputing Infrastructure Toolkit", Ian Foster and Carl Kesselman, http://www.globus.org, 1996. 9. \The Nexus Approach to Integrating Multi-threading and Communication", Ian Foster, Carl Kesselman, Steven Tuecke, MCD, Technical Note, Argonne National lab, 1996.
10. \The Electronic InfoMall - HPCN enabling Industry and Commerce", G.C.Fox, K.A.Hawick, M.Podgorny, K.Mills, Proc. HPCN Europe 1995, The International Conference on High Performance Computing and Networking, Milan, 3-5 May 1995. 11. \Characteristics of HPC Scienti c and Engineering Applications", G.C.Fox, K.A.Hawick and A.B.White, Report of Working Group 2: Proc. Second Pasadena Workshop on System Software and Tools for High Performance Computing Environments, January 10-12 1995. 12. \ATM Networks - Concepts, Protocols, Applications", R.Handel, M.N.Huber, S.Schroder, Pub. Addison-Wesley, 1994, ISBN 0-201-42274-3. 13. \Applications and Enabling Technology for NYNET Upstate Corridor", Salim Hariri, Georey Fox Northeast Parallel Architectures Center (NPAC), Technical Report SCCS-642, November, 1994. 14. \Geographic Information Systems Applications on an ATM-Based Distributed High Performance Computing System", K.A.Hawick, H.A.James, K.J.Maciunas, F.A.Vaughan, A.L.Wendelborn, M.Buchhorn, M.Rezny, S.R.Taylor, and M.D.Wilson, Submitted to HPCN 97. 15. \Telstra's Experimental Broadband Network", D.Kirkham, Telecommunications Journal of Australia, Vol 45, No 2, 1995. 16. \CASA Gigabit Network Testbed: Final Report", Paul Messina and Tina MihalyPauna, Editors, CASA Technical Report 123, July 1996. 17. \Introduction to OSF DCE", The Open Software Foundation, Pub. Prentice Hall, 1995, ISBN 0-13-185810-6. 18. \Horus, a exible Group Communication System", Robbert van Renesse, Kenneth P. Birman and Silvano Maeis, Communications of the ACM, April 1996. 19. \Aspects of High Performance Computing", Michael Rezny, PhD Thesis, The University of Queensland, June 1995. 20. The distributed high performance computing technology demonstration to the Committee on Earth Observation Satellites (CEOS) is described at the following URL: \http://acsys.anu.edu.au/special/CEOS/demos".