A Distributed Computing Experiment in a Metropolitan Network - Another Attempt to Implement Grid Computing Ping Chen Network Center Peking University Beijing, P.R. China, 100871
[email protected] John H. Hine School of Mathematical and Computing Sciences, Victoria University of Wellington Wellington, New Zealand
[email protected]
Abstract Grid Computing is a wide-area computing infrastructure viewed by many as a promising next generation platform for solving large-scale problems. Most of the current research projects concentrate on constructing middleware to provide pervasive, dependable, and consistent access to geographically distributed computational resources connected by a wide-area network. As an alternative, we are studying the feasibility of sharing computational resources connected to a high-speed city wide network using mature local network technologies, such as PVM (Parallel Virtual Machine). This paper describes an experiment to run a distributed meteorological computing model over Wellington’s metropolitan network. Keywords: Grid Computing, Metropolitan Network, PVM 1. Introduction Advances in network technology have increasingly blurred the distinction between local area networks, metropolitan areas networks and wide-area networks. Common local network technologies, such as fiber optics, have moved out of enterprises, into the city, and increasingly wide area networks. The network bandwidth of some well-developed metropolitan-area networks is almost the same as or greater than that of local networks. Research promises that this trend will continue. Following on from the rapid development of network technologies, new and unique applications become possible. Wide-area distributed computing is one of these. Computations that used to be run on parallel supercomputers first migrated to local networks in early 1990’s. Supporting middleware such as MPI and PVM became mature products in this time. The principal goal of this middleware was the support of distributed computing over local area networks. The rapid increase in the bandwidth of wide area networks in recent years, has made sharing computing resources over a wider network attractive. Grid computing describes efforts to extend large parallel computations to wide area networks. Of course bandwidth is not the only attribute of networks. While the bandwidth of wide area networks has increased connections must still suffer the latency introduced by the switching fabric. In this paper we will try to assess how well a local area product, PVM, works in a metropolitan environment. The organization of this paper is as follows. In the next section, we will briefly introduce the idea of Grid Computing and the typical infrastructures implementing this concept. In the third section, we will
1
list details of our experiment including Wellington’s CityLink, PVM, the underlay we used, and the computing application model ARPS. Next, the experiment will be described. The last part is a conclusion. 2. Grid Computing Grid Computing is an infrastructure focusing on wide-area computing. A grid based computational infrastructure is a promising next generation computing platform for solving large-scale resource intensive problems[1]. It couples a wide variety of geographically distributed computational resources (such as PCs, workstations, and clusters), storage systems, data sources, databases, computational kernels, and special purpose scientific instruments and presents them as a unified integrated resource. Its ultimate goal is to securely and affordably link people to global resources.[3] According to the papers indexed by Grid Computing (Distributed System ONLINE[5], IEEE Computer Society) the concept of wide-area computing first appeared in the early 1990’s [3]. It became popular in 1997. Currently, more than fifty projects are underway in the United States, European countries and Asia Pacific region. All these projects can be classified under one or more of Application Community Initiatives, General Purpose Grid Infrastructure Initiatives, and General-Purpose Grid Technologies and Projects[10]. Under the third category, GLOBUS[2] grows out of I-WAY[8] which is one of the earliest wide-area computing models. Most of the major grid-related projects have adopted core grid technologies from its Toolkit[9]. EcoGrid[7] which is being built at Monash University, Australia, emphasizes “computational economy”. This means the grid users interact with a Grid Resource Broker(GRB) that discovers geographically distributed resources, negotiates service costs, and undertakes the most economical resources for computing[1]. Other well-known grid technologies include Condor[11][12], Legion[15][16], etc. This paper describes a different approach to wide-area computing problems that involve parallel computation. Rather than develop new, complex middleware we have used a mature local network distributed computing technology, Parallel Virtual Machine (PVM). When compared to typical grid computing middleware, PVM is a comparatively simple and widely used technology. We will investigate PVM’s ability to solve distributed computing problems on metropolitan area networks. 3. The Experiment Wellington has had a fiber optic network through its central business district for nearly a decade. The network now extends beyond the city center connecting a large number of organizations at a minimum of 10Mbps. We chose to use this network to test the hypothesis that local network distributed computing technology could be extended to a metropolitan area network. Our goal was to demonstrate that mature stable software was already available for selected forms of grid computing. 3.1 Wellington city network – CityLink CityLink is a 1Gbps fiber optic backbone covering the Wellington Central Business District and also extending into other parts of the city. This provides a permanent high bandwidth connection for most business users who can choose to lease either 10Mbps or 100Mbps ports to connect to the Internet. CityLink also forms a major exchange point for the internet service providers located in Wellington. Victoria University has a 10Mbps connection to CityLink. A schematic diagram of Wellington city and Victoria University of Wellington network are shown in figure 1. The bold highlighting indicates those parts of the network used in our experiments.
2
Figure 1 3.2 LAN of SMCS, VUW To provide a benchmark for the experiments conducted over the metropolitan network we also ran the same experiment on the local area network of the School of Mathematical and Computing Sciences (SMCS). The relevant network is shown in figure 2. The SMCS LAN is a fully switched 100Mbps network. All servers are directly connected to the 100Mbps Ethernet ports of the central switch. Desktop workstations provided to staff and students are uplinked to second-level switches. Compared to Wellington CityLink there are no latencies caused by routing. To use CityLink we placed two workstations on CityLink two hops from the SMCS LAN. While all systems were connected to 100Mbps LANs the bandwidth of connection between SMCS and CityLink is only 10Mbps.
3
Figure 2 3.3 Advanced Regional Prediction System The Advanced Regional Prediction System (ARPS) [14] is a comprehensive regional storm-scale atmospheric modeling and prediction system developed at the Center for Analysis and Prediction of Storms (CAPS) at the University of Oklahoma. It is a complete system that includes real-time data analysis and assimilation systems, a forward prediction model and a post-analysis package. ARPS was designed for the explicit representation of convective and cold-season storms. It represents the atmosphere as a 3-dimensional grid of points. Each point holds a number of values describing the atmospheric conditions at that point at the current time. Each time step requires that new values for each point be computed using the previous values of the surrounding five by five grid. The entire grid must be recomputed for each time step, typically 6 second. The ability of ARPS to successfully predict storms depends critically upon the effective use of high performance computing and communication systems. Because thunderstorms have relatively short lifetimes (a few hours) compared to larger-scale weather systems (a few days), their associated forecasts must be generated and disseminated to the public very quickly (5 – 10 times faster than the weather evolves) to be of practical value. It is clear that computers having sustained performance will be needed if operational storm-scale prediction is to be successful. ARPS was selected as an application that would stress test our configuration. 3.4 Parallel Virtual Machine Parallel Virtual Machine (PVM)[13] is a parallel processing system that has gained wide spread acceptance in the local network user community, and is a de facto standard. PVM permits a heterogeneous collection of computers connected by a network to be used as a single large parallel computer. PVM runs on Unix and NT as well as a number of specialised parallel computers. Its heterogeneity ensures the ability to use a variety of networked resources, allowing large computational
4
problems to be solved more cost-effectively. It enables organisations to exploit existing computer hardware to solve much larger problems at minimal additional cost. The characteristics of the computational model will influence the performance of PVM. In particular the nature of the communication amongst the distributed components can have a significant impact. PVM is built on message passing. By sending and receiving messages, multiple tasks of an application situated at different computing resources belonging to the same virtual machine can cooperate to solve a problem in parallel. The PVM developers have tried to provide fast and reliable message passing and synchronisation mechanisms to its users. The software provides libraries tuned for specific architectures and protocols.[6] As stated the target network of our experiment is a metropolitan network rather than a local network. Its architecture is similar to a local area network, however connections over this network are likely to cross a greater number of routers than connections on a local area network. If PVM runs effectively over CityLink it will introduce the possibility of running large distributed applications over resources throughout the city. The meteorological model we chose is a communication-intensive model. Communications play an important role in the computing process. It’s a very good example to test the communication efficiency of PVM. If we can obtain acceptable results from this communication intensive computing model, we can expect even better results from other less communication intensive computing models in the same environment. In conclusion PVM was chosen for the experiment because: •
PVM has been proven on local area networks with architectures similar to Wellington CityLink.
•
PVM is mature and a PVM ready version of ARPS meant that minimal software development was required.
•
Applications can be distributed over a reasonably heterogeneous mix of architectures and system software.
•
Wide-area computing resources can be fully used with minimal costs.
•
Legacy code on PVM can be reused.
•
PVM provides a variety of supporting communication primitives. Parallel computation benefits significantly from primitives such as group broadcast and scatter/gather.
•
PVM has a graphic monitor, XPVM, which can support, monitor and manage parallel applications easily.
4. Our Experiment and Results The goal of our experiment is to gain experience with PVM and a real application, the meteorological model ARPS, on a metropolitan area network. We will compare the performance on a metropolitan network with the performance on a local area network within SMCS. We originally planned to carry out four sets of experiments: 1.
Run ARPS on four identical Sun workstations on the local network of SMCS. We were confident that both ARPS and PVM would run on the Solaris OS and this would allow us to address the problems associated with running ARPS in parallel mode. It would also establish a performance benchmark for later experiments.
2.
Run ARPS on up to 30 identical netBSD machines on SMCS LAN. This experiment was designed to demonstrate the scalability of PVM.
5
3.
Run ARPS on a mixture of Solaris and netBSD systems on the SMCS LAN to demonstrate PVM’s capabilities in a heterogeneous environment.
4.
Run ARPS on four identical Sun workstations, two located in SMCS and two on Wellington CityLink.
4.1 Experiment 1: ARPS on four Sun Solaris on SMCS LAN The model, ARPS, has a PVM aware version, ARPS-PVM. Essentially ARPS-PVM required a data file to be created and then partitioned into component parts corresponding to each of the processors being used and the desired matrix. Then ARPS-PVM could be run under the control of PVM. This experiment was successfully completed using two data sets. The “SMALL” data set was based on a three dimensional grid of 63x63x35 points and the “LARGE” data set on a three dimensional grid of 127x127x35 points. The experiment simulated one hour of real-time. Measurements were made with each data set. A sequential run of ARPS on a single processor was used as a benchmark. The final two measurements were made with the four processors partitioned into 4 slices(1x4x1) and into four cubes(2x2x1). The machines used were two Sun Ultra5s and two Ultra 10s. The results of these experiments are: SMALL 63x63x35 61x61x35
1 processor, sequential run 4 processors, 1x4 partition 4 processors, 2x2 partition
72 minutes 36 minutes 44 minutes
1.00 0.50 0.65
324 minutes 120 minutes 172 minutes
1.00 0.37 0.53
LARGE 127x127x35
1 processor, sequential run 4 processors, 1x4 partition 4 processors, 2x2 partition
The four machines used in the experiment were: Hostname sun1 lido debretts tahi
Configuration Sun Ultra 10, 440MHz with 640MB, Solaris 8 Sun Ultra 10, 440MHz with 640MB, Solaris 8 Sun Ultra 10, 333MHz with 1GB, Solaris 8 Sun Ultra 5, 333MHz with 384MB, Solaris 8
The ARPS application on these systems is communication intensive. The XPVM control console allowed us to view the execution as shown by the snapshot in figure 3. The pattern of computing over a portion of the grid followed by an exchange of information at the boundaries is clear. Significant waiting times are visible at the times data is exchanged.
6
Figure 3 4.2 Experiment 2: ARPS on four NetBSD on SMCS LAN The 1x4 experiment used in the first experiment was repeated on four netBSD systems. The results were: 1 processor, sequential run 302 minutes 4 processors, 1x4 partition 235 minutes Running environments: 4 533MHz Pentium III processors with 128Mbytes of memory 63x63x35
1.00 0.78
As the results show the elapsed times on these systems were significantly longer than those on the Suns. This may be due to the speed of the Ethernet channel, bus or file server used by these systems. Again it was clear that communication was the limiting factor. Running ARPS and PVM on the netBSD boxes proved troublesome. The virtual machines controlled by PVM often could not be started up properly. From this we learned that while PVM is designed to handle heterogeneous environments ARPS-PVM was not. ARPS uses a binary format for data storage which prevented us from performing the third set of experiments with heterogenous system. 4.3 Experiment 4: ARPS on four Sun Solaris on Wellington CityLink For this experiment, two Sun Ultra-5 workstations were put at a location on CityLink, one Sun Ultra 10 and one Ultra5 workstation were located on the SMCS LAN. Figure 1 highlights the network of the experiment. The four machines used in the experiment were: Hostname sun1 sun2 debretts
Configuration Sun Ultra 5, 360MHz with 512MB, Solaris 8 Sun Ultra 5, 360MHz with 512MB, Solaris 8 Sun Ultra 10, 333MHz with 1GB, Solaris 8
7
Location CityLink CityLink SMCS LAN
tahi
Sun Ultra 5, 333MHz with 384MB, Solaris 8
SMCS LAN
The minimum network capacity between the machines located at CityLink and those at SMCS was 10 Mbps. We used the LARGE data set from experiment one and a 2x2 processor partition. The following table compares all results with this data set and partition with single processor benchmark. LARGE 127x127x35 SMCS LAN 127x127x35 CityLink
1 processor, sequential run 4 processors, 2x2 partition 4 processors, 2x2 partition 4 processors, 2x2 partition 4 processors, 2x2 partition 4 processors, 2x2 partition 4 processors, 2x2 partition
324 minutes 183 minutes 172 minutes 171 minutes 342 minutes 270 minutes 202 minutes
1.00 0.56 0.53 0.53 1.06 0.83 0.62
As the virtual machine controlled by PVM is totally transparent to users, we have no control over the machines that PVM actually chooses to use for a particular run. In the first experimental run on CityLink, sun1, sun2 and debretts were used with two processes on debretts. In the second and third runs it ran one process on each of the four machines. Figure 4 shows the communications in this experiment. Compared with figure3, there is more waiting time when the same experiment runs on a metropolitan network. This is presumed to arise from delays in communication.
Figure 4 5. Conclusion We have successfully run a complex task on four processors and achieved speed ups of up to two on a local network and up to 1.6 on a metropolitan area network. The metropolitan network results had a higher variance reflecting the different traffic patterns on the network at different times of day. The best results were achieved when the network was quiet. ARPS is relatively communication intensive and we
8
expect to be able to achieve better results with tasks that have a higher ratio of computing to communication. While we have demonstrated the feasibility of using local network middleware on a metropolitan network we still have a good deal to learn with respect to achieving effective distributed computing over such a network. Consider the SMALL data set as an example. The 2x2 processor partition communicates 1120 data values across 4 separate interfaces on each iteration. The 1x4 partition communicates 2205 data values across 3 interfaces. Although the latter communicates more data it uses fewer messages which enable it to achieve greater speedup. Message switching delays appears to have a greater impact than bandwidth. The high performance of Wellington’s metropolitan network contributed much to the success of our experiment. In the future, it will be possible to carry out grid computing over this sort of metropolitan area network with local network technologies, such as PVM. This will provide a cost effective use of distributed computing resources. 6. References [1] Rajkumar Buyya, Jonathan Giddy, David Abramson, An Economy Grid Architecture for ServiceOriented Grid Computing, 10th IEEE International Heterogeneous Computing Workshop(HCW 2001), In conjunction with IPDPS 2001, San Francisco, California, USA, April 2001 [2] I. Foster, K. Kesselman, The Globus Project: a status report, IPPS/SPDP’98 Heterogeneous Computing Workshop S.4-18 (1998) [3] A.S. Grimshaw, W.A. Wulf, J.C. French, A.C. Weaver and P.F. Reynolds, Legion: the next Logical Step toward a National wide Virtual Computer, Technical Report CS-94-21(University of Virginia, 1994) [4] "Grid Technology Takes on the 21st Century", Shani Murray [5] http://www.computer.org/dsonline/gc [6] Honbo Zhou, Al Geist, Faster Message Passing in PVM, 1995 [7] Economy Grid - http://www.csse.monash.edu.au/~rajkumar/ecogrid [8] I. Foster, J. Geisler, W. Nickless, W. Smith, and S. Tuecke. Software infrastructure for the I-WAY metacomputing experiment. Concurrency: Practice & Experience. to appear. [9] The Globus Project - http://www.globus.org [10] Global Grid Forum – http://www.gridforum.org [11] Condor High Throughput Computing – http://www.cs.wisc.edu/condor [12] Miron Livny, Jim Basney, Rajesh Raman, and Todd Tannenbaum, "Mechanisms for High Throughput Computing", SPEEDUP Journal, Vol. 11, No. 1, June 1997. [13] PVM(Parallel Virtual Machine) – http://www.epm.ornl.gov/pvm/pvm_home.html [14] Advanced Regional Prediction System(ARPS) – http://www.caps.ou.edu/ARPS/ [15] Legion World Wide Virtual Computer - http://www.cs.virginia.edu/~legion/ [16] Andrew S. Grimshaw, William A. Wulf, James C. French, Alfred C. Weaver, Paul F. Reynolds Jr., A Synopsis of the Legion Project, UVa CS Technical Report
9