An Efficient Network Information Model Using NWS for Grid Computing Environments* Chao-Tung Yang1,** , Po-Chi Shih1,2, Sung-Yi Chen1, and Wen-Chung Shih3 1
High-Performance Computing Laboratory, Department of Computer Science and Information Engineering, Tunghai University, Taichung, 40704 Taiwan, R.O.C.
[email protected] 2 Department of Computer Science, National Tsing Hua University, Hsinchu, 30013 Taiwan, R.O.C.
[email protected] 3 Department of Computer and Information Science, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C.
[email protected]
Abstract. Grid computing technologies enable large-scale aggregation and sharing of resources via wide-area networks focused on sharing computational, data, and other resources to form general-purpose services for users. In this paper, we address network information gathering and focus on providing approximate measurement models for network-related information using Network Weather Service (NWS) for future scheduling and benchmarking. We propose a network measurement model for gathering network-related information including bandwidth, latency, forecasting, error rates, etc., without generating excessive system overhead. We consider inaccuracies in real-world network values in generating approximation values for future use. Keywords: Network information, NWS, Globus, Grid computing, Bandwidth.
1 Introduction Grid computing is commonly used by scientists and researchers to solve complex problems in parallel and distributed paradigm [1, 2, 3, 4, 5, 12, 13]. A key issue in Grid computing environments is providing a centralized interface that enables users to make use of various resources easily. Grid computing technologies includes many elements such as user authentication, job description, information gathering, job scheduling, and resource dispatching. In this work, we concerned with information gathering. Without precise information, no scheduling strategy or algorithm will work well. *
The authors would like to acknowledge the National Center for High-Performance Computing for sponsoring the Taiwan UniGrid project, under the national project “Taiwan Knowledge Innovation National Grid”. This work is supported in part by National Science Council Taiwan, under grants no. NSC93-2213-E-029-026, NSC94-2213-E-029-002, and NSC93-2119-M-002-004. ** Corresponding author. H. Zhuge and G.C. Fox (Eds.): GCC 2005, LNCS 3795, pp. 287 – 299, 2005. © Springer-Verlag Berlin Heidelberg 2005
288
C.-T. Yang et al.
In this paper we report on using the Globus Toolkit [6, 7, 10] to build a grid environment. Globus is a commonly used middleware for constructing grid environments and providing development tools. Its main advantage is that it supplies a good security model with a provision for hierarchically collecting information about the grid. Regardless of grid type, bandwidth management is a question of manipulating numerous variables to support systems and maximizes grid performance [15]. As grid computing becomes popular, there is a need to manage and monitor available resources worldwide, as well as a need to convey these resources to people for everyday use. Network bandwidth is crucial when work is distributed around groups of machines, especially when jobs require heavy communication. To understand why managing bandwidth is so critical, we need to analyze what affects overall grid performance. The Globus Information Service, Monitor and Discover Service (MDS), provides good system-related information support on CPU speeds, CPU loading, memory utilization, etc., but no network-related information support [5, 12]. Therefore, we make use of the open-source program NWS for providing network information. Network Weather Service (NWS) can measure point-to-point network bandwidth and latency that may be important for grid scheduling and load balancing [9, 11]. NWS detects all network states during time periods selected by the user. Because this kind of site-to-site measurement results in N(N-1) network measurement processes, the time complexity is O(N2). Our network model focuses on solving the problem of reducing this time complexity without losing too much precision. In this paper, we focus on providing approximate measurement models for network-related information using NWS for future scheduling and benchmarking. We first propose a network measurement model for gathering network-related information including bandwidth, latency, forecasting, error rates, etc., without generating excessive system overhead. We then consider inaccuracies in real-world network values in generating approximation values for future use. This paper is organized as follows. We give a background review of NWS and Globus in Section 2. In Section 3 we describe our measurement model and research questions. In Section 4 we report on a grid computing environment constructed in three Taiwan schools using the Globus Toolkit, and experimental results and discussion are presented. We conclude this study in Section 5.
2 Background Review 2.1 Globus Project The Globus Toolkit is an open source software toolkit used for building Grid systems and applications [10]. It is being developed by the Globus Alliance and many others all over the world. A growing number of projects and companies are using the Globus Toolkit to unlock the potential of grids for their causes. The Globus Toolkit has become an actual standard for Grid middleware to handle these four kinds of services: • Resource management: Grid Resource Allocation & Management (GRAM) • Information Services: Monitoring and Discovery Service (MDS)
An Efficient Network Information Model Using NWS
289
• Security Services: Grid Security Infrastructure (GSI) • Data Movement and Management: Global Access to Secondary Storage (GASS) and GridFTP. GRAM is designed to provide a single common protocol and API for requesting and using remote system resources, by providing a uniform, flexible interface to local job scheduling systems. The Grid Security Infrastructure (GSI) provides mutual authentication of both users and remote resources using GSI (Grid-wide) PKI-based identities. GRAM provides a simple authorization mechanism based on GSI identities and a mechanism to map GSI identities to local user accounts. MDS is designed to provide a standard mechanism for publishing and discovering resource status and configuration information. It provides a uniform, flexible interface to data collected by lower-level information providers. It has a decentralized structure that allows it to scale, and it can handle static (e.g., OS, CPU types, and system architectures) or dynamic data (e.g., disk availability, memory availability, and loading). A project can also restrict access to data by combining GSI (Grid Security Infrastructure) credentials and authorization features provided by MDS. GridFTP [3] is a high-performance, secure, and reliable data transfer protocol optimized for high-bandwidth wide-area networks. The GridFTP protocol is based on FTP, the highly-popular Internet File Transfer protocol. 2.2 Network Weather Service The Network Weather Service, though not targeted on clusters, is a distributed system that periodically monitors and dynamically forecasts the performance that various network and computational resources can deliver over a given time interval. The service operates a distributed set of performance sensors (network monitors, CPU monitors, etc.) from which it gathers system condition information. It then uses numerical models to generate forecasts of what the conditions will be for a given time period. It also uses mathematical models to forecast each condition and the Mean Absolute Error (MAE) and Mean Square Error (MSE) rates. NWS is a widely used measurement tool for Grid environments. Studies on topics, such as load balancing, scheduling, brokering, replica selection, etc., are available [9, 11].
3 Network Information Model We constructed a network measurement model to solve a complete point-to-point network measurement problem. Consider the twelve-node grid environment shown in Figure 1. The lines linking the nodes represent site-to-site network measurement. A “node” or “site” here represents a single machines or a personal computer. This model is often used for local grids or cluster environments when the scale is not too large. In large-scale grid environments this kind of architecture results in excessive bandwidth overhead. In order to reduce the total number of times of NWS measurement, we proposed the “domain” concept shown in Figure 2 to partition the network measurement environment.
290
C.-T. Yang et al.
Fig. 1. Network measurement model
Fig. 2. Domain-based network measurement model
In our domain-based model, we define several hosts as a domain (“host” means the same as “node” and “site”). Figure 2 shows three domains, one with 5 hosts at left, one with 3 hosts on top, and one with 4 hosts at right-bottom. They are linked by “borders” that form a central domain. We thus need only pairwise measurements within domains. Domain-to-domain data is measured by the central-domain, considerably reducing the number of measurements required. Our domain-based model looks like the clique of NWS seemingly, but they are not in same level. The Clique co-operates with token passing skill and it is a bottom level component built in NWS. Our model is on top of all. Hence, its advantages include reducing the number of experiments conducting. Our main idea is to predict some links that are lost in our model. We will discuss several types of cases about prediction technique related to our model later. Now we can extend existing group models and consider how to construct domain-based models from geographical and real bandwidth perspectives. We separate schools or organizations into domains on the basis of geography. Hosts in each domain are thought of as tightly coupled, and it is convenient for each domain to control and maintain its hosts. The domains may each use a different network infrastructure: Fast Ethernet, Gigabit, or InfiniBand. This design ensures that local fluctuations won’t affect the entire grid system. Some questions about the model remain: • how to select a representative host in each domain to form a central domain without loss of generality? • how to accurately evaluate host-to-host network information lost in the domain model? About the first question, we should discuss about what is a representative border in a domain. In our model, a job may be submitted cross different domains. In this case, we will consider the bandwidth between each border to decide where we should submit our jobs (we focus on parallel jobs). The host with the worst bandwidth will encumber the total execution time. Therefore the representative border must be the one which has the worst bandwidth.
An Efficient Network Information Model Using NWS
291
Here we discuss how to select borders. One way is to conduct an all-pair network test and select the worst one in each domain. However, this may not work in a real grid environment because the organizations owning the domains may each control their hosts according to different policies. Another way is to let domain administrators select borders according to network topology or architecture (maybe select the host in the deepest topology away from the router who connects to WLAN). These two methods are not smart and not scalable, so we propose an alternative way to select borders. 1. When the Grid is first being built up, pick some domains (perhaps 2~4 if the total number of domains is more than that) to start with. Then, perform one of the above ways to select borders and save them in a border list. 2. The other domains (not selected in step one) should select a border one by one according to network topology, or test all hosts in each domain against each border in border list to select one and add to the list. We take Figure 3 for example. The top and right domains have already selected their borders by step 1. Now we take left domain to perform step 2.We perform network tests to know the bandwidth of R1, R2, … , R5 and T1, T2, … ,T5. Then select the one who has the minimal bandwidth (minimize Rn+Tn for n = 1 to 5) 3. Repeat step 2 until every domain has found a best border. This simplifies construction complexity to avoid all pair network testing and makes the grid environment scalable for adding new domains. When a new domain will join the existing grid, only step 3 need be performed.
Fig. 3. Border selection diagram
About the second question, in order to obtain the all-pair network values lost in out model, we use a few measured values from NWS to estimate the lost value. Figure 4 shows an example of a network estimation model. The line connecting alpha1 and lz01 is part of the central-domain, which we called the Bridge in our experimental environment. The solid lines mean that our domain model has gotten the
292
C.-T. Yang et al.
network information; the dotted lines are examples of many lines our model has not measured, so we use an evaluation model to calculate them. The following notation is used throughout this paper. • • • • • • •
B_inavg: average inner-domain bandwidth (Mbits/sec) B_outavg: average outer-domain bandwidth (Mbits/sec) Pflu: amplitude of bandwidth fluctuation (%) Nflu: number of times of bandwidth fluctuation detected Pvaflu: valid fluctuation rate (%) Use: Boolean value signifying whether bandwidth has been used Lij(k): the kth inner-domain measurement value counted backward from node i to node j.
B_inavg and B_outavg are obtained by averaging historical bandwidth values. Nflu traces network fluctuations over given time periods ignoring pulses and bandwidth noise. Our algorithm will detect the bandwidth value of last Nflu times similar to a sliding window. Pvaflu shows how much percentage of bandwidth fluctuation occurred during Nflu times, which is treated as actual bandwidth use (Use). Lij(k) is the latest kth value measured by NWS. We give a default values that Pflu = 30%, Nflu = 10, time period = 5 second (achieved by setting NWS sensor period), and Pvaflu = 80% through our experiment. We employ our algorithm in three separate cases to consider possible bandwidth usage patterns. Case 1 Assume the inner domain bandwidth use shown in Figure 5. The bandwidth use occurred between alpha2 to alpha3. We want to investigate that how it affects our target bandwidth. This is complex because the usage between alpha2 and alpha3 may not affect the bridge bandwidth much. We use an algorithm to calculate the target bandwidth. Left domain bandwidth fluctuation is examined first, ignoring pulses and bandwidth noise fluctuations: Right Domain
Left Domain alpha2
Target Bandwidth
alpha1
lz03
lz01 Bridge
alpha3
Border
Fig. 4. Network estimate model
lz02
Fig. 5. Bandwidth usage in Case 1
An Efficient Network Information Model Using NWS
Use = CountIf( Lij [k ] − B _ inavg > P ) > (Nflu×Pvaflu), k = 1, …, Nflu., flu B _ inavg
293
(1)
∀ij in left domain.
The function CountIf is used to count the times if the condition in brackets is L [ k ] − B _ inavg is the percentage of bandwidth down. Nflu×Pvaflu = 8 is set as true. ij B _ inavg our default value. For example, the formula Use is true if the percentage of bandwidth down large than 30% is more than 8 times over last 10 network values measured by NWS. The remaining bandwidth usage is then calculated by ignoring the maximal and minimal bandwidth values via sorting Lij list. N flu −1
Brem =
∑ L [k ] k =2
ij
(2)
N flu − 2
Finally, the target bandwidth is calculated as follows: Btar =
Brem × B _ outavg × α B _ inavg
(3)
The symbol α here indicates a value converted from internet bandwidth to LAN and is used throughout this paper. Case 2 We assumed that bandwidth use occurs within organizations but not with other members of the domain (machine pc1), as shown in Figure 6. Figure 7 shows the general topology of network architecture. The network transfer will go through the top switch. So we claim that the target bandwidth almost follows bridge bandwidth.
Fig. 6. Bandwidth usage in Case 2
Fig. 7. Topology for Case 2
Case 3 We assumed that bandwidth use occurs between two domains, as shown in Figure 8. Bandwidth use between alpha2 and lz03 will affect the available bridge bandwidth. So we claim that the target bandwidth almost follows bridge bandwidth just like Case 2.
294
C.-T. Yang et al.
Fig. 8. Bandwidth usage in Case 3
4 Experimental Results We set up the two domains shown in Figure 9 for experimentation using our grid testbed. The alpha domain with four hosts is situated at the THU HPC laboratory, and the lz-domain, also with 4 hosts, is at the LZ senior high school. We installed and set up NWS on each node and configured the domains appropriately. The servers alpha1 and lz01 were used as border nodes and connected to the other nodes in their respective domains. Both domains have 100MB fast Ethernet capacity.
Fig. 9. Our experimental environment overview
Name Server, Memory Server, and Sensor were installed on lz01 and alpha1 in order to use NWS for network measurement. Sensor is used to monitor which nodes communicate with other nodes. For example, the lz-domain sensor measured link information between nodes in the lz-domain and stored the measurements in the lz01 Memory Server. The lz01-toall sensor measured link information between the border nodes lz01 and alpha1 and stored the results in the lz01 Memory Server. We used the Java CoG Kit [8] to implement the GUI as shown in Figures 10, 11, and 12. Our application can connect with the grid system by using the Java CoG kit. The key characteristics include:
An Efficient Network Information Model Using NWS
295
• GridProxyInit, which creates a limited-life proxy for authorized users to access grid resources. • GridConfigureDialog, which uses the CoG Kit UITool to enable users to configure the number of process and the grid server host name. • GridJob, which creates GramJob instances. This class represents a simple gram job, allowing submitting jobs to a gatekeeper, canceling them, sending signal commands, and registering and unregistering callbacks. • GetRSL, RSL provides a common interchange language for describing resources. The various Globus Resource Management architecture components manipulate RSL strings to perform their management functions in cooperation with the other system components. GetRSL combines RSL strings. • JobMonitor, uses two parameters, Gridjob and RSL to start GlobusRun and monitor the job process. • GlobusRun is a factory method for creating and exporting credentials, and submitting jobs to the grid server and receiving the responses from it. We also developed some APIs for our system. For example, ProxyDestroy, which destroys CA files to protect the grid system from the application site we can configure the machinefile for the grid system.
Fig. 10. Domain information is used to show network information
Fig. 11. lz-domain is selected
296
C.-T. Yang et al.
Fig. 12. Detailed information provided by the GUI
The symbol “~” is used below to indicate connection speed. For example, lz01~alpha1 is the connection speed between lz01 and alpha1. We transferred an 800MB file from alpha2 to alpha3 to simulate bandwidth usage. We observed five links, including three links in the alpha-domain, and one bridge lz01~alpha1. However, the last one lz01~alpha2 is not shown in our model, which was added for experimental comparison. Figure 13 clearly shows that alpha2~alpha3 did not affect lz01~alpha1. Therefore, to evaluate lz01~alpha2 (not shown in our network model), we had to observe the connection speed inside the alpha domain. Figure 14 shows the
Prediction Result 35
90
30
80 70 lz01_alpha2 lz01_alpha1 alpha1_alpha2 alpha2_alpha3 alpha1_alpha3
60 50 40 30 20
Bandwidth (Mbits/sec)
25 20
lz01_alpha1 lz01_alpha2 forecast
15 10 5
10
Fig. 13. Experimental result for transferring an 800MB file from alpha2 to alpha3
200
180
160
140
120
80
100
60
40
200
185
170
155
140
125
95
110
80
65
50
Time Stamp = 5 sec
0
0
0
20
Bandwidth (Mbits/sec)
CASE 1 100
Time Stamp = 5 sec
Fig. 14. Experimental results comparing actual bandwidth with our algorithm prediction
An Efficient Network Information Model Using NWS
297
prediction result of our algorithm. The curve of target bandwidth fits the actual bandwidth as our expectation. But there exists some delay caused by Nflu times sliding window detection period. So we summarize that our algorithm can predict the network behavior with a little delay. Case 2 In this case, the 800MB file was transferred from alpha2 to an external server called pc1, as shown in Figure 15. The connection speeds shown for lz01~alpha1 and alpha1~alpha2 were obtained by NWS; besides, lz01~alpha2 was measured for comparison. In this case, the curve of lz01~alpha1 is almost the same as lz01~alpha2. This result fits our claim that the target bandwidth goes with the border bandwidth. Case 3 In this case, the 800MB file was transferred from alpha2 to lz02. The connection speed between lz01~alpha1 shown in Figure 16 was obtained using NWS. The others were measured for experimental comparison. The situation of lz01~alpha2 can be substitute with lz01~alpha1 by experiment result. The situation of alpha1~lz02 can be substituted with alpha1~lz01 by the two situation above, and lz02~alpha2 can be substituted with lz01~alpha1. The experimental result confirms the inference above. CASE 3 35
70
30
60 lz01_alpha1 lz01_alpha2 pc1_alpha1 pc1_alpha2 alpha1_alpha2
50 40 30 20
Bandwidth (Mbits/sec)
25
lz01_alpha2 lz02_alpha2 lz01_alpha1 lz02_alpha1
20 15 10 5
10
Fig. 15. Experimental results for transferring an 800MB file from alpha2 to the pc1
245
230
215
200
185
170
155
140
95
125
110
80
245
230
215
200
185
170
155
140
95
125
80
110
65
50
Time Stamp = 5 sec
65
0
0
50
Bandwidth (Mbits/sec)
CASE 2 80
Time Stamp = 5 sec
Fig. 16. Experimental results for transferring an 800MB file from alpha2 to lz02
Through the above three experiments, we can summarize the result under 3 cases. Case 1: the target bandwidth can be predicted by our algorithm within acceptable error and delay. Case 2: the bridge bandwidth can be use to substituted with target bandwidth. Case 3: almost likes Case 2. The second experiment tested NWS measurement values for various packet (frame) sizes. There exists an inaccuracy problem with NWS. We want to know if we can improve the accuracy by adjusting some setting of NWS. The most effective parameter is the packet size used by NWS sensor. The results of varying packet sizes shown in Figures 17 and 18 indicate that the measured bandwidth increased as packet size increased and approached the maximum bandwidth. Our test indicated that a 512Kbits packet size is most suitable, which can avoid wasting too much bandwidth in our grid environment.
298
C.-T. Yang et al.
Inaccuracy compare with packet size test from pc1
Inaccuracy compare with packet size test from alpha1 100
90
90
80
70
alpha3 pc1 lz01 lz03 tc01 tc03
60 50 40 30
Bandwidth (Mbits/s)
Bandwidth (Mbits/s)
80
70 alpha1 alpha3 lz01 lz03 tc01 tc03
60 50 40 30
20
20
10
10 0
0 64
128
256
512
1024
2048
4096
Packet size (Kbits)
Fig. 17. NWS measurement value testing from alpha1
64
128
256
512
1024
2048
4096
Packet size (Kbits)
Fig. 18. NWS measurement value testing from pc1
5 Conclusion We constructed a domain-based model to investigate reducing network measurements. We also proposed an algorithm to evaluate point-to-point network bandwidth without measuring. Finally we considered inaccuracies in real-world network values by generating approximation values for future use. The results show that our algorithm was useful in calculating target bandwidths in Cases 1, 2, and 3. We also compared the influence of different packet sizes. The experimental results showed that a packet size of 512-Kbits provided good accuracy without causing too much overhead. In future work, we will test various kinds of network usage and organizations with different network architectures. We will also test various parameters and NWS inaccuracy measures to provide more precise network information.
References 1. B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnal, S. Tuecke. “Data Management and Transfer in High Performance Computational Grid Environments.” Parallel Computing, 28(5):749-771, May 2002. 2. B. Allcock, S. Tuecke, I. Foster, A. Chervenak, and C. Kesselman. “Protocols and Services for Distributed Data-Intensive Science.” ACAT2000 Proceedings, pp. 161-163, 2000. 3. W. Allcock, J. Bester, J. Bresnahan, A. Chervenak, L. Liming, S. Meder, S. Tuecke. “GridFTP Protocol Specification.” GGF GridFTP Working Group Document, September 2002. 4. I. Foster, C. Kesselman, S. Tuecke. “The Anatomy of the Grid: Enabling Scalable Virtual Organizations.” Int. J. of Supercomputer Applications and High Performance Computing, 15(3), pp. 200-222, 2001. 5. I. Foster, C. Kesselman, “Globus: A Metacomputing Infrastructure Toolkit,” Intl J. Supercomputer Applications, 11(2), pp. 115-128, 1997. 6. Global Grid Forum, http://www.ggf.org
An Efficient Network Information Model Using NWS
299
7. IBM Redbooks, “Introduction to Grid Computing with Globus”, http://www.redbooks. ibm.com/redbooks/pdfs/sg246895.pdf 8. Java CoG Kits, http://www.cogkit.org/ 9. Network Weather Service, http://nws.cs.ucsb.edu/ 10. The Globus Alliance, http://www.globus.org/ 11. R. Wolski, N. Spring and J. Hayes, “The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing,” Future Generation Computer Systems, 15(5-6):757-768, 1999. 12. Chao-Tung Yang, Chuan-Lin Lai, Po-Chi Shih, and Kuan-Ching Li, “A Resource Broker for Computing Nodes Selection in Grid Environments,” Grid and Cooperative Computing GCC 2004: Third International Conference, Lecture Notes in Computer Science, SpringerVerlag, Hai Jin, Yi Pan, Nong Xiao (Eds.), vol. 3251, pp. 931-934, Oct. 2004. 13. Chao-Tung Yang, Po-Chi Shih, Kuan-Ching Li, “A High-Performance Computational Resource Broker for Grid Computing Environments,” Proceedings of the International Conference on Advanced Information Networking and Applications (AINA 2005) INA’2005 The First International Workshop on Information Networking and Application, vol. 2, pp. 333-336, Tamkang University, Taipei, Taiwan, March 28-30, 2005. 14. X. Zhang, J. Freschl, and J. Schopf, “A Performance Study of Monitoring and Information Services for Distributed Systems”, Proceedings of 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12 ‘03), pp. 270-282, August 2003. 15. H. Zhuge, “The Future Interconnection Environment,” IEEE Computer, 38(4): 27-33, 2005.