A Jobs Allocation Strategy for Multiple DRBL Diskless ... - IEEE Xplore

A Jobs Allocation Strategy for Multiple DRBL Diskless Linux Clusters with Condor Schedulers* Chao-Tung Yang†

Ping-I Chen Sung-Yi Chen Hao-Yu Tung High-Performance Computing Laboratory Department of Computer Science and Information Engineering Tunghai University, Taichung, 40704, Taiwan {ctyang, g932834, g942803}@thu.edu.tw Abstract

In this paper, we construct four DRBL-based diskless Linux PC clusters and install Globus Toolkit and Condor scheduler on the master node of each cluster. All diskless clusters can be merged into a powerful computing environment. To utilize the computing power, we propose a jobs allocation strategy for application execution. We use HPL to conduct some performance evaluations and analyze those data to form a formula for jobs allocation. The information server will calculate the value of computing power based on our formula and find the most appropriate cluster to execute the job. It can make our large computing environment that can execute the jobs in a more efficient way.

1. Introduction The diskless Linux PC cluster differs from the traditional one in that a network is used to provide not only inter-processor communications but also a medium for booting and transmission for a live file system [5]. Generally, it consists of one or more servers which provide not only bootstrap service but also related network services and many clients with no hard disk drive requesting for booting from the network. Thus, each diskless node before booting can boot through a floppy disk or a NIC’s boot ROM with a small bootstrap program and even with a NIC’s PXE [9], which sends a broadcast packet to a DHCP [4] server and is then assigned an IP address. After each node has been assigned a valid IP address, it sends a request to the TFTP [10] server for getting the boot image, referred to the Linux Kernel, through TCP/IP

protocol and starts the booting process. During the booting process, all the necessary system files are transmitted through the network. After the remote file system is mounted as root file system (NFS_ROOT), and the system initialization is done, the node is ready to work. In this paper, we construct four DRBL diskless Linux PC clusters to form large computing environments. DRBL stands for Diskless Remote Boot in Linux [5, 11]. This solution is solely designed and implemented by people in National Center for Highperformance Computing, Taiwan. DRBL uses PXE/etherboot, NFS, and NIS to provide services to client machines [5]. Once the server is ready to be a DRBL server, the client machines can boot via PXE/etherboot. In this work, we install Globus Toolkit 4.0.1 [6] and Condor 6.6.10 [3] on the master node of each cluster. All the clusters can be merged into one powerful computing environment. We use JSP and RRDtool to build a web-based monitoring system [12]. We can monitor all the clusters in our computing environment, and more detail information, like CPU, memory, network, and I/O status, about each node. We also build a web-based job submission system. It can help people who were first time to use Condor successfully submit and execute the job without the detail. The only restriction is that the job only can be submitted to a single cluster. It can not be run over two or more PC clusters simultaneously. To utilize the computing power, we propose a jobs allocation strategy for application execution. We use HPL [7] to perform some performance evaluations and analyze those data to form a formula for job allocations. Users only need to decide how many processors need

*

This work is supported in part by National Science Council, Taiwan ROC, under grants no. NSC94-2622-E-029-002-CC3 and NSC95-2622-E-029-003-CC3. † The corresponding author.

Proceedings of the Fifth International Conference on Grid and Cooperative Computing (GCC'06) 0-7695-2694-2/06 $20.00 © 2006

to be used. The information server will calculate the value of computing power based on our formula and find the most appropriate cluster to execute the job. It can make our large computing environment that can execute the jobs in a more efficient way.

check how many CPUs are available in each cluster and the node’s hardware information (CPU speed, memory size, and network speed). Afterward the information server will calculate TP and choose the best VO in one of the clusters.

2. Jobs Allocation Strategy

2.2.2. Performance Value

2.1. Main Steps As we know, Condor scheduler can successfully make the cluster join into grid environment. However, it can’t detect other cluster’s information about how many job are in the condor queue and how many CPUs are in used. Condor only can monitor their local resource. Therefore, we created the job submission and monitoring system. It is very similar to the eMinerals Clusters which was created by Richard P. Bruin et al. [1]. But it’s not easy for a user to decide which cluster is the best for them to submit the job and be executed immediately. To make our DRBL clusters computing environment that can be worked in a more efficient way, HPL (High Performance Linpack) is used to conduct performance tests to find out the relationship between machine’s performance and hardware [9]. We wanted to establish a job allocation mechanism to ensure that the user’s job can be executed in the most suitable machine [2]. Here are three main steps about our job submission procedure. 1. User login the job submission portal which is in the information server. 2. The information portal will determine the computing power value about each cluster. 3. Choose the cluster which has the largest computing power value and submit the job to it. We will introduce the detail method of how the jobs allocation strategy worked and how we get those performance values in the following section.

2.2. Determining the Computing Power Value 2.2.1. Total Computing Power (TP) Our DRBL grid environment is based on several clusters and the cluster nodes can not directly accept the job which is submitted from the Globus. Thus, the cluster’s head node will accept the jobs from information server and resubmit the job to the node. So, the Condor scheduler in each cluster will find some machines and group them into one virtual organization (VO) when the job needs to be executed. We use TP to represent the computing power of those machines which the user needed. The TP can be divided into three main parts (CPU, Memory, and Network). User input the number of CPU (NP) which they want to use for job execution. Then, the information server will

PCij and PMij mean the performance value of each machine based on the ith cluster and jth machine’s CPU and memory, respectively. We use the statistics to analyze the execution results of HPL application. First, we fix the memory size and change the number of CPU to conduct the HPL performance test. Then, we fix the CPU number and changed the HPL problem size to conduct the performance test to find out the incidence of different memory size which was been used. Finally, we give each type of CPU and memory size in our environment a performance value based on those performance test. Because the SMP machine will share the total memory size so that each CPU will have less memory to use in contrast to the single processor machine which has the same memory size. We use the NW value to represent this situation. The memory size is very important for a diskless PC cluster. 2.2.3. Performance Effect Ratio There are three kinds of performance effect ratio in our formula. The ME value is based on correlation coefficient value between CPU and HPL value. The (1ME) value represents the performance effect ratio of memory size and HPL value. The square bracket of our formula means the inner effect of the machine. So, the ME value is worked out Cov ( CPU , HPL ) . Then, we make the by Cov ( CPU , HPL ) + Cov ( memory , HPL )

HPL performance test in one of the cluster and change the switch from gigabit to 10/100 to find out the effect of different network speed on performance test. There are two NE ratios, one is for gigabit, and another is for 10/100. The NE value of gigabit is worked out Cov ( gigabit , HPL ) , and so does the NE by Cov ( gigabit , HPL ) + Cov (10 / 100, HPL )

value of 10/100.

2.3. The Formula We list all the terminology in the following: z TP: Total computing power z NP: Number of CPUs the user needed z PCij: Performance value of each machine based on the ith cluster and jth machine’s CPU z ME: Performance effect ratio of CPU


NW: N-way processor PMij: Performance value of each machine based on the ith cluster and jth machine’s memory z 1-ME: Performance effect ratio of memory z NE: Network effect ratio Our formula is listed as follows: z z

16 CPUs computing power and they also can accept other jobs to run.

n n    PMij ∑∑   n n  PCij 1 i =1 j =1   TP = NP * ∑∑ * ME + * * (1 − ME )   * NE   i =1 j =1 NP  NW NM      

The values will be ranked from 0 to 1, and the most powerful equipments will be set to 1. Other values will be calculated according to the variation ratio of the performance. For example, the AMD MP 2600+ is the most powerful CPU in our environment. So, the performance value will be set to 1, and AMD MP 2000+ will be calculated and set to 0.6 according to the variation ratio of the performance. The relative values of our formula are shown in Table 1. Table 1: The values of our formula PC AMD Athlon MP 2000+ AMD Athlon MP 2600+ P4-1.8G

PM 0.6 1

1GB

Machine Effect 1

256MB 0.6

NE

ME

0.6

Gigabit

1

1-ME

0.4

10/100 0.8

0.45

3. Experimental Results 3.1. Machine Selection Procedure We install four cluster sites as shown in Figure 1 and the total number of CPUs is 72. As we mentioned in previous section, the information server will calculate the TP value of each cluster in our environment. Then, it will decide which cluster is used for submitting the job. First, it will compare the TP value of our four cluster sites and choose the cluster which has the biggest TP value. This means that the job will be sent to the most powerful machine based on the cluster’s situation at that time. Second, it will check whether the clusters have the same TP values or not. If the TP values of the clusters are the same, it will choose the cluster which has the most unused CPUs. Thus, we can make the job been submitted to the cluster which has the most number of CPUs. For example, if we choose 16 CPUs to run a job, the information server will find out that the TP values of condor1 and condor2 are the same. That is because the machines of condor1 and condor2 are the same except the numbers of nodes. If we submit this job to condor2, and we find out that this job is very time consuming. The condor2 will be monopolized for a long time and can not accept other job until this job finished. So, we submitted the job to condor1. In this way, each of the condor1 and condor2 will still have

Figure 1. DRBL grid architecture Third, if the TP values and unused CPUs of the clusters are the same. It will choose the cluster which has the least job executed on it. The idea is the same with the second step. We want to make the DRBL grid environment to accept and execute more jobs at once to reduce the waiting time in global queue. After the job was been submitted and successfully executed. The information server will record the program name, HPL value about this job, and execution time. Next time, when the same program is been submitted to information server and calculates the TP values again. The information server will check the record and compare with the new TP value. The new TP value should be equal or larger than the order one. In some cases, like all the powerful clusters are running jobs and can not accept the job anymore, the job will be sent to the worst machine.

3.2. Simulation Results The execution results which we mentioned in the prior section are as follows. Table 2 shows the order of job execution. The job’s order in our simulation was decided according to random sample table. We assumed all the 28 jobs are been submitted to information server and the server will keep the rule of first come first serve to follow our jobs allocation strategy. Figure 2 shows that the zeus is the most powerless cluster in our environment. It gets the most two time consuming jobs (nr 4 CPU and nr 8 CPU) so that it only can execute two jobs during the whole simulation. After the simulation, the information server will base on those execution records to modify the TP values about those time consuming jobs. In this case, the highest TP value of 4 CPUs is 4. Then, next time when the user wants to execute the job which is using nr database and 4 CPUs. The TP value should be equal to 4 or higher. Using this kind of learning model can


help the whole environment work in a more efficient way. The simulation result (Table 3) shows that the total execution time to finish twenty-eight jobs using the jobs allocation strategy will be more efficient than before. It can save about 1000 seconds for our whole environment to finished 28 jobs and 3000 seconds total execution time for our four clusters.

pp. 53, May 2006.

4. Conclusions In this paper, we combined four DRBL-based diskless Linux PC clusters and installed Globus Toolkit and Condor on the master node of each cluster. All diskless clusters could be merged into a powerful computing environment. To utilize the computing power, we proposed a jobs allocation strategy for application execution. We used HPL to conduct some performance evaluations and analyzed those data to form a formula for jobs allocation. The information server would calculate the value of computing power based on our formula and find the most appropriate cluster to execute the job. The experimental results showed that our method that can reduce the total execution time.

References [1] R.P. Bruin, M.T. Dove, M. Calleja, and M.G. Tucker, “Building and Managing the eMinerals Clusters: A Case Study in Grid-Enabled Cluster Operation,” IEEE Computing in Science and Engineering, 7(6), pp. 30-37, Nov. 2005. [2] V. Berten, J. Goossens, and E. Jeannot, “On the distribution of sequential jobs in random brokering for heterogeneous computational grids,” IEEE Transactions on Parallel and Distributed Systems, 17(2), pp. 113-124, Feb. 2006. [3] Condor, http://www.cs.wisc.edu/condor/ [4] DHCP, http://www.dhcp.org/ [5] DRBL, http://drbl.sourceforge.net/ [6] Globus Toolkit, http://www.globus.org/toolkit/ [7] HPL, http://www.netlib.org/benchmark/hpl/ [8] L. Peng, S. See, Y. Jiang, J. Song, A. Stoelwinder, and H.K. Neo, “Performance Evaluation in Computational Grid Environments,” Proceedings of IEEE HPCAsia’04, pp. 54-62, July 2004. [9] PXE, http://www.ltsp.org/documentation/pxe.howto.html [10] TFTP, http://www.faqs.org/rfcs/rfc1350.html [11] C.T. Yang, P.I. Chen, and Y.L. Chen, “Performance Evaluations of SLIM and DRBL Diskless PC Clusters on Fedora Core 3,” Proceedings of the 6th IEEE PDCAT’05, pp. 479-482, Dec. 2005. [12] C.T. Yang and C.S. Liao, “On Construction and Performance Evaluation of Cluster of Linux PC Clusters Environments,” Proceedings of the 6th IEEE CCGrid’06,

Figure 2: Jobs execution status diagram using our strategy

Figure 3: Jobs execution status diagram using round-robin Table 2: Job submission procedure mm2048 1 8 32 CPUs 2

mm2048 8 CPUs

15

mm1024 4 CPUs

22

mm256 32 CPUs

mm256 8 CPUs

23

mm512 8 CPUs

mm256 mm1024 16 9 16 16 CPUs CPUs

3

yeast 10 8 CPUs

mm512 4 CPUs

17

yeast 4 CPUs

24

yeast 16 CPUs

4

mm512 11 32 CPUs

mm512 16 CPUs

18

nr 16 CPUs

25

yeast 32 CPUs

5

nr 12 4 CPUs

mm2048 4 CPUs

19

env-nr 16 CPUs

26

nr 32 CPUs

6

nr 13 8 CPUs

env-nr 32 CPUs

20

mm256 4 CPUs

27

env-nr 4 CPUs

7

mm2048 14 16 CPUs

env-nr 8 CPUs

21

mm1024 32 CPUs

28

mm1024 8 CPUs

Table 3: Total execution jobs and time on each cluster Hostname condor1 condor2 amd Zeus Total execution time The time to finish 28 jobs


Job allocation strategy Total Job execution execution job time 11 4078 8 2862 7 2

1902 11708

Round robin Total Job execution execution job time 9 3431 7 5427 6 6

1370 12810

20550

23038

11708

12810