Int. J. Web and Grid Services, Vol. 3, No. 1, 2007
Batch mode scheduling in grid systems Fatos Xhafa Department of Languages and Informatics Systems Polytechnic University of Catalonia Campus Nord, Ed. Omega, C/Jordi Girona 1–3 08034 Barcelona, Spain Fax: +34–93–413–7833 E-mail:
[email protected]
Leonard Barolli* Department of Information and Communication Engineering Faculty of Information Engineering Fukuoka Institute of Technology (FIT) 3–30–1 Wajiro-higashi, Higashi-ku Fukuoka 811–0295, Japan Fax: +81–92–606–4970 E-mail:
[email protected] *Corresponding author
Arjan Durresi Department of Computer Science Louisiana State University 298 Coates Hall, Baton Rouge, LA 70803, USA Fax: +1 (225) 578–1465 E-mail:
[email protected] Abstract: Despite recent advances, grid and P2P systems remain difficult for many users to bring to real-world applications. One difficulty is the lack of schedulers for such systems. In this work, we consider the allocations of jobs to resources using batch mode methods. These methods are able to provide fast planning by exploring characteristics of distributed and highly heterogeneous systems. In evaluating these methods, four parameters of the system are measured: makespan, flowtime, resource utilisation and matching proximity. These methods were tested using the benchmark model of Braun et al. (2001) for distributed heterogeneous systems. Based on the computational results, we evaluate the performance of these methods with regard to the four considered metrics. Also, we evaluate the usefulness of batch methods when grid characteristics, such as degree of consistency of computing and heterogeneity of jobs and resources, are known in advance. We observe that batch mode methods are beneficial to grid scheduling services, for adaptively providing these services according to the grid infrastructure characteristics. Keywords: batch mode; scheduling; computational grids; ETC simulation model; resource allocation; adaptive scheduling; grid services.
Copyright © 2007 Inderscience Enterprises Ltd.
19
20
F. Xhafa, L. Barolli and A. Durresi Reference to this paper should be made as follows: Xhafa, F., Barolli, L. and Durresi, A. (2007) ‘Batch mode scheduling in grid systems’, Int. J. Web and Grid Services, Vol. 3, No. 1, pp.19–37. Biographical notes: Fatos Xhafa received his PhD in Computer Science from the Polytechnic University of Catalonia (Barcelona, Spain) in 1998. He joined the Department of Languages and Informatics Systems of the Polytechnic University of Catalonia as an Assistant Professor in 1996 and is currently an Associate Professor and a member of the ALBCOM Research Group of this department. His current research interests include parallel algorithms, approximation and meta-heuristics, distributed programming, and grid and P2P computing. Leonard Barolli is a Professor in the Department of Information and Communication Engineering, Fukuoka Institute of Technology (FIT), Japan. He received his BE and PhD degrees from Tirana University and Yamagata University in 1989 and 1997, respectively. He has published more than 200 papers in refereed journals and international conference proceedings. He has served as a Guest Editor for many journals. He was PC Chair of IEEE AINA-2004 and ICPADS-2005 and General Co-Chair of IEEE AINA-2006. At present, he is the Workshops Chair of iiWAS-2006 and Workshops Co-Chair of IEEE AINA-2007 and ARES-2007. His research interests include high-speed networks, grid computing, P2P, and ad hoc and sensor networks. He is a member of IEEE, IPSJ and SOFT. Arjan Durresi is an Assistant Professor of Computer Science at Louisiana State University. He received his BE, MS and PhD from the Polytechnic University of Tirana, Albania, and a Superior Specialisation Diploma in Telecommunications from La Sapienza University in Rome, Italy, and the Italian Telecommunication Institute. He has 20 years of experience in industry and academic research, in the areas of network architecture, heterogeneous wireless networks, security, grid computing, etc. He has published more than 40 journal and 70 conference papers. He also has over 30 contributions to standardisation organisations such as the IETF, ATM Forum, ITU, ANSI and TIA. He is on the editorial boards of Ad Hoc Networks Journal (Elsevier) and is a senior member of the IEEE.
1
Introduction
Computational Grids are a type of parallel and distributed system that enable dynamic sharing, selection and aggregation of geographically distributed autonomous resources, depending on their availability, capability, performance, cost and users’ QoS requirements (Foster and Kesselman, 1998; Foster et al., 2001). The rapid development of the internet and other new technologies enabled the construction and deployment of grid and Peer-to-Peer (P2P) applications, which benefited from the large computing capacity offered by such large-scale distributed systems. Thus, numerous projects from optimisation, engineering, medicine, biochemistry, physics, weather forecasting, simulations, etc. used grid computing to solve large instances of real-world problems. Examples of such projects are NetSolve by Casanova and Dongarra (1998), applications
Batch mode scheduling in grid systems
21
for stochastic programming and optimisation (Linderoth and Wright, 2003; Wright, 2001), data-intensive applications (Beynon et al., 2001; Newman et al., 2003) and eCollaborative platforms by Caballé et al. (2004). The usefulness of a grid system largely depends, among other factors, on the efficiency of the system in allocating jobs to grid resources. The resource allocation problem in large systems is known to be computationally hard and much more difficult than its standard version for sequential or Local Area Network (LAN) computation environments. Indeed, not only are the grid systems heterogeneous and dynamic, but usually a large number of jobs originated by different users and applications also have to be allocated to grid nodes. Furthermore, schedulers of the computational grids must provide load balancing of the resources, as the resource utilisation is an important parameter of the grid system, especially when owners are concerned about the use of their resources. At present, the scheduling problem in grid and P2P systems is being tackled from different perspectives, such as queueing systems and resource management, e.g., Condor-G, Nimrod/G (Frey et al., 2001; Abraham et al., 2000), optimisation approaches, e.g., genetic algorithms and other meta-heuristic approaches (Abraham et al., 2000; Buyya et al., 2000; Di Martino and Mililotti, 2004; Carretero and Xhafa, 2006; Xhafa, 2006), and economic models (Buyya, 2002; Buyya et al., 2002). In this work, we address the scheduling problem that arises when jobs have to be periodically scheduled. This type of scheduling is common in many applications, such as Networking by Ross and Bambos (2007) or Monte-Carlo simulations by Casanova et al. (2000), in which many jobs with almost no interdependencies are to be submitted to the grid system. More precisely, we present several batch methods that provide fast planning of jobs to grid resources. Batch mode scheduling methods are simple and yet powerful heuristics that are distinguished for their efficiency in contrast to more sophisticated scheduling methods such as meta-heuristics, economic models or game theory, which could need longer execution times to provide high-quality allocation of resources. Moreover, batch mode methods can take better advantage of job and resource characteristics in deciding which job to allocate to which resource, since they dispose of the time interval between two successive activations of the batch scheduler. Several batch mode scheduling methods have been proposed in the literature (Maheswaran et al., 1999; Abraham et al., 2000; Braun et al., 2001; Wu and Shu, 2001) mainly with the objective of using them as part of other approaches. In this work, we considered the following batch mode methods: Min-Min, Max-Min, Sufferage, Relative Cost and Longest Job to Fastest Resource–Shortest Job to Fastest Resource. Our main focus here is, on one hand, to implement and evaluate these methods for the case of grid systems, and, on the other, to address the usefulness of such methods in grid scheduling services. It should be noted that in previous work, the performance of these methods was measured using only the makespan of the system, while we study – to the best of our knowledge, for the first time – their performance with regard to four parameters: makespan, flowtime, resource utilisation and matching proximity. Once implemented, these methods were tested and their performance evaluated using the benchmark simulation model by Braun et al. (2001), which is based on the so-called Expected Time to Compute (ETC) model and allows the simulation of distributed heterogeneous systems. From this simulation model, a benchmark of static
22
F. Xhafa, L. Barolli and A. Durresi
instances was generated, which is actually considered one of the most demanding benchmark for the problem. We have conducted an extensive experimental study to analyse the performance of the methods presented with respect to the four metrics considered. Moreover, the experimental study was aimed at revealing which of the batch methods better takes into account known scheduling characteristics, such as the degree of consistency of computing grid nodes, heterogeneity of resources and heterogeneity of submitted jobs. Batch mode methods have been largely explored for many computing environments and different types of applications. Owing to their potential use in emergent computational systems, such as grid, P2P and web computing, batch mode methods are attracting special attention from researchers (Casanova et al., 1999; Doerr et al., 1999; Dail, 2002). Thus, we are dealing with grid applications needing long-running parallel computations, which could generate large numbers of jobs, or batch servers to which clients could submit their particular jobs/applications. Providing efficient batch service for these systems is nowadays essential in order to ensure QoS requirements. Though considerable efforts are being made in this respect, scheduling services in emergent computational systems are still open research issues. This motivated our work to address the usefulness of the presented batch methods for batch service for scheduling in grid and P2P applications. Because mathematical models for batch methods are usually difficult to handle for practical purposes, we address this issue from an empirical perspective, that is, using computational results from a simulation model. The rest of the paper is organised as follows. We give in Section 2 a description of the scheduling problem in computational grids. The batch mode methods considered in this work are given in Section 3. We give in Section 4 some computational results, which are evaluated in Section 5. We address the usefulness of the presented batch methods as part of batch servicing systems in Section 6 and end in Section 7 with some conclusions and indications for further work.
2
Problem description
Job scheduling in grids consists in efficiently allocating jobs to resources in a global, heterogeneous and dynamic environment. Efficiency means that we are interested in allocating jobs as fast as possible and in optimising several (conflicting) criteria, such as makespan, flowtime, resource utilisation and matching proximity. Jobs have the following characteristics: they are originated from different users/applications, have to be completed in unique resource (preemptive mode), independent and have requirements over resources. On the other hand, resources can dynamically be added/dropped from the grid, which can process one job at a time. In order to formalise the problem definition, we could use existing distributed and heterogeneous systems. The diversity of such systems, however, makes it difficult to capture and generalise the most relevant characteristics of the job scheduling for grid systems. Moreover, fair comparison among different scheduling methods in existing distributed systems would be rather difficult. Therefore, we based our study on a benchmark simulation by Braun et al. (2001), the ETC model, in which the expected running time of jobs on resources is known or can be predicted. Using the ETC model, the scheduling problem consists of:
Batch mode scheduling in grid systems
23
•
a number of independent (user/application) jobs to be scheduled
•
a number of heterogeneous candidate machines to participate in the planning
•
the workload of each job (in millions of instructions)
•
the computing capacity of each machine (in Million Instructions Per Second or MIPS)
•
ready time readym – when machine m will have finished the previously assigned jobs. This parameter measures the previous workload of machine m.
•
The Expected Time to Compute matrix ETC (nb_jobs × nb_machines), in which the component ETC[i][j] is the expected execution time of job i in machine j.
Interestingly, the ETC model is also able to express possible incompatibilities among jobs and resources by taking the corresponding ETC value equal to ∞. The ETC matrix can be computed in different ways; for instance, by dividing the workload of job i by computing capacity of machine j, we obtain the ETC[i][j] value. Note, however, that this version of the problem does not include other characteristics such as local policies of resources and possible job dependencies. Nonetheless, as described in the Introduction, this version arises in many grid-based applications that can be partitioned into independent parts. For instance, an application of the intensive use of CPUs can be thought of as an application composed of sub-jobs, each one capable of being executed in a different machine of the computational grid.
Optimisation criteria Several metrics can be used for measuring the quality of a given schedule, such as: •
Makespan: the finishing time of the latest job, defined as: min max{Fj : j ∈ Jobs}.
(1)
schedule
•
Flowtime: the sum of the finishing times of jobs, that is: min
schedule
•
∑
j∈Jobs
(2)
Fj .
Resource utilisation: expresses the degree of utilisation of resources with respect to the schedule. In fact, we consider the average resource utilisation, detailed next.
The resource utilisation is defined using the completion time of a machine, which indicates the time at which machine m will finalise the processing of the previous assigned jobs as well as those already planned for the machine. Formally, it is defined as follows: completion[ m] = ready[ m] +
∑
j∈schedule−1 ( m )
ETC[ j ][ m].
(3)
24
F. Xhafa, L. Barolli and A. Durresi
With the values of the completion time for the machines (Equation (3)), we can define the local makespan (Equation (1)), which is the makespan when considering only the machines involved in the current schedule: local _ makespan = max{completion[i ] i ∈ Machines′}.
(4)
Then, we define: local _ avg _ utilisation =
∑
i∈Machines
completion[i ]
local _ makespan ⋅ nb _ machines
.
(5)
Moreover, we also consider the matching proximity as an additional performance parameter of batch mode methods. Matching proximity indicates the degree of proximity of a given schedule to the schedule produced by the Minimum Execution Time (MET) method, which assigns a job to the machine having the smallest execution time for that job. A large value for matching proximity means that a large number of jobs is assigned to the machine that executes them faster. Formally, this parameter is defined as follows: matching _ proximity =
∑ ∑
i∈Jobs
ETC[i ][ schedule][i ]
i∈Jobs
ETC[i ][ MET ][i ]
.
(6)
It should be noted that these parameters are among the most important parameters of a grid system. Makespan measures the throughput of the grid system, flowtime measures the QoS of the grid system, and resource utilisation indicates the quality of a schedule with respect to the utilisation of resources involved in the schedule, with the aim of reducing the idle time of resources.
3
Batch mode methods
In this work, we consider Min-Min, Max-Min, Sufferage, Relative Cost and Longest Job to Fastest Resource–Shortest Job to Fastest Resource batch mode methods.
3.1 Min-Min The Min-Min method starts with computing a matrix of values completions [i][j] for any job i and machine j based on the ETC[i][j] and ready[j] values: completions[i ][ j ] = ETC[i ][ j ] + ready[ j ].
(7)
For any job i, the machine mi yielding the earliest completion time is computed by traversing the i-th row of the completion matrix. Then the job ik with the earliest completion time is chosen and mapped to the corresponding machine mk (previously computed). Next, the job ik is removed from Jobs and the values completion[i][j] for each i in Jobs and machine mk are updated. The process is repeated as long as there remain jobs to be assigned.
25
Batch mode scheduling in grid systems
3.2 Max-Min The Max-Min method is similar to Min-Min. The difference is that, for any job i, once the machine mi yielding the earliest completion time is computed, the job ik with the latest completion time is chosen and mapped to the corresponding machine. Note that this method is appropriate when most of the jobs arriving in the grid system are short1 ones. Thus, Max-Min would try to schedule at the same time all the short jobs and the longest ones while Min-Min would schedule first the shortest jobs and after that the longest ones, implying thus a larger makespan.
3.3 Sufferage The idea behind the Sufferage method is that better scheduling could be obtained if we assign to a machine a job which would ‘suffer’ more if it were assigned to any other machine. To implement this method, the sufferage parameter of a job is defined as the difference between the second earliest completion time of the job in machine ml and the first earliest completion time of the job in machine mk. The method starts by labelling all machines as available. Then, in each iteration a pending job j is chosen to be scheduled. To this end, for job j, machines mi and ml and the sufferage value are computed. If machine mi is available, then job j is assigned to mi. In the case when mi is already busy with job j′, then j and j′ will compete for machine mi; the winner is the job with the largest sufferage value. The job losing the competition will be considered for scheduling after all pending jobs have been analysed.
3.4 Relative cost In allocating jobs to machines, the Relative Cost method takes into account both the load balancing of machines and the execution times of jobs in machines. Therefore, for a given job, this method finds the machine that best matches a job’s execution time. In fact, this last criterion is known as matching proximity and is used, apart from makespan, flowtime and resource utilisation, as another metric for measuring the performance of the (batch) scheduling methods. Note that load balancing and matching proximity are contradicting criteria. In order to find a good trade-off between them, the method uses two parameters, namely, static relative cost and dynamic relative cost, defined next. Given a job i and a machine j, the static relative cost γ s is defined as: ij
γ ijs =
ETC[i ][ j ] , etc _ avgi
where
etc _ avgi =
∑
j∈Machines
ETC[i ][ j ]
nb _ machines
.
(8)
This static parameter is computed only once at the beginning of the execution of the method. The dynamic relative cost, on the other hand, is computed at the beginning of each iteration k, as follows:
γ ijd =
completions ( k ) [i ][ j ] , completion _ avgi( k )
(9)
26
F. Xhafa, L. Barolli and A. Durresi
where: completion _ avgi( k ) =
∑
j∈Machines
completions ( k ) [i ][ j ]
nb _ machines
.
(10)
At each iteration k, the best job ibest is the one that minimises the expression:
(γ ) s i , mi*
α
⋅ γ id, m* , ∀i ∈ Jobs,
(11)
i
where: mi* = argmin{completions ( k ) [i ][ m ]
m ∈ Machines}.
(12)
The value of α is fixed to 0.5 for the purpose of the computational results.
3.5 Longest Job to Fastest Resource–Shortest Job to Fastest Resource (LJFR-SJFR) The last method differs substantially from the rest of the presented methods since it tries to simultaneously minimise both the makespan and flowtime values of a schedule. It can be seen as being composed of two methods: the Longest Job to Fastest Resource (LJFR), which tries to minimise makespan, and the Shortest Job to Fastest Resource (SJFR), which tries to minimise flowtime. We note that this method uses the workloads of jobs and computing capacity of resources. Essentially, the LJFR is alternated with the SJFR. The method starts by sorting the jobs in ascending order of their workloads. At the beginning, the first nb_machines longest jobs are assigned to the nb_machines idle machines (the longest job to the fastest machine and so on). Then, for the rest of the jobs, at each step, the fastest of machines that has finished its jobs is chosen and is alternatively assigned either the shortest job (SJFR) or the longest job (LJFR) from the remaining jobs.
3.6 Implementation To approach the batch scheduling numerically, the presented batch methods have been implemented in C++. From a computational complexity perspective, it can be shown that the time complexity of the batch methods being considered is O(nb_jobs2⋅ nb_machines). This fact is useful in deciding the length of the batch according to the computational and execution time requirements.
4
Computational results
In this section, we present the computational results obtained with the implementations of the batch mode methods. In experimentally evaluating scheduling methods, one has basically two alternatives: either use a real grid environment or use a simulation model that simulates as much as possible a real grid environment. In this paper, we present results obtained by using the benchmark simulation model by Braun et al. (2001), which
Batch mode scheduling in grid systems
27
enables the high configuration flexibility needed at the first phases of the study. In the future, directed also by the results, we plan to experiment with real grid environments, which are expensive and do not allow high configuration flexibility.
4.1 Benchmark description The instances of this benchmark are classified into 12 different types of ETC matrices, each of them consisting of 100 instances, according to three metrics: job heterogeneity, machine heterogeneity and consistency. Instances are labelled as u_x_yyzz.k, where: •
u means uniform distribution (used in generating the matrix).
•
x means the type of consistency (c – consistent, i – inconsistent and s – semi-consistent). An ETC matrix is considered consistent when, if a machine mi executes job t faster than machine mj, then mi executes all the jobs faster than mj. Inconsistency means that a machine is faster for some jobs and slower for others. An ETC matrix is considered semi-consistent if it contains a consistent sub-matrix.
•
yy indicates the heterogeneity of the jobs (hi means high, and lo means low).
•
zz indicates the heterogeneity of the resources (hi means high, and lo means low).
We notice that the benchmark of these instances is considered one of the most demanding for the scheduling problem in heterogeneous environments, and is the main reference in the literature (Ritchie and Levine, 2003; Ritchie, 2003; Giersch et al., 2004; Song et al., 2005; Kwok et al., 2005). Note that for all the instances, the number of jobs is 512 and the number of machines is 16. We have selected different types of instances and for each instance considered, we compute the makespan, flowtime, resource utilisation and matching proximity. To the best of our knowledge, this is the first time computational results are reported for the methods altogether and, in particular, for the LJFR-SJFR method, which is used in the literature for initialising other heuristic approaches. Regarding this method, the results are obtained by first transforming an ETC input instance into the input format for LJFR-SJFR, which uses another input format.
4.2 Results for makespan We present in Table 1 the computational results obtained from batch methods for makespan. An illustrative graphical representation of the makespan values is shown in Figure 1 (we have split the figure into two figures owing to the scale of makespan values for different types of instances). Ritchie and Levine (2003) used the Min-Min method in combination with Local Search procedures. We present in Table 2 the comparison of the results for makespan from our implementation and their results (Ritchie and Levine, 2003). We can observe small differences in the reported makespan values, though our implementation yields better results for 8 out of 12 considered instances.
28
F. Xhafa, L. Barolli and A. Durresi
Table 1
Makespan values for Braun et al. benchmark (in arbitrary time units)
Instance
Min-Min
Max-Min
Sufferage
Relative cost
LJFR-SJFR
u_c_hihi.0
8 460 675.003
12 385 671.828
10 908 697.836
9 576 838.988
14 665 588.00
u_c_hilo.0
161 805.434
204 054.588
167 483.273
163 200.214
213 423.25
u_c_lohi.0
275 837.356
392 566.686
349 746.074
309 192.737
485 590.92
u_c_lolo.0
5441.428
6945.362
5649.899
5542.551
7112.78
u_i_hihi.0
3 513 919.281
8 018 378.071
3 391 758.359
3 447 651.421
25 777 789.21
u_i_hilo.0
80 755.679
151 923.834
78 828.278
76 471.530
278 568.11
u_i_lohi.0
120 517.708
251 528.847
125 688.604
126 002.417
816 216.91
u_i_lolo.0
2785.645
5177.709
2673.828
2677.048
9276.56
u_s_hihi.0
5 160 342.819
9 208 811.495
5 574 357.799
5 068 011.463
19 429 749.81
u_s_hilo.0
104 375.164
172 822.698
103 400.813
101 739.593
243 432.46
u_s_lohi.0
140 284.488
282 085.731
153 094.025
143 491.192
630 140.64
u_s_lolo.0
3806.827
6232.241
3727.971
3679.586
8688.05
Graphical representation of makespan values for all but ‘u_x_hihi’ instances (top) and for ‘u_x_hihi’ instances (bottom)
900000 800000 700000 600000 500000 400000 300000 200000 100000 0
Min-Min Max-Min Sufferage Relative Cost LJ F R -S J F R
0 .0 .0 .0 i. 0 i. 0 lo. 0 i. 0 lo. 0 o. ilo loh i l o l oh ilo ol o oh l h l l lo h h i_ i_ i_ c_ _c_ c_ s_ _s_ s_ u_ u_ u_ u_ u u_ u_ u u_ Ins tanc es
Makes pan
Makes pan
Figure 1
40000000 35000000 30000000 25000000 20000000 15000000 10000000 5000000 0
t in in R ge os M M a JF r C x S e in f a e f M iv M FR Su at el LJ R Methods
u_c_hihi.0 u_i_hihi.0 u_s _hihi.0
29
Batch mode scheduling in grid systems Table 2
Comparison of makespan values (in arbitrary time units) obtained with our implementation and Ritchie and Levine’s implementation
Instance
Min-Min (our implementation)
Min-Min (Ritchie and Levine)
8 460 675.003
8 428 258.43
u_c_hihi.0 u_c_hilo.0
161 805.434
162 745.18
u_c_lohi.0
275 837.356
283 083.4
u_c_lolo.0
5441.428
5460.25
u_i_hihi.0
3 513 919.281
3 632 360.64
u_i_hilo.0
80 755.679
82 413.3
u_i_lohi.0
120 517.708
122 044.94
u_i_lolo.0
2785.645
2777.16
u_s_hihi.0
5 160 342.819
4 897 763.92
u_s_hilo.0
104 375.164
105 157.39
u_s_lohi.0
140 284.488
163 927.91
u_s_lolo.0
3806.827
3527.45
4.3 Results for flowtime We present in Table 3 the computational results for flowtime obtained from batch methods. Their graphical representation is given in Figure 2. Table 3 Instance
Flowtime values for Braun et al. benchmark (in arbitrary time units) Min-Min
Max-Min
Sufferage
Relative cost
LJFR-SJFR
u_c_hihi.0
1 047 290 764.1
1 677 778 133.7
1 728 201 829.0
1 440 713 010.0
2 025 822 398.6
u_c_hilo.0
27 678 395.8
35 444 353.0
32 149 457.3
30 338 311.0
35 565 379.5
u_c_lohi.0
34 764 848.0
55 217 956.2
58 754 668.5
47 700 966.3
66 300 486.2
u_c_lolo.0
922 005.6
1 187 820.4
1 052 744.8
1 008 866.8
1 175 661.3
u_i_hihi.0
352 768 066.2
888 119 419.4
428 139 885.7
367 881 880.2
3 665 062 510.3
u_i_hilo.0
12 520 217.4
22 258 825.2
13 312 013.5
12 923 076.1
41 345 273.2
u_i_lohi.0
12 369 160.9
28 373 487.6
14 624 047.6
12 774 845.8
118 925 452.9
u_i_lolo.0
437 571.4
784 299.4
463 245.6
444 104.2
1 385 846.1
u_s_hihi.0
508 290 217.0
1 124 763 070.1
768 589 776.8
631 516 071.2
2 631 459 406.5
u_s_hilo.0
16 345 314.2
26 872 788.0
18 614 808.8
17 596 804.6
35 745 658.3
u_s_lohi.0
14 995 170.1
33 400 144.1
22 130 034.8
19 033 082.9
86 390 552.3
u_s_lolo.0
596 097.1
977 828.5
680 682.7
637 505.3
1 389 828.7
30
F. Xhafa, L. Barolli and A. Durresi Graphical representation of flowtime values for all but ‘u_x_hihi’ instances (top) and for ‘u_x_hihi’ instances (bottom)
90000000 80000000 70000000 60000000 50000000 40000000 30000000 20000000 10000000 0
Min-Min Max-Min Sufferage Relative Cost LJ F R -S J F R
.0 i.0 lo. 0 lo. 0 hi. 0 lo. 0 lo. 0 hi. 0 lo. 0 ilo loh lo lo lo lo hi lo h hi i_ i_ i_ c_ _c_ _c_ s _ _s _ _ s _ _ _ _ _ _ u u u u u u u u u Ins tanc e
4000000000 3500000000 3000000000 2500000000 2000000000 1500000000 1000000000 500000000 0
u_c_hihi.0 u_i_hihi.0
Method
FR LJ
Re
la
tiv
e
-S
Co
JF
R
st
ge ra ffe Su
ax M
in
-M
-M
in
in
u_s_hihi.0
M
F low tim e
Flowtime
Figure 2
31
Batch mode scheduling in grid systems
4.4 Results for resource utilisation We present in Table 4 the computational results for resource utilisation obtained from batch methods and the respective graphical representations in Figure 3. Average resource utilisation values for Braun et al. benchmark
Table 4 Instance
Min-Min
Max-Min
Sufferage
Relative cost
LJFR-SJFR
u_c_hihi.0
0.898
0.9991
0.916
0.994
0.948
u_c_hilo.0
0.948
0.9997
0.988
0.995
0.944
u_c_lohi.0
0.887
0.9992
0.938
0.988
0.926
u_c_lolo.0
0.950
0.9996
0.989
0.989
0.948
u_i_hihi.0
0.834
0.996
0.951
0.863
0.961
u_i_hilo.0
0.915
0.9994
0.965
0.972
0.963
u_i_lohi.0
0.857
0.998
0.898
0.819
0.981
u_i_lolo.0
0.920
0.9991
0.984
0.957
0.965
u_s_hihi.0
0.796
0.997
0.917
0.940
0.97
u_s_hilo.0
0.924
0.998
0.987
0.992
0.924
u_s_lohi.0
0.888
0.998
0.968
0.986
0.946
u_s_lolo.0
0.916
0.9991
0.987
0.977
0.966
Note:
R e s o u rce U t i l i sa t i o n
Figure 3
See Equation (5) Graphical representation of resource utilisation values
1,2 1
Min-Min
0,8
Max-Min
0,6
Suffrage
0,4
Relative Cost
0,2
LJ F R -S J F R
0 i. 0 . 0 i. 0 . 0 i. 0 . 0 i. 0 . 0 i. 0 . 0 i. 0 . 0 ih hilo loh lolo hih hilo loh lolo hih hilo loh lolo h c _ c_ c _ c_ _ i_ _ i_ _ i_ _ i_ _ s _ _ s _ _ s _ _ s _ u u u u u u u u_ u_ u_ u_ u
Instance
32
F. Xhafa, L. Barolli and A. Durresi
4.5 Results for matching proximity The computational results for matching proximity are given in Table 5 and the graphical representation in Figure 4. Matching proximity values for Braun et al. benchmark
Table 5 Instance
Min-Min
Max-Min
Sufferage
Relative cost
LJFR-SJFR
u_c_hihi.0
0.390
0.239
0.296
0.311
0.213
u_c_hilo.0
0.482
0.363
0.447
0.455
0.367
u_c_lohi.0
0.370
0.231
0.276
0.297
0.201
u_c_lolo.0
0.478
0.356
0.442
0.451
0.366
u_i_hihi.0
0.966
0.354
0.878
0.952
0.114
u_i_hilo.0
0.980
0.477
0.952
0.974
0.270
u_i_lohi.0
0.964
0.396
0.882
0.965
0.124
u_i_lolo.0
0.981
0.486
0.955
0.981
0.280
u_s_hihi.0
0.678
0.303
0.545
0.585
0.147
u_s_hilo.0
0.755
0.422
0.714
0.722
0.324
u_s_lohi.0
0.659
0.291
0.554
0.581
0.137
u_s_lolo.0
0.750
0.420
0.711
0.727
0.311
Note:
See Equation (6) Graphical representation of matching proximity values
Figure 4
Matching proximity
1,2 1 Min-Min
0,8
Max-Min
0,6
S u f fe ra g e
0,4
R e l a t i ve Co s t
0,2 0
. 0 . 0 .0 .0 . 0 . 0 . 0 . 0 .0 .0 . 0 .0 hi lo hi lo hi lo hi lo hi lo hi lo hi _ hi _ lo _ lo _ hi _ hi _ lo _ lo _ hi _ hi _ lo _ lo _ c _i _i _i _ i _s _ s _s _s c c c u u u u u_ u_ u _ u _ u u u u Instance
Batch mode scheduling in grid systems
5
33
Performance evaluation
In this section, we evaluate the computational results obtained from batch mode methods regarding the four parameters: makespan, flowtime, resource utilisation and matching proximity. Note that the presented methods are deterministic. Their execution times are given by their time complexity, therefore the hardware/software configuration used is irrelevant. For the benchmark of instances considered consisting of 512 jobs and 12 machines, in any basic configuration of today’s computers, the running time would be less than one second. As can be seen from the description of the methods considered, they use concrete strategies; therefore the objective is to evaluate their performance, from which we can deduce which method to use for certain grid system characteristics (configurations), especially if we knew such characteristics in advance. From Tables 1, 3, 4 and 5, we can easily analyse the performance of the presented batch methods with respect to the four metrics studied: makespan, flowtime, resource utilisation and matching proximity. Surprisingly, the LJFR-SJFR method, which is aimed at simultaneously minimising both makespan and flowtime, obtained very poor results for all considered metrics. We have therefore excluded it from the analysis given in the next subsections.
5.1 Makespan Min-Min and Relative Cost obtain the best makespan values. As can be seen from Table 1, Min-Min is thus appropriate for consistent matrices and for grid scenarios of low heterogeneity of jobs and high heterogeneity of resources. Relative Cost works better than Min-Min for semi-consistent and inconsistent matrices. The Sufferage method, unlike the results reported in Maheswaran et al. (1999), does not outperform Min-Min and Relative Cost methods. One possible explanation for this could be that Sufferage can yield better results for smaller matrices; in fact, when applied to some sub-matrices of the considered instances, better performance was observed, even outperforming Min-Min in a few cases. Max-Min shows the worst performance (see Table 1 and Figure 1).
5.2 Flowtime Min-Min performs best for this parameter, followed by Relative Cost. Again, Max-Min is the worst and Sufferage seems to perform well only for consistent matrices with high heterogeneity of resources (see Table 3 and Figure 2).
5.3 Average resource utilisation The Max-Min method achieves better resource utilisation, though it performs poorly for makespan values. On the other hand, Min-Min shows the worst performance, as expected, since load balancing is not good for Min-Min. Relative Cost and Sufferage perform quite well; the former performs well for consistent and semi-consistent matrices and the latter for inconsistent matrices (see Table 4 and Figure 3). We notice, however, that all methods achieve a high rate of resource utilisation and, therefore, they seem not to be much affected by the type of instance.
34
F. Xhafa, L. Barolli and A. Durresi
5.4 Matching proximity Min-Min achieves the best proximity, closely followed by Relative Cost and Sufferage. Max-Min performs poorly, especially for inconsistent and semi-consistent matrices (see Table 5 and Figure 4). We notice thus a considerable difference in the proximity matching rate for inconsistent matrices, followed by semi-consistent matrices.
6
Batch servicing systems
In this section, based on the obtained results for batch methods, we address their usefulness in batch servicing. The presented methods can be used either in classical batch systems/batch servers or as part of middleware for emergent computational systems such as grid scheduling services. As shown by our empirical analysis, the presented batch methods have several properties that make them an interesting option for both classical and emergent computational systems. We discuss this issue next. In a batch service system, individual jobs arrive at the system at random. Typically, jobs are queued till a batch of a pre-specified size is formed, which is then passed to the scheduler for processing. The efficiency of batch service systems depends on the batch processing rate, which must take into account many parameters such as the computing capacity of the system; it is, in general, difficult to compute. The presented batch methods have the advantage of taking into account the computing capacity of resources. Moreover, in traditional batch systems, job characteristics are taken into account less. Thus, by informing the batch system about the computing capacity of resources, their computing consistency and job characteristics, the scheduler will choose the appropriate batch method that yields the best performance. Observe that by using the presented batch methods, we could also indicate which parameter of the system (makespan, flowtime, resource utilisation or matching proximity) we are most interested in to optimise. Traditional batch systems try to optimise only the throughput of the system, which is reasonable for LANs and clusters but is not sufficient for emergent computational systems such as grid and P2P systems. Indeed, in such systems, we could also be interested in: •
Obtaining a high QoS – This requires optimising the flowtime of the system.
•
Maximising the user benefits – Many of the emergent computational systems are being built up by joining the resources of institutions, enterprises or persons, which could expect benefits owing to the use of their resources by the grid enterprise. This would require maximising the resource utilisation, which is one of the metrics studied for the presented batch methods.
•
Minimising the grid users payment – Grid systems could be used in a pay-per-use mode; that is, users that submit their jobs/applications to the grid system could be required to pay for using the grid system. It is reasonable to expect that the system would map the user’s jobs to the machines that best fit them; this is precisely the matching proximity metric.
Batch mode scheduling in grid systems
35
From the above observations, we deduce that the presented batch methods are very useful for traditional and emergent computational systems, as they allow the optimisation of several parameters and take into account characteristics of the underlying infrastructure, user requirements and job characteristics. Regarding this last feature, the computation of the workload of jobs is possible either from user’s description/information or from historical data. Examples of the computation of the workload of jobs are available from the Cornell Theory Center (Hotovy, 1996) or the Parallel Workload Archive (The Hebrew University Parallel Systems Laboratory, 2006). In the case of computational grids, the presented batch methods would be used as part of a grid middleware or, to be precise, as part of grid scheduling services. One current approach for scheduling in such systems is the design of super-schedulers or brokers, which are informed about the state of the network and adaptively allocate the user jobs to resources. Such a super-scheduler could then use the different scheduling methods presented in this work according to the state of the network.
7
Conclusions and future work
In this work, we considered a family of batch methods for job scheduling in computational grids. Five batch mode methods (Min-Min, Max-Min, Sufferage, Relative Cost and LJFR-SJFR) were presented and studied. The implementations of these methods were tested using the benchmark simulation model for distributed heterogeneous systems by Braun et al. (2001). The computational results show that none of the presented batch methods performs best; rather, their performance depends on the grid scenarios, heterogeneity of the jobs (high, low), heterogeneity of the resources (high, low) and consistency (consistent, inconsistent and semi-consistent) of the computing resources. It should be noted that consistency allows the simulation of real grid environments having job resource restrictions. The experimental study revealed the usefulness of applying certain batch methods, if we knew in advance the characteristics of the grid system. Thus, schedulers based on these methods could be designed to achieve an effective and efficient allocation of resources depending on the desired system metric (makespan, flowtime, resource utilisation or matching proximity). In our future work, we plan to implement a scheduler that will use the presented batch methods and to experimentally measure its performance using a grid simulator prototype that would allow us to easily change the configuration of grid nodes and job characteristics to test the scheduler against different grid scenarios.
Acknowledgements This research is partially supported by Projects ASCE TIN2005-09198-C02-02, FP6-2004-ISO-FETPI (AEOLUS) and MEC TIN2005-25859-E. The authors would like to thank Javier Carretero for his programming work and experimental setup during the realisation of his Final Project in Informatics Degree, Faculty of Informatics of Barcelona.
36
F. Xhafa, L. Barolli and A. Durresi
References Abraham, A., Buyya, R. and Nath, B. (2000) ‘Nature’s heuristics for scheduling jobs on computational grids’, The 8th IEEE Int. Conf. on Advanced Computing and Communications (ADCOM 2000), India. Beynon, M.D., Sussman, A., Catalyurek, U., Kure, T. and Saltz, J. (2001) ‘Optimization for data intensive grid applications’, Third Annual International Workshop on Active Middleware Services, pp.97–106. Braun, T.D., Siegel, H.J., Beck, N., Boloni, L.L., Maheswaran, M., Reuther, A.I., Robertson, J.P., et al. (2001) ‘A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems’, J. of Parallel and Distr. Comp., Vol. 61, No. 6, pp.810–837. Buyya, R. (2002) ‘Economic-based distributed resource management and scheduling for grid computing’, PhD thesis, Monash University, Melbourne, Australia. Buyya, R., Abramson, D. and Giddy, J. (2000) ‘Nimrod/G: an architecture for a resource management and scheduling system in a global computational grid’, The 4th Int. Conf. on High Performance Computing, Asia-Pacific Region, China. Buyya, R., Abramson, D., Giddy, J. and Stockinger, H. (2002) ‘Economic models for resource management and scheduling in grid computing’, Concurrency and Computation: Practice and Experience, Vol. 14, Nos. 13–15, pp.1507–1542. Caballé, S., Xhafa, F., Daradoumis, T. and Marqués, J.M. (2004) ‘Towards a generic platform for developing CSCL applications using grid infrastructure’, Cluster Computing and the Grid, CD Proceedings, USA: IEEE. Carretero, J. and Xhafa, F. (2006) ‘Using genetic algorithms for scheduling jobs in large scale grid applications’, Journal of Technological and Economic Development – A Research Journal of Vilnius Gediminas Technical University, Vol. 12, No. 1, pp.11–17. Casanova, H. and Dongarra, J. (1998) ‘Netsolve: network enabled solvers’, IEEE Computational Science and Engineering, Vol. 5, No. 3, pp.57–67. Casanova, H., Kim, M., Plank, J.S. and Dongarra, J. (1999) ‘Adaptive scheduling for task farming with grid middleware’, International Journal of High Performance Computing, Sage Science Press, Vol. 13, No. 3, pp.231–240. Casanova, H., Legrand, A., Zagorodnov, D. and Berman, F. (2000) ‘Heuristics for scheduling parameter sweep applications in grid environments’, Heterogeneous Computing Workshop, pp.349–363. Dail, H.J. (2002) ‘A modular framework for adaptive scheduling in grid application development environments’, Master’s thesis, University of California, San Diego. Di Martino, V. and Mililotti, M. (2004) ‘Sub optimal scheduling in a grid using genetic algorithms’, Parallel Computing, Vol. 30, pp.553–565. Doerr, B.S., Venturella, T., Jha, R., Gill, C.D. and Schmidt, D.C. (1999) ‘Adaptive scheduling for real-time, embedded information systems’, 18th IEEE/AIAA DASC, St. Louis. Foster, I. and Kesselman, C. (1998) The Grid – Blueprint for a New Computing Infrastructure, Morgan Kaufmann Publishers. Foster, I., Kesselman, C. and Tuecke, S. (2001) ‘The anatomy of the grid’, International Journal of Supercomputer Applications, Vol. 15, No. 3. Frey, J., Tannenbaum, T., Foster, I., Livny, M. and Tuecke, S. (2001) ‘Condor-G: a computation management agent for multi-institutional grids’, Proc. of the 10th IEEE Symposium on HPDC. Giersch, A., Robert, Y. and Vivien, F. (2004) ‘Scheduling tasks sharing files on heterogeneous master-slave platforms’, 12th Euromicro Conference on Parallel, Distributed and Network Based Processing (PDP’2004), A Coruña, Spain. Hotovy, S. (1996) ‘Workload evolution on the Cornell Theory Center IBM SP2’, Job Scheduling Strategies for Parallel Proc. Workshop, IPPS’96, pp.27–40.
Batch mode scheduling in grid systems
37
Kwok, Y., Song, S. and Hwang, K. (2005) ‘Selfish grid computing: game-theoretic modeling and NAS performance results’, Proceedings of CCGrid 2005, Cardiff, UK. Linderoth, L. and Wright, S.J. (2003) ‘Decomposition algorithms for stochastic programming on a computational grid’, Computational Optimization and Applications, Vol. 24, pp.207–250. Maheswaran, M., Ali, S., Siegel, H.J., Hensgen, D. and Freund, R.F. (1999) ‘Dynamic mapping of a class of independent tasks onto heterogeneous computing systems’, J. of Parallel and Distr. Comp., Vol. 59, No. 2, pp.107–131. Newman, H.B., Ellisman, M.H. and Orcutt, J.A. (2003) ‘Data-intensive e-science frontier research’, Communications of ACM, Vol. 46, No. 11, pp.68–77. Ritchie, G. (2003) ‘Static multi-processor scheduling with ant colony optimisation & local search’, Master’s thesis, School of Informatics, University of Edinburgh. Ritchie, G. and Levine, J. (2003) ‘A fast, effective local search for scheduling independent jobs in heterogeneous computing environments’, Technical Report, Centre for Intelligent Systems and their Applications, School of Informatics, University of Edinburgh. Ross, K. and Bambos, N. (2007) ‘Adaptive batch scheduling for packet switching with delays’, in Elhanany and Hamdi (Eds.) High-performance Packet Switching Architectures, Springer. Song, S., Kwok, Y. and Hwang, K. (2005) ‘Security-driven heuristics and a fast genetic algorithm for trusted grid computing’, Proceedings of IPDPS 2005, Denver, Colorado, 4–8 April. The Hebrew University Parallel Systems Laboratory (2006) ‘Parallel workload archive’, http://www.cs.huji.ac.il/labs/parallel/workload/ (as of September 2006). Wright, S.J. (2001) ‘Solving optimization problems on computational grids’, Optima, Vol. 65. Wu, M. and Shu, W. (2001) ‘A high-performance mapping algorithm for heterogeneous computing systems’, Proceedings of the 15th International Parallel & Distributed Processing Symposium, p.74. Xhafa, F. (2006) ‘An experimental study on GA replacement operators for scheduling on grids’, The 2nd International Conference on Bioinspired Optimization Methods and Their Applications (BIOMA 2006), Ljubljana, Slovenia, 9–10 October, pp.121–130.
Note 1
Here, ‘short’ and ‘long’ refer to small and large ETC values, respectively.