SCHOOL OF INFORMATION TECHNOLOGIES
SIMULTANEOUS JOB AND DATA ALLOCATION IN GRID ENVIRONMENTS TECHNICAL REPORT 671
JAVID TAHERI, YOUNG CHOON LEE AND ALBERT Y. ZOMAYA
MARCH, 2011
Simultaneous Job and Data Allocation in Grid Environments Javid Taheri, Young Choon Lee, and Albert Y. Zomaya Centre for Distributed and High Performance Computing School of Information Technologies The University of Sydney Sydney, NSW 2006, Australia Emails: [javid.taheri,
[email protected]], [
[email protected]]
Abstract This paper presents a novel competitive-based approach, called BestMap, to simultaneously schedule jobs and assign data files to different entities of a Grid system. Using two independent, however collaborating mechanisms, schedulers of the system allocate jobs and data files to computational and storage nodes to minimize the overall makespan as well as the overall delivery time of all data files to their dependant jobs, respectively. The performance of BestMap has been measured by using several benchmarks varying from small- to largesized systems. Results of BestMap are compared against the performance of other algorithms to show its superiority in many working scenarios. The results also provide invaluable insights into scheduling and disseminate dependent jobs and data files as well as their performance related issues for various Grid environments.
Keywords Data file migration policies, Grid environments, Scheduling, Task Scheduling.
1 Introduction Grid computing has emerged as an essential technology that enables the effective exploitation of diverse distributed computing resources to deal with large-scale and resource-intensive applications, particularly in science and engineering. A grid typically consists of a large number of heterogeneous resources spanning across multiple administrative domains. The effective coordination of these heterogeneous resources plays a vital key role in achieving performance objectives. Grids can be broadly classified into two main categories (computational and data) based on their application focus. In recent years, the distinction between these two classes of grids is much blurred due to the ever increasing data processing demand in many scientific, engineering and business applications, such as drug discovery, economic forecasting, seismic analysis, back-office data processing in support of ecommerce, Web services, etc [1]. In a typical scientific environment such as High-Energy Physics (HEP), hundreds of endusers may individually or collectively submit thousands of jobs to access petabytes of distributed HEP data. Given the large number of sub-jobs/tasks resulting from splitting these bulk submitted jobs and the amount of data being used by them, their optimal scheduling along with allocating the necessary data files becomes the most serious bottleneck in grids – where jobs compete for scarce compute and storage resources among available nodes. Compact Muon Solenoid (CMS) data analysis is an infamous case study for such applications and is used as a motivation to design many systems including the algorithm in this article. In CMS analysis [2], versions of event feature extraction algorithms and event selection
functions are iteratively refined until their effects are well understood; “Run this version of the system to identify Higgs events” and “Create a plot of particular parameters that have been selected to determine the characteristics of this version” are two example of such analysis [2]. CMS analyses are usually submitted as hundreds or thousands of parallel jobs to access many shared data files. In such systems, each job is an acyclic data flow of tens of thousands of sub-jobs/tasks in which CMS executables (with seconds up to hours of runtime) must be started and run in parallel [3]. The data flow inside a job is known to the grid schedulers and its execution services so that they can correctly schedule and sequence subjob/task executions and data movements within a job. In this case, the grid scheduler unpacks every received job before passing it to the grid-wide execution services. Although subjobs/tasks can be multithreaded, they are always executed on a single CPU. Also, because sub-jobs/tasks do not directly communicate with each other –using an inter-process communication layer (such as MPI)– their communicating data is passed asynchronously through datasets. Thus, each sub-job/task needs one or more datasets as its input, and will create or update at least one dataset to store its output. The bulk of the CMS job output remains inside the grid, either as a new or an updated dataset. One or more sub-jobs/tasks in a CMS grid job may also deliver outputs (normally in the form of datafiles) directly to the user started the job. In this case, output delivery is asynchronous and should be supported by a grid service. Table 1 shows typical number of jobs from users and their computation and data related requirements for CMS jobs [4]. Many approaches based on greedy algorithms are designed to selfishly submit a job or allocate/replicate a data file to the best available resource without considering its consequent global costs. Therefore, grid schedulers are mainly divided into two types of (1) job-oriented and (2) file-oriented systems. In job-oriented systems, data files are fixes and jobs are scheduled –usually with respect to a few objectives such as the power-aware scheduling in [2, 5] or is inspired by an heuristic such as game theory in [6]. In this case, the aim is to schedule jobs inside a batch-of-jobs (bulk) among Computational Nodes (CNs) to minimise the overall makespan of the whole system. The speed and number of available computer resources in different CNs and the network capacity among CNs and Storage Nodes (SNs) are typical considerations taken into account in such systems. For file-oriented systems, on the other hand, jobs are fixed and data files are moved in the system so that their accessibility to their dependent jobs is increased. As a result, jobs will need less time to download their associated data files to execute; and therefore, the total makespan of the system is reduced again. The available storage in SNs and the capacity of interconnected network links between CNs and SNs are typical considerations in such allocations. From a practical point of view, neither of these two systems is adequate to deal with cases in which both computational jobs and data files are equally influential factors for efficient system utilisation. In fact, these algorithms share two main drawbacks: (1) they do not handle the frequency of jobs to treat cluster of jobs as atomic units as it is required in bulk job scheduling, and (2) they do not consider network characteristics to schedule extremely data intensive or computationally intensive jobs. Therefore, inappropriate distribution of resources, large queues, reduced performance, and throughput degradation for the remainder Table 1: Typical job characteristics in CMS Number of simultaneously active users Number of jobs submitted per day Number of jobs being processed in parallel Job turnaround time for tiny jobs Job turnaround time for huge jobs Number of datasets that serve as input to a sub job Average number of datasets accessed by a job Average size of the dataset accessed by a job
100-1000 250-10000 50-1000 30s-60s 0.2s-300s 0-10 250K-10000K 30GB-3TB
of the jobs are some of the drawbacks of such algorithms. In order to solve these problems, the European Data Grid (EDG) project has created a resource broker under its workload management system based on an extended and derived version of Condor [7]. The problem of bulk scheduling has also been addressed through shared sandboxes in the most recent versions of gLite from the EGEE project [8]. Nevertheless, these approaches only consider one of the priority and policy controls rather than addressing the whole real co-allocation and co-scheduling issues for the bulk jobs. In another approach for data intensive applications, data transfer time was considered in the process of scheduling jobs [9]. This deadline based scheduling approach however could not be extended to cover bulk scheduling. In the Stork project [10], data placement activities in grids were considered as important as computational jobs; therefore, data-intensive jobs were automatically queued, scheduled, monitored, managed, and even check-pointed in this system/approach. Condor and Stork were also combined to handle both job and datafile scheduling to cover a number of scheduling scenarios/policies. This approach also lacks the ability to cover bulk scheduling. In another approach [11], jobs and datafiles are linked together by binding CNs and SNs into I/O communities. The communities then participate in the wide-area system where the Class Ad framework is used to express relationships among stake-holders in such communities. This approach however does not consider policy issues in its optimisation procedure. Therefore, although it covers co-allocation and co-scheduling, it cannot deal with bulk scheduling and its related managements issues such as reservation, priority, and policy. The approach presented in [12] defines an execution framework to link CPUs and data resources in grids for executing applications that require access to specific datasets. Similar to Stork, bulk scheduling is also left uncovered in this approach. In more complete works such as the Maui Cluster Scheduler in [13], all jobs are queued and scheduled based on their calculated priority. In this approach, which is only applicable for local environments, weights are assigned based on various objectives to manipulate priorities in scheduling decisions. The data aware approach of the MyGrid [14] project schedules jobs close to the data files they require. However, this traditional approach is not always very cost effective as the amount of available bandwidths are rapidly increasing nowadays; and therefore, less effective for transferring large data files. This approach also results in long job queues and adds undesired load on sites when several jobs can be moved to other less loaded sites. The GridWay scheduler [15] provides dynamic scheduling and opportunistic migration although its information collection and propagation mechanism is not very robust. Furthermore, it has not been exposed to bulk scheduling of jobs yet. The Gang scheduling [16] approach provides some sort of bulk scheduling by allocating similar jobs to a single location; it is specifically tailored toward parallel applications working in a cluster. XSuffrage designed as an extension to the infamous Suffrage scheduling algorithm to consider location of data files during the scheduling process [17]. This algorithm however only uses such information for better scheduling of jobs, not to (re)allocate/replicate the data files. Data Intensive and Network Aware (DIANA) scheduling [2] is the most complete approach for simultaneous job and datafile scheduling. In this approach, jobs are first assessed to determine their execution class. For data-intensive applications, jobs are migrated to the best available CN with minimum access (download) time to their required data files. For computationally-intensive jobs, on the other hand, data files are migrated/replicated to the best available SN with minimum access (upload) time to their dependant jobs. In both cases, the decision is made based on: (1) capacity of SNs, (2) speed and number of computers in CNs, and (3) the interconnecting network links among SNs and CNs. In this article, another complete approach is presented to also simultaneously assign jobs and data files to CNs and SNs so that the overall execution of a batch of jobs as well as their total
delivery time of all data files to their dependant jobs is minimized. Toward this end, section 2 presents the assumed framework for this article. Section 3 presents the problem statement. Section 4 presents BestMap, our novel proposed algorithms to solve the stated problem. Section 5 shows the performance of BestMap in comparison with other approaches. Discussion and Analysis is presented in section 6, followed by Conclusion in section 7.
2 Framework Grids are large-scale distributed systems that consist of: (1) CNs, (2) SNs, (3) interconnecting network, (4) schedulers, (5) users, (6) jobs and (7) data files (Fig. 1). Computational Nodes: In this framework, computer centres with heterogeneous computing elements are modelled as a collection of CNs, !"! , !"! , … , !"!"! ; each CN (1) consists of several homogenous computing elements with identical characteristics, and (2) is equipped with a local storage capability. Fig. 2 shows a sample computer centre consisting of four CNs with such storage capability. CNs are characterised by (1) their processing speed, and (2) their number of processors. Processing speed for each CN is a relative number between 1−10 that reflects the processing speed of a CN as compared with other CNs in the system. For example, if a sample job needs 100 seconds to run on the slowest CN in the system, it can be executed/finalised in only 10 seconds using the fastest CN instead. It is worthwhile mentioning that the ratio of the fastest CN to the slowest one has been chosen as 10 in our framework as was suggested by other approaches too [2]. Our framework is however very flexible and can operate with any other value. Number of processors for each CN determines its capability to execute jobs with a certain degree of parallelism. For example, jobs with 12 degrees of parallelism can only be executed on CNs with more than or equal to 12 computing elements. Storage Nodes: SNs, !"! , !"! , … , !"!"! , are storage elements in the system that host datafiles required by jobs disseminated among CNs. Two types of SNs exist in this framework: independent and dependent. Independent SNs are individual entities in the system that are only responsible to host data files and deliver them to requesting CNs. Dependent SNs, on the other hand, are storage capacities that are attached to CNs to host
Fig. 1. Framework
Fig. 2. A sample computer center
their local data files as well as to provide them to other CNs if requested. Although from the optimisation point of view there is no difference between the two and they are treated equally in a grid system, independent SNs usually have more capacity than dependant ones; whereas, a dependant SN –the one that is attached to a CN– has the highest bandwidth to its attached CN and therefore can upload data files to it almost instantly. Interconnection Network: CNs and SNs are connected through an interconnection network which is comprised of individual links !"! , !"! , … , !"!!" . Each link in this system has its own characteristics and is modelled using two parameters: Delay and Bandwidth. Delay is set based on the average waiting time for a data file to start flowing from one side of the link to the other; bandwidth is set based on the average bandwidth between two sides of the link. Although the above formulation differs from reality in which delay and bandwidth among nodes significantly varies based on a system’s traffic, our extensive simulation showed that this difference is negligible when the number of jobs and data files increases in a system. Furthermore, the simulation time is significantly decreased using the proposed simple link model as it has also been endorsed by other works, such as DIANA [2]. Schedulers: Schedulers, !"ℎ! , !"ℎ! , … , !"ℎ!"#! , are independent entities in the system that accept jobs and datafiles from Users and schedule/assign/replicate them to their under control CNs and SNs. Schedulers, which can be connected to all CNs/SNs or only to a subset of them, are in fact the decision makers of the whole system that decide where each job and/or data file should be executed or stored/replicated, respectively. Schedulers can be either subentities of CNs/SNs or individual job/data-file brokers that accept jobs and data files from Users. In this work, to cover both cases, the more general case in which schedulers are treated as individual job/data file brokers is assumed. Users: Users, !! , !! , … , !!" , generate job with specific characteristics. Each user is only connected to one scheduler to submit his/her jobs. Although, the majority of users only use pre-existing data files in a system, they can also generate their own data files should they want to. Jobs: Jobs, !! , !! , … , !!" , are generated by Users and are submitted to schedulers to be executed by CNs. Each job consists of several dependant tasks –described by a DAG– with specific characteristics, such as: (1) execution time, and (2) number of processors. Execution time determines the number of seconds a particular task needs to be executed/finalised in the slowest CN in the system –the actual execution time of a task can be significantly reduced if it is assigned to a faster CN instead; Number of processors determines a task’s degree of parallelism. Using this factor, a scheduler eliminates CNs with less number of processors to execute jobs including tasks with higher number of processors’ need. Based on tasks’ characteristics and the way they are related to each other, each job has the following characteristics: (1) width, (2) height, (3) number of processors, (4) time to execute, (5) shape and (6) a list of required data files. Fig. 3 and Table 2 show sample jobs and their characteristics. Width of a job is the maximum number of tasks can be run concurrently inside a job. Height is the number of levels/stages a job has; Number of Processors a job needs is the maximum number of processors its containing tasks need to be run. Time to Execute determines the minimum time a job can be run on a CN –with relative speed of ‘1’– when its tasks’ dependencies are all satisfied. List of required data files determines a list of data files that must be downloaded by a CN before executing a job. Jobs are generated with different shapes to reflect different classes of operations as outlined by TGFF [18]. These
(a)
(b)
(c)
(d)
Fig. 3. Jobs’ shapes: (a) Serious-Parallel, (b Homogenous-Parallel, (c) Heterogeneous-Parallel, and (d) Single-Task Table 2. Tasks’s characteristics for Jobs is Fig. 3 Shape Serious-Parallel Homogenous-Parallel Heterogeneous-Parallel Single-Task
Width 6 7 9 1
Height 12 12 14 1
Num of Tasks 62 53 65 1
Num of Processors 7 8 6 4
Time to Execute 491 260 470 20
shapes are: (1) serious-parallel, (2) homogenous-parallel, (3) heterogeneous-parallel, and (4) single-task (Fig. 3 and Table 2). Data files: data files, !! , !! , … , !!" , are characterised based on: (1) their sizes and (2) list of their dependant jobs. These characteristics are used by schedulers to replicate them on different SNs of a system so that data files can be downloaded faster. Each data files is assumed to be owned by a specific SN and can also have up to a predefined number of replicas in a system. Schedulers can only delete or move replicas, and the original copies are always kept untouched.
3 Problem Statement: Data Aware Job Scheduling Data Aware Job Scheduling (DAJS) is a multi objective optimisation problem and is defined as assigning jobs to CNs and data files to SNs so that the overall makespan of executing a batch of jobs as well as the overall transfer time of all data files to their dependant jobs are minimised. These two objectives are usually independent –and in many cases even contracting– from each other; mainly because, achieving lower makespans requires scheduling jobs to powerful CNs; whereas, achieving lower transfer times requires using powerful links with higher bandwidths in a system. As a result, achieving one objective –e.g. reducing makespans– usually results in compromising the other one –e.g. reducing total transfer time. To formulate this problem, let !"#$ = !! , !! , … , !!" be a batch of !! jobs and !"#"$%&'( = !! , !! , … , !!" be a collection of !! data files that are required to execute !"#$. Also let !"#$%&' = !"#$! , !"#$! , … , !"#$!"! be !!" partitions of job indices to be disseminated among CNs. A partition of a set is defined as decomposition of a set into disjoint subsets whose union is the original set. For example, if !! = 9 and !!" = 3; then, !"#$%&' =
1,5,7 , 2,4,8,9 , 3,6 means jobs !! , !! , !! , !! , !! , !! , !! , and !! , !! are assigned to !"! , !"! and !"! , respectively. Similarly, let !"#"$%&'('#) = !"#!! , !"#$! , … , !"#$!"! be !!" sets of data file indices to be replicated on SNs. In this case, let !"#$! = !! , !! , … , !!" be !! number of jobs indices assigned to be executed by !"! . Thus, the total execution of all jobs assigned to !"! and the total transfer time of all datafiles to all jobs would be: !
! !"#$%&"' !"#$! = !"#!!! !!! . !"#$"%&'( !"! + !!! . !"#$% !"!
!!
!"#$%&'"!()' !"#$! =
!!!
!"#$%&'" !!! . !"#"$%&'(, !"!
where !!! . !"#$"%&'( !"! is the time !!! starts executing on !"! ; !!! . !"#$% !"! is the execution time of job !!! at !"! and !"#$%&'" !!! . !"#"$%&'(, !"! is the total transfer time for all data files required by !!! to be downloaded from their respective SNs. !!! . !"#$"%&'( !"! and !!! . !"#$% !"! greatly depend on the capacity of !"! and therefore can significantly affect the performance of the whole system. For example, execution of !!! can be greatly delayed if its required resources such as number of free processors are not available; and/or, the speeds of available free processors are very slow. !"#$%&'" !!! . !"#"$%&'!, !"! , on the other hand, greatly depends on the quality of links between !"! and SNs hosting !!! . !"#"$%&'(; and therefore, can also affect the performance of the system. For example, it can be significantly increased if !"! is connected via slow links to hosting SNs of !!! . !"#"$%&'(. In this case, DAJS is defined as finding elements of !"#$%&' = !"#$! , !"#$! , … , !"#$!"! and !"#"$%&'('#) = !"#$! , !"#$! , … , !"#$!"! to minimise the following two objective functions: !
!" !. MIN MAX!!! !"#$%&"' !"#$!
!. MIN
!!" !!!
!"#$%&'"!()' !"#$!
!. !. 1. !"#$! . !"#$%&' ≤ !"! . !"#$%&' ! = 1, … , !!" 2. !"#$! . !"#$ ≤ !"! . !"#$ ! = 1, … , !!" where !"#$! . !"#$%&' is the maximum number of processors needed to perform jobs addressed by !"#$! , !"! . !"#$%&' is the number of processors in a !"! , !"#$! . !"#$ is the total size of all datafiles addressed by !"#$! and !"! . !"#$ is the maximum capacity of !"! . Here, the first constraint is to guarantee that all CNs are capable of executing their assigned jobs; and, the second constraint is to guarantee that the total size of all data files each SN hosts is less than its total capacity.
4 BestMap Algorithm in Solving the DAJS Problem BestMap is an iterative competitive-based approach in which Jobs and Datafiles are scheduled and replicated to CNs and SNs, respectively. Here, the following two rounds of mechanisms are performed in each scheduler’s optimization cycle/iteration: (1) schedule jobs to CNs, and (2) replicate data files to SNs. During the first round, data files are assumed fixed and jobs are scheduled to minimize the overall makespan of the system; whereas, in the second round, jobs are assumed disseminated and data files are replicated to provide the fastest uploading times to their dependant jobs. Fig. 4 shows the overall flowchart of our algorithm BestMap. Here, this scheduler collects executing status of CNs of a system to find which CN is able to execute a job faster. After that, it starts replicating data files to provide the fastest uploading time to jobs. This procedure is repeated for several iterations to either no better configuration is found or a maximum number of iterations is reached.
Fig. 4. BestMap’s flowchart
4.1 Scheduler: Schedule Jobs In this procedure, data files are assumed fully distributed/replicated and each scheduler aims to efficiently schedule its jobs to its under controlled CNs so that their overall makespan is minimized. Here, each scheduler asks each CN to provide the earliest finish time it can perform a given job. Upon receiving all finish times, the scheduler decide which CN should perform the job. This procedure is finalized once all jobs are scheduled among CNs. The following procedure shows details of this procedure. Step 1: For each scheduler, !"ℎ! Step 2: Sort Jobs inside !"ℎ! based on a given criterion Step 3: For each Job in !"ℎ!, namely !! Step 4: For each CN, !"! Step 5: Find the earliest finish time that !"! can perform !! ; store these times in a vector namely !""#$%$&ℎ!"#$% Step 6: Find minimum finish time in !""#$%$&ℎ!"#$. Step 7: Assign !! to the CN that provides the minimum finish time. Step 8: Repeat Steps 1-7 until all jobs from all schedulers are disseminated.
In Step 2, jobs are sorted based on a criterion –such as Longest Jobs First– before allocations. In Step 5, only CNs that are able to perform !! would be asked to report their finish times; i.e., CNs that have less number of parallel computing elements to satisfy !! ’s degree of parallelism will be excluded.
4.2 Scheduler: Replicate Datafiles In contrary to the previous procedure, jobs are assumed fully disseminated in this procedure and schedulers try to replicate data files to their associated SNs. Here, each scheduler would similarly ask every SN under its control to report the total upload time of a given data file to all its dependant jobs should the data file is replicated there. In this procedure, each SN assume that it will hold the sole replica of a data file and only calculated its own uploading time, regardless of other replicas. Upon collecting such uploading times, each scheduler chooses SNs that must hold replicas of a data file. This procedure is finalized once either all data files are replicated or no capacity is left for SNs to replicate other data files. The following procedure explains details of this procedure. Step 1: Sort Datafiles based on a given criterion Step 2: For each Datafile, !! Step 3: For each SN, !"! Step 4: Find the total upload time of !! to all its dependant jobs if it is replicated on !"! . Step 5: Store these uploading times in an array called !""#$%&'(). Step 6: Sort !""#$%&'() in ascending order. Step 7: For ! = 1 to !"#$%&'()*+,"Step 8: if !""#$%&'()(!)/!"#$!"#$%&'()(!! ) < 2 then replicate !! onto !"! . Step 9: Next ! Step 10: Repeat Steps 2-10 until all data files are replicated.
In the above procedure, Step 1 sorts data files with respect to a criterion –such as Largest Datafiles First– to prioritise replication of data files with respect to each other; in Step 7, !"#$%&'()*+,"- is the maximum number of replicas each data file can have in a system; i.e., up to !"#$%&'()*+,"- + 1 instances of each datafile can exist in a system: one nonremovable original copy that is hosted by a specific SN, and up to !"#$%&'()*+,"removable replicas on other SNs. Step 8 prevents replicating data files to SNs that probably will not help reducing the total transfer time of a data file to its dependant jobs. !"#$%&'()*"+,(!! ) returns the minimum uploading time of !! if it is replicated on any SN.
4.3 Computational Nodes: Calculate Earliest Finish time Upon receiving a job from a scheduler, each CN calculates the earliest time it can finish the submitted job based on the status of its local scheduler. Local scheduler in each CN is the unit responsible to schedule jobs and their inside tasks among several homogenous computer resources (e.g. desktop PCs) while satisfying their tasks’ dependencies. Speed and the number of computer resources in a CN as well as characteristic of the network links connecting this CN to other SNs are other factors to determine the finish time of a given job. In this case: !"#"$ℎ!"#$(!! , !"! ) = !"#$"%&'( !! , !"! + !"# !! , !"! + !"#$%&'"(!! . !"#"$%&'(, !"! ) where !"#$"%&'( !! , !"! denotes the earliest time !"! can start executing !! , !"# !! , !"! denotes the execution time of all tasks inside !! –while satisfying their interdependencies– on !"! , and !"#$%&'"(!! . !"#"$%&'(, !"! ) denoted the total transfer time of all data files required by !! before executing it on !"! . It is worthwhile mentioning that if more than one
job in a CN need the same data file, the requested data file will be downloaded only once and stored in a local depository to be used for further requests. As a result, as CNs perform more jobs inside a bag-of-jobs, their depositories become richer and therefore !"#$%&'"(!! . !"#"$%&'(, !"! ) for many jobs could be significantly lesser than their original transfer times.
4.4 Storage Nodes: Calculate Uploading Time Upon receiving a data file from a scheduler, each SN –with enough space to replicate– calculates the total uploading time of the given data file to its depending jobs. Here, each SN assumes that it holds the only instance of a data file and calculates its delivery time; such calculation is purely based on its own link characteristics to CNs that are responsible to execute jobs dependant to the given data file and is calculated as follows: !!"!
!"#$%&'()*'+, !! , !"! =
!"#$%&'" !! , !"! , !! . !"
+ !"#$%&'" !! . !"!"# , !"!
!!!
where !!"! denotes the total number of jobs that need !! to execute, !! . !" denotes the CN that is responsible to execute !! , !"#$%&'" !! , !"! , !! . !" denotes the transfer time of !! between !"! and !! . !", !! . !"!"# addresses the SN that hosts the original instance of !! , and !"#$%&'" !! . !"!"# , !"! determines the total time needed to transfer !! form its original SN to be replicated on !"! .
5 Simulation To simulate the performance of the proposed algorithm (BestMap) to assign jobs to CNs and data files to SNs to achieve the lowest makespan as well as transfer time of all data files to their dependant jobs, nine test grids are generated to represent different working scenarios. They were investigated through our exclusively designed simulator. These grids are generated based on the direct observations from [2] and [18] where the only exclusive algorithm in this area, DIANA, is designed for a similar problem as ours. Tables 3 and 4 show the general shape and detailed characteristics of these systems, respectively. Here, all jobs and data files are generated by an arbitrary number of 100 users. To gauge the performance of our approach, it is compared against two other methods (DIANA and FLOP) with the same objectives as well as two greedy algorithms (MinTrans and MinExe). In summary, DIANA categorizes the submitted jobs as either computationally intensive or data intensive. For a computationally intensive job, DIANA migrates it to a CN with lowest execution time –regardless of its data files’ download time; for a data intensive job on the contrary, DIANA either migrates the job to a CN with fastest data file download time, or, replicates the data files to SNs with higher upload times. In other words, DIANA migrates jobs for data intensive and replicates data files for computationally intensive jobs, respectively. FLOP targets CNs that can start executing jobs straight away; i.e. it always migrates jobs to CNs that can start executing them faster than others. Although FLOP does not initially consider data files’ download times during it scheduling process, it always replicates data files upon scheduling jobs to provide highest upload times to them. MinTrans and MinExe are two extreme algorithms that only try to minimise one of the aforementioned objectives (either makespan or transfer-time) on the stated DAJS problem. MinTrans solely focus on minimizing the overall data files’ transfer time and replicate them so that they can provide the lowest upload time to jobs. On the contrary, MinExe solely focus is on
Table 3. Overall system characteristics System Entity CN SN Users Datafile Network Links Jobs:Tasks
Jobs:Shape
Jobs:Datafiles
Attribute Number of Processors Processors’ Speed Attached Storage Size (GByte) Storage Size (GByte) Job Generation Size (MByte) Link Latency (sec) Bandwidth (Mbit/sec) Execution Time (sec) Processors’ Dependency Serious-Parallel (30%) Homogenous-Parallel (30%) Heterogeneous-Parallel (20%) Single-Task (20%) Width Height Datafiles’ Dependency
Mean 32 4 5 100 100 50 0.2 1.024 30 2
StDev 32 4 5 100 100 50 0.2 1.024 30 2
Min 8 1 1 20 10 1 0 0.128 10 1
Max 128 10 20 500 1000 200 1 12.8 1000 10
4 10 5
4 4 5
1 1 0
10 15 20
Table 4. Detailed characteristics of tailor-made grids Test grid name Test-Grid-Smal-5-2 Test-Grid-Smal-5-5 Test-Grid-Smal-5-10 Test-Grid-Medium-10-5 Test-Grid-Medium-10-10 Test-Grid-Medium-10-20 Test-Grid-Large-20-10 Test-Grid-Large-20-20 Test-Grid-Large-20-40
Num of CNs
Num of SNs
5 5 5 10 10 10 20 20 20
7 10 15 15 20 30 30 40 60
Num of Prcs [in CNs] 136 120 176 328 304 288 680 712 656
Num of Jobs 1066 1059 1049 1052 1047 1043 1015 1028 1072
Num of Tasks [in Jobs] 24510 24983 24079 24790 25567 22543 24540 25327 25265
Num of Datafiles 84 128 248 300 379 572 456 765 910
minimizing a system’s makespan and schedule jobs to achieve the lowest possible makespan. Achieving lower makespans and transfer-times are the second priority in MinTrans and MinExe, respectively. MinTrans’ transfer-time and MinExe’s makespan/resource-utilisation will be used as benchmarks to measure uploading and computational performance of other approaches; mainly because, they would probably provide the lowest makespan and transfer time can be achieved for each test grid. Although many other methods exist to solely either schedule jobs or replicate data files in a grid environment, only the aforementioned methods (DIANA and FLOPS) simultaneously considered both objectives in their optimization procedure; and therefore, were selected to gauge the performance of our novel algorithms in this work. MinTrans and MinExe are only listed here to show the best can be achieves if only one objective is targeted. Note that both of these problems –i.e., scheduling and replication– are NP-complete and no optimal solution can be found for them in a linear time.
5.1 Test-‐Grid-‐Small-‐X-‐Y These test grids are to represent small-sized grids with very limited number of CNs and SNs; here, X, and Y represent the total number of CNs and independent SNs in a system, respectively (the total number of SNs in the system should be X+Y). Fig. 5 shows the overall shape of Test-Grid-Small-5-2; the others are not represented here as they have very similar structure to the presented one. TestGrid-Small-5-2 and -5-10 are to represent small-sized grids where jobs are usually computationally-intensive and
Fig. 5. Test-Grid-Small-5-2
data-intensive, respectively. Test-Grid-Small-5-5 shows a grid containing a balanced number of jobs from both categories. Table 4 provides more information for these test grids. Table 5 and Fig. 6 reflect performance of different algorithms in scheduling jobs and replicating data files for these systems. In this figure, schedulers sort their jobs based on LJF (Longest Jobs First) and prioritise their data files based on their sizes; number of replicas for these systems are set to ‘2’ and ‘3’. In other words, schedulers try to schedule their bulkier jobs and data files first, and each data file is allowed to have at most ‘2’ or ‘3’ replicas in a system. The measuring criteria are: (1) Makespan, (2) Transfer-Time, and (3) ResourceUtilisation. Makespan reflects the latest time all CNs finish their allocated/scheduled jobs. Transfer-Time represents the total amount of upload time to deliver all data files to their requested jobs. Resource-Utilisation represents how well CNs are deployed. The following Table 5. Results for Test-Grid-Small-X-Y Makespan Test-Grid-Smal-5-2
Trans-Time Res-Util
Makespan Test-Grid-Smal-5-2
Trans-Time Res-Util
Makespan Test-Grid-Smal-5-2
Trans-Time Res-Util
Rep=2 Rep=3 Rep=2 Rep=3 Rep=2 Rep=3
BestMap 5601 5604 850 750 95.6% 95.5%
DIANA 8503 8503 780 580 59.7% 59.7%
FLOP 6025 6025 740 690 90.2% 90.2%
MinTrans 15619 14961 150 150 26.3% 25.5%
MinExe 5602 5602 920 760 95.6% 95.6%
Rep=2 Rep=3 Rep=2 Rep=3 Rep=2 Rep=3
3724 3722 3180 3010 96.4% 96.3%
4519 4519 3050 2860 80.2% 80.2%
5594 5594 2860 2820 66.1% 66.1%
8746 8746 0 0 51.2% 51.2%
3721 3721 3280 3330 96.4% 96.4%
Rep=2 Rep=3 Rep=2 Rep=3 Rep=2 Rep=3
2304 2305 2000 1420 96.1% 96.1%
3887 3887 1880 1440 57.2% 57.7%
4827 4827 1520 990 49.3% 49.3%
4516 4164 60 50 40.0% 43.5%
2306 2306 1960 1640 96.0% 96.0%
(a) Test-Grid-Small-5-2, Rep=2
(b) Test-Grid-Small-5-5, Rep=2
(c) Test-Grid-Small-5-10, Rep=2
(d) Test-Grid-Small-5-2, Rep=3
(e) Test-Grid-Small-5-5, Rep=3
(f) Test-Grid-Small-5-10, Rep=3
Fig. 6. Makespan plus Transfer time for Test-Grid-Small-X-Y for: (a)-(c) ‘2’ and (d)-(f) ‘3’ number of replicas
Table 6. A sample calculation for Resource Utilisation Item
Processors
Makespan
CN1 CN2 CN3
16 8 4
95 100 105
Num of Used Processors up to Time : Makespan 1520 800 420
All Available Processors 16 x 105 = 1680 8 x 105 = 840 4 x 105 = 420
example shows how this criterion is calculated for the sample heterogeneous environment in Table 6. !"#$%&' =
1520 + 800 + 420 = 93.2% 1680 + 840 + 420
5.2 Test-‐Grid-‐Medium-‐X-‐Y This test grid is to represent medium-sized grids with moderate number of CNs and SNs. Table 7 and Fig. 7 shows performance of different algorithms on these systems.
5.3 Test-‐Grid-‐Large-‐X-‐Y This test grid is to represent large-sized grids with a large number of CNs and SNs. Table 8 and Fig. 8 shows performance of different algorithms on these systems.
6 Discussion and Analysis Figs. 6-8 and Tables 5,7,8 show results of applied algorithms to minimize the overall makespan as well as transfer time of the whole system for the tailor-made systems in Table 4. Table 7. Results for Test-Grid-Medium-X-Y Makespan Test-Grid-Medium-10-5
Trans-Time Res-Util
Makespan Test-Grid-Medium-10-10
Trans-Time Res-Util
Makespan Test-Grid-Medium-10-20
Trans-Time Res-Util
Rep=2 Rep=3 Rep=2 Rep=3 Rep=2 Rep=3
BestMap 1271 1273 942 805 95.3% 95.2%
DIANA 1980 2224 836 703 58.9% 52.7%
FLOP 2273 2273 841 767 61.4% 61.4%
MinTrans 4237 4082 27 37 25.3% 29.4%
MinExe 1274 1274 1029 880 95.0% 95.0%
Rep=2 Rep=3 Rep=2 Rep=3 Rep=2 Rep=3
1225 1224 3760 3765 95.4% 95.5%
2104 2104 3000 3445 54.0% 54.0%
3199 3199 3715 3835 38.8% 38.8%
8019 5069 110 105 16.8% 24.0%
1225 1225 4390 4080 95.4% 95.4%
Rep=2 Rep=3 Rep=2 Rep=3 Rep=2 Rep=3
1735 1731 2880 2346 94.8% 95.1%
2583 2672 2880 2500 61.9% 62.7%
4441 4441 2481 2382 40.1% 40.1%
4701 3826 600 476 33.4% 40.0%
1733 1733 2676 2444 95.0% 95.0%
(a) Test-Grid-Medium-10-5, Rep=2
(b) Test-Grid-Medium-10-10, Rep=2
(c) Test-Grid-Medium-10-20, Rep=2
(d) Test-Grid-Medium-10-5, Rep=3
(e) Test-Grid-Medium-10-10, Rep=3
(f) Test-Grid-Medium-10-20, Rep=3
Fig. 7. Makespan plus Transfer time for Test-Grid-Medium-X-Y for: (a)-(c) ‘2’ and (d)-(f) ‘3’ number of replicas
The following sections explain more about different aspects of these results.
6.1 Makespan This criterion that represent the time the latest CN in the system finalizes its assigned jobs is one of the two objectives that must be minimized in a system. Figs. 6-8(a,d) show overall makespans of different systems using the aforementioned techniques. For all test grids –varing from small to large– BestMap achieves the closest makespan to MinExe that is specifically designed to only perform all jobs faster and achieves the lower makespan. This in fact vouches for BestMap’s excellent performance in efficiently executing both classes of jobs –both computational and data intensive– in a grid environment. For other methods, the following results were observed for different scenarios. For small-sized grids (Test-Grid-Small-X-Y), when the number of SNs is lower than the number of CNs (Fig. 6(a,d)) and the system is mainly designed to handle computationally intensive jobs, techniques that more inclined to execute jobs faster (FLOP, MinExe) achieve the lowest makespans. Furthermore, techniques that are more inclined to minimize total transfer time of datafiles (DIANA and MinTrans) achieve the higher makespans, especially MinTrans with the makespan almost three times of MinExe’s makespan. Here, when the Table 8. Results for Test-Grid-Large-X-Y Makespan Test-Grid-Large-20-10
Trans-Time Res-Util
Makespan Test-Grid-Large-20-20
Trans-Time Res-Util
Makespan Test-Grid-Large-20-40
Trans-Time Res-Util
Rep=2 Rep=3 Rep=2 Rep=3 Rep=2 Rep=3
BestMap 743 748 1141 1071 91.8% 92.1%
DIANA 1564 1198 997 1139 48.1% 41.2%
FLOP 2861 2861 940 1214 29.2% 29.2%
MinTrans 4677 3069 148 198 28.2% 30.3%
MinExe 751 751 1104 1441 91.8% 91.8%
Rep=2 Rep=3 Rep=2 Rep=3 Rep=2 Rep=3
868 870 727 755 89.9% 89.6%
1654 1926 640 752 44.8% 38.7%
1982 1982 598 825 50.8% 50.8%
3982 3839 55 60 21.0% 21.9%
870 870 573 770 89.6% 89.6%
Rep=2 Rep=3 Rep=2 Rep=3 Rep=2 Rep=3
1037 1034 729 1007 91.8% 92.1%
1722 2116 604 928 48.1% 41.2%
4632 4632 534 999 29.2% 29.2%
3032 3775 132 129 28.2% 30.3%
1037 1037 582 1090 91.8% 91.8%
(a) Test-Grid-Large-20-10, Rep=2
(b) Test-Grid-Large-20-20, Rep=2
(c) Test-Grid-Large-20-40, Rep=2
(d) Test-Grid-Large-20-10, Rep=3
(e) Test-Grid-Large-20-20, Rep=3
(f) Test-Grid-Large-20-40, Rep=3
Fig. 8. Makespan plus Transfer time for Test-Grid-Large-X-Y for: (a)-(c) ‘2’ and (d)-(f) ‘3’ number of replicas
number of CNs and SNs is balanced (Fig. 6(b,e)), the differences between the aforementioned techniques are diminished. Despite the previous case, DIANA that is more inclined to reduce the total transfer time of datafiles achieves better makespan than FLOP. MinExe achieves the best makespan and MinTrans achieves the worse. When the nature of the system is shifter toward small-sized grids with more SNs than CN (Fig. 6(c,f)) to majorly emphasize on data intensive jobs, the aforementioned techniques showed different results. In this case DIANA showed very similar results to MinTrans and both achieved lower makespan than FLOP. As expected, MinExe still achieved the lowest makespan. For medium-sized grids (Test-Grid-Medium-X-Y), achieving lower makespans were more challenging for the aforementioned techniques. Here, when the number of CNs was dominant (Fig. 7(a,d)), DIANA and FLOP that are emphasised on two different aspects of the stated DAJS problem achieved very similar makespans. When the number of CNs and SNs are balanced (Fig. 7(b,e)), FLOP’s greedy approach for starting jobs in the earliest time failed and resulted in its higher makespan than DIANA. For larger grids with more SNs (Fig. 7(c,f)), results were more intriguing. Here, FLOP’s achieved the higher makespan of all technique, even higher than MinTrans. DIANA still performs reasonably acceptable when compared with MinExe and BestMap. In all cases of medium-sized grids –varying from small to large– MinExe achieved the lower makespan, while makespan of DIANA, FLOP and MinTrans were varying between 1.5-2 times, 2-3 times, and 3-4 times of the benchmark (MinExe), respectively. For large-sized grids (Test-Grid-Large-X-Y) despite previous cases, when the number of CNs was dominant (Fig. 8(a,d)), the difference between FLOP and DIANA become very distinct. Here, FLOP’s makespan was only slightly better than MinTrans, and DIANA’s makespan was slightly worse than MinExe. For the same system size, however with a balanced number of CNs and SNs (Fig. 8(b,e)), makespan of DIANA and FLOP become very similar again, both almost twice that of MinExe and half that of MinTrans. For the SNs dominant case (Fig 8(c,f)), DIANA was able to maintain its almost reasonable makespan; whereas, similar to the medium-sized grid, FLOP performed even worse than MinTrans. In all cases of this system size, makespan of DIANA, FLOP, and MinTrans were always between 2-2.5, 2-4.5, and 3.54.5 times of benchmark (MinExe), respectively.
6.2 Transfer Time This criterion reflects the quality of the replication policy for each of the aforementioned techniques. It is worthwhile mentioning that although the replication procedure is identical for all methods, their total transfer time is vastly different as it highly depends on their scheduling policy as well. Figs. 8-10 also show transfer time for all test grids for ‘2’ and ‘3’ number of replicas. As it was expected, MinTrans always achieved the lowers transfer time for all test grids. DIANA is ranked second after MinTrans that is specifically designed to achieve the lowest transfer time. Following details describe more about the results. For the small-sized grids, Test-Grid-Small-X-Y, when the number of SNs is lower than CNs (Fig. 6(a,d)), MinTrans could transfer all data files in almost 1/5 of other methods; when balanced number of SNs and CNs exists or when then number of SNs are dominant, MinTrans could transfer all data files almost instantly, while other methods achieves almost similar transfer times (Fig. 6(b,c,e,f)). For the medium-sized and large-sized test grids, after MinTrans that always transfer data files significantly lower than other methods, DIANA could slightly transfer data files better than others, followed by BestMap, FLOP, and MinExe (Figs. 7-8).
6.3 Recourse Utilisation This criterion is to measure the microscopic behaviour of scheduling techniques in disseminating jobs. Despite the macroscopic criterion of makespan that only measures the overall outcome of a system, Resource-Utilisation reflects the exact percentage of unused computing units during the whole process of executing a bag-of-jobs. Resource-Utilisation has a direct relation to the makespan of a system where lower makespan usually mean better resource utilisation –and vice versa– as well. For the small-sized grids (Table 5), BestMap’s and MinExe’s Resource-Utilisation were almost identical and significantly higher than others. When the number of SNs is lower than CNs, FLOP also managed to utilise the system only 5%-6% lower than that of BestMap and MinExe; when a balanced number of SNs and CNs exists, DIANA utilised the system better than FLOP and MinTrans; and, when the number of SNs are higher than CNs, Recourse Utilisation of DIANA, FLOP, and MinTrans never exceeds 60%, almost 30% lower than BestMap and MinExe. For the medium-sized grids (Table 7) where these algorithms are more challenged, BestMap and MinExe could still achieve the highest resource utilisation with approximately 95% at all times; other methods could never utilise such systems for more than 62%. For the large-sized grids (Table 8) in which the aforementioned methods are pushed to their limits, BestMap and MinExe could maintain their performance with approximately 90% at all times. Other methods could barely utilise such systems for more than 50%. In summary, BestMap and MinExe always utilise systems much better than others. After them, FLOP utilised systems better than other when the number of SNs are lower than CNs; DIANA marginally outperformed others when the number of SNs are either balanced or higher than the number of CNs in a system.
6.4 BestMap vs. DIANA vs. FLOP vs. MinExe vs. MinTrans Each of the aforementioned algorithms has it own pros and cons. MinExe and MinTrans that were specifically designed to optimize/minimize only one objective in the stated DAJS problem always outperformed others in their focused objective. For the other objective, however, they significantly showed poor performances; e.g., MinTrans’ makespan was always 4-5 times greater than that of MinExe’s, and MinExe’s transfer time was always significantly greater than that of MinTrans. These results, in fact, proved the true multi objective nature of the stated problem; it also clearly shows that optimizing/minimizing these objectives are purely independent and sometimes even contracting. Figs. 6-8 also show the overall makespan plus transfer time of a bag-of-jobs in these test grids. Based on these results, BestMap, our proposed algorithm in this work, always achieved the closest makespan to MinExe, and sometimes even slightly better than it. Its transfer time, however, was always better than that of MinExe. As a result, BestMap always outperformed other techniques in reducing the overall execution time plus transfer time in all test grids. FLOP, on the other hand, showed mix performance behaviour for different test grids. For test grids with dominant number of CNs (Figs. (6-8)(a,d)), and balanced number of SNs and CNs (Figs. (6-8)(b,e)), FLOP usually performed better than MinTrans, while showed mix outperformances against other methods. For test grids with dominant number of SNs (Figs. (6-8)(c,f)), FLOP always showed the poorest performance, even poorer than MinTans and MinExe that are only focused on one objective. DIANA also showed a mix performance profile for different systems. For test grids with either dominant number of CNs or balanced number of SNs and CNs, DIANA always outperformed both FLOP and MinTrans. For test grids with dominant number of SNs, however, it usually showed poorer performance profile than MinTrans. This proves that although migrating jobs to data files for data intensive jobs and data files to jobs for computationally intensive jobs usually results in better deployment of a system, ignoring the aforementioned greedy approach and only migrating jobs to data
(a) Makespan
(b) Transfer Time
(c) Resource Utilisation Fig. 9. Effect of number of replicas for Test-Grid-Small-5-10
files could also result in better deployment of a system when data intensive jobs are majorly dominant. In summary, these figures prove that the greedy approach of FLOP in executing jobs as soon as possible can only be effective for small grids where computationally intensive jobs are dominant. DIANA with its intelligent classification of jobs types is usually effective only when balanced number of job types (computationally and data intensive) exists. For system with dominant number of SNs where jobs are majorly data intensive, DIANA does not show better performance than MinTrans that only tries to minimize/optimize the transfer time. BestMap’s strategy in scheduling jobs and replicating their dependant data files to adjacent SNs is probably the best strategy to concurrently reduce both equally important objectives of the stated DAJS problem. This fact is also somehow confirmed by other approaches that try to group similar jobs/data files to be scheduled/replicated on a similar CNs/SNs. For example, Gang Scheduling [16] also hypothesised this fact in scheduling its jobs.
6.5 The Effect of Number of Replicas For all simulations, data files were limited to have at most ‘2’ or ‘3’ replicas in a system; i.e. three or four instances in total. In this section, this limitation is relaxed to study the effect of number of replicas on the overall performance of the aforementioned techniques. Here, three test grids that were focused on data intensive jobs and therefore more sensitive to the number of replicas are selected again and performances of the aforementioned algorithms are measured for them. Figs. 9-11 show these results when up to 0-7 number of replicas –i.e. 1-8 instances for each data file in total– were allowed in a system. Results of other test grids were not reflected here as they were very similar to the projected ones. These figures prove that despite the general impression that more replicas should result in reducing the overall transfer-time and probably better utilisation of a system is not always correct. In fact, in many cases, more replicas can waste some precious resources in a system and results in sever under utilisation sometimes. These figures also showed that high and low number of replicas always result in system under utilisation; and therefore, setting a proper number for replicating can be as important as efficiently scheduling jobs in CNs.
6.6 Convergence Speed Table 9 shows the convergence time for only one iteration of each of the algorithms. The actual number of iterations each algorithm needs to converge to an answer greatly depends on the problem size as well as the computer running it (always less than 5 iterations for our
(a) Makespan
(b) Transfer Time
(c) Resource Utilisation Fig. 10. Effect of number of replicas for Test-Grid-Medium-10-20
(a) Makespan
(b) Transfer Time
(c) Resource Utilisation Fig. 11. Effect of number of replicas for Test-Grid-Large-20-40
tailor-made grids). Nevertheless, just to provide an overview of their convergence speeds, these algorithms are run on an ordinary dual core 2.1 MHz desktop with 4G RAM and their result are reflected here. Table 9 clearly show for small-sized grids, MinExe and MinTrans with less amount of computation usually converge faster than others; for medium-sized grids, BestMap marginally finalized faster than others and DIANA is ranked the slowest; for largesized grids, BestMap maintained its lead and performed faster than others, while MinTrans and FLOP are recognized as the slowest techniques. In summary, BestMap was always either the fastest, or the marginal runner-up; DIANA was usually relatively fast too with occasional slow downs in some cases; FLOP was relatively fast for small-sized grids and slow for medium- and large-sized grids; MinTrans was only fast for small-sized grids and significantly slow otherwise; and, MinExe with no data replication policy was either the fastest for small-sized grids and a marginal runner-up otherwise. These results also emphasize the fact that replication policies are probably more time consuming than scheduling jobs for medium- to large-sized grids. BestMap, however, with a reasonable amount of calculation for data replication was always the fastest in all cases as its job scheduling policy always had a positive effect/impact in simplifying the decision making
Table 9. Convergence times [in seconds] of algorithms for the trailer-made grids Test grid name Test-Grid-Smal-5-2 Test-Grid-Smal-5-5 Test-Grid-Smal-5-10 Test-Grid-Medium-10-5 Test-Grid-Medium-10-10 Test-Grid-Medium-10-20 Test-Grid-Large-20-10 Test-Grid-Large-20-20 Test-Grid-Large-20-40
BestMap 137 39 42 137 39 42 157 232 322
DIANA 159 48 51 159 48 51 169 241 323
FLOP 140 39 41 140 39 41 202 323 628
MinTrans 100 67 34 100 67 34 374 330 586
MinExe 124 37 42 124 37 42 165 244 324
process of its data replication policy. Therefore, it can be concluded that fastest data replication policies can be by-products of job scheduling policies if designed properly.
7 Conclusion This paper presents a novel greedy approach to simultaneously schedule jobs to CNs and replicate data files to SNs. Grid schedulers in our proposed approach (BestMap) run two independent, however, collaborative mechanisms to find the best CN to perform a job and a SN to replicate a data file too. A number of distinct observations were made that articulate the results of detailed simulations developed in this work. Using several criteria to measure the performance of different algorithms on a variety of small-sized to large-sized Grids, BestMap showed its superiority when compared with other algorithms proposed in the literature. The results presented here also disclose new avenues of research that can be targeted in the future.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
F. Berman, et al., Grid Computing: Making the Global Infrastructure a Reality, 2008. A. Anjum, et al., "Bulk Scheduling With the DIANA Scheduler," IEEE Transactions on Nuclear Science, vol. 53, p. 3829, 2006. K. Holtman, "CMS Data Grid System Overview and Requirements," The Compact Muon Solenoid (CMS) Experiment Note 2001/037, 2001. K. Holtman, "HEPGRID2001: A Model of a Virtual Data Grid Application," in HPCN Europe 2001: Proceedings of the 9th International Conference on HighPerformance Computing and Networking, 2001, pp. Springer-Verlag–720. R. Subrata, et al., "Cooperative power-aware scheduling in grid computing environments," Journal of Parallel and Distributed Computing, vol. 70, p. 91, 2010. R. Subrata, et al., "A Cooperative Game Framework for QoS Guided Job Allocation Schemes in Grids," IEEE Transactions on Computers, vol. 57, p. 1422, 2008. J. Frey, et al., "Condor-G: A Computation Management Agent for Multi-Institutional Grids," Cluster Computing, vol. 5, pp. Springer Netherlands-246, 2002. P. Andreetto, et al., "Practical Approaches to Grid Workload \& Resource Management in the EGEE Project, CHEP04," Interlaken, Switzerland, 2004. H. Jin, et al., "An adaptive meta-scheduler for data-intensive applications," International Journal of Grid and Utility Computing, vol. 1, pp. Inderscience Publishers–37, 2005. T. Kosar and M. Livny, "A framework for reliable and efficient data placement in distributed computing systems," Journal of Parallel and Distributed Computing, vol. 65, pp. -1157, 2005.
[11] [12] [13] [14] [15] [16] [17] [18]
D. Thain, et al., "Gathering at the Well: Creating Communities for Grid I/O," in SC2001: High Performance Networking and Computing. Denver, CO, November 10– 16, 2001, 2001, pp. ACM Press and IEEE Computer Society Press-58. J. Basney, et al., "Utilizing widely distributed computational resources efficiently with execution domains," Computer Physics Communications, vol. 140 (2), p. 252, 2001. B. Bode, et al., "The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters," in Proceedings of the 4th Annual Showcase \& Conference (LINUX-00), 2000, pp. The USENIX Association–224. W. Cirne, et al., "Running Bag-of-Tasks Applications on Computational Grids: The MyGrid Approach," in Proceedings of the 2003 International Conference on Parallel Processing (32th ICPP’03), 2003, pp. IEEE Computer Society-416. E. Huedo, et al., "The GridWay Framework for Adaptive Scheduling and Execution on Grids," Scalable Computing: Practice and Experience, vol. 6 (3), pp. -8, 2005. P. E. Strazdins and J. Uhlmann, "A comparison of local and gang scheduling on a Beowulf cluster," in Proc. 2004 IEEE International Conference on Cluster Computing (6th CLUSTER’04), 2004, pp. IEEE Computer Society–62. H. Casanova, et al., "Heuristics for Scheduling Parameter Sweep Applications in Grid Environments," in HCW ‘00: Proceedings of the 9th Heterogeneous Computing Workshop, 2000, pp. IEEE Computer Society–363. R. P. Dick, et al., "TGFF: task graphs for free," Hardware/Software Codesign, 1998. (CODES/CASHE ‘98) Proceedings of the Sixth International Workshop on, pp. -101, 1998.
ISBN 978-1-74210-228-3
School of Information Technologies Faculty of Engineering & Information Technologies Level 2, SIT Building, J12 The University of Sydney NSW 2006 Australia
T +61 2 9351 3423 F +61 2 9351 3838 E
[email protected] sydney.edu.au/it
ABN 15 211 513 464 CRICOS 00026A