Multi-job Meta-Brokering in Distributed Computing Infrastructures using Pliant Logic Attila Kertesz∗‡ , Gergo Maros‡ , Jozsef Daniel Dombi‡ for Computer Science and Control, MTA SZTAKI, H-1111 Budapest, Kende u. 13-17, Hungary
[email protected] ‡ Software Engineering Department, University of Szeged, 6720 Szeged, Dugonics ter 13, Hungary
[email protected],
[email protected]
∗ Institute
Abstract—The ever growing number of computationintensive applications calls for the interoperation of distributed infrastructures such as Clouds, Grids and private clusters. The European SHIWA and ER-flow projects have been initiated to enable the combination of heterogeneous scientific workflows, and to execute them in a large-scale system consisting of multiple Distributed Computing Infrastructures including Grids and Clouds. In this paper we focus on one of the resource management challenges of these projects called multijob scheduling. A parameter study job of a workflow having a large number of input files to be consumed by independent job instances is called a multi-job. In order to cope with the high uncertainty and unpredictable load of these infrastructures and with the simultaneous submissions of multi-job instances, we use statistical historical job allocation data gathered from real-world workflow archives and propose an adaptive metabrokering approach for the management of this unified system based on the Pliant logic concept, which is a specific part of fuzzy logic theory. We argue that this novel scheduling technique produce better performance scores, hence the overall load of the system can be more balanced. Keywords-Meta-brokering; Interoperability; Distributed infrastructures; Workflows; Pliant system;
I. I NTRODUCTION Researchers of various disciplines ranging from Life Sciences and Astronomy to Computational Chemistry, create and use scientific applications producing large amount of complex data relying heavily on compute-intensive modelling, simulation and analysis. The ever growing number of such computation-intensive applications calls for the interoperation of distributed infrastructures including private and public Clouds, Grids and clusters. Scientific workflows have become a key paradigm for managing complex tasks and have emerged as a unifying mechanism for handling scientific data. Workflow applications capture the essence of the scientific process, providing means to describe it via logical data- or control-flows. During the execution of a workflow, its jobs are mapped onto resources of concrete Distributed Computing Infrastructures (DCIs) to perform large-scale experiments. Reusing already existing workflows, however, is still challenging because workflows typically have their own user interfaces, description languages and enactment
engines, which are not standard and do not interoperable, therefore the proliferation of workflows in scientific practice is limited. The European SHIWA [24] and its successor, the ER-flow [25] project has been initiated to enable the combination of heterogeneous scientific workflows, and to execute them in a large-scale system consisting of multiple DCIs including academic and public Clouds. Managing these heterogeneous DCIs and scheduling the jobs of these scientific workflows are also difficult problems and require sophisticated approaches. Many of these workflows apply the parameter study construct to sweep large input sets with specific algorithms [18], [19]. A parameter study job of a workflow having a large number of input files to be consumed by independent job instances is called a multi-job. The state-of-the-art approach for executing these parameter study job instances is to submit all instances simultaneously one-by-one to the scheduling component of the workflow management system. This may cause significant overheads in service response time and bottleneck problems. To avoid these problems, we propose in this paper a multijob construct and a multi-job scheduling solution to enable ’group’ submission of these instances and a more efficient distribution of job instances among the available computing resources. As job scheduling in these heterogeneous distributed systems is NP-hard [10], there is a need for solutions that apply some sort of heuristics and approximations. Our current approach tries to find alternative scheduling solutions by gathering data from real-world workflow archives to establish allocation on statistical resource utilization properties. Some of these methods are based on runtime estimates and the inaccuracy of these estimates is a perennial problem mentioned in the job scheduling literature. Even if users are required to provide these values, there is no substantial improvement in the overall average accuracy [14]. To cope with the high uncertainty and unpredictable load present in these heterogeneous large-scale systems, we apply the Pliant system approach [7] to our proposed multi-job scheduling solution, which is similar to a fuzzy system [6], by relying on statistical historical job allocation data gathered from realworld public traces.
Therefore the main contributions of this paper are: (i) the design of a multi-job Pliant-based scheduling algorithm for managing resources of multiple DCIs at a meta-brokering level, and (ii) the evaluation of this approach in a simulation environment using real-world traces. The remainder of this paper is as follows: Section II presents the related resource management approaches; Section III introduces the SHIWA and ER-flow projects highlighting the challenges and the problem area we target; Section IV describes our proposed meta-brokering solution for multi-job management; Section V introduces the advanced scheduling algorithms and discusses how this approach enables better DCI management. Finally, Section VI discusses the performed evaluation by using real-world traces, and the contributions are summarized in Section VII. II. R ELATED WORK Meta-brokering means a higher level brokering approach that schedules user jobs among various distributed infrastructures. This approach can also be regarded as a federation management solution. Current state-of-the-art works usually target one or two infrastructure types to form a federation. E.g. in Grid infrastructures, the InterGrid approach [1] promotes interlinking of Grid systems through peering agreements to enable inter-Grid resource sharing. Regarding Cloud systems, Buyya et al. [2] suggest a Cloud federation-oriented, opportunistic and scalable application services provisioning environment called InterCloud. They envision utility oriented federated IaaS systems that are able to predict application service behaviour for intelligent down and up-scaling infrastructures. Some works investigated federating Grid and Cloud systems [3], but managing multiple infrastructures is rarely studied, yet. Our proposed multi-job construct is similar to the socalled bags-of-tasks (BoT) applications, where the application consists of a large number of independent tasks that need to be executed. Such a BoT application is not necessary part of a workflow, and mapping these tasks to multiple infrastructures is also rare. Nevertheless, GridBot [17] represents an approach for execution of bags-of-tasks on multiple Grids, clusters, and volunteer computing Grids. It has a Workload Manager component that is responsible for brokering among these environments, which is similar to our approach, but they focus on tasks more suitable for volunteer Grids. They also do not support Clouds, and do not address the high dynamicity and uncertainty inherent in these heterogeneous large-scale systems. Oprescu et al. [15] propose a budget constraint-based resource selection approach for Cloud applications. In this work they present a budget-constrained scheduler called BaTS, which can schedule bags of tasks onto multiple clouds with different CPU performance and cost, minimizing completion time with maximized budget. Their scheduler learns to estimate task completion times at run time, while
we use estimations by statistical calculations from real world trace files. Ramirez-Alcaraz et al. [16] have analyzed different Grid allocation strategies depending on the type and amount of information they require, and they found that information about users’ runtime estimate and local schedules does not help to significantly improve the outcome of the allocation strategies. They concluded that quite simple schedulers with minimal information requirements can provide good performance. In our previous work [8] we have already investigated Pliant-based meta-brokering for broker selection among different Grids by evaluating the performance of our metabroker for submitting single jobs within a simulation environment compared to random broker selection. These results clearly indicated the performance gains of the Pliant system approach in grid meta-brokering. Our current approach tries to find alternative scheduling solutions for multi-jobs by gathering data from real-world trace files and establishing allocation on statistical properties. III. E UROPEAN PROJECTS FOR WORKFLOW INTEROPERABILITY
User communities worldwide use many kinds of workflow management and execution systems, which raises interoperation problems among these working groups. Workflow development, testing and validation is a time consuming process and it requires specific expertise. These tasks hinder the growth of the number of available workflows and slow down the production of research results, so it is important to reuse them. Communities using similar workflow engines can benefit from sharing previously developed and tested applications, e.g. using the myExperiment collaborative environment [5]. Besides sharing, interconnecting and combining these workflows can lead to additional research gains and save time. Unfortunately, workflows developed for one workflow enactment system is normally not compatible with workflows of other systems. In the past, if two user communities using different workflow systems wanted to collaborate, they had to redesign the application from scratch to the desired workflow execution platform. To overcome these difficulties, new workflow interoperability technologies must be developed, which is the goal of the European SHIWA project [24]. By using the new SHIWA technologies, publicly available workflows can be used by different research communities working on different workflow systems, and are enabled to run on multiple distributed computing infrastructures. As a result, workflow communities are not locked any more to their selected workflow enactment systems and their supported computing infrastructure. This project has developed and operates the SHIWA Simulation Platform to offer users production-level services supporting workflow interoperability. As part of this platform, the SHIWA Repository facilitates publishing and shar-
Figure 1.
Default scheduling for a multi-job
ing workflows, and the SHIWA Portal enables their actual enactment and execution in most DCIs available in Europe. This platform is built on the gUSE portal technology [11], which uses brokers and infrastructure-specific submitters to access and manage these DCIs. Several portal installations have been performed so far [26], and numerous DCIs are supported by these portals: including Virtual Organizations of the European Grid Infrastructure [20], EGI Cloud testbeds [21], national local clusters, private Clouds and European supercomputers. The European ER-flow project [25] has been initiated to disseminate the achievements of the SHIWA project and use these achievements to build workflow user communities across Europe. It provides application support to research communities within and beyond the project consortium to develop, share and run workflows with the SHIWA Simulation Platform. Both projects supports the execution of multi-job workflows among a growing number of DCIs, therefore it is crucial to develop such resource allocation algorithms that are capable of efficiently distributing high number of simultaneously submitted parameter study jobs among resources of these heterogeneous infrastructures. IV. M ETA - BROKERING APPROACH To efficiently manage multiple, heterogeneous distributed infrastructures within the gUSE portal environment, a metabrokering approach has been developed [13] to coordinate and manage the available brokers by scheduling the jobs of the submitted workflows to them. A tool of this approach has been developed as an open source project called Generic Meta-Broker Service (GMBS), and it is available at [22]. This meta-brokering service has five major components. The Meta-Broker Core is responsible for managing the interaction with the other components and handling user interactions. The MatchMaker component performs the scheduling of the jobs by selecting a suitable broker. This decision making is based on aggregated static and dynamic data stored by the Information Collector component in a local database. The Information System Agent is implemented as a listener
service of GMBS, and it is responsible for regularly updating static and dynamic information on resource availability from the interconnected infrastructures. The Invoker component forwards the jobs to the selected broker and receives the results. Each job submitted for matchmaking is supplied with a standard description document containing their quality of service attributes. More information on these components and the utilized description language can be read in [13]. Besides managing a much larger resource pool with increased number and diversity of heterogeneous distributed infrastructures, our meta-brokering solution have to face a newly emerged scheduling problem of a new job type called multi-job. This multi-job exploits the parameter study construct: it has multiple input sets to execute with multiple job instances of the same program binary. Hence, a parameter study job having a large number of input files to be consumed by independent job instances is called a multijob. (This job type can be denoted in the supplied description document of the job.) This new job type has been introduced within the SHIWA project, and planned to be supported in future releases of the SHIWA Simulation Platform. By using this construct a parameter study job of a workflow can be defined as a multi-job, and submitted to the meta-broker once, similarly as single jobs, to be scheduled among the available DCIs, instead of the former one-by-one submissions [12]. On one hand, this approach can save transfer and service call turnaround times, avoid bottleneck problems and enhance the throughput of GMBS, on the other hand, its scheduling process should be extended to manage multi-jobs, and the scheduled yet not submitted job instances need to be taken into account for later schedules. Figure 1 is used to exemplify the idea behind the default meta-brokering scheduling algorithm for multi-jobs by depicting job instance distribution among 5 DCIs with different resource pool sizes. In short, in case of a multijob the scheduling process overviews the number of free resources available in the participating DCIs, and distributes job instances to these DCIs according to the ratio of their
free resources compared to each other. The proposed default scheduling algorithm (called scheduleMultiJobInstances) receives a vector of the participating DCIs as an input parameter that are capable of executing instances of the multi-job, and the number of job instances to be scheduled. This vector of DCIs is a result of a pre-filtering process of GMBS that rules out such DCIs that do not match the quality of service attributes of the multi-job described by the user. The output is a vector called MappedInstances containing the number of job instances to be submitted to the candidate DCIs accordingly. The first loop of the function counts the total number of free resources by summing up the available free resources minus the previously mapped yet not submitted instances (supposed to reserve one resource each) in all the candidate DCIs. In this loop the DCI having the highest number of free resources is also saved. In the second loop, an amount of job instances is mapped to each candidate DCI according to the ratio of its free resources compared to the total number of free resources. Finally, the number of possibly remaining instances is added to the DCI having the highest number of free resources. V. P LIANT SCHEDULING APPROACH FOR MULTI - JOBS The Pliant system [7] is similar to a traditional fuzzy system [6]. The difference between the two systems lies in the choice of operators. The Pliant system has a strict, monotonously increasing t-norm and t-conorm, and the following expression is valid for the generator function: fc (x)fd (x) = 1,
(1)
where fc (x) and fd (x) are the generator functions for the conjunctive and disjunctive logical operators, respectively. This system is defined in the [0,1] interval. In our previous paper [8], we developed a scheduling component that uses the Pliant system to select a good performing Grid broker for a user’s job even under conditions of high uncertainty. The algorithm we developed calculates a score for each broker using the broker’s properties. The calculation step includes a normalization step, where we apply a linear function. In the normalization step it should be mentioned that if the normalized value is close to one, it means it is a more valuable property, and if the normalized value is close to zero, it means it is a less prioritized property. For example, if the counter of number of job is high, the normalization algorithm should give a value close to zero. In this previous work [8] we found that we can achieve better results, if we use the aggregation operator to calculate the score number. Here, we will improve the scheduling algorithm in order to handle the multi-job case with a similar approach. There are additional, different types of Resources to manage, but they all can be described with the same three properties, namely a number of jobs, an estimate time of jobs, and a number of processors:
Figure 2. The kappa function with the parameter values ν = 0.5, λ = 1, 2, 4 and 8
The number of jobs gives the count of running and waiting jobs on the Resource. • The estimate time of jobs gives the time that is required to execute all the jobs in the Resource. • The number of processors gives the available number of processors of the Resource. We have developed a Pliant decision making algorithm that takes into account the above-mentioned properties and decides to which DCI a job instance should be submitted. First we start with a normalization step and we apply different kinds of linear functions to normalize the DCIs’ property value. We can get the parameters of a linear function by searching the minimum and maximum value of the property. After the normalization step we modify the normalized value to emphasize the importance of the result. To achieve this we will modify the normalized value by using the Kappa function shown in Figure 2 with ν = 0.5 and λ = 2 parameters: •
1
κλν (x) = 1+
ν 1−x 1−ν x
(2)
λ
Finally to calculate a DCI’s score number for a given job instance, we use the aggregation operator: aν,ν0 (x1 , · · · , xn ) =
1 1+
1−ν0 ν ν0 1−ν
Qn
i=1
1−xi xi
,
(3)
where ν is the neutral value and ν0 is the threshold value of the corresponding negation. Here we don’t want to threshold the result so both parameters have the same value 0.5. The
result of the calculation is always a real number that lies in the [0,1] interval. In order to use this Pliant decision maker algorithm we revised the previously introduced default scheduling algorithm (called scheduleMultiJobInstances). In the first loop of the algorithm we gather the earlier mentioned properties for all DCIs to a vector, and in a second loop we call the function implementing the Pliant decision making algorithm with this input vector for all job instances. For a given job instance the algorithm will compute a score number for each available DCI, and the one with the highest score will be selected for submission. The threshold values of the depicted normalization functions has been initially set up based on the current average resource availability numbers in the SHIWA Simulation Platform. We have already evaluated the performance of this approach for purely Grid systems (detailed in the next section), and our future work addresses the evaluation of the presented Pliant scheduling solution for multi-job submissions in a real-world production system provided by SHIWA. During the evaluation we will modify the threshold values of the normalization function to improve the scheduling efficiency. VI. E VALUATION AND DISCUSSION In order to evaluate our multi-job meta-brokering solution, we created a general simulation environment based on the GridSim toolkit [4], which is a fully extendible, widely used and accepted grid simulation tool. In this simulated evaluation we did not explicitly differentiate the available DCIs. We considered all resources of the participating DCIs equally, therefore in the simulations each resource represents a ’virtual’ DCI. In this way we could simulate 20 DCIs, represented by ’resources’ in the simulation environment, and the goal of the proposed allocation algorithms is to distribute the multi-jobs among these resources. As we stated in the introduction, our current approach tries to find alternative scheduling solutions by gathering data from trace files and establishing allocation on statistical properties. In order to address universality, we use real user application execution traces as background workload gathered both in parallel and production Grid environments, published in an online archive [23]. We have investigated the workload trace files from the Grid Workloads Archive (GWA) [9], and have chosen the GWA-T-1 DAS2 trace file containing records for almost two years provided by the Advanced School for Computing and Imaging, the owner of the DAS-2 system. The main reasons for choosing this trace were that they contain the largest number of jobs, and include all three group identities for the logged jobs: Group, User and Execution ids. These categories mean that the considered job belongs to the same group of users, or to the same user or an instance of the same application,
respectively. In the considered DAS2 trace file the GROUP category has 12, the USER category has 333, and the EXE category has 9070 members. These traces have been partitioned according to the PartitionID fields, which denote the machines the jobs had been submitted to. We used these partitioned files to feed the simulated resources with background (workload) jobs. In this way we created 5 files, which we divided into 4 parts based on the logged time equally, to get 20 files to be used for simulating 20 resources. Within GridSim, resources consist of one or more machines, to which workloads can be set. As an extension of GridSim classes, we have developed the ’GridSimStatQueueBroker’, ’SimWorkload’ and ’SimulatorSetup’ entities in order to enable the simulation of the previously mentioned scenarios. On top of these simulated Grid infrastructures we can use the broker entities for setting up brokers with various allocation policies, the ’SimWorkload’ entities are used for submitting the workload jobs, while the ’SimulatorSetup’ component is responsible for parameterizing and executing each experiment. We have also developed a scripting layer for the simulator in order to automatize and easily parametrize the simulation runs, and to generate plots. We have created a statistical job runtime categorization based on the ids and the exact runtimes found in the traces. For each id category (ie. execution, user and group) we have created a database that contains the calculated mean runtime for all jobs of the considered trace file with the same id. We used these values in the simulations to estimate the runtime of a given job by the meta-broker. In this way the meta-broker (ie. ’GridSimStatQueueBroker’ entity in the simulator) knows an estimated waiting time for all available resources in the system based on these statistics at a given time in the simulation, while the resources execute the jobs based on the exact runtimes read from the original traces. During the simulations the runtime of the parameter study jobs are given explicitly, or with an id that refers to the statistically calculated execution time. First, as a pre-evaluation, we considered n resources in the system, each having m machines with 1 processor. The workload present in these resources were submitted from trace files containing jobs having the original run times given in the traces. The jobs in the traces are marked by group identifiers, and in the simulations the scheduler knows for each group the run time estimation represented by the mean value of all jobs belonging to a group. We submitted k jobs at a given time to the system, having the same, predefined run time given by a group id (determining its mean run time) or a concrete run time value. From these simulations we could see that our allocation algorithms relying on statistical job runtime categorization read from trace files can perform significantly better than random resource selection. This proved that the categorization is valid in the trace files. On the other hand, there were no significant differences in performance among the different groups.
Table I E VALUATION PARAMETERS . Jobs 100
Job run time
Pre-load arrival and run time
DAS2: USER
as in DAS2 traces
5000
Start time 6909476 12274286 23077573
Delay time 1000 69120
Table II AVERAGE RUNTIME OF 100 JOB INSTANCES IN THE SIMULATIONS . EVAL. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
RES. TYPE R1 R1 R1 R1 R1 R1 R1 R1 R1 R1 R1 R1 R2 R2 R2 R2 R2 R2 R2 R2 R2 R2 R2 R2
NUM. JOBS 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
ID 117 11 117 11 117 11 117 11 117 11 117 11 117 11 117 11 117 11 117 11 117 11 117 11
CUTTIME 12274286 12274286 12274286 12274286 23077573 23077573 23077573 23077573 6909476 6909476 6909476 6909476 12274286 12274286 12274286 12274286 23077573 23077573 23077573 23077573 6909476 6909476 6909476 6909476
Second, we considered n resources in the system, each having m machines with more processors. The workloads present in these resources were also submitted from trace files containing jobs having the original run times given in the traces. The jobs in the traces are marked by group identifiers, and in the simulations the scheduler knows for each group the run time estimation represented by the mean value of all jobs belonging to a group. We submitted k jobs at a given time to the system, having the same, predefined run time (given by a group id or a concrete run time value). The exact simulation parameters are shown in Table I. As we stated before we used the DAS2 trace file that contain utilization records for almost two years. We partitioned this trace file in order to reduce the simulated time interval and the number of jobs arriving to the system. The Start time depicted in Table I (called as CUTTIME in Tables II and III) denotes the submission time of the first background job to the simulation environment according to the original trace file. By using three different start times we could pick different time intervals to simulate different background load. The Delay time depicted in the tables show the applied delay (in seconds) in the simulations before the multi-jobs are submitted to the meta-broker. Within this delay time
DELAY 1000 1000 69120 69120 1000 1000 69120 69120 1000 1000 69120 69120 1000 1000 69120 69120 1000 1000 69120 69120 1000 1000 69120 69120
RND AVG 340.16 239.71 139.31 131.07 128.14 149.35 12220.74 10850.00 120.07 138.23 35083.48 28094.22 177.44 146.02 10644.25 31737.83 111.99 106.94 2878.26 3451.47 106.94 107.94 22496.75 9699.54
STAT AVG 260.95 126.00 127.19 124.07 128.63 127.29 126.05 131.06 125.64 125.64 125.03 124.99 155.62 105.12 105.13 105.13 105.84 105.67 158.52 108.11 105.84 105.84 106.10 105.12
PLIANT AVG 126.14 123.11 130.17 130.17 130.18 130.18 132.18 131.17 126.14 126.14 125.13 123.11 106.95 105.94 106.95 106.95 107.96 106.95 108.95 107.94 105.94 105.94 104.93 105.94
the arrived background jobs can generate some load on the available resources. For these realistic simulations we used jobs with user ids 11 and 117, which have mean execution times 462 and 5801 seconds respectively (computed from the whole trace files). In these cases the simulated Grid environment consisted of resources having various number of machines and processors. For resource setup R1 we used 20 resources, out of which 5-5 resources had 1, 2, 3 and 4 machines respectively, and each machines had 2 processors. Regarding resource setup R2 we used 20 resources, out of which 5-5 resources had 2, 4, 8 and 10 machines respectively, and each machines had 2 processors is this case, too. We performed the evaluations for multi-jobs having 100 and 5000 job instances. We gathered the average runtime of job instances of the preformed measurements in Table II and Table III to better exemplify the differences of the random, the default statistics-based and Pliant-based allocations. Our proposed default and Pliant-based statistical brokering approach performed better than randomized resource selection most of the time. In some cases, when almost no background load was present in the system, we experienced that the random, the default and the Pliant algorithms
Table III AVERAGE RUNTIME OF 5000 JOB INSTANCES IN THE SIMULATIONS . EVAL. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
RES. TYPE R1 R1 R1 R1 R1 R1 R1 R1 R1 R1 R1 R1 R2 R2 R2 R2 R2 R2 R2 R2 R2 R2 R2 R2
NUM. JOBS 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000
ID 117 11 117 11 117 11 117 11 117 11 117 11 117 11 117 11 117 11 117 11 117 11 117 11
CUTTIME 12274286 12274286 12274286 12274286 23077573 23077573 23077573 23077573 6909476 6909476 6909476 6909476 12274286 12274286 12274286 12274286 23077573 23077573 23077573 23077573 6909476 6909476 6909476 6909476
performed around the same (e.g. see the 3-5th, 21-22th rows of Table II, and 1-2nd rows of Table III). Nevertheless we experienced significant differences in the makespan (e.g. see the 7-8th, 11-12th, 15-16th and 23th rows of Table II, and the 4th, 8th and 12th rows of Table III), when we enlarged the resource heterogeneity by varying the number of processors within the machines of a resource (or DCI). Although our default statisticsbased algorithm provided better performance than random selection, the Pliant-based algorithm resulted in significantly better performance in cases when the default algorithm could not achieve additional performance gains – generally for the higher number of job instances (e.g. see the 4-5th, 7th, 9th, 11th, 21th and 23th rows of Table III). Overall, we can state that although the values of the job runtimes sometimes vary highly within the same group categories, estimations by statistically reusing this information from historical trace files can provide us reliable information to perform better allocations then randomized algorithms with no apriori knowledge on job runtimes. VII. C ONCLUSION User communities continuously develop computationintensive workflow applications, which results in a growing demand of resources and more efficient management of available computing infrastructures. The European SHIWA and ER-flow projects provide ways to combine and interoperate these scientific workflows, and to execute them in a large-scale system of multiple heterogeneous Distributed Computing Infrastructures.
DELAY 1000 1000 69120 69120 1000 1000 69120 69120 1000 1000 69120 69120 1000 1000 69120 69120 1000 1000 69120 69120 1000 1000 69120 69120
RND AVG 3360.12 3333.20 9744.83 10076.59 7440.80 7880.09 15581.84 16996.82 31236.43 31790.35 37842.87 38504.54 18883.42 17178.39 2941.11 2798.33 2424.22 2307.30 3546.33 3631.57 14549.51 16964.12 15639.61 16241.94
STAT AVG 3175.04 3233.22 9649.20 9672.71 7852.84 6909.25 15737.07 7036.65 31954.28 25895.59 36558.30 9330.02 16965.20 14795.08 2857.95 2925.62 2164.27 2129.79 3475.74 2734.73 15136.15 13734.96 16310.67 10680.17
PLIANT AVG 3465.31 3465.30 4765.13 4765.13 3512.19 3510.86 3518.89 3517.94 3336.21 3336.21 3339.54 3337.99 1606.30 1606.58 2095.74 2097.25 1661.10 1662.15 1651.70 1652.05 1574.77 1573.78 1573.84 1573.71
In this paper we proposed a novel meta-brokering algorithm for scheduling multi-jobs over multiple DCIs to reduce submission bottlenecks and workflow execution times. We have applied the Pliant approach to cope with the high uncertainty and unpredictable load present in these heterogeneous large-scale systems, and used statistical historical job allocation data gathered from real-world public traces. The evaluation results prove that relying on statistics-based application data can result in better multi-job allocation, and the applied Pliant-based scheduling solution can provide significant performance gains in reducing execution time of scientific workflow applications. Our future work aims to involve academic Clouds among the managed DCI, and to evaluate our approach in a heterogeneous real-world environment provided by the SHIWA and ER-flow projects. ACKNOWLEDGEMENT The research leading to these results has received funding from the ER-Flow project of the European Commission’s FP7 INFRASTRUCTURES-2012-1 call under grant agreement 312579, and it was supported by the European Union and the State of Hungary, co-financed by the European Social Fund in the framework of TAMOP 4.2.4. A/2-111-2012-0001 National Excellence Program. R EFERENCES [1] M. D. Assuncao, R. Buyya and S. Venugopal. InterGrid: A Case for Internetworking Islands of Grids. Concurrency and Computation: Practice and Experience (CCPE), Jul. 16 2007.
[2] R. Buyya, R. Ranjan, and R. N. Calheiros. InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services. Lecture Notes in Computer Science: Algorithms and Architectures for Parallel Processing. Volume 6081, 2010.
[16] J.M. Ramirez-Alcaraz, A. Tchernykh, R. Yahyapour, U. Schwiegelshohn, A. Quezada-Pina, J.L. Gonzalez- Garcia, A. Hirales-Carbajal. Job allocation strategies with user runtime estimates for online scheduling in hierarchical Grids. J. Grid Computing 9(1), pp. 95116, 2011.
[3] R. Buyya, R. Ranjan. Special section: Federated resource management in grid and cloud computing systems. Future Generation Computer Systems, vol. 26, pp. 1189–1191, 2010.
[17] M. Silberstein, A. Sharov, D. Geiger, and A. Schuster. GridBot, execution of bags of tasks in multiple grids. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC ’09), 2009.
[4] R. Buyya, M. Murshed, and D. Abramson. Gridsim: A toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. In Journal of Concurrency and Computation: Practice and Experience, pp. 1175-1220, 2002.
[18] Alessandro Constantini. RWavePR workflow at GASuC. Online: http://www.lpds.sztaki.hu/gasuc/index.php?m=7&s=12, 2012.
[5] C.A. Goble, J. Bhagat, S. Aleksejevs, D. Cruickshank, D. Michaelides, D. Newman, M. Borkum, S. Bechhofer, M. Roos, P. Li, and D. De Roure. myExperiment: a repository and social network for the sharing of bioinformatics workflows. Nucl. Acids Res., 2010. [6] J. Dombi. A general class of fuzzy operators, the de morgan class of fuzzy operators and fuzziness measures induced by fuzzy operators. Fuzzy Sets and Systems 8, 1982. [7] J. Dombi. Pliant system. IEEE International Conference on Intelligent Engineering System Proceedings, Budapest, Hungary, 1997. [8] J. D. Dombi, A. Kertesz. Advanced Scheduling Techniques with the Pliant System for High-Level Grid Brokering. Communications in Computer and Information Science (CCIS), Vol. 129, Springer-Verlag Berlin Heidelberg, pp. 173–185, 2011. [9] A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, D. H.J. Epema, ”The Grid Workloads Archive”, Future Generation Computer Systems, Volume 24, Issue 7, pages 672-686, July 2008. [10] M. R. Garey and D. S. Johnson. Computers and Intractability; a Guide to the Theory of Np-Completeness. W. H. Freeman & Co., New York, USA, 1979. [11] P. Kacsuk. P-GRADE portal family for Grid infrastructures. Concurrency and Computation: Practice and Experience journal, Volume: 23, Issue: 3, pp. 235-245, 2011. [12] P. Kacsuk, Z. Farkas, G. Hermann. Workflow-level parameter study support for production grids. Computational science and its application (ICCSA’07), LNCS 4707, pp. 872-885, 2007. [13] A. Kertesz, P. Kacsuk. GMBS: A New Middleware Service for Making Grids Interoperable. Future Generation Computer Systems, vol. 16, pp. 542-553, 2010. [14] C. B. Lee, Y. Schwartzman, J. Hardy, and A. Snavely. Are user runtime estimates inherently inaccurate? Springer LNCS, Volume 3277, pp. 253-263, 2005. [15] A. Oprescu and T. Kielmann. Bag-of-Tasks Scheduling under Budget Constraints. CloudCom 2010, pp. 351–359.
[19] Andrea Wiggins. Success-AbandonmentClassification workflow at myExperiment. Online: http://www.myexperiment.org/workflows/140.html, 2012. [20] European Grid Infrastructure. Online: http://www.egi.eu/, 2012. [21] EGI Cloud testbeds. Online: https://wiki.egi.eu/wiki/Fedcloud-tf:Testbed, 2012. [22] Generic Meta-Broker Service at SourceForge. Online: http://sourceforge.net/projects/gmbs/, 2012. [23] The Grid Workloads Archive website. Online: http://gwa.ewi.tudelft.nl, 2010. [24] SHaring Interoperable Workflows for large-scale scientific simulations on Available DCIs (SHIWA) Eu FP7 project. Online: http://www.shiwa-workflow.eu/, 2012. [25] Building a European Research Community through Interoperable Workflows and Data (ER-flow) Eu FP7 project. Online: http://www.erflow.eu/, 2013. [26] gUSE portal website. Online: http://www.guse.hu/?m=installations, 2012.