Proposal and Development of a Reconfigurable Parallel Job ...

4 downloads 3104 Views 231KB Size Report
In parallel computers, jobs and user requirements change dynamically ... parallel job scheduling algorithm can assume different configurations, according to.
Proposal and Development of a Reconfigurable Parallel Job Scheduling Algorithm Luís Fabrício Wanderley Góes, Carlos Augusto Paiva da Silva Martins Graduate Program in Electrical Engineering – PUC Minas {lfwgoes,capsm}@pucminas.br

Abstract. In parallel computers, jobs and user requirements change dynamically, creating a great challenge to parallel job schedulers. Parallel job scheduling consists in the decision on how to allocate jobs to a parallel computer along the time. Thus, a parallel job scheduling algorithm, ideally, needs a flexible behavior to adapt to the workload and environment variations. We showed that a fixed behavior parallel job scheduling algorithm cannot provide the best performance for all workloads and parallel computers. So, we proposed an algorithm that is capable to dynamically change its structure (configuration) and consequently its behavior, according to environment and workload variations. Particularly in this research, we used concepts of reconfiguration in a specific class of parallel job scheduling called gang scheduling. Our results showed that the performance of our algorithm (Reconfigurable Gang Scheduling Algorithm - RGSA) was around 40% (upper bound) better than the other fixed gang scheduling algorithms.

1. Introduction Nowadays, the service quality requirements of users and institutions increased. Thus, computer systems that provide many services (particularly, parallel computers) need to be highly utilized and provide a short response time for users jobs. Parallel job schedulers should match both requirements and workload (jobs) with resource availability (processors, memory etc.) in order to maximize the system’s performance. The main problem is that workload, requirements and resources may change continuously and parallel computers use job scheduling algorithms with a fixed behavior, which treat all these situations in the same way. In order to deal with this problem, many works have been developed to make job scheduling algorithms more flexible and adaptable [1], [4], [6], [14]. Up to now, a poorly explored solution is the use of reconfiguration in parallel job scheduling algorithms. Reconfigurable computing emerged as a paradigm to fill in the gap between hardware and software, reaching better performance than software and more flexibility than hardware [2], [3], [4], [16]. The reconfigurable devices including FPGAs (Field Programmable Gate Arrays) contain an array of computing elements or constructive blocks, whose functionalities are determined through the programming of configuration bits. Thus, an FPGA can implement different behaviors not established at design time. Because of this, reconfigurable devices (hardware) are improving the solutions for problems from different areas [2], [3], [4], [16].

Our main hypothesis is: a parallel job scheduling algorithm with a fixed (inflexible) behavior cannot be the best one for all situations. To verify our hypothesis, we propose a set of experiments to show that the use of a fixed behavior algorithm is not efficient. Thus, we propose a solution to our problem that provides the required flexibility to a parallel job scheduling algorithm. We use reconfiguration concepts, or extend the concepts used in reconfigurable devices to the algorithm level. With this feature, a parallel job scheduling algorithm can assume different configurations, according to input parameters such as: performance metrics (utilization, mean response time of jobs etc.) and workload characteristics (mean execution time of jobs, mean parallelism degree of jobs etc.). Also, a reconfiguration causes the algorithm to assume a good configuration for a particular situation considering the system’s state at a given moment. According to a deep bibliographic revision presented in [14], we found researches that apply reconfiguration in software, but we did not find a previous research that used it on algorithms. In [9], we used a first approach to build a reconfigurable algorithm of a static parallel job scheduling algorithm. We improved this first approach to reach our present stage.

2. Reconfigurable Parallel Job Scheduling Algorithm Among parallel job scheduling algorithms, we remark gang scheduling algorithms. They have been intensely studied in the last decade [1], [5], [14], and demonstrated many advantages over other parallel job scheduling algorithms, for instance, they: provide interactive response time for short jobs, through preemption; prevent long jobs from monopolizing processors; maximize the system’s utilization etc [1], [4], [6], [14]. In this research, we proposed a model of a gang scheduling algorithm which is composed of at least four parts: a packing scheme, a re-packing scheme, a queue policy and a multiprogramming level [14].

(a)

(b)

Figure 1. (a) Reconfigurable Algorithm Architecture; (b) The Frameset of the Basic Layer of RGSA.

A reconfigurable algorithm is composed of constructive blocks and frames, which makes possible to change its behavior by altering its configuration (structure). It is organized in three layers: the Basic Layer (BL), the Reconfigurable Layer (RL) and the Configuration Control Layer (CCL), as shown in Fig.1 (a). The BL is a frameset composed of data structures (for storage) and frames (action and control). An action

frame represents a part or phase of an algorithm, while a control frame controls a specific characteristic of a data structure. The RL is a configuration or an instance of the BL, in which every frame is filled out with one compatible building block at a certain moment. A building block is a possible implementation of a frame. The CCL is responsible for selecting and loading the building blocks that fill out the frames at a given moment. Those decisions may be made based upon: input parameters, dynamic workload information, commands from the operating system, user’ s choice etc. To deal with the reconfiguration overhead, we have three important aspects: the CCL layer update, the selection of a configuration and the configuration loading. In the experimental results presented in [14], we showed that the configuration loading overhead can be neglect when compared to the high performance gains provided by reconfigurable algorithms. But the selection of the best configuration is not a trivial task and presents a tradeoff between performance and complexity. The CCL design, which deals with the selection process, is a key problem that can be done using artificial intelligence and statistic techniques to analyze past information and to predict which configuration should be used. To design a reconfigurable algorithm, we must execute the following steps: i) to choose a set of traditional algorithms to solve a certain problem; ii) to identify the common parts (functionalities and data structures); iii) to model each part of the algorithm as a frame; iv) to identify and specify possible constructive blocks per frame; v) to create the CCL layer. In our proposed solution, called Reconfigurable Gang Scheduling Algorithm (RGSA), as show in Fig. 1 (b), each part is a different frame with constructive blocks. The Packing Schemes Frame may be filled out with different packing schemes based on capacity: first fit or best fit. The Re-Packing Schemes Frame may be filled out with the slot unification and/or alternative scheduling re-packing schemes. The Queue Policies Frame can use the First Come First Served (FCFS) or Short Job First (SJF) policies. Finally, the Multiprogramming Levels Frame can be filled out with the Unlimited or Limited Multiprogramming Level Constructive Blocks. In our RGSA, the CCL is implemented as a selection structure that selects the best configuration according to some workload parameters: execution time, parallelism degree, predominance degree and performance metric. The CCL evaluates these parameters and dynamically reconfigures RGSA to the best configuration.

3. Experimental Results In this research, we defined, proposed, developed, implemented and analyzed the performance of RGSA. Moreover, we developed a simulation library called JSDESLib [15] and a cluster simulation tool called ClusterSim [7] [8] to provide our experimental environment. To validate our hypothesis, to simulate and to analyze RGSA [6] [14], we compared each frame of RGSA with 12 traditional and proposed gang scheduling algorithms using 12 different workloads (with 10 simulation seeds) in a 16-node cluster, which was a total of 1440 simulations. In the RGSA frames analysis, we can remark (Fig. 2(a)): i) Packing Schemes Frame.: Considering all metrics, on average, both packing schemes (first fit and best fit) presented an equivalent performance. It suggests that other constructive blocks may be used; ii) Re-Packing Schemes Frame.: Considering all metrics, on average, both re-

packing schemes (slot unification and alternative scheduling) presented an equivalent performance. It suggests that other constructive blocks may be used or developed; iii) Multiprogramming Levels Frame.: Considering utilization and simulation time metrics, the unlimited multiprogramming level presented a better performance to homogenous workloads, and the limited one to the heterogeneous workloads. For reaction time and slowdown metrics, the unlimited multiprogramming level presented best performance in all cases. Finally, considering the response time metric, on average, the limited multiprogramming level was the best; iv) Queue Policies Frame.: Considering the utilization and simulation time metrics, the SJF policy was always better than the FCFS. For reaction time and slowdown metrics, on average, the SJF policy presented a better performance, but in some specific cases FCFS was better than the other. Finally, considering the response time metric, the SJF policy presented a better performance to homogenous workloads and the FCFS to the heterogeneous workload. [5], [6], [14].

Figure 2. (a) Mean utilization considering each frame; (b) RGSA speedup over other gang scheduling algorithms.

One of the most important results is shown in Fig. 2 (b), in which the performance of RGSA, on average, has an upper bound of 40% better than the other gang scheduling algorithms for all tested workloads. These results show that a parallel job scheduling algorithm with a fixed behavior cannot be the best in all situations and the use or reconfiguration can lead to a high speedup. Considering the slowdown and reaction time metrics, RGSA performance gain is near 100% better than 8 different algorithms. If we consider only the utilization metric, the speedup of RGSA over Alg05 increases from 18.83% to 42.32%. In this case, Alg05 (the best on average) would be worse than Alg03, which is considered the worst algorithm. In our specific case, the longest simulation took about 13000 seconds (3 hours and 36 minutes). So we got to reduce the simulation time in 40% (1 hour and 26 minutes) using RGSA. But in real systems, a workload may execute for a week. In that case, a reduction of 40% would mean to reduce the workload execution time in 3 days.

4. Conclusions In this paper, we summarized the main results and contributions obtained in the master thesis [14]. The most important contributions were: i) the proposal and performance analysis of RGSA, which showed that the use of reconfiguration can provide a flexible behavior to a scheduling algorithm and lead the parallel computer to a high performance; ii) to show that a fixed behavior parallel job scheduling algorithm cannot be the best one for all situations;

More information1, results, tools and documentation about this research are available on: http://www.ppgee.pucminas.br/lfwg/.

References [1] Feitelson, D., Rudolph, L., Schwiegelshohn, U., Sevcik K., Wong, P., “Theory and Practice in Parallel Job Scheduling”, 3rd Workshop on Job Scheduling Strategies for Parallel Processing, pp. 1-34, 1997. [2] Dehon, A., “The Density Advantage of Configurable Computing”, IEEE Computer, Vol. 33, 2000. [3] Martins, C. A. P. S., Ordonez, E. D. M., Corrêa, J. B. T., Carvalho, M. B., “Computação Reconfigurável: Conceitos, Tendências e Aplicações”, Jornada de Atualização em Informática, 2003. [4] Wiseman, Y., Feitelson, D., “Paired Gang Scheduling”, IEEE Transactions Parallel & Distributed Systems, pp. 581-592, 2003. [5] Góes, L. F. W., Martins, C. A. P. S., “Escalonamento Paralelo de Tarefas: Conceitos, Simulação e Análise de Desempenho”, WSCAD, pp. 234-254, 2004. [6] Góes, L. F. W., Martins, C. A. P. S., “Reconfigurable Gang Scheduling Algorithm”, 10th Workshop on Job Scheduling Strategies for Parallel Processing, LNCS, 34-45, 2004. [7] Pousa, C. V., Ramos, L. E. S., Góes, L. F. W., Martins, C. A. P. S., “Extending ClusterSim with Message-Passing and Distributed Shared Memory Modules”, ISHPCSE, Kluwer Publishers, 2004. [8] Góes, L. F. W., Ramos, L. E. S., Martins, C. A. P. S., “ClusterSim: A Java Parallel Discrete Event Simulation Tool for Cluster Computing”, IEEE International Conference on Cluster Computing, 2004. [9] Góes, L. F. W., Martins, C. A. P. S., “RJSSim: A Reconfigurable Job Scheduling Simulator for Parallel Processing Learning”, 33rd ASEE/IEEE Frontiers in Education Conference, pp. F3C3-8, 2003. [10] Pousa, C. V., Góes, L. F. W., Ramos, L. E. S., Penha, D. O., Martins, C. A. P. S., “A Comparative Performance Analysis of Parallel Algorithm Models”, 3rd CSiTeA, Rio de Janeiro, 2003. [11] Góes, L. F. W., Ramos, L. E. S., Martins, C. A. P. S., “Performance Analysis of Parallel Programs using Prober as a Single Aid Tool”. IEEE 14th SBAC-PAD, pp. 204-211, 2002. [12] Góes, L. F. W., Ramos, L. E. S., Martins, C. A. P. S., “Parallel Image Filtering Using WPVM in a Windows Multicomputer”. 2nd CSiTeA, Foz do Iguaçu, 2002. [13] Ramos, L. E. S., Góes, L. F. W., Martins, C. A. P. S., “Teaching And Learning Parallel Processing Through Performance Analysis Using Prober”. 32nd IEEE Frontiers in Education Conference, 2002. [14] Góes, L. F. W., Martins, C. A. P. S., “ Proposal and Development of a Reconfigurable Parallel Job Scheduling Algorithm” , Master Thesis, PUC Minas, May, 2004. [15] Góes, L. F. W., et. all, “ JSDESLib: A Library for the Development of Discrete-Event Simulation Tools of Parallel Systems” , Workshop on Java for Parallel Computing, IPDPS, 2005. (to be published) [16] Pousa, C. V., Góes, L. F. W., Martins, C. A. P. S., “ Reconfigurable Object Consistency Model” , Workshop on Advances in Parallel Computational Models, IPDPS, 2005. (to be published)

1

This work started as an undergraduate research, in which the author won the best paper award in technology area in PUC Minas. This master thesis resulted in three open source software tools (Prober [11], ClusterSim [5],[7],[8] and JSDESLib [15]), the publication of two book chapters [5],[7], papers in some national and international conferences [8], [9], [10], [11], [12], [13], [15] and a paper in JSSPP [6], th the world’ s best conference in parallel job scheduling and the 19 conference ranked in CiteSeer. Other researches applied the concepts of reconfigurable algorithms in different problems [16].

Suggest Documents