Dynamic Task Scheduling in Multiprocessor and the Swift Embryonic ...

40 downloads 14045 Views 174KB Size Report
Swift Embryonic World of Parallel Computing. Dr. D.I. George Amalarethinam ... important issue in the parallel computing systems is the development of effective.
International Journal of Algorithms, Computing and Mathematics Volume 3, Number 4, November 2010 ©Eashwar Publications

A Survey : Dynamic Task Scheduling in Multiprocessor and the Swift Embryonic World of Parallel Computing Dr. D.I. George Amalarethinam Director - M.C.A., Dept. of Computer Science, Jamal Mohamed College, Trichy, Tamilnadu. [email protected]

G.J. Joyce Mary Research Scholar, PRIST University, Thanjavur Principal & HOD of Computer Sci., Dr.Nallikuppusamy Arts College(for Women), Thanjavur - 3, Tamil Nadu. [email protected] Abstract This paper gives a survey about the impact of modern high performance computing paradigms over Dynamic Task scheduling in Multiprocessor Systems. Our aim is to help the Multiprocessor Scheduling community to feel more comfortable with the evolving parallel paradigms, and marking some areas of research for the HighPerformance Computing (HPC) is the major inspiration behind this survey. The most important issue in the parallel computing systems is the development of effective techniques for the assignment of tasks of a parallel program to multiple processors, to minimize the execution time of the program. To assign task to the multiprocessor, the scheduling strategies are very essential. One among the scheduling strategy is Dynamic Task Scheduling or Dynamic Load Balancing. It attempts to maximize the utilization of the processors in the system at all times and is more complex with challenges. Dynamic Scheduling re allocate the tasks to the processors during the execution time. These challenges are discussed in the paper and given potential solutions.

1. Introduction Parallel processing[1] is information processing that emphasizes the concurrent manipulation of data elements belonging to one or more processes solving a single problem. A parallel computer is a multiple processor computer capable of parallel processing. Parallel Processing is a combined field of studies which requires a broad knowledge of an experience with all aspects of algorithms, languages, hardware, software, performance evaluation and computing alternatives. It will be effective only, if all the processors are utilized at a time. It is very essential to utilize all the processors, without leaving any processor idle. To assign task evenly to all the processors it is necessary to schedule dynamically. When the processing is speed high, the multiprocessor system can be used as an effective real time system for dynamic scheduling. 53

International Journal of Algorithms, Computing and Mathematics

Parallel algorithms can be categorized either as data parallel or as Control parallel. i. Data Parallelism : It is the use of multiple functional units to apply the same operation to different elements of a data set. ii. Control Parallelism : It is achieved through the simultaneous application of different operations to different data elements. 2. Classification of scheduling Multiprocessor operating systems are more complex due to the complexity of parallel hardware and more importantly, due to the performance requirements of user programs. Performance requirement includes resource management (i.e) assignment of processes to processors. The resource management should not contribute significantly to the overhead of the system. There are two generic types of scheduling strategies : Static and Dynamic. While giving the task scheduling[2] if the scheduling is Dynamic, the utilization of processor is more advantageous. The Dynamic Scheduling might be depending upon the Earliest Finish Time, or Earliest Dead line, or with Shared memory with Uniform Memory Address(UMA) or with central queue which has the tasks and send to the processor with FIFO manner or with subtasks list according to its computation time etc. This paper proposes an analysis of few such dynamic scheduling algorithms and suggesting a new concept to implement the same in multiprocessor system. 3. Challenges for the High Performance Computing(HPC) commmunity Many technical challenges are involved in an efficient implementation of Dynamic Scheduling Algorithms in the modern parallel computing models. Algorithm developers face a lot of challenges regarding implementation over modern computing paradigms. A developer will be reluctant to follow a very complex programming model that is difficult to design, and very error prone. There is a responsibility to the HPC community, to provide an easy way to develop an algorithm. The following challenges faced by HPC community are : (a) Afford a way to deal with the heterogeneous resources. (b) Provide with simulators. (c) Grant an integrated way to program hybrid parallel environments. (d) Provide easy and reliable method to access the remote archival and real time data sources. (e) Supply a certain Quality Of Service(QOS) to the application developer despite all the uncertainties involved. (f) Present advanced reservation of the resources. (g) Propose an easy to use environment. (h) Provide security, fault tolerance, scalability, load balancing, message passing, data storage, communication protocols and architecture, accounting, economic models, debugging tools etc. 4. Different Parallel Paradigm of Dynamic Scheduling Dynamic Scheduling have more different Parallel Paradigm to implement in parallel machines. This section presents a survey of the previous work done in the area of Dynamic Scheduling for multiprocessor systems. 54

A Survey : Dynamic Task Scheduling in Multiprocessor and the Swift Embryonic World of Parallel Computing

4.1 Dynamic Scheduling with Heuristic Function: Most of the parallel algorithm implementation is based on heuristic function. Ramamirtham et al propose Myopic Algorithm[2], the best known heuristic algorithm. This algorithm used in Spring Systems[2]. In the Spring System, tasks can arrive dynamically at any node in the system. The local scheduler on a node tries to guarantee that the task will complete before its deadline. It does so by determining if the new task plus all the previously guaranteed tasks on this node can be scheduled to complete before their deadlines. The new task will be guaranteed, only if such a schedule exists. Otherwise the new task will not be guaranteed. In either case, previously guaranteed tasks remain guaranteed, the non guaranteed task can be sent to an another node, if appropriate. In this Myopic Algorithm, local scheduler is focusing, specifically the component which dynamically determines if a feasible schedule can be found for a set of tasks. According to earliest start time Test the task can begin execution. Test - The time when resources required by a task are available.  Evaluated the heuristic approach, when tasks with deadlines and resource requirements are scheduled on multiprocessor.  Allow multiple instances of a resource time.  Evaluated two kinds of a multiprocessor model, Shared memory model and Local memory model. The myopic algorithm initially considered the real-time system with resource constraints. The heuristic function in the myopic algorithm is used actively direct the search for a feasible schedule. It starts at the root of the search tree which is an empty schedule. The algorithm tries to widen the schedule by stirring to one of the vertices at the next level in the search tree until a full feasible schedule is derived. So, a heuristic function H is applied to almost k tasks (known as the feasibility check window) that remain to be feasibly scheduled at each level of search. The task with the minimum value of function H is selected to extend the current partial schedule. If the current vertex is not strongly feasible, the algorithm backtracks to the previous search point and the schedule is extended using a task with the next minimum heuristic value. The heuristic function is an integrated heuristic, which captures the deadline and resource requirements of task Tk and W is an input parameter. In myopic algorithm, there are many H functions. It uses the last H function that has a best recital in all the H functions. The advantage of the H functions is that, it considers two factors: deadline and resource. 4.2. A Comprehensive Dynamic Processor Allocation Scheme : Iffat et al proposed, the comprehensive dynamic processor allocation scheme[3] for sharedmemory multiprocessor systems with a multi programmed workload, it allows parallel application programs to dynamically adjust to both the program’s varying behavior and the system load. Based on the program’s behavior, the dynamic processor allocation system decides the number of processors, a parallel code region can beneficially use. In this scheme, Pipelined execution of concurrent threads in a super threaded processor with Java Speculative Multithreading (JavaSpMT) parallelization model is used. The parallel 55

International Journal of Algorithms, Computing and Mathematics

Execution Decision Heuristic, which uses Sequential Execution Time(Ts), and Parallelization Overhead(Toh) of speedup with P processor. Parallel execution time on P processor Tp = Toh + Ts / P. It should be profitable only when Tp < Ts or (Toh + Ts / P) < Ts Existing implementation of the dynamic processor allocation scheme allows dynamic parallelization of well structured loops only (e.g. for loops) discussed in this paper. They had a plan to extend the run-time system to include the profiling and parallelization of do-while type loops that have non-deterministic loop annihilation conditions. This extension will allow the run-time system to dynamically control parallel execution of a wide variety of application programs. 4.3. Dynamic Task Scheduling Using Online Optimization : The Self-Adjusting Dynamic Scheduling (SADS) class of algorithms were proposed by Babak Hamidzadeh et al, which uses a unified cost model[4] to explicitly account issues like processor load balance, memory locality, and scheduling overhead at runtime. A dedicated processor performs scheduling in phases by maintaining a tree of partial schedules and incrementally assigning tasks to the least-cost schedule. A scheduling phase terminates whenever any processor becomes idle, at which time partial schedules are distributed to the processors. The SADS family of online optimization techniques dynamically overlaps the scheduling and execution phases to mask the overhead of scheduling at runtime [5], [6]. SADS is an online optimization algorithm that, similar to the branch-and-bound algorithm [7]. An extension of the basic SADS algorithm, called Depth Bound SAD (DBSADS), controls the scheduling overhead by giving higher priority to partial schedules with more task-toprocessor assignments. These algorithms are compared to two distributed scheduling algorithms within a database application on an Intel Paragon distributed memory multiprocessor system. SADS performs partial task scheduling in repeated periods using a novel online tuning technique to determine the duration of each scheduling phase. The time allocated to a single scheduling phase is self-adjusted based on the loads in the working processors. During a scheduling phase, the algorithm continues to incrementally build a schedule of available tasks until at least one working processor has completed executing all of its previously assigned tasks.[4] The performance of the basic SADS and the DBSADS algorithms will be compared with their execution time measurement from implementation and experimentation. The Distributed Scheduling with Load balancing (DSL) algorithm distributes an equal number of tasks to each of the processors' local queues. When a processor becomes idle, it removes 1/P of the tasks from the most loaded processor and executes them. This receiverinitiated algorithm is effective in balancing the load among processors in a distributed fashion. The Distributed Scheduling with Affinity (DSA) algorithm[3] also initially distributes an equal number of tasks to each of the processors' local queues. However, with 56

A Survey : Dynamic Task Scheduling in Multiprocessor and the Swift Embryonic World of Parallel Computing

this sender-initiated algorithm, a processor executes a task only if it has affinity with the task. If it does not, it sends the task to a processor that does have affinity with that task. They compared the algorithms under different locality (affinity) patterns among tasks and processors to demonstrate the capability of SADS to adjust to these different patterns of locality. They also compared the algorithms under different average task execution times to test the effect of task grain sizes on the performance of the various algorithms. The comparison of the SADS algorithms with the distributed scheduling algorithms provides insight into the quality of schedules produced by dedicating a processor to scheduling. There is a one-to-one mapping between the set of queries and the set of relations in the system. The distribution of the relations in the system creates a notion of affinity between a query and the local memory of each processor which holds the relation containing the answer to that query. Once a query is available, its affinity with a processor can be found in a table or it can be calculated by matching the attributes used in the query predicates with the relation definitions in the database schema. The interconnection network on the Intel Paragon machine is a 2D mesh with a cut-through (wormhole) routing mechanism. Memory cost functions were derived from a communication model of such a network and routing mechanism. The cost function would be as: tcomm = ts + lth + mtw, where tcomm - the total communication time for a message of size m words to traverse l links, ts - the startup-time, th - the per hop time, and tw - a per word transfer time. Parameters of this cost function were calculated through benchmarking. Benchmarking was done by sending messages of different sizes between different source-destination pairs of nodes to calculate startup, per word, and per hop transfer times. The SADS performs well for a small number of processors, but it does not scale well when the mean of task execution times is small. Its scalability improves as the mean of task execution times increases. DBSADS performs quite well compared to other candidate algorithms and that it scales well as the number of processors increases. This work demonstrates that performing a sophisticated scheduling technique dynamically on a dedicated processor can produce substantial improvements in total execution times. 4.4. An Improved Dynamic Scheduling Algorithm: Zhu Xiangbin et al [8] proposed an improved heuristic algorithm, which has a new heuristic function. The improved algorithm considers not only the deadlines and the resource requirements of a task, but also the processing time of the task. The most important metric for real-time scheduling algorithms is scheduling success ratio. The myopic algorithm[2] uses many heuristic function, the last H function that has a best performance will be considered. The advantage of the H functions is it considers two factors: deadline and resource. But the H function H = (TD+W*ESPT(T)) 57

International Journal of Algorithms, Computing and Mathematics

does not consider the task processing time. When the algorithm want to select a task in k tasks, with the very short task processing time, the algorithm should select another task whose processing time is longer. But the task which processing time is short may have a less laxity. So, Zhu Xiangbin proposed that, the LAXITTY(T) is used to affect the heuristic(H)value. It is based on the Myopic Algorithm, but the new heuristic function used in this Improved Dynamic Scheduling Algorithm, which includes the task processing time to affect the H value is, H(T) = TD+ W1 *IEST( T) + W2 *LAXITY(T) It gives the better performance than the integrated heuristic, which was proposed in myopic as TD + W1*IEST(T). 4.5. A New Dynamic Scheduling Algorithm: YANG YuHai et al [9] , scrutinize several latent processor selection policies for non preemptive scheduling of dynamically arriving real-time tasks in heterogeneous multiprocessor systems. They propose P_IEFT policy is best, which selects the processor that minimizes the earliest finish time of a task. A preponderance of dynamic scheduling algorithms [2][10][11][12] are estimated for the homogenous multiprocessor systems, while few [13][14] are proposed for the heterogeneous multiprocessor systems. Minimal Earliest Finish Time (MEFT) algorithm suggested by Yang YuHai and Shengsheng[9] using P_IEFT processor selection policy, where a task will be assigned to the processor on which its finish time is less. In the P_IEST policy, the schedule length is not guaranteed to be the least, since a task executes on the processor with minimum earliest available time does not result in that its finish time will be the least. As for the P_IEFT policy, the schedule length will be less since both the earliest available time and the difference execution time of a task on the processors is taken into account. As for the P_minSpe policy, the schedule length is not certainly least, because the processor with fastest computation speed will have allot of tasks, so that the finish time of a task on it might be larger than that of it on the other one. As for the P_maxSpe policy, it is clearly that schedule length is not the least. Using the P_lowest policy, the workload of the processors will be balance. But this not related with the characteristics of real-time. Thereby schedule length is not guaranteed to be the least. From the above analysis, the P_IEFT policy is believed to outperform the others. Accordingly, the processor that minimizes the earliest finish time of a task is selected. Based on the analysis, a new algorithm Minimal Earliest-Finish-Time (MEFT) using the P_IEFT policy was proposed by YANG YuHai.

58

A Survey : Dynamic Task Scheduling in Multiprocessor and the Swift Embryonic World of Parallel Computing

4.6. Modified Dynamic Scheduling Algorithm: The improved scheduling algorithm[8] proposed by Zhu Xiangbin and Tu Shiliang is based on Heuristic function, it considers not only the deadlines and resource requirements of a task, but also the processing time of the task. George Amalarethinam et al, proposed Modified Dynamic Scheduling algorithm[15], which has a new heuristic function. The proposed algorithm considers the Computation time and shows that, it also affects the Heuristic function. The heuristic function H has been changed in the modified algorithm. It gives the better performance than the improved algorithm. This is because of the new H function which has been affected by the Computation Time and the Weightiness of the LAXITY(T) function. The performance can also be improved by scheduling large number of tasks. 5. Conclusion The paper gives a brief survey of Dynamic Scheduling algorithms for Multiprocessor Systems. We discuss the impact of modern parallel paradigms over the Dynamic Scheduling algorithms. It surveyed a number of different dynamic scheduling algorithms for multiprocessor system. In the world of Parallel Computing, while scheduling the task in dynamic way, it has to consider the utilization of all the processor without any processor is in an idle. And also it has to consider the processing time, deadline, computation time, scheduler, scheduler queue overflow, task weighting time, etc. We believe that, all the Dynamic Scheduling algorithms are capable of dealing with all the challenges posed to algorithm development by the modern parallel computing paradigms. However, the challenges faced by the HPC community are enormous and need a lot of research in the coming years. The application of the Dynamic scheduling algorithm is quickly growing research area. However, there remain many unsettled problems. Practical applications are also important for researchers studying Dynamic Scheduling algorithms. Since the adaptation to dynamic environments is a difficult problem, algorithms and implementations should be studied taking features of practical problems into consideration. References : 1. Parallel Computers – Architecture and Programming. V. Rajaraman, C. Siva Ram Moorthy - 2006 2. Ramamritham K, Stankovic A J, “Efficient scheduling algorithms for real-time multiprocessor systems”, IEEE transactions on Parallel and distributed systems, 1990, 3. A Comprehensive Dynamic Processor Allocation Scheme for Multiprogrammed Multiprocessor Systems. Iffat H. Kazi David J. Lilja – IEEE – 2000. 4. Dynamic Task Scheduling Using Online Optimization. Babak Hamidzadeh, Member, IEEE, Lau Ying Kit, and David J. Lilja, Senior Member, IEEE – IEEE November 2000. 5. B. Hamidzadeh and D.J. Lilja, “Self-Adjusting Scheduling: An On Line Optimization Technique for Locality Management andLoad Balancing” Proc. Int'l Conf. Parallel Processing, 1994. 6. B. Hamidzadeh and D.J. Lilja, “Centralized Scheduling Strategies for Shared-Memory Multiprocessors,” Proc. IEEE Int'l Conf. Distributed Computing Systems, 1996. 7. N.J. Nilsson, Principles of Artificial Intelligence. Palo Alto, Calif,: Tioga, 1980. 8. An Improved Dynamic Scheduling Algorithm For Multiprocessor Real-Time Systems. Zhu Xiangbin, Tu Shiliang, IEEE – 2003 9. A New Dynamic Scheduling Algorithm for Real-Time Heterogeneous Multiprocessor Systems.

59

International Journal of Algorithms, Computing and Mathematics

YANG YuHai, YU Shengsheng &, BIN XueLian. – IEEE – 2007 10. Qiao Ying, Wang Hong-an, Developing a new dynamic scheduling algorithm for real-time mulitprocessor systems, Journal of Software (in Chinese), 2002. 11. Manimaran T, Murthy C S R, An efficient dynamic scheduling algorithm for multiprocessor realtime systems, IEEE Transactions On Parallel and Distributed Systems, 1998. 12. Mittal A, Manimaran G, Murthy C S R, Integrated dynamic scheduling of hard and QOS degradable real-time tasks in multiprocessor systems, “Proceedings of the 5th international conference on real-time computing systems and applications”, Los Alamitos, CA;IEEE press, 1998. 13. Wang Kun, Qiao Ying, DAI Guo-Zhong, Study of a dynamic scheduling algorithm for real-time heterogeneous systems, Journal of Computer Research and Development (In Chinese), 2002. 14. Qiao Ying, Zou Bing, etc, Design and evaluation of an algorithm for integrated dynamic scheduling in real-time heterogeneous systems, Journal of Software (in Chinese), 2003. 15. Modified Dynamic Scheduling Algorithm for Muliprocessor System. Dr. D.I. George Amalarethinam & G.J. Joyce Mary. International Journal of Engineering and Technology. Vol. 2, No. 4, December 2009

60