Hindsight Helps: Deterministic Task Scheduling with ...

4 downloads 1286 Views 87KB Size Report
Aug 11, 1997 - ... of the 1997 International Conference on Parallel Processing, August 1997. Hindsight Helps: Deterministic Task Scheduling with Backtracking.
To appear in the Proc. of the 1997 International Conference on Parallel Processing, August 1997

Hindsight Helps: Deterministic Task Scheduling with Backtracking Yueh-O Wang Nancy M. Amato D. K. Friesen Department of Computer Science Texas A&M University College Station, Texas 77843-3112

fyuehow,amato,[email protected]

Abstract

optimal schedules, recent research has emphasized heuristic approaches which produce near-optimal solutions in polynomial time. Another strategy is to restrict the problem so that an optimal makespan for the restricted problem can be found in polynomial time. Both restriction and heuristic approaches are studied in this research.

This paper considers the problem of scheduling a set of precedence-related tasks on a nonpreemptive homogeneous message-passing multiprocessor system in order to minimize the makespan, that is, the completion time of the last task relative to start time of the first task. We propose a family of scheduling algorithms, called IPR for immediate predecessor rescheduling, which utilize one level of backtracking. We also develop a unifying framework to facilitate the comparison between our results and the various models and algorithms that have been previously studied. We show, both theoretically and experimentally, that the IPR algorithms outperform previous algorithms in terms of both time complexity and the makespans of the resulting schedules. Moreover, our simulation results indicate that the relative advantage of the IPR algorithms increases as the communication constraint is relaxed.

1.1 Preliminaries In this work we consider a system with an arbitrary number of nonpreemptive homogeneous processors that communicate via message passing. The precedence relationships between tasks are known before-hand, as are the execution costs and communication delays. If two tasks are scheduled on different processors, the communication delay between them is independent of the processors on which they are scheduled. The system is assumed to be collision-free, so that no messages are lost and all messages are sent in a finite amount of time. The system is contention-free, that is, the channel processors are independent of the task processors, and all processors may be executing tasks at the same time that communication is taking place. In the task scheduling problem we are given a set T = f1; 2; : : :; ng of n tasks, each with a processing time ui . The tasks in T can be arranged in a directed acyclic graph, called a precedence graph (DAGPG), in which each edge represents a temporal relationship between two tasks. Task j is an immediate predecessor of task i if the edge (j; i) is present in the DAGPG; such an edge implies that the execution of task i cannot be initiated until after task j has completed execution and its communication to task i. Each edge (j; i) has a weight cji representing the communication delay from task j to task i. The graph is called an in-forest precedence graph (IFPG) if each node has at most one out-going edge; a connected IFPG is called an in-tree precedence graph (ITPG).

1 Introduction This paper considers the problem of scheduling a set of precedence-related tasks on a nonpreemptive homogeneous message-passing multiprocessor system in order to minimize the makespan, that is, the completion time of the last task relative to start time of the first task. This problem is NP-complete, and few polynomial time scheduling algorithms are known even if strong restrictions are placed on the problem. For example, Rayward-Smith [12] showed that the problem of optimally scheduling a set of tasks whose precedence relation forms a directed acyclic graph (DAG) on m > 1 processors is NP-complete even when all tasks have unit time execution cost and unit time communication delays. Moreover, if we assume that communication delays are zero, then the scheduling problem is NP-complete even if the execution cost of each task is either one or two time units. If we further assume that all tasks have identical execution costs, then polynomial time algorithms are known if either there are only two processors in the system, or the precedence relation of the tasks forms a tree [9]. Nevertheless, the complexity of the scheduling problem varies according to the constraints we place on the following factors: (i) the relative magnitudes of the execution costs and the communication delays (costs), (ii) the structure of the precedence graph, and (iii) the number of processors in the system. Due to the difficulty of efficiently obtaining

1.2 A unifying framework for task scheduling It is sometimes difficult to make meaningful comparisons between task scheduling algorithms since they are often described and analyzed with respect to different (restricted) versions of the problem. Versions of the task scheduling problem, henceforth referred to as models, may vary according to the constraints placed on the structure of the precedence graph, the number of processors available, and the (relative) magnitudes of the computation cost and the communication delay. Accordingly, we suggest representing a model by three parameters M (G; P; C ), where G denotes the structure of the precedence graph, P denotes the number of processors available, and C denotes the constraints placed on the communication delay. The models considered by most previous work, and in this paper, can be categorized within this framework by selecting appropriate values for the parameters. (See Table 1.)

 Supported in part by the NSF under CAREER Grant CCR-9624315 and Grant CDA-9529442 and by NATO under Grant CRG 961243. Copyright 1997 IEEE. Published in the Proceedings of ICPP’97, August 11-15, 1997 in Bloomingdale, Illinois. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 908-562-3966.

1

2 The IPR algorithms

Table 1: Representative Parameters for M (G; P; C )

G0 G1 P0 P1 C0 C1 C2 C3 C4

G — Structure of Precedence Graph DAGPG (directed acyclic graph) ITPG (in-tree) P — Number of Processors Available insufficient (limited or bounded) sufficient (as many as needed or unbounded) C — Communication Constraints arbitrary (not constrained) comm. delay less than exec. cost of sender (cij ui ) coarse-grain (maxj cji minj uj , maxj cij minj comm. delay less than exec. cost of smallest task unit execution cost and unit communication delay

f g

f g

 f g

In addition to the makespan of the schedule produced, the time required to compute the schedule is also a critical concern. Most of the algorithms mentioned above do not apply any backtracking or look ahead scheme since that would complicate the algorithm and increase the time complexity. In this paper we propose a set of scheduling algorithms which utilize one level of backtracking. These algorithms obtain improved schedules without significantly increasing the scheduling time (see Table 3). We call these algorithms IPR, for immediate predecessor rescheduling. A study of previous scheduling algorithms shows that optimal makespans cannot be obtained when the communication constraint is less restrictive unless multiple predecessors of a task are allowed to be scheduled on the same processor. A naive approach which considers scheduling all possible combinations of predecessors on a single processor would have unacceptable time complexity. IPR balances the conflicting requirements of minimizing both the makespan and the scheduling time by considering only the immediate predecessors of a task. It has a running time of O(n log n). The description of IPR given below is general and applies to any model M (G1; P 1; C ). However, how close the makespan obtained is guaranteed to be to optimal depends on the communication constraint C .

fuj g)

The most general, NP-complete version of the problem is represented by model M (G0; P 0; C 0). If C is more restricted than Cd1 and less restricted than Cd2, we say Cd1 < C < Cd2; similarly for G and P . For example, C 0 < C 1 < C 2 < C 3 < C 4. The natural extension of this ordering to models enables us to draw meaningful comparisons between them. For example, M (G0; P 0; C 0) < M (G1; P 1; C 1) < M (G1; P 1; C 2). Note however, that comparisons between parameters are not necessarily meaningful, that is, this framework defines a partial order for the various models that have been studied.

algorithm: IPR

1.3 Previous work

The problem of scheduling a set of n tasks with precedence relationships on a nonpreemptive multiprocessor system with m > 1 identical processors has been studied by many researchers. Most previous work can be classified as either list scheduling algorithms [12, 5, 6, 17, 1, 10, 2] or cluster scheduling algorithms [3, 4, 7, 8, 9, 11, 13, 15, 16]. Due to space constraints, we do not describe the various approaches here; details can be found in [14]. Using the M (G; P; C ) framework, Table 2 gives a summary of the previous work most relevant to our research, where OPT (I ) denotes the makespan of the optimal schedule.

1. schedule each leaf of the IFPG on a different processor 2. while (there exists an unscheduled task) (a) i

g

f

t1 ; t2 ; : : : ; tei , task i’s immediate predecessors, (b)  sorted in nonincreasing order of ftj ctj i



+

(c) find U  and processor pU on which to schedule task i and U that minimizes the start time of task i

The only non-trivial step is 2(c). Using the facts stated below, we can devise a scheme for it that runs in O(ei +log ei ) time. Let the set  be defined as in Step 2(b).

1.4 Our results

Fact 1: If task i’s start time cannot be reduced by scheduling some tr ; r 2 , on the same processor as task i, then task i’s start time cannot be reduced by scheduling any tj , j > r, on the same processor as task i.

In this work, we study possible optimal algorithms and heuristic solutions for models which have G 2 fG0; G1g, P 2 fP 0; P 1g, and C 0  C  C 2. Most previous work has been on more restricted models with C  C 2, that is, models with more restrictive communication requirements. We propose a family of algorithms called IPR, for immediate predecessor rescheduling. We analyze the running time and prove worst-case bounds on the makespans of the resulting schedules for the models M (G1; P 1; C ), M (G1; P 0; C ), and M (G0; P 1; C ), where C 1  C  C 2. For these models, we show that IPR finds schedules with smaller makespans than previous algorithms while maintaining the same or only slightly larger running times.1 We also present simulation results for the models M (G1; P 1; C 0) and M (G1; P 1; C 1), and compare the resulting makespans with those obtained by some previously proposed scheduling algorithms. In our experiments, IPR consistently obtained smaller makespans than all of the previous algorithms with which we compared it. A summary of our results is contained in Table 3, where OPT (I ) denotes the makespan of the optimal schedule. The diagram shows the relationships between the models; an arrow from a to b indicates a is more restricted than b. 1 Details

an unscheduled task with scheduled predecessors

Fact 2: For those predecessors which are to be scheduled on the same processor as task i, the best order to schedule them is according to their start times. These facts can be exploited to compute the subset U   and the processor pU as follows. By Fact 1, we can consider the tasks in  in order and the rescheduling process can be terminated as soon as it is determined that the start time of task i cannot be improved. Moreover, the best solution will be obtained when U consists of a consecutive set of the immediate predecessors ft1 ; t2 ; : : :; tkg, for some k  ei , and, by Fact 2, these tasks should be scheduled in order of their start times. Finally, whenever a new predecessor tj is considered for inclusion in the set U , it is simple to verify that the only processors that need to be considered for scheduling U [ ftj ; task ig are the processor on which tj is currently scheduled and the processor pU selected for U [ ftask ig (which is initially the processor on which t1 was scheduled). The correctness of this approach follows

omitted here due to space constraints can be found in [14].

2

Table 3: Summary of the IPR Algorithms Table 2: Summary of Most Relevant Previous Work Algorithm ETF [6] JLP [1] JLP/D [1] T [10] TDUP [10] DSC [16]

Model M (G0;P 0; C:0) M (G1;P 1; C 3) M (G0;P 1; C 3) M (G1;P 1; C 2) M (G0;P 1; C 2) M (G1;P 1; C 2)

Time

O(n2 m) O(n) O(n2) O(n) O(n2) O((e + n) log n)

Algorithm

Makespan

(2

? m1 )OPT (I ) + Cl OPT (I ) OPT (I ) OPT (I ) OPT (I ) OPT (I )

IPR IPR/S IPR/D IPR/LP

Model M(G1,P1,C2) M(G1,P1,C1) M(G1,P1,C2) M(G1,P1,C1) M(G0,P1,C2) M(G0,P1,C1) M(G1,P0,C2) M(G1,P0,C1)

Time

O(n log n) O(n log n) O(n) O(n) O(n2 log n) O(n2 log n) O(n2 ) O(n2 )

Makespan

OPT (I ) OPT (I ) OPT (I ) 6 OPT (I ) 5 OPT (I ) 6 OPT (I ) 5 3 OPT (I ) 2 9 OPT (I ) 5 6 5

Relationships Between Models M(G0,P0,C0) ETF

Table 4: Simulation Results

Least Restricted

Algorithms IPR vs JLP

M(G0,P1,C1) IPR/D M(G0,P1,C2) IPR/D, TDup M(G0,P1,C3) JLP/D

M(G1,P1,C1) IPR(IPR/S)

IPR vs DSC

M(G1,P0,C1) IPR/LP

IPR vs MCCP IPR vs DCP

M(G1,P0,C2) IPR/LP

vs DSC vs MCCP JLP vs DCP DSC vs MCCP DSC vs DCP JLP

M(G1,P1,C2) IPR(IPR/S), T, DSC M(G1,P1,C3) JLP

JLP

Most Restricted

Model M (G1;P 1;C 0) better worse same 396 0 104 396 0 104 447 6 47 440 3 57 0 0 500 403 43 54 339 86 75 403 43 54 339 86 75

Model M (G1;P 1; C 1) better worse same 51 0 449 51 0 449 344 1 155 359 0 141 0 0 500 336 3 161 339 3 158 336 3 161 339 3 158

3 Experimental comparison

directly from Facts 1-2. Using an appropriate data structure, it can be implemented in O(ei + log ei ) time. Details can be found in [14]. The running time of IPR is dictated by the time spent in Steps 2(b) and 2(c) of the while loop. Recall that ei is the number of immediate predecessors of task i, the task being considered in the current iteration. The sorting required in Step 2(b) takes O(ei log ei ) time for each iteration, or O(n log n) time overall since i ei = O(n). As mentioned above, Step 2(c) takes O(ei + log ei ) time per iteration, or O(n) time overall. Thus, the running time of IPR is dominated by the sorting in Step 2(b) and is O(n log n). It can be shown that for M (G1; P 1; C ), IPR constructs schedules with optimal makespans when C  C 2, and when C6 1  C < C 2, it constructs schedules that are at most OPT (I ), that is, makespans at most 65 the length of the 5 optimal makespan [14].

Viewed in our model framework, it is clear from the theoretical results presented in Tables 2 and 3 that the IPR algorithms are superior to previous methods for models with communication constraints in the range C 1  C < C 2. We now examine IPR’s behavior using simulation and compare its performance to some previous algorithms. None of the other algorithms have been theoretically analyzed for models with C < C 2. The simulation was performed for two models — M (G1; P 1; C 0) and M (G1; P 1; C 1). The algorithms we selected to compare with IPR were JLP [1], DSC [16], DCP [8], and MCCP [16]. We chose DSC and JLP as representatives of the cluster and list scheduling algorithms, respectively. DSC was selected since it is the only cluster scheduling algorithm that has been both theoretically and experimentally analyzed. JLP was selected since it can easily be compared with DSC and because the other list scheduling algorithms achieve either the same, or only slightly improved, makespans. In addition, we chose the cluster scheduling algorithm MCCP since it can optimally schedule worst-cases for JLP, DSC, and IPR. All algorithms were coded in C++ and run on a Unix system. We generated 500 test cases for each model. The number of tasks in each test case was randomly generated in the range [1; 500], and the execution costs and communication delays were randomly generated in the range [1; 100]. The communication constraint C 1 was enforced by setting the communication delay to be the remainder after dividing by the sender’s execution cost. The results for each model are given in Table 4, and graphically in Figures 1 and 2 for models M (G1; P 1; C 0) and M (G1; P 1; C 1), respectively. In the graphs, the test cases are partitioned into 10 groups according to the number of tasks; group 0 for test cases with 1-50 tasks, group 1 for test cases with 51-100 tasks, etc. The average makespan for each group is shown in the graph. Our simulations show that IPR achieves the smallest makespans among the algorithms studied. The next best algorithms are JLP and DSC, followed

P

2.1 Other versions of the IPR algorithm We have designed and analyzed several variations of the basic IPR algorithm discussed above (see Table 3). One version, called IPR/S for simple IPR, is a modification of IPR that avoids the sorting in Step 2(b) and reduces the running time to O(n). It has essentially the same worst-case makespan bounds as IPR; the only exception is when the IFPG has height less than three. If task duplication is allowed, a modification of IPR, called IPR-DAG/D, can schedule DAGPGs; the basic idea is that a task is duplicated if more than one of its successors would be scheduled on the same processor as that task. The running time of IPR-DAG/D is O(n2 log n) and it provides the same performance guarantees as IPR. If the number of processors is limited ( P 0), then IPR can be used as a subroutine in an algorithm we call IPR/LP which runs in time O(n2 ) and achieves a worst-case performance bound of 95 OPT (I ) if C 1  C  C 2. 3

650

[2] S. Darbha and D. P. Agrawal. SDBS: A task duplication based optimal scheduling algorithm. In Proc. of Scalable High Performance Computing Conference, pages 756–763, May 1994.

"IPR" "JLP" "DSC" "MCCP" "DCP"

600

550

Makespan

500

[3] H. El-Rewini and T. G. Lewis. Scheduling parallel program tasks onto arbitary target machines. J. of Parallel and Distributed Computing, 9(2):138–153, June 1990.

450

400

[4] A. Gerasoulis and T. Yang. A comparison of clustering heuristics for scheduling directed acyclic graphs on multiprocessors. J. of Parallel and Distributed Computing, 16:276–291, 1992.

350

300

[5] R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math., 17:416–429, 1969.

250 0

1

2

3

4

5

6

7

8

9

Scheduling algorithm comparison for M (G1;P 1;C 0). Group

Figure 1:

[6] J. Hwang, Y. Chow, F. Angers, and C. Lee. Scheduling precedence graphs in systems with interprocessor communication times. SIAM J. Comput., 18(2):244–257, April 1989.

550 "IPR" "JLP" "DSC" "MCCP" "DCP" 500

[7] A. A. Khan, C. L. McCreary, and M. S. Jones. A comparison of multiprocessor scheduling heuristics. In Proc. of the 1994 International Conference on Parallel Processing, volume 2, pages 243–250, August 1994.

Makespan

450

400

[8] Y.-K. Kwok and Ishfaq Ahmad. A static scheduling algorithm using dynamic critical path for assigning parallel algorithms onto multiprocessors. In Proc. of the 1994 International Conference on Parallel Processing, volume 2, pages 155–159, August 1994.

350

300

250 0

1

2

3

4

5

6

7

8

9

Scheduling algorithm comparison for M (G1;P 1;C 1). Group

Figure 2:

[9] T. G. Lewis and H. El-Rewini. Parallax: A tool for parallel program scheduling. IEEE Parallel and Distributed Technology, pages 62–72, May 1993.

by DCP and then MCCP. Thus, our simulation results reinforce the theoretical analysis and moreover, indicate that the relative advantage of IPR increases when the communication constraint is unrestricted (C 0).

[10] D. R. Lopez. Models and Algorithms for Task Allocation in a Parallel Environment. PhD thesis, Computer Science Department, Texas A&M University, 1992. [11] C. McCreary and H. Gill. Automatic determination of grain size for efficient parallel processing. Communication of the ACM, 32(9):1073–1078, September 1989.

4 Conclusion This paper introduces the IPR family of scheduling algorithms which utilize one level of backtracking. To facilitate the comparison between the various models and algorithms that have been studied we proposed the M (G; P; C ) model framework. We have seen, both theoretically and experimentally, that the IPR algorithms outperform previous algorithms in terms of both time complexity and the makespan of the resulting schedules. Our results indicate that the relative advantage of the IPR algorithms increases as the communication constraint is relaxed; most previous algorithms have been proposed and analyzed for models with more restricted communication constraints. The improved makespans obtained by the IPR algorithms result from the use of backtracking. Indeed, without backtracking, schedules with acceptable makespans can only be guaranteed in a more restricted model, such as those with much smaller communication delays. However, backtracking potentially increases the scheduling time significantly, which is not desirable since it would waste resources making the decision. The IPR algorithms balance these conflicting requirements by using only one level of backtracking.

[12] V. J. Rayward-Smith. UET scheduling with interprocessor communications delays. Technical Report SYS-C86-06, School of Information Systems, University of East Anglia, Norwich, NR4 7TJ, 1986. [13] V. Sarkar. Partitioning and Scheduling Parallel Programs for Multiprocessors. MIT Press, Massachusetts, 1989. [14] Y. Wang, N. M. Amato, and D. K. Friesen. Hindsight helps: A backtracking scheme for scheduling concurrent tasks. Technical Report 97005, Department of Computer Science, Texas A&M University, May 1997. [15] M. Wu and D. D. Gajski. Hypertool:a programming aid for message-passing systems. IEEE Trans. on Parallel and Distributed Systems, 1(3):330–343, July 1990. [16] T. Yang and A. Gerasoulis. A fast static scheduling algorithm for dags on an unbounded number of processors. In Proc. of Supercomputing’91, pages 633–642, 1991. [17] C. Yen, S. S. Tseng, and C.-T. Yang. Scheduling of precedence constrained tasks on multiprocessor systems. In Proc. IEEE First International Conference on Algorithms and Architectures for Parallel Processing, pages 379–382, April 1995.

References [1] F. Anger, J. Hwang, and Y. Chow. Scheduling with sufficient loosely coupled processors. J. of Parallel and Dist. Comp., 9:87–92, 1990.

4

Suggest Documents