Local Search for DAG Scheduling and Task Assignment Min-You Wu and Wei Shu Department of Computer Science State Universit yof New York at Bualo wu,
[email protected]alo.edu
Jun Gu Dept. of Electrical and Computer Engineering Universit yof Calgary
[email protected]
Abstract
eac h of which is denoted by e . Each node represents a task, and the w eight of node n , w(n ), is the execution time of the task. Each edge represents a message transferred from one node to another node, and its w eigh t,w(e ), is equal to the transmission time of the message. The communication-to-computation ratio (CCR) of a parallel program is de ned as its average communication cost divided by its a verage computation cost on a giv en system. In a D A G,a node which does not have any parent is called an entry node whereas a node which does not have any child is called an exit node. A node cannot start execution before it gathers all of the messages from its parent nodes. In static sc heduling, the number of nodes, the number of edges, the node w eigh t,and the edge w eigh tare assumed to be kno wnbefore program execution. The edge w eigh tamong tw onodes assigned to the same processing element (PE) is assumed to be zero. The objective in static scheduling is to assign nodes of a D A Gto PEs suc h that the schedule length or makespan is minimized without violating the precedence constraints. Many algorithms can be employed in static scheduling including MCP [28 ], DSC [29 ], DLS [21], etc. Although these algorithms produce relativ ely good schedules, they are usually not optimal. Sometimes, the generated schedule is far from optimal. In this paper, we propose a fast local search algorithm, T ASK (T opological Assignment and Scheduling Kernel), to improve the quality of schedules generated by an initial scheduling algorithm.
Sche dulingD A Gsto multiprocessors is one of the key issues in high-p erformance computing. L ocal search c an b e use d to eectively improve the quality of a sche duling algorithm. In this pap er,base d on top ological ordering, we present a fast local search algorithm which can improve the quality of DA G scheduling algorithms. This low complexity algorithm can effectively r educ e the length of a given sche dule.
1 Introduction Sc heduling computations onto processors is one of the crucial components of a parallel processing environment. In this paper, we consider static scheduling algorithms that schedule an edge-weigh ted directed acyclic graph (D A G)to a set of homogeneous processors to minimize the completion time. Since the static scheduling problem is NP-complete in its general forms [2], there has been considerable researc h eort in this area resulting in many heuristic algorithms [28 , 1, 29 , 21 ]. In this paper, instead of suggesting a new scheduling algorithm, we present an algorithm that can improve the scheduling qualit yof the existing scheduling algorithms by using a fast local searc h technique. This algorithm, called T ASK (T opological Assignment and Scheduling Kernel), systematically minimizes a given sc hedule in atop olo gical order. In each mo e,v the dynamic cost (i.e., tlevel) of a node is used to quickly determine the search direction. It can eectively reduce the length of a giv en schedule.
2 DAG Scheduling A directed acyclic graph (D A G)consists of a set of nodes fn1; n2 ; :::; n g connected by a set of edges, n
Presen
tly on lea ve at:Dept. of Computer Science, Hong Kong Univ. of Science and Technology, Kowloon, Hong Kong,
[email protected]
i;j
i
i
i;j
3 Local Search for Scheduling and Task Assignment Local searc h w as one of the early techniques for combinatorial optimization. The principle of local searc h is to re ne a given initial solution point in the solution space by searc hing through the neighborhood of the solution point. There have been tw o major periods for the development of local search. Early greedy local searc h method w asable to solv e small uncon-
Proceedings of the 1997 International Conference on Parallel Processing (ICPP’97) 0-8186-8108-X 97 $10.00 ã 1997 IEEE
strained path- nding problems such as TSP [14, 19]. During the middle and late eigh ties, more po werful techniques for randomized local searc h w ere dev eloped. These include con ict minimization, random variable selection, and pre-, partial, and stoc hastic variable selection [3, 5, 6, 22 , 23 , 25 , 24 , 26 ]. These randomized local searc h algorithm can handle large size constraint satisfaction problem (CSP) and constrained optimization problems eciently. The n-queen problem is a benchmark for constraint satisfaction problem. Analytical solutions for the nqueen problem exist but they cannot solv e general searc h problems and have no use in practice. In practice a searc h algorithm is used. The satis ability (SAT) problem is a binary CSP problem. The scheduling and task assignment problems are well-known as CSP/satis ability problems. SAT model formulates scheduling and task assignment problems precisely . CSP model expressively c haracterizes scheduling and task assignment operations. There were sev eral signi cant local search solutions to the sc heduling and task assignment problems. The SAT1 algorithm w asthe rst local searc h algorithm dev eloped for the satis ability problem during the later eigh ties [3, 4, 5, 7]. These local searc h solutions to the SAT problem were applied to solve sev eral large-scale industrial scheduling and task assignment problems. N -queen problem was a model for the early study of local search algorithms for scheduling and task assignment. During the later eighties, IBM and NASA w ere w orkingon a number of important scheduling and task assignment projects. The underlying structure of the n-queen problem, represented by a complete constraint graph, giv es a relational model with fully speci ed constraints among the multiple objects [3]. V ariationson the dimension, the objects' relative positions, and the weigh ts on the constraints led to a h yper-queen problem model which consists of sev eral simple and basic models:
n{queen problem: the base model. N queens
are indistinguishable and the constraints among queens are speci ed by the binary values (i.e., 1 or 0).
w{queen problem: the weighted n-queen model.
N queens are distinguishable (each is associated with a cost) and the constraints among queens are speci ed by some weigh ts.
3d{queen problem: queens are to be placed in a 3-dimensional (l m n) rectangular cuboid.
A special case, nm{queen, is to place queens on an n by m rectangle. q+{queen problem: more than one queens are allo w ed to be placed on the same row or the same column. Based on the n-queen, the hyp er-queen problem can model the objects/tasks, the performance criteria, the timing, spatial, and resource constraints for a wide range of scheduling and task assignment problems. This made the n-queen problem a general model for many industrial scheduling and task assignment problems. By a remarkable coincidence, the models of several dicult scheduling projects at that time were either the n-queen or the hyp er-queen problems [12, 27]. All of them required ecient solutions to the n-queen or hyp er-queen problems. Many practical applications based on hyp er-queen models ha vebeen dev eloped[8, 26 ]. These include task scheduling (static or dynamic), real-time system, task assignment, computer resource management, VLSI circuit design, air trac control, communication system design, and so on. Sc heduling problems modeled by various queen models ha vespeci c performance criteria and are kno wnto be NP-hard. When scheduling computational tasks onto multiprocessors, for example, one can use a hyp er-queen model where there are q+ weigh ted queens to beplaced on a t by p rectangle. Let t denote the execution time, p the number of processors, q the execution time of the ith task, and c the communication time from the ith task to the j th task, the goal is to place the task que enson to the t by p table and minimize the longest execution path, follo wing the giv en topological constrain ts. F ollo wing local con ict minimization [3], aQS 1 algorithm was developed during late 1987 and was implemented during early 1988. It w asthe rst local searc h algorithm dev elopedfor the n-queen problem [3, 22 , 23 ]. Three improved local searc h algorithms for the n-queen problem were developed during 1988 to 1990 [16, 13]. QS 2 is a near linear-time local search algorithm with an ecient random variable selection strategy [24]. QS 3 is a near linear-time local searc h algorithm with ecient pre- and random variable selection and assignment [24 ]. QS 4 is a linear time local searc h algorithm with ecient partial and random variable selection and assignment techniques [25, 26]. Compared to the rst local search algorithm [3], partial and random variable selection/assignment heuristics ha vesigni cantly improved searc h eciency by orders of magnitude. QS 4, for example, w as able to solv e 3,000,000 queens in a few seconds.
Proceedings of the 1997 International Conference on Parallel Processing (ICPP’97) 0-8186-8108-X 97 $10.00 ã 1997 IEEE
i
ij
Since the early dev elopment of local searc h solutions for scheduling and task assignment applications during the late eigh ties, more than one hundred industrial companies w orldwide ha vedev eloped these scheduling softw are systemsfor various applications. Three years after releasing the QS 1 algorithm, in 1990, Minton et al. independently reported a similar local searc h algorithm for the n-queen problem [15]. A major dierence betw een Minton's algorithms and Sosic and Gu's algorithms is that Minton's rst algorithm w as a one dimensional local search without using random heuristics. Recently local searc h solutions ha vebeen introduced for DAG scheduling and task assignment [8, 26 , 9, 10 , 11 ]. A typical randomized local searc h algorithm for D A Gscheduling w asimplemented in a recen t Applied Optimization course project and found to be ecient [11, 17] (also in Figure 1 and [18]). In this algorithm, a node is randomly pic kedfrom the bloc kingnode list, where a bloc kingnode is de ned as a node that has the potential to block critical path nodes. Then the node is moved to a randomly selected PE. If the schedule length is reduced, the move is accepted. Otherwise, the node is moved back to its original PE. Each mo e,v successful or not, takes O(e) time to compute the schedule length, where e is the number of edges in the graph. T o reduce its complexity, a constant Max iteration is de ned to limit the number of steps so that only Max iteration nodes are inspected. The time tak en for the algorithm is proportional to eMax iteration. Moreover, randomly selected nodes and PEs may not be able to signi can tly reduce the length of a given sc hedule. Even if the Max iteration is equal to the number of nodes, leading to a complexity of O(en), the random search algorithm still cannot provide a satis ed performance.
4 Local Search with Topological Ordering for Scheduling We propose a fast local searc h algorithm utilizing topological ordering for D A Gscheduling. The algorithm is called T ASK (T opological Assignment and Sc heduling Kernel). In this algorithm, the nodes in the DA G are inspected in atop olo gicalorder. In this order, it is not required to visit ev ery edge to determine whether the schedule length is reduced. Time spent on each move can be drastically reduced so that inspecting ev ery node in a large graph becomes feasible. Also, in this order, w ecan compact the giv en schedule systematically. F or a given graph, in order to describe the T ASK algorithm succinctly, sev eral terms are de ned as follows: tlevel(n ), the largest sum of communication and computation costs at the top lev el of node n , i.e., from an entry node to n , excluding its own w eigh tw(n ). blevel(n ), the largest sum of communication and computation costs at the bottom level of node n , i.e., from n to an exit node. The critical path, CP, is the longest path in a D A G. The length of the critical path of a D AG is Lcp = max fL(n )g; 2 i
i
i
i
i
i
i
ni
V
i
where L(n ) = tlevel(n )+ blevel(n ) and V is the node set of the graph. i
i
i
If the giv en graph has been previously scheduled, more terms are de ned: Node n has been scheduled on PE pe(n ). Let p(n ) be the predecessorno de that has been scheduled immediately before node n on PE pe(n ). If node n is the rst node scheduled on the PE, p(n ) is null. Let s(n ) be the suc cessor no de that has been scheduled immediately after node n on PE pe(n ). If node n is the last node scheduled on the PE, s(n ) is null. i
i
i
i
i
construct the blocking node list searc hstep = 0 do f pick node n randomly from blocking node list pick a PE P randomly move n to PE P if schedule length does not improve move n bac k to its original PE g while (searchstep++ < MAXSTEP) i
i
i
Figure 1: RAND: a randomized local searc h algorithm for DAG scheduling.
i
i
i
i
i
i
i
One of characteristics of the T ASKalgorithm is its independence of the algorithm that was used to generate the initial schedule. As long as the initial schedule is correct and every node n has a vailablepe(n ), p(n ), and s(n ) nodes, application of T ASKguarantees that the new schedule of the graph is better than or equal to the initial one. The T ASK algorithm is sho wnin Figure 2.
Proceedings of the 1997 International Conference on Parallel Processing (ICPP’97) 0-8186-8108-X 97 $10.00 ã 1997 IEEE
i
i
i
i
procedure T ASK (DAG Schedule) begin
/* initialization */ Construct a scheduled DA G; for node i := 0 to n , 1 do L(n ) := tlevel(n ) + blevel(n ); L := max0 L(n ), the longest path in DA G; i
i
CP
i