Scheduling Jobs with Communication Delays: Using ... - CiteSeerX

4 downloads 32680 Views 124KB Size Report
This meets the best known performance guarantee for the same problem ... by the theoretical computer science and operations research communities since ...
Scheduling Jobs with Communication Delays: Using Infeasible Solutions for Approximation Extended Abstract

Rolf H. Möhring

Markus W. Schäffter

Andreas S. Schulz

Abstract In the last few years, multi-processor scheduling with interprocessor communication delays has received increasing attention. This is due to the more realistic constraints in modeling parallel processor systems. Most research in this vein is concerned with the makespan criterion. We contribute to this work by presenting a new and simple (2 ? m1 )-approximation algorithm for scheduling to minimize the makespan on identical parallel processors subject to series-parallel precedence constraints and both unit processing times and communication delays. This meets the best known performance guarantee for the same problem but without communication delays. For the same problem but with (non-trivial) release dates, arbitrary precedence constraints, arbitrary processing times and “locally small” communication delays we obtain a simple 73 -approximation algorithm compared with 4 )-approximation algorithm by Hanen and Munier for the case with the involved ( 73 ? 3m identical release dates. Another quite important goal in real-world scheduling is to optimize average performance. Very recently, there have been significant developments in computing nearly optimal schedules for several classic processor scheduling models to minimize the average weighted completion time. In this paper, we study for the first time scheduling with communication delays to minimize the average weighted completion time. Specifically, based on an LP relaxation we give the first constant-factor polynomial-time approximation algorithm for scheduling identical parallel processors subject to release dates and locally small communication delays. Moreover, the optimal LP value provides a lower bound on the optimum with the same worst-case performance guarantee. The common underlying idea of our algorithms is to compute first a schedule that regards all constraints except for the processor restrictions. This schedule is then used to construct a provable good feasible schedule for a given number of processors and as a tool in the analysis of our algorithms. Complementing our approximation results, we also show that minimizing the makespan on an unrestricted number of identical parallel processors subject to series-parallel precedence constraints, unit-time jobs, and zero-one communication delays is NP-hard.  Technische

Universität Berlin, Fachbereich Mathematik, Sekr. MA 6-1, Straße des 17. Juni 136, 10623 Berlin, Germany, e-mail: fmoehring,shefta,[email protected]. This work was supported by the DFG under grant MO 446/3-1.

1 Introduction Many real-world scheduling problems bear three important characteristics: the tasks to be scheduled arrive over time, data has to be transferred from completed tasks to some others to be scheduled subsequently, and one wishes to optimize some function of average or total performance. In multi-processor systems, the transfer of data between different processors working on a chain of tasks is particularly important. This is why scheduling problems with communication delays have achieved much attention by the theoretical computer science and operations research communities since about 1987 (cf., e.g., [RS87, VLL90, Law93, LVV96, HVL94, CP95]). With the exception of [Ver95] where the maximum lateness is considered, all studies deal with the makespan criterion. We focus on both the makespan and the average weighted completion time criterion and obtain new and simple approximation algorithms with constant performance guarantee for a variety of problems that have not been attacked before. A ρ-approximation algorithm is a polynomial-time algorithm that always finds a solution of objective function value within a factor of ρ of optimal; ρ is also referred to as the performance guarantee of the algorithm. Two of our main results are a 10 3 -approximation algorithm for minimizing the average weighted completion time on identical parallel processors subject to (non-trivial) release dates, arbitrary precedence constraints, 0=1 communication delays, and unit processing times, and a 6:143approximation algorithm for the same problem but with arbitrary processing times and “locally small” communication delays. Nothing was known about constructing good approximations in polynomial time for these cases. The perhaps most interesting approximation result for communication delays so far 4 is a ( 73 ? 3m )-approximation algorithm of Hanen and Munier [HM95] for the makespan criterion on m identical parallel processors subject to arbitrary precedence constraints and processing times, and “small” communication delays. Their rather complicated algorithm is based on the idea of omitting the processor restrictions and constructing a feasible schedule on an unrestricted number of processors in a first step. Interestingly enough, Picouleau [Pic95] showed that minimizing the makespan on an unrestricted number of processors is NP-hard, even in the case of unit processing times and unittime communication delays. Hanen and Munier thus use an approximate solution of this relaxation with an unrestricted number of processors in order to construct a schedule respecting the processor restrictions. The other significant progress in the design of approximation algorithms concerns minimizing the average weighted completion time for a variety of classic scheduling problems [PSW95, HSW96, Sch95, Sch96, CPS+ 96, HSSW96]. For problems involving precedence constraints, the progress essentially comes from the use of appropriate linear programming relaxations, from a decomposition of schedules into intervals of geometrically increasing size, and from the use of randomness. In this paper, we combine and extend these techniques to develop and analyze new approximation algorithms for scheduling problems with communication delays. We exploit the same ideas of neglecting the processors restrictions and using LP relaxations in several ways. We first develop a “master” algorithm that, given a schedule with performance guarantee α for the corresponding problem with an unrestricted number

of processors, and a priority list, constructs a feasible schedule for the given number of processors with a performance guarantee of α + β. The master algorithm turns out to be a suitable extension of Graham’s list scheduling [Gra66]. The question is then, how good α and β can be made. Roughly speaking, we distinguish the general case with arbitrary processing times and locally small communication delays, and the unit-time case with unit processing times and unit-time (or zero-one) communication delays. We obtain that β = 1 for the makespan in the general case, while for the average completion time β = 2 in the unit-time case. The general case here requires a more complicated randomized analysis in which β = 2 is achieved only “locally”. Concerning α, we obtain α 6 43 in the general case. For the special class of series-parallel orders, we even can show α = 1 in the case of unit-time communication delays, i.e., we derive a polynomial-time algorithm to minimize the makespan in the model with an unrestricted number of identical parallel processors and series-parallel orders. In several of these results, we construct from an optimal solution to an appropriate LP relaxation a closely related schedule for the corresponding problem with an unrestricted number of processors. We then use this schedule to analyze the performance of list scheduling in an order dictated by the optimal solution to the linear program. In particular, we make use of the choice of so-called favored successors by a schedule on an unrestricted number of processors which is itself obtained by rounding an LP solution. This shows lastingly that scheduling jobs with communication delays on an unrestricted number of processors is an important theoretical and practical tool not only for the understanding of the difficulty of problems involving communication delays — the NP-hardness result of [Pic95] as well as our NP-hardness result for series-parallel precedence constraints, unit-time jobs, and zero-one communication delays — but also in the design of approximation algorithms with good worst-case performance guarantees. Due to space limitations some details are omitted from this paper. A complete version can be obtained from the authors [MSS96].

2 Preliminaries The class of identical parallel processor scheduling problems with communication delays is denoted by Pjprec; r j ; p j ; ci j jκ. An instance I of this class is given by a tuple I = (m; N ; Θ; r; p; c; κ), where m is the number of available identical parallel processors and N = f1; : : : ; ng denotes the set of jobs. Each processor can process at most one job at a time, and each job j requires one processor for an uninterrupted period of p j time units. Jobs are assumed to arrive over time, i. e., job j is available from time r j > 0. The partial order of precedence constraints among the jobs is denoted by Θ, an individual precedence constraint is denoted by i Θ j, or simply by i  j. The set of successors of a job i is the set Succ(i) = f j 2 N j i  jg. Similarly, Pred( j) denotes the set of predecessors of j. If i  j and there is no job k with i  k  j, then we denote this by i  j and call j a direct successor of i and i a direct predecessor of j. The sets of all direct predecessors and direct successors of j is denoted by Pred!( j) and Succ!( j), respectively. For every pair of jobs i, j such that i is a direct predecessor of j in Θ, there is a

communication delay ci j > 0. This delay will only occur if job j is scheduled on a different processor than i. It models the time that is required to transfer data from the processor processing i to that processing j. The vector of these communication delays is denoted by c. Notice that the communication delays only depend on the tasks, that any two processors may communicate, and that task replication is not allowed. Communication delays are said to be small if the largest communication delay does not exceed the smallest processing time. This is a reasonable assumption when processing times do not differ too much, since the time needed to communicate information from a job should be smaller than the time needed to process it. Hanen and 4 Munier [HM95] need this assumption for their ( 73 ? 3m )-approximation algorithm for Pjprec; ci j smalljCmax . We will pose a weaker assumption, however, namely locally small communication delays. That is, we assume that for every pair i  j of jobs the communication delay ci j between them is no larger than the processing time of any job that is either an immediate predecessor of j or an immediate successor of i. This better reflects the natural assumption that we cannot transfer more information from job i to job j than created by job i or processed by job j. We define ci j ρi j = for i  j; and ρ = max ρi j : i j minf pk j k 2 Pred!( j) [ Succ!(i)g

We will assume ρ 6 1 throughout the paper. Finally, κ is a regular performance measure that measures the quality of a schedule. Here, κ will either be the makespan (κ = Cmax ) or the weighted sum of completion times (κ = ∑ w j C j ). A schedule S for an instance I is a function assigning a starting time S j to each job j 2 N. Then C j = S j + p j is the completion time of job j in S, and C denotes the vector of completion times associated with the schedule. Depending on what is needed, we will use S and C interchangeably to denote a schedule. The makespan of a schedule S is then given by Cmax (S) = maxfC j j j 2 N g, while the total weighted completion time of S is given by ∑ j2N w j C j , where w j > 0 is a weight associated with job j. A schedule S for an instance I is feasible if it satisfies the following constraints: (I) ]f j 2 N j S j 6 t < C j g 6 m for every t ; 0 6 t 6 Cmax (S),

(II) S j > max(fr j g[fCi j i 2 N ; i  jg) for all jobs j,

(IIIa) S j < Ci + ci j for at most one job j 2 N with i  j and ci j > 0, for all jobs i,

(IIIb) Ci + ci j > S j for at most one job i 2 N with i  j and ci j > 0, for all jobs j.

Condition (IIIa) ensures that, for every job i, at most one direct successor j is scheduled without communication delay on the same processor as job i. If such a job exists, it is called the favored successor of job i. Similarly, (IIIb) ensures the existence of at most one favored predecessor i of j. Notice that this definition of feasibility relies on our assumption of locally small communication delays. For a feasible schedule S one can easily construct in linear time a feasible processor assignment such that two jobs i  j assigned to different processors satisfy S j > Ci + ci j . We will often consider the case that the number of processors is unrestricted (though n suffice). The associated class of problems is denoted by P∞jprec; r j ; p j ; ci j jκ.

3 Generalized List Scheduling In this section, we present and analyze our “master” algorithm which will be used in all of our approximation algorithms, either explicitly or implicitly. It can be seen as an extended list scheduling algorithm that, in contrast to ordinary list scheduling, also takes the communication delays into account. Graham’s original list-scheduling rule is a (2 ? 1 m )-approximation algorithm for P jprecjCmax . In this algorithm, the jobs are ordered in some list, and whenever one of the m identical parallel processors becomes idle, the next available job in the list is started, where a job is available if all its predecessors have completed processing. Graham showed that the resulting schedule C satisfies Cj

6 ∑

k2C j

pk

+

∑k2N nC j pk m

;

(3.1)

where C j denotes the set of jobs that form a longest chain (with respect to processing times) of jobs ending with job j. We shall analyze a variant of Graham’s list-scheduling rule that takes communication delays into account. Given an instance I = (m; N ; Θ; r; p; c; κ), its input consists of a priority list of the jobs and of a schedule S∞ that is feasible for the corresponding problem where the restriction on the number of processors is neglected. We define x∞ ij = 1 if, for a pair of jobs i  j, j is the favored successor of i in S∞ , i.e. if S∞j < Ci∞ + ci j , and x∞ i j = 0 otherwise. The schedule produced by the extended list-scheduling rule is denoted by CH . We call job j available at time t if t t

> >

r j; CiH + (1

?

and

for all i  j:

x∞ i j )ci j

Our algorithm proceeds as follows. Whenever we enter a time period such that less than m jobs are scheduled, we start the first available job in the list. Observe that, in the absence of (non-trivial) release dates and communication delays, this algorithm coincides with Graham’s rule. Moreover, whenever job j is the favored successor of job i in CH , then j is the favored successor of job i in S∞ . The same holds for favored predecessors, of course. Note that the generalized list scheduling algorithm does not explicitly determine a processor assignment, but the resulting schedule fulfills the conditions (I), (II), and (III) of feasibility, from which a processor assignment can easily be obtained in linear time. The next theorem is an extension of (3.1) to the generalized list scheduling algorithm. Theorem 3.1. For Pjprec; r j ; p j ; ci j jκ, the generalized list scheduling algorithm produces a feasible schedule CH such that, for each j = 1; : : : ; n, CHj

6

C∞j

+

∑k2N nC j pk m

;

(3.2)

where m is the given number of processors, and C j is a longest chain (with respect to processing times, communication delays, and release dates) of jobs ending with job j.

Proof. Let us focus on a particular job j. We show that the time interval from 0 to CHj can be partitioned into two sets of intervals as follows. Let t0 = CHj and j1 = j. We first derive a chain of jobs js  js?1    j1 in the following way. Inductively, for k = 1; : : : ; s define tk as the time at which job jk becomes available; if tk = r jk , then set s = k, and the construction is complete. Otherwise, let jk+1 denote a predecessor of job jk such that tk = CHjk+1 + (1 ? x∞jk+1 ; jk )c jk+1 ; jk . Let C j denote the set of jobs in this 1 ∞ ((1 ? x∞ chain. Clearly, r js + p js + ∑`s? js?`+1 ; js?` )c js?`+1 ; js?` + p js?` ) 6 C j . We can think =1 of this lower bound as the total length of the union of time intervals in which some job in this chain is processed, together with the induced communication delays, and together with the interval [0; r js ]. So to compute an upper bound on CHj we need only consider the complementary set of time intervals within [0; CHj ]. Consider any point t in such an interval which is contained, say, in (tk ; tk?1 ]. At this point in time, job jk is available. Since it is not being processed, this implies that no processor is idle; hence, each processor is processing some job in N n C j . For unit-time jobs, zero-one communication delays, and no (non-trivial) release dates, the algorithm and its analysis can be improved. Notice that, due to the choice of favored successors, some jobs might actually be earlier available than our definition of availability suggests. This can be exploited by a kind of post-processing. We scan the schedule delivered by the generalized list scheduling algorithm from the beginning to its end, and whenever a job can be moved to an earlier time period on an idle processor we move this job as far as possible such that still all communication delays are obeyed. This has the following consequence for the analysis given in the proof of Theorem 3.1. Whenever communication delays contribute to the length of the considered chain there is actually at least another job which does not belong to the chain but enforces this communication delay. Since those jobs do not contribute to the second term of the right-hand side of (3.2) we obtain the following lemma. Lemma 3.2. For Pjprec; p j = 1; ci j 2 f0; 1gjκ, the generalized list scheduling algorithm followed by post-processing produces a feasible schedule CH that satisfies, for each job j = 1; : : : ; n, CHj

6

(1

? m1 )C∞j

+

∑k2N nC j pk m

;

(3.3)

where m is the given number of processors, and C j is a longest chain (with respect to processing times and communication delays) of jobs ending with job j.

4 The LP Relaxation In the previous section, we introduced an algorithm that needs a feasible schedule on an unrestricted number of processors as input. We now propose an LP relaxation of the considered scheduling problem that provides such a schedule S∞ . It can be seen as a combination of the LP relaxations of Munier and König [MK93] as well as Hanen and Munier [HM95] for scheduling to minimize the makespan subject to small communication delays but with an unrestricted number of processors, and the LP relaxation of Schulz [Sch95, Sch96] (see also [HSSW96]) for scheduling to minimize the

total weighted completion time subject to release dates and precedence constraints (and a given number m of processors). We use variables C j which denote the completion time of job j and 0=1 variables x jk for jobs j  k to indicate whether k is the favored successor of j (x jk = 1), or not (x jk = 0). The relaxation is as follows. n

∑ w jC j

minimize

(4.1)

j =1



subject to

j 2A

p jC j

>



x jk

> > 6



x jk x jk

Cj Ck k2Succ!( j) j 2Pred!(k)

2 1 1  2 + ∑ pj p j 2 m j∑ 2 j2A 2A rj + pj C j + pk + c jk (1 ? x jk )

for all A  N ;

(4.2)

for all j 2 N ; for all j  k;

(4.3) (4.4)

1

for all j 2 N ;

6

1

for all k 2 N ;

(4.6)

>

0

for all j  k:

(4.7)

(4.5)

For our purposes, it is important to note that this linear program is solvable in polynomial time. Since Schulz [Sch95, Sch96] showed that the separation problem associated with the inequalities (4.2) is solvable in polynomial time, this is indeed possible by use of the ellipsoid method [GLS88]. Henceforth, we denote by (CLP ; xLP ) an optimal LP solution. The following lemma is due to Schulz [Sch96]. Lemma 4.1. Let C 2 RN be a point satisfying the inequalities (4.2). Let A  N and j 2 A, and assume that Ck 6 C j for all jobs k 2 A. Then C also satisfies ∑k2A pk 6 2C j : m Given a solution (C; x) to the linear programming relaxation (4.1) – (4.7), it is easy to obtain a solution (C∞ ; x∞ ) that is feasible when the limited number of processors is neglected: If x jk > 12 , we set x∞jk := 1, and we define x∞jk := 0, otherwise. For each job j, it follows from inequality (4.5) that at most one variable x jk satisfies x jk > 12 . Hence, every job j has at most one favored successor k in the schedule C∞ . (And every job has at most one favored predecessor, of course.) The completion times of the implied schedule are then recursively defined by setting

(

Ck∞ :=

rk +  pk ;

max

rk + pk ; max j2Pred!(k) C∞j + pk + (1

?

x∞jk)c jk

/ if Pred(k) = 0,

 ;

otherwise.

(4.8)

The following theorem extends a result of Hanen and Munier [HM95] for the makespan. Theorem 4.2. Let (C; x) be a point that satisfies the inequality system (4.3) – (4.7). Let C∞ be the schedule defined by (4.8). Then, for each j = 1; : : : ; n, C∞j

6

2 (1 + ρ ) Cj 2+ρ

6

4 Cj 3

;

where ρ is the upper bound on the ratio of communication delays to locally relevant processing times as defined earlier. Theorem 4.2 leads directly to performance guarantees for the schedule S∞ obtained by rounding a solution of the LP–relaxation defined by (4.1) and (4.3) – (4.7).

Corollary 4.3. For both problems, P∞jprec; r j ; ci j jCmax and P∞jprec; r j ; ci j j ∑ w j C j , there exists a 2(21++ρρ) –approximation algorithm. Moreover, the ratio of the optimum for these problems to the optimum value of the respective LP defined by (4.1) and (4.3) – (4.7) is the same.

5 Minimizing the Makespan Minimizing the makespan on a restricted number of processors is NP-hard, even for unit processing times, unit–time communication delays, and trees as precedence constraints [LVV96]. We derive in this section a simple 73 -approximation algorithm for arbitrary precedence constraints, arbitrary processing times, release dates, and locally small communication delays, and a 2-approximation algorithm for the case without (non-trivial) release dates and with unit-time communication delays, if, in addition, the precedence constraints form a series-parallel order.

5.1

The General Case

In this subsection, we show how the previous results can be used to derive the best approximation algorithm for Pjprec; r j ; ci j jCmax . First, we introduce a dummy job which is a successor of all maximal job in the given partial order Θ. The respective communication delays are zero. Then we solve the LP relaxation (4.1) – (4.7) where all jobs have weight zero with the exception of the new job which obtains weight one. We then apply the generalized list scheduling algorithm to the schedule C∞ defined by (4.8) (and with an arbitrary priority list). Theorem 3.1 implies that the resulting schedule CH satisfies H 6 C∞ + ∑f pi ji2N g . Since the last term is clearly a lower bound on the makespan, Cmax max m we obtain with Corollary 4.3 the following performance guarantee. Theorem 5.1. For Pjprec; r j ; ci j jCmax , there exists a (1 + 2(21++ρρ) )-approximation algorithm. Notice that (1 + 2(21++ρρ) ) 6 73 .

4 ) in the It follows from Lemma 3.2 that this bound can be improved to ( 73 ? 3m absence of (non-trivial) release dates and in case of unit-time jobs and zero-one communication delays. This is exactly the performance guarantee that Hanen and Munier [HM95] derived for the same problem but with small communication delays. However, both the explanation and the analysis of our algorithm seem to be simpler.

5.2

Scheduling Series-Parallel Orders on an Unrestricted Number of Processors

We now show how to construct an optimal schedule for an unrestricted number of processors in the case of series-parallel precedence relations and unit-time communica-

tion delays. This means α = 1 in our master algorithm. Applying the generalized list scheduling algorithm to the resulting schedule S∞ together with an arbitrary priority list L yields a performance guarantee of 2 (or even (2 ? m1 ) in case of unit-time jobs) when only m machines are available. This is, perhaps surprisingly, the same performance guarantee as the best known for the same problem but without communication delays. Let us first recall some definitions and notations. A partial order Θ is called seriesparallel if it can be obtained recursively from singletons by two operations, the series composition Θ1  Θ2 and the parallel composition Θ1 [ Θ2 of two (series-parallel) suborders Θ1 ; Θ2 . According to this definition, every series-parallel order can be described by a (not necessarily unique) binary decomposition tree whose nodes correspond to series-parallel sub-orders, called the (series or parallel) blocks of the composition. For details, we refer to [Möh89]. We now compute the schedule S∞ along a binary decomposition tree in a bottom-up fashion. In doing so, we must consider how to obtain the starting times of the jobs in a parallel or series composition from the already computed starting times of the jobs in the corresponding two parallel or series blocks. It turns out that the sub-schedules for the different blocks have to be chosen carefully since a combination of two optimal subschedules for blocks B1 and B2 need not necessarily give an optimal schedule for the series composition B = B1  B2 . (Consider fag (fb1g[fb2g) fcg as an example). To capture this formally, we introduce the following notation. For a given schedule SB on a block B, define left(SB ) as the number of jobs that start at time 0 in SB and right(SB ) as the number of jobs completed at time Cmax (SB ). We will only consider schedules S with left(S) > 0 and right(S) > 0. These values model the different cases how schedules of two blocks can be put together in a parallel or series composition. The crucial case of the composition is always given when one has to compose schedules S1 ; S2 of two series blocks B1 ; B2 , and there are two or more jobs in the last time slot of S1 or in the first time slot of S2 . In this case, one must insert an extra time slot for the communication delays. The construction along the decomposition tree is such that this bad situation is avoided as much as possible. The problem of finding an optimal schedule for a series-parallel order Θ on B is now reduced to the problem of finding an optimal composition of feasible schedules for the blocks of Θ. To this end, however, not only optimal schedules have to be considered; it may be necessary to select sub-optimal schedules for some of the blocks. Constructing the schedule from the left to the right hand side, the decision whether to take an optimal schedule SB with with two or more jobs in the first time slot or a sub–optimal schedule with exactly one first job cannot be made locally when block B is considered, but only when a schedule for B will be used in the next series or parallel composition involving B. So we have to store both possibilities in order to choose the right one in the later computation. A schedule S on a block B of the decomposition tree is called strongly optimal if S has minimum makespan among all feasible schedules, and right(S) is minimum among all optimal schedules. The algorithm follows the binary series-parallel decomposition tree in a bottom-up fashion starting with all singletons, i. e., blocks B that only consist of one job. For each

block B the algorithm maintains a strongly optimal schedule SB and a strongly optimal schedule S¯B among all schedules with left(S¯B ) = 1. Our main result is that the knowledge of these two schedules for two blocks B1 , B2 is sufficient to construct the schedules SB and S¯B for the block B resulting from a composition of B1 and B2 . For two feasible schedules S0 and S00 on two blocks B1 and B2 , let S0 [ S00 denote the straight-forward parallel composition of the schedules S0 and S00 . (S0 [ S00 ) j equals S0j if j 2 B1 and S00j if j 2 B2 . Clearly, S0 [ S00 is a feasible schedule on B = B1 [ B2 . Similar, (S0  S00 ) j is defined as S0j if j 2 B1 and as Cmax (S0 ) + S00j if j 2 B2 . Notice that (S0  S00 ) not necessarily respects the communication delays between jobs of B1 and jobs of B2 . For a schedule S let (S + 1) denote the schedule that is obtained by (S + 1) j = S j + 1. With this notation, we can describe the algorithm as follows. For each block B, select the combination of the possible schedules (S1 ; S¯1 ) and (S2 ; S¯2 ) stored for the children blocks B1 and B2 of B that leads to a strongly optimal schedule SB and a strongly optimal schedule S¯B with left interface 1, respectively. In the case of a parallel composition B = B1 [ B2 , SB = S1 [ S2 is the best combination while for S¯B , either S¯1 [ (S2 + 1) or S¯2 [ (S1 + 1) can be the best choice. For a series composition B = B1  B2 , the situation is much simpler. If right(S1 ) > 2, an additional time slot has be introduced between the schedules on B1 and B2 . Hence, SB = S1  (S2 + 1) is strongly optimal while for the case that right(S1 ) = 1, the schedule SB = S1  S¯2 is the best choice. For S¯B just consider S¯1 instead of S1 and proceed as before. A profound analysis of the described algorithm yields the following theorem. Theorem 5.2. For P∞jprec = series-parallel; ci j structs an optimal schedule in linear time.

=

1jCmax , the above algorithms con-

In the case of forests and single source/single target series-parallel orders, the occurring interfaces left(S); right(S) are much simpler. In the case of (in-)forests, right(S) = 1 for all sub-schedules S and hence, also zero–one communication delays can be treated (see [MS96]). For single source/single target series-parallel orders, for each block, a feasible schedule contains exactly one job in its first and its last time slot, respectively. The presented approach can be extended to 0/1 communication delays if all communication delays are locally identical, i. e., one of the following conditions holds: – ci; j = ci;k for all jobs i; j; k 2 V with i  j; i k, – ci; j = ck; j for all jobs i; k; j 2 V with i  j; k  j, or if Θ is a single source/single target series-parallel order, i. e., a series-parallel order where each block contains exactly one minimal and one maximal element. In contrast, the problem with 0=1 communication delays is NP-hard. Theorem 5.3. The problem P∞jprec=series-parallel; p j NP–complete.

=

1 ; ci j

2 f0 1gjCmax 6 6 is ;

Proof. The proof follows from a modification of the proof by Hoogeveen, Veltman and Lenstra [HVL94] for the problem P∞jprec; p j = 1; ci j = 1jCmax 6 6. We omit the details here.

Consequently, for both problems, P∞jprec=series-parallel; p j = 1; ci j 2 f0; 1gjCmax and P∞jprec; p j = 1; ci j = 1jCmax there does not exist an approximation algorithm with performance guarantee better than 76 , unless P=NP. Notice, however, that for both problems the best we are aware of are 43 –approximation algorithms (see Corollary 4.3).

5.3

A 2-Approximation Algorithm for Series-Parallel Orders

Since for series-parallel precedence constraints and unit-time communication delays minimizing the makespan on an unrestricted number of processors is easy, the application of the generalized list scheduling algorithm to an optimal solution to this problem yields, by Lemma 3.2, the following result. Theorem 5.4. For Pjprec=series-parallel; ci j = 1jCmax , let S∞ be an optimal schedule for the corresponding problem with an unrestricted number of processors. If S∞ serves as input for the generalized list scheduling algorithm, then its performance guarantee is 2. It is (2 ? m1 ) if all processing times are one. A similar approach leads to a (2 ? m1 )–approximation algorithm for minimizing the makespan on identical parallel processors when there are zero-one communication delays and the precedences form an out-forest. The computed solution does, at the 1 same time, not differ more than m? 2 from the optimal one [MS96]. Lenstra, Veldhorst, and Veltman [LVV96] showed that the latter problem is NP-hard, even if we have both unit-time communication delays and unit processing times.

6 Minimizing the Average Weighted Completion Time In this section, we discuss the problem of minimizing the average weighted completion time, or equivalently, the total weighted completion time on identical parallel processors subject to locally small communication delays. First, we use the LP relaxation of this problem to construct a feasible schedule for the corresponding problem without restrictions on the number of processors. Second, we use this solution to apply the generalized list scheduling algorithm to Pjprec; r j ; p j = 1; c jk 2 f0; 1gj ∑ w jC j . The use of this algorithm for arbitrary processing times and locally small communication delays requires a more intricate decomposition of the LP solution into intervals. This is described in the second subsection.

6.1

Unit–Time Jobs

In this and the next subsection, we show how to use the LP relaxation to derive approximation algorithms for identical parallel processor problems with locally small communication delays and release dates. We start with the somewhat simpler case of unit–time jobs and zero–one communication delays, i.e., we assume in this section that p j = 1 for all jobs j 2 N and that c jk 2 f0; 1g for all j  k. The reason is that we can directly apply the generalized list scheduling algorithm which does not work well with arbitrary processing times, and this leads to much better performance guarantees.

We now define the approximation algorithm, called CD - NATURAL. Let (CLP ; xLP ) be an optimal solution to the LP relaxation (4.1) – (4.7) and index the jobs such that C1LP 6 C2LP 6  6 CnLP . This defines the list L which serves together with the schedule C∞ defined by (4.8) (with respect to CLP ) as the input of the generalized list scheduling algorithm. The output CH of this algorithm is the output of CD - NATURAL, too. The following observation is crucial for the analysis of CD - NATURAL. Lemma 6.1. Let L = (1; : : : ; n) be the list of all jobs in N which is used by the algorithm CD - NATURAL. Let CH be the output of this algorithm, and C∞ denotes the schedule defined by (4.8) with respect to CLP . Then, for each job j = 1; : : : ; n, CHj

6

C∞j

j

+

∑i=1 pi m

(6.1)

:

Proof. Notice that in contrast to inequality (3.2) the nominator of the second term of the right-hand side in (6.1) only summarizes jobs which precede job j in the input list L. This is caused by the unit processing times and can be seen by a careful look at the proof of Theorem 3.1. Theorem 6.2. For Pjprec; r j ; p j = 1; ci j 2 f0; 1gj ∑ w jC j , let w(CD - NATURAL) be the value of the schedule produced by algorithm CD - NATURAL. Let w(OPT) be the value of an optimal schedule. Then w(CD - NATURAL)

6

10 w(OPT) 3

:

In case of identical release dates, the performance guarantee is actually

10 3

? 3m4 .

Proof. The proof follows directly from Lemma 6.1, Lemma 4.1, and Theorem 4.2. We should mention that the best known approximation algorithm for the same problem but without communication delays (Pjprec; r j ; p j = 1j ∑ w j C j ) has a performance guarantee of 3 [Sch95, HSSW96], i.e., it is not much better. Indeed, the difference of 13 can be attributed to the fact that optimal scheduling without communication delays on an unrestricted number of processors can be done efficiently.

6.2

The General Case

In this section, we present the first approximation algorithm with constant worst-case performance guarantee for scheduling to minimize the average weighted completion time on identical parallel processors, for arbitrary processing times, release dates, precedence constraints, and locally small communication delays. Our algorithm combines the ideas presented before with the idea to partition the jobs according to an optimal LP solution and then to schedule the job subsets obtained in successive intervals of geometrically increasing size. The latter idea was successfully used to design approximation algorithms for scheduling problems without communication delays, see [HSSW96, CPS+ 96].

So let (CLP ; xLP ) denote an optimum solution to the linear programming problem (4.1) – (4.7). Our algorithm, we call it CD – PARTITION, works as follows. Given the LP solution, it partitions the time horizon into intervals. Define τ0 = 0 and τ` = γ 2` , ` = 1; : : : ; T , where γ 2 [0:5; 1] will be determined later. We partition the time horizon into intervals (τ`?1 ; τ` ], ` = 1; : : : ; T , where T is chosen so that each CLP j is contained in some interval. Accordingly, we partition the jobs into sets such that for each ` = 1; : : : ; T , the subset N` is the set of jobs j for which CLP j lies within (τ`?1 ; τ` ]. We now construct disjoint schedules for each set N` by applying the generalized list scheduling algorithm to each set but such that communication delays between jobs of successive sets are obeyed. That is, we redefine the availability of minimal jobs j in N` such that the communication delays from their direct predecessors in N`?1 are taken into account. To be more precise, we set τ¯ ` = δ τ`+1 + ∑`k=1 tk for ` = 1; : : : ; T , where tk = ∑i2Nk pi =m and δ = 2(21++ρρ) . We schedule the job set N` between τ`?1 and τ` , ` = 1; : : : ; T . For each job j 2 N` it follows from a refinement of the analysis of the generalized list scheduling algorithm that its completion time CHj is bounded by τ`?1 + t` + C∞j . In fact, the completion time CHj can be bounded by a chain Γ of jobs in N1 [ [ N` that ends with job j, by successively considering the predecessor a of the current job b as in the proof of Theorem 3.1. Let Γ stop if a job k 2 Nα for some α  ` is reached such that SkH = τ¯ α?1 . From the proof of Theorem 3.1 we obtain CHj 6 τα?1 +

∑ tν + C∞j 6 δ τα + ∑ tν + C∞j 6 τ`?1 + t` + C∞j `

`

ν=α

ν=1

:

Thus, it follows from the definition of the interval limits and Lemma 4.1 that the completion time of job j 2 N` is bounded by (4 + 2 δ) τ`?1 + C∞j . This leads to a (4 + 3 δ)– approximation algorithm when we choose γ = 0:5. But we can choose γ better by incorporating an idea of Goemans and Kleinberg [GK96], as in the absence of communication delays (see [CPS+ 96]). Let B j denote the start of the interval in which CLP j occurs, i. e., B j = τ`?1 . If we choose γ at random, B j becomes a random variable. We choose X uniformly in the interval [0; 1] and set γ = 2?X . A routine calculation shows 1 CLP that the expected value of B j is equal to 2 ln2 j . This implies that the expected ratio between the total weighted completion time of the schedule determined by algorithm CD - PARTITION and the optimum is at most 2ln+2δ + δ < 6:14232. It is quite simple to derandomize this algorithm. This leads to the following result. Theorem 6.3. There is a 6:14232–approximation algorithm for minimizing the average weighted completion time on identical parallel processors subject to precedence constraints, release dates, and locally small communication delays. Moreover, the ratio between the optimum and the value obtained from the LP relaxation (4.1) – (4.7) is 6:14232, too.

7 Concluding Remarks In the analysis of the approximation algorithms for the average weighted completion time, we have actually shown that CHj 6 (α + β)CLP j , for each job j. This implies that

these algorithms can be used in a straightforward manner to approximate other objective functions as well. For instance, we obtain (α + β)-approximation algorithms to minimize the weighted sum of the makespan and the average weighted completion time, or (α + β)2 -approximation algorithms to minimize the weighted sum of the squares of the completion times. Moreover, the ratios between the respective optima and the corresponding LP values is again the same. Furthermore, we have shown in Lemma 3.2 that the performance guarantee of the generalized list scheduling can be improved by an additive term of ? m1 in case of unittime jobs. This remains true for arbitrary processing times and locally small communication delays, with the obvious implications for the performance guarantees of our algorithms. The details will be given in the full version of this paper [MSS96].

References [CP95]

P. Chrétienne and C. Picouleau, Scheduling with communication delays: a survey, Scheduling Theory and its Applications (P. Chrétienne, E. G. Coffman Jr, J. K. Lenstra, and Z. Liu, eds.), John Wiley & Sons, 1995, pp. 65–90. [CPS+ 96] S. Chakrabarti, C. A. Phillips, A. S. Schulz, D. B. Shmoys, C. Stein, and J. Wein, Improved scheduling algorithms for minsum criteria, 1996, To appear in Springer Lecture Notes in Computer Science, Proceedings of the 23rd ICALP Conference. [GK96] M. X. Goemans and J. Kleinberg, An improved approximation ratio for the minimum latency problem, Proceedings of the 7th ACM–SIAM Symposium on Discrete Algorithms, 1996. [GLS88] M. Grötschel, L. Lovász, and A. Schrijver, Geometric Algorithms and Combinatorial Optimization, Algorithms and Combinatorics, vol. 2, Springer, Berlin, 1988. [Gra66] R. L. Graham, Bounds for certain multiprocessing anomalies, Bell System Tech. J. 45 (1966), 1563 – 1581. [HM95] C. Hanen and A. Munier, An approximation algorithm for scheduling dependent tasks on m processors with small communication delays, Preprint, Laboratoire Informatique Théorique et Programmation, Institut Blaise Pascal, Université Pierre et Marie Curie, 1995. [HSSW96] L. A. Hall, A. S. Schulz, D. B. Shmoys, and J. Wein, Scheduling to minimize average completion time: Off–line and on–line approximation algorithms, Preprint 516/1996, Department of Mathematics, University of Technology, Berlin, Germany, 1996, submitted. Available from ftp://ftp.math.tu-berlin.de/pub/Preprints/combi/Report-5161996.ps.Z. [HSW96] L. A. Hall, D. B. Shmoys, and J. Wein, Scheduling to minimize average completion time: Off–line and on–line algorithms, Proceedings of the 7th ACM–SIAM Symposium on Discrete Algorithms, 1996, pp. 142 – 151. [HVL94] J. A. Hoogeveen, B. Veltman, and J. K. Lenstra, Three, four, five, six, or the complexity of scheduling with communication delays, Operations Research Letters 16 (1994), 129–137. [Law93] E. L. Lawler, Scheduling trees on multiprocessors with unit communication delays, Presented at the First Workshop on Models and Algorithms for Planning and Scheduling Problems, unpublished manuscript, June 1993. [LVV96] J. K. Lenstra, M. Veldhorst, and B. Veltman, The complexity of scheduling trees with communication delays, Journal of Algorithms 20 (1996), 157 – 173.

[MK93]

[Möh89]

[MS96]

[MSS96]

[Pic95] [PSW95]

[RS87] [Sch95] [Sch96]

[Ver95] [VLL90]

A. Munier and J.-C. König, A heuristic for a scheduling problem with communication delays, Preprint 871, Laboratoire de Recherche en informatique, Université de Paris, France, 1993, to appear in Operations Research, 1996. R. H. Möhring, Computationally tractable classes of ordered sets, Algorithms and Order (I. Rival, ed.), Nato Advanced Study Institutes Series, D. Reidel Publishing Company, Dordrecht, 1989, pp. 105–193. R. H. Möhring and M. W. Schäffter, A simple approximation algorithm for scheduling forests with unit processing times and zero-one communication delays, Preprint No. 506/1996, University of Technology, Berlin, 1996, ftp://ftp.math.tuberlin.de/pub/Preprints/combi/Report-506-1995.ps.Z. R. H. Möhring, M. W. Schäffter, and A. S. Schulz, Scheduling jobs with communication delays: Using infeasible solutions for approximation, Preprint 517/1996, Department of Mathematics, University of Technology, Berlin, Germany, 1996. C. Picouleau, New complexity results on scheduling with small communication delays, Discrete Applied Mathematics 60 (1995), 331 – 342. C. Phillips, C. Stein, and J. Wein, Scheduling jobs that arrive over time, Proceedings of the Fourth Workshop on Algorithms and Data Structures (Berlin), Lecture Notes in Computer Science, no. 955, Springer, Berlin, 1995, pp. 86 – 97. V. J. Rayward-Smith, UET scheduling with unit interprocessor communication delays, Discrete Applied Math. 18 (1987), 55–71. A. S. Schulz, Polytopes and scheduling, Ph.D. thesis, University of Technology, Berlin, Germany, 1995. A. S. Schulz, Scheduling to minimize total weighted completion time: Performance guarantees of LP–based heuristics and lower bounds, Integer Programming and Combinatorial Optimization (Berlin) (W. H. Cunningham, S. T. McCormick, and M. Queyranne, eds.), Lecture Notes in Computer Science, no. 1084, Springer, Berlin, 1996, Proceedings of the 5th International IPCO Conference, pp. 301 – 315. J. Verriet, Scheduling UET, UCT dags with release dates and deadlines, Preprint No. UU-CS-1995-31, Utrecht University, Department of Computer Science, 1995. B. Veltman, B. Lageweg, and J. K. Lenstra, Multiprocessor scheduling with commmunication delays, Parallel Computing 16 (1990), 173–182.