Improved Results for Data Migration and Open Shop ... - CiteSeerX

Improved Results for Data Migration and Open Shop Scheduling∗ Rajiv Gandhi†

Magn´ us M. Halld´orsson‡

Guy Kortsarz†

Hadas Shachnai§

Abstract The data migration problem is to compute an efficient plan for moving data stored on devices in a network from one configuration to another. We consider this problem with the objective of minimizing the sum of completion times of all storage devices. It is modeled by a transfer graph, where vertices represent the storage devices, and the edges indicate the data transfers required between pairs of devices. Each vertex has a non-negative weight, and each edge has a release time and a processing time. A vertex completes when all the edges incident on it complete, under the constraint that two edges incident on the same vertex cannot be processed simultaneously. The objective is to minimize the sum of weighted completion times of all vertices. Kim (Proc. ACM-SIAM Symposium on Discrete Algorithms, 97-98, 2003 ) gave a 9-approximation algorithm for the problem when edges have arbitrary processing times and are released at time 0. We improve Kim’s result by giving a 5.06-approximation algorithm. We also P address the open shop scheduling problem, O|rj | wj Cj , and show that it is a special case of the data migration problem. Queyranne and Sviridenko (Journal of Scheduling, 5:287-305, 2002 ) gave a 5.83-approximation algorithm for the non-preemptive version of the open shop problem. They state as an obvious open question whether there exists an algorithm for open shop scheduling that gives a performance guarantee better than 5.83. Our 5.06 algorithm for data migration proves the existence of such an algorithm. Crucial to our improved result is a property of the linear programming relaxation for the problem. Similar linear programs have been used for various other scheduling problems. Our technique may be useful in obtaining improved results for these problems as well.

Key Words and Phrases: Data migration, open shop, scheduling, linear programming, LP rounding, approximation algorithms. ∗

A preliminary version of this paper appears in the proceedings of the 31st International Colloquium on Automata, Languages and Programming (ICALP’04). † Department of Computer Science, Rutgers University, Camden, NJ 08102. E-mail: {rajivg,guyk}@camden.rutgers.edu. ‡ Department of Computer Science, University of Iceland, IS-107 Reykjavik, Iceland. E-mail: [email protected]. § Department of Computer Science, The Technion, Haifa 32000, Israel. E-mail: [email protected]. Part of this work was done while the author was on leave at Bell Laboratories, Lucent Technologies, 600 Mountain Ave., Murray Hill, NJ 07974.

1

1

Introduction

The data migration problem arises in large storage systems, such as Storage Area Networks [13], where a dedicated network of disks is used to store multimedia data. As the data access pattern changes over time, the load across the disks needs to be rebalanced so as to continue providing efficient service. This is done by computing a new data layout and then “migrating” data to convert the initial data layout to the target data layout. While migration is being performed, the storage system is running suboptimally, therefore it is important to compute a data migration schedule that converts the initial layout to the target layout quickly. This problem can be modeled as a transfer graph [14], in which the vertices represent the storage disks and an edge between two vertices u and v corresponds to a data object that must be transferred from u to v, or vice-versa. Each edge has a processing time (or length) that represents the transfer time of a data object between the disks corresponding to the end points of the edge. An important constraint is that any disk can be involved in at most one transfer at any time. Several variations of the data migration problem have been studied. These variations arise either due to different objective functions or due to additional constraints. One common objective function is to minimize the makespan of the migration schedule, i.e., the time by which all migrations complete. Coffman et al. [6] introduced this problem. They showed that when edges may have arbitrary lengths, a class of greedy algorithms yields a 2-approximation to the minimum makespan. In the special case where the edges have equal (unit) lengths, the problem reduces to edge coloring of the transfer (multi)graph of the system for which an asymptotic approximation scheme is now known [21]. Hall et al. [9] studied the data migration problem with unit edge lengths and capacity constraints; that is, the migration schedule must respect the storage constraints of the disks. The paper gives a simple 3/2-approximation algorithm for the problem. The papers [9, 1] also present approximation algorithms for the makespan minimization problem with the following constraints: (i) data can only be moved, i.e, no new copies of a data object can be created, (ii) additional nodes can assist in data transfers, and (iii) each disk has a unit of spare storage. Khuller et al. [13] solved a more general problem, where each data object can also be copied. They gave a constant factor approximation algorithm for the problem. Another objective function is to minimize the average completion time over all data migrations. This corresponds to minimizing the average edge completion time in the transfer graph. For the case of unit edges lengths, Bar-Noy et al. [3] showed that the problem is NP-hard and gave a simple 2-approximation algorithm. For arbitrary edge lengths, Halld´ orsson et al. [11] gave a 12-approximation algorithm for the problem. This was improved to 10 by Kim [14]. In this paper, we study the data migration problem with the objective of minimizing 2

the average completion time over all storage disks. Indeed, this objective favors the individual storage devices, which are often geographically distributed over a large network. It is therefore natural to try and minimize the average amount of time that each of these (independent) devices is involved in the migration process. For the case where vertices have arbitrary weights, and the edges have unit length, Kim [14] proved that the problem is NP-hard and showed that Graham’s list scheduling algorithm [7], when guided by an optimal solution to a linear programming relaxation, gives an approximation ratio of 3. She also gave a 9-approximation algorithm for the case where edges have arbitrary lengths. We show that the analysis of the 3-approximation algorithm is tight, and for the case where edges have release times and arbitrary lengths, we give a 5.06-approximation algorithm. A problem related to the data migration problem is non-preemptive open shop schedulP ing, denoted by O|rj | wj Cj in the standard three-field notation [15]. In this problem, we have a set of jobs, J , and a set of machines M1 , . . . , Mm . Each job Jj ∈ J consists of a set of m operations: oj,i has the processing time pj,i and must be processed on Mi , 1 ≤ i ≤ m. Each machine can process a single operation at any time, and two operations that belong to the same job cannot be processed simultaneously. Also, each job Jj has a positive weight, wj , and a release time, rj , which means that no operation of Jj can start before rj . The objective is to minimize the sum of weighted completion times of all jobs. This problem is MAX-SNP hard [12]. Chakrabarti et al. [5] gave a (5.78 + )-approximation algorithm for the case where the number of machines, m, is some fixed constant. They also gave a (2.89 + )-approximation algorithm for the preemptive version of the problem and fixed number of machines. For arbitrary number of machines, Queyranne and Sviridenko [19] presented algorithms that yield approximation factors of 5.83 and 3 for the non-preemptive and preemptive versions of the problems, respectively. The approximation factor for the preemptive version was subsequently improved to (2 + ) by the same authors [18]. Our Contribution Since the open shop scheduling problem is a special case of the data migration problem all of our positive results for data migration apply to open shop scheduling. Note that the MAX-SNP hardness of the data migration problem follows from the MAX-SNP hardness of open shop scheduling [12]. Our main result is a 5.06-approximation algorithm for the data migration problem with arbitrary edge lengths. Our algorithm is based on rounding a solution of a linear programming (LP) relaxation of the problem. The general idea of our algorithm is inspired by the work of Halld´ orsson et al. [11] in that the edges have to wait before they are actually processed (i.e., data transfer begins). Even though the high-level idea is similar, there are subtle differences that are crucial to the improved results that we present here. Our method combines solutions obtained by using two different wait functions. It is interesting to note that each solution (when all edges are released at time 0) is a 5.83-approximate solution, which is the approximation ratio obtained

3

by Queyranne and Sviridenko [19]. In their algorithm artificial precedence constraints are defined and after that a schedule is obtained by a greedy algorithm whereas in our work artificial wait function for the edges (edges have to wait for some times before being processed) is defined after which a schedule is obtained using a greedy approach. To obtain an approximation ratio better than 5.83, we crucially use a property of the LP relaxation that we prove in Lemma 3.1. Although the LP relaxation has been used earlier [23, 17, 20, 10, 14, 19], we are not aware of any previous work that uses such a property of the LP. Our technique may be useful for deriving improved results for other shop scheduling problems. For the case where edges have unit lengths, we show, by giving a tight example, that the list scheduling analysis of Kim [14] is tight. This illustrates the limitations of the LP relaxation. Finally, we study the open shop problem under operations completion time criteria (cf. [19]); that is, we sum the completion times of all operations for every job. For the special case of unit length operations with arbitrary non-negative weights, we show that an algorithm of [11] yields a 1.796 approximation algorithm for the problem.

2

Relation of Data Migration and Open Shop Scheduling

In this section, we formally state the data migration and open shop scheduling problems and show that the latter is a special case of the former. Data Migration Problem We are given a graph G = (V, E). The vertices represent storage devices, and the edges correspond to data transmissions among the devices. We denote by E(u) the set of edges incident on a vertex u. Each vertex v has weight wv and processing time 0. Each edge e has a length, or processing time, pe . Moreover, each edge e can be processed only after its release time re . All release times and processing times are non-negative integers. The completion time of an edge is simply the time at which its processing is completed. Each vertex v can complete only after all the edges in E(v) are completed. Since each vertex v has the processing time 0, the completion time, Cv , of v is the latest completion time of any edge in E(v). The crucial constraint is that two edges incident on the same vertex cannot be processed at the same time. The objective is to P minimize v∈V wv Cv . Open Shop Scheduling Problem We are given a set of jobs J = {J1 , . . . , Jn }, to be scheduled on a set of machines M = {M1 , . . . , Mm }. Each job Jj has a non-negative weight wj ; also, Jj consists of a set of m operations oj,1 , . . . , oj,m , with the corresponding processing times pj,i , 1 ≤ i ≤ m; the operation oj,i must be processed on the machine Mi . Each machine can process at most one operation at any time, and no two operations belonging to the same job can be processed simultaneously. The completion time Cj of each

4

job Jj is the latest completion time of any of its operations. The objective is to minimize P Jj ∈J wj Cj .

The open shop scheduling problem is a special case of the data migration problem, as shown by the following reduction. Given an instance of the open shop scheduling problem, construct a bipartite graph B = (J, M, F ) as follows. Each vertex j` ∈ J represents a job J` ∈ J , and each vertex mi ∈ M represents a machine Mi ∈ M. The edge (j` , mi ) ∈ F with processing time p`,i corresponds to the operation o`,i , 1 ≤ i ≤ m. Assign wmi = 0 to each vertex mi ∈ M , and wj` = w` (i.e., the weight of the job J` ) to each vertex j` ∈ J. It is now easy to verify that any data migration schedule for B is a valid solution for the corresponding open shop problem. In the remainder of the paper, we consider only the data migration problem, with the understanding that all of our results apply to open shop scheduling.

3

A Linear Programming Relaxation

The linear programming relaxation for the data migration problem (without release times) was given by Kim [14]. Such relaxations have been proposed earlier by Wolsey [23] and Queyranne [17] for single machine scheduling problems and by Schulz [20] and Hall et al. [10] for parallel machines and flow shop problems. For the sake of completeness, we state below the LP relaxation for the data migration problem. For an edge e (vertex v) let Ce (Cv ) be the variable that represents the completion time P of e (resp., v) in the LP relaxation. For any set of edges S ⊆ E, let p(S) = e∈S pe and P p(S 2 ) = e∈S p2e . (LP )

minimize Cv ≥ re + pe ,

subject to:

Cv ≥ p(E(v)), X

e∈S(v)

P

wv Cv

(1)

∀v ∈ V, e ∈ E(v)

(2)

v∈V

∀v ∈ V

(3)

Cv ≥ Ce , ∀v ∈ V, e ∈ E(v) 2 p(S(v)) + p(S(v)2 ) pe Ce ≥ , ∀v ∈ V, S(v) ⊆ E(v) 2 Ce ≥ 0,

∀e ∈ E

Cv ≥ 0,

∀v ∈ V

(4) (5) (6)

.

(7)

The set of constraints represented by (2), (3), and (4) are due to the different lower bounds on the completion times of a vertex. The justification for constraints (5) is as follows. By the problem definition, no two edges incident on the same vertex can be scheduled at the same time. Consider any ordering of the edges in S(v) ⊆ E(v). If an edge e ∈ S(v) is 5

the j-th edge to be scheduled among the edges in S(v) then, setting Cj = Ce and pj = pe , we get |S(v)| |S(v)| |S(v)| j j X X X X X p(S(v))2 + p(S(v)2 ) pj Cj ≥ pj pk = pj p k = . 2 j=1

j=1

j=1 k=1

k=1

Although there are exponentially many constraints, the above LP can be solved in polynomial time via the ellipsoid algorithm [17].

3.1

A Property of the LP

In this section, we state and prove a property of the LP that plays a crucial role in the analysis of our algorithm. Let X(v, t1 , t2 ) ⊆ E(v) denote the set of edges that complete in the time interval (t1 , t2 ] in the LP solution (namely, their fractional value belongs to this interval). Hall et al. [10] showed that p(X(v, 0, t)) ≤ 2t. In Lemma 3.1 we prove a stronger property of a solution given by the above LP. Intuitively, our property states that if too many edges complete early, then the completion times of other edges must be delayed. For example, as a consequence of our property, for any t > 0 if p(X(v, 0, t/2)) = t then p(X(v, t/2, t)) = 0, which means that no edges in E(v) \ X(v, 0, t/2) complete before t in the LP solution. We now formally state and prove the lemma. Lemma 3.1 Consider a vertex v and times t1 > 0 and t2 ≥ t1 . If p(X(v, 0, t1 )) = λ1 and p(X(v, t1 , t2 )) = λ2 , then λ1 and λ2 are related by q λ2 ≤ t2 − λ1 + t22 − 2λ1 (t2 − t1 ) Proof: Using the constraint (5) of the LP relaxation for vertex v, we get p(X(v, 0, t2 ))2 2

≤

X

pe Ce +

e∈X(v,0,t1 )

X

pe Ce

e∈X(v,t1 ,t2 )

≤ p(X(v, 0, t1 ))t1 + p(X(v, t1 , t2 ))t2 2

∴ (λ1 + λ2 )

≤ 2λ1 t1 + 2λ2 t2

∴ (λ1 + λ2 − t2 )2 ≤ t22 − 2λ1 (t2 − t1 ) q ∴ λ2 ≤ t2 − λ1 + t22 − 2λ1 (t2 − t1 )

The following result of [10] follows from Lemma 3.1 by substituting t1 = 0, λ1 = 0, and t2 = t. Corollary 3.2 For any vertex v and time t ≥ 0, p(X(v, 0, t)) ≤ 2t. 6

4

Algorithm

Note that if an edge has processing time 0, it can be processed as soon as it is released, without consuming any time steps. Hence, without loss of generality, we assume that the processing time of each edge is a positive integer. The algorithm is parameterized by a wait function W : E → R+ . The idea is that each edge e must wait for We (We ≥ re ) time steps before it can actually start processing. The algorithm processes the edges (that have waited enough time) in non-decreasing order of their completion times in the LP solution. When e is being processed, we say that e is active. Once it becomes active, it remains active for pe time steps, after which it is finished. A not-yet-active edge can be waiting only if none of its neighboring edges are active; otherwise, it is said to be delayed. Thus, at any time, an edge is in one of four modes: delayed, waiting, active, or finished. When adding new active edges, among those that have done their waiting duty, the algorithm uses the LP completion time as priority. The precise rules are given in the pseudocode in Fig. 1. Let wait(e, t) denote the number of time steps that e has waited until the end of time step t. Let Active(t) be the set of fe (C fu ) be the completion time of edge e (vertex active edges during time step t. Let C u) in our algorithm. The algorithm in Fig. 1, implemented as is, would run in pseudopolynomial time; however, it is easy to implement the algorithm in strongly polynomial time, by increasing t in each iteration by the smallest remaining processing time of an active edge. One property of our processing rules, that distinguishes it from the wait functions used in [11] for the sum of edge completion times, is that multiple edges touching the same vertex can wait at the same time. We run the algorithm for two different wait functions W and choose the better of the two solutions. For any vertex v (edge e), let Cv∗ (Ce∗ ) be its completion time in the optimal LP solution. In the first wait function, for each edge e we choose We = bβ1 Ce∗ c, β1 ≥ 1 and in the second one, we choose We = bβ2 max{re , p(Se (u)), p(Se (v))}c, where Se (u) = {f |f ∈ E(u), Cf∗ ≤ Ce∗ } and β2 ≥ 1. Note that the choice of wait functions ensures that the edges become active only after they are released. When all release times are 0, we can choose β1 and β2 such that β1 > 0 and β2 > 0.

5

Analysis

Consider a vertex x and an edge e = (x, y), and recall that Cx∗ and Ce∗ are their completion ff < C fe }, i.e., edges times in the optimal LP solution. Let Be (x) = {f |f ∈ E(x), Cf∗ > Ce∗ , C in E(x) that finish after e in the LP solution, but finish before e in our algorithm. Recall that Se (x) = {f |f ∈ E(x), Cf∗ ≤ Ce∗ }. Note that e ∈ Se (x). Let Se (x) = Se (x) \ {e}. By constraint (3), we have p(Se (x)) + p(Be (x)) ≤ Cx∗ . 7

Schedule(G = (V, E), W ) 1 Solve the LP relaxation for the given instance optimally. 2 t←0 3 F inished ← Active(t) ← ∅ 4 for each e ∈ E do 5 wait(e, t) ← 0 6 while (F inished 6= E) do 7 t←t+1 8 Active(t) ← {e | e ∈ Active(t − 1) and e 6∈ Active(t − pe )} 9 for each edge e ∈ Active(t − 1) \ Active(t) do fe ← t − 1 10 C 11 F inished ← F inished ∪ {e} // e is finished 12 for each edge e = (u, v) ∈ E \ (Active(t) ∪ F inished) 13 in non-decreasing order of LP completion time do 14 if (Active(t) ∩ (E(u) ∪ E(v)) = ∅) and (wait(e, t − 1) = We ) then 15 Active(t) ← Active(t) ∪ {e} // e is active 16 for each edge e = (u, v) ∈ E \ (Active(t) ∪ F inished) do 17 if (Active(t) ∩ (E(u) ∪ E(v)) = ∅) then 18 wait(e, t) ← wait(e, t − 1) + 1 // e is waiting 19 else wait(e, t) ← wait(e, t − 1) // e is delayed e 20 return C Figure 1: Algorithm for Data Migration We analyze our algorithm separately for the two wait functions defined in Section 4. In each case, we analyze the completion time of an arbitrary but fixed vertex u ∈ V . Without loss of generality, let eu = (u, v) be the edge that finishes last among the edges in E(u). By constraint (4), we have Ce∗u ≤ Cu∗ , since the LP solution we are considering is optimal, we can assume that Ce∗u = Cu∗ . We analyze our algorithm for the case where all edges in Seu (u) ∪ Seu (v) finish before eu in our algorithm. If this is not true then our results can only improve. Let p(Seu (v)) = λeu Ce∗u = λeu Cu∗ , 0 ≤ λeu ≤ 2 (8) The upper bound on λeu follows from Corollary 3.2. For ease of reference, in Table 1 we summarize the key notation used in the rest of this section. Lemma 5.1 gives an upper bound on the completion time of any vertex in our algorithm for any wait function. Lemma 5.2 and Lemma 5.3 use Lemma 5.1 to give upper bounds for the two wait functions.

8

Symbol Ce∗ (Cv∗ ) fu ) fe (C C p(Z) re E(u) Se (u) Se (u) Be (u)

Description completion time of edge e (vertex v) in the LP solution. completion time of edge e (vertex v) in our algorithm. P e∈Z pe . release time of edge e. set of edges incident on vertex u. {f |f ∈ E(u) and Cf∗ ≤ Ce∗ } Se (u) \ {e}. ff < C fe }, i.e., edges in E(u) {f |f ∈ E(u) and C ∗ > Ce∗ and C f

that finish after e in the LP solution, but finish before e in our algorithm. waiting time of edge e. edge that finishes last in our algorithm among all edges in E(u).

We eu

Table 1: Summary of notation fu ≤ Weu + Cu∗ + p(Seu (v)) + p(Beu (v)) Lemma 5.1 C Proof: Observe that when eu is in delayed mode it must be that some edge in Seu (u) ∪ Beu (u) ∪ Seu (v) ∪ Beu (v) must be active. Hence, we have g C eu ≤ Weu + p(Seu (u)) + p(Beu (u)) + p(Seu (v)) + p(Beu (v)) f ∴ Cu ≤ Weu + Cu∗ + p(Seu (v)) + p(Beu (v))

Define f (β1 , λ) = β1 + 1 +

β1 +1 β1

+

r

β1 +1 β1

2

−

2λ β1 .

fu ≤ f (β1 , λeu )Cu∗ Lemma 5.2 If Weu = bβ1 Ce∗u c then C Proof: Let eb ∈ Beu (v) be the edge with the largest completion time in the LP solution among all the edges in Beu (v). Note that when eb is in waiting mode it must be that either S eu is waiting or an edge in Seu (u) Beu (u) is active (i.e., eu is delayed). Thus, we get Web ≤ Weu + p(Seu (u)) + p(Beu (u)). Hence, we have that bβ1 Ce∗b c ≤ bβ1 Ce∗u c + p(Seu (u)) − peu + p(Beu (u)). 9

Since peu ≥ 1, it follows that β1 Ce∗b − 1 ≤ β1 Cu∗ + Cu∗ − 1, and Ce∗b ≤ β1β+1 Cu∗ . 1 Note that in the LP solution, each edge in Seu (v) finishes at or before Cu∗ and each edge in Beu (v) finishes after Cu∗ and at or before Ce∗b . Thus, by substituting t1 = Cu∗ , t2 = β1β+1 Cu∗ , 1 ∗ λ1 = p(Seu (v)) = λeu Cu , and λ2 = p(Beu (v)) in Lemma 3.1 we get   s 2 β1 + 1 β1 + 1 β1 + 1 p(Beu (v)) ≤  − λeu + − 2λeu − 1  Cu∗ β1 β1 β1   s 2 β + 1 β + 1 2λ 1 eu  ∗ 1 =  − λeu + − Cu β1 β1 β1 Then, using (8), we have that 

β1 + 1 p(Seu (v)) + p(Beu (v)) ≤  + β1

s

β1 + 1 β1

2

 2λeu  ∗ − Cu β1

The lemma now follows from Lemma 5.1 and the fact that Ce∗u = Cu∗ . Define h(β2 , λ) = (β2 + 1) max{1, λ} +

β2 +1 β2 .

fu ≤ h(β2 , λeu )C ∗ Lemma 5.3 If Weu = bβ2 max{reu , p(Seu (u)), p(Seu (v))}c then C u

Proof: By constraints (2) and (4) reu ≤ Ce∗u = Cu∗ , and from constraint (3), p(Seu (u)) ≤ Cu∗ . Also, recall that p(Seu (v)) = λeu Cu∗ , 0 ≤ λeu ≤ 2. Hence, Weu ≤ bβ2 max{Cu∗ , λeu Cu∗ }c.

(9)

In the following we upper bound p(Seu (v)) + p(Beu (v)). Let z ∈ Seu (v) ∪ Beu (v) be the edge with the largest waiting time, i.e., Wz = maxf ∈Se (v)∪Be (v) {Wf }. When z is in u u S waiting mode it must be that either eu is waiting or an edge in Seu (u) Beu (u) is active. Thus, using (9), we get

Wz ≤ Weu + p(Seu (u)) + p(Beu (u)) ≤ bβ2 max{Cu∗ , λeu Cu∗ }c + p(Seu (u)) − peu + p(Beu (u)) ≤ β2 max{Cu∗ , λeu Cu∗ } + Cu∗ − 1.

(10)

Let l be the edge with the largest completion time in the LP solution among the edges in Seu (v) ∪ Beu (v), i.e., Cl∗ = maxf ∈Se (v)∪Be (v) {Cf∗ }. Since Wl ≤ Wz , we have u

u

bβ2 (p(Seu (v)) + p(Beu (v)))c = bβ2 · p(Sl (v))c ≤ Wl ≤ Wz . 10

(11)

Combining (10) and (11) we get bβ2 (p(Seu (v)) + p(Beu (v)))c ≤ β2 max{Cu∗ , λeu Cu∗ } + Cu∗ − 1. Hence, p(Seu (v)) + p(Beu (v)) ≤

1 max{1, λeu } + β2

Cu∗ .

Now, from Lemma 5.1 we have that fu ≤ Weu + Cu∗ + (max{1, λeu } + 1 )Cu∗ , C β2 and from (9), fu ≤ β2 C ∗ · max{1, λeu } + C ∗ + C ∗ (max{1, λeu } + 1 ). C u u u β2 This gives the statement of the lemma. Combining the two solutions To obtain an improved performance ratio, we run our algorithm with each of the two wait functions and return the better result. The cost of that solution is, by Lemmas 5.2 and 5.3, bounded by ! X X ∗ ∗ ALG ≤ min f (β1 , λv )Cv , h(β2 , λv )Cv . v∈V

v∈V

This expression is unwieldy for performance analysis, but can fortunately be simplified. The following lemma allows us to assume without loss of generality that there are only two ˆ in the range [1, 2]. To analyze the performance ratio, different λ-values: 0, or a constant λ ˆ we then numerically optimize a function over three variables: β1 , β2 , and λ. P ∗ ∗ We first introduce some notation. Let OP T = v Cv be the cost of the optimal LP solution. Partition V into V0 = {v ∈ V : λv ∈ [0, 1]} and V1 = {v ∈ V : λv ∈ (1, 2]}. Let P w = ( v∈V0 Cv∗ )/OP T ∗ be the contribution of V0 to the LP solution. Given an instance with values λv , LP solution vector Cv∗ , and the notation above, we define two functions: Let F (t) = [w · f (β1 , 0) + (1 − w) · f (β1 , t)] and H(t) = [w · h(β2 , 0) + (1 − w) · h(β2 , t)]. ˆ ∈ [1, 2] such that ALG ≤ min(H(λ), ˆ F (λ))OP ˆ Lemma 5.4 There is a λ T ∗. Proof: Since h is constant on [0, 1] and f is decreasing, h(β2 , λv ) = h(β2 , 0) and f (β1 , λv ) ≤ ∗ ˆ = P f (β1 , 0), for v ∈ V0 . Let λ of nodes in V1 v∈V1 λv Cv /|V1 | be the average λ-value P weighted by their LP value. Since h is linear on [1, 2], we have that v∈V1 h(β2 , λv )Cv∗ =

11

P

v∈V1

ˆ ∗ , and since f is convex, h(β2 , λ)C v

that ALG ≤

X

f (β1 , λv )Cv∗ ≤

v∈V

X

P

v∈V1

f (β1 , λv )Cv∗ ≤

f (β1 , 0)Cv∗ +

v∈V0

X

P

v∈V1

ˆ ∗ . It follows f (β1 , λ)C v

ˆ ∗ = F (λ)OP ˆ f (β1 , λ)C T ∗. v

v∈V1

The same type of bound holds for H, establishing the lemma. ˆ denote that value of t. We can then The worst case occurs when h0 (t) = f 0 (t); let λ determine the value of w from the other variables. Namely, defining g(x) = gβ1 ,β2 (x) = ˆ ˆ − g(0)). The performance ratio of the algorithm h(β2 , x) − f (β1 , x), we have w = g(λ)/(g( λ) is then bounded by ! ˆ h(0) − h( λ) ˆ + g(λ) ˆ . ρ ≤ max h(λ) ˆ − g(0) ˆ g(λ) λ∈[1,2] We find the best choice of parameters β1 and β2 numerically. When the release times can be non-zero, we need to ensure that each operation e does not begin executing before its release time. This is satisfied if the β-values are at least 1, ensuring that We ≥ re . Setting ˆ = 1.838, giving a ratio β1 = 1.177 and β2 = 1.0, the worst-case is then achieved at about λ of ρ ≤ 5.0553.

6

6

h(x) f(x)

5.5

5.5

5

5

4.5

4.5

4

4 0

0.5

1

1.5

(a) The functions h(λ), f (λ)

2

p(x)

1

1.2

1.4

1.6

1.8

2

(b) The performance ratio as a function of ˆ λ

Figure 2: The functions h and f and the bound on the performance function, when β1 = 1.177 and β2 = 1.0 Theorem 5.5 There exists a 5.06-approximation algorithm for the data migration problem, as well as for the open shop scheduling problem. When all release times are zero, we can widen the search to all non-zero values. We then obtain a slightly improved ratio of 5.03, when choosing β1 = 1.125 and β2 = 0.8. 12

6

Unit Processing Times

It is known that the open shop scheduling problem is polynomially solvable when all operations are of unit length (c.f. [2]). On the other hand, Kim [14] showed that the more general data migration problem is NP-hard already in the case of unit edge lengths. For such instances, where all edges are released at time 0, it is shown in [14] that Graham’s list scheduling algorithm [7], guided by an optimal solution to the LP relaxation1 gives a 3approximate solution; the algorithm is called Ordered List Scheduling (OLS). The problem of obtaining a better than 3-approximate solution remained open. In Section 6.1, we show by giving a tight example that OLS cannot achieve a ratio better than 3. The tight example also illustrates the limitations of the LP solution. For the sake of completeness, we describe the OLS algorithm and its analysis here. The edges are sorted in non-decreasing order of their completion times in the LP solution. At any time, an edge e = (u, v) is scheduled iff no edge in Se (u) ∪ Se (v) is scheduled at that time. (Recall that Se (u) = {f |f ∈ E(u), Cf∗ ≤ Ce∗ }.) fu For any vertex u, if eu is the edge that finishes last among the edges in E(u), and if C fu ≤ p(Seu (u)) + p(Seu (v)). Combining the fact is the completion time of u in OLS, then C ∗ fu ≤ 3C ∗ that p(Seu (u)) ≤ C along with p(Seu (v)) ≤ 2C ∗ = 2C ∗ (Corollary 3.2), we get C u

eu

u

u

and hence a 3-approximation ratio.

6.1

A tight example

Consider a tree rooted at vertex r. Let S = {s1 , s2 , . . . , sk } be the children of r. For each vertex si , let Li = {l1i , l2i , . . . , lhi } be the children of si . Let L = ∪ki=1 Li . Let k = (n + 1)/2 and h = n − 1. For each vertex u ∈ S, let wu = and for each vertex v ∈ L, let wv = 0. Let wr = M . For each edge e, let Ce∗ = (n + 1)/2 be its completion time in the LP solution. For each vertex v ∈ L ∪ {r}, Cv∗ = (n + 1)/2 and for each vertex v ∈ S, Cv∗ = n. The completion times of vertices in L do not matter as the weights of all those vertices are zero. It is easy to verify that this is an optimal LP solution. The cost of the LP solution equals X k n+1 n+1 n+1 n(n + 1) wr + wsi n = M + kn = M + . 2 2 2 2 i=1

OLS could process the edges in the following order. At any time t, 1 ≤ t ≤ n − 1, OLS processes all edges in {(s1 , lt1 ), (s2 , lt2 ), . . . , (sk , ltk )}. At time t = n + z, 0 ≤ z < (n + 1)/2, OLS processes edge (r, sz+1 ). The cost of the solution in OLS is at least X k n+1 3n − 1 n(n + 1) wr n − 1 + + wsi n = M + . 2 2 2 i=1

For large n, if M , the ratio of the cost of the OLS solution to the cost of the LP solution approaches 3. 1

See in Section 3).

13

6.2

Open shop and sum of operation completion times

Consider now the open shop problem, where each operation has unit processing time and a non-negative weight, and the objective is to minimize the weighted sum of completion times of all operations. Even though the problem of minimizing the sum of job completion times is polynomially solvable for open shop scheduling, minimizing the weighted sum of completion times of unit operations is NP-hard. This holds even for uniform weights as recently proved in [16]. We relate this problem to a result of [11] for the sum coloring problem. The input to sum coloring is a graph G, where each vertex corresponds to a unit length job. We need to assign a positive integer (color) to each vertex (job) so as to minimize the sum of the colors over all vertices. The constraint is that adjacent vertices receive distinct colors. In the weighted case, each vertex (job) is associated with a non-negative weight, and the goal is to minimize the weighted sum of the vertex colors. In the maximum k-colorable subgraph problem, we are given an undirected graph G and a positive integer k; we need to find a maximum size subset U ⊆ V such that G[U ], the graph induced by U , is k-colorable. In the weighted version, each vertex has a non-negative weight and we seek a maximum weight k-colorable subgraph. The following theorem is proved in [11]. Theorem 6.1 The weighted sum coloring problem admits a 1.796 ratio approximation algorithm on graphs for which the maximum weight k-colorable subgraph problem is polynomially solvable. We can relate this theorem to the above variant of the open shop problem, by defining the bipartite graph B = (J, M, F ) (see in Section 2) and setting G = L(B), i.e., G is the line graph of B. Recall that in L(B) the vertices are the edges of B; two vertices are neighbors if the corresponding edges in B share a vertex. In order to apply Theorem 6.1, we need to show that the maximum weight k-colorable subgraph problem is polynomial on L(B). Note that this is the problem of finding a maximum weight collection of edges in B that is k-colorable (i.e., can be decomposed into k disjoint matchings in B). Observe that, on bipartite graphs, this problem is equivalent to the well-known weighted b-matching problem. In weighted b-matching, we seek a maximum weight set of edges that induces a subgraph of maximum degree at most k. Recall that a bipartite graph always admits a matching touching every vertex of maximum degree (c.f. [8]). It follows, that the chromatic index of a bipartite graph is equal to its maximum degree. Since weighted b-matching is solvable in polynomial time (c.f. [4]), the same holds for the weighted k-colorable subgraph problem on L(B). Hence, we have shown Theorem 6.2 Open shop scheduling of unit jobs, under weighted sum of operation completion time criteria, admits a 1.796 ratio approximation.

14

Acknowledgments The first author would like to thank Yoo-Ah Kim for introducing to the author the problem of data migration, and Samir Khuller, Yoo-Ah Kim, Aravind Srinivasan, Chaitanya Swamy for useful discussions.

References [1] E. Anderson, J. Hall, J. Hartline, M. Hobbes, A. Karlin, J. Saia, R. Swaminathan, and J. Wilkes. An Experimental Study of Data Migration Algorithms. Workshop on Algorithm Engineering, pages 145–158, 2001. [2] P. Brucker, Scheduling Algorithms. 3rd edition, Springer Verlag, 2001. [3] A. Bar-Noy, M. Bellare, M. M. Halld´ orsson, H. Shachnai, and T. Tamir. On Chromatic Sums and Distributed Resource Allocation. Information and Computation, 140:183–202, 1998. [4] W. J. Cook, W. H. Cunningham, W. R. Pulleyblank, and A. Schrijver. Combinatorial Optimization. John Wiley and Sons, 1998. [5] S. Chakrabarti, C. A. Phillips, A. S. Schulz, D. B. Shmoys, C. Stein, and J. Wein. Improved Scheduling Problems For Minsum Criteria. Proc. of the 23rd International Colloquium on Automata, Languages, and Programming, LNCS 1099, pages 646–657, 1996. [6] E. G. Coffman, M. R. Garey, D. S. Johnson, and A. S. LaPaugh. Scheduling File Transfers. SIAM Journal on Computing, 14(3):744–780, 1985. [7] R. Graham. Bounds for certain multiprocessing anomalies. Bell System Technical Journal, 45:1563–1581, 1966. [8] H. Gabow and O. Kariv. Algorithms for edge coloring bipartite graphs and multigraphs. SIAM Journal of Computing, 11(1):117–129, 1982. [9] J. Hall, J. Hartline, A. Karlin, J. Saia, and J. Wilkes. On Algorithms for Efficient Data Migration. Proc. of the 12th ACM-SIAM Symposium on Discrete Algorithms, pages 620– 629, 2001. [10] L. Hall, A. S. Schulz, D. B. Shmoys, and J. Wein. Scheduling to Minimize Average Completion Time: Off-line and On-line Approximation Algorithms. Mathematics of Operations Research, 22:513–544, 1997. [11] M. M. Halld´ orsson, G. Kortsarz, and H. Shachnai. Sum Coloring Interval Graphs and kClaw Free Graphs with Applications for Scheduling Dependent Jobs. Algorithmica, 37:187– 209, 2003. [12] H. Hoogeveen, P. Schuurman, and G. Woeginger. Non-approximability Results For Scheduling Problems with Minsum Criteria. Proc. of the 6th International Conference on Integer Programming and Combinatorial Optimization, LNCS 1412, pages 353–366, 1998. [13] S. Khuller, Y. Kim, and Y. C. Wan. Algorithms for Data Migration with Cloning. In Proc. of the 22nd ACM Symposium on Principles of Database Systems, pages 27–36, 2003.

15

[14] Y. Kim. Data Migration to Minimize the Average Completion Time. Proc. of the 14th ACMSIAM Symposium on Discrete Algorithms, pages 97–98, 2003. [15] E. L. Lawler, J. K. Lenstra, A. H. G. Rinnooy-Kan, and D. B. Shmoys. Sequencing and Scheduling: Algorithms and Complexity. In S. C. Graves et al, eds., Handbooks in Operations Research and Management Science, Vol. 4: Logistics of Production and Inventory, 445–522, 1993. [16] D´aniel Marx. Complexity results for minimum sum edge coloring. Manuscript, 2004. See URL: http://www.cs.bme.hu/~dmarx/publications.html [17] M. Queyranne. Structure of a Simple Scheduling Polyhedron. Mathematical Programming, 58:263-285, 1993. [18] M. Queyranne and M. Sviridenko. A (2 + )-Approximation Algorithm for Generalized Preemptive Open Shop Problem with Minsum Objective. Journal of Algorithms, 45:202–212, 2002. [19] M. Queyranne and M. Sviridenko. Approximation Algorithms for Shop Scheduling Problems with Minsum Objective. Journal of Scheduling, 5:287–305, 2002. [20] A. S. Schulz. Scheduling to Minimize Total Weighted Completion Time: Performance Guarantees of LP-based Heuristics and Lower Bounds. Proc. of the 5th International Conference on Integer Programming and Combinatorial Optimization, LNCS 1084, pages 301–315, 1996. [21] P. Sanders and D. Steurer. An Asymptotic Approximation Scheme for Multigraph Edge Coloring. Proc. of the 16th ACM-SIAM Symposium on Discrete Algorithms, 2005. [22] V. G. Timkovsky. Identical parallel machines vs. unit-time shops, preemptions vs. chains, and other offsets in scheduling complexity, Technical Report, Dept. of Comp.Sc. and Systems, McMaster Univ. Hamilton. [23] L. Wolsey. Mixed Integer Programming Formulations for Production Planning and Scheduling Problems. Invited talk at the 12th International Symposium on Mathematical Programming, MIT, Cambridge, 1985.

16

Improved Results for Data Migration and Open Shop ... - CiteSeerX

Improved Results for Data Migration and Open Shop ... - CiteSeerX

Suggest Documents

improved approximation algorithms for shop scheduling ... - CiteSeerX

Heuristic Approach for Open Shop ... - CiteSeerX

Improved Results for Stackelberg Scheduling Strategies ... - CiteSeerX

Improved Approximation Algorithms for Data Migration - Springer Link

An Improved Model for Live Migration in Data Centre Simulators

Closing the Open Shop: Contradicting Conventional ... - CiteSeerX

Open Data Commons, A License for Open Data - CiteSeerX

Workshop on Open-Source and Open-Data for MICCAI - CiteSeerX

Combining AI/OR techniques for solving Open Shop ... - CiteSeerX

Benchmark results for delayed neutron data - CiteSeerX

Automated Data Validation for Data Migration Security - CiteSeerX

One-Operator, Two-Machine Open Shop and Flow Shop Problems

An Improved PSO for Data Clustering - Bee22 open-source particle ...

Open Data, Open Source and Open Standards in chemistry - CiteSeerX

Improved Non-approximability Results for Minimum Vertex ... - CiteSeerX

Provenance Principles for Open Data - CiteSeerX

Provenance Principles for Open Data - CiteSeerX

Technical Data - Termo Shop

Open Results

Polarization and Improved Migration Maturation of

Legacy System Migration : A Legacy Data Migration Engine - CiteSeerX

Migration Data

Acoustic and Textual Data Augmentation for Improved

Search Networks, Open Migration Ladders and New ... - CiteSeerX