Journal of Combinatorial Optimization, 5, 233–247, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. °
Approximation Algorithms for Bounded Facility Location Problems PIOTR KRYSTA ROBERTO SOLIS-OBA∗ Max-Planck-Institut f¨ur Informatik, Saarbr¨ucken, Germany
[email protected] [email protected]
Received December 1, 1999; Revised February 24, 2000; Accepted February 25, 2000
Abstract. The bounded k-median problem is to select in an undirected graph G = (V, E) a set S of k vertices such that the distance from any vertex v ∈ V to S is at most a given bound d and the average distance from vertices V\S to S is minimized. We present randomized algorithms for several versions of this problem and we prove some inapproximability results. We also study the bounded version of the uncapacitated facility location problem and present extensions of known deterministic algorithms for the unbounded version. Keywords: facility location, approximation algorithms, randomized algorithms, clustering
1.
Introduction
The bounded k-median problem is to select in an undirected graph G = (V, E) a set S of k vertices (called centers) such that the distance from any vertex v ∈ V\S to S is at most a given bound d and the average distance from vertices in V\S to S is minimized. This is a natural generalization of the well known k-median problem (Arora et al., 1998) in which it is desired to choose k centers so as to minimize the average distance from vertices to centers. Consider for example the following situation. In some city it is desired to locate a set of k fire departments so that the average distance that a fire crew must travel to get to any building in the city is minimized. Moreover, for safety reasons, it is also required that the maximum time needed to move a fire crew to any point in the city does not exceed a certain bound. Otherwise, when the fire crew finally reaches a fire site, the fire might have already destroyed everything there. Let G = (V, E) be a graph with minimum dominating set of size k. It is not difficult to see that when all edge lengths are equal to 1 and d = 1, the bounded k-median problem is equivalent to the minimum dominating set problem. The k-center problem consists in choosing k centers so that the maximum distance from a vertex to its nearest center is minimized. The bounded k-median problem is also a generalization of the k-center problem. Another related problem is the bounded uncapacitated facility location problem. Here, given a graph G = (V, E) there is a set F ⊆ V of possible locations for service facilities. ∗ Present address: Department of Computer Science, The University of Western Ontario, London, ON N6A 5B7, Canada.
234
KRYSTA AND SOLIS-OBA
Each site i ∈ F has a cost f i for opening a service facility there. The cost of servicing a request from a vertex v ∈ V\F is the distance from v to its nearest facility. The placement of facilities must be such that the maximum cost of servicing a request is at most a given bound d. The goal is to find locations for the facilities so that the total cost of servicing the vertices v ∈ V\F plus the total cost for opening the facilities is minimized. We can think of the above two problems as multi-criteria optimization problems (Lin and Xue, 1999; Marathe et al., 1998) in which it is desired to minimize the total cost of servicing the vertices, the maximum service cost, and (in the case of the bounded k-median problem) the number of centers. This is the view that we adopt in this paper since we present algorithms that might use more than the specified number of centers, or that violate the bound on the maximum service cost. Although the bounded k-median problem has been studied before, all the previous work centers on exact algorithms for solving the problem in exponential time (Berman and Yang, 1991; Choi and Chaudhry, 1993; Chaudhry et al., 1995; Khumawala, 1973; Toregas et al., 1971). The k-median problem is a classical clustering problem with a large number of applications (see e.g., Jain and Dubes, 1981). Lin and Vitter (1992a) showed that the problem does not admit a polynomial time approximation scheme unless P = NP. They also designed an algorithm that for any value ² > 0 finds a solution of value (1 + ²) times the optimum, but using (1 + 1/²)(ln n + 1)k centers, Lin and Vitter (1992a). For metric spaces Lin and Vitter (1992b) gave an algorithm that finds a solution of value 2(1 + ²) times the optimum using (1 + 1/²)k centers. Arora et al. (1998) designed a randomized algorithm for the problem when the set of vertices lay on the plane. For any value ² > 0 this algorithm finds with high probability a solution of value at most 1 + ² times the optimum 1 in O(n O(1+ ² ) ) time. Very recently, Charikar et al. (1999) designed the first constant-factor approximation algorithm for the metric k-median problem. The uncapacitated facility location problem is a classical problem of Operations Research (Cornuejols et al., 1990). Guha and Khuller (1998) have shown that the metric version of the problem is MAX SNP-hard, and that it cannot be approximated within a factor smaller than 1.463 of the optimum unless NP ⊆ DTIME(n poly log n ). Shmoys et al. (1997) presented an algorithm for the problem in metric spaces with performance ratio 3.16. Algorithms with better ratios of 2.41 and 1.74 were later given in Guha and Khuller (1998) and Chudak (1998). When the vertices of the graph lay on the plane Arora et al. (1998) designed a 1 randomized algorithm with performance ratio 1 + ² and running time O(n O(1+ ² ) ), for any ² > 0. The bounded version of the uncapacitated facility location problem has also been studied by the Operations Research community and several exponential time algorithms for solving exactly the problem are known (Berman and Yang, 1991; Drezner, 1995). The service cost of a vertex is defined as the distance from the vertex to its closest center. We prove that the bounded k-median problem is MAX SNP-hard even when all edge lengths are 1 and the bound on the service cost is d = 2. Moreover, by extending ideas of Guha and Khuller (1998) we prove that the optimum solution of the problem cannot be approximated in polynomial time within a factor smaller than 1.367 unless NP ⊆ DTIME(n O(log log n) ). Even if we allow the use of 1.2784 k centers, we show that the problem is still inapproximable within 1.2784, unless NP ⊆ DTIME(n O(log log n) ).
BOUNDED FACILITY LOCATION PROBLEMS
235
We present a technique that allows us to design randomized approximation algorithms for several versions of the bounded k-median problem on graphs with unit edge lengths, minimum dominating set of size k, and d = 1: (i) an approximation algorithm with expected performance ratio (4e + 6)/(4e + 1) ≈ 1.4211 that uses at most 2k centers and has maximum service cost 2d; (ii) an approximation algorithm with expected performance ratio (e + 6)/(e + 2) ≈ 1.8478 that uses at most k centers, but has maximum service cost 3d; (iii) an algorithm with expected performance ratio (e + 5)/(e + 1) ≈ 2.076 with maximum service cost 3d when the vertices have weights {1, +∞}. For the bounded k-median problem we also give a deterministic algorithm that produces a solution of value at most 1.5 times the optimum, that uses at most 2k centers, and with maximum service cost 2d. We also give a 1.4489-approximation algorithm for a fault tolerant version of the bounded k-median problem: the bounded 2-neighbor k-median problem. This algorithm improves on average the approximation ratio of the algorithm in Khuller et al. (1999) for the unbounded 2-neighbor k-center problem in the case of unit edge lengths. For the case of arbitrary edge lengths, we extend algorithms of Chudak (1998), Shmoys et al. (1997), and of Lin and Vitter (1992b) for the k-median, and the capacitated and uncapacitated facility location problems. These algorithms have the same performance guarantees as the original ones, but they bound the maximum service cost of every vertex. The rest of the paper is organized in the following way. In Section 2 we give precise definitions of the problems that we study. In Section 3 we present inapproximability results for the bounded k-median problem. In Section 4 we present some approximation algorithms for the bounded k-median problem and for the fault tolerant version of the problem when all edge lengths are equal to one. Finally, in Section 5 we describe approximation algorithms for the bounded uncapacitated facility location and for the bounded k-median problems with arbitrary edge lengths. 2.
Preliminaries
Let G = (V, E) be an undirected graph with n vertices. Every edge (i, j) ∈ E has a nonnegative length ci j . We assume that the edge lengths satisfy the triangle inequality. Without loss of generality we might assume that G is a complete graph and that for every pair of vertices i and j, ci j is equal to the length of a shortest path from i to j. Given an integer value k ≤ n, and a value d > 0, the bounded k-median problem is to find a set S of k vertices (called centers) such that 1. for every vertex v ∈ V the distance to its nearest center, c(v, S), is at most P d, and 2. the sum of distances from vertices in V\S to their nearest center, v∈V c(v, S), is minimized. In the bounded uncapacitated facility location problem there is a set F of vertices that can be selected as centers. Each vertex i ∈ F has a cost f i for selecting it as a center. The problem consists in choosing a set S of centers such that
236
KRYSTA AND SOLIS-OBA
1. for every vertex v ∈ V , c(v, S) P≤ d, and P 2. the total cost of the solution, v∈V c(v, S) + u∈S f u , is minimized. We say that a vertex v ∈ V is serviced at distance d 0 if the distance from v to its nearest center is at most d 0 . An (α, β, γ )-approximation algorithm for the bounded kmedian problem is a polynomial time algorithm that finds a solution S of cost at most α times the optimum, with maximum service cost βd, and that uses at most γ k centers. An (α, β)-approximation algorithm for the bounded uncapacitated facility location problem is defined in a similar way. 3.
Inapproximability results
In this section we prove that the optimum solution of the bounded k-median problem cannot be approximated within a factor smaller than 1.367 in polynomial time unless NP ⊆ DTIME(n O(log log n) ). This results holds even when all edge lengths are equal to 1. Let X = {x1 , . . . , x p } be a finite set of elements and S ⊆ X . We say that set S covers ` Si = X , the all elements xi ∈ S. Given a family S = {S1 , . . . , S` } of sets such that ∪i=1 minimum set covering problem asks for the smallest number of sets in S that cover X . Such a collection of sets is called a set cover. The following lemma follows from results of Guha and Khuller (1998). Lemma 3.1. Let (X, S) be an instance of the set covering problem with minimum cover of size k. If there is a polynomial time algorithm A SC that for some positive constant β picks βk sets of S covering more than (1 − e1β )|X | elements of X, then NP ⊆ DTIME (|X | O(log log |X |) ). Let G = (V, E) be an undirected graph with unit length edges. Given two vertices u, v ∈ V we say that u dominates v if either u = v or u and v are adjacent. A dominating set of G is a set that dominates every vertex in V . Let the minimum size of a dominating set of G be k. Lemma 3.2. If there is a polynomial time algorithm A Dom that for any constants β, ε > 0, selects a set of βk vertices that dominates c0 n vertices of G, with c0 > 1 + ε − e1β , then NP ⊆ DTIME(n O(log log n) ). Proof: We show that if algorithm A Dom exists then we can build algorithm A SC as described in Lemma 3.1. Let (X, S) be an instance of the set cover problem with X = {x1 , . . . , x p } and S = {S1 , . . . , S` }. We build a graph G = (V, E) that contains one vertex Si0 for each set Si ∈ S and t = d( e1β − ε)`/(εp)e vertices xi0 for each element xi ∈ X . There is an edge in E from Si0 to every copy of xi0 if xi ∈ Si and there is also an edge from Si0 to every set S 0j , i 6= j. It is not hard to see that G has a dominating set of size k if and only if X has a set cover of size k. Assume that algorithm A Dom exists and that it chooses a set U of βk vertices dominating at least c0 |V | vertices of G. If U contains a vertex xi0 , we can replace it for any vertex Si0 such
BOUNDED FACILITY LOCATION PROBLEMS
237
that xi ∈ Si . This change does not decrease the number of vertices dominated by U . Hence, we might assume that U contains only vertices Si0 and it dominates at least c0 ( pt + `) − ` vertices xi0 . This means that the corresponding subsets Si cover at least (c0 ( pt + `) − `)/t = c0p − (1 − c0 )`/t elements from X . But by Lemma 3.1, c0p − (1 − c0 )`/t ≤ (1 − e1β ) p, and thus c0 ≤ 1 + ε − e1β . 2 Theorem 3.1. The bounded k-median problem cannot be approximated within a factor α < 1 + 1e of the optimum, unless NP ⊆ DTIME(n O(log log n) ). This is true even when all edge lengths are equal to 1 and d = 2. Proof: Let G = (V, E) be a graph with unit length edges and minimum dominating set of size k. Assume that there is an α-approximation algorithm Aα for the bounded k-median problem that keeps all service costs no larger than d = 2. Run the algorithm on G and let S ⊆ V be the set of centers that it selects. Let V1 ⊆ V be the set of vertices at distance 1 from S and let the number of vertices dominated by S be cn, i.e. |S ∪ V1 | = cn, where c < 1. Since G has a dominating set of size k, there exists a solution to the bounded kmedian problem of cost n − k. The above α-approximation algorithm finds a solution of cost at most α(n − k). In this solution there are cn − k vertices at distance 1 from S, and n − cn vertices at distance 2 from S. Thus the cost of this solution is cn −k +2(n −cn) ≤ α(n − k), . Since by Lemma 3.2, c ≤ 1 − 1e , then α ≥ 1 + 1e . 2 and hence α ≥ (2−c)n−k n−k Even if we allow an algorithm to use more than k centers, it cannot approximate the value of the optimum solution within an arbitrary precision. Corollary 3.1. If an algorithm for the bounded k-median problem is allowed to use up to (1+²)k centers, where 0 ≤ ² ≤ 0.2784, then the solution obtained by that algorithm cannot be always smaller than 1 + e−1−² times the optimum unless NP ⊆ DTIME(n O(log log n) ). Proof: 4.
The proof follows the proof of Theorem 3.1.
2
The uniform cost bounded k-median problem
Given any instance of the bounded k-median problem, we might assume without loss of generality that the maximum service cost d is equal to the length of some edge in the graph. This is because for any set S of k centers, the vertex v farthest from S is at distance cvu for some vertex u ∈ S. Given a graph G = (V, E), the k-center problem is to find the smallest distance d 0 for which there is a set S of at most k vertices such that c(v, S) ≤ d 0 for all vertices v ∈ V . Let dk∗ be the value of the optimum solution to the k-center problem. The following result by Hochbaum and Shmoys (1986) shows that we should consider only those instances of the bounded k-median problem in which the bound d is at least 2dk∗ . Lemma 4.1. No algorithm can approximate in polynomial time the value of the optimum solution for the k-center problem within a factor smaller than 2 unless P = NP.
238
KRYSTA AND SOLIS-OBA
In this section we restrict our attention to the bounded k-median problem in which the maximum service cost is d = 1, the input graph is connected, it has unit edge lengths and minimum dominating set of size k. We call this problem the uniform cost bounded k-median problem. 4.1.
A simple algorithm
Hochbaum and Shmoys (1986) designed a simple 2-approximation algorithm for the k-center problem. The algorithm, which we call A H S , finds the smallest edge length ci j for which a maximal set S of centers at distance larger than 2ci j from each other has size |S| ≤ k. Note that every vertex v is at distance at most 2ci j from its nearest center. For a proof that ci j is no larger than the value of the optimum solution the reader is referred to Hochbaum and Shmoys (1986). Lemma 4.2. Algorithm A H S is a ((2 − uniform cost bounded k-median problem.
k ), 1, 1)-approximation n−k
algorithm for the
Proof: The centers selected by the algorithm are at least at distance 3 from each other, and so every center has at least one unique non-center vertex at distance 1 from it. Hence, the value of the solution found by the algorithm is at most k + 2(n − 2k) = 2n − 3k, while the optimum solution has value at least n − k. Therefore, the performance ratio of the algorithm k = 2 − n−k . 2 is 2n−3k n−k 4.2.
A randomized algorithm
In this section we describe a randomized algorithm for the uniform cost bounded k-median problem. Let N j be the set of vertices (including j) at distance at most one from vertex j. We can describe the problem as the following integer linear program, IP. X ci j xi j (1) Min i, j∈V
s.t.
X
xi j = 1
∀ j∈V
(2)
∀ j∈V , ∀i∈N j
(3)
i∈N j
xi j ≤ yi X yi ≤ k
(4)
i∈V
xi j , yi ∈ {0, 1}
∀i, j∈V
(5)
The meaning of the variables is as follows: yi = 1 if and only if vertex i is chosen as a center, and xi j = 1 if and only if vertex i is a center, i ∈ N j , and i is the closest center to j. Let LP be the linear program obtained by relaxing constraint (5) to 0 ≤ xi j , yi ≤ 1. Let λ ∈ [0, 1) be a fixed constant whose value will be specified later. Our algorithm is as λ n we use algorithm A H S to select a set of centers. Otherwise, we solve follows. If k > 1+λ LP and then round the solution using the following rounding procedure based on ideas from
BOUNDED FACILITY LOCATION PROBLEMS
239
Shmoys et al. (1997) and Chudak (1998). Let (x, y) be an optimal solution of LP. Without loss of generality (see Chudak, 1998) we may assume that (x, y) is complete, i.e., if xi j > 0, then xi j = yi , for every i, j ∈ V . Our algorithm chooses independently each vertex i as a center with probability yi . Let S1 = {i | yi > 0 and i is selected as a center}. This set of centers might not induce a feasible solution for the problem since some vertex might be far away from its closest center. To ensure that the maximum service cost is bounded we run algorithm A H S on the input graph. Let S2 be the set of centers that it chooses. Note that in this solution all vertices are serviced P at distance at most 2d. For every vertex j, let C j = i∈V ci j xi j denote the fractional cost of servicing it. Let E(cost( j)) be the expected cost of servicing j in the rounded solution. λ n, then for each j ∈ V, C j + x j j = 1, and E(cost( j)) ≤ Lemma 4.3. If k ≤ 1+λ C j + q(C j + x j j ), where q ≤ 1e .
Proof: We show first that that C j + x j j = 1. Note that for any vertex i 6= j, if xi j > 0 then (2) of LP and from the objective function of LP. ci j = d = 1. This follows P from constraintP Hence, C j + x j j = i∈V ci j xi j + x j j = i∈V : xi j >0 xi j , since ci j = 1 for all xi j > 0, and c j j = 0. This last expression is equal to 1 because of constraint (2) of LP. Next, we show that E(cost( j)) ≤ C j + q for some value q ≤ 1e . For simplicity let the neighbors of vertex j be N ( j) = {1, 2, . . . , g, j}. Then, c1 j = · · · = cg j = 1,Qand c j j = 0. g The probability that no vertex from N ( j) is selected as a center is q = (1− y j ) i=1 (1− yi ); this is the probability that vertex j is serviced at distance at most 2 by one of the centers selected in the second phase of the algorithm. The probability that j is chosen as center is q . Hence, the y j , and the probability that at least one neighbor of j is a center is 1 − 1−y j q expected service cost of vertex j is E(cost( j)) ≤ 0 · y j + 1 · (1 − y j )(1 − 1−y ) + 2q = j 1 − y j − q + 2q = C j + x j j − y j + q = C j + q, because the solution of the linear program is complete, and so x j j = y j . e−x for all x > 0, and P yi = x i j for all i ∈ N ( j), then q = (1 − y j ) · QgSince 1 − x ≤ Q (−xi j ) −xi j (1 − x ) ≤ e = e = 1e , where the last equality follows from ij i∈{1,...,g, j} i=1 constraint (2) of LP. 2 Theorem 4.1. There is a randomized algorithm for the uniform cost bounded k-median problem, that produces a solution of expected value no larger than (4e + 6)/(4e + 1) times the value of the optimum solution. This solution uses, with high probability, at most 2k centers, and services each vertex at distance at most 2. Proof: We first argue for the bound on the number of centers. In first phase the algorithm chooses independently P each i ∈ V as a center with probability yi , and so by constraint (4) of LP, E(|S1 |) = i∈V yi ≤ k. Note that this selection process constitutes independent Poisson trials. Using Chernoff bounds we can prove that the probability Pr[|S1 | > k] is at k 1 k+2 k+1 ) )] (see e.g., Theorem 4.1 in Motwani and Raghavan, 1995). Since most [e/((1 + k+1 k 1 k+2 1 1 ) > e(1 + k+1 ), then Pr[|S1 | > k] < (1 − k+2 ) k+1 . If we repeat our algorithm (1 + k+1 [(k + 1)(k + 2) ln n]/k times then this probability is at most 1/n. In the second phase of our algorithm A H S selects at most k centers.
240
KRYSTA AND SOLIS-OBA
λ We now prove the bound on the approximation ratio. We consider P two cases, if k ≤ 1+λ n then by Lemma 4.3, the expected cost of the solution is E(cost) ≤ j∈V [C j +q(C j + x j j )]. P Let L = j∈V C j be the value of an P optimal solution of LP, then E(cost) ≤ (1 + q)L + P = (1 + q + λq)L + q j∈V (x j j − λC j ). q j∈V x j jP P , so r = j∈V ((1 + λ)x j j − λ). Let r = j∈V (x j j − λC j ). By Lemma 4.3, C j = 1 − x j jP From constraint (3), x j j ≤ y j for each vertex j, thus r ≤ j∈V ((1 + λ)y j − λ). Now by λ constraint (4), r ≤ (1 + λ)k − λn. Since k ≤ 1+λ n, then r ≤ 0. Thus, E(cost) ≤ (1 + q + λq)L, and the expected performance ratio of the algorithm is r2 = 1+q +λq ≤ 1+(1+λ) 1e . λ n we use algorithm A H S . The value of the solution that the On the other hand, if k > 1+λ algorithm finds is at most 2n − 6k, and so the performance ratio of the algorithm in this case is r1 = (2n − 6k)/(n − k) < 2 − 4λ. By choosing λ so that r1 = r2 , the performance ratio of the overall algorithm is (4e + 6)/(4e + 1) ≈ 1.421. 2
4.3.
A greedy algorithm
Now we describe a simple greedy 1.5-approximation algorithm for the uniform cost bounded k-median problem that uses at most 2k centers. Let S be a set of vertices and N (S) be the neighbors of S. The algorithm has two phases, and in each one of them it selects k centers. The first phase is as follows. S←∅ while |S| < k do Add to S the vertex with largest number of neighbors not in N (S).
Lemma 4.4. Let G = (V, E) be a graph with a dominating set S ∗ of size k. The above algorithm finds a set S such that |N (S)| ≥ (n − k)/2. Proof: Partition the vertices V\S ∗ into k disjoint groups N j , j ∈ S ∗ , such that every vertex in N j is adjacent to vertex j. Index the groups in non-increasing order of size. Let Si , i = 0, . . . , k be the set formed by Sthe i first vertices chosen by the algorithm. We show by induction on i that |N (Si )| ≥ 12 | j≤i N j | for all i. T The claim trivially holds for i = 1. Assume that it also holds for all ` < i. If |N (Si−1 ) N j | ≥ 12 |N j | for all j = 1, . . . , i, then clearly the T claim follows. Otherwise, there must hybe a vertex v ∈ S ∗ , v ≤ i, such that |N (Si−1 ) Nv | ≤ 12 |Nv |. Hence, by induction S pothesis, in the i-th iteration the algorithm selects a vertex u with |N ({u}) N (S )| i−1 S 2 ≥ 12 | ` λ+1 Otherwise, find an optimal complete solution (x, y) of LP. For each vertex j ∈ V , let N ( j) = {i ∈ V | xi j > 0}. Partition the set of vertices in the following manner. Choose any vertex i ∈ V and create a cluster Q i that includes all vertices j ∈ V such that N (i) ∩ N ( j) 6= ∅. Remove the vertices in Q i from V , and repeat this process until V is empty. Let C be the set of vertices that serve as indices for the clusters Q i . Note that for every pair of vertices i, j ∈ C, i 6= j, N (i) ∩ N ( j) = ∅. Now weP divide the vertices into two groups: A = ∪ j∈C N ( j) and B = V\A. By constraint (2) of LP, i∈N ( j) xi j = 1. For every cluster Q j the algorithm randomly chooses a vertex in N ( j) as a center, selecting vertex i ∈ N ( j) with probability xi j . Since the solution (x, y) is complete then xi j = yi for all i ∈ N ( j), and thus every vertex i ∈ N ( j) is chosen as a center with probability yi . Next, the algorithm chooses independently every vertex i ∈ B as a center with probability yi . By linearity of expectation and by constraint (4) of LP the expected number of centers that this algorithm selects is k. Note that the probability that this algorithm chooses more than k centers is no larger than the probability that the algorithm of Section 4.2 chooses more than k centers during the first phase. Hence, using Chernoff bounds one can show that the probability that the algorithm chooses more than k centers is at most [e(1 + 1/(k + 1))−k−2 ]k/(k+1) . Hence, by performing [(k + 1)(k + 2) ln n]/k independent runs of the algorithm, the probability of choosing more than k centers is at most 1/n. Consider a vertex j ∈ Q i . If some vertex u ∈ N ( j) is selected by the algorithm as a center, then vertex j is serviced at distance 1. But if none of the vertices N ( j) is a center then j is serviced at distance at most 3. To see why, let i 0 ∈ Q i be the center that the algorithm chooses in cluster Q i . Since i 0 is a neighbor of i, and j is at distance at most 2 from i, the claim follows. Let B j be a random variable denoting the distance from j to the center chosen in cluster Q i . Let D j denote the event that none of the vertices in N ( j) are centers. Lemma 4.5.
For each vertex v, E(cost(v)) ≤ Cv + 2q, where q ≤ 1e .
Proof: For every vertex j ∈ C let Nv j = N ( j) ∩ N (v), and let E j be the event that one vertex in Nv j is selected by the algorithm as center. Note that for any two vertices i, j ∈ C, thus events E j and E i are independent. The probability of event i 6= j, Nv j ∩ N Pvi = ∅, andP E j , j ∈ C is i∈Nv j yi = i∈Nv j xiv .
242
KRYSTA AND SOLIS-OBA
For notational simplicity, for every vertex i ∈ N (v) such that i 6∈ N ( j) for all j ∈ C, let E i be the event that vertex i is selected as center. The probability of E i is yi = xiv . Because of the way in which the centers P all above events E i are independent, and thus P are selected, by linearity of expectation i Pr[E i ] = i∈N (v) xiv = 1. Q probabilityPthat none of the vertices in N (v) is a center is q = i (1 − Pr [E i ]) ≤ Q The 1 −Pr [E i ] = e− i Pr [Ei ] = e . Therefore, the expected cost of servicing vertex v is at most i e 0 · Pr [E v ] + 1 · (1 − Pr [E v ])(1 − 1−Prq [Ev ] ) + 3q because if no vertex from N (v) is center then there is some center at distance at most 3 from v. Simplifying we get E(cost(v)) ≤ 2 1 − yv − q + 3q = Cv + 2q by Lemma 4.3. Theorem 4.3. The above algorithm for the uniform cost bounded k-median problem finds a solution of expected value no larger than (e + 6)/(e + 2) times the optimum in which every vertex is serviced within distance 3. The number of centers in this solution is with high probability at most k. λ n then byP Lemmas 4.5 and 4.3 the expected cost Proof: If P k ≤ 1+λ P of the solution is be the value E(cost) ≤ j∈V (C j + 2q) = j∈V (C j + 2q(C j + x j j )). Let L = j∈V C j P of an optimum solution ofPlinear program LP. Then, E(cost) ≤ (1+2q)L +2q j∈V x j j = (1 + 2q + 2λq)L + 2q j∈V (x j j − λC j ) ≤ (1 + 2q + 2λq)L, because by the proof of P Lemma 4.3, j∈V (x j j − λC j ) ≤ 0. Hence, the expected performance ratio of the algorithm in this case is r1 ≤ 1 + (1 + λ) 2e . λ If k > 1+λ n then algorithm A H S is used. This algorithm has performance ratio r2 ≤ k the expected performance ratio of the algorithm is 2 − n−k < 2 − λ. Choosing λ = e−2 e+2 e+6 ≈ 1.848. 2 e+2
4.5.
Weights on the vertices
Consider a graph G = (V, E) with unit length edges and weights wi on the vertices. The weights can be either 1P or ∞. The weighted bounded k-median problemPis to find a set S of vertices such that i∈S wi ≤ k, d(v, S) ≤ d for each v ∈ V , and v∈V d(v, S) is minimized. This problem is interesting because it allows us to exclude some vertices as possible centers. Here we study the case when d = 3. Our results improve on average the 3-approximation algorithm of Hochbaum and Shmoys (1986) for the weighted k-center problem in the special case of weights P {1, +∞} on the vertices. Let k be the minimum weight i∈S wi of a dominating set S of G. In this section, we extend the techniques of Section 4.4 to design a randomized (2.076, 3, 1)-approximation algorithm for the weighted bounded k-median problem. Let A W H S be the 3-approximation algorithm of Hochbaum and Shmoys (1986) for the weighted k-center problem. This algorithm first chooses a set of centers just like in algorithm A H S , and then it modifies the solution by replacing every center with its smallest weight neighbor. k Lemma 4.6. A W H S is a (3 − n−k , 3, 1)-approximation algorithm for the weighted bounded k-median problem.
BOUNDED FACILITY LOCATION PROBLEMS
243
Proof: It is not difficult to see that every vertex is at distance at most 3 from its nearest center and that every center can be assigned at least one unique vertex at distance one from it. Then, the solution obtained by the algorithm has value at most 3(n − 2k) + k = 3n − 5k, while the optimum solution has value at least n − k. Therefore, the performance ratio of k . 2 the algorithm is at most 3 − n−k We can describe the weighted bounded k-median problem as an integer linear program, that P we call IP2 . This is the same as integer program IP, but with constraint (4) replaced by i∈V wi yi ≤ k. For convenience we write IP2 here.
Min
X
ci j xi j
i, j∈V
s.t.
X
xi j = 1
∀ j∈V
i∈N j
xi j ≤ yi X wi yi ≤ k
∀ j∈V , ∀i∈N j
(6) (7)
i∈V
xi j , yi ∈ {0, 1}
∀i, j∈V
(8)
Let LP2 be the linear program relaxation of IP2 obtained by relaxing constraint (8). The algorithm for the weighted bounded k-median problem is like the algorithm of Section 4.4. λ n, for some value λ to be specified later, then use algorithm A W If k > λ+1 H S . Otherwise, solveL P2 and find a complete solution (x, y). Then, we define clusters like before, but this time we create clusters only around vertices i with yi > 0. Notice that yi = 0 if wi = +∞. Using similar arguments as those in Section 4.4 we can show that every vertex is serviced P at distance at most 3 and that the expected total weight of the solution is at k-center problem. If most i∈V wi yi ≤ k. Lemma 4.5 also holds for the weighted P λ k ≤ λ+1 n, then the expected cost of the solution is E(cost) ≤ (C j + 2q(C j + x j j )) = j∈V P (1 + 2q + 2λq)L 2 + 2q j∈V (x j j − λC j ), where L 2 is the value of an optimal solution of L P2 , and λ ∈ [0, 1] is some constant P to be defined later. By Lemma 4.3 C j = 1 − x j j , so + 2q λ)x j j − λ). Since the weight of every vertex E(cost) ≤ (1 + 2(1 + λ)q)L 2 P j∈V ((1 +P P is at least one, then j∈V x j j ≤ j∈V y j ≤ j∈V w j y j ≤ k, by constraints (6) and (7). Thus, E(cost) ≤ (1 + 2(1 + λ)q)L 2 + 2q((1 + λ)k − λn) ≤ (1 + 2(1 + λ)q)L 2 , because λ k ≤ λ+1 n. By Lemma 4.3, q ≤ 1e , so the performance ratio of the algorithm in this case is r1 ≤ 1 + 2(1 + λ) 1e . λ n, then by Lemma 4.6 the performance ratio of the algorithm is r1 ≤ 3 − 2λ. If k > λ+1 the performance ratio of the algorithm is e+5 ≈ 2.076. By choosing λ = e−1 e+1 e+1 Theorem 4.4. There is a randomized algorithm for the weighted bounded k-median problem, when the vertex weights are either 1 or ∞, that finds a solution of expected value at most (e + 5)/(e + 1) times the optimum. The total cost of the centers in this solution is
244
KRYSTA AND SOLIS-OBA
with high probability at most k. In this solution each vertex is serviced at distance at most 3. Moreover, for each vertex v, the probability that v is serviced at distance 3 is at most 1e . 4.6.
Fault tolerant problem
The bounded p-neighbor k-median problem is given a value d, find a set S of k centers suchP that for each vertex v ∈ V\S there are at least p centers in S within distance d of v, and v∈S d(v, S) is minimized, where d(v, S) is the distance from v to its closest center. We can also handle the case of d(v, S) being a sum of the distances from v to its p closest centers. In this section, we are interested in the case when all edge lengths are 1 and d = 1. The p-neighbor k center problem consists in finding the smallest distance d for which there is a set S of k vertices such that every vertex v ∈ V\S is at distance at most d from p vertices in S. Khuller et al. (1999) designed a 2-approximation algorithm for the problem, and when p = 2 it is not difficult to modify the solution that this algorithm finds so that the number of vertices within distance d from their closest centers is at least k/2. The idea is that if the set Sd of vertices within distance d from S has size smaller than k/2 then some of these vertices can be exchanged from vertices in S to get a new solution in which the set Sd is larger. Since the algorithm involves analyzing several cases, and its description is slightly complicated we omit it. We call this algorithm A N . The bounded p-neighbor k-median problemP can be modeled as the integer program IP of n, where q is Section 4.2, with the constraint (2) replaced by i∈N j xi j = p, ∀ j∈V . If k ≤ 1−q 3.5 as defined in Lemma 4.3., then we solve the linear program relaxation LP of this integer program to obtain a complete solution, and then we round the solution as in Section 4.2 but using n we use algorithm A N to choose the algorithm A N instead of algorithm A H S . If k > 1−q 3.5 centers. n, Using arguments similar to those of Section 4.2 we can show that when k ≤ 1−q 3.5 every vertex j has expected service cost E(cost( j)) ≤ 1 − y j + q, where q ≤ 1e . The service cost of a vertex is the distance to its closest center. We note that the objective function of the linear program does not reflect this definition of service cost, which is very difficult to write as a linear function. However, this linear program can be used to get a good approximation for the solution of the problem, as we show now. The expected number of centers chosen by the algorithm is with high probability at most 2k. Proceeding as in the proof P P E(cost(i)) ≤ (1 − y j + q) = of Theorem 4.1, we can show that E(cost) = j∈V j∈V (1 + q)n − k. Therefore, the performance ratio of the algorithm in this case is at most qn . 1 + n−k If k > 1−q n then we use algorithm A N to find a solution for the problem, but we allow 3.5 A N to choose 2k centers. The cost of this solution is 0 · 2k + k2 + 2(n − 2.5k) = 2n − 4.5k. 2.5k = 2 − n−k . Therefore the performance ratio of the algorithm in this case is at most 2n−4.5k n−k The overall performance ratio of the algorithm is approximately 1.4489. Theorem 4.5. There is a randomized algorithm for the uniform cost bounded 2-neighbor k-median problem that finds a solution of expected value at most 1.4489 times the optimum. This solutions has, with high probability, at most 2k centers, and every vertex is at distance at most 2 from 2 of the centers.
BOUNDED FACILITY LOCATION PROBLEMS
5.
245
Facility location problems with arbitrary edge lengths
In this section, we consider the bounded uncapacitated facility location problem and the bounded k-median problem with arbitrary edge lengths. Given a graph G = (V, E), let f i be the cost of selecting vertex i as a center. The bounded uncapacitated facility location problem can be stated as the following integer program, that we call IP3 . X X ci j xi j + f i yi Min i, j∈V
s.t.
X
i∈N d ( j)
i∈V
xi j = 1
∀ j∈V
(9)
xi j ≤ yi ∀i, j∈V xi j , yi ∈ {0, 1} ∀i, j∈V
where N d ( j) = {i ∈ V | ci j ≤ d} and ci j is the distance from i to j. We solve the linear programming relaxation of IP3 , and when f i = 1 for all vertices i, we can use ideas from Chudak (1998) and Guha and Khuller (1998) to round this solution to get an algorithm with the following performance ratio. Theorem 5.1. If f i = 1 for every vertex i ∈ V, then there is a deterministic (3, 2)approximation algorithm for the bounded uncapacitated facility location problem. Proof: We first solve L P3 optimally. We can assume that the solution is complete, i.e. xi j > 0 ⇒ xi j = yi . Let α ∈ [0, 1] be a fixed constant to be specified later. Let N ( j) = {i ∈ V | xi j > 0} and let Nα ( j) be the smallest set of vertices i ∈ N ( j) that are closest to j and such that their xi j values add up to at least α. For a given vertex j ∈ V , let i 0 ∈ Nα ( j) be the vertex farthest from j, and let c j (α) = ci0 j After solving the linear program we round the solution as follows. Find a vertex j with the smallest c j (α) value and select it as center. All vertices in the set J j = {i | Nα ( j) ∩ Nα (i) 6= ∅} are serviced by vertex j. Remove j and all the vertices in J j from the graph and repeat the process until all vertices have been deleted. Notice triangle inequality the service cost of each vertex i is at most 2ci (α). P that by the P Since i∈Nα ( j) yi ≥ i∈Nα ( j) xi j ≥ α and the centers j have disjoint sets Nα ( j), then the selection of these centers increases the total cost of the centers by at most a factor of α1 . So P 1 P 2 j∈V c j (α). the overall cost of the solution is at most: P P α i∈V yi +P Observe that for every vertex j, i∈V ci j xi j ≥ i∈V : ci j ≤c j (α) ci j x i j + i∈V P: ci j >c j (α) 1 2 cPj (α)xi j ≥ (1−α)c j (α). Hence, the value of the solution is at most max{ α , 1−α }( i∈V yi + 1 2 i, j∈V ci j x i j ). By choosing α = 1−α we obtain a 3-approximation algorithm. Since c j (α) ≤ d for every vertex j, then the cost of servicing any vertex j is at most 2 2c j (α) ≤ 2d. If we relax the constraint on the maximum distance from a vertex to its closest center, we can use ideas of Section 4.4 to get an algorithm for the bounded uncapacitated facility location problem with arbitrary costs on the centers. The idea is to first build the dual for the linear program relaxation of IP3 and solve it optimally. Let v j be a dual variable corresponding to the primal constraint (9). We solve the linear program relaxation of IP3
246
KRYSTA AND SOLIS-OBA
and perform the clustering step of Section 4.4 but we create clusters around vertices i with smallest value vi + Ci . The rest of the algorithm is as in Section 4.4. Theorem 5.2. (Chudak) There is a deterministic (1.736, 3)-approximation algorithm for the bounded uncapacitated facility location problem. We can formulate the bounded k-median problem with arbitrary edge lengths as the linear program L P1 with N j replaced with N d ( j). To design an approximation algorithm for this general problem we first solve the linear program, and then round the solution like we did in the proof of Theorem 5.1. The analysis of this algorithm is very similar to that of Theorem 5.1, hence we omit it. This algorithm finds a solution of value no more than 1 for 2/(1 − α) times the optimum and it uses at most (1/α)k centers. Choosing α = 1+ε some value ε > 0 we obtain the following result. Theorem 5.3. There is a deterministic (2(1 + 1ε ), 2, 1 + ε)-approximation algorithm for the metric bounded k-median problem for any value ε > 0. 6.
Conclusions
We have presented randomized and deterministic algorithms for the bounded k-median and the bounded facility location problems. These problems are natural generalizations of the classical k-median and the facility location problems. We view these problems as muti-criteria optimization problems where the objective functions are: (1) minimize the total service cost, (2) minimize the maximum service cost, and (3) minimize the number of centers. Our results are summarized in the following table. Problem
Randomized algorithm
Deterministic algorithm
k Uniform bounded k-median (1.4211, 2, 2), (1.8478, 3, 1) (2 − n−k , 1, 1), (1.5, 2, 2) Bounded k-median with k , 3, 1) {0, ∞} weights (2.076, 3, 1) (3 − n−k Fault tolerant bounded k-median (1.4489, 2, 2) Metric bounded k-median (2(1 + 1ε ), 2, 1 + ε) Bounded facility location with f i = 0 (3, 2) Metric bounded facility location (1.736, 3)
Acknowledgment First author was supported by Deutsche Forschungsgemeinschaft (DFG) as a member of the Graduiertenkolleg Informatik, University of Saarland, Germany.
BOUNDED FACILITY LOCATION PROBLEMS
247
References S. Arora, P. Raghavan, and S. Rao, “Approximation schemes for Euclidean k-medians and related problems,” Proceedings of the 30th Annual ACM Symposium on Theory of Computing, 1998, pp. 106–113. O. Berman and E.K. Yang, “Medi-center location problems,” Journal of the Operational Research Society, vol. 42, pp. 313–322, 1991. ´ Tardos, and D.B. Shmoys, “A constant-factor approximation algorithm for the k-median M. Charikar, S. Guha, E. problem,” in Proceedings of the 31st ACM Symposium on Theory of Computing, 1999. S. Chaudhuri, N. Garg, and R. Ravi, “The p-neighbor k-center problem,” Information Processing Letters, vol. 65, pp. 131–134, 1998. S.S. Chaudhry, I.C. Choi, and D.K. Smith, “Facility location with and without maximum distance constraints through the p-median problem,” International Journal of Operations and Production Management, vol. 15, pp. 75–81, 1995. I.C. Choi and S.S. Chaudhry, “The p-median problem with maximum distance constraints: A direct approach,” Location Science, vol. 1, pp. 235–243, 1993. F. Chudak, “Improved approximation algorithms for uncapacitated facility location,” in Integer Programming and Combinatorial Optimization, R.E. Bixby, E.A. Boyd, and R.Z. Rios-Mercado (Eds.), Lecture Notes in Computer Science, vol. 1412, 1998, pp. 180–194. G. Cornuejols, G. L. Nemhauser, and L.A. Wolsey, “The uncapacitated facility location problem,” in Discrete Location Theory, P.B. Mirchandani and R.L. Francis (Eds.), Wiley: New York, 1990, pp. 119–171. Z. Drezner (Ed.), Facility Location. A Survey of Applications and Methods, Springer-Verlag: New York, 1995. S. Guha and S. Khuller, “Greedy strikes back: Improved facility location algorithms,” in Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, 1998, pp. 649–657. D.S. Hochbaum and D.B. Shmoys, “A unified approach to approximation algorithms for bottleneck problems,” Journal of the ACM, vol. 33, pp. 533–550, 1986. A.K. Jain and R.C. Dubes, Algorithms for Clustering Data, Prentice Hall: Englewood, NJ 1981. S. Khuller, R. Pless, and Y. J. Sussmann, “Fault Tolerant k-center problems,” Theoretical Computer Science, vol. 242, pp. 237–245, 2000. B.M. Khumawala, “An efficient algorithm for the p-median problem with maximum distance constraint,” Geographical Analysis, vol. 5, pp. 309–321, 1973. G. Lin and G. Xue, “Balancing shortest-path trees and Steiner minimum trees in the rectilinear plane,” in Proceedings of the 1999 IEEE International Symposium on Circuits and Systems (ISCAS’99), pp. 117–120, vol. VI, 1999. J.H. Lin and J.S. Vitter, “²-Approximations with minimum packing constraint violation,” in Proceedings 24th ACM Symposium on Theory of Computing, 1992a, pp. 771–782. J.H. Lin and J.S. Vitter, “Approximation algorithms for geometric median problems,” Information Processing Letters, vol. 44, pp. 245–249, 1992b. M.V. Marathe, R. Ravi, R. Sundaram, S.S. Ravi, D.J. Rosenkrantz, and H.B. Hunt III, “Bicriteria network design problems,” Journal of Algorithms, vol. 28, pp. 142–171, 1998. R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge University Press, New York, 1995. ´ Tardos, and K. Aardal, “Approximation algorithms for facility location problems,” in Proceedings D.B. Shmoys, E. of the 29th ACM Symposium on Theory of Computing, 1997, pp. 265–274. C. Toregas, R. Swain, C. ReVelle, and L. Bergman, “The location of emergency service facilities,” Operations Research, vol. 19, pp. 1363–1373, 1971.