An Approximate Solution Algorithm for the One ... - Springer Link

1 downloads 0 Views 518KB Size Report
Abstract—The online median problem consists in finding a sequence of incremental solutions of the k-median problem with k increasing. A particular case of the ...
c Pleiades Publishing, Ltd., 2008. ISSN 1990-4789, Journal of Applied and Industrial Mathematics, 2008, Vol. 2, No. 3, pp. 421–425.  c V.V. Shenmaier, 2007, published in Diskretnyi Analiz i Issledovanie Operatsii, Ser. 1, 2007, Vol. 14, No. 2, pp. 95–104. Original Russian Text 

An Approximate Solution Algorithm for the One-Dimensional Online Median Problem V. V. Shenmaier* Sobolev Institute of Mathematics, pr. Akad. Koptyuga 4, Novosibirsk, 630090 Russia Received September 11, 2006; in final form, January 9, 2007

Abstract—The online median problem consists in finding a sequence of incremental solutions of the k-median problem with k increasing. A particular case of the problem is considered: the clients and facilities are located on the real line. The best algorithm available for the one-dimensional case has competitive ratio 8. We give an improved 5.83-competitive algorithm. DOI: 10.1134/S1990478908030125

INTRODUCTION The difference of the online median problem from the usual k-median problem is that the total number k is unknown of facilities to be placed. Instead we must place the facilities one at a time so that at each step the facilities already placed constitute an exact or approximate solution to the median problem of the corresponding size. Assume that we are given: some finite set of clients C; some finite set of facilities F ; the distance d(u, f )  0 defined for each client u and each facility f ; the weight w(u)  0 defined for each client u. The cost cost(X) of an arbitrary set X of facilities is equal to the weighted sum:  w(u) d(u, X), cost(X) = u∈C

where the distance d(u, X) from the client u to X is the distance from u to the nearest facility in X: d(u, X) = min d(u, f ). f ∈X

In the k-median problem we are also given some positive integer k. We are to find a set X of facilities of size k with the minimal cost among all sets of size k. The online median problem consists in finding a sequence F1 ⊂ F2 ⊂ · · · ⊂ Fn of sets of facilities, where n is the size of F , such that for all k = 1, 2, . . . , n the size of Fk is equal to k, and the cost of Fk is minimal among all sets of size k. Since no sequence of iterated exact medians exists seemingly in the majority of cases, it makes sense to discuss the approximate solutions to the problem. We say that a solution F1 , F2 , . . . , Fn to the online median problem has the competitive ratio δ provided that, given k = 1, 2, . . . , n and some set X of facilities of size k, we have the inequality cost(Fk )  δ cost(X). In other words, each Fk differs in cost from the precise solution to the k-median problem at most by a factor of δ. The first algorithm with a constant-bounded competitive ratio for the metric online median problem was obtained in [4]. The best available algorithm has competitive ratio 8c, where c is the precision of the solution to the k-median problem [2]. Therefore, in the class of polynomial algorithms, the problem *

E-mail: [email protected]

421

422

SHENMAIER

can be solved with competitive ratio 24 + ε because the best available polynomial algorithm for solving the k-median problem has the performance guarantee 3 + ε [1]. As for some negative results, let us mention the inapproximability limit in the class of polynomial algorithms for the k-median problem equal to 1 + 2/e ≈ 1.74 [3], and the inapproximability limit in the class of all deterministic algorithms for the online median problem equal to 2 − 1/(n − 1) [4]. Consider a particular case of the online median problem in which both clients and facilities lie on the real line. Although this case is the simplest model for location problems, the problem under consideration remains roughly as hard to solve as in the general case. The best algorithm available for the one-dimensional problem [2] is 8-competitive [2] (in the one-dimensional case, c = 1). We propose a polynomial algorithm that solves the one-dimensional online median problem with competitive ratio √ (1 + 2)2 ≈ 5.83. 1. DESCRIPTION OF THE ALGORITHM The basic idea of the algorithm coincides with the idea of the algorithm in [2]. The algorithm starts with finding the solutions F1∗ , F2∗ , . . . , Fn∗ to the usual k-median problem for k = 1, 2, . . . , n. In this case, these are the exact solutions because the one-dimensional k-median problem can be solved in polynomial time. At the next step, the algorithm determines a set K consisting of an inextensible sequence i1 , i2 , . . . , it of indices in which i1 = 1 and, for k = 1, 2, . . . , t − 1, the index ik+1 is the minimal index such that √ α = 1 + 2. cost(Fi∗k+1 )  cost(Fi∗k )/α, Note that, when the algorithm of [2] is used, the analogous constant is equal to 2. We then construct a partial solution to the online median problem corresponding to the indices in K. Namely, we find an increasing sequence Fi1 ⊂ Fi2 ⊂ · · · ⊂ Fit of sets constructed recursively in reverse order: we determine Fit firstly, then we determine Fit−1 , and so on. We put Fit equal to Fi∗t ; and, for k = t − 1, . . . , 1, we define Fik as follows: for each facility f in Fi∗k we consider the set C(f ) of clients for which f is the nearest facility in Fi∗k (this set is called the cluster of f ). We determine the facilities in Fik+1 nearest to f from the left and from the right; at least one of these exists. We pick the best of them from the viewpoint of minimizing the weighted sum of the distances to the clients in C(f ). The facilities chosen in this way for all f ∈ Fi∗k constitute the set Fik . Henceforth, we denote this construction of Fik by Γ; i.e., Fik = Γ(Fik+1 , Fi∗k ). Note that the number of points in Fik is at most the size of Fi∗k , which is equal to ik . If the size of Fik is less than ik , we add to Fik the missing points of Fik+1 \ Fik picked arbitrarily. Therefore, a partial solution Fi1 , Fi2 , . . . , Fit to the online median problem is constructed. This solution can be extended arbitrarily to a complete solution by adding to Fik for k = 1, 2, . . . , t either some points of Fik+1 \ Fik if k < t or some points of F \ Fit if k = t. Theorem. The above algorithm has competitive ratio √ δ = (1 + 2)2 . Note that, in constructing Fik , we can take an unquestionably better set, i.e., the solution to the ik median problem for the set Fik+1 of facilities. This option is not only better as regards the target function, but also more universal, as it makes sense for an arbitrary metric problem that lacks the concepts of “nearest on the left/right”. Note also that, in the algorithm of [2], during the construction of Fik , the replacement for a facility f in Fi∗k is determined as the facility in Fik+1 nearest to f . The complexity of this algorithm is determined by the complexity of finding the initial sequence F1∗ , F2∗ , . . . , Fn∗ , which in turn coincides with the complexity of finding one k-median since the dynamic programming method used to solve the one-dimensional k-median problem reduces it to a series of kmedian problems for various k. JOURNAL OF APPLIED AND INDUSTRIAL MATHEMATICS

Vol. 2 No. 3 2008

AN APPROXIMATE SOLUTION ALGORITHM

423

2. PROOF OF THE UPPER BOUND Let us introduce the following concept: a configuration is a triple consisting of an arbitrary weighted set C of clients and two sets S and S ∗ of facilities, of which the second is the exact solution to the kmedian problem with k = |S ∗ | and the set C of clients. The justification of the upper bound for the competitive ratio of solutions which is obtained by the algorithm rests on the following Lemma. Given an arbitrary configuration C, S, S ∗ we have the inequality √ β = (1 + 2)/2. (1) cost(Γ(S, S ∗ ))  β (cost(S) + cost(S ∗ )), We prove this lemma below. Proof of the theorem. By the lemma, the cost of Fik for k = 1, 2, . . . , t is at most the sum of the series β cost(Fi∗k ) + β 2 cost(Fi∗k+1 ) + · · · . Taking the choice of the indices i1 , i2 , . . . , it into account, we have   βα . cost(Fik )  β cost(Fi∗k ) 1 + β/α + (β/α)2 + · · · = cost(Fi∗k ) α−β Therefore, we obtained an upper bound for the competitive ratio of the partial solution Fi1, Fi2, . . . , Fit . / K. Take the maximal index ik Let us estimate the cost of an arbitrary set Fs for s = 1, 2, . . . , n and s ∈ in K with ik < s. Then cost(Fs∗ ) > cost(Fi∗k )/α by the construction of K. Since the target function does not increase as the number of facilities placed increases, this implies cost(Fs )  cost(Fik )  cost(Fi∗k )

βα βα2 < cost(Fs ) . α−β α−β

Note that α = 2β (this is the optimal value of the√constant α). Consequently, the competitive ratio of the algorithm is at most 4β 2 , which is equal to (1 + 2)2 . The proof of the theorem is complete. Proof of the lemma. Suppose that for some configuration C, S, S ∗ we have the converse:     cost Γ(S, S ∗ ) > β cost(S) + cost(S ∗ ) .

(2)

We will arrive at a contradiction in two stages: the construction of a substantially simpler configuration in which (2) holds as well; the derivation of a contradiction using the simplicity of this configuration. The first stage consists in the six simplifications of a configuration described below: Simplification 1. By (2), there is at least one facility c ∈ S ∗ such that the contribution of the clients in the cluster C(c) to the left-hand side of the inequality is greater than to the right-hand side. Consequently, if S ∗ = {c} and C = C(c) then (2) still holds. Observe that the collection C, S, {c} is a configuration because the point c is the 1-median of its cluster. Let us introduce some notation. Denote the nearest neighbors of c in S on the left and on the right respectively by a and b. We can assume that both neighbors exist (otherwise we can take the infinitely distant point as the missing neighbor). Without loss of generality, we may assume that c is not located to the right of d = (a + b)/2; the case in which c > d is completely symmetric and can be dealt with similarly. Denote the costs of the sets S, {c}, {a}, and {b} respectively by f , f c , f a , and f b . Then the inequality (2) assumes the form min(f a , f b ) > β(f c + f ).

(3)

Simplification 2. Consider the interval of the real line to the left of a. Move the clients in this interval to a and exclude the facilities from S. Observe that all quantities in (3) do not increase, while f c , f a , and f b decrease by the same constant. Consequently, (3) still holds. Repeat this for the clients and facilities located to the right of b. Observe that, following these simplifications, S contains only the facilities a and b. JOURNAL OF APPLIED AND INDUSTRIAL MATHEMATICS

Vol. 2 No. 3

2008

424

SHENMAIER pppp

ppp p p p  p p p p pppp p pp   p p p p p  p  p pp β (f c + f ) HH   H   HH f b  H  a f   r r x 0 0.5 pppp

pppp

Fig. 1.

Simplification 3. Observe that the weighted sum of the distances from a facility to the clients located on one side of the facility is equal to the distance form the facility to the center of mass of these clients multiplied by their total weight. This implies that the quantities f a , f b , f c , and f do not change if the set of clients located on either of the intervals (a, c), (c, d), and (d, b) is replaced with some other set with the same total weight and center of mass. Using this observation, replace the set of clients in each of these intervals with the set consisting of two clients located at the endpoints of the interval and having the same total weight and center of mass. Only four clients a, c, d, and b remain now. Simplification 4. Move the client d to the point c. Observe that f b does not decrease, while f a , f c , and f decrease by the same constant. Consequently, (3) still holds. Thus, the number of clients again reduces by one. Simplification 5. Decrease the weights of the clients a and b by   Δw = min w(a), w(b) . At that, f remains unchanged since facilities from the set S are located at a and b, while f a , f b , and f c decrease by the same constant Δw (b − a). Consequently, (3) still holds. Observe that the resulting collection C = {a, c, b}, S = {a, b}, and {c} is still a configuration. Indeed, by the well-known property of the 1-median, a point on the line is a median if and only if the total weight of the clients located on one side of it is at most the total weight of the clients on the other side and at the point itself [5]. Since Simplifications 2–5 preserve this property, the point c is still a 1-median on the set C of clients. Simplification 6. Normalize the weights of the clients and coordinates of all points in the resulting configuration. Since, following Simplification 5, one of the clients a and b has zero weight; therefore, by the property of 1-medians, the weight of c must be maximal. Consequently, the normalization yields the configuration consisting of the three clients: a = 0, c = x  1/2, and b = 1 with the weights y, 1, and z, where y, z  1 and one of the weights y and z is equal to 0. This concludes the first and more complicated stage of deriving a contradiction. Proceed to the second stage, using the simplicity of the resulting configuration to demonstrate that (3) cannot hold. Write out explicitly the terms in the inequality in question: f b = 1 − x + y,

f a = x + z,

f = x,

f c = xy + (1 − x)z.

Since the right-hand side of the inequality is a linear function of x, while the left-hand side is piecewise linear; therefore, the inequality must be satisfied either at the endpoints of the interval (0, 1/2) or at a corner point of the left-hand side (see Fig. 1). For x = 0, the inequality fails since min(f a , f b ) = z,

f = 0,

f c = z.

Similarly, for x = 1/2, we have f a = 1/2 + z,

f b = 1/2 + y,

f = 1/2,

f c = y/2 + z/2.

Consequently, min(f a , f b )  (f a + f b )/2 = 1/2 + z/2 + y/2 = f + f c ; JOURNAL OF APPLIED AND INDUSTRIAL MATHEMATICS

Vol. 2 No. 3 2008

AN APPROXIMATE SOLUTION ALGORITHM

425

i.e., (3) also fails. The corner point of the left-hand side of (3) is determined by the relation f a = f b , which amounts to x + z = 1 − x + y and x = (1 + y − z)/2. Since x  1/2, this implies y  z. Since one of the variables y and z vanishes, it follows that y = 0. Consequently, z = 1 − 2x and f c = (1 − x)z = 1 − 3x + 2x2 . This reduces (3) to the form 1 − x > β (x + 1 − 3x + 2x2 ), or 2βx2 + x (1 − 2β) + β − 1 < 0. The discriminant of this quadratic form is equal to 1 − 4β + 4β 2 − 8β 2 + 8β = −4β 2 + 4β + 1. However, the constant β is chosen so that the last expression vanishes. Consequently, the parabola 2βx2 + x (1 − 2β) + β − 1 is tangent to the x-axis and never enters the lower half-plane. Therefore, (3) fails at the corner point of the left-hand side as well. The resulting contradiction completes the proof of the lemma. ACKNOWLEDGMENTS The author was supported by the Russian Foundation for Basic Research (project no. 05–01– 00395). REFERENCES 1. V. Arya, N. Garg, R. Khandekar, K. Munagala, and V. Pandit, “Local Search Heuristic for k-Median and Facility Location Problems,” in Proceedings of the 33rd Symposium on Theory of Computing (STOC) (ACM Press, New York, 2001), pp. 21–29. 2. M. Chrobak, C. Kenyon, J. Noga, and N. Young, “Online Medians Via Online Bribery,” arXiv: cs.DS/0504103 (2005) (see http://arxiv.org/abs/cs.DS/0504103). 3. K. Jain, M. Mahdian, and A. Saberi, “A New Greedy Approach for Facility Location Problems,” in Proceedings of the 34th Symposium on Theory of Computing (STOC) (ACM Press, New York, 2002), pp. 731–740. 4. R. Mettu and G. Plaxton, “The Online Median Problem,” SIAM J. Comput. 32 (3), 816–832 (2003). 5. Discrete Location Theory, Ed. by P. Mirchandani and R. Francis (Wiley, New York, 1990).

JOURNAL OF APPLIED AND INDUSTRIAL MATHEMATICS

Vol. 2 No. 3

2008

Suggest Documents