Aggregation error for location models - Applied Mathematics and ...

4 downloads 0 Views 639KB Size Report
Sep 24, 2009 - For earlier reviews of aggregation for location models, see Francis et al. ...... Englewood Cliffs: Prentice–Hall (Exercise 12.23 on page 505 ...
Ann Oper Res DOI 10.1007/s10479-008-0344-z

Aggregation error for location models: survey and analysis R.L. Francis · T.J. Lowe · M.B. Rayco · A. Tamir

© Springer Science+Business Media, LLC 2008

Abstract Location problems occurring in urban or regional settings may involve many tens of thousands of “demand points,” usually individual private residences. In modeling such problems it is common to aggregate demand points to obtain tractable models. We survey aggregation approaches to a large class of location models, consider and compare various aggregation error measures, identify some effective (and ineffective) aggregation error measures, and discuss some open research areas. Keywords Location · Aggregation · p-median · p-center · Covering

1 Introduction Many location problems involve locating services (called servers) with respect to customers of some sort (called demand points, and abbreviated as DPs). Usually there is travel between servers and DPs, so that travel distance, or (more generally) travel cost, is of interest. Location models represent this travel cost, and solutions to the models provide locations of the servers of (nearly) minimal cost. For books on location models and modeling, see Daskin R.L. Francis () Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA e-mail: [email protected] T.J. Lowe Tippie College of Business, University of Iowa, Iowa City, IA 52242, USA e-mail: [email protected] M.B. Rayco Modular Mining Systems, Inc., 3289 E. Hemisphere Loop, Tucson, AZ 85706, USA e-mail: [email protected] A. Tamir Department of Statistics & Operations Research, School of Mathematical Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel e-mail: [email protected]

Ann Oper Res

(1995), Drezner (1995a), Drezner and Hamacher (2002), Francis et al. (1992), Handler and Mirchandani (1979), Love et al. (1988), Mirchandani and Francis (1990), and Nickel and Puerto (2005). A common difficulty with modeling location problems that occur in urban or regional areas is that the number of DPs may be quite large, since each private residence might be a DP. In this case it may be impossible, and also unnecessary, to include every DP in the corresponding model. Further, the models may be NP-hard to optimize (Kariv and Hakimi 1979). For problems as diverse as ones including the location of branch banks (Chelst et al. 1988), tax offices (Domich et al. 1991), network traffic flow (Sheffi 1985), and vehicle exhaust emission inspection stations (Francis and Lowe 1992) a popular aggregation approach is used: to suppose every DP in each postal code area or zone of the larger urban area is at the centroid of the postal code area or zone, and to compute distances accordingly. The result is a smaller model to deal with, but one with an intrinsic error. If the modeler wishes to aggregate to have both a small number of aggregate demand points (abbreviated as ADPs), and also a small error, then how to aggregate becomes a nontrivial matter. For earlier reviews of aggregation for location models, see Francis et al. (1999) and Francis et al. (2002b). For a general review of aggregation and disaggregation in optimization, see Rogers et al. (1991). For Ph. D. dissertations on DP aggregation, see Rayco (1996), Zhao (1996), and Emir-Farinas (2002). For an overview of location theory research see Hale and Moberg (2003). The literature on DP aggregation has been growing more rapidly in recent years. Of the DP aggregation journal publications we reference, about 65% have appeared after 1995. It is important to note that DP data is becoming widely available (although at some cost). For example, a well-known CD-ROM telephone book for the USA includes the latitude and longitude for every street address having a street number. Latitudes and longitudes are found using the Geographic Information System capability called address matching. It is tempting to ask the following question: How many ADPs are enough? There are no general answers to this question. This is because there are important tradeoffs in doing aggregation. Aggregation often decreases: (1) data collection cost, (2) modeling cost, (3) computing cost, (4) confidentiality concerns and (5) data statistical uncertainty. The first four items seem self-explanatory; item (5) occurs because aggregation leads to pooled data, which provides larger samples and thus smaller sample standard deviations. The price paid for aggregation is the increase in model error: instead of working with the actual location model we work with some approximating location model. How to trade off the benefits and costs of aggregation is still an open question. The question is open in part because there is no general agreement on how to measure error, and also because there is no accepted way to attach a cost to model aggregation error. As best we know, professional judgment is generally used to do the tradeoffs. One can categorize location models as strategic, tactical, or operational in scope. As pointed out by Bender et al. (2001), planar distances are often used for strategic-level location models, and network distances for tactical-level location models. Such models are often converted to equivalent mixed integer programming (MIP) models for solution purposes, using some finite dominating set principle to reduce the set of possible locations of interest to a finite set (Hooker et al. 1991). Thus results to follow for these planar and network models also apply to their MIP representations, including those for the covering, p-center, and p-median location models. Operational-level location models are not too common (mobile servers are one example), but for such models no aggregation may be best. Note that the scope of the location model may well indicate the degree of aggregation; a more detailed scope requires a more detailed aggregation.

Ann Oper Res

Consider now some terminology and several aggregation examples. We suppose that servers and DPs are either all points in the plane, or on some travel network. In either case, there is some well-defined set, say S, and a distance d(x, y) between any two points x, y in S. If S is a travel network (assumed undirected) then d(x, y) is usually the length of a shortest path between x and y. For planar problems when S = R 2 , with x = (χ1 , χ2 ), y = (ψ1 , ψ2 ), d(x, y) is often the p-distance, x − yp = [|χ1 − ψ1 |p + |χ2 − ψ2 |p ]1/p , with p ≥ 1. Taking p = 1 or 2 gives the well-known rectilinear or Euclidean distance, respectively. The limiting case of the p-distance as p goes to infinity, denoted by x − y∞ , x − y∞ = max{|χ1 − ψ1 |, |χ2 − ψ2 |}, is called the Tchebyshev distance. The Tchebyshev distance in R 2 is sometimes analytically convenient because it is known (Francis et al. 1992) to be equivalent to the planar rectilinear distance under a 45-degree rotation of the axes. We define the diameter of S by diam(S) = sup{d(x, y) : x, y ∈ S}, with the understanding that possibly diam(S) = +∞. More generally, S can be a metric space (Goldberg 1976) with metric d, but no loss of insight occurs by considering the network and planar cases for S. Suppose we have n DPs, aj ∈ S, j = 1, . . . , n. Denote the list (or vector) of DPs by A = (a1 , . . . , an ). When we aggregate, we replace each DP aj by some ADP aj in S, obtaining an ADP list A = (a1 , . . . , an ). While the DPs are usually distinct, the ADPs are not, since otherwise there is no computational advantage to the aggregation. When we wish to model m distinct ADPs, we let Á (upper case Alpha) denote the set of m distinct ADPs, say Á = {α´ 1 , . . . , α´ m }. We use the former (latter) ADP notation when the correspondence between DPs and ADPs is (is not) of interest. Usually we have m  n. For any positive integer p, let X = {xk : k = 1, . . . , p} denote any p-server, the set of locations of the p servers, X ⊂ S. (This symbol p is a different symbol from the one defining the p-distance.) Denote the location model with the given original DPs by f (X : A), and the one with the aggregate DPs by f (X : A ). (The notation f (X : A) and f (X : A ) captures a key idea that an aggregation is a replacement of A by A , with the entries of A not all distinct.) For the large class of location models with similar or indistinguishable servers, with only the closest one to each DP assumed to serve the DP, for any p-server X ⊂ S and DP a ∈ S we denote by D(X, a) ≡ min{d(xk , a) : k = 1, . . . , p} the distance between a and a closest element in X. Then define closest-distance vectors D(X, A) ≡ (D(X, aj )), D(X, A ) ≡ (D(X, aj )) ∈ R+n . Suppose g is some “costing” function with domain R+n attaching a cost to D(X, A) and D(X, A ). This gives original and approximating location models f (X : A) ≡ g(D(X, A)) and f (X : A ) ≡ g(D(X, A )), respectively. Important and perhaps best-known instances of g are the p-median and p-center costing functions, g(U ) = w1 u1 + · · · + wn un , and g(U ) = max{w1 u1 , . . . , wn un } respectively; the wj are positive constants, often called “weights”, and may be proportional to the number of trips between servers and DPs. Thus f (X : A) is either the p-median model, w1 D(X, a1 ) + · · · + wn D(X, an ), or the p-center model, max{w1 D(X, a1 ), . . . , wn D(X, an )}. These models originate with Hakimi (1965) (each is called unweighted if all wj = 1). They are perhaps the two best-known models in location theory. Subsequently, we refer to the p-center, p-median, and covering location model as PCM, PMM, and CLM, respectively. These models are NP-hard to minimize (Kariv and Hakimi 1979; Megiddo and Supowit 1984). Consider several aggregation examples which serve to illustrate our notation and basic aggregation ideas. Let N = {1, . . . , n} denote the set of all DP indices. We suppose, for example, that the DPs will be aggregated into two postal code area centroids. Let Ni denote the subset of indices of the DPs in postal area i = 1, 2. Let α´ i denote the centroid of postal area i = 1, 2. Clearly then, the Ni form a partition of N . To aggregate the DPs in the postal code areas into the centroids we replace by α´ i each aj with j ∈ Ni , for i = 1, 2. Thus aj = α´ i

Ann Oper Res

for j ∈ Ni and i = 1, 2. Hence A is now the n-vector of ADPs, and Á = {α´ 1 , α´ 2 } is the ADP set.   j ∈ N }. Let ω1 = {wj : Example aggregation (PMM) f (X : A) = {wj D(X, aj ) :   {wj D(X, aj ) : j ∈ N } = j ∈ N1 }, ω2 = {wj : j ∈ N 2 }. We then have f (X : A ) =  {wj D(X, α´ 1 ) : j ∈ N1 } + {wj D(X, α´ 2 ) : j ∈ N2 } = ω1 D(X, α´ 1 ) + ω2 D(X, α´ 2 ). This example illustrates how aggregation error can occur. If only p-servers are of interest (with p ≥ 2), then taking X to be {α´ 1 , α´ 2 } minimizes f (X : A ) with minimal value of 0, giving a useless underestimation of min{f (X : A) : X}. If there is only one server, X = {x}, and the p-distance is used, then  it is known that this 1-median under-approximation can hold for all x. If we let ω = {wj : j ∈ N }, and  ´ p, let α´ = {(wj /ω)aj : j ∈ N } be the centroid of the DPs, so that f (x : A ) = ωx − α it is known (Francis and White 1974) that f (x : A) ≥ f (x : A ) for all x. This is an important reason why underestimation can occur for PMM aggregation when few centroid ADPs are used. It is also known that for p distances Plastria (2001) the difference f (x : A) − ´ f (x : A ) goes to zero as x gets farther from α´ along an infinite ray with one end point at α. There are good theoretical reasons due to self-canceling error (Plastria 2000, 2001; Francis et al. 2003) for using centroids as ADPs for the PMM, but none that we know of for the PCM and CLM. Indeed, better choices than centroids are available for the latter two models. Example aggregation (PCM) f (X : A) = max{wj D(X, aj ) : j ∈ N }. Let w1+ = max{wj : j ∈ N1 }, w2+ = max{wj : j ∈ N2 }. We then have f (X : A ) = max{wj D(X, aj ) : j ∈ N } = max{max{wj D(X, aj ) : j ∈ N1 }, max{wj D(X, aj ) : j ∈ N2 }} = max{max{wj D(X, α´ 1 ) : j ∈ N1 }, max{wj D(X, α´ 2 ) : j ∈ N2 }} = max{w1+ D(X, α´ 1 ), w2+ D(X, α´ 2 )}. Again, if only p-servers (p ≥ 2) are of interest, then taking X to be {α´ 1 , α´ 2 } minimizes f (X : A ) with minimal value of 0, giving an underestimate of f (X : A). Example aggregation (CLM) minimize |X| subject to D(X, aj ) ≤ rj , j ∈ N, X ⊂ S. All but two covering constraints for the aggregated model are redundant. Define ρ1 = min{rj : j ∈ N1 }, ρ2 = min{rj : j ∈ N2 }. Thus the aggregated model has constraints D(X, α´ 1 ) ≤ ρ1 , D(X, α´ 2 ) ≤ ρ2 , X ⊂ S. This means it takes at most two servers to solve the aggregated model. CLMs and PCMs are known to be closely related (Kolen and Tamir 1990). We shall see that aggregation results developed for one model often also apply to the other. These examples illustrate two equivalent approaches for representing n DPs with an aggregation of m ADPs. Either we have a partition of the DP index set N into m sets N1 , . . . , Nm with one ADP per set, or for each aj there is a replacing ADP aj , with each aj in the set Á of m distinct ADPs. In either case, three aggregation decisions (Francis et al. 1999) must be made: (1) the number of ADPs, (2) the location of ADPs, (3) the replacement rule: for each aj , what is aj ? The (reasonable) replacement rule often used is to replace each DP by a closest ADP. Further, for the aggregation to be computationally useful we require the number of ADPs, m, be less (usually much less) than the number of DPs, n; also it is reasonable to have p  m. These authors note that versions of these three aggregation decisions occur in location modeling. Hence results in location theory help in doing DP aggregation, so DP aggregation is a sort of “second-order” location problem to solve prior to solving the original or “first-order” problem. These three examples may possibly also suggest that as more ADPs are used the aggregation error decreases—ideally, if we could use m = n ADPs, then we need have no DP aggregation error at all. In fact there are classes of location models where the law of diminishing returns applies: aggregation error decreases at a decreasing rate as m increases. Thus

Ann Oper Res

a very small value of m may cause a very high aggregation error, while a large value of m might give little less error than an appreciably smaller value of m. It is the purpose of this paper to survey aggregation approaches to a large class of location models, consider and compare various error measures, identify some effective (and ineffective) aggregation error measures, and discuss some areas where more work is needed to achieve agreement in order to further research in the area. We now give an overview of our paper. In Sect. 2 we consider various approaches to aggregation error measurement. In Sect. 3 we provide a general literature discussion of DP aggregation. In Sect. 4 we consider several specific aggregation topics in more detail, including limitations and interrelationships of various aggregation error measurements, and present some new results related to measuring location error.

2 Aggregation error measurements While there can be other types of error in location models, the one we focus on is demand point aggregation error, which comes by replacing DPs by ADPs. Thus, instead of actual distances we obtain approximating ones. The use of these approximating ADPs creates error. It is thus important for the location modeler who does the aggregation to be aware of the aggregation error being created. The modeler who does DP aggregation intentionally introduces error into the model. The use of ADPs is the cause of the aggregation error, but there are error effects—including inaccurate values of the objective function and of server locations, due to using the approximating distances. It is important to consider both cause and effects in order to get the whole picture. There are a number of ways to measure error effects; further, the magnitude of aggregation effects can depend on model structure—for the same aggregation, some models can have more error than others. What is clear, in any case, is that the way to minimize DP aggregation error is to not aggregate DPs—certainly this is what we recommend when it is feasible. The ideal way to aggregate DP data is not to aggregate it. If DP data must be aggregated, then we need to consider aggregation error measures. We list and summarize ten such measures in Table 1. All these error measures have an ideal value of zero. One simple way to measure aggregation error is to consider ADP-DP distances. If these distance values are all zero then ADPs and DPs are identical, so there is no error. Later we establish a relationship between ADP-DP distances and other error measures, including the distance difference error. For the PMM, this distance difference error leads to an error we call DP error. Like the difference error, the DP error can be negative or positive. Still considering the PMM, note that total DP error e(X) in Table 1 satisfies e(X) = f (X : A ) − f (X : A), the difference between the aggregated PMM and the original model. Even though no DP error is zero, total DP error can be zero or nearly zero, since negative errors can cancel out positive errors—this is self-canceling error. Unfortunately self-canceling error only applies to models with an additive cost structure. Next, consider ABC errors for the PMM, due to Hillsman and Rhoda (1978). Note that ABC errors are sums of the DP errors which are organized by the ADPs. Suppose we represent an aggregation by a partition of N = {1, . . . , n}, say N1 , . . . , Nm , such that for  i = 1, . . . , m, every DP aj with j ∈ Ni is aggregated into the ADP α´ i ; that is, aj = α´ i for j ∈ is replaced in the aggregate model by {wj D(X, α´ i ) : Ni . Thus {wj D(X, aj ) : j ∈ Ni }  j ∈ Ni } = ωi D(X, α´ i ), where ωi ≡ {wj : j ∈ Ni }. In the parlance of Hillsman and Rhoda, the ABC error illustrates their Source A error, which they define actually as ωi D(X, α´ i ). Using ωi D(X, α´ i ) instead of {wj D(X, aj ) : j ∈ Ni } is a source of error. The special

Ann Oper Res Table 1 Various demand point aggregation error measures for a location model f (X : A). Ideal error measures have value zero for all j and all X No.

Error name

Error definition

1

ADP-DP distances

d(aj , aj ), j ∈ N

2

Distance difference error

3

DP error, PMM

4

Total DP error, PMM

5

ABC error for PMM: N1 , . . . , Nm is a partition of N = {1, . . . , n};  {wj : j ∈ Ni } for i = 1, . . . , m

ωi ≡ 6

Absolute error, any location model

7

Relative error, for all X with f (X : A) > 0

8

Maximum absolute error

9

Error bound eb Ratio error bounds (when f (X : A), f (X : A ) > 0)

10

Location error

D(X, aj ) − D(X, aj ), j ∈ N , all X

ej (X) = wj [D(X, aj ) − D(X, aj )], j ∈ N , all X  e(X) = {ej (X) : j ∈ N }, all X  abci (X) = ωi D(X, α´ i ) − {wj D(X, aj ) : j ∈ Ni } for all X ae(X) = |f (X : A ) − f (X : A)|, all X rel(X) = ae(X)/f (X : A), all X

mae(f  , f ) = max{ae(X) : X, X ⊂ S, |X| = p} a number eb with ae(X) ≤ eb for all X

|f (X : A )/f (X : A) − 1| ≤ eb/f (X : A),

|f (X : A)/f (X : A ) − 1| ≤ eb/f (X : A ) for all X

some measure, diff(X  , X ∗ ), of the

“difference” between p-servers X  and X ∗

case of Source A error when α´ i ∈ X, so that ωi D(X, α´ i ) = 0, is their Source B error. If  ωi D(X, α´ i ) = 0, then it is useless as an estimate of {wj D(X, aj ) : j ∈ Ni }. Source C error is a related sort of allocation error involving closest-distance definitions. Suppose xk ∈ X and is closest in X to α´ i ; we might then assume that every aj ∈ Ni will be closest to xk . However, in reality, some aj ∈ Ni may be closer to another element of X than xk . In effect, we would allocate some DPs to a wrong server location that is not closest to them. Note  abci (X) = {ej (X) : j ∈ Ni } for all i, so total ABC error is e(X) = f (X : A ) − f (X : A). ABC error can be negative or positive, again resulting in possible self-cancellation effects. Hillsman and Rhoda recognize and discuss both total error and error self-cancellation. Now consider any location model f (X : A) with p-server X and its approximation f (X : A ). A difficulty with error measures that can be negative or positive is that a smaller error (e.g., −3000) can be worse than a bigger error (e.g., +3). We can avoid this difficulty by using the (nonnegative) absolute error, ae(X) ≡ |e(X)| = |f (X : A ) − f (X : A)| defined for all X. This measure is familiar from Calculus for measuring how well one function approximates another. Related to ae(X) is the idea of an error bound: a number eb for which ae(X) ≤ eb for all X. An equivalent way to define an error bound, using f  and f to denote the functions f (X : A ) and f (X : A) respectively, is based on the maximum absolute error, mae(f  , f ), a number which may very well be quite difficult to compute. Any error bound is then an upper bound on the maximum absolute error. Good error bounds may be much easier to compute than the maximum absolute error. Relative error can be defined when f (X : A) is always positive: rel(X) ≡ ae(X)/f (X : A), perhaps converted to percent. Depending on model structure, ae(X) may be large but rel(X) may still be small due to the magnitude of f (X : A). Relative error is not affected by the measurement scale chosen, whereas the preceding error measures are.

Ann Oper Res Table 2 Various types of optimality errors for any location model f (X : A). Ideal error measures are zero No.

Error name

Error definition

1

Total error at X 

e(X  ) = f (X  : A ) − f (X  : A)

2

Opportunity cost error

3

Optimality error

f (X ∗ : A) − f (X  : A ) f (X ∗ : A) − f (X  : A)

Assuming f (X : A) > 0 and f (X : A ) > 0 for all X ⊂ S, the relative error idea suggests other equivalent ways of expressing the error bound:       f (X : A)   f (X : A ) eb eb ≤  ≤  − 1 ∀X ⊂ S ⇔ − 1  f (X : A)  f (X : A )  f (X : A ) ∀X ⊂ S.  f (X : A) If the model f (X : A) is on a national scale, but aggregation is done on a city/town scale (e.g., eb = 10 miles, f (X : A) = 500 miles), we could have relatively small ratios eb/ f (X : A) and eb/f (X : A ), in which case the model ratios would be nearly one and we would have a good aggregation. By contrast, if the model is on a city/town scale and the aggregation is also on a city/town scale, we might have a poor aggregation. We need the aggregation scale to be substantially smaller than the model scale in order to have a good aggregation. This is one reason that aggregation may be of more interest for problems of city/town/regional scope than those of national or international scope. There is another way to view the use of an aggregation error bound. The error bound allows us to draw conclusions about a family of original models, instead of just one. If the actual location model is F (X : A) instead of f (X : A), but the error bound applies to both, that is |f (X : A ) − F (X : A)| ≤ eb

and

|f (X : A ) − f (X : A)| ≤ eb

for all X,

then whatever conclusions we draw about the function f using the error bound inequality also apply to the function F . While we lose accuracy when we aggregate, we gain the ability to draw approximate conclusions about a family of original functions. As a general example of the function F , suppose instead of DP set {aj : j ∈ N } we have a different DP set, say {bj : j ∈ N }, defining F , while all other model data is the same as for f (X : A). If each DP bj is aggregated into aj , then each of the functions F and f will be aggregated into the same approximating model, denoted by f  . Further, if also d(aj , aj ) = d(bj , aj ) for j ∈ N , then the methods we present later would provide both F and f  , and f and f  , with the same error bound. The data for F and f differ, but are sufficiently similar that the aggregation does not detect the differences. Denote (globally) minimizing solutions to any original and approximating location models f (X : A) and f (X : A ) by X ∗ and X  respectively. While we usually cannot expect to find X ∗ if we must aggregate DPs, we can still obtain some information about X ∗ if we know an error bound eb and X  . Geoffrion (1977) proves that |f (X  : A ) − f (X ∗ : A)| ≤ eb, |f (X  : A) − f (X ∗ : A)| ≤ 2eb. Supposing f (X  : A) > 0, we thus have |1 − f (X ∗ : A)/ f (X  : A)| ≤ 2eb/f (X  : A). Hence, if 2eb is small relative to f (X  : A), we may reasonably accept X  as a good substitute for X ∗ . We assume henceforth that we can compute X  but not X ∗ . Note that if we wish to use X  to approximate X ∗ , then it makes no sense to allow p ≥ m, for then we can place a new facility at every ADP and may achieve a minimal approximating function value of f (X  : A ) = 0. Certainly it is desirable to have p  m. Various authors, cited in Sect. 3, have proposed different types of optimality errors which we list in Table 2. The first error can be computed, and indicates how well the approximating

Ann Oper Res

function estimates the original function at X  . For large models, the second two errors cannot be computed without knowing X ∗ . They can be computed for smaller models where X ∗ can be found without the need to aggregate, or for larger models if one assumes the algorithm used to solve the original problem provides X ∗ . Unless one can be certain that X ∗ is known, or that some properties of X ∗ are known, the latter two measures do not seem useful. Although it is reasonable to use some measure of the difference between f (X : A ) and f (X : A) to represent aggregation error, doing so results in what may well be called the paradox of aggregation (Francis and Lowe 1992). Often our principal reason to aggregate is because we cannot afford, computationally, to make many function evaluations of f (X : A). We want to aggregate to make the error small; however, algorithms to do this typically require numerous function evaluations of f (X : A) and thus cannot be used for this purpose. Usually it is practical, however, to compute error measures for at least a few X, and we certainly recommend doing so whenever possible. For example, given we know A and A , we can use a sampling approach to compute a random sample of size K of p-servers, say X1 , . . . , XK , compute f (Xk : A ) and f (Xk : A) for each sample element Xk , and then compute a sample error estimate of any error measure of interest. Location error (Casillas 1987; Daskin et al. 1989) involves some comparison of the pserver locations X ∗ and X  . There are several difficulties with using this concept. First, if we really knew X ∗ we would not need to do the aggregation. Second, when |X ∗ | ≥ 2, there appears to be no accepted way to define the difference between X ∗ and X  . Third (assuming we do know X ∗ ), the function f (X : A), particularly if it is the PMM function, may well be relatively flat in the neighborhood of X ∗ , as pointed out by Erkut and Bozkaya (1999). This means we could have some X  with f (X  : A) only a little larger than f (X ∗ : A), but X  is “far” from X ∗ . Fourth, X  and X ∗ may not be unique global minima. Why are comparisons made of X  with X ∗ ? We speculate they are made in part due to unstated subjective evaluation criteria, or known but unstated supplementary evaluation criteria. As another possible example of the use of location error, we might solve the approximating model with three different levels of aggregation (numbers of ADPs), obtaining three corresponding optimal p-servers say X  , X  and X  . In this case, differences between successive pairs of these pservers might be of interest; we might want to know how stable the optimal server locations are as we change the level of aggregation (Murray and Gottsegen 1997). Subjective or unstated aggregation error criteria may well be important, but are not welldefined. Thus two analysts can study the same DP aggregation and not agree on whether it is good or not. Further, if a subjective evaluation derives from some visual representation of DPs and ADPs, such an analysis may single out some relatively simple visual error feature that is inappropriate for the actual model structure. For example, a visual analysis could not evaluate the (computationally intense) absolute error for the PMM. Some generally accepted way to measure location error is desirable. How should we measure the location error diff(X, Y ), the “difference” between any two p-servers X and Y ? The answer is not simple, because the numbering of the elements of X and of Y is arbitrary, and we must find a way to match up corresponding elements. Further, X and Y are not vectors, but sets. We propose the use of a method discussed by Francis and Lowe (1992). For motivation, consider the case where for each element xk of X there is only one “nearby” element of Y , say yk∧ . In this case we might use either  max{d(xk , yk∧ ) : k = 1, . . . , p} or {d(xk , yk∧ ) : k = 1, . . . , p} as diff(X, Y ). More generally, define the p × p matrix C = (cij ) with cij = d(xi , yj ). Define an assignment (permutation matrix) to be any 0/1 p × p matrix Z = (zij ) having a single nonzero entry of one in each row, and a single nonzero entry of one in each column, and let P denote the set of all such p! assignments (permutation matrices). Define the objective function value v(Z) for

Ann Oper Res

every assignment Z ∈ P by v(Z) ≡ max{cij zij : Z ∈ P }, so that v(Z) is the largest entry in C for which the corresponding entry in Z is one. Define (X, Y ) = min{v(Z) : Z ∈ P }, so that (X, Y ) is the minimal objective function value of the min-max assignment problem with cost matrix C. We propose using (X, Y ) for diff(X, Y ). There are several good reasons for using (X, Y ). One reason is that it has all the properties of a distance (see Goldberg 1976): symmetry: (X, Y ) = (Y, X); nonnegativity: (X, Y ) ≥ 0 and (X, Y ) = 0 ⇔ X = Y ; triangle inequality: (X, Y ) ≤ (X, Z) + (Z, Y ) for any p-servers X, Y and Z. Another reason, further explored in Sect. 4, is that it is related to absolute error. (We could also use the optimal value of the conventional min-sum assignment model for diff(X, Y ). This optimal value also has all the properties of a distance, but we know of no useful relationship between it and absolute error.) We call the distance  the min-max distance. Note, for any two p-servers X, Y ⊂ S, (X, Y ) ≤ diam(S). Further, when p = 1 the min-max distance is just the usual distance, d(x1 , y1 ). Both min-max and min-sum assignment models are well-studied and are efficiently solvable in low polynomial order for any set of real coefficients (Ahuja et al. 1993). In the assignment models we study, the coefficients typically correspond to distances between points in some geometric spaces, e.g., planar Euclidean or rectilinear cases. For these geometric models significantly more efficient algorithms have become available (Agarwal et al. 1999; Agarwal and Varadarajan 1999; Efrat et al. 2001 and Varadarajan 1998). There are a number of relationships between the error measures of Table 1. These relationships, some of which may not be obvious, will be a subject of discussion of Sect. 4, where we also give numerical examples of many of the error measures. It also seems worth pointing out that error measures 2 through 7 of Table 1 are local error measures, since they depend on X. By contrast, measures 1, 8 and 9 may be considered global error measures. There is no general agreement on which aggregation error measure is best. Until the research community agrees on one or more error measures, progress in comparing various aggregation approaches, and in building a cumulative body of knowledge, will necessarily be limited. The lack of agreement on error measures also limits progress in trading off aggregation advantages and disadvantages. Further, because comparisons of various aggregation algorithm results should all be based on the same error measures, there is currently little point in developing for the profession a test data base of DPs to aggregate. For reasons discussed further in Sect. 4, we personally recommend the uses of relative error based on absolute error and/or error bounds, together with ADP-DP distances. The bound in the inequality |1 − f (X ∗ : A)/f (X  : A)| ≤ 2eb/f (X  : A) seems particularly promising. An alternative to using some low computational order approach to aggregate the original demand point set, and then solving the resulting aggregated location model to optimality, is to use some low computational order metaheuristic approach (Pardalos and Resende 2002; Reeves 1993; Resende and de Sousa 2004) to approximately minimize the original, unaggregated location model. The first approach gives bounds on optimality to the original model. The second approach introduces an additional source of error, since a heuristic is used, but may possibly result in a better solution. Given the current state of the art, which approach is best is not known. Indeed, “best” may not even be well-defined, since there is no generally accepted measure of aggregation error.

3 Literature discussion We now provide an overview of some of the principal papers dealing with aggregation error, although we discuss some papers in other sections. The review is organized in two main

D

P

D

C

C

C

C

T

T

C

C

C

1979 Goodchild (1979)

1981 Bach (1981)

1986 Mirchandani and Reilly (1986) T

C

1978 Hillsman and Rhoda (1978)

1987 Current and Schilling (1987)

1987 Casillas (1987)

1991 Ohsawa et al. (1991)

1992 Francis and Lowe (1992)

1993 Hodgson and Neuman (1993)

1994 Ballou (1994)

1995 Fotheringham et al. (1995)

D

P

D

N

P

D

NA

286

most

900 at

N/A

N/A

500

#4 Table 1

#1 Table 2 871

900

#1,#3 Table 2 27

#9 Table 1

#10 Table 1

#2 Table 2;

#4 Table 1

1 to 4

N/A

25-800

25-900

6 to 25

N/A

N/A

50-200

(30,70)

1 Y

N

Y

10

1-100

3

N/A

1

Y

Y

Y

N

N

1,2,4 & 6 N

5,7,9

N/A

Euclidean

Euclidean

Euclidean

Euclidean

Distance

Network

(real line)

Euclidean

Euclidean

Euclidean

rectilinear

Graphical Y

No, but

Y Y

are listed

Graphical

N

N

Y (closed

Y

Y

N/A

form)

Y (heuristics) (heuristics)

Y

(heuristics) (heuristics)

Y

Y

N/A

form)

Y (closed

N

N

N

N

Y

(heuristics) (heuristics) comparison

Y

(heuristics) (heuristics)

Y

N/A

(heuristics) (heuristics) solutions

Euclidean, N/A

network

Y

N/A

diff(X ∗ , X  )

Compute

(heuristics) (heuristics) comparison

Y

N/A

X

X∗

N/A

Compute

Compute

Euclidean, Y

Y&N Euclidean

N

Y/N

of p)

Distance

Real? measure

Data

(Range

Servers

(12,62,171) 1 to 12

1 to 900

distributed server

#1,#3 Table 2 681

#2 Table 1

#8 Table 1

ADPs

Range of

Uniformly 1 to 50 per

DPs

used

#4, 5 Table 1

Range of

Agg meas.

P and N #10 Table 1

P

Emphasis Setting

Year Reference

Table 3 p-median aggregation literature classification

Ann Oper Res

D

D

N and P

C, T

C

C

C

C,T

T

T

T

1997 Hodgson et al. (1997)

1997 Murray and Gottsegen (1997) C

C

1996 Francis et al. (1996)

1998 Andersson et al. (1998)

1999 Bowerman et al. (1999)

1999 Erkut and Bozkaya (1999)

1999 Zhao and Batta (1999)

2000 Francis et al. (2000)

2000 Plastria (2000)

2000 Zhao and Batta (2000)

#9 Table 1,

#3 Table 2

Table 2

of #1, #3

Scaled values

measures

N

P

General

D

D

N,D

measures

cost”, #3 Table 2

“Opportunity

#4 Table 1

#6 Table 1

#1,#3 Table 2

#4 Table 1

#1, #3 Table 2

100-800

158 only

250-1225

ADPs

Range of

10

1-25

1,3,5,7

of p)

N/A

arbitrary

N/A

5032

668

2630

(uniform)

million

N/A

1

N/A

50

300)

(50, 200,

209

832)

N/A

1

N/A

1,5

2-6

5-50

N

N/A

N

Y

Y

Y

X

X∗

and

Y

Y

Distance

Network

norms

Various

arbitrary

Euclidean

Y

N/A

N/A

N/A

Y

N/A

N/A

N/A

Y

(heuristics) (heuristics)

Y

Y

(heuristics) (heuristics)

Network

Y

N

Distance Euclidean

Graphical

N

N

diff(X ∗ , X  )

Compute

N

N

N

comparison

Graphical

N

N

N

(heuristics) (heuristics) comparison

Y

(heuristics) (heuristics)

Y

Rectilinear N

Euclidean

Euclidean

N

Compute

Compute

Rectilinear N

measure

Distance

network network

real

N, but

Y

Y

Y

Y/N

(Range Real?

Servers Data

49 K to 1 (216, 376, 5,7

913

668

5500– 69960

sample error

DPs

used

#9 Table 1,

Range of

Agg meas.

combined sample error

P

Emphasis Setting

Year Reference

Table 3 (Continued)

Ann Oper Res

Reference

Plastria (2001)

Hodgson (2002)

Hodgson and Hewko (2003)

Francis et al. (2003)

Har-Peled and Mazumdar (2004)

Har-Peled and Kushal (2007)

Year

2001

2002

2003

2003

2004

2005

Table 3 (Continued)

T

T

T, C

C

C

C,T

Emphasis

P

P

P

D

D

P

Setting

#6, # 7 Table 1

#6, # 7 Table 1

#9 Table 1

Table 1

error & #5

“Surrogation”

error

“Surrogation”

N/A

N/A

real data

69960 for

1193 &

generated);

N/A

N/A

N/A

N/A

N

N

sets)

Y (2

Y

Y

(computer

1,3,5

1-25

5-50

N

Y/N

of p) 1,2,5

Real?

Data

(Range

Servers

data

25-900

N/A

N/A

400

ADPs

Range of

250000

5000–

722

N/A

2000

DPs

used

#5 Table 1

Range of

Agg meas.

Euclidean

Euclidean

Rectilinear

Euclidean

Euclidean

Rectilinear

and

Euclidean

measure

Distance

N/A

N/A

N

N/A

N/A

N/A

N/A

N

N/A

N/A

Y

X

X∗

Y

Compute

Compute

N

N

N

N/A

N/A

N

diff(X ∗ , X  )

Compute

Ann Oper Res

Rayco et al. (1999)

Fortney et al. (2000)

Agarwal and Procopiuc (2002) T

Agarwal et al. (2002)

Francis et al. (2004a)

1999

2000

2002

2002

2004

T

T

T

2004a Har-Peled (2004a)

2004b Har-Peled (2004b)

T

T

C

T, C

T, C

Francis and Rayco (1996)

C, T

Rayco (1996)

Current and Schilling (1990)

1990

C

1996

Daskin et al. (1989)

1989

P

P

Servers Data

N/A

Rectilinear N

N

N

N/A

Compute

Objective.

N

N

N

N

Y

(Median

Center

Center

Covering

Max.

Covering

Max.

diff(X∗ , X ) Functton.

2500 by construction

69960 for real data

N/A

435

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

a-priori; 25–

10000

set

Not known

5000–

sets

real data

N/A

N/A

N/A

N/A

N/A

4000

N

N

N

N

N

Y

generated)

computer

data

Euclidean

Euclidean

General

Euclidean

Euclidean

Euclidean

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N

N

N/A

N

N

N

N

N

Cover

Center,

N

N

Cover

Center,

Center

for one

construction

Y (other

Y

Y

(heuristics) (heuristics)

Y

Y

Rectilinear N

Euclidean

Euclidean

Euclidean

Compute X

Compute X∗

case)

sets)

Y (2 data

N

Y

Y

Y/N

Distance measure

69960 for

1,3,5,7

1,3,5,7

N/A

30,70

4

of p)

(Range Real?

11993 and 2500 by

a-priori; 25–

10000;

N/A

(185,415)

(67, 201)

Not known

#6, # 7 Table 1 N/A

#6, #7 Table 1

Range of ADPs

5000–

N/A

681

355

#6, # 7 Table 1 N/A

#6, #7 Table 1

N/A

#9 Table 1

#9 Table 1

#8 Table 1

#1, #3 Table 2

#1, #3 Table 2

Range of DPs

Agg meas. used

General #1 Table 1

P

P

P

P

P

P

D

D

Emphasis Setting

1996

Reference

Year

Table 4 p-center and covering aggregation literature classification

Ann Oper Res

Ann Oper Res

categories, median problems and center/covering problems, but there is some overlap since some authors considered both types of problems. Also, unless it is clear, we indicate whether the study was for planar data (P), data on a network (N), or discrete data (D). Some of these references have been mentioned earlier in the paper but are discussed here for completeness. Refer to Tables 3 and 4, and Sect. 3.3, for a summary of our findings. The tables also give supplementary information about the papers, including whether the papers are primarily computational (C), or theoretical (T). 3.1 Median problems Most authors agree that Hillsman and Rhoda (1978) proposed the first formal method for classifying and measuring aggregation error for the median problem. As discussed in Sect. 2, they identified three error sources labeled A, B and C. Source A essentially involves the inaccuracy of measuring average distance from a server location to a set of DPs via distance to a single ADP. Source B, a special case of A, occurs when a server coincides with an ADP. Source C occurs when there is more than one server and some of the represented points are closer to one server while others are closer to another server. To investigate aggregation error, they utilized a grid of regular polygons (hexagons, squares, or triangles) over a planar distribution of DPs, and aggregated the total demand in each polygon to its centroid. They then placed servers at regularly spaced points on a lattice. By varying the spacing between servers they were able to vary the number of ADPs assigned to each server. They concluded that aggregation error was larger when few ADPs were assigned to each server. Goodchild (1979), in a planar setting, studied the problem of aggregating a zone into one ADP so that the average distance to any point in the zone from any arbitrary facility location was equal to the distance to the ADP. He concluded that this is not possible. He then conducted an empirical study using 900 uniformly distributed DPs in the plane by aggregating them into different numbers of ADPs. He also conducted an empirical study of aggregation effects on a network. The primary focus was to study location errors due to aggregation. He compared solutions to resulting problems graphically and concluded that “. . . the effects of aggregation error on median problems are substantial.” He also noted that “. . . aggregation tends to produce much more dramatic effects on location than on the values of the objective function.” Bach (1981) studied aggregation effects for the discrete PMM, PCM and CLM using data sets for the cities of Dortmund, Kleve and Emmerich, Germany. Using these data sets, Bach studied both location error as well as objective function error by examining several levels of aggregation and different distance measures, e.g., Euclidean distance in the plane, travel time by car on a road network, etc. Bach states, “Thus it is possible to conclude that the level of aggregation exerts a strong influence on the optimal locational patterns as well as on the values of the locational criteria.” Similar conclusions were stated for variations in distance measures. Mirchandani and Reilly (1986) studied the problem of approximating distances to points in a region by distance to a single point representing the region. The paper has a good review of pre-1986 literature on aggregation. The authors give an algorithm that partially compensates for the distance error discussed above. Current and Schilling (1987) proposed a method for eliminating error sources A and B, and tested it on real discrete data. Using n = 681 DPs, they solved problems for p = 5, 7, and 10 and with 30 and 70 ADPs. Four different aggregation schemes were used in their experiments. In addition to using {wj D(X, α´ i ) : j ∈ Ni } for aggregated models, they experimented with using a modified distance matrix as follows. The cost of as signing ADP α´ i to facility xk was taken to be {wj d(aj , xk ) : j ∈ Ni }, when xk = α´ i ,

Ann Oper Res

 and {wj d(aj , α´ i ) : j ∈ Ni }, when xk = α´ i . For any given aggregation scheme, use of the above modified distance matrix eliminated error sources A and B, but C still re∗ be the optimal solution to the aggregated problem using the modmained. Letting XAB ified distance matrix, Current and Schilling reported the following results on their test ∗ ∗ ∗ : A ) − f (XAB : A))/f (XAB : A), problems: (f (X  : A ) − f (X  : A))/f (X  : A), (f (XAB ∗  ∗ ∗ ∗ ∗ (f (X : A) − f (X : A))/f (X : A) and (f (XAB : A) − f (X : A))/f (X : A). Error was reduced by using the modified distance, but still remained due to the presence of error source C. Casillas (1987) proposed that error sources A, B and C can result in two types of error: (1) error as a result of wrong server locations (location error) and (2) error associated with an inaccurate objective function value (comparison of f (X ∗ : A) and f (X  : A )). He studied aggregation effects using randomly generated planar demand data. The DP data base consisted of 500 points generated using the negative exponential distribution, thus favoring the placement of points near the origin. He then ran experiments using m = 200, 150, 100 and 50 ADPs. For each level of aggregation m, he randomly selected m of the original DPs to be seeds. Then for each seed he created a zone by clustering each non-seed point with its closest seed. Next, the centroid of each of the m zones was chosen as an aggregate point. He then solved a PMM over these m ADPs for p = 1, 2, 4 and 6. He concluded that optimality error (the difference f (X ∗ : A) − f (X  : A)) was small for small p, but larger for higher levels of aggregation and larger values of p. Casillas plotted the locations of optimal facilities for some of the problems solved. Ohsawa et al. (1991) considered the continuous min-sum and min-max (1-median, 1center) problems in one-dimensional space and the effect of aggregating DPs into the midpoints of intervals of equal width. Thus, the authors refer to aggregated data as “rounded data.” As the aggregate data points are determined by an origin Z and the specified interval width ω, the authors studied the case when Z is a random variable that is uniformly distributed between −ω/2 and ω/2, and analyze the expected location error E{(X ∗ (Z) − X  (Z))2 } and expected cost E{(f (X ∗ (Z) : A) − f (X  (Z) : A ))2 } for each of the models. They concluded that: (1) rounding appears to exert more serious influence on the median problem than on the center problem, and (2) for both models, the circumstance that maximizes cost error minimizes location error. As discussed in Sects. 2 and 4, Francis and Lowe (1992) developed bounds on the absolute error due to aggregation of demand data. Error bounds were developed for both the PCM and PMM. For each problem, the minimal error bound is obtained by using the least objective function value obtained of a location problem that has the same structure as, but also has more variables than, the original problem. Hodgson and Neuman (1993) proposed a method for eliminating error source C by spatially disaggregating data “as needed” during the solution to a p-median problem. Their method made use of Thiessen (Voronoi) overlay polygons, where every point within a given polygon is closer to that polygon’s centroid than the centroid of any other polygon. The disaggregation process was based on membership in a polygon. Their method made use of a geographic information system (GIS) and appears to be computationally intensive. Ballou (1994) studied costing error (f (X  : A) − f (X  : A ))/f (X  : A) for the PMM using U.S. population data with initial DPs being the centroids of 900 3-digit zip code areas. The weights corresponded to population. Transport cost was approximated by using the formula: rate = a + b× (Euclidean distance), with a and b obtained from freight rate tables. The computed rate is multiplied by population to obtain the cost for serving a given DP. Different rate functions were also tested. Ballou examined cost errors by varying m and p. Facilities were located using Cooper’s (1967) location/allocation heuristic. He found that his error

Ann Oper Res

measure increased as p increased (for fixed m), and that error decreased as m increased. This latter relationship displayed diminishing returns. Computational work indicated that to keep the costing error below one percent, m should be approximately 5 to 10 times p, while if costing errors in the five to ten percent range are acceptable, then m can be 3 to 5 times p. Fotheringham et al. (1995) studied the sensitivity of both the objective function value and optimal locations for the discrete problem to different aggregation schemes for a real data set. They examined different “zone definitions” involving 871 Buffalo, New York census blocks aggregating them into different sized aggregate units. Using p = 10, they considered aggregation into 800, 400, 200, 100, 50 and 25 ADPs. For each aggregation level, they randomly selected m seeds from the 871 original units and then clustered the remaining (871 − m) units to the seeds creating m ADP-regions. The centroid of each ADP-region was then computed and a 10-median problem was solved. At each aggregation level this process was repeated 20 times. The resulting objective function values were then plotted. To measure location error, they first found X ∗ (the optimal solution to the original problem) ∗ . Then, at a given aggregation and arbitrarily indexed the elements of X ∗ as x1∗ , x2∗ , . . . , x10 level  and from each of the 20 solutions, letting X  be one of these solutions, they calculated   G = i=1,...,20 d(xi∗ , x[i] ), where x[i] is the closest member of X  to xi∗ . In this procedure,  x[1] is found first and removed from the set of x  values, etc. Note this approach amounts to a heuristic solution to a min-cost assignment problem, as discussed in Sect. 2. The 20 G values for a given aggregation level were then plotted. The authors concluded that objective function values did not seem to depend much on the level of aggregation, but the optimal facility locations, i.e., location error, did seem to be quite sensitive to changes in the level of aggregation. Francis et al. (1996) considered DP aggregation for the planar PMM with rectilinear distances, and weights normalized (scaled) to total 1. Their approach was to find an aggregation giving a small value of the error bound of Francis and Lowe (1992). The authors showed that the error bound they studied, while worst-case, is attainable, and gave necessary and sufficient conditions for attainability. They also gave a discussion of various error measures, and an argument, based on work of Geoffrion (1977), that their error bound was a “natural” error measure to use. The authors considered a collection of row-column aggregations with one ADP per cell. Their approach, which built on work of Hassin and Tamir (1991), minimized the error bound over a (restricted) class of row-column aggregations. Rows (and columns), need not have equal spacings. Adjusting the number of rows and columns adjusts the total number of ADPs. For problems with real data, the authors found it helped to add an extra “touchup” step of low computational order to obtain a better bound. Their procedure, predominantly of O(n log n), was tested extensively using both randomly generated data and real data. Error bounds, as well as various sample error estimates, were computed. The law of diminishing returns occurred with all the testing; each error measure decreased at a decreasing rate as the number of ADPs increased. Hodgson et al. (1997) studied different types of aggregation error using Edmonton, Canada census data. In addition to error sources A, B and C identified by Hillsman and Rhoda (1978) they also defined and studied source D error, the error introduced by aggregating potential server locations. That is, an error might possibly arise when a potential facility site is not available because it has been aggregated with one or more other potential sites. Computational work was performed on problems of sizes 668 × 668, 158 × 158, 158 × 668, and 668 × 158, where the first entry is m (smaller values are the result of aggregation) while the second entry is the number of potential facility locations. The problem was solved on these data. The authors reported computational estimates of (f (X  : A ) − f (X  : A))/ f (X  : A) (called measurement error), and (f (X  : A) − f (X ∗ : A))/f (X ∗ : A) (called location error) for various values of p. They found that measurement error decreased with p,

Ann Oper Res

while location error increased with p. They also studied isolating the various error sources into types A through D. Murray and Gottsegen (1997) studied the stability of facility locations and objective function values for the planar PMM with respect to various levels of demand data aggregation and various aggregation schemes for a fixed level of aggregation. They used U.S. Bureau of Census block group data from the 1990 census for the Buffalo, New York metropolitan area. The original DPs consisted of 913 block groups of elderly citizens totaling 138,515 people with demands for senior services centers. A total of 10 servers were located using the DPs giving f (X ∗ : A). Then f (X  : A ) was found for aggregation levels of m = 100, 200, 400 and 800. Furthermore, for each level of aggregation, various clustering methods were used to create the aggregate sets. They found that although the server locations varied quite a bit depending on the level of aggregation and aggregation method used, for a fixed value of m, the corresponding objective function values of the various location sets, when evaluated against the original data, i.e., f (X  : A), did not seem to vary that much. Comparing f (X ∗ : A) with f (X  : A), they found that more aggregation (smaller m values) produced poorer results. They concluded that “. . .more intelligent and sophisticated methods of aggregating data do appear to be worth the additional effort in terms of reducing error.” Andersson et al. (1998) studied DP aggregation methods for network versions of the PMM and PCM. A coarse aggregation structure was first obtained by partitioning the DPs according to a variable-spacing grid imposed over the demand region using “Row-Column” aggregation algorithms of Francis et al. (1996) or of Rayco et al. (1997). A second step located an ADP on each of the subnetworks induced by the cells of the grid; each ADP is either a network 1-median or 1-center, depending on the problem. Optionally, the ADP set initialized an iterative network location-allocation method to find improved ADPs. The concept of a network Voronoi diagram (see Hakimi et al. 1992) was used extensively to find closest ADPs to DPs. The network data sets—not aggregated—were computerized maps from the TIGER/Line database of the U.S. Census Bureau. The maps were of large U.S. cities, including Jacksonville, Florida and Chicago, Illinois. Hypothetical DPs were spaced equally along network arcs. In addition to the four sample error measures discussed in Sect. 2, a “relative error bound” eb/f (X : A), an upper bound on the relative error, was considered. Error measures obeyed the law of diminishing returns with increasing numbers of ADPs. For the 5-median problem and the city of Jacksonville with n = 206,761, for example, the sample maximum relative error was at most 1% with at least 500 ADPs. While the aggregations were sensitive to the street network structure, the error estimates were not, and not too sensitive to the value of p. Computed error bounds largely overestimated the sample errors for both models, but the sample errors closely tracked the error bound values. Error estimates based on rectilinear distances were consistently less than the more accurate estimates based on network distances. Overall, the approach worked well for the PMM, while results were mixed for the PCM. Bowerman et al. (1999) proposed a data partitioning method to eliminate sources A, B and C for the PMM. This iterative method essentially combined the approaches developed by Current and Schilling (1987) to eliminate source A and B, and of Hodgson and Neuman (1993) to eliminate source C. The method was tested using network data from the Central Valley of Costa Rica. The disaggregated data set had 2,630 demand nodes along with 731 potential sites for facilities. They tested their method using 209 ADPs with p ranging from 5 to 50. They reported results for error measures 1 and 3 in Table 2. They found that their method was more effective than that of Current and Schilling (1987) in reducing error but the computation times were higher. Erkut and Bozkaya (1999) provided an excellent review of the pre-1995 literature on aggregation error analysis for the planar problems. In addition to the review, Erkut and

Ann Oper Res

Bozkaya asserted that source A–C errors result in three types of PMM output perturbations: (a) location error (see error 10, Table 1), (b) cost error, f (X ∗ : A) versus f (X  : A ), and (c) optimality error, i.e., f (X ∗ : A) versus f (X  : A). They made a case that some of the error analysis reported in the literature was exaggerated because of selection of test data, e.g., using uniformly distributed data as opposed to a data distribution that is more realistic. They also gave “dos” and “don’ts” for spatial aggregation. Zhao and Batta (1999) presented a theoretical analysis of aggregation error for the planar PMM with Euclidean distance and centroid ADPs. They studied both worst-case and average-case error. Among other theoretical results they developed bounds on the total DP error (error measure 4, Table 1) for the 1-median problem evaluated at any point X. They applied their analytical results to housing location data from Buffalo, New York; and Ontario, California. The Buffalo problem involved 5032 housing locations and m = 50 ADPs. In their experiments they measured both total error and optimality error (error measure 3, Table 2). The Ontario experiments involved sort of a reverse procedure. They started with 76 zip code areas and then randomly generated n > 76 data points in the zones, measuring the same kind of errors as for the Buffalo problem. We discuss material in Francis et al. (2000) in Sect. 4. Plastria (2000, 2001) studied aggregation error for the planar problem. The papers are primarily theoretical, and have good literature discussions. He considered various distance measures derived from gauge functions. He made a strong case for aggregating data at the centroid of the data set. An important finding of his is that aggregation error for the 1-median problem goes to zero as the distance between the centroid and the server increases as the server location extends out on a ray originating at the centroid. Plastria (2001) contains some results on computational experiments with a DP set consisting of 2000 randomly generated points in the plane. He measured error sources A and B in his experiments, with m = 400 ADPs and p = 1, 3 and 5. Zhao and Batta (2000) considered a seldom studied type of aggregation error. Perhaps due to budgeting constraints it may be the case that only a subset of all possible feasible solutions to a location problem may be considered. They studied a problem of this type on a network where demands could be on links of the network and could be discrete or continuous. They found that, by restricting the search for a solution to the nodes of the network, the resulting error associated with this restriction involved only the demands on a single link of the network. Based on this result, they also suggested a “link aggregation” scheme to further reduce the error. Hodgson (2002) defined data surrogation error for a PMM as the error which occurs when an inappropriate parameter is used to stand in for a target population’s demand. An example of this error was illustrated for 25 Canadian cities where general population data is substituted for target data: either small children or else elderly citizens. In a follow-up paper, Hodgson and Hewko (2003) investigated the relative sizes of total DP error (error measure 4, Table 1) and surrogation errors using Edmonton, Canada data. In their limited testing, they found that for the data they used, surrogation error was a much more serious problem than was ABC error. The paper also contains a disaggregation technique that is designed to reduce both error types. Francis et al. (2003) presented theory and algorithms for aggregations for the 1-median rectilinear distance problem on the plane. The authors first studied properties of an Aligned Row-Column Aggregation (ARC) algorithm that minimizes—over all aligned row-column aggregations—the maximal absolute error for the 1-median rectilinear distance problem. The Centroid Row-Column (CRC) aggregation then is based on the partitioning defined by the ARC aggregation using the centroids of the sets as the aggregate points. The row-column

Ann Oper Res

partitioning is itself determined by solving two contiguous DP aggregation problems on a line. Two algorithms were presented for solving these problems on a line: a bisection search method, and a dynamic programming procedure. While the development of the CRC was based on the 1-median problem, the authors posited that the aggregate sets so obtained can be quite effective for PMM aggregations, for p ≥ 2. In their computational testing they used values of p = 1, 3, 5. They found that the error measures examined with the CRC aggregation could be well-modeled by a power function of the form a/mb , where m is the number of aggregate points, a is a positive constant, and b ≥ 1. Finally we refer to the work of Har-Peled and Mazumdar (2004) and Har-Peled and Kushal (2007) who studied the existence and construction of small ADP sets for weighted Euclidean p-median problems defined in R k . Their main motivation was to improve on and obtain more efficient approximation algorithms for weighted Euclidean p-median problems. They use the relative error measure, rel(X), defined in Sect. 2. Specifically, given a weighted set A of n DP’s in R k , and a parameter ε, Har-Peled and Mazumdar (2004) prove that there is a weighted set A , A ⊂ A, of size O((p/ε k ) log n), such that for the weighted p-median models, denoted by f (X|A); f (X|A ) respectively, we have (1 − ε)f (X|A) ≤ f (X|A ) ≤ (1 + ε)f (X|A), for any subset X of p points in R k . Equivalently, with our terminology, when f (X|A) > 0, rel(X) ≤ ε. They call such a set A an ε-coreset for the p-median problem on A, and show that it can be constructed in time O(n + poly(p, log n, 1/ε)), where poly(·) is a polynomial. Har-Peled and Kushal (2007) improved the above result and showed that one can construct a coreset whose size is independent of n, the size of A. In particular, they construct an ε-coreset A for the weighted Euclidean p-median problem of size O(p 2 /ε k ). However, the coreset A that they construct is not necessarily a subset of A. We note that with the above results, if we let X ∗ and X  denote optimal solutions for f (X|A) and f (X|A ), respectively, then one can show that   2ε 0 ≤ f (X  |A) − f (X ∗ |A) ≤ f (X ∗ |A). 1−ε 3.2 Center and cover problems Daskin et al. (1989) studied aggregation error for discrete planar maximum covering problems. Using n = 355 DPs representing demand areas in the U.S., they measured three error types with three different aggregation schemes. In their experiments, they aggregated both DPs as well as candidate facility site nodes. For each aggregation scheme, they measured a value they call “optimality error” a number computed using (a) the optimal objective value of the model using all 355 DPs, and (b) facility locations determined via an aggregated model but evaluated using all DPs, i.e., error measure 3 in Table 2. A second error measure which they called “coverage error” used (b) above in comparison with the objective value of the aggregated problem, i.e., error measure 1 in Table 2. Also they measured location error using a rather interesting definition of this metric. To measure location error, for each aggregate candidate site α´ i they identified the set of DPs that are closest to α´ i . Then after solving both the aggregated and disaggregated versions of the problem they measured location error as follows. If a node is chosen in the disaggregated solution and if it is associated with α´ i chosen in the aggregated solution, then no location error occurs. However, if α´ i is not chosen

Ann Oper Res

in the aggregated solution, then a location error has occurred. They found that the percentage of location error tended to decrease as m increased. They also found that location error exceeded both optimality error and coverage error for all of the problems examined. From this observation, they “qualitatively confirmed” Goodchild’s finding that “aggregation has a greater effect on location decisions than on the values of the objective function”. For planar problems, Current and Schilling (1990) defined CLM counterparts to the Hillsman and Rhoda (HR) source A, B and C errors. They also presented three aggregation rules, which when applied during the data aggregation process, will reduce these errors. The analogue of the HR source A error applied to the covering problem is that once an aggregated problem is formulated and solved, an original DP may be reported as covered (because its ADP is covered), when in actuality it is not. Also, it may be that the original DP is covered, but the aggregate model indicates that it is not. Current and Schilling noted that there is no analogue of error source C in the covering problem. Their three aggregation rules were tested using an n = 681 node data set representing Baltimore City, Maryland. They found that their aggregation rules were effective in reducing problem size as well as aggregation error. Francis and Rayco (1996) considered aggregation schemes for the unweighted PCM in the plane with rectilinear distances. They noted that the error bound function for the PCM developed by Francis and Lowe (1992) might be difficult to minimize when the number of aggregate points is large. Thus they developed an over-approximation to the error bound function and studied its behavior. They also developed a lower bound for the minimal value of this error bound and gave necessary and sufficient conditions for the lower bound to be attained. They then gave a constructive aggregation algorithm that attains the lower bound asymptotically as the number of ADPs increases. Their results parallel the results derived by Zemel (1985) for Euclidean distance PCMs and PMMs. However, Zemel did not focus on the implications of the results for designing good aggregation schemes. Rayco et al. (1999) presented an aggregation algorithm for the rectilinear distance PCM. The aggregate set identified imposed a grid structure on the plane. The cells comprising the grid structure were diamond-shaped and all of the same user-specified dimensions. The positioning of the grid was determined by minimizing an upper bound eb on the objective function error, where |f (X : A ) − f (X : A)| ≤ eb, for all X. Minimizing this error bound was shown to decompose into 1-center problems on cycles. In their computational testing, the authors applied the algorithm to computer-generated data sets, as well as a real-world data set instance; they found a decreasing rate of improvement in the error measures as the number of ADPs increased. In a study of access to health services, Fortney et al. (2000) examined metrics related to both center and covering objectives for planar data from the state of Arkansas. They studied aggregation of location data for 435 patients in need of care as well as aggregation of over 4000 health care provider locations. Since all data was available beforehand, no optimization was involved. Aggregation was done at the zip code level with centroids taken as locations. Results of coverage using aggregate data were then compared with coverage using street address encoding. They found that for the center-type objectives (called accessibility in their paper), when compared with zip code aggregation, a large increase in accuracy can be obtained by using a GIS to encode both patient and provider locations at the street level. For the covering type objective (called availability) they found the street level geocoding did not provide substantial accuracy. Francis et al. (2004a) considered aggregation issues for location problems that involve nearest distances of DPs to servers where these distances are in the objective function as well as in a set of constraints. They used and extended error bound results developed by Francis

Ann Oper Res

et al. (2000) for the constrained problems to be able to analyze the effect of aggregation on the problem constraints. The paper made use of penalty functions as a method for collapsing the constraints into the objective function. An application of the method was illustrated via the CLM. (See Sect. 4 for more details.) Finally we refer to the work by Agarwal and Procopiuc (2002), Agarwal et al. (2002), and Har-Peled (2004a, 2004b) and Agarwal et al. (2005), who studied the existence and construction of small ADP sets for unweighted Euclidean p-center problems defined in R k . Using our terminology, they use the relative error measure, defined in Sect. 2. Specifically, given a set A of n DPs in R k , and a parameter ε, they prove that there is a set A , A ⊂ A, of size O(p/ε k ), such that for the unweighted p-center models, f (X|A); f (X|A ), we have f (X|A) ≤ (1 + ε)f (X|A ), for any subset X of p points in R k . With our terminology, when f (X|A) is not zero, rel(X) ≤ ε. (Note that for this unweighted model we clearly have f (X|A ) ≤ f (X|A).) They call such a set A an additive ε-coreset for the p-center problem on A, and show that it can be constructed in time O(n + p/ε k ). (It should be pointed out that the additive ε-coreset that they define is more general. We use a simplified version sufficient for our presentation and discussion.) We note that with the above results, if we let X ∗ and X  denote optimal solutions for f (X|A) and f (X|A ), respectively, then one can show that 0 ≤ f (X  |A) − f (X ∗ |A) ≤ εf (X ∗ |A). 3.3 Other related work, and literature conclusions In this section we discuss a few papers that do not fit into our classification scheme. Also we draw some conclusions from our study of the literature. Webber (1980) considered a spatial interaction model, which expressed the idea that the proportion of flows pij from origin i to destination j depends on the attractiveness of j , typically measured as the “size” mj of j , and some function of the distance dij between γ i and j : mj f (dij ), where γ is a parameter reflecting the significance of the size of the destination as an attractive force. The author argued that the use in spatial interaction models of parameter estimates that are derived using the assumption that all individuals in a zone are located at the center of the zone is flawed. He indicated that computations that consider interzonal distance to the distance between the zone centers give rise to biased estimators. Instead, Webber used maximum likelihood methodology to propose the use of estimators that eliminate such bias. Rodriguez-Bachiller (1983) looked into errors introduced with the use of intercentroid distances for measuring separation between discrete zones, especially for spatial models involving a power or exponential function of distance.  He proposed an error-correcting

2 , where dii (djj ) is approach which utilized the following formula: Iij = dij2 + dii2 + djj the average distance between all the points in zone i(j ) to its centroid, and dij is the intercentroid distance between zones i and j . Modifications to this transformation formula depending on whether the underlying network of interest exhibited low connectivity or contained so-called “privileged” links were also provided. The error correction transformation proposed was tested on one particular urban area, using three levels of aggregation (119, 29, and 4 zones), and reduction in the distortion was reported. Rodriguez-Bachiller characterized his approach as “purely intuitive.” Rushton (1989) focused on the degree to which locational complexity and geographical complexity in a location model influence solution quality. The former refers to the identification and evaluation of a large number of potential locations; the latter refers to the

Ann Oper Res

representation of the geography of an area. The level of spatial aggregation employed in a model falls within the latter concern. In particular, Rushton considered the effect of employing discrete structures to represent continuously distributed data. He questioned Goodchild’s (1979) statement that “aggregation tends to produce much more dramatic effects on location than on the values of the objective function,” and asserted that optimal locations are sensitive at some level of data aggregation. He identified three ways to avoid error due to aggregation, namely, (1) data disaggregation, (2) the removal of Type A and B errors (as outlined by Current and Schilling 1987), and (3) modeling and solving the problem in continuous space. Still, these suggestions do not come with clear cut methodologies, as for instance, the disadvantages of having large data sets with disaggregated data remain. Other geographical representation issues Rushton analyzed were the representation of distance measures, the effect of representing boundaries to solution quality and the accuracy of modeling interaction patterns. In the end, Rushton argued for more explicit attention to the tradeoffs in the level of detail employed for capturing complexity—locational and geographical—in location problems. Drezner (1995b) and Drezner and Drezner (1997), consider a market share location problem for a single server. There are competitors with known locations competing for market share, and demand points providing the market. While the approach is not demand point aggregation, it is related to it. It is assumed effectively that there are an infinite number of DPs, so that the original (continuous) model, say f (X), is one involving integrals (instead of sums) and the location X of a single server. The continuous model is compared to a discrete one involving sums, say f  (X). The discrete model is constructed using what amounts to DP aggregation to obtain a finite number of ADPs. The authors compare graphs of surfaces of f (X) and f  (X), and find f  (X) has local optima that f (X) does not have. With the discrete model they also consider location error graphically, and find as well that “calculation of market share is inaccurate when two facilities are located close to one another and one is located near a (aggregate) DP whereas the other is not.” Thus errors similar to Type A/B errors for the PMM can occur. All the computational experience is for the case when the DPs are uniformly distributed over a given planar set of known area. Also for this uniform case, the authors reduce A/B errors by using a modified distance measure in f  (X) based on a “distance correction” approach. Hale and Hale (2000) considered solving a single facility planar location problem, with one-dimensional barriers to travel. The motivating problem was to locate a facility in a city divided by a river spanned by bridges. Travel from one side of the city to the other was thus by a bridge that yielded a shortest travel distance among all the bridges. The river represented a barrier to travel that could be “penetrated” at the bridge points. The bridges effected a simplification of the problem of computing the average travel distance between a point on one side of a river and any subregion on the opposite river side. Connections were made to a substantial literature involving the use of integrals to compute average travel distances between regions of various shapes, and there was a good literature discussion. The use of integrals can be viewed as a sort of DP aggregation. There was no computational experience, and no aggregation error measure was used. One integral was evaluated to illustrate the average travel distance idea assuming uniformly distributed DPs. Hale et al. (2000) considered a planar region with DPs for which a PMM is to be solved. They assumed the region was partitioned in some way into N ∧ subregions. Assuming a DP density function was given and making an independence assumption, they computed the average distance between each pair of subregions using an integral. They then constructed a network with N ∧ nodes, where the length of arc (i, j ) was the average distance between subregions i and j . Each node i representing a subregion of positive area had a self-loop

Ann Oper Res

representing the average distance between any two points in the subregion. They then used the network information to construct an uncapacitated facility network location model to represent an aggregated approximation to the original PMM for the planar region. One numerical example was given of a network with 5 nodes assuming uniformly distributed DPs and rectilinear distances. The resulting uncapacitated facility location model was solved by complete enumeration for p = 2. One facility location, instead of being found, was localized to a subregion. There was no computational experimentation, and no error measure was reported. We draw the following conclusions from our literature survey: 1. There is much more aggregation literature for median than for center, covering and other models; 2. The work of Hillsman and Rhoda is widely recognized and influential; in particular, self-canceling error is a helpful concept for models with additive structure; 3. Justifying theory for using centroids as ADPs is limited to median models; 4. Aggregation error bounds can be useful, particularly for center and covering models; 5. Location modelers should be aware of the law of diminishing returns for aggregation error; 6. Although there are clear tradeoffs in doing aggregation, there are no analytical models for addressing them; 7. There is little average-case analysis of aggregation error; 8. Progress is definitely being made in understanding aggregation error; 9. DP data for aggregation algorithm testing is often computer-generated instead of being real data; 10. Sometimes the number of DPs used in testing aggregation methods is so small that aggregation is not really necessary, although this can depend on the computational state of the art at the time the work was done; 11. There appears to be little or no theoretical basis for the sometimes implicit assumption that conclusions for aggregation of “small” problems scale up to similar conclusions for “large” problems; 12. The variety of findings for location error, and the use of heuristics to compute X ∗ and X  , support statements in Sect. 2 that this error is difficult to deal with; 13. Aggregation error measures used vary greatly, and there is no agreement on how to measure error; hence it is pointless to ask which aggregation algorithm is best, since “best” is not defined.

4 ADP-DP distances, SAND functions and aggregation error bounds, constraint aggregation, big problems versus small ones This section deals with specific aggregation concepts in more detail. In Sect. 4.1 we point out worst-case failings of some aggregation error measures and suggest a repair based on ADP-DP distance bounds. In Sect. 4.2 we consider the prevalent case when the real-valued costing function g with domain R+n has special structure. We say that g is subadditive (SA) if for every U, V ∈ R+n , we have g(U + V ) ≤ g(U ) + g(V ). We say that g is nondecreasing (ND) if for every U, V ∈ R+n with U ≤ V we have g(U ) ≤ g(V ). We say that g is SAND if it is both subadditive and nondecreasing. When g is SAND, there are easily computed error bounds, as well as a Lipschitz-like property for the location function f (X : A) (with possible implications for location error) that we believe to be new. In Sect. 4.3 we consider aggregation for constraints of location models, concentrating on the CLM. In Sect. 4.4 we

Ann Oper Res Table 5 Example of aggregation and aggregation errors for unweighted planar 2-median X = {(0, 4), (0, 2)} using closest rectilinear distances, with N1 = {1, 2}, N2 = {3, 4, 5} j aj

aj

D(X, aj )

D(X, aj )

ej (X)

1 (−r, 1) (r, 2) = α´ 1 r

r +1

−1

2 (−r, 2) (r, 2) = α´ 1 r

r

0

3 (−r, 3) (r, 4) = α´ 2 r

r +1

−1

4 (−r, 4) (r, 4) = α´ 2 r

r

0

5 (−r, 5) (r, 4) = α´ 2 r

r +1

−1

f (X : A ) = 5r f (X : A) = 5r + 3 e(X) = −3,

abci (X), i = 1, 2 d(aj , aj ) 2r + 1 abc1 (X) = −1

2r 2r + 1 2r

abc2 (X) = −2

2r + 1

eb = 10r + 3

ae(X) = 3

discuss DP aggregation for PCMs and CLMs where big models are not like small ones, thus raising the question of the validity of drawing conclusions about aggregation for large models based on the study of aggregation for small ones. 4.1 Error measure comparisons Table 5 presents a small aggregation example. The example serves to illustrate some basic aggregation ideas, as well as worst-case limitations of some aggregation error measures. There are five DPs, N = {1, . . . , 5}; N1 = {1, 2}, N2 = {3, 4, 5} is the partition of N defining the aggregation. DPs are aggregated into two ADPs α´ 1 and α´ 2 , as shown; each ADP is a closest one to its DP. For the given X, closest distances D(X, aj ) and D(X, aj ) are also shown. Since the example model is unweighted, ej (X) = D(X, aj ) − D(X, aj ) for j ∈ N , so abc1 (X) = e1 (X) + e2 (X) = −1, abc2 (X) = e3 (X) + e4 (X) + e5 (X) = −2. Distance difference errors are identical to DP errors, and are all nearly zero. Total DP error e(X) is also nearly zero so absolute error is nearly zero, as are the ABC errors. We have rel(X) = |f  (X) − f (X)|/f (X) = 3/(5r + 3), which is small for large r. Other choices of 2-medians placed on the y-axis would lead to similar conclusions. Note the value of r is arbitrary, so that (last column) the ADPs can be arbitrarily far from the DPs. Hence the error measures of the last paragraph, which have small values, are misleading for this X. This example demonstrates that local error measures may be misleading, and that we really need the error measures to be nearly zero for all X. This need for all the error measures to be small will lead (below) to a relationship between the ADP-DP distance error measures and the others in Table 5. If we use Table 5 to illustrate the unweighted 2-center model, then f (X : A) = max{D(X, aj ) : j ∈ N } = r + 1, f (X : A ) = r, ae(X) = 1, and rel(X) = 1/(r + 1) which is small for large r. Except for the error bound measure, eb = 2r + 1, the error measures that are defined for this model are unreliable for the given X. The above example is a small one, but it is possible to construct examples with many DPs where the error measures still fail in a similar manner. The following observation is a basis for finding an aggregation criterion that holds for all X.

Ann Oper Res

ADP-DP distance bounds (a) For any DP aj and its ADP aj , j ∈ N , and any p-server X ⊂ S, the ADP-DP distance differences satisfy −d(aj , aj ) ≤ D(X, aj ) − D(X, aj ) ≤ d(aj , aj ) |D(X, aj ) − D(X, aj )|





d(aj , aj ).

(b) We have max{|D(X, aj ) − D(X, aj )| : X a p-server, X ⊂ S} = d(aj , aj ). Part (a) above is presented in Francis and Lowe (1992), and is due to the triangle inequality for distances. Part (b) follows from part (a) and the following: ADP-DP observation If aj ∈ X and is the closest point in X to aj , or aj ∈ X and is the closest point in X to aj , then |D(X, aj ) − D(X, aj )| = d(aj , aj ). Let ε be any small positive number. We now relate ADP-DP distances to distance difference errors. ADP-DP distance bound claim

For any ε > 0, the following are equivalent:

−ε ≤ D(X, aj ) − D(X, aj ) ≤ ε d(aj , aj )

≤ε

max{d(aj , aj )

for all X ⊂ S, j ∈ N,

for all j ∈ N, : j ∈ N } ≤ ε.

Proof The following are all clearly equivalent: −ε ≤ D(X, aj ) − D(X, aj ) ≤ ε |D(X, aj ) − D(X, aj )|

≤ε

max{|D(X, aj ) − D(X, aj )| d(aj , aj ) ≤ ε, max{d(aj , aj )

j ∈ N,

for all X ⊂ S, j ∈ N,

for all X ⊂ S, j ∈ N, : X ⊂ S} ≤ ε,

j ∈ N,

(using (b) above)

: j ∈ N } ≤ ε.



The above claim leads to a rationale to consider ADP-DP distances when doing aggregation; having them all small guarantees that other error measurements are also small for all X. Note, since max{d(aj , aj ) : j ∈ N } ≤ ε, the smallest that ε can be for the given ADPs is max{d(aj , aj ) : j ∈ N }. ADP-DP error measure guarantee d(aj , aj ) ≤ ε for all j ∈ N , then

Consider the PMM, and let W = w1 + · · · + wn . If

(0) (a) whenever D(X, aj ) ≤ rj , then D(X, aj ) ≤ rj + ε, j ∈ N , (b) whenever D(X, aj ) > rj , then D(X, aj ) > rj − ε, j ∈ N , (1) |ej (X)| ≤ wj ε for j ∈ N and all X ⊂ S, S, (2) |e(X)| ≤ |e1 (X)| + · · · + |en (X)| ≤ W ε and all X ⊂ (3) |abci (X)| ≤ ωi ε for all X ⊂ S, i = 1, . . . , m, (ωi ≡ {wj : j ∈ Ni }), (4) ae(X) ≤ W ε for all X ⊂ S, (5) mae(f  , f ) ≤ W ε,

Ann Oper Res

(6) eb = W ε is a valid error bound. The conclusion we draw from the example of Table 5, and the above guarantee, is that if we keep all ADP-DP distances small in doing a PMM aggregation then other aggregation measures will also be small. If we do not, then Table 5 illustrates that other aggregation measures can be small but we can have a bad aggregation. In effect, if we omit from our set of aggregation error measures the condition that all ADP-DP distances are small, then the remaining error measures (2 through 8 of Table 1) can be incomplete and unreliable. A similar observation and guarantee apply to the PCM, f (X : A) = max{wj D(X, aj ) : j ∈ N }. Define Wmax = max{wj : j ∈ N }. To obtain the observation from the one above, replace W by Wmax in items (4) through (6) above; omit items (2) and (3) (which do not apply). 4.2 SAND costing functions, error bounds, location error, and the Lipschitz condition The topic of using ADP-DP distances for aggregation evaluation is closely related to aggregation error bounds. These bounds avoid the limitations of some aggregation error measures by using ADP-DP distances to define the error bounds, thus implicitly using the ADP-DP distance aggregation error measure. They also incorporate the model cost structure. It is worth considering these error bounds since the bounds are useful, easy to compute, and apply to a large class of models (see Francis et al. 2000, 2004a, 2004c). The error bounds lead naturally to aggregation algorithms and have been found to be computationally useful for PMMs (Francis et al. 1996), PCMs (Rayco et al. 1997, 1999), and CLMs (Emir-Farinas and Francis 2005). Our location model of interest is given by f (X : A) = g(D(X, A)), with D(X, A) = (D(X, aj )) ∈ R+n the n-vector of closest distances, and g a costing function. From Sect. 1, taking g to be the sum-function or the max-function gives the PMM or PCM respectively. Each of these g functions is an instance of a larger class of SAND functions considered. These SAND costing functions provide a unifying aggregation theory as well as a sort of Lipschitz condition relating location error (X  , X ∗ ) and optimality error |f (X  : A) − f (X ∗ : A)|. Carrizosa et al. (2000) studied a model related to the one we discuss. They consider only single-facility models. They assume g is nondecreasing and quasiconcave. Their work, like ours, involves using approximating distance functions, but these are defined by general gauges. Also their work, like ours, involves Lipschitz conditions, which we consider in more detail in the Appendix. There is a substantial location literature involving Lipschitz costing functions, for a review see Hansen et al. (1995); for specific uses see Plastria (1992) and Romero-Morales et al. (1997). Let g be a real-valued function defined on R+n . We assume that g is SAND. It is known that g being SAND implies 0 ≤ g(U ) for all U ∈ R+n . Often it is also reasonable to assume g(0) = 0, giving f (X : A) = 0 if D(X, A) = 0. If g is SAND with g(0) = 0 and g(U ) > 0 for every nonzero U ∈ R+n , we say that g is super-SAND, and write g is SSAND. The max function and the sum function are SSAND. Basic SAND inequalities Suppose g is SAND, with domain R+n . For any U1 , U2 , U3 ∈ R+n with U1 ≤ U2 + U3 and U2 ≤ U1 + U3 , we have g(U1 ) ≤ g(U2 ) + g(U3 ) and g(U2 ) ≤ g(U1 ) + g(U3 ) using the ND and SA properties in turn. Thus |g(U2 ) − g(U1 )| ≤ g(U3 ). We now use the fact that closest-distances satisfy several triangle inequalities (Zemel 1985; Francis and Lowe 1992) and the SAND inequalities, to get several useful bounds.

Ann Oper Res

Closest-distance DP triangle inequalities Given the DPs aj and ADPs aj , j ∈ N , define the vector of ADP-DP distances δ(A , A) = δ(A, A ) = (d(aj , aj )) ∈ R+n . We have D(X : A) ≤ D(X : A ) + δ(A , A), D(X : A ) ≤ D(X : A) + δ(A, A ) for every p-server X ⊂ S. Basic SAND error bound Applying the basic SAND inequalities to the above triangle inequalities, with eb ≡ g(δ(A , A)), gives |g(D(X, A )) − g(D(X, A))| ≤ eb for every pserver X ⊂ S. For examples of error bounds when either g(U ) = w1 u1 + · · · + wn un or g(U ) = max{w1 u1 , . . . , wn un }, we have either eb = w1 d(a1 , a1 ) + · · · + wn d(an , an ) or eb = max{w1 d(a1 , a1 ), . . . , wn d(an , an )}, giving error bounds for either the PMM or PCM respectively. For the 2-median example of Table 5 with all wj = 1, eb = 10r + 3 which is large if r is large. If r = 0, then eb = 3 = ae(X), illustrating the error bound can be tight. For the related 2-center example also based on the data of Table 5, f (X : A) = r + 1, f (X : A ) = r, ae(X) = 1, eb = 2r + 1 and suggests a bad aggregation. If r = 0 then ae(X) = eb, so the error bound is tight. Note the error bound g(δ(A , A)) is necessarily worst-case, since it is valid for all X ⊂ S. The PCM error bound does not capture the self-canceling error concept that occurs with additive models like the PMM. When g is SSAND it is known that g(δ(A , A)) defines a distance on the set of all possible pairs of n-tuples of DPs. There are several costing functions g determined to be SAND (Francis et al. 2000), which include many others as special cases. The convex ordered median costing function discussed below unifies the PMM, PCM, p-centrum and p-centdian location models. This function was introduced independently by Nickel and Puerto (1999), Rodriguez-Chia et al. (2000), and Puerto and Fernandez (2000). A second instance of g of interest is the p-norm costing function (Francis et al. 2000) studied by Shier and Dearing (1983) and Zemel (1985). We consider these two functions after stating the following property. Basic SAND scaling property Suppose g is SAND with domain R+n , and B is any n × n matrix with all nonnegative entries. Then the function gB (U ) ≡ g(BU ) with domain R+n is SAND. The property gives an easy way to build weights into a SAND costing function. For example, suppose B = (bij ) is a diagonal weight matrix, defined as a diagonal matrix with bjj = wj , all j ∈ N . The p-norm function g(U ) = ((u1 )p + · · · + (un )p )(1/p) is known to be SAND for p ≥ 1. Therefore gB (U ) = [(w1 u1 )p + · · · + (wn un )p ](1/p) is also SAND. Hence the p-norm location model gB (D(X, A)) is a SAND location model. Now consider the convex ordered median model. Given any U ∈ R+n , denote its compo· · · ≥ u[n] . Let λ1 ≥ · · · ≥ λn ≥ 0 be any sequence nents in decreasing order by u[1] ≥ u[2] ≥ of real constants. The function g(U ) = j =1,...,n λj u[j ] is known to be a SAND function with domain R+n . Suppose again B is a diagonal weight matrix, and we modify g to obtain gB . Then gB (D(X : A)) is a SAND location model. If λ1 = 1 and λj = 0 otherwise, the result is the weighted PCM. If λj = 1 for all j ∈ N , the result is the weighted PMM. If λ1 = 1 and λj = α otherwise for some α with 0 < α < 1, we get the p-centdian model. The case λ1 = · · · = λk = 1 and λj = 0 otherwise gives the k-centrum model. Consider now a Lipschitz-like result for the SAND location function. Francis and Lowe (1992) prove the following result. Closest-distance and min-max distance p-server triangle inequalities Let e be the nvector of ones, and A the DP vector. With the definition of (X, Y ) from Sect. 2, for any two p-servers X and Y ⊂ S we have

Ann Oper Res

(1) D(X, A) ≤ (X, Y )e + D(Y, A), (2) D(Y, A) ≤ (Y, X)e + D(X, A). Applying the basic SAND inequalities to (1) and (2), we obtain the following: Lipschitz-like bound

For any two p-servers X and Y ⊂ S we have |f (X : A) − f (Y : A)| ≤ g((X, Y )e).

If g(0) = 0 we note the bound is tight with X = Y . We remark that the bound also applies to the approximating model f (X : A ). Using the Lipschitz-like bound, we conclude that |f (X  : A) − f (X ∗ : A)| ≤ g((X  , X ∗ )e), thus obtaining a new relationship between absolute optimality error and location error. If an aggregation algorithm can compute X  satisfying (X  , X ∗ ) ≤ r, then we conclude (since g is ND) that |f (X  : A) − f (X ∗ : A)| ≤ g(re). For our purposes, an actual Lipschitz inequality is of the form |f (X : A) − f (Y : A)| ≤ L(X, Y ) for some positive Lipschitz constant L, and holds for all p-servers X and Y . Consider now some conditions on the function g for which we obtain this inequality or something quite similar. Comment (a) If g is also convex, then (Rockafellar 1970, Thm. 4.7) it is known to be positively homogeneous of degree 1. This means g((X, Y )e) ≤ (X, Y )g(e), which gives the Lipschitz condition with Lipschitz constant L = g(e). (b) If g is SAND and also positively homogeneous of degree t, t ≥ 1 (Rockafellar 1970 p. 135), then g((X, Y )e) ≤ ((X, Y ))t g(e). We next note that if g is SSAND then the function g((X, Y )e) can be interpreted as a distance. Lipschitz bound distance property For any two p-servers X and Y ⊂ S, define β(X, Y ) = g((X, Y )e). If g is SSAND, then β(X, Y ) is a distance defined on the set of all pairs of p-servers. Examples with β(X, Y ) = L(X, Y ) for some Lipschitz constant L is homogeneous of degree 1.

For each example, g

(1) gB is the p-norm costing function with diagonal weight matrix B for p ≥ 1: we have p p L = (w1 + · · · + wn )(1/p) . (2) gB is the convex ordered median costing function with diagonal weight matrix B, and with ordered nonnegative DP weights w[1] ≥ w[2] ≥ · · · ≥ w[n] : we have L = λ1 w[1] + · · · + λn w[n] . Note an appropriate choice of the λ s from {0, 1} gives either L = W or L = Wmax for the PMM and PCM respectively. Following Francis et al. (2004c), the Lipschitz results may be useful if we wish to aggregate some large set of feasible solutions into a smaller set. Let FS be some set of potential pservers and denote by FSagg some aggregation of FS, possibly finite, with the property that FSagg ⊂ FS. We say that FSagg is r-close to FS if for each X ∈ FS there exists a YX in FSagg with (X, YX ) ≤ r. Suppose X ∗ and Y ∗ are global minimizers of f (X : A) over FS and

Ann Oper Res Table 6 Alternative relaxations and restrictions of both the original and aggregated covering location models assuming all δj < rj : approach I is tighter than approach II

1

Definitions

I. Individual constraint approach

II. Combined constraint approach

α´ 1 , . . . , α´ m : the m distinct ADPs

f (X) ≡ max{D(X, aj )/rj : j ∈ N }

βi ≡ min{rj − δj : aj = α´ i }, i = 1, . . . , m γi ≡ min{rj + δj : aj = α´ i }, i = 1, . . . , m

eb ≡ max{δj /rj : j ∈ N }

δj ≡ d(aj , aj ), j ∈ N ; δj < rj , j ∈ N

2

f  (X) ≡ max{D(X, aj )/rj : j ∈ N }

D(X, aj ) ≤ rj , j ∈ N , all X

f (X) ≤ 1, all X

D(X, aj ) ≤ rj , j ∈ N , all X

f  (X) ≤ 1, all X

Restrictions of

D(X, aj ) ≤ rj − δj , j ∈ N , all X ⇔

f  (X) ≤ 1 − eb, all X ⇔

both original

D(X, α´ i ) ≤ βi , i = 1, . . . , m, all X

D(X, aj ) ≤ rj (1 − eb), all X, j ∈ N

Relaxations

D(X, aj ) ≤ rj + δj , j ∈ N , all X ⇔

f  (X) ≤ 1 + eb, all X ⇔

of both original

D(X, α´ i ) ≤ γi , i = 1, . . . , m, all X

Original covering constraints

3

Aggregate constraints

4

and aggregate constraints 5

D(X, aj ) ≤ rj (1 + eb), all X, j ∈ N

and aggregate constraints

FSagg respectively. It is direct to use the Lipschitz inequality, with the definitions of X ∗ and Y ∗ , to conclude f (Y ∗ : A) − g(re) ≤ f (X ∗ : A) ≤ f (Y ∗ : A)



1 − g(re)/f (Y ∗ : A) ≤ f (X ∗ : A)/f (Y ∗ : A) ≤ 1. For example, FS might consist of all p-servers contained in a network, while FSagg might be the set of all p-servers with each server a vertex. If 2r is the length of the longest arc in the network, then FSagg is r-close to FS and is contained in FS. Thus if g(re)/f (Y ∗ : A) is small, then Y ∗ may be an acceptable approximation to X ∗ . 4.3 Constraint aggregation Aggregation can occur in constraints (Francis et al. 2004a, 2004b), instead of the objective. For example, consider the CLM with n constraints whose aggregation was illustrated in Sect. 1. If only m of the ADPs in the aggregated CLM are distinct, then n − m of the aggregated CLM constraints are redundant, and may be deleted as in the example of Sect. 1 or as shown in Table 6. Let us now develop a basic error bound idea for constraints. Generally, we have location constraints of the form fj (X) ≤ rj , j ∈ N, X ⊂ S. Suppose each function fj (X) is replaced by some approximating function, say fj (X), giving non-distinct constraints for the aggregated model of fj (X) ≤ rj , j ∈ N, X ⊂ S. If we now define functions f (X) and f  (X) by f (X) ≡ max{(1/rj )fj (X) : j ∈ N }, f  (X) ≡ max{(1/rj )fj (X) : j ∈ N }, then

Ann Oper Res

the constraints for the two models are equivalent to f (X) ≤ 1 and f  (X) ≤ 1 respectively. Hence we can view f  (X) as an aggregated version of the function f (X), and apply whatever function error measures are of interest. It is known (Francis et al. 2004a), for example, that if fj (X) and fj (X) have error bound bj (= d(aj , aj ) for the CLM) for j ∈ N , then f (X) and f  (X) have the (unitless) error bound eb = max{bj /rj : j ∈ N }. For the CLM, the resulting error bound is identical in form to that for the PCM; hence aggregation methods providing small PCM error bounds also can provide small CLM error bounds, and vice-versa. Note also, since the constraints of the CLM may be written as f (X : A) ≡ max{(1/rj )D(X, aj ) : j ∈ N } ≤ 1, that a Lipschitz bound applies to this function f . When f (X) and f  (X) are any original and aggregated functions with some error bound eb, it follows directly that f  (X) ≤ 1 − eb ⇒ f (X) ≤ 1; f (X) ≤ 1 ⇒ f  (X) ≤ 1 + eb. Thus the constraint f  (X) ≤ 1 − eb gives a restriction of the original constraint, while f  (X) ≤ 1 + eb gives a relaxation. Each can be easier to deal with than the original constraint and may be used to compute lower and upper bounds on the optimal objective function value of the original model. Supposing eb  1 (which is clearly desirable), feasibility conclusions about one model thus allow us to draw feasibility or “near-feasibility” conclusions about the other model. Following Francis et al. (2004c), Table 6 illustrates the use of error bounds as discussed to obtain relaxations and restrictions of the aggregated CLM that are also relaxations and restrictions of the original model. We can aggregate the individual constraints I (see Table 6), or deal with the combined constraints II. There is a resulting interesting modeling insight: Even though I and II give equivalent formulations of the original CLM, I gives at least as tight a relaxation and at least as tight a restriction as II. This is because we have rj + δj ≤ rj (1 + eb), rj (1 − eb) ≤ rj − δj , for all j ∈ N . This insight is very much in the same spirit as that of Cornuejols et al. (1977) for alternative formulations of the PMM as a mathematical program. Francis et al. (2004c) used approach I of Table 6. They solved to optimality a CLM with almost 70,000 original CLM constraints by solving several aggregated CLMs each with less than 1,000 covering constraints. Their computational experience was usually that the minimal objective function value of the original model was underestimated when solving the approximating model without enough ADPs, which is consistent with the discussion in Sect. 1. For a successful use of this aggregation approach to a CLM application see Dekle et al. (2005). The error bound max{wj d(aj , aj ) : j ∈ N } for the PCM and CLM for some choice of the wj including wj = 1/rj is quite robust. It applies to an obnoxious facility location model (Francis et al. 2000; Erkut and Neuman 1989) and, when doubled, to a p-center hub location model (Gavriliouk 2003; Ernst et al. 2002a, 2002b). It also occurs in the ADP-DP Distance Bound Claim and in a penalty function approach we next consider. Emir-Farinas and Francis (2005) considered maximum violation error for the CLM, a special type of constraint penalty function. For constraint j, D(X, aj ) ≤ rj , (D(X, aj ) − rj )+ ≡ max{D(X, aj ) − rj , 0} is the amount by which the constraint is violated, and can be scaled by dividing by rj . Let mve(X) = max{(D(X, aj ) − rj )+ /rj : j ∈ N }, mve (X) = max{(D(X, aj ) − rj )+ /rj : j ∈ N } denote the maximum (scaled) constraint violations for the original and aggregated CLMs. Each of the penalty functions, mve(X) and mve (X), is always nonnegative and is zero if and only if X is feasible to the underlying problem. Ideally, we want each error measure to be zero if and only if the other is zero. More realistically, if one measure is zero we guarantee the other is small by having a small error bound on the quantity |mve (X) − mve(X)|. We summarize this discussion in column 2 of Table 7. The result of Emir-Farinas and Francis appears in the last row of column 2 of Table 7.

Ann Oper Res Table 7 Illustration of constraint penalty approach to covering constraints, and to general SAND constraints with any SSAND penalty costing function g Item name

Covering example penalty analysis

General SAND constraints penalty analysis

Original constraint j

D(X, aj ) ≤ rj , j ∈ N

cj (X) ≤ rj , j ∈ N

Constraint j in standard

D(X, aj )/rj ≤ 1, j ∈ N

fj (X) ≡ cj (X)/rj ≤ 1, j ∈ N

D(X, aj )/rj ≤ 1, j ∈ N

fj (X) ≡ cj (X)/rj ≤ 1, j ∈ N

|D(X, aj )/rj − D(X, aj )/rj |

|fj (X) − fj (X)| ≤ ebj ,

form Aggregated constraint j , standard form Error bound inequality for constraint j

≤ d(aj , aj )/rj ,

j ∈ N , all X

j ∈ N , all X

n EB ≡ (d(aj , aj )/rj ) ∈ R+

n EB ≡ (ebj ) ∈ R+

Infeasibility violation

vej (X) ≡ max{D(X, aj ) − 1, 0}/rj },

pen(X, fj ) ≡ max{fj (X) − 1, 0},

error/penalty, constraint j

j ∈N

j ∈N

Vector of constraint

n VE(X) ≡ (vej (X)) ∈ R+

n P (X : F ) ≡ (pen(X, fj )) ∈ R+

Infeasibility violation

vej (X) ≡ max{D(X, aj ) − 1, 0}/rj ,

pen(X, fj ) ≡ max{fj (X) − 1, 0},

error/penalty,

j ∈N

j ∈N

n VE (X) ≡ (vej (X)) ∈ R+

n P (X : F  ) ≡ (pen(X, fj )) ∈ R+

mve(X) ≡ max{vej (X) : j ∈ N }

π(X) ≡ g(P (X : F )),

Constraint error bound vector

violation errors/penalties

agg. constraint j Vector of agg. constraint violation errors/penalties Maximum violation error/composite constraint

g is SSAND

penalty, original constraints Maximum violation

mve (X) ≡ max{vej (X) : j ∈ N }

π  (X) ≡ g(P (X : F  ))

ceb ≡ max{d(aj , aj )/rj : j ∈ N }

ceb ≡ g(EB)

|mve (X) − mve(X)| ≤ ceb, all X.

|π  (X) − π(X)| ≤ ceb, all X

error/composite constraint penalty, aggregate constraints Composite error bound on absolute error of maximum violation error/composite constraint penalty Error bound inequality

Emir-Farinas and Francis (2005) presented several algorithms for aggregating DPs for CLMs assuming rectilinear distances. They considered four different infeasibility measures, including maximum violation error and related average violation error. They found both theoretically and computationally that the choice of an infeasibility measure had a quite substantial effect on the best choice of an aggregation method. Their algorithms exploit CLM structure and allow the user to control the aggregation error by specifying a maximum

Ann Oper Res

allowable error bound value as an input to the algorithm. Their computational experience indicated that the error bound was a good surrogate for the maximum absolute error. Following Francis et al. (2004a), consider a generalization of the CLM and of the above penalty approach to constraint aggregation. Denote the original constrained location model as follows: (Prorig ) : min f0 (X) s. to F (X) ≤ e, X ⊂ S. Here e is a vector of ones, each entry fj (X) in F (X) ∈ R+n is SAND with n ≡ |N |. The functions fj (X) are distinct. The model is in standard form, in the sense that any constraint of the form cj (X) ≤ rj with cj (X) a SAND function, is rewritten equivalently as fj (X) = (1/rj )cj (X) ≤ 1. Column 3 of Table 7 illustrates how to extend the penalty function approach of Emir-Farinas and Francis to the more general context of any SAND constraints and any SSAND function g, including the max-function and the sum-function. Results in column 2 are a special case of those in column 3. The principal conclusion from Table 7 is in the bottom row of column 3. Note, if X is feasible to the aggregation of (Prorig ), then |g(P (X, F )) − g(P (X, F  ))| ≤ ceb becomes g(P (X, F )) ≤ ceb thus giving a bound on how “infeasible” X can be to (Prorig ). There is a direct generalization of this penalty approach if the SAND objective function f0 (X) is also replaced by some approximating SAND objective function f0 (X) and each penalty function pen(X, fj ) is multiplied by some appropriate positive weighting constant kj . 4.4 Big aggregation problems may not be like small ones DP aggregation may be needed only when there are many DPs. While we may obtain insight by aggregating small models, as Table 5 illustrates, there is some evidence that big aggregation models may not be like small ones. For some large aggregation models we can obtain simple closed-form “square root” formulas for minimal aggregation error bound values, whereas this has not been the case for small models. This finding necessarily raises substantial questions about the validity of extrapolating from aggregation analysis of small models to analysis of big models. Frieze (1980), Marchetti-Spaccamela and Talamo (1983) and Zemel (1985) considered the case when the n DPs are uniform and iid random variables distributed on some planar set T of area a. The set T must satisfy a (weak) boundary regularity condition. These authors studied the PMM with each weight 1/n and/or the unweighted PCM with Euclidean distances. Given some rather technical assumptions, they derived asymptotically accurate √ (for large values of p and n) “square root” approximation formulas of the form c (a/p) for the model minimal objective function values. The formulas are based on tiling T with p regular hexagons. The constant c is given by c = 0.3772 and c = 0.6204 for the PMM and PCM respectively. For the rectilinear distance unweighted PCM, Francis and Rayco (1996) derived an ap√ proximating minimal objective function value of c (a/p) with c = 0.7071. Again the set T must satisfy a weak boundary regularity condition (different from Zemel’s). This value is asymptotically accurate as p and n increase and is a valid upper bound on the actual minimal objective function value even for smaller p values. It does not require DPs to be modeled as uniform iid random variables but substitutes instead the assumption that every point in T is a DP. The authors observed, since the error bound for a PCM with m ADPs is an m-center √ model, that 0.7071 (a/m) gives an approximate value for the error bound. The formula is based on tiling T with m “diamonds” (squares rotated 45-degrees relative to the axes). Francis et al. (2004b) used the square root formulas for aggregation error bound analysis of a family of both the PCM and CLM that decompose in a natural way into smaller “separate community” models. They also provide computational evidence, assuming rectilinear distance, that the formula is a good approximation to the error bound value for an

Ann Oper Res

aggregation algorithm presented by Emir-Farinas and Francis (2005) for m ≥ 900 (and overestimates it for smaller m). It is critically important in using the square root formula to have an accurate estimate of the set T and its area a. For a DP set with about n = 70,000 from Palm Beach County, Florida, their best choice of the area a of the set T was about 31% of the land area of the entire county. Major portions of the county have no DPs, and must be excluded from consideration to obtain an accurate estimate. This exclusion can be viewed as beginning with a DP data set where the DPs are far from being uniformly distributed. Then the data set is reformulated to one where the uniform approximation is reasonable, thus improving the accuracy of the approximating model. If no such exclusion is made, then what one obtains is likely to be an overestimate of the error bound. How many ADPs are enough? The square root formulas imply that the foregoing question is meaningless, since we must know not only m but also the area a in order to use the formulas. Even if we know both a and m, the aggregation tradeoffs discussed in Sect. 2 should be carefully considered in order to provide a reasoned response to the question. If instead, we can answer the question “How much error is acceptable?” then the square root formulas can answer the original question. Given a and an acceptable error (bound) value √ of ea , then the inequality ea ≥ c (a/m) gives m ≥ ((c/ea )2 a) , where  Z  is the smallest integer greater than or equal to Z. The square root formulas provide some theoretical basis for the law of diminishing returns for DP aggregation for a class of PCMs and CLMs: DP aggregation error decreases at a decreasing rate as the number of ADPs increases. While the formulas are only approximate, using them may be better than relying on professional judgment. Unfortunately the law of diminishing returns for error bounds as the number of ADPS m increases does not always hold. See Francis et al. (2002a) for details for the PCM and PMM. Acknowledgements We are happy to thank Xiuli Chao for an insightful discussion of ABC error. We thank the operators of the EWGLA and SOLA web sites, Frank Plastria and Trevor Hale respectively, for distributing our reference queries, as well as the colleagues who responded to these queries. In particular, we thank Emilio Carrizosa, Sanjay Melkote, Stefan Nickel and Frank Plastria for their help with references involving the Lipschitz condition. We thank a referee for careful editing. A portion of this material was given in invited presentations at the Seville ISOLDE X meeting in June, 2005.

Appendix: Lipschitz conditions, SAND, and convexity We need some general definitions. Let g be a real function defined on some nonempty set S ⊂ R n . We say that g is positively homogeneous (homogeneous of degree 1) if for any U ∈ S, and a nonnegative r, such that rU ∈ S, g(rU ) = rg(U ). g is subhomogeneous if for any U ∈ S, and a nonnegative r, such that rU ∈ S, g(rU ) ≤ rg(U ). g is quasi-homogeneous if for any U ∈ S, and a real r ≥ 1, such that rU ∈ S, g(rU ) ≤ rg(U ). g is Lipschitzian relative to S if there exists a finite constant L such that |g(U ) − g(V )| ≤ LU − V  for any U, V ∈ S. (U − V  is the Euclidean norm.) In this paper we apply the above only in the case when S = R+n . The following is a basic result directly derived from the Mean Value Theorem. Theorem 1 Suppose that S is compact and g is differentiable on S. If the gradient of g is uniformly bounded on S, then g is Lipschitzian. Consider the location model f (X : A) = g(D(X, A)), where g satisfies the conditions of Theorem 1 over the compact set S = {(u1 , . . . , un ) : 0 ≤ ui ≤ D  }, where D  is the diameter

Ann Oper Res

of the network, or, in the planar case, the diameter of the convex hull of the DPs. We now obtain the following Lipschitzian condition for the function f (X : A): |f (X : A) − f (Y : A)| = |g(D(X, A)) − g(D(Y, A))| ≤ LD(X, A) − D(Y, A) √ ≤ L(X, Y )e = nL(X, Y ). The two inequalities above are due respectively to Theorem 1 and to the fact that the closest distance p-server min-max distance triangle inequalities give |D(X, aj ) − D(Y, aj )| ≤ (X, Y ) for j ∈ N and any two p-servers X and Y , (Note that if the DPs belong to some metric space T ∗ , then f (X : A) is a mapping from the Cartesian space T ∗ × T ∗ × · · · × T ∗ to the real line.) The above Lipschitzian result is quite general and does not depend on any subadditivity or homogeneity properties of the costing function g. However, when we consider general SAND costing functions, differentiability is no longer guaranteed. For example, the center costing objective function g(U ) = max{u1 , . . . un } is not differentiable. Nevertheless, this convex function is Lipschitzian with L = 1 on R+n . More generally, Theorems 10.4 and 24.7 in Rockafellar (1970) ensure that if g is a finite and convex function and S is a compact set in the relative interior of the domain of g, then g is Lipschitzian. In particular, the convex ordered median and the p-norm costing functions are Lipschitzian. Note that these functions are positively homogeneous, and by Theorem 1.4.7 of Rosenbaum (1950) and Theorem 4.7 of Rockafellar (1970), every homogeneous subadditive function is convex. (In fact, a necessary and sufficient condition that a finite convex function is subadditive is that it is quasi-homogeneous, Theorem 1.4.6 in Rosenbaum 1950). Focusing on Lipschitz conditions of SAND functions, the remaining question is whether all SAND functions are Lipschitzian on compact domains. The answer is no. Let g be defined on the real line by g(u) = 1 if u < 1, and g(u) = 2 if u ≥ 1. Then g is SAND but it is not Lipschitzian since it is not continuous at u = 1. (If we omit the requirement that g is ND and assume that it is only SA, such a function may even be everywhere discontinuous, as illustrated by the following Example 1.1.8 from Rosenbaum 1950: let g be defined on the real line, by g(u) = 0 if u is rational, and g(u) = 1 otherwise.) Finally, we note the following set of conditions ensure a Lipschitz property on the location model f (X : A) = g(D(X, A)), without assuming explicitly that g is convex or differentiable. Suppose that g is SAND. As in Sect. 4.2 (Lipschitz-Like- Bound), we obtain |f (X : A) − f (Y : A)| ≤ g((X, Y )e), where e = (1, . . . , 1) ∈ R+n . If we further assume that the SAND function g is subhomogenous of degree 1 along the direction e = (1, . . . , 1) (i.e., for any nonnegative r, such that re ∈ S, g(re) ≤ rg(e)) we get the Lipschitzian inequality |f (X : A) − f (Y : A)| ≤ (X, Y )g(e). References Agarwal, P. K., & Procopiuc, C. M. (2002). Exact and approximation algorithms for clustering. Algorithmica, 33, 201–226. Agarwal, P. K., & Varadarajan, K. R. (1999). Approximation algorithms for bipartite and nonbipartite matchings in the plane, In 10th ACM-SIAM symposium on discrete algorithms (SODA), pp. 805–814. Agarwal, P. K., Efrat, A., & Sharir, M. (1999). Vertical decomposition of shallow levels in 3-dimensional arrangements and its applications. SIAM Journal on Computing, 29, 912–953. Agarwal, P. K., Procopiuc, C. M., & Varadarajan, K. R. (2002). Approximation algorithms for k-line center, In Proceedings 10-th annual European symposium on algorithms (ESA 2002), pp. 54–63.

Ann Oper Res Agarwal, P. K., Har-Peled, S., & Varadarajan, K. R. (2005). Geometric approximation via coresets. In J. E. Goodman, J. Pach, & E. Welzl (Eds.), Combinatorial and computational geometry. New York: Cambridge University Press. Ahuja, R. K., Magnanti, T. L., & Orlin, J. B. (1993). Network flows: theory, algorithms, and applications. Englewood Cliffs: Prentice–Hall (Exercise 12.23 on page 505 describes an O(n2.5 log n) algorithm for the bottleneck assignment problem.) Andersson, G., Francis, R. L., Normark, T., & Rayco, M. B. (1998). Aggregation method experimentation for large-scale network location problems. Location Science, 6, 25–39. Bach, L. (1981). The problem of aggregation and distance for analysis of accessibility and access opportunity in location-allocation models. Environment and Planning A, 13, 955–978. Ballou, R. H. (1994). Measuring transport costing error in customer aggregation for facility location. Transportation Journal, 33, 49–54. Bender, T., Hennes, H., Kalcsics, J., Melo, T., & Nickel, S. (2001). Location software and interface with GIS and supply chain management. In Z. Drezner & H. Hamacher (Eds.), Facility location: applications and theory. Berlin: Springer. Bowerman, R. L., Calamai, P. H., & Hall, B. (1999). The demand partitioning method for reducing aggregation errors in p-median problems. Computers and Operations Research, 26, 1097–1111. Carrizosa, E., Hamacher, H. W., Nickel, S., & Klein, R. (2000). Solving nonconvex planar location problems by finite dominating sets. Journal of Global Optimization, 18, 195–210. Casillas, P. A. (1987). Data aggregation and the p-median problem in continuous space. In A. Ghosh & G. Rushton (Eds.), Spatial analysis and location-allocation models (pp. 227–244). New York: Van Nostrand Reinhold Publishers. Chelst, K. R., Schultz, J. P., & Sanghvi, N. (1988). Issues and decision aids for designing branch networks. Journal of Retail Banking X, 2, 5–17. Cooper, L. (1967). Solutions of generalized location equilibrium models. Journal of Regional Science, 1, 334–336. Cornuejols, G., Fisher, M. L., & Nemhauser, G. L. (1977). Location of bank accounts to optimize float: an analytical study of exact and approximate algorithms. Management Science, 23, 789–810. Current, J. R., & Schilling, D. A. (1987). Elimination of source A and B errors in p-median problems. Geographical Analysis, 19, 95–110. Current, J. R., & Schilling, D. A. (1990). Analysis of errors due to demand data aggregation in the set covering and maximal covering location problems. Geographical Analysis, 22, 116–126. Daskin, M. S. (1995). Network and discrete location: models, algorithms, and applications. New York: Wiley. Daskin, M. S., Haghani, A. E., Khanal, M., & Malandraki, C. (1989). Aggregation effects in maximum covering models. Annals of Operations Research, 18, 115–139. Dekle, J., Lavieri, M., Martin, E., Emir-Farinas, H., & Francis, R. L. (2005). A Florida county locates disaster recovery centers. Interfaces, 35(2), 133–139. Domich, P. D., Hoffman, K. L., Jackson, R. H. F., & McClain, M. A. (1991). Locating tax facilities: a graphics-based microcomputer optimization model. Management Science, 37, 960–979. Drezner, Z. (Ed.). (1995a). Facility location: a survey of applications and methods. Berlin: Springer. Drezner, Z. (1995b). Replacing discrete demand with continuous demand. In Z. Drezner (Ed.), Facility location: a survey of applications and methods. Berlin: Springer. Drezner, T., & Drezner, Z. (1997). Replacing discrete demand with continuous demand in a competitive facility location problem. Naval Research Logistics, 44, 81–95. Drezner, Z., & Hamacher, H. W. (Eds.). (2002). Facility location: theory and algorithms. Berlin: Springer. Efrat, A., Itai, A., & Katz, M. J. (2001). Geometry helps in bottleneck matching and related problems. Algorithmica, 31, 1–28. Emir-Farinas, H. (2002). Aggregation of demand points for the planar covering location problem. Ph. D. Dissertation, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL. Emir-Farinas, H., & Francis, R. L. (2005). Demand point aggregation for planar covering location models. Annals of Operations Research, 136, 175–192. Erkut, E., & Bozkaya, B. (1999). Analysis of aggregation errors for the p-median problem. Computers and Operations Research, 26, 1075–1096. Erkut, E., & Neuman, S. (1989). Analytical models for locating undesirable facilities. European Journal of Operational Research, 40, 275–291. Ernst, A., Hamacher, H. W., Jiang, H. W., Krishnamorthy, M., & Woeginger, G. (2002a). Uncapacitated single and multiple allocation p-hub center problems. Report CSIRO, Melbourne, Australia. Ernst, A., Hamacher, H. W., Jiang, H. W., Krishnamorthy, M., & Woeginger, G. (2002b). Heuristic algorithms for the uncapacitated hub center single allocation problem. Report CSIRO, Melbourne, Australia.

Ann Oper Res Fortney, J., Rost, K., & Warren, J. (2000). Comparing alternative methods of measuring geographic access to health services. Health Services and Outcomes Research Methodology, 1, 173–184. Fotheringham, A. S., Densham, P., & Curtis, A. (1995). The zone definition problem in location-allocation modeling. Geographical Analysis, 27, 60–77. Francis, R. L., & Lowe, T. J. (1992). On worst-case aggregation analysis for network location problems. Annals of Operations Research, 40, 229–246. Francis, R. L., & Rayco, M. B. (1996). Asymptotically optimal aggregation for some unweighted p-center problems with rectilinear distances. Studies in Locational Analysis, 10, 25–36. Francis, R. L., & White, J. A. (1974). Facility layout and location: an analytical approach. Englewood Cliffs: Prentice–Hall (Homework problem 7.25, p. 324). Francis, R. L., McGinnis, L. F., & White, J. A. (1992). Facility layout and location: an analytical approach (2nd ed.). Englewood Cliffs: Prentice–Hall. Francis, R. L., Lowe, T. J., & Rayco, M. B. (1996). Row-column aggregation for rectilinear p-median problems. Transportation Science, 30, 160–174. Francis, R. L., Lowe, T. J., Rushton, G., & Rayco, M. B. (1999). A synthesis of aggregation methods for multi-facility location problems: strategies for containing error. Geographical Analysis, 31, 67–87. Francis, R. L., Lowe, T. J., & Tamir, A. (2000). On aggregation error bounds for a class of location models. Operations Research, 48, 294–307. Francis, R. L., Lowe, T. J., & Tamir, A. (2002a). Worst-case incremental analysis for a class of p-facility location problems. Networks, 39, 139–143. Francis, R. L., Lowe, T. J., & Tamir, A. (2002b). Demand point aggregation of location models. In Z. Drezner & H. Hamacher (Eds.), Facility location: applications and theory. Berlin: Springer. Francis, R. L., Lowe, T. J., Rayco, M. B., & Tamir, A. (2003). Exploiting self-canceling demand point aggregation error for some planar rectilinear median problems. Naval Research Logistics, 50, 614–637. Francis, R. L., Lowe, T. J., & Tamir, A. (2004a). Demand point aggregation analysis for a class of constrained location models: a penalty function approach. IIE Transactions, 36, 601–609. Francis, R. L., Lowe, T. J., Tamir, A., & Emir-Farinas, H. (2004b). Aggregation decomposition and aggregation guidelines for a class of minimax and covering location models. Geographical Analysis, 36, 332–349. Francis, R. L., Lowe, T. J., Tamir, A., & Emir-Farinas, H. (2004c). A framework for demand point and solution space aggregation analysis for location models. European Journal of Operational Research, 159, 574–585. Frieze, A. M. (1980). Probabilistic analysis of some Euclidean clustering problems. Discrete Applied Mathematics, 10, 295–309. Gavriliouk, E. O. (2003). Aggregation in hub location models. M. Sc. Thesis, Department of Mathematics, Clemson University, Clemson, SC. Geoffrion, A. (1977). Objective function approximations in mathematical programming. Mathematical Programming, 13, 23–37. Goldberg, R. (1976). Methods of real analysis (2nd ed.). New York: Wiley. Goodchild, M. F. (1979). The aggregation problem in location-allocation. Geographical Analysis, 11, 240– 255. Hakimi, S. L. (1965). Optimum location of switching centers and the absolute centers and medians of a graph. Operations Research, 12, 450–459. Hakimi, S. L., Labbe’, M., & Schmeichel, E. (1992). The Voronoi partitioning of a network and its implications in network location theory. ORSA Journal on Computing, 4, 412–417. Hale, T., & Hale, L. (2000). An aggregation technique for location problems with one-dimensional forbidden regions. International Journal of Industrial Engineering, 7, 133–139. Hale, T., & Moberg, C. (2003). Location science research: a review. Annals of Operations Research, 123, 21–35. Hale, T., Wysk, R., & Smith, D. (2000). A gross aggregation technique for the p-median location problem. International Journal of Operations and Quantitative Management, 6, 129–135. Handler, G. Y., & Mirchandani, P. B. (1979). Location on networks: theory and algorithms. Cambridge: MIT Press. Hansen, P., Jaumard, B., & Tuy, H. (1995). Global optimization in location. In Z. Drezner (Ed.), Facility location: a survey of applications and methods. Berlin: Springer. Har-Peled, S. (2004a). Clustering motion. Discrete Computational Geometry, 31, 545–565. Har-Peled, S. (2004b). No coreset, no cry. In Proceedings 24-th conf. found. soft. tech. theoretical computer science, pp. 324–335. Har-Peled, S., & Kushal, A. (2007). Smaller coresets for k-median and k-means clustering. Discrete Computational Geometry, 37, 3–19.

Ann Oper Res Har-Peled, S., & Mazumdar, S. (2004). Coresets for k-means and k-median clustering and their applications. In Proceedings 36-th annual ACM symposium on theory of computing, pp. 291–300. Hassin, R., & Tamir, A. (1991). Improved complexity bounds for location problems on the real line. O.R. Letters, 10, 395–402. Hillsman, E. L., & Rhoda, R. (1978). Errors in measuring distances from populations to service centers. Annals of Regional Science, 12, 74–88. Hodgson, M. J. (2002). Data surrogation error in p-median models. Annals of Operations Research, 110, 153–165. Hodgson, M. J., & Hewko, J. (2003). Aggregation and surrogation error in the p-median model. Annals of Operations Research, 123, 53–66. Hodgson, M. J., & Neuman, S. (1993). A GIS approach to eliminating source C aggregation error in p-median models. Location Science, 1, 155–170. Hodgson, M. J., Shmulevitz, F., & Körkel, M. (1997). Aggregation error effects on the discrete-space pmedian model: the case of Edmonton, Canada. The Canadian Geographer, 41, 415–428. Hooker, J. N., Garfinkel, R. S., & Chen, C. K. (1991). Finite dominating sets for network location problems. Operations Research, 39, 100–118. Kariv, O., & Hakimi, S. L. (1979). An algorithmic approach to network location problems: part 1, the pcenters; part 2, the p-medians. SIAM Journal of Applied Mathematics, 37, 513–560. Kolen, A., & Tamir, A. (1990). Covering problems. In P. B. Mirchandani & R. L. Francis (Eds.), Discrete location theory (pp. 263–304). New York: Wiley–Interscience. Love, R., Morris, J., & Wesolowsky, G. (1988). Facility location: models and methods. Amsterdam: NorthHolland. Marchetti-Spaccamela, A., & Talamo, M. (1983). Probabilistic analysis of two Euclidean location problems. R.A.I.R.O. Informatique Theorique/Theoretical Informatics, 17, 387–395. Megiddo, N., & Supowit, K. J. (1984). On the complexity of some common geometric location problems. SIAM Journal on Computing, 13, 182–196. Mirchandani, P. B., & Francis, R. L. (Eds.). (1990). Discrete location theory. New York: Wiley–Interscience. Mirchandani, P. B., & Reilly, J. M. (1986). Spatial nodes in discrete location problems. Annals of Operations Research, 10, 329–350. Murray, A. T., & Gottsegen, J. M. (1997). The influence of data aggregation on the stability of p-median location model solutions. Geographical Analysis, 29, 200–213. Nickel, S., & Puerto, J. (1999). A unified approach to network location problems. Networks, 34, 283–290. Nickel, S., & Puerto, J. (2005). Location theory: a unified approach. Berlin: Springer. Ohsawa, Y., Koshizuka, T., & Kurita, O. (1991). Errors caused by rounded data in two simple facility location problems. Geographical Analysis, 23, 56–73. Pardalos, P. M., & Resende, M. (Eds.). (2002). Handbook of applied optimization. Oxford: Oxford University Press. Plastria, F. (1992). GBSSS: the generalized big square small square method for planar single-facility location. European Journal of Operational Research, 62, 163–174. Plastria, F. (2000). New error bounds in continuous minisum location for aggregation at the gravity centre. Studies in Locational Analysis, 14, 101–119. Plastria, F. (2001). On the choice of aggregation points for continuous p-median problems: a case for the gravity center. TOP, 9, 217–242. Puerto, J., & Fernandez, F. R. (2000). Geometrical properties of the symmetrical single facility location problem. Journal of Nonlinear Convex Analysis, 1, 321–342. Rayco, M. B. (1996). Algorithmic approaches to demand point aggregation for location models. Ph. D. Dissertation, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL. Rayco, M. B., Francis, R. L., & Lowe, T. J. (1997). Error-bound driven demand point aggregation for the rectilinear distance p-center model. Location Science, 4, 213–235. Rayco, M. B., Francis, R. L., & Tamir, A. (1999). A p-center grid-positioning aggregation procedure. Computers and Operations Research, 26, 1113–1124. Reeves, C. (1993). Modern heuristic techniques for combinatorial problems. Oxford: Blackwell Scientific Press. Resende, M. G. C., & de Sousa, J. P. (Eds.). (2004). Metaheuristics: computer decision-making. Boston: Kluwer Academic. Rockafellar, R. T. (1970). Convex analysis. Princeton: Princeton University Press. Rodriguez-Bachiller, A. (1983). Errors in the measurement of spatial distances between discrete regions. Environment and Planning A, 15, 781–799. Rodriguez-Chia, A. M., Nickel, S., Puerto, J., & Fernandez, F. R. (2000). A flexible approach to location problems. Mathematical Methods of Operations Research, 51, 69–89.

Ann Oper Res Rogers, D. F., Plante, R. D., Wong, R. T., & Evans, J. R. (1991). Aggregation and disaggregation techniques and methodology in optimization. Operations Research, 39, 553–582. Romero-Morales, D., Carrizosa, E., & Conde, E. (1997). Semi-obnoxious location models: a global optimization approach. European Journal of Operational Research, 102, 295–301. Rosenbaum, R. A. (1950). Sub-additive functions. Duke Mathematical Journal, 17, 227–247. Rushton, G. (1989). Applications of location models. Annals of Operations Research, 18, 25–42. Sheffi, Y. (1985). Urban transportation networks: equilibrium analysis with mathematical programming models (pp. 14–16). Englewood Cliffs: Prentice–Hall. Shier, D. R., & Dearing, P. M. (1983). Optimal locations for a class of nonlinear, single-facility location problems on a network. Operations Research, 31, 292–303. Varadarajan, K. R. (1998). A divide and conquer algorithm for min-cost perfect matching in the plane. In Proceedings 38-th annual IEEE symposium on foundations of computer sciences, pp. 320–331. Webber, M. J. (1980). A theoretical analysis of aggregation in spatial interaction models. Geographical Analysis, 12, 129–141. Zemel, E. (1985). Probabilistic analysis of geometric location problems. SIAM J. Algebraic and Discrete Methods, 6, 189–200. Zhao, P. (1996). Analysis of aggregation effects in location problems. Ph. D. Dissertation, Dept. of Industrial Engineering, University at Buffalo (SUNY), Buffalo, NY. Zhao, P., & Batta, R. (1999). Analysis of centroid aggregation for the Euclidean distance p-median problem. European Journal of Operational Research, 113, 147–168. Zhao, P., & Batta, R. (2000). An aggregation approach to solving the network p-median problem with link demands. Networks, 36, 233–241.