Alon Efrat(2). Klara Kedem(3). (1)Department of Computer Science, Cornell University, Ithaca, NY 14853, USA. (2)School of Mathematical Sciences, Tel Aviv ...
Geometric Pattern Matching in d-Dimensional Space? L. Paul Chew(1)
Dorit Dor(2)
Alon Efrat(2)
Klara Kedem(3)
Department of Computer Science, Cornell University, Ithaca, NY 14853, USA School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel (3) Department of Math and CS, Ben Gurion University, Beer-Sheva 84105, Israel (1)
(2)
Abstract. We show that, using the L1 metric, the minimum Hausdor
distance under translation between two point sets of cardinality n in ddimensional space can be computed in time O(n(4d?2)=3 log2 n) for d > 3. Thus we improve the previous time bound of O(n2d?2 log2 n) due to Chew and Kedem. For d = 3 we obtain a better result of O(n3 log 2 n) time by exploiting the fact that the union of n axis-parallel unit cubes can be decomposed into O(n) disjoint axis-parallel boxes. We prove that the number of dierent translations that achieve the minimum Hausdor distance in dspace is (nb3d=2c ). Furthermore, we present an algorithm which computes the minimum Hausdor distance under the L2 metric in d-space in time O(nd3d=2e+1 log 3 n).
1 Introduction We consider the problem of nding the resemblance, under translation, of two point sets in d-dimensional space for d 3. In many matching applications, objects are described by d parameters; thus a single object corresponds to a point in d-dimensional space. One would like the ability to determine whether two sets of such objects resemble each other. A 3D example comes from molecular matching, where a molecule can be described by its atoms, represented as points in 3-space. The tool that we suggest here for measuring resemblance is the well-researched minimum Hausdor distance under translation. The distance function we use is the L1 metric. One advantage of using the Hausdor distance is that it does not assume equal cardinality of the point sets. It measures the maximal mismatch between the sets when one point set is allowed to translate in order to minimize this mismatch. Two point sets are considered to be similar if this mismatch is small. We assume that the cardinalities of the sets are n and m = o(n) and we express our results in terms of n. There have been several papers on the subject of point set resemblance using the minimum Hausdor distance under translation. Huttenlocher et al. [11, 12] nd the ?
The work of the rst and fourth authors was partly supported by AFOSR Grant AFOSR91-0328. The rst author was also supported by ONR Grant N00014-89-J-1946, by ARPA under ONR contract N00014-88-K-0591, and by the Cornell Theory Center which receives funding from its Corporate Research Institute, NSF, New York State, ARPA, NIH, and IBM Corporation.
minimumHausdor distance for point sets in the plane in time O(n3 logn) under the L1 , L2 , or L1 metrics. For point sets in 3-dimensional space their algorithm, using the L2 metric, runs in time O(n5+" ). The method used in [12] cannot be extended to work under L1 . Chew and Kedem [7] show that, when using the L1 metric in the plane, the minimum Hausdor distance can be computed in time O(n2 log2 n). This is a somewhat surprising result, since there can be (n3 ) dierent translations that achieve the (same) minimum [7, 18]. They [7] further extend their technique to compute the minimum Hausdor distance between two point sets in d-dimensional space, using the L1 metric, achieving a time bound of O(n2d?2 log2 n) for a xed dimension d. We show in this paper that, using the L1 metric, the minimum Hausdor distance between two point sets can be found in time O(n3 log2 n) for d = 3, and in time O(n(4d?2)=3 log2 n) for any xed d > 3. To estimate the quality of the time complexity of our algorithms, it is natural to seek the number of dierent translations that achieve the minimum Hausdor distance between two point sets in d-dimensional space. More precisely, the number of connected components of feasible translations in the d-dimensional translation space. We show that this number is (nb3d=2c ) in the worst case. Note that, as for the planar case solved in [7], the runtime of the algorithms which we present for a xed d 3 is signi cantly lower than the number of the connected components in the d-dimensional translation space. Our algorithms can be easily extended within the same time bounds to tackle the problem of pattern matching in the presence of spurious points, sometimes called outliers. In this problem we seek a small subset X A [ B of points (whose existence is a result of noise) containing at most a pre-determined number k of points, such that A X can be optimally matched, under translation, to B X. Many optimization problems are solved parametrically by nding an oracle for a decision problem and then using this oracle in some parametric optimization scheme. In this paper we follow this line by developing an algorithm for the Hausdor decision problem (see de nition in the next section) and then using it as an oracle in the Frederickson and Johnson [10] optimization scheme. For the oracle in 3-space we prove that a set of n unit cubes can be decomposed into O(n) disjoint axis-parallel boxes. We then apply the orthogonal partition trees (OPTs) described by Overmars and Yap [16] to nd the maximal depth of disjoint axis-parallel boxes. We show that this suces to answer the Hausdor decision problem in 3-space. For d > 3 there is a super-linear lower bound on the number of boxes obtained by disjoint decomposition of a union of boxes (see [4]); thus we cannot use a disjoint decomposition of unit cubes. Instead, we build a decision-problem oracle by developing and using a modi ed, nested version of the OPT. When using L2 as the underlying metric we show that there can be (nb3d=2c ) connected componnents in the translation space, and that the complexity of the space of feasible translations is O(nd3d=2e ). We present an algorithm which computes the minimum Hausdor distance under the L2 metric in d-space in time O(nd3d=2e+1+ log3 n) for any > 0. The paper is organized as follows: In Section 2 we de ne the minimum Hausdor distance problem, and describe the Hausdor distance decision problem. In Section 3 we show that the union of n axis-parallel unit cubes in 3-space can be decomposed
into O(n) disjoint axis-parallel boxes, and use the orthogonal partition trees of [16] to solve the Hausdor distance decision problem in 3-space. For d > 3, our algorithm is more involved and hence its description is separated into two sections; Section 4 contains a relaxed version of our data structures and an oracle which runs in time O(n3d=2?1 log n). In Section 5 we modify the data structures of the relaxed version and obtain an O(n(4d?2)=3 logn) runtime oracle. In Section 6 we show brie y how we plug the decision algorithm into the Frederickson and Johnson optimization scheme. Bounds on the number of translations that minimize the Hausdor distance are presented in Section 7. The algorithm for the minimum Hausdor distance under the L2 metric is discussed in Section 8. Open questions appear in Section 9. Since most of the spatial objects we deal with in this paper are axis-parallel cubes, axis-parallel boxes and axis-parallel cells, we will omit from now on the words `axis-parallel' and talk about cubes, boxes and cells.
2 The Hausdor Distance Decision Problem The well-known Hausdor distance between two point sets A and B is de ned as H(A; B) = max(h(A; B); h(B; A)) where the one-way Hausdor distance from A to B is h(A; B) = max min (a; b): a2A b2B
Here, (; ) represents a familiar metric on points: for instance, the standard Euclidean metric (the L2 metric) or the L1 or L1 metrics. In this paper, unless otherwise noted, we use the L1 metric. In dimension d, an L1 \sphere" (i.e., a set of points equidistant from a given center point) is a d-cube. The minimum Hausdor distance between two point sets is the Hausdor distance minimized with respect to all possible translations of the point sets. Huttenlocher and Kedem [11] observe that the minimum Hausdor distance is a metric on point sets (and more general shapes) that is independent of translation. Intuitively, it measures the maximum mismatch between two point sets after the sets have been translated to minimize this mismatch. For the minimum Hausdor distance the sets do not have to have the same cardinality, although, to simplify our presentation, we assume that both point sets are of size (n). As in [7] we approach this optimization problem by using the Hausdor decision problem with parameter " > 0 as a subroutine for a search in a sorted matrix of " values. We de ne the Hausdor decision problem for a given " to be the question of whether the minimum Hausdor distance under translation is bounded by ". We say that the Hausdor decision problem for sets A and B and for " is true if there exists a translation t such that the Hausdor distance between A and B shifted by t is less than or equal to ". We follow the approach taken in [7], solving the Hausdor decision problem by solving a problem of the intersection of unions of cubes in the (d-dimensional) translation space. Let A and B be two sets of points as above, let " be a positive real value, and let C" be a d-dimensional cube, with side size 2" and with the
origin at its center (the L1 \sphere" of radius "). We de ne the set A" to be A C", where represents the Minkowski sum. Consider the set A" ?b where ?b represents the re ection of the point b through the origin. This set is the set of translations that map b into A" . The set of translations that map all points b 2 B into A" is then \b2B (A" ?b); we denote this set by S(A; "; B). It can be shown [7] that the Hausdor decision problem for point sets A and B and for " is true i S(A; "; B) \?S(B; "; A) 6= . We restrict our attention to the problem of determining whether S(A; "; B) is empty; extending our method to determining whether the intersection of this set with ?S(B; "; A) is empty is reasonably straightforward. Another way to look at the Hausdor decision problem is to assign a dierent color to each b 2 B and to paint all the cubes of A" ?b for a particular b, in its assigned color. Now we can look at A" ?b as a union of cubes of one color which we call a layer. We thus have n layers in n dierent colors, one layer for each point b 2 B. A point p 2 Rd is covered by a color i if there exists a cube in the layer of that color containing p (i.e., if p 2 A" ?bi ). The color-depth of p is the number of layers containing p. Our aim is thus to determine if there is a point p of color-depth n.
3 The Decision Problem in 3 Dimensions Overmars and Yap [16] address the question of determining the volume of a union of N d-boxes. Using a data structure they call an orthogonal partition tree, which we abbreviate as OPT, they achieve a runtime of O(N d=2 log N). They also observe that their data structure can be used to report other measures within the same time bound. One problem that can be easily solved using their data structure is the maximum coverage for a set S of N d-boxes. De ning the coverage of a point p 2 Rd to be the number of (closed) d-boxes that contain p, the maximum coverage is maxfcoverage(p) j p 2 Rdg. Maximum coverage is almost what we need for the Hausdor decision problem, but instead we need the maximum color-depth. The dierence is that maximum coverage counts the number of dierent boxes while we need to count the number of dierent colors (layers). These two concepts are the same if, for each color, all the boxes of that color are disjoint. To achieve this, we rst decompose each layer into O(n) disjoint boxes; then we apply the OPT method to compute the maximum coverage (which will now equal the maximum color-depth). Theorem 1. The union of n unit cubes in R3 can be decomposed, in time O(n logn), into O(n) boxes whose interiors are disjoint. Proof: For space limitations we just outline the proof. We slice the 3-dimensional space by planes parallel to the z axis at z = 0; 1; 2; : : : (wlog all cubes have nonnegative coordinates). For eachPinteger i, ni denotes the number of of cubes intersected by the plane z = i, n = ni , and E is the portion of the union of cubes that lies within the slab bounded by z = i and z = i + 1. Since the complexity of the boundary of the union of n unit 3-cubes is linear in the number of cubes (e.g., [4]), the complexity of the boundary of E is O(ni + ni+1 ). To end the proof, we show that E can be decomposed into O(ni + ni+1 ) boxes with disjoint interiors. 2
Applying this theorem to each of the n colors, we decompose each layer (recall that a layer is the union of all cubes of a single color) into O(n) disjoint boxes, getting a total of N = O(n2 ) boxes, where boxes of the same color do not overlap. We can now apply the Overmars and Yap algorithm on these boxes, getting an answer to the Hausdor decision problem in time O(n3 log n). This gives us the following theorem.
Theorem2. For point sets in 3-space the Hausdor decision problem can be answered in time O(n3 log n).
4 Higher Dimensions: the Relaxed Version The decomposition method used for 3-space cannot be extended eciently to work for d > 3, since as Boissonnat et al. [4] have recently shown, the complexity of the union of n d-dimensional unit cubes is (nbd=2c ); thus a single layer (the union of cubes for a single color) cannot be decomposed into O(n) disjoint boxes. Note that we cannot use the Overmars and Yap data structure and algorithm directly for the set of n2 colored cubes because of the possible overlapping of cubes of the same color. Our method is therefore to augment the OPT [16], adding capabilities that eciently handle the overlapping of same-color cubes. We describe very brie y the OPT of Overmars and Yap [16]. To use the OPT for computing the measure of N d-boxes, Overmars and Yap [16] treat this static ddimensional problem as a dynamic (d ? 1)-dimensional problem: they sweep d-space using a (d ? 1)-dimensional hyperplane h. During this sweep, cubes enter into h (start intersecting h) and leave it afterwards. Each entering cube causes an insertion update to the OPT and each leaving cubes causes a deletion update. Both insertions and deletions involve some computation concerning the required measure. Let Q = fq1; : : :; qN g be a set of N boxes contained in (d ? 1)-space. An OPT T de ned for Q is a binary tree such that each node is associated with a box-like (d ? 1)-cell C that contains some part of the (d ? 1)-space. For each node , the cell C is the disjoint union of the cells associated with its children, Cleft and Cright . The cell Croot is the whole (d?1)-space. We say that a cell C is an ancestor (resp. descendent) of a cell C if is an ancestor (resp. descendent) of 0 in T . Note that this is just the containment relation between cells. We say that a box q is a pile with respect to a cell C if (1) C 6 q and (2) for at least d ? 2 of the axes, the projection of q on the axes contains the projection of C . If C q we say that q covers C . The tree structure of the OPT is xed though the data at the nodes are changed with the insertion and deletion of cubes throughout the algorithm. Some attributes of the OPT structure and the measure algorithm are ([16]): (A1) Each cell C stores those boxes of Q that cover C , but don't cover the parent of C. (In this way, the OPT is an extension of the well known segment tree.) (A2) p Every leaf-cell also stores all boxes partially covering it (as piles), and there are O( N) such boxes. (A3) Each box q partially covers O(N d=2?1 ) leaf-cells (as piles with respect to the corresponding leaf cells). (A4) The height of the tree is O(log N). (A5) The number of nodes in the tree is O(N (d?1)=2). (A6) The measure is computed in time O(N d=2 log N). 0
As mentioned above we would like to implement these trees for N = O(n2) colored cubes (n cubes in each of the n colors), and nd whether there is a point covered by n colors. Two problems arise due to the possible overlap of cubes of the same color: (i) cubes of the same color might appear several times on a path from the root to a leaf, and (ii) there might be an overlap of piles or cubes of the same color in a leaf cell. Let us rst explain how we handle eciently problem (ii). For a given leaf-cell C denote the set of piles with respect to C by P. We de ne colors(P) to be the set of colors of all the piles of P. At each deletion or insertion of a pile to C (and to P), we want to determine quickly whether there exists a point in C covered by all colors of colors(P). Note that for every color i 2 colors(P) the region in C consisting of points not covered by color i is a set of disjoint boxes Gi . Consider an addition of a pile of color i to C . If the pile is contained in another pile of that color, then the addition does not contribute anything to Gi . Otherwise, the addition may decrease the size of an existing box in Gi or add one box to Gi (because a pile is partial to an input unit cube and therefore slicing a cell into two uncovered boxes can be done in at most one axis). Thus at every instance the number of boxes in Gi for a given color i is linear in the number of piles of that color at that instance. Thus determining whether there is a point in C which is covered by all colors of P is equivalent to determining wheteher there is a point in C which is out of all boxes of the sets Gi , i 2 colors(P). This question, in its turn, is equivalent to determining whether the volume of the union [iGi is equal to the volume of C . Notice that by posing the question in this way we handle the overlapping of piles of the same color and convert this question into a volume problem in (d ? 1)-space which can be answered by applying the volume algorithm in [16] on each leaf cell sepaprately, as we explain below. This means that we have to de ne and maintain an OPT T for each leaf cell C . We call the OPT's constructed on leaves of T leaf trees. These trees maintain the boxes Gi which arise in the course of the algorithm. In the next section we will describe in detail the implementation and analysis of this step. The intuition however is that with each insertion of a pile q of color i to a leaf-cell , we rst delete the set of boxes Gi from T , then we recompute Gi , insert the new boxes of Gi into T computing the volume on the y. The operations for deleting a pile are symmetric. Regarding (i), we do not want to count a color twice (for the depth) when two cells along a path from the root of T to a leaf have two cubes of that color. Therefore, for each node , we maintain the \Active" colors in this node | that is, colors that cover this node and do not cover any of its ancestors along the path from to the root of T . The active colors will be the ones taken into account in the computation of the depth as we show below. We now turn to the description of the elds of each node in our augmented OPT T . Notice that for some elds we make a distinction between leaf nodes and internal nodes.
L TOT stores for a leaf , all cubes covering C (either partially or completely) but not covering its parent. If is an internal node of T , then L TOT stores all cubes covering C , but not covering its parent. The eld L TOT is organized as a balanced search tree, sorted by colors and
lexicographicallly by cubes, so that determining whether a color i appears in L TOT , as well as inserting a cube into or deleting a cube from L TOT , is done in time O(log n). Active stores for a leaf , all cubes covering C (either partially or completely), whose color does not cover completely an ancestor cell of C . For an internal node , Active stores all colors i which cover C , and no ancestor cell of C is covered by color i. The eld Active is a balanced search tree sorted by colors. For leaf nodes the search tree is also sorted lexicographicallly by cubes, so that determining whether any cube of color i appears in Active , as well as inserting or deleting a color (or a cube, if is a leaf) is done in time O(logn). N Active is a counter which always contains zero if is a leaf. If is an internal node, then N Active contains the number of dierent colors in Active . This counter is updated whenever an element is inserted to or deleted from Active . Depth is the maximal depth \achieved" by cubes stored at the Active elds of and its descendent nodes. Formally, if is a leaf, then if there is a point in C covered by all colors in Active then Depth is the number of dierent colors in Active , otherwise Depth = 0. If is an internal node, then Depth is de ned recursively by: Depth = maxf Depthleft + N Activeleft ; Depthright + N Activeright g When a (colored) cube q is inserted into T or deleted from T , we rst update recursively the relevant elds L TOT . This task is simple, and details are omitted. Next, we update the Depth elds of the leaves aected by q (see below). Updating the Active elds of all nodes of T aected by q, is done in the following way: when a new cube q of color i is inserted, and we note that it completely covers a cell C and that no cell on the path from the root of T to is coverd by color i, then we insert color i to Active and delete all appearances of color i from the Active elds of all nodes in the subtree rooted at (by scanning all the nodes in the sub-tree rooted at ). When a cube q of color i is deleted from T , we nd the nodes that have q in their Active eld, and for each such node we we search the sub-tree rooted at for its highest nodes that contain color i in their L TOT eld. The cubes found in L TOT are then inserted to the respective Active elds of these nodes. Details of these operations are presented in procedures Insert To Active and Delete From Active in the Appendix. The Depth elds for each internal node are updated on the y (see Appendix) by the recursive de nition of Depth. Recall that we nd the depth Depth in a leaf cell using the method of [16] for volume computation of the union of boxes [iGi . This requires maintaining d ? 1dimensional OPT trees T for each leaf . When a pile is added (or deleted) from T we update the nodes of T that are aected by the box, and tracking the path from these nodes to the root of T we thus update the volume of C . An underlying assumption in constructing an OPT is that all the input boxes have to be known in advance. Clearly this is not the case with the boxes Gi since these boxes are constructed in the course of the algorithm. In order to construct all the boxes Gi in advance we rst perform a \dummy" execution of the algorithm, in which we maintain the list of all boxes Gi stored at each instance in each of the leaves. For the actual running of the algorithm we can assume that the boxes for the trees T for all leaves are known in advance.
The correctness of the algorithm follows from the next claim: Claim 3 After the Depth elds are updated, there is a point of depth n on the hyperplane h if and only if n = Depthroot + N Activeroot Proof omitted in this version for lack of space. It follows that in order to answer the Hausdor decision problem we only have to compare Depthroot + N Activeroot to n after each insertion or deletion of a cube.
Time and space analysis We now bound the time required by the algorithm. Since operations on leaf-cells consume the larger chunk of time of our algorithm, we can aord, when inserting or deleteing a cube q of color i, to scan all nodes in T , and to update all the internal nodes of T aected by q. There are N = n2 input cubes and O(N (d?1)=2) = O(nd?1 ) nodes in T (Attr. A5), therefore the total time spent on inner nodes of T is n2 O(nd?1 ) O(log n) = O(nd+1 logn), where the log factor is caused by searching the trees associated with the elds in each node. In bounding the leaf-cells operations, we rst analyze the time required to insert or delete a cube q of color i to one of the leaf trees T , and maintaining Depth . As we said above, each insertion of a pile into might create at most two new boxes in Gi and cause the removal of at most two old such boxes. So, at insertion (or deletion) we recompute Gi by deleting up to two boxes from T and inserting up to two boxes to T , while maintaining the volume of [iGi . p By Attribute A2, the number of cubes stored at a leaf of T is O( N) = O(n), this implies that the number of boxes of Gi for all i is O(n). Combining this fact with Theorem 4.4 of [16], we get that every leaf tree T has O(n(d?1)=2) nodes (Attr. A5) and that the time required for one insertion or deletion of a box to T , including updating the volume information at the root of T , is only (Attr. A3) O(nd=2?1 logn): (1) Let us bound now the time required to perform all insertion and deletion operations on all the trees T . We apply an amortized-type argument to get a good bound for the total running time of the algorithm. Let q be a cube, and a leaf of T . We claim that during the course of the algorithm, q is inserted into Active at most once, and deleted at most once. This is true since q is deleted when either the sweeping hyperplane h stops intersecting it or when h intersects a new cube q0 of the same color that completely covers C . In both cases q will not reappear. By Attribute A3, a cube partially covers O(nd?2 ) leaf-cells of T . Hence the total number of leaf trees aected by the n2 input cubes, is n2 O(nd?2 )
(2)
Multiplying Equations 1 and 2 we get: Theorem 4. For d > 3 we can answer the Hausdor decision problem in time O(n3d=2?1 log n).
Turning now to space requirements, by Theorem 4.1 of [16], the space required for each of the trees T is O(n(d?1)=2). Since in T there are O(nd?1 ) leaves, the total space required for the trees is O(n3(d?1)=2). The space requirement can be reduced to O(n2 ), using ideas similar to the ones in Section 5 of Overmars and Yap [16]. Analysis of this claim will appear in the full version.
5 Higher Dimensions: the Improved Version In this section we improve the relaxed algorithm, where we had two phases of OPT's. The rst phase OPT, T , stored the decomposition of the (d ? 1)-space. The second phase OPT's T described the leaf-cells of T . In order to distinguish between the cells in the trees of the two phases we call the cells of the second phase sub-cells. The main idea for the improved oracle is to try to decrease the number of operations spent on the leaf trees by decreasing the number of these trees and storing more information in each of them. In order for the display of the improved oracle to be self-contained we rst review brie y the decomposition of the (d ? 1)-space as described in [16], using similar notations (for more details see [16]). Given N (d ? 1)-boxes Q = fq1; : : :; qN g we wish to decompose the (d ? 1)-space into N (d?1)=2 cells such that attributes A1{A5 from Section 4 hold. We de ne an i-boundary of a box q to be a (dp? 2)-face of q orthogonal to the i-th axis. For axis p1 we split the (d ? 1)-space into 2 N slabs, such that each slab contains at most N 1-boundaries of the input boxes. We reapeat the process on axes j, j = 2; : : :; d ? 1. For the j-th axis we consider each slab s of level j ? 1 in the following way. We de ne Vs1 to be the boxes that have a j 0 -boundary intersecting s for some j 0 < j, and Vs2 to be the boxes that intersect s but do not have any j 0 -boundary intersecting s for j 0 < j; we nowp split the slab s by a (d ? 2)- at at every j-boundary of a box in Vs1 and at every N-th pj-boundary of a box in Vs2 . As a result, each slab of level j ? 1 is split into up to 2j N slabs of level j. This is, in essence, the cell decomposition and the OPT of Overmars and Yap, which we have used in the relaxed version. We now show how we construct our OPT T so that it will have less nodes. This implies that we load more information on the leaf cells. The invariant that each leafcell in T is intersected just by piles is not maintained any more. We de ne non-piles as follows: If a box q intersects C and is not a pile with respect to C , nor does it cover C completely then we call q a non-pile with respect to . The tree T of phase one is generated by the following procedure. Recall that we have N = n2 cubes.
Cell Decomposition
Split the (d ? 1)-space into N 1=3 = 2n2=3 slabs in the rst axis, each of whose interiors contains at most 2N 2=3 = n4=3 1-boundaries of boxes. Repeat for j = 2; : : : ; d ? 1. De ne Vs1 and Vs2 as before. For the j -th axis, we consider a slab s of level j ? 1. Split s with a (d ? 2)- at at every N 1=3 -th j boundary of a box in Vs1 and at every N 2=3 -th j -boundary of a box in Vs2 .
Notice that in the process of Cell Decomposition , we do not put (d ? 2)- ats at every j-boundary of boxes from Vs1, but at every n2=3 of them. We denote by Fi the set of non-pile boxes of color i in C , for i = 1; : : :; n.
Lemma 5. In the modi ed orthogonal partition tree generated by the
Cell Decomposition procedure on N = n2 boxes, each leaf-cell contains at most O(n2=3) non-pile boxes Fi and at most O(n4=3) pile boxes.
Proof: A non-pile box F 2 Fi , with respect to the leaf-cell C , contains at least two boundaries of F in C (say b -boundary and b -boundary where b < b ). Consider level b in the Cell Decomposition and let s be the slab of level b ? 1 that contains 1
2
2
1
2
2
C . As b1 < b2, the b1-boundary is contained in the slab s and the box F is in Vs1. At level b2 we split the slab s at each n2=3-th b2 -boundary of boxes from Vs1 . Therefore, C contains at most n2=3 non-pile boxes such that their largest boundary in C is b2 and the number of non-pile boxes in C does not exceed O(n2=3). 2 Using the Cell Decomposition procedure we obtain the following properties: (i) The number of cells in T is O(n2(d?1)=3). (ii) Each (d ? 1)-box partially covers O(n2(d?2)=3) cells of T . (iii) Each leaf cell C contains at most n4=3 boxes partially covering C . (iv) Each leaf cell intersects at most n2=3 non-pile boxes with respect to that leaf. To verify these invariants, observe that each slab was split into 2n2=3 and, as we are dealing with (d?1)-space, the number of cells is O(n2(d?1)=3). At the moment we split the slabs generated for axis j ? 1 with respect to j-th axis, there are O(n2(j ?1)=3) slabs. A box that partially covers the cell, has at least one axis in which it does not cut through the slabs. Therefore, each box cuts through at most O(n2(j ?1)=3 n2(d?1?j )=3) = O(n2(d?2)=3) cells. The last two properties follow immediately from the way that the slabs are split. We now modify the second-phase trees. For each leaf-cell , we calculate as before the boxes Gi consisting of points that are not covered by any pile of color i at a certain instance (ignoring the non-pile boxes). If we now construct the p standard OPT for the N 0 = n4=3 boxes (Gi ), each slab at level j is split into N 0 = n2=3 slabs. As the total number of non-piles in each leaf-cell is only O(n2=3), we may also split each slab at each non-pile boundary without increasing the complexity of the leaf-cells. Note that the value n2=3 was selected p as the smallest number such that each leaf-cell will contain no more than O( n4=3 ) non-pile boxes. The following procedure describes the construction of the augmented OPT T for a leaf-cell of T .
Sub Cell Decomposition
Calculate the piles G = fGi g and the non-piles F = fFi g, ofpthe leaf-cells, for i = 1; : : : ; n. Let N 0 = jGj = O(n4=3 ). Split the rst axis at all N 0 1-boundaries of boxes in G and at each 1-boundary of a non-pile box. For j = 2; : : : ; d ? 1 consider the slab s generated at axis j ? 1 and let Vs1 be the boxes of G that have a j 0 -boundary in s for some j 0 < j . Let Vs2 be the boxes of G that intersect s but do not have any j 0 -boundary in s for j 0 0.) Theorem 10. Let A and B be as above. A translation t that minimizes H2(A; t+B) can be found in expected time O(nd3d=2e+1 log3 n). Proof: Due to space limitations we present only a brief sketch of the proof here. As in the L1 case, we develop an algorithm for the Hausdor decision problem and then combine it with the parametric search paradigm of Megiddo [15] to create an algorithm for nding the minimum Hausdor distance. To solve the decision problem, we show that there are just O(nd3d=2e ) critical points that are potential candidates for vertices of regions with color-depth n. Our counting method is analogous to that used in the proof of Theorem 7. Each such critical point must be tested to see if it is covered by all the colors. To do this we use a result of [14] to build in time O(ndd=2e logO(1) n) a data structure of size O(ndd=2e logO(1) n) which can answer a coverage query in time O(logn). Initially, it looks as though such a data structure is needed for each color, but since the set of spheres for one color is just a translation of the set of spheres for each other color, we really need just one copy of this data structure. A single critical point must be tested for every color taking time O(n log n). Thus the total time for solving the Hausdor decision problem is O(nd3d=2e+1 log n). The extra log factors in the theorem are due to the overhead of using parametric search to solve the corresponding optimization problem. 2
9 Open Problems
Are L1 Hausdor decision problems inherently easier than the ones for L2? Is there a single technique that solves both types of problems eciently.
Given a d-dimensional box C and n boxes G1; : : :; Gn lying inside C, is it possible to determine if [n1 Gi = C in time o(nd=2 )? Finding such a technique would immediately improve the running time of our L1 algorithm.
Finally, our techniques do not take advantage of the fact that the set of cubes
for one color is really just a translation of the set of cubes for each other color. Is there some way to use this information to design a better algorithm?
Acknowledgments.
We would like to thank Micha Sharir for useful discussions and help in the proof of Theorem 7. Thanks also to Reuven Bar-Yehuda for contributing to the algorithm of Section 4.
References [1] P. K. Agarwal, B. Aronov and M. Sharir, Computing Envelopes in Four Dimensions with Applications, Proc. 10th Annual ACM Symposium on Computational Geometry 1994, 348{358. [2] P. K. Agarwal, M. Sharir and S. Toledo, Applications of Parametric Searching in Geometric Optimization, Proc. 3rd ACM-SIAM Symposium on Discrete Algorithms, 1992, 72{82. [3] N. M. Amato, M. T. Goodrich and E. R. Ramos, Parallel Algorithms for HigherDimensional Convex Hulls, Proceedings 6 Annual ACM Symposium on Theory of Computing 1994, 683{694. [4] J. Boissonnat, M. Sharir, B. Tagansky and M. Yvinec, Voronoi Diagrams in Higher Dimensions under Certain Polyhedral Distance Functions, Proc. 11th Annual ACM Symposium on Computational Geometry 1995, 79{88. [5] H. Bronnimann, B. Chazelle and J. Matousek, Product Range Spaces, Sensitive Sampling, and Derandomization, Proc. 34th Annual IEEE Symposium on the Foundations of Computer Science 1993, 400{409. [6] B. Chazelle, H. Edelsbrunner, L. Guibas and M. Sharir, A Singly Exponential Strati cation Scheme for Real Semi{algebraic Varieties and Its Applications, Theoretical Computer Science 84 (1991), 77{105. [7] L.P. Chew and K. Kedem, Improvements on Geometric Pattern Matching Problems, Algorithm Theory { SWAT '92, edited by O. Nurmi and E. Ukkonen, Lecture Notes in Computer Science #621, Springer-Verlag, July 1992, 318{325. [8] H. Edelsbrunner, Algorithms in Combinatorial Geometry, Springer-Verlag, Heidelberg, Germany, 1987. [9] A. Efrat and M. Sharir, A Near-linear Algorithm for the Planar Segment Center Problem, Discrete and Computational Geometry, to appear. [10] G.N. Frederickson and D.B. Johnson, Finding kth Paths and p-Centers by Generating and Searching Good Data Structures, Journal of Algorithms, 4 (1983), 61{80. [11] D.P. Huttenlocher and K. Kedem, Eciently Computing the Hausdor Distance for Point Sets under Translation, Proceedings of the Sixth ACM Symposium on Computational Geometry 1990, 340{349. [12] D.P. Huttenlocher, K. Kedem and M. Sharir, The Upper Envelope of Voronoi Surfaces and its Applications, Discrete and Computational Geometry, 9 (1993), 267{291. [13] J. Matousek, Reporting Points in Halfspaces, Comput. Geom. Theory Appls. 2 (1992), 169{186. [14] J. Matousek and O. Schwarzkopf, Linear Optimization Queries, Proc. 8th Annual ACM Symposium on Computational Geometry 1992, 16{25. [15] N. Megiddo, Applying Parallel Computation Algorithms in the Design of Serial Algorithms, J. ACM 30 (1983), 852{865. [16] M. Overmars and C.K. Yap, New Upper Bounds in Klee's Measure Problem, SIAM Journal of Computing, 20:6 (1991), 1034{1045. [17] F.P. Preparata and M.I. Shamos, Computational Geometry, Springer-Verlag, New York, Berlin, Heidelberg, Tokyo, 1985. [18] W. Rucklidge, Lower Bounds for the Complexity of the Hausdor Distance, Proc. 5th Canadian Conference on Computational Geometry 1993, 145{150. [19] W. Rucklidge, Private Communication. [20] R. Seidel, Small-dimensional Linear Programming and Convex Hulls Made Easy, Discrete and Computational Geometry, 6 (1991), 423{434.
Appendix
The operations for inserting a cube q into, or deleting q from a leaf-cell C are done by the procedures Insert to Leaf and Delete from Leaf. They have the side eect of updating the eld Active . In addition, they return jcolors(Active)j if there is a point in C covered by all colors of Active , or 0 otherwise. The following procedures update the Active eld in each node of T , and the depth elds of the leaves of T . Insert To Active(q; )
if q is disjoint to C then return ; if is a leaf then depth Insert to Leaf(q;); else /* is an internal node */ if q does not cover C completely then Insert To Active(q;lef t ) ; Insert To Active(q;right ) /* q covers C completely */ else if color(q) in Active or in Active; an ancestor of then return else /* No ancestor of is covered by color(q) */ insert color(q) into Active for every descendant of if is an internal node and color(q) 2 Active then delete color(q) from Active if is a leaf then for every q0 of color color(q) in Active set Depth Delete from Leaf(q0; ) Delete From Active(q; )
if q is disjoint to C then return ; if q partially covers C and is an internal node then Delete From Active(q;lef t ) ; Delete From Active(q;righ ) else if is a leaf then Depth Delete from Leaf(q; ) else /* is an internal node, covered completely by q */ if no ancestor of contains color(q) in its Active list then recover(color(q); ) else /*No such ancestor exists */ return Recover (color(q ); )
if color(q) 2 L TOT then if is an internal node then Insert color(q) into Active return else /* is a leaf containing cubes of color(q) in L TOT */ for each box q0 of color color(q) in L TOT Depth Insert to Leaf(q0 ; ) else /* no box of color color(q) in Active */ if is an internal node then Recover(color(q);lef t ) ; Recover(color(q);right ) This article was processed using the LaTEX macro package with LLNCS style