Geometric Pattern Matching: A Performance Study Martin Gavrilov
Piotr Indyk Rajeev Motwani y Suresh Venkatasubramanian z Department of Computer Science, Stanford University. fmartinga,
indyk, rajeev,
[email protected]
Abstract In this paper, we undertake a performance study of some recent algorithms for geometric pattern matching. These algorithms cover two general paradigms for pattern matching; alignment and combinatorial pattern matching. We present analytical and empirical evaluations of these schemes. Our results indicate that a proper implementation of an alignmentbased method outperforms other (often asymptotically better) approaches. 1 Introduction This paper is concerned with the geometric pattern matching problem: given two d-dimensional points sets, the pattern P and the image Q, determine a rigid transformation that brings P close to Q under some speci ed distance measure. A popular measure is the Hausdor metric (denoted by dH ), where the distance from P to Q is de ned as the maximum over all points p 2 P of the distance between p and its nearest neighbor in Q. This problem has many applications in areas such as model-based object recognition [Wol90], pharmacophore identi cation [FKL+97], vehicle tracking [GL96], and image registration [Bro92]. In this paper, we will consider two-dimensional pattern matching. A comprehensive study of these problems was initiated by Alt, Mehlhorn, Wagener, and+ Welzl [AMWW88], and was followed by other work [CGH 93, HKK92, Ruc93, Box96, IR96]. Unfortunately, the algorithms presented in this body of work turn out to be fairly impractical. In fact, as noted in the survey by Alt and Guibas [AG96] and in the paper by Goodrich et al [GMO94], these algorithms are likely to This work was supported by a Stanford Graduate Fellowship and NSF Award CCR-9357849, with matching funds from IBM, Mitsubishi, Schlumberger Foundation, Shell Foundation, and Xerox Corporation. y Supported by an IBM Faculty Partnership Award, an ARO MURI Grant DAAH04{96{1{0007, and NSF Young Investigator Award CCR{9357849, with matching funds from IBM, Mitsubishi, Schlumberger Foundation, Shell Foundation, and Xerox Corporation. z Supported by ARO MURI Grant DAAH04-96-1-0007 and NSF Award CCR-9357849, with matching funds from IBM, Mitsubishi, Schlumberger Foundation, Shell Foundation, and Xerox Corporation.
be \dicult to implement and numerically unstable due to the necessary computation of intersections of complex algebraic surfaces." Worse still, they have unacceptably high running times: for example, for jP j = k and jQj = +n, the running time of the algorithm of Chew et al [CGH 93] is O~ (k3 n2 ); in 0 and point sets P;Q in the plane, let = dH (P;Q). If , then return a transformation T such that dH (T (P ); Q) (1 + ); If > (1 + ), then return none. Otherwise, when 2 (; (1 + )], return any transformation T . For simplicity, we will x = 2 in our algorithms. Our algorithms are also enumerative, in that they report all valid matches. 3 Alignment-based schemes The three alignment-based algorithms that we consider all conform to the same basic template: Algorithm ALIGNMENT: 1. Compute a diameter pair p1 ; p2 of P . This is a pair p1 ; p2 such that diam(P) = kp1 ? p2 k = . 2. Find the set A of all pairs (q1 ; q2 ) of points in Q such that ? 2 kq1 ? q2 k + 2. 3. Construct a set T of (possibly many) transformations mapping (p1 ; p2 ) close to (q1 ; q2 ) (the details will be given shortly) 4. Search for T 2 T such that dH (T (P ); Q) (1 + ). Step 1 of the algorithm can be performed in O(n log n) time [PS85]. We now discuss in detail the implementation of Step 2-4. 3.1 Step 2: Extracting Candidate Pairs The three schemes dier in their implementation of Step 2. Basic Alignment (BA): This is essentially the scheme employed by Goodrich, Mitchell and Orletsky [GMO94] that yields a constant factor approximation to dH (P;Q). We enumerate all pairs (q; q0 ); q 6= q0 2 Q, retaining only those pairs that satisfy the condition of Step 2 Multiple Grids (MGRID): This is the scheme outlined in [IMV99]. Let G denote a planarpgrid consisting of rectangup lar cells of width 4 and height . For each i = 0; : : : ; i we de ne Gi to be the grid G rotated by an angle 2p . Notice, that for anyppoint q the ring R(q; ? 2; + 2) can be covered by O( ) grid cells from G0 ; : : : ; G . During the preprocessing step, we store q at the grid cell of all Gi containing q. Now, to compute the set A satisfying the conditions of Step 2, we must compute Q \ R(q; ? 2; + 2) for each point q in Q. In order to compute Qp\ R(q; ? 2; + 2) we retrieve the contents of the O( ) buckets covering R(q; ? 2; + 2), eliminating all q0 such that q0 2= R(q; ? 2; + 2). The total complexity of this step (over all q 2 Q) can be bounded by the total number of the points retrieved p plus the total number of buckets accessed (which is Op(n )) plus the cost of preprocessing (which is again O(n )).
Single Grid (GRID): In this scheme, we use a single square grid, with cells of size . We place all points in Q in this grid, and with every point q 2 Q we store the set of all grid cells covering R(q; ? 2; + 2). As before, to compute the set A in Step 2 of the algorithm, we traverse this set of cells, eliminating points as before.p As we shall see in Section 5, this step runs in time O( nk15==44 ) (ignoring polynomial terms in 1=). 3.2 Step 3: Constructing the transformation set This operation is performed in Step 3 of ALIGNMENT. In order to solve the -approximate version of the problem, we can construct a set of O(1= 3 ) transformations mapping the pair (p1 ; p2 ) close to (q1 ; q2 ), as described in [HS94]. For simplicity, in0 this paper we merely compute the single transformation T that maps p1 to q1 and aligns p1~p2 to q1~q2 . Since p1 ; p2 is a diameter pair, the error incurred in doing this is no more than 2, implying that the approximation is within a factor of 2 of the minimum value. 3.3 Step 4: Verifying the match In Step 4 of the algorithm we need to verify if for a given transformation T the point set T (P ) is close to Q. This is done by checking for each p 2 P if there is a point in Q within from T (p). One way of implementing this check is as described in [IMV99]. p We subdivide the plane into a grid of cell width = 2. For each grid cell, we make a list of all points of Q contained within it. Now, for each point p 2 T (P ), we nd the cell c containing it, and check whether this cell (and all adjacent cells) are empty or not. As it turns out, a more ecient implementation can be obtained by observing that we only need to test whether a grid cell is empty or not. Therefore, instead of maintaining (for each cell) the set of points contained within it, we need only maintain one bit for each cell which is set if there is some point within distance from it. This bitmap can be constructed in advance, by spreading each point q 2 Q to all cells that intersect B (q; ). If the cell size is ; 1, then given a point p 2 T (P ), this procedure willp return YES if there is a point of Q within distance (1 + 2) of p. This scheme yields signi cant improvement in running time over the rst method2 . 4 The Combinatorial Pattern Matching Scheme We brie y describe this scheme; a more detailed outline can be found in Appendix A. Fix a point p 2 P . We partition the plane into O(log =) concentric rings centered around p, of radius ; 2; : : : 2dlog e . Each ring is partitioned radially into cells such that the diameter of each cell is O( ). In each ring, the sequence of non-empty cells constitutes a pattern string, de ned over the alphabet of canonical numberings of the cells. Now, for each image point q 2 Q, we perform the same procedure, yielding a set of O(log =) text strings. These are the strings on which we perform subset matching. Note that there exists a match from P to Q i there exists some q 2 Q, and some rotation such that each of the pattern strings matches with its corresponding text string. The subset matching algorithm involves computing convolutions of binary vectors over (; +) [Ind98]. We use the 2 For the sake of brevity, we will not present performance comparisons for these two schemes.
NTL package developed by Victor Shoup [Sho] that im-
plements various operations on nite elds, including fast polynomial multiplication (which we use to implement convolution). 5 Running Time Analysis In this section we present analytical estimations of the running time of GRID. We present two estimations: a worst case bound in terms of the ratio between the diameter and the closest pair, and an upper bound under probabilistic assumptions about the input. 5.1 Worst-case Analysis Recall that the algorithm proceeds as follows. First a grid of a speci ed cell side = ; 1 is imposed on the image. 0 of P ; let Then the algorithm computes a diameter pair p; p r = jp ? p0 j. For each point q 2 Q we select the cells Cq which intersect with an annulus centered at q with inner radius0 r ? 2 and outer radius r + 2. Finally, for all pairs q and q 2 Cq we align (p; p0 ) to (q; q0 ) and verify the match. Assume (as in [IMV99]) that the minimum interpoint distance is 1. Let 1 denote the diameter. p Theorem 5.1 GRID runs in time O(n5=4 k3=4 ). Proof : By the analysis of [IMV99] we know that the total number of pairs found is O(n4=3 ()1=3 ). Also, the total numberpof cells visited is O(n=), as r . By setting = k3=4 n1=4 (and assuming this number is greaterpthan ) we get the total running time bound of O(n5=4 k3=4 ). 5.2 Probabilistic Analysis We consider two following random input models. In both cases the pattern is generated by2 choosing k points2 uniformly at random from the square [0; t] , for t such that t = nk ; note that t 1. This choice guarantees that the density of the points in the image and the pattern is roughly similar. The models are dierentiated by the way we select the image. In the rst case (which we call model A) the image consists of n points chosen independently at random from the unit square [0; 1]2 . In the second case (called model B) the rst n ? k points are chosen at random from the unit square. The last k 2points are chosen by picking a random vector from [1 ? t] , translating P by this vector and adding the resulting point set into Q. The rst model has the advantage of being simple: the pattern and the image are chosen independently, which simpli es the analysis. On the other hand, for small values of the probability of an existence of a match is very small; in such a situation it is not very realistic. The model B does not have this drawback (by de nition); however, it is more dicult to deal with due to the fact that the pattern and the image are not independent. The estimations which we give below are valid for both models. However, for the sake of simplicity, we give proofs only for model A. We show that if the number of occurrences of the pattern in the image is small, then the expected running time of GRID is subquadratic (more speci cally, O(n1:5 )). On the other hand if the number of matches is large, then GRID runs in superquadratic time (i.e. roughly O(kn1:5 ). Thus depending on the characteristic of the input data, the algorithm is expected to run either faster or slower than CPM, which runs in quadratic time independently of the input properties.
Theorem 5.2 If = Oq ( p1n ), then GRID runs in expected time O(n1:5 ). For > c logn n for large enough c > 1, then the expected running time of GRID is O(kn1:5 ). Proof : In order to estimate the running time of the procedure, we rst observe that the total area Aq of the union of cells from Cq is O(r), which is at most at for some constant a > 0. Consider now a speci c point q 2 Q. From the way we generate the random points it is easy to see that the expected number of points in Q falling into the cells from Ctq is Aq n atn. On the other hand, we know that jCq j . Thus if we denote the expected cost of verifying an alignment by C , then the expected cost CA of the algorithm is the order of
CA = n (t= + Ctn) = tn(1= + Cn): If is small (formally O( p1n )), then it is not dicult to verify that C = O(1); note that in this case the number of matches
is 0 with high probability. Then the cost of the algorithm is bounded by
Csmall = O(nt(1= + n)) = O(n3=2 t) for = p1n . On the other hand when is large (formally q greater than c logn n for large enough c > 1), then the algorithm cost is
Clarge = O(nt(1= + kn)) = O(kn3=2 log n) for = . Notice that in the latter case many matches exist. Note that a similar analysis to the one above yields an expected running time of1 O(n2 ) for BA and O(n1:5 ) for MGRID when O( pn ). For large the above bounds are multiplied by k. 6 Experiments We now describe the experiments that we performed on these algorithms. All of the above algorithms were implemented using SGI C++ without any optimization ags. The machine used was an SGI Indigo running IRIX 6:2 with a 195 MHz MIPS R10000 processor, 384 MB RAM and a 32 KB data cache. 6.1 Data Sets Random Data: The rst data set consists of random sets of points. Speci cally, given two parameters n; , we generate a random image consisting of n points inside a bounding box of side . It is easy to see that for such an image, the expected ratio of the largest distance to the smallest distance is O(n). To generate the pattern, we use a parameter , and perturbation parameters ; . To create a pattern, we place a bounding box of side randomly in the image, and extract the set P0 of all points within. We then perturb each point in P0 by a uniformly distributed random value (in the range [0::]) and then rotate P0 by to obtain P . This construction ensures dH (P; Q) . In all the experiments that follow, we will set = 50 . We will also set = 2 in all cases where a xed perturbation is called for.
Satellite Data: Mount et al [MNL98] consider data sets obtained by feature extraction from satellite images. We consider two of these sets, one drawn from a satellite image taken over Haifa, Israel, and the other from an image taken over South Africa. We construct patterns from these image sets in the same way as above. The rst set (S1) has 1020 points. Its diameter is 173.9. The second set (S2) has size 927 and diameter 188.6. It is interesting to note that the behavior of the algorithms on these sets closely mirrors their behavior on random data sets. As we shall see, these data sets share certain key properties which enables such behavior. 6.2 Running Times Our rst suite of experiments studies the variation in running time of the algorithms as various input parameters are changed. For this paper, we maintain to be xed, so the four parameters that determine the running time of the algorithms are k (the pattern size), n (the image size), , and . In reality, n; and are not really independent of each other. Although can in principle vary arbitrarily, typical values of will be close to the average closest-point distance, p which for random data sets can be shown to be (= n). Running time vs image size The rst experiment compares the running time of the four schemes as a function of n, the size of the image. For this experiment, we used random instances of cardinality varying from 100 to 1000 points. The diameters p of the sets were chosen in order to maintain the ratio = n to be constant. To generate the patterns, we set = 0:6 in the choice of the random window, and used perturbation parameters = 2 and = 50 to perturb the points. For each instance, we set = 2 and = 1. For GRID, we recorded the optimal running time achieved by varying the mesh cell size . Figure 1 shows how the four schemes compare. Notice that CPM performs orders of magnitude worse than the alignment-based schemes. Hence, in Figure 2, we focus on the three alignment-based schemes. We see that BA performs the worst of the three as n increases, while GRID consistently beats the other two schemes. S1
GRID 0.412 BA 0.815 MGRID 0.869 CPM 119.792
S2 0.577 0.860 1.265 87.510
Table 1: Times for Satellite Data In Table 1 we show the running times for S1 and S2 using the same pattern extraction parameters as above. As we can see, the relative ordering of the algorithms is the same as above. Running time vs distance threshold Next, we investigate the behavior of the schemes as we vary the noise parameter . We do this by varying the perturbation parameter , while keeping the input sizes xed. For this experiment, we x the (random) image size at 1000, and select the pattern using = 0:5. As increases, the number of solutions increases, and this is re ected in the increased running time of all the
2500
700 CPM MGrid Grid BA
Running Time (secs)
Running Time (secs)
2000
CPM MGrid Grid BA
600
1500
1000
500 400 300 200
500 100 0
0 0.5
16
1
4
5
0
5
10
15
20
25
Epsilon
Figure 1: Running time vs n
3.5
MGrid Grid BA
14
Figure 3: Running time vs noise MGrid Grid BA
3
12
Running Time (secs)
Running Time (secs)
2 3 Size of image (n) * 1000
10 8 6 4
2.5 2 1.5 1 0.5
2 0
0 0.5
1
2 3 Size of image (n) * 1000
4
5
Figure 2: Running times vs n (Alignment) alignment-based schemes, as they generate solutions one at a time. However, as the asymptotic complexity of CPM is independent of the number of solutions, its performance is relatively independent of . This enables it to outperform GRID (and BA) for large . In Figure 4, we compare the three alignment based schemes. Once again, we observe that GRID outperforms the other two algorithms. Note that as increases, more and more candidate pairs (q1 ; q2 ) are chosen in Step 3 of the basic alignment algorithm, and so the performance of GRID will tend towards that of BA. 6.3 The Quality of Filtering The previous suite of experiments establishes the superior performance of GRID for most values of and n. The dierence between GRID and the other schemes is in its ltering procedure. In this section, we study the ltering performed by GRID more closely. Distance pairs vs distance One of the key parameters governing the running time of GRID is the number of candidate pairs (q1 ; q2 ) retrieved in Step 3 of Algorithm 3. For a given point set, we calculate the number of pairs whose distance lies in the range [i;i + 1] for all i < . We scale the ranges such that = 100, and then normalize each such plot so that the area under each curve is 1. In Figure 5, we plot the resulting distance distributions for a random point set having 1000 points and for S1 and S2.
1
2
3
4 Epsilon
5
6
7
Figure 4: Running time vs noise (Alignment) One can immediately observe similarities between the distributions. One common property they all share is that the number of large pairs is very small. This would imply that if the pattern and image sizes are comparable, GRID should perform very well. Another observation is that the distribution curves are quite smooth, without signi cant peaks at any speci c distance. This means that even for smaller patterns the ltering mechanism of GRID should work well. Our next graph demonstrates this behavior. We x a point set and using the random-window approach described earlier, we generate patterns of varying diameters. We then plot the running time of GRID against the ratio of pattern diameter to image diameter. In Figure 6 we present plots for a random set of 1000 points and diameter 179.2, and S1 and S2. In all cases, the running time decreases as pattern diameter increases. Mesh cell size dependence Recall that GRID uses a uniform grid of side , where we refer to as the mesh factor. Notice that as tends to =, GRID will behave more and more like BA, becoming identical to it when = . Therefore, the eectiveness of GRID can be seen by examining the value of which yields the best performance. We consider three point sets, S1, S2 and a random set of size 1000. For each set we extract a pattern (where = 0:5), perturbing it as before. We now run GRID on this instance with varying values of , keeping all other parameters xed. In Figure 7, we plot running time against mesh factor for
each of the point sets. Notice that in each case, there is a well-de ned non-trivial point at which GRID performs fastest. 0.025 Random S1 S2
Number of pairs
0.02
0.015
0.01
0.005
0 0
20
40 60 80 Edge length (normalized)
100
Figure 5: Distance Distributions
8 Acknowledgements We would like to thank David Mount for providing us with the data that was used in [MNL98]. We also would like to thank Victor Shoup for providing us with the NTL package for nite eld arithmetic, Dragomir Angelov for helpful discussions, and the anonymous reviewers for helping clarify the presentation of the paper.
4.5 S1 Random S2
4
Running Time (s)
3.5 3
References
2.5
[AG96]
2 1.5 1 0.5 0 0.2
0.3
0.4 0.5 0.6 0.7 0.8 0.9 Pattern Diameter/Text Diameter
1.0
Figure 6: Running Time vs. Pattern Size 1.1
Random S1 S2
1 Running Time (secs)
7 Discussion The main conclusion of our study is that for typical values of GRID is the best choice. Its simplicity allows it to beat the (much more complex) CPM, while its ltering mechanism of matches avoids the quadratic behavior of BA. Moreover, for small the match veri cation runs eectively in constant time and thus the pattern size has an insigni cant eect on the running time of the algorithm, making it subquadratic (or even close to linear when the diameter of the pattern is close to the image diameter). On the other hand the CPM scheme, although better in the worst case, is order(s) of magnitude slower than other algorithms for small values of . However, this ratio changes when the value of increases. This is due to a rapid increase of the number of matches or potential matches, which makes the match veri cation of alignment schemes more costly. Thus for suciently large CPM achieves the lowest running time.
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0
20
40 60 80 Mesh Size / Epsilon
100
Figure 7: Running Time vs Mesh Size
120
H. Alt and L. Guibas. Discrete geometric shapes: Matching, interpolation, and approximation. a survey. Technical Report B96-11, Freie Universitat Berlin, December 1996. [AMWW88] H. Alt, K. Melhorn, H. Wagener, and E. Welzl. Congruence, similiarity, and symmetries of geometric objects. Discrete Computational Geometry, 3:237{ 256, 1988. [Box96] L. Boxer. Point set pattern matching in 3-D. Pattern Recognition. Letters., 17:1293{1297, 1996. [Bro92] L. G. Brown. A survey of image registration techniques. ACM Computing Surveys, 24:325{376, 1992. [CGH+ 93] L. Chew, M.T. Goodrich, D.P. Huttenlocher, K. Kedem, J.M. Kleinberg, and D. Kravets. Geometric pattern matching under euclidean motion. In Proceedings of the Fifth Canadian Conference on Computational Geometry, pages 151{156, 1993. [CH97] R. Cole and R. Hariharan. Tree pattern matching and subset matching in randomized O(n log3 m) time. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing, 1997. [CS98] D. Cardoze and L. Schulman. Pattern matching for spatial point sets. In Thirty-ninth Annual Symposium on the Foundations of Computer Science. IEEE, November 1998. [FKL+ 97] P. Finn, L. E. Kavraki, J. C. Latombe, R. Motwani, C. Shelton, S. Venkatasubramanian, and A. Yao. Rapid: Randomized pharmacophore identi cation for drug design. In Proceedings of the Thirteenth Annual ACM Symposium on Computational Geometry, 1997.
[GL96] [GMO94]
[HKK92]
[HR93]
[HS94] [HV97] [IMV99] [Ind97] [Ind98] [IR96]
[MNL98]
[PS85] [Ruc93] [Sho] [Wol90]
W. F. Gardner and D. T. Lawton. Interactive model-based vehicle tracking. IEEE Trans. Pat. Anal. Mach. Int., 18(11), November 1996. M.T. Goodrich, J.B. Mitchell, and M.W. Orletsky. Practical methods for approximate geometric pattern matching under rigid motions. In Proceedings of the Tenth Annual ACM Symposium on Computational Geometry, pages 103{113, 1994. D. P. Huttenlocher, K. Kedem, and J. M. Kleinberg. On dynamic Voronoi diagrams and the minimum Hausdor distance for points sets under Euclidean motion in the plane. In Proceedings of the Eighth Annual ACM Symposium on Computational Geometry, pages 110{120, 1992. D. P. Huttenlocher and W. T. Rucklidge. A multiresolution technique for comparing images using the hausdor distance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 705{706. IEEE, June 1993. P. J. Heernan and S. Schirra. Approximate decision algorithms for point set congruence. Computational Geometry: Theory and Applications, 4(3):137{156, 1994. M. Hagedoorn and R. C. Veltkamp. Reliable and ecient pattern matching using an ane invariant metric. Technical Report RUU-CS-97-33, Dept. of Computing Science, Utrecht University, 1997. P. Indyk, R. Motwani, and S. Venkatasubramanian. Geometric matching under noise: Combinatorial bounds and algorithms. In 10th Annual SIAM-ACM Symposium on Discrete Algorithms, 1999. P. Indyk. Deterministic superimposed coding with applications to matching. In Proceedings of the 38th IEEE Symposium on Foundations of Computer Science, pages 127{136, 1997. P. Indyk. Faster algorithms for string matching problems: matching the convolution bound. In Proceedings of the 38th IEEE Symposium on Foundations of Computer Science, pages 166{173, 1998. S. Irani and P. Raghavan. Combinatorial and experimental results for randomized point matching algorithms. In Proceedings of the Twelfth Annual ACM Symposium on Computational Geometry, pages 68{ 77, 1996. D. Mount, N. Netanyahu, and J. LeMoigne. Improved algorithms for robust point pattern matching and applications to image registration. In Fourteenth ACM Symposium on Computational Geometry, June 1998. F. P. Preparata and M. I. Shamos. Computational Geometry: An Introduction. Springer-Verlag, New York., 1985. W.T. Rucklidge. Lower bounds for the complexityof the Hausdor distance. In Proceedings of the Fifth Canadian Conference on Computational Geometry, pages 145{150, 1993. Victor Shoup. NTL: A Library for doing number theory. http://www.cs.wisc.edu/shoup/ntl/. H. Wolfson. Model-based object recognition by geometric hashing. In Proceedings of ECCV 90: First European Conference on Computer Vision, pages 526{536, 1990.
A Appendix In this section, we provide for completeness a detailed sketch of CPM, which appeared in [IMV99]3 . The algorithm proceeds in four steps: (1) Choose an arbitrary point (say p) from P . For each q 2 Q align p to q. Let this translation be denoted as T . Now transform the points in P according to T ; for simplicity we still refer to T (P ) as P . (2) In the next step, split the plane into l = O(log ) concentric rings R1 ; : : : ; Rl centered at p; note that R1 is a full disk. The inner radius of the i-th ring (for i 2) is equal to 2i?1 , the outer radius (for i 1) is ri = 2i . (3) Set i = =2i , for some small > 0. Partition each Ri into 2=i sectors, the sectors being partitioned further by 2i = uniformly placed concentric circles. We denote the set of grid cells obtained from the ring Ri by Gi ; the union of all Gi 's (i.e., the whole partition) is denoted by G. For any point x, let G(x) be the cell of G to which x belongs (ties are broken arbitrarily); the function G can be extended to sets of points in a natural way. p Each grid cell has diameter c , for c > 0 with value near 2. (4) Let Qi = Q \ (Ri?1 [ Ri [ Ri+1 ) Further, let Pi = P \ Ri ; Si =0 [q0 2Qi B (q0 ; + ). Now, for each angle ji , check if G(Pi ) G(Si ) for the set Pi0 obtained by rotating Pi by an angle ji. This is implemented using the subset matching algorithm of Cole and Hariharan [CH97] with the binary-vector convolution scheme used by Indyk [Ind98]. De ne a signature of a grid cell to be its distance to the origin point p; note that all grids from one sector of a ring Ri have dierent signatures, while the signatures of cells from dierent sectors can be equal. De ne the pattern p to be a sequence of sets p[0]; p[1]; : : : such that each set p[j ] contains signatures of grid cells from G(Pi) belonging to the j th sector of Ri (the rst sector is chosen arbitrarily). The text t is constructed analogously to G(Si ). It is easy to check that the subset matching algorithm nds the desired match if one exists.
3 Copyright c 1999 by the Association of Computing Machinery, Inc. and the Society of Industrial and Applied Mathematics.