The Approximation of Two-Dimensional Spatial ...

0 downloads 0 Views 392KB Size Report
possible void rectangles, as they may touch the profile at an arbitrary ..... is to find a corner and its two neighbours with touching void rectangles – the last void ...
The Approximation of Two-Dimensional Spatial Objects by Two Bounding Rectangles Jörg Roth Univ. of Applied Sciences Nuremberg Kesslerplatz 12, 90489 Nuremberg, Germany [email protected]

Abstract. Minimal bounding rectangles are a simple and efficient tool for approximating geometries, particularly for accelerating spatial queries. If a spatial object fills a rectangular shape to a certain extent, then the minimal bounding rectangle is a reasonable approximation. Unfortunately some geo objects, such as streets or rivers, have a small area but large bounding rectangles. In this paper we suggest an approximation with two bounding rectangles instead of a single one. Since the corresponding shape provides a better approximation, we get a greater average benefit. We present an efficient algorithm that computes the two rectangles with the theoretically smallest combined area that encloses a given geometry in O(nlog n) steps. Following a detailed description of the algorithm, we present an analysis with approximately 4 million real geo objects.

1 Introduction Bounding boxes are a simple and efficient tool to speed up geometric or spatial operations. For two dimensions, a bounding box (often called Minimal Bounding Rectangle) is the smallest rectangle that encloses a geometry such as a polygon or line string. Although bounding boxes may have more dimensions, two dimensions are sufficient for typical spatial queries for geo objects on the Earth's surface such as reverse geocoding, map painting or typical queries in location-based services. Even though bounding boxes provide only a rough approximation, there exist efficient a priori checks for spatial properties. For example, if two geometries overlap, their bounding boxes overlap as well. Obviously, the other direction is not always true. Thus, we can use bounding boxes to check quickly whether or not some properties could potentially be true. Bounding box comparisons can even be performed by tools that are not built to process geometries. For example, a non-spatial SQL database can check if a position is inside a box or if two boxes overlap using traditional comparisons of column values inside standard SQL expressions. If the bounding boxes pass the check, the corresponding exact geometry checks may fail, so bounding boxes can only reduce the set of candidates. Since an exact geometric test (e.g. if polygons overlap) may be time consuming, one goal is to have a small candidate set. Unfortunately for some spatial objects the bounding box area is very large compared to the original geometry. Line objects (e.g. rivers or roads), in particular, may generate very large bounding boxes. One way to reduce the candidate set is to reduce the surface area of the approximating construction. The better the approximation of a geometry, the lower is the number of false positive hits by a priori checks. To both retain the idea of bounding boxes and to simultaneously reduce the candidate set, we suggest Double Bounding Boxes (DBBs) that have the following properties:  They consist of two simple bounding boxes, which may overlap.  The area of the double bounding boxes fully encloses the geometry.  The total area of the two boxes is minimal.

The last point is crucial: we determine the minimal combined area of an infinite number of possible box configurations that encloses a given geometry. If we are interested not only in a small, suboptimal combined area, but in the theoretical minimum area, this task is non-trivial. An example of a traditional bounding box and the corresponding DBB is presented in Fig. 1.

Fig. 1: Traditional and double bounding boxes for a given geometry In this paper we present an efficient approach to generate theoretically optimal DBBs. To the author's best knowledge this is a new approach. Note that, depending on the actual usage, DBBs might have overlapping boxes as presented in Fig. 1, but in other scenarios the boxes should not overlap. For example, the spatial index R-tree (Guttman, 1984) requires that each part of a geometry belongs to exactly one box. Incorporating DBBs into R-trees thus produces non-overlapping boxes. The Extended Split Index (Roth, 2009), on the other hand, allows parts of a geometry to belong to both boxes, so that we can use overlapping boxes, since they usually provide a better approximation. Our approach can be configured to produce overlapping or non-overlapping boxes as desired. In the following, we first present related work. Following a detailed description of our DBB approach, we then discuss performance considerations.

2 Related Work Approximating structures such as bounding boxes are used mainly for three purposes: first they can be used as bounding shapes to speed up spatial comparisons; second, they are used to create metadata of spatial objects; and third, they may represent objects inside spatial indexes. Bounding shapes of the first kind are, for example, rectangles, circles, ellipses or convex hulls (Gartner & Schonherr, 1997, Barequet & Har-Peled, 1999, Hill, 2006, Yang et al., 2008, Welzl, 1991). The main goal is to provide a quick a priori test for a spatial condition. If this a priori test is true then it makes sense to perform an additional check which is more exact but also more costly. If the a priori test was not true, then no further checks are needed. The simplest bounding shape is the bounding box, also called Axis-Aligned Bounding Box (AABB) or Minimal Bounding Rectangle (MBR). To distinguish the single bounding box from the double bounding boxes introduced in this paper we call the traditional box Single Bounding Box (SBB). Note that a bounding box does not necessarily have to have aligned axes. If a box is rotated, it may be smaller and still enclose a geometry. This leads to the construction of Oriented Bounding Boxes (Yuan et al., 2006), often used in the area of game engines (for collision detection) or Optical Character Recognition (OCR). Such boxes can be computed using socalled rotating calipers (Toussaint, 1983). We did not follow that approach in this paper because it does not suit our intended usage. To give an impression of how an aligned bounding box may speed up a test, consider a case where we have stored two polygonal geometries A and B. In addition to the polygon's vertices pi, the bounding box is also stored, defined by two corners c1, c2 where c1x  min( pix ) , c1y  min( piy ) , c 2 x  max( pix ) , c 2 y  max( piy )

(1)

We now could query whether or not A and B overlap, whether A is completely inside B (or vice versa), whether A and B only touch (only share a line or point), or whether A and B are disjoint. For each of these geometric properties there exists a quick bounding box test. To test whether the geometries overlap, for example, we first check whether c A1x  c B 2 x  c B1x  c A 2 x  c A1y  c B 2 y  c B1 y  c A 2 y

(2)

Only if this expression is true does it make sense to perform the more complex check of whether the polygons overlap exactly. The drawback of this approach is the number of successful box checks where the exact test fails (false positives). This happens particularly when the actual geometry's area is small compared to the box area. It is important to note that the a priori check for aligned bounding boxes (in contrast to oriented bounding boxes) can also be processed inside a standard database query. Thus, one approach to use non-spatial databases for spatial objects is to store the box parameters in table columns and the exact geometry as a binary large object. Only if the database passes the a priori check, and only then, the geometry is deserialised from the table and the exact check is performed (Roth, 2009). Instead of a bounding box we could also use a bounding circle, defined by centre cx, cy and radius r. To compute an optimal centre (that leads to a minimal r) is not trivial (Elzinga, 1972) and requires a longer computation than a bounding box, but there exist approximations (Welzl, 1991, Hearn & Vijay, 1982). Once the bounding circle is computed, the overlap a priori test is

c Ax  c Bx 2  c Ay  c By 2  rA  rB 2

(3)

which can also be processed inside a standard database query. A third approximation shape is the convex hull, which is the minimal convex polygonal area containing the original geometry. A convex hull provides a better approximation than a rectangle or a circle. There exist algorithms with O(nlog n) complexity to create convex hulls for a given geometry with n vertices, but there are also some drawbacks. First, the time to check geometric properties increases with the number of convex hull vertices whereas the time for rectangles and circles is always constant. Second, the number of convex hull vertices is limited only by the number of vertices of the given geometry and thus may by very high. Hastings presents an approach to split convex hulls into multiple convex hulls (called multihulls) to provide a better approximation (Hastings, 2009). His main idea is to identify the vertex of the original geometry with the largest distance to the convex hull, which is then used to cut the convex hull into two parts. As a result, the union of hulls provides an increasingly accurate approximation of the original geometry, with the drawback that the time to check geometric properties tends to the time for exact checks. The second purpose of bounding boxes is to have a short representation of a spatial object used, for example, in metadata records. There often exists a simple string representation of this box that can be used, for example, in web pages for URLs or sent by email to identify spatial objects. Some examples:  The Dublin Core Metadata Initiative (DCMI) (Kunze & Baker, 2007) is an open organization that defines metadata standards. DCMI provides a box encoding scheme for spatial objects (Cox et al., 2006).  The vertical and horizontal spatial extents (which are our bounding boxes) are part of the ISO 19115 Metadata Standard for geographic information (ISO 19115, 2003).  The bounding box (called envelope) is part of the Geography Markup Language (GML) (Portele, 2007). The last significant role of bounding boxes is to represent a geometry inside spatial indexes. A common approach to quickly access geometries is to first approximate the shape of an object

by a bounding box that is inserted into a tree structure. Common spatial indexes are Quadtrees (Finkel & Bentley, 1974) or variations of R-Trees (Guttman, 1984). They mainly differ in how a tree is efficiently built and maintained when objects are inserted, changed or removed. A query goes down through the tree until an appropriate tree node is found. The corresponding bounding box then can be used to identify a (hopefully small) set of candidates that must then undergo further geometric checks. The major benefit of bounding boxes for all of the functions presented above is that they are very simple to compute and provide a quick a priori check for several kinds of spatial queries. The drawback is that the shapes of spatial objects usually do not fill axis-aligned rectangles, thus there is a significant loss of shape information. In the following we suggest an approach that approximates arbitrary shapes with the help of two bounding boxes.

3 Generating Double Bounding Boxes 3.1 Basic considerations There exist several types of two-dimensional geometries such as multipolygons with holes, line strings, or points (Herring, 2006). Some are not of interest for our bounding box approximations: points can be trivially mapped to a degenerated bounding box by using four identical corners. If polygons have holes, these holes do not change the bounding boxes, and can thus be ignored. For multiple geometries, e.g. multipolygons, box approaches can be repeatedly applied to each part of the geometry. In this paper, we will thus only consider the geometry types closed polygon without holes and line string. Fig. 2a shows a traditional SBB. In the following, north (N), south (S), east (E) and west (W) denote the four main directions. Our goal is to find two axis-aligned bounding boxes that enclose the geometry and have a minimal combined area. The corresponding algorithm is parameterized with a flag overlap that indicates whether the two boxes may overlap (Fig. 2b), or not (Fig. 2c). Note that if an overlap is allowed, the algorithm may also produce non-overlapping boxes, but not vice versa. a) single bounding box

b) two bounding boxes

c) non-overlapping boxes

N

W

E

S

d) corner void rectangles

e) box borders

f) corner profile

Fig. 2: DBB basic considerations To compute a DBB it is reasonable to look at the 'inverse' areas. So, we do not look at the covered area of a geometry, but rather at the empty area in the corners. For this, we introduce

so-called void rectangles (Fig. 2d): a void rectangle is aligned to the DBB's axes and one corner is an SBB's corner. In addition, it touches the geometry, but it does not overlap. If we respect some additional conditions listed below, we can infer DBBs from the void rectangles. In this example the DBB and void rectangles have aligned borders. For example, the north/west rectangle's east border is aligned to the north/east void rectangle's west border (Fig. 2e). However, to actually generate DBBs is more complex, as we present in the next sections. Fig. 2f demonstrates the idea of generating a void rectangle. The north/east corner of all south/west void rectangles reside on a special line string we call the corner profile. To find appropriate void rectangles, we only have to check corners that touch their corner profile. Note that even if corner profiles have a finite number of vertices, we get an infinite number of possible void rectangles, as they may touch the profile at an arbitrary position inside a line segment. The approach to generate DBBs from void rectangles is to find those void rectangles that are maximal under certain constraints. We start with a classification of DBB cases and present the generation of corner profiles in a later section.

3.2 General cases To get a complete list of DBB types, we must answer the following question: Which layout of void rectangles allows the completion of the surrounding rectangle by exactly two more rectangles (our DBB)? We start with examples where this completion fails (Fig. 3).

Fig. 3: Void rectangles that cannot be filled by two more rectangles In these examples, the remaining area cannot be filled by exactly two rectangles but requires an additional rectangle, marked by 'X'. Fig. 4 illustrates 5 cases in which two bounding rectangles can be successfully created. B: 2 void rectangles (diagonal, independent)

A: 1 void rectangle

x2

x1 y1

y

y2

y x1 < x y1 < y

x

x

x1 + x2 < x y 1 + y2 < y

y1 x1

C: 2 void rectangles (sideways aligned) x1

D: 2 void rectangles (diagonally aligned)

y1 x

x1 + x2 < x y2 y1 = y2 y1 < y or x1 = x2 x1 < x y1 + y2 < y

x1

x2

x2

x2 y

E: 4 void rectangles y1

y x y1 x1

x1 = x3 x2 = x4

x

y1 = y2 y 3 = y4

x1 + x2 < x y1 + y3 < x y4

y3 x3

Fig. 4: Void rectangle cases

y2

y

y2 x1 + x2 = x y1 + y2 < y or x1 + x2 < x y1 + y2 = y

x4

The systematic way to list all cases is to iterate through the possible numbers of void rectangles:  1 void rectangle: For any single void rectangle, two more rectangles can always be added to fill the entire space (case A).  2 void rectangles: There exist three cases (cases B, C, and D) with different restrictions (described below).  3 void rectangles: There is no way to generate DBBs from three given void rectangles.  4 void rectangles: The only chance to generate DBBs from 4 given void rectangles requires every pair of neighbour void rectangles to have the same height (east-west neighbours) or the same width (north-south neighbours). This is case E. For two void rectangles we can further distinguish two general cases:  The void rectangles are neighbours at the same border (case C): In order to generate a DBB, these void rectangles must have the same height (north or south border) or the same width (east or west border).  The void rectangles are at opposite corners (cases B and D): If the void rectangles do not overlap in the x- or y-direction, we can create DBBs. In case B the DBB rectangles overlap, in case D they share a border. The boxes could also only share a corner, but this case is subsumed under case D. Note that minimal DBBs based on case A occur only if the three other corners are part of the geometry, for example, if a line starts exactly in the corner. If not, DBBs that are based on two void corners always have a lower combined area. As a result, case A rarely occurs in real data and we can delete corresponding tests from our algorithm. We only document case A for completeness. For each case, there are a number of sub-cases, depending on the corresponding corners:  Case A has 4 sub-cases: the single void corner can be at NW, NE, SW, SE;  Case B has 2 sub-cases: the corners can be at NW/SE, NE/SW;  Case C has 4 sub-cases: the common border can be at N, S, E, W;  Case D has 4 sub-cases: the corners can be at NW/SE, NE/SW and they can share the border in direction NS or EW;  Case E cannot be distinguished into further sub-cases. A preliminary algorithm to create a DBB from a given geometry is: Algorithm 1: DBB creation (preliminary version) iterate through all cases and sub-cases. For each sub-case { generate the maximum void rectangle(s) that fulfil(s) the conditions related to this case; sum up the void rectangle areas; if (sum of areas > area of the previous best solution) store these void rectangles as the new best solution; } return the best solution;

If the algorithm should only produce non-overlapping DBBs, then simply skip cases B and E. We looked up examples for these cases in real spatial data. For this, we used approx. 4 million records from the file 'Germany" of Open Street Map (Ramm & Topf, 2010). Table 1 shows examples for each case found in the data. We distinguish closed polygons and line strings.

Table 1: Real examples for void rectangle cases A: 1 void rectangle

B: 2 void rectangles (diagonal, independent)

Overlap

no

yes

Closed polygon example

no real example found

Line string example

no real example found

C: 2 void rectangles (aligned sideways) no

D: 2 void rectangles (aligned diagonally) no

E: 4 void rectangles

yes

3.3 Generate a corner profile To compute void rectangles we need the auxiliary construction of corner profiles. We only need to compute these profiles once in order to generate all of the cases A to E introduced above. The definition of a corner profile is:  It is a line string that has start and end positions at two different SBB borders (e.g. it connects the west and south borders). No inner position of the line string touches a border.  The line is monotonic. Either monotonically increasing (west to north, south to east) or monotonically decreasing (north to east, west to south).  The region between the corner profile and the corresponding SBB corner has a maximum size and does not overlap the given geometry (i.e. it has to touch the geometry).

Fig. 5: Corner profile construction examples Fig. 5 shows construction examples. Only the right construction is a corner profile; the other examples break one of the rules above. The actual construction requires three steps as presented in Fig. 6 for a south-west corner profile.

west

Fig. 6: Steps to construct a corner profile The following steps must be performed: 1. We use a plane sweep approach to add new vertices to the original geometry. For any xcoordinate of a vertex, we add a new vertex on every line segment that covers this coordinate, i.e. contains a point with that x-coordinate. The new vertices reside on an existing line segment, i.e. the actual shape of the geometry does not change. After this, each vertical line either only cuts vertices or only cuts inner line points. 2. After the integration of new vertices we now check whether multiple line segments have the same projection on the x-axis. In this case, we only keep the one with the smallest ycoordinates. Note that after step 1 line segments have either equal or fully disjoint projections on the x-axis. Further note that after step 1 line segments cannot intersect, thus we can use any line point (e.g. the starting point) for comparison. 3. We iterate through all remaining line segments from west to south. We maintain a variable miny that holds the minimal y-coordinate so far. To get a monotonic decreasing line, miny must not increase as we iterate through the lines. Thus, if endpoints have y-coordinates larger than miny, they must be moved downwards. In some cases, a new point must be artificially added. Fig. 7 presents the possible constructions.

Fig. 7: Ensuring monotony of corner profiles The figure shows how a new line is integrated into the profile. We can distinguish six situations. We have to check whether:  the first end point is exactly at y=miny (a, b),  both end points are higher than miny (c),  the first end point is higher, the second lower than miny (d), or  the first end point is lower than miny (e, f). In situations d, e, and f, an additional vertex must be integrated into the line string. Some words about the complexity. Let n denote the number of geometry vertices. It requires O(nlog n) steps to generate additional vertices using the plane sweep algorithm and remove those lines that do not have the smallest y-values (steps 1 and 2 in Fig. 6). This can be motivated as follows: the plane sweep algorithm sorts all line segments by their x-coordinate of the left vertex. Efficient sorting requires O(nlog n) steps. After this, the sorted list can be iterated line by line and checked for overlaps which requires O(n) steps. It requires an additional O(n) steps to ensure monotony (step 3 in Fig. 6) since the sorted line list can be iterated ele-

ment by element again. In summary, it requires O(nlog n) steps to generate the four corner profiles.

3.4 Compute maximum void rectangles Once we have computed the four corner profiles, we must generate void rectangles corresponding to the cases A to E (section 3.2). We already specified that void rectangles must respect certain conditions; in addition, the rectangles should have a maximum area to get minimum DBB areas. For void corner generation we can group the following cases:  Cases A and B: a void rectangle can have the maximum area in its respective corner without considering the other corners.  Cases C and D: two void rectangles should have a maximum combined area, and either the height or the width is related (e.g. equal heights or the sum of the widths equals the SBB width).  Case E: four void rectangles should have a maximum size, and both the height and the width are related to the neighbouring rectangles. We now present algorithms to obtain the maximum void rectangles. Cases A and B For a given corner profile, what is the rectangle that touches the corner profile and has the maximum area? For this we iterate through all profile line segments and check whether the largest void rectangle touches this line. The overall maximum rectangle for all line segments is the required solution. For a specific line segment the problem is sketched in Fig. 8, on the left.

Fig. 8: Computation of maximum void rectangles We are given a line segment (x1, y1)-(x2, y2), the SBB's lower-left corner (x0, y0), and a void rectangle spanned between the corner (x0, y0) and (xr, yr). Since (xr, yr) is point on the given line, there exists a p with x r  x1  ( x 2  x1 )  p , y r  y1  ( y 2  y1 )  p and 0  p  1

(4)

The area A of the rectangle is A  ( x 1  x 0  ( x 2  x 1 )  p)  ( y 1  y 0  ( y 2  y 1 )  p)

(5)

We further introduce dx0  x1  x0 , dy 0  y1  y 0 , dx1  x 2  x1 , dy1  y 2  y1

(6)

and express the area A  (dx 0  dx1  p)  (dy 0  dy1  p)

 dx 0 dy 0  p  dx1dy 0  dx 0 dy1   p 2  dx1dy1 

(7)

We need to find the value pmax that maximizes the value A, which gives pmax 

 dx1dy 0  dx 0 dy1 2dx1dy1

(8)

In order to have (xr, yr) on the given line, pmax has to be mapped to 0 if pmax < 0 and mapped to 1 if pmax > 1. The algorithm that computes the maximum independent void rectangle can be sketched as follows. Algorithm 2: Maximum independent void rectangle for each line x1,y1,x2,y2 of the corner profile { dx0=x1-x0; dy0=y1-y0; dx1=x2-x1; dy1=y2-y1; pmax=(-dx1*dy0-dx0*dy1)/2/dx1/dy1; if (pmax1) pmax=1; A=(dx0+dx1*pmax)*(dy0+dy1*pmax); if (A > previous largest A) { xr=x1+dx1*pmax; yr=y1+dy1*pmax; store A, xr, yr as the new largest void rectangle } }

Cases C and D These cases can be handled in a similar manner, except that the two rectangles cannot be chosen independently. We demonstrate the approach for case C, sub-case west (Fig. 8, in the middle). We are given two lines (xA1,  yA1)-(xA2,  yA2), (xB1,  yB1)-(xB2,  yB2), the SBB's lower-left corner (xA0, yA0), and the upper-left corner (xB0, yB0). We introduce dAx 0  x A1  x A 0 , dAy 0  y A1  y A 0 , dAx1  x A 2  x A1 , dAy1  y A 2  y A1 , dBx 0  x B1  x B0 , dBy 0  y B1  y B0 , dBx1  x B 2  x B1 , dBy1  y B 2  y B1

(9)

For a given p ( 0  p  1 ) the combined area is A  B  (dAx 0  dAx1  p)  (dAy 0  dAy1  p)  (dBx 0  dBx1  p)  (  dBy 0  dBy1  p)            dAx 0 dAy 0  dBx 0 dBy 0  p  (dAx1 dAy 0  dAx 0 dAy1  dBx 0 dBy1  dBx1 dBy 0 )  p 2  (dAx1 dAy 1  dBx1 dBy1 )

(10)

thus, the value that maximizes A+B is p

max



 dAx dAy  dAx dAy  dBx dBy  dBx dBy 1 0 0 1 0 1 1 0 2(dAx dAy  dBx dBy ) 1 1 1 1

(11)

Again, pmax must be mapped to 0 if pmax < 0 and mapped to 1 if pmax > 1. Note that, according to Step 1 in Section 3.3, the two corresponding lines of the two corner profiles are always vertically aligned, i.e. xA1=  xB1 and xA2=  xB2. It thus is reasonable to create references from each corner profile line to the aligned lines of the neighbour profile. The algorithm sketched for cases A and B can then be easily adapted for the cases C and D.

Case E Whereas in the former cases all of the maximum void rectangles touch the respective corner profile, case E is different. The problem is presented in Fig. 9.

Fig. 9: Problem with case E This figure shows the maximum aligned void rectangles for the given geometry, but only three of the four rectangles touch their corner profiles (the exception is the north/east one). It is very unlikely that all four aligned void rectangles touch their corner profile. So, any naïve approach that simply iterates through the lines of a specific corner profile may fail if the wrong corner is chosen. However, looking at geometries where the DBB falls under case E, at least three of the four corners touch their respective corner profile. The concept to compute optimal void rectangles is to find a corner and its two neighbours with touching void rectangles – the last void rectangle is then fully defined by the height and width of its neighbours. In the example in Fig. 9 we begin by choosing the SW corner as the starting corner. We can then construct a maximal area where the SE and NW void rectangles also touch their corner profiles. In the NE corner, the remaining void area cannot freely be chosen as its neighbours define the height and width. However, when starting at a certain corner, it may also happen that the entire construction fails: One of the neighbour rectangles or the opposite rectangle may overlap with the geometry. In this case, the resulting four rectangles obviously do not represent a valid maximum. Thus, we must also check whether a rectangle overlaps with the geometry. Since it is not clear which starting corner will end up successfully constructing the maximal void area, we must iterate through all four corners. To compute the maximum area, we have to sum up all four void rectangles (Fig. 8, on the right). In addition to formula (10) for cases C and D, we must also add the width marked with d  to the widths of the rectangles A and B. The analogous definition for the variables dAx0,  dAy0… is  dAy 0  dCy 0  p  dAy1   d  dCx 0  dCx1   dCy1  

(12)

A BC  D  (dAx0  dAx1  p  d)  (dAy0  dAy1  p)  (dBx0  dBx1  p  d)  ( dBy0  dBy1  p)

(13)

and

We can again compute pmax. In this case we get a large formula not presented here, but it is important to note that we again get a closed equation for the maximum combined area. Taking all of the above into consideration, we can formulate an approach to calculate four maximally aligned void rectangles for case E.

Algorithm 3: Four maximally aligned rectangles for each corner X in {NW, NE, SE, SW} for each line of the corner profile X { compute pmax that takes into account X and the two neighbour corners compute the sum of four areas according pmax check if the resulting void rectangle of the opposite corner overlaps with the respecting corner profile if (does not overlap AND area > previous largest area) store this choice as new best choice }

Note that any of the four corners could be the one where the profile and void rectangle do not touch. Thus, we must iterate through all four corners to get the overall maximum. Some words about the algorithmic complexity: The basic idea of all of the algorithms in this section is to iterate through all of the corner profile lines with costs of O(n). In the latter two cases we must also take the aligned lines of multiple profiles into account. As mentioned earlier, we create references between aligned lines of neighbour corners during the corner profile creation. For case E it takes O(n) to check whether the opposite corner overlaps with the geometry. Thus the overall complexity to compute maximum void rectangles for all cases is O(n).

3.5 Putting it all together Now that we have all of the steps and algorithms we need, we can combine them into the final solution. An algorithm to create a DBB from a given geometry is: Algorithm 4: DBB creation (final version) generate all four corner profiles; // see section 3.3 if (overlapping DBBs are allowed) list all cases and sub-cases of {A, B, C, D, E}; // see section 3.2 else list all cases and sub-cases of {A, C, D}; // see section 3.2 for each case/sub-case in the list { generate the maximum void rectangle(s) that fulfil(s) the conditions related to this case; // see section 3.4 sum up the void rectangle areas; if (sum of areas > area of the previous best solution) store these void rectangles as the new best void rectangles; } return the DBB based on the best void rectangles;

It is important to note that this algorithm returns the theoretical optimum, i.e. the two DBB rectangles have the smallest combined area that encloses the geometry. To understand why this is true, consider the following:  Cases A to E include all possibilities for creating a DBB. Thus, for any specific geometry, the theoretically optimal DBB must be one of these cases.  For each case we can compute the maximal void rectangles that respect the specific case conditions.  As a result, the theoretically optimal DBB must be based on the maximum void rectangles of all cases. This is computed by algorithm 4. One note about case A: as mentioned earlier, case A virtually never occurs in real data. This is because in real data, even a very small additional void rectangle can be constructed in a sec-

ond corner (covered by cases B, C, or D). Even though we can construct artificial geometries that exactly touch three of four corner points, we can easily ignore case A. In the algorithm above the cases B, C, and D would then create a second void rectangle with zero area. As a result, the appropriate DBB configuration would be returned without explicitly checking case A. The complexity of the algorithm above is as follows:  The generation of the corner profiles and creating the references between corresponding lines requires O(nlog n) steps (see Section 3.3).  To compute maximum void rectangles for all cases requires O(n) steps (see Section 3.4).  The number of main loop iterations is constant: 11 for overlapping rectangles and 8 for non-overlapping rectangles (ignoring case A). Thus the main loop does not affect the runtime complexity. As a result, the algorithm requires O(nlog n) steps.

4 Evaluation and Performance Considerations The main advantage of using DBBs rather than SBBs is to speed up spatial queries. So, it must be possible to compute the DBB for a given geometry quickly, or the increased lookup speed is minimal. In some scenarios, however, we can tolerate a longer time for index creation. Consider a large database with all of the static geo objects of Europe. It may take hours to create an indexed database of these objects, but it will only rarely change in the future. Thus we can tolerate the time for database creation, if it significantly reduces the later query time. Nevertheless, we demonstrate in this section that time to create DBBs is small. Moreover, DBBs significantly reduce the set of candidates that must undergo an exact test. For this we implemented the DBB algorithm and used a large set of spatial data to perform performance measurements. The geo data were imported from the OpenStreetMap database. We used the file "Germany" which contains  920 754 polygons,  3 144 120 line strings and  1 286 066 point-like objects (not used for our measurements). We thus processed 4.06 million objects. The average number of vertices per objects was  11.712 for polygons,  7.905 for line strings. The first step was to measure the creation time for DBBs. We implemented the DBB algorithm in Java (SE 1.6) and generated DBBs for the approx. 4 million objects on a Windows PC with 2.2 GHz. Fig. 10 shows the results.

300-319

280-299

260-279

240-259

220-239

200-219

180-199

160-179

140-159

120-139

100-119

80-99

60-79

40-59

20-39

0-19

Time to compute a DBB in ms

18 16 14 12 10 8 6 4 2 0

Number of Geometry Points

Fig. 10: Processing time to create DBBs The time measurements roughly follow O(nlog n), but deviate slightly due to Java runtime effects related to memory allocation. Even though we get a longer (but still acceptable) time for large geometries, the average time is small. More than 4.018 million (98.97%) of all objects have less than 15 vertices with a DBB generation time of less than 0.13 ms. In the second step we wanted to determine how the cases A to E are distributed in real data. Table 2 shows the distribution of cases that lead to maximum void rectangles. Table 2: Case distribution in real data and area ratios Closed Polygons May No Overlap Overlap Case A (single void rectangle)

0.00%

Case B (two diagonally independent void rectangles)

35.59%

-

26.27%

Case C (two sideways aligned void rectangles)

24.76%

36.64%

3.11%

3.32%

Case D (two diagonally aligned void rectangles)

34.74%

63.36%

70.56%

96.68%

Case E (four void rectangles)

4.91%

-

0.06%

Average SBB/real area

0.00%

Line Strings May No Overlap Overlap

2.047

0.00%

0.00% -

-

-

-

Average DBB/real area

1.459

1.471

-

-

Average SBB/DBB area

1.367

1.357

2.147

2.147

We first looked for cases that only rarely occur. If the occurrences were low, we could consider skipping the respective case check to speed up the DBB generation. This however, would very rarely lead to a sub optimal result, because:  As discussed earlier, we could skip case A since it virtually never occurs.  Case E appears only rarely in real line string geometries. That is because case E occurs for geometries with cross-like shapes similar to the characters '+' or '#'. Real line string objects only rarely have these shapes. For line string geometries, case E checks could thus be skipped without significant drawbacks.  Case C also only rarely appears in line strings, but with 3% should not be skipped. Other cases have a significant representation in the data set. We can also compare the areas of SBBs, DBBs and real geometries (Table 2, in the last three rows). Closed polygons have an area, as opposed to line strings, which do not. On average,

the SBB size is more than double the polygon area, whereas the DBB size is only approximately 50% larger. This is a great benefit. The benefit is not significantly different between overlapping and non-overlapping DBBs. Line string geometries do not have an area, so we only can compare SBB and DBB sizes. Here the benefit is also significant: on average, SBBs have more than double the size of the corresponding DBBs. We further want to check, whether different object types have different characteristics regarding DBB approximation. For this, we iterated through the 4 million geo objects and computed the average SBB/DBB area ratio (overlapping DBBs). Table 3: Object type distribution of SBB/DBB ratios Closed Polygons cnt/total Avg. cnt SBB/DBB Virtual borders 0.03% Buildings 13.70% Traffic 1.29% Open areas 1.45% Water 1.40% Vegetation 3.16% Geological objects 0.10% Others 1.54%

1.320 1.338 1.400 1.341 1.339 1.407 1.531 1.318

Line Strings cnt/total Avg. cnt SBB/DBB 0.01% 0.17% 71.64% 0.00% 1.40% 0.01% 0.01% 4.10%

2.031 2.404 2.150 1.868 2.076 1.861 2.008 2.084

Table 3 shows the results. We distinguished eight main object classes: virtual borders (e.g. city borders), buildings (including industrial buildings, power plants etc.), traffic (streets, rails, bridges, parking etc.), open areas (sports grounds, parks etc.), water (rivers, lakes etc.) vegetation (forests, cornfields etc.), geological objects (volcanoes, mountains etc.), and other (unclassified) objects. The majority in our database are linear traffic objects with 71.64%, followed by 13.7% polygonal building objects. In summary, different geo object classes with the same geometry type do not show significantly different values. On average, line string objects produce a higher ratio of 2.15 compared to 1.35 of polygonal objects. In a last evaluation we want to check the benefit of the DBB approach for typical spatial queries. Our test queries checked whether stored geo objects overlap with randomly generated geometries. We generated four different types of random geometries (Fig. 11 upper left):  (1) line segments with a size up to 5000m;  (2) triangles with largest edge up to 5000m;  (3) rotated rectangles with largest edge up to 5000m;  (4) circles with diameters up to 5000m. The center, size and orientation of the geometries were randomly generated using a uniformly distributed random number. Fig. 11 shows the results. For each of the four geometry types we produced 100,000 random geometries. Each of them undergoes an overlap check against all geo objects stored in the database.

Random query shapes

SBB/DBB false hits ratios 2,8 2,6 false_hit_ratio

2,4 2,2

Line Triangle Rectangle Circle

2,0 1,8 1,6 1,4 1,2 5000

4500

4000

3500

3000

2500

2000

1500

1000

500

0

1,0

size in m

(1) Line segments

(2) Triangles

4500

4500

3000 2500

4500 4500

5000

4000 4000

(4) Circles

3500

(3) Rectangles

3500

size in m

3000

length in m

3000

2500

0

5000

4500

4000

3500

3000

2500

0

2000

500

0 1500

1000

500 1000

1500

1000

2000

2000

1500

1500

hits

2000

500

SingleBB DoubleBB exact

3500

2500

0

hits

3000

4000

500

3500

1000

SingleBB DoubleBB exact

4000

4500

6000 5000

5000

2500

5000

4500

4000

3500

3000

2500

2000

0

1500

1000

0 1000

2000

500 500

3000

1000

2000

4000

1500

1500

2000

1000

hits

2500

0

SingleBB DoubleBB exact

7000

500

3000

0

3500

hits

8000

SingleBB DoubleBB exact

4000

diameter in m

size in m

Fig. 11: Results of random query tests For each geometry type we plot the average number of hits against the query geometry size  using SBBs for query and geo object geometries,  using (overlapping) DBBs for query and geo object geometries, and  using an exact overlap check that compares the original geometries. Line segments produce the lowest numbers of exact hits with a considerable amount of SBB and DBB hits. As expected, line segments usually have a poor box approximation, however DBBs show significantly better values than SBBs. For circles the difference between SBB, DBB and exact hits is small. This is because an SBB area is only 4/1.27 and a DBB is only (42-2)/1.16 (case E shape) larger than the corresponding circle area. Thus boxes already provide a good approximation for circles. To explore the overall benefit of DBBs compared to SBBs we introduce the false hit ratio

false _ hit _ ratio 

SBB _ hits  exact _ hits DBB _ hits  exact _ hits

(14)

that describes the amount of false SBB candidates compared the false DBB candidates. A false hit ratio of, e.g., 2.0 means: we get twice as much false SBB candidates compared to DBBs. Fig. 11 upper right presents the false hit ratios for the four geometry types. The improvement ranges from 1.6 (circles) up to approx. 2.5 (line segments). This is a great benefit, as DBBs significantly reduced the set of false candidate compared to SBBs. We roughly get half as much false candidates.

5 Conclusion In this paper we presented an approach to approximate two-dimensional geometries with the help of two bounding rectangles. The approach provides the theoretical optimum, i.e. the combined area of the two bounding boxes is minimal. The algorithm has a complexity of O(nlog n) steps for n geometry vertices. Performance evaluations show that the algorithm runs at an acceptable speed in real scenarios. Further evaluations show that, for polygonal objects, the combined area of two boxes is on average only 50% larger than the object's actual size, while a single box is 100% larger. For line strings, a single box has double the size of two bounding boxes. Random queries using two boxes reduce the number of false candidates roughly by 1/2, thereby significantly reducing the number of exact spatial comparisons.

Acknowledgments I would like to thank Kim Trommler and Peter Trommler for their helpful comments.

References Barequet, G. & Har-Peled, S. (1999). Efficiently Approximating the Minimum-Volume Bounding Box of a Point Set in 3D, Proc. 10th ACM-SIAM Sympos. Discrete Algorithms (1999), 82-91 Cox, S., Powell, A., Wilson, A. & Johnston, P. (2006). DCMI Box Encoding Scheme: specification of the spatial limits of a place, and methods for encoding this in a text string, http://dublincore.org Elzinga, J. & Hearn D.W. (1972). The Minimum Covering Sphere Problem, Management Science, Vol. 19, 96–104 (1972) Finkel, R. & Bentley J.L. (1974). Quad Trees: A Data Structure for Retrieval on Composite Keys. Acta Informatica 4 (1): 1974, 1–9 Gartner, B. & Schonherr. S. (1997). Smallest enclosing ellipses - fast and exact, Tech. Report B 97-03, Free Univ. Berlin, Germany (1997) Guttman, A. (1984). R-Trees: A Dynamic Index Structure for Spatial Searching. Proc. of the 1984 ACM SIGMOD International Conference on Management of Data, Boston, Massachusetts, June 1984, 47-57 Hastings, J. T. (2009). Multihulls: A Technique for Spatial Representation and Processing with Variable Shape Fidelity, Applied to Gazetteers, Transactions in GIS, 2009, 13(5-6), 465-480 Hearn D.W. & Vijay J. (1982). Efficient Algorithms for the (Weighted) Minimum Circle Problem, Operations Research, Vol. 30, No. 4, 777–795 (1982) Herring, J. R. (ed.) (2006). OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture, Open Geospatial Consortium, 2006

Hill, L. L. (2006) Georeferencing: The Geographic Associations of Information MIT Press, Cambridge, MA ISO 19115 (2003). Geographic Information - Metadata. International Organization for Standardization (ISO), 2003 Kunze, J. & Baker, T. (2007). The Dublin Core Metadata Element Set, Request for Comments 5013 (IETF), Aug. 2007 Portele C. (ed.) (2007). OpenGIS Geography Markup Language (GML) Encoding Standard, Open Geospatial Consortium, 2007 Ramm, F. & Topf, J. (2010). OpenStreetMap: Using and Enhancing the Free Map of the World, UIT Cambridge Ltd. 2010 Roth, J. (2009). The Extended Split Index to Efficiently Store and Retrieve Spatial Data With Standard Databases. IADIS International Conference Applied Computing 2009, Rome (Italy), Nov. 19-21, 2009, Vol. I, 85-92 Toussaint, G. T. (1983). Solving geometric problems with the rotating calipers. Proceedings of IEEE MELECON'83, Athens, Greece, May 1983 Welzl, E. (1991). Smallest enclosing disks (balls and ellipsoids) in New Results and New Trends in Computer Science, Lecture Notes in Computer Science, Vol. 555 (1991), 359370 Yang, M., Kpalma, K. & Ronsin, J. (2008). A Survey of Shape Feature Extraction Techniques. In Pattern Recognition Techniques, Technology and Applications, Peng-Yeng Yin (ed.) , Nov. 2008, I-Tech, Vienna, Austria, pp. 626 Yuan, B., Kwoh, L. K., Tan, C. L. (2006). Finding the Best-Fit Bounding Boxes, Springer LNCS 3872, 268-279

Suggest Documents