Approximate range query processing in spatial network ... - Springer Link

2 downloads 0 Views 1MB Size Report
Jul 20, 2012 - REGULAR PAPER. Approximate range query processing in spatial network databases. Haidar AL-Khalidi • Zainab Abbas •. Maytham Safar.
Multimedia Systems (2013) 19:151–161 DOI 10.1007/s00530-012-0286-9

REGULAR PAPER

Approximate range query processing in spatial network databases Haidar AL-Khalidi • Zainab Abbas Maytham Safar



Published online: 20 July 2012 Ó Springer-Verlag 2012

Abstract Spatial range query is one of the most common queries in spatial databases, where a user invokes a query to find all the surrounding interest objects. Most studies in range search consider Euclidean distances to retrieve the result in low cost, but with poor accuracy (i.e., Euclidean distance less than or equal network distance). Thus, researchers show that range search in network distance retrieves the results with high accuracy but with a vast amount of network distance computations. However, both of these techniques retrieve all objects in a given radius with a high number of false hits. Yet, in many situations, retrieving all objects is not necessary, especially when there are already enough objects closer to the query point. Also, when the radius of the search increases, a demotion in the performance will occur. Hence, approximate results are valuable just as the exact result, and approximate results can be obtained much faster than the exact result and are less costly. In this paper, we propose two approximate range search methods in spatial road network, namely approximate range Euclidean restriction and approximate range network expansion, to reduce the number of false hits and the number of network distance computations in a considerable manner. After the verification, these two methods are shown to be robust and accurate.

H. AL-Khalidi (&)  Z. Abbas Clayton School of Information Technology, Monash University, Melbourne, Australia e-mail: [email protected] Z. Abbas e-mail: [email protected] M. Safar Computer Engineering, Kuwait University, Kuwait, Kuwait e-mail: [email protected]

Keywords Range search  Approximate range search  Road network  Search query processing  Spatial databases

1 Introduction During the past decade, spatial databases has received increasing interest due to its important role in many modern applications, such as geographic information systems (GIS), multimedia databases, navigation systems, urban planning, and traveller information systems [7, 12, 16, 22, 23, 27, 29]. Most of the studies consider Euclidean spaces, where the distance between two objects is determined by their relative position in space. However, in practice, the trajectory between two objects is specified by the underlying network (such as roads and railways) [32]. Thus, measuring the actual network distance between two objects (the length of the shortest road connection between them) is more important than measuring their Euclidean distance. In spatial network databases (SNDB), research [17, 21, 25, 26, 33] focused on developing efficient algorithms that expand the spatial query processing methods by integrating connectivity and location information. On this basis, two methods were developed, euclidean restriction and network expansion, for processing the most common spatial queries in SNDB, i.e., range search, nearest neighbour, closest pairs, point location, and distance joins [6, 17]. In this paper, we will focus on processing range search queries in SNDB due to its common use in various applications, such as global position system (GPS). Compared with Euclidean distance (dE), network distance (dN) computations are significantly more expensive because they entail shortest path algorithms in large graphs [18, 22]. Consequently, implementing range search queries in SNDB, namely range Euclidean restriction RER and

123

152

range network expansion RNE, requires lengthy calculation times to obtain the essential results. Also, when the search is finalized with a result of a massive number of objects or with absolutely no results, then users need to submit another query which means that another search is performed. Furthermore, repeated false hits (i.e., irrelevant objects as candidates) that occur during the search should be considered since each false hit represents a waste of search time. Therefore, an essential pre-request for solving these problems is to refine the processing of spatial range queries in SNDB. RER and RNE are applied to find all interest objects within the given region or radius, and it can be defined as follows: given q a query point (user’s location), e a radius (the range of the search specified by the user) and P a set of special objects (e.g., post office or patrol station), find all interest objects P within network distance dN from q. To give an example of range search in a GPS map, consider Fig. 1 where interest objects (i.e., petrol stations) are listed by the numbers 1–17. The user is asking for all of the objects within 2.5 km from where he stands. The red objects (i.e., 1–6) represent interest objects that will be received by the user because their network distance is less than 2.5 km. While the blue objects (i.e., 7–17) represent objects out of the range where their distance to the q is more than 2.5 km, consequently, this will exclude them from the result list. In this paper we propose two novel query-processing techniques for range search in SNDB, approximate range Euclidean restriction (ARER) and approximate range network expansion (ARNE), based on our previous work approximate range search technique (ARS) in Euclidean

Fig. 1 Range search query in spatial network databases using an online map

123

H. AL-Khalidi et al.

distance [2]. In these two techniques, we introduce lowerbound which minimizes the actual range search to exclude the internal nodes that fall outside the lowerbound. We also improve the selectivity of the filter step to reduce the number of the candidate objects, and consequently, minimizing numbers of communication between the mobile device and the database server. The resulting techniques achieve a better running time and deliver a better performance, yet with low false hit and reasonable false misses. To the best of our knowledge, this is the first work dealing with the efficient processing of approximate spatial range search queries in SNDB. The rest of the paper is structured as follows: Sect. 2 will review the relevant existing works and the basic theoretical background, and Sect. 3 will illustrate the adoptive ARS query. This will be followed by Sect. 4, which will describe our proposed techniques, ARER and ARNE, and their algorithms. Section 5 will then show the performance evaluation of the approaches, and finally, Sect. 6 will conclude this paper with further intuitions.

2 Related work This section reviews the previous work of range search in spatial road network and approximate search. 2.1 Range search in spatial road network Range search is one of the most common types of queries in spatial and mobile databases. Traditional range search query processing depends on Euclidean distance. The main

Approximate range query processing in spatial network databases

disadvantage of using Euclidean distance is the inability of perfectly providing the relative position of the spatial object. In reality, the location of spatial objects is specified by the underlying network and not by Euclidean space. Also, measuring in network distance is more accurate than in Euclidean distance. Thus, researchers [13, 17, 22, 28] suggested using network distance instead of Euclidean space. The main disadvantage of range search approaches using network space or Euclidean space is the retrieval of many redundant false hits; however, range search in the road network is more complicated. Range search, in general, requires a filter step and refinement step [17, 19, 21, 31] to obtain the essential results. The filter step is used to select objects whose minimum boundary rectangle (MBR) overlaps with the range search to obtain candidates that fall within a specific range e. The refinement step, on the other hand, is used to sequentially scan the objects that pass the filter step and then perform the spatial test on the actual geometries of the objects whose MBR satisfies the filter step. The refinement step is done on-line [17, 21]. The objects that successfully pass the filter step, but fail to pass the refinement step, are called false hits. Lots of false hits will lead to extra time for searching and an enormous amount of communication between the mobile device and the database server, meaning that lots of communications will exhaust the mobile’s device battery [20, 30]. Also, false hits cause extra input/output access (i.e., extra traversal in R-tree), where input/output access is considered to be the most expensive operation in computer systems, as it requires a mechanic movement (i.e., disk arm movement), which is very slow in comparison with CPU operations. Therefore, minimizing input/output access is the ultimate objective in any query processing algorithms [24]. In 2003, Papadias et al. [17] proposed two algorithms, range Euclidean restriction (RER) and range network expansion (RNE), taking advantage of location and connectivity to efficiently prune the search space. These two algorithms gave a solution for range queries in network databases by introducing an architecture that integrates the network and Euclidean information and captures spatial entities. However, the initial problems have not been solved. Further more, the number of communications is fairly high when performing these two methods in addition to their costly pre-computations. Hence, using such methods in real-life applications is not reliable. RER algorithm is based on a Dijkstra algorithm to find the shortest path. By using R-tree, the RER method first performs a range query at the entity dataset and returns the set of objects S0 within (Euclidean) distance e from q. S0 is guaranteed to avoid false misses (i.e., dN(q, p) B dE(q, p) B e, where dN and dE refer to the network and Euclidean distance, respectively), but it may contain a large number of false hits. The RNE algorithm first computes the

153

set QS of qualifying segments within network range e from q and then retrieves the data entities falling on these segments. In other words, set all paths (segments) that may contain interest object(s) in QS and then evaluate these paths in QS to find the interest objects. This is done by traversing the index tree such as R-tree and while visiting each node in the index tree, joining the node of the index tree with the qualifying segment QS, to check whether the node (e.g., MBR if it is a non-leaf node, or interest object if it is a leaf node) is located in QS or not. 2.2 Approximate search The exhaustive processing to find the exact answer of a query can be prohibitively expensive, depending on the query nature and data properties. This is generally due to the exponential nature of the problem search. The relatively high complexity of range searching has led researchers to consider the problem in the context of approximation. A geometric way to do this is to consider the range shape to be ‘‘fuzzy’’ and allow points that are close to the range boundary to either be counted or not. The user supply approximation parameter e [ 0 (i.e., e distance relative error), where the range shape R is bounded and points lying within distance e:diamðRÞ of the boundary of the range may or may not be included [11]. Many researchers propose to improve the performance of the approximate search; however, most of their works were related to the performance of the approximate nearest neighbour query search in Euclidean space and its drawbacks. The approximate nearest neighbour problem (i.e., closest-point problem) has been considered by Bern [6]. He proposed a data structure based on quadtrees, which uses Euclidean space and provides logarithmic query time. However, the approximation error factor for his algorithm is a fixed function of the dimension. Arya and Mount [4] proposed a data structure which achieves poly-logarithmic query time in the expected case and nearly linear space. In their algorithm the approximation error factor is an arbitrary positive constant, fixed at preprocessing time. This algorithm finds the nearest neighbour in moderately large dimension significantly faster than existing practical approaches. In another study, Arya et al. [5] strengthen the results of that algorithm significantly by purposing a method called -approximate method which guarantees that the quality of the result is bounded by some constant in terms of distance error. Given a positive tolerate real ð  0Þ as maximum distance relative error. However, in the  approximate method, the trade-off between the cost and the accuracy of the result cannot be controlled easily since it depends on the value of  which is an unbounded positive real. Corral et al. [9, 10], on the other hand,

123

154

proposed a new version of approximate nearest neighbour called a-allowance method to bound the relative distance error c. In another type of study, Chow et al. [8] proposed an algorithm of approximate range nearest queries, to return an answer set that includes the approximate nearest object(s) to every point in a range area. Their algorithm minimized the number of objects (i.e., returned as interest objects) to minimize the transmission time of sending the answer to the user. They used Voronoi diagram to index the objects which partition the space into a set of polygons so that each polygon has exactly one object. Another study was conducted by Arya et al. [3], which modified the index structure of the objects to get the result of the range search. The result will be approximately depending on the relative error that has been given by the user. Subsequently, Fonseca and Mount [11] proposed a similar data structure to retrieve the approximate result, but depending on the absolute error. In our previous works, we have proposed a novel ARS query processing method which is the first work dealing with the approximate search in range query [1, 2]. Our approach has solved the problems of range search that we have mentioned in Sect. 2.1. The number of false hits has been reduced, in addition to minimizing the number of communications with the database server, reducing the number of the network distance computation. This is done by increasing the number of MBRs that should be pruned. We introduced two boundaries, lowerbound and upperbound, to surround the range search depending on the distance relative error c. The lowerbound is used to reduce the range of the search and also to reduce the number of false hits through pruning the MBRs that fall outside it, while the upperbound is used to retrieve some objects just outside the search boundary but located within it. This will give the user alternative options without performing another query or increasing the number of computations or the search time.

H. AL-Khalidi et al.

time of search is effected by the number of the retrieved objects and the size of the range search e. The range query retrieves all objects that are within a certain range e from a given query point q. To reduce the search time and to find more results, we proposed the approximate range-based query method (ARS) using different factors to bound the range search. Thus, the approximate result can be performed much faster than the exact one, when the users propose their satisfactory distance relative error c. With our ARS method, we introduced two factors to bound the search. First, the lowerbound = e 9 (1 - c), to reduce the range search and finish the search faster. Second, the upperbound = e 9 (1 ? c), to increase the range search and retrieve alternative options. Concerning the modulation, we will use the lowerbound factor only to reduce the number of pre-computation and the number of MBRs which will intersect the range search. We will call this method lowerbound approximate range search (LARS). Given any positive real c (0 B c B 1) as maximum distance relative error to be tolerated, the result of ARS query within distance e is (1 - c) approximate range. When the pruning heuristic is applied on a branch-andbound algorithm 1. 2.

Note that for c = 0, the approximate algorithm behaves as the exact version and outputs the precise solution. For c = 1 the approximate algorithm retrieves all objects within distance e from q only if their parents intersect with the query point q. Let P be a point dataset (P = /) in Ed, e the range search e 2

Suggest Documents