MULTI-POINT QUERIES IN LARGE SPATIAL DATABASES Hoong Kee Ng Department of Computer Science National University of Singapore 3 Science Drive 2, Singapore 117543
[email protected]
Hon Wai Leong Department of Computer Science National University of Singapore 3 Science Drive 2, Singapore 117543
[email protected] work [1], an efficient algorithm called path-based range query (PRQ) to answer MPRQ was introduced. Regardless of the number of query points in the input set, PRQ processes multiple query points using one pass through the spatial index, usually the R-tree which is widely used for 2-d spatial data. We had previously shown that PRQ performs better than RRQ in main memory. The focus of this paper is to show that PRQ is effective in solving the MPRQ efficiently in the case where the database is in secondary memory (disks) and the bulk of the costs are dominated by disk accesses during queries. Henceforth, we distinguish the disk algorithms by referring to them as PRQ-Disk and RRQDisk. As least recently used (LRU) page buffering schemes are widely used in real database systems, it is interesting to show how RRQ-Disk might gain because of its repeated nature in accessing the disk spatial index. We show how PRQ-Disk and RRQ-Disk perform in the presence of a LRU buffer.
ABSTRACT This paper revisits multi-point range query (MPRQ) for 2d spatial database. In a previous paper, we introduced an efficient algorithm, PRQ, to answer the query for the case where database resides in main memory. This paper extends the algorithm to the general case in which the database is large and has to reside on disk. The MPRQ is defined as: Given a set of query points, P = {p1, p2, …, pn}, and a search distance d, report all points in the spatial database that are within a distance d of some point pi in P. The simple method of performing Repeated Range Query (RRQ), i.e. the standard range query for each query point pi (1 ≤ i ≤ n) and combining the results is inefficient as it involves multiple searches on the database. We show that PRQ-Disk still achieve better results and outperform RRQ-Disk, as in the case of main memory. Extensive experiments using various real-life datasets, different Rtree variants (including bulk-loaded ones), over different query paths P, search distances d, and LRU buffering show that PRQ-Disk outperforms RRQ-Disk in terms of both query time and I/Os.
2.
An algorithm called PRQ-MinMax was introduced in [1] to efficiently process MPRQ in one single step – using only one pass through the spatial index while simultaneously pruning the search space and the query points. The algorithm makes use of two pruning rules called NodeIn and PointOut which are geometric-based rules to identify a minimal subset of the query points at any level of the spatial index to be considered. When the search focuses on different parts of the spatial index, the minimal subset of the query points are different w.r.t. a node. As NodeIn and PointOut rely on the widely used notion of minimum and maximum possible distance among MBRs (commonly called MinDist and MaxDist [2, 3, 4]), hence this approach is called PRQ-MinMax. MBR (minimum bounding rectangle) is the basic representation of spatial objects in hierarchical R-trees. At leaf level of the R-tree, it is the smallest box that encloses a spatial object, while at internal levels it is the smallest box that encloses all its children (also MBRs). For simplicity, PRQ-MinMax is the default algorithm and is called PRQ and PRQ-Disk in the remainder of this paper. Based on the idea that the performance of multiple queries can be improved if they share common data (subsequent nearby queries retrieve a lot of the same data), [5] presented an algorithm that sorts its queries (of
KEY WORDS query processing, spatial range queries, algorithm
1.
Introduction
Range queries are very basic operations commonly used in geographical information systems, spatial information databases and many applications on a daily basis. Recent technological advancements and declining costs introduced many applications providing quick, informative services in route planning, in particular for a general traveler or driver on the road. The user of a car equipped with GPS who needs to make a few stops in one journey might want to know all possible establishments within a certain range from the stops. He can then make informed decisions on when to do his chores, either earlier or later along his journey. This is an example of multiple range query, whereby given a few points (stops), find out what is within a predefined distance from all the points. We call this a multi-point range query (MPRQ). Given an input set of query points, a simple straightforward way to process MPRQ is to find the objects near to the first query point; repeat the search for the second query point, third query point and so on. Finally combine the results and we are done. We call this method the repeated range query (RRQ). In our previous
559-149
Related Work
408
straightforward way to solve MPRQ, as mentioned in a previous section, is to apply the standard range query to each point pi (1 ≤ i ≤ m) and combining the results, i.e. ∪ RQ(pj, d). However, this method is inefficient as it
rectilinear rectangles), group them together so that they are spatially close, and finally pass them for processing. Results were shown for R-trees built on Hilbert-curve sorted objects. Although the queries are similar to MPRQ, the main differences are (i) they are doing inter-query optimization, while PRQ is a single query; and (ii) the combined results obtained from joining the queries raises another issue which is the separation of results, extra processing to determine which objects belong to a specific range query. PRQ generates cumulative results that answer the query as a whole. There are many variants of the multiple queries problem. One such recent work is the group NN queries [6], where two sets of points P (database) and Q (multiple input) are given and the aim is to find a point p from P that minimizes the sum of distances |pqi| for all qi ∈ Q. In [7], for the same sets of points, the aim is to find the nearest neighbor from P for each and every point in Q. Three algorithms were given. The first is multiple NN (MNN) which is similar to RRQ in [1], except that the latter returns all points instead of the nearest w.r.t. the query point in Q. This approach is straightforward and already proven to be very slow in both papers (see Section 4). The second is batched NN (BNN) which is designed for cases where Q cannot fit in memory. BNN breaks all points in Q into arbitrary groups (bounded by two thresholds, max number of points per group and MBR size of the grouped points) to be processed together against P. The third approach is hash-based NN (HANN) where the points in P and Q are hashed to a grid and subsequently loaded pairs (HP ∈ P, HQ ∈ Q) of buckets covering the same region are searched for each point in HA its NN in HB (with consideration for points near grid borders that might have NN in an adjacent region).
3. 3.1
p j ∈P
searches the database multiple times and requires an expensive post-processing step to eliminate duplicates. The key idea behind PRQ(P, d) is to use only one pass of the R-tree while simultaneously process all the points in the query path P. At each node R in the R-tree, the algorithm processes all the children of R against all the query points in P. A straightforward method, RRQ, will require a processing time of O(m*f) where m is the number of query points and f is the fan out of node R; i.e. a processing time of O(f) per node, but it may need to visit the node m times (once for each query point). An important observation is that when the search proceeds down the R-tree, the number of query points to be processed at each node decreases rapidly (since the MBR is much smaller). PRQ not only effectively ensures that all the relevant nodes are visited only once for the entire input regardless of its size, it also prunes the query path for each node. This is called PointOut rule. At a certain point during traversal, if the search distance is sufficiently large, it is possible that it covers an entire MBR (e.g., p7 against R44). In this case, PRQ quickly reports all points found in the MBR without further extra computations. This is called NodeIn rule. For more details, see [1]. 3.2
Extending PRQ to PRQ-Disk
In our previous study [1], the issue of performance was largely dominated by the effect of pruning of the nodes visited during the search traversal. The performance speed-up of the PRQ algorithm is more or less directly proportional to the number of nodes visited during the search process. However, when extending the PRQ to disk, several different performance issues need to be studied – disk block reads are typically much slower than main memory access; the number of disk I/Os becomes the critical factor in performance. Since each disk block contains a node of R-tree, issues such as block size and disk buffering greatly affect the performance of PRQDisk. The Transparent Parallel I/O Environment (TPIE) library [8] was used to implement the PRQ algorithm on disk for our study. TPIE is a set of templated classes and functions that facilitates the implementation of external memory algorithms. We used the blocks-oriented TPIE data structure to obtain an I/O-efficient implementation of PRQ, each block representing a typed view of a logical block which is the unit amount of data transferred between disk and main memory. Each node R is given a block id which consists of the bucket size (fan out) number of Rc (R’s children), each with their own block id. Largely, the handling of block ids is synonymous to the handling of pointers in main
PRQ Definitions
PRQ and RQ were first mentioned in [1]. They are defined as follows. The standard range query (RQ): Given a twodimensional spatial database of objects SDB, a query point q, and a search distance d, the range query RQ(q, d) is the set of all the points in SDB that are within a distance d from point q, i.e. RQ(q, d) = { p : p ∈ SDB, p ∈ C(q, d) }, where C(q, d) is the circle with centre q and radius d, defined by C(q, d) = { p : distance(p, q) 12 d }. Here we shall assume that distance(x, y) is the Euclidean (L2) distance, although our algorithms will work for all reasonable distance measures. The path-based range query (PRQ): Given SDB, a set of query points P = {p1, p2, …, pm}, and a search distance d, the path-based range query PRQ(P, d) is the set of all the points in SDB that are within a distance d from some query point pi in P. More formally, PRQ(P, d) = { p : p ∈ SDB, and p ∈ C(pj, d) for some pj ∈ P }. The
409
memory. In the PRQ-Disk algorithm, nodes are stored in blocks that contain links (block ids) to other nodes (blocks). For the disk case, given that the size of SDB is now N, the size of disk block is B. At any node R in the R-tree, PRQ-Disk now incurs O(m*B) time for each node, where m is the number of query points and f ≤ B is the fan out of node R. The former, m, is a mostly internal CPU computation where the pruning rules of NodeIn and PointOut take place. Since disk accesses are generally 2-3 orders of magnitude slower than CPU computations, B becomes dominant and it contributes to the bulk of the query time and the total number of I/Os. In comparison, the RRQ has a processing time of O(f) per node in main memory, but O(B) on disk. In general, RRQ-Disk answers MPRQ in O(m*(logB N + k/B)) using bulk-loaded R-trees (such as STR and KDTopDown) which has a bounded height of O(log n), where k is the number of results found. There is also a post-processing cost of O(K log K) to remove duplicates, where K =
∑i =1 k i m
4. 4.1
Experimental Results Experiment Setup
We conducted extensive experimental study to evaluate the performance of PRQ-Disk with large spatial databases that reside on disk. All implementations are in C++ compiled with gcc version 3.4.4 on a Pentium IV 2.0GHz Linux machine with 512MB RAM, in addition to TPIE for disk implementations. The disk page size is 4096 bytes. We consider the following factors when evaluating the PRQ-Disk performance: search distance d, different query paths P, different R-tree variants and LRU buffering. Synthetic dataset (Singapore dataset): We used a database of 160,000 uniformly distributed 2-d points in all our experiments. All the points are randomly placed (Fig. 2(a)-(b)). We have more extensive results (not reported here due to space constraints) with different number of points of 10K, 20K, 40K and 80K. Real-life dataset (TIGER/Line Dataset): We included benchmark data from the TIGER/Line datasets [14] – map for New Jersey (331,544 objects representing road segment intersections). Different query paths: A uniformly spaced horizontal route (called H-path) with 80 query points (at regular interval 500m apart) is used across all experiments, i.e. |H-path| = 80. We also have a vertical query path V-path with 38 points and a diagonal query path D-path with 45 points. Real-life query paths: For one set of experiments, 4 reallife routes that are output by the multi-criteria shortestpath algorithm of [15] are used. The paths contain 34, 78, 120 and 123 query points respectively. The paths exhibit many aspects of a real-life travel which consists of taking buses (query points very near to each other – meaning overlapping is heavy), the subway (points far apart – less incidents of overlapping), and combinations of the two. Varying search distances (d): The search distances of (100m, 200m, …, 1000m) are used. For the H-path, when the search distance d < 250m, there is no overlapping in all the query regions. Most of the results reported in this paper use two main distances d=200m and d=500m. Data structures: We implemented both algorithms for PRQ-Disk and RRQ-Disk, as well as several variants of the R-tree – the R*-tree, Hilbert curve, KDTopDown and STR. Several other R-tree variants were also implemented but not reported here since their performance were worse
. In comparison,
PRQ-Disk answers MPRQ in O(logB N + k/B). PRQDiskSearch(Bid, P, d, Obj) // Input: Disk blk ID Bid, query set P, // search distance d // Output: Obj – set of objects within // distance d of some point in P begin Access block Bid for node R; if (R is a leaf-node) then Process objects in R wrt path P; else for each Rc of node R do // Pruning rule PointOut PointOut-Rule(Rc, P, d, Pnew); if (Pnew empty) then if NodeIn-Rule(Rc, P, d) then // Pruning rule NodeIn // report all objects under Rc FastReport(Rc.Bid); else PRQSearch(Rc.Bid, Pnew, d, Obj); endfor endif end; // PRQDiskSearch
Fig. 1. Algorithm for PRQ-Disk
Since the spatial database under consideration is two-dimensional, the most commonly used data structure is the well-known R-tree. A comprehensive survey of Rtrees can be found in [9]. In this paper, we only mention the R-trees variants used in our study. As map database is generally static, we further consider R-trees that use offline bulk-loading algorithms such as Hilbert curve [10], STR [11] and KDTopDown-build of [12] (which is extended from [13]). For comparison, we have also considered the R*-tree which is a dynamic data structure.
(a) Uniform random dataset and query paths (b) The 8-clustered dataset with two (c) New Jersey dataset from H-path, V-path, and D-path real-life routes TIGER/Line Fig. 2: (a)-(b) The spatial database (using the Singapore map as outline) with 40K generated points for illustration; (c) the New Jersey dataset from TIGER/Line; all figures not drawn to scale
410
d=500m, RRQ-Disk
d=500m, RRQ-Disk
d=500m, PRQ-Disk d=500m, PRQ-Disk
(a) (query-time) vs (# query-points) (b) (# I/O) vs (# query-points) on disk (external memory) on disk (external memory) Fig. 3. Comparison of PRQ and RRQ for query path H-path and d=500m
d=75, RRQ-Disk d=75, RRQ-Disk d=75, PRQ-Disk d=75, PRQ-Disk
(a) (query-time) vs (# query-points) (b) (# I/O) vs (# query-points) Fig. 4. Comparison of PRQ-Disk and RRQ-Disk for NJ dataset, query path V-path and d=75
d=500m, RRQ-Disk
d=500m, RRQ-Disk d=500m, PRQ-Disk
d=500m, PRQ-Disk
(a) (query-time) vs (# query-points) (b) (# I/O) vs (# query-points) Fig. 5. PRQ-Disk performance with small number of query points (when m ≤ 10) d=500m
incurs 2.5 times more I/Os for m = 80. In main memory, the speed-up is significant as the PRQ pruning rules significantly reduce the amount of expensive distance computation. However, on disk, the savings in computation is negligible as the cost of an I/O (a few orders of magnitude larger) eclipses it. In spite of this, the PRQ still performs well because it is able to minimize the I/Os by not visiting a node unnecessarily. Fig. 4 shows the results comparing both the query times and the number of I/Os for PRQ and RRQ in reallife New Jersey dataset, where the data is non-uniform. We chose d = 75 such that it returns about 20% of the total points when m = 35. The query time speed-up is 7 times for m = 35. In general, the query time speed-up increases with the number of query points. The reduction in the number of I/Os for PRQ-Disk versus RRQ-Disk is also significant. For the case of query path H-path, the number of I/Os rises linearly with the
than those from the representatives above. For all of these comparisons, we measure the overall query time and the number of I/Os (disk accesses) to evaluate the performance of PRQ-Disk. We ran each query 100 times and take the average running times. 4.1 4.1.1
PRQ Performance Evaluation Baseline Comparison of PRQ and PRQ-Disk
We start by mentioning the baseline case where we compare the results of the main memory (studied in detail in [1]) to disk. Due to space limitation, the results are not shown here. As expected, we found that the speed-up trend is similar for query time and the number of nodes visited (for main memory) correlates to the number of I/Os (for disk). For the case on disk, Fig. 3(a), PRQ over RRQ speed-up is 7.93 times for m = 80; and 2.46 times for m = 10. As for the number of I/Os in Fig. 3(b), RRQ
m=80, PRQ-Disk
overlapping
non-overlapping
overlapping
non-overlapping
m=80, RRQ-Disk
m=80, RRQ-Disk
(a) (query-time) vs (search-distance) (b) (# I/O) vs (search-distance) Fig. 6. PRQ-Disk performance for varying distances d with H-path 80 query points
411
m=80, PRQ-Disk
route4
route4 route3
route3 route2
route2
route1
(a)
route1
# I/O(RRQ) query-time(RRQ) vs (search-distance) vs (search-distance) (b) # I/O(PRQ) query-time(PRQ) Fig. 7. PRQ-Disk performance for real-life paths (route1-4)
RRQ-Disk grows similarly to PRQ-Disk (in terms of I/Os). However, when overlapping occurs, RRQ-Disk uses more I/O requests (for duplicates actually) which is totally redundant and this contributes to its long query time. In fact, PRQ-Disk growth is linear because excessive overlapping in search areas does not add to the algorithm’s running time. Performance of real-life routes: Real-life routes provide an insight into how the PRQ-Disk algorithm fares when deployed for use. Our target application is a route advisory system that can help a user plan a route in a combination of transportation modes of bus, walking and subway [15]. The performances of the four real-life routes (route1-4) are shown in Fig. 7 showing clear advantages of PRQ-Disk over RRQ-Disk. In Fig. 7(a), the query time speed-up for real paths are generally similar to those for the synthetic H-path (shown in Fig. 6). The reduction in the number of I/Os for PRQ-Disk also widens with the search distance.
number of query points for both PRQ-Disk and RRQDisk. Fig. 3(b) and Fig. 4(b) show that, on average, the number of I/O requests by PRQ is about 41.5% and 69.1% of that for RRQ-Disk for the Singapore dataset and the NJ dataset, respectively. 4.1.2
PRQ-Disk Performance Study
Small number of query points (m ≤ 10): In general, we expect PRQ-Disk to perform better as the number of points in the query path P increases. As a stringent test we have also zoomed into the cases where m = 1, 2, …, 10. Fig. 5(a) shows that PRQ-Disk runs slightly faster for the special case of just one query point when m = 1 (normal single point range query) because the PointOut pruning rule that generally exert more computations for PRQ-Disk (as opposed to RRQ-Disk) did not fire. This is by design. The rule only fires when m ≥ 2. Meanwhile, NodeIn rule is fired when the index traversal reaches a point where the search distance covers an entire MBR which triggers all of its children to be reported without further computations. This makes PRQ-Disk faster than RRQ-Disk even when there is just one query point. As for the number of I/Os, Fig. 5(b) reveals that at m = 1, both PRQ-Disk and RRQ-Disk incurs the exact same amount of I/Os. This is true because even if NodeIn rule fired, it still has to traverse until the leaf level to report all results although it doesn’t do any further calculations. Additional results for Rhode Island and Montgomery datasets (both from [14]) and V-path and D-path (in Fig. 2(a)) also show identical trends with respect to performance. Therefore, for the remainder of this paper, it suffices to report on results for H-path. Henceforth, we focus only on just the performance of PRQ-Disk with the understanding that it is superior to RRQ-Disk. Size of the search distance: We now investigate the performance of PRQ-Disk across different search distances d. Given any set of query points, when d is large, overlapping of search area will result in many duplicate results obtained by RRQ-Disk (since each query point is a standard range query, independent of the rest of the points in the same query points set) which in turn results in a longer post-processing time to remove duplicates. Recall that for d < 250m, there is no overlapping of search area because the H-path is made up of query points with regular interval of 500m. Fig. 6 shows that for nonoverlapping areas, where no redundant results are present,
4.1.3
Data Structures and LRU Buffering
Underlying data structures: The underlying spatial index will have effect on the performance of PRQ-Disk because objects that are spatially close and indexed as such will result in lower I/Os and improved query time. We ran similar sets of experiment on R*-tree, Hilbertbuild R-tree, STR-built R-tree and our KDTopDown Rtree. The results tend to be similar to above results for both PRQ-Disk and RRQ-Disk. To obtain a more detailed comparison of the different R-tree variants, Fig. 8 shows the performance of only PRQ-Disk on the different R-tree variants. The dominance of I/O costs in the overall query time for different data structures clearly shows. KDTopDown has a better packing algorithm for objects as compared to the rest. For this reason, we used it as the underlying data structure for all experiments in this paper. Effect of LRU buffering: Using KDTopDown to build the 160K dataset, we designed a query consisting of a real-life path of 34 points and d = 500m. We vary the LRU buffer size from 10%, 11%, …, 19%, 20%, 30%, …, 90% of the total internal nodes. Previous studies of LRU buffering [16, 17] suggest that as little as 10% buffer size (i.e. buffer size equals n*p/100 of the total number of n nodes given p percent) could halve the number of I/Os required. Our aim is to prove a fair case for RRQ-Disk (as
412
Fig. 8. PRQ-Disk performance on different R-tree data structures: R*-tree, Hilbert curve, STR and KDTopDown for search distance d=500m
even a straightforward implementation benefits from a LRU of some sort in modern databases) against PRQDisk. We observe that an LRU buffer as little as 10% cuts down I/Os by approximately 68.9% for RRQ-Disk (with 91.63% buffer utilization), mostly because the spatial index is traversed repeatedly for each query point pi and down a slightly different path the next time if pi+1 is near. RRQ-Disk benefited if nodes from the previous search is retained in the LRU buffer. PRQ-Disk does not show any effect as it optimally accesses only the nodes that are relevant, once, for all query points in P. In Fig. 9, LRU buffer ≥ 17% for RRQ-Disk improves its performance only marginally. Our experiments run all the way to 90% (although in practice, this is not feasible unless SDB is small) which showed that PRQ-Disk still requests 12.96% fewer I/Os than RRQ-Disk.
Fig. 9. PRQ-Disk and RRQ-Disk under different buffer sizes
[4]
[5]
[6]
[7]
[8]
5.
Conclusion
[9]
In this paper, we revisited the MPRQ and the efficient PRQ algorithm which we proposed for solving MPRQ. More often than not, spatial databases contain more data than can fit into main memory. Hence this paper addresses the case where the spatial database is large and external memory must be used for processing. Our extensive experimental results show that PRQ-Disk promises good performance in answering MPRQ in terms of query time as well as number of I/Os, even for the case where RRQDisk benefits from an implicit buffer that is the norm in database systems. As expected, the speed-up increases proportionally with the number of query points as well as with the search distance. In addition, this speed-up holds for a large variety of problem parameters: over different number of query points in the query path P (even for very small queries), different search distances d, as well as different spatial representations of the spatial database.
[10]
[11]
[12]
[13]
[14]
[15]
References [1]
[2]
[3]
[16]
Ng H.K., Leong H.W. & Ho N.L., 2004, “Efficient Algorithm for Path-Based Range Query in Spatial Databases”, 8th International Database Engineering & Applications Symposium (IDEAS 2004), pp. 334-343. Roussopoulos N., Kelley S. & Vincent F., 1995, “Nearest Neighbor Queries”, Proceedings of the 1995 ACM SIGMOD Conference, pp. 71-79. Ferhatosmanoglu F., Stanoi I., Agrawal D. & Abbadi A.E., 2001, “Constrained Nearest Neighbor Queries”,
[17]
413
Proceedings of the 7th Int’l Symposium on Advances in Spatial and Temporal Databases, pp. 257-278. Shan J., Zhang D. & Salzberg B., 2003, “On SpatialRange Closest-Pair Query”, 8th International Symposium on Spatial and Temporal Databases, pp. 252-269. Papadopoulos A. & Manolopoulos Y., 1998, “Multiple Range Query Optimization in Spatial Databases”, Proceedings 2nd East-European Conference on Advanced Databases and Information Systems (ADBIS), pp. 71-82. Papadias D., Shen Q., Tao, Y. & Mouratidis, K., 2004, “Group Nearest Neighbor Queries”, Proceedings of the 20th IEEE International Conference on Data Engineering (ICDE), pp. 301-312. Zhang, J. & Mamoulis N., Papadias D. & Tao Y.F., 2004, “All-Nearest-Neighbors Queries in Spatial Databases”, 15th IEEE Conf. on Scientific and Statistical DB Management (SSDBM), pp. 297-306. TPIE: A Transparent Parallel I/O Environment. http://www.cs.duke.edu/TPIE/ Gaede V. & Gunther O., 1997, “Multidimensional Access Methods”, ACM Computing Surveys. Kamel I. & Faloutsos C., 1994, “Hilbert R-Tree: An Improved R-Tree Using Fractals”, Proceedings of the 20th VLDB Conference, pp. 500-509. Leutenegger S., Lopez M. A. & Edgington J., 1997, “STR: A Simple and Efficient Algorithm for R-Tree Packing”, Proceedings of the IEEE International Conference on Data Engineering (ICDE 97). Ho N.L., 2000, “Proximity Search for Route Advisory Systems”, Honours Year Project Report, Dept. of Computer Science, National University of Singapore. Garcia Y.J, Lopez M.A. & Leutnegger S.T., 1998, “A Greedy Algorithm for Bulk Loading R-Trees”, Proceedings of 6th ACM Symposium on Geographic Information Systems, pp. 163-164. TIGER/Line Files, 2002 Technical Documentation, U.S. Census Bureau. http://www.census.gov/geo/www/tiger/ Foo H.M., Leong H.W., Lao Y.Z. & Lau H.C., 1999, “A Multi-Criteria, Multi-Modal Passenger Route Advisory System”, Proceedings of 1999 IES-CTR International Symposium. Theodoridis Y. & Sellis T., 1996, “A Model for The Prediction of R-tree Performance”, 8th International Symposium on Principles of Database Systems (PODS), pp. 161-171. Leutenegger S. & Lopez M. A., 1998, “The Effect of Buffering on the Performance of R-Trees”, Proceedings of the IEEE International Conference on Data Engineering (ICDE 98), pp. 33-44.