Optimized skyline queries on road networks using ... - Semantic Scholar

3 downloads 1316 Views 692KB Size Report
Apr 20, 2011 - data points are network distance relative to a set of query points. Spatial ..... step, it is guaranteed that the data points within a certain distance from ..... The database engine that is used to store the dataset is. Microsoft SQL ...
Pers Ubiquit Comput (2011) 15:845–856 DOI 10.1007/s00779-011-0371-7

ORIGINAL ARTICLE

Optimized skyline queries on road networks using nearest neighbors Maytham Safar • Dalal El-Amin • David Taniar

Received: 10 December 2010 / Accepted: 5 March 2011 / Published online: 20 April 2011  Springer-Verlag London Limited 2011

Abstract Skyline queries are used with data extensive applications, such as mobile location-based services, to support multi-criteria decision-making and to prune the data space by returning the most ‘‘interesting’’ data points. Most interesting data points are the points, which are not dominated by any other point. Spatial network skyline query is a subset of the skyline query problem where data points are nodes in a road network and the attributes of the data points are network distance relative to a set of query points. Spatial network skyline query’s problem is the need to calculate the attributes with an expensive distance calculation operation. Previous works (Deng et al. Proceedings of the 23th international conference on data engineering, 796–805, 2007), Sharifzadeh et al. Proceedings of the 32nd international conference on very large databases, 751–762, 2009) that addressed this problem involved extensive network distance calculation between the query points and data points. A new algorithm that requires a remarkably less number of network distance calculations is proposed in this work. Our approach uses a progressive nearest neighbor algorithm to minimize the set of candidates then evaluates those candidates by only comparing them to a subset of discovered skyline points. Experiments showed the effectiveness of our algorithm compared to previous works. Keywords Skyline query  Road networks  Network Voronoi diagrams  Mobile computing  Nearest neighbor

M. Safar (&)  D. El-Amin Kuwait University, Kuwait City, Kuwait e-mail: [email protected] D. Taniar Monash University, Melbourne, Australia

1 Introduction Mobile location-based services and applications are becoming widely available in the market and used in different directions. Those applications are usually coupled with publicly accessible online maps (e.g., Google Maps), route planning, etc. that answers the users desirable spatial queries. Mobile navigation has grown to be one of the most popular navigation applications these days [9, 13, 23–25, 27–34, 37, 39, 41, 42]. Beside the navigation applications, mobile technologies were used extensively to recognize the relation between the interest points in road networks [46–49]. Skyline queries are considered as an important application in mobile computing. Whenever an evaluation to a data space with respect to multiple criteria is needed, skyline query is the first choice. It returns the set of points that are not dominated by any other point [2]. A point p is dominated by point q, if point q was better than or as good as point p in all attributes and strictly better than point p in one attribute. This was a brief definition of skyline query problem. The skyline problem has gained great interest in the literature since it was introduced by Bo¨rzso¨nyi et al. [1], and then it was studied extensively by numerous numbers of works. Early works [1, 6, 8, 14] solved the problem under the assumption of the availability of all the attributes at the beginning of the algorithm as a given. Traditional example in the literature is the evaluation of hotels based on prices and distance to beach. Both of those attributes are available at the beginning of the algorithm, and they are static attributes [4]. In this paper, we are going to address the dynamic version of skyline problem where the attributes are calculated by an expensive network distance function. In our method, network Voronoi diagram is used

123

846

Pers Ubiquit Comput (2011) 15:845–856

to enrich the content of our mobile navigation system and to give more benefits to mobile users as well. This problem generally belongs to the set of spatial skyline queries. When a skyline query evaluates some locations based on their distances to other set of locations, the problem becomes a spatial skyline problem (SSQ) [15]. Spatial skyline queries have applications in emergency, sales, tourism, and military [11]. For instance, when there is a need to determine the best locations for a retail shop branches, SSQ can be used for such a decision. In such a problem, the query points will be the locations of the targeted customers’ locations for instance residential blocks, offices, schools, clinics, etc., while the data points will be the candidate shops’ locations. The result of the query will be the set skyline data points that represent the best retail shop locations. In such a problem, the evaluation function can be either the Euclidean distance of candidate shop locations to the targeted customers’ location or the network distance. Euclidean distances are only an approximation to the problem since the distance that really matters is the road network distance, and this is what we are going to address. Few previous works in this domain [4, 12] have major drawbacks. Both of the works tried to minimize access to network space; however, their techniques involved many network distance calculations from numerous numbers of points in the space to the query points. The algorithm that we will present computes nearest neighbors progressively from each query point then processes every reported nearest neighbor to check if it is a part of the skyline set. Our techniques minimize the access to the network space by only considering a set of candidates. Then a bulk network distance calculation is performed from each query point to the whole set of consecutive candidates. In order to clarify the theoretical concept of our algorithm, it should be presented in static version then in its dynamic network version. The static version requires that the dataset to be sorted according to every attribute independently. All the points that are successive to the point that is common among all sorted lists are pruned from the dataset. Remaining points in the sorted lists are tested by their order. It will be proved that the domination test is only sufficient against the skyline points that were in the same sorted list. The static concept is mapped to the dynamic network concept in the following way: 1. 2.

3.

Sorting points is mapped to finding nearest neighbor to the query point. Reaching a common point in a sorted list is analog to reaching a point that is reported as a nearest neighbor to every query point. Domination test requires that attributes of the tested points are known. That is why the algorithm calculates

123

the distance between every query point and every nearest neighbor to other query points. The rest of the paper is organized as follows: in Sect. 2, problem definition is presented in details. Section 3 presents a brief related work review. Section 4 presents our algorithm, first on static data then on network data. Section 5 presents the implementation method of our algorithm with a complexity analysis, and it evaluates our algorithm experimentally in comparison to VSNS2 [12] to elaborate its effectiveness in reducing the number of expensive distance computations. Section 6 concludes our work.

2 Problem definition Spatial network skyline query is a subset of skyline query problem where the evaluation function that decides whether a data point is better than other points regarding an attribute is the network distance between the data point and a query point. The components of the problem are interest points, query points, and a road network where those points (interest and query points) exist. An interest point is a part of a skyline query result and it is called a skyline point if there was no other point that has a less distance to all the query points than its distances. Spatial network skyline problem is defined as follows: Definition 1 (Domination). Given a set of query points Q = {q1, q2,…, qn}, i = 1,…, n in a road network, a point p dominates p0 if and only if the following condition holds (V i, d(qi,p) B d(qi,p0 ) ^ A j, d(qj,p) \ d(qj,p0 )). Definition 2 (Skyline point). Given a set of query points Q in a road network, a point p in the road network N is considered to be a skyline point if and only if (V p0 [ N, p0 does not dominate p). An application of the problem would be a tourism agency that needs to define the hotels that are near to a number of must see places in a city. The must see places are the query points while the hotels are the interest points. A hotel would be a part of skyline query result if there is no other hotel that has less distances to all the must see places. Consider the example in Fig. 1, the query points are the park and downtown. In this example, the query points are ‘‘Hotel 2’’ and ‘‘Hotel 4’’ since there is no hotel that is closer to downtown and park than them. ‘‘Hotel 1’’ is considered to be dominated by both ‘‘Hotel 2’’ and ‘‘Hotel 4’’. ‘‘Hotel 3’’ is dominated by ‘‘Hotel 2’’. Now the question is how network distance is different than any other attribute like hotel prices, ratings etc. It is

Pers Ubiquit Comput (2011) 15:845–856

847

(a) Sample road network Hotel

Park

downtown

Hotel 1

22

29

Hotel 2

17

10

Hotel 3

27

12

Hotel 4

10

17

(b) Distances Results

(c) Distances Results Plot Fig. 1 Spatial network skyline example

different because prices and ratings are static data that are already defined and given while network distance needs to be computed with an expensive function (as illustrated in the previous example). That means that if skyline algorithms that are designed for static data are to be used; we need to compute the distances between every query point and every interest point then start applying any skyline algorithm. Network distance computation has a high complexity that made the solution completely impractical. A practical solution should optimally select a set of candidates from the network space. Then distance calculation is performed to only that set of candidates. A special data structure for the road network would be used to pre-compute some network distances. 3 Related work Skyline query was addressed by many previous works. The skyline operator was first introduced in [1]. They have introduced skyline query algorithms that can be used in

databases. The problem with their techniques is that they are not progressive and the whole dataset have to be visited before returning the first query point. The work in [14] addressed the progressiveness problem. Their binary bitwise algorithm represents the points in bits, where for each dimension, the distinct values are identified and the value of the dimension is set to one starting from the order of the distinct value. The algorithm is only suitable for dimensions with discrete distinct values. Next another truly progressive algorithm was presented in [6]. Their work uses nearest neighbor technique to get the first skyline query. That first nearest neighbor partitions the space into 3 regions one which is dominated and others that needs further investigation. The algorithm continues and the number of partitions (to-do list) may grow so their approach was that the algorithm only needs to return the ‘‘big picture’’ of skyline points and not the whole set. Another algorithm that can be used in spatial skyline query is branch and bound algorithm [8]. It uses R-tree structure to partition the data. Then that R-tree is traversed by the order of the minimum distance of the lower left corner of the minimum bounding rectangle. Every entry is inserted in the heap then expanded by deleting it and inserting its children. Every entry is checked for dominance against the already found skyline points before it is inserted in the heap and before it is expanded. This algorithm is progressive and does not face a memory problem. Also it can be used to find spatial skyline points; however, it does not use any geometric properties to further prune the search space. Spatial skyline query was introduced in [12]. They were the first to introduce that term as a set of algorithms that use geometric properties to prune the search space. They designed two algorithms (B2S2 and VS2). B2S2 is an enhancement for BBS [8]; it starts by computing the convex hull of the query points, then it proceeds in the same way by traversing the R-tree in the order of minimum distance. VS2 uses Voronoi diagrams (delaunay diagrams) instead of R-tree. This algorithm preserves some dominance checks by using geometric properties. The traversal starts by the nearest neighbor of one of the query points and proceeds to the neighbors. Before performing dominance check of the point to each of the so far found skyline points the point is checked to see if it is inside the MBR (minimum bounding rectangle) of the skyline points or intersects with it. All the previously presented algorithms cannot be used to return the skyline point in a road network. Those algorithms assume that attributes of all the points are known and this is not the case in a network skyline query. Also the spatial skyline algorithms use geometric properties that could not be applied on a network space [12]. To the best of our knowledge, the only works that considered skyline query in a network space are [12] and [4]. Other works

123

848

[3, 16] were introduced later to solve the skyline problem in metric and graph spaces, respectively. The first introduction to the problem of skyline query in road network was in [4]. They considered that the goal of an efficient network skyline algorithm is to minimize the portion of the network to be accessed, rather than simply minimizing the dominance tests. They have introduced three algorithms collaborative expansion (CE), Euclidean distance constraint algorithm (EDC), and Lower bound constraint algorithm (LBC). CE finds the next nearest neighbor for all the query points. It finds the first data point visited by all query points. That point is the first skyline point; all the data points that are not visited up to this point are pruned from the search since they are dominated by the first nearest neighbor. Then the next nearest neighbor is found, and it is tested against all the skyline points to check if it is dominated. CE involves many network distance computations. Another algorithm is EDC that can work only offline because the skyline results are returned at the last step. The algorithm involves calculating Euclidean distance for all data points, then a Euclidean skyline algorithm runs to output the skyline points, those points are the initial candidate set of the skyline point. Then network distances between each point in the candidate set and query points are calculated. Next that candidate set is further expanded by shifting every point in the initial candidate set by its network distance. All the points within the hypercube formed by shifting the data points are included in the expanded candidate set. Again, network distances are recomputed for all candidate set points and query point. Finally, the network skyline is reported by pair wise comparison. The third algorithm is LBC. It is the same as CE but it uses some optimization techniques, it uses Euclidean distance computation to save some network distance calculations. The second group of work that addressed skyline query in network space is presented in [12]. They presented two algorithms SNS2 and VSNS2. SNS2 runs a parallel version of Dijkstra from each query point. From each query point, a step is performed to cover an equal distance. It progressively collects the skyline points. At the end of each step, it is guaranteed that the data points within a certain distance from each query point are visited. The domination test of every data point includes two steps. When it is visited for the first time, it is checked whether it was dominated by the so far found skyline points, if it was dominated then it is not added to the candidates. Then when the algorithm visits all the points that can dominate p are checked again. The data point is a skyline if it passed the two dominance tests.

123

Pers Ubiquit Comput (2011) 15:845–856

VSNS2 uses the same computational steps as SNS2 but rather than traversing the network node by node, it uses network Voronoi diagram structure. That approach enhances the network distance calculation. However, VSNS2 algorithm still requires many expensive network distance calculations to perform the domination tests, which could be dramatically reduced by applying our new algorithm. Table 1 summarizes part of the skyline algorithms previous work.

4 Network nearest neighbor skyline algorithm The first algorithm that observed the relation between nearest neighbor and skyline query was proposed in [6]. In their work, they proved that the first nearest neighbor in a 2-dimensional plane is a skyline point. The first nearest neighbor that is visited by all the query points is reported as the first skyline point [4]. In their work, they considered that point as a skyline point that dominates all non-visited interest points after it. Our approach uses the relation between the skyline query and nearest neighbor query to return the whole query result. In this section, our new algorithm N3S will be presented and proved to work on static data and network data.

4.1 Skyline query on static data We will consider a data set with n dimensions. Every data point in that data set is represented by hd1 ; d2 ; . . .; di ; . . .; dn i where di is the value of the data point in dimension i. That dataset has to be sorted in each dimension. This step will generate n sorted lists. The data point that appears first in all the sorted lists is considered to be a skyline point, and all the data points after it in all the lists are pruned from the skyline search. For example, suppose, there is a dataset of hotels with 3 dimensions price, distance to beach and capacity, the dataset was sorted in each dimension, and 3 sorted lists were generated. We have now a data point p (hotel) that has a tuple of value hp3 ; d4 ; c2 i where p3 is the third value in the list of prices, d4 is the fourth in list of distances and c2 is the second value in the list of capacities. Suppose that we have no other data point that has less order in all the three lists. That means that we do not have any data point that has an order less than third in prices, fourth in distance to beach and second in capacities. If that holds then p is a skyline point and all the points that appears after it in any sorted list are dominated by it, given that this point does not appear before p in other dimension. Now, the only points that are to be considered for skyline query are the first 2 points in list of prices, first 3 points in list of distances and the first point in list of capacities. It can be proved by contradiction that

Pers Ubiquit Comput (2011) 15:845–856

849

Table 1 Skyline existing algorithms Algorithm

Addressed problem

Comments

Skyline operator

Static skyline

Not progressive

Binary bitwise

Static skyline

Not progressive

Online algorithm

Static skyline

Progressive based on nearest neighbor

Branch and bound

Static skyline

Progressive and can be used for spatial skyline

Z order curve [7]

Dynamic skyline

Concentrates on updating the skyline result when data is changes

B2S2

Spatial skyline

Uses geometric properties to prune search space

VS2

Spatial skyline

Same as B2S2 but traverses the spatial space by using Voronoi diagrams

CE

Network spatial

Depends on finding the next nearest neighbor by all query points

EDC

Network spatial

Uses Euclidean distance to control the network expansion in all directions, not progressive

LBC

Network spatial

Uses the same technique of EDC but solves the progressiveness problem

SNS2 VSNS2

Network spatial Network spatial

Parallel dijkstra to cover the network space Same as SNS2 but it uses network Voronoi diagram to improve network traverse

DSG

Graph skyline

Uses graph properties to prune the search space, requires expensive index structure computation

points like point p is a skyline point. Suppose that p is not a skyline point then there should be a data point that dominates it, by definition that means that there is a point that is less or equal p in all dimensions and less than p in at least one dimension. If that point exists then it would appear before p in all the dimensions and this contradicts our assumption. Define SLiki to be a sorted list of data points with respect to dimension i, size of SLiki is k. Also define a(p,i) to be the value of the dimension i in point p. In SLiki point p is before q if and only a(p,i) B a(q,i), where ‘‘B’’ means better than or as better as with respect to dimension i. ‘‘\’’means strictly better than. ‘‘=’’ means as good. Lemma 1 A point p is a skyline point if Vi p [ SLiki and p 62 SLi(k-1)i, and there exist no point r such that Vi, r [ SLigi and gi \ ki. Proof (by contradiction) Suppose that p is not a skyline point then there exist a point that dominates it. A point r dominates p when Vi: a(r,i) B a(p,i) and Ai where a(r,i)\ a(p,i). That implies that r is before p in all dimensions i that means that Vi, r [ SLigi: gi \ ki, and this contradicts our assumption. All the points that are after point p in all sorted lists are not considered for further investigation since they are all dominated by point p. Lemma 2 If point q does not belong to any sorted lists that contains p (defined in Lemma 1) then point q is dominated by at least point p. Proof since p is before q in every sorted list then p dominates q. h

After we found point p, we have to investigate all points that appear before it in the sorted lists. Let’s call those points as candidates. Starting from the first point in each list, it is only sufficient to perform the domination test against the skyline points that have been found and the points that have the same order in the same sorted list. Back to our example, the first hotel in the list of prices is considered to be a skyline point because there is no other skyline point that appeared before it in the list of prices. However, the second point has to be tested against the first point. Hotel q is the lowest in prices (first in the list), hotel r is the second lowest so other attributes (distance and capacity) has to be examined. If hotel r was better than q in any of these dimensions then it is a skyline point. Lemma 3 Vi, p [ SLiki, to report p as a skyline point it is sufficient to prove that all the skyline points s does not dominate point p, where s [ SLiki and s is before p in SLiki. Proof Suppose that p0 is the first point in SLiki, p0 is a skyline point since there is no other point that is better than it in dimension i then no point can dominate it (consider that all the points has distinct values in all dimensions). p1 is the second point in SLiki if p1 is not dominated by p0 then p1 is a skyline point because it has at least one attribute that is better than p0 and that attribute is definitely not ai. At the same time when considering dimension i, p1 is better than all other points in the data set. This argument applies to any point p in SLiki, if p was not dominated by all prior skyline points then p is better than every prior point in at least one dimension. And for rest of points in the dataset (points that are after p in SLiki, and points that are not in the list) point p is better that them in dimension i. h

123

850

Pers Ubiquit Comput (2011) 15:845–856

From this lemma, we conclude that when applying this investigation on each sorted list, the whole set of skyline points is returned. 4.2 Skyline query algorithm on road network We will again consider the same hotels example but change the attributes to road network distance to beach, airport and downtown. The problem now is a network skyline query problem. Applying our algorithm N3S would mean that we will first sort the hotels in 3 lists which means that we calculated the distances between every hotel and every query point (beach, airport and downtown). That approach is definitely not feasible. A better solution would be to calculate the nearest neighbors progressively from each query point and once we reach a data point (hotel) that is reported by all the query points as a nearest neighbor, calculation is stopped. Again suppose that we have hotel p that is third nearest to the beach, fifth nearest to airport and seventh nearest neighbor to downtown. In order to calculate the rest of skyline points and perform dominance tests, we need to calculate the network distances between candidates and query points. This operation is so expensive if we are going to loop through every data point and apply network distance calculation between that point and every query point. Our approach to solve this problem is to perform a bulk distance calculation between every query point and other query points’ nearest neighbor lists. If we applied this to our example it means that we will calculate the distance between the beach and the nearest neighbor list of airport and downtown and the same operation is applied to the rest of query points. One such distance calculation is as expensive as a distance calculation from point to point because the nearest neighbors list always consists of a set of points that are near each other. The steps of the algorithm are: 1.

2.

3.

From every query point run a step of nearest neighbor algorithm, until the first skyline point is found line (1–9) Calculate the distances from each query point to the set of other query points nearest neighbor, line (10–14). Loop through neighbors of each query point, for every point perform dominance test for this point against all the skyline points that are found among neighbors of the current query point. Points that are not dominated by that set of query points are added immediately to the set of skyline points, line (15–21)

123

The advantage of N3S is that it involves few network distance computation. Also a candidate should not be evaluated against all the so far found skyline query set, rather it should be evaluated against skyline set that was found in relative to the query point that this candidate is a neighbor to it. Number of network distance computation equals the number of query points (nq). From every query point the distance between every query point and every neighbor list of other query points has to be calculated. Since the neighbor list of a query point is always a list of adjacent points then the cost of distance computation is only the distance computation between the query point and the farthest point in the nearest neighbor list. Other advantage of N3S is that the user can choose which skyline points are the most interesting for her. Suppose that the user is only interested in the points that are nearest to point q3, then the algorithm can only run

Pers Ubiquit Comput (2011) 15:845–856

851

PINE from q3, then perform the dominance tests only in this branch. Consider the road network in Fig. 2, the query points are A, D, and F. Table 2 presents a walkthrough example of Algorithm 1.

5 Implementation and experimentations

Fig. 2 Sample road network

Table 2 Nearest neighbor network skyline query example q

Reached point

Explored points

Candidates (point, order, distance)

A

A

(B,10), (C,12)

A

B

(D,22)

A

C

(E,17)

D

D

(C,10), (B,12), (H,17), (E,12)

D

C

(A,22)

D

B



F

F

(H,10), (E,12)

F

H

(D,27)

F

E

(C,17), (D,24)

(H,1,10)

A

E

(F,29)

(C,2,12)

D

E

(F,24)

(B,2,12), (E,2,12)

D

H

(F,27)

F

C

(A,27)

(E,2,12)

A

D

(H,39)

(E,3,17)

(B,1,10)

(C,1,10)

Step 1 is done q

Reached point

Explored points

Distance to candidate

A

F



A

H



(H,-,39)

D

H

(F,27)

(H,-,17)

F

C

(A,27)

(C,-,17)

F

D

(B,36)

F

B



(B,-,36)

Step 2 is done q

Candidate

Domination tests

A

B



B

A

C

B

B,C

D

C



C

D

B

E,C

C

F

H



H

Step 3 is done, skyline result is: B, C, H, E

Partial skyline

N3S needs an incremental network nearest neighbor algorithm to find list of candidates from each query point. Progressive incremental network expansion (PINE) [10] was used for this purpose. PINE depends on partitioning the network into smaller more manageable regions by generating a network Voronoi diagram (NVD). Partitioning the network and exploring the network incrementally were proven to be successful in numerous other related works [35, 36, 38, 40, 44, 45]. NVD is a special case of a Voronoi diagram [17] where the distances between points are the road network distances. Interest points that generate network Voronoi polygons resides in a network plane where the location of points is defined by the links between them. The whole plane is represented by a graph that has weighted links. Each intersection between at least two links is considered to be a node in the graph, and the links represent roads. A network Voronoi polygon contains all the locations that are closest to a point in a road network plane. In a road network plane, the links between nodes define the locations that are nearest to an interest point [5]. That means that the region that contains all the links that are closest to an interest point is called a network Voronoi polygon and the set of all those regions defines the network Voronoi polygon. Network Voronoi diagram was extensively used in the literature to minimize the online network distance computation [9, 10, 19–21, 26]. PINE distinguishes between two sets of nodes, the inner nodes that are inside the network Voronoi polygon and the border points that lie on the borders of the polygons. A border to border computation is performed for each of the NVPs, separately. The algorithm starts by first finding the network polygon that contains the query point. The generator of that polygon is the first nearest neighbor. Second, the nodes that are connected to the query point q are explored and added to a priority queue ordered by the network distances to q. When the nearest node is explored, it is deleted from the queue and replaced by all the nodes that can be reached. This process continues inside the polygon, until a border point is reached. Since the distances of border to border points are pre-computed, all the border points of the current polygon are added to the queue as nodes to be reached

123

852

and the generator of the current polygon is added to the current set of candidate nearest neighbors. The process continues until the requested number of nearest neighbors is reached. PINE accesses the road network gradually since it accesses every network polygon separately and uses pre-computed values to save the online computation of distances among polygons. N3S operates PINE from each query point, where PINE gradually finds the next nearest neighbor from each query point until an interest point that is reported by all query points as a nearest neighbor. Next, the distance computation from each query point is carried out from the place where PINE stopped to ensure that the network expansion developed by PINE is reused efficiently. Distance computation uses the same progressive technique to expand the search of shortest distance among network Voronoi polygons to reach nearest neighbor of other query points. Complexity of N3S depends mainly on the cost of the next nearest neighbor computation and the number of times the next nearest neighbor has to be computed. A property of a voronoi polygon is that its edges cannot exceed 6. This property was proved at [17]. That means

Fig. 3 N3S verses VSNS2 under different query size

123

Pers Ubiquit Comput (2011) 15:845–856

that in order to find the next nearest neighbor in N3S at most 5 network voronoi polygons will be examined to report a new nearest neighbor. In N3S from each query point k nearest neighbors have to be found, where k is determined when a point that is reached by all query points is found. Finding that point will cost 5k|Q| where |Q| is the number of query point. Fagin algorithm [18] answers the database queries that have fuzzy conjunctions. First part of the algorithm reads the elements from each sorted data source until a record that is seen by all data sources is found. Fagin in his work proved that the cost this operation is O(N(m-1)/mk1/m), where m is the number of queries and k is the number of required top objects. If we considered our query points to be the data sources that will sort the interest points according to their distances to them, then Fagin theory can be applied to our problem. N3S only requires the first k. It can be stated that our online algorithm maximum candidate size is N(m-1)/m. Every candidate found it should be reported as a nearest neighbor to one of the query points, Then the cost will be O(5 N(m-1)/m).

Pers Ubiquit Comput (2011) 15:845–856

5.1 Experimental setup Experiments were conducted on a real dataset obtained from NAVTECH, and it represents a network of approximately 110,000 links and 79,800 nodes of the road system in the downtown Los Angeles. N3S was tested against VSNS2. VSNS2 progressively reaches the nodes that are nearest to the query point and calculates the distance between every reached node and every query point whenever that next node is reached. The algorithm stops when the distance between the next reached node and the query point is greater that the distance between the farthest node to a query point. VSNS2 was chosen to be compared because it uses the same data structure as N3S, specifically it uses network Voronoi diagrams. The database engine that is used to store the dataset is Microsoft SQL server 2008. The experiments were implemented using java and conducted on core 2 duo 2.4 GHz machine with 3 GB RAM running Windows 7. The tested query sets were chosen using the method described in [16]. The first query point is chosen randomly. Then k nearest neighbors is calculated. Value of k equals

853

maximum of number of query points to be chosen and number of interest points in the dataset multiplied by k, where k is a factor to determine the diameter of the query points. A large value of k means a larger diameter. The experiments will evaluate query performance of VSNS2 and N3S against different number of query points, values of k and datasets densities. Theoretically N3S outperforms VSNS2 in response time, domination tests, candidates’ size and number of disk accesses. N3S has a clear method for reusing the distances calculated during the algorithm, while VSNS2 covers the network in parallel and requires the computation of distances between the discovered points and the query points. Using a pre-computed structure will reduce the cost required for distance calculation but this is shared between the two algorithms since both are using the same NVD data structure. Clearly domination tests are less for N3S because it never compares the candidate to the whole discovered skyline set, it always compares the candidates to a partial skyline query set and only applies domination test on a candidate once. VSNS2 compares the candidate to the

Fig. 4 N3S verses VSNS2 under different k values

123

854

whole discovered skyline set and performs the domination test twice to take a pre-decision of including the discovered point to the candidate set. Candidate sets for N3S are smaller because N3S stops the candidates’ collections on the last skyline point while VSNS2 stops the search when a point that can not dominate the discovered points is discovered and this last point is clearly found after the last skyline point. However the candidates set of N3S may exceed VSNS2 candidate set because VSNS2 applies a domination test before adding a point to the candidate set.

5.2 Query performance under different query sets sizes N3S and VSNS2 were tested under different query set sizes. Different query sizes were: 3, 5, 7, and 9. Density value was fixed on 0.058 and value of k was fixed on 0.003. Results showed that N3S outperforms VSNS2 in dominance tests, considered candidates, total response time and number of disk accesses (Fig. 3).

Fig. 5 N3S verses VSNS2 under different density values

123

Pers Ubiquit Comput (2011) 15:845–856

5.3 Query performance under different k values N3S and VSNS2 were tested under different k values. Different k values were: 0.003, 0.004, 0.006, 0.008, and 0.01. Density value was fixed on 0.058 and value of query size was fixed on 3. Results showed that N3S outperforms VSNS2 in dominance tests, considered candidates, total response time, and number of disk accesses (Fig. 4). 5.4 Query performance under different density Density of a dataset is determined by the number of interest points in the road networks. The real test dataset contained different types of interest points with different numbers on the same road network. N3S and VSNS2 were tested under different density values. Different density values were: 0.0004, 0.0016, 0.0115, 0.0326, and 0.058. k value was fixed on 0.003 and value of query size was fixed on 3. Results showed that N3S outperforms VSNS2 in dominance tests, considered candidates, total response time and number of disk accesses (Fig. 5).

Pers Ubiquit Comput (2011) 15:845–856

5.5 Conclusions and future work In this work, we presented an efficient algorithm that returns the skyline points in a road network. The algorithm depends on sorting points as nearest neighbors to query points, then starts to perform distance calculation computations and dominance tests. The algorithm operates on 3 main phases, nearest neighbor computation, distance computation and dominance tests. Performing first and second phases results in minimizing the number of distance calculation operations to O(nq), where nq is the number of query points. This number was not achieved by any of the previous road network skyline queries. In all previous works the number of network calculations approximately equals the number of considered candidates multiplied by number of query points. Experiments showed that N3S outperforms VSNS2 in number of dominance tests, considered candidates and response time. Future work will include techniques to solve the skyline query for moving query points [22]. Also studying the most interesting skyline query problem would be applicable since our algorithm supports such user preferences. We will also incorporate intelligence techniques and contextaware in mobile navigation and mobile query processing [19, 27]. Performance in mobile query processing is always an issue. We plan to examine more thoroughly the performance issues of mobile query processing including the use of dynamic data structures.

References 1. Bo¨rzso¨nyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data engineering, pp 421–430 2. Bhattacharya B, Bishnu A, Cheong O, Das S, Karmakar A, Snoeyink J (2010) Computation of non-dominated points using compact Voronoi diagrams. Walcom Algo Comput LNCS 5942:82–93 3. Chen L, Lian X (2008) Dynamic skyline queries in metric spaces. In: Proceedings of the 11th international conference on extending database technology: advances in database technology, pp 333–343 4. Deng K, Zhou X, Shen HT (2007) Multi-source skyline query processing in road networks. In: Proceedings of the 23th international conference on data engineering, pp 796–805 5. Kolahdouzan MR, Shahabi C (2004) Voronoi-based K nearest neighbor search for spatial network databases. In: Proceedings of the 30th international conference on very large data bases, pp 840–851 6. Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th international conference on very large data bases, pp 275–286 7. Lee K, Zheng B, Li H, Lee W (2007) Approaching the skyline in Z order. In: Proceedings of the 33rd international conference on very large data bases, pp 23–27

855 8. Papadias D, Tao Y, Fu G, Seeger B (2003) An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, pp 467–478 9. Safar M (2008) Spatial queries in road networks based on PINE. J Univ Comput Sci 14(4):590–611 10. Safar M (2005) K nearest neighbour search in navigation systems. Mob Inf Syst 1(3):207–224 11. Sharifzadeh M, Shahabi C (2006) The spatial skyline queries. In: Proceedings of the 32nd international conference on very large databases, pp 751–762 12. Sharifzadeh M, Shahabi C, Kazemi L (2009) Processing spatial skyline queries in both vector spaces and spatial network databases. ACM Trans Database Syst 14(3) 13. Son W, Lee M, Ahn H, Hwang S (2009) Spatial skyline queries: an efficient geometric algorithm. Adv Spatial Temp Databases LNCS 5644:247–264 14. Tan K, Eng P, Ooi BC (2001) Efficient progressive skyline computation. In: Proceedings of the 27th international conference on very large data bases, pp 301–310 15. Zheng B, Lee K, Lee W (2008) Location-dependent skyline query. In: Proceedings of the 9th international conference on mobile data management, pp 148–155 ¨ zsu MT, Zhao D (2010) Dynamic skyline in 16. Zou L, Chen L, O large graphs. Database Syst Adv App LNCS 5982:62–78 17. Okabe A, Boots B, Sugihara K, Chiu SN (2000) Spatial tessellations, concepts and applications of Voronoi diagrams, 2nd edn. Wiley 18. Fagin R (1996) Combining Fuzzy Information from Multiple Systems. In: Proceedings ACM SIGMOD/SIGACT conference on principle of database systems (PODS), pp 216–226 19. Xuan K, Zhao G, Taniar D, Srinivasan B, Safar M, Gavrilova M (2009) Network Voronoi diagram based range search. IEEE 23rd international conference on advanced information networking and applications (AINA), pp 741–748 20. Zhao G, Xuan K, Taniar D, Safar M, Gavrilova M, Srinivasan B (2009) Multiple object types KNN search using network Voronoi diagram. In: Proceedings of IEEE international conference on computational science and its applications (ICCSA), 5593/2009, pp 819–834 21. Zhao G, Taniar D, Rahayu W, Safar M, Srinivasan B (2010) Path branch points in mobile navigation. In: Proceedings of the 8th international conference on advances in mobile computing & multimedia (MoMM) 22. Xuan K, Taniar D, Safar M, Srinivasan B (2010) Time constrained range search queries over moving objects in road networks. In: Proceedings of the 12th international conference on information integration and web-based applications & services (iiWAS) 23. Al-Khalidi H, Taniar D, Safar M (2011) Approximate static and continuous range search in mobile navigation. 5th ACM international conference on ubiquitous information management and communication (ICUIMC) 24. Safar M (2008) Group K-nearest neighbour’s queries in spatial network databases. J Geo Sys 10(4):407–416, published by Springer 25. Zhao G, Xuan K, Rahayu W, Taniar D, Safar M, Gavrilova M, Srinivasan B (2009) Voronoi based continuous k nearest neighbor search in mobile navigation. IEEE Trans Ind Electron (IEEE TIE) 56(12) 26. Xuan K, Zhao G, Taniar D, Srinivasan B, Safar M, Gavrilova M (2009) continuous range search based on network Voronoi diagram. Int J Grid Utility Comput (IJGUC) 1(4), Inderscience Publisher 27. Xuan K, Zhao G, Taniar D, Safar M, Srinivasan B (2009) Voronoi-based multi-level range search in mobile navigation. Int

123

856

28.

29.

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

Pers Ubiquit Comput (2011) 15:845–856 J Multimed Tools App, Springer. doi:10.1007/s11042-0100498-y, online first Xuan K, Zhao G, Taniar D, Rahayu W, Safar M, Srinivasan B (2011) Voronoi-based range and continuous range query processing in mobile databases. Pub J Comput Syst Sci (JCSS), Elsevier Science 77(4):637–651 Jayaputera J, Taniar D (2005) Data retrieval for location-dependent queries in a multi-cell wireless environment. Mob Inf Syst 1(2):91–108 Borgy Waluyo A, Srinivasan B, Taniar D (2005) Research in mobile database query optimization and processing. Mob Inf Syst 1(4):225–252 Xuan K, Zhao G, Taniar D, Srinivasan B (2008) Continuous range search query processing in mobile navigation. In: Proceedings of the 14th international conference on parallel and distributed systems (ICPADS 2008), IEEE, pp 361–368 Waluyo BA, Srinivasan B, Taniar D (2005) Research on locationdependent queries in mobile databases. Int J Comput Syst Sci Eng 20(2) Thai Tran Q, Taniar D, Safar M (2010) Bichromatic reverse nearest-neighbor search in mobile systems. IEEE Syst J 4(2): 230–242 Waluyo BA, Taniar D, Rahayu W, Srinivasan B (2009) Mobile service oriented architectures for NN-queries. J Net Comput App 32(2):434–447 Safar M, Ebrahimi D, Taniar D (2009) Voronoi-based reverse nearest neighbor query processing on spatial networks. Multimed Syst 15(5):295–308 Thai Tran Q, Taniar D, Safar M (2009) Reverse k nearest neighbor and reverse farthest neighbor search on spatial networks. Trans Large Scale Data Knowl Cent Syst 5740(2009):353–372 Zhao G, Xuan K, Taniar D, Srinivasan B (2010) Look ahead continuous KNN mobile query processing. Comput Syst Sci Eng 25(3):205–217 Zhao G, Xuan K, Taniar D, Srinivasan B (2008) Incremental k-nearest-neighbor search on road networks. J Int Net 9(4):455–470 Borgy Waluyo A, Taniar D, Rahayu W, Srinivasan B (2011) Mobile broadcast services with MIMO antennae in 4G wireless

123

40.

41.

42.

43.

44.

45.

46.

47.

48.

49.

networks. World Wide Web J. doi:10.1007/s11280-011-0113-9, online for early access Zhao G, Xuan K, Rahayu W, Taniar D, Safar M, Gavrilova M, Srinivasan B (2010) Voronoi-based continuous k nearest neighbor search in mobile navigation. IEEE Trans Ind Electron. doi: 10.1109/TIE.2009.2026372, online for early access Waluyo B.A, Rahayu W, Taniar D, Srinivasan B (2010) A novel structure and access mechanism for mobile broadcast data in digital ecosystems. IEEE Trans Ind Electron. doi:10.1109/ TIE.2009.2035457, online for early access Taniar D, Safar M, Thai Tran Q, Rahayu W, Hyuk Park J (2010) Spatial network RNN queries in GIS. Comput J. doi: 10.1093/comjnl/bxq068, online for early access Xuan K, Zhao G, Taniar D, Rahayu W, Safar M, Srinivasan B (2010) Voronoi-based range and continuous range query processing in mobile databases. J Comput Syst Sci. doi:10.1016/ j.jcss.2010.02.005, online for early access Xuan K, Zhao G, Taniar D, Safar M, Srinivasan B (2010) Voronoi-based multi-level range search in mobile navigation. Multimed Tools App. doi:10.1007/s11042-010-0498-y, online for early access Mammeri Z, Morvan F, Hameurlain A, Marsit N (2009) Location-dependent query processing under soft real-time constraints. Mob Inf Syst 5(3):205–232 ´ ngel Go´mez-Nieto M (2010) A Borrego-Jaraba F, Luque Ruiz I, A NFC-based pervasive solution for city touristic surfing. Pers Ubiquit Comput. doi:10.1007/s00779-010-0364-y, online for early access Rashid O, Coulton P, Edwards R (2008) Providing location based information/advertising for existing mobile phone users. Pers Ubiquit Comput 12(1):3–10 Choi J, Jang B, Kim G (2010) Organizing and presenting geospatial tags in location-based augmented reality. Pers Ubiquit Comput. doi:10.1007/s00779-010-0343-3, online for early access Kenteris M, Gavalas D, Economou D (2009) An innovative mobile electronic tourist guide application. Pers Ubiquit comput 13(2):103–118

Suggest Documents