2013 IEEE International Conference on Pervasive Computing and Communications (PerCom), San Diego (18--22 March 2013)
Bichromatic Reverse Nearest Neighbors in Mobile Peer-to-Peer Networks Thao P. Nghiem ∗ , Kiki Maulana† , Agustinus Borgy Waluyo ∗ , David Green∗ , David Taniar∗ Faculty of Information Technology Monash University Melbourne, Australia Email ∗ : {phuong.thao.nghiem, agustinus.borgy.waluyo, david.green, david.taniar}@monash.edu Email † :
[email protected]
Abstract—The increasing use of mobile communications has raised many issues of decision support and resource allocation. A crucial problem is how to solve queries of Reverse Nearest Neighbor (RNN). An RNN query returns all objects that consider the query object as their nearest neighbor. Existing methods mostly rely on a centralized base station. However, mobile P2P systems offer many benefits, including self-organization, fault-tolerance and load-balancing. In this study, we propose two P2P algorithms focusing on bichromatic RNN queries, in which mobile query objects and static objects of interest are of two different categories, based on a boundary polygon around the mobile query object. The Exhaustive Search Algorithm makes use of all information from the peers to aim at high accuracy rate while the Optimized Search Algorithm reduces the number of queried peers. The algorithms are evaluated in MiXiM simulation framework with a real dataset. The results show the practical feasibility of the P2P approach in solving bichromatic RNN queries for mobile networks.
Figure 1. Centralized Systems versus P2P Systems. Cirlcles represent moving objects; stars: objects of interest; dashed lines: wide-range communication; continuous lines: P2P communication; dots: the BS network range.
particular, those systems only contain a central point of failure, which is likely to be corrupt in several scenarios. For example, on a battle field or natural disaster, the headquarters is vulnerable to unavailability or traffic congestion [3]. In response to the limitations of centralized query processing systems and the advance of mobile technologies, the emergence of mobile peer-to-peer (P2P) query processing systems is a promising solution. A typical mobile P2P network consists of a collection of moving objects in Fig. 1(b) which are equipped with a Global Positioning System (GPS) and able to communicate with each other in a P2P manner to share common interests via short-range wireless technology standards such as IEEE 802.11, Bluetooth, or Ultra Wide Band(UWB), etc [4], [5]. When a moving object q invokes an RNN query, it sends a query message to surrounding peers which are in its communication range instead of performing wide-range communication to the BS. In order to reply to the query from q, neighbor peers return their cached data, for example, their kNNs retrieved two minutes ago. From received data of objects of interest from its peers, q makes a selection to answer its RNN queries. While most recent research has proposed index structure or query processing in centralized database systems, there are very few studies based on P2P approaches [3], [6], [7]. Speaking from our best knowledge, there is no previous work in the context of bichromatic RNNs for mobile P2P networks. Therefore, we have developed two P2P algorithms
Keywords-RNN Queries; P2P Spatial Queries; Mobile Networks; Collaborative Caching.
I. I NTRODUCTION The growing importance of mobile communication systems has highlighted the need for solutions to many problems of geographic searching. One of these is the problem of Reverse Nearest Neighbor (RNN), in which a query that returns all objects that consider the query object as their nearest neighbor. Reverse Nearest Neighbor (RNN) queries were first introduced in 2000 by Korn and Muthukrishnan [1]. They have since attracted a growing number of studies in a wide range of applications, such as decision support systems, mobile navigation systems and resource allocation. The problem is raised from the objects’ point of view. Instead of finding the nearest objects from the query point q, it asks which objects consider q as their nearest neighbor. There are two types of RNNs; firstly monochromatic RNN, in which query points and objects of interest are of the same category, and secondly bichromatic RNN, in which they are of different categories. In the literature, there is a wide variety of research on RNN queries; however, their query processing is based on a centralized base station (BS) as in Fig. 1(a) [2]. Scalability, bottleneck and low fault-tolerance are critical issues of those centralized approaches, especially in large-scale systems. In
978-1-4673-4575-0/13/$31.00 ©2013 IEEE
160
Table I N OTATIONS
focusing on bichromatic RNN queries based on a boundary polygon around the mobile query object. The Exhaustive Search Algorithm makes use of all information from the peers while Optimized Search Algorithm reduces the number of queried peers. In this paper, we introduce a novel mobile P2P system with the following overall contribution: 1) We introduce a new direction in mobile P2P query processing in solving bichromatic RNN queries. 2) We propose two algorithms to search cached spatial data of objects of interest from mobile peers. 3) From our experimental study involving real-data set, it is found that our proposed system is substantially more energy-efficient and save up to 43% latency time compared with the centralized system. 4) The simulation also shows that the accuracy rate in Exhaustive Algorithm is higher than that in Optimized Search Algorithm in most scenarios. Vice versa, the Optimized Search Algorithm reduces query response time as the number of queried peers are pruned.
Notation q pi P = {p1 , ..., pH } IOpi r
Meaning the query node a peer node with ID i a priority queue of peers of q. |P | = H. is a set of sorted objects of interest cached in pi . the communication range of a moving object.
communication to the BS, it is still a hybrid approach which requires both P2P can centralized implementation. A distributed multi-dimensional index structure, called P2PRdNN was introduced in [6] to solve RNN queries. However, this research focused on monochromatic queries only and was based on super-peer-based overlay. In addition, the work in [3], [7] proposed a framework to find approximate answer for spatial queries in mobile P2P environments but they only deal with range and kNN queries. Our work differs from all of the previous research as it is the first one to investigate bichromatic RNN queries in mobile P2P networks only by harnessing the power of peer collaboration without any central supervision.
II. R ELATED W ORK A. Reverse Nearest Neighbors in Centralized Database System
III. M OBILE P2P Q UERY P ROCESSING : P ROPOSED M ODEL
Snapshot RN N query processing was introduced in 2000 by Flip Korn and S.Muthukrishnan [1]. It is based on an RTree by pre-calculating a circle C(o, N (o)) for each object of interest o where radius N (o)) is the distance from o to its nearest neighbor. The usage of perpendicular bisectors of the line segment between q and its peer to answer RkNN queries, was presented in [5] based on a filter-refinement framework. There are also a number of studies working on RkNN queries. For example, the research in [8] uses a convex polygon obtained from intersection of bisectors. The Voronoi diagram was used in solving bichromatic RNN Queries [9] to determine the shortest route to the query point. All approaches above are efficient in terms of cost and time of RNN search. However, they require a BS to deploy an index tree. In addition, they are not suitable to be implemented in mobile P2P systems due to short-life battery and limited memory of mobile devices.
A. System Model and Assumptions We consider an mobile network without any central supervision or base station where query objects and their peers are dynamic. It is a symmetric system where each moving object such as a smart mobile phone or a PDA plays as both a query node and a peer of other nodes. Moving objects are also able to self-aware of their location through an equipped GPS. Location of moving objects and objects of interest mentioned in this paper are actual physical location. In addition, objects of interest are distributed randomly in the network. To enhance P2P query processing, a cache memory is assigned to store spatial data of objects of interest from its previous queries. Also moving objects are equipped to conduct ad-hoc communication with other neighbor moving objects via Bluetooth or Wireless Local Networks (WLANs).
B. Mobile Collaborative Caching and P2P Query Processing
B. Notations and Definitions Table I lists notations which are going to be used in this paper. Definition 1: Let q be the query node and pi be one of its peer. A boundary line (bi ) is defined as the perpendicular bisector line of line segment< q, pi > Definition 2: Provided (bi ) is the boundary line corresponding to pi , the positive half plane, denoted as H + (pi ), is defined as
Advances of collaborative caching technique and mobile devices have inspired the emergence and rapid development of mobile P2P systems in wide ranges of applications from everyday-life applications to military environments. Therefore, many researchers have investigated query processing in this context. The first on-demand P2P data sharing algorithm on kNN queries was introduced in [10]. Although research in [10] can alleviate server workload in query processing and reduce
H + (pi ) = {x ∈ RxR, dist(x, q) ≤ dist(x, pi )}
161
(1)
Figure 4.
Figure 2. The Boundary line (b1) separates the positive half plane H + and the negative half plane H − .
Figure 3.
C. System Details 1) Overview: Our ultimate aim is to harness collaborative power of peers in a pure P2P RNN query processing on mobile environments in order to answer RNN queries. The core of our system is eliminating the BS by collecting results from selected peers by time and energy effective boundarypolygon-based algorithms. Overall, the proposed system is divided into 3 primary phases: 1)Initialization and Peer Discovery Phase, 2)Constructing a Boundary Polygon and Sending Queries and lastly 3)Pruning Objects of Interest. Each phase is described in detail as below. 2) Initialization and Peer Discovery: Each moving object maintains a default map and the associated objects in its cache. Since moving objects move frequently, their peers also change. As a result, before starting to send queries, query node q needs to discover which moving objects around are in their communication range by simply sending a one-hop broadcast message. Moving objects receiving the broadcast message send an acknowledgment message which is attached their ID and location information. The query node q collects all acknowledgment messages from the surrounding nodes. From the attached location of each acknowledgment message, q calculates the distance between q and the peer to put them into a priority queue which is ascendingly ordered by that distance. 3) Constructing a Boundary Polygon and Sending Queries: The query node q starts query processing by popping out the information of the first peer in the P queue and starts building the boundary polygon. The constructing process is based on the Lemma 1 below. It is assumed that there is a polygon B =< V, E > around q and vj is the furtherest vertex of B from q, i.e. ∀vm ∈ V, dist(q, vm ) ≤ dist(q, vj ). We create the polygon by drawing perpendicular bisector lines for each peer pi from queue P . The lemma is stated as follows. Lemma 1: If ∃pi ∈ P such that dist(q, pi ) ≥ 2dist(q, vj ), then B is a tight polygon. Put another way, we do not need to consider remaining peers left in the queue P and stop creating the polygon. Proof: Let B =< V, E > be a tight polygon. pi is a peer of q such that
Boundary polygon < v1 , v2 , v3 , v4 > of q.
Similarly, the negative half plane, denoted as H − (pi ), is defined as: H − (pi ) = {x ∈ RxR, dist(x, q) ≥ dist(x, pi )}
The illustration of Lemma 1’s Proof.
(2)
It is easy to see that the boundary line (bi ) separates H − (pi ) and H + (pi ) because ∀ point x ∈ (bi ), dist(x, q) = dis(x, pi ). Fig. 2 illustrates the boundary line (b1 ) between q and p1 . All objects of interest in the H + (p1 ) are closer to q than to p1 . Definition 3: Let P be the set of peers around query node q. A boundary region is the intersection of positive half spaces from peers or H + (pi ) (3) B= pi ∈P
Definition 4: If B is closed, B is called a boundary polygon, B =< V, E > where V is a set of its vertices and E is a set of its edges. It is easy to derive the following properties of B =< V, E >: ∀vi ∈ V, ∃m, n such that vi ∈ (bm ) ∩ (bn ). ∀ei ∈ E, ∃j such that ei is a line segment on (bj ). Polygon < v1 , v2 , v3 , v4 > is the boundary polygon of q in Fig. 3. Definition 5: The boundary polygon B is called a tight polygon iff any object of interest oi inside B regards q as the closest moving object.
dist(q, pi ) ≥ 2dist(q, vj )
162
(4)
(bi ) is the boundary line of pi where m intersection point of line segment (q, pi ) and (bi ) as in Fig. 4. It is assumed that: ∃ei ∈ E, (bi ) ∩ ei = ∅
of moving objects is high. To limit the number of peers communicated by q, we propose the Algorithm 2 as below. Accordingly, only the peers which contribute in building the boundary polygon B are sent queries. However, as a tradeoff, this can lower the accuracy of RNN answer since the answer may be cached by other peers.
(5)
Let i be the intersection of (bi ) and ei or i ∈ (bi ) ∩ ei As the assumption, vj is the vertex of B fartherest from q, i.e. ∀vm ∈ V, dist(q, vm ) ≤ dist(q, vj ). Hence, any point on ei is inside the circle C(q, dist(q, vj )) or dist(q, i) ≤ dist(q, vj )
Algorithm 2: Optimized Search Algorithm Data: query node q; peer list P ; boundary polygon B; list V of intersection nodes or vertices in B Result: priority queue IOq 1 begin 2 stopHit = false; 3 while !stopHit&& P = ∅ do 4 pi = P.pop(); 5 if B is a polygon then 6 Find the furthest vertex vj of B; 7 if dist(q, pi ) >= 2dist(q, vj ) then 8 stoph it = true;
(6)
Since segment (q, m) ⊥ (bi ), dist(q, m) ≤ dist(q, i)
(7)
dist(q, m) ≤ dist(q, i) ≤ dist(q, vj )
(8)
2dist(q, m) ≤ 2dist(q, vj )
(9)
From 6 and 7,
Hence, Since m is the midpoint of line segment < q, pi >, finally we obtain dist(q, pi ) ≤ 2dist(q, vj ), which contradicts to 5. As long as the stop condition is not satisfied, q continues the boundary constructing process and sending queries to the next peer in the P queue. Here we develop two different algorithms which are Algorithm 1 - Exhaustive Search Algorithm and Algorithm 2 - Optimized Search Algorithm.
9 10 11 12 13 14 15
8 9 10 11 12
Send query msg to pi ; IOq = IOq ∪ IOpi ;
4) Pruning Objects of Interest: After collecting cached objects of interest from peers, in this phase the query node q prunes objects of interest which do not consider q as the nearest neighbor or put another way, those objects of interest are not part of RNN’s answer to the query. Based on the definition of tight polygon, all of objects which are outside B are pruned.
Algorithm 1: Exhaustive Search Algorithm Data: query node q; peer list P ; boundary polygon B; list V of intersection nodes or vertices in B Result: priority queue IOq 1 begin 2 foreach pi ∈ P do 3 if B is a polygon then 4 Find the furthest vertex vj of B; 5 if dist(q, pi ) >= 2dist(q, vj ) then 6 break; 7
if !stopHit then Calculate a boundary line (bi ); B = B ∪ (bi ); Find intersection nodes; Update V ;
Table II S IMULATION PARAMETERS Parameters Playground No. of objects of interest No. of moving objects Expected number of queries, λ Cache Size Simulation time
Calculate a boundary line (bi ); B = B ∪ (bi ); Find intersection nodes; Update V ; Send query msg to pi ; C = C ∪ IOpi ;
Value 87.1km2 550 7600 2 50 30s
IV. P ERFORMANCE E VALUATION A. Simulation Model, Assumptions and Benchmarks MiXiM is used in our simulation environment as a powerful OMNeT++-based framework to model and analyze our Mobile P2P Query Processing System [11]. It is assumed that the system is applied in a network where the density
Although Algorithm 1 can make the most of information from peers to answer RNN queries, the query node needs to send the query to all of its peers. Therefore, this algorithm is not effective due to communication overhead if the density
163
0
7 6
2
3
4
5
1
49 48 47 46 45 44 43 42 1 Optimized
2
3
4 Exhaustive
5
2
3
Optimized Exhaustive No. of Stop Hits and Peers Pruned
Centralized
4
800 600 400 200 0
5
550
Centralized
1100
1650
Optimized Exhaustive
900
52
800
50
700
Accuracy Rate
1 Optimized Exhaustive
Accuracy Rate
8
Mean Latency (ms)
200
9
600 500 400 300
2200
2750
48 46 44 42 40
200 100 1 Stop Hits
2
3
4
38
5
550
Peers Pruned
1100
1650
Optimized
Figure 5. Effect of expected number of queries, λ, generated by each moving objects on the performance of the systems.
12 11.5 11 10.5 10 9.5 9 8.5 8 7.5 550
Centralized
2200
1100
1650
Optimized Exhaustive No. of Stop Hits and Peers Pruned
400
10
2200
2750
Centralized
450 400 350 300 250 200 150 100
2750
550
Exhaustive
1100
1650
Stop Hits
2200
2750
Peers Pruned
Communication Cost (mW)
Figure 6. Effect of number of objects of interest on the performance of the systems.
of moving objects is high enough to guarantee that for each moving object, at least one moving object is in communication range to send its queries. As an optimal network condition, there is no network partitioning or connection failure happens. At the initial stage, each moving object is assigned kNN objects of interest in its cache. The simulation is run with both real and synthetic datasets for a large-scale network as Table II. More specifically, for the real dataset we collect statical data of inner Melbourne city, Australia in 2009 1 . The number of objects of interest is the number of tourist accommodation establishments. The number of moving objects is based on the number of registered vehicles. We expect that at least 10% of people who registered their vehicle install the application on their mobile device. The effect of each parameter is investigated by varying its value. To compare performance of the P2P system with other benchmarking systems, we model the basic R-Tree-based centralized system [1]. Communication to the BS is conducted via 3G(WCDMA), band I-2100 which is used by Vodaphone and Optus in Australia. Data rate for high speed moving objects in this network is 128kbps 2 and current consumption in connected state is 365.6mW 3 .
1200
12
1000
11
Mean Latency (ms)
600
1000
800 600 400 200 0 10
20
Optimized Exhaustive
30
Accuracy Rate
42 40 38 36 34 32 30 1
10
Optimized
Communication Cost (mW)
1
20
30
1200 1000 800 600 400 200 0
40
1
10
Stop Hits
12 11
800 600 400 200
20
30
40
Peers Pruned
10 9 8 7 6
7600 11400 15200 76000
3800
44 42 40 38 36 34 32 30 3800
7600
Optimized
11400 15200 76000 Exhaustive
7600
Optimized Exhaustive No. of Stop Hits and Peers Pruned
Centralized
46
Accuracy Rate
40
Effect of speed of moving objects on the performance of the
3800
164
30 Centralized
1400
1000
Figure 8. systems.
20
1600
1200
B. Simulation Results
10
1800
Exhaustive
Optimized Exhaustive
Regional Profile: Inner Melbourne (Statistical Subdivision), http://www.abs.gov.au 2 Silicon-Press, Third generation (3g) wireless technology brief, http://www.silicon-press.com/briefs/brief.3g/index.html 3 Option® ,” Power considerations for 2g & 3g modules in mid designs,” http://www.option.com/en/newsroom/media-center/white-papers/
7
Optimized Exhaustive
0
1 National
8
Centralized
44
1) Effect of λ: Fig. 5 compares performance of the systems with respect to the expected number of queries generated by each moving object, λ. In general, the result shows the P2P system is substantially more energy-efficient
9
40
46
Figure 7. systems.
10
6 1
No. of Stop Hits and Peers Pruned
800
1200
Mean Latency (ms)
11
Communication Cost (mW)
12
1000
Mean Latency (ms)
Communication Cost (mW)
1200
11400 15200 76000 Centralized
12000 10000 8000 6000 4000 2000 0 3800
7600 11400 15200 76000
Stop Hits
Peers Pruned
Effect of number of moving objects on the performance of the
than Centralized System since it needs to exchange messages with the BS via a wide-range communication like 3G. Even though the mean latency of P2P system slightly increases when the value of λ is incremented, it always lower than that of Centralized System by up to 43%. In addition, in the P2P system, the accuracy rate of Exhaustive Search Algorithm is slightly higher than the Optimized Search Algorithm. As a trade-off, the mean latency of the latter one is also slightly higher because there is more peer communication. As shown in Fig.5, a number of peers are pruned thanks to the stop condition in Optimized Search Algorithm. Moreover, this number goes up when the number of queries is increased, which means this algorithm can apply for busy network with a huge number of communication. 2) Effect of Number of Objects of Interest: Fig. 6 illustrates the cost, accuracy and the number of peers pruned with stop hits of the system in respect to number of objects of interest. The communication cost and mean latency of P2P is fairly stable no matter how this parameter changes. However, the accuracy rate of P2P gradually goes down whilst the number of peers pruned in Optimized Search Algorithm fluctuates around 400 as the number of objects of interest increases. 3) Effect of Speed of Moving Objects: Fig. 7 plots the performance of the systems by varying the speed of moving objects. The communication cost of P2P is almost constant whatever the factors change. Although the accuracy rate of P2P falls slightly as the some cached data become invalid, the P2P system is more time-efficient with the increase of moving objects’ speed. 4) Effect of Number of Moving Objects: Fig. 8 shows the performance of the systems with respect to the number of moving objects. In general, the P2P system is durable for large-scale network. Although mean latency of P2P goes up slightly, their communication cost and mean latency are far better than those of Centralized System. In addition, the difference between Optimized Search Algorithm and Exhaustive Algorithm is clearer when the number of moving objects increases. While Exhaustive Search Algorithm is outperformed in term of accuracy, it spends more time on processing queries. The Fig.8 also reflects that the number of peers pruned by the Optimized Search Algorithm rockets dramatically as the density of moving objects is increased.
centralized system regardless to the change of network parameters. In addition the mean latency of query processing in our proposed system is up to 43% less than the other benchmarking system. The Exhaustive Search Algorithm makes use of all information from the peers to aim at a high accuracy rate while Optimized Search Algorithm reduces the number of queried peers and then response time. In general, the P2P system is a practically feasible option for a largescale and busy network. R EFERENCES [1] F. Korn and S. Muthukrishnan, “Influence sets based on reverse nearest neighbor queries,” in Proc. of the 2000 ACM SIGMOD international conference on Management of data, 2000. [2] Y. Tao, D. Papadias, X. Lian, and X. Xiao, “Multidimensional reverse knn search,” The VLDB Journal, vol. 16, no. 3, pp. 293–316, 2007. [3] T. P. Nghiem, A. Waluyo, and D. Taniar, “A pure peerto-peer approach for knn query processing in mobile ad hoc networks,” Personal and Ubiquitous Computing, pp. 1–13, 2012. [Online]. Available: http://dx.doi.org/10.1007/ s00779-012-0545-y [4] Y. Luo, O. Wolfson, and B. Xu, “The mobi-dik approach to searching in mobile ad hoc network databases,” in Handbook of Peer-to-Peer Networking, 2010, vol. 9, pp. 1105–1123. [5] Y. Tao, D. Papadias, and X. Lian, “Reverse knn search in arbitrary dimensionality,” in Proc. of the Thirtieth international conference on Very large data bases, 2004. [6] D. Chen, J. Zhou, and J. Le, “Reverse nearest neighbor search in peer-to-peer systems,” in Flexible Query Answering Systems, 2006, vol. 4027, pp. 87–96. [7] C. Chow, M. Mokbel, and H. Leong, “On efficient and scalable support of continuous queries in mobile peer-to-peer environments,” IEEE Transactions on Mobile Computing, vol. 10, pp. 1473–1487, 2011. [8] W. Wu, F. Yang, C. Chan, and K. Tan, “Finch: evaluating reverse k-nearest-neighbor queries on location data,” in Proc. of the VLDB Endowment, 2008, pp. 1056–1067. [9] Q. Tran, D. Taniar, and M. Safar, “Bichromatic reverse nearest-neighbor search in mobile systems,” IEEE Systems Journal, vol. 4, no. 2, pp. 230–242, 2010.
V. C ONCLUSIONS
[10] W. Ku and R. Zimmermann, “Nearest neighbor queries with peer-to-peer data sharing in mobile environments,” Pervasive and Mobile Computing, vol. 4, no. 5, pp. 775 – 788, 2008.
In this paper, we confirmed the potential of a pure P2P query processing system in Mobile Networks to solve RNN queries based on two novel search algorithms, the system harnesses collaboration of peers answer queries without any support from server side. As a consequence, the limitation of centralized approaches such as single point of failure and bottleneck problems is eliminated. The simulation result shows that our P2P system significantly saves communication cost via the short-range network compared to the
[11] A. K¨opke, M. Swigulski, K. Wessel, D. Willkomm, P. T. K. Haneveld, T. E. V. Parker, O. W. Visser, H. S. Lichte, and S. Valentin, “Simulating wireless and mobile networks in omnet++ the mixim vision,” in Proc. of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems & Workshops, 2008.
165