Shared Execution Approach to -Distance Join Queries in ... - MDPI

International Journal of

Geo-Information Article

Shared Execution Approach to ε-Distance Join Queries in Dynamic Road Networks Hyung-Ju Cho Department of Software, Kyungpook National University, 2559, Gyeongsang-daero, Sangju-si, Gyeongsangbuk-do 37224, Korea; [email protected]; Tel.: +82-54-530-1455 Received: 30 April 2018; Accepted: 6 July 2018; Published: 10 July 2018

Abstract: Given a threshold distance ε and two object sets R and S in a road network, an ε-distance join query finds object pairs from R × S that are within the threshold distance ε (e.g., find passenger and taxicab pairs within a five-minute driving distance). Although this is a well-studied problem in the Euclidean space, little attention has been paid to dynamic road networks where the weights of road segments (e.g., travel times) are frequently updated and the distance between two objects is the length of the shortest path connecting them. In this work, we address the problem of ε-distance join queries in dynamic road networks by proposing an optimized ε-distance join algorithm called EDISON, the key concept of which is to cluster adjacent objects of the same type into a group, and then to optimize shared execution for the group to avoid redundant network traversal. The proposed method is intuitive and easy to implement, thereby allowing its simple integration with existing range query algorithms in road networks. We conduct an extensive experimental study using real-world roadmaps to show the efficiency and scalability of our shared execution approach. Keywords: ε-distance join query; shared execution; dynamic road network

1. Introduction Recent advances in mobile technologies and map-based applications enable users to access a wide range of location-based services such as shortest path queries [1–7], distance queries [2,3,5,8,9], range queries [3,10], and k-nearest neighbor (kNN) queries [3,7,10–12]. Owing to the popularity of map-based applications among users, processing spatial queries in road networks efficiently has become an important research area in recent years. In this work, we address ε-distance join queries in road networks. Given a threshold distance ε and two object sets R and S, an ε-distance join query explores all object pairs (r, s) ∈ R × S and returns a set of object pairs (r, s) that are within the threshold distance ε. The ε-distance join queries are useful in real-life applications such as data mining and similarity joins [13–18]. Therefore, many studies have evaluated the efficiency of ε-distance join queries in the Euclidean space [19,20] and in the metric space [13–18]. These studies focus primarily on designing elegant indexing techniques to avoid scanning the entire dataset repeatedly, and on pruning as many distance computations as possible. Figure 1 presents an example of dynamic road networks where objects r1 through r5 denote passengers (represented by rectangles), and objects s1 through s3 denote taxicabs (represented by triangles). In this example, an ε-distance join query for a taxicab company involves sending an available taxicab to each passenger who is located within a five-minute driving distance from the taxicab. The travel time should be frequently updated depending on traffic conditions. For example, at time t1 , taxicab s2 is closer to passenger r4 than taxicab s3 , as shown in Figure 1a. However, as shown in Figure 1b, because the taxicab s2 is in a traffic-congested area, it cannot reach the passenger r4 faster than the taxicab s3 at time t2 .

ISPRS Int. J. Geo-Inf. 2018, 7, 270; doi:10.3390/ijgi7070270

www.mdpi.com/journal/ijgi

ISPRS Int. J. Geo-Inf. 2018, 7, 270

2 of 23

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW

2 of 23

passenger taxicab Traffic-congested area

(a)

(b)

Example of of dynamic dynamic road road networks networks (a) (a) traffic traffic condition condition at at time time t1; (b) traffic condition at Figure 1. Example time t2 ..

Owing Owing to to the the dynamic dynamic nature nature of of road road networks, networks, many many studies studies have have recently recently addressed addressed aa variety variety of spatial queries in dynamic road networks such as shortest path queries [1,6,21], distance queries of spatial queries in dynamic road networks such as shortest path queries [1,6,21], distance queries [8], [8], kNN queries Although there have been a few studies [10,23] -distance queries andand kNN queries [22].[22]. Although there have been a few studies [10,23] onon ε-distance joinjoin queries in in road networks, they proposed sophisticated index structures and algorithms to boost -distance road networks, they proposed sophisticated index structures and algorithms to boost ε-distance join join queries in road staticnetworks, road networks, where the of weights of road are segments are rarely updated. queries in static where the weights road segments rarely updated. Specifically, Specifically, the state-of-the-art solution to queries in road thealgorithm distance the state-of-the-art solution to ε-distance join -distance queries injoin road networks is thenetworks distance is join join algorithm proposed in Reference [23], based on the spatially induced linkage cognizance (SILC) proposed in Reference [23], based on the spatially induced linkage cognizance (SILC) framework framework introduced in Reference [3]. The algorithm pre-computes the shortest path between each introduced in Reference [3]. The algorithm pre-computes the shortest path between each pair of pair of vertices in a road network and stores all paths. This approach is not suitable for dynamic road vertices in a road network and stores all paths. This approach is not suitable for dynamic road networks networks because because in in these these road road networks, networks, the the weights weights of of road road segments segments are are frequently frequently updated, updated, and and the pre-computed distances between vertices often become obsolete. Various techniques the pre-computed distances between vertices often become obsolete. Various techniques [24–34] [24–34] for for preserving location privacy in road networks have recently been developed, which focus on moving preserving location privacy in road networks have recently been developed, which focus on moving objects users, and and therefore therefore cannot cannot be be applied applied to to the the problem problem studied studied here. here. objects such such as as vehicles vehicles and and users, Therefore, -distance join join algorithm algorithm called called EDISON Therefore, in in this this study, study, we we propose propose an an optimized optimized ε-distance EDISON to to efficiently support -distance join queries in dynamic road networks, where materialized efficiently support ε-distance join queries in dynamic road networks, where materialized structures structures such such as as the the SILC SILC framework framework cannot cannot be be used used to to dynamically dynamically compute compute the the shortest shortest path path and and distances distances from a source point to one or more destination points. Intuitively, a simple solution involves from a source point to one or more destination points. Intuitively, a simple solution involves computing computing the distance between every object pair from two datasets. This simple solution is very the distance between every object pair from two datasets. This simple solution is very inefficient, inefficient, because for each element in an object set , the algorithm must traverse another object set because for each element in an object set R, the algorithm must traverse another object set S, which can , which can lead tonetwork redundant network traversal. The proposed is to cluster adjacent objects lead to redundant traversal. The proposed solution is tosolution cluster adjacent objects of the same of the same type into a group and then optimize shared execution for the group to avoid redundant type into a group and then optimize shared execution for the group to avoid redundant network network the shared execution queries hasmuch received much[21,35–38], attention [21,35–38], traversal.traversal. AlthoughAlthough the shared execution of queriesofhas received attention no shared no shared execution strategy has been applied to evaluate -distance join queries in dynamic road execution strategy has been applied to evaluate ε-distance join queries in dynamic road networks to networks to date. The contributions of this study are as follows. date. The contributions of this study are as follows. • •

We propose an efficient -distance join algorithm called EDISON for dynamic road networks We propose an efficient ε-distance join algorithm called EDISON for dynamic road networks that optimizes the shared execution paradigm to accelerate query response times. To the best of that optimizes the shared execution paradigm to accelerate query response times. To the best of our knowledge, this is the first attempt to evaluate -distance join queries effectively in dynamic our knowledge, this is the first attempt to evaluate ε-distance join queries effectively in dynamic road networks. road networks. • We design algorithms that reduce the number of range queries required to evaluate -distance • join We design the number of range queries requiredand to evaluate ε-distance queriesalgorithms in dynamicthat roadreduce networks. The proposed method is intuitive easy to implement, join queries in dynamic road networks. The proposed method is intuitive and easy to implement, thereby allowing its simple integration with existing range query algorithms (e.g., [10]) in road thereby allowing its simple integration with existing range query algorithms (e.g., [10]) in networks. roadconduct networks. • We extensive experiments under different setups to demonstrate the efficiency and • scalability We conduct experiments under different setups to demonstrate the efficiency and of extensive our approach over conventional solutions. scalability of our approach over conventional solutions. The remainder of this paper is organized as follows. In Section 2, we review related studies. In Section we provideofsome background information. We present a simple solution the naive The3,remainder this paper is organized as follows. In Section 2, we reviewcalled related studies. EDISON based on shared executioninformation. in Section 4.We In present Section 5, we present an optimized In Sectionmethod 3, we provide some background a simple solution called the solution that avoids redundant network traversal called the EDISON method. In Section 6, we evaluate our proposed solutions experimentally under different setups by comparing them to a conventional solution. We conclude the paper in Section 7.


3 of 23

naive EDISON method based on shared execution in Section 4. In Section 5, we present an optimized solution that avoids redundant network traversal called the EDISON method. In Section 6, we evaluate our proposed solutions experimentally under different setups by comparing them to a conventional solution. We conclude the paper in Section 7. 2. Related Work 2.1. Dynamic Road Networks The geographic information system (GIS) community has shown interest in processing spatial queries in road networks, which form an integral aspect of geospatial applications, such as location-based services and locational analysis [1–3,5–7,10,11,22,23,39,40]. Previous works have studied a variety of spatial queries in road networks, which include shortest path queries [1–7], distance queries [2,3,5,8,9], range queries [3,10], and kNN queries [3,7,10,11]. The previous studies focused primarily on adopting sophisticated index structures to achieve good performance under the assumption that road networks are stable. For example, Sankaranarayanan et al. [3,4] proposed the SILC and path-coherent pairs decomposition frameworks based on spatial path coherence to determine the shortest path and the shortest distance between every pair of vertices. In particular, the SILC framework can support various queries in road networks including path queries, distance queries, range queries, and kNN queries. Because these frameworks have high pre-computational overhead and space complexity even in static road networks, they are not suitable for processing spatial queries in dynamic road networks where the weights of road segments are often updated. Recently, distance queries [8] and shortest path queries [1,6,21] have been actively studied in dynamic road networks. Techniques for distance queries and shortest path queries cannot be applied directly to processing ε-distance join queries in dynamic road networks owing to the inconsistent problem requirements. Recently, Arain et al. [24–26], Domenic et al. [27], Gustav et al. [28], Kamenyi et al. [29], and Memon et al. [30–34] developed novel techniques for location privacy preservation over road networks. These techniques focus on protecting the location privacy of moving objects in road networks. However, it is inappropriate to extend these location privacy techniques to solve the ε-distance join problem in dynamic road networks owing to differences in the problem definition. 2.2. ε-Distance Join Queries Given a query point q, a threshold distance ε, and a set of objects S, a range query retrieves objects that are within the threshold distance ε from the query point q. Several algorithms have been proposed to support range queries in road networks. Papadias et al. [10] proposed the range Euclidean restriction (RER) and range network expansion (RNE) algorithms for processing range queries in road networks. The ε-distance join query is another important query in road networks. Papadias et al. [10] proposed join Euclidean restriction (JER) and join network expansion (JNE) algorithms to support ε-distance join queries in road networks. The JER algorithm uses the Euclidean distance as a heuristic to search for object pairs within the threshold distance ε based on their network distance. Therefore, it performs poorly when the Euclidean distance and the network distance are very different, which is a common scenario in practice. If the edge weight is defined as the travel time in road networks, the Euclidean distance cannot confine the search space unless additional assumptions are made, such as assuming maximum speed. Therefore, JER is not appropriate for processing ε-distance join queries in dynamic road networks. Sankaranarayanan et al. [23] proposed a distance join algorithm that can support ε-distance join queries in road networks. The distance join algorithm, based on the SILC framework introduced in [3], exploits the pre-computation of the shortest path between each pair of vertices in a road network and stores all paths in a quad-tree. However, the distance join algorithm in [23] is not suitable for dynamic road networks as it suffers from excessive storage costs for large road networks. The ε-distance join queries are well studied in the Euclidean space [19,20] and in the metric space [13–18]. Nonetheless, all approaches are unsuitable for processing ε-distance join queries,


4 of 23

because they utilize some geometric properties (e.g., MBR [20], plane-sweep [20], and space-filling curve [41]) that are not available for dynamic road networks. As a means to process a large number of queries in database systems, the shared execution of queries has recently received much attention [35–38,42]. The key idea of shared query execution is to cluster similar queries (i.e., those that share some common execution path) into a group and then execute the group as a single query in the system. These shared execution methods are found to be effective in many applications involving high load conditions. In this study, we optimize the shared execution strategy to boost ε-distance join queries in dynamic road networks. Our proposed solution differs from existing studies in several aspects. First, our solution represents the first attempt to evaluate ε-distance join queries efficiently in dynamic road networks, where materialized index structures (e.g., SILC [3]) cannot be used. Second, our solution considers applying a shared execution strategy to avoid redundant network traversal while processing ε-distance join queries. Finally, our solution can be implemented easily using the best-known solutions (e.g., RNE [10]) to range queries in road networks, which is considered a desirable property in practice. 3. Preliminaries Section 3.1 defines the terms and notations that are used in this paper. Section 3.2 defines the ε-distance join query applied to road networks. 3.1. Definition of Terms and Notations Road network: We represent a road network by an undirected weighted graph h G = V, E, W i, where V, E, and W indicate the vertex set, the edge set, and the edge distance matrix, respectively. Each edge vi v j has a nonnegative weight representing the network distance, such as the travel time. Classification of vertices: We divide vertices into three categories based on their degree. (1) If the degree of a vertex is larger than or equal to 3, the vertex is referred to as an intersection vertex; (2) If the degree is 2, the vertex is an intermediate vertex; (3) If the degree is 1, the vertex is a terminal vertex. Vertex sequence and segment: A vertex sequence vl vl +1 . . . vm denotes a path between two vertices vl and vm , such that vl (vm ) is either an intersection vertex or a terminal vertex, and the other vertices in the path, vl +1 , . . . , vm−1 , are intermediate vertices. The length of a vertex sequence is the total weight of the edges in the vertex sequence. A part of a vertex sequence is referred to as a segment. By definition, a vertex sequence is also a segment. Table 1 summarizes the symbols used in this paper. Note that to simplify presentation, we use ri r j to denote ri ri+1 · · · r j , where adjacent outer objects ri , ri+1 , · · · , r j are located in the same vertex sequence. In this study, we employ the basic concept of clustering adjacent outer objects in a vertex sequence into an outer segment. The advantage of this clustering method is that if we evaluate two range queries at the two end points, ri and r j , of an outer segment ri r j , we can retrieve inner objects within distance ε from each outer object r ∈ ri r j without duplicating the network traversal. In other words, the two range queries at ri and r j are sufficient to retrieve inner objects within distance ε from all outer objects in ri r j , which is proved in Lemma 1. Figure 2 shows the difference between the distance and the segment length between two objects r and s in a road network, where the numbers at the edges indicate the distance between two adjacent points (e.g., dist(v1 , v2 ) = 6). The shortest path from r to s is denoted by r → v2 → v5 → s where the distance between r and s is dist(r, s) = 6 in this case. The segment connecting r and s in the same vertex sequence v2 v3 v4 v5 becomes rv3 v4 s with length equal to len(r, s) = 9. We recall that len(r, s) is defined for two objects located in the same vertex sequence.


5 of 23

Table 1. Symbols and their meaning. Symbol

Definition

ε Threshold distance εp Query distance at a point p such that ε p ≤ ε r Outer object r ∈ R s Inner object s ∈ S of the shortest path connecting two points p1 and p2 in the road network 5 of 23 p2 ) ISPRSdist Int.( pJ.1 ,Geo-Inf. 2018, 7, x FORLength PEER REVIEW Length of the segment connecting two points p1 and p2 , such that both p1 and p2 are len( p1 , p2 ) located in the same vertex sequence Figure 2 shows the difference between the distance and the segment length between two objects Vertex sequence where vl and vm are not intermediate vertices and the other vertices, v v . . . v m l l + 1 and in a road network, the numbers at vertices the edges indicate the distance between two vl +where intermediate 1 , are 1 , . . . , vm− ri ri+1 points . . . rj thatshortest consists ofpath outerfrom objects ri ,to ri+1 , . is . . ,denoted r j in a vertex ( Outer = 6). The bysequence → adjacent (e.g., , )segment → → Set of inner objects within distance ε from a point p, i.e., ε S ( ) wherep the distance betweenSε =and is , = 6 in this case. The segment connecting and {s|dist( p, s) ≤ ε for s ∈ S} p ( ,∀s)∈=S}9. We recall in the becomes top1 p2 for in a segment p1 pwith = {s|s ∈ p1 p2 ) vertex sequenceSet of inner objects S( p1 p2 ) equal S( same 2 , i.e., length ε of inner objects within distance ε from a point p query that returnsin a set that RNQ((ε,, p)) is defined forRange two objects located theS psame vertex sequence. Ω(v x ) Number of outer segments in vertex sequences adjacent to an intersection vertex v x 3 2

4

1

6

2

3

4

Figure 2. Example where dist(r, s) = 6 and len(r, s) = 9. ( , ) = 6 and ( , ) = 9. Figure 2. Example where

3.2. ε-Distance 3.2. -Distance Join Join Query Query We consider consider two two object object sets sets R and to to as We andS ininthe theroad roadnetwork networkG. For . Forconvenience, convenience,R is referred is referred the set of outer objects and S is referred to as the set of inner objects. Accordingly, r ∈ R and s ∈ S are as the set of outer objects and is referred to as the set of inner objects. Accordingly, ∈ and ∈ referred to as outer and inner objects, respectively. are referred to as outer and inner objects, respectively. Definition 1.1.n( -distance (ε-distance query): a threshold distance ε object and two R and = S, Definition joinjoin query): Given a threshold distance and two sets object andsets , where nGiven o o R, = r1 , r2 , = · · ·{ , r,| R| , ⋯and , s2 , · · · , join s|S| query, , the ε-distance query, denotedthe by set R ./ S, {where , ,⋯ , | |S}, = the s1-distance denoted byjoin ⋈ , returns of εall | | } and returns the set of all possible pairs of objects from R × S that are within the threshold distance ε: possible pairs of objects from × that are within the threshold distance :

( , ) ≤ for ∀( , ) ∈ × }. = {( , )| R ./⋈ ε S = {(r, s )| dist (r, s ) ≤ ε for ∀(r, s ) ∈ R × S }. is a subset of × . The -distance join query is commutative, i.e.,

⋈ = Naturally, ⋈ ⋈ . Naturally, R ./ε S is a subset of R × S. The ε-distance join query is commutative, i.e., R ./ε S = S ./Shared ε R. 4. Naive Execution for -Distance Join Queries in Road Networks 4. Naive Shared Execution for ε-Distance Join Queries in Road Networks 4.1. Grouping of Objects in a Vertex Sequence 4.1. Grouping Objects in Vertex Sequence Figure 3 of presents ana example of an -distance join query in a road network, which will be discussed throughout therejoin are query five outer through Figure 3 presentsthis an section. exampleInofthis anfigure, ε-distance in aobjects, road network, which, and will six be inner objects, through . Given a threshold distance = 5 and two object sets = discussed throughout this section. In this figure, there are five outer objects, r1 through r5 , and six {inner , , objects, , , }s and = { , , , , , } , we consider an -distance join query ⋈ of the 1 through s6 . Given a threshold distance ε = 5 and two object sets R = {r1 , r2 , r3 , r4 , r5 } ( , ) ≤an following form: )| consider 5 for ∀( , ) ∈join×query }. R ./ S of the following form: and S = {s , s , s ⋈ , s , s =, s{(},, we ε-distance 1

2

3

4

5

6

ε

R ./ε S = {(r, s)|dist(r, s) ≤ 5 for ∀(r, s) ∈ R × S}. 4

2

7 1

2 1

2

2 1

2

∈

inner object

∈

3

3 2

outer object

1

2

4

Figure 3. Example of an -distance join query in a road network.

4.1. Grouping of Objects in a Vertex Sequence Figure 3 presents an example of an -distance join query in a road network, which will be discussed throughout this section. In this figure, there are five outer objects, through , and six inner objects, through . Given a threshold distance = 5 and two object sets = { ISPRS , , Int. , , J. Geo-Inf. } and 2018, = {7, 270 , , , , , } , we consider an -distance join query ⋈ of the ( , ) ≤ 5 for ∀( , ) ∈ × }. following form: ⋈ = {( , )| 4

2

7 1

2

outer object

∈

inner object

∈

1

2

2 3

3

1

2

4

2

1

2

6 of 23

Figure 3. Example of an ε-distance join query in a road network.

Figure 3. Example of an -distance join query in a road network.

Figure 4 shows thethe sample grouping of of outer objects in in a avertex and Figure 4 shows sample grouping outer objects vertexsequence sequence andthe thesample sample grouping grouping inner objects in a vertex sequence. As shown in Figure twoouter outer objects objects r1 and and r2 in a vertex of innerofobjects in a vertex sequence. As shown in Figure 4a,4a,two insequence a vertex sequence are grouped an outer segment , and three outerobjects objects r ,, r ,, and r in a v1 v2 v3 are grouped into aninto outer segment r1 r2 , and three outer 3 4 5 and in a vertex sequence are grouped into another outer segment . Similarly, as vertexISPRS sequence v v v are grouped into another outer segment r r r . Similarly, as shown in Figure 4b, 3 7, x FOR PEER REVIEW 3 5 4 1 42018, Int. J. Geo-Inf. shown in Figure 4b, three inner objects , , and in a vertex sequence are grouped into 6 of 23

three inner objects s1 , s4 , and s5 in a vertex sequence v1 v2 v3 are grouped into an inner segment s1 s5 s4 , an inner , and two inner objects a vertex sequence grouped and two innersegment objects s2 and s3 in a vertex sequenceand v1 v4 v3inare grouped into anotherare inner segment another inner . Therefore, objects = { , , , into , } Ris = transformed s2 s3 . into Therefore, a set segment of outer objects R = {ar1set , r2of, router , r , r is transformed r r } { 3 4 5 1 2 , r3 r5 r4 }, into = { , } , where is the set of outer segments generated from the outer objects. where R is the set of outer segments generated from the outer objects. Similarly, a set of inner objects Similarly, a set of inner objects = { , , , , , } is transformed into ̅ = { , , }, S = {s1 , s2 , s3̅ , s4 , s5 , s6 } is transformed into S = {s1 s5 s4 , s2 s3 , s6 }, where S is the set of inner segments generality, we where is the set of inner segments generated from the inner objects. Without loss of generated from the inner objects. Without loss of generality, we assume that R ≤ S , because assume that | | ≤ | ̅| , because ⋈ = ⋈ . If | | > | ̅| , then ⋈ is evaluated instead of R ./ε S ⋈ = S. ./ε R. If R > S , then S ./ε R is evaluated instead of R ./ε S. 2

7 1

7

2

4 2

1

3

3

2

4

1

2

2

2 1

2

1

2

2

1

2

1

4

3

3 2

2

1

2

4

(b)

(a)

Figure 4. Grouping outer and inner objects (a) R = {r1 r2 , r3 r5 r4 } and R = 2; (b)̅ S = {s1 s5 s4 , s2 s3 , s6 } 4. Grouping outer and inner objects (a) = { , } and | | = 2; (b) = { , , } Figure and S and=| 3.̅| = 3.

4.2. Shared Execution Processing of GroupedOuter OuterObjects Objects 4.2. Shared Execution Processing of Grouped Our shared execution strategy for processing -distance join queries in networks road networks is Our shared execution strategy for processing ε-distance join queries in road is motivated motivated by the observation that at most two range queries are sufficient for retrieving inner objects by the observation that at most two range queries are sufficient for retrieving inner objects within the within the threshold distance from each outer object in an outer segment. This observation is threshold distance ε from each outer object in an outer segment. This observation is formalized in formalized in Lemma 1, which states a simple but important fact regarding the shared execution of Lemma 1, which states a simple but important fact regarding the shared execution of ε-distance join -distance join queries in road networks—if we evaluate two range queries at outer objects and , queries in road networks—if we evaluate two range queries at outer objects r and r j , which correspond which correspond to the two end points of an outer segment … , theni we can retrieve inner to theobjects two end points of an outer ri ri+1 . . .,r⋯j , ,then without we can retrieve inner within distance within distance fromsegment outer objects duplicating theobjects network traversal. ε from outer ri+1queries , · · · , rat duplicating theretrieving networkinner traversal. two range j−1 without Thus, theobjects two range and are sufficient for objects Thus, within the distance queries at the ri and are sufficient for inner objects within distance ε from the other outer from otherr jouter objects , ⋯ retrieving , . objects ri+1 , · · · , r j−1 . Lemma 1. For every outer object

Lemma 1. objects For every outer object inner within distance

∈

, it holds that

⊆

∪

( rfrom ∈ ran it holds that (Srε),⊆and Srεi ∪ object i r j ,outer

), where

∪ (

(

) is the set of

Srε)j ∪ r j of, where Srεi (Srεlocated ) is theinset is S theriset inner objects j

of

outerwithin segmentdistance (e.g., ∈ an outer in Figure inner the objects ε from object 4). ri (r j ), and S ri r j is the set of inner objects located in the

outer segment ri r j (e.g., s2 ∈ r3 r5 r4 in Figure 4).

Proof. We prove Lemma 1 by contradiction. We assume that

⊆

∪

∪ (

) does not hold,

Then this implies therethat is an ⊈ ∪ ∪ (1 by).contradiction. ∉ i.e., Proof.i.e., We prove Lemma We that assume Srε inner ⊆ Srεi object ∪ Srεj ∪ S ∈ri r j such doesthat not hold, ), i.e., ∉ , ∉ , and ∉ ( ). Clearly, ∉+ means that ε ε ∪ ε∪ ( ε +( , ε) > ε

Sr * Sri ∪ Sr j ∪ S ri r j . Then this implies that there is an inner object s ∈ Sr such that s ∈ / Sr i ∪ Sr j ∪ . Similarly, ∉ + means that + ( , )> . Clearly,+ ∉ ε ( ) means that is not located + ε ε + S ri r j , i.e., s ∈ / Sr i , s ∈ / Sr j , and s ∈ / S ri r j . Clearly, s ∈ / Sri means that dist(ri , s ) > ε. Similarly, in . Therefore, the shortest passes either or and the distance path from +to through + ε + + s ∈ / from Sr j means distdetermined r j , s > ε. / ( , ) +that (s , is not ), located ( , ) +in ri r(j . , Therefore, )} . ( , s )∈ to that is = S ri{r j means byClearly, ( , ) > and ( , ) > , which leads to ( , )> From the previous condition, both . Thus, cannot belong to , which contradicts the assumption that an inner object ∈ exists such that ⊈ ∪ ∪ ( ). Consequently, it holds that ⊆ ∪ ∪ ( ) for ∀ ∈ .□


7 of 23

the shortest path from r to s+ passes through either ri or r j and the distance from r to s+ is determined by dist(r, s+ ) = min len(r, ri ) + dist(ri , s+ ), len r, r j + dist r j , s+ . From the previous condition, both dist(ri , s+ ) > ε and dist r j , s+ > ε, which leads to dist(r, s+ ) > ε. Thus, s+ cannot belong to Srε , which contradicts the assumption that an inner object s+ ∈ Srε exists such that Srε * Srεi ∪ Srε j ∪ S ri r j . Consequently, it holds that Srε ⊆ Srεi ∪ Srε j ∪ S ri r j for ∀r ∈ ri r j . We determine the inner objects within distance ε from an outer object r ∈ ri r j among inner objects in ∪ Srεj ∪ S ri r j . First, we compute the distance from an outer object r ∈ ri r j to an inner object s ∈ Srεi ∪ Srε j ∪ S ri r j . If there exists a path r → ri → s (i.e., s ∈ Srεi ), then the distance from r to s is dist(r, s) = len(r, ri ) + dist(ri , s), as shown in Figure 5a. Similarly, if there exists a path r → r j → s (i.e., s ∈ Srε j ), then the distance from r to s is dist(r, s) = len r, r j + dist r j , s , as shown in Figure 5b. If theISPRS inner s is located ri r jREVIEW (i.e., s ∈ ri r j ), then the distance from r to s is dist(r, s) =7 oflen Int.object J. Geo-Inf. 2018, 7, x FORin PEER 23(r, s ), as shown in Figure 5c. Because dist(r, s) is the length of the shortest path among multiple paths Figure 5b. s, If dist the (inner is located in (i.e., ∈ ), then the distance from to is between r and r, s) isobject computed as follows. ( , )= ((, ), asshown in Figure 5c. Because ( , ) is the length of the shortest path min len(r, ri ) and + dist, (ri , s(), ,len r j + distasr jfollows. ,s if s ∈ / ri r j ) isr,computed among dist(multiple r, s) = paths between min len(r, ri ) + dist(ri , s), len r, r j + dist r j , s , len(r, s) otherwise Srεi

{ {

( , )=

( , )+ ( , )+

( , )+ ( , )+

( , ), ( , ),

( , )+ ( , )+

( , )

( , )} ( , ),

if ∉ ( , )} otherwise

( , )+

( , )

( , )+

( , )

( , )

( , )

( , )

(b)

(a) ( , ) ( , )

( , )

(c) Figure 5. Determination of the distance from r to s (a) If s ∈ Srεi , then dist(r, s) = len(r, ri ) + dist(ri , s); distance to (a) If ∈ , then ( , )= ( , )+ Figure 5. Determination of the from (b) If s ∈ (Srε,j , )then len(r, s(). , ) = ) = r j ,(s , ; )(c) ; (b)dist If (r,∈s) =, len then r, r j (+, dist + If s (∈ r, i r)j;, then (c) If dist∈(r, s), = then ( , ).

Table 2 explains how to compute the distance from r to s, where r ∈ ri r j and s ∈ Srεi ∪ Srε j ∪ S ri r j . Table 2 explains how to compute the distance from to , where ∈ and ∈ ∪ ∪ The inner object s belongs to a combination of Srεi , Srε j , and S ri r j , so seven possible cases are considered ), ( ). The inner object belongs to a combination ( so seven possible cases , and of , in total. Clearly, it is to retrieve a set to S retrieve ri r j ofainner that are located in rlocated i r j compared ) of inner are considered in trivial total. Clearly, it is trivial set (objects objects that are in ε (Sε ) of inner objects within distance ε from an outer object r (r ). with retrieving a set S j rj compared withri retrieving a set ( ) of inner objects within distance from ani outer object ( ).

Table 2. Computation of dist(r, s) for r ∈ ri r j and s ∈ Srεi ∪ Srεj ∪ S ri r j . Table 2. Computation of

Condition

Condition ε ε ∩S r r r i ∩ Sr j ∩ ( i j ) ∈s ∈ S∩ ε ε −S r r r i ∩ Sr j − ∈s ∈ S∩ ( i j ) ε − Srεj ri ∩ S ( ri r )j − ∈s ∈ S∩ s ∈ Srεj ∩ S ri r j − Srεi ∩ ( ) − ∈ ε ε ri − )) ∈s ∈ S− ( Srj∪∪ S( ri r j ε s ∈ Sr j − Srεi ∪ S ri r j )) ∈ − ( ∪ ( s ∈ S r r − Sε ∪ Sε ∈ ( i) j− ( ri∪ rj )

( ( ( ( ( ( ( (

( , ) for

∈

and

∈

∪

∪ (

).

dist(r, s)

, , , , , , ,

, ) n o , len(r, s) ) =dist(r, {s) = (min, nlen ) +(r, ri ) +(dist, (r),i , s), len ( , r, r)j ++ dist(r j , ,so ), ( , )} ) =dist(r, {s) = (min, len ( , r, r)j ++ dist (r j , ,s )} ) +(r, ri ) +(dist, (r),i , s), len dist(r, s) = min{len(r, ri ) + dist(ri , s), len(r, s)} )= { ( , n ) + ( , ), ( , )} o r , s , len(r, s) ) +r, r j +( dist ) =dist(r, {s) = (min, len , ), j ( , )} dist(r, s) = len(r, ri ) + dist(ri , s) )= ( , ) + ( , ) dist(r, s) = len r, r j + dist r j , s ( , ) )= ( , )+ dist(r, s) = len(r, s) )= ( , )

4.3. Naive EDISON Algorithm Algorithm 1 describes the naive EDISON algorithm, which employs the grouping of outer objects and shared execution to reduce query processing time. This algorithm involves two steps. In the first step, adjacent outer objects in a vertex sequence are grouped into an outer segment. Similarly,


8 of 23

4.3. Naive EDISON Algorithm Algorithm 1 describes the naive EDISON algorithm, which employs the grouping of outer objects and shared execution to reduce query processing time. This algorithm involves two steps. In the first step, adjacent outer objects in a vertex sequence are grouped into an outer segment. Similarly, adjacent inner objects in a vertex sequence are also grouped into an inner segment. In other words, R and S are transformed into R and S, respectively. For simplicity, we assume that R ≤ S . If R > S , we simply evaluate S ./ε R, because R ./ε S = S ./ε R. In the second step, the naive EDISON algorithm explores each outer segment ri r j sequentially to retrieve inner objects that are within distance ε from each outer object in the outer segment ri r j . Depending on the number of outer objects in ri r j , there are three possible cases, which the naive EDISON algorithm handles differently: (1) ri r j = 1; (2) ri r j = 2; and (3) ri r j > 2. If ri r j = 1, a range query RNQ(ε, ri ) is evaluated at ri because ri r j includes an outer object ri (= r j ) only. Then, a partial join result Φ(ri ) is simply obtained n o from the range query result Srεi at ri , i.e., Φ(ri ) = (ri , s) s ∈ Srεi . The partial join result Φ(ri ) is added to the ε-distance join query result Φ( R), i.e., Φ( R) = Φ( R) ∪ Φ(ri ) (lines 8-10). If ri r j = 2, two range queries, RNQ(ε, ri ) and RNQ ε, r j , are evaluated at ri and r j , respectively, because ri r j includes only two outer objects ri and r j . Then, two partial join results Φ(ri ) and Φ r j are simply n o obtained from the range query results Srεi and Srε j at ri and r j , respectively, i.e., Φ(ri ) = (ri , s) s ∈ Srεi o n and Φ r j = r j , s s ∈ Srε j . These partial join results Φ(ri ) and Φ r j are added to the ε-distance join query result Φ( R), i.e., Φ( R) = Φ( R) ∪ Φ(ri ) ∪ Φ r j (lines 11-14). Finally, if ri r j > 2, only two range queries, RNQ(ε, ri ) and RNQ ε, r j , are evaluated at ri and r j , respectively, similar to the case of ri r j = 2. According to Lemma 1, a partial join result Φ ri r j can be obtained from Srε ∪ Srε ∪ S ri r j i j using the shared execution method (lines 15-19), which is detailed in Lemma 2. This is because the partial join result Φ ri r j is the set of object pairs (r, s) that are within distance ε, where r ∈ ri r j n o and s ∈ Srεi ∪ Srε j ∪ S ri r j , i.e., Φ ri r j = (r, s) dist(r, s) ≤ ε for r ∈ ri r j and s ∈ Srεi ∪ Srε j ∪ S ri r j . Finally, the ε-distance join query result Φ( R) is returned after all outer segments have been processed S (line 20), i.e., Φ( R) = Φ ri r j for each outer segment ri r j ∈ R. In Lemma 2, we prove the correctness of the naive EDISON algorithm. Algorithm 1: Naive_EDISON (ε, R, S) Input: ε: threshold distance, R: set of outer objects, S: set of inner objects Output: R ./ε S: set of object pairs (r, s) ∈ R × S such that dist(r, s) ≤ ε 1 Φ( R) ← ∅ // Step 1: neighboring outer objects are grouped and neighboring 2 inner objects are also grouped. 3

R ← group_objects( R)

4

S ← group_objects(S) // For simplicity, we assume that R ≤ S . if R > S , we simply evaluate S ./ε R. // Step 2: ri r j ./ε S is evaluated for each outer segment ri r j ∈ R. for each segment ri r j ∈ R do outer if ri r j = 1 then

5 6 7 8 9 10 11

← RNQ(ε, ri ) Φ ( R ) ← Φ ( R ) ∪ Φ (ri ) else if ri r j = 2 then Srεi

Srεi ← RNQ(ε, ri ) Srεj ← RNQ ε, r j

12 13 14 15

Φ ( R ) ← Φ ( R ) ∪ Φ (ri ) ∪ Φ r j else if ri r j > 2 then Sri ← RNQ(ε, ri ) Srεj ← RNQ ε, r j Φ ri r j ← DJQ ε, ri r j , Srεi ∪ Srεj ∪ S ri r j Φ ( R ) ← Φ ( R ) ∪ Φ ri r j ε

16 17 18 19 20

return Φ( R)

// Φ( R) is the set of object pairs (r, s) ∈ R × S such that dist(r, s) ≤ ε.

// Outer objects in a vertex sequence are grouped into an outer segment. // Inner objects in a vertex sequence are grouped into an inner segment.

// ri r j = 1 means that ri r j contains an outer object ri only. // A range query is evaluated at ri and its result is Srεi . ε // Note that Φ(ri ) = (ri , s) s ∈ Sri . // ri r j = 2 means that ri r j contains two outer objects ri and r j . // A range query is evaluated at ri and its result is Srεi . // Another range query is evaluated at r j and its result is Srεj . o n // Note that Φ(ri ) = (ri , s) s ∈ Srεi and Φ r j = r j , s s ∈ Srεj . // ri r j ≥ 2 means that ri r j contains more than two outer objects. // A range query is evaluated at ri and its result is Srεi . // Another range query is evaluated at r j and its result is Srεj . // DJQ is explained in Algorithm 3. n o // Φ ri r j = (r, s) dist(r, s) ≤ ε for r ∈ ri r j , s ∈ Srεi ∪ Srεj ∪ S ri r j . // Φ( R) stores the result of R ./ε S.


9 of 23

Lemma 2. The naive EDISON algorithm is correct. Proof. We prove the correctness of the naive EDISON algorithm by cases. As shown in Algorithm 1, the naive EDISON algorithm handles the following three cases differently depending on the number of outer objects in an outer segment ri r j , namely, (1) ri r j = 1, (2) ri r j = 2, and (3) ri r j > 2. In the first two cases (i.e., ri r j = 1 and ri r j = 2), a range query is used to retrieve inner objects s within distance ε from each outer object r ∈ ri r j . This is same as a straightforward method used to answer ε-distance join queries by issuing a range query for each outer object r ∈ R. For ri r j = 1, the naive EDISON algorithm evaluates a range query RNQ(ε, ri ) for only an outer object ri , and adds an object pair (ri , s) to the ε-distance join query result for each inner object s ∈ Srεi . Therefore, the naive EDISON r r = 1 because RNQ ε, r simply retrieves inner objects s within distance ε9 of 23 algorithm is correct ISPRS Int. J. Geo-Inf.for 2018, i 7,j x FOR PEER REVIEW ( i ) from ri . Similarly, for ri r j = 2, the naive EDISON algorithm evaluates two range queries, RNQ(ε, ri ) ( , r) to algorithm addsouter an object pair the -distance query resultalgorithm for each inner and RNQ ε, r j then , for two objects ri and Thejoin naive EDISON then object adds ∈ j , respectively. ε ( ) to , and an object pair , the -distance join query result for each inner object ∈ . an object pairadds to the ε-distance join query result for each inner object s ∈ S , and adds an object r , s ( i ) ri ε pair Therefore, r j , s to thethe ε-distance join query result foriseach innerfor object s = ∈S . Therefore, the EDISON( , ) naive algorithm correct 2r jbecause ( ,naive ) and EDISON algorithm is correct for ri r j within = 2 because RNQfrom RNQ ε, r j retrieve inner objects s within (ε, ri ) and retrieve inner objects distance and , respectively. distance ε from ri and r j , respectively. The proof for the correctness of the naive EDISON algorithm for > 2 differs from that for r r j >shared The proof for the correctness of the naive EDISON algorithm for 2 differs from that execution strategy = 1 and = 2 because the naive EDISON algorithm exploitsi the for rfor and ri r j that = 2the because the naive algorithm EDISON algorithm exploits the shared execution( , ) > 2. Note naive EDISON evaluates only two range queries i rj = 1 strategy 2. Note >that the key naive EDISON algorithm and for r(i r j, > ) for idea of the proof of the evaluates correctnessonly for two range > 2 isqueries to compute 2. The RNQthe ε, r j anfor ri r jobject > 2. The idea of the proof the correctness for ri r j >a 2range (ε, rdistance i ) and RNQ between outer andkey each candidate inner of object without evaluating is to query compute distance ∈ between and ∪ each innertoobject s without ) . According ( candidate for the, where − { an , }outer and object ∈ r∪ Lemma 1, we can ε ∪ Sε ∪ S r r . According to evaluating a range query for r, where r ∈ r r − r , r and s ∈ S i j i j outer object ri ∈ r j among i j the candidate inner retrieve inner objects within distance from every Lemma 1, we can retrieve inner objects within distance ε from every outer object r ∈ ri r j among the an ( , ) , determines objects ∈ between and , whether ∪ ∪ ( ) . The distance candidate inner objects s ∈ Srεi ∪ Srε j ∪ S ri r j . The distance between r and s, dist(r, s), determines object pair ( , ) is included in the -distance join query result. Let us compute the distance between whether an object pair (r, s) is included in the ε-distance join query result. Let us compute the distance an outer object and candidate inner object . To do so, we consider the two cases, ∉ and ∈ between an outer object r and candidate inner object s. To do so, we consider the two cases, s ∈ / ri r j , as shown in Figure 6a,b, respectively. If ∉ , the distance from to is evaluated as and s ∈ ri r j , as shown in Figure 6a,b, respectively. If s ∈ / ri r j , the distance from r to s is evaluated ( , )= { ( , )+ ( , ), ( ( , ) + , )} because there are two possible paths as dist(r, s) = min len(r, ri ) + dist(ri , s), len r, r j + dist r j , s because there are two possible paths from to , i.e., → → and → → ; otherwise, the distance from to is evaluated as from r to s, i.e., r → ri → s and r → r j → s ; otherwise, the distance from r to s is evaluated as ( , ), ( , )} because there are three ( , ) = { ( , )+ ( , ), ( , ) + dist(r, s) = min len(r, ri ) + dist(ri , s), len r, r j + dist r j , s , len(r, s) because there are three possible ( , ) ≤ , an object pair paths from , i.e., → → , → → , and → . If pathspossible from r to s, i.e., r → rto i → s , r → r j → s , and r → s . If dist (r, s ) ≤ ε, an object pair (r, s ) is ( , )inisthe included in the joinotherwise, query result; pair ( , ) isTherefore, not included. included ε-distance join -distance query result; theotherwise, object pairthe included. (r, sobject ) is not is correct for Therefore, the naive EDISON algorithm > 2. Consequently, the naive EDISON the naive EDISON algorithm is correct for ri r j > 2. Consequently, the naive EDISON algorithm is algorithm is correct for = 1, = 2, and > 2. □ correct for ri r j = 1, ri r j = 2, and ri r j > 2.

( , )

( , )

(a)

(b)

Computation of the distance from r ∈ r r to s (a) If s ∈ / ri r j , n o i j dist(r,Figure s) = 6. min len(r, ri ) + r, r j + ∈dist rto ; If (b) ∉ If s , ∈ (ri r, j , ) = dist(r, {s) (=, ) + Computation ofdist the(rdistance i , s ), len from j , s (a) n o ), ( ) ( )} ( ) ( ) ( , ), ( , )+ ( (b) rIf, s , ∈len(r,, s) . , = { , + min len(r, r,i ) + dist(,ri , s+ ), len r, r,j +; dist j ( , ), ( , )}.

Figure 6.

4.4. Evaluation of an Example -Distance Join Query Using the Naive EDISON Algorithm We discuss how to evaluate the -distance join query in Figure 3 using the naive EDISON algorithm. Recall that = 5 , = { , , , , } , and = { , , , , , } are given and that outer objects through are grouped into outer segments and , as shown in Figure 4. Table 3 summarizes the computation of ⋈ for the naive EDISON algorithm.


10 of 23

4.4. Evaluation of an Example ε-Distance Join Query Using the Naive EDISON Algorithm We discuss how to evaluate the ε-distance join query in Figure 3 using the naive EDISON algorithm. Recall that ε = 5, R = {r1 , r2 , r3 , r4 , r5 }, and S = {s1 , s2 , s3 , s4 , s5 , s6 } are given and that outer objects r1 through r5 are grouped into outer segments r1 r2 and r3 r5 r4 , as shown in Figure 4. Table 3 summarizes the computation of R ./ε S for the naive EDISON algorithm. Table 3. Computation of R ./ε S using the naive EDISON algorithm. ri rj

ri

rj

S”ri

S”rj

S ri rj

n

r1 r2 r3 r5 r4

r1 r3

r2 r4

Srε1 = {s1 , s5 } Srε3 = {s6 }

Srε2 = {s1 , s6 } Srε4 = {s2 , s3 }

S (r1 r2 ) = ∅ S (r3 r5 r4 ) = { s2 }

∅ {r5 }


ri+1 , . . . , rj−1

o

S”ri ∪S”rj ∪S ri rj

{ s1 , s5 , s6 } { s2 , s3 , s6 }

10 of 23

Because Algorithm 1 processes outer segments one by one, one, we we determine determine inner inner objects objects within within the threshold distance ε from each outer object r ∈ r r followed by r r r . Two range queries threshold distance from each outer object ∈ 1 2 followed by 3 5 4 . range queries ε RNQ(5, (5, r1)) and RNQ(5, (5, r2)) are evaluated for an outer outer segment segment r1 r2 ,, whose whoseresults resultsare areSr1 =={{s1 ,,s5 }} ε and respectively. Because Because |r1 r2 | = }, respectively. and Sr2 = = 2 holds, holds, we we can can simply simply obtain obtain the the partial partial join join result result = {{s1 , s6 }, , r , s . , r , s , r , s ∪ Φ r = r , s Φ r r for r r , i.e., Φ r r = Φ r ) ( )} ) ( ) ( ) ( ) ( ) ( ) ( {( Φ( 1 2 ) for 1 2 , i.e., Φ(1 2 ) = Φ( 1 ) ∪ Φ( 2 ) = {( 1, 1), ( 1, 5), ( 2, 1), ( 2, 6)}. Similarly, (5, r3 )) and (5, r4)) are Similarly, two two range range queries queries RNQ(5, and RNQ(5, are evaluated evaluated for for an an outer outer segment segment ε = { s }, r3 r5 r4 ,, whose whoseresults resultsare are Srε3=={ {}s6 and = s , s , respectively. Because we have S } and Srε= { } 2 3 }, respectively. Because we have = {r3 }, 6 = 4 { , S{ rε4 , =},{and s2 , s3 }(, and )S= r4 we , we can compute the distance ε from r ∈ r each (r{3 r5 }, ) =can {scompute } 2 5 3 r5 r4 to inner the distance from ∈ to each candidate ε ε candidate Sr 4 ∪ S(r3 r5 rinner can retrieve inner objects 4 ) and r3 ∪ can object ∈ inner retrieve objects within distance fromwithin ∪ object ∪ ( s ∈ )Sand . Fordistance this, we εneed fromtor5compute . For this, we need to compute the distance from r to each candidate inner object 5 to each candidate inner object ∈ { , , }. Because ∈ the distance from s ∈∩ {s(2 , s3 , s)6 }−. Because s2 ∈ to Srε4 ∩ S(r3 r53, r4 ) the − Srε3distance according to Table to 3, the distance 5 (from according Table from is , ) r= to s2{ is (dist r , s = min len r , r + dist r , s , len r , s = min 10, 4 = 4, as shown in ( ) ( ) ( ) ( )} { { } 5 2 5 2 5 2 4 4 ( , ), ( , )} = min{10,4} = 4, as , )+ shown in Figure 7a. Similarly, because Figure s3 ∈ Srε4to− Table Srε3 ∪ S3,(r3the r5 r4 )distance according to Table )) according )= ( , from from ∈ 7a. − ( Similarly, ∪ ( because to 3, theis distance r5 to s is dist r , s = len r , r + dist r , s = 10, as shown in Figure 7b. Finally, because ( ) ( ) ( ) 3 5 3 5 3 4 4 ( , ) = 10, as shown in Figure 7b. Finally, because )) ( , )+ ∈ −( ∪ ( s6 ∈ Srε3 − Srε4 ∪ S(r1 r5 r4 ) according to Table 3, the distance = (r)5 ,=s66) , as ) +s6 is (dist ( , ) =from( r5, to according to Table 3, the distance from to is , in Figure 7c.( Thus, Sε5 = {s(2 } given that dist(r5 , (s2 ) =) 4, len(r5 , r3 ) + dist(r3 , s6 ) = 6, as shown shown in Figure 7c. Thus, = = { } given that , ) = r4, , ) = 10, and , dist(r5 , s3 ) = 10, and dist(r5 , s6 ) = 6. The partial join result Φ(r3 r5 r4 ) for r3 r5 r4 is obtained by ) for 6. The partial join result Φ( is obtained by taking the union of the three range query taking the union of the three range query results at r3 , r4 , and r5 , i.e., Φ(r3 r5 r4 ) = Φ(r3 ) ∪ Φ(r4 ) ∪ ) = Φ( ) ∪ Φ( ) ∪ Φ( ) = {( , ), ( , ), ( , ), ( , )} results at , , and , i.e., Φ( Φ(r5 ) = {(r3 , s6 ), (r4 , s2 ), (r4 , s3 ), (r5 , s2 )} where Φ(r3 ) = {(r3 , s6 )}, Φ(r4 ) = {(r4 , s2 ), (r4 , s3 )}, where Φ( ) = {( , )}, Φ( ) = {( , ), ( , )}, and Φ( ) = {( , )}. Consequently, we have the and Φ(r5 ) = {(r5 , s2 )}. Consequently, we have the result Φ( R) of the ε-distance join query, i.e., ) ∪ Φ( )= result Φ( ) of the -distance join query, i.e., Φ( ) = Φ( Φ( R) = Φ(r1 r2 ) ∪ Φ(r3 r5 r4 ) = {(r1 , s1 ), (r1 , s5 ), (r2 , s1 ), (r2 , s6 ), (r3 , s6 ), (r4 , s2 ), (r4 , s3 ), (r5 , s2 )}. {( , ), ( , ), ( , ), ( , ), ( , ), ( , ), ( , ), ( , )}. 13

12

12

10

10

6

6 4

3 2

4

(a)

3

3

4

7

2

(b)

7

2

(c)

Figure 7. Computation of the distance from r5 to s ∈ {s2 , s3 , s6 } (a) dist(r5 , s2 ) = 4; (b) dist(r5 , s3 ) = 10; ( , ) = 4; (b) ( , )= Figure of the distance from to ∈ { , , } (a) (c) dist(7. r5 ,Computation s6 ) = 6. ( , ) = 6. 10; (c)

5. Optimal Shared Execution for -Distance Join Queries in Road Networks 5.1. Extending Shared Execution Processing to Adjacent Outer Segments We can extend the shared execution processing to adjacent outer segments. We call this shared execution processing the EDISON method. To this end, we investigate the number of outer segments that belong to adjacent vertex sequences of an intersection vertex. Let Ω( ) be the number of outer


11 of 23

5. Optimal Shared Execution for ε-Distance Join Queries in Road Networks 5.1. Extending Shared Execution Processing to Adjacent Outer Segments We can extend the shared execution processing to adjacent outer segments. We call this shared execution processing the EDISON method. To this end, we investigate the number of outer segments that belong to adjacent vertex sequences of an intersection vertex. Let Ω(v x ) be the number of outer segments in vertex sequences adjacent to an intersection vertex v x . If Ω(v x ) = 0, as shown in Figure 8a, no range query is issued at v x . If Ω(v x ) = 1, as shown in Figure 8b, a range query RNQ(ε, ri ) is issued at ri . If Ω(v x ) ≥ 2, as shown in Figure 8c, a range query RNQ(ε vx , v x ) is issued at v x , where Ω(v x ) = n and ε vx = ε − min len v x , ri1 , len v x , ri2 , · · · , len(v x , rin ) . Naturally, if ε vx ≤ 0, no range query at v x isInt. issued. In 2018, summary, Ω(vREVIEW ISPRS J. Geo-Inf. 7, x FORifPEER 11 of 23 x ) = 1, the same shared execution as the naive EDISON algorithm is applied to an outer segment, and if Ω v ≥ 2, an extended shared execution is applied to outer ( ) x ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 11 of 23 segments adjacent to v x .

(a)

(b)

(c)

(a)

(b)

(c)

Figure 8. Investigating the number of outer segments adjacent to an intersection vertex (a) Figure Investigating the(c)number outer segments adjacent to an intersection vertex v x (a) Ω(v x ) = 0; ) = 8. ) = 1; ) of Ω( 0; (b) Ω( Ω( ≥ 2. Figure 8. Investigating the number of outer segments adjacent to an intersection vertex (a) (b) Ω(v x ) = 1; (c) Ω(v x ) ≥ 2. Ω( ) = 0; (b) Ω( ) = 1; (c) Ω( ) ≥ 2.

Returning to the example in Figure 3, we reevaluate the -distance join query to demonstrate the effectiveness ofthe considering theFigure shared3, of adjacent outer segments. To to answer the Returning to to the example in in Figure 3,execution we reevaluate reevaluate the ε-distance join query query to demonstrate Returning example we the -distance join demonstrate distance join query, the naive EDISON method would evaluate four range queries (5, ) ,the effectiveness of considering the shared execution of adjacent outer segments. To answer the effectiveness of considering the shared execution of adjacent outer segments. To answer thethe (5, ), (5, ),the and ) whereas the would EDISON methodfour only range evaluates the range query ε-distance joinquery, query, thenaive naive(5, EDISON method would evaluate evaluate four range queries queries RNQ(5, (5, r1)),, distance join EDISON method (4, ) at the intersection vertex . This occurs because there are two intersection vertices RNQ 5, r , RNQ 5, r , and RNQ 5, r whereas the EDISON method only evaluates the range query ( ) ( ) ( ) 2 (5, ), (5, ),3 and (5, ) 4whereas the EDISON method only evaluates the range query ) ) and = 2 , Ω( = 2 , , as shown in Figure 4a and we have Ω( = − RNQ(4, at the the intersection there areare twotwo intersection vertices v1 and (4, v3)) at intersection vertex vertexv3 . .This Thisoccurs occursbecause because there intersection vertices ), ( , 4a)}and {12,7} (, len ( ,, as = − −22, ,Ω 5 −− − , (v),1), r=1 )2 v3 , {as shown in Figure have Ω4a(v1= = v3 )have = 2,=ε vΩ( = )ε − r= = )and 1, = 4 )} 1 and = {min 2 , ({len Ω( ,, (v)} shown in 5we Figure we(so 5 −{2,1} min 12, 7 = − 2, so ε = ε − min len v , r , len v , r = 5 − min 2, 1 = 4. ( ) ( )} { } { { } 2 , so 3 3 = − ( , )}v3= 5 − {12,7} =3 −2 ( , )} = 5 − { (= 4. { ( , ), , ), We present a simple heuristic where no range queries close to terminal vertices are issued. We present a simple heuristic where no range queries close to terminal vertices are issued. As {2,1} = 4. As shown in Figure 9, the example network has one intersection vertex v and three terminal vertices shown in Figure 9, the example network has one intersection vertex and three terminal vertices x We present a simple heuristic where no range queries close to terminal vertices are issued. As v , v , and v this figure, the naive EDISON method evaluates two range queries RNQ r)i )and and , , and . In this figure, the naive EDISON method evaluates two range queries ( (,ε,vertices 3 5 4 shown in Figure 9, the example network has one intersection vertex and three terminal RNQ ε, r to answer the ε-distance join query. However, the EDISON method evaluates only the ( , ) to answer the -distance join query. However, the EDISON method evaluates only the j , , and . In this figure, the naive EDISON method evaluates two range queries ( , ) and ε ε ∪S r r = range query RNQ ε, r to answer the same ε-distance join query. This is because S S ( ) ) ( range query ( , = ) to answer the same -distance join query. This is because ∪ i i j the ri rj ( , ) to answer the -distance join query. However, the EDISON method evaluates only ε Sri∪∪ S(query ri r j v)3 holds, holds, and sufficient to evaluate thejoin range query rather than the two isissufficient to the range query ( (,ε, r)i )rather )= ( two range ( , and ) toititanswer the same -distance query. ThisRNQ is because ∪ than ∪ the evaluate range queries RNQ ε, r and RNQ ε, r . ( ) range queries ( , ) and ( , ). i j ) holds, and it is sufficient to evaluate the range query ( , ) rather than the two ∪ ( range queries

( , ) and

( , ).

Figure 9. Simple heuristic Srεi ∪ Srεj ∪ S ri r j = Srεi ∪ S(ri v3 ).

Figure 9. Simple heuristic

5.2. EDISON Algorithm

Figure 9. Simple heuristic

∪ ∪

∪ ( ∪ (

)= )=

∪ ( ∪ (

). ).

5.2. EDISON Algorithm Algorithm 2 describes the EDISON algorithm based on shared execution to avoid redundant | | ≤ | ̅| , rangeAlgorithm queries while processing -distance join queries. we assume that redundant 2 describes the EDISON algorithm basedFor onsimplicity, shared execution to avoid ), the number of outer because ⋈ while = ⋈processing . The EDISON algorithm examines | | ≤ | ̅| , range queries -distance join queries. For Ω( simplicity, we assume that segments


12 of 23

5.2. EDISON Algorithm Algorithm 2 describes the EDISON algorithm based on shared execution to avoid redundant range queries while processing ε-distance join queries. For simplicity, we assume that R ≤ S , because R ./ε S = S ./ε R. The EDISON algorithm examines Ω(v x ), the number of outer segments adjacent to each intersection vertex v x , and applies the extended shared execution to outer segments adjacent to v x if Ω(v x ) ≥ 2. If Ω(v x ) < 2, then no range query is issued at v x . If Ω(v x ) ≥ 2 and ε ε vx > 0, then a range query RNQ(ε vx , v x ) is issued and its result is saved to Svvxx , where ε vx = ε − min len v x , ri1 , len v x , ri2 , · · · , len(v x , rin ) , assuming that outer segments ri1 r j1 , ri2 r j2 , · · · , rin r jn are adjacent to v x and that rik is closer to v x than r jk for 1 ≤ k ≤ n (lines 8-11). Next, for each outer segment ri r j ∈ R, we retrieve a set of object pairs (r, s) within distance ε from each outer object r ∈ ri r j . We assume that ri r j is close to vl (vm ) for an outer segment ri r j ∈ vl vm . We consider the following four cases depending on the values of Ω(vl ) and Ω(vm ), where vl vm is a vertex sequence containing an outer segment ri r j : (1) Ω(vl ) = Ω(vm ) = 1; (2) Ω(vl ) = 1 and Ω(vm ) ≥ 2; (3) Ω(vl ) ≥ 2 and Ω(vm ) = 1; and (4) Ω(vl ) ≥ 2 and Ω(vm ) ≥ 2. If Ω(vl ) = 1 and Ω(vm ) = 1, then inner objects within distance ε from each outer object r ∈ ri r j are retrieved among the candidate inner objects in Srεi ∪ Srε j ∪ S ri r j (lines 13-16). If Ω(vl ) = 1 and Ω(vm ) ≥ 2, then inner objects within distance ε from each outer object r ∈ ri r j are retrieved among the candidate inner objects ε in Srεi ∪ Svvmm ∪ S(ri vm ) (lines 17-19). If Ω(vl ) ≥ 2 and Ω(vm ) = 1, then inner objects within distance ε εv from each outer object r ∈ ri r j are retrieved among the candidate inner objects in Svl l ∪ Srε j ∪ S vl r j (lines 20-22). Finally, if Ω(vl ) ≥ 2 and Ω(vm ) ≥ 2, then inner objects within distance ε from each outer εv ε object r ∈ ri r j are retrieved among the candidate inner objects in Svl l ∪ Svvmm ∪ S(vl vm ) (lines 23-24). A partial join result Φ ri r j for ri r j is added to the query result Φ( R). Finally, the ε-distance join query result Φ( R) is returned after all outer segments have been processed (line 26), i.e., Φ( R) = ∪Φ ri r j for each outer segment ri r j ∈ R. Algorithm 2: EDISON (ε, R, S) Input: R: set of outer objects, S: set of inner objects Output: R ./ε S: set of object pairs (r, s) ∈ R × S such that dist(r, s) ≤ ε 1 Φ( R) ← ∅ // Φ( R) is the set of object pairs (r, s) ∈ R × S such that dist(r, s) ≤ ε. 2 // Step 1: neighboring outer objects are grouped and neighboring inner objects are also grouped. 3 R ← group_objects( R) // Outer objects in a vertex sequence are grouped into an outer segment. S ← group_objects(S) // Inner objects in a vertex sequence are grouped into an inner segment. 4 5 // For simplicity, we assume that R ≤ S . if R > S , we simply evaluate S ./ε R. // Step 2: ri r j ./ε S is evaluated by extending shared execution processing to adjacent outer segments. 6 7 for each intersection vertex v x do 8 if Ω(v x ) ≥ 2 then // Ω(v x ) ≥ 2 means that more than two outer segments are adjacent to v x . 9 ε vx ← ε − min len v x , ri1 , len v x , ri2 , · · · , len v x , rin // assume that Ω(v x ) = n. 10 if ε vx > 0 then // If ε vx ≤ 0, then no range query is evaluated at v x . ε 11 Svvxx ← RNQ(ε vx , v x ) // Otherwise, a range query RNQ(ε vx , v x ) is evaluated at v x . for each outer segment ri r j ∈ vl vm // Assume that ri (r j ) is close to vl (vm ). 12 do 13 if Ω(vl ) = 1 and Ω(vm ) = 1 then 14 Srεi ← RNQ(ε, ri ) // A range query is evaluated at ri because no range query is issued at vl . 15 16 17 18 19 20 21 22 23 24 25 26

Srεj ← RNQ ε, r j Φ ri r j ← DJQ ε, ri r j , Srεi ∪ Srεj ∪ S ri r j

else if Ω(vl ) = 1 and Ω(vm ) ≥ 2 then ← RNQ Srε (ε, ri ) i

Φ ri r j ← DJQ ε, ri r j , Srεi ∪ Svvmm ∪ S(ri vm ) ε

// A range query is evaluated at r j because no range query is issued at vm .

else if Ω(vl ) ≥ 2 andΩ(vm ) = 1 then

Srεj ← RNQ ε, r j εv Φ ri r j ← DJQ ε, ri r j , Svl l ∪ Srεj ∪ S vl r j

else ifΩ(vl ) ≥ 2 and Ω(vm ) ≥ 2 then εv

Φ ri r j ← DJQ ε, ri r j , Svl l ∪ Svvmm ∪ S(vl vm ) Φ ( R ) ← Φ ( R ) ∪ Φ ri r j

return Φ( R)

ε

// A range query is evaluated at ri because no range query is issued at vl . ε

// Svvmm is reused by outer segments adjacent to vm . // A range query is evaluated at r j because no range query is issued at vm . εv

// Svl l is reused by outer segments adjacent to vl .

εv

ε

// Svl l (Svvmm ) is reused by outer segments adjacent to vl (vm ). // A partial join result Φ ri r j for ri r j is added to Φ( R). // Φ( R) stores the result of R ./ε S.

Algorithm 3 retrieves a set Φ ri r j of object pairs (r, s) within distance ε, where r ∈ ri r j and εβ s ∈ Sαε α ∪ S β ∪ S αβ . First, Φ ri r j is initialized to the empty set. According to the condition of an


13 of 23

εβ inner object s ∈ Sαε α ∪ S β ∪ S αβ , the distance from r to s is computed (lines 4-11). Note that dist(r, s) in Algorithm 3 may not necessarily be the length of the shortest path from r to s, as discussed in Section 5.3. If dist(r, s) ≤ ε, then the object pair (r, s) is added to the partial join result Φ ri r j (lines 12-14). Clearly, Φ ri r j is returned after all candidate object pairs (r, s) have been examined (line 15). In Lemma 3, we prove the correctness of the EDISON algorithm. ε Algorithm 3: DJQ ε, ri r j , Sαε α ∪ S ββ ∪ S αβ Input: ε: threshold distance, ri r j : outer segment, ε Sαε α ∪ S ββ ∪ S αβ : set of candidate inner objects for outer objects in ri r j

Output: Φ ri r j : set of object pairs (r, s) such that ε dist(r, s) ≤ ε for r ∈ ri r j and s ∈ Sαε α ∪ S ββ ∪ S αβ // Φ ri r j is the set of object pairs Φ ri r j ← ∅(r, s) within distance ε where 1 ε r ∈ ri r j and s ∈ Sαε α ∪ S ββ ∪ S αβ . 2 3

4 5 6 7 8 9 10 11 12 13 14 15

for each outer object r ∈ ri r j do ε for each inner object s ∈ Sαε α ∪ S ββ ∪ S αβ do // dist(r, s) is computed according to the condition of s. ε if s ∈ Sαε α ∩ S ββ ∩ S αβ then dist(r, s) ← min{len(r, α) + dist(α, s), len(r, β) + dist( β, s), len(r, s)} ε else if s ∈ Sαε α ∩ S ββ − S αβ then dist(r, s) ← min{len(r, α) + dist(α, s), len(r, β) + dist( β, s)} εβ εα else if s ∈ Sα ∩ S αβ − S β then dist(r, s) ← min{len(r, α) + dist(α, s), len(r, s)} ε else if s ∈ S ββ ∩ S αβ − Sαε α then dist(r, s) ← min{len(r, β) + dist( β, s), len(r, s)} ε β εα else if s ∈ Sα − S β ∪ S αβ then dist(r, s) ← len(r, α) + dist(α, s) ε else if s ∈ S ββ − Sαε α ∪ S αβ then dist(r, s) ← len(r, β) + dist( β, s) ε else if s ∈ S αβ − Sαε α ∪ S ββ then dist(r, s) ← len(r, s) // If dist(r, s) ≤ ε then an object pair (r, s) is involved in the query result. if dist(r, s) ≤ ε then

Φ ri r j ← Φ ri r j ∪ {(r, s)}

return Φ ri r j

// A partial join result Φ ri r j for ri r j is returned.

Lemma 3. The EDISON algorithm is correct. Proof. We prove the correctness of the EDISON algorithm by cases. As shown in Algorithm 2, the EDISON algorithm handles the following four cases differently depending on the values of Ω(vl ) and Ω(vm ) of a vertex sequence vl vm containing an outer segment ri r j , namely, (1) Ω(vl ) = Ω(vm ) = 1; (2) Ω(vl ) = 1 and Ω(vm ) ≥ 2; (3) Ω(vl ) ≥ 2 and Ω(vm ) = 1; and (4) Ω(vl ) ≥ 2 and Ω(vm ) ≥ 2. For Ω(vl ) = Ω(vm ) = 1, two range queries, RNQ(ε, ri ) and RNQ ε, r j , are evaluated at ri and r j , respectively. According to Lemma 1, we can retrieve the inner objects within distance ε from every outer object r ∈ ri r j among the candidate inner objects s ∈ Srεi ∪ Srε j ∪ S ri r j . The EDISON algorithm computes the distance dist(r, s) from an outer object r to each candidate inner object s ∈ Srεi ∪ Srε j ∪ S ri r j . Because dist(r, s) is the length of the shortest path among three possible paths (i.e., r → ri → s , r → r j → s , and r → s if s ∈ ri r j ), it is determined simply depending on the conditions listed in Algorithm 3. Specifically, if s ∈ / ri r j , then dist(r, s) = min len(r, ri ) + dist(ri , s), len r, r j + dist r j , s ; otherwise, dist(r, s) = min len(r, ri ) + dist(ri , s), len r, r j + dist r j , s , len(r, s) . If dist(r, s) ≤ ε, an object pair (r, s) is included in the ε-distance join query result; otherwise, the object pair (r, s) is not included. Therefore, the EDISON algorithm is correct for Ω(vl ) = Ω(vm ) = 1. For Ω(vl ) = 1 and Ω(vm ) ≥ 2, two range queries, RNQ(ε, ri ) and RNQ(ε vm , vm ), are evaluated at ri and vm , respectively, where ε vm = ε − min len vm , r j1 , len vm , r j2 , · · · , len vm , r jn , assuming that the outer segments ri1 r j1 , ri2 r j2 , · · · , rin r jn are adjacent to vm , and that r jk is closer to vm than rik ε for 1 ≤ k ≤ Ω(vm ). Because ri r j ∈ ri1 r j1 , ri2 r j2 , · · · , rin r jn and ε vm ≥ ε0 , set Svvmm of the inner objects 0 within distance ε vm from vm contains set Svε m of the inner objects within distance ε0 from vm , where 0 ε ε0 = ε − len vm , r j , i.e., Svε m ⊆ Svvmm . According to Lemma 1, we can retrieve inner objects within ε distance ε from every outer object r ∈ ri r j among the candidate inner objects s ∈ Srεi ∪ Svvmm ∪ S(ri vm ) ε because Srεi ∪ Svvmm ∪ S(ri vm ) contains Srεi ∪ Srε j ∪ S ri r j . The EDISON algorithm computes the distance


14 of 23

ε

dist(r, s) from an outer object r to each candidate inner object s ∈ Srεi ∪ Svvmm ∪ S(ri vm ). Because dist(r, s) is the length of the shortest path among three possible paths (i.e., r → ri → s , r → vm → s , and r → s if s ∈ ri vm ), it is determined simply depending on the conditions listed in Algorithm 3. Specifically, if s ∈ / ri vm , then dist(r, s) = min{len(r, ri ) + dist(ri , s), len(r, vm ) + dist(vm , s)}; otherwise, dist(r, s) = min{len(r, ri ) + dist(ri , s), len(r, vm ) + dist(vm , s), len(r, s)}. If dist(r, s) ≤ ε, an object pair (r, s) is included in the ε-distance join query result; otherwise, the object pair (r, s) is not included. Therefore, the EDISON algorithm is correct for Ω(vl ) = 1 and Ω(vm ) ≥ 2. Without loss of generality, the proof of the correctness of the EDISON algorithm for Ω(vl ) ≥ 2 and Ω(vm ) = 1 can be simply obtained by interchanging the roles of vl and vm , as well as the roles of ri and r j , in the proof for Ω(vl ) = 1 and Ω(vm ) ≥ 2, which is omitted. Finally, for Ω(vl ) ≥ 2 and Ω(vm ) ≥ 2, two range queries, RNQ ε vl , vl and RNQ(ε vm , vm ), are evaluated at vl and vm , respectively, where ε vl = ε − min{len(vl , r a1 ), · · · , len(vl , r an )} and ε vm = ε − min len vm , rd1 , · · · , len(vm , rdn ) . It is assumed that the outer segments r a1 rb1 , · · · , r an rbn rc1 rd1 , · · · , rcn rdn are adjacent to vl (vm ), and that r a k rdk is closer to vl (vm ) than rbk rck for 1 ≤ k ≤ Ω(vl ) (1 ≤ k ≤ Ω(vm )). Because ri r j ∈ r a1 rb1 , · · · , r an rbn (ri r j ∈ rc1 rd1 , · · · , rcn rdn ) εv ε and ε vl ≥ ε − len(vl , ri ) ε vm ≥ ε − len vm , r j , set Svl l Svvmm of the inner objects within distance 0 00 ε vl (ε vm ) from vl (vm ) contains set Svε l Svε m of the inner objects within distance ε00 (ε0 ) from vl 0 εv 00 ε (vm ), where ε00 = ε − len(vl , ri ) ε0 = ε − len vm , r j , i.e., Svε l ⊆ Svl l Svε m ⊆ Svvmm . According to Lemma 1, we can retrieve the inner objects within distance ε from every outer object r ∈ ri r j εv

εv

ε

ε

among the candidate inner objects s ∈ Svl l ∪ Svvmm ∪ S(vl vm ) because Svl l ∪ Svvmm ∪ S(vl vm ) contains Srεi ∪ Srε j ∪ S ri r j . The EDISON algorithm computes the distance dist(r, s) from an outer object εv

ε

r ∈ ri r j to each candidate inner object s ∈ Svl l ∪ Svvmm ∪ S(vl vm ). Because dist(r, s) is the length of the shortest path among three possible paths (i.e., r → vl → s , r → vm → s , and r → s if s ∈ vl vm ), it is determined simply depending on the conditions listed in Algorithm 3. Specifically, if s∈ / vl vm , then dist(r, s) = min{len(r, vl ) + dist(vl , s), len(r, vm ) + dist(vm , s)}; otherwise, dist(r, s) = min{len(r, vl ) + dist(vl , s), len(r, vm ) + dist(vm , s), len(r, s)}. If dist(r, s) ≤ ε, an object pair (r, s) is included in the ε-distance join query result; otherwise, the object pair (r, s) is not included. Therefore, the EDISON algorithm is correct for Ω(vl ) ≥ 2 and Ω(vm ) ≥ 2. Consequently, the EDISON algorithm is correct for Ω(vl ) = Ω(vm ) = 1, Ω(vl ) = 1 and Ω(vm ) ≥ 2, Ω(vl ) ≥ 2 and Ω(vm ) = 1, and Ω(vl ) ≥ 2 and Ω(vm ) ≥ 2. 5.3. Evaluation of an Example ε-Distance Join Query Using the EDISON Algorithm We discuss how to evaluate the ε-distance join query in Figure 3 using the EDISON algorithm. As shown in Figure 4, R and S are grouped into R = {r1 r2 , r3 r5 r4 } and S = {s1 s5 s4 , s2 s3 , s6 }, respectively. Because R < S , we evaluate R ./ε S. There are two intersection vertices, v1 and v3 , both of which are adjacent to r1 r2 and r3 r5 r4 . Therefore, to determine whether range queries at v1 and v3 are evaluated, the EDISON algorithm computes the distances ε v1 and ε v3 for the range queries at v1 and v3 , respectively. Because ε v1 = ε − min{len(v1 , r1 ), len(v1 , r4 )} = 5 − min{12, 7} = −2 and ε v3 = ε − min{len(v3 , r2 ), len(v3 , r3 )} = 5 − min{2, 1} = 4, the EDISON algorithm evaluates the range query RNQ(4, v3 ) only. Clearly, the range query RNQ(−2, v1 ) returns the empty set. Table 4 summarizes the computation of R ./ε S for the EDISON algorithm. Table 4. Computation of R ./ε S using the EDISON algorithm. ri rj

fffi

ff

fi

r1 r2

v1 v2 v3

v1

v3

r3 r5 r4

v1 v4 v3

v1

v3

S”ffff Sv−12 = ∅ Sv−12 = ∅

S”fifi

S fffi

Sv43 = {s6 }

S ( v1 v2 v3 ) = { s1 , s4 , s5 } S ( v1 v4 v3 ) = { s2 , s3 }

Sv43 = {s6 }

S”ffff ∪S”fifi ∪S fffi

{ s1 , s4 , s5 , s6 } { s2 , s3 , s6 }


15 of 23

We retrieve inner objects within distance ε from each outer object r ∈ r1 r2 among the candidate inner objects, followed by inner objects within distance ε from each outer object r ∈ r3 r5 r4 . As shown in Table 4, {s1 , s4 , s5 , s6 } is the set of candidate inner objects for r1 r2 , and {s2 , s3 , s6 } is the set of candidate inner objects for r3 r5 r4 . The EDISON algorithm computes the distance between an outer object r and each candidate inner object s and finds all qualifying object pairs (r, s) such that dist(r, s) ≤ ε. We compute the distance from r1 to each candidate inner object s ∈ {s1 , s4 , s5 , s6 }. Because s1 ∈ S(v1 v2 v3 ) − (Sv−12 ∪ Sv43 ) according to Table 4, the distance from r1 to s1 is dist(r1 , s1 ) = len(r1 , s1 ) = 2, as shown in Figure 10a. Because s4 ∈ S(v1 v2 v3 ) − Sv−12 ∪ Sv43 , the distance from ISPRS r1 toInt.s4J. Geo-Inf. is dist lenREVIEW (r , s ) =PEER (r1 , s4 ) = 10, as shown in Figure 10b. Similarly,15 because 2018,1 7, x4 FOR of 23 s5 ∈ S(v1 v2 v3 ) − Sv−12 ∪ Sv43 , the distance from r1 to s5 is dist(r1 , s5 ) = len(r1 , s5 ) = 3, as shown Table 4.s6Computation ⋈ S (vusing algorithm. in Figure 10c. Finally, because ∈ Sv43 − Sofv−12 ∪ according to Table 4, the distance from ) EDISON 1 v2 v3the r1 to s6 is dist(r1 , s6 ) = len(r1 , v3 ) + dist(v3 , s6 ) = 5 + 3 = 8, as shown in Figure 10d. Consequently, ∪ ( ) ) (r1 , s6 ) = 8, and the ∪ ( dist Srε1 = {s1 , s5 } given that dist(r1 , s1 ) = 2, dist(r1 , s4 ) = 10, dist(r1 , s5 ) = 3, and ( ) = r∅1 , s1 ),=(r{ 1 ,}s5 )}. { , , , } generated partial join result is Φ(r1 ) = {( ={ , , } =∅

={ }

(

)={ ,

}

{ ,

,

} 15

10

10 7

2

2 1 2

10

2

2

1 2 2

10

2

(a)

(b) 20

9

8

8 3

3 3 1 2 2

9

(c)

1 2 2

12

(d)

Figure 10. Computation of the distance from r1 to s ∈ {s1 , s4 , s5 , s6 } (a) dist(r1 , s1 ) = 2; (b) dist(r1 , s4 ) = ( , ) = 2 ; (b) Figure 10. Computation of the distance from to ∈ { , , , } (a) 10; (c) dist((r1, , s)5 )== 3; (d) dist r , s ) 3; = (d) 8. ( ( , 1 )6= ( , ) = 8. 10; (c)

We compute the distance from each candidate inner inner object We compute the distance from r2 to to each candidate object s∈∈{ {s, 1 , ,s4 ,,s5 ,}.s6Because }. Because∈ s1 ∈ ( ) ) ( − 2 4 − ( = ∪ ) according to Table 4, the distance from to is , S(v1 v2 v3 ) − (Sv1 ∪ Sv3 ) according to Table 4, the distance from r2 to s1 is dist(r2 , s1 ) = len( (r, 2 , s)1=) = 5, 5 , asinshown Figure 11a.s Because ∈ ( ∪ ) , the distance from to is −2) − (4 as shown Figure in 11a. Because 4 ∈ S ( v1 v2 v3 ) − ( Sv1 ∪ Sv3 ), the distance from r2 to s4 is dist (r2 , s4 ) = ( , ) = 13, as shown in Figure 11b. ( , )= In fact, the shortest path from to is → len(r2 , s4 ) = 13, as shown in Figure 11b. In fact, the shortest path from r2 to s4 is r2 → v3 → v1 → s4 , ( , ) = 9. However, this deviation from the shortest distance → → , whose length is whose length is dist(r2 , s4 ) = 9. However, this deviation from the shortest distance does not affect )−( does not affect the query result. Similarly, because ∈ ( ∪ ), this distance from 2 ∪ S4 ), this distance from r to s is the query Similarly, − (Sv−111c. (v1 v2 vin 3 ) Figure v3 ( , ) =s56, ∈as Sshown ( , ) = because Finally, because to result. is ∈ − (2 ∪ 5 4 − S −2 ∪ S ( v v v ) dist(r2 ,(s5 ) = )) len(according r2 , s5 ) = 6, as shown in Figure 11c. Finally, because s ∈ S ( v3, ) = v1 ( , 1) +2 3 to Table 4, the distance from to is 6 according( to, Table from r2into Figure s6 is dist 3 = 5, (r2 , sConsequently, ) = 2+ 6 ) = len (r2 , v3 ) +=dist ) = 24, + the 3 = distance 5 , as shown 11d. { (, v3}, s6given that ε as shown( in, Figure , s6 } given , s1 )generated = 5, distpartial (r2the (r2 , s4join ) = 13, ) = 5, 11d.( Consequently, ( S,r2 )== {6,s1and ( , that ) = dist 5, and , ) = 13, ) =dist )}.and the generated partial join result is Φ(r2 ) = {(r2 , s1 ), (r2 , s6 )}. is 6, Φ(and {( (,r2 ,),s6() = , 5, dist(r2result , s5 ) = We compute the distance from each candidateinner inner object object s ∈∈{ {s, 2 , s, 3 , }. We compute the distance from r3 to to each candidate s6 Because }. Because∈s2 ∈ − 2 4 ( ) ( (r, , s) = ( ∪ S ∪ according ) according to Table 4,distance the distance from to (r , s is) = len S(v1 v4 v3 ) − S− to Table 4, the from r to s is dist 3 2 3 2 3 2 ) = 6, v1 v3 ( , ) = 6, as shown in Figure 12a. Similarly, because )−( ∈ ( ∪ ), the distance ( , ) = 12, as shown in Figure 12b. In fact, the shortest path from ( , )= to is from ( , ) = 10. However, this deviation from the to is → → → , whose length is )), the shortest distance does not affect the query result. Finally, because ∈ −( ∪( ( , )+ ( , ) = 1 + 3 = 4, as shown in Figure 12c. ( , )= distance from to is ( , ) = 12, and ( , ) = 4, and the ( , ) = 6, Consequently, = { } given that


16 of 23

as shown in Figure 12a. Similarly, because s3 ∈ S(v1 v4 v3 ) − (Sv−12 ∪ Sv43 ), the distance from r3 to s3 is dist(r3 , s3 ) = len(r3 , s3 ) = 12, as shown in Figure 12b. In fact, the shortest path from r3 to s3 is r3 → v3 → v1 → s3 , whose length is dist(r3 , s3 ) = 10. However, this deviation from the shortest distance does not affect the query result. Finally, because s6 ∈ Sv43 − Sv−12 ∪ (v1 v4 v3 ) , the distance from r3 to s6 is dist(r3 , s6 ) = len(r3 , v3 ) + dist(v3 , s6 ) = 1 + 3 = 4, as shown in Figure 12c. Consequently, Srε3 = {s6 } given that dist(r3 , s2 ) = 6, dist(r3 , s3 ) = 12, and dist(r3 , s6 ) = 4, and the generated partial join result is Φ(r3 ) = {(r3 , s6 )}. We compute the distance from r4 to each candidate inner object s ∈ {s2 , s3 , s6 }. Because s2 ∈ −2 Geo-Inf. ISPRS 7, x FOR PEER of ( 23r4 , s2 ) = 3, S ( v1 v4 v3 ) − SInt. Sv43 2018, according to REVIEW Table 4, the distance from r4 to s2 is dist(r4 , s2 ) = 16len v1 J.∪ ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 16 of 23 as shown in Figure 13a. Similarly, because s3 ∈ S(v1 v4 v3 ) − Sv−12 ∪ Sv43 , the distance from r4 to s3 is We compute the distance from to each candidate inner object ∈ { , , 4 }. Because ∈ }. − { , S Because We()rcompute from to 4, each candidate inner because object ∈(v1 v4 v3 ) , dist(r4 , s3 ) = 3,)distance as shown Figure Finally, S(v−12, ∪) S= 4−, (s3 ) = ( len ∪the according toin Table the 13b. distance from to is ∈s6 (∈ ,, v)3 = ( ( , in )= ( 13, ( = shown ∪ ) according to len Table , )as the distance r)4−to v3 ∈)distance +( dist(from v)3−, s(6 ) to =∪ 10is 3= (rSimilarly, (r44,, the 3, from as shown in sFigure 13a. ),+the distance from to Figure 13c. 6 is dist 4 , s6 ) = because )−( ( 3, as shown in Figure 13a. Similarly, because ∈ ∪ ), the distance from to ) = from ( , r) = )), However, ( , path Finally, becauselength ∈ is−dist ( (r∪4 , s( 6 ) = 9. In fact, the isshortest to3,s asisshown r4 →invFigure s6 , whose 4 → v13b. 1 → ( , 4) = 3, 6as shown )), ( , )= in)Figure(13b. Finally, because ∈ −( ∪ ( is ε ) ( ) ( = + = 10 + 3 = 13 , as shown the distance from to is , , , this deviation from the shortest query Sr in= {s2 , s3 } ) not ( , )the ( , result. ) = 10 +Consequently, (does = affect , as shown the distance from to distance is , from Figure 13c. In fact, the shortest path to + is → → → 3 =, 13 whose length 4in is given that Figure dist((r,413c. , s)2=) In = fact, 3, dist , s3deviation 3, and dist , s6 ) = and thenotgenerated partial join (r4this ) = path (r4to the shortest from is 13, → → →affect , whose length is result is 9. However, from the shortest distance does the query result. deviation result. Φ(r4 ) = {(Consequently, r4 , (s2 ), , ()r= s3However, )}.= { , this ( , ) does ) = 13, ( the 4 , 9. = 3, distance = 3, not andaffect(the and } given that from , )shortest , query ( , ( , ) = 3, Consequently, = { , } given that the generated partial join result is Φ( ) = {( , ), ( , )}. the generated partial join result is Φ( ) = {( , ), ( , )}.

) = 3, and

( ,

) = 13, and 15 15

13 13

10 10

7 7

5 5

2 2 2 2

2 1 2 2 2 1 2 2

10 10

1 2 2 1 2 2

10 10

(a) (a)

(b) (b) 20 20

9 9 6 6

8 8 5 5 3 1 2 2 3 1 2 2

9 9

1 2 2 1 2 2

12 12

(c) (c)

3 3

(d) (d)

Figure 11. Figure Computation of the distance r2 to s ∈ {sto (r2 , s1() = ) =(b) 1 , s4 ,∈s5{ , s, 6 }, (a) 11. Computation of the from distance from , dist } (a) , 5; 5 ; dist (b) (r2 , s4 ) = ( , ) = 5 ; (b) to ∈ { , , , } (a) (from 13; = ,s6 ))the , ) = 5. 13; (c) dist(Figure r2 , (s5 ), 11. =) =6;Computation (d)(c)dist(r(2 ,of =6;distance 5.(d) ( ,

) = 13; (c)

( ,

) = 6; (d)

( ,

) = 5.

20 20

13 13

12 12

10 10

7 7

6 6 6 6

1 3 1 3

(a) (a)

4 4

2 1 2 1

4 4 4 4

2 1 2 1

7 7

(b) (b)

2 1 2 1

4 4

6 6

1 1

7 7

3 2 1 3 2 1

(c) (c)

( 6; ) =dist 12. Computation of the from distance , dist , }(r(a) , (b) 6 ; ((b) Figure 12. Figure Computation of the distance r3 tofrom s ∈ {s2 ,tos3 , s6∈}{ (a) r3 , s3 ) = 12; 3 , s2 ) = ( , ) = 6 ; (b) Figure to ∈ { , , } (a) ) =Computation ( ,of )the ( , 12. 12; (c) = 4.distance from (c) dist(r3 , s6 ) (= , 4. ) = 12; (c) ( , ) = 4.


17 of 23


17 of 23


17 of 23

20 20

13 10

13

13 10

7 7

3 63

13

1 3

4 1 3

6

(a)

4 3

2 1 2 1

4

3

4 3 4 4

2 1

7

2 1

7

2 1

1

6

2 1

1

6

(b)

(a)

2 1

7

(c)

(b)

32 1

7

(c)

Figure 13. Computation of the distance from r to s ∈ {s2 , s3 , s6 } (a) dist(r4 , s2 ) = 3; (b) dist(r4 , s3 ) = 3; ( , ) = 3 ; (b) Figure 13. Computation of the distance 4 from to ∈ { , , } (a) ( , ) = 3 ; (b) Figure Computation of the distance from to ∈ { , , } (a) (c) dist = 13. (r4 , s6 )13. ( , () = 13.13. ) =(c)3; (c) ( ,( ,) =) = , 3;

thethe distance from to eachcandidate candidate inner object s {,∈ ,,{s}. s3}., sBecause We compute the distance inner object ∈ 2, , Because 6 }. Because We compute distance from r5 to toeach each candidate inner object ∈ {∈ ∈ from 4 s2( ∈ (S()v1−v4()v− according to Table 4, the distance from r to s dist r , s = ( ) ) ( ) ( = ) Saccording to Table 4, distance from to is , ∪ S∪v−)12 ∪ according to Table 4, the distance from to is , 3 )(− 5 2 5 2 v3 −2 ∪ S4 , the ) 4, =as4,as asshown shown in 14a. Similarly, because , =4, ∈∈ ( s(3 ∈ )S−)(v(−1 v(4 v∪3 ) ∪ len(r5 , s2() = inFigure Figure 14a. Similarly, because −), the S), shown in Figure 14a. Similarly, because the distance v1 distance v3 ) =(r10, shown Finally, because tois is s(3 is,( dist ,) =()r= ∈ because fromto distance from r5 to , s )( =, len = 10, in as shown in Figure 14b. Finally, ) shown 5 , sas 3as inFigure Figure14b. 14b. Finally, because ∈− − from 5 3( , ) = 10, 4 − 2 )), ) ( ) ( ) ( )), ( the distance from = + = 3 + 3 = ∪ S to is , , , (Sv (− s(6 ∈ ∪ ∪ S v v v , the distance from r to s is dist r , s = len r , v + dist v = ( ) ( ) ( ) ( ) 6, 4 3 v1 the1distance from 3s6= to is 5 ( 6, ) = 5 ( 6 , ) + 5 ( 3 , ) = 3 +3 ,6, 3 ε ) ( ) ( as shown in Figure 14c. Consequently, = 4, = 10, and = { } given that , , 3 +shown 3 = in 6, Figure as shown Figure 14c. Consequently, Sr5 = {(s2 }, given 4, ) =and ) = 4, that (dist as 14c. in Consequently, = { } given that , (r)5 ,=s210, ) = {( join = 6,and anddist the(rgenerated partial join generated result is Φ(partial Finally, we obtain the ,= )10, , )}. dist((r5 ,, s3()) = , s = 6, and the result is Φ r = r , s ) ( ) )} {( 5 6 5 5 2 6, and the generated partial join result is Φ( ) = {( , )}. Finally, we obtain the. complete query result Φ( ) = Φ( ) ∪ Φ( ) ∪ Φ( ) ∪ Φ( ) ∪ Φ( ) = Finally, we obtain the complete query result Φ R = Φ r1 ) ∪) Φ r2 ) ∪) ∪ ΦΦ( ( ) (r3 ) ∪ (r4 ) )∪∪ΦΦ( (r5 )) = ) ∪ΦΦ( complete =(Φ( ∪ (Φ( = ), ( , ), ( , ), Φ( {( , ), ( , ), (query ( , ))}. , ), ( , ), ( , result r , s , r , s , r , s , r , s , r , s , r , s , r , s , r , s . ) ( ) ( ) ( ) ( ) ( ) ( ) ( )} {( 5 2 2 6 3 6 2 3 5 2 1 1 1 1 4 4 ), ( ), ( ), ( ), ( ), ( ), ( ), {( , ( , )}. , , , , , , 20

20 13 10

10 7

10 4

4 6

10

1 3

1 3

(a)4

4

2 1

2 1

6 4

7 6

13

4

4

2 1

6

2 1

7

3 1

6

2 1

7

3 4

2 1 (b) 7

(a)14. Computation of the distance from(b) Figure ( , ) = 6. ( , ) = 10; (c)

2 1

to

∈{ ,

6

,

} (a)

( ,

(c)1

7

2 1

) =(c) 4 ; (b)

Figure 14. Computation of the distance from r5 to s ∈ {s2 , s3 , s6 } (a) dist(r5 , s2 ) = 4; (b) dist(r5 , s3 ) = 10; ( , ) = 4 ; (b) Figure Computation of the distance from to ∈ { , , } (a) 6. Performance (c) dist(r14. 6. 5 , s6 ) = Study ( , ) = 6. ( , ) = 10; (c) In this section, we report on an empirical analysis of our proposed solution. We present our 6. Performance Study experimental settings in Section 6.1, followed by our experimental results in Section 6.2. 6. Performance Study

In this section, we report on an empirical analysis of our proposed solution. We present our 6.1.this Experimental In section, Settings we report on an empirical analysis of our proposed solution. We present our experimental settings in Section 6.1, followed by our experimental results in Section 6.2. experimental settings in Section 6.1,wefollowed our experimental results Section 6.2. For the performance study, use threeby real-world road networks [43],in which are described in Table 5. These real-world road networks have different sizes and are part of the US’s road network. 6.1. Experimental Settings 6.1. Experimental Table 6 showsSettings the range of each variable used in the experiments with defaults indicated in bold. For For the performance study, of wethe use three real-world road networks [43], which described convenience, each dimension data universe is normalized independently to unit are length, such in the performance study, we use three real-world roadand networks [43], which areroad described in TableFor 5. These real-world road networks have different sizes are part of the US’s that the threshold distance is in the range of [0, 1]. The positions of both the outer objects andnetwork. the Table 5. These real-world road networks have different sizes and are part of the US’s road network. Table 6 shows the range of each variable used in the experiments with defaults indicated in bold. For Table 6 showseach the dimension range of each variable used in the experiments with defaultstoindicated in bold. For convenience, of the data universe is normalized independently unit length, such that convenience, each dimension of the data universe is normalized independently to unit length, such the threshold distance ε is in the range of [0, 1]. The positions of both the outer objects and the inner that the threshold distance is in the range of [0, 1]. The positions of both the outer objects and the


18 of 23

objects follow either centroid or uniform distributions. The centroid dataset is generated to resemble the real-world data. First, 10 centroids are selected randomly. The objects around each centroid follow a normal distribution, where the mean is set to the centroid and the standard deviation is set to 1% of the side length of the data universe. In each experiment, we vary one or two of the parameters within the range shown in Table 6, while keeping other parameters at default values. The outer objects and the inner objects follow the centroid distribution unless stated otherwise. Table 5. Real-world roadmaps. Name CAL FLA COL

Description California and Nevada Florida Colorado

Number of Vertices

Number of Edges

Number of Vertex Sequences

1,890,815

2,315,222

1,794,708

1,070,376 435,666

1,343,951 521,200

1,100,675 374,355

Table 6. Experimental parameter settings. Parameter

Range

Threshold distance (ε)

0.005, 0.01, 0.03, 0.05, 0.1 1, 5, 10, 20, 30 (×103 ) for CAL and FLA 1, 3, 5, 7, 10 (×103 ) for COL (C)entroid, (U)niform (C)entroid, (U)niform CAL, FLA, COL

Numbers of outer objects (| R|) and inner objects (|S|) Distributions of outer objects Distributions of inner objects Real-world roadmaps

We implement and evaluate two versions of our proposed solution, i.e., the naive EDISON and EDISON methods. As a benchmark for our proposed method, we use a baseline method that computes the range query of every outer object using the RNE algorithm [10]. A comparison with the pre-computed distance-based solution [23] and the Euclidean distance-based solution (e.g., JER [10]) is beyond the scope of this study, because these methods cannot support frequent network traffic updates. All algorithms are implemented in C++ in Microsoft Visual Studio 2015, and they use common subroutines for similar tasks. We conduct experiments on a desktop computer running Windows 10 with a 4 GHz processor and 32 GB of memory. We believe that indexing structures of all techniques should be memory resident to ensure responsive query processing, which is assumed in many recent studies [5,11] and is crucial to online map services and commercial navigation systems. We determine the average values based on 10 repetitions of the experiments for each algorithm. 6.2. Experimental Results Figure 15 compares the query processing times using the baseline, naive EDISON, and EDISON methods to evaluate ε-distance join queries in the CAL roadmap, where each chart illustrates the effect of changing one or two of the parameters in Table 6. The first and the second values in parentheses indicate the number of range queries that are evaluated by the naive EDISON and EDISON methods, respectively. The numbers of range queries evaluated by the baseline method are omitted, because these numbers become min{| R|, |S|}, i.e., the cardinality of the smaller dataset between | R| and |S|. Figure 15a shows the query processing time as a function of the threshold distance ε. Although the EDISON method shows the worst performance for ε = 0.005, the processing times using the baseline and the naive EDISON methods increase more rapidly with the value of ε than those using the EDISON method. This implies that the shared execution of the EDISON method is more effective for a large threshold distance ε. The baseline, naive EDISON, and EDISON methods evaluate a total of 10,000 range queries, 959 range queries, and 448 range queries, respectively. Figure 15b shows the query processing time as a function of the number | R| of outer objects, while the number of inner objects is fixed at |S| = 10, 000. The EDISON method is less sensitive to variations in | R| than the other methods, although it shows the worst performance at | R| = 1000. Owing to the benefit of the shared execution


19 of 23

processing, the numbers of range queries evaluated by the naive EDISON and EDISON methods increase with | R7,|.x FOR PEER REVIEW ISPRSslightly Int. J. Geo-Inf. 2018, 19 of 23

(959, 448) (959, 448)

(959, 448) (2758, 1181) (1902, 844)

0.1 (959, 448)

(667, 346)

(959, 448)

(959, 448)

| |

(294, 178)

(a)

(b) (10000, 10016)

(944, 440) (1902, 844) (667, 346)

(2758, 1181) (959, 448)

(959, 448)

(1089, 539)

| |, | |

(294, 178)

(c)

(d)

Figure 15. Comparison of the baseline, naive EDISON, and EDISON methods for CAL (a) varying ε; Figure 15. Comparison of the baseline, naive EDISON, and EDISON methods for CAL (a) varying ; (b) varying | R|; (c) varying | R| and |S|; (d) varying the distributions of objects. (b) varying | |; (c) varying | | and | |; (d) varying the distributions of objects.

| | of Figure shows queryprocessing processing time time as ofof both thethe number objects Figure 15c 15c shows thethe query asaafunction function both number objects | Router | of outer | | and the number of inner objects. The naive EDISON method outperforms the EDISON method and the number |S| of inner objects. The naive EDISON method outperforms the EDISON method for |, |S| |, |S| ≤ 10,000, latter outperforms thefor former ≤ This 1000 for ≤ | R1000 000, whereaswhereas the latterthe outperforms the former 20, 000for≤ 20,000 000. |, |S|≤≤| 10, | R|, |S|≤≤| 30, 30,000. This indicates that the EDISON method optimizes the shared execution processing more indicates that the EDISON method optimizes the shared execution processing more effectively than the effectively than the naive EDISON method for large datasets. Clearly, the baseline method shows the naive EDISON method for large datasets. Clearly, the baseline method shows the worst performance in worst performance in most cases. Figure 15d shows the query processing time for various mostdistributions cases. Figureof15d shows the query processing time for various distributions of outer objects and outer objects and inner objects, where each ordered pair (i.e., (C, C), (C, U), (U, C), innerand objects, where each ordered pairof(i.e., U, C)objects , and (and U, Uinner a combination (C, C), (C, Uof ), (outer )) denotes (U, U)) denotes a combination the distributions objects. Because of theshared distributions of outer objects and inner objects. Because shared execution processing is favorable execution processing is favorable for non-uniform distributions of objects, the naive EDISON for non-uniform distributions of objects, the naive EDISON and EDISON methods significantly and EDISON methods significantly outperform the baseline method for (C, C), (C, U), and (U, C). However, processing times for of the EDISON and methodsthe for processing (U, U) are very outperform thethebaseline method C), (C, U), and C). However, times of (C, naive (U,EDISON similarEDISON to the baseline method. This is because outer objects and the inner objects are widely This the naive and EDISON methods for both very similar to the baseline method. (U, Uthe ) are scattered, which shared and execution processing. is because both the hinders outer objects the inner objects are widely scattered, which hinders shared Figure 16 compares the query processing times using the baseline, naive EDISON, and EDISON execution processing. methods to evaluate -distance join queries in the FLA roadmap. Figure 16a shows the query Figure 16 compares the query processing times using the baseline, naive EDISON, and EDISON processing time as a function of , when varies between 0.005 and 0.1. The EDISON method methods to evaluate ε-distance join queries in the FLA roadmap. Figure 16a shows the query processing achieves the best performance for 0.01 ≤ ≤ 0.1, because it evaluates the smallest number of range time queries as a function ε, three whenmethods. ε varies between 0.005naive and EDISON, 0.1. The EDISON method achieves the best amongof the The baseline, and EDISON methods evaluate performance for 0.01 ≤ ε ≤ 0.1, because it evaluates the smallest number of range queries among 10,000 range queries, 1320 range queries, and 665 range queries, respectively. Figure 16b shows the the threequery methods. The baseline, naive EDISON, and| EDISON methods evaluate 10,000 queries, | varies between processing time as a function of | |, when 1000 and 30,000. Therange EDISON | ≤ 30,000. clearly and outperforms thequeries, other methods for 5000 ≤ | 16b Duequery to shared execution 1320 method range queries, 665 range respectively. Figure shows the processing time as | | processing, the naive EDISON and EDISON methods are less sensitive to changes in than the a function of | R|, when | R| varies between 1000 and 30,000. The EDISON method clearly outperforms | |,naive | | baseline method. Figure the 000. queryDue processing timeexecution as a function of | | andthe whenEDISON the other methods for 500016c ≤ |shows R| ≤ 30, to shared processing, (| |) varies between 1000 and 30,000. The EDISON method clearly outperforms the other methods for and EDISON methods are less sensitive to changes in | R| than the baseline method. Figure 16c shows 5000 ≤ | |, | | ≤ 30,000. Figure 16d shows the query processing time for various distributions of the query processing time as a function of | R| and |S|, when | R| (|S|) varies between 1000 and 30,000. outer objects and inner objects. The naive EDISON and EDISON methods significantly outperform The EDISON method clearly outperforms the other methods for 5000 ≤ R|, |S| ≤ 30, 000. Figure 16d the baseline method for (C, C) , (C, U) , and (U, C) , whereas the former| methods show similar shows the query processing time for various distributions of outer objects and inner objects. The naive performance to the latter method for (U, U) for the same reason as in the CAL case. EDISON and EDISON methods significantly outperform the baseline method for (C, C), (C, U), and


20 of 23

(U, C), whereas the former methods show similar performance to the latter method for (U, U) for the sameISPRS reason in the CAL case. Int. J.as Geo-Inf. 2018, 7, x FOR PEER REVIEW 20 of 23

(1320, 665)

(3827, 2075) (2497, 1324)

(1320, 665) (1320, 665)

(1320, 665) (792, 430) (233, 159)

(1320, 665)

| |

(1320, 665)

(a)

(b) (10000, 10021)

(3827, 2075) (2497, 1324) (792, 430)

(1302, 651) (1320, 665)

(1428, 755)

(1320, 665)

(233, 159)

| |, | |

(c)

(d)

Figure 16. Comparison of the baseline, naive EDISON, and EDISON methods for FLA (a) varying ε; Figure 16. Comparison of the baseline, naive EDISON, and EDISON methods for FLA (a) varying ; (b) varying | R|; (c) varying | R| |and varying the distributions of objects. |S|;| (d) | |; | |; (b) varying

(c) varying

and

(d) varying the distributions of objects.

Figure 17 compares query processingtimes times using naive EDISON, and and EDISON Figure 17 compares thethe query processing usingthe thebaseline, baseline, naive EDISON, EDISON methods to evaluate -distance join queries in the COL roadmap. Figure 17a shows the query methods to evaluate ε-distance join queries in the COL roadmap. Figure 17a shows the query processing processing time as a function of , when varies between 0.005 and 0.1. The naive EDISON and time as a function of ε, when ε varies between 0.005 and 0.1. The naive EDISON and EDISON methods EDISON methods significantly outperform the baseline method in all cases. Specifically, the query significantly outperform the baseline method in all cases. Specifically, the query processing time processing time of the EDISON method is up to 11 times shorter than the baseline method at = 0.1. of theThe EDISON method is up to 11 times shorter than the baseline method at ε = 0.1. The naive naive EDISON method significantly outperforms the EDISON method for 0.005 ≤ ≤ 0.01, EDISON method significantly outperforms EDISON method for 0.005 ε EDISON ≤ 0.01, whereas whereas the latter outperforms the former forthe 0.03 ≤ ≤ 0.1. This indicates that≤the method the latterisoutperforms former for ε ≤ 0.1. EDISON This indicates the 17b EDISON is less less sensitivethe to changes in 0.03 than≤the naive method.that Figure shows method the query sensitive to changes than the EDISON method. Figure 17b 10,000. showsThe thenaive query processing | |, when | | varies processing time asina εfunction of naive between 1000 and EDISON significantly outperform baseline method in naive all cases and the former time and as a EDISON function methods of | R|, when between the 1000 and 10,000. The EDISON and EDISON | R| varies | | methods are less sensitive to changes in than the latter method. This indicates that methods significantly outperform the baseline method in all cases and the former methods the are less performance difference thelatter EDISON methodThis and the baselinethat method rapidlydifference with sensitive to changes in | Rbetween method. indicates theincreases performance | than the | |. the query processing time of the EDISON method is up to 155 times shorter than the betweenSpecifically, the EDISON method and the baseline method increases rapidly with | R|. Specifically, the baseline method at | | = 10,000. Figure 17c shows the query processing time as a function of | | query processing time of the EDISON method is up to 155 times shorter than the baseline method and | |, when | | (| |) varies between 1000 and 10,000. The naive EDISON method outperforms the at | R|EDISON Figure processing timeoutperforms as a function = 10, 000. | R| and |, when | | shows method at 17c = | | = the 1000query , whereas the latter theofformer for|S5000 ≤ | R| (|S|) varies andperformance 10,000. Thedifference naive EDISON outperforms the with EDISON method | |, | | between | | and ≤ 10,000, 1000 and the betweenmethod the two methods increases at | R| = 1000, whereas the lattermethod outperforms the former 5000 000, and the S| =implies ≤ 10,EDISON | R|,the |S| naive |. |This | and | |≤than that the EDISON scales better with | for performance difference between two methodstime increases withdistributions that the | R|. and |Sof |. This method. Figure 17d shows the the query processing for various outerimplies objects and innermethod objects. The naive EDISON methods outperform baseline method EDISON scales better withand and |S| than the significantly naive EDISON method.theFigure 17d shows the | R| EDISON (C, C), (C, U), and C). However, all methods showobjects similar and performance when the objects queryfor processing time for(U, various distributions of outer inner objects. Theouter naive EDISON and the inner objects follow uniform distributions (U, U). This is expected because both the outer and EDISON methods significantly outperform the baseline method for (C, C), (C, U), and (U, C). objects and inner objects are widely scattered for (U, U), which obstructs shared execution processing However, all methods show similar performance when the outer objects and the inner objects follow of the naive EDISON and EDISON methods. uniform distributions (U, U). This is expected because both the outer objects and inner objects are widely scattered for (U, U), which obstructs shared execution processing of the naive EDISON and EDISON methods.


21 of 23


21 of 23

(720, 381) (720, 381) (720, 381)

0.1

0.1

(305, 215)

(912, 485) (1097, 521) (555, 321) (720, 381)

(720, 381)

| |

(720, 381)

(a)

(b) (4999, 5024)

(715, 377) (555, 321) (720, 381)

0.1

(912, 485) (1097, 521)

(305, 215) (720, 381)

(836, 463)

| |, | |

(c)

(d)

Figure 17. Comparison of the baseline, naive EDISON, and EDISON methods for COL (a) varying ε; Figure 17. Comparison of the baseline, naive EDISON, and EDISON methods for COL (a) varying ; (b) varying | R|; (c) varying | R| and |S|; (d) varying the distributions of objects. (b) varying | |; (c) varying | | and | |; (d) varying the distributions of objects.

7. Conclusions 7. Conclusions In In this study, we investigated of ε-distance joinqueries queriesinindynamic dynamic road this study, we investigatedthe theefficient efficient processing processing of -distance join road networks. We proposed a cost-effective solution called EDISON that optimizes the shared execution networks. We proposed a cost-effective solution called EDISON that optimizes the shared execution method to avoid redundant network traversal. WeWe implemented and evaluated a baseline method and method to avoid redundant network traversal. implemented and evaluated a baseline method twoand versions of EDISON, i.e., the i.e., naive and EDISON methods. The experiments are based two versions of EDISON, theEDISON naive EDISON and EDISON methods. The experiments areon basedreal-world on several real-world and involve wide range of parameter values. results The several roadmaps androadmaps involve a wide range of aparameter values. The experimental are summarized follows: and (1) the naive methods EDISON significantly and EDISONoutperform methods areexperimental summarizedresults as follows: (1) the naiveasEDISON EDISON outperform baseline method; (2) the naive EDISON and EDISON methods are thesignificantly baseline method; (2) thethe naive EDISON and EDISON methods are typically comparable in terms typically comparable in terms of query processing time; (3) the EDISON method scales better with of query processing time; (3) the EDISON method scales better with increasing number of objects increasing number of objects and naive threshold distance than the In naive EDISON In future work,the and threshold distance than the EDISON method. future work,method. we plan to extend we plan to extend the shared execution approach used here to the problems of processing shared execution approach used here to the problems of processing sophisticated spatial queries sophisticated spatial queries over road distance networks,join such as multi-way joink-farthest queries [44] and over road networks, such as multi-way queries [44] anddistance aggregate neighbor aggregate k-farthest neighbor queries [45,46]. These problems have not been adequately addressed queries [45,46]. These problems have not been adequately addressed with regard to road networks with regard to road networks despite their importance. despite their importance. Author Contributions: H.-J.C. designed the study, performed the experiments, and wrote the paper. Author Contributions: H.-J.C. designed the study, performed the experiments, and wrote the paper. Funding: This work was supportedby bythe theNational National Research Research Foundation funded byby thethe Funding: This work was supported FoundationofofKorea Korea(NRF) (NRF)grant grant funded Korean government (MSIP) (NRF-2016R1A2B4009793). Korean government (MSIP) (NRF-2016R1A2B4009793). Conflicts of Interest: The authordeclares declaresno noconflict conflictof ofinterest. interest. Conflicts of Interest: The author

References References Delling, Goldberg, A.V.; Pajor, Werneck, R.F. Customizable route planningininroad roadnetworks. networks.Transp. Transp.Sci. 1. 1. Delling, D.;D.; Goldberg, A.V.; Pajor, T.;T.; Werneck, R.F. Customizable route planning Sci. 2017, 51, 566–591. 2017, 51, 566–591. [CrossRef] 2. Samet, H.; Sankaranarayanan, J.; Alborzi, H. Scalable network distance browsing in spatial databases. In 2. Samet, H.; Sankaranarayanan, J.; Alborzi, H. Scalable network distance browsing in spatial databases. Proceedings of the International Conference on Management of Data, Vancouver, BC, Canada, 10–12 June In Proceedings of the International Conference on Management of Data, Vancouver, BC, Canada, 2008. 10–12 June 2008.


3. 4. 5. 6. 7. 8.

9. 10. 11. 12. 13. 14.

15. 16. 17. 18.

19. 20. 21.

22.

23. 24. 25.

22 of 23

Sankaranarayanan, J.; Alborzi, H.; Samet, H. Efficient query processing on spatial networks. In Proceedings of the International Workshop on Geographic Information Systems, Bremen, Germany, 4–5 November 2005. Sankaranarayanan, J.; Samet, H.; Alborzi, H. Path oracles for spatial networks. PVLDB 2009, 2, 1210–1221. [CrossRef] Wu, L.; Xiao, X.; Deng, D.; Cong, G.; Zhu, A.D.; Zhou, S. Shortest path and distance queries on road networks: An experimental evaluation. PVLDB 2012, 5, 406–417. [CrossRef] Zhang, D.; Yang, D.; Wang, Y.; Tan, K.-L.; Cao, J.; Shen, H.T. Distributed shortest path query processing on dynamic road networks. VLDB J. 2017, 26, 399–419. [CrossRef] Zhong, R.; Li, G.; Tan, K.-L.; Zhou, L.; Gong, Z. G-tree: An efficient and scalable index for spatial search on road networks. IEEE Trans. Knowl. Data Eng. 2015, 27, 2175–2189. [CrossRef] D’Angelo, G.; D’Emidio, M.; Frigioni, D. Distance queries in large-scale fully dynamic complex networks. In Proceedings of the International Workshop on Combinatorial Algorithms, Helsinki, Finland, 17–19 August 2016. Sankaranarayanan, J.; Samet, H. Query processing using distance oracles for spatial networks. IEEE Trans. Knowl. Data Eng. 2010, 22, 1158–1175. [CrossRef] Papadias, D.; Zhang, J.; Mamoulis, N.; Tao, Y. Query processing in road network databases. In Proceedings of the International Conference on Very Large Data Bases, Berlin, Germany, 9–12 September 2003. Abeywickrama, T.; Cheema, M.A.; Taniar, D. k-Nearest neighbors on road networks: A journey in experimentation and in-memory implementation. PVLDB 2016, 9, 492–503. [CrossRef] Luo, S.; Kao, B.; Li, G.; Hu, J.; Cheng, R.; Zheng, Y. TOAIN: A throughput optimizing adaptive index for answering dynamic knn queries on road networks. PVLDB 2018, 11, 594–606. Chaudhuri, S.; Ganti, V.; Kaushik, R. A primitive operator for similarity joins in data cleaning. In Proceedings of the International Conference on Data Engineering, Atlanta, GA, USA, 3–8 April 2006. Deng, D.; Li, G.; Hao, S.; Wang, J.; Feng, J. MassJoin: A mapreduce-based method for scalable string similarity joins. In Proceedings of the International Conference on Data Engineering, Chicago, IL, USA, 31 March–4 April 2014. Metwally, A.; Faloutsos, C. V-SMART-Join: A scalable mapreduce framework for all-pair similarity joins of multisets and vectors. PVLDB 2012, 5, 704–715. [CrossRef] Sarma, A.D.; He, Y.; Chaudhuri, S. ClusterJoin: A similarity joins framework using map-reduce. PVLDB 2014, 7, 1059–1070. Vernica, R.; Carey, M.J.; Li, C. Efficient parallel set-similarity joins using mapreduce. In Proceedings of the International Conference on Management of Data, Indianapolis, IN, USA, 6–10 June 2010. Wang, Y.; Metwally, A.; Parthasarathy, S. Scalable all-pairs similarity search in metric spaces. In Proceedings of the International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013. Hjaltason, G.R.; Samet, H. Incremental distance join algorithms for spatial databases. In Proceedings of the International Conference on Management of Data, Seattle, WA, USA, 2–4 June 1998. Shin, H.; Moon, B.; Lee, S. Adaptive and incremental processing for distance join queries. IEEE Trans. Knowl. Data Eng. 2003, 15, 1561–1578. [CrossRef] Mahmud, H.; Amin, A.M.; Ali, M.E.; Hashem, T.; Nutanong, S. A group based approach for path queries in road networks. In Proceedings of the International Symposium on Spatial and Temporal Databases, Munich, Germany, 21–23 August 2013. Mouratidis, K.; Yiu, M.L.; Papadias, D.; Mamoulis, N. Continuous nearest neighbor monitoring in road networks. In Proceedings of the International Conference on Very Large Data Bases, Seoul, Korea, 12–15 September 2006. Sankaranarayanan, J.; Alborzi, H.; Samet, H. Distance join queries on spatial networks. In Proceedings of the International Symposium on Geographic Information Systems, Arlington, VA, USA, 10–11 November 2006. Arain, Q.A.; Deng, Z.; Memon, I.; Zubedi, A.; Jiao, J.; Ashraf, A.; Khan, M.S. Privacy protection with dynamic pseudonym-based multiple mix-zones over road networks. China Commun. 2017, 14, 89–100. [CrossRef] Arain, Q.A.; Memon, I.; Deng, Z.; Memon, M.H.; Mangi, F.A.; Zubedi, A. Location monitoring approach: Multiple mix-zones with location privacy protection based on traffic flow over road networks. Multimedia Tools Appl. 2018, 77, 5563–5607. [CrossRef]


26.

27.

28.

29. 30. 31. 32. 33. 34. 35. 36. 37.

38. 39. 40.

41. 42. 43. 44. 45.

46.

23 of 23

Arain, Q.A.; Uqaili, M.A.; Deng, Z.; Memon, I.; Jiao, J.; Shaikh, M.A.; Zubedi, A.; Ashraf, A.; Arain, U.A. Clustering based energy efficient and communication protocol for multiple mix-zones over road networks. Wirel. Pers. Commun. 2017, 95, 411–428. [CrossRef] Domenic, M.K.; Wang, Y.; Zhang, F.; Memon, I.; Gustav, Y.H. Preserving users’ privacy for continuous query services in road networks. In Proceedings of the International Conference on Information Management, Innovation Management and Industrial Engineering, Xi’an, China, 23–24 November 2013. Gustav, Y.H.; Wang, Y.; Domenic, M.K.; Zhang, F.; Memon, I. Velocity similarity anonymization for continuous query Location based services. In Proceedings of the International Conference on Computational Problem-Solving, Jiuzhai, China, 26–28 October 2013. Kamenyi, D.M.; Wang, Y.; Zhang, F.; Memon, I. Authenticated privacy preserving for continuous query in location based services. J. Comput. Inform. Syst. 2013, 9, 9857–9864. Memon, I. Distance and clustering-based energy-efficient pseudonyms changing strategy over road network. Int. J. Commun. Syst. 2018, 31, 1–22. [CrossRef] Memon, I. Authentication user’s privacy: An integrating location privacy protection algorithm for secure moving objects in location based services. Wirel. Pers. Commun. 2015, 82, 1585–1600. [CrossRef] Memon, I.; Arain, Q.A. Dynamic path privacy protection framework for continuous query service over road networks. World Wide Web 2017, 20, 639–672. [CrossRef] Memon, I.; Arain, Q.A.; Zubedi, A.; Mangi, F.A. DPMM: Dynamic pseudonym-based multiple mix-zones generation for mobile traveler. Multimedia Tools Appl. 2017, 76, 24359–24388. [CrossRef] Memon, I.; Chen, L.; Arain, Q.A.; Memon, H.; Chen, G. Pseudonym changing strategy with multiple mix zones for trajectory privacy protection in road networks. Int. J. Commun. Syst. 2018, 31, 1–44. [CrossRef] Ali, M.E.; Tanin, E.; Zhang, R.; Kulik, L. A motion-aware approach for efficient evaluation of continuous queries on 3D object databases. VLDB J. 2010, 19, 603–632. [CrossRef] Giannikis, G.; Alonso, G.; Kossmann, D. SharedDB: Killing one thousand queries with one stone. PVLDB 2012, 5, 526–537. [CrossRef] Thomsen, J.R.; Yiu, M.L.; Jensen, C.S. Effective caching of shortest paths for location-based services. In Proceedings of the International Conference on Management of Data, Scottsdale, AZ, USA, 20–24 May 2012. Zhang, D.; Chow, C.-Y.; Li, Q.; Zhang, X.; Xu, Y. SMashQ: Spatial mashup framework for k-nn queries in time-dependent road networks. Distrib. Parall. Databases 2013, 31, 259–287. [CrossRef] Brinkhoff, T.; Kriegel, H.-P.; Seeger, B. Efficient processing of spatial joins using r-trees. In Proceedings of the International Conference on Management of Data, Washington, DC, USA, 26–28 May 1993. Chen, C.; Sun, W.; Zheng, B.; Mao, D.; Liu, W. An incremental approach to closest pair queries in spatial networks using best-first search. In Proceedings of the International Conference on Database and Expert Systems Applications, Toulouse, France, 29 August–2 September 2011. Koudas, N.; Sevcik, K.C. High dimensional similarity joins: Algorithms and performance evaluation. IEEE Trans. Knowl. Data Eng. 2000, 12, 3–18. [CrossRef] Makreshanski, D.; Giannikis, G.; Alonso, G.; Kossmann, D. MQJoin: Efficient shared execution of main-memory joins. PVLDB 2016, 9, 480–491. [CrossRef] 9th DIMACS Implementation Challenge: Shortest Paths. Available online: http://www.dis.uniroma1.it/ challenge9/download.shtml (accessed on 15 June 2018). Corral, A.; Manolopoulos, Y.; Theodoridis, Y.; Vassilakopoulos, M. Multi-way distance join queries in spatial databases. GeoInformatica 2004, 8, 373–402. [CrossRef] Gao, Y.; Shou, L.; Chen, K.; Chen, G. Aggregate farthest-neighbor queries over spatial data. In Proceedings of the International Conference on Database Systems for Advanced Applications, Hong Kong, China, 22–25 April 2011. Wang, H.; Zheng, K.; Su, H.; Wang, J.; Sadiq, S.; Zhou, X. Efficient aggregate farthest neighbour query processing on road networks. In Proceedings of the Australasian Database Conference, Brisbane, Australia, 14–16 July 2014. © 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Shared Execution Approach to -Distance Join Queries in ... - MDPI

Shared Execution Approach to -Distance Join Queries in ... - MDPI

Suggest Documents

Multi-Way Distance Join Queries in Spatial Databases - Delab

Optimization of Parallel Execution for Multi-Join Queries - Knowledge ...

Optimization of Parallel Execution for Multi-Join Queries - Knowledge ...

Shared Execution of Path Queries on Road Networks

Processing Distance Join Queries with Constraints - Semantic Scholar

VA-Files vs. R*-Trees in Distance Join Queries - Semantic Scholar

Processing Top-k Join Queries - VLDB Endowment

FREQUENCY-ADAPTIVE JOIN FOR SHARED ...

The customized-queries approach to CBIR - CiteSeerX

Hierarchical Indexing Approach to Support XPath Queries

A Knowledge-Based Approach to Facilitate Queries

Optimizing Multi-Join Queries in Parallel Relational ... - CiteSeerX

Rank Join Queries in NoSQL Databases - VLDB Endowment

Consistent Join Queries in Cloud Data Stores - Semantic Scholar

Cost Models for Join Queries in Spatial Databases - Semantic Scholar

Scalable Join Queries in Cloud Data Stores - Globule

The Threshold Join Algorithm for Top-k Queries in ... - CiteSeerX

Consistent Join Queries in Cloud Data Stores - Globule

Scalable Join Queries in Cloud Data Stores - Globule

Cost Models for Join Queries in Spatial Databases - Semantic Scholar

An Execution Backtracking Approach to Program ...

Structured Approach to Project Execution, Monitoring

Harissa: A Hybrid Approach to Java Execution

HP Distance via Double Cut and Join Distance - Semantic Scholar