Detour Queries in Geographical Databases for Navigation ... - CiteSeerX

0 downloads 0 Views 378KB Size Report
The current version also contains animations of rel- evant problems utilizing geographical databases such as the k-mean, used in spatial data mining, and the.
Detour Queries in Geographical Databases for Navigation and Related Algorithm Animations Tetsuo Shibuya, Hiroshi Imai Department of Information Science, University of Tokyo Tokyo 113, Japan

Shigeki Nishimura, Hiroshi Shimoura and Kenji Tenmoku Sumitomo Electric Industries, Ltd., Osaka, Japan

Abstract

In geographical databases for navigation, users raise various types of queries concerning route guidance. The most fundamental query is a shortest-route query, but, as dynamical trac information newly becomes available and the static geographical database of roads itself has grown up further, more exible queries are required to realize a user-friendly interface meeting the current settings. One important query among them is a detour query which provides information about detours, say listing several candidates for useful detours. We have proposed ecient algorithms for enumerating meaningful detours [6, 14]. In this paper, we rst review our algorithms for the static case, and discuss their extensions to incorporate dynamical information in an ecient manner. Also, in connection with the user interface part, animation of the proposed algorithm is performed, and its prototype version is made public via WWW. In a more general setting, we discuss data mining of this rapidly growing geographical database as an interesting target to derive useful information from vast geographical data. Applications of this data mining cover a broad class of real-world problems such as urban planning, environmental assessment, social welfare, facility management, disaster prevention, etc., from the governmental standpoint and, marketing, customer management, etc., in the business world. This paper investigates the road network for navigation in the geographical database from this point of view, and proposes how to obtain a nice collection of candidates satisfying user requirements by our clever enumeration approach for detours and how to present them to users by visual interfaces. 1

Introduction

In geographical databases, various types of queries arise, especially for spatial and topological queries. One such typical query is a shortest-path query. This query is very important in mobile computing environments for car navigation based on geographical databases. In fact, in such mobile computing environments, computing power may not be so high, and under that restricted circumstances, route navigation

systems must show the shortest route to the destination as fast as possible. Thus, realizing a shortest-path query on the road network database is very crucial, and its algorithmic issues have been investigated very well for a long time. We have investigated many3algorithms and proposed the sophisticated use of A and bidirectional search techniques in the query in [6]. Recently, as dynamical trac information newly becomes available such as  ATIS (Advanced Trac Information Service; Trac Information Service Co.), and  VICS (Vehicle Information & Communication System; VICS Promotion Council) in Japan, more sophisticated queries come to be required. Also, the static geographical database of roads itself has grown up further, and similarly in this respect advanced types of queries are necessary to realize a user-friendly interface meeting the current circumstances. One important query among them is a detour query which provides information about detours; for example, enumerating several candidates for useful detours. We have proposed an ecient algorithm for enumerating meaningful detours [14]. In this paper, we rst discuss the problem of executing data mining in geographical databases, especially for navigation. Since the geographical databases grow rapidly, this is a very interesting target to challenge, and generalizes the current main stream of data mining in relational databases [1, 13] and extensions to data mining in spatial databases [3, 4, 11]. We next review our algorithms in [6, 14] so that we can meet the current new demands for these advanced multimedia applications. Among the useful techniques summarized here, the use of A3 technique is then investigated so that available dynamical information can be incorporated in an good way in the detour query. Enumerating a set of good candidates so that the user may choose interesting ones by their own requirements is a key step in data mining, and our approach provides an ecient tool for the purpose. Also, in connection with the user interface part, animation of the proposed algorithm is performed, and its

prototype version is made public via Internet WWW at http://naomi.is.s.u-tokyo.ac.jp/ in its algorithm animation library. The current version also contains animations of relevant problems utilizing geographical databases such as the k-mean, used in spatial data mining, and the vehicle routing problem. 2

Data Mining in Geographical Databases

Data Mining, or Knowledge Discovery in Databases (KDD) in databases is to nd interesting, previously unknown and useful information from large databases. There have been many studies of data mining in relational as well as transaction databases as the rst targets of this eld; e.g., see [1, 13]. Now, data mining has been extended to other types of databases. In this section, we rst mention extensions to spatial databases [3, 4, 11], and our related results in this setting [8]. Then, we discuss issues to investigate for data mining in geographical databases, especially topological geographical data. This leads to our target problem in this paper of listing an appropriate candidate set of solutions (or, knowledge). 2.1 Spatial data mining Data mining in spatial databases, or spatial data mining has been proposed; see a survey paper [11] and also [3, 4]. Spatial data mining refers to the extraction of implicit knowledge, spatial relations, or other patterns not explicitly stores in spatial databases. In the existing spatial data mining, basically part of geographic information having strong connection with remote sensing, image databases exploration seems to have been investigated, and a clustering approach is adopted to derive knowledge. Main algorithmic tools used in this approach are k-mean, k-medoid and their extensions. The basic algorithms for them are wellknown and have been used in many areas. Especially, in connection with geographical databases, the socalled geographical optimization approach provides a general algorithmic framework in terms of mathematical programming and computational geometry [9]. To give an idea about these, we here show an example in Fig.1, taken from [8], of applying the k-mean algorithm to about 20,000 points corresponding to big crossings of road network in Kanto district in Japan. This clustering itself is basically intended for experimental use, and not for some speci c data mining, and yet this example would illustrate how large the amount of geographical data even in this restricted area and its geometric nature. Our research results on the spatial data mining will be reported elsewhere. 2.2

Data mining in topological structures of geographical databases

As is seen from the above arguments and examples, the existing spatial data mining have treated some geometric features of spatial databases. In geographical databases, besides geometric/spatial data, there are topological data representing incidence relations among objects in geographical information. For example, suppose that boundary polygonal lines of administrative districts are represented just as a sequence of

(a)

(b) Figure 1: Application of the k-mean algorithm for 20726 road crossing points with k = 100: (a) initial random solution, (b) solution obtained by the k-mean line segments, that is, just as pictorial data. Although even in this case incidence relations among segments are treated, this representation does not maintain incidence relations about administrative districts, and this causes spatial queries hard to implement. This type of topological data is sometimes called vector data in geographical information systems. Among them, road networks have topological structures in themselves, and, due to recent technological advancements in trac information systems, as mentioned in the introduction, road network databases become huge and dynamic. Topological data such as road networks represent implicitly transitive relations, that is, even if two vertices in the network are not connected by an edge,

they may be connected by a path, a sequence of adjacent edges. Since road networks are almost planar, the number of edges is linear in the number of vertices, while they represent the number of transitive relations among vertices which is quadratic in the number of vertices. Furthermore, if we distinguish transitive adjacency relations between two vertices generated by distinct paths, the original network data represent an exponential size of relations (the number of paths is basically exponential). Thus, topological part of geographical databases generates an exponential-size search space, and in this huge space there are some interesting and useful information for data mining. When combined with other qualitative data such as statistics of geographical information systems, more useful and currently implicit facts may be found. In geographical databases, various types of queries arise. Especially for topological queries, a typical query is a shortest-path query. This query is very important in mobile computing environments for car navigation based on geographical databases. In fact, in such mobile computing environments, computing power may not be so high, and under that restricted circumstances, route navigation systems must show the shortest route to the destination as fast as possible, further utilizing available dynamic trac information. When viewed as network/combinatorial optimization problems, this is just the ordinary shortest-path problem, but, when combined with such new computing environments, they are problems in the integration system of database systems, data visualization and information technology, including communications, and algorithmics. In fact, standardization of geographical and road databases has started, and in it an aspect of information technology is much stressed (e.g., see [7]). Finding useful information related to shortest-path queries as data mining for these data thus deserves investigation. For geographical databases, the shortestpath queries is part of various types of queries, and these also provide the necessity of data mining for applications elds in governmental and business world. In this paper, we consider detour queries in geographical databases. This is a good target requiring data mining, because in most cases users have multiple criteria for a `nice' route and one path that is shortest with respect to one criterion is not sucient. Therefore, the second best, third best, etc., are required, but here we need care to de ne `nice' suboptimal paths, since there are too many suboptimal paths for the user to check all. Similar problems happen in the standard data mining of association rules in relational databases [1]. In the standard case, the transitive relation above corresponds to subset relations of the set of items in each transaction, and, in nding large itemsets, many candidates exist and we have to select meaningful ones to obtain nally nice association rules. This paper provides practically useful solutions to resolve this part for detour queries by our clever way of manipulating suboptimal candidates.

2.3

Visual interface for data mining results in geographical databases

Visual interfaces are very crucial in presenting data mining results in geographical databases, since they are pictorial in nature. Representing a path in a series of crossing names is not sucient at all, and the path should be depicted in an appropriate-size map. Especially, to show many candidates for detours so that the user can choose appropriate ones among them, animation is necessary. Use of visual interface in data mining has been proposed (e.g., [5, 10] and others), and yet there would be new interfaces necessary for geographical databases. In this paper, we do not go into details of the visual interface, but show related animations as algorithm animation on WWW. 3

Detour-Finding Algorithms

In this paper, let the graph in assumption be a directed graph, G = (V; E ), in which l(v; w) is the length of the edge (v; w) which is always non-negative, and s be the source and t be the destination when we consider the shortest path problem or the k shortest path problem. Adding to this, let d(u; v) be the shortest path length from u to v, n be jV j and m be jE j. We here mainly summarize the modi ed A3 algorithm and the suboptimal techniques. 3.1

A3 algorithm and edge-length transformation 3

The A algorithm nds the shortest path from source s to destination t eciently, using a heuristic estimate h(v) for the length of the shortest path from vertex v to the destination t, which is not longer than the actual shortest path to the destination. The estimator is called dual feasible if h satis es the following constraint: l(u; v ) + h(v )  h(u) ((u; v) 2 E ) (1) For example, the Euclidean distance between v and t can be used for a dual feasible estimator in a graph of a road network. If the estimator is dual feasible, the A3 algorithm can be easily translated to the ordinary Dijkstra method by modifying the length of the edges as stated in [6] and fully used in [14]: Theorem 1 Let h be a dual feasible estimator for s. The Dijkstra method on a graph in which the length of the edge (u; v), or l(u; v) is replaced by l0 (u; v) as follows is equivalent to the A3 algorithm on the original graph.

( ) = l(u; v) + h(v) 0 h(u) (2) In [6], furthermore, another method, called the bidirectional search method, is also examined. In Fig.2 and Fig.3, we show areas searched by respective algorithms, where their sizes a ect the eciency of each algorithm. l0 u; v

3.2

Suboptimal technique

Eppstein [2] proposed an algorithm which nds the k shortest paths implicitly regardless of cycles, in time O(m + n log n + k), or O (m + n log n + k log k) if the

(a) The Dijkstra method

(b) The A3 algorithm

(c) The bidirectional Dijkstra method

(d) The bidirectional A3 algorithm

Figure 2: The distribution of searched edges on nding the shortest path from Machida (lower) to Tokorozawa (upper), where the shortest path is shown by bold lines

Figure 3: The distribution of searched edges on nding the shortest path from Machida (left) to Ichihara (right) by the bidirectional Dijkstra method after the edge length transformation corresponding to A3 output paths are sorted [2]. We brie y summarize this algorithm here. At rst, we de ne (u; v) for the edge (u; v) as follows:  (u; v ) = l(u; v ) + d(v; t) 0 d(u; t) (3) This (u; v) means how much longer it will take than the optimal way if we go to the edge (u; v), and therefore this value is always non-negative. If we search by the Dijkstra method from the destination t, a shortest path tree to t can be made. If an edge (u; v) is on this tree, (u; v) is zero. If an edge is not on the shortest path tree, it is called a sidetrack. If we go along an s-t path p other than the shortest path, there must be sidetracks on the path, and we de ne sidetrack(p) as the nearest sidetrack to t within them. We can conceptually consider a heap, in which the parent of a path p is a path which is same as p until sidetrack(p) and go along the shortest path instead of going to sidetrack(p). We de ne this parent of p as parent(p). The root of the heap is the shortest path, and all the path from s to t appear in the heap once. In this heap, p is  (sidetrack (p)) longer than parent(p). The basic concept of the Eppstein's algorithm is to modify this path heap to 4-heap, i.e., a heap such that each node has at most 4 children. Once the 4-heap has made, we can get the k shortest paths in O(k) time, or O(k log k) time if we sort the output paths. This algorithm is very ecient for nding the k shortest paths from all the vertices to one destination, or one source to the other vertices. But, for the 2-terminal problem, this algorithm may search much more vertices than necessary, and such issues are re-3 solved in [14]. Furthermore, in [14], incorporating A in this suboptimal technique is mentioned. If a dual feasible estimator is given, transforming the length of an edge l(u; v) to l0(u; v) as described at (2) in Theorem 1 does not change the k shortest paths: Theorem 2 The k shortest paths from s to t on a graph in which the length of the edge (u; v), or l(u; v) is replaced by l0 (u; v) as in (2) are same as those on the original graph.

Thus we can use the A3 algorithm implicitly by changing the length of the edges. 3.3

Answering clever way

the

detour

query

in

a

`Detour' is a path which is short but overlaps little with the shortest path. To nd it is very important in route navigation systems, ATM network, and so on. We discuss how to gain this detour based on the above algorithms. In searching a detour, the overlapping length of a detour with the shortest path is an important factor. This length can be computed very fast for every path encountered in searching in the path heap graph. For details, see [14]. `Detour' is not so clear concept. Thus we must de ne it precisely. In easiest way, we may de ne it as follows: `Detour' is the shortest path which has overlap, which is shorter than the half of the shortest path length, with the shortest path length. But this de nition requires searching the path heap in order until a desired path will be found, which means it takes O(k log k) time in checking k paths, and setting 1 is dicult. Furthermore, the obtained detour by this de nition may branch o and join the shortest path many times. In the car navigation system, such a detour is not desirable. Taking these things into consideration, we de ne `detour' as follows [14]:  longer than the shortest De nition 1 `Detour' is 1 path at most, branch o and join the shortest path only once, and has the smallest overlap with the shortest path among such paths. If several paths satis es these constraints, choose the shortest one.

s

t shortest path

Figure 4: Explanation of the de nition of `detour' In Fig.4, there are two `branchings', and each may be used separately in our detours, but, the path using both of them is neglected in our approach, since it can be generated from the previous two paths. We now discuss how to get the detour de ned in De nition 1 here. The method to search the paths which branch o and join the shortest path only once in the path heap is as follows: If u of (u; v) = sidetrack(p) is on the shortest path tree to t and parent(p) is not the shortest path, or popt , we only have to search children of p in HT . This technique can be also applied in the bidirectional method, and note that listing paths which branch o and join the shortest path i times can be done with a similar method. Also, notice that, if there is longer overlap from s to sidetrack(p) along p than the temporary

shortest overlap, we also have to search children of p only in HT . To make this technique more ecient, we should not search the children of popt in HT (s) from the root of it, but search Hout(v) and its children from s to t along popt .

Figure 5: Two representations of crossings by networks It should be noted that our algorithm is based on Eppstein's algorithm, and hence still considers suboptimal paths contain cycles. Using the technique mentioned above, we can remove such paths eciently, and also the ratio of paths containing cycles is small, and hence our approach results in a practical algorithm to enumerate suboptimal paths without cycles. In connection with this, how to represent a road network in computers becomes very important for our detour algorithm. In a nave representation, the network is constructed by representing each crossing by a vertex and connecting two adjacent crossing by two directed edges with opposite directions (Fig.5, left). However, by this representation, short cycles are formed by going to an adjacent crossing and returning to the current crossing. This decreases the eciency of the detour algorithm. On the other hand, in our test data, the network is represented as Fig.5, right, in principle (note that in Japan cars keep left). With this representation, of course there exist cycles around each block in towns, but their length are longer enough for the detour algorithm to ignore them. Also, with this representation, small and large penalties may be put to left and right turn by assigning small and large conceptual length to corresponding edges, respectively. In this respect, our detour algorithm ts practical representation of road networks. 3.4 An example Figure 6 shows the obtained detours from Sayama to Matsudo, on an real road network database of Kanto district area in Japan, when 1 is 100 seconds and 120 seconds. In the gure, the thickest line is the shortest path, and the relatively thinner line which branch o it is the obtained detour. Note that the left terminal is Sayama and the other Matsudo. In this road network data, intersections are not directly represented as nodes in the graph (cf. Fig.5),

(1) 1 = 100

(a) Animation of listing nave suboptimal paths, where only small di erence can be seen (2) 1 = 120 Figure 6: Detours between Sayama and Matsudo since costs of turning left or right or going straight in intersections di er and U-turn cannot be made in most intersections, and so on. Accordingly, we let a node in the graph be one side of a road segment between intersections. Hence, in a road network, the detour de ned in De nition 1 can cross the shortest path at intersections in the map. 4

Incorporating Dynamic Information in Query Processing

Dynamic trac information provides jam information in detail. If we have computed shortest paths under ideal trac conditions, incorporating dynamic information in our framework of the A3 and suboptimal technique is rather direct based on the techniques we have developed so far. First, the ideal values are a dual estimator for the A3. Hence, by executing the edge-length transformation, all our methods can be applied in this case. Here is one problem of how to maintain ideal values. As far as the destination is xed, by solving a shortest3 path problem to the destination with using some A estimator, say distance information, for the ideal fastest value, the computed ideal values can be used as a very good estimator in the subsequence computation. We have to compute these values within some wider region than the region necessary for the computation of ideal shortest paths, and, for this issue, our approach of a suboptimal region E1 given in [6, 14] can be used, where E1 is the set of edges contained in a

(b) Animation of listing our detours, where at the 120th stage a nice detour is detected and only this type of detours can be reported by using user criteria Figure 7: Animation of detour algorithms path from the source to the destination having length at most 1 longer than the shortest one. This 1 may be set to an expected delay from the ideal time in the preprocessing stage; then, we need not compute this good estimator for A3 application in the process of computation. Also, the value of 1 may be set in connection with a memory size and other computing environments (note that, in mobile car navigation system, computing power is not so high).

(a) k-mean animation for real data

(b) k-mean animation for a bad initial solution for points uniformly distributed in a square Figure 8: k-mean animation 5

Algorithm Animation and User Interface

Now that it becomes possible to obtain meaningful detours at hand, our next problem is how to show them to users. Just presenting all candidates are not enough. A version of ATIS service mentioned in the introduction provides three routes, including the shortest one, as candidates. However, just providing three candidates may not be enough to meet various types of user requirements. By our method, we can very easily provide much more candidates, which di er from one another greatly, and then users may set some parameters based on their preference to select appropriate candidates among them. Also, presenting these information as a real-time animation may be helpful. Furthermore, from the viewpoint of algorithm explanation, this ani-

Figure 9: Animation of a tabu search process for the Vehicle Routing Problem (VRP) which is a very common real-world problem mation is very useful, and we are planning to incorporate expository aspects of algorithmic issues in computing shortest and suboptimal paths. With these in mind, we have made an algorithm animation of our method. The current version is available at http://naomi.is.s.u-tokyo.ac.jp/ in its algorithm animation library. The current version does not contain its explanation, and this will be improved soon. If you open the page above, proceed to the algorithm animation library where other animations are also provided. These animations are encoded in MPEG, and can be viewed in various environments. We have also made animation (in Fig.8) for the kmean discussed in section 2.1, and further as another promising application of geographical databases the vehicle routing problem animation (Fig.9) is shown (cf. [12]). These animations would give us insight concerning user interface of presenting query results in geographical databases. Acknowledgment

The authors would like to thank members of Imai Laboratory at University of Tokyo, especially Ms. Mary Inaba and Messrs. Ken-Ichi Asai, Motoki Nakade and Narihiro Park, for their cooperation in this research project. This work was supported in part by the Grant-in-Aid for Scienti c Research on Priority Areas \Advanced Databases" of the Ministry of Education, Science, Sports and Culture of Japan. References

[1] R. Agrawal, T. Imielinski and A. Swami: Mining Association Rules Between Sets of Items in Large Databases. Proceedings of the 1993 ACM SIGMODD International Conference on Management of Data, 1003, p.207{216.

[2] D. Eppstein: Finding the k Shortest Paths. Proceedings of the 35th IEEE Annual Symposium on Foundations of Computer Science, 1994, pp154{ 165. [3] M. Ester, H.-P. Kriegel, J. Sander and X. Xu: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD96), 1996.

[4] M. Ester, H.-P. Kriegel and X. Xu: Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Ecient Class Identi cation. Proceedings of the 4th International Symposium on Large Spatial Databases, 1995, pp. 67{82. [5] T. Fukuda, Y. Morimoto, S. Morishita and T. Tokuyama: Data Mining Using Two-dimensional Optimized Association Rules: Scheme, Algorithms, and Visualization. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, 1996, pp.13{23. [6] T. Ikeda, M.-Y. Hsu, H. Imai, S. Nishimura, H. Shimoura, K. Tenmoku and K. Mitoh: A Fast Algorithm or Finding Better Routes by AI Search Techniques. Proceedings of the International Conference on Vehicle Navigation & Information System (VNIS'94), 1994, pp.90{99. [7] H. Imai, K.-I. Asai and M. Inaba: Trends in Standardization in Geographical Databases and Geographical Information Systems (in Japanese). Proceedings of the Matsue Workshop on Advanced Databases, Vol.2, 1996, pp.311{316. [8] H. Imai and M. Inaba: Geometric Clustering with Applications. Zeitschrift fur Angewandte Mathematik und Mechanik (ZAMM), ICIAM95 Proceedings, Issue 3, \Applied Stochastics and Optimization", (1996), pp.183{186. [9] M. Iri, K. Murota and T. Ohya: A Fast VoronoiDiagram Algorithm with Applications to Geographical Optimization Problems. Proceedings of the 11th IFIP Conference on System Modelling and Optimization, Lecture Notes in Control and Information Science, Vol.59, Springer-Verlag, 1984, pp.273{288. [10] D. A. Keim, H.-P. Kriegel and T. Seidl: Supporting Data Mining of Large Databases by Visual Feedback Queries. Proceedings of the 10th International Conference on Data Engineering, 1994, pp. 302{313. [11] K. Koperski, J. Adhikary, and J. Han: Knowledge Discovery in Spatial Databases Progress and Challenges. Proceedings of Workshop on Research Issues on Data Mining and Knowledge Discovery, 1996.

[12] M. Nakade, N. Park, H. Imai, S. Nishimura, H. Shimoura and K. Tenmoku: Tabu Search Approach to the Vehicle Routing Problem and Its Experimental Analysis (in Japanese). IPSJ SIG Notes SIGAL, IPSJ, 1997, to appear. [13] G. Piatetsky-Shapiro and W. J. Frawley, eds.: Knowledge Discovery in Databases. AAAI/MIT Press, 1991. [14] T. Shibuya, T. Ikeda, H. Imai, S. Nishimura, H. Shimoura, and K. Tenmoku: Finding a Realistic Detour by AI Search Techniques. Proceedings of the 2nd Intelligent Transportation Systems (ITS'95), 1995, pp.2037{2044.

Suggest Documents