Mining moving objects trajectories in Location-based services for ...

3 downloads 13407 Views 387KB Size Report
Keywords: LBS (Location-based Services), spatio-temporal data mining, mining ... “Data mining”, which is often also referred to as “Knowledge Discovery in ...
Mining moving objects trajectories in Location-based services for spatio-temporal database update Danhuai Guo∗a, Weihong Cuia a Institute of Remote Sensing Applications, Chinese Academy of Sciences, Beijing 100101 ABSTRACT Advances in wireless transmission and mobile technology applied to LBS (Location-based Services) flood us with amounts of moving objects data. Vast amounts of gathered data from position sensors of mobile phones, PDAs, or vehicles hide interesting and valuable knowledge and describe the behavior of moving objects. The correlation between temporal moving patterns of moving objects and geo-feature spatio-temporal attribute was ignored, and the value of spatio-temporal trajectory data was not fully exploited too. Urban expanding or frequent town plan change bring about a large amount of outdated or imprecise data in spatial database of LBS, and they cannot be updated timely and efficiently by manual processing. In this paper we introduce a data mining approach to movement pattern extraction of moving objects, build a model to describe the relationship between movement patterns of LBS mobile objects and their environment, and put up with a spatio-temporal database update strategy in LBS database based on trajectories spatiotemporal mining. Experimental evaluation reveals excellent performance of the proposed model and strategy. Our original contribution include formulation of model of interaction between trajectory and its environment, design of spatio-temporal database update strategy based on moving objects data mining, and the experimental application of spatio-temporal database update by mining moving objects trajectories. Keywords: LBS (Location-based Services), spatio-temporal data mining, mining moving trajectory objects, spatiotemporal database, update strategy

1.

INTRODUCTION

With the increasing popularization of LBS (Location-based Service), more and more researchers and businessmen concentrate on how to increase it’s QoS (Quality of Service) constrained by data quality. There are four classical questions that LBS users are concerned with: where am I currently? What and where are the nearest locations of interest? How to get there? [1] The four questions can develop to spatio-temporal ones, if we extend the questions like: Where am I currently since I got in bus No.803 twenty minutes ago in Tiananmen station? What and where is the nearest parking lot with vacancy? How to get there and avoid traffic jam? The current LBS systems constrained by information collecting technology and data transition infrastructure hardly answer these questions meeting users’ satisfaction. The current services provided by LBS are short of context. As we all know, the human movement and other behaving are constrained by his/her surroundings. Environment imprints more or less its spatio-temporal character in trajectory, which records LBS consumer’s movement. A large amount of moving objects trajectories implies considerable quantity spatio-temporal information of external environment. LBS spatio-temporal database, which is updated by traditional method, hardly keep pace of changing of spatial and temporal attribute of environment. On the one hand, the amount of trajectory data is explosively increasing; on the other hand, spatio-temporal knowledge of environment of trajectory is scarce[2]. “Data mining”, which is often also referred to as “Knowledge Discovery in Databases” (KDD) and “Pattern recognition”, is a young sub-discipline of computer science aiming at the automatic interpretation of large datasets. Knowledge discovery in databases (KDD) is the non-trivial extraction of implicit, previously unknown, and potentially useful information

from databases [3] [4]. Over the last decade, a wealth of research articles on new data mining techniques has been

published, and the field keeps on growing, both in industry and in academia. Spatio-temporal data mining is an emerging research area dedicated to the development and application of novel computational techniques for the analysis of large spatio-temporal databases. Spatio-temporal data mining is not merely general data mining extension to spatio-temporal dataset, as both the spatial and temporal dimension adds substantial complexity to data mining tasks. Spatio-temporal data mining is a novel train of thought to extract hiding and useful knowledge from moving objects trajectories. ∗

[email protected]; phone: (86) 10 64889568; fax: (86) 10 64889211 Geoinformatics 2008 and Joint Conference on GIS and Built Environment: Geo-Simulation and Virtual GIS Environments, Lin Liu, Xia Li, Kai Liu, Xinchang Zhang, Aijun Chen, Eds., Proc. of SPIE Vol. 7143 71432M · © 2008 SPIE · CCC code: 0277-786X/08/$18 · doi: 10.1117/12.812625 Proc. of SPIE Vol. 7143 71432M-1

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/18/2013 Terms of Use: http://spiedl.org/terms

With the decrease of the positioning systems and data transferring devices prices as well as remarkable advances of wireless communication technology, massive amount of moving objects trajectories is easily obtained. A trace is a trajectory generated by the motion of a single moving object, as recorded by GPS receivers or other positioning devices. The LBS users’ traces are recorded and transmitted to trajectory database when they enjoy the services. The trajectories are imprinted by their environment, it’s the objects of our mining implementation to find environment patterns and to update LBS spatio-temporal database. This paper challenges to mine trajectories to update spatio-temporal database. The rest of the paper is organized as follows. Section 2 introduces and summaries some related work of moving objects mining. Section 3 describes the mining algorithms and implementation. Section 4 proposes spatio-temporal database update strategies. The conclusion and future work is discussed in section 5.

2.

RELATED WORK

Many researchers contributed to moving objects mining in the recent years. Schroedl S. et al employed differential GPS receivers and spatial clustering algorithm to extract centerline, lane position, and intersection structure and road segments[5]. But this method was limited by accuracy of GPS receivers, as most of GPS receivers embedded in vehicles are not differential GPS receivers with decimeter level accuracy but the navigational ones with a few ten’s meters accuracy. Laube, P. et al (2005) and Gudmundsson, J. et al (2007) provided REMO to detect four spatio-temporal patterns namely flock, leadership, convergence, and encounter[6] [7]. Verhein, F. and Chawla, S. (2006) defined STARs that described how objects moved between regions over time and presented two useful patterns in mobility data: stationary regions and high traffic regions consisting of sources, sinks and thoroughfares[8]. Li, X. (2007) proposed a density-based algorithm, namely FlowScan, to mine hot routes in road network[9]. The database projection based association rule mining method described in [10] was efficient to extract long, sharable frequent routes. The spatiotemporal similarity calculation provided in[11] [12] [13] investigated STDist and provided variant calculators in variant situation, the algorithms are adopted in our experiment. Lee, J. et al (2004) proposed a new data mining technique and algorithms for identity temporal patterns from series of locations of moving objects in LBS, employed spatial operation to generalize location of moving object, and applied time constraints between locations of moving objects to make valid moving sequences[14]. Yang, J. and Hu, M. (2006) proposed a model of trajectories patterns and a novel measure to represent the expected occurrences of pattern in a set of imprecise trajectories and mined sequential patterns from imprecise trajectories of mobile objects[15]. Jin, M et al (2007) proposed an approach to predict future location of moving behavior from their long-term moving history[16]. Andersson, M. (2007) proposed leader and follower patterns and developed mining methods to extract those patterns from trajectories[17]. Compieta, P. et al (2007) investigated visualization of discovered knowledge and employed GooglEarth APIs and Java3D to visualize mining results[18]. There are other researches contributing to spatio-temporal data mining in clustering, mining grouping data and mining from variant granularity[19] [20] [21] [22].

3.

MOVING OBJECTS TRAJECTORY MINING

Moving objects are geometries, which may be points, lines, regions or volumes, changing their position over time. Objects may move/change at a specific instant in time, without any existence or knowledge of their existence in between. Movement is perceived as neither continuous nor stepwise, but a collection of separate instants or intervals. A person and a car are obvious examples of potential moving objects. A moving object is an object that is not constrained to keep the same position during its whole existence. Objects that move become particularly interesting when we recorded their trajectory[23]. In this paper, hereinafter we restrict the term moving object to denote an object to which we can associate a trajectory. Although the trajectory is naturally thought as the record of the position of the object in variant instant, in reality trajectory has to be built from a set of sample points i.e. the positions of the moving object in variant instant. Trajectory data modeling has received a lot of attention from the researchers who either apply existing spatio-temporal data models to trajectory data or propose new models specifically dedicated to moving objects and their trajectories. The main trajectory data sampling method include: Linear interpolation and Bezier curves interpolation. The former is based on the idea that to connect the sample points with straight lines, the linearity is expressed in the fact that equal jumps in time(between the same sample points) lead to equal jumps in space. This method is based on assumptions that the moving objects have constant speed and direction between sample points, and the speed is the minimal average speed needed to cover the distance between (xi, yi) and (xi+1,yi+1) in time[ti, ti+1], and that changes in speed and direction at sample point are abrupt and discontinuous. The advantage of this method is fast and easy to construct and handle. The

Proc. of SPIE Vol. 7143 71432M-2 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/18/2013 Terms of Use: http://spiedl.org/terms

latter is based on the idea that to connect sample points with a Bezier cure where each spatial coordinate is a third-degree polynomial of the time coordinate, the vector in these sample points are precisely the velocity vectors to the curve in those points. Bezier curves with nice and smooth shape are fast to construct but hard to handle. There are three categories of approach for modeling trajectories, namely spatio-temporal data models, constraint data models and moving objects models. The spatio-temporal data models are off-the-shelf and supported by many GIS standard and models such as ISO TC 211, STER, Perceptory and MADS, but are not accepted by the database community. In constraint database model, which was introduced and developed by Kanellakis, Kuper and Revesz [24], trajectories can be seen as a collection of infinite points connecting a finite number of sample points. The representative models of constraint model are linear constraint model, differential geometry model [25] and Equations of motion model [26]. The moving object model focuses on keeping track of object location and supporting queries. The widely used in industry approach is to model a moving point by generating periodically a location-time point of the form (l, t), indicating that the objects is at location l at time t. 3.1. Trajectories are constrained by environment The human moving activity is a combined outcome of subjective will and influences of environment. Human moving behavior is constrained by his/her surroundings, i.e. walkers should walk in sideway as well as the drivers should drive in road network and obey traffic rules and traffic signal. Take a simple example: Tom is going to drive to Wal-Mart for shopping at 14:00. He planes to drive to Road A, then U-turns at the intersection of Road A and Road 2, namely Intersection 2. When he drives near to Intersection 2, he notices Intersection 2 is “No U-turn” in the time interval 7:00~22:00. Then there are two shortest driving routes left for his option. In the first route, he could drive straightly at Intersection 2, U-turns at the intersection of Road A and Road 1, namely Intersection 3, then drives straightly to Intersection 2, finally gets to the destination. In another route, he would turn left at Intersection 2 to Road 2, U-turns at the intersection of Road 2 and Road B, namely Intersection 1, then turns right at Intersection 2 to Road A, finally gets to the destination (Fig 1). In this example, it is the subjective will to go to Wal-Mart; the road network and its constraints are the environment. Moving objects trajectories are mixed effect of moving objects’ “subjective will” and constraints of the environment. The “subjective will” is explicit, which can be easily derived from the start point and end point of trajectories or trajectories segments. The constraint of environment is implicit and mixed by many factors, but is interesting and valuable. The environment constraints include topological constraints and temporal constraints. The topological constraints include restrictions of the road network and obstacles of unmovable objects, etc. The temporal constraints include limitation of temporal rules and accident, etc. Road B

Road A

Destination

Start Point

Fig 1 Tom’s option of driving route. (Tom is going to drive to Wal-Mart. The red and blue solid lines are optional routes. Dash-dot lines represent the road centerlines. Intersection 2 is “No U-turn”.)

A single trajectory is blended with moving object’s subjective will, synthetic topological constraints and synthetic temporal constraints. It can be noted as:

traj = will + ∑ constr _ topo + ∑ constr _ temp

(1)

The topological constraint can be decomposed to many sub constraint factors such as: speed limitation of specific roads or road parts, turn forbidden of specific intersections etc., while the temporal constraint can be decomposed to many sub constraints factors such as: traffic control in specific time interval, traffic jam in specific time interval, and road-closure in specific instant etc..

Proc. of SPIE Vol. 7143 71432M-3 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/18/2013 Terms of Use: http://spiedl.org/terms

∑ constr _ topo = constr _ topo + constr _ topo 1

2

+ ... + constr _ topom

(2)

+ ... + constr _ tempn

(3)

And:

∑ constr _ temp = constr _ temp + constr _ temp 1

2

Different trajectory has different topological constraints and different temporal constraints. But it is possible that two variant trajectories have one or more common sub topological constraints factors, if the two trajectories pass through the same area and are influenced by the same constraints factors. In like manner, when two trajectories pass through the same area in the same interval, they may be influenced by one or more common sub temporal constraints factors. It can be naturally deduced that the specific sub topological constraints factors and temporal constraints factors are associated with some people moving behaviors, which are recorded as trajectories. To a single trajectory, a single sub constraints factor’s affection to trajectory is weak and is often covered up by the person’s subjective will and mixed with other constraints. It is hard to extract a single trajectory’s environment constraint by dynamic systems methods as the environment affecting mechanism is too complex. It was tested by many applications that data mining could find association rules from large amount of phenomena which is not related on appearance. It is a challenge to mine constraints of environment and their sub parts from a large amount of trajectories. 3.2. Moving objects data mining Moving objects mining is a retrieve of knowledge discovery from trajectory database, i.e. the process of converting raw trajectory data into useful knowledge. In this paper, we focus on discovering the live traffic information, which is also described as constraints of environment in Section 3.1. The trajectory mining consists of a series of transformation steps: z

Trajectory preprocessing, which transforms the raw trajectories into appropriate form for the subsequent analysis;

z

Trajectories selecting, which selects candidate trajectories that meet terms mining implementation;

z

Actual data mining, which transforms the prepared trajectories into patterns or models: classification models, clustering models, association patters, etc;

z

Post-processing of trajectory mining results, which assesses validity and usefulness of the extracted patterns and models, and presents interesting knowledge to final users by using appropriate method and integrating knowledge into database updating.

3.2.1. Data preprocessing The road network spatial data in most current commercial traffic map are organized in Arc-Node model, which is not sufficient in precise road and lane representation. The trajectories collected by GPS receivers and other positioning devices store the location 2- or 3-D geo-coordinate that requires a spatial reference system. In traffic applications, it has been verified that road linear referencing system is more efficient in road indexing and location querying than spatial reference system [27] [28] [29]. The raw trajectories would be transformed to linear referencing systems and road segment. ( xi , yi , ti ) ⇒ ( Rm , ln , Di , ti ) . In the transformation, Rm and ln are the road number and lane number which the moving objects is located in time instant ti ; Di is the segment length along the road between the perpendicular point of trajectory sample point and the road start point. All the roads number and lanes number are assigned by spatial indexing. Each road and lane is numbered uniquely. The GPS recorded coordinates sometimes “derivate” as satellites signal sheltering, multi-way effect and other reasons. When GPS “derivates” too much, it is actually difficult to determine the road and lane numbers. There are two methods to deal with these imprecise trajectories: the first one is to get rid of those trajectories or trajectory parts, the other is to transform imprecise trajectories sample points to corresponding most-probable locations. The former can raise mining efficiency but lost some useful data. In some urban road segment around high building, if the out-of-accuracy-demand trajectories in those places are filtered by the first method, the trajectories of those places are always of hiatus. The latter tries to transform the imprecise data to its most probable position according to moving object moving inertia and moving trend predicting models. Yang J. and Hu M. (2006) provided an improved probable position algorithm based on predict _ loc = last _ loc + v × t [15]. We adopt that algorithm and extend it as follows:

Proc. of SPIE Vol. 7143 71432M-4 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/18/2013 Terms of Use: http://spiedl.org/terms

z

If the moving object in roadi in time t1, and the time interval ∆t is small enough, in the moment t2 = t1 + ∆t , the moving object will keep in the same road i;

z

Adjustment based on offset: There are moving objects trajectory sample points A, B, C… (arranged by offset distance from the road centerline) in different lane of the same road in the same direction and in the same moment, if any sample point (e.g. A) is exceed the range of reasonable lane position, lane number of the innermost sample point is set to the innermost lane number and the other trajectories point are rearranged according to their offset distance, and vice versa (Fig 2). The velocity trajectories would be adjusted similarly.

ArK c

(b)

ca)

Fig 2 The adjustment of imprecise trajectory. (a) The recorded position of raw trajectory. (b) The adjusted position of trajectory

3.2.2. Similar trajectories retrieving and data selecting The trajectories investigation described in Section 3.1 shows that only those trajectories with spatio-temporal similarity (passing the same regions and in the same time interval) can be influenced by same constraints of environment. It is critical to retrieve trajectories with spatio-temporal similarity before actual mining. Trajectory retrieving based on spatiotemporal similarity is in fact spatio-temporal clustering modeling and classification modeling. There are two main measurements of spatio-temporal similarity i.e. Euclidean Distance and spatio-temporal distance (STDist) [11] [12]. The method based on Euclidean Distance considers spatial similarity measurement as the only measurement, and is inappropriate in retrieving similar trajectories constrained by road network. The method based on STDist takes spatial similarity measure, temporal measure and spatio-temporal measure into consideration, and performs more excellent in similar trajectories retrieving [11]. The STDist of two trajectories is the product of spatial distance (SD) and temporal distance (TD).

STDist = (

SD

δ

+ 1) *(TD + 1)

(4)

SD between two trajectories is defined as the sum of distances of edge pairs from the two trajectories, and the SD between the two edges is calculated by averaging the distances between the start nodes of the trajectories edges and the distance between the end nodes of the trajectories. TD between two trajectories is defined as summation of time range distance (TRD), day distance (DD) and week distance (TWD) by variant coefficient. In formula (4),“δ”is the spatiotemporal weight and is assigned various value in various mining applications. The algorithm to retrieve the similar spatio-temporal trajectories based on STDist can be designed as: Input: spatial positions, time intervals Output: Similar trajectories with Spatio-temporal Distance (STDist) 1. Creating an ideal trajectory satisfying all the input conditions 2. Using ideal trajectory, find all trajectories that match at least one segment of ideal trajectory; 3. For all members of Result of step 2, calculate SD and TD and STDist

Proc. of SPIE Vol. 7143 71432M-5 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/18/2013 Terms of Use: http://spiedl.org/terms

4. Sort the trajectories in the ascending order of STDist 5. Select m most similar trajectories (according to user specific threshold) Thus the candidate trajectories are chosen and prepared for the next step actual data mining. The candidate trajectories cover the vast majority of trajectories influenced by constraints to be mined. 3.2.3. Trajectories mining Data preprocessing based on road linear reference systems and road segmentation as well as trajectories retrieving based on spatio-temporal similarity simplify the trajectories and reduce the amount of mining objects. But the trajectory data mining is still a grand challenge from spatio-temporal point of view. During the last five years, attempts have been made to extend many ordinary data mining techniques in classical relational or transactional data, such as association rule mining, frequent pattern discovery, clustering, classification, predication and time-series analysis, to knowledge discovery of data with spatial and temporal tags[30]. To moving objects trajectory, the main spatio-temporal constraint is the road condition and other traffic condition. So the obvious tasks we would like to perform concern the road live condition and road information in high granularity. The other information could be mined from trajectories in similar ways. The following sections focus on how to extract traffic conditions from trajectories. The algorithm extracting knowledge near road intersection or away from road intersection is different, so the following sections describe the data mining algorithms for the two different situations respectively. 3.2.3.1. Traffic condition extraction near road intersection When driving to a road intersection, the drivers are concerned with whether the coming intersection is straight through, left-turn, right-turn or U-turn, and if it is turn-allowed, which lanes are the turn lanes? In way-finding and other transporting planning applications, the road intersection turn information in variant time interval is one of critical factors. In data mining from trajectories near road intersection, the emphasis of mining tasks is to find the correlations of variant road and lane and calculate their support. The first process is to select all trajectories and clip trajectory segments which are related with the road intersection Road_Inti in time interval [t1, t2]. Select trajectories and clip trajectory segments from trajectories where trajectory.interpoint is near Road_Inti and t1

Suggest Documents