Temporal Pattern Mining of Moving Objects for ... - Springer Link

4 downloads 3620 Views 85KB Size Report
Since the conventional studies on data mining do not consider spatial and temporal aspects of ... Section 2 explores the definition of moving objects. Section 3 ...
Temporal Pattern Mining of Moving Objects for Location-Based Service1 1

2

1

Jae Du Chung , Ok Hyun Paek , Jun Wook Lee , and Keun Ho Ryu

1

1

Department of Computer Science, Chungbuk National University, San 48, Gaesin-dong,Cheongju, Chungbuk, Republic of Korea {chungjaedu,junux,khryu}@dblab.chungbuk.ac.kr 2 Agency for Defense Development, Republic of Korea [email protected]

Abstract. LBS(Location-Based Service) is generally described as an information service that provides location-based information to its mobile users. Since the conventional studies on data mining do not consider spatial and temporal aspects of data simultaneously, these techniques have limited application in studying the moving objects of LBS with respect to the spatial attributes that is changing over time. In this paper, we propose a new data mining technique and algorithms for identifying temporal patterns from series of locations of moving objects that have temporal and spatial dimensions. For this purpose, we use the spatial operation to generalize a location of moving point, applying time constraints between locations of moving objects to make valid moving sequences. Finally, we show that our technique generates temporal patterns found in frequent moving sequences.

1 Introduction LBS aims to accurately identify individuals’ locations and, by applying this information to various marketing and services, provide more personalized and satisfying mobile service to its users. The service can particularly be applicable to the sectors with changeable locations over time, such as PDA, mobile telephone, automobile, airplane, etc. Such changeable entities, in terms of location and pattern over time, are defined as “moving objects”[6,11]. The temporal changes of moving objects tend to possess a unique, regular pattern. This pattern can be traced by using the temporal data mining technique [14]. The pattern of moving objects which is discovered by data mining can be quite useful to location-based information service in identifying users’ moving paths [14]. Prior studies, however, have paid little attention to the location data of moving objects. The study of temporal pattern mining for location-based moving objects is similar to the analysis of transaction database [2,10,15] and study of consumer behavior on the web [3,8,9]. It should be pointed out, nonetheless, that since the prior research does not take spatial attributes into account, it is not sufficient to discover 1

This work was supported in part by KOSEF RRC(Cheongju Univ. ICRC) and KISTI Bioinformatics Research Center.

R. Cicchetti et al. (Eds.): DEXA 2002, LNCS 2453, pp. 331–340.  Springer-Verlag Berlin Heidelberg 2002

332

J.D. Chung et al.

spatial patterns of moving objects and thus has a limitation when applying the pattern theory to the specific sectors, such as LBS. In this paper, we adopt a methodological approach that features several distinct stages. First, location information is generalized by applying spatial operation to moving objects in two-dimensional coordinate system, and then transformed into knowledge that conveys the location information of moving objects. Second, a time constraint between locations of moving pattern is imposed in order to transform, in search of moving patterns, uncertain moving sequence into effective transaction. This approach can be attributable to the unclear components of a sequence in moving objects mining. By imposing the maximum time constraint between two areas that constitute a sequence, a sequence is generated only when the time span between two locations satisfies the maximum time constraint. Finally, an algorithm that discovers a significant pattern from the moving sequence of moving object is presented. This algorithm is an appropriate extension and application of Apriori [1] for addressing the issues concerning moving pattern mining. The paper proceeds as follows. Section 2 explores the definition of moving objects. Section 3 discusses the discovery of moving pattern. Section 4 investigates a new method of moving pattern mining. The proposed method is tested and evaluated through experiments in Section 5. In conclusion, we summarize and provide concluding remarks. Some suggestions for future research design are also provided.

2 Description of Moving Objects The existing models of data mining are too static to properly identify the location of moving objects, which continues to change over time. Many diverse researches followed to trace moving objects in temporal and spatial dimensions [5,7,8,12,13]. A location change of moving objects may occur in a discrete or continuous pattern, and thus, it can be described as a point in time or time periods. Each description has its merit as well as deficiency and thus far, there is no consistent and well-developed definition of the moving object. Here, we abstract essential definitional components of the term with respect to moving pattern mining. As seen in Fig. 1, the topological attributes of moving objects continue to change sequentially on the two-dimensional coordinate of x- and y-axis. Since we cannot properly describe the continuous changes of moving objects in the real world setting, we draw the moving locations of objects using discrete points. Each point represents the starting and ending points of the time span. The moving object description adopted in this paper, therefore, embraces only the basic components for generality, implying that the location is sampled at specific points. The spatial attributes of moving objects will be described using a plane coordinate system with x- and y-axis. Mpoint, an abstracted type of moving objects, is defined as follows: Definition 1. Mpo int = oid , {(VT1 , L1 ), (VT2 , L2 ), " , (VTn , Ln )} , where oid = a discriminator for the object that possesses unique components, vt = effective time, and L = location of the sampled object denoted by x, y.

Temporal Pattern Mining of Moving Objects for Location-Based Service

333

t y (t4, x4, y4) (t3, x3, y3)

(t2, x2, y2) (x3, y3) (t1, x1, y1)

(t0, x1, y1)

(x4, y4)

(x2, y2)

(x1, y1)

x

Fig. 1. Location change of the moving objects

Table 1 shows descriptive examples of such moving objects in a form of a relational database table. Table 1. Example of the moving objects

oid 100 100 100

vt 2001/10/10/13/10 2001/10/10/13/20 2001/10/10/13/25

x 3321000 3397000 3385000

y -233100 -463600 -523600

3 Problem Definition of Moving Pattern Mining Let L = {l1 , l2 ,", lm } denote a finite set of coordinates that represent spatial location attributes of moving objects, where l i = ( xi , y i ) and each x i , yi represents the coordinate value of moving object on a two-dimensional coordinate system. Also, let A = {a1 , a2 ,", an } denote a set of areas that represent the value of spatial location attributes of moving objects, where for 1 ≤ j ≤ n , a j = (l1 , l2 ,..., lk ) and l k = ( xk , yk ) . This allows for the use of representative coordinate values to describe an area, and the spatial attributes of moving objects being described, using coordinate values, can be transformed into an area through spatial operation. This transformation of coordinate values, representing spatial information of moving objects, into an area including those coordinates is called “generalization of the location”. A sequence, S = {s1s2 " sk } is an ordered list of the areas, where k denotes the length of a sequence, s j = (t j , a j ) , t j denotes the specific time the moving objects were sampled, and a i ∈ A . Time constraints restrict the time gap between two areas that constitute a sequence. The occurrence times of consecutive movements are denoted by tj, t j-1. The maximal time gap, max_gap, is defined as follows

t j −t j −1 ≤ max_gap, 2 ≤ j ≤ k

(1)

A sequence with k number of lengths, i.e., a sequence composed of k number of areas, is denoted as k -sequence. An area may appear several times in a given sequence. For a given moving object, sequentially arranged areas over time are referred to as the “moving sequence”. If a1 = bi , a 2 = bi , ", a n = bi exist for 1

2

n

334

J.D. Chung et al.

i1 < i2 < " < i n , then the sequence 〈 a1 a 2 " a n 〉 is a partial sequence of 〈b1b2 " bm 〉 . For example, 〈 2 5 9 8 〉 is a partial sequence of 〈 6 2 1 5 9 3 8 〉 . Let S = {s1 , s 2 , " , s m } denote a set of moving sequences. Each si represents a moving sequence, where 1 ≤ i ≤ m . If sequence s is a partial sequence of s ' , then it is said that s ' contains s . The support of sequence s can be defined as a proportion of entire (including s ) moving sequence, i.e., sup (s) = | {si | s ⊆ si }| m . fixed

The user-specified minimum support threshold is the lowest value that each frequent sequence satisfies. It is denoted as min_sup. If a sequence s has sup (s ) ≥ min_sup , then it is defined as a frequent sequence. Although a multiple appearance of an area in a sequence is possible, it is counted no more than once in a given sequence. Definition 2. Given moving objects database (D), user-assigned minimum support (min_sup), and user-assigned time constraint between areas (max_gap), the moving pattern mining involves searching for all frequent moving sequences that satisfy the minimum support.

4 Temporal Pattern Mining of Moving Objects In this section, based upon the definitions suggested in section 3, we present a technique used to investigate moving patterns. The algorithm used for moving patterns mining consists of four stages: database arrangement, location generalization, moving sequence extraction, and frequent moving pattern mining. 4.1 Database Arrangement Stage The database for mining should be orderly arranged by object discriminator as the primary key and by effective time as the assistant key. This provides a well-organized process of transformation when arranging discriminators of the moving objects and bounded time for moving patterns mining. Table 2 shows an example of the database, arranged according to discriminator and time. The unit of the spent time is in minutes. 4.2 Location Generalization Stage We transform location value of the moving objects into an area with fixed boundary values using a spatial operation. In this process, a spatial operation method is used to test whether an object’s x, y point lies within a specified area. A spatial area is represented by a polygon. Contains spatial operation algorithm[14] is used to test whether the area includes coordinate points when two random coordinate points and specified areas are entered. ‘True’ is returned if the point is inside the area, and ‘false,’ otherwise. Table 3 demonstrates the application of Contains operation on spatial attribute value of each moving object (from the ordered database in Table 2), transforming it into a generalized area.

Temporal Pattern Mining of Moving Objects for Location-Based Service Table 2. Example of an arranged database oid 1

2

3

vt 2001/10/30/13/10 2001/10/30/13/15 2001/10/30/13/25 2001/10/30/13/38 2001/11/01/12/30 2001/11/01/12/38 2001/11/01/12/45 2001/11/01/12/56 2001/10/30/14/11 2001/10/30/14/17 2001/10/30/14/23 2001/10/30/14/58

x 15 38 55 65 5 7 35 51 23 59 77 78

335

Table 3. Location after Contains operation

y 10 15 8 19 17 35 16 18 15 19 12 35

oid 1

2

3

vt

Location

2001/10/30/13/10 2001/10/30/13/15 2001/10/30/13/25 2001/10/30/13/38 2001/11/01/12/30 2001/11/01/12/38 2001/11/01/12/45 2001/11/01/12/56 2001/10/30/14/11 2001/10/30/14/17 2001/10/30/14/23 2001/10/30/14/58

A B C D A E B C B C D H

4.3 Moving Sequence Extraction Stage In this stage, a moving sequence of each moving object is extracted. That is, a transaction for moving pattern mining is created in this stage. While a sequence as an object of pattern mining is clearly defined in the transaction database, a sequence as an object of moving pattern mining is not so clearly defined. In order to generate a significant moving sequence, we put a maximum time constraint between areas that constitute a sequence. Only when the time between two locations stay within the maximum time constraint can a sequence be produced. In addition, during this process, the effective time related to spatial attributes of the objects is examined. If the duration of an object’s stay over a specified location exceeds max_gap, the sequences are categorized into either moving sequences before exceeding or after exceeding. Table 4 shows an example of moving sequence of each object, which is extracted from the database in Table 3. In this example, we assume that max_gap is 30 minutes. Table 4. Moving sequences

Oid

Moving Sequences

1



2



3

,

4.4 Frequent Moving Pattern Mining Stage This stage involves mining, from moving sequences, the frequent moving pattern that exceeds the critical value assigned by the user. For this purpose, we use a modified version of Apriori [1] algorithm, the representative association rules algorithm that effectively reduces candidate sets.

336

J.D. Chung et al.

Fk represents frequent k -sequence, and C k represents candidate k -sequence. C k is the self-join of Fk −1 , i.e., Fk −1 * Fk −1 . When the individual moving sequence, s1 ,..., s k −1 and s '1 ,..., s ' k −1 that exist in Fk −1 , exists and if the sequence s '1 ,..., s ' k −1 includes s1 ,..., s k − 2 , or s1 ,..., s k −1 includes s '1 ,..., s ' k − 2 , a join is established. Next, any sequence that includes sequence in C k but not included in Fk −1 is eliminated. This procedure is executed based on the observations that super sets (i.e., infrequent sets) do not occur often. Also, we use hash tree to efficiently scan whether appropriate candidate sets for moving sequence exist and are stored. Assuming that min_sup represent two sequences, the frequent moving pattern extracted from moving sequence in Table 4 is {
, , , , , , , , }. Fig. 2 shows this procedure. Thus far, we have categorized the process of moving pattern mining into four stages. The entire process of MP (Moving Pattern mining) algorithm is given in [14]. F1

C2

F2

C2

Sequence

Support

Sequence

Sequence

Support

Sequence

Support



2





2



2



3





2



2



3





1



3

2





0



2





Scan D



3





1





0





0





2





0





0





0

C3 Sequence


F3

C3 Scan D

Sequence

Support

Sequence

Support



2



2

Fig. 2. Example of candidate sequence generation

5 Experimental Results In this section, we analyze the performance of the proposed algorithm using the experimental data.

Temporal Pattern Mining of Moving Objects for Location-Based Service

337

5.1 Experimental Design and Data Generation The experiment was conducted using the random data generated via modified version of the data generator used in [1]. A modification is made such that, when moving objects’ location properties as represented by coordinates are mapped into the domain of coordinates, |E| represents the average number of main lines that form the domain of interest. The data generation parameters are defined in Table 5 and their values are presented in Table 6. Table 5. Data generating parameters

Input Variables

Description

|D|

The number of total moving sequences to be inputed into database.

|C|

The average number of areas per moving sequence.

|E|

The average number of lines constituting an area.

|S|

The average length of potential frequent sequence.

Ns

The number of potential frequent sequences.

N

The number of areas. Table 6. Parameter values for data generation

Generated Data C5.E4.S4 C5.E6.S4 C10.E4.S4 C10.E4.S8

|C| 5 5 10 10

|E| 4 6 4 4

|S| 4 4 4 8

The data generating process for the experiment was designed to generate data by varying the number of regions that constitute a sequence, the number of main lines that constitute domain, and the average length of resulting patterns. 5.2 Performance Evaluation Using the generated data, performance of the algorithm is evaluated based on the changes in minimum support, the number of total moving sequences, and the average number of domains per moving sequence. Fig. 3 shows the execution time results as the minimum support of each generated dataset decreases from 1% to 0.35%.

338

J.D. Chung et al.

Time(seconds)

Minimum support

600

C5.E4.S4

500

C5.E6.S4

400

C10.E4.S

300

200

100

0

1

0,75

0,5

0,4

0,35

Minumum Support(%)

Fig. 3. Execution time in response to changes in minimum support level Number of Moving Sequences 40

2%

35

1,50%

1%

Time(seconds)

30 25 20 15 10 5 0

10

30

50

100

Number of Moving Sequences('000s)

Fig. 4. Execution time in response to changes in number of total moving sequences

Fig. 5 shows the execution time results as the number of areas per moving sequence is increased from 5 to20. We let |D|=20000, Ns= 4000, and N= 2000. In summary, several conclusions can be drawn from the performance evaluation. First, during the entire course of the algorithm, the two steps, location generalization for moving objects and frequent moving pattern mining, do not influence each other’s performance. Second, the support level entered by the user greatly influences the performance of the algorithm. As decreased support levels generate increased number of candidate sets, fees associated with scanning the database rise. Third, as the length of the moving sequence grows, especially when the length exceeds 10, the execution time rises, thus alerting the need for efficient algorithm development and improved storage capacity. Finally, with the increase in the size of input database, the execution time for the algorithm gradually escalated.

Temporal Pattern Mining of Moving Objects for Location-Based Service

339

Number of Areas per Sequences 600

2%

1,50%

1%

10

15

20

Time(seconds)

500

400

300

200

100

0

5

Number of areas per sequences

Fig. 5. Execution time in response to changes in number of areas per moving sequence

6 Conclusion and Future Works In this research, we define individual users as the moving objects, and present an innovative mining technique useful for LSB in discovering significant patterns from users’ location information that also change over time. The algorithm used for moving patterns mining consisted of four stages. First, the database is arranged into object discriminator and transaction discriminator, and spatial operators are used on the location information of moving objects to generalize locations. Second, with the location generalization, moving objects’ location information is transformed in order to discover significant information. Third, time constraint is imposed to extract effective moving sequence. Finally, the frequent moving patterns are extracted from the generated moving sequences. The results of the experiment show that algorithm efficiency has a negative relation with its support level but positive relations with the length of moving sequences and the size of the database. This mining technique for spatial locations of moving objects is different from the existing techniques that have been used in the analysis of web-log and the transaction analysis. By adopting a simultaneous spatio-temporal approach, there is no doubt that this technique provides useful knowledge for LBS. Future research in this area should consider not only the moving objects’ location information but also incorporate information such as speed and direction, as well as the duration of moving objects’ stay in a given area when designing a research in pattern mining.

References 1. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. of Int. Conf. on VLDB, Santiago, Chile(1994)

340

J.D. Chung et al.

2. Agrawal, R., Srikant, R.: Mining Sequential Patters. In: Proc. of Int. Conf. on Data Engineering, (1995) 3. Chen, M.S., Park, J., Yu, P.S.: Efficient Data Mining for Path Traversal Patterns. IEEE Transactions on Knowledge and Data Engineering, Vol. 10. No.2(1998) 4. Erwig, M., Guting, R.H., Schneider, M., Vazirgiannis, M.: Spatio-Temporal Data Types : An Approach to Modeling and Querying Moving Objects in Databases, GeoInformation, Vol. 3. No. 3.(1999) 5. Forlizzi, L., Guting, R. H., Nardelli, E., Schneider, M.: A Data Model and Data Structures for Moving Objects Databases. In: Proc. of the ACM-SIGMOD Int. Conf. on Management of Data, (2000) 6. Garofalakis, M. N., Rastogi, R., Shim, K.: SPIRIT:Sequential Pattern Mining with Regular Expression Constraints. In: Proc. of Int. Conf. on VLDB,(1999) 7. R. H. Guting, M. H. Bohlen, M. Erwig, C. S. Jensen, N. A. Lorentzos, M. Schneider, M. Vazirgiannis: A Foundation for Representing and Querying Moving Objects, ACM Transactions on Database Systems, (2000). 8. Borges, J., Levene, M.: A Fine Grained Heuristic to Capture Web Navigation Patterns. SIGKDD Explorations, Vol. 2. No. 1 (2000) 9. Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining Access Patterns Efficiently from Web Logs. In: Proc. of PAKDD,(2000). 10. Srikant, R., Agrawal, R.: Mining Sequential Patterns : Generalizations and Performance Improvements. In: Proc. of Int. Conf. on Extending Database Technology, SpringerVerlag(1996) 11. Wolfson, O., Sistla, A. P., Xu, B., Zhou, J., Chamberlain, S.: DOMINO: Databases fOr MovINg Objects tracking. In: Proc. of the ACM-SIGMOD Int. Conf. on Management of Data, (1999) 12. Ryu, K., Ahn, Y.: Application of Moving Objects and Spatiotemporal Reasoning. A TimeCenter Technical Report TR-58(2001). 13. Park, S., Ahn, Y., Ryu, K.: Moving Objects Spatiotemporal Reasoning Model for Battlefield Analysis. In: Proceedings of Military, Government and Aerospace Simulation part of ASTC2001, (2001). 14. Paek, O.: A Temporal Pattern Mining of Moving Objects for Location Based Service. Master thesis, Dept. of Computer Science, Chungbuk National University, (2002) 15. Yun, H., Ha, D., Hwang, B., Ryu, K.: Mining Association Rules on Significant Rare Data using Relative Support. Journal of Systems and Software, 2002 (accepted).