Document not found! Please try again

Mobile Sequential Pattern Mining in Location-Based Service

0 downloads 0 Views 1MB Size Report
Fig.1 An example for a mobile transaction sequence. (a) Moving ... Requesting the nearest business or service, such as an ATM or restaurant. •. Turn by turn ...
International Journal of Electronics and Computer Science Engineering Available Online at www.ijecse.org

1328

ISSN- 2277-1956

MOBILE SEQUENTIAL PATTERN MINING IN LOCATIONBASED SERVICE ENVIRONMENT Ms. V. C. Belokar 1, Mr. P.S. Kulkarni 2 1,2 Lecturer, Information Technology Dept K. K. Wagh Polytechnic, Nashik ,India 2 2 EMAIL ID- [email protected], [email protected] 1-INTRODUCTION The advancement of wireless communication techniques and the popularity of mobile devices such as mobile phones, PDA, and GPS-enabled cellular phones, have contributed to a new business model. In this chapter various basic terms involved in mining mobile transactions are explained briefly. Keywords —Data mining, transportation, mining methods and algorithms, mobile environments 1.1 Mobile Communication: Mobile users can request services through their mobile devices via Information Service and Application Provider (ISAP) from anywhere at any time. This business model is known as Mobile Commerce (MC) that provides Location-Based Services (LBS) through mobile phones. The communication coverage of each base station in mobile network is called a cell as a location area. The average distance between two base stations is hundreds of meters and the number of base stations is usually more than 10,000 in a city. When users move within the mobile network, their locations and service requests are stored in a centralized mobile transaction database.

Fig. 1 shows an MC scenario, where a user moves in the mobile network and requests services in the corresponding cell through the mobile devices. Fig.1 An example for a mobile transaction sequence. (a) Moving sequences. (b) Service sequences. Fig. 1a shows a moving sequence of a user, where cells are underlined if services are requested there. Fig. 1b shows the record of service transactions, where the service S1 was requested when this user moved to the location A at time 5. In fact, there exists insightful information in these data, such as movement and transaction behaviors of mobile users. Mining mobile transaction data can provide insights for various applications, such as data prefetching and service recommendations. 1.2 Location Based Services: A Location-Based Service (LBS) is an information or entertainment service, accessible with mobile devices through the mobile network and utilizing the ability to make use of the geographical position of the mobile device. LBS include services to identify a location of a person or object, such as discovering the nearest banking cash

ISSN 2277-1956/V1N3-1328-1339

1329 Mobile Sequential Pattern Mining in Location-Based Service Environment machine or the whereabouts of a friend or employee. LBS include parcel tracking and vehicle tracking services. LBS can include mobile commerce when taking the form of coupons or advertising directed at customers based on their current location. Some examples of location-based services are: • Requesting the nearest business or service, such as an ATM or restaurant • Turn by turn navigation to any address • Locating people on a map displayed on the mobile phone • Receiving alerts, such as notification of a sale on gas or warning of a traffic jam • Location-based mobile advertising • Real-time Q&A revolving around restaurants, services, and other venues. 1.3 Clustering Mobile Transactions: Clustering mobile transaction data helps in the discovery of social groups, which are used in applications such as targeted advertising, shared data allocation, and personalization of content services. In previous studies, users are typically clustered according to their personal profiles (e.g. age, sex, and occupation). However, in real applications of mobile environments, it is often difficult to obtain users’ profiles. That is, we may only have access to users’ mobile transaction data. To achieve the goal of user clustering without user profiles, we need to evaluate the similarities of mobile transaction sequences (MTSs). Although a number of clustering algorithms have been studied in the rich literature, they are not applicable in the LBS scenario in consideration of the following issues: 1) Most clustering methods can only process data with spatial similarity measures, while clustering methods with non spatial similarity measures are required for LBS environments.2) Most clustering methods request the users to set up some parameters. However, in real applications, it is difficult to determine the right parameters manually for the clustering tasks. Hence, an automated clustering method is required. Although there exist many non spatial similarity measures, most of them are used to measure the string similarity. However, the mobile transaction sequences discussed in this report include multiple and heterogeneous information such as time, location, and services. 1.4 Time Segmentation: The time interval segmentation method helps us find various user behaviors in different time intervals. For example, users may request different services at different times (e.g., day or night) even in the same location. If the time interval factor is not taken into account, some behaviors may be missed during specific time intervals. To find complete mobile behavior patterns, a time interval table is required. Although some studies used a predefined time interval table to mine mobile patterns, the data characteristic and data distribution vary in real mobile applications. Therefore, it is difficult to predefine a suitable interval table by users. Automatic time segmentation methods are, thus, required to segment the time dimension in a mobile transaction database. Genetic Algorithm (GA) is automatic time segmentation method. GA produces a more suitable time interval table 1.5 Mining Mobile Transactions: A mobile transaction database is complicated since a huge amount of mobile transaction logs is produced based on the user’s mobile behaviors. Data mining is a widely used technique for discovering valuable information in a complex data set and a number of studies have discussed the issue of mobile behavior mining. However, mobile behaviors vary among different user clusters or at various time intervals. The prediction of mobile behavior will be more precise if we can find the corresponding mobile patterns in each user cluster and time interval. To provide precise location-based services for users, effective mobile behavior mining systems are required pressingly. 1.4.1 CTMSP Mine: A novel data mining algorithm named Cluster-based Temporal Mobile Sequential Pattern Mine (CTMSP-Mine) is proposed to efficiently mine the Cluster-based Temporal Mobile Sequential Patterns (CTMSPs) of users. To mine CTMSPs, a transaction clustering algorithm named Cluster-Object-based Smart Cluster Affinity Search Technique (CO-Smart-CAST) is proposed, that builds a cluster model for mobile transactions based on the proposed LocationBased Service Alignment (LBS-Alignment) similarity measure. Mining and prediction of mobile sequential patterns by considering user clusters and temporal relations in LBS environments simultaneously gives more precise predictions. The main contributions of this work are that not only a novel algorithm for mining CTMSPs but also two nonparametric techniques for increasing the predictive precision of the mobile users’ behaviors are proposed. Besides, the proposed CTMSPs provide information including both user clusters and temporal relations. Meanwhile,

ISSN 2277-1956/V1N3-1328-1339

IJECSE,Volume1,Number 3 V. C. Belokar and P.S. Kulkarni user profiles like personal information are not needed for the clustering method and time segmentation method proposed. LITERATURE SURVEY In this chapter brief description of different papers about clustering mobile transactions, temporal pattern mining, mobile pattern mining & mobile behavior predictions is carried out. In recent years, a number of studies have discussed the usage of data mining techniques to discover useful rules/patterns from: • WWW • Transaction databases • Mobility data. Sequential pattern mining was first introduced in to search for time ordered patterns, known as sequential patterns within transaction databases. SMAP-Mine was proposed by Tseng and Lin for efficiently mining users’ sequential mobile access patterns, based on the FP-Tree to discover both the user movements and service requests. Lee et al. proposed T-MAP to efficiently find the mobile users’ mobile access patterns based on SMAP in distinct time intervals which are predefined by users. Yun and Chen proposed the Mobile Sequential Pattern (MSP) to take moving paths into consideration and add the moving path between the left hand and the right hand in the content of rules. However, there is no work that considers user clusters and temporal relations in the mobile pattern mining simultaneously. The clustering analysis can be roughly divided into two categories. 1. On similarity measures that may affect the final clustering results directly. 2. The second category is on the clustering methods. For density-based clustering methods, Ben-Dor and Yakhini proposed the Cluster Affinity Search Technique (CAST) that requires an affinity threshold t, where 0 < t < 1. The algorithm guarantees that the average similarity in each generated cluster is higher than the threshold t. Tseng and Kao proposed the Smart Cluster Affinity Search Technique (Smart-CAST). The main ideas of the Smart-CAST are as follows: First, the method uses the CAST as the basic clustering method. Second, the method uses a quality validation method, Hubert’s Ґ(gamma) statistics, to find the best clustering result. The genetic algorithm was proposed by Holland.It needs to define a fitness function to evaluate the quality of a chromosome, and then, randomly generate a population. Through the evolution processes: 1) Selection, 2) Crossover, and 3) Mutation, the chromosomes of the population repeatedly create new generations. The weakest chromosomes become obsolete. The mobile behavior predictions can be roughly divided into two categories. 1. Time series-based prediction that can be divided into two types : 1) Linear models 2) Nonlinear models The nonlinear models considered the object’s movements by more sophisticated regression functions. Thus, their prediction accuracies are higher than those of the linear models. Recursive Motion Function (RMF) is the most accurate prediction method in the literature based on regression functions. 2. Pattern-based prediction. Ishikawa et al. derived a Markov Model (MM) that 3. generates Markov transition probabilities from one cell to another for predicting the next cell of the object.HPM method can only predict 4. the next spatial locations of objects. SMAP-Mine was first proposed to discover sequential mobile access rules and predict the user’s next locations and services. Monreale et al. proposed a prediction model, namely, Where Next that utilized trajectory patterns to predict the next locations of moving objects. METHDOLOGY There are mainly four important issues addressed in mining mobile patterns. Those are: 1. Clustering of mobile transaction sequences 2. Time segmentation of mobile transaction sequences 3. Discovery of CTMSPs 4. Mobile behavior prediction for mobile users using combined approach

ISSN 2277-1956/V1N3-1328-1339

1331 Mobile Sequential Pattern Mining in Location-Based Service Environment The system framework for the mining mobile patterns is:

Fig 2: System Framework Fig. 2 shows the proposed system framework. System has an “offline” mechanism for CTMSPs mining and an “online” engine for mobile behavior prediction. When mobile users move within the mobile network, the information which includes time, locations, and service requests will be stored in the mobile transaction database. Table 1 shows an example of mobile transaction database which contains seven records. In the offline data mining mechanism, there are two design techniques and the CTMSP-Mine algorithm to discover the knowledge. First, the CO-Smart-CAST algorithm is proposed to cluster the mobile transaction sequences. In this algorithm, the LBSAlignment is proposed to evaluate the similarity of mobile transaction sequences. Second, a GA based time segmentation algorithm is proposed to find the most suitable time intervals. After clustering and segmentation, a user cluster table and a time interval table are generated, respectively. Third, the CTMSP-Mine algorithm is proposed to mine the CTMSPs from the mobile transaction database according to the user cluster table and the time interval table. In the online prediction engine, a behavior prediction strategy is proposed to predict the subsequent behaviors according to the mobile user’s previous mobile transaction sequences and current time. The main purpose of this framework is to provide mobile users a precise and efficient mobile behavior prediction system. TABLE 1 An Example of Mobile Transaction Database

ISSN 2277-1956/V1N3-1328-1339

IJECSE,Volume1,Number 3 V. C. Belokar and P.S. Kulkarni

3.1 Clustering mobile transaction database: A mobile transaction database, users in the different user groups may have different mobile transaction behaviors. The first task to tackle is to cluster mobile transaction sequences. A parameter-less clustering algorithm CO-SmartCAST is proposed. Before performing the CO-Smart-CAST, a similarity matrix S, based on the mobile transaction database is generated. The entry Si,j in matrix S represents the similarity of the mobile transaction sequences i and j in the database, with the degrees in the range of [0, 1]. A mobile transaction sequence can be viewed as a sequence string, where each element in the string indicates a mobile transaction. The major challenge is to measure the content similarity between mobile transactions. So, LBS-Alignment is proposed, which can obtain the similarity based on the concept of DNA alignment. 3.1.1 Location Based Service Alignment: LBS Alignment is based on the consideration that two mobile transaction sequences are more similar, when the orders and timestamps of their mobile transactions are more similar. Based on this concept, the time penalty (TP) and the service reward (SR) in the LBS-Alignment is generated. The base similarity score is set as 0.5. Two mobile transactions can be aligned if their locations are the same. Otherwise, a location penalty is generated to decrease their similarity score. The location penalty is defined as 0.5/(|s1|+|s2|),where |s1|and |s2| are the lengths of sequences s1 and s2,respectively. When two sequences are totally different, their similarity score is 0.

ISSN 2277-1956/V1N3-1328-1339

1333 Mobile Sequential Pattern Mining in Location-Based Service Environment

Fig. 3. The LBS-Alignment algorithm. When two mobile transactions are aligned, their time penalty and service reward is measured. TP focuses on their time distance. The farther the time distances between them, the larger their time penalty. TP that is generated to decrease their similarity score is defined as (|s1 time - s2 time|)/len, where len indicates the time length. SR focuses on the similarity of the service requests. The more similar their service requests, the larger their service reward. SR that is generated to increase their similarity score is defined as (|s1.services ^ s2.services|)/(|s1.services U s2.services|).Fig. 3 shows the procedures of an LBS-Alignment measure. Input data include two mobile transaction sequences (line 1). Output data are the similarity between two mobile transaction sequences, with the degrees in the range from 0 to 1 (line 2). Some parameters are initialized (line 4 to line 7). The base similarity score is set as 0.5 (line 5). Dynamic programming to calculate Mi,j (line 8 to line18) is used. Mi,j indicates the value of matrix M in column i and row j, where M is the score matrix of LBS-Alignment. In this procedure, if the locations of two transactions are the same (line 10), both the time penalty (line 11) and the service reward (line 12) are calculated to measure the similarity score (line 13). Otherwise (line 14), the location penalty is generated to decrease the similarity score (line 15). Finally, s.length, s’.length is returned as the similarity score of the two mobile transaction sequences (line 19). After obtaining the similarity matrix, clusters of the mobile transaction sequences by the proposed CO-Smart-CAST are formed. Fig. 4 shows the procedure of CO-Smart-CAST. The input data are an N-by-N similarity matrix S (line 1). The output data are the clustering result (line 2). CO-Smart-CAST can automatically cluster the data according to the similarity matrix without any user-input parameter. The main ideas of CO-Smart-CAST are as follows: First, the CAST method that takes a parameter named affinity threshold t is used as the basic clustering method. Second, a quality validation method is used, called Hubert’s Ґ Statistics, to find the best clustering result. Third, a hierarchical concept to reduce the sparse clusters is used. For a clustering result, Hubert’s Ґ Statistics is used to measure its quality by taking the similarity matrix and the clustering result as the input. In each clustering result, its Ґobj and Ґclu which represent the clustering qualities measured by the original object similarity matrix S and the last cluster similarity matrix S’, respectively. Example: • Let s and s’ be two mobile transaction sequences. • s ={(1,A,S1),(4,B,ø),(6,C,S2),(8,E, ø),(17,G,s4)} • s’={(3,A,ø),(5,D,S1),(8,C,ø),(19,E,ø),(20,G,{S4,S5})}

ISSN 2277-1956/V1N3-1328-1339

IJECSE,Volume1,Number 3 V. C. Belokar and P.S. Kulkarni • Time length=20 and location penalty=.05

Fig.4.(a) gives similarity matrix and (b) gives LBS-alignment result Using above algorithm, similarity between s and s’=0.405 3.1.2 CO-Smart-Cast Algorithm: • Cluster Object Based Smart Cluster Affinity Search Technique. • Co-Smart-CAST clusters the data without any user-input parameter. • Steps for Clustering method: Takes affinity threshold t for basic clustering Use quality validation method, Hubert's Ґ Statistics, to find best clustering result Use hierarchical concept to reduce sparse cluster Input to CO-Smart-CAST Algorithm TABLE 2 :The Similarity Matrix

ISSN 2277-1956/V1N3-1328-1339

1335 Mobile Sequential Pattern Mining in Location-Based Service Environment Algorithm:

Figure 5. The CO-Smart-CAST algorithm The initial values of S’ and S are the same since every object be an independent cluster (line 4).The F1 score which is the harmonic mean to combine Ґobj and Ґclu as ҐCO is used. A higher value of ҐCO represents the better clustering quality. To determine the most suitable t, the easiest way is varying t with a fixed increment and iterating the executions of CAST to find the best clustering result with the highest ҐCO. The main drawback of this way is that many iterations of computation are required. For this reason, the number of computations by eliminating unnecessary executions are reduced, and then, obtain a “near-optimal” clustering result. The main idea is to narrow down the range of t effectively. A testing range R for setting t is from 0 to 1(line 5). By the points P0, P1, P2, P3and P4, R is equally divided into five points, where P0 < P1 < P2 < P3 < P4.Then, the value of each Pi (line 8) is sequentially taken as the affinity threshold to perform the CAST algorithm (line 9), and then, obtain the ҐCO of the clustering result of each Pi (line 10 to line 12). When a run of executing the clustering is completed (line 7 to line13), the clustering at point Pb that produces the highest ҐCO is considered to be the best clustering (line 14). Then, the testing range R is limited within the new range [Pb-1,Pb+1]containing the point Pb (line 15). The above process is repeated until the testing range R is smaller than the threshold " (line 16), where ε is a very small value, i.e., less than 10-5. If the ҐCO statistic produced by point PBest is higher than the best ҐCO statistic(line 17), the best cluster result is recorded (line 18 and line19) and all of the entities in similarity matrix S’ are modified to the average similarities between all pairs of corresponding cluster results.The total process is repeated until no better ҐCO statistic is generated (line 07 to line 26).Finally, the clustering result with the highest quality during the tested. Example: The best affinity threshold t found using highest CO. After 1st clustering output:

Figure6. Clustering after 1st iteration

ISSN 2277-1956/V1N3-1328-1339

IJECSE,Volume1,Number 3 V. C. Belokar and P.S. Kulkarni Similarity matrix for newly formed clusters:

Figure 7. table formed after clustering The final clustering result:

Fig.8. Finally formed clusters 3.2 Segmentation of Mobile Transaction: In a mobile transaction database, similar mobile behaviors exist under some certain time segments. Hence, it is important to make suitable settings for time segmentation so as to discriminate the characteristics of mobile behaviors under different time segments. A GA-based method to automatically obtain the most suitable time segmentation table with common mobile behaviors is proposed. Fig. 9 shows the procedure of proposed time segmentation method, named Get Number of Time Segmenting Points (GetNTSP) algorithm. The input data are a mobile transaction database D and its time length T (line 01). The output data are the number of time segmenting points (line 02). For each item, the total number of occurrences at each time point is accumulated (line 07 to line 11). At each time point with the largest change rate (line 13) is obtained. The change rate is defined as (C[i+1]C[i])/(1+C[i]), where C[i] represents the total number of occurrences for the item at time point i. Then, count occurrences of all these time points (line 15), and find out the satisfied time points whose counts are larger than or equal to the average of all occurrences from these ones, and then, take these satisfied ones as a set of the time point sequence (TPS) (line 17). In the time point sequence, the average time distance a between two neighboring time points is calculated (line 18).

ISSN 2277-1956/V1N3-1328-1339

1337 Mobile Sequential Pattern Mining in Location-Based Service Environment Algorithm:

Figure 9. The GetNTSP algorithm. The number of neighboring time point pairs, in which the time distance higher than α, is calculated (line 19 to line 23). The result represents the time segmentation count (line 24). After obtaining the number of time segmenting points, use genetic algorithm to discover the most suitable time intervals. Example: • Time points with the largest change rates are 5, 10, 13, 30, 5, 28, 10, 7,20, 30, 25, and 28. • These time points can be sorted as 5(2),7(1), 10(2), 13(1), 20(1), 25(1), 28(2), and 30(2), where t(n) indicates that the number of time points t is n. The Time Point Sequence (TPS) :{5, 10, 28, and 30} as average number of time points is 12/8=1.5 Calculate a , 25/3= 8.33 Only 1 interval in 10 and 28 larger than 8.33. So, time segmenting point =1. Use Genetic algo. with fitness function :

Fitness( X ) =

Len ( X )+1

∑ i =1

(

)

2 1  Nc Ns ×  ∑∑ Ti[ c, s] − T i  Nc × Ns  c =1 s =1 

Nc-total no. of cells Ns-total no of services Ti[c, s]-request count of cell c & service s in time interval Ti –avg. service request count 3.3 Discovery of CTMSPs: The entire procedures of CTMSP-Mine algorithm can be divided into three main steps: 1) Frequent-Transaction Mining, 2) Mobile Transaction Database Transformation, and 3) CTMSP Mining. 3.3.1 Frequent Transaction Mining: Mine Frequent transactions (F -Transactions) using modified Apriori Algorithm. At first, count support of each cell and service in

ISSN 2277-1956/V1N3-1328-1339

IJECSE,Volume1,Number 3 V. C. Belokar and P.S. Kulkarni each user cluster time interval Keep frequent 1-transactions with minimal support threshold TSUP. A candidate 2-transaction is generated by joining two frequent 1-transactions if user clusters time intervals cells are same. Repeat same procedures until no candidate transaction is generated. Construct service mapping table to transform services into F-transactions to reduce time TABLE 3 Frequent Transactions

The frequent transactions are shown in Table 3. Here, a service mapping table is constructed to transform services into F-Transactions in Table 3. For each service set, we use a contiguous and unique symbol LSi (Large Service i) to represent it. The mapping procedure can reduce the time required to check if a mobile sequential pattern is contained in a mobile transaction sequence. 3.3.2 Mobile Transaction Database Transformation: The main objectives and advantages are: 1) service sets can be represented by symbols for efficiently processing 2) transactions whose support is less than the minimal support threshold can be eliminated to database. 3.3.3 CTMSP Mining: In this phase, Frequent 1-CTMSPs are obtained in the frequent-transaction mining. Utilization of a two-level tree named (CTMSP-Tree). The internal nodes store the frequent mobile transactions The leaf nodes store the corresponding paths. Every parent node of a leaf node is designed as a hash table stores: the combinations of user cluster tables time interval tables. CTMSP-Mining Tree obtained is:

ISSN 2277-1956/V1N3-1328-1339

reduce the size of

1339 Mobile Sequential Pattern Mining in Location-Based Service Environment

Fig.10. CTMSP-Tree. (a) The part of frequent 2-CTMSPs. (b) 3-CTMSPs. (c) 4-CTMSPs 3.4 Prediction Strategies: 3 prediction strategies : Patterns selected only from the corresponding cluster a user belong. Patterns selected only from the time interval corresponding to current time. Patterns selected only from the ones that match the user’s recent mobile behavior. If exist more than one pattern that satisfy above conditions, the one with the maximal support is selected. SUMMARY AND FUTURE SCOPE A novel method CTMSP-Mine , for discovering CTMSP in LBS environments prediction strategies to predict the subsequent user mobile behaviors using CTMSP is proposed combining user cluster and time interval. This method is not yet applied on real data. Future Scope: As above techniques are not yet implemented on real data, the work is to implement all the above algorithms for real data and obtain results. And then for the same by applying different strategies, efficiency of the mining can be tested in future. BIBLIOGRAPHY [1] [2] [3] [4] [5] [6] [7] [8]

Eric Hsueh-Chan Lu,Vincent S.Tseng and Philip S. Yu, “Mining Cluster-Based Temporal Mobile Sequential Patterns in Location-Based Service Environment,” IEEE Trans. Knowledge and Data engineering , vol. 23, no. 6, June 2011. J. Han and M. Kamber, Data Mining: Concepts and Techniques, second ed., Morgan Kaufmann, Sept. 2000. A. Ben-Dor and Z. Yakhini, “Clustering Gene Expression Patterns,” J. Computational Biology, vol. 6, no. 3, pp. 281-297, July 1999 V.S. Tseng and C. Kao, “Efficiently Mining Gene Expression Data via a Novel Parameterless Clustering Method,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 2, no. 4, pp. 355-365,Oct.-Dec. 2005. C.H. Yun and M.S. Chen, “Mining Mobile Sequential Patterns in a Mobile Commerce Environment,” IEEE Trans. Systems, Man, and Cybernetics, Part C, vol. 37, no. 2, pp. 278-295, Mar. 2007. V.S. Tseng, H.C. Lu, and C.H. Huang, “Mining Temporal Mobile Sequential Patterns in Location-Based Service Environments”Proc. 13th IEEE Int’l Conf. Parallel and Distributed Systems, pp. 1-8,Dec. 2007. V.S. Tseng and W.C. Lin, “Mining Sequential Mobile Access Patterns Efficiently in Mobile Web Systems,” Proc. 19th Int’l Conf. Advanced Information Networking and Applications, pp. 867-871,Mar. 2005. Mr.A.Dubey and Prof.S.K.Shandilya,”Exploiting need of data mining services in mobile computing environments”,Int’l Conf. Computational Intelligennce and Networks 2010

ISSN 2277-1956/V1N3-1328-1339

Suggest Documents