Location-aware Data mining for Mobile Users based on Neuro-Fuzzy System* Romeo Mark A. Mateo1, Marley Lee2, Su-Chong Joo3, and Jaewan Lee1 1
School of Electronic and Information Engineering, Kunsan National University 68 Miryong-dong, Kunsan, Chonbuk 573-701, South Korea {rmmateo, jwlee}@kunsan.ac.kr 2 School of Electronic and Information Engineering, Chonbuk National University 664-14, DeokJin-dong, Jeonju, Chonbuk 561-756, South Korea
[email protected] 3 School of Electrical, Electronic and Information Engineering Wonkwang University, South Korea
[email protected]
Abstract. Data mining tools generally deal with highly structured and precise data. However, classical methods fail to handle imprecise or uncertain information. This paper proposes a neuro-fuzzy data mining approach which provides a means to deal with the uncertainty of data. This presents a location-based service collaboration framework and uses the neuro-fuzzy algorithm for data mining. It also introduces the user-profile frequency count (UFC) function to determine the relevance of the information to mobile users. The result of using neuro-fuzzy system provides comprehensive and highly accurate rules.
1 Introduction Ubiquitous and mobile technologies providing location-awareness and information through location-based services (LBS) have experienced dramatic increase in the world market [1]. Such technologies include radio frequency identifiers (RFID), smart personal devices, and global positioning systems. Researchers investigate the methods of acquiring information using these distributed technologies [2]. Moreover, identifying patterns and rules of locations by using data mining are challenging areas for these researchers. Classical methods provide meaningful information and are use for predictions of data [3]. The rules extracted from classification mining predict the next event from the history of transactions [4]. The knowledge represented by rules is useful for analyzing patterns of data especially in allocating resources from the LBS [3]. On the contrary, classical methods avoid imprecise or uncertain information because it is not considered useful in processing the data. Moreover, the goal of obtaining understandable results is often neglected. In location-based services, the imprecision is due to the errors and inaccuracies of measuring devices. * This research was supported by grant R01-2006-000-10147-0 from the Basic Research Program of the Korea Science and Engineering Foundation.
Fuzzy systems are used to handle uncertainty from the data that cannot be handled by classical methods. It uses the fuzzy set to represent a suitable mathematical tool for modeling of imprecision and vagueness [5]. The pattern classification of fuzzy classifiers provides a means to extract fuzzy rules for information mining that leads to comprehensible method for knowledge extraction from various information sources [6]. The fuzzy algorithm is also a popular tool for information retrieval [7]. Fuzzy c-means classifier (FCM) uses an iterative procedure that starts with an initial random allocation of the objects to be classified to c clusters. Among other data mining techniques, FCM is a very popular tool for knowledge extraction in the distributed environment like in Ko, et. al [8]. The output of FCM can be substantially improved by means of preprocess filtering [9]. The filtering removes the unnecessary data and consequently increases the processing speed of FCM as well as improves the quality of rules extracted. In the case of neuro-fuzzy, a fuzzy system is used to represent knowledge in an interpretable manner. The algorithm borrows the learning ability of neural networks to determine the membership values. It is among the most popular data mining techniques used in recent research [10, 11]. In this paper, we propose a location-aware data mining approach for mobile users based on neuro-fuzzy system. We present a framework to enhance the location-based service (LBS) by filtering using the user-profile frequent count (UFC) function to select the most relevant object service. To demonstrate our framework, we perform data mining using the neuro-fuzzy algorithm on the location information obtained from the LBS. Also, the proposed system is compared to other classical methods.
2 Related Works 2.1 Data Mining using Mobile Devices In location-based services, data mining is used to reveal patterns of services and provide prediction of location. A sequential mining approach for the location prediction is used to allocate resources in a PCS network [3]. This technique can effectively allocate resources to the most probable-to-move cells instead of blindly allocating excessive resources in the cell-neighborhood of a mobile-user. Location-awareness agent using data mining is found in the work of Lee, et. al. [4]. This is done by sending a mobile agent to the LBS and then it performs the classification mining in the database. A guide system combines the positioning technique and location-awareness service to provide the surrounding information for users [12]. The guide system not only accepts the user's search query to find the target but also receives the information from other users who took notes during the tour guide. 2.2 Neuro-Fuzzy Systems Fuzzy classification is based on the concept of fuzzy sets, which was conceived by Lotfi Zadeh [14]. It is presented as a way of processing data by allowing partial set membership rather than crisp set membership or non-membership. Typical fuzzy data analysis discovers rules in large set of data and these rules can be used to describe the
dependencies within the data and to classify a new data [6]. Neuro-fuzzy systems are fuzzy classifiers and uses neural networks for learning by performing induction of the structure and adaptation of the connection weights [10, 11]. There are many types of neuro-fuzzy rule generation algorithm [15]. FuNE-I is a neuro-fuzzy model that is based on the architecture of feed-forward neural network with five layers which uses only rules with one or two variables in antecedents [16]. A Sugeno-Type neuro-fuzzy system is used for a scheme to construct an n-link robot manipulator to achieve highprecision position tracking [17]. A scheme of knowledge encoding in a fuzzy multilayer perceptron using rough set-theoretic concepts [18] that is utilized for encoding the crude domain knowledge. A neuro-fuzzy classification (NEFCLASS) is a fuzzy classifier that creates fuzzy rule from data by a single run through the data set [11]. Figure 1 is a type of neuro-fuzzy system which consists of two input nodes, two output nodes and five hidden node use for the linguistic rules.
Fig 1. A NEFCLASS system with two inputs, five rules and two output classes
3
Collaborative Framework for Location-based Services
The proposed framework of location-based service (LBS) consists of four interactive components namely, location-aware agent, location information service, mapping service and object group services. These components collaborate to produce the necessary location information and use neuro-fuzzy system for classifying the data and extract rules. Figure 2 illustrates the interaction of object service registration and processing the mobile request from the LBS framework. The LBS is implemented in Common Object Request Broker Architecture (CORBA) to support the communication of distributed objects and to provide an efficient service.
Mapping Service (GPS)
Locationaware Agent
registration
Object Group Service
Maps the request Location Information Service
Mobile request
Object Group Service
LOCATIONBASED SERVICES
Object Service
information from OGS
(Restaurant, etc.)
Fig 2. Location-based services collaboration framework
The interaction of these components starts with a client requesting for information like finding of 30 nearest restaurant. The request is sent to the location information service and then it invokes the services that match the request of client. Mapping service estimates the location of the mobile user among the services. The extracted information from different services is collected by the location information service. After collecting the information, location-aware agent process the data using the neuro-fuzzy algorithm. The processed information is then presented into table of location information and fuzzy rules as outputs. The components from the proposed architecture are defined in more details in the next sub-sections.
3.1 Components of the Location-based Services Location-aware Agent. First, the query of the mobile user is processed by the location information service. Before sending the request, user profiles like the interest of mobile user are sent. The request is forwarded to mapping services to locate the service and object services to provide information. Lastly, after gathering the location information, it is returned to the location-aware agent and process data mining. Location Information Service (LIS). Location information service collects information provided by location-based services. Thus, when a mobile user wants to search the nearest restaurants, then the location information service invokes the mapping service to map the nearest restaurants. The location information are collected and sent to the location-aware agent for processing the neuro-fuzzy algorithm. Mapping Service. The mapping service accomplishes the inquiry of mobile users to select the nearest services. To be able to map the object service, it has to register first to the mapping service. Mapping service accepts the membership of the object service and the object service agrees to collaborate to the system. Once the location of the object service is change, it notifies the mapping service to update its location.
Object Group Services (OGS). Object group service provides customable implementation of objects and gives challenge to developers. These is done by implementing membership schemes like binding other objects to a group and defined through IDL for accessibility of the object services [19]. In our proposed system, the object group services are represented by location-based services or as simple as web servers. Object services allow information retrieval through location information service.
4
Neuro-Fuzzy Data Mining Model
Conventional methods in data mining are geared towards structured and precise data. Lee, et. al [4] propose a location-aware agent to mine the database of library using a J48 classifier to predict the availability of books which can be borrowed. However, it does not deal with imprecise data which can be factor for time estimation and the result is not easy to comprehend. In our study, we use the neuro-fuzzy system to manage imprecise data. Figure 3 presents the overall procedure of the location-aware data mining approach. First, the mobile user sends request to the LIS. The LIS then performs the UFC function to each object services and the data are collected and produce LI. Finally, the LI is sent to the location-aware agent to perform data mining.
Object Services
UFC function
Object Services
Location Information Service
LI Wireless transmission
Location-aware agent Neuro-fuzzy Fuzzy Rules
Fig 3. Neuro-fuzzy data mining using location-aware agent
There are two common problems in information retrieval and these are overflow of information and relevance of information. This can be solved by preprocessing method like filtering based on user-profile. We used the method of user-profile frequency count (UFC) function to select the most relevant object service. Let P as the collected user-profile and pi as a single profile. A single profile of a mobile user contains subsets of words. These are words describe pi. In our example, we use American food as a single user-profile and this is described by burgers, steaks, and ice cream that are subset of this profile. The process of calculating P is presented in Equation 1.
P = { p1 , p 2 ,..., p i }
(1)
P is used for user-profile frequency count (UFC) function through the object services. Before executing UFC, the LIS requests the mapping service to query the location of the object services that contains the possible information. The object services then allows information retrieval from their web pages by using UFC. In information retrieval, term frequency [8, 20] is used for classifying documents. This study uses the UFC to determine information relevance where the frequency of pi is counted over all words from every object service (oj). Equation 2 shows the calculation of UFC. n
UFC ( p i , o j ) =
m
∑∑ p
ij
(2)
i =1 j =1
This computation counts the frequency of pi from each oj. Thus, this function is done after the location information from the oj was already invoked by the LIS. The UFC is a separate procedure from the invocation of location information from oj. The repetition of the word from profile will not be counted at this stage. After getting the UFC value of each object service, it is compared to a threshold value (θ) for filtering. The location information which has a UFC value that is less than the threshold will be filtered or removed. If UFC (pi, o j) < θ then it is not relevant to the mobile user and it can be ignored. Moreover, we collect all UFC (pi, oj) to generate a new attribute from the location information. Equation 3 presents the collected information or Ij from the object services where C is the iteration through k attributes. m
Ij =
∑C
k
(3)
1
LI = I j + Pj (UFC )
(4)
Equation 4 presents the merged value of previous location information attributes and the attribute from UFC value. Here, LI is the merged attribute of Ij, which are previous attributes from location information, and Pj(UFC), which is the new attribute that was obtained by UFC function. It is assumed that this relevant information from the web pages is likely important factor for generating fuzzy rules. The LI is transmitted from LIS to the user’s mobile device containing a location-aware agent which processes the data mining. The neuro-fuzzy algorithm is shown below. 1.
Fuzzy Classification. Fuzzy rules shown in Equation 5 are used in classification of pattern where small numbers of linguistic terms that are shared by all rules. These are linguistic terms which are readable and easy to interpret.
If a1 is Ai ,1 and a 2 is Ai , 2 and ... and a k is Ai , K then B i 2.
(5)
Learning Fuzzy Classification Rules. The domain attributes are mapped to the units of the input layer and output layer of the neural network which contains one unit for each possible value of the class attribute. This procedure creates fuzzy rules and adapts the fuzzy set appearing in the rules of learning.
5
Experimental Evaluation
The components of the proposed location-based service collaboration framework were designed using Java. To evaluate different algorithms for data mining, we used Weka for classical methods like J48 classifier, FCM and MLP, and NEFCLASS for neurofuzzy. The following subsection describes the simulation environment, performance analysis and result. 5.1 Data Mining Environment The environment OS platform used here were Windows OS, Red Hat Linux and Sun Solaris 8 to simulate the heterogeneity of the location-based services. The data were gathered from the web pages of restaurants in California and used the mapping service of Google to estimate the location points of the restaurants. These data were processed to the location information service using UFC function. We used the favorite foods of the user-profile to perform UFC and set θ to 1 for filtering the object services. The data mining procedure was executed after sending the information to the mobile users. 5.2 Performance Analysis Precision and recall are two typical measures for evaluating the performance of information retrieval systems [20]. Given a discovered cluster γ and the associated reference cluster Γ, precision (PγΓ) and recall (RγΓ) applied to evaluate the performance of clustering algorithms. In classifier algorithm, recall and precision is performed by cross-validation of the classified instances. To evaluate the performance of our experiment, we used these measurements of precision and recall. This is done by calculating the average precisions in Equation 6 where AvgP is the summation of precision (Pn) of classes divided by the number of classes.
AvgP =
∑
n i =1
Pn
n
(6)
Average of recall is computed in Equation 7 where AvgP is the summation of recall (Rn) of classes divided by the number of classes.
AvgR
∑ =
n i =1
n
Rn
(7)
A high percentage of precision and recall means that the classification method is more accurate. The processing time is observed to determine the time constraint of processing the algorithm. The number of correctly classified instances was used to determine accuracy. The performance of neuro-fuzzy and classical methods is discussed in Section 5.3. The classical methods that we compared are fuzzy c means (FCM), J48 classifier and multilayer perceptron (MLP).
5.3 Results of Data Mining We used the data environment from Section 5.1 and simulated on computer networks. The result of data mining using neuro-fuzzy algorithm processed by the locationaware agent and generated fuzzy rules from the location information. Fuzzy rules are shown in Table 1 and consist of 9 patterns classified out of 30 restaurant information. Table 1. Fuzzy rules processed from neuro-fuzzy data mining.
Fuzzy rules if nSea is N and nPark is F and sCrowd is S and fPref is M then American if nSea is N and nPark is M and sCrowd is S and fPref is H then Seafood if nSea is F and nPark is M and sCrowd is S and fPref is H then American if nSea is N and nPark is M and sCrowd is S and fPref is M then Seafood if nSea is N and nPark is M and sCrowd is L and fPref is H then American if nSea is N and nPark is N and sCrowd is S and fPref is M then Italian if nSea is N and nPark is M and sCrowd is S and fPref is L then Seafood if nSea is M and nPark is N and sCrowd is S and fPref is M then Italian if nSea is F and nPark is N and sCrowd is S and fPref is M then Italian
The attributes nSea and nPark determines the distance of the restaurant from the sea and the park, respectively, which have variables for membership given by N for near, M for medium near and F for far. The attribute sCrowd determines if there are lots of people within the place, which have variables for membership given by S for small, M for medium and L for large and fFood determines the relevant information from favorite food of the mobile user which have variables for membership given by L for less-favorite, M for medium-favorite and H for highly-favorite. These rules classify types of restaurant which are Italian, American and Seafood. In the first rule, it can be explained that American restaurants are near from the sea and far from parks and not crowded and the food are medium favorite of the mobile user. The second, it can be explained that seafood restaurants are near to the sea and medium far from parks and not crowded and the food are highly favorite of the mobile user. A c c u ra c y
P r o c e s s in g T im e 80
1. 8
1. 6
NF
70
FCM
1. 2
60
J4 8 MLP
50 Percent %
1. 4
1
0 .8
40 30
0 .6
20
0 .4
10
0 .2
0
0 N F
FC M
J 48
(a)
M LP
NF
FCM
J4 8
MLP
(b)
Fig 4. Bar graphs showing the comparison of processing time and accuracy of neuro-fuzzy and classical methods.
Comparison of classical methods for performance is shown in Figure 4. The bar graphs present the comparison of processing time and accuracy of neuro-fuzzy and other classical methods in. In Figure 4a, the processing time of neuro-fuzzy is much faster than the MLP while FCM and J48 classifier has almost same processing time. In accuracy, we can justify the performance of neuro-fuzzy is better than the other classical methods in the sense that even though the processing time is slower than the FCM and J48, it is more accurate of classifying patterns shown in Figure 4b. Table 2 shows the result of precision and Table 3 for recall. Table 2. Precision of each class
Classes American Seafood Italian Average
Neuro-fuzzy 0.85 0.33 0.72 0.63
FCM
J48 0.29 0.36 0.50 0.38
0.71 0.46 0.50 0.56
MLP 0.44 0.43 0.57 0.48
0.50 0.50 0.60 0.53
MLP 0.40 0.60 0.40 0.47
Table 3. Recall of each class
Classes American Seafood Italian Average
Neuro-fuzzy 0.60 0.70 0.80 0.70
FCM
J48 0.25 0.56 0.25 0.35
The result of precision (Equation 6) and recall (Equation 7) are presented in Table 2 and 3, respectively. Neuro-fuzzy has the highest precision which has an average of 0.63 and recall which has an average of 0.70 compared to J48 (0.56, 0.53), MLP (0.48, 0.47), and FCM (0.38, 0.35). Most of these classical methods were able to predict testing data with the number of misclassified patterns between 14 to 16 while neurofuzzy has only 9 misclassified patterns. In addition, the results outperformed the mentioned classical methods in terms of simplicity by providing comprehensive rules from the data.
6 Conclusion In this paper we proposed a location-aware data mining approach for mobile user based on neuro-fuzzy system. We present a collaborative framework for locationbased services and enhanced it by filtering using the user-profile frequent count (UFC) function to select the most relevant object service. Neuro-fuzzy algorithm is used for data mining of location information. The proposed system is compared to other classical methods and shows that it provides comprehensive results of rules from the location information. More correctly classified instances are obtained by the neuro-fuzzy which indicates that its classification accuracy is much better than classical methods. We presented our experiment on restaurant services and our future works will be implementing this work in an on-going project of intelligent home healthcare services.
References 1. 2. 3.
4.
5.
6.
7.
8.
9.
10. 11. 12. 13. 14. 15. 16. 17.
18. 19.
20.
Schiller, J., and Voisard, A.: Location-Based Services. Morgan Kaufmann Publishers, Elveiser Inc. (2004) pp. 10-15 Patterson, C., Muntz, R., and Pancake, C.: Challenges in Location-Aware Computing. IEEE Pervasive Computing. Vol.2, No. 2 (April-June 2003) pp. 80-89 Yavas, G., Katsaros, D., Ulusoy, O., and Manolopoulos, Y.: A Data Mining Approach for Location Prediction in Mobile Environments. Data & Knowledge Engineering Vol. 54 (2005) pp. 121-146 Lee, J. W., Mateo, R. M., Gerardo, B. D., and Go, S.: Location-aware Agent using Data mining for the Distributed Location-based Services. LNCS Vol. 3894, Glasgow, Scotland, U.K. (May 2006) pp. 867-876 Dumitrescu, D., Lazzerini, B., and Jain, L. C.: Fuzzy Sets and Their Application to Clustering and Training. The CRC Press International Series on Computer Intelligence, CRC Press LLC (2000) Kruse, R., Bolgelt, C., and Nauck, D.: Fuzzy Data Analysis: Challenges and Perspectives. In Proceedings of the 8th IEEE International Conference on Fuzzy Systems, IEEE Press, Piscataway, NJ, USA (1999) Mendes Rodrigues, M.E.S. and Sacks, L.: A Scalable Hierarchical Fuzzy Clustering Algorithm for Text Mining. In Proceedings of the 5th International Conference on Recent Advances in Soft Computing, Nottingham, U. K. (December 2004) Ko, J., Gerardo, B. D., Lee, J. W., and Hwang, J.: The Information Search System Using Neural Network and Fuzzy Clustering Based on Mobile Agents. LNCS Vol. 3481, Singapore (May 2005) pp. 205-214 Yi, S., Gerardo, B. D., Lee, Y. S., and Lee, J. W.: Intelligent Information Search Mechanism using Filtering and NFC based on Multi-agents in the Distributed Environment. LNCS Vol. 3982, Glasgow, Scotland, U.K. (May 2006) pp. 867-876 Klose, A., Nürnberger, A., Nauck , D., and Kruse R.: Data Mining with Neuro-Fuzzy Models. Data Mining and Computational Intelligence, Springer-Verlag (2001) pp. 1-36 Nauck, D., and Kruse, R.: NEFCLASS - A Neuro-Fuzzy Approach for the Classification of Data. In Proceedings of ACM Symposium on Applied Computing, Nashville (1995) Huang, Y. P., Chuang. W. P.: Improving the Museum's Service by Data Mining and Location-aware Approach. In Proceedings of Systems, Man and Cybernetics (2004) Bellavista, P., Corradi, A., and Stenfalli, S.: Mobile Agent Middleware for Mobile Computing. Computer Journal. (March 2001) pp. 73-81 Zadeh, L. A.: Fuzzy Sets. Information and Control (1965) pp. 338-353 Mitra, S., and Hayashi, Y.: Neuro-Fuzzy Rule Generation: Survey in Soft Computing Framework. IEEE Trans. Neural Networks, Vol. 11 (2000) pp. 748-768 Halgamuge, S. K., and Glesner, M.: Neural Networks in Designing Fuzzy Systems for Real World Applications. Fuzzy Sets and Systems, 65 (1994) pp. 1-12 Wai, R. J., and Chen, P. C.: Intelligent Tracking Control for Robot Manipulator Including Actuator Dynamics via TSK-type Fuzzy Neural Network. IEEE Trans. Fuzzy Systems Vol. 12 (2004) pp. 552-560 Banerjee, M., Mitra, S., and Pal, S. K.: Rough Fuzzy MLP: Knowledge Encoding and Classification. IEEE Trans. Neural Networks, Vol. 9 (1998) pp. 1203-1216 Felber, P., Guerraoui, R., Schiper, A.: Evaluating CORBA Portability: The Case of an Object Group Service, Proceedings of the Second International Enterprise Distributed Object Computing Workshop (1998) pp. 164-173 Baeza-Yates, R., and Ribeiro-Neto, B.: Modern Information Retrieval. New York: Addison Wesley, ACM Press (1999)