Implementation of Personalized Recommendation System using k-means Clustering of Item Category based on RFM Young Sung Cho1, Song Chul Moon2, Si Choon Noh2 , Keun Ho Ryu1 Department of Computer Science, Chungbuk National University, Cheongju, Korea 2 Department of Computer Science, Namseoul University, Cheonan-city, Korea, Korea (e-mail:
[email protected],
[email protected],
[email protected] ,
[email protected] ) 1
Abstract
-
This paper proposes the recommendation system which is a new method using k-means clustering of item category based on RFM(Recency, Frequency, Monetary) in ucommerce under ubiquitous computing environment which is required by real time accessibility and agility. Searching for wallpaper images with mobile device, such as cell phones, PDA, is inconvenient and complex in this ubiquitous computing environment. The explicit method of collaborative filtering which is used user's profile for rating can not only reflect exact attributes of item but also still has the problem of sparsity and scalability, though it has been practically used to improve these defects. In this paper, using a implicit method without onerous question and answer to the users, not used user's profile for rating to reduce customers’ search effort, it is necessary for us to keep the analysis of RFM to be able to reflect the attributes of the item in order to find out the items with high purchasability. The proposed makes the task of k-means clustering by each item category based on purchased history data for pre-processing so as to be possible to recommend the items with efficiency. As a result, it can be improved and evaluated according to the criteria of logicality through the experiment with dataset collected in a cosmetic internet shopping mall, comparing the proposed system with existing system. Keywords: RFM, Collaborative filtering, Clustering, kmeans algorithm
1. Introduction Due to the advent of ubiquitous networking environment, it is becoming a part of our common life style that the demands for enjoying the wireless internet using intelligent portable device such as smart phone, are increasing anytime or anyplace without any restriction of time and place. In these trends, the personalization becomes a very important technology which could find exact information to present users. Users need a recommendation system that can recommend item which they really wants on behalf of them, if the content is correct,
users also can get satisfaction. The possession of intelligent recommendation system is becoming the company's business strategy. A personalized recommendation system using RFM segmentation analysis technique to meet the needs of customers, it has been actually processed the research[1,2,3,4,5]. We are possible to improve the accuracy of recommendation using RFM method for item segmentation and clustering by item category so as to be able to reflect the attributes of items. As a result of that, we can propose the personalized recommendation system using k-means clustering of item category based on RFM. The next chapter briefly reviews the literature related to studies. The chapter 3 is described a new method for personalized recommendation system in detail, such as system architecture with sub modules, the procedure of processing the recommendation, the algorithm for proposing system. The chapter 4 describes the evaluation of this system in order to prove the criteria of logicality and efficiency through the implementation and the experiment. In chapter 5, finally it is described the conclusion of paper and further research direction.
2. Relative Works 2.1 RFM RFM is generally used in database marketing and direct marketing and has received particular attention in retail. RFM consists three initial character. R means recency-“How recently a customer has purchased?”. F means frequency-“How often she purchases?”. M means monetary-“How much does she spend?”. We can create categories for each attribute to create an RFM analysis. For example, if there were three categories for each attribute which has five bins divided by each 20%, exact quintile, And then the resulting matrix would have twentyseven possible combinations. One well-known commercial approach uses five bins per attributes, which yields 125 cells of segment. The following expression presents RFM score to be able to create an RFM analysis. The variables(A, B, C) are weights, The categories(R, F, M) have five bins.
RFM = A × R + B × F + C × M
(1)
The RFM score is correlated to the interest of ecommerce[4]. 2.2 COLLABORATIVE FILTERING Collaborative filtering means that the method of filtering is associated with the interests of a user by collecting preferences or taste information from many users. The terms of collaborative filtering comes from the method based on other users' preferences. There are two types of the method. One is the explicit method which is used user's profile for rating. The other is the implicit method which is not used user's profile for rating, The implicit method is not used user's profile for rating but is used user's web log patterns or purchased history data to show user's buying patterns so as to reflect the user's preferences. It is used weighting of the five-point scale, a measure of likert in the explicit method. There are some kinds of the method of recommendation, such as collaborative filtering, demographic filtering, rule-base filtering, contents based filtering, the hybrid filtering which put such a technique together and association rule and so on in data mining technique currently. The explicit method can not only reflect exact attributes of item, but also still has the problem of sparsity and scalability, though it has been practically used to improve these defects. In this paper, using a implicit method, it is necessary for us to keep the analysis of RFM method to be able to reflect the attributes of the item in order to find the items with high purchasability. 2.3 CLUSTERING Clustering can be defined as the process of organizing objects in a database into clusters/groups such that objects within the same cluster have a high degree of similarity, while objects belonging to different clusters have a high degree of dissimilarity. Clustering techniques [6,7] fall into a group of undirected data mining tools. The goal of undirected data mining is to discover structure in the data as a whole. Clustering techniques are used for combining observed examples into clusters (groups) that satisfy two main criteria: - each group or cluster is homogeneous; examples that belong to the same group are similar to each other. - each group or cluster should be different from other clusters, that is, examples that belong to one cluster should be different from the examples of other clusters. 2.4 K-MEANS ALGORITHM
K-means is the simplest clustering algorithm. This algorithm uses as input a predefined number of clusters that is the k from its name. Mean stands for an average, an average location of all the members of a particular cluster. The euclidean norm is often chosen as a natural distance which customer a between k measure in the k-means algorithm. The ai means the preference of attribute i for customer a. (2)
In this paper, we can use the K-means algorithm is as follows[8]. The steps of the k-means algorithm are given in Table 1. Table 1. K-means algorithm
Input: D = {d1, d2,......,dn} //set of n data items. k // Number of desired clusters Output: A set of k clusters. Steps: 1. Select randomly k points (it can be also examples) from D to be the seeds for the centroids of k clusters. 2. Assign each example(item di)to the centroid closest to the example, forming in this way k exclusive clusters of examples. 3. Calculate new centroids of the clusters. For that purpose average all attribute values of the examples belonging to the same cluster (centroid). 4. Check if the cluster centroids have changed their "coordinates". If yes, start again form the step 2). If not, cluster detection is finished and all examples have their cluster memberships defined
3. Proposing System 3.1 SYSTEM ARCHITECTURE This proposing system suggests the personalized recommendation system using k-means clustering of item category based on RFM in under ubiquitous computing environment which is required by real time accessibility and agility. Searching for wallpaper images with mobile device, such as cell phones, PDA, is inconvenient and complex in this mobile computing environment. It is convenient, simple for us to use this proposing system to reduce customers’ search effort because of using implicit method without complicated question and answer, unlike explicit method depended on rating data. This proposing system was composed of the sub systems, four agent modules which have the analytical agent, the recommendation agent, the learning agent, the data mining agent in the internet shopping mall environment. And also, we can add data mining agent. The following mentioned the feature of sub system. The analytical agent can be clustering user's
information and purchased data using k-means algorithm. The analytical agent can manage the data clustered by user's information and purchased data. The recommendation agent can create the list of recommendation with TOP-N of the highest preference of item to recommend the item with purchasability efficiently. The recommendation agent can take the cross comparison with purchased history data in order to avoid the duplicated recommendation which it has ever taken. The learning agent can manage users’ score and items’ score and the database table for real time accessibility and agility to apply in ubiquitous computing environment. We observed the web standard in the web development, so developed the interface of internet to use full browsing in mobile device. As a matter of course, we can use web browser in wired internet to use our recommendation system. We can use the system under WAP in mobile web environment by using feature phone. We can support safari browser of iPhone and Google chrome browser based on android so as to use our system by using smart phone.
Fig. 1. The system configuration for recommendation system
3.2 CLUSTERING ALGORITHM USING CUSTOMER INFORMATION & PURCHASED DATA We can create the cluster with neighborhood user_group which is used by customer information with customer’s score and the code of classification with demographic variables: age, gender, occupation, and then create the cluster of purchased data sorted by item category using the cluster with neighborhood user group. The following algorithm is coded by pseudo code using k-means clustering. Table 2. clustering algorithm using customer information & purchased data Input: member, sale, point, gp //customer information with customer’s score, the code of classification reflected demographic variable, purchased data, item category data; Output: user_group(n), sale_group(n); // neighborhood user_group which is created by customer
information // neighborhood sale_group which is created by purchased data while (!= EOF(member_rec)) { Scan member_rec, sale_rec, point_rec; // demographic variable: subscript set it with score, age, gender, occupation If ((score != null) then if score > 89 i = 1 else if score > 79 and < 91 i = 2 else if score > 59 and < 81 i = 3 else if score > 39 and < 61 i = 4 else if score < 40 i = 5; i as Num_class ← # of score in UserID; j as Num_age ← # of Age; k as Num_gender ← # of Gender; l as Num_occupation ← # of occupation; // perform the vector of customer’s feature set n(4 : 1) =i; set n(4 : 2) =j; set n(4 : 3) =k; set n(4 : 4) =l; user_group_rec ← member_rec; create user_group(n) using K-means; // neighborhood grouping by K-means clustering sale_group_rec ← sale_rec sorted by gp; create sale_group(n) using K-means; // neighborhood grouping by K-means clustering }
3.3 ALGORITHM OF PROPOSING SYSYEM The login user can read users' information and recognize the code of classification and user’s score. The system can search the information in the cluster selected by using the code of classification and user’s score. It can scan the preference of item category in this cluster, suggest the brand item in item category selected by the highest probability for preference of item category. This system can create the list of recommendation with TOP-N of the highest preference of item to recommend the item with purchasability efficiently. This system takes the cross comparison with purchased history data in order to avoid the duplicated recommendation which it has ever taken. The following algorithm is the procedure algorithm for a personalized recommendation system using k-means clustering of item category based on RFM. Table 3. Procedure algorithm for Personalized recommendation System using k-means clustering of Item Category based on RFM
Step 1 : When the user joins the membership, user’s information is created, managed the code of classification reflected demographic variable such as age, gender, an occupation and propensity of a customer through users' social data. Step 2 : The login user reads users' information and recognize the code of classification, classifies the cluster using the code of classification reflected demographic variable and user’score. Step 3 : The system applies to the data having the RFM score of
item with a lot of purchasing counts more than 79 points, searches searches the preference of small item category in the cluster’s data classified. Step 4 : The system can selects the highest preference of item category based on purchased data, created the items of recommendation ordered by descending the preference of brand item. Step 5 : The system can create the list of recommendation with TOP-N of the highest preference of item to recommend the item with purchasability efficiently. Step 6 : The system executes the cross comparison with purchased history data in order to avoid the duplicated recommendation which it has ever taken.
through the experiment with learning data set for 12 months, testing data set for 3 months in a cosmetic cyber shopping mall. 4.3 EXPERIMENT & EVALUATION The proposing system's overall performance evaluation was performed by dividing the two directions. The first evaluation is mean absolute error(MAE). The mean absolute error between the predicted ratings and the actual ratings of users within the test set. The mean absolute error is computed the following expression (3) over all data sets generated on purchased data.
4. The Environment of Implementation and Experiment & Evaluation MAE = 4.1 EXPERIMENTAL ENVIRONMENT This system proposes a new method using k-means clustering of item category based on RFM under ubiquitous computing environment. In order to do that, we make the implementation for prototyping of the internet shopping mall which handles the cosmetics professionally and do the experiment. It is the environment of implementation and experiment below. - OS: Window XP - Web Server: Apache HTTP Server Version 2.2.8 / WAP 2.0 - XML/WML2.0/HTML/XHTML/CSS2/JAVASCRIPT - Server-Side Application : JSP/ PHP 5 Version 5.2.5 - Database : MySQL Version 4.0.26 - Java 2 SDK, SE 1.4.2_08 - http://java.sun.com/ - Tomcat 5.0.28 - http://jakarta.apache.org/ - http://www.mysql.com/products/connector/j/ We have carried out the implementation and the experiment for proposing system through system design, we have finished the system implementation about prototyping recommendation system. It could be improved and evaluated to new system through the result of experiment with MAE and the metrics such as precision, recall, F-measure as comparing proposing system with existing system. 4.2 EXPERIMENTAL DATA FOR EVALUATION We used 319 users who have had the experience to buy items in e-shopping mall, 580 cosmetic items used in current industry, 1600 results of purchased data recommended in order to evaluate the proposal system. It could be evaluated in MAE and Precision, Recall, F-measure for the recommendation system in clusters. It could be proved by the experiment
(3)
N represents the total number of predictions, ε represents the error of the forecast and actual phase i represents each prediction. Table 4. The result for table of MAE by comparing proposal system with existing system P_count
Proposal
Existing
50
0.47
0.65
100
0.23
0.32
300
0.07
0.08
500
0.05
0.06
MAE
Fig. 2. The result for the graph of MAE by comparing proposal system with existing system
The next evaluation is precision, recall and Fmeasure for proposing system in clusters. The performance was performed to prove the validity of recommendation and the system's overall performance evaluation. The metrics of evaluation for recommendation system in our system was used in the field of information retrieval commonly[9].
Table 5. The result for table of precision, recall, Fmeasure for recommendation ratio by each cluster
Proposal System Clu ster
Existing System
Precis ion
Recall
Fmeau re
Precis ion
C1
56.98
81.11
62.72
52.10
50.89
47.18
C2
100
18.89
31.78
42.08
16.07
22.34
C3
48.79
48.73
45.38
48.02
31.32
35.3
C4
49.36
51.27
47.52
47.82
29.54
34.39
C5
52.49
46.94
46.84
51.67
34.98
39.18
C6
50.41
53.06
47.86
48.60
43.21
42.02
C7
50.93
49.40
46.59
48.56
36.60
38.26
C8
43.60
50.60
44.47
42.63
36.60
37.08
C9
46.68
31.40
34.58
44.78
25.19
29.71
C10
67.23
68.60
63.91
61.93
55.34
53.87
Recall
Fmeasu re
Above table 5 shows the result of evaluation metrics (precision, recall and F-measure) for recommendation system. It shows the improvement in the result of evaluation rates for proposing system comparing with existing system. The proposed is higher 18.3% in precision, 14.03% in recall, 9.23% in F-measure than the exisited. As a result of that, the performance of the proposal system is improved better than existing system. The following fiigure 6 shows cosmetic items on the web of a personalized recommendation system using k-means clustering of item category based on RFM and also, a smart phone is available to show that. This system can be used immediately in u-commerce under ubiquitous computing environment which is required by real time accessibility and agility because of finishing particular tasks such as clustering and calculating the probability of preference for pre-processing to reduce the processing time.
Fig. 3. The result of recommending ratio for recommendation each cluster by precision
Fig. 6. The result of recommending items of cosmetics Fig. 4. The result of recommending ratio for recommendation each cluster by recall
Fig. 5. The result of recommending ratio for recommendation each cluster by F-measure
5. Conclusion Recently u-commerce as a application field under ubiquitous computing environment required by real time accessibility and agility, is in the limelight. Searching for wallpaper images with mobile device, such as cell phones, PDA, is inconvenient and complex in this ubiquitous computing environment. We proposed the personalized recommendation system using k-means clustering of item category based on RFM method to be able to reflect the attributes of the item in order to find out the items with high purchasability. It is convenient, simple for us to use this proposing system to reduce
customers’ search effort because of using implicit method without complicated question and answer, unlike explicit method depended on rating data which many use in existing system. It was improved and evaluated according to the criteria of logicality through the experiment. It is meaningful to present a new framework of personalized recommendation system for u-commerce. The following research will be looking for ways of a personalized u-commerce recommendation technique by a variety of approaches using RFM to increase the efficiency and scalability.
Acknowledgement This work1) was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MEST) (No. 2011-0001044) and by the grant of the Korean Ministry of Education, Science and Technology (CBITRC). This paper2) was supported by funding of Namseoul University
6. References [1] Young Sung Cho, Moon Haeng Heo, Keun Ho Ryu, “ Implementation of Personalized recommendation System using RFM method in Mobile Internet Environment” , KSCI, 13th-2 Vol, pp 1-5, Mar, 2008 [2] Young Sung Cho, Keun Ho Ryu, "Implementation of Personalized recommendation System using Demographic data and RFM method in eCommerce", 2008 IEEE International Conference on Management of Innovation & Technology Publication, 2008. [3] Jin Byeong Woon, Young Sung Cho, Keun Ho Ryu, “ Personalized e-Commerce Recommendation System using RFM method and Association Rules” , KSCI, 15th-12 Vol, pp 227-235, Dec, 2010 [4] Young Sung Cho, Seon-phil Jeong, Keun Ho Ryu, "Implementation of Personalized u-Commerce Recommendation System using Preference of Item Category based on RFM", the 6th International Conference on Ubiquitous Information Technologies & Applications, pp109-114, Dec, 2011 [5] Young Sung Cho, Keun Ho Ryu, "Personalized Recommendation System using FP-tree Mining based on RFM, KSCI, 17th-2 Vol, Feb., 2012 [6] Collier K., Carey B., Grusy E., Marjaniemi C.,and Sautter D., (1998) “A Perspective on Data Mining”, Northern Arizona University. [7] Hand D., Mannila H., Smyth P. (2001), “Principles of Data Mining”. The MIT Press. [8] MacQueen J.B. (1967): Some methods for classification and analysis of multivariate observations. Proc. 5-th Symp. Mathematical Statistics and Probability, Berkelely, CA,
Vol. 1, pp. 281–297. [9] Jonathan L. Herlocker, Joseph A. Kosran, Al Borchers, and John Riedl, “ An Algorithm Framework for Performing Collaborative Filtering", Proceedings of the 1999 Conference on Research and Development in Information Retrival, 1999