480
Int. J. Business Information Systems, Vol. 14, No. 4, 2013
An effective recommendation based on user behaviour: a hybrid of sequential pattern of user and attributes of product Mojtaba Salehi Industrial Engineering Department, K.N. Toosi University of Technology, 1999143344, Tehran, Iran Fax: +982188674858 E-mail:
[email protected] E-mail:
[email protected] Abstract: Recommender system is a promising technology for companies to present personalised offers to their customers. But this technology suffers from sparsity problem. In addition, most researches are based on explicit rating. But most users do not spend time for rating of products. Therefore, this research proposes an effective recommendation based on user behaviour. Since users express their opinions implicitly based on some specific attributes of products, we introduce a preference matrix that can collect user preferences based on attributes of products. In addition, since there are some sequential patterns in purchasing of products, we use weighted association rules to discover these patterns to improve the quality of recommendation. The method outperforms current algorithms and alleviates sparsity problem. Main contribution is implementation of a user behaviour-based recommendation method that discovers interest of users based on implicit rating of product attributes. In addition, this approach uses sequential pattern of purchasing to improve the quality of recommendation. Keywords: personalisation; content-based filtering; CBF; personalised recommender; sparsity; product attributes; attribute-based; sequential-based; association rules; information overload; collaborative filtering. Reference to this paper should be made as follows: Salehi, M. (2013) ‘An effective recommendation based on user behaviour: a hybrid of sequential pattern of user and attributes of product’, Int. J. Business Information Systems, Vol. 14, No. 4, pp.480–496. Biographical notes: Mojtaba Salehi received his BSc degree from Shahid Bahonar University in 2004 and MSc degree from Tehran University in 2006 and his PhD degree from Tarbiat Modares University in 2013. During 2012, he was a Researcher in TU/e, Mathematics and Computer Science Department. He is currently working as an Assistant Professor of K.N. Toosi University of Technology (KNUT). His research areas of interest include soft computing, data mining, combinatorial optimisation, recommender systems and applied multivariate analysis.
Copyright © 2013 Inderscience Enterprises Ltd.
An effective recommendation based on user behaviour
1
481
Introduction
Recommender systems help users to find appropriate content, products, or services such as books, digital products, movies, music, TV programmes, and websites by analysing the content of past visited items by user or by suggestions from other users (Frias-Martinez et al., 2009, 2006; Kim et al., 2010). Using this system, in the online retail stores, customers spend less time and find better products for purchasing. Truly, to help users deal with information overload and provide personalised recommendations, recommender systems have become as an important research area (Adomavicius and Tuzhilin, 2005; Mathew, 2012). Recommender system uses three strategies for recommendation including: content-based filtering (CBF) (Adomavicius and Tuzhilin, 2005; Cheung et al., 2003; Cho and Kim, 2004) collaborative filtering (CF) (Adomavicius and Tuzhilin, 2005; Boucher-Ryan and Bridge, 2006; Cheung et al., 2003; Cho and Kim, 2004; Karypis, 2001; Leung et al., 2006; Liu and Shih, 2005; Sarwar et al., 2001; Shih and Liu, 2008; Weng and Liu, 2004; Palanivel and Sivakumar, 2011; Salehi et al., 2012) and hybrid approaches (Adomavicius and Tuzhilin, 2005; Cho and Kim, 2004; Choi et al., 2006; Kim et al., 2006; Semeraro et al., 2005; Shih and Liu, 2008; Salehi and Nakhai Kamalabadi, 2012). The most successful technology for building recommender systems is nearest neighbour-based CF that is mostly used in many commercial recommender systems. This method so-called memory-based CF uses the user rating data to calculate the similarity between users or items and make predictions or recommendations according to these calculated similarity values. In most researches, explicit rating of user is used to generation recommendation. Explicit rating approach requires users to provide explicit information about their preferences and needs. Since based on Nielson’s 90-9-1 principle (Nielsen, 2006), more people will lurk in a virtual community than will participate. Therefore, in this research, we use implicit rating and propose a method based on behaviour of user to generate recommendation. Implicit rating approach gathers information based on the online behaviour and activities (i.e., implicit feedback) of user. The most important task of recommendation system is modelling of user’s preferences and computing the relevant degree between massive items and target user. Currently, vector space model is implemented for user’s preference modelling in the most of recommendation algorithms (Adomavicius and Tuzhilin, 2005). These vectors are rating of user on items. In these approaches, according the similarity between vectors or similarity between ratings of users, most relevant items are recommended to users. But these approaches have not adequate accuracy, because items usually have several kinds of attributes with different values and two users that have similar rating on a specific product, may place different emphases on the attributes of this product. Therefore, sometimes similar rating by two users on a product does not eventuate similarity between users. As a sample in an online retail store, products have price, guaranty, brand, packaging and, etc., as attributes and also each attribute has values, for example, for the price attribute we have ‘very high, high, middle, low, very low’. Rating of a user on a product indicates overall rating on the values of product attributes. On the other hand, if we use only rating information, the similarity values between users or items will be unreliable when data are sparse and common items are therefore few. For addressing this limitations and achieving better recommendation, the proposed approach implements an attribute-based approach. According to implicit user ratings on products, the attributes
482
M. Salehi
values of products transfer to the user profile. Based on this approach, the importance of a specific attribute value for each user can be determined based on the number of the user’s purchased products which have this attribute value. On the other hand, the purchasing processes (sequences of product purchase) usually have some time-dependency relationship and are repeatability and periodicity. Therefore, we can mine user’s historical purchasing records for discovering the product purchase sequential patterns and use them to predict the most probable product that a user will purchase in near future to further improve the quality of recommendations (Luo et al., 2010). Taking these issues in mind, this paper proposes an effective recommendation based on user behaviour. Since users express their opinions implicitly based on some specific attributes of products, this paper introduces a preference matrix that can collect user preference based on attributes of products. In addition, weighted association rules are used to discover the sequential of product purchasing and improve the quality of recommendation. The main contribution of this paper is establishing an attribute-based method that can transfer implicit rating of product attributes to user profile to alleviate sparsity problem and improve the quality of recommendations. In addition, this approach uses sequential pattern of purchasing to improve the quality of recommendation. Using this recommender system, companies can provide one-to-one personalisation and at the same can capture customer loyalty. In the rest of paper, Section 2 describes related works. Methodology section introduces the overall system framework and describes the proposed mechanism step by step. Experiment section applies the proposed approach for a datasets to evaluate and analyse the performance. Finally, conclusion section provides the concluding remarks.
2
Related works
In this section, we briefly review prior methods related to recommender systems, which can be categorised into three groups: content filtering, CF and hybrid methods. We also discuss the techniques of recommendation and highlight the difference between our method and other existing works.
2.1 Content-based recommendation In CBF recommendations are done based only in the profile made taking into consideration the object content analysis the user has evaluated in the past. This technique is effective for applications locating textual documents relevant to a topic. The content-based recommendation systems are mainly used to recommend documents, web pages, publications, jokes or news. This method was used by many researches (Leung et al., 2006; Shih and Liu, 2008; Hung, 2005). However, CBF approaches suffer from multiple drawbacks, e.g., strong dependence on the availability of content, ignoring the contextual information of recommendation, etc. (Adomavicius and Tuzhilin, 2005).
An effective recommendation based on user behaviour
483
2.2 Collaborative filtering Based on the assumption that users with similar past behaviours have similar interests, a CF system recommends items that are liked by other users with similar interests. Recommendation is therefore achieved by finding common characteristics in the preferences of other users. Some approaches in this category are k-nearest-neighbour (k-NN) (Chen and Yin, 2006), matrix factorisation (Koren, 2008; Salakhutdinov and Mnih, 2008), and semi-supervised learning (Ding et al., 2007), etc. In addition, data mining techniques has provided a good resource of customer data that can be corporate with other information for recommendation (Hemalatha, 2012). Piramuthua et al. (2012) considered a specific type of bias that is introduced in online product reviews due to the sequence in which these reviews are written. The proposed method by Kim et al. (2011a) first predicts actual ratings and subsequently identifies prediction errors for each user. From this error information, pre-computed models, collectively called the error-reflected model, are built. Then, the models are applied to new predictions. In the other research, Kim et al. (2011b) first discovered useful and meaningful user patterns, and then enriched the personal model with collaboration from other similar users. Niknafs and Shiri (2008) presented an algorithm to take into account information about the change of selected item while a user is buying. However, CF requires more ratings over items, and therefore suffers from the rating sparsity problem (Adomavicius and Tuzhilin, 2005). Most collaborative-based recommendation systems require explicit expression of personal interests for items. Nevertheless methods for obtaining ratings implicitly have been investigated to add more ratings and reduce sparsity; sparsity yet is a critical drawback for recommender systems due to the extensive growing number of items (Ahn et al., 2010). Therefore, this research tries to tackle this problem by implementing an attribute-based approach.
2.3 Hybrid recommendation Combining several recommendation strategies can be expected to provide better results than either strategy alone (Mahdavi and Shepherd, 2004). Hybrid approaches combine the aforementioned techniques in different ways to improve recommendation performance and tackle the shortcoming of underlying methods. Most hybrids work by combining several input data sources or several recommendation strategies. Robin (2002) reviewed several hybrid recommender methods developed to combine the attributes and historical rating data for higher predication accuracy. According to the experiment results reported, it is believed that both attributes and the historical ratings have great values to estimate the predication function for recommendation. In addition, in real environment, users have unlimited and unpredictable desires and their preferences may vary within different product categories (Choi et al., 2006; Leung et al., 2006). For example, a user may be interested in buying inexpensive and pocket-size books, while this user may be interested to buy expensive and big toys. Therefore, it is better to discover user preferences based on attributes of product in different category of products separately. Using this approach, at first we reduce dimension of problem and increase scalability and also improve quality of recommendations.
484
3
M. Salehi
Proposed recommendation approach
In this section, the system framework is presented and the proposed recommendation mechanism is described step by step. Figure 1 shows the framework of the proposed recommender system. The input data consist of web server log files, product database, customer database and purchase database. The output is the personalised product recommendation list. After extraction the attribute of products, to generate recommendation, the proposed framework uses two main approaches. In the attribute-based approach, according to defined attributes, product profile is defined as a vector in which the values of attributes are assigned to the product. Then, according to user ratings on products, the attributes values of products transfer to user profile. Finally, system makes a personal preference matrix for each user and generates recommendation using content and CF. In the sequential-based approach, the latent patterns in purchasing of product are discovered using weighted association rules. Then, these patterns are used for recommendation. Final recommendations are generated by combination of results of two approaches. Figure 1
System framework of the proposed recommender system (see online version for colours)
Product database User database Content-based recommendation R2
Preference matrix construction
Purchase database
Final recommendation
Sequential recommendation R1
Finding sequential pattern using association rules
Web server log files
Recommendation
3.1 Attributes extraction of products In this research, attributes of products are extracted for each category separately in three steps. •
Product taxonomy formation: Product taxonomy is practically represented as a tree that classifies a set of products at a low level into a more general product at a higher level. The leaves of the tree denote the product instances, stock keeping units in retail jargon, and non-leaf nodes denote product classes obtained by combining several nodes at a lower level into one parent node (Cho and Kim, 2004). Several online retailers employed product taxonomy to give a clear view of their product
An effective recommendation based on user behaviour
485
lines in tree structure. In this phase, products of online store should be placed in product taxonomy structure. The marketing manager or domain expert should have great contribution in order to form product taxonomy. •
Product category formation: In this stage, similar products are identified and they are grouped together using product taxonomy so as to conduct the next steps in the reduced product space. This approach enables to handle products in the reduced dimensional space and increase scalability quality of recommendations. The marketing manager or domain expert categorises all products in the database by specifying the level of product aggregation on the product taxonomy. These categories from product taxonomy are proposed as a flexible way for the domain experts to apply multiple rules at a time by grouping similar rules together.
•
Attribute extraction for each category: It is very important to implement an efficient approach for discovering attributes of items. According to the domain of item (product), we should hold an expert panel. This panel should be arranged with experts in the field of marketing, distribution and even main producers of products. At the end of this phase, according to preliminary steps of specialised marketing research methods, the key attributes of products with attribute levels and values will be determined. For example, attributes can be determined in three steps. First, experts create basic list of attributes (Kleija and Musters, 2003; Kuhfeld, 2005); second, this list is modified through asking users to express their opinions about importance of items. Finally, experts generate final list according to preliminary list and users opinions (Oppewal and Louviere, 2000). Experts also determine levels and possible values for final attributes.
3.2 Attribute-based approach After extraction of attributes for products, we make a profile for products and users.
3.2.1 Product profiling According to the specified attributes, product profile is defined as a vector in which the values of category attributes are assigned to a product. When a product is registered in online retail store, its profile should be created. In this way, values are assigned to attributes according to expert’s diagnostics. Therefore, the product’s attribute-based model is defined as a multidimensional vector I = (A1, A2, …, AK) where Ak indicates the kth attribute’s name of category. In addition, according to expert opinions, we can consider a weight for each attribute to indicate its importance where AWk denotes the appropriate weight value and also
∑
K k =1
AWk = 1. For example, in book category you
may select subject, education level, price, author, cover, as attributes of category. In this situation, each book has some values for these attributes of the category, for example: ⎡ ( subject = Neural network ), ( Education level = PhD), ⎤ Book (i ) = ⎢ ⎥ ⎣ ( Price = low), ( Author = A2 ) , (cover = simple) ⎦
486
M. Salehi
3.2.2 User profiling User profile describes the interests or preferences of user in each attribute. According to user ratings on products, the attributes values of products transfer to user profile. Since the importance of specific attribute for each user can be determined based on the rating of user’s accessed products which have this attribute, we use a matrix to transfer user’ rating to user profile. For each user, system makes a personal preference matrix (shown in Figure 2) including K rows corresponding to the attributes of category and T columns corresponding to T visited product in the category. In this matrix sikj is the score of the attribute Ak by user i in the observation of product j that is calculated as follows: sikj = AWk .rij
(1)
where rij is rating of user i for the product j that has been visited. Figure 2
Personal preference matrix ri1
ri2
ri3
…
riT
A1
si11
si12
si13
…
si1T
A2
si21
si22
si23
…
si2T
A3
si31
si32
si33
…
si3T
A4
si41
si42
si43
…
si4T
…
…
…
…
…
…
AK
siK1
siK2
siK3
…
siKT
3.2.3 Recommendation In this work, the similarity between user behaviour and each product is calculated using the following equation: T
K
∑ w .∑ s
ikj .mk
ij
sim ( ui , p p ) =
j =1
( p p , pij )
k =1
T .K
(2)
In which wij is a weighting value for observation of product j by user i and is normalised with 1-norm. Since user’s recent purchased product preference plays an important role to the future purchase. The relative importance of each observation pre-determined as follows: wij = e − λ ( t ( pij ) −T )
(3)
where t(pij) is the order of product j in the recent observation by user i and λ is an adjustable parameter used to describe the change rate of user’s preference, this formula gives more weight to recent visited products mk(pp, pij) is a matching function between kth attribute of product pp and pij that is calculated as follows: ⎪⎧1 mk ( p p , pij ) = ⎨ ⎪⎩0
if value ( Ak , p p ) = value ( Ak , pij ) otherwise
(4)
An effective recommendation based on user behaviour
487
Therefore, if value of kth attribute for pp and pij be same mk(pp, pij) gets 1 otherwise 0. Products are ranked and recommended by calculating their similarity between the preference matrix of user.
3.3 Sequential-based recommendation approach The purchase processes (sequences of products in purchase) usually have some time-dependency relationship and are repeatability and periodicity. For example, a person that buys a camera may then buy a memory and the other accessories of camera. Therefore, the sequences between products in a purchasing process can reflect user’s latent purchasing pattern and preference (Luo et al., 2010). Therefore, we can mine user’s historical purchasing records for discovering the purchase sequential patterns. Then, using these sequential patterns, we can predict the most probable product that a user will purchase in near future to further improve quality of recommendations and solve new user problem. In this section, the weighted association rules are introduced and adapted for sequential pattern mining and recommendation of products in the purchase process.
3.3.1 Weighted association rule mining Association rules are very useful form of data mining that describe the probabilistic co-occurrence of certain events within a database. To further investigate whether user would choose potential product, association rule mining was used to extract the prediction rule set from the background of user. An association rule r is an expression of the form: A => B, where A and B is two set of items, A is the body and B is the head of the rule. The support for the association rule A => B is the percentage of transactions (sequences in this study) that contain both A and B among all transactions. The confidence of the rule A => B is the percentage of transactions that contain B among transactions that contain A. The support represents the usefulness of the discovered rule and the confidence represents certainty of the rule. The confidence is computed as follows: Confidence( A → B) =
support ( A ∪ B ) support ( A)
(5)
Rule r must satisfy a minimum confidence and a minimum support threshold to be involved in the discovered rules set. In order to improve the efficiency of association rules, we introduce a weighted version of association rules. The traditional association rules model only considers whether an item is present in a sequence or not. It is assumed that all items have the same importance and does not take into account the weight of an item within a sequence and all items in a sequence are treated uniformly. The idea of using a weighted association rule mining is useful because by considering the rating of item in the mining of rules and also recommendation process, we can take into account user’s interest and improve the quality of recommendation. Inspired by Tao et al. (2003), to associate a weight parameter with each item in an association rule, we implement a weighted association rules mining. In the following of this section, we modify the measures of Apriori algorithm to reflect the weighting approach. In these definitions, each sequence is considered as
488
M. Salehi
S = {(p1, r1), (p2, r2), …, (pm, rm)}, where pi represents product i, ri represents implicit rating for product i defined by equation (1) and m is the total number of products. For each sequence that item i is absent, we have ri = 0. Definition 1 Item weight: item weight represents the significance of item. In this research, weight scheme for item i is defined as follows: W ( pi ) = ri
(6)
Definition 2 Weight of an itemset in a sequence: Itemset sequence weight is the product of weights of all the items in the itemset present in a single sequence. Itemset sequence weight for an itemset X can calculated as: ⎧⎪ min W ( pi ) W ( X , s ) = ⎨∀ i∈X ⎪⎩0
X ⊆s X ⊄s
(7)
Definition 3 Weighted support: weighted support (WSP) is defined as average of itemset sequence weight on all the sequences in database as follows: S
WSP ( X ) =
∑W ( X , s) s =1,
S
(8)
where S is the number of sequences in the database. The problem of frequent pattern mining is to find the complete set of itemset satisfying a minimum support threshold in the database. In our model, an itemset is considered as frequent itemset if its WSP is above a given minimum WSP threshold. Definition 4 Weighted confidence: weighted confidence of the weighted association rule is formulated as follows: WC ( X → Y ) =
WSP( X ∪ Y ) WSP ( X )
(9)
Weighted association rules are extracted from server logs. The generated rules express behaviour characteristics of user’s purchasing process.
3.3.2 Recommendation mechanism To produce recommendation, we must search the left had side of weighted association rules which are most similar to an active user accessed product sequences by use of a similarity degree. We use a similarity measure for finding the most similar rules instead of exact match between the active user and rules. The weighted association rules and accessed product sequences of active user are represented as a set of item-weight pairs.
An effective recommendation based on user behaviour
489
This allows us to consider both the sequence of active user and the association rules as m-dimensional vectors over the space of item. Thus, the left-hand side of the each rule can be represented by as a vector: rL = {w1(rL), w2(rL), …, wm(rL)} where ⎧⎪W ( pi ) , wi ( rL ) = ⎨ ⎪⎩0
if pi ∈ rL
(10)
otherwise
where W ( pi ) was considered as mean of Item weight defined by equation (9) on the sequences that have the left hand side of rule. An active user sequence is also represented as a vector Sa = {w1(sa), w2(sa), …, wm(sa)} where wi(sa) is a significance weight associated with the item pi that is defined by equation (9) for the accessed sequence of user, if the user has accessed pi, and wi(sa) = 0, otherwise. Then, inspired by cosine similarity, the matching score between the associations rules that indicates relationships among items based on their co-occurrence in the accessed patterns of user and the current active sequence is defined as: Match score ( sa , rL ) =
∑ ∑
m i =1
m i =1
wi ( sa ) wi ( rL )
wi ( sa )
2
∑
m i =1
wi ( rL )
2
(11)
and rL represent the active user sequence and left hand side of weighted association rule, respectively. By using this measure, algorithm tries to find rules that are similar to the active user sequence. Finally, a recommendation score is calculated for each unvisited item for active user. In this research, three factors are used in calculating of recommendation score: the matching score of the active sequence to the weighted rule, and the weighted confidence and support of the rule:
Rec.Score ( sa , rL → pi ) = MatchScore ( sa , rL ) × WC ( rL → pi ) × WSP ( rL )
(12)
Finally, N items with the highest recommendation score are chosen as the recommendation set. By using the matching measure between the weighted rules and current sequence instead of exact match between them and also using the weighted confidence and support and matching score instead of just the confidence value, this approach try to improve recommendation results.
3.4 Recommendation We proposed two recommendation approaches: attribute-based recommendation (ABR) and sequential-based recommendation (SBR). To generate recommendation list for active user in a given category, candidate products are derived separately from two components or we can combine the result as follows: 1
Mixed of ABR and FBR (M-ABR-SBR): Recommendations from ABR and SBR recommenders are presented together.
2
Cascade of ABR and SBR (C-ABR-SBR): Recommendation results of SBR are ranked using ABR method. In other words, first we produce top-N recommendation using SBR method, and then we rank them using ABR recommendation method.
490
4
M. Salehi
Experiments
We have conducted a set of experiments to examine the effectiveness of our proposed recommender system in terms of sparsity and recommendation accuracy and quality.
4.1 Valuation metrics and dataset In order to check the performance of the proposed algorithm, a real-world dataset is applied in our simulations. We used web log data and purchase data from a large e-commerce company. According to experts’ opinions, product taxonomy must be formed and categories must be determined. In this research, to implement our approach, we consider book category only. Therefore, in this phase, the key attributes of this category must be obtained through three steps. In the first step of this phase, experts determined some attributes for grain categories. In the second step, importance of attributes was determined by some users through e-mail questionnaire. In the third step, experts determined final attributes and attribute values according to their knowledge and users opinions. The final attributes and their values for book category are shown in Table 1. Table 1
Selected attributes and their values for book category
Attribute
Values
Subject
Mathematic, information technology, …
Sub subject
Neural network, e-learning, …
Education level
Bachelor, Master, PhD
Author
Don Passey, Tom Forester, …
Cover type
Paperback, hardcover, …
Price
Very low, low, average, high, very high
Users’ ratings on books obtained from web log files. In our experiment, data pre-processing were applied to log files. This process includes data cleaning, user identification, session identification, and path completion task (Cho and Kim, 2004). After pre-processing the data, a transaction database, which contained 2,502 users and 23,045 transactions on books, was obtained. In experiments, the transaction data is ordered by user’ access timestamp, and then is divided into a training set and a test set. The algorithm is then trained on the training set and top N-items are predicted from that uses’ test set. The items that appear in both sets, becomes members of the special set which is called as the hit set.
4.2 Performance measure The precision and recall are most popular metrics for evaluating information retrieval system. For the evaluation of recommender system, they have been used by various researchers (Pazzani and Billsus, 2007; Herlocker, 2000). The precision is a measure of exactness and recall is a measure of completeness. Several ways to evaluate precision and recall exists. When referring to recommender systems the recall can be defined as follows:
An effective recommendation based on user behaviour Recall =
⏐test ∩ top − N⏐ ⏐test⏐
491 (13)
where top – N denotes the recommendation set and test denotes the test set. The precision when referring to recommender systems can be defined as follows: Precision =
⏐test ∩ top − N⏐ N
(14)
where N denotes number of recommendation. Since increasing the size of the recommendation set leads to an increase in recall but at the same time a decrease in precision, we can use F1 measure (Cho and Kim, 2004; Karypis, 2001; Sarwar et al., 2001; Shih and Liu, 2008) that is a well-known combination metric with the following formula: F1 = 2.
Precision.Recall Precision + Recall
(15)
Users interact with recommendation list and accuracy metrics cannot see this problem because they are designed to judge the accuracy of individual item predictions; they do not judge the contents of entire recommendation lists. Since the recommendation list should be judged for its usefulness as a complete entity, not just as a collection of individual items, in this research we also define an intra-list similarity metric inspired from Herlocker et al. (2004) as follows:
∑∑ ISM ( List ) =
f ( Ii , I j )
I i I j ,i ≠ j
⎛⏐List⏐⎞ ⎜⎜ ⎟⎟ ⎝2 ⎠
(16)
where f ( Ii , I j ) =
mat ( I i , I j ) m
(17)
where mat indicates number of matching attributes between Ii item Ij and. As it was said before m is number of considered attributes for item. Higher similarity denotes lower diversity. This measure uses to evaluate the quality of recommendation.
4.3 Performance comparison To compare the relative performance of ABR, SBR, M-ABR-SBR and C-ABR-SBR methods in the recommendation generation, an experiment is performed. Comparisons were produced for optimal values of λ and T (λ = 0.83 and T = 15) that have been obtained by trial and error. This comparison is based on number of recommendations for F1 measure that is presented in Figure 3. The results demonstrate that C-ABR-SBR method has the best performance. The results indicate cascade combination perform better than mix combination in the proposed attribute-based method. The relative
492
M. Salehi
performance of these methods for different number of recommendation is different, generally the best performance is from 20 to 25. Figure 3
Comparison of different recommendation methods (see online version for colours)
A comparative study can be implemented to evaluate the proposed methods. Table 2 presents a comparative study for recommendation quality between our proposed methods and optimal state of five different algorithms: user-based CF using Pearson correlation with default voting (DV) (Breese et al., 1998), item-based CF using adjusted cosine similarity (Sarwar et al., 2001), two hybrid recommendation algorithms used by Pazzani (1999) and Melville et al. (2002) and the personality diagnosis algorithm (Pennock et al., 2000) for making probabilistic recommendations. As can be seen, C-ABR-SBR generates better recommendations of the other algorithms. Since attributes of a product can still be used for finding similar products. Table 2
A comparison of prediction accuracy of various methods Recall
Precision
F1
C-ABR-SBR
0.382
0.677
0.488
M-ABR-SBR
0.365
0.593
0.452
ABR
0.358
0.581
0.443
SBR
0.327
0.543
0.408
User-based with DV
0.344
0.541
0.421
Item-based
0.311
0.512
0.387
Pazzani
0.352
0.571
0.436
Melville et al.
0.363
0.637
0.462
Personality diagnosis
0.343
0.572
0.429
4.3.1 Performance evaluation for different sparsity levels To illustrate that the proposed method can alleviate sparsity problem, we increased the sparsity level of the training set by dropping some randomly selected entries. However, we kept the test set same for each sparse training set. The performance of C-ABR-SBR algorithm was compared with other algorithms. Figure 4 shows that the performance does
An effective recommendation based on user behaviour
493
not degrade rapidly in the case of proposed algorithm. It is because; attributes of an item can still be used for finding similar items. Furthermore, this algorithm enriches item and user profiles with combining attribute information and sequential pattern information. Figure 4
Performance of algorithms under different sparsity levels (see online version for colours)
4.3.2 Performance evaluation for intra-list similarity In final experiment, to evaluate the quality of recommendation by C-ABR-SBR, it is compared with other algorithm based on defined intra-list similarity metric. As shown in Figure 5, C-ABR-SBR has lower ISM than any other algorithms that means higher diversity. Item-based CF, user-based CF has the lowest diversity and diversity in Pazzani (1999) and Melville et al. (2002) is approximately equal. By increasing number of recommendations, diversity decreases for all algorithms. Figure 5
The ism of algorithms with respect of n (number of recommendation) (see online version for colours)
494
5
M. Salehi
Conclusions
Companies use recommender systems to present personalised offers to their customer without spending much time and effort on information search. To address sparsity and have a good recommendation for user, this paper presents a novel personalised recommender system that utilises attribute and sequential pattern of purchased product in the unified model. After product taxonomy, category extraction and attribute determination for each category, preference matrix was introduced that can model the interests of user based on implicit rating of products in attribute-based approach. In sequential-based approach we use weighted association rules to find sequential pattern of purchased product. The method outperforms current algorithms and alleviates sparsity problem. For further researches, we can find similarity between users according to similarity between their preference matrixes and implement a collaborative-based approach.
Acknowledgements The author has considerably benefited from the valuable, constructive and helpful comments of the reviewers and expresses our sincere thanks.
References Adomavicius, G. and Tuzhilin, A. (2005) ‘Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions’, IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 6, pp.734–749. Ahn, H.J., Kang, H. and Lee, J. (2010) ‘Selecting a small number of products for effective user profiling in collaborative filtering’, Expert Systems with Applications, Vol. 37, No. 4, pp.3055–3062. Boucher-Ryan, P.D. and Bridge, D. (2006) ‘Collaborative recommending using formal concept analysis’, Knowledge-Based Systems, Vol. 19, No. 5, pp.309–315. Breese, J.S., Heckerman, D. and Kadie, C. (1998) ‘Empirical analysis of predictive algorithms for collaborative filtering’, UAI’98 Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp.43–52. Chen, J. and Yin, J. (2006) ‘Recommendation based on influence sets’, Proceedings of the Workshop on Web Mining and Web Usage Analysis, Citeseer. Cheung, K-W., Kwok, J.T. and Law, M.H. (2003) ‘Mining customer product ratings for personalized marketing’, Decision Support Systems, Vol. 35, No. 2, pp.231–243. Cho, Y.H. and Kim, J.K. (2004) ‘Application of web usage mining and product taxonomy to collaborative recommendations in e-commerce’, Expert Systems with Applications, Vol. 26, No. 2, pp.233–246. Choi, S.H., Kang, S. and Jeon, J.Y. (2006) ‘Personalized recommendation system based on product specification values’, Expert Systems with Applications, Vol. 31, No. 3, pp.607–616. Ding, C., Simon, H.D., Jin, R. and Li, T. (2007) ‘A learning framework using Green’s function and kernel regularization with application to recommender system’, Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.260–269. Frias-Martinez, E., Chen, S.Y. and Liu, X. (2009) ‘Evaluation of a personalized digital library based on cognitive styles: adaptivity vs. adaptability’, International Journal of Information Management, Vol. 29, No. 1, pp.48–56.
An effective recommendation based on user behaviour
495
Frias-Martinez, E., Magoulas, G., Chen, S.Y. and Macredie, R. (2006) ‘Automated user modeling for personalized digital libraries’, International Journal of Information Management, Vol. 26, No. 3, pp.234–248. Hemalatha, M. (2012) ‘Market basket analysis – a data mining application in Indian retailing’, International Journal of Business Information Systems, Vol. 10, No. 1, pp.109–129. Herlocker, J.L. (2000) Understanding and Improving Automated Collaborative Filtering Systems, Doctoral dissertation, Citeseer. Herlocker, J.L., Konstan, J.A., Terveen, L.G. and Riedl, J. (2004) ‘Evaluating collaborative filtering recommender systems’, ACM Transactions on Information Systems, Vol. 22, No. 1, pp.5–53. Hung, L.P. (2005) ‘A personalized recommendation system based on product taxonomy for one-to-one marketing online’, Expert Systems with Applications, Vol. 29, No. 2, pp.383–392. Karypis, G. (2001) ‘Evaluation of item-based top-n recommendation algorithms’, Proceedings of the ACM CIKM Conference, pp.247–254. Kim, B.M., Li, Q., Park, C.S., Kim, S.G. and Kim, J.Y. (2006) ‘A new approach for combining content-based and collaborative filters’, Journal of Intelligent Information Systems, Vol. 27, No. 1, pp.79–91. Kim, H.N., Ji, A.T., Ha, I. and Jo, J.S. (2010) ‘Collaborative filtering based on collaborative tagging for enhancing the quality of recommendation’, Electronic Commerce Research and Applications, Vol. 9, No. 1, pp.73–83. Kim, H-N., El-Saddika, A. and Job, G-S. (2011a) ‘Collaborative error-reflected models for cold-start recommender systems’, Decision Support Systems, Vol. 51, No. 3, pp.519–531. Kim, H-N., Hab, I., Leeb, K-S., Job, G-S. and El-Saddika, A. (2011b) ‘Collaborative user modeling for enhanced content filtering in recommender systems’, Decision Support Systems, Vol. 51, No. 4, pp.772–781. Kleija, F.T. and Musters, P.D.A. (2003) ‘Text analysis of open-ended survey responses: a complementary method to preference mapping’, Journal of Food Quality and Preference, Vol. 14, No. 1, pp.43–52. Koren, Y. (2008) ‘Factorization meets the neighborhood: a multifaceted collaborative filtering model’, Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.426–434. Kuhfeld, W.F. (2005) ‘Marketing research methods in SAS’, Marketing Research Methods in the SAS System: A Collection of Papers and Handouts, pp.21–46. Leung, C.W-K., Chan, S.C-F. and Chung, F-L. (2006) ‘A collaborative filtering framework based on fuzzy association rules and multiple-level similarity’, Knowledge and Information Systems, Vol. 10, No. 3, pp.357–381. Liu, D.R. and Shih, Y.Y. (2005) ‘Integrating AHP and data mining for product recommendation based on customer lifetime value’, Information and Management, Vol. 42, No. 3, pp.387–400. Luo, J., Dong, F., Cao, J. and Song, A. (2010) ‘A context-aware personalized resource recommendation for pervasive learning’, Cluster Computing, Vol. 13, No. 2, pp.213–239. Mahdavi, M. and Shepherd, J. (2004) ‘Enabling dynamic content caching in web portals’, Research Issues on Data Engineering: Web Services for E-Commerce and E-Government Applications, Proceedings. 14th International Workshop on, pp.129–136. Mathew, S.K. (2012) ‘Adoption of business intelligence systems in Indian fashion retail’, International Journal of Business Information Systems, Vol. 9, No. 3, pp.261–277. Melville, P., Mooney, R.J. and Nagarajan, R. (2002) ‘Content boosted collaborative filtering for improved recommendations’, Eighteenth National Conference on Artificial Intelligence, pp.187–192. Nielsen, J. (2006) Participation Inequality: Lurkers vs. Contributors in Internet Communities [online] http://www.useit.com/alertbox/participation_inequality.html (accessed 25 November 2012).
496
M. Salehi
Niknafs, A.A. and Shiri, M.E. (2008) ‘A new restoration-based recommender system for shopping buddy smart carts’, Int. J. of Business Information Systems, Vol. 3, No. 3, pp.284–299. Oppewal, H. and Louviere, J.J. (2000) ‘Modifying conjoint methods to model managers’ reactions to business environmental trends: an application to modeling retailer reactions to sales trends’, Journal of Business Research, Vol. 50, No. 3, pp.245–257. Palanivel, K. and Sivakumar, R. (2011) ‘A study on collaborative recommender system using fuzzy-multicriteria approaches’, Int. J. of Business Information Systems, Vol. 7, No. 4, pp.419–439. Pazzani, M. and Billsus, D. (2007) ‘Content-based recommendation systems’, The Adaptive Web, pp.325–341. Pazzani, M.J. (1999) ‘A framework for collaborative, content-based and demographic filtering’, Artificial Intelligence Review, Vol. 13, Nos. 5–6, pp.393–408. Pennock, D., Horvitz, E., Lawrence, S. and Giles, C. (2000) ‘Collaborative filtering by personality diagnosis: a hybrid memory and model-based approach’, Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp.473–480. Piramuthua, S., Kapoora, G., Zhoub, W. and Mauwd, S. (2012) ‘Input online review data and related bias in recommender systems’, Decision Support Systems, Vol. 53, No. 3, pp.418–424. Robin, B. (2002) ‘Hybrid: recommender systems: survey and experiments’, Journal of User Modeling and User-Adapted Interaction, Vol. 12, No. 4, pp.331–370. Salakhutdinov, R. and Mnih, A. (2008) ‘Probabilistic matrix factorization’, Proc. Advances in Neural Information Processing Systems 20 (NIPS 07), ACM Press, pp.1257–1264. Salehi, M. and Nakhai Kamalabadi, I. (2012) ‘A hybrid attribute based recommender system for e-learning material recommendation’, IERI Procedia, Vol. 2, pp.565–570. Salehi, M., Nakhai Kamalabadi, I. and Ghaznavi Ghoushci, M.B. (2012) ‘A new recommendation approach based on implicit attributes of learning material’, IERI Procedia, Vol. 2, pp.571–576. Sarwar, B., Karypis, G., Konstan, J. and Riedl, J. (2001) ‘Item-based collaborative filtering ecommendation algorithms’, Proceedings of the international World Wide Web Conference (WWW’ 10), pp.285–295. Semeraro, G., Lops, P. and Degemmis, M. (2005) ‘Word net-based user profiles for neighborhood formation in hybrid recommender systems’, Proceedings of the 5th International Conference on Hybrid Intelligent Systems, pp.291–296. Shih, Y.Y. and Liu, D.R. (2008) ‘Product recommendation approaches: collaborative filtering via customer lifetime value and customer demands’, Expert Systems with Applications, Vol. 35, Nos. 1–2, pp.350–360. Tao, F., Murtagh, F. and Farid, M. (2003) ‘Weighted association rule mining using in weighted support and significance framework’, Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining, pp.661–666, Washington DC, USA. Weng, S.S. and Liu, M.J. (2004) ‘Feature-based recommendations for one-to-one marketing’, Expert Systems with Applications, Vol. 26, No. 4, pp.493–508.