An iterative semi-explicit rating method for building ... - Semantic Scholar

4 downloads 57392 Views 207KB Size Report
An iterative semi-explicit rating method for building collaborative recommender ... e.g., electronic commerce, Web 2.0, and web personalization. Over the last decade, they ... and the information about the items to be recommended, more ad-.
Expert Systems with Applications 36 (2009) 6181–6186

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

An iterative semi-explicit rating method for building collaborative recommender systems Buhwan Jeong a, Jaewook Lee b,*, Hyunbo Cho b a b

Data Mining Team, Daum Communications Corp, 1730-8 Odeung, Jeju 690-150, South Korea Department of Industrial and Management Engineering, Pohang University of Science and Technology (POSTECH), San 31 Hyoja Pohang, Kyungbuk 790-784, South Korea

a r t i c l e

i n f o

Keywords: Collaborative filtering Data sparsity Explicit rating Recommender system Semi-explicit rating

a b s t r a c t Collaborative filtering plays the key role in recent recommender systems. It uses a user-item preference matrix rated either explicitly (i.e., explicit rating) or implicitly (i.e., implicit feedback). Despite the explicit rating captures the preferences better, it often results in a severely sparse matrix. The paper presents a novel iterative semi-explicit rating method that extrapolates unrated elements in a semi-supervised manner. Extrapolation is simply an aggregation of neighbor ratings, and iterative extrapolations result in a dense preference matrix. Preliminary simulation results show that the recommendation using the semi-explicit rating data outperforms that of using the pure explicit data only. Ó 2008 Elsevier Ltd. All rights reserved.

1. Introduction Recommender systems have gained more importance ever before as the increasing popularity of Internet and social networking, e.g., electronic commerce, Web 2.0, and web personalization. Over the last decade, they are ones of the most successful applications both in academia and in industry. Success stories can be found in recommending books and CDs at Amazon.com (Linden, Smith, & York, 2003), movies by MovieLens (Miller, Albert, Lam, Konstan, & Riedl, 2003), news by GroupLens (Konstan et al., 1997) and by MONERs (Lee & Park, 2007), ESL reading lessons (Hsu, 2008) and so forth. Nonetheless, current state-of-the-art shows that they require further improvements to make them more effective and applicable to a broader range of real-life applications. For example, developments of better methods for representing user behavior and the information about the items to be recommended, more advanced recommendation methods that incorporate various contextual information into the recommendation process and utilize multi-criteria ratings, and less intrusive and more flexible recommendation methods require to be further enhanced (Adomavicius & Tuzhilin, 2005). The paper particularly concentrates on an improvement of capturing better user behaviors, i.e., rating the user preference. Rating for recommender systems (or collaborative filtering in particular) results in a user-item preference matrix by means of either explicit rating or implicit rating. In the explicit rating, each user examines items and assigns them rating values on a rating scale, while in the implicit rating the rating values are presumed * Corresponding author. E-mail address: [email protected] (J. Lee). 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.07.085

based on the user’s behaviors such as purchase of the item, access to the information content, time duration to read the content, actions (e.g., save, print, delete) applied to the content, etc. It is reported that the explicit rating captures user preferences to items more accurately than implicit rating does (Nichols, 1998). However, the latent problem of the explicit rating, i.e., data sparsity (which is usually severer than that of the implicit rating), makes it hard to manipulate the rating matrix – i.e., recommending items to an active user – in a pragmatic sense. The paper aims to propose a novel rating method, namely semiexplicit rating (SER), to overcome the sparsity problem. The proposed method extrapolates the rating scores of unrated elements in the principle of semi-supervised learning (Jeong, Lee, Cho, & Lee, 2008; Lee & Lee, 2005, 2006, 2007), in that by manipulating a few labeled/rated elements mathematically a number of the rest unlabeled/unrated elements are estimated. Especially to enhance the recommendation accuracy, the proposed method iteratively updates the user-item preference matrix until it becomes stabilized. The remainder of the paper is organized as follows: Section 2 addresses previous works on recommender systems, especially on collaborative filtering. Section 3 presents the details of the proposed method, followed by preliminary validations via numerical experiments in Section 4. Finally, the concluding remarks and future works are given in Section 5. 2. Related works Due to massive diversity in algorithms and applications, this section briefly reviews the key research branches of the recommender systems and collaborative filtering relevant to this paper. For more comprehensive reviews and comparison, see references

6182

B. Jeong et al. / Expert Systems with Applications 36 (2009) 6181–6186

such as Adomavicius and Tuzhilin (2005), Deshpande and Karypis (2004) and Candillier, Meyer, and Boullé (2007). The recommendation problem is to maximize an active user’s satisfaction by suggesting him/her a set of items from many. According to the definition by Adomavicius and Tuzhilin (2005), the user satisfaction can be formulated as a utility function u that measures the usefulness of an item g to the user c, i.e., u: C  G ? R, where C is the set of all users, G is the set of all possible items that can be recommended, and R is a totally ordered set in nonnegative real numbers within a certain range. Note that the sizes of both C and G are very large – up to more than millions in some cases. Then, for each user c 2 C, the objective is to choose an item g0 2 G such that maximizes the user’s utility, more formally, 8c 2 C; g 0c ¼ arg maxg2G uðc; gÞ. The recommender systems can be commonly classified into the following three types based on how recommendations are made: content-based recommendations, in which the user will be recommended items similar to the ones the user preferred in the past; collaborative recommendations, in which the user will be recommended items that people with similar preferences liked in the past; and hybrid approaches, in which collaborative and contentbased recommendations are mixed. First, the content-based methods utilize user profiles that contain information about users’ tastes, preferences, and needs, and item profiles that are a set of attributes characterizing an item g. The techniques used in information retrieval/text mining such as vector space model and term frequency/ inverse document frequency (TF-IDF) are used for these recommender systems. Second, the collaborative methods (or collaborative filtering) predict the utility of items for a particular userbased on the items previously rated by other users. The underlying assumption is that similar users have similar preferences. A useritem rating matrix R  RjCjjGj is augmented for collaborative filtering systems. According to Breese, Heckerman, and Kadie (1998), algorithms in this type can be classified into memory-based and model-based ones. The memory-based algorithms, the mathematical details of which will be provided in the next section, estimate the value of unknown rating rcg for user c and item g as an aggregate of the ratings of some other users for the same item g. On the other hand, the model-based algorithms make a classifier trained from the collection of ratings, and then predict future ratings. Finally the hybrid methods are nothing but an integration of collaborative and content-based methods to avoid each other’s limitations. See Adomavicius and Tuzhilin (2005) for the detailed survey and exemplary recommender systems. Specifically the collaborative methods can be categorize into a user-based approach and an item-based approach according to the searching order. The former user-based approach, more popular at present, first finds a small group of users having similar preferences (i.e., nearest neighbors to the active user) and then suggests the items the group commonly shares (e.g., purchase, access, read, etc.). Despite its popularity, the user-based approaches have some problems in practice – data sparsity, scalability, and real-time performance (Grcar, Mladenic, Fortuna, & Grobelnik, 2006; Herlocker, Konstan, Terveen, & Riedl, 2004; Sarwar, Karypis, Konstan, & Reidl, 2001). On the other hand, the recent item-based approach directly looks for a set of items similar to an active item. It roughly consists of measuring similarity between items and then predicting a recommendation item. The item similarity is often computed in terms of cosine, correlation, and conditional probability as the user similarity, whereas the prediction employs weighted sum and regression (Herlocker et al., 2004; Lee, Jun, Lee, & Kim, 2005; Sarwar et al., 2001). One of the most important issues in collaborative filtering for recommendation accuracy is how to prepare the user-item preference matrix. The matrix can be filled either explicitly or implicitly, and hybrid rating is also possible. The explicit rating constructs the

user-item matrix with users’ explicit rating scores on a certain rating scale, so that it can exactly express users’ tastes and preferences. However, it has some crucial weaknesses: ambiguity in the use of appropriate scales, difficulty in providing motivation and incentives for evaluators, detecting biased and malicious evaluators, and achieving a critical mass of users to avoid data sparsity (Nichols, 1998). Users tend to rate an item more frequently if they feel it is good, and not to rate otherwise. On the other hand, the implicit rating constructs the user-item matrix by observing users’ behaviors such as whether or not an action (e.g., purchase, access, save, print, reply) is performed to the item, how long they spend time on reading, for example, the item, and how many times they have browsed the item, and so on (Lee et al., 2005; Nichols, 1998). The resulting matrix is usually less sparse, but the scores are assumed/implicit thereby less informative. The explicit rating provides a better user-item matrix for plausible predictions about the interests of a user, provided that every user is even, rational, unbiased, and correct. The focus of this paper is to overcome the data sparsity problem in the user-item matrix. Widely used ways to deal with this problem are to use dimension reduction techniques such as a naïve method to select relevant users and/or items only (e.g., eliminate sparse rows/columns from the user-item matrix), or a more sophisticated method based on linear algebra and statistical analysis such as the singular value decomposition (SVD, or named as LSA/LSI (Latent Semantic Analysis/Indexing) in many applications) and principal components analysis (PCA) (Grcar et al., 2006). These dimension reduction techniques not only resolve the data sparsity and scalability problems, but also improve recommendation accuracy. In addition, the item-based collaborative filtering is known to be very effective in dealing with such sparse data (Grcar et al., 2006; Sarwar et al., 2001). Other approaches include horting, clustering, and Bayesian networks (Grcar et al., 2006). Nonetheless, the original matrix still remains sparse. 3. Semi-explicit rating and recommendation prediction This section presents a novel extrapolation method, namely semi-explicit rating (SER), that estimates unrated elements in the user-item preference matrix. The method is based on the semisupervised learning principle, in that a number of unrated elements are filled by numerical inference of a few (sparse) explicit ratings. 3.1. Basic idea to extrapolate unrated elements The user-item preference matrix Rð¼ ½r ij Þ  RNM contains N users’ preferences to M items, i.e., an element rij represents user i’s rating of item j, as shown in Fig. 1. To extrapolate an unrated element rij, we employ the memory-based approaches that infer the rating from neighbor users’ ratings rlj by a formulation of rij = f(rlj, SimU(i, l)), where l(–i & 6N) is the index of the users who rated the active item j, f() is an aggregation function, and SimU(i,l) is the similarity between users i and l. Some examples of the aggregation function are

rij ¼ jc

X

U

Sim ði; lÞ  r lj

l

rij ¼ r i; þ jc

X

U

Sim ði; lÞ  ðr lj  r i; Þ;

ð1Þ ð2Þ

l

where multiplier jc serves as a normalizing factor and is usually seP U lected as jc ¼ 1= l jSim ði; lÞj, and where r i; in (2) is the average rating of the active user i. Eq. (1) is the most common aggregation function where the similarity measure SimU(i, l) is used as a weight, but it has a shortcoming in that different users may use different

6183

B. Jeong et al. / Expert Systems with Applications 36 (2009) 6181–6186

1

2

...

j

...

terms of the portion of common items they have rated. This can be viewed as item-perspective extrapolation because the estimated rating is an aggregation of the ratings of the item. Since the item is expressed in a column vector, this is also named as column-oriented computation, as shown in Fig. 2A. Another insight is that the rating rij may be written as a weighted aggregation of the ratings the active user previously evaluated. This sets the unknown rating by a default value. In this case, the weight in the equations above is replaced with item similarity SimI(j, k), by the same computation as follows:

M

1 2 ... rij

i ...

Sim(i, l)

rlj

l

rij ¼jg

...

X

rij ¼r ;j þ jg

N

I

Sim ðj; kÞ  rkj

k

X

ð3Þ

I

Sim ðj; kÞ  ðr ik  r ;j Þ;

ð4Þ

k

where r ;j is an average rating of the active item j, and jg ¼ P I 1= k jSim ðj; kÞj is the normalizing factor. Alternatively, we may use jg = 1/Ki, where Ki is the number of items rated by the user i, to give a penalty to unrated elements. This approach is illustrated in Fig. 2B, and named as user-/row-perspective extrapolation. Our proposed method combines both the item-perspective and user-perspective extrapolations as depicted in Fig. 3. Specifically, we take a weighted average as in Eq. (5). Since the user-perspective extrapolation rcij represents the default value for a user c, we can modify Eq. (2) into Eq. (6) as follows:

Fig. 1. Notation and extrapolation of an unrated element.

rating scale (Adomavicius & Tuzhilin, 2005). The aggregation function in Eq. (2) overcomes it by using deviations from the average rating of the corresponding user (Resnick, Iacovou, Suchak, Bergstorm, & Riedl, 1994). An important factor to the equations above is how to measure similarity SimU(i, l) between users. Popular similarity measures based on their ratings of items that both users have rated include correlation-based, cosine-based, adjusted cosine-based, and conditional probability-based measures. Firstly, the correlation-based measure is defined in terms of the Pearson correlation coefficient that evaluates the degree of linear relationship between two users (Grcar et al., 2006). Secondly, the cosine-based one treats the users as M-dimensional vectors and compute the cosine of the angle between them. Thirdly, the adjusted cosine-based one incorporates the difference in rating scale between different users (Sarwar et al., 2001). Finally, the conditional probability-based one simply uses the ratio of common items two user share (Deshpande & Karypis, 2004).

1

...

j

...

2

l

...

k

rij

i

...

As described above, the memory-based collaborative method computes an unknown rating rij of an item j for the active user i by solving either Eq. (1) or Eq. (2). That is, the solution rij is a weighted average of the ratings rlj by the neighbors who have already rated for the item, where the weight is set by the user similarity SimU(i, l) between the active user and his/her neighbors in

N Fig. 3. Mixture of item- and user-perspective extrapolations.

A

B 2

...

j

...

M

1

2

...

j

1

l

2

... rij

M

1

3.2. Various views of rating extrapolation

1

2

i

k

rij

... N Fig. 2. Item-/column-perspective and user-/row-perspective extrapolation.

...

M

6184

B. Jeong et al. / Expert Systems with Applications 36 (2009) 6181–6186

r ij ¼arUij þ ð1  aÞr Iij X I Sim ði; lÞ  ðr lj  r Uij Þ; r ij ¼r Uij þ j

3.5. Recommendation

ð5Þ ð6Þ

It is now evident to suggest recommendations (either only one item or top-N items) to the active user using the preference matrix generated by the proposed method. Since the proposed method extends the memory-based approach, we simply select the top-N highest rating items (except those the user already has). Or, we may build a classifier (or a prediction model) using the rating matrix as the model-based approach does. Item-based approaches are also applicable. The advantages of the proposed method is that we can not only cope with the data sparsity problem of the explicit rating, but also provide scalability and real-time performance. Clearly, the final rating matrix has no null element, so that the rating matrix becomes dense. Also, since the rating matrix is constructed in off-line, the real-time applications does not require extra time in computation, except selecting recommendations.

l

where r Uij and r Iij represent the estimated rij by user- and item-perspective extrapolations, respectively, and a 6 1 is the weight control parameter. 3.3. Graphical representation of extrapolation Before reaching the final stage of the proposed method, we want to mention another intuition about the extrapolation in a graphical view. The extrapolation of an unrated element is represented in a graphical model depicted in Fig. 4. Abstractly, the extrapolation is expressed as a typical Sum-Product form. Further studies need to be investigated in a graphical manipulation of the rating extrapolation. 3.4. Iterative procedure of semi-explicit rating

4. Preliminary experiments The unrated elements filled by the extrapolation method described above are neither complete nor stable since, as unrated elements are filled with new extrapolated values, the similarity, both user similarity SimU(i, l) and item similarity SimI(j, k) change and so must be adjusted. The latter fact involves complex processes to complete a user-item preference matrix because (i) every extrapolation requires its own extravagant similarity computations, and (ii) the extrapolation procedure is recursive, i.e., next extrapolations affect previous extrapolation results. To facilitate this process, we envision an iterative semi-explicit rating procedure, in which an iteration fills all the unrated elements with initial similarity matrices, and the next iteration starts with re-computed similarity matrices. The procedure terminates when the user-item matrix becomes stable. Note that periodic updates are required for incorporating new users, items, and explicit ratings. The iterative extrapolation procedure is detailed below:

4.1. Simulation setting Preliminary simulations are conducted to validate the underpinning concept of the proposed method. The simulation is limited for it is intended only to show the validity of using the method. The dataset used is the MovieLens (ML) data, which contain 100,000 explicit ratings (on 1–5 rating scale) from 943 users and 1682 items (Sarwar et al., 2001). Note that the ML data are very sparse: 100;000 entries ¼ 1  9431682 ). the sparsity level is about 93.7% (i.e., 1  nonzero total entries For the underlying performance evaluation metric, we use the mean absolute error (MAE) between ratings and predictions. The MAE is a widely used metric, and measures the deviation of recommendations from their true user-specified values. For each pair of true rating and prediction (tv, pv), the MAE is defined as P MAE ¼ Vv¼1 jt v  pv j=V, where V is the total number of corresponding rating-prediction pairs. The lower the MAE is, the more accurately the recommender system predicts user ratings. We also use the correlation coefficient as an annexed metric. In this experiment, we use the cosine similarity, the user-based recommendation algorithm (Eq. (1)), and the modified weighted aggregation function (Eq. (6)). For a validation, we resort to 10-fold cross validation. Many of the other simulation parameters (e.g., training/test data ratio = 0.8) are set to follow the pre-test results in Sarwar et al. (2001), if not specified. Recall that the simulation aims to show the validity of the rating method, not to compare the performance of recommendation algorithms, even though they are critical in a

Step 0: Given a user-item preference matrix, compute both user similarity matrix Sc  RNN and item similarity matrix Sg  RM  M. Step 1: Extrapolate the user-item preference matrix using the similarity matrices and the explicit ratings. Note that the explicit ratings that the users explicitly evaluated never change. Step 2: Check the stability, i.e., the difference between previous preference matrix and resulting matrix. Terminate the procedure if the matrix is stable. Otherwise, go to step 1 with newly computed user similarity and item similarity matrices.

User (C) r1j

r 2j

Item(G)

rlj

rNj

ri1

Sim(i, l)

rik

Sim(j, k)

rij Fig. 4. Graphical representation of rating extrapolation.

riM

6185

B. Jeong et al. / Expert Systems with Applications 36 (2009) 6181–6186

recommender system. For this reason, we feel free to choose specific simulation settings.

0.82

MAE

0.80

4.2. Experiment results using ML data The simulation results are depicted in Fig. 5. First, we measured the rating stability by means of the mean difference of the unrated elements. The mean differences at the second and third iterations are very high because the similarities between users and between items are computed with updated ratings. After that, the proposed method gradually converges as shown in Fig. 5A. The second observation is that recommendation using semi-explicit ratings gives better performance, in terms of recommendation accuracy MAE, than that of using explicit ratings only does. In Fig. 5B, the iteration 0 indicates the MAE of the recommendation by the memory-based recommendation algorithm (Eq. (1)) using the pure explicit data. Except iteration 1 when the rating matrix is very unstable, the recommendations using the SER data (MAE = 0.7398) are better than those using the explicit data only (iteration 0, MAE = 0.8089). It is noted that the best MAE of using the cosine similarity in Sarwar et al. (2001) was over 0.83 under a similar simulation environment, e.g., use of the ML data set and training/test data ratio. This, however, does not imply the superiority of the user-based recommendation (as used in our simulation) to the item-based one, but show the advantage of the criterial performance of using the cosine similarity and extra improvements from using the SER data. In addition, the correlation coefficient also increases from 0.4393 (iteration 0) to 0.5593 (iteration 7), while the extrapolation requires very much time due to numerous unrated elements. In particular, the computation time per iteration is around 7 s (in IBM compatible PC with 2.4 GHz CPU and 1 G Byte RAM memory, and implemented using Matlab). This does not, however, pose a problem since the extrapolation is to prepare better rating data in the off-line, but not to predict recommendations in the on-line. Note that little fluctuation at the stable tail (i.e., iterations 3–7) is ignorable for it averages 10-fold validations. We conducted additional simulations to show the effect of mixture ratio between item-perspective and user-perspective extrapolations in Eq. (5). The aggregation weight a is set from 0 (userperspective) to 1 (item-perspective) by 0.1. As shown in Fig. 6, the best accuracy MAE = 0.7355 is achieved at near a = 0.5. We skip redundant results from other simulation settings, e.g., use of different similarity measures, for they give similar results. 4.3. Additional experiments using BX data The Book-Crossing (BX) data set, collected from the Book-Crossing community, contains 1,149,780 ratings (i.e., 433,681 explicit ratings on 1–10 scale and 716,109 implicit ratings) by 278,858 users on 271,379 books (Ziegler, McNee, Konstan, & Lausen, 2005). Due to the uncontrollable data size, we have picked out

A

0.90

1.5

0.85

1.0

0.80

0.5

0.75

0.76 0.74 0.72

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

4.4. Discussion During the simulations, we have recognized an averaging effect, by which every unrated element tends to be filled with a value around the average rating of the user and/or item. Moreover, the large sparsity, i.e., a great number of unrated elements, requires very much computation time, even though it is performed off-line. An initial naïve implementation gets the results in more than 1 h (for the ML data and a total of seven iterations), but after optimizing the process we could obtain them in minutes. To expedite it, we may design a novel hybrid rating method, in which the proposed method is applied only to those elements having corresponding implicit ratings, a situation that a user has purchased/ accessed an item, but not rated it at all. In this case, it is rational to rate such items by a default rating value. However, experiments using BX1/BX2 data showed that this suggestion is not so effective as expected. In addition, although the simulations clearly show that the proposed method is effective in the cleansed ML/BX data (sparsity = 94–99%), the method may fail to obtain a robust rating matrix for extremely severe sparse data. Remarkably, in our simulations the sparser BX data achieved better improvements.

MAE (Mean Absolute Error) 0.883

0.815

0.742 0.740 0.740 0.740 0.740

0.70 0

1

2

3

4

5

6

7

α

two smaller data sets, BX1 and BX2, which contain 19,886 and 40,620 explicit ratings, respectively. We have randomly selected users who rated more than 15 books and items which are rated by at least three users. The size of the BX1 and BX2 data sets is 1441  616 and 880  2142, respectively. Accordingly, their sparsity levels are 97.76% and 98.99%. Note that we re-scaled the rating into 1–5 scale. In addition, we used Eq. (5) with a = 0.5 only for it gives the best performance in previous simulations. The simulation results are summarized in Table 1, in which ‘Initial (ER)’ implies the recommendations using explicit rating data and ‘Final (SER)’ does those of using the stable SER data. Similarly, the BX1/BX2-SER matrices have become stable after iteration 4, and terminated at iteration 7. Similar to the results for the ML data, the SER method improved the recommendation accuracy approximately from 0.60 to 0.46 in MAE and from 0.28 to 0.60 in correlation. However, it is observed that the method is very sensitive to the size of the rating matrix in extrapolation time.

0.809

0

0

Fig. 6. Sensitivity of recommendation accuracy according to a.

B

Rating stability

2.0

0.78

0

1

2

3

4

Fig. 5. (A) Rating stability and (B) recommendation accuracy.

5

6

7

6186

B. Jeong et al. / Expert Systems with Applications 36 (2009) 6181–6186

Table 1 Brief descriptions of ML/BX1/BX2 data sets and summary of simulation results

Dataset description

MAE Correlation coefficient

No. of explicit ratings Matrix size Sparsity (%) Initial (ER) Final (SER) Initial (ER) Final (SER)

Computation time per iteration (s)

5. Conclusion The recommender systems, or collaborative filtering in particular, have been omnipresent in various applications such as products recommendation, spams filtering, web personalization, etc. As the amount of information content grows, the importance of accurate recommender systems increases. The availability of correct user-item preference matrices is critical to build a better system. The explicit rating method usually gives a better preference matrix than the implicit rating methods does. However, the preference matrix by the explicit rating is often much sparser. We have proposed a generative rating method, namely iterative semi-explicit rating (SER), that extrapolates unrated elements with neighbor ratings. The underlying computation of extrapolation is the same as that of the memory-based algorithm. By visiting all the unrated elements and iteratively extrapolating them, we finally construct a full preference matrix. The preliminary simulations show that recommendation using the data constructed by the proposed method outperforms the method using the pure explicit data only. To validate the proposed method, we have experimented a relatively small data set only. Exhaustive experiments with much larger data from real applications needs to be further investigated. We also expect diverse further studies in the semi-supervised rating approach. For example, instead of using all the related ratings ri* and r*j, we may extrapolate using partial ratings from a group of similar users and items after co-clustering (e.g., Araujo, Trielli, Orair, Ferreira, & Guedes, 2006). Acknowledgements Thanks to Shyong Lam and Jon Herlocker for cleaning up and generating the MovieLens (ML) data set, and to Cai-Nicolas Ziegler and Ron Hornbaker for the Book-Crossing (BX) data set. This work was supported partially by the Korea Research Foundation under the Grant No. KRF-2008-314-D00483 and partially by the KOSEF under the Grant No. R01-2007-000-20792-0. References Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749.

ML

BX1

BX2

100,000 943  1682 93.69 0.8089 0.7398 0.4393 0.5593 7

19,886 1441  616 97.76 0.5830 0.4645 0.2846 0.6101 3

40,620 1880  2142 98.99 0.6019 0.4573 0.2645 0.5840 25

Araujo, R., Trielli, G., Orair, G., Ferreira, W. M., Jr., R., & Guedes, D. (2006). ParTriCluster: A scalable parallel algorithm for gene expression analysis. In Proceedings of the 18th international symposium on computer architecture and high performance computing (SBAC-PAD’06) (pp. 3–10). Breese, J. S., Heckerman, D., & Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th conference on uncertainty in artificial intelligence (UAI-98) (pp. 43–52). Candillier, L., Meyer, F., & Boullé, M. (2007). Comparing state-of-the-art collaborative filtering systems. Machine Learning and Data Mining in Pattern Recognition LNCS, 4571, 548–562. Deshpande, M., & Karypis, G. (2004). Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems, 22(1), 143–177. Grcar, M., Mladenic, D., Fortuna, B., & Grobelnik, M. (2006). Data sparsity issues in the collaborative filtering framework. Advances in Web Mining and Web Usage Analysis LNAI, 4198, 58–76. Herlocker, J. L., Konstan, J. A., Terveen, L. G., & Riedl, J. T. (2004). Item-based top-n recommendation algorithms. ACM Transactions on Information Systems, 22(1), 5–53. Hsu, M.-H. (2008). A personalized English learning recommender system for ESL students. Expert Systems with Applications, 34(1), 683–688. Jeong, B., Lee, D., Cho, H., & Lee, J. (2008). A novel method for measuring semantic similarity for xml matching. Expert Systems with Applications, 34(3), 1651–1658. Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L., Gordon, L. R., & Riedl, J. (1997). GroupLens: Applying collaborative filtering to usenet news. Communication of ACM, 40(3), 77–87. Lee, J.-S., Jun, C.-H., Lee, J., & Kim, S. (2005). Classification-based collaborative filtering using market basket data. Expert Systems with Applications, 29(3), 700–704. Lee, J., & Lee, D. (2005). An improved cluster labeling method for support vector clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3), 461–464. Lee, J., & Lee, D. (2006). Dynamic characterization of cluster structures for robust and inductive support vector clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1869–1874. Lee, D., & Lee, J. (2007). Equilibrium-based support vector machine for semisupervised classification. IEEE Transactions on Neural Networks, 18(2), 578–583. Lee, H., & Park, S. (2007). MONERS: A news recommender for the mobile web. Expert Systems with Applications, 32(1), 143–150. Linden, G., Smith, B., & York, J. (2003). Amazon. com recommendations: Item-toitem collaborative filtering. IEEE Internet Computing, 7(1), 76–80. Miller, B., Albert, I., Lam, S., Konstan, J., & Riedl, J. (2003). MovieLens unplugged: Experiences with an occasionally connected recommender system on four mobile devices. In Proceedings of the 17th annual human–computer interaction conference (HCI 2003) (pp. 263–266). Nichols, D. (1998). Implicit rating and filtering. In Proceedings of the fifth DELOS workshop on filtering and collaborative filtering (pp. 31–36). Resnick, P., Iacovou, N., Suchak, M., Bergstorm, P., & Riedl, J. (1994). GroupLens: An open architecture for collaborative filtering of netnews. In Proceedings of 1994 ACM conference on computer supported cooperative work (pp. 175–186). Sarwar, B. M., Karypis, G., Konstan, J. A., & Reidl, J. (2001). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on world wide web (pp. 285–295). Ziegler, C.-N., McNee, S. M., Konstan, J. A., & Lausen, G. (2005). Improving recommendation lists through topic diversification. In Proceedings of the 14th international world wide web conference (WWW’05) (pp. 22–32).

Suggest Documents