2009 Fourth ChinaGrid Annual Conference
A Collaborative Filtering Recommendation Model Using Polynomial Regression Approach
Houkun Zhu, Yuan Luo, Chuliang Weng, Minglu Li Computer Science and Engineering Department Shanghai Jiao Tong University, Shanghai, China, 200240
[email protected] platform is equipped with the capability of evaluating every virtual machine. Recommender system could play a significant role in safeguarding grid environment and virtual machine systems. As an important branch of recommender system, collaborative filtering recommender systems could be mainly divided into two main categories: user-based and item-based [1], [10]. User-based CF systems generate a prediction by utilizing rating information from similar user profiles. By measuring similarity, a group of users, that is neighbors, who have similar tastes with the active user can be found. Recommender systems combine the preference of the neighbors and output the predictions or recommendations to the active user (the user whom the prediction is for) [3], [5], [8]. Furthermore, this approach is widely deployed in practice. In item-based CF systems, recommendations are achieved by analyzing the similarities among items. Item-based CF originates from the fact that we are willing to buy the items similar to what we have bought. The linear regression-based approach on the basis of typical item-based CF was firstly put forward to generate recommendations in [4]. Instead of directly using the ratings of similar items, this approach uses an approximation of the ratings based on the regression model. In practice, the similarities computed by cosine or correlation measures may be misleading in the sense that two rating vectors may be distant (in Euclidean sense) yet may have very high similarity. The poor predictions generated by item-based CF could be improved through linear regression-based approach, since the ratings of similar items are not utilized directly. In addition, experiments on a movie database proved that the regression-based approach on the basis of item-based CF provides significantly better accuracy and is two orders of magnitude as fast as the item-based alternatives directly using the ratings of similar items [2]. In this study, we propose a new recommendation model using the user-based CF and polynomial regression. In term of the characteristics of our model, regression order and dataset size (see3.1), are imported to probe the features of our recommendation model. Thus, a cluster of recommendation algorithms could be derived due to various regression order and dataset size. As to the main process, firstly, a group of similar users could be selected from the database. For each user in the group, an expert, who is
Abstract—In gird environment, collaborative filtering (CF) could be used for security recommendation when grid users face plenty of unknown security grid services. Also, CF recommender systems could be employed in the virtual machines managing platform to measure the creditability of each virtual machine. In this study, a polynomial regression based recommendation model on the basis of typical userbased CF is built to make security recommendation. In the model, a cluster of recommendation algorithms based on polynomial regression are derived according to various regression orders and dataset sizes. From our experiments, three significant conclusions are discovered in this model. Firstly, algorithms with lower regression orders make better predictions. Secondly, among algorithms with each fixed regression order, the best one satisfies that its dataset size is equal to its regression order in general. Thirdly, selecting appropriate regression order and dataset size could enhance recommendation quality. Keywords-security recommendation; polynomial regression; collaborative filtering;
I.
INTRODUCTION
Grid refers to systems and applications that integrate and manage resource and services distributed across multiple control domains [9]. Due to the increasing requirement of coordinated working and resource sharing, the grid technology is increasingly applied, and the problems of security are becoming more and more serious. In order to safeguard gird environment, recommender system could be employed to recommend more secure grid services for grid users by analyzing the history records of the grid users. The recommender system works by collecting grid users’ security ratings for grid services and matching together the grid user who shares the same information requirement. Besides, as to the platform providing and managing virtual machines, recommender system could also be employed in the system to measure the creditability of each virtual machine. In this scenario, the degrees of satisfaction between virtual machines during the interactions are collected, and then through cross recommendation, the
* This research was supported by Development Plan of the State Key Fundamental Research (973) under Grant 2007CB310900
978-0-7695-3818-1/09 $26.00 © 2009 IEEE DOI 10.1109/ChinaGrid.2009.34
134
actually a functional relationship based on polynomial regression between the user and the active user, is created to make the individual prediction (the prediction only based on a neighbor). Secondly, all the individual predictions will be combined according to their weight. As for creating the expert, a new approach to select samples (dataset), which are used to estimate the coefficients of our polynomial model, is presented. As regards measuring the similarities between users, two options are given, one is the similarities used in typical user-based CF, and the other is the mean residual error (MRE) based similarity we put forward. The remainder of the paper is organized as follows. Section 2 provides a brief introduction to collaborative filtering. We first formally describe the user-based CF and then discuss two widely used similarity measures. Section 3 describes the recommendation model using polynomial regression approach in detail. Section 4 describes our experimental work. It provides details of evaluation criterion, methodology and results of different experiments and discussion of the result. Finally, Section 5 provides some concluding remarks and directions for future research. II.
active user, and sim(a,k) denotes the similarity between user a and user k. Two popular methods for measuring the similarity of users are given, one is Pearson Correlation and the other is Cosine Similarity. This paper adopts the cosine similarity measure, comparing two user profiles by the cosine of the angle between the corresponding row vectors. Pearson Correlation: The Pearson Correlation between user k and user a can be calculated by
sim ( a, k ) =
, uK ]
T
a ,i
∑ (ra ,i − ra ,i )2 ∑ (rk ,i − rk ,i )2 i∈I
T
rk , M ⎤⎦ , k = 1,
(2)
i∈I
by user a and user k repectively, symbol I denotes the set of items co-rated by user k and user a. Cosine Similarity: In this case, two users are viewed as two vectors in the K dimensional item-space. The similarity between them is measured by calculating the cosine of angle between two vectors. The formula is given as follows:
BACKGROUND
uk = ⎡⎣ rk ,1 ,
− ra , i )( rk ,i − rk , i )
i∈I
where ra,i and rk,i denote the ratings of user a and user k on item i repectively, ra and rk are the average rating made
sim ( a , k ) =
In this section, the typical user-based CF approach is introduced briefly. For K users and M items, the user profiles are represented in a K×M user-item matrix R. Each element rk,m indicates that user k rated item m by rk,m and that rk,m must satisfy 0 ≤ rk,m ≤ v, where v is an positive integer. If rk,m =0, it means that the rating is unknown, i.e. item m has not been rated by user k, otherwise, item m has been rated by user k. The user-item matrix can be decomposed into new row vectors, R = [u1 ,
∑ (r
ua ⋅ uk = ua ⋅ uk
∑r
r
a ,i k ,i
i∈ I
∑ ra ,i 2 ∑ rk ,i 2 i∈ I
(3)
i∈ I
where ra,i denotes the rating of user a on item i, and rk,i denotes the rating of user k on item i.. RECOMMENDATION MODEL USING POLYNOMIAL REGRESSION
III.
A. The Model of Polynomial Regression 2 p ⎪⎧ y = β 0 + β1 x + β 2 x + + β p x + ε ⎨ ⎪⎩ D, dataset of sample points ,
,K
corresponding to the K user profiles uk, each row vector represents a particular user’s item ratings.
(4)
where p is regression order, , βp are regression coefficients,
A. Typical User-based CF User-based CF predicts an active user’s interest on an item based on rating information from similar user profiles [1, 3, 7]. The predictions mainly comprise two processes, searching a set of similar users named “neighbors” and combining the ratings of neighbors according to their similarities. Thus, ratings by more similar users contribute more to predicting the item rating. The set of similar users can be identified by employing a correlation threshold or selecting top-N. Consequently, the predicted rating of item m by user k is computed as (see [1, 3, 7]) ∑ (ra,m − ra ,i ) × sim ( a, k ) (1) rˆk ,m = r k + a∈SU ∑ sim ( a, k )
β 0 , β1 ,
ε
is the error term, D is the set of sample values used to estimate regression coefficients. Obviously, when p =1, the model degenerates into a linear one. Two main techniques are used to figure out the parameter vector, one is transforming the polynomial regression model into a multiple linear one and the other is Least Square Method. B. Recommendation Algorithm Using Polynomial Regression Approach For the typical user-based CF, the approach mainly depends on the adjusted weighted sum of neighbors’ ratings. If each neighbor’s rating on the item to be recommended is equal to or slightly different from its average rating,
a∈SU
where ra,m denotes the rating of user a on item m, ra , i is the average rating of user a, SU denotes the neighbors of the
135
ϕu , j
predictions may become worse, because the neighbors make little contribution to the final predictions, that is, the final prediction contains little information about neighbors. In order to alleviate the poor predictions generated by directly using the neighbors’ rating, a polynomial regression based recommendation model is proposed in this study. In the model, a cluster of recommendation algorithms based on polynomial regression are derived in term of various polynomial regression degrees and different dataset sizes selected for estimating the parameters of the algorithms. These algorithms use an approximation of the ratings based on the polynomial regression model, instead of directly using the ratings of neighbors’. Before stating the main process, we list some symbols involved in the model. U is the whole user space; U i is the set of users having rated item i ;
regression relationship between user u and user i,
w( u, j)
{
where user u is the active user, t is a correction threshold used for selecting neighbors. 2) Determining the dataset size Different from the previous research, a new method to select regression dataset (set of sample values) is presented to decrease the calculation brought by the selection of all the points in the rating pairs (see 3.21). For an arbitrary expert in a certain prediction, the dataset D could be obtained from rating pairs. The specific processes are briefly given as follow. a) Sample values (point set) S could be selected from rating pairs rj , k , ru ,k , k ∈ I u ∩ I j simply by directly changing
1) Main Process The prediction process could be approached by learning a nonlinear mapping
φu (U i ) : R U → R i
and using this mapping to optically predict predicted preferences of the active user with respect to item i. we use polynomial approximation and forecasting combination to approach the mapping. Thus, the mapping, which could also be viewed as a function, could be thought of as an adjusted combination of several child mappings:
)
(5)
pairs into points and adding the points into point set . For example, there are 6 rating pairs: , , , , , . By the rule, point set S is {(1,2), (2,3), (3,2)}. b) For each point in S, the frequency of the point appearing in the pairs could be counted out. c) Sorting the points according to their frequencies. Points with more appearing times are sorted in the front d) Selecting the first n points as the sample values. If n > size(S), all the points are selected to estimate the regression coefficients of recommendation algorithm. Note: In this study, every expert in an algorithm has a corresponding dataset. And dataset size of each expert is equal to that of any other dataset in the algorithm. 3) Weight strategy Weight strategies involved in the recommendation are given as follows. a) Similarity Based Weight: Similarity based weighted combination is widely used in the collaborative filtering, which can be expressed
where
wu , j
is the corresponding weight
ϕ ( Iu ∩ I j ) : R
Iu ∩I j
→R
ϕ ( I u ∩ I j ) could be regarded as an expert from which
the individual prediction could be obtained if I u ∩ I j rating pairs rj , k , ru ,k , k ∈ I u ∩ I j and rating r j.i are inputted. Regarding implementing the child mapping, an approach based on the polynomial regression model mentioned in 3.1 is introduced. Thus, the algorithm using polynomial regression approach could described formally as (6) ru ,i = r u , i + w ( u , j ) ⋅ ϕ u , j rj , i − r j , i
∑
j∈SU i
( ( )
)
ϕu, j ( x) = βu, j,0 +βu, j,1x+βu, j,2x2 + +βu, j, pxp where
p ∈ N, β u , j ,0 , β u , j ,1 ,
}
SU i = u′ sim ( u , u ′ ) ≥ t , ru′,i ≠ 0, u ′ ∈ U i , 0 ≤ t ≤ 1, t ∈ R ,
I u ∩ I j is the set of items co-rated by user u and j.
(
= βu, j, p = 0 , the
user-based approach using polynomial regression approach is in equivalence with the typical user-based CF approach. Remark 2. The success of the model mainly depends on the dataset size and the user-item matrix. If the dataset size is so small that the model can’t estimate the regression coefficients. Moreover, if the user-item matrix is too sparse, it may also lead to the failure of the recommendation algorithms. Remark 3. Since the approach using the whole set Ui costs too much computing resource and sometimes may lead to poor predictions, we define a set SUi which is a subset of Ui. By setting a correction threshold, SUi is represented as:
I u is the set of items user u have rated;
j∈Ui
is the corresponding weight which will be
explained in the following. Remark 1. If βu, j ,1 = 1, βu, j,0 = βu, j,2 =
U \ U i is the set of users left.;
φu (Ui ) = ru ,i + ∑ wu, j ⋅ ϕ ( Iu ∩ I j ) − rj ,i
is an expert giving prediction based on the
(7)
, β u, j, p ∈ R
136
w ( u, j ) =
sim ( u , j ) ∑ | sim(u, j) |
C. Results and discussions
(8)
1) Coping with overflow In normal cases, the predictions are between 1 and 5, but due to the property of our regression based recommendation model, they may exceed 5 and fall below 1. We set rˆu ,i = 5
j∈SU i
b) Mean Residual Error Based Weight: Mean Residual Error (MRE) Based Weight is put forward with an eye to the performance of each expert. It is based on the assumption that experts have a better performance if their MRE is smaller. In term of this assumption, good experts are given more weight while poor experts are given less. The specific formulae can be designed as Iu ∩ I j (9) w ( u, j ) = ∑ ru ,k − ϕu , j ( rj ,k )
if rˆu ,i > 5 , otherwise rˆu ,i = 1 , where rˆu ,i is prediction rating 2) Experimental conclusions and discussion In this section, the experimental conclusions are illustrated by two main parts. 1. For each fixed dataset size, the performances of the recommendation algorithms with different regression orders are compared. 2. For each fixed regression order, the performances of the recommendation algorithms with different dataset size are compared. a) Experiment with regression order Fig.1 shows that RA(1,5) performs best , the second is RA(2,5) and RA(3,5) gives the worst performance. Fig.2 and Fig.3 show the similar trends. Therefore, regression order has significant influence on the performance of the algorithms. The conclusion that to some extent, the larger the polynomial degree is, the poorer prediction is made by the polynomial regression could be drawn in the model. b) Experiment with dataset size Fig.4~6 demonstrate that for each fixed regression order, how the performances of RA(q,5), RA(q,10) and RA(q, 15) vary with the increase of correlation threshold from 0.1 and 0.9. When p=1, RA(1,5) performs best (see Fig.4). When p=2, RA(2,10) makes the optimal predictions among the algorithms with regression order 2(see Fig.5). When p=3, RA(3,15) is the best algorithm among RA(3,5), RA(3,10) and RA(3,15). The above analysis directly lead to the conclusion that among algorithms with each fixed regression order, the best one satisfies that its dataset size is equal to its regression order in general. Namely, when p is 1, 2 and 3, the best algorithms are RA(1,5), RA(2.10) and RA(3,15), respectively. c) Experiment with RA(1,5)and typical CF From Fig.7, when correlation threshold is around 0.4 (mainly in the interval 0.3~0.5), RA(1,5) could make a slightly better recommendation than the typical CF. With correlation between 0.l and 0.3, the user-based CF is slightly better than RA(1,5). Therefore, selecting appropriate value of regression order and dataset size could enhance the recommendation quality. Finally, a strange phenomenon appears in our experiments. The phenomenon is that for nearly all the algorithms in our model, the MAE increase rapidly when correlation threshold is larger than 0.7. Two key reasons are presented to explain this phenomenon. The first is that the neighbors become less and less with the growth of correlation threshold, and the second is that the experts formed by the active user and its neighbors with larger similarities to the active user have small point sets. Both of them have an important relationship with the sparsity of rating matrices. Therefore, the sparsity of recommender system is becoming an urgent topic in current researches.
k ∈Iu ∩ I j
IV.
EXPERIMENTAL EVALUATION
A. Evaluation Criterion In our experiments, Mean Absolute Error (MAE) is used to evaluate the performance of our algorithms. MAE is one of most widely used criterion to evaluate the quality of recommender system. Let represent the prediction-rating pair, pi is the prediction of one rating, ri is the actual rating. MAE can be expressed as N
MAE = ∑ pi − ri N
(10)
i =1
B. Notes on the Experiment Because the essence of the recommendation related to virtual machines or grid system and the recommendation in movies are the same by and large, we use MovieLens database (see http://movielens.org) as the substitute to simulate recommendation algorithms in this study. We start our experiments by dividing the data set into a training set and a test set. The training set contains 800 users and the test set contains the remained users. As to similarity measure and weight strategy, Cosine Similarity is utilized to compute the similarity between users and the selected weight strategy is similarity based weight. In addition, two significant parameters mentioned in 3.2, regression order p and dataset size n are employed in the experiment to research our recommendation model. By varying the value of p from 1 to 3 incremented by 1 and the value of n from 5 to 15 incremented by 5, we could get 9 specific recommendation algorithms in the model. For each fixed p and n, a specific algorithm (as shown in (12)) could be identified in the model. Algorithms could be symbolized as { RA ( p, n) p ∈{1, 2, 3} , n ∈{5,10,15} } (11) where
RA is recommendation algorithm in short, p is regression order, n is dataset size. For instance, RA(1,5) stands for the linear regression based algorithm with the dataset size 5.
137
Fig.1: MAE of Algorithms with n=5
Fig.2: MAE of Algorithms with n=10
Fig.3: MAE of Algorithms with n=15
Fig.4: MAE of Algorithms with p=1
Fig.5: MAE of Algorithms with p=2
Fig.6: MAE of Algorithms with p=3
V.
[5]
CONCLUSION AND FUTURE WORK
In order to improve the security of virtual machines and grid systems, CF techniques could be used for security recommendation. In this study, we proposed a polynomial regression based recommendation model in order to make better recommendation. From the model, a cluster of recommendation algorithms are derived by varying regression order and dataset size. In addition, we detailedly investigate how the performances of recommendation algorithms change with the variation of two important parameters in the model. Experiments show that if we select appropriate polynomial degree and dataset size in the model, our algorithms could enhance recommendation quality. With regard to future work, other collaborative filtering methods including a probabilistic relation model and maximum entropy model could be incorporated into our model for better recommendation.
[6]
[7]
[8]
[9]
[10] J.S. Breese, D. Heckerman, C. Kadie, “Empirical analysis of
predictive algorithms for collaborative filtering”, 14th Conf. on Uncertainty in Artificial Intelligence, 1998.
REFERENCES [1]
[2]
[3]
[4]
J. L. Herlocker, J. A. Konstan, L. G. Terveen and J. T.Riedl, “Evaluating collaborative filtering recommender systems”, ACM Transactions Information Systems, vol.22, no. 1, 2004 P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm and J. Riedl, “GroupLens: An Open Architecture for Collaborative Filtering of Netnews”, Proceedings of ACM 1994 Conference on Computer Supported Cooperative Work, 1994. Jun Wang, Arjen P. de Vires, Marcel J.T. Reinders, “Unifying Userbased and Item-based Collaborative Filtering Approaches By Similarity Fusion”, SIGIR’06, 2006. George Karypis, “Evaluation of Item-Based Top-N Recommendation Algorithm”, 10th Conference of Information and Knowledge Management, 2001. I. Foster, C. Kesselman, J. M. Nick, and S. Tuecke, “Grid Services for Distributed System Integration”, IEEE Computer, Vol.35, No.6, June 2002.
Gedimias Adomavicius and Alexander Tuzhilin “Toward the Next Generation of Recommender Systems: A Survey of the State-of-theArt and Possible Extensions”, IEEE, 2005 Slobodan Vucetic and Zoran Obradovic “A Regression-Based Approach for Scaling-Up Personalized Recommender Systems in ECommence”, WEBKDD'00, 2000 Jonathan L. Herlocker, Joseph A. Konstan, Al borchers, and John Riedl, “An Algorithm Framework for Performing Collaborative Filtering”, SIGIR’99, 1999 Badrul Sarwar, George Karypis, Joseph Konstan and John Riedl, “Item-Based Collaborative Filtering Recommendation Algorithms”, WWW10, 2001.
138