Recommender systems based on ranking performance optimization

8 downloads 1834 Views 586KB Size Report
c Higher Education Press and Springer-Verlag Berlin Heidelberg 2015. Abstract The .... porating the ranking performance into the optimization pro- cess. In specific, we ... data from search engine, researchers try to make machines. “learn” the ...
Front. Comput. Sci., 2016, 10(2): 270–280 DOI 10.1007/s11704-015-4584-1

Recommender systems based on ranking performance optimization Richong ZHANG, Han BAO, Hailong SUN

, Yanghao WANG, Xudong LIU

State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China

c Higher Education Press and Springer-Verlag Berlin Heidelberg 2015 

Abstract The rapid development of online services and information overload has inspired the fast development of recommender systems, among which collaborative filtering algorithms and model-based recommendation approaches are wildly exploited. For instance, matrix factorization (MF) demonstrated successful achievements and advantages in assisting internet users in finding interested information. These existing models focus on the prediction of the users’ ratings on unknown items. The performance is usually evaluated by the metric root mean square error (RMSE). However, achieving good performance in terms of RMSE does not always guarantee a good ranking performance. Therefore, in this paper, we advocate to treat the recommendation as a ranking problem. Normalized discounted cumulative gain (NDCG) is chosen as the optimization target when evaluating the ranking accuracy. Specifically, we present three ranking-oriented recommender algorithms, NSMF, AdaMF and AdaNSMF. NSMF builds a NDCG approximated loss function for Matrix Factorization. AdaMF is based on an algorithm by adaptively combining component MF recommenders with boosting method. To combine the advantages of both algorithms, we propose AdaNSMF, which is a hybird of NSMF and AdaMF, and show the superiority in both ranking accuracy and model generalization. In addition, we compare our proposed approaches with the state-of-the-art recommendation algorithms. The comparison studies confirm the advantage of our proposed approaches. Keywords recommender system, matrix factorization, Received December 24, 2014; accepted April 15, 2015 E-mail: [email protected]

learning to rank

1

Introduction

Recently, recommender systems have been exploited by systems from various domains for assisting users to discover potential interested items. Basically, most of the existing recommendation approaches focus on the missing rating prediction for specific users or user groups on unobserved items. They are usually located at certain e-commerce stores or internet delivered promotional marketing advertising lists and then recommend products and services to serve the increasing needs of today’s consumers. The collaborative filtering approaches [1–3] work well to recommend ranked products by making use of users’ profiles and purchasing histories. In reality, most items have only been rated by a small fraction of the users. As a consequence, when organizing the observed ratings in an rating matrix, a large fraction of the matrix entries are missing. The objective of collaborative filtering in this setting is to predict the missing entries of the rating matrix based on the observed ratings. From the model building perspective, matrix factorization (MF) [4] or Latent Factor Model is one of the most impactful models exploited in collaborative filtering. In general, MF assumes that there is a latent space of dimension, from which each user is associated with a user feature vector and each item is associated with an item feature vector. The rating of an item given by a user is then modelled as the inner product of two vectors. There are numerous approaches based on MF for improving the accuracy of recommendation. The advantages of the MF-based models include their well-

Richong ZHANG et al.

Recommender systems based on ranking performance optimization

principled definition and the ability of reducing the problem of overfitting by introducing normalization terms. These models however treat each user’s rating equally important. In practice, not only users feature distribution varies, but also the number of ratings provided by users changes. Therefore, they are not specifically designed to capture the difference between users, and they assume no correlation or dependency for the model performance and the observed ratings. On the other hand, as the introducing of normalization terms, the parameters which control the extend of regularization are usually not easy to decide. From the perspective of recommendation performance evaluation, root mean square error (RMSE) is usually chosen not only as the objective function to be minimized, but also as the evaluation metric to judge the system performance as well. However, in practice, the top-K precision may demonstrate the performance of recommender systems more accurately. The uncertainty of the prediction in rating-oriented recommender system brings instability in the RMSE-based loss function. Moreover, other factors, such as the position information that measures the ranking performance are not taken into account. Cremonesi et al. showed that lower RMSE does not always guarantee a better ranking result [5]. As a result, instead of achieving better rating prediction, optimizing the relevance of the ranking list can offer better user experience. To focus on the ranking performance of learning models, researchers investigated the possibility of building learning models by considering ranking measures. For instance, learning to rank (LTR) is one of the most commonly-used techniques to solve ranking problem in machine learning domain, which predicts the ranked list for unobserved data. LTR is categorized as point-wise, pair-wise and list-wise by Liu et al. in [6]. Due to the scalability limitation of pair-wise approach (considering all rating pairs of each user) and the calibration [7] problem of point-wise methods, we focus on list-wise in this study that optimizes the ranking performance metric of the generated permutation directly. There have been existing a few recommender systems [8, 9] that take into account the ranking performances when building the objective functions. However, the overfitting problem of existing algorithms is not well solved. In addition, the performance of algorithms is highly dependent on the chosen parameters. To overcome the above mentioned limitation and combine the advantage of LTR and MF, in this paper, we propose LTR and MF-based recommendation models by incorporating the ranking performance into the optimization process. In specific, we choose normalized discounted cumu-

271

lative gain (NDCG) as a parameter that measures the overall relevance of the recommended list. As NDCG is a noncontinuous function, we cannot use this function directly as the objective function. One possible option is to incorporate NDCG as the performance measure of the adaptive boosting approach AdaRank [10] and to build each component recommender by MF. This model is same as our previous work, AdaMF [11]. As an extension of previous work, in this study, we introduce two more possible solutions which further improve the ranking performance of recommender systems. One is to make use of an continuous smoothed function of NDCG measure [12], which we denote as NDCG-smoothed matrix factorization (NSMF). The other one is to create a hybrid model that choosing NSMF as the component recommender, which is denoted as AdaNSMF. In summary, the contributions of this work are four folds: • We investigate the difference between the evaluation metrics of recommender systems and suggest exploiting ranking performance to judge recommendation results. • Two possible solutions, NSMF and AdaMF of introducing ranking performance metric into the recommendation model are discussed. • To incorporate the advantage of NSMF and AdaMF, We propose AdaNSMF, which combines component recommenders built by NDCG-smoothed Matrix Factorization Models and insures the ranking performance. • We conduct empirical studies on two real-world datasets, where the results confirm the effectiveness and superiority over existing approaches.

2

Related work

In this section, we introduce existing studies related to recommender system, MF and LTR techniques and then discuss the comparative advantages and limitations of these approaches. 2.1 Recommender system The general goal of recommender systems is to assist potential buyers in discovering their interested items, such as products or information. Collaborative filtering approaches [1, 2] have been successfully exploited by many systems to predict the ratings by aggregating the experience of other users who are similar to the current user in terms of interest or other aspects. The collaborative filtering recommendation approach was first proposed by Goldberg et al. [1]. GroupLens [2] is an

272

Front. Comput. Sci., 2016, 10(2): 270–280

automatic collaborative filtering recommender system that works based on users’ ratings and can be used to generate recommendations about movie, music or news. Sarwar et al. [3] suggest that item-based collaborative filtering algorithms can perform many recommendations for millions of users and items in seconds, and the Mean Absolute Error generated by item-based collaborative filtering algorithms is lower than that generated by user-based algorithms, which indicates that item-based algorithms are able to provide higher quality recommendations. Item-based [3], user-based [13] and hybrid algorithm [14] have all shown the capability of providing higher quality recommendations in various domains. In addition, the modelbased collaborative filtering is used to establish a model from user behaviors and generate specific recommendations. Matrix factorization or SVD becomes another popular modeling technique because of their success in Netflix challenge [4]. Moreover, traditional neighbor based collaborative filtering is combined with MF model [15] for improving the recommendation accuracy. 2.2 Learning to rank Traditional ranking algorithm generates a list by sorting the relevance function of query and document. With the development of linguistics model and more accumulation of labeled data from search engine, researchers try to make machines “learn” the ranking model. LTR is a series of supervised learning algorithm to learn better ranking model from the labeled documents and feedbacks from users. It can be categorized as Point-wise, Pairwise and List-wise [6] models. Point-wise approaches predict the score of a document to the query by regression. Pairwise ranking models use classification to learn the relative preference of each item pair, and classification methods, like SVM [16], can be adapted into solving the problem. The key point of List-wise LTR is to define loss function based on the whole result list for each query. The intuition is to consider ranking performance metrics, such as mean average precision (MAP) and NDCG, as the learning objective. However, these metrics are associated with the rank of documents thus they are discontinuous with parameters. Therefore different smoothing technologies for metrics lead to various approaches [17]. Based on the direct optimization of the performance metric, AdaRank [10] uses boosting technique to solve the problem. The basic idea of AdaRank is to employ an exponential loss function of performance metrics. It generates a strong

ranker which linearly combines weak rankers, where the coefficients are computed by the performance of each ranker. Consequently, different weak rankers are correlated according to each one’s performance. Several ranking-based models have been proposed for making better recommendations. In Ref. [8], Balakrishnan et al. propose to take the trained latent factor of each user and each item as the feature vector. The extra parameters are also considered for building point-wise and pair-wise models. List-wise LTR algorithm was also used to build MF model. ListRank-MF [9] optimizes the loss function based on the top one probability of the recommended item. The disadvantage of these approaches is that the model training may cost more time as model parameters are learned by EM-like algorithms. Ranking based loss function in ListRank-MF also increases the complexity in the gradient descent algorithm. Moreover, such methods do not take ranking metric into account, which reflects the ranking performance of recommender systems. 2.3 Evaluation measures Since information retrieval (IR) derives from the middle of last century, many evaluation measures are proposed to judge how well an algorithm IR system adopt is. It is broadly known that numerous metrics in IR are considering as the case of binary relevance, in which every document is known to be either relevant or non-relevant to a particular query. The precision and recall are widely used in IR and statistics, and then F-measure is deduced by combining precision and recall [18]. Later, based on that, we can compute a precision and recall at every position in the ranked sequence of document. So it easily gets a function of precision p(r) in terms of recall r, and finally the area under the curve of p(r) represents the average precision. Thus, MAP is calculates by mean average precision in N queries. Besides, precision@n denotes that the precision are evaluated only on top n documents, reason that what user considers about is top n (n commonly is 10–20) results for a query. Beyond that, another notable metric for the case of binary relevance is mean reciprocal rank (MRR) [19]. The reciprocal rank of a query is the reciprocal of the position of the first correct answer. Therefore, MRR is the average of the reciprocal ranks in N queries. However, since there are many correct results for query, Kalervo. J et al. propose evaluation method, namely normalized discounted cumulative gain (NDCG) [20], based on the use of non-dichotomous relevance judgements in IR experiments, which we will describe detailedly in Section 3. What’s more, Chapelle et al. extend the classical reciprocal rank to the graded relevance

Richong ZHANG et al.

Recommender systems based on ranking performance optimization

case to the expected reciprocal rank (ERR) [21], which measures the expected effort required for a user to satisfy their information needs. ERR supports graded relevance judgements and simplifies to reciprocal rank under binary relevance.

3

Preliminaries

In this section, we propose the problem definition and introduce the general MF method and the NDCG metric which are fundamentals of this study. 3.1 Problem definition The objective of recommender systems is to learn the preference model from users’ rating histories so as to generate a top-K recommendation list. Suppose that there are N ratings given by U users to I items. For each user u, Ru is the set Nu of rated items {Rui }i=1 where Nu is the size of Ru . The rating given by user u to item i is denoted as Rui . The goal of our algorithm is to build a function f (u, i) and rank items for users according to the value of this function. 3.2 Matrix factorization As mentioned in the introduction section, MF is one of the most popular used techniques for conducting recommender systems. In MF, the prediction of the unobserved rating of item i given by user u is formulated as: f (u, i) =

K 

Puk Qik ,

(1)

k=0

where Pu is a latent factor vector of user u which represents the interest of user u. Similarly Qi shows the relevance of item i on the latent factors, and K is the dimension of the latent factor. The traditional approach of MF model is built for the prediction of missing ratings. To measure the performance of the accuracy of prediction, RMSE is one of the most commonly used metrics. The RMSE-based loss function is defined as: u 1  λ (Rui − f (u, i))2 + (||P||F + ||Q||F ), 2 u=1 i∈R 2

U

L=

N

(2)

u

where f (u, i) denotes the predicted rating given by (1). In the loss function, || · ||F is Frobenius norm, which is used as the normalized term to generalize the model. Once the latent vectors have been learned, the unobserved ratings of each user can be estimated by f (u, i). Intuitively, the recommendation list is generated by sorting f (u, i).

273

3.3 NDCG From the definition in Eq. (2), it is clear that the original loss function in MF is contributed by different observed rating equally. Therefore, when the number of associated items varies largely for different users, the overall loss function will be dominated by those users with a large amount of feedbacks. In addition, the ranking performance, which is important for evaluating recommender systems, is not taken into consideration in the loss function. In this study, our objective is to define a ranking-based loss function. We propose to optimize the performance metrics which measure the overall ranking accuracy directly. To measure the performance of recommendation, metrics for information retrieval such as precision, recall, MAP and NDCG can be utilized. In recommender systems, user ratings are usually shown as a 1–5 start scale value. So that we need a metric that measures both multi-level relevance and ranking position as our optimization objective. Among the above mentioned metrics, NDCG is the most appropriate one. For the recommender system, NDCG@n of the recommendation list to user u is defined as: NDCG(u)@n = Zu

n  j=1

Gu j , log(1 + j)

(3)

where we choose top n items with highest predicted scores. In Eq.(NDCGdef), Gu j is the gain of the item ranked at the jth position for user u, that is defined as Gui = 2Rui − 1. Zu is the normalized parameter for the user u so that the perfect ranker will get NDCG of 1. From the definition of NDCG, it is clear when items with higher relevance on the top of a list, the NDCG value achieves a higher value. This metric reflects the overall relevance of the top-N recommendation list.

4

Adaptive boosting matrix factorization

In this section, we present our model AdaMF that combines MF with AdaRank [10], which provides a boosting technique for conducting the ranking-oriented recommender system. 4.1 AdaMF Similar to AdaBoost, the AdaMF maintains a weight distribution on each users in the training set. The training process contains T rounds. In the tth round, the algorithm builds a component recommender Ptu , Qtu by MF according to the weight distribution on users. During the learning process, weights on users with low performance are increased so that the component recommender of next round would be forced

274

Front. Comput. Sci., 2016, 10(2): 270–280

to focus on these users by improving the weights. The coefficient αt of the component recommender is calculated according to the performance metric. Then the newly trained recommender is integrated into the ensemble recommender f (u, i). New distribution for each user is updated after evaluating the performance of the ensemble recommender f (u, i) on the training set. The detailed AdaMF algorithm is shown in Algorithm 1. Algorithm 1 AdaMF Input: Rating history {rui } Output: MF ranker f (u, i) 1. Initialize D1 (u) = 1/U, f 0 (u, i) = 0 2. for t = 1, t 0. Thus, Since 0  ϕ(t)  1,

δtu = E e (u, ft ) − E e (u, ft−1 ) n n   2 ft (u, j) − 1 2 ft−1 (u, j) − 1 − Zu · = Zu · log(1 + j) log(1 + j) j=1 j=1

6.1 AdaNSMF Same as the construction of AdaMF recommender system, we extend the modified NSMF method, and get the adaptive NSMF recommender system, which we call AdaNSMF recommender system. During the iteration, for instance, in the tth round of training, we build the current component recommender Ptu , Qti by NSMF under the current weight distribution on users which reflects critical point of component recommender. Then according to the definition of NDCG, we calculate the component recommender performance by which the weight of current component recommender αt is calculated and the weight distribution of next round on user is updated. Finally we integrate the newly component recommender into the ensemble recommender f (u, i).  t According to f (u, i) = Tt=1 αt PtT u Qi , we obtain an ensemble recommender of rating by linearly combining the component recommender. The final raking list can be created by sorting f (u, i) for the unrated items for each user, which would be evaluated by NDCG@10 metric. Since the choice of performance metric is NDCG@10, our hope is that on one hand, the NDCG@10 metric can increase along with the round, on the other hand at least the RMSE metric would not increase with the bound of the round. 6.2 Theoretical analysis In order to simplify the derivation, we set E e (u, t) =  t t NDCG(u)@mte , E c (u, t) = NDCG(u)@mtc , δt = U u=1 δu , δu =  c E e (u, t) − E e (u, t − 1), ϕ(t) = U u=1 Dt (u) · E (u, t). So it is t clear that if we can prove every δu  0, obviously δt  0 1)

http://www.grouplens.org/node/73

= Zu ·

n  2 ft (u, j) − 2 ft−1 (u, j) j=1

log(1 + j)

.

(17)

t tT t Since ft (u, j) = ft−1 (u, j) + αt PtT u Q j , and αt Pu Qu  0, where j denotes the item at position of j. It holds that δtu > 0. Thus δt > 0. So we conclude that the NDCG@10 metric increases along with the round.

7

Experiment

In this section, the empirical studies are conducted to evaluate our proposed three approaches. In comparison with baseline methods, we confirm the effectiveness of our proposed AdaMF, NSMF and AdaNSMF optimization techniques. 7.1 Experimental setup and datasets The datasets we use in the experiments are from Movielens. MovieLens-100K1) contains 100K ratings on the scale of 1 to 5 given by 943 users to 1 682 movies, which we abbreviate as ML-100K. The other dataset is ML-1M dataset which is composed by 1M ratings from 6 040 users to 3 883 items with scale of 1 to 5. To sample the training set and the testing set, following the experimental conduction in related works [9], we randomly choose N ratings from each user to build the training set, and use the remaining ratings for testing. N is a value in the set of {10, 20, 50}. For each choice, the user must have at least N+10 ratings, so that after choosing N ratings for training, we can still test NDCG@10 on the testing set. In addition, we build a mixed dataset in which user ratings are

Richong ZHANG et al.

Recommender systems based on ranking performance optimization

10, 20 or 50. In the following experiments, we use 10-set, 20set, 50-set, mix-set to denote these datasets representatively. In addition, the statistics of our datasets are shown in Table 1. Table 1

Data sets statistics ML-100K

ML-1M

Noumber of users

943

6 040

Noumber of items

1 682

3 883

Noumber of ratings

100 000

1 000 000

Ave noumber of ratings/users

106.0

165.6

Sparsity/%

93.70

95.74

Ratings scale (interval)

1–5(1)

1–5(1)

3.53

3.58

Ave rating value

The basic MF model is used as the baseline model in this study. Both RMSE and NDCG, as defined in Eq. (3), are chosen as the evaluation metric for performance comparison. The approaches we compared in this study are as follows: • MF: This is the traditional rating-oriented matrix factorization [4]. In traditional MF, the loss function is built based on minimizing the error of rating prediction, as shown in Eq. (2). • AdaMF: AdaMF, as introduced in Section 4, combines linearly the weak component recommender system into a strong recommender system. • NS: NS recommender system minimizes the NDCGbased objective function. • NSMF: As defined in Section 5, NSMF linearly integrates NS and MF. • AdaNSMF: As defined in Section 6, AdaNSMF is based on the idea of Adaboost and each component is a NSMF recommender.

277

is 200 rounds. Specifically, the upper bound of T is set as 20. Figure 1 demonstrates the performances of AdaMF on the testing sets. As we can observed, after the combination of weak recommenders, the original MF model is enhanced in terms of ranking accuracy since not only the NDCG@10 increases with the round in Fig. 1(a), but variance of RMSE is no more than 0.005, which is a small range in Fig. 1(b). Besides, from Fig. 1(a), the NDCG@10 could reach its bound after about 10 training rounds. Finally, from both figures, we find that the RMSE is the lowest in 1m-mix and the NDCG is the highest in 1m-mix as well. This suggests along with the number of ratings increasing, the performance of AdaMF gets improved.

Fig. 1 The performance curve of AdaMF on 1m-dataset. (a) NDCG; (b) RMSE

In each weak recommender, the regularization coefficient λ controls the degree of over-fitting on the training set. For the single MF model, a good performance can be obtained after choosing parameters by Cross-Validation, which may take a great quantity of works. However, as illustrated in Fig. 2, AdaMF shows its convergence with different parameters λ in a certain range.

7.2 Effectiveness In order to achieve a more reliable experimental result, for each condition of the following experiments, we run the experiment 5 times to get the mean of NDCG@10 and RMSE. First and foremost, we test the effectiveness of AdaMF. AdaMF integrates weak recommenders which show better performances on different distributions of training data as a strong recommender. One experiment is evaluated how the metrics NDCG and RMSE vary along with the number of training rounds. Besides, the other experiment shows the effectiveness of AdaMF in solving over-fitting problem. For each weak recommender, the parameters are chosen as: the latent factor dimension K is 10, the regularization coefficient λ is 0.0, learning rate η is 0.001 and iteration upper bound T w

Fig. 2 The performance curve of AdaMF with λ. (a) 100k-mix; (b) 1m-20; (c) 1m-50; (d) 1m-mix

Second, we examine the training effectiveness of NS. For NS, since an approximation function is chosen to substitute the original NDCG, we will confirm that if minimizing Eq.

278

Front. Comput. Sci., 2016, 10(2): 270–280

(11) could really improve recommendation in terms of ranking performance. The parameters are chosen as follows: the latent factor dimension K is 40, the regularization coefficient λ is 0.1, learning rate η is 0.004, smoothness α is 5 and iteration upper bound T w is 100 rounds. From Fig. 3, the NDCG@10 could reach its bound after about 50 iterations. It is obvious that the NS recommendation model captures the user ranking preference in the dataset. But it is regrettable that this model performs not well on the RMSE metric, as performed in Fig. 3, which RMSE convergent to 3.40 on 1m-20, 3.27 on 1m-50 and 3.20 on 1m-mix. In fact, NS method directly optimizes NDCG as a loss fuction. Fortunately for us, this experiment indicates the NS method is efficient as the NDCG@10 metric performs well.

Fig. 3 The performance curve of NS. (a) NDCG; (b) RMSE

Third, we get the result of NSMF. NSMF combines NDCG loss function and MF loss function linearly which may integrate both advantages. The parameters are set as: the latent factor dimension K is 40, the regularization coefficient λ is 0.1, the learning rate is 0.0005, the smoothness α is 5, the weight β is 0.3 and the iteration upper bound T w is 200 rounds. Figure 4 shows that NSMF performs well on the testing sets. Not only can NSMF retain excellent performance on NDCG@10 metric, but NSMF decreases the RMSE metric by a large margin. It appears that both NDCG metric and RMSE metric approximately reach the convergence after 100 iteration.

Fig. 4

The performance curve of NSMF. (a) NDCG; (b) RMSE

In Fig. 5, NSMF shows its convergence with different parameter λ so that the parameter which controls the regularization term can be identified easily. Moreover, it reports that the

NSMF gets the convergence not only on NDCG@10 metric but on RMSE metric after around 50 iterations.

Fig. 5 The performance curve of NSMF with different λ. (a) 1m-50; (b) 1m-mix

Forth and last, we test AdaNSMF on 1m dataset. This experiment can further conclude that AdaBoost idea improves the performance of component recommender. The parameters we use are: the latent factor dimension K is 40, the regularization coefficient λ is 0.1, learning rate is 0.0005, smoothness α is 5, weight β is 0.3, iteration upper bound T w is 200 rounds and round upper bound T is 20. From Fig. 6(a), it is observed that NDCG@10 increases along with the round T , and then the model converges after 10 rounds. Besides, Fig. 6(b) indicates that RMSE fluctuates lower than 0.005 which is a small range, similar to AdaMF. So the AdaNSMF method achieves a good performance not only on NDCG@10 but on RMSE, which accords with our previous expectation.

Fig. 6

The performance curve of AdaNSMF. (a) NDCG; (b) RMSE

7.3 Ranking result comparison In this subsection we evaluate the ranking performance by comparing our models with the baseline MF, as introduced in Section 2. When implementing basic MF model, the best RMSE results at K = 10, λ = 0.1, η = 0.001, T w = 200. The AdaMF model takes the MF model as the component recommender, so the parameters of the component recommender is the same with the MF model, of which round is 20. For NS model, the parameters are K = 40, λ = 0.1, η = 0.000 5, α = 5, T w = 200 on 20 and 50 dataset. Besides, for NSMF model, the parameters are the same with the NS model, except that linear equivalent factor β is 0.3. Finally for AdaNSMF, the parameters of component recommender is

Richong ZHANG et al.

Recommender systems based on ranking performance optimization

the same with NSMF, and the iteration of round is 20. From the result presented in Table 2, it is clear that our three approaches consistently outperform the baseline method on NDCG@10 metric. Obviously, AdaMF achieves a performance improvement in terms of MF. In addition, for NSMF, when combining our proposed NS method with MF method, the performance can be further improved compared with NS method. Table 2

NDCG results and RMSE results on ML-1m dataset UPL MF

NDCG@10

RMSE

20

50

mix

0.712 1

0.713 0

0.758 3

AdaMF

0.744 7

0.763 2

0.800 3

NS

0.715 0

0.736 0

0.777 5

NSMF

0.737 5

0.750 7

0.776 5

AdaNSMF

0.743 7

0.766 0

0.801 8

MF

0.978 3

0.973 6

0.962 7

AdaMF

0.967 5

0.944 9

0.937 9

NS

3.401 2

3.313 4

3.218 1

NSMF

0.994 3

0.942 8

0.928 3

AdaNSMF

0.993 7

0.948 3

0.937 4

As is shown in Table 2, with the increase of rating matrix density, higher NDCG@10 and lower RMSE can transform into better ranking result. AdaMF takes the advantage of different component models that fit different users better, so that AdaMF can enhance the performance of single model. While NSMF constructs the optimal function by linearly combining NDCG and RMSE. It is observed that NSMF retains the higher NDCG@10 metric, and meanwhile successfully decreases the RMSE metric. Furthermore, the result of AdaNSMF further identifies the idea that Adaboost can promote the performance of recommender system. Specifically, for the 20-set, AdaMF precedes the basic MF model on NDCG by 3.26% and RMSE by 1.08% on average. This reveals the disadvantage of MF on sparse rating matrix. NSMF increases the basic MF model on NDCG by 2.54%, but reduces on RMSE by 1.6%. This confirms a better performance of AdaMF compared to NSMF when dealing with the sparseness problem in real world scenario. For the 50-set and the mix-set, AdaMF improves the basic MF model on NDCG by 4.61% and RMSE by 2.675% on average. Also NSMF exceeds the basic MF model on NDCG by 2.795% and RMSE by 3.26% on average. These all lead to that AdaMF and NSMF methods contribute to an efficient performance than the basic MF method. Finally refer to AdaMF and AdaNSMF, the raking accuracy of AdaNSMF performs better than AdaMF with promotion of 0.28% in 50 dataset and 0.15% in mix dataset. This indicates that AdaNSMF is more excellent than AdaMF in big training dataset.

8

279

Conclusion and future work

In this paper, we have introduced two ranking-oriented recommender systems based on NDCG optimization. The first approach is to develop a NDCG approximation-based loss function, where gradient descent method is exploited for learning parameters. The second approach is a boosting algorithm for recommender system. Optimization for NDCG is done by boosting the component recommender trained by different weight distribution for each user. Two approaches can be combined to further improve the ranking performance by utilizing NSMF as the component recommender in the boosting framework. Our approaches offer several advantages: ease in implementation, low complexity in training, high performance and less uncertainty in ranking. In comparison with basic MF model, the effectiveness of our proposed models has been confirmed. In the future, we are going to focus on the distributed algorithm for ranking-oriented recommender system. As gradient descent algorithms have polynomial complexity and linear scales up with the rating number, parallel and scalable implementation methods would also be studied in the future. Acknowledgements This work was supported partly by the China 973 Program (2014CB340305) and partly by the National Natural Science Foundation of China (Grant Nos. 61300070, 61370057).

References 1.

Goldberg D, Nichols D, Oki B M, Terry D. Using collaborative filtering to weave an information tapestry. Communications of the ACM, 1992, 35(12): 61–70

2.

Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J. Grouplens: an open architecture for collaborative filtering of netnews. In: Proceedings of ACM Conference on Computer Supported Cooperative Work. 1994, 175–186

3.

Sarwar B M, Karypis G, Konstan J A, Reidl J. Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web. 2001, 285–295

4.

Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer, 2009, 42(8): 30 –37

5.

Cremonesi P, Koren Y, Turrin R. Performance of recommender algorithms on top-n recommendation tasks. In: Proceedings of the 4th ACM Conference on Recommender Systems. 2010, 39–46

6.

Liu T Y. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 2009, 3(3): 225–331

7.

Hacker S, Von Ahn L. Matchin: eliciting user preferences with an online game. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2009, 1207–1216

8.

Balakrishnan S, Chopra S. Collaborative ranking. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining. 2012, 143–152

9.

Shi Y, Larson M, Hanjalic A. List-wise learning to rank with ma-

280

Front. Comput. Sci., 2016, 10(2): 270–280 trix factorization for collaborative filtering. In: Proceedings of the 4th ACM Conference on Recommender Systems. 2010, 269–272

10.

Xu J, Li H. Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007, 391–398

11.

Wang Y, Sun H, Zhang R. Adamf: adaptive boosting matrix factorization for recommender system. In: Proceedings of the 15th International Conference on Web-Age Information Management. 2014, 43–54

12.

Valizadegan H, Jin R, Zhang R, Mao J. Learning to rank by optimizing ndcg measure. In: Proceedings of the 2009 Conference on Advances in Neural Information Processing Systems. 2009, 1883–1891

13.

Sarwar B, Karypis G, Konstan J, Riedl J. Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web. 2001, 285–295

14.

Guan Y, Cai S, Shang M S. Recommendation algorithm based on item quality and user rating preferences. Frontiers of Computer Science, 2014, 8(2): 289–297

15.

16.

17.

Computer Science and Engineering, Beihang University, China. His research interests include recommender systems, knowledge graph and crowdsourcing. Han Bao received his BS from Beihang University (BUAA), China in 2014. He is currently a PhD student in BUAA. His research interests include recommender systems, data mining, and machine learning.

Hailong Sun received his BS in com-

Koren Y. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 426–434

puter science from Beijing Jiaotong

Herbrich R, Graepel T, Obermayer K. Large margin rank boundaries for ordinal regression. Advances in Neural Information Processing Systems, 1999: 115–132

ory from Beihang University (BUAA),

Chapelle O, Wu M. Gradient descent optimization of smoothed information retrieval metrics. Information Retrieval, 2010, 13(3): 216–235

University, China in 2001. He received his PhD in computer software and theChina in 2008. He is currently an associate professor in the School of Computer Science and Engineering, BUAA.

18.

Baeza-Yates R, Ribeiro-Neto B. Modern information retrieval. New York: ACM press, 1999

19.

Voorhees E M. The TREC-8 question answering track report. In: Proceedings of TREC. 1999, 77–82

20.

Järvelin K, Kekäläinen J. IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2000, 41–48

Yanghao Wang received his BE from

Chapelle O, Metlzer D, Zhang Y, Grinspan P. Expected reciprocal rank for graded relevance. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 621–630

BUAA. His research interests include

21.

22.

Qin T, Liu T Y, Li H. A general approximation framework for direct optimization of information retrieval measures. Information Retrieval, 2010, 13(4): 375–397

His research interests include software systems, crowdsourcing and distributed computing.

Beihang University (BUAA), China in 2012. He is currently a PhD student in crowdsourcing, web services, and machine learning.

Richong Zhang received his BS and

Xudong Liu received his PhD in com-

MAS from Jilin University, China in

puter application technology from Bei-

2001 and 2004, respectively. He re-

hang University (BUAA), China. He is

ceived his MS from Dalhousie Univer-

a professor and doctoral supervisor at

sity, Canada in 2006. He received his

BUAA. His research interests mainly

PhD form the School of Information

include middleware technology and ap-

Technology and Engineering, Univer-

plications, service-oriented computing,

sity of Ottawa, Canada. He is currently

trusted network computing, and net-

an associate professor in the School of

work software development.