Interaction-Rich Transfer Learning for Collaborative Filtering with

Collaborative Filtering

Interaction-Rich Transfer Learning for Collaborative Filtering with Heterogeneous User Feedback Weike Pan and Zhong Ming, Shenzhen University

A novel and efficient transfer learning algorithm called

F

acing the information flood in our daily lives, search engines mainly respond to our submitted queries passively, while recommender systems

aim to discover and meet our needs in a more active way. Collaborative filtering techniques1–4 have been applied in various recommendation-embedded

applications. However, lack of users’ accurate preference data—for example, five-star numerical ratings—might limit this aptransfer by collective proach’s applicability in real deployment. On the other side, a real recommender sysfactorization extends tem can usually make use of additional types of user feedback—for example, binary the efficient collective ratings of likes and dislikes.5 Hence, collaborative filtering with different types of user matrix factorization feedback provides a potential way to address the data sparsity problem of accurate algorithm by graded ratings. Here, we focus on this new research problem of collaborative filtering providing more with heterogeneous user feedback, which is associated with few prior works. interactions between A recent work proposed a transfer learning algorithm called transfer by collective the user-specific factorization (TCF) that exploits such heterogeneous user feedback.5 TCF addresses latent features. the data sparsity problem via simultaneously sharing data-independent knowledge and modeling the data-dependent effect of

interaction-rich

48

two types of feedback. However, TCF is a batch algorithm and updates model parameters only once after scanning the whole data, which might not be applicable for large datasets. On the contrary, some stochastic methods such as regularized singular value decomposition (RSVD)3 and collective matrix factorization (CMF) 6 are empirically much more efficient than alternative batch-style algorithms like probabilistic matrix factorization (PMF)7 and TCF. However, the prediction accuracy of RSVD and CMF might not be adequate when compared with that of TCF, especially when the users’ feedback are heterogeneous. There are also some efficient distributed or online collaborative filtering algorithms such as distributed stochastic gradient descent8 and online multitask collaborative filtering, 9 but they’re designed for homogeneous user feedback instead of the heterogenous ones studied in this article.

1541-1672/14/$31.00 © 2014 IEEE Published by the IEEE Computer Society

IEEE INTELLIGENT SYSTEMS

Related Work in Transfer Learning in Collaborative Filtering

T

References

ransfer learning in collaborative filtering (TLCF)1,2 is an emerging interdisciplinary topic, which aims to design transfer learning3 solutions to address the challenges in collaborative filtering,4 for example, rating sparsity. Parallel to transfer learning in text mining, TLCF has developed a family of new algorithms: model-, instance-, and featurebased transfer, which answer the question of “what to transfer” from the perspective of shared knowledge; and adaptive, collective, and integrative algorithms, which answer the question of “how to transfer” from the perspective of algorithm styles. We can categorize the proposed interaction-rich transfer by collective factorization (iTCF) algorithm as a feature-based (what to transfer), collective (how to transfer), transfer learning method. The most closely related work to our iTCF are transfer by collective factorization5 and collective matrix factorization,6 because they’re also feature-based collective algorithms.

In this work, we aim to achieve a good balance between the accuracy of TCF and the efficiency of CMF (see the related sidebar for information on others’ efforts). We extend the CMF algorithm by introducing richer interactions between the user-specific latent features, and design a corresponding algorithm called interactionrich transfer by collective factorization (iTCF). In particular, we assume that the predictability with regards to the same user’s rating behaviors in the related numerical ratings and binary ratings is likely to be similar. With this assumption, we design update rules by sharing not only the item-specific latent features as that in CMF, but also the user-specific latent features in a smooth manner. The iTCF algorithm thus introduces more interactions between user-specific latent features. Experimental results on three real-world datasets show the effectiveness of our iTCF over RSVD and CMF.

Background The studied problem setting is exactly the same as that of TCF. We have n users and m items in a target numerical rating data R = {rui}n×m ∈ {1, 2, 3, 4, 5, ?}n×m and an auxiliary binary rating data R = {rui }n × m ∈ {0, 1,?}n × m, where ? denotes a missing value. The users and november/december 2014

? ? ? ? ? 4

3 ? ? ? ? ? 3 1 2 ? 1 5 5 ? 5 ? 5 ? ? 1 ? ?

R

1. B. Li, Q. Yang, and X. Xue, “Transfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model,” Proc. 26th Ann. Int’l Conf. Machine Learning, 2009, pp. 617–624. 2. W. Pan, E.W. Xiang, and Q. Yang, “Transfer Learning in Collaborative Filtering via Uncertain Ratings,” Proc. 26th AAAI Conf. Artificial Intelligence, 2012. 3. S.J. Pan and Q. Yang, “A Survey on Transfer Learning,” IEEE Trans. Knowledge and Data Eng., vol. 22, no. 10, 2010, pp. 1345–1359. 4. D. Goldberg et al., “Using Collaborative Filtering to Weave an Information Tapestry,” Comm. ACM, vol. 35, no. 12, 1992, pp. 61–70. 5. W. Pan and Q. Yang, “Transfer Learning in Heterogeneous Collaborative Filtering Domains,” Artificial Intelligence, vol. 197, Apr. 2013, pp. 39–55. 6. A.P. Singh and G.J. Gordon, “Relational Learning via Collective Matrix Factorization,” Proc. 14th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, 2008, pp. 650–658.

u = 1,..., n

u = 1,..., n U

Uu.

Uu.

VT

i = 1,..., m

Target data (5-star graded ratings)

~ rui 0 1 0 ? 0 ?

? 0 0 1 ? 0 ? ? 0 1 0 1 ? 0 1 ? ? 0 0 ? 0 1

~ R

W

VT

Vi.

~ rui

Wu.

Wu.

u = 1,..., n

u = 1,..., n

CMF

Auxiliary data (binary ratings of likes/dislikes)

(a)

i = 1,..., m rui

rui

(b)

Vi.

iTCF

(c)

Figure 1. Problem setting and solutions. (a) Illustration of the studied problem setting. Two transfer learning solutions: (b) collective matrix factorization (CMF) and (c) interaction-rich transfer by collective factorization (iTCF).

items in the two data types are the same, and a one-to-one mapping is given. Our goal is to transfer knowledge from R to help predict the missing values in R. We illustrate the problem setting in Figure 1a. Note that in this article, we aim to design an efficient transfer learning algorithm, because the lack of efficiency is a major limitation of TCF.

where the target numerical rating matrix R is factorized into a user-specific latent feature matrix U ∈ n×d and an item-specific latent feature matrix V ∈ m×d. Once we have obtained the latent feature matrices, we can predict the rating located at (u, i) via rûi = Uu ⋅ViT⋅ , where Uu· ∈ 1×d and Vi· ∈ 1×d are user u’s and item i’s latent feature vectors, respectively.

PMF

PMF models the preference data via two latent feature matrices, R ~ UVT,(1) www.computer.org/intelligent

CMF

CMF uses item-side auxiliary data via sharing the same item-specific latent features. We use matrix notations 49


to illustrate its idea on knowledge sharing,  ~ WVT , (2) R ~ UVT , R

where the auxiliary binary rating matrix R is decomposed into a userspecific latent feature matrix W ∈ n×d and an item-specific latent feature matrix V ∈ m×d. The knowledge encoded in the item-specific latent feature matrix V is shared in two factorization systems, and the two user-specific latent feature matrices W and U aren’t shared. For our problem with the same users and same items in the target and auxiliary data, we reach the following factorization systems:  ~ WVT , s.t. W = U, (3) R ~ UVT , R

which means that CMF reduces to PMF with a pool of both target and auxiliary data R ∪ R . However, such a reduced approach will cause performance degradation, because it ignores the heterogeneity of the users’ feedback in R and R . Obviously, the semantic meaning of likes and dislikes in the auxiliary data are different from that of graded ratings in the target data.

Our Solution Now let’s look at the solution we propose. iTCF

We can see that the PMF in Equation 1 doesn’t make use of auxiliary data, CMF in Equation 2 only makes use of item-side auxiliary data, while CMF in Equation 3 reduces to PMF without distinguishing the heterogeneity of user feedback. The question we ask in this article is whether we can transfer more knowledge besidevs sharing the item-specific latent features in CMF as shown in Equation 2 and Figure 1b. There’s some potential that 50

we can exploit because the users in both data types are the same. For a typical user, a model’s prediction accuracy trained on the target data of numerical ratings or auxiliary data of binary ratings is likely to be similar, because a user’s preference variation and a model’s ability to capture the user’s preference usually doesn’t change much in two related data. With this assumption, we reach the following factorization systems:  ~ WVT , s.t. E = E , (4) R ~ UVT , R

where E and E denote the corresponding errors of the prediction model on the two data types, representing the predictability of user preferences. We can see that the main difference between Equations 2 and 4 is from the shared predictability in Equation 4, denoted as E = E . We expand the matrix formulation in Equation 4 as follows: n

m

∑ ∑ fui , s.t. eui = eui , ,V , b , b , µ

min

Wu⋅ ,Uu⋅

i⋅

u

i

u =1 i =1

(5) 2 2)eui

+ ℜ ui ] + λ y ui × where fui = yui [(1 / 2  ] is a balanced loss [(1 / 2)eui +ℜ ui

function on two data with l > 0. Note that eui = rui − rûi and eui = rui − rûi are the errors of the prediction model on missing ratings in the target data and auxiliary data, respectively, where rûi = µ + bu + bi + Uu ⋅ViT⋅ rûi = and Wu ⋅ViT⋅ are estimated preferences, m is the global average, bu is the user bias of user u, and bi is item bias of item i. The variables yui and y ui indicate whether the entry located at (u, i) is observed in the target data and auxiliary data, respectively. ℜ ui = 2 2 (α u / 2) Uu⋅ + (α v / 2) Vi⋅ + (β u / 2)bu2 + 2  = (α / 2) W 2 + ( βv / 2)bi and ℜ ui w u⋅ 2 (α v / 2) Vi⋅ + ( β u / 2)bu2 + ( βv / 2) bi2 are regularization terms used to avoid overfitting when learning the latent variables. www.computer.org/intelligent

Learning the iTCF

To solve the optimization problem in Equation 5, we start from the perspective of gradient descents, which will be used in the stochastic gradient descent framework. Learning parameters using the target data. Given a rating from the target data rui with yui = 1 and y ui = 0, we

have gradients, ∇Uu· = −euiVi· + αu Uu·, ∇Vi· = −euiUu· + αv Vi·, ∇bu = −eui + bubu, ∇bi = −eui + bvbi, and ∇m = −eui for Uu·, Vi·, bu, bi, and m, respectively. Besides using these gradients to update the target parameters, we can also make use of auxiliary variables Wu· to update the target item-specific latent feature vector Vi∙, because the predictability is assumed to be similar and can be shared—that is, eui = eui . Given eui , we have the gradient of Vi∙ in the auxiliary data, ∇Vi ⋅ = − euiWu ⋅ + α vVi ⋅.

We combine two gradients for the item-specific latent feature vector Vi∙, ∇Vi⋅ = ρ  − euiUu⋅ + α vVi⋅  + (1 − ρ)  − euiWu⋅ + α vVi⋅  = ρ  − euiUu⋅ + α vVi⋅  + (1 − ρ)  − euiWu⋅ + α vVi⋅  = − eui  ρUu⋅ + (1 − ρ)Wu⋅  + α vVi⋅ ,

where 0 ≤ r ≤ 1 is a parameter used to linearly integrate two gradients. Comparing rUu· + (1 − r)Wu· and Uu· in the gradient ∇Vi·, we can see that more interactions between the userspecific latent features Uu· and Wu· are introduced, which is also illus trated via graphical models in Figure 1c. For this reason, we call r an interaction parameter between the user-specific latent features. We can see that the shared predictability will introduce more interactions between the user-specific IEEE INTELLIGENT SYSTEMS

Input: The target user-item numerical rating matrix R, the auxiliary user-item binary ˜ rating matrix R. Output: The user-specific latent feature vector Uu’ and bias bu⋅, user-specific latent feature vector Wu⋅, item-specific latent feature vector Vi⋅ and bias bi⋅, and global average µ,where u = 1,...,n,i = 1,...,m. For t = 1,...,T ˜ For iter = 1,...,q + q ˜ Step 1. Randomly pick up a rating from R and R; ˜ui = 1; Step 2. Calculate the gradients as shown in Eqs.(6−10) if yui = 1 or Eqs.(10−11) if y Step 3. Update the parameters as shown in Eq.(12). End End Figure 2. The iTCF algorithm.

latent feature matrices U and W via rUu· + (1 − r)Wu· in ∇Vi·. And for this reason, we call the proposed approach iTCF, representing interaction-rich transfer by collective factorization.

∇Uu. = −euiVi. + αuUu., if yui = 1;

Learning parameters using the auxiliary data. Similar to that of the target

∇Wu⋅ = − λeui Vi⋅ + λα wWu⋅ , if y ui = 1; (11)

data, given a rating from the auxiliary data rui with y ui = 1 and yui = 0, we have the following gradient:

where Zu⋅ = rUu⋅ + (1 − r)Wu. and Z u ⋅ = ρWu ⋅ + (1 − ρ)Uu ⋅. We can see that when r = 1, we have Zu⋅ = Uu⋅ and Z u ⋅ = Wu ⋅, which are exactly the same as that of CMF. CMF is thus a special case of iTCF, which only shares the item-specific latent feature matrix V with r = 1. When 0 < r < 1, the equation Zu⋅ = rUu⋅ + (1 − r)Wu⋅ in iTCF is considered a smooth version, in comparison with Uu only in CMF, which is likely to be more stable in the stochastic algorithmic framework of Stochastic Gradient Descent (SGD). Finally, we have the update rules,

∇Wu· = −λeuiVi· + λα w Wu· , ∇Vi· = −λeui ρWu· + (1 − ρ ) Uu·  + λαv Vi· ,

where 0 ≤ r ≤ 1 is again an interaction parameter to combine two gradients. Similarly, more interactions are introduced between the user-specific latent features Wu· and Uu· in rWu· + (1 − r)Uu·. The algorithm. We thus have the gradients given a target numerical rating (y ui = 1) or an auxiliary binary rating (y ui = 1 ) as follows:

∇bu = −eui + b ubu, if yui = 1;

(6)

∇bi = −eui + b vbi, if yui = 1;

(7)

∇m = −eui, if yui = 1;

(8)

november/december 2014

(9)

∇Vi⋅ − eui Zu⋅ + α vVi⋅ , if yui = 1, =  − λeui Zu⋅ + λα vVi⋅ , if y ui = 1; (10)

q = q − γ∇q,(12) where q can be bu, bi, m, Uu·, Vi· when yui = 1; and Vi·, Wu⋅ when y ui = 1. Note that g > 0 is the learning rate. Figure 2 describes a complete algorithm with the previously discussed update rules, where it goes over all of both target and auxiliary data in www.computer.org/intelligent

T times. The time complexity of iTCF and CMF are O(T (q + q )d) and that of RSVD is O(Tqd), where q and q are the numbers of ratings in target and auxiliary data, respectively. The learning algorithm of iTCF is much more efficient than that of TCF, because iTCF is a stochastic algorithm while TCF is a batch one. Note that TCF can’t use similar stochastic update rules because of the orthonormal constraints on user-specific and item-specific latent feature matrices in the adopted matrix trifactorization model, and its time complexity is O(K max(q, q )d 3 + Kd 6) with K as the iteration number.5 The difference between TCF and our iTCF can be identified from the two fundamental questions in transfer learning.10 To answer the question of “what to transfer,” TCF shares latent features, while our iTCF shares both latent features and the predictability; and for “how to transfer,” TCF adopts matrix trifactorization and batch style implementation, while our iTCF uses the more efficient matrix bifactorization and stochastic style implementation.

Experimental Results Next, we tested the algorithm to determine its performance. 51

Collaborative Filtering Table 1. Description of Netflix subset (n = m = 5000), MovieLens10M (n = 71, 567, m = 10,681), and Flixter (n = 147, 612, m = 48, 794) data used in the experiments. Dataset Netflix

MovieLens10M

Flixter

Form

Sparsity (%)

Target (training)

{1, …, 5, ?}

0.8

Target (test)

{1, …, 5, ?}

11.3

Auxiliary

{dislike, like, ?}

2

Target (training)

{0.5, …, 5, ?}

0.52

Target (test)

{0.5, …, 5, ?}

0.26

Auxiliary

{dislike, like, ?}

0.52

Target (training)

{0.5, …, 5, ?}

0.046

Target (test)

{0.5, …, 5, ?}

0.023

Auxiliary

{dislike, like, ?}

0.046

Table 2. Prediction performance of iTCF and other methods on the Netflix subset data.* Mean absolute error (MAE)

Root mean square error (RMSE)

Style

Algorithm

Batch

Probabilistic matrix factorization

0.7642 ±0.0003

0.9691±0.0007

CMF-link

0.7295 ±0.0003

0.9277±0.0004

TCF (CMTF)**

0.6962 ±0.0009

0.8884 ±0.0007

TCF (CSVD)**

0.6877±0.0007

0.8809 ±0.0005

Regularized singular value decomposition (RSVD)

0.7236 ±0.0003

0.9201±0.0004

CMF

0.7054 ±0.0002

0.9020 ±0.0003

iTCF

0.7014 ±0.0005

0.8966 ±0.0004

Stochastic

* For stochastic algorithms, the interaction parameter r is fixed at 0.5, and the number of iterations is fixed at 50. Batch algorithm results are from other work.5 ** CMTF = collective matrix trifactorization; CSVD = collective singular value decomposition.

Datasets and Evaluation Metrics

We extracted the first dataset from Netflix (see www.netflix.com) in the same way as that used in other work.5 The data contains three copies of numerical ratings and binary ratings assigned by 5,000 users on 5,000 items. Note that we used this small dataset for empirical studies among iTCF, TCF, and other methods, because TCF might not scale well to large datasets. We extracted the second dataset from MovieLens10M (see www.grouplens.org/node/73/) in the same way as that used in other work.11 The data contains five copies of target, auxiliary, and test data. For each copy of auxiliary data, we convert ratings smaller than four to “dislike,” and ratings 52

larger than or equal to four to “like,” to simulate the binary feedback. We extracted the third dataset from Flixter (see www.cs.ubc.ca/~ jamalim/datasets).12 This data contains 8.2 × 106 ratings given by 1.5 × 105 users on 4.9 × 104 products. We preprocess the Flixter rating data in the same way as that of the MovieLens10M data to generate five copies of target, auxiliary, and test data. Table 1 shows the detailed statistics of the datasets used in the experiments. For iTCF, RSVD, and CMF, “dislike” and “like” are replaced with numerical values of “1” and “5,” respectively, to make both target data and auxiliary data in the same rating range. We adopt two commonly used evaluation metrics in recommender www.computer.org/intelligent

systems: mean absolute error (MAE) and root mean square error (RMSE), MAE =

∑

rui − rûi / TE

(u, i , rui )∈TE

RMSE =

∑

(rui − rûi )2 / TE ,

(u, i , rui )∈TE

where r ui and rûi are the true and predicted ratings, respectively, and |T E | is the number of test ratings. Baselines and Parameter Settings

We compare our iTCF algorithm with some batch algorithms5 on the small dataset, because batch algorithms aren’t very efficient. We also compare iTCF with two stochastic algorithms, RSVD and CMF, on the aforementioned two large datasets. For iTCF, RSVD, and CMF, the model parameters of m, bu, bi, Uik, Vik, and Wik, k = 1, …, d are initialized exactly the same as that done in previous work.11 The tradeoff parameters are set similarly to that used by Yehuda Koren, αu = αv = αw = 0.01, b u = b v = 0.01.3 The learning rate is initialized as g = 0.01 and decreased via g ← g × 0.9 over every scan of both the target data and auxiliary data.3 For the Netflix subset data, we set the number of latent features as d = 10;5 for the MovieLens10M data, we use d = 20;13 and for the Flixter data, we use d = 10.12 To study the effectiveness of interactions between userspecific latent features, we report the results of using different values of r ∈ {0, 0.2, 0.4, 0.6, 0.8, 1}. Note that when r = 1, iTCF reduces to CMF. The value of l is fixed as 1 with the same weight on auxiliary and target data for the MovieLens10M and Flixter data, and is fixed as 10 for the Netflix subset data. Results

Now, let’s study the results of the algorithm’s performance. IEEE INTELLIGENT SYSTEMS

Table 3. Prediction performance of iTCF and other methods on MovieLens10M and Flixter data.*

Comparison with stochastic algorithms.

From Table 3, we can see that iTCF is again better than RSVD and CMF, which shows the effect of the introduced richer interactions between auxiliary and target data in the proposed transfer learning solution. We can also see that the transfer learning methods CMF and iTCF are both better than RSVD, which shows the usefulness of the auxiliary data and the effectiveness of the knowledge transfer mechanisms in CMF and iTCF. Impact of interaction parameter (r). From Figure 3, we can see that iTCF performs best when 0.2 ≤ r ≤ 0.4, which shows that a relatively strong interaction is useful. Note that when r = 1, iTCF reduces to CMF with no interactions between user-specific latent features.

I

n this article, we propose a novel and efficient transfer learning algorithm, iTCF, in collaborative filtering with heterogeneous user feedback. Our iTCF aims to transfer knowledge from auxiliary binary ratings of likes and dislikes to improve the target numerical rating prediction performance in an efficient way. Our iTCF algorithm achieves this via introducing richer interactions by sharing both item-specific november/december 2014

RSVD

CMF

iTCF

MAE RMSE

0.6438 ±0.0011 0.8364 ±0.0012

0.6334 ±0.0012 0.8273 ±0.0013

0.6197±0.0006 0.8091±0.0008

Flixter

MAE RMSE

0.6561±0.0007 0.8814 ±0.0010

0.6423 ±0.0009 0.8710 ±0.0012

0.6373 ±0.0005 0.8636 ±0.0010

* The interaction parameter r is fixed at 0.5. The number of iterations is fixed at 50.

0.83

0.875

0.82

RMSE

From Table 2, we can see that the batch algorithm TCF performs better than the proposed stochastic algorithm iTCF, because TCF is able to capture the datadependent effect and to transfer the data-independent knowledge simultaneously in a principled way. The iTCF algorithm aims for efficiency and large data, which beats other batch algorithms of PMF and CMF-link, and stochastic algorithms of RSVD and CMF. The results of batch algorithms of PMF, CMF-link, and TCF shown in Table 2 are from other research.5

Metric

MovieLens10M

RMSE

Comparison with batch algorithms.

Data

0.870

0.865 0.81 0

0.2

0.4

0.6

0.8

1.0

0.860

ρ

(a)

0

0.2

0.4

ρ

0.6

0.8

1.0

(b)

Figure 3. Prediction performance of iTCF on (a) MovieLens10M data and (b) Flixter data with different q values. The number of iterations is fixed at 50.

latent features and the predictability in two heterogeneous data in a smooth manner. Our iTCF is more efficient than a recent batch algorithm—that is, TCF—and performs better than two state-of-the-art stochastic algorithms— that is, RSVD and CMF. For future work, we’re interested in generalizing the idea of introducing rich interactions in heterogeneous user feedback to the problem of collaborative filtering with auxiliary information of social context and implicit feedback.14

Acknowledgments

We thank the National Natural Science Foundation of China (no. 61170077 and no. 61272303), NSF GD (no. 10351806001000000), GDS&T (no. 2012B091100198), S&T project of SZ (no. JCYJ20130326110956468), and the National Basic Research Program of China (973 Plan, no. 2010CB327903) for their support. Zhong Ming is the corresponding author for this work.

References 1. G. Adomavicius and A. Tuzhilin, “Toward the Next Generation of Recommender Systems: A Survey of the www.computer.org/intelligent

S tate-of-the-Art and Possible Extensions,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 6, 2005, pp. 734–749. 2. D. Goldberg et al., “Using Collaborative Filtering to Weave an Information Tapestry,” Comm. ACM, vol. 35, no. 12, 1992, pp. 61–70. 3. Y. Koren, “Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model,” Proc. 14th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, 2008, pp. 426–434. 4. S. Rendle, “Factorization Machines with libFM,” ACM Trans. Intelligent Systems and Technology, vol. 3, no. 3, 2012, pp. 57:1–57:22. 5. W. Pan and Q. Yang, “Transfer Learning in Heterogeneous Collaborative Filtering Domains,” Artificial Intelligence, vol. 197, Apr. 2013, pp. 39–55. 6. A.P. Singh and G.J. Gordon, “Relational Learning via Collective Matrix Factorization,” Proc. 14th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, 2008, pp. 650–658. 7. R. Salakhutdinov and A. Mnih, “Probabilistic Matrix Factorization,” Proc. 53


The Authors Weike Pan is a lecturer with the College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China. His research interests include transfer learning, recommender systems, and statistical machine learning. Pan has a PhD in computer science and engineering from the Hong Kong University of Science and Technology, Kowloon, Hong Kong. Contact him at [email protected]. Zhong Ming is a professor with the College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China. His research interests include software engineering and Web intelligence. Ming has a PhD in computer science and technology from the Sun Yat-Sen University, Guangzhou, Guangdong, China. He is the corresponding author. Contact him at [email protected].

Ann. Conf. Neural Information Processing Systems, 2008, pp. 1257–1264. 8. R. Gemulla et al., “Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent,” Proc. 17th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, 2011, pp. 69–77. 9. J. Wang et al., “Online Multi-Task Collaborative Filtering for On-the-Fly Recommender Systems,” Proc. 7th ACM Conf. Recommender Systems, 2013, pp. 237–244.

10. S.J. Pan and Q. Yang, “A Survey on Transfer Learning,” IEEE Trans. Knowledge and Data Eng., vol. 22, no. 10, 2010, pp. 1345–1359. 11. W. Pan, E.W. Xiang, and Q. Yang, “Transfer Learning in Collaborative Filtering via Uncertain Ratings,” Proc. 26th AAAI Conf. Artificial Intelligence, 2012. 12. M. Jamali and M. Ester, “A Matrix Factorization Technique with Trust Propagation for Recommendation in Social

Networks,” Proc. 4th ACM Conf. Recommender Systems, 2010, pp. 135–142. 13. T.C. Zhou et al., “TagRec: Leveraging Tagging Wisdom for Recommendation,” Proc. 2009 Int’l Conf. Computational Science and Eng., 2009, pp. 194–199. 14. N.N. Liu, L. He, and M. Zhao, “Social Temporal Collaborative Ranking for Context Aware Movie Recommendation,” ACM Trans. Intelligent Systems and Technology, vol. 4, no. 1, 2013, pp. 15:1–15:26.

Selected CS articles and columns are also available for free at http://ComputingNow.computer.org.

ADVERTISER INFORMATION Advertising Personnel

Southwest, California: Mike Hughes Email: [email protected] Phone: +1 805 529 6790

Marian Anderson: Sr. Advertising Coordinator Email: [email protected] Phone: +1 714 816 2139 | Fax: +1 714 821 4010 Sandy Brown: Sr. Business Development Mgr. Email [email protected] Phone: +1 714 816 2144 | Fax: +1 714 821 4010 Advertising Sales Representatives (display)

Advertising Sales Representatives (Classified Line)

Central, Northwest, Far East: Eric Kincaid Email: [email protected] Phone: +1 214 673 3742 Fax: +1 888 886 8599

Heather Buonadies Email: [email protected] Phone: +1 973 304 4123 Fax: +1 973 585 7071

Northeast, Midwest, Europe, Middle East: Ann & David Schissler Email: [email protected], [email protected] Phone: +1 508 394 4026 Fax: +1 508 394 1707

54

Southeast: Heather Buonadies Email: [email protected] Phone: +1 973 304 4123 Fax: +1 973 585 7071

Advertising Sales Representatives (Jobs Board)

Heather Buonadies Email: [email protected] Phone: +1 973 304 4123 Fax: +1 973 585 7071

www.computer.org/intelligent

IEEE INTELLIGENT SYSTEMS

Interaction-Rich Transfer Learning for Collaborative Filtering with

Interaction-Rich Transfer Learning for Collaborative Filtering with

Suggest Documents

Transfer Learning in Collaborative Filtering for ... - onlinepresent.org

Transfer learning in heterogeneous collaborative filtering domains

Transfer Learning for Collaborative Filtering via a Rating-Matrix ...

Transfer Learning for Collaborative Filtering via a Rating-Matrix ...

Transfer Learning for Collaborative Filtering via a ... - Google Sites

Transfer Learning for Collaborative Filtering Using a Psychometrics ...

Collaborative Filtering with Stability

Hybrid Collaborative Filtering with Autoencoders

Supervised Learning-Based Collaborative Filtering ... - Semantic Scholar

Supervised Learning-Based Collaborative Filtering Using Market ...

Collaborative Filtering: A Machine Learning Perspective by ...

Clustered Collaborative Filtering Approach for

Collaborative Filtering for Community Threats

Sparse Online Learning for Collaborative Filtering 1 Introduction

Sparse Online Learning for Collaborative Filtering 1 Introduction

A Bayesian Approach toward Active Learning for Collaborative Filtering

Temporal Collaborative Filtering with Bayesian ... - CiteSeerX

Semantic Social Collaborative Filtering with ... - Semantic Scholar

Semantic Social Collaborative Filtering with ... - Semantic Scholar

Collaborative Filtering with Recurrent Neural Networks

Probabilistic Collaborative Filtering with Negative Cross Entropy

A Neural Collaborative Filtering Model with Interaction

Cross-Domain Collaborative Filtering with Factorization Machines

Collaborative Filtering with Privacy - Semantic Scholar