SLIM: Sparse Linear Methods for Top-N Recommender Systems Xia Ning and George Karypis Computer Science & Engineering University of Minnesota, Minneapolis, MN Email: {xning,
[email protected]}
December 14, 2011
Introduction
1
2
3 4
Methods
Materials
Experimental Results
Outline
Introduction Top-N Recommender Systems Definitions and Notations The State-of-the-Art methods Methods Sparse LInear Methods for top-N Recommendation Learning W for SLIM SLIM with Feature Selection Materials Experimental Results SLIM on Binary Data Top-N Recommendation Performance SLIM for Long-Tail Distribution SLIM Regularization Effects
5
SLIM on Rating Data Conclusions
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Conclusions
2/25
Introduction
1
2
3 4
Methods
Materials
Experimental Results
Outline
Introduction Top-N Recommender Systems Definitions and Notations The State-of-the-Art methods Methods Sparse LInear Methods for top-N Recommendation Learning W for SLIM SLIM with Feature Selection Materials Experimental Results SLIM on Binary Data Top-N Recommendation Performance SLIM for Long-Tail Distribution SLIM Regularization Effects
5
SLIM on Rating Data Conclusions
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Conclusions
3/25
Introduction
Methods
Materials
Experimental Results
Conclusions
Top-N Recommender Systems q Top-N recommendation q E-commerce: huge amounts of products q Recommend a short ranked list of items for users
q Top-N recommender systems q Neighborhood-based Collaborative Filtering (CF) q Item based [2]: fast to generate recommendations, low recommendation quality
q Model-based methods [1, 3, 5] q Matrix Factorization (MF) models: slow to learn the models, high recommendation quality
q SLIM: Sparse LInear Methods q Fast and high recommendation quality
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
4/25
Introduction
Methods
Materials
Experimental Results
Conclusions
Definitions and Notations Table 1: Definitions and Notations Def
Descriptions
ui tj U T A W aTi aj
user item all users (|U | = n) all items (|T | = m) user-item purchase/rating matrix, size n × m item-item similarity matrix/coefficient matrix The i-th row of A, the purchase/rating history of ui on T The j-th column of A, the purchase/rating history of U on tj
q Row vectors are represented by having the transpose supscriptT , otherwise by default they are column vectors. q Use matrix/vector notations instead of user/item purchase/rating profiles
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
5/25
Introduction
Methods
Materials
Experimental Results
The State-of-the-Art Methods Item-based Collaborative Filtering (1)
q Item-based k-nearest-neighbor (itemkNN) CF q Identify a set of similar items q Item-item similarity: q Calculated from A q Cosine similarity measure 2nd nn 1st nn
1 1
... ...
1 1 1 1
1 1 1
1 1
1 1
A
t1 t2 t3
. . . . . tj . . . .t.m−1 s s s
s s
s
tm−1 tm
s
s s
tj
s
... ...
1 1
1
t1 t2 t3
tm
... ...
1
... ...
..... .....
1
ui
un−1 un
. . . . . tj . . . . .tm−1
..... .....
t1 t2 t3 u1 u2 u3
s s
s s s
s
W
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
tm
s s s s
Conclusions
6/25
Introduction
Methods
Materials
Experimental Results
Conclusions
The State-of-the-Art Methods Item-based Collaborative Filtering (2)
tm−1 tm
p p
=
1
tj
tm−1 tm
s
×
1
tm−1 tm
s s s
s s s
tj
1
. . . . . tj . . . .tm−1 .
s s s
... ...
p p
1
t1 t2 t3 t1 t2 t3
... ...
p
t1 t2 t3
. .. . . . .. . .
. .. . . . .. . . tj
uT ∗·
p
. .. . . . .. . .
uT ∗· t1 t2 t3
s s
tm
s s s s s s s
s
q itemkNN recommendation q Recommend similar items to what the user has purchased a˜ Ti = aTi × W q Fast: sparse item neighborhood q Low quality: no knowledge is learned Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
7/25
Introduction
Methods
Materials
Experimental Results
Conclusions
The State-of-the-Art Methods Matrix Factorization (1)
q Latent factor models q Factorize A into low-rank user factors (U) and item factors (V T ) q U and V T represent user and item characteristics in a common latent space
q Formulated as an optimization problem U,V T
1
1 1
1 1 1
... ...
... ...
..... .....
1
ui
un−1 un
. . . . . tj . . . . .tm−1 1
1 1 1
1 1 1
1 1
1 1
A
tm
l1 l2 u1 u2 u3
..... .....
t1 t2 t3 u1 u2 u3
β λ 1 kA − UV T k2F + kUk2F + kV T k2F 2 2 2
ui
uk−1 uk
u u u u u u u u u u
u u u u u u u u u u
. . . lk u u u u u u u u u u
U
u u u u u u × u u u u
t1 t2 t3 l1 l2 ...
minimize
lk
v v v v
v v v v
v v v v
×
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
. . . . . tj . . . . t.m−1 tm v v v v
v v v v
v v v v
VT
v v v v
v v v v
v v v v
v v v v
8/25
Introduction
Methods
Materials
Experimental Results
Conclusions
The State-of-the-Art Methods Matrix Factorization (2) uT ∗·
tj
tm−1 tm
p p p p p p p p p p
=
t1 t2 t3
l1 l2 . . . lk u∗
u u u u ×
l1 l2 ...
..... .....
t1 t2 t3
lk
v v v v
v v v v
v v v v
. . . . . tj . . . . t.m−1 tm v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
q MF recommendation q Prediction: dot product in the latent space a˜ ij = UiT · Vj
q Slow: dense U and V T q High quality: user tastes and item properties are learned Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
9/25
Introduction
1
2
3 4
Methods
Materials
Experimental Results
Outline
Introduction Top-N Recommender Systems Definitions and Notations The State-of-the-Art methods Methods Sparse LInear Methods for top-N Recommendation Learning W for SLIM SLIM with Feature Selection Materials Experimental Results SLIM on Binary Data Top-N Recommendation Performance SLIM for Long-Tail Distribution SLIM Regularization Effects
5
SLIM on Rating Data Conclusions
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Conclusions 10/25
Introduction
Methods
Materials
Experimental Results
Conclusions 11/25
SLIM for top-N Recommendation
q Motivations: q recommendations generated fast q high-quality recommendations q “have my cake and eat it too”
q Key ideas: q retain the nature of itemkNN: sparse W q optimize the recommendation performance: learn W from A q sparsity structures q coefficient values
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction
Methods
Materials
Experimental Results
Conclusions 12/25
Learning W for SLIM q The optimization problem: β 1 kA − AWk2F + kWk2F + λkWk1 2 2 subject to W ≥ 0
minimize W
diag(W) = 0,
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
(1)
Introduction
Methods
Materials
Experimental Results
Conclusions 12/25
Learning W for SLIM q The optimization problem: β 1 kA − AWk2F + kWk2F + λkWk1 2 2 subject to W ≥ 0
minimize W
(1)
diag(W) = 0,
q Computing W: q The columns of W are independent: easy to parallelize q The decoupled problems: minimize
1 β kaj − Awj k22 + kwj k22 + λkwj k1 2 2
subject to
wj ≥ 0
wj
wj,j = 0, Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
(2)
Introduction
Methods
Materials
Experimental Results
Reducing model learning time 1 β kaj − Awj k22 + kwj k22 + λkwj k1 2 2
minimize wj
q fsSLIM: SLIM with feature selection q Prescribe the potential non-zero structure of wj q Select a subset of columns from A q itemkNN item-item similarity matrix aj
1
11 1
11
ui
un−1 un
1
1 1
1 1
1
1 1 111
1 1
A minimize wj
1 1 1
1 1 1
u1 u2 u3
. .... . . .... .
1
1
... ... ... ...
. .... . . .... .
u1 u2 u3
1
uj
um−1 um
1 1
1 1
1
1
1
1 1
1 1
A0
1 β kaj − A0 wj k22 + kwj k22 + λkwj k1 2 2
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Conclusions 13/25
Introduction
1
2
3 4
Methods
Materials
Experimental Results
Outline
Introduction Top-N Recommender Systems Definitions and Notations The State-of-the-Art methods Methods Sparse LInear Methods for top-N Recommendation Learning W for SLIM SLIM with Feature Selection Materials Experimental Results SLIM on Binary Data Top-N Recommendation Performance SLIM for Long-Tail Distribution SLIM Regularization Effects
5
SLIM on Rating Data Conclusions
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Conclusions 14/25
Introduction
Methods
Materials
Experimental Results
Conclusions 15/25
Datasets, Evaluation Methodology and Metrics Table 2: The Datasets Used in Evaluation dataset
#users
#items
ccard ctlg2 ctlg3 ecmrc BX ML10M Netflix Yahoo
42,067 22,505 58,565 6,594 3,586 69,878 39,884 85,325
18,004 17,096 37,841 3,972 7,602 10,677 8,478 55,371
#trns
rsize
csize
density ratings
308,420 7.33 17.13 1,814,072 80.61 106.11 453,219 7.74 11.98 50,372 7.64 12.68 84,981 23.70 11.18 10,000,054 143.11 936.60 1,256,115 31.49 148.16 3,973,104 46.56 71.75
0.04% 0.47% 0.02% 0.19% 0.31% 1.34% 0.37% 0.08%
1-10 1-10 1-5 1-5
q Datasets: 8 real datasets of 2 categories q Evaluation methodology: Leave-One-Out cross validation q Evaluation metrics q Hit Rate:
HR =
#hits #users
q Average Reciprocal Hit-Rank (ARHR) [2]: ARHR =
1 #users
#hits X
1 pi
i=1 Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction
1
2
3 4
Methods
Materials
Experimental Results
Outline
Introduction Top-N Recommender Systems Definitions and Notations The State-of-the-Art methods Methods Sparse LInear Methods for top-N Recommendation Learning W for SLIM SLIM with Feature Selection Materials Experimental Results SLIM on Binary Data Top-N Recommendation Performance SLIM for Long-Tail Distribution SLIM Regularization Effects
5
SLIM on Rating Data Conclusions
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Conclusions 16/25
Introduction
Methods
Materials
Experimental Results
Conclusions 17/25
SLIM on Binary Data Top-N recommendation performance
HR
0.24 0.20 0.16 0.12 0.08
ccard
ecmrc datasets
Figure 3: learning time comparison learning time (s)
Figure 1: HR comparison 0.28
100000 10000 1000 100 10 1 0.1
ccard
Netflix
ARHR
Netflix
Figure 4: testing time comparison testing time (s)
Figure 2: ARHR comparison 0.24
ecmrc datasets
10000
0.20 0.16 0.12 0.08 0.04
ccard
ecmrc datasets
itemkNN itemprob userkNN
1000 100 10 1 0.1
ccard
Netflix
PureSVD WRMF BPRMF
ecmrc datasets
BPRkNN SLIM fsSLIM
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Netflix
Introduction
Methods
Materials
Experimental Results
Conclusions 18/25
SLIM on Binary Data SLIM for Long-Tail Distribution
Figure 6: HR in ML10M tail Figure 5: Rating Distribution in ML10M HR
0.24
100%
0.16 0.12
10% % of items
0.20
ML10M tail
1% 0.1% 0.01%
short -head (popular)
0.001% 0
20% 40% 60% 80% % of purchases/ratings
Figure 7: ARHR in ML10M tail 100% 0.11
ARHR
0
long-tail (unpopular)
0.09 0.07 0.05
q SLIM outperforms the rest methods on the “long tail”.
ML10M tail
itemkNN itemprob userkNN
PureSVD WRMF BPRMF
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
BPRkNN SLIM fsSLIM
Introduction
Methods
Materials
Experimental Results
Conclusions 19/25
SLIM on Binary Data SLIM Recommendations for Different top-N
Figure 9: Netflix 0.30
0.12
0.25
0.09
0.20
HR
HR
Figure 8: BX 0.15
0.15
0.06
0.10
0.03 5
10
15 N
itemkNN itemprob
20
25
userkNN PureSVD
5
10
WRMF BPRMF
15 N
20
25
BPRkNN SLIM
q The performance difference between SLIM and the best of the other methods are higher for smaller values of N. q SLIM tends to rank most relevant items higher than the other methods. Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction
Methods
Materials
Experimental Results
Conclusions 20/25
SLIM on Binary Data SLIM Regularization Effects
Figure 10: SLIM Regularization Effects on BX 2.5
0.11
5.0
5.0 2.0
1.0
0.5
0.5
0.0
2.0
0.09
1.0
0.08
HR
1.0
0.10
3.0 β
1.5
β
2.0
time (s)
3.0
0.5
0.07
0.0 0.0 0.0 0.5 1.0 2.0 3.0 5.0 λ
minimize W
0.06 0.0 0.5 1.0 2.0 3.0 5.0 λ
1 β kA − AWk2F + kWk2F + λkWk1 2 2
q As greater `1 -norm regularization (i.e., larger λ ) is applied, lower recommendation time is achieved, indicating that the learned W is sparser. q The best recommendation quality is achieved when both of the regularization parameters β and λ are non-zero. q The recommendation quality changes smoothly as the regularization parameters β and λ change. Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction
Methods
Materials
Experimental Results
Conclusions 21/25
SLIM on Rating Data Top-N recommendation performance
30% rHR
distribution
Figure 11: SLIM on Netflix 20% 10% 0
0.32 0.24 0.16 0.08 0.00 1
1 2 3 4 5
rating PureSVD-r PureSVD-b
WRMF-r WRMF-b
2
3 rating BPRkNN-r BPRkNN-b
4
5 SLIM-r SLIM-b
q Evaluation metics: q per-rating Hit Rate: rHR
q All the -r methods produce higher hit rates on items with higher ratings. q The -r methods outperform -b methods on high-rated items. q SLIM-r consistently outperforms the other methods on items with higher ratings. Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction
1
2
3 4
Methods
Materials
Experimental Results
Outline
Introduction Top-N Recommender Systems Definitions and Notations The State-of-the-Art methods Methods Sparse LInear Methods for top-N Recommendation Learning W for SLIM SLIM with Feature Selection Materials Experimental Results SLIM on Binary Data Top-N Recommendation Performance SLIM for Long-Tail Distribution SLIM Regularization Effects
5
SLIM on Rating Data Conclusions
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Conclusions 22/25
Introduction
Methods
Materials
Experimental Results
Conclusions 23/25
Conclusions
q SLIM: Sparse LInear Method for top-N recommendations q The recommendation score for a new item can be calculated as an aggregation of other items q A sparse aggregation coefficient matrix W is learned for SLIM to make the aggregation very fast q W is learned by solving an `1 -norm and `2 -norm regularized optimization problem such that sparsity is introduced into W q Fast and efficient
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction
Methods
Materials
Experimental Results
Conclusions 24/25
References P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems, RecSys ’10, pages 39–46, New York, NY, USA, 2010. ACM. M. Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Transactions on Information Systems, 22:143–177, January 2004. Y. Hu, Y. Koren, and C. Volinsky. Collaborative filtering for implicit feedback datasets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 263–272, Washington, DC, USA, 2008. IEEE Computer Society. S. Rendle, C. Freudenthaler, Z. Gantner, and S.-T. Lars. Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, pages 452–461, Arlington, Virginia, United States, 2009. AUAI Press. V. Sindhwani, S. S. Bucak, J. Hu, and A. Mojsilovic. One-class matrix completion with low-density factorizations. In Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM ’10, pages 1055–1060, Washington, DC, USA, 2010. IEEE Computer Society. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (Series B), 58:267–288, 1996.
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction
Methods
Materials
Experimental Results
Thank You!
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Conclusions 25/25