SLIM: Sparse Linear Methods for Top-N ... - Semantic Scholar

SLIM: Sparse Linear Methods for Top-N Recommender Systems Xia Ning and George Karypis Computer Science & Engineering University of Minnesota, Minneapolis, MN Email: {xning,[email protected]}

December 14, 2011

Introduction

1

2

3 4

Methods

Materials

Experimental Results

Outline

Introduction Top-N Recommender Systems Definitions and Notations The State-of-the-Art methods Methods Sparse LInear Methods for top-N Recommendation Learning W for SLIM SLIM with Feature Selection Materials Experimental Results SLIM on Binary Data Top-N Recommendation Performance SLIM for Long-Tail Distribution SLIM Regularization Effects

5

SLIM on Rating Data Conclusions

Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems

Conclusions

2/25

Introduction

1

2

3 4

Methods

Materials


Outline


5



Conclusions

3/25

Introduction

Methods

Materials


Conclusions

Top-N Recommender Systems q Top-N recommendation q E-commerce: huge amounts of products q Recommend a short ranked list of items for users

q Top-N recommender systems q Neighborhood-based Collaborative Filtering (CF) q Item based [2]: fast to generate recommendations, low recommendation quality

q Model-based methods [1, 3, 5] q Matrix Factorization (MF) models: slow to learn the models, high recommendation quality

q SLIM: Sparse LInear Methods q Fast and high recommendation quality


4/25

Introduction

Methods

Materials


Conclusions

Definitions and Notations Table 1: Definitions and Notations Def

Descriptions

ui tj U T A W aTi aj

user item all users (|U | = n) all items (|T | = m) user-item purchase/rating matrix, size n × m item-item similarity matrix/coefficient matrix The i-th row of A, the purchase/rating history of ui on T The j-th column of A, the purchase/rating history of U on tj

q Row vectors are represented by having the transpose supscriptT , otherwise by default they are column vectors. q Use matrix/vector notations instead of user/item purchase/rating profiles


5/25

Introduction

Methods

Materials


The State-of-the-Art Methods Item-based Collaborative Filtering (1)

q Item-based k-nearest-neighbor (itemkNN) CF q Identify a set of similar items q Item-item similarity: q Calculated from A q Cosine similarity measure 2nd nn 1st nn

1 1

... ...

1 1 1 1

1 1 1

1 1

1 1

A

t1 t2 t3

. . . . . tj . . . .t.m−1 s s s

s s

s

tm−1 tm

s

s s

tj

s

... ...

1 1

1

t1 t2 t3

tm

... ...

1

... ...

..... .....

1

ui

un−1 un

. . . . . tj . . . . .tm−1

..... .....

t1 t2 t3 u1 u2 u3

s s

s s s

s

W


tm

s s s s

Conclusions

6/25

Introduction

Methods

Materials


Conclusions

The State-of-the-Art Methods Item-based Collaborative Filtering (2)

tm−1 tm

p p

=

1

tj

tm−1 tm

s

×

1

tm−1 tm

s s s

s s s

tj

1

. . . . . tj . . . .tm−1 .

s s s

... ...

p p

1

t1 t2 t3 t1 t2 t3

... ...

p

t1 t2 t3

. .. . . . .. . .

. .. . . . .. . . tj

uT ∗·

p

. .. . . . .. . .

uT ∗· t1 t2 t3

s s

tm

s s s s s s s

s

q itemkNN recommendation q Recommend similar items to what the user has purchased a˜ Ti = aTi × W q Fast: sparse item neighborhood q Low quality: no knowledge is learned Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems

7/25

Introduction

Methods

Materials


Conclusions

The State-of-the-Art Methods Matrix Factorization (1)

q Latent factor models q Factorize A into low-rank user factors (U) and item factors (V T ) q U and V T represent user and item characteristics in a common latent space

q Formulated as an optimization problem U,V T

1

1 1

1 1 1

... ...

... ...

..... .....

1

ui

un−1 un

. . . . . tj . . . . .tm−1 1

1 1 1

1 1 1

1 1

1 1

A

tm

l1 l2 u1 u2 u3

..... .....

t1 t2 t3 u1 u2 u3

β λ 1 kA − UV T k2F + kUk2F + kV T k2F 2 2 2

ui

uk−1 uk

u u u u u u u u u u

u u u u u u u u u u

. . . lk u u u u u u u u u u

U

u u u u u u × u u u u

t1 t2 t3 l1 l2 ...

minimize

lk

v v v v

v v v v

v v v v

×


. . . . . tj . . . . t.m−1 tm v v v v

v v v v

v v v v

VT

v v v v

v v v v

v v v v

v v v v

8/25

Introduction

Methods

Materials


Conclusions

The State-of-the-Art Methods Matrix Factorization (2) uT ∗·

tj

tm−1 tm

p p p p p p p p p p

=

t1 t2 t3

l1 l2 . . . lk u∗

u u u u ×

l1 l2 ...

..... .....

t1 t2 t3

lk

v v v v

v v v v

v v v v

. . . . . tj . . . . t.m−1 tm v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

q MF recommendation q Prediction: dot product in the latent space a˜ ij = UiT · Vj

q Slow: dense U and V T q High quality: user tastes and item properties are learned Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems

9/25

Introduction

1

2

3 4

Methods

Materials


Outline


5



Conclusions 10/25

Introduction

Methods

Materials


Conclusions 11/25

SLIM for top-N Recommendation

q Motivations: q recommendations generated fast q high-quality recommendations q “have my cake and eat it too”

q Key ideas: q retain the nature of itemkNN: sparse W q optimize the recommendation performance: learn W from A q sparsity structures q coefficient values


Introduction

Methods

Materials


Conclusions 12/25

Learning W for SLIM q The optimization problem: β 1 kA − AWk2F + kWk2F + λkWk1 2 2 subject to W ≥ 0

minimize W

diag(W) = 0,


(1)

Introduction

Methods

Materials


Conclusions 12/25

Learning W for SLIM q The optimization problem: β 1 kA − AWk2F + kWk2F + λkWk1 2 2 subject to W ≥ 0

minimize W

(1)

diag(W) = 0,

q Computing W: q The columns of W are independent: easy to parallelize q The decoupled problems: minimize

1 β kaj − Awj k22 + kwj k22 + λkwj k1 2 2

subject to

wj ≥ 0

wj

wj,j = 0, Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems

(2)

Introduction

Methods

Materials


Reducing model learning time 1 β kaj − Awj k22 + kwj k22 + λkwj k1 2 2

minimize wj

q fsSLIM: SLIM with feature selection q Prescribe the potential non-zero structure of wj q Select a subset of columns from A q itemkNN item-item similarity matrix aj

1

11 1

11

ui

un−1 un

1

1 1

1 1

1

1 1 111

1 1

A minimize wj

1 1 1

1 1 1

u1 u2 u3

. .... . . .... .

1

1

... ... ... ...

. .... . . .... .

u1 u2 u3

1

uj

um−1 um

1 1

1 1

1

1

1

1 1

1 1

A0

1 β kaj − A0 wj k22 + kwj k22 + λkwj k1 2 2


Conclusions 13/25

Introduction

1

2

3 4

Methods

Materials


Outline


5



Conclusions 14/25

Introduction

Methods

Materials


Conclusions 15/25

Datasets, Evaluation Methodology and Metrics Table 2: The Datasets Used in Evaluation dataset

#users

#items

ccard ctlg2 ctlg3 ecmrc BX ML10M Netflix Yahoo

42,067 22,505 58,565 6,594 3,586 69,878 39,884 85,325

18,004 17,096 37,841 3,972 7,602 10,677 8,478 55,371

#trns

rsize

csize

density ratings

308,420 7.33 17.13 1,814,072 80.61 106.11 453,219 7.74 11.98 50,372 7.64 12.68 84,981 23.70 11.18 10,000,054 143.11 936.60 1,256,115 31.49 148.16 3,973,104 46.56 71.75

0.04% 0.47% 0.02% 0.19% 0.31% 1.34% 0.37% 0.08%

1-10 1-10 1-5 1-5

q Datasets: 8 real datasets of 2 categories q Evaluation methodology: Leave-One-Out cross validation q Evaluation metrics q Hit Rate:

HR =

#hits #users

q Average Reciprocal Hit-Rank (ARHR) [2]: ARHR =

1 #users

#hits X

1 pi

i=1 Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems

Introduction

1

2

3 4

Methods

Materials


Outline


5



Conclusions 16/25

Introduction

Methods

Materials


Conclusions 17/25

SLIM on Binary Data Top-N recommendation performance

HR

0.24 0.20 0.16 0.12 0.08

ccard

ecmrc datasets

Figure 3: learning time comparison learning time (s)

Figure 1: HR comparison 0.28

100000 10000 1000 100 10 1 0.1

ccard

Netflix

ARHR

Netflix

Figure 4: testing time comparison testing time (s)

Figure 2: ARHR comparison 0.24

ecmrc datasets

10000

0.20 0.16 0.12 0.08 0.04

ccard

ecmrc datasets

itemkNN itemprob userkNN

1000 100 10 1 0.1

ccard

Netflix

PureSVD WRMF BPRMF

ecmrc datasets

BPRkNN SLIM fsSLIM


Netflix

Introduction

Methods

Materials


Conclusions 18/25

SLIM on Binary Data SLIM for Long-Tail Distribution

Figure 6: HR in ML10M tail Figure 5: Rating Distribution in ML10M HR

0.24

100%

0.16 0.12

10% % of items

0.20

ML10M tail

1% 0.1% 0.01%

short -head (popular)

0.001% 0

20% 40% 60% 80% % of purchases/ratings

Figure 7: ARHR in ML10M tail 100% 0.11

ARHR

0

long-tail (unpopular)

0.09 0.07 0.05

q SLIM outperforms the rest methods on the “long tail”.

ML10M tail

itemkNN itemprob userkNN

PureSVD WRMF BPRMF


BPRkNN SLIM fsSLIM

Introduction

Methods

Materials


Conclusions 19/25

SLIM on Binary Data SLIM Recommendations for Different top-N

Figure 9: Netflix 0.30

0.12

0.25

0.09

0.20

HR

HR

Figure 8: BX 0.15

0.15

0.06

0.10

0.03 5

10

15 N

itemkNN itemprob

20

25

userkNN PureSVD

5

10

WRMF BPRMF

15 N

20

25

BPRkNN SLIM

q The performance difference between SLIM and the best of the other methods are higher for smaller values of N. q SLIM tends to rank most relevant items higher than the other methods. Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems

Introduction

Methods

Materials


Conclusions 20/25

SLIM on Binary Data SLIM Regularization Effects

Figure 10: SLIM Regularization Effects on BX 2.5

0.11

5.0

5.0 2.0

1.0

0.5

0.5

0.0

2.0

0.09

1.0

0.08

HR

1.0

0.10

3.0 β

1.5

β

2.0

time (s)

3.0

0.5

0.07

0.0 0.0 0.0 0.5 1.0 2.0 3.0 5.0 λ

minimize W

0.06 0.0 0.5 1.0 2.0 3.0 5.0 λ

1 β kA − AWk2F + kWk2F + λkWk1 2 2

q As greater `1 -norm regularization (i.e., larger λ ) is applied, lower recommendation time is achieved, indicating that the learned W is sparser. q The best recommendation quality is achieved when both of the regularization parameters β and λ are non-zero. q The recommendation quality changes smoothly as the regularization parameters β and λ change. Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems

Introduction

Methods

Materials


Conclusions 21/25

SLIM on Rating Data Top-N recommendation performance

30% rHR

distribution

Figure 11: SLIM on Netflix 20% 10% 0

0.32 0.24 0.16 0.08 0.00 1

1 2 3 4 5

rating PureSVD-r PureSVD-b

WRMF-r WRMF-b

2

3 rating BPRkNN-r BPRkNN-b

4

5 SLIM-r SLIM-b

q Evaluation metics: q per-rating Hit Rate: rHR

q All the -r methods produce higher hit rates on items with higher ratings. q The -r methods outperform -b methods on high-rated items. q SLIM-r consistently outperforms the other methods on items with higher ratings. Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems

Introduction

1

2

3 4

Methods

Materials


Outline


5



Conclusions 22/25

Introduction

Methods

Materials


Conclusions 23/25

Conclusions

q SLIM: Sparse LInear Method for top-N recommendations q The recommendation score for a new item can be calculated as an aggregation of other items q A sparse aggregation coefficient matrix W is learned for SLIM to make the aggregation very fast q W is learned by solving an `1 -norm and `2 -norm regularized optimization problem such that sparsity is introduced into W q Fast and efficient


Introduction

Methods

Materials


Conclusions 24/25

References P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems, RecSys ’10, pages 39–46, New York, NY, USA, 2010. ACM. M. Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Transactions on Information Systems, 22:143–177, January 2004. Y. Hu, Y. Koren, and C. Volinsky. Collaborative filtering for implicit feedback datasets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 263–272, Washington, DC, USA, 2008. IEEE Computer Society. S. Rendle, C. Freudenthaler, Z. Gantner, and S.-T. Lars. Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, pages 452–461, Arlington, Virginia, United States, 2009. AUAI Press. V. Sindhwani, S. S. Bucak, J. Hu, and A. Mojsilovic. One-class matrix completion with low-density factorizations. In Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM ’10, pages 1055–1060, Washington, DC, USA, 2010. IEEE Computer Society. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (Series B), 58:267–288, 1996.


Introduction

Methods

Materials


Thank You!


Conclusions 25/25

SLIM: Sparse Linear Methods for Top-N ... - Semantic Scholar

SLIM: Sparse Linear Methods for Top-N ... - Semantic Scholar

Suggest Documents

SLIM: Sparse Linear Methods for Top-N Recommender Systems (PDF ...

SLIM: Sparse Linear Methods for Top-N Recommender Systems

SLIM: Sparse Linear Methods for Top-N Recommender Systems

PARALLEL DIRECT METHODS FOR SPARSE ... - Semantic Scholar

Bayesian Methods for Finding Sparse ... - Semantic Scholar

Greedy Sparse Linear Approximations of ... - Semantic Scholar

DISTURBED SPARSE LINEAR EQUATIONS ... - Semantic Scholar

LEAST-SQUARES METHODS FOR LINEAR ... - Semantic Scholar

Linear and non-linear methods for automatic ... - Semantic Scholar

Sub-Linear Capacity Scaling Laws for Sparse ... - Semantic Scholar

Bayesian Inference for Sparse Generalized Linear ... - Semantic Scholar

[PDF]EPUB Direct Methods for Sparse Linear Systems - Google Sites

Sparse Linear Methods with Side Information for Top ... - George Karypis

Iterative Methods and Preconditioning for Large and Sparse Linear

Iterative Methods for Sparse Linear Systems Second Edition [PDF]

Iterative Methods for Sparse Linear Systems Second Edition

Computational Methods for Sparse Solution of Linear Inverse Problems

Iterative Methods for Sparse Linear Systems - Stanford University

Iterative Methods for Sparse Linear Systems Second ...

[PDF]EPUB Direct Methods for Sparse Linear Systems - Google Sites

Scalable linear solvers for sparse linear systems

FLAME: Formal Linear Algebra Methods ... - Semantic Scholar

aggregation of sparse linear discriminant analyses ... - Semantic Scholar

Sparse Linear Regression With Structured Priors ... - Semantic Scholar