1st International Workshop on Inductive Reasoning and Machine ...

Probabilistic Partial User Model Similarity for Collaborative Filtering 1st International Workshop on Inductive Reasoning and Machine Learning (IRMLeS) 2009 Amancio Bouza, Gerald Reif, Abraham Bernstein Department of Informatics, University of Zurich

SOFTWAREEVOLUTIONARCHITECTURELAB

Motivation

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

2

Motivation

Italian Food

1. Jun. 2009


2

Motivation

Italian Food

Zurich

1. Jun. 2009

Heraklion


2

Motivation

Italian Food

No common rated items, but similar preferences

Zurich

1. Jun. 2009

Heraklion


2

Motivation No common rated items, but similar preferences

Asian Food

Italian Food

Zurich

1. Jun. 2009

Asian Food

Heraklion


2

Motivation No common rated items, but similar preferences

Asian Food

Italian Food

Asian Food

Partial User Preference Similarity

Zurich

1. Jun. 2009

Heraklion


2

Motivation


1. Jun. 2009

No common rated items, but similar preferences


2

Agenda Motivation User preference models Global similarity of user preferences Partial similarity of user preferences Evaluation Conclusion 1. Jun. 2009


3

User preference models

Modeling preferences

1. Jun. 2009


4


Modeling preferences Topics of interest (Balabanovic and Shoham 1997)

Weighted topics of interest (Good et al. 1999)

Topics from domain ontology (Middleton et al. 2002, 2004)

0.1

0.6

0.8

0.2

Preference vector (Anand et al. 2007)

1. Jun. 2009


4


Modeling preferences Item rating vector

Topics of interest

X

0.8

X

0.1

0.2

0.8

0.6

0.1

(Resnick et al. 1994)

(Balabanovic and Shoham 1997)

Weighted topics of interest

Item rating vector Prediction of missing values

(Good et al. 1999)

(Melville et al. 1999)

Topics from domain ontology (Middleton et al. 2002, 2004)

0.1

0.6

0.8

0.2

Preference vector (Anand et al. 2007)

1. Jun. 2009


4


Modeling preferences

FORMULAS FOR THE SOFTALK AMANCIO BOUZA X

Item rating vector

Topics of interest

User function

(Balabanovic and Shoham 1997)

Prediction of missing values

(Good et al. 1999)

X

0.1


hypothesized User function Item rating vector


0.8

u(i) → ck 0.2

0.8

0.6

0.1

h(i) + ε(i) → ck


Topics from domain ontology

Preference model hypothesized User function

(Middleton et al. 2002, 2004)

0.1

Preference vector

0.6 0.8 0.2 hypothesized User function

(Anand et al. 2007)

hypothesized User function

1. Jun. 2009

User-Based collaborative Filtering: IRMLeS 2009 by Amancio Bouza

h : i "→ ck h(i) → ck ha (i) → ck hb (i) → ck 4

User preference models AMANCIO BOUZA

Modeling preferences User function

u(i) → ck X

Item rating vector

Topics of interest

hypothesized User function (Balabanovic and Shoham 1997)


Item rating vector Prediction of missing values

(Good et al. 1999)

X

0.1

h(i) + ε(i) → ck 0.2


0.8

0.8

0.6

0.1


Topics from domain ontology

Preference model

hypothesized User function (Middleton et al. 2002, 2004)

0.1

0.6

0.8

0.2

Preference vector

hypothesized User function (Anand et al. 2007)

2009 by Amancio Bouza hypothesized User IRMLeS function

1. Jun. 2009

h : i "→ ck h(i) → ck ha (i) → ck 4


User Preference Model Modeling of items DefinitionFOR of feature THE set with relevant features FORMULAS SOFTALK

Mapping items to rating concepts User contribution: user provides item ratings

AMANCIO BOUZA Learning of accurate user preference model Program is said to learn: Performance P in task T improves with more experience E

u(i) = ck

unction h(i) + ε(i) = ck

unction 1. Jun. 2009

h : i !→ ck h(i) → ck


5

:


0.875 ∗ (4 − 3.66) + 0.25 ∗ (1 − 2.33 User Preference Model =3+

a2

0.875 + 0.25 Modeling of0.298 items − 0.333 FORMULAS FOR THE SOFTALK FORMULAS FOR THE SOFTALK = 2.969 = Mapping 3 +items to rating concepts 1.125 Definition of feature set with relevant features

AMANCIO BOUZA

User contribution: user provides item ratings

AMANCIO BOUZA Learning of accurate user preference model User function

uhypothesized hfunction a (i) =User a (i) + ε(i)

Program is said to learn: Performance P in task T improves with more experience u(i) = Eck

u(i) = ck

h(i) + ε(i) = ck

h(i) + ε(i) = ck

h : i !→ ck

unction

u(i) =Userh(i) hypothesized function+ ε(i)

ilarity:

h : i !→ ckhypothesized User function

h(i) → ck

" # h(i) → c hypothesized User function sim(ua , ub ) ≡ sim ua (i), ub (i)h (i) → c

unction

1. Jun. 2009

k


ha (i) → ck 5


Concept learning Ambiance information e.g. #Mediterranean_Ambiance

Food information e.g. #Italian_Food

Location information e.g. #Business_District

1. Jun. 2009


6





1. Jun. 2009


6





1. Jun. 2009


6




Hypothesis #Asian_Ambiance AND #Vegetarian_Food AND #Business_District


1. Jun. 2009


6



...

... ...

...

...

...


... ...

...

...

... ...

...

... ...

...


...

... ...

...

1. Jun. 2009


... ... ... ...

6



...

...

#Greek_District ... AND #American_Food AND #cheap_Wine

...

...

...


... ...

...

...

#Italian_Food AND ... #excellent_Wine #Asian_Food AND ... #Asian_Ambiance

... ...

...


...

... ...

...

1. Jun. 2009

...


...

#excellent_Wine AND ... NOT #Italian_Food ... ...

6



#Greek_District AND #American_Food AND #cheap_Wine

#Italian_Food AND #excellent_Wine

Food information e.g. #Italian_Food #Asian_Food AND #Asian_Ambiance


1. Jun. 2009

#excellent_Wine AND NOT #Italian_Food


6





Food information e.g. #Italian_Food #Asian_Food AND #Asian_Ambiance


1. Jun. 2009

#excellent_Wine AND NOT #Italian_Food


6


Concept learning

AMANCIO BOUZA

Ambiance information e.g. #Mediterranean_Ambiance


u(i) = ck


Food information

ser function e.g. #Italian_Food


ser function 1. Jun. 2009

#Asian_Food AND #Asian_Ambiance

h(i) + ε(i) = ck #excellent_Wine AND NOT #Italian_Food

h : i !→ ck IRMLeS 2009 by Amancio Bouza

6

Global similarity of user preferences

Model Similarity

1. Jun. 2009


7


Model Similarity Item set

1. Jun. 2009


7

hypothesized User function Global similarity of user preferences h(i) + ε(i) = ck u(i) = ck

Model Similarity

nction

h : i !→ ck

h(i) + ε(i) = ck

nction

hypothesized User function h : i !→ ck

hypothesized User function h(i) → ck

Item set

nction

nction

hypothesized User function ha : i !→ ck

h(i) → ck ha : i !→ ck hb : i !→ ck

User-Based collaborative Filtering: hb : i !→ ck n ! râj = ra + κ sim(a, b) ∗ (rbj − rb ) tive Filtering: n !

b!=a

râj = ra Normalization +κ sim(a, b) ∗ (rbjκ:− rb ) factor

1. Jun. 2009

b!=a


κ=

1 n

7

hypothesized User function Global similarity of user preferences h(i) + ε(i) = ck u(i) = ck

Model Similarity

nction

h : i !→ ck

h(i) + ε(i) = ck

nction

hypothesized User function h : i !→ ck

hypothesized User function h(i) → ck

Item set

nction

nction

hypothesized User function ha : i !→ ck

h(i) → ck ha : i !→ ck hb : i !→ ck

User-Based collaborative Filtering: hb : i !→ ck n ! râj = ra + κ sim(a, b) ∗ (rbj − rb ) tive Filtering: n !

b!=a

râj = ra Normalization +κ sim(a, b) ∗ (rbjκ:− rb ) factor

1. Jun. 2009

b!=a


κ=

1 n

7


Similarity Metric

1. Jun. 2009


8

b!=a


ample calculation:

0.875 ∗ (4 − 3.66) + 0.25 ∗ (1 − 2.33) Similarity Metric rˆ = 3 + a2

er Mode:

0.875 + 0.25 0.298 − 0.333 =3+ = 2.969 1.125 ua (i) = ha (i) + ε(i)

er Preference Similarity:

" # sim(ua , ub ) ≡ sim ua (i), ub (i) sim(ua , ub ) ≡ α

z ! m ! k=1 j=1

P (ua (j) = ck ∧ ub (j) = c

" # sim(ua (i), ub (i)) ! sim ha (i), hb (i)

1. Jun. 2009

z ! m ! " # " sim ua (i), ub (i) ! α P h (j) = c ∧ h (j) = c a k b IRMLeS 2009 by Amancio Bouza 8

0.298 − 0.333 Normalization factor κ: = 3 + Global = 2.969 1 similarity of user preferences κ= 1.125 ! sim(a, b) n

Similarity Metric u (i) = h (i) + ε(i)

User Mode:

b!=a

Example calculation:

0.875 ∗ (4a− 3.66) + 0.25 ∗a(1 − 2.33) 0.875 + 0.25 0.298 − 0.333 =3+ = 2.969 1.125

râ2 = 3 +

User Preference Similarity:

" # sim(ua , ub ) u≡(i) sim = h (i) + u ε(i) a (i), ub (i)

User Mode:

a

a


" # sim(ua , ub ) ≡ sim ua (i),zub (i) m

sim(usim(u a , u, ub )) ≡≡ α α a

b

!!

z ! m ! k=1 j=1

P (u (j) = ck ∧ ub (j)

ack ) P (ua (j) = ck ∧ ub (j) =

" k=1 j=1 # sim(ua (i), ub (i)) ! sim ha (i), hb (i)

" # " # ! ! " # " # sim uu(i), u (i) !! α sim P hh (j)a=(i), c ∧ hh (j)b = c sim ua (i), (i) b (i) z

a

m

a

b

k

b

k

k=1 j=1

z ! m ! " # sim(u , u ) ≡ " sim ua (i), ub (i) ! α P ha (j) = ck ∧ hb (j) Partial User Preference Similarity

a

b

k=1 j=1


sim(ua , ub ) ≡ 1

1. Jun. 2009


8

de:


sim(a, b)


0.875 ∗b!= (4a − 3.66) + 0.25 ∗ (1 − 2.33) Example calculation: râ2 = 3 + 0.875 + 0.25 a a 0.875 ∗ (4 − 3.66) 0.298 − 0.333 + 0.25 ∗ (1 − 2.33) râ2 = = 3+ 3+ = 2.969 1.125 0.875 + 0.25 0.298 − 0.333 =3+ = 2.969 User Mode: 1.125 ua (i) = ha (i) + ε(i) User Mode: a Preference b b + ε(i) User Similarity: aua (i) = ha (i) " # sim(u ) ≡ sim ua (i), ub (i) User Preference Similarity: z a , ubm

u (i) = h (i) + ε(i)

Similarity Metric "

erence Similarity:

# sim(u , u ) ≡ sim u (i), u (i)

! ! " # !u (i) sim(u , u ) ≡ sim ! u (i), αP (uaP(j) (u (j)= = c c∧ku ∧ (j) = ) sim(ua , ub ) ≡ αsim(u , u ) ≡! ucb (j) = ck ) ! a

z

b

a

b

sim(u a , ub ) ≡ " #α

z

a m

m

b

a

k=1 j=1 " P (ua (j)

k

b

=# ck ∧ ub (j) = ck )

k

j=1 sim uak=1 (i), ub (i) !k=1 simj=1ha (i), hb (i)

" # u (i), uh (i) !! α ! sim(ua (i), ub (i)) !sim"sim hb (i) # a (i), "P h (j) = c " # m sim(u"a (i), ub (i)) !# sim ! hza (i), h (i) !b " a

b

sim ua (i), ub (i) ! α

z

m

a

P k=1 j=1

m k=1 j=1 Partial User PreferencezSimilarity

k

∧ hb (j) =# ck

ha (j) = ck ∧ hb (j) = ck

#

! ! " # " # Partial User Preference Similarity sim(u , u(j) ) ≡ = c ∧ h (j) = c sim ua (i), ub (i) ! α P h a k b k sim(u , u ) ≡ a

a

b

b

k=1 j=1

ser Preference Similarity

sim(ua , ub ) ≡ 1

1. Jun. 2009

1


8

Partial similarity of user preferences

Partial Model Similarity

1. Jun. 2009


9



1. Jun. 2009


9



Item set

1. Jun. 2009


9



Item set

1. Jun. 2009


9


Partial Model Similarity tion u(i) = ck

h(i) + ε(i) = ck

tion

tion

tion

ve Filtering: râj = ra + κ 1. Jun. 2009

h : i !→ ck h(i) → ck ha : i !→ ck hb : i !→ ck Item set

n ! b!=a

sim(a, b) ∗ (rbj − rb )


9


Partial Model Similarity Hypothesis 1

Hypothesis 2

Hypothesis 3

ha : i !→ ck

Item set

1. Jun. 2009


9


Partial Model Similarity Hypothesis 1

Hypothesis 2

Hypothesis 3

ha : i !→ ck

Item set

1. Jun. 2009


9

i !→ ck Partial similarity of userh :preferences



h(i) → ck


ha : i !→ ck

Hypothesis 1

hypothesized User function Hypothesis 2

User-Based collaborative Filtering: Hypothesis 3

râj = ra + κ

ha : i !→ ck

hb : i !→ ck n ! b!=a

Normalization factor κ:

κ= Item set

sim(a, b) ∗ ( 1

n !

sim(a, b)

b!=a

1. Jun. 2009

ExampleIRMLeS calculation: 2009 by Amancio Bouza

9




h(i) → ck


ha : i !→ ck

Hypothesis 1



râj = ra + κ

ha : i !→ ck

hb : i !→ ck n ! b!=a


κ= Item set

sim(a, b) ∗ ( 1

n !

sim(a, b)

b!=a

1. Jun. 2009


9




h(i) → ck


ha : i !→ ck

Hypothesis 1



râj = ra + κ

ha : i !→ ck

hb : i !→ ck n ! b!=a


κ= Item set

sim(a, b) ∗ ( 1

n !

sim(a, b)

b!=a

1. Jun. 2009


9




h(i) → ck


ha : i !→ ck

Hypothesis 1



râj = ra + κ

ha : i !→ ck

hb : i !→ ck n ! b!=a


κ= Item set

sim(a, b) ∗ ( 1

n !

sim(a, b)

b!=a

1. Jun. 2009


9




h(i) → ck


ha : i !→ ck

Hypothesis 1



râj = ra + κ

ha : i !→ ck

hb : i !→ ck n ! b!=a


κ= Item set

sim(a, b) ∗ ( 1

n !

sim(a, b)

b!=a

1. Jun. 2009


9

User Preference Similarity

Partial Similarity Metric

1. Jun. 2009


10

a

a

r Preference Similarity:

" # sim(ua , ub ) ≡ sim ua (i), ub (i)


Partial Similarity Metric sim(ua , ub ) ≡ α

z ! m ! k=1 j=1

P (ua (j) = ck ∧ ub (j) = ck )

" # " # sim ua (i), ub (i) ! sim ha (i), hb (i)

z ! m ! " # " # sim ua (i), ub (i) ! α P ha (j) = ck ∧ hb (j) = ck k=1 j=1

tial User Preference Similarity

" # ∂sim(ua , ub |ha,q ) ≡ sim ha,q , hb (i)

1 1. Jun. 2009


10

=3+ User Mode:

0.298 − 0.333 = 2.969 1.125


ua (i) = ha (i) + ε(i)

Partial Similarity Metric User Preference Similarity:

" # sim(ua , ub ) ≡ sim ua (i), ub (i) sim(ua , ub ) ≡ α

z ! m ! k=1 j=1

P (ua (j) = ck ∧ ub (j) = ck )


z ! m ! " # " # sim ua (i), ub (i) ! α P ha (j) = ck ∧ hb (j) = ck k=1 j=1

AMANCIO BOUZA Partial User Preference Similarity

" # ∂sim(ua , ub |ha,q ) ≡ sim ha,q , hb (i)

ntinue

n # " ! " ! sim ha,q , hb (i) ≡ α P hb (j) = ck ∧ ha,q (j) = ck

≡α

rtial Preference Simiarlity

j=1 n # j=1

1

" " ! P hb (j) = ck |ha,q (j) = ck P ha,q (j) = ck !

n # " ! ! " sim ha,q , hb (i) ≡ α P hb (i) = ck |ha,q (i) = ck k=1

1. Jun. 2009


10



" # sim(ua , ub ) ≡ sim ua (i), ub (i)

AMANCIO BOUZA

sim(ua , ub ) ≡ α

z ! m !

P (ua (j) = ck ∧ ub (j) = ck )

Partial Similarity Metric k=1 j=1


tinue

z ! m ! " # " # na (i), ub (i) ! α sim u P ha (j) = ck ∧ hb (j) = ck

# ! " ! " sim ha,q , hbPartial (i) User ≡ Preference α P hb (j) = ck ∧ ha,q (j) = ck Similarity k=1 j=1

2

Continue

AMANCIO BOUZA "

# ∂sim(ua , ub |ha,q ) ≡ sim ha,q , hb (i)

j=1 n # ! " ! n sim h # , h (i) ≡ α P h (j) = c a,q

≡α

b

j=1

≡α

Partial Preference Simiarlity

tial Preference Simiarlity

"

" " ! P hb (j) = ck |ha,q "(j) = ck" P ha,q (j) = ck ! ! !

b

j=1 n # j=1

k

∧ ha,q (j) = ck

P hb (j) = ck |ha,q (j) = ck P ha,q (j) = ck 1

n # " ! ! " sim ha,q , hb (i) ≡ α P hb (i) = ck |ha,q (i) = ck

n # " ! ! " sim ha,q , hb (i) ≡ α P hb (i) = ck |ha,q (i) = ck k=1

k=1

1. Jun. 2009


10

Collaborative Filtering

1. Jun. 2009


11

FORMULAS FOR THE SOFTALK

Collaborative Filtering AMANCIO BOUZA

râj = ra + κ (Resnick et al. 1994)

n ! b!=a

κ= 1. Jun. 2009


n ! b!=a

1 sim(a, b)


11

AMANCIO BOUZA

Collaborative Filtering râj = ra + κ (Resnick et al. 1994)

n ! b!=a

κ=


n !

1

Item 1 Item 2 Item 3

Avg

a

2

?

4

3

b

2

4

5

3.66

c

5

1

1

2.33

Avg

3

2.5

3.33

sim(a, b)

b!=a

1. Jun. 2009

User


Similarity

a

b

c

a

1

0.875

0.25

b

0.875

1

0.125

c

0.25

0.125

1

11

AMANCIO BOUZA AMANCIO BOUZA

Collaborative Filtering râj = ra + κ

n ! b!=a


n !

1

râj = ra +!κsim(a, b) sim(a, b) ∗ (rbj − rb ) κ=


râj = ra + κ

n

b!=a n !

n ! 0.875 ∗ (4 − 3.66) + 0.25 ∗ (1 − 2.33)

0.875 + 0.25 0.298 − 0.333 =3+ = 2.969 1.125 b!=a

1. Jun. 2009

1


κ=

b!=a

râj = 3 +

b!=a

User

Item 1 Item 2 Item 3

Avg

a

2

?

4

3

b

2

4

5

3.66

c

5

1

1

2.33

Avg

3

2.5

3.33

sim(a, b)


Similarity

a

b

c

a

1

0.875

0.25

b

0.875

1

0.125

c

0.25

0.125

1

11

does it work?

Evaluation Dataset IMDb (movie features) + Netflix Prize (user ratings) 10’128 Movies, 83’029’805 ratings, 479’437 users

Data Analysis Avg. num. r/u: 173.2 Median r/u: 80 Avg. rating: 3.53 Rating median: 4

Experimental Setting Few ratings, few common rated items: 500 users, 50 r/u Many ratings, many common rated items: 500 users, 200 r/u

Significance test Wilcoxon signed-ranks test Significance level: alpha = 0.01 Bonferroni correction for the family-wise error

1. Jun. 2009


13

Evaluation Dataset IMDb (movie features) + Netflix Prize (user ratings) 10’128 Movies, 83’029’805 ratings, 479’437 users

Setting Data Analysis Avg. num. r/u: 173.2 Median r/u: 80 Avg. rating: 3.53 Rating median: 4

Algorithm

RMSE

Recall

F1

1.097698

0.898961

66.23% 71.23% 68.64%

UMSim (SVM)

1.077945

0.88902

66.72% 71.33% 68.95%

0.885730

66.34% 68.34% 68.34%

0.929923

65.19% 71.14% 68.04%

UMSim (Part) 1.075843 50 Few ratings, few common rated items: 500 users, 50 r/uratings/user CF (Pearson Corr.) 1.131921 Many ratings, many common rated items: 500 users, 200 r/u

Wilcoxon signed-ranks test

Prec.

pUMSim (Part)

Experimental Setting

Significance test

MAE

SVM

1.309146

0.976800

63.85% 71.68% 67.53%

Part

1.334507

1.003800

64.32% 70.98% 67.49%

Significance level: alpha = 0.01 Bonferroni correction for the family-wise error

1. Jun. 2009


13

Evaluation Setting

Dataset IMDb (movie features) + Netflix Prize (user ratings) 10’128 Movies, 83’029’805 ratings, 479’437 users

Setting Data Analysis Avg. num. r/u: 173.2 Median r/u: 80 Avg. rating: 3.53 Rating median: 4

Algorithm

ratings/user

MAE

Prec.

Recall

F1

pUMSim (Part)

1.097698

0.898961

66.23% 71.23% 68.64%

UMSim (SVM)

1.077945

0.88902

66.72% 71.33% 68.95%

F1

68.34% 68.34%

CF (Pearson Corr.) 1.131921

0.929923

65.19% 71.14% 68.04%

SVM

0.976800

63.85% 71.68% 67.53%

pUMSim (Part)

1.048786

UMSim (SVM)

1.003800 64.32% 63.88% 70.98% 67.49% 1.035611 Part 0.835009 1.334507 60.77% 67.33%

UMSim (Part) 1.032746 200 Few ratings, few common rated items: 500 users, 50 r/uratings/user CF (Pearson Corr.) 1.035324 Many ratings, many common rated items: 500 users, 200 r/u

Wilcoxon signed-ranks test

RMSE

UMSim (Part) 0.885730 66.34% RMSE MAE 1.075843 Prec. Recall 50

Experimental Setting

Significance test

Algorithm

0.843029

60.90% 66.83% 63.73%

1.309146

0.833374

60.89% 67.31% 63.94%

0.832373

60.56% 68.71% 64.38%

SVM

1.230682

0.896450

58.54% 67.80% 62.83%

Part

1.292360

0.953600

58.76% 64.72% 61.60%

Significance level: alpha = 0.01 Bonferroni correction for the family-wise error

1. Jun. 2009


13

Evaluation Setting

Dataset IMDb (movie features) + Netflix Prize (user ratings) 10’128 Movies, 83’029’805 ratings, 479’437 users

Data Analysis Avg. num. r/u: 173.2

50 ratings/user

Median r/u: 80 Avg. rating: 3.53

Algorithm

RMSE

MAE

Prec.

Recall

F1

pUMSim (Part)

1.097698

0.898961

66.23% 71.23% 68.64%

UMSim (SVM)

1.077945

0.88902

66.72% 71.33% 68.95%

UMSim (Part)

1.075843

0.885730

66.34% 68.34% 68.34%


0.929923

65.19% 71.14% 68.04%

SVM

1.309146

0.976800

63.85% 71.68% 67.53%

Part

1.334507

1.003800

64.32% 70.98% 67.49%

RMSE

MAE

pUMSim (Part)

1.048786

0.843029

60.90% 66.83% 63.73%

UMSim (SVM)

1.035611

0.835009

60.77% 67.33% 63.88%

UMSim (Part)

1.032746

0.833374

60.89% 67.31% 63.94%


0.832373

60.56% 68.71% 64.38%

SVM

1.230682

0.896450

58.54% 67.80% 62.83%

Part

1.292360

0.953600

58.76% 64.72% 61.60%

Rating median: 4

Experimental Setting Few ratings, few common rated items: 500 users, 50 r/u

Setting

Many ratings, many common rated items: 500 users, 200 r/u

Significance test Wilcoxon signed-ranks test Significance level: alpha = 0.01

200 ratings/user

Algorithm

Prec.

Recall

F1

Bonferroni correction for the family-wise error

1. Jun. 2009


13

Conclusion Model similarity is important Similarity based on user preference models sometimes significantly outperforms Similarity based on common rated item Especially with few common rated items

Partial User Preference Similarity needs further improvement Preprocessing needed for scalability 1. Jun. 2009


14

Thanks for your attention

References 1. S. S. Anand, P. Kearney, and M. Shapcott. Generating semantically enriched user profiles for web personalization. ACM Transactions on Internet Technology, 2007. 2. M. Balabanovic and Y. Shoham. Fab: Content-based, collaborative recommendation. In Communications of the ACM, 1997. 3. C. Basu, H. Hirsh, and W. Cohen. Recommendation as classification: Using social and content-based information in recommendation. In AAAI, 1998. 4. J. Bennett and S. Lanning. The netflix prize. KDD Cup and Workshop, 2007. 5. J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In 14th Conference on Uncertainty in AI, 1998.

10. D. Lemire and A. Maclachlan. Slope one predictors for online rating-based collaborative filtering. In Proceedings of SIAM Data Mining (SDM’05), 2005. 11. P. Melville, R. J. Mooney, and R. Nagara jan. Content-boosted collaborative filtering for improved recommendations. In AAAI, 2002. 12. S. E. Middleton, H. Alani, and D. C. de Roure. Exploiting synergy between ontologies and recommender systems. In WWW, 2002. 13. S. E. Middleton, N. R. Shadbolt, and D. C. de Roure. Ontological user profiling in recommender systems. In ACM Transactions on Information Systems, 2004. 14. T. M. Mitchel. Machine Learning. 1997.

6. I. Cantador, A. Bellog´ın, and P. Castells. A multilayer ontology-based hybrid recommendation model. AI Communcations, 2008.

15. P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: an open architecture for collaborative filtering of netnews. In CSCW, 1994.

7. E. Frank and I. H. Witten. Generating accurate rule sets without global optimization. In 15th International Conference on Machine Learning, 1998.

16. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. In WWW, 2001.

8. N. Good, J. B. Schafer, J. A. Konstan, A. Borchers, B. Sarwar, J. Herlocker, and J. Riedl. Combining collaborative filtering with personal agents for better recommendations. In AAAI /IAAI, 1999.

17. I. H. Witten and E. Frank. Data Mining - Practical Machine Learning Tools and Techniques. 2005.

9. J. L. Herlocker, J. A. Konstan, L. G. Reveen, and J. T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. on Information Sys., 2004.

1. Jun. 2009


16

Summary

FORMULAS FOR THE SOFTALK

Amancio Bouza, Gerald Reif, Abraham, Bernstein: “Probabilistic Partial User Model Similarity for Collaborative Filtering”, IRMLeS 2009

AMANCIO BOUZA Asian Food

Asian Food

Italian Food

User function u(i) = ck hypothesized User function

Similarity based on user preference models is important

h(i) + ε(i) = ck h : i !→ ck

Zurich

Heraklion


User preference similarity is good, but 2 partial user preference similarity not Continue always. Needs further investigation

h(i) → ck

hypothesized User function H1

AMANCIO BOUZA

ha : i !→ ck


hb : i !→ ck

H2

n #

User-Based " ! " ! collaborative Filtering: H3 sim ha,q , hb (i) ≡ α P hb (j) = ck ∧ ha,q (j)! n= ck

Partial user preference similarity based on similarity between hypothesis and user model:

≡α

Partial Preference Simiarlity

Hypothesis extraction from user model Hypothesis as item filter

j=1 n #

râj = ra + κ

b!=a

" ! P hb (j) = ck |ha,q (j) = ck P 1 ha,q (j) = c

! Normalization factor κ: j=1


Item set set Item


κ=

n !

sim(a, b)

b!=a

n # " ! ! " 0.875 ∗ (4 − 3.66) + 0.25 ∗ (1 − 2. sim ha,q , hb (i) ≡ α Prâ2h=b (i) 3 + = ck |ha,q (i) = ck 0.875 + 0.25 k=1

=3+

0.298 − 0.333 = 2.969 1.125

User Model: ua (i) = ha (i) + ε(i)

1. Jun. 2009


User Model: u(i) = h(i) + ε(i)

17