1st International Workshop on Inductive Reasoning and Machine ...

1 downloads 227 Views 6MB Size Report
Jun 1, 2009 - 1st International Workshop on Inductive Reasoning and. Machine Learning (IRMLeS) 2009 ..... Data Analysis.
Probabilistic Partial User Model Similarity for Collaborative Filtering 1st International Workshop on Inductive Reasoning and Machine Learning (IRMLeS) 2009 Amancio Bouza, Gerald Reif, Abraham Bernstein Department of Informatics, University of Zurich

SOFTWAREEVOLUTIONARCHITECTURELAB

Motivation

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

2

Motivation

Italian Food

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

2

Motivation

Italian Food

Zurich

1. Jun. 2009

Heraklion

IRMLeS 2009 by Amancio Bouza

2

Motivation

Italian Food

No common rated items, but similar preferences

Zurich

1. Jun. 2009

Heraklion

IRMLeS 2009 by Amancio Bouza

2

Motivation No common rated items, but similar preferences

Asian Food

Italian Food

Zurich

1. Jun. 2009

Asian Food

Heraklion

IRMLeS 2009 by Amancio Bouza

2

Motivation No common rated items, but similar preferences

Asian Food

Italian Food

Asian Food

Partial User Preference Similarity

Zurich

1. Jun. 2009

Heraklion

IRMLeS 2009 by Amancio Bouza

2

Motivation

Partial User Preference Similarity

1. Jun. 2009

No common rated items, but similar preferences

IRMLeS 2009 by Amancio Bouza

2

Agenda Motivation User preference models Global similarity of user preferences Partial similarity of user preferences Evaluation Conclusion 1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

3

User preference models

Modeling preferences

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

4

User preference models

Modeling preferences Topics of interest (Balabanovic and Shoham 1997)

Weighted topics of interest (Good et al. 1999)

Topics from domain ontology (Middleton et al. 2002, 2004)

0.1

0.6

0.8

0.2

Preference vector (Anand et al. 2007)

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

4

User preference models

Modeling preferences Item rating vector

Topics of interest

X

0.8

X

0.1

0.2

0.8

0.6

0.1

(Resnick et al. 1994)

(Balabanovic and Shoham 1997)

Weighted topics of interest

Item rating vector Prediction of missing values

(Good et al. 1999)

(Melville et al. 1999)

Topics from domain ontology (Middleton et al. 2002, 2004)

0.1

0.6

0.8

0.2

Preference vector (Anand et al. 2007)

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

4

User preference models

Modeling preferences

FORMULAS FOR THE SOFTALK AMANCIO BOUZA X

Item rating vector

Topics of interest

User function

(Balabanovic and Shoham 1997)

Prediction of missing values

(Good et al. 1999)

X

0.1

(Resnick et al. 1994)

hypothesized User function Item rating vector

Weighted topics of interest

0.8

u(i) → ck 0.2

0.8

0.6

0.1

h(i) + ε(i) → ck

(Melville et al. 1999)

Topics from domain ontology

Preference model hypothesized User function

(Middleton et al. 2002, 2004)

0.1

Preference vector

0.6 0.8 0.2 hypothesized User function

(Anand et al. 2007)

hypothesized User function

1. Jun. 2009

User-Based collaborative Filtering: IRMLeS 2009 by Amancio Bouza

h : i "→ ck h(i) → ck ha (i) → ck hb (i) → ck 4

User preference models AMANCIO BOUZA

Modeling preferences User function

u(i) → ck X

Item rating vector

Topics of interest

hypothesized User function (Balabanovic and Shoham 1997)

(Resnick et al. 1994)

Item rating vector Prediction of missing values

(Good et al. 1999)

X

0.1

h(i) + ε(i) → ck 0.2

Weighted topics of interest

0.8

0.8

0.6

0.1

(Melville et al. 1999)

Topics from domain ontology

Preference model

hypothesized User function (Middleton et al. 2002, 2004)

0.1

0.6

0.8

0.2

Preference vector

hypothesized User function (Anand et al. 2007)

2009 by Amancio Bouza hypothesized User IRMLeS function

1. Jun. 2009

h : i "→ ck h(i) → ck ha (i) → ck 4

User preference models

User Preference Model Modeling of items DefinitionFOR of feature THE set with relevant features FORMULAS SOFTALK

Mapping items to rating concepts User contribution: user provides item ratings

AMANCIO BOUZA Learning of accurate user preference model Program is said to learn: Performance P in task T improves with more experience E

u(i) = ck

unction h(i) + ε(i) = ck

unction 1. Jun. 2009

h : i !→ ck h(i) → ck

IRMLeS 2009 by Amancio Bouza

5

:

User preference models

0.875 ∗ (4 − 3.66) + 0.25 ∗ (1 − 2.33 User Preference Model =3+

a2

0.875 + 0.25 Modeling of0.298 items − 0.333 FORMULAS FOR THE SOFTALK FORMULAS FOR THE SOFTALK = 2.969 = Mapping 3 +items to rating concepts 1.125 Definition of feature set with relevant features

AMANCIO BOUZA

User contribution: user provides item ratings

AMANCIO BOUZA Learning of accurate user preference model User function

uhypothesized hfunction a (i) =User a (i) + ε(i)

Program is said to learn: Performance P in task T improves with more experience u(i) = Eck

u(i) = ck

h(i) + ε(i) = ck

h(i) + ε(i) = ck

h : i !→ ck

unction

u(i) =Userh(i) hypothesized function+ ε(i)

ilarity:

h : i !→ ckhypothesized User function

h(i) → ck

" # h(i) → c hypothesized User function sim(ua , ub ) ≡ sim ua (i), ub (i)h (i) → c

unction

1. Jun. 2009

k

IRMLeS 2009 by Amancio Bouza

ha (i) → ck 5

User preference models

Concept learning Ambiance information e.g. #Mediterranean_Ambiance

Food information e.g. #Italian_Food

Location information e.g. #Business_District

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

6

User preference models

Concept learning Ambiance information e.g. #Mediterranean_Ambiance

Food information e.g. #Italian_Food

Location information e.g. #Business_District

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

6

User preference models

Concept learning Ambiance information e.g. #Mediterranean_Ambiance

Food information e.g. #Italian_Food

Location information e.g. #Business_District

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

6

User preference models

Concept learning Ambiance information e.g. #Mediterranean_Ambiance

Food information e.g. #Italian_Food

Hypothesis #Asian_Ambiance AND #Vegetarian_Food AND #Business_District

Location information e.g. #Business_District

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

6

User preference models

Concept learning Ambiance information e.g. #Mediterranean_Ambiance

...

... ...

...

...

...

Food information e.g. #Italian_Food

... ...

...

...

... ...

...

... ...

...

Location information e.g. #Business_District

...

... ...

...

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

... ... ... ...

6

User preference models

Concept learning Ambiance information e.g. #Mediterranean_Ambiance

...

...

#Greek_District ... AND #American_Food AND #cheap_Wine

...

...

...

Food information e.g. #Italian_Food

... ...

...

...

#Italian_Food AND ... #excellent_Wine #Asian_Food AND ... #Asian_Ambiance

... ...

...

Location information e.g. #Business_District

...

... ...

...

1. Jun. 2009

...

IRMLeS 2009 by Amancio Bouza

...

#excellent_Wine AND ... NOT #Italian_Food ... ...

6

User preference models

Concept learning Ambiance information e.g. #Mediterranean_Ambiance

#Greek_District AND #American_Food AND #cheap_Wine

#Italian_Food AND #excellent_Wine

Food information e.g. #Italian_Food #Asian_Food AND #Asian_Ambiance

Location information e.g. #Business_District

1. Jun. 2009

#excellent_Wine AND NOT #Italian_Food

IRMLeS 2009 by Amancio Bouza

6

User preference models

Concept learning Ambiance information e.g. #Mediterranean_Ambiance

#Greek_District AND #American_Food AND #cheap_Wine

#Italian_Food AND #excellent_Wine

Food information e.g. #Italian_Food #Asian_Food AND #Asian_Ambiance

Location information e.g. #Business_District

1. Jun. 2009

#excellent_Wine AND NOT #Italian_Food

IRMLeS 2009 by Amancio Bouza

6

User preference models

Concept learning

AMANCIO BOUZA

Ambiance information e.g. #Mediterranean_Ambiance

#Greek_District AND #American_Food AND #cheap_Wine

u(i) = ck

#Italian_Food AND #excellent_Wine

Food information

ser function e.g. #Italian_Food

Location information e.g. #Business_District

ser function 1. Jun. 2009

#Asian_Food AND #Asian_Ambiance

h(i) + ε(i) = ck #excellent_Wine AND NOT #Italian_Food

h : i !→ ck IRMLeS 2009 by Amancio Bouza

6

Global similarity of user preferences

Model Similarity

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

7

Global similarity of user preferences

Model Similarity Item set

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

7

hypothesized User function Global similarity of user preferences h(i) + ε(i) = ck u(i) = ck

Model Similarity

nction

h : i !→ ck

h(i) + ε(i) = ck

nction

hypothesized User function h : i !→ ck

hypothesized User function h(i) → ck

Item set

nction

nction

hypothesized User function ha : i !→ ck

h(i) → ck ha : i !→ ck hb : i !→ ck

User-Based collaborative Filtering: hb : i !→ ck n ! rˆaj = ra + κ sim(a, b) ∗ (rbj − rb ) tive Filtering: n !

b!=a

rˆaj = ra Normalization +κ sim(a, b) ∗ (rbjκ:− rb ) factor

1. Jun. 2009

b!=a

IRMLeS 2009 by Amancio Bouza

κ=

1 n

7

hypothesized User function Global similarity of user preferences h(i) + ε(i) = ck u(i) = ck

Model Similarity

nction

h : i !→ ck

h(i) + ε(i) = ck

nction

hypothesized User function h : i !→ ck

hypothesized User function h(i) → ck

Item set

nction

nction

hypothesized User function ha : i !→ ck

h(i) → ck ha : i !→ ck hb : i !→ ck

User-Based collaborative Filtering: hb : i !→ ck n ! rˆaj = ra + κ sim(a, b) ∗ (rbj − rb ) tive Filtering: n !

b!=a

rˆaj = ra Normalization +κ sim(a, b) ∗ (rbjκ:− rb ) factor

1. Jun. 2009

b!=a

IRMLeS 2009 by Amancio Bouza

κ=

1 n

7

Global similarity of user preferences

Similarity Metric

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

8

b!=a

Global similarity of user preferences

ample calculation:

0.875 ∗ (4 − 3.66) + 0.25 ∗ (1 − 2.33) Similarity Metric rˆ = 3 + a2

er Mode:

0.875 + 0.25 0.298 − 0.333 =3+ = 2.969 1.125 ua (i) = ha (i) + ε(i)

er Preference Similarity:

" # sim(ua , ub ) ≡ sim ua (i), ub (i) sim(ua , ub ) ≡ α

z ! m ! k=1 j=1

P (ua (j) = ck ∧ ub (j) = c

" # sim(ua (i), ub (i)) ! sim ha (i), hb (i)

1. Jun. 2009

z ! m ! " # " sim ua (i), ub (i) ! α P h (j) = c ∧ h (j) = c a k b IRMLeS 2009 by Amancio Bouza 8

0.298 − 0.333 Normalization factor κ: = 3 + Global = 2.969 1 similarity of user preferences κ= 1.125 ! sim(a, b) n

Similarity Metric u (i) = h (i) + ε(i)

User Mode:

b!=a

Example calculation:

0.875 ∗ (4a− 3.66) + 0.25 ∗a(1 − 2.33) 0.875 + 0.25 0.298 − 0.333 =3+ = 2.969 1.125

rˆa2 = 3 +

User Preference Similarity:

" # sim(ua , ub ) u≡(i) sim = h (i) + u ε(i) a (i), ub (i)

User Mode:

a

a

User Preference Similarity:

" # sim(ua , ub ) ≡ sim ua (i),zub (i) m

sim(usim(u a , u, ub )) ≡≡ α α a

b

!!

z ! m ! k=1 j=1

P (u (j) = ck ∧ ub (j)

ack ) P (ua (j) = ck ∧ ub (j) =

" k=1 j=1 # sim(ua (i), ub (i)) ! sim ha (i), hb (i)

" # " # ! ! " # " # sim uu(i), u (i) !! α sim P hh (j)a=(i), c ∧ hh (j)b = c sim ua (i), (i) b (i) z

a

m

a

b

k

b

k

k=1 j=1

z ! m ! " # sim(u , u ) ≡ " sim ua (i), ub (i) ! α P ha (j) = ck ∧ hb (j) Partial User Preference Similarity

a

b

k=1 j=1

Partial User Preference Similarity

sim(ua , ub ) ≡ 1

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

8

de:

Example calculation:

sim(a, b)

Global similarity of user preferences

0.875 ∗b!= (4a − 3.66) + 0.25 ∗ (1 − 2.33) Example calculation: rˆa2 = 3 + 0.875 + 0.25 a a 0.875 ∗ (4 − 3.66) 0.298 − 0.333 + 0.25 ∗ (1 − 2.33) rˆa2 = = 3+ 3+ = 2.969 1.125 0.875 + 0.25 0.298 − 0.333 =3+ = 2.969 User Mode: 1.125 ua (i) = ha (i) + ε(i) User Mode: a Preference b b + ε(i) User Similarity: aua (i) = ha (i) " # sim(u ) ≡ sim ua (i), ub (i) User Preference Similarity: z a , ubm

u (i) = h (i) + ε(i)

Similarity Metric "

erence Similarity:

# sim(u , u ) ≡ sim u (i), u (i)

! ! " # !u (i) sim(u , u ) ≡ sim ! u (i), αP (uaP(j) (u (j)= = c c∧ku ∧ (j) = ) sim(ua , ub ) ≡ αsim(u , u ) ≡! ucb (j) = ck ) ! a

z

b

a

b

sim(u a , ub ) ≡ " #α

z

a m

m

b

a

k=1 j=1 " P (ua (j)

k

b

=# ck ∧ ub (j) = ck )

k

j=1 sim uak=1 (i), ub (i) !k=1 simj=1ha (i), hb (i)

" # u (i), uh (i) !! α ! sim(ua (i), ub (i)) !sim"sim hb (i) # a (i), "P h (j) = c " # m sim(u"a (i), ub (i)) !# sim ! hza (i), h (i) !b " a

b

sim ua (i), ub (i) ! α

z

m

a

P k=1 j=1

m k=1 j=1 Partial User PreferencezSimilarity

k

∧ hb (j) =# ck

ha (j) = ck ∧ hb (j) = ck

#

! ! " # " # Partial User Preference Similarity sim(u , u(j) ) ≡ = c ∧ h (j) = c sim ua (i), ub (i) ! α P h a k b k sim(u , u ) ≡ a

a

b

b

k=1 j=1

ser Preference Similarity

sim(ua , ub ) ≡ 1

1. Jun. 2009

1

IRMLeS 2009 by Amancio Bouza

8

Partial similarity of user preferences

Partial Model Similarity

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

9

Partial similarity of user preferences

Partial Model Similarity

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

9

Partial similarity of user preferences

Partial Model Similarity

Item set

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

9

Partial similarity of user preferences

Partial Model Similarity

Item set

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

9

Partial similarity of user preferences

Partial Model Similarity tion u(i) = ck

h(i) + ε(i) = ck

tion

tion

tion

ve Filtering: rˆaj = ra + κ 1. Jun. 2009

h : i !→ ck h(i) → ck ha : i !→ ck hb : i !→ ck Item set

n ! b!=a

sim(a, b) ∗ (rbj − rb )

IRMLeS 2009 by Amancio Bouza

9

Partial similarity of user preferences

Partial Model Similarity Hypothesis 1

Hypothesis 2

Hypothesis 3

ha : i !→ ck

Item set

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

9

Partial similarity of user preferences

Partial Model Similarity Hypothesis 1

Hypothesis 2

Hypothesis 3

ha : i !→ ck

Item set

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

9

i !→ ck Partial similarity of userh :preferences

hypothesized User function

Partial Model Similarity

h(i) → ck

hypothesized User function

ha : i !→ ck

Hypothesis 1

hypothesized User function Hypothesis 2

User-Based collaborative Filtering: Hypothesis 3

rˆaj = ra + κ

ha : i !→ ck

hb : i !→ ck n ! b!=a

Normalization factor κ:

κ= Item set

sim(a, b) ∗ ( 1

n !

sim(a, b)

b!=a

1. Jun. 2009

ExampleIRMLeS calculation: 2009 by Amancio Bouza

9

i !→ ck Partial similarity of userh :preferences

hypothesized User function

Partial Model Similarity

h(i) → ck

hypothesized User function

ha : i !→ ck

Hypothesis 1

hypothesized User function Hypothesis 2

User-Based collaborative Filtering: Hypothesis 3

rˆaj = ra + κ

ha : i !→ ck

hb : i !→ ck n ! b!=a

Normalization factor κ:

κ= Item set

sim(a, b) ∗ ( 1

n !

sim(a, b)

b!=a

1. Jun. 2009

ExampleIRMLeS calculation: 2009 by Amancio Bouza

9

i !→ ck Partial similarity of userh :preferences

hypothesized User function

Partial Model Similarity

h(i) → ck

hypothesized User function

ha : i !→ ck

Hypothesis 1

hypothesized User function Hypothesis 2

User-Based collaborative Filtering: Hypothesis 3

rˆaj = ra + κ

ha : i !→ ck

hb : i !→ ck n ! b!=a

Normalization factor κ:

κ= Item set

sim(a, b) ∗ ( 1

n !

sim(a, b)

b!=a

1. Jun. 2009

ExampleIRMLeS calculation: 2009 by Amancio Bouza

9

i !→ ck Partial similarity of userh :preferences

hypothesized User function

Partial Model Similarity

h(i) → ck

hypothesized User function

ha : i !→ ck

Hypothesis 1

hypothesized User function Hypothesis 2

User-Based collaborative Filtering: Hypothesis 3

rˆaj = ra + κ

ha : i !→ ck

hb : i !→ ck n ! b!=a

Normalization factor κ:

κ= Item set

sim(a, b) ∗ ( 1

n !

sim(a, b)

b!=a

1. Jun. 2009

ExampleIRMLeS calculation: 2009 by Amancio Bouza

9

i !→ ck Partial similarity of userh :preferences

hypothesized User function

Partial Model Similarity

h(i) → ck

hypothesized User function

ha : i !→ ck

Hypothesis 1

hypothesized User function Hypothesis 2

User-Based collaborative Filtering: Hypothesis 3

rˆaj = ra + κ

ha : i !→ ck

hb : i !→ ck n ! b!=a

Normalization factor κ:

κ= Item set

sim(a, b) ∗ ( 1

n !

sim(a, b)

b!=a

1. Jun. 2009

ExampleIRMLeS calculation: 2009 by Amancio Bouza

9

User Preference Similarity

Partial Similarity Metric

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

10

a

a

r Preference Similarity:

" # sim(ua , ub ) ≡ sim ua (i), ub (i)

User Preference Similarity

Partial Similarity Metric sim(ua , ub ) ≡ α

z ! m ! k=1 j=1

P (ua (j) = ck ∧ ub (j) = ck )

" # " # sim ua (i), ub (i) ! sim ha (i), hb (i)

z ! m ! " # " # sim ua (i), ub (i) ! α P ha (j) = ck ∧ hb (j) = ck k=1 j=1

tial User Preference Similarity

" # ∂sim(ua , ub |ha,q ) ≡ sim ha,q , hb (i)

1 1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

10

=3+ User Mode:

0.298 − 0.333 = 2.969 1.125

User Preference Similarity

ua (i) = ha (i) + ε(i)

Partial Similarity Metric User Preference Similarity:

" # sim(ua , ub ) ≡ sim ua (i), ub (i) sim(ua , ub ) ≡ α

z ! m ! k=1 j=1

P (ua (j) = ck ∧ ub (j) = ck )

" # " # sim ua (i), ub (i) ! sim ha (i), hb (i)

z ! m ! " # " # sim ua (i), ub (i) ! α P ha (j) = ck ∧ hb (j) = ck k=1 j=1

AMANCIO BOUZA Partial User Preference Similarity

" # ∂sim(ua , ub |ha,q ) ≡ sim ha,q , hb (i)

ntinue

n # " ! " ! sim ha,q , hb (i) ≡ α P hb (j) = ck ∧ ha,q (j) = ck

≡α

rtial Preference Simiarlity

j=1 n # j=1

1

" " ! P hb (j) = ck |ha,q (j) = ck P ha,q (j) = ck !

n # " ! ! " sim ha,q , hb (i) ≡ α P hb (i) = ck |ha,q (i) = ck k=1

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

10

User Preference Similarity:

User Preference Similarity

" # sim(ua , ub ) ≡ sim ua (i), ub (i)

AMANCIO BOUZA

sim(ua , ub ) ≡ α

z ! m !

P (ua (j) = ck ∧ ub (j) = ck )

Partial Similarity Metric k=1 j=1

" # " # sim ua (i), ub (i) ! sim ha (i), hb (i)

tinue

z ! m ! " # " # na (i), ub (i) ! α sim u P ha (j) = ck ∧ hb (j) = ck

# ! " ! " sim ha,q , hbPartial (i) User ≡ Preference α P hb (j) = ck ∧ ha,q (j) = ck Similarity k=1 j=1

2

Continue

AMANCIO BOUZA "

# ∂sim(ua , ub |ha,q ) ≡ sim ha,q , hb (i)

j=1 n # ! " ! n sim h # , h (i) ≡ α P h (j) = c a,q

≡α

b

j=1

≡α

Partial Preference Simiarlity

tial Preference Simiarlity

"

" " ! P hb (j) = ck |ha,q "(j) = ck" P ha,q (j) = ck ! ! !

b

j=1 n # j=1

k

∧ ha,q (j) = ck

P hb (j) = ck |ha,q (j) = ck P ha,q (j) = ck 1

n # " ! ! " sim ha,q , hb (i) ≡ α P hb (i) = ck |ha,q (i) = ck

n # " ! ! " sim ha,q , hb (i) ≡ α P hb (i) = ck |ha,q (i) = ck k=1

k=1

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

10

Collaborative Filtering

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

11

FORMULAS FOR THE SOFTALK

Collaborative Filtering AMANCIO BOUZA

rˆaj = ra + κ (Resnick et al. 1994)

n ! b!=a

κ= 1. Jun. 2009

sim(a, b) ∗ (rbj − rb )

n ! b!=a

1 sim(a, b)

IRMLeS 2009 by Amancio Bouza

11

AMANCIO BOUZA

Collaborative Filtering rˆaj = ra + κ (Resnick et al. 1994)

n ! b!=a

κ=

sim(a, b) ∗ (rbj − rb )

n !

1

Item 1 Item 2 Item 3

Avg

a

2

?

4

3

b

2

4

5

3.66

c

5

1

1

2.33

Avg

3

2.5

3.33

sim(a, b)

b!=a

1. Jun. 2009

User

IRMLeS 2009 by Amancio Bouza

Similarity

a

b

c

a

1

0.875

0.25

b

0.875

1

0.125

c

0.25

0.125

1

11

AMANCIO BOUZA AMANCIO BOUZA

Collaborative Filtering rˆaj = ra + κ

n ! b!=a

sim(a, b) ∗ (rbj − rb )

n !

1

rˆaj = ra +!κsim(a, b) sim(a, b) ∗ (rbj − rb ) κ=

(Resnick et al. 1994)

rˆaj = ra + κ

n

b!=a n !

n ! 0.875 ∗ (4 − 3.66) + 0.25 ∗ (1 − 2.33)

0.875 + 0.25 0.298 − 0.333 =3+ = 2.969 1.125 b!=a

1. Jun. 2009

1

sim(a, b) ∗ (rbj − rb )

κ=

b!=a

rˆaj = 3 +

b!=a

User

Item 1 Item 2 Item 3

Avg

a

2

?

4

3

b

2

4

5

3.66

c

5

1

1

2.33

Avg

3

2.5

3.33

sim(a, b)

IRMLeS 2009 by Amancio Bouza

Similarity

a

b

c

a

1

0.875

0.25

b

0.875

1

0.125

c

0.25

0.125

1

11

does it work?

Evaluation Dataset IMDb (movie features) + Netflix Prize (user ratings) 10’128 Movies, 83’029’805 ratings, 479’437 users

Data Analysis Avg. num. r/u: 173.2 Median r/u: 80 Avg. rating: 3.53 Rating median: 4

Experimental Setting Few ratings, few common rated items: 500 users, 50 r/u Many ratings, many common rated items: 500 users, 200 r/u

Significance test Wilcoxon signed-ranks test Significance level: alpha = 0.01 Bonferroni correction for the family-wise error

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

13

Evaluation Dataset IMDb (movie features) + Netflix Prize (user ratings) 10’128 Movies, 83’029’805 ratings, 479’437 users

Setting Data Analysis Avg. num. r/u: 173.2 Median r/u: 80 Avg. rating: 3.53 Rating median: 4

Algorithm

RMSE

Recall

F1

1.097698

0.898961

66.23% 71.23% 68.64%

UMSim (SVM)

1.077945

0.88902

66.72% 71.33% 68.95%

0.885730

66.34% 68.34% 68.34%

0.929923

65.19% 71.14% 68.04%

UMSim (Part) 1.075843 50 Few ratings, few common rated items: 500 users, 50 r/uratings/user CF (Pearson Corr.) 1.131921 Many ratings, many common rated items: 500 users, 200 r/u

Wilcoxon signed-ranks test

Prec.

pUMSim (Part)

Experimental Setting

Significance test

MAE

SVM

1.309146

0.976800

63.85% 71.68% 67.53%

Part

1.334507

1.003800

64.32% 70.98% 67.49%

Significance level: alpha = 0.01 Bonferroni correction for the family-wise error

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

13

Evaluation Setting

Dataset IMDb (movie features) + Netflix Prize (user ratings) 10’128 Movies, 83’029’805 ratings, 479’437 users

Setting Data Analysis Avg. num. r/u: 173.2 Median r/u: 80 Avg. rating: 3.53 Rating median: 4

Algorithm

ratings/user

MAE

Prec.

Recall

F1

pUMSim (Part)

1.097698

0.898961

66.23% 71.23% 68.64%

UMSim (SVM)

1.077945

0.88902

66.72% 71.33% 68.95%

F1

68.34% 68.34%

CF (Pearson Corr.) 1.131921

0.929923

65.19% 71.14% 68.04%

SVM

0.976800

63.85% 71.68% 67.53%

pUMSim (Part)

1.048786

UMSim (SVM)

1.003800 64.32% 63.88% 70.98% 67.49% 1.035611 Part 0.835009 1.334507 60.77% 67.33%

UMSim (Part) 1.032746 200 Few ratings, few common rated items: 500 users, 50 r/uratings/user CF (Pearson Corr.) 1.035324 Many ratings, many common rated items: 500 users, 200 r/u

Wilcoxon signed-ranks test

RMSE

UMSim (Part) 0.885730 66.34% RMSE MAE 1.075843 Prec. Recall 50

Experimental Setting

Significance test

Algorithm

0.843029

60.90% 66.83% 63.73%

1.309146

0.833374

60.89% 67.31% 63.94%

0.832373

60.56% 68.71% 64.38%

SVM

1.230682

0.896450

58.54% 67.80% 62.83%

Part

1.292360

0.953600

58.76% 64.72% 61.60%

Significance level: alpha = 0.01 Bonferroni correction for the family-wise error

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

13

Evaluation Setting

Dataset IMDb (movie features) + Netflix Prize (user ratings) 10’128 Movies, 83’029’805 ratings, 479’437 users

Data Analysis Avg. num. r/u: 173.2

50 ratings/user

Median r/u: 80 Avg. rating: 3.53

Algorithm

RMSE

MAE

Prec.

Recall

F1

pUMSim (Part)

1.097698

0.898961

66.23% 71.23% 68.64%

UMSim (SVM)

1.077945

0.88902

66.72% 71.33% 68.95%

UMSim (Part)

1.075843

0.885730

66.34% 68.34% 68.34%

CF (Pearson Corr.) 1.131921

0.929923

65.19% 71.14% 68.04%

SVM

1.309146

0.976800

63.85% 71.68% 67.53%

Part

1.334507

1.003800

64.32% 70.98% 67.49%

RMSE

MAE

pUMSim (Part)

1.048786

0.843029

60.90% 66.83% 63.73%

UMSim (SVM)

1.035611

0.835009

60.77% 67.33% 63.88%

UMSim (Part)

1.032746

0.833374

60.89% 67.31% 63.94%

CF (Pearson Corr.) 1.035324

0.832373

60.56% 68.71% 64.38%

SVM

1.230682

0.896450

58.54% 67.80% 62.83%

Part

1.292360

0.953600

58.76% 64.72% 61.60%

Rating median: 4

Experimental Setting Few ratings, few common rated items: 500 users, 50 r/u

Setting

Many ratings, many common rated items: 500 users, 200 r/u

Significance test Wilcoxon signed-ranks test Significance level: alpha = 0.01

200 ratings/user

Algorithm

Prec.

Recall

F1

Bonferroni correction for the family-wise error

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

13

Conclusion Model similarity is important Similarity based on user preference models sometimes significantly outperforms Similarity based on common rated item Especially with few common rated items

Partial User Preference Similarity needs further improvement Preprocessing needed for scalability 1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

14

Thanks for your attention

References 1. S. S. Anand, P. Kearney, and M. Shapcott. Generating semantically enriched user profiles for web personalization. ACM Transactions on Internet Technology, 2007. 2. M. Balabanovic and Y. Shoham. Fab: Content-based, collaborative recommendation. In Communications of the ACM, 1997. 3. C. Basu, H. Hirsh, and W. Cohen. Recommendation as classification: Using social and content-based information in recommendation. In AAAI, 1998. 4. J. Bennett and S. Lanning. The netflix prize. KDD Cup and Workshop, 2007. 5. J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In 14th Conference on Uncertainty in AI, 1998.

10. D. Lemire and A. Maclachlan. Slope one predictors for online rating-based collaborative filtering. In Proceedings of SIAM Data Mining (SDM’05), 2005. 11. P. Melville, R. J. Mooney, and R. Nagara jan. Content-boosted collaborative filtering for improved recommendations. In AAAI, 2002. 12. S. E. Middleton, H. Alani, and D. C. de Roure. Exploiting synergy between ontologies and recommender systems. In WWW, 2002. 13. S. E. Middleton, N. R. Shadbolt, and D. C. de Roure. Ontological user profiling in recommender systems. In ACM Transactions on Information Systems, 2004. 14. T. M. Mitchel. Machine Learning. 1997.

6. I. Cantador, A. Bellog´ın, and P. Castells. A multilayer ontology-based hybrid recommendation model. AI Communcations, 2008.

15. P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: an open architecture for collaborative filtering of netnews. In CSCW, 1994.

7. E. Frank and I. H. Witten. Generating accurate rule sets without global optimization. In 15th International Conference on Machine Learning, 1998.

16. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. In WWW, 2001.

8. N. Good, J. B. Schafer, J. A. Konstan, A. Borchers, B. Sarwar, J. Herlocker, and J. Riedl. Combining collaborative filtering with personal agents for better recommendations. In AAAI /IAAI, 1999.

17. I. H. Witten and E. Frank. Data Mining - Practical Machine Learning Tools and Techniques. 2005.

9. J. L. Herlocker, J. A. Konstan, L. G. Reveen, and J. T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. on Information Sys., 2004.

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

16

Summary

FORMULAS FOR THE SOFTALK

Amancio Bouza, Gerald Reif, Abraham, Bernstein: “Probabilistic Partial User Model Similarity for Collaborative Filtering”, IRMLeS 2009

AMANCIO BOUZA Asian Food

Asian Food

Italian Food

User function u(i) = ck hypothesized User function

Similarity based on user preference models is important

h(i) + ε(i) = ck h : i !→ ck

Zurich

Heraklion

hypothesized User function

User preference similarity is good, but 2 partial user preference similarity not Continue always. Needs further investigation

h(i) → ck

hypothesized User function H1

AMANCIO BOUZA

ha : i !→ ck

hypothesized User function

hb : i !→ ck

H2

n #

User-Based " ! " ! collaborative Filtering: H3 sim ha,q , hb (i) ≡ α P hb (j) = ck ∧ ha,q (j)! n= ck

Partial user preference similarity based on similarity between hypothesis and user model:

≡α

Partial Preference Simiarlity

Hypothesis extraction from user model Hypothesis as item filter

j=1 n #

rˆaj = ra + κ

b!=a

" ! P hb (j) = ck |ha,q (j) = ck P 1 ha,q (j) = c

! Normalization factor κ: j=1

sim(a, b) ∗ (rbj − rb )

Item set set Item

Example calculation:

κ=

n !

sim(a, b)

b!=a

n # " ! ! " 0.875 ∗ (4 − 3.66) + 0.25 ∗ (1 − 2. sim ha,q , hb (i) ≡ α Prˆa2h=b (i) 3 + = ck |ha,q (i) = ck 0.875 + 0.25 k=1

=3+

0.298 − 0.333 = 2.969 1.125

User Model: ua (i) = ha (i) + ε(i)

1. Jun. 2009

IRMLeS 2009 by Amancio Bouza

User Model: u(i) = h(i) + ε(i)

17