Jun 1, 2009 - 1st International Workshop on Inductive Reasoning and. Machine Learning (IRMLeS) 2009 ..... Data Analysis.
Probabilistic Partial User Model Similarity for Collaborative Filtering 1st International Workshop on Inductive Reasoning and Machine Learning (IRMLeS) 2009 Amancio Bouza, Gerald Reif, Abraham Bernstein Department of Informatics, University of Zurich
SOFTWAREEVOLUTIONARCHITECTURELAB
Motivation
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
2
Motivation
Italian Food
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
2
Motivation
Italian Food
Zurich
1. Jun. 2009
Heraklion
IRMLeS 2009 by Amancio Bouza
2
Motivation
Italian Food
No common rated items, but similar preferences
Zurich
1. Jun. 2009
Heraklion
IRMLeS 2009 by Amancio Bouza
2
Motivation No common rated items, but similar preferences
Asian Food
Italian Food
Zurich
1. Jun. 2009
Asian Food
Heraklion
IRMLeS 2009 by Amancio Bouza
2
Motivation No common rated items, but similar preferences
Asian Food
Italian Food
Asian Food
Partial User Preference Similarity
Zurich
1. Jun. 2009
Heraklion
IRMLeS 2009 by Amancio Bouza
2
Motivation
Partial User Preference Similarity
1. Jun. 2009
No common rated items, but similar preferences
IRMLeS 2009 by Amancio Bouza
2
Agenda Motivation User preference models Global similarity of user preferences Partial similarity of user preferences Evaluation Conclusion 1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
3
User preference models
Modeling preferences
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
4
User preference models
Modeling preferences Topics of interest (Balabanovic and Shoham 1997)
Weighted topics of interest (Good et al. 1999)
Topics from domain ontology (Middleton et al. 2002, 2004)
0.1
0.6
0.8
0.2
Preference vector (Anand et al. 2007)
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
4
User preference models
Modeling preferences Item rating vector
Topics of interest
X
0.8
X
0.1
0.2
0.8
0.6
0.1
(Resnick et al. 1994)
(Balabanovic and Shoham 1997)
Weighted topics of interest
Item rating vector Prediction of missing values
(Good et al. 1999)
(Melville et al. 1999)
Topics from domain ontology (Middleton et al. 2002, 2004)
0.1
0.6
0.8
0.2
Preference vector (Anand et al. 2007)
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
4
User preference models
Modeling preferences
FORMULAS FOR THE SOFTALK AMANCIO BOUZA X
Item rating vector
Topics of interest
User function
(Balabanovic and Shoham 1997)
Prediction of missing values
(Good et al. 1999)
X
0.1
(Resnick et al. 1994)
hypothesized User function Item rating vector
Weighted topics of interest
0.8
u(i) → ck 0.2
0.8
0.6
0.1
h(i) + ε(i) → ck
(Melville et al. 1999)
Topics from domain ontology
Preference model hypothesized User function
(Middleton et al. 2002, 2004)
0.1
Preference vector
0.6 0.8 0.2 hypothesized User function
(Anand et al. 2007)
hypothesized User function
1. Jun. 2009
User-Based collaborative Filtering: IRMLeS 2009 by Amancio Bouza
h : i "→ ck h(i) → ck ha (i) → ck hb (i) → ck 4
User preference models AMANCIO BOUZA
Modeling preferences User function
u(i) → ck X
Item rating vector
Topics of interest
hypothesized User function (Balabanovic and Shoham 1997)
(Resnick et al. 1994)
Item rating vector Prediction of missing values
(Good et al. 1999)
X
0.1
h(i) + ε(i) → ck 0.2
Weighted topics of interest
0.8
0.8
0.6
0.1
(Melville et al. 1999)
Topics from domain ontology
Preference model
hypothesized User function (Middleton et al. 2002, 2004)
0.1
0.6
0.8
0.2
Preference vector
hypothesized User function (Anand et al. 2007)
2009 by Amancio Bouza hypothesized User IRMLeS function
1. Jun. 2009
h : i "→ ck h(i) → ck ha (i) → ck 4
User preference models
User Preference Model Modeling of items DefinitionFOR of feature THE set with relevant features FORMULAS SOFTALK
Mapping items to rating concepts User contribution: user provides item ratings
AMANCIO BOUZA Learning of accurate user preference model Program is said to learn: Performance P in task T improves with more experience E
u(i) = ck
unction h(i) + ε(i) = ck
unction 1. Jun. 2009
h : i !→ ck h(i) → ck
IRMLeS 2009 by Amancio Bouza
5
:
User preference models
0.875 ∗ (4 − 3.66) + 0.25 ∗ (1 − 2.33 User Preference Model =3+
a2
0.875 + 0.25 Modeling of0.298 items − 0.333 FORMULAS FOR THE SOFTALK FORMULAS FOR THE SOFTALK = 2.969 = Mapping 3 +items to rating concepts 1.125 Definition of feature set with relevant features
AMANCIO BOUZA
User contribution: user provides item ratings
AMANCIO BOUZA Learning of accurate user preference model User function
uhypothesized hfunction a (i) =User a (i) + ε(i)
Program is said to learn: Performance P in task T improves with more experience u(i) = Eck
u(i) = ck
h(i) + ε(i) = ck
h(i) + ε(i) = ck
h : i !→ ck
unction
u(i) =Userh(i) hypothesized function+ ε(i)
ilarity:
h : i !→ ckhypothesized User function
h(i) → ck
" # h(i) → c hypothesized User function sim(ua , ub ) ≡ sim ua (i), ub (i)h (i) → c
unction
1. Jun. 2009
k
IRMLeS 2009 by Amancio Bouza
ha (i) → ck 5
User preference models
Concept learning Ambiance information e.g. #Mediterranean_Ambiance
Food information e.g. #Italian_Food
Location information e.g. #Business_District
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
6
User preference models
Concept learning Ambiance information e.g. #Mediterranean_Ambiance
Food information e.g. #Italian_Food
Location information e.g. #Business_District
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
6
User preference models
Concept learning Ambiance information e.g. #Mediterranean_Ambiance
Food information e.g. #Italian_Food
Location information e.g. #Business_District
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
6
User preference models
Concept learning Ambiance information e.g. #Mediterranean_Ambiance
Food information e.g. #Italian_Food
Hypothesis #Asian_Ambiance AND #Vegetarian_Food AND #Business_District
Location information e.g. #Business_District
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
6
User preference models
Concept learning Ambiance information e.g. #Mediterranean_Ambiance
...
... ...
...
...
...
Food information e.g. #Italian_Food
... ...
...
...
... ...
...
... ...
...
Location information e.g. #Business_District
...
... ...
...
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
... ... ... ...
6
User preference models
Concept learning Ambiance information e.g. #Mediterranean_Ambiance
...
...
#Greek_District ... AND #American_Food AND #cheap_Wine
...
...
...
Food information e.g. #Italian_Food
... ...
...
...
#Italian_Food AND ... #excellent_Wine #Asian_Food AND ... #Asian_Ambiance
... ...
...
Location information e.g. #Business_District
...
... ...
...
1. Jun. 2009
...
IRMLeS 2009 by Amancio Bouza
...
#excellent_Wine AND ... NOT #Italian_Food ... ...
6
User preference models
Concept learning Ambiance information e.g. #Mediterranean_Ambiance
#Greek_District AND #American_Food AND #cheap_Wine
#Italian_Food AND #excellent_Wine
Food information e.g. #Italian_Food #Asian_Food AND #Asian_Ambiance
Location information e.g. #Business_District
1. Jun. 2009
#excellent_Wine AND NOT #Italian_Food
IRMLeS 2009 by Amancio Bouza
6
User preference models
Concept learning Ambiance information e.g. #Mediterranean_Ambiance
#Greek_District AND #American_Food AND #cheap_Wine
#Italian_Food AND #excellent_Wine
Food information e.g. #Italian_Food #Asian_Food AND #Asian_Ambiance
Location information e.g. #Business_District
1. Jun. 2009
#excellent_Wine AND NOT #Italian_Food
IRMLeS 2009 by Amancio Bouza
6
User preference models
Concept learning
AMANCIO BOUZA
Ambiance information e.g. #Mediterranean_Ambiance
#Greek_District AND #American_Food AND #cheap_Wine
u(i) = ck
#Italian_Food AND #excellent_Wine
Food information
ser function e.g. #Italian_Food
Location information e.g. #Business_District
ser function 1. Jun. 2009
#Asian_Food AND #Asian_Ambiance
h(i) + ε(i) = ck #excellent_Wine AND NOT #Italian_Food
h : i !→ ck IRMLeS 2009 by Amancio Bouza
6
Global similarity of user preferences
Model Similarity
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
7
Global similarity of user preferences
Model Similarity Item set
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
7
hypothesized User function Global similarity of user preferences h(i) + ε(i) = ck u(i) = ck
Model Similarity
nction
h : i !→ ck
h(i) + ε(i) = ck
nction
hypothesized User function h : i !→ ck
hypothesized User function h(i) → ck
Item set
nction
nction
hypothesized User function ha : i !→ ck
h(i) → ck ha : i !→ ck hb : i !→ ck
User-Based collaborative Filtering: hb : i !→ ck n ! rˆaj = ra + κ sim(a, b) ∗ (rbj − rb ) tive Filtering: n !
b!=a
rˆaj = ra Normalization +κ sim(a, b) ∗ (rbjκ:− rb ) factor
1. Jun. 2009
b!=a
IRMLeS 2009 by Amancio Bouza
κ=
1 n
7
hypothesized User function Global similarity of user preferences h(i) + ε(i) = ck u(i) = ck
Model Similarity
nction
h : i !→ ck
h(i) + ε(i) = ck
nction
hypothesized User function h : i !→ ck
hypothesized User function h(i) → ck
Item set
nction
nction
hypothesized User function ha : i !→ ck
h(i) → ck ha : i !→ ck hb : i !→ ck
User-Based collaborative Filtering: hb : i !→ ck n ! rˆaj = ra + κ sim(a, b) ∗ (rbj − rb ) tive Filtering: n !
b!=a
rˆaj = ra Normalization +κ sim(a, b) ∗ (rbjκ:− rb ) factor
1. Jun. 2009
b!=a
IRMLeS 2009 by Amancio Bouza
κ=
1 n
7
Global similarity of user preferences
Similarity Metric
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
8
b!=a
Global similarity of user preferences
ample calculation:
0.875 ∗ (4 − 3.66) + 0.25 ∗ (1 − 2.33) Similarity Metric rˆ = 3 + a2
er Mode:
0.875 + 0.25 0.298 − 0.333 =3+ = 2.969 1.125 ua (i) = ha (i) + ε(i)
er Preference Similarity:
" # sim(ua , ub ) ≡ sim ua (i), ub (i) sim(ua , ub ) ≡ α
z ! m ! k=1 j=1
P (ua (j) = ck ∧ ub (j) = c
" # sim(ua (i), ub (i)) ! sim ha (i), hb (i)
1. Jun. 2009
z ! m ! " # " sim ua (i), ub (i) ! α P h (j) = c ∧ h (j) = c a k b IRMLeS 2009 by Amancio Bouza 8
0.298 − 0.333 Normalization factor κ: = 3 + Global = 2.969 1 similarity of user preferences κ= 1.125 ! sim(a, b) n
Similarity Metric u (i) = h (i) + ε(i)
User Mode:
b!=a
Example calculation:
0.875 ∗ (4a− 3.66) + 0.25 ∗a(1 − 2.33) 0.875 + 0.25 0.298 − 0.333 =3+ = 2.969 1.125
rˆa2 = 3 +
User Preference Similarity:
" # sim(ua , ub ) u≡(i) sim = h (i) + u ε(i) a (i), ub (i)
User Mode:
a
a
User Preference Similarity:
" # sim(ua , ub ) ≡ sim ua (i),zub (i) m
sim(usim(u a , u, ub )) ≡≡ α α a
b
!!
z ! m ! k=1 j=1
P (u (j) = ck ∧ ub (j)
ack ) P (ua (j) = ck ∧ ub (j) =
" k=1 j=1 # sim(ua (i), ub (i)) ! sim ha (i), hb (i)
" # " # ! ! " # " # sim uu(i), u (i) !! α sim P hh (j)a=(i), c ∧ hh (j)b = c sim ua (i), (i) b (i) z
a
m
a
b
k
b
k
k=1 j=1
z ! m ! " # sim(u , u ) ≡ " sim ua (i), ub (i) ! α P ha (j) = ck ∧ hb (j) Partial User Preference Similarity
a
b
k=1 j=1
Partial User Preference Similarity
sim(ua , ub ) ≡ 1
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
8
de:
Example calculation:
sim(a, b)
Global similarity of user preferences
0.875 ∗b!= (4a − 3.66) + 0.25 ∗ (1 − 2.33) Example calculation: rˆa2 = 3 + 0.875 + 0.25 a a 0.875 ∗ (4 − 3.66) 0.298 − 0.333 + 0.25 ∗ (1 − 2.33) rˆa2 = = 3+ 3+ = 2.969 1.125 0.875 + 0.25 0.298 − 0.333 =3+ = 2.969 User Mode: 1.125 ua (i) = ha (i) + ε(i) User Mode: a Preference b b + ε(i) User Similarity: aua (i) = ha (i) " # sim(u ) ≡ sim ua (i), ub (i) User Preference Similarity: z a , ubm
u (i) = h (i) + ε(i)
Similarity Metric "
erence Similarity:
# sim(u , u ) ≡ sim u (i), u (i)
! ! " # !u (i) sim(u , u ) ≡ sim ! u (i), αP (uaP(j) (u (j)= = c c∧ku ∧ (j) = ) sim(ua , ub ) ≡ αsim(u , u ) ≡! ucb (j) = ck ) ! a
z
b
a
b
sim(u a , ub ) ≡ " #α
z
a m
m
b
a
k=1 j=1 " P (ua (j)
k
b
=# ck ∧ ub (j) = ck )
k
j=1 sim uak=1 (i), ub (i) !k=1 simj=1ha (i), hb (i)
" # u (i), uh (i) !! α ! sim(ua (i), ub (i)) !sim"sim hb (i) # a (i), "P h (j) = c " # m sim(u"a (i), ub (i)) !# sim ! hza (i), h (i) !b " a
b
sim ua (i), ub (i) ! α
z
m
a
P k=1 j=1
m k=1 j=1 Partial User PreferencezSimilarity
k
∧ hb (j) =# ck
ha (j) = ck ∧ hb (j) = ck
#
! ! " # " # Partial User Preference Similarity sim(u , u(j) ) ≡ = c ∧ h (j) = c sim ua (i), ub (i) ! α P h a k b k sim(u , u ) ≡ a
a
b
b
k=1 j=1
ser Preference Similarity
sim(ua , ub ) ≡ 1
1. Jun. 2009
1
IRMLeS 2009 by Amancio Bouza
8
Partial similarity of user preferences
Partial Model Similarity
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
9
Partial similarity of user preferences
Partial Model Similarity
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
9
Partial similarity of user preferences
Partial Model Similarity
Item set
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
9
Partial similarity of user preferences
Partial Model Similarity
Item set
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
9
Partial similarity of user preferences
Partial Model Similarity tion u(i) = ck
h(i) + ε(i) = ck
tion
tion
tion
ve Filtering: rˆaj = ra + κ 1. Jun. 2009
h : i !→ ck h(i) → ck ha : i !→ ck hb : i !→ ck Item set
n ! b!=a
sim(a, b) ∗ (rbj − rb )
IRMLeS 2009 by Amancio Bouza
9
Partial similarity of user preferences
Partial Model Similarity Hypothesis 1
Hypothesis 2
Hypothesis 3
ha : i !→ ck
Item set
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
9
Partial similarity of user preferences
Partial Model Similarity Hypothesis 1
Hypothesis 2
Hypothesis 3
ha : i !→ ck
Item set
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
9
i !→ ck Partial similarity of userh :preferences
hypothesized User function
Partial Model Similarity
h(i) → ck
hypothesized User function
ha : i !→ ck
Hypothesis 1
hypothesized User function Hypothesis 2
User-Based collaborative Filtering: Hypothesis 3
rˆaj = ra + κ
ha : i !→ ck
hb : i !→ ck n ! b!=a
Normalization factor κ:
κ= Item set
sim(a, b) ∗ ( 1
n !
sim(a, b)
b!=a
1. Jun. 2009
ExampleIRMLeS calculation: 2009 by Amancio Bouza
9
i !→ ck Partial similarity of userh :preferences
hypothesized User function
Partial Model Similarity
h(i) → ck
hypothesized User function
ha : i !→ ck
Hypothesis 1
hypothesized User function Hypothesis 2
User-Based collaborative Filtering: Hypothesis 3
rˆaj = ra + κ
ha : i !→ ck
hb : i !→ ck n ! b!=a
Normalization factor κ:
κ= Item set
sim(a, b) ∗ ( 1
n !
sim(a, b)
b!=a
1. Jun. 2009
ExampleIRMLeS calculation: 2009 by Amancio Bouza
9
i !→ ck Partial similarity of userh :preferences
hypothesized User function
Partial Model Similarity
h(i) → ck
hypothesized User function
ha : i !→ ck
Hypothesis 1
hypothesized User function Hypothesis 2
User-Based collaborative Filtering: Hypothesis 3
rˆaj = ra + κ
ha : i !→ ck
hb : i !→ ck n ! b!=a
Normalization factor κ:
κ= Item set
sim(a, b) ∗ ( 1
n !
sim(a, b)
b!=a
1. Jun. 2009
ExampleIRMLeS calculation: 2009 by Amancio Bouza
9
i !→ ck Partial similarity of userh :preferences
hypothesized User function
Partial Model Similarity
h(i) → ck
hypothesized User function
ha : i !→ ck
Hypothesis 1
hypothesized User function Hypothesis 2
User-Based collaborative Filtering: Hypothesis 3
rˆaj = ra + κ
ha : i !→ ck
hb : i !→ ck n ! b!=a
Normalization factor κ:
κ= Item set
sim(a, b) ∗ ( 1
n !
sim(a, b)
b!=a
1. Jun. 2009
ExampleIRMLeS calculation: 2009 by Amancio Bouza
9
i !→ ck Partial similarity of userh :preferences
hypothesized User function
Partial Model Similarity
h(i) → ck
hypothesized User function
ha : i !→ ck
Hypothesis 1
hypothesized User function Hypothesis 2
User-Based collaborative Filtering: Hypothesis 3
rˆaj = ra + κ
ha : i !→ ck
hb : i !→ ck n ! b!=a
Normalization factor κ:
κ= Item set
sim(a, b) ∗ ( 1
n !
sim(a, b)
b!=a
1. Jun. 2009
ExampleIRMLeS calculation: 2009 by Amancio Bouza
9
User Preference Similarity
Partial Similarity Metric
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
10
a
a
r Preference Similarity:
" # sim(ua , ub ) ≡ sim ua (i), ub (i)
User Preference Similarity
Partial Similarity Metric sim(ua , ub ) ≡ α
z ! m ! k=1 j=1
P (ua (j) = ck ∧ ub (j) = ck )
" # " # sim ua (i), ub (i) ! sim ha (i), hb (i)
z ! m ! " # " # sim ua (i), ub (i) ! α P ha (j) = ck ∧ hb (j) = ck k=1 j=1
tial User Preference Similarity
" # ∂sim(ua , ub |ha,q ) ≡ sim ha,q , hb (i)
1 1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
10
=3+ User Mode:
0.298 − 0.333 = 2.969 1.125
User Preference Similarity
ua (i) = ha (i) + ε(i)
Partial Similarity Metric User Preference Similarity:
" # sim(ua , ub ) ≡ sim ua (i), ub (i) sim(ua , ub ) ≡ α
z ! m ! k=1 j=1
P (ua (j) = ck ∧ ub (j) = ck )
" # " # sim ua (i), ub (i) ! sim ha (i), hb (i)
z ! m ! " # " # sim ua (i), ub (i) ! α P ha (j) = ck ∧ hb (j) = ck k=1 j=1
AMANCIO BOUZA Partial User Preference Similarity
" # ∂sim(ua , ub |ha,q ) ≡ sim ha,q , hb (i)
ntinue
n # " ! " ! sim ha,q , hb (i) ≡ α P hb (j) = ck ∧ ha,q (j) = ck
≡α
rtial Preference Simiarlity
j=1 n # j=1
1
" " ! P hb (j) = ck |ha,q (j) = ck P ha,q (j) = ck !
n # " ! ! " sim ha,q , hb (i) ≡ α P hb (i) = ck |ha,q (i) = ck k=1
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
10
User Preference Similarity:
User Preference Similarity
" # sim(ua , ub ) ≡ sim ua (i), ub (i)
AMANCIO BOUZA
sim(ua , ub ) ≡ α
z ! m !
P (ua (j) = ck ∧ ub (j) = ck )
Partial Similarity Metric k=1 j=1
" # " # sim ua (i), ub (i) ! sim ha (i), hb (i)
tinue
z ! m ! " # " # na (i), ub (i) ! α sim u P ha (j) = ck ∧ hb (j) = ck
# ! " ! " sim ha,q , hbPartial (i) User ≡ Preference α P hb (j) = ck ∧ ha,q (j) = ck Similarity k=1 j=1
2
Continue
AMANCIO BOUZA "
# ∂sim(ua , ub |ha,q ) ≡ sim ha,q , hb (i)
j=1 n # ! " ! n sim h # , h (i) ≡ α P h (j) = c a,q
≡α
b
j=1
≡α
Partial Preference Simiarlity
tial Preference Simiarlity
"
" " ! P hb (j) = ck |ha,q "(j) = ck" P ha,q (j) = ck ! ! !
b
j=1 n # j=1
k
∧ ha,q (j) = ck
P hb (j) = ck |ha,q (j) = ck P ha,q (j) = ck 1
n # " ! ! " sim ha,q , hb (i) ≡ α P hb (i) = ck |ha,q (i) = ck
n # " ! ! " sim ha,q , hb (i) ≡ α P hb (i) = ck |ha,q (i) = ck k=1
k=1
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
10
Collaborative Filtering
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
11
FORMULAS FOR THE SOFTALK
Collaborative Filtering AMANCIO BOUZA
rˆaj = ra + κ (Resnick et al. 1994)
n ! b!=a
κ= 1. Jun. 2009
sim(a, b) ∗ (rbj − rb )
n ! b!=a
1 sim(a, b)
IRMLeS 2009 by Amancio Bouza
11
AMANCIO BOUZA
Collaborative Filtering rˆaj = ra + κ (Resnick et al. 1994)
n ! b!=a
κ=
sim(a, b) ∗ (rbj − rb )
n !
1
Item 1 Item 2 Item 3
Avg
a
2
?
4
3
b
2
4
5
3.66
c
5
1
1
2.33
Avg
3
2.5
3.33
sim(a, b)
b!=a
1. Jun. 2009
User
IRMLeS 2009 by Amancio Bouza
Similarity
a
b
c
a
1
0.875
0.25
b
0.875
1
0.125
c
0.25
0.125
1
11
AMANCIO BOUZA AMANCIO BOUZA
Collaborative Filtering rˆaj = ra + κ
n ! b!=a
sim(a, b) ∗ (rbj − rb )
n !
1
rˆaj = ra +!κsim(a, b) sim(a, b) ∗ (rbj − rb ) κ=
(Resnick et al. 1994)
rˆaj = ra + κ
n
b!=a n !
n ! 0.875 ∗ (4 − 3.66) + 0.25 ∗ (1 − 2.33)
0.875 + 0.25 0.298 − 0.333 =3+ = 2.969 1.125 b!=a
1. Jun. 2009
1
sim(a, b) ∗ (rbj − rb )
κ=
b!=a
rˆaj = 3 +
b!=a
User
Item 1 Item 2 Item 3
Avg
a
2
?
4
3
b
2
4
5
3.66
c
5
1
1
2.33
Avg
3
2.5
3.33
sim(a, b)
IRMLeS 2009 by Amancio Bouza
Similarity
a
b
c
a
1
0.875
0.25
b
0.875
1
0.125
c
0.25
0.125
1
11
does it work?
Evaluation Dataset IMDb (movie features) + Netflix Prize (user ratings) 10’128 Movies, 83’029’805 ratings, 479’437 users
Data Analysis Avg. num. r/u: 173.2 Median r/u: 80 Avg. rating: 3.53 Rating median: 4
Experimental Setting Few ratings, few common rated items: 500 users, 50 r/u Many ratings, many common rated items: 500 users, 200 r/u
Significance test Wilcoxon signed-ranks test Significance level: alpha = 0.01 Bonferroni correction for the family-wise error
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
13
Evaluation Dataset IMDb (movie features) + Netflix Prize (user ratings) 10’128 Movies, 83’029’805 ratings, 479’437 users
Setting Data Analysis Avg. num. r/u: 173.2 Median r/u: 80 Avg. rating: 3.53 Rating median: 4
Algorithm
RMSE
Recall
F1
1.097698
0.898961
66.23% 71.23% 68.64%
UMSim (SVM)
1.077945
0.88902
66.72% 71.33% 68.95%
0.885730
66.34% 68.34% 68.34%
0.929923
65.19% 71.14% 68.04%
UMSim (Part) 1.075843 50 Few ratings, few common rated items: 500 users, 50 r/uratings/user CF (Pearson Corr.) 1.131921 Many ratings, many common rated items: 500 users, 200 r/u
Wilcoxon signed-ranks test
Prec.
pUMSim (Part)
Experimental Setting
Significance test
MAE
SVM
1.309146
0.976800
63.85% 71.68% 67.53%
Part
1.334507
1.003800
64.32% 70.98% 67.49%
Significance level: alpha = 0.01 Bonferroni correction for the family-wise error
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
13
Evaluation Setting
Dataset IMDb (movie features) + Netflix Prize (user ratings) 10’128 Movies, 83’029’805 ratings, 479’437 users
Setting Data Analysis Avg. num. r/u: 173.2 Median r/u: 80 Avg. rating: 3.53 Rating median: 4
Algorithm
ratings/user
MAE
Prec.
Recall
F1
pUMSim (Part)
1.097698
0.898961
66.23% 71.23% 68.64%
UMSim (SVM)
1.077945
0.88902
66.72% 71.33% 68.95%
F1
68.34% 68.34%
CF (Pearson Corr.) 1.131921
0.929923
65.19% 71.14% 68.04%
SVM
0.976800
63.85% 71.68% 67.53%
pUMSim (Part)
1.048786
UMSim (SVM)
1.003800 64.32% 63.88% 70.98% 67.49% 1.035611 Part 0.835009 1.334507 60.77% 67.33%
UMSim (Part) 1.032746 200 Few ratings, few common rated items: 500 users, 50 r/uratings/user CF (Pearson Corr.) 1.035324 Many ratings, many common rated items: 500 users, 200 r/u
Wilcoxon signed-ranks test
RMSE
UMSim (Part) 0.885730 66.34% RMSE MAE 1.075843 Prec. Recall 50
Experimental Setting
Significance test
Algorithm
0.843029
60.90% 66.83% 63.73%
1.309146
0.833374
60.89% 67.31% 63.94%
0.832373
60.56% 68.71% 64.38%
SVM
1.230682
0.896450
58.54% 67.80% 62.83%
Part
1.292360
0.953600
58.76% 64.72% 61.60%
Significance level: alpha = 0.01 Bonferroni correction for the family-wise error
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
13
Evaluation Setting
Dataset IMDb (movie features) + Netflix Prize (user ratings) 10’128 Movies, 83’029’805 ratings, 479’437 users
Data Analysis Avg. num. r/u: 173.2
50 ratings/user
Median r/u: 80 Avg. rating: 3.53
Algorithm
RMSE
MAE
Prec.
Recall
F1
pUMSim (Part)
1.097698
0.898961
66.23% 71.23% 68.64%
UMSim (SVM)
1.077945
0.88902
66.72% 71.33% 68.95%
UMSim (Part)
1.075843
0.885730
66.34% 68.34% 68.34%
CF (Pearson Corr.) 1.131921
0.929923
65.19% 71.14% 68.04%
SVM
1.309146
0.976800
63.85% 71.68% 67.53%
Part
1.334507
1.003800
64.32% 70.98% 67.49%
RMSE
MAE
pUMSim (Part)
1.048786
0.843029
60.90% 66.83% 63.73%
UMSim (SVM)
1.035611
0.835009
60.77% 67.33% 63.88%
UMSim (Part)
1.032746
0.833374
60.89% 67.31% 63.94%
CF (Pearson Corr.) 1.035324
0.832373
60.56% 68.71% 64.38%
SVM
1.230682
0.896450
58.54% 67.80% 62.83%
Part
1.292360
0.953600
58.76% 64.72% 61.60%
Rating median: 4
Experimental Setting Few ratings, few common rated items: 500 users, 50 r/u
Setting
Many ratings, many common rated items: 500 users, 200 r/u
Significance test Wilcoxon signed-ranks test Significance level: alpha = 0.01
200 ratings/user
Algorithm
Prec.
Recall
F1
Bonferroni correction for the family-wise error
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
13
Conclusion Model similarity is important Similarity based on user preference models sometimes significantly outperforms Similarity based on common rated item Especially with few common rated items
Partial User Preference Similarity needs further improvement Preprocessing needed for scalability 1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
14
Thanks for your attention
References 1. S. S. Anand, P. Kearney, and M. Shapcott. Generating semantically enriched user profiles for web personalization. ACM Transactions on Internet Technology, 2007. 2. M. Balabanovic and Y. Shoham. Fab: Content-based, collaborative recommendation. In Communications of the ACM, 1997. 3. C. Basu, H. Hirsh, and W. Cohen. Recommendation as classification: Using social and content-based information in recommendation. In AAAI, 1998. 4. J. Bennett and S. Lanning. The netflix prize. KDD Cup and Workshop, 2007. 5. J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In 14th Conference on Uncertainty in AI, 1998.
10. D. Lemire and A. Maclachlan. Slope one predictors for online rating-based collaborative filtering. In Proceedings of SIAM Data Mining (SDM’05), 2005. 11. P. Melville, R. J. Mooney, and R. Nagara jan. Content-boosted collaborative filtering for improved recommendations. In AAAI, 2002. 12. S. E. Middleton, H. Alani, and D. C. de Roure. Exploiting synergy between ontologies and recommender systems. In WWW, 2002. 13. S. E. Middleton, N. R. Shadbolt, and D. C. de Roure. Ontological user profiling in recommender systems. In ACM Transactions on Information Systems, 2004. 14. T. M. Mitchel. Machine Learning. 1997.
6. I. Cantador, A. Bellog´ın, and P. Castells. A multilayer ontology-based hybrid recommendation model. AI Communcations, 2008.
15. P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: an open architecture for collaborative filtering of netnews. In CSCW, 1994.
7. E. Frank and I. H. Witten. Generating accurate rule sets without global optimization. In 15th International Conference on Machine Learning, 1998.
16. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. In WWW, 2001.
8. N. Good, J. B. Schafer, J. A. Konstan, A. Borchers, B. Sarwar, J. Herlocker, and J. Riedl. Combining collaborative filtering with personal agents for better recommendations. In AAAI /IAAI, 1999.
17. I. H. Witten and E. Frank. Data Mining - Practical Machine Learning Tools and Techniques. 2005.
9. J. L. Herlocker, J. A. Konstan, L. G. Reveen, and J. T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. on Information Sys., 2004.
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
16
Summary
FORMULAS FOR THE SOFTALK
Amancio Bouza, Gerald Reif, Abraham, Bernstein: “Probabilistic Partial User Model Similarity for Collaborative Filtering”, IRMLeS 2009
AMANCIO BOUZA Asian Food
Asian Food
Italian Food
User function u(i) = ck hypothesized User function
Similarity based on user preference models is important
h(i) + ε(i) = ck h : i !→ ck
Zurich
Heraklion
hypothesized User function
User preference similarity is good, but 2 partial user preference similarity not Continue always. Needs further investigation
h(i) → ck
hypothesized User function H1
AMANCIO BOUZA
ha : i !→ ck
hypothesized User function
hb : i !→ ck
H2
n #
User-Based " ! " ! collaborative Filtering: H3 sim ha,q , hb (i) ≡ α P hb (j) = ck ∧ ha,q (j)! n= ck
Partial user preference similarity based on similarity between hypothesis and user model:
≡α
Partial Preference Simiarlity
Hypothesis extraction from user model Hypothesis as item filter
j=1 n #
rˆaj = ra + κ
b!=a
" ! P hb (j) = ck |ha,q (j) = ck P 1 ha,q (j) = c
! Normalization factor κ: j=1
sim(a, b) ∗ (rbj − rb )
Item set set Item
Example calculation:
κ=
n !
sim(a, b)
b!=a
n # " ! ! " 0.875 ∗ (4 − 3.66) + 0.25 ∗ (1 − 2. sim ha,q , hb (i) ≡ α Prˆa2h=b (i) 3 + = ck |ha,q (i) = ck 0.875 + 0.25 k=1
=3+
0.298 − 0.333 = 2.969 1.125
User Model: ua (i) = ha (i) + ε(i)
1. Jun. 2009
IRMLeS 2009 by Amancio Bouza
User Model: u(i) = h(i) + ε(i)
17