Exploiting Similarity to Optimize. Recommendations from ... D-INF,ETHZ ). 2. Collaborators: Isidor Nikolic (Microsoft, Z
1
Exploiting Similarity to Optimize Recommendations from User Feedback Hasta Vanchinathan Andreas Krause (Learning and Adaptive Systems Group, D-INF,ETHZ )
Collaborators: Isidor Nikolic (Microsoft, Zurich), Fabio De Bona (Google, Zurich) 2
A Recommendation Example
3
A Recommendation Example
4
A Recommendation Example
5
A Recommendation Example
6
A Recommendation Example
7
A Recommendation Example
8
A Recommendation Example
9
A Recommendation Example
10
Many real world instances…
Disclaimer: All trademarks belong to respective owners
Many real world instances…
Disclaimer: All trademarks belong to respective owners
Many real world instances…
Disclaimer: All trademarks belong to respective owners
Many real world instances…
Disclaimer: All trademarks belong to respective owners
Many real world instances…
Disclaimer: All trademarks belong to respective owners
Many real world instances…
Disclaimer: All trademarks belong to respective owners
Many real world instances…
Disclaimer: All trademarks belong to respective owners
Many real world instances…
Disclaimer: All trademarks belong to respective owners
Common Thread
19
Common Thread • To do well, we need a model. e.g.,
20
Common Thread • To do well, we need a model. e.g.,
• Popular techniques include – Content-based filtering – Collaborative filtering – Hybrid recommendation systems
21
Common Thread • To do well, we need a model. e.g.,
• Popular techniques include – Content-based filtering – Collaborative filtering – Hybrid recommendation systems
• All aim to predict reward given a fixed data set 22
Challenges
23
Challenges Many, dynamic!
24
Challenges Many, dynamic!
Preferences change
25
Challenges Many, dynamic!
Estimating all combinations both hard and wasteful!
Preferences change
26
Challenges Many, dynamic!
Estimating all combinations both hard and wasteful!
Preferences change Only need identify high reward items!
27
Challenges Many, dynamic!
Estimating all combinations both hard and wasteful!
Preferences change Only need identify high reward items!
28
Multi Arm Bandits
29
Multi Arm Bandits
30
Multi Arm Bandits • Early approaches require k > T 33
Learning meets bandits
f(x)
x
34
Learning meets bandits • Exploit similarity information to predict rewards for new items f(x)
x
35
Learning meets bandits • Exploit similarity information to predict rewards for new items • Must make assumptions on f(x) reward function, e.g.:
x
36
Learning meets bandits • Exploit similarity information to predict rewards for new items • Must make assumptions on f(x) reward function, e.g.: • Linear (linUCB - Li et al ‘10) x
37
Learning meets bandits • Exploit similarity information to predict rewards for new items • Must make assumptions on f(x) reward function, e.g.: • Linear (linUCB - Li et al ‘10) • Lipschitz (Bubeck et al ‘08) x
38
Learning meets bandits • Exploit similarity information to predict rewards for new items • Must make assumptions on f(x) reward function, e.g.: • Linear (linUCB - Li et al ‘10) • Lipschitz (Bubeck et al ‘08) • Low RKHS norm (GP-UCB Srinivas et al ’12)
x
39
Learning meets bandits • Exploit similarity information to predict rewards for new items • Must make assumptions on f(x) reward function, e.g.: • Linear (linUCB - Li et al ‘10) • Lipschitz (Bubeck et al ‘08) • Low RKHS norm (GP-UCB Srinivas et al ’12)
x
• This is the approach we pursue in this work! 40
Problem Setup
41
Problem Setup
42
Problem Setup
= user attributes
43
Problem Setup
= user attributes
44
Problem Setup
= user attributes
45
Problem Setup
= user attributes
46
Problem Setup
= user attributes
47
Problem Setup
= user attributes
48
Problem Setup
= user attributes
49
Problem Setup
= user attributes
50
Problem Setup
= user attributes
We want to maximize:
51
Problem Setup
= user attributes
Equivalently, minimize
52
Problem Setup
= user attributes
Equivalently, minimize
53
Our Approach
54
Our Approach • We propose CGPRank, that uses a bayesian model for the rewards
55
Our Approach • We propose CGPRank, that uses a bayesian model for the rewards • CGPRank efficiently shares reward across
56
Our Approach • We propose CGPRank, that uses a bayesian model for the rewards • CGPRank efficiently shares reward across
– Items
57
Our Approach • We propose CGPRank, that uses a bayesian model for the rewards • CGPRank efficiently shares reward across
– Items – Users
58
Our Approach • We propose CGPRank, that uses a bayesian model for the rewards • CGPRank efficiently shares reward across
– Items – Users – positions 59
‘Demux’ing Feedback
60
‘Demux’ing Feedback We still need to predict:
61
‘Demux’ing Feedback We still need to predict:
Assume: items do not influence reward of other items
62
‘Demux’ing Feedback We still need to predict:
Assume: items do not influence reward of other items
63
‘Demux’ing Feedback We still need to predict:
Assume: items do not influence reward of other items
64
‘Demux’ing Feedback We still need to predict:
Assume: items do not influence reward of other items
relevance! 65
‘Demux’ing Feedback We still need to predict:
Assume: items do not influence reward of other items
relevance!
Position CTR! 66
CGPRank – Sharing across positions
67
CGPRank – Sharing across positions
68
CGPRank – Sharing across positions 0.3
0.17
0.16
0.08
69
CGPRank – Sharing across positions 0.3
0.17
0.16
0.08
70
CGPRank – Sharing across positions 0.3
0.3
0.17
??
0.16
??
0.08
0.08
71
CGPRank – Sharing across positions 0.3
0.3
0.17
??
0.16
??
0.08
0.08
72
CGPRank – Sharing across positions 0.3
0.3
1
0.17
??
0.8
0.16
??
0.65
0.08
0.08
0.47
- Position weights - independent of items! - estimated from logs
73
CGPRank – Sharing across positions 0.3
0.3
1
0.17
0.19
0.8
0.16
0.13
0.65
0.08
0.08
0.47
- Position weights - independent of items! - estimated from logs
74
CGPRank – Sharing across items/users
75
CGPRank – Sharing across items/users
76
CGPRank – Sharing across items/users
77
CGPRank – Sharing across items/users
78
CGPRank – Sharing across items/users
79
CGPRank – Sharing across items/users
80
CGPRank – Sharing across items/users
81
CGPRank – Sharing across items/users
82
CGPRank – Sharing across items/users
83
CGPRank – Sharing across items/users
84
CGPRank – Sharing across items/users
85
CGPRank – Sharing across items/users
86
CGPRank – Sharing across items/users
87
Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f)
f(x) reward
x choice
88
Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f)
f(x) reward
x choice
89
Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f)
f(x) reward
x choice
90
Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f)
f(x) reward likely
x choice
91
Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f)
unlikely
f(x) reward likely
x choice
92
Sharing across items / users with Gaussian processes Likelihood P(data | f)
Bayesian models for functions Prior P(f)
unlikely
f(x)
f(x) reward
+
likely
+
+ +
x choice
93
Sharing across items / users with Gaussian processes Likelihood P(data | f) Posterior: P(f | data)
Bayesian models for functions Prior P(f)
unlikely
f(x)
f(x) reward
+
likely
x
+
+ + x
choice
94
Sharing across items / users with Gaussian processes Likelihood P(data | f) Posterior: P(f | data)
Bayesian models for functions Prior P(f)
unlikely
f(x)
f(x) reward
+
likely
x
+
+ + x
choice
95
Sharing across items / users with Gaussian processes Likelihood P(data | f) Posterior: P(f | data)
Bayesian models for functions Prior P(f)
unlikely
f(x) likely
f(x) reward
+
likely
x
+
+ + x
choice
96
Sharing across items / users with Gaussian processes Likelihood P(data | f) Posterior: P(f | data)
Bayesian models for functions Prior P(f)
unlikely
f(x) likely
f(x) reward
+
likely
x
+
+ + x
choice
97
Sharing across items / users with Gaussian processes Likelihood P(data | f) Posterior: P(f | data)
Bayesian models for functions Prior P(f)
unlikely
f(x) likely
f(x) reward
+
likely
x choice
+
+ + x unlikely
98
Sharing across items / users with Gaussian processes Likelihood P(data | f) Posterior: P(f | data)
Bayesian models for functions Prior P(f)
unlikely
f(x) likely
f(x) reward
+
likely
+
+ +
x choice
x unlikely
Closed form Bayesian posterior inference possible! 99
Sharing across items / users with Gaussian processes Likelihood P(data | f) Posterior: P(f | data)
Bayesian models for functions Prior P(f)
unlikely
f(x) likely
f(x) reward
+
likely
+
+ +
x choice
Closed form Bayesian posterior inference possible! Allows to represent uncertainty in prediction
x unlikely
100
Predictive confidence in GPs f(x)
x
Typically, only care about “marginals”, i.e.,
101
Predictive confidence in GPs f(x)
x’
x
Typically, only care about “marginals”, i.e.,
102
Predictive confidence in GPs f(x)
f(x’)
x’
x
P(f(x’))
Typically, only care about “marginals”, i.e.,
103
Predictive confidence in GPs f(x)
f(x’)
x’
x
P(f(x’))
Typically, only care about “marginals”, i.e., Parameterized by covariance function K(x,x’) = Cov(f(x),f(x’))
104
Predictive confidence in GPs f(x)
f(x’)
x’
x
P(f(x’))
Typically, only care about “marginals”, i.e., Parameterized by covariance function K(x,x’) = Cov(f(x),f(x’)) Can capture many rec. tasks using appropriate cov. function 105
Intuition: Explore-Exploit using GPs
Selection Rule:
118
Intuition: Explore-Exploit using GPs
Selection Rule:
119
CGPRank – Selection Rule
120
CGPRank – Selection Rule
At t=0, if no prior observations
121
CGPRank – Selection Rule
At t=0, with some prior observation
122
CGPRank – Selection Rule
Uncertainty shrinks not just at observation….
123
CGPRank – Selection Rule
but also at other locations based on similarity!
124
CGPRank – Selection Rule
If list size is 2…
125
CGPRank – Selection Rule
The first item,
, is selected according to
126
CGPRank – Selection Rule
✔
127
CGPRank – Selection Rule
✔
Secret sauce? 128
CGPRank – Selection Rule
✔
Time varying tradeoff parameter 129
CGPRank – Selection Rule
✔
Hallucinate mean
and shrink uncertainties…
130
CGPRank – Selection Rule
✔
Hallucinate mean
and shrink uncertainties…
131
CGPRank – Selection Rule
✔
Now update model and again pick using:
132
CGPRank – Selection Rule
✔
✔
Now update model and again pick using:
133
CGPRank
134
CGPRank
135
CGPRank
136
CGPRank
137
CGPRank
138
CGPRank
139
CGPRank
140
CGPRank
141
CGPRank
142
CGPRank
143
CGPRank
144
CGPRank
145
CGPRank
146
CGPRank - guarantees Theorem 1 If we choose
, then running CGPRank for T
rounds, we incur a regret sublinear in T. Specifically,
Grows strongly sublinearly for typical kernels
147
Experiments - Datasets
153
Experiments - Datasets • Google book store logs – 42 days of user logs – Given key book, suggest list of related books – Kernel computed from related graph on books
154
Experiments - Datasets • Google book store logs – 42 days of user logs – Given key book, suggest list of related books – Kernel computed from related graph on books
• Yahoo! Webscope R6B* – – – –
10 days of user log on Yahoo! Frontpage Unbiased method to test bandit algorithms 45 million user interations with 271 articles Feedback available for single selection, we simulated list selection 155
Experiments - Questions • How much does principled sharing of feedback help? – Across items/context? – Across positions?
• Can CGPRank outperform an existing, tuned recommendation system?
156
Sharing across items
157
Sharing across contexts
158
Effect of increasing list size
159
Boost over existing approach
Existing Algorithm
160
Conclusions • CGPRank - Efficient Algorithm with strong theoretical guarantees • Can generalize from sparse feedback across • Items • Contexts • Positions • Experiments suggest • Statistical and computational efficiency 161