Exploiting Similarity to Optimize Recommendation Lists from Click ...

0 downloads 101 Views 2MB Size Report
Exploiting Similarity to Optimize. Recommendations from ... D-INF,ETHZ ). 2. Collaborators: Isidor Nikolic (Microsoft, Z
1

Exploiting Similarity to Optimize Recommendations from User Feedback Hasta Vanchinathan Andreas Krause (Learning and Adaptive Systems Group, D-INF,ETHZ )

Collaborators: Isidor Nikolic (Microsoft, Zurich), Fabio De Bona (Google, Zurich) 2

A Recommendation Example

3

A Recommendation Example

4

A Recommendation Example

5

A Recommendation Example

6

A Recommendation Example

7

A Recommendation Example

8

A Recommendation Example

9

A Recommendation Example

10

Many real world instances…

Disclaimer: All trademarks belong to respective owners

Many real world instances…

Disclaimer: All trademarks belong to respective owners

Many real world instances…

Disclaimer: All trademarks belong to respective owners

Many real world instances…

Disclaimer: All trademarks belong to respective owners

Many real world instances…

Disclaimer: All trademarks belong to respective owners

Many real world instances…

Disclaimer: All trademarks belong to respective owners

Many real world instances…

Disclaimer: All trademarks belong to respective owners

Many real world instances…

Disclaimer: All trademarks belong to respective owners

Common Thread

19

Common Thread • To do well, we need a model. e.g.,

20

Common Thread • To do well, we need a model. e.g.,

• Popular techniques include – Content-based filtering – Collaborative filtering – Hybrid recommendation systems

21

Common Thread • To do well, we need a model. e.g.,

• Popular techniques include – Content-based filtering – Collaborative filtering – Hybrid recommendation systems

• All aim to predict reward given a fixed data set 22

Challenges

23

Challenges Many, dynamic!

24

Challenges Many, dynamic!

Preferences change

25

Challenges Many, dynamic!

Estimating all combinations both hard and wasteful!

Preferences change

26

Challenges Many, dynamic!

Estimating all combinations both hard and wasteful!

Preferences change Only need identify high reward items!

27

Challenges Many, dynamic!

Estimating all combinations both hard and wasteful!

Preferences change Only need identify high reward items!

28

Multi Arm Bandits

29

Multi Arm Bandits

30

Multi Arm Bandits • Early approaches require k > T 33

Learning meets bandits

f(x)

x

34

Learning meets bandits • Exploit similarity information to predict rewards for new items f(x)

x

35

Learning meets bandits • Exploit similarity information to predict rewards for new items • Must make assumptions on f(x) reward function, e.g.:

x

36

Learning meets bandits • Exploit similarity information to predict rewards for new items • Must make assumptions on f(x) reward function, e.g.: • Linear (linUCB - Li et al ‘10) x

37

Learning meets bandits • Exploit similarity information to predict rewards for new items • Must make assumptions on f(x) reward function, e.g.: • Linear (linUCB - Li et al ‘10) • Lipschitz (Bubeck et al ‘08) x

38

Learning meets bandits • Exploit similarity information to predict rewards for new items • Must make assumptions on f(x) reward function, e.g.: • Linear (linUCB - Li et al ‘10) • Lipschitz (Bubeck et al ‘08) • Low RKHS norm (GP-UCB Srinivas et al ’12)

x

39

Learning meets bandits • Exploit similarity information to predict rewards for new items • Must make assumptions on f(x) reward function, e.g.: • Linear (linUCB - Li et al ‘10) • Lipschitz (Bubeck et al ‘08) • Low RKHS norm (GP-UCB Srinivas et al ’12)

x

• This is the approach we pursue in this work! 40

Problem Setup

41

Problem Setup

42

Problem Setup

= user attributes

43

Problem Setup

= user attributes

44

Problem Setup

= user attributes

45

Problem Setup

= user attributes

46

Problem Setup

= user attributes

47

Problem Setup

= user attributes

48

Problem Setup

= user attributes

49

Problem Setup

= user attributes

50

Problem Setup

= user attributes

We want to maximize:

51

Problem Setup

= user attributes

Equivalently, minimize

52

Problem Setup

= user attributes

Equivalently, minimize

53

Our Approach

54

Our Approach • We propose CGPRank, that uses a bayesian model for the rewards

55

Our Approach • We propose CGPRank, that uses a bayesian model for the rewards • CGPRank efficiently shares reward across

56

Our Approach • We propose CGPRank, that uses a bayesian model for the rewards • CGPRank efficiently shares reward across

– Items

57

Our Approach • We propose CGPRank, that uses a bayesian model for the rewards • CGPRank efficiently shares reward across

– Items – Users

58

Our Approach • We propose CGPRank, that uses a bayesian model for the rewards • CGPRank efficiently shares reward across

– Items – Users – positions 59

‘Demux’ing Feedback

60

‘Demux’ing Feedback We still need to predict:

61

‘Demux’ing Feedback We still need to predict:

Assume: items do not influence reward of other items

62

‘Demux’ing Feedback We still need to predict:

Assume: items do not influence reward of other items

63

‘Demux’ing Feedback We still need to predict:

Assume: items do not influence reward of other items

64

‘Demux’ing Feedback We still need to predict:

Assume: items do not influence reward of other items

relevance! 65

‘Demux’ing Feedback We still need to predict:

Assume: items do not influence reward of other items

relevance!

Position CTR! 66

CGPRank – Sharing across positions

67

CGPRank – Sharing across positions

68

CGPRank – Sharing across positions 0.3

0.17

0.16

0.08

69

CGPRank – Sharing across positions 0.3

0.17

0.16

0.08

70

CGPRank – Sharing across positions 0.3

0.3

0.17

??

0.16

??

0.08

0.08

71

CGPRank – Sharing across positions 0.3

0.3

0.17

??

0.16

??

0.08

0.08

72

CGPRank – Sharing across positions 0.3

0.3

1

0.17

??

0.8

0.16

??

0.65

0.08

0.08

0.47

- Position weights - independent of items! - estimated from logs

73

CGPRank – Sharing across positions 0.3

0.3

1

0.17

0.19

0.8

0.16

0.13

0.65

0.08

0.08

0.47

- Position weights - independent of items! - estimated from logs

74

CGPRank – Sharing across items/users

75

CGPRank – Sharing across items/users

76

CGPRank – Sharing across items/users

77

CGPRank – Sharing across items/users

78

CGPRank – Sharing across items/users

79

CGPRank – Sharing across items/users

80

CGPRank – Sharing across items/users

81

CGPRank – Sharing across items/users

82

CGPRank – Sharing across items/users

83

CGPRank – Sharing across items/users

84

CGPRank – Sharing across items/users

85

CGPRank – Sharing across items/users

86

CGPRank – Sharing across items/users

87

Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f)

f(x) reward

x choice

88

Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f)

f(x) reward

x choice

89

Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f)

f(x) reward

x choice

90

Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f)

f(x) reward likely

x choice

91

Sharing across items / users with Gaussian processes Bayesian models for functions Prior P(f)

unlikely

f(x) reward likely

x choice

92

Sharing across items / users with Gaussian processes Likelihood P(data | f)

Bayesian models for functions Prior P(f)

unlikely

f(x)

f(x) reward

+

likely

+

+ +

x choice

93

Sharing across items / users with Gaussian processes Likelihood P(data | f)  Posterior: P(f | data)

Bayesian models for functions Prior P(f)

unlikely

f(x)

f(x) reward

+

likely

x

+

+ + x

choice

94

Sharing across items / users with Gaussian processes Likelihood P(data | f)  Posterior: P(f | data)

Bayesian models for functions Prior P(f)

unlikely

f(x)

f(x) reward

+

likely

x

+

+ + x

choice

95

Sharing across items / users with Gaussian processes Likelihood P(data | f)  Posterior: P(f | data)

Bayesian models for functions Prior P(f)

unlikely

f(x) likely

f(x) reward

+

likely

x

+

+ + x

choice

96

Sharing across items / users with Gaussian processes Likelihood P(data | f)  Posterior: P(f | data)

Bayesian models for functions Prior P(f)

unlikely

f(x) likely

f(x) reward

+

likely

x

+

+ + x

choice

97

Sharing across items / users with Gaussian processes Likelihood P(data | f)  Posterior: P(f | data)

Bayesian models for functions Prior P(f)

unlikely

f(x) likely

f(x) reward

+

likely

x choice

+

+ + x unlikely

98

Sharing across items / users with Gaussian processes Likelihood P(data | f)  Posterior: P(f | data)

Bayesian models for functions Prior P(f)

unlikely

f(x) likely

f(x) reward

+

likely

+

+ +

x choice

x unlikely

Closed form Bayesian posterior inference possible! 99

Sharing across items / users with Gaussian processes Likelihood P(data | f)  Posterior: P(f | data)

Bayesian models for functions Prior P(f)

unlikely

f(x) likely

f(x) reward

+

likely

+

+ +

x choice

Closed form Bayesian posterior inference possible! Allows to represent uncertainty in prediction

x unlikely

100

Predictive confidence in GPs f(x)

x

Typically, only care about “marginals”, i.e.,  

101

Predictive confidence in GPs f(x)

x’

x

Typically, only care about “marginals”, i.e.,  

102

Predictive confidence in GPs f(x)

f(x’)

x’

x

P(f(x’))

Typically, only care about “marginals”, i.e.,  

103

Predictive confidence in GPs f(x)

f(x’)

x’

x

P(f(x’))

Typically, only care about “marginals”, i.e., Parameterized by covariance function K(x,x’) = Cov(f(x),f(x’)) 

104

Predictive confidence in GPs f(x)

f(x’)

x’

x

P(f(x’))

Typically, only care about “marginals”, i.e., Parameterized by covariance function K(x,x’) = Cov(f(x),f(x’)) Can capture many rec. tasks using appropriate cov. function 105

Intuition: Explore-Exploit using GPs

Selection Rule:

118

Intuition: Explore-Exploit using GPs

Selection Rule:

119

CGPRank – Selection Rule

120

CGPRank – Selection Rule

At t=0, if no prior observations

121

CGPRank – Selection Rule

At t=0, with some prior observation

122

CGPRank – Selection Rule

Uncertainty shrinks not just at observation….

123

CGPRank – Selection Rule

but also at other locations based on similarity!

124

CGPRank – Selection Rule

If list size is 2…

125

CGPRank – Selection Rule

The first item,

, is selected according to

126

CGPRank – Selection Rule



127

CGPRank – Selection Rule



Secret sauce?  128

CGPRank – Selection Rule



Time varying tradeoff parameter 129

CGPRank – Selection Rule



Hallucinate mean

and shrink uncertainties…

130

CGPRank – Selection Rule



Hallucinate mean

and shrink uncertainties…

131

CGPRank – Selection Rule



Now update model and again pick using:

132

CGPRank – Selection Rule





Now update model and again pick using:

133

CGPRank

134

CGPRank

135

CGPRank

136

CGPRank

137

CGPRank

138

CGPRank

139

CGPRank

140

CGPRank

141

CGPRank

142

CGPRank

143

CGPRank

144

CGPRank

145

CGPRank

146

CGPRank - guarantees Theorem 1 If we choose

, then running CGPRank for T

rounds, we incur a regret sublinear in T. Specifically,

Grows strongly sublinearly for typical kernels

147

Experiments - Datasets

153

Experiments - Datasets • Google book store logs – 42 days of user logs – Given key book, suggest list of related books – Kernel computed from related graph on books

154

Experiments - Datasets • Google book store logs – 42 days of user logs – Given key book, suggest list of related books – Kernel computed from related graph on books

• Yahoo! Webscope R6B* – – – –

10 days of user log on Yahoo! Frontpage Unbiased method to test bandit algorithms 45 million user interations with 271 articles Feedback available for single selection, we simulated list selection 155

Experiments - Questions • How much does principled sharing of feedback help? – Across items/context? – Across positions?

• Can CGPRank outperform an existing, tuned recommendation system?

156

Sharing across items

157

Sharing across contexts

158

Effect of increasing list size

159

Boost over existing approach

Existing Algorithm

160

Conclusions • CGPRank - Efficient Algorithm with strong theoretical guarantees • Can generalize from sparse feedback across • Items • Contexts • Positions • Experiments suggest • Statistical and computational efficiency 161

Suggest Documents