web-based music service and we show its potential as an addition to explicit rating. Keywords: implicit learning; recommender systems; user behavior; relief algorithm ... [5], besides tracking the user's behavior, the interaction history of a group of similar ... They find a group of customers having similar characteristics, and.
Personalized Implicit Learning in a Music Recommender System Suzana Kordumova1, Ivana Kostadinovska1, Mauro Barbieri2, Verus Pronk2, and Jan Korst2 1
Ss. Cyril and Methodius University Faculty of Natural Sciences and Mathematics 1000 Skopje, R. Macedonia {suzana.kordumova,ivana.kostadinovska}@gmail.com 2 Philips Research, High Tech Campus 34, 5656 AE Eindhoven, The Netherlands {mauro.barbieri,verus.pronk,jan.korst}@philips.com
Abstract. Recommender systems typically require feedback from the user to learn the user’s taste. This feedback can come in two forms: explicit and implicit. Explicit feedback consists of ratings provided by the user for a number of items, while implicit feedback comes from observing user actions on items. These actions have to be interpreted by the recommender system and translated into a rating. In this paper we propose a method to learn how to translate user actions on items to ratings on these items by correlating user actions with explicit feedback. We do this by associating user actions to rated items and subsequently applying naive Bayesian classification to rate new items with which the user has interacted. We apply and evaluate our method on data from a web-based music service and we show its potential as an addition to explicit rating. Keywords: implicit learning; recommender systems; user behavior; relief algorithm; naive Bayesian classification.
1 Introduction Recommenders are becoming a popular tool to retrieve, from a vast amount of items such as from A/V content repositories, product catalogues, and the like, only those items a user or a group of users likes. These recommenders are typically offered as a stand-alone service (e.g. MovieLens [1]), or as an add-on to an existing service (e.g. Amazon, iTunes). They increasingly appear in consumer devices, such as the TiVo DVR [2] or the Media Center PC running the Watchmi software plug-in [3]. Recommender systems typically require feedback from the user to learn the user’s taste. This feedback generally comes in two forms: explicit and implicit. Explicit feedback consists of a user providing ratings for a number of items, e.g. on a five point scale or in a binary like/dislike form. Implicit feedback comes from observing user actions such as purchases, downloads, and selections of items for playback or deletion. These user actions have to be interpreted by the recommender system and P. De Bra, A. Kobsa, and D. Chin (Eds.): UMAP 2010, LNCS 6075, pp. 351–362, 2010. © Springer-Verlag Berlin Heidelberg 2010
352
S. Kordumova et al.
translated into a rating. For example, typically, recommender systems interpret a purchase action or, in case of music services, a listening action as a positive rating and the skipping of a song in an album as a negative rating. These types of user actions are quite rudimentary, and it is not always clear how to interpret them. For example, a skipped song may, but need not, indicate a negative rating. Furthermore, the interpretation may be user-dependent. Some of the advantages of implicit learning are that it frees users from having to explicitly rate items and it allows continuous updating of user preferences. In this paper we propose a method to learn, from explicit ratings, how to translate user actions on items to ratings on these items. We do this by associating user actions to rated items and considering these actions as features of these items. Naive Bayesian classification is then used to rate new items with which the user has interacted. These implicitly rated items provide an additional source of information for recommending new items. We present a preliminary evaluation of our method on data from a webbased music player. In particular, we show that it enables a form of implicit learning that adapts to each user individually. The rest of the paper is structured as follows. Section 2 gives an overview of related work in the domain of implicit learning for recommender systems. Section 3 introduces our approach to implicit learning and the architecture we adopted. In Sect. 4 we describe the music listening behavior features. Evaluation and results are presented in Sect. 5. Section 6 concludes the paper.
2 Related Work Given the advantages that implicit learning may offer, more and more recommender systems now rely on implicit profiles. Recommender systems in all areas of everyday life such as e-purchasing, digital news, music or TV recommenders and search engines, collect implicit knowledge. In [4] and [5] a user’s buying behavior is recorded to achieve more personalized and effective recommendations. In [4], the proposed system first collects user behavioral data, then analyzes this data to extract user preferences and, using these preferences, it creates recommendations. This system deducts the most important attributes for the user among various product attributes based on the ID3 algorithm. In [5], besides tracking the user’s behavior, the interaction history of a group of similar users is analyzed. They find a group of customers having similar characteristics, and then analyze their transaction histories. Customer profiles are constructed using the product features and individual and group interests. Because studies have shown that the vast majority of users are reluctant to provide any explicit feedback on search results and their interest [6], search engines also try to learn the user’s preferences automatically. In [6] a user model is proposed to formalize web-pages and to correlate them with clicks on search results. Then, based on this correlation, the paper describes an algorithm to actually learn interests. Many TV recommender systems have adopted the concept of implicit learning. In [7], the author is exploring different user actions that can be tracked and possible features are extracted from the user’s behavior to model the user’s preferences in TV
Personalized Implicit Learning in a Music Recommender System
353
shows. The paper also analyzes the most relevant machine learning solutions for adapting recommendations to users.
3 Implicit Learning Model We propose an implicit learning model that uses music listening behavioral data and an explicit rating history to build an implicit rating history. The architecture of the model is presented in Fig. 1. Music Player User actions on item M Music Listening Behavior Logging
New Item Music Listening Behavior Logging
Behavioral Data for M Features Calculations
Features Calculations
Behavioral Features for M Combine
Behavioral Features Profile
Recommender Based on Behavioral Features
Rating for M Explicit Rating History
Implicit Rating History
Fig. 1. Implicit learning model
The music player system logs the user’s actions and generates behavioral data. The behavioral data corresponding to songs that have explicit ratings is used to calculate behavioral features. These features are combined with the explicit rating to build a behavioral features profile. This is a recommender-specific representation of the rating history. Thus, in this architecture, explicit ratings are required, but they are here used to build a behavior-based recommender that can link user behavior to likes and dislikes. When the explicit rating history and the features profile are rich enough, the behavior-based recommender can start working and build an implicit rating history. When the user interacts with a new song, this interaction is analyzed and behavioral features are extracted in the same way as the features calculation procedure for the explicitly rated items. Using these features and the features profile, the recommender classifies the new item and writes the item and its inferred class into the implicit rating history. In our model we work with two classes: positive and negative (likes and dislikes).
354
S. Kordumova et al.
Note that the behavior-based recommender does not provide recommendations directly to the user. It is just used to classify items based on the user interaction. The idea is that the use of both rating histories results in faster learning when compared to only using an explicit rating history. The implicit and explicit rating histories can be used in different ways to recommend new items with which the user has not yet had any interaction. One way is to construct separate user profiles from the implicit and explicit rating histories (see Fig. 2). Both profiles are then used by an integrated recommender to rate new items. This recommender is designed to combine two sources of information about the user and, based on them, make a distinction between future likes and dislikes. An example of integrated recommender is described in [8]. The behavior-based recommender and the integrated recommender are based on naive Bayesian classification.
Fig. 2. Combination of implicit and explicit rating histories
4 Extracting Features from Music Listening Behavior We have applied our model to a web-based music player called xStream [9] developed by Philips Research and used within the company for research purposes. xStream offers an interface to a music library with about 80000 songs from many artists and genres. Users can search or browse the library and select songs to play. When an artist is selected from the library, all his songs and albums are also displayed. It is important to note that the player plays all songs of an album sequentially unless the user decides otherwise by interacting with the player. After the user has selected a song, the player allows standard actions such as pause/resume, stop, seek, skip to next or previous song, repeat, shuffle, and volume adjustment. Additionally, the player allows rating songs, albums and artists on a five-point scale. All interactions with the player are time stamped and saved in a database together with the song id and the user id. The usage data is then analyzed to extract, for each user and for each rated song, behavioral features. Table 1 gives an overview of the behavioral features we have defined. The following paragraphs explain in more detail the behavioral features and the rationale for considering them. Times played. Playing a song repeatedly is an indication that the user likes it; therefore the number of times a song has been played is potentially an important feature.
Personalized Implicit Learning in a Music Recommender System
355
Table 1. Behavioral features
No. Feature 1 Times played 2 Percentage played 3 4 5 6 7 8
Description Number of times the song has been played How much of the song has been actually played on average Percentage played (multi-valued) Same as feature 2, but multi-valued Time when played Time of the day the song was played Date when played Date the song was played Day of the week when played Day of the week the song was played Times skipped Number of times the song was skipped Times next song Number of times the song followed a previously played song from the same album
Percentage played. Sometimes users skip to another song before the current one is finished. The actual percentage of a song which has been played is an important feature to take into consideration. When a song is played more than once, this feature is averaged. Percentage played (multi-valued). Most users play their favorite songs more than once. So the percentage played feature can also be treated as multi-valued feature [8] instead of averaging the percentage played. Time, date and day of the week when played. Listening to songs in the morning while at work, or in the night, is different, and provides some behavioral information about the user, his habits and music listening mood during different times of the day, or different days of the week. These features allow taking into consideration part of the context in which the listening takes place. They are treated as multi-valued features. Times skipped. The number of times a song is skipped is calculated indirectly by observing the user interacting with other songs of the same album. Because, without user intervention, the xStream system plays sequentially all songs of an album, we can identify different skipping behaviors schematically represented in Fig. 3: 1. Times skipped in album more than two songs: Corresponds to the user behavior shown in Fig. 3 (i). After playing song a, the user skips to song b, which is more than two songs further in the same album. The value of this feature for all songs between a and c is then increased. 2. Times skipped in album at most two songs: Fig. 3 (ii) and (iii) illustrate the listening behavior that corresponds to this feature. After playing song a, the user skips to song b, not more than two songs further in the same album. The value of this feature for all songs between a and b is then increased. This feature can be further separated into two features: times one song skipped in album and times two songs skipped in album. The first feature is extracted from the listening behavior illustrated in Fig. 3 (ii), and the second feature from the listening behavior shown
356
S. Kordumova et al.
in Fig. 3 (iii). It is interesting to investigate if skipping just one song or skipping more songs can correspond to different degrees of dislike. 3. Times skipped first play: When the user starts the xStream music player, he has to choose one song manually. If this song is not the first track of an album, then we can consider all songs preceding that one as skipped. This situation is shown in Fig. 3 (iv), where the value of this feature is increased for all songs before b. 4. Times skipped in other album: This feature represents the behavior shown in Fig. 3 (v). The user skips from song a to a song b belonging to another album. The value of this feature is increased for all the songs of album X following a and for all the songs of album Y preceding b. Mapping different skipping behaviors to different features allows building a more precise implicit user model. The disadvantage of this approach is a more complex model and a potentially larger number of features with missing values. As a trade off, we first calculate the features separately and then we combine them linearly into one feature using weights heuristically determined after interviewing 36 users on their skipping behavior. Times next song. Considering that the xStream music player automatically plays the songs of an album following a song that a user has manually selected, we can maybe expect the users to take advantage of this feature or, at least, to have adapted their behavior to this characteristic of the player. Therefore, it is possible that a user could favor songs that are followed by other songs he likes. Given a song that is automatically played by the system, this feature counts how many times any of the preceding three songs were manually selected by the user. To reduce the zero counts in calculating the implicit user profiles, some of the features described above are binned. For examples, the percentage played is mapped to 10 non-overlapping, equally distributed bins while time when played is nonuniformly mapped to intervals such as “morning”, “afternoon”, “late evening”, etc. 4.1 Listening Sessions Characterization The listening behavior of a user may vary from one listening session to another. People listen to music while working, when eating, at parties, when hanging out with friends and in many other situations. Depending on the situation, the interaction with the music player may differ considerably. For example, Fig. 4 shows two different listening sessions of about two hours occurring on different days and times of the day for a randomly chosen user. The plusses represent events in which the user has manually selected a song, while the circles represent events in which a song is played automatically by the music player. The two listening sessions appear to be quite different. The first one has many plusses, indicating intensive interaction with the music player. Closer inspection of the data reveals that many songs are not played till the end, and most songs that are played are chosen manually by the user. The second listening session shows much less interaction and more songs automatically played. The two sessions correspond to two different times of the day and probably to two different listening situations.
Personalized Implicit Learning in a Music Recommender System
Album
Album
Album
Album
Album X
357
Album Y
+1
+1
a
a
a
+1
a
+1
+1
+1
+1
+1
+1
+1
+1
+1
b
b
+1
+1
+1
b
+1
+1
+1
+1
+1
+1
+1
+1
b
+1
b
+1 +1 (i)
(ii)
(iii)
(iv)
(v)
Fig. 3. Different behaviors in skipping songs. The feature values of the skipped tracks are increased by one.
Based on the observation of this and many other user session plots from different users, we decided to classify the listening sessions in two main categories: active and passive sessions. Active sessions are sessions in which the user wants to listen to some particular songs and choose them manually. There is a lot of interaction with the music player, songs are often skipped or not played until the end. Passive sessions are sessions in which songs are played mostly automatically and until the end. The user has little interaction with the music player and may occasionally skip some songs. We use the definition of different session types to define new behavioral features. For example, skipping a song in an active or in a passive session indicates a different degree of dislike. Till now we did not made a distinction between listening behavior in different sessions. One approach for making such a distinction is weighing the values associated to the feature values based on the type of listening session. We define the following additional behavioral features based on listening sessions: 1. Times first played in a session. 2. Times last played in a session. 3. Percentage listened weighed based on the session type. 4. Times skipped weighed based on the session type. 5. Times next song weighed based on the session type. Before calculating the session type, the behavioral data of a user is segmented into sessions depending on the timestamps of the interaction events. A new session is started every time a user interaction event occurs at least 20 minutes after the previous one. A session is labeled as active if the number of manually played songs exceeds the number of automatically played songs, passive otherwise.
358
S. Kordumova et al.
user 95 - listening session 06-03-2007
auto manual 14:52:48
15:07:12
15:21:36
15:36:00
15:50:24
16:04:48
16:19:12
16:33:36
user 95 - listening session 08-03-2007
auto manual 20:45:36
21:00:00
21:14:24
21:28:48
21:43:12
21:57:36
22:12:00
22:26:24
Fig. 4. Two different listening sessions for user 95
Times first and last played in a session. Users may start listening to music by choosing a song they definitely like. They may also stop their listening sessions after encountering a song they dislike. Percentage listened weighed based on the session type. Not listening to an entire song may indicate a certain degree of dislike for that song. The degree of dislike may vary depending on the session type. This feature weighs the listened percentages of songs depending on the session type. Times skipped weighed based on the session type. The skipping of a song in an active session is a typical behavior and it may not indicate the same degree of dislike as a skip in a passive session. This feature counts the number of times a song is skipped weighed depending on the session type: skipping a song in a passive session is weighed twice as heavy as skipping a song in an active session. Times next song weighed based on the session type. Similar to skipping songs, the number of times a song has been played automatically may be interpreted differently for active and passive sessions. This feature counts how many times a song has been played automatically and weighs these counts depending on the session type.
5 Evaluation and Results The goal of our evaluation was to evaluate the performance of the implicit learning model and its potential for a music recommender. The experiments are conducted using data from the xStream database. From a total of 661 users who had 2626342 interactions in the period 08.12.2006 – 01.04.2009, we selected for our tests only the 58 users who had rated more than 50 songs. The number of rated songs for each user varies from 51 to 825. To classify the instances into positive (like) and negative (dislike), we need to set a decision threshold for the Bayesian classifier. We have chosen to use symmetric classification (approximately same positive and negative classification accuracy) to balance the positive and negative error rates (see also [8]). For assessing the performance of the classifier, we have used leave-one-out cross-validation.
Personalized Implicit Learning in a Music Recommender System
Number of users
{1, 2}
{1, 2, 7, 8}
359
{1, 2, 3, 4, 5, 6}
20 18 16 14 12 10 8 6 4 2 0
Fig. 5. Number of users with classification accuracy for various ranges
The first experiment was conducted using only two features: times played and percentage played. The classification accuracy for these two features was on average 0.70 with standard deviation 0.19, a promising result considering the use of only two features. For six users, the accuracy is particularly low, below 0.5. After observing the behavior of these users, it appears that they have interacted with the system only for one or two days and they have given most ratings after listening very briefly to the songs. Perhaps these users only wanted to test the system and the recommender or to have an idea of what songs were available. This could explain why the classification accuracy is so low. If we remove these outliers, the average classification accuracy with merely two features increases to 0.76 with standard deviation 0.10 for the remaining 52 users. It is interesting to consider these six users as outliers because the system, based on a very low accuracy, could decide not to apply implicit learning to these users. Fig. 5 shows a histogram of the number of users having symmetric classification accuracy in various ranges using three heuristically chosen subsets of behavioral features, namely subsets {1, 2}, {1, 2, 7, 8} and {1, 2, 3, 4, 5, 6}. These subsets were chosen manually as a first investigation into the dependency of the classification accuracy on the choice of feature subsets. See Table 1 for an overview of the behavioral features. For all three feature subsets the average accuracy is around 0.76 with standard deviation 0.1. There are a few users for which the accuracy is very low and some user for which the accuracy is very high independently of the feature subset used. The third feature subset, however, has the least number of users (5%) with accuracy lower than 50%. 5.1 Optimization The results discussed in the previous section indicate that different feature sets provide diverse classification accuracy results. For some users, one feature set
360
S. Kordumova et al.
Best subset 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
4f 3f
15f
2
3
4
{1, 2, 3, 4, 5, 6}
6f
4f
6f
19f
3f
1
Relief
5f 4f
6
7
8
3f
13
14
5f
3f
6f
5
2f
9
10
11
12
15
Fig. 6. Classification accuracy of the Relief feature subsets compared to the best feature subsets found
provides better results, for other users another feature set gives higher classification accuracy. In order to optimize the behavior-based recommender for each user, we have applied the Relief feature selection algorithm [10]. This algorithm estimates the importance of features depending on how much their values distinguish between instances that are near to each other, in the same and in different classes. Figure 6 shows the classification accuracy of a set of 15 randomly chosen users when using the features with the highest weights calculated using the Relief algorithm compared to the best manually searched feature subset found for that user and to subset {1, 2, 3, 4, 5, 6}. Note that an exhaustive search for each user among all possible feature subsets would be impractical for 19 features. The figure also reports the number of features selected using the Relief algorithm. For most users, the accuracy achieved with the Relief algorithm does not significantly decrease when compared to using the best feature subset found while, in most cases, the number of selected features is remarkably low. Using, for each user, the subset of features indicated by applying the Relief algorithm, provides an average symmetric classification accuracy of 0.75 with standard deviation 0.1. This is a remarkable result indicating that the music preferences of most users can be modeled using a few automatically chosen behavioral features with substantially the same level of accuracy achieved using all features. In comparison to the heuristically chosen feature set {1, 2, 3, 4, 5, 6}, we can conclude that for most users Relief succeeds in reducing the number of features required without a significant decrease in accuracy. In a few cases, it achieves a better result than the heuristic choice, which was fixed for all users.
6 Conclusions and Future Work In this paper we have investigated the use of implicit learning for personalization in a music listening context, as an extension to explicit learning. To this end, we have
Personalized Implicit Learning in a Music Recommender System
361
proposed an architecture in which the interaction of the user with user-rated items is used to create an implicit, user-tailored profile to rate items automatically. We have proposed a set of behavioral features based on the interaction of the user with a webbased music player. We have investigated the performance of a two-class naive Bayesian classifier with various feature subset selection methods. Although our work is still ongoing, from our results we can conclude that the architecture proposed is feasible in that the behavior-based recommender provides encouraging results in terms of modeling explicit ratings through the use of behavioral features. The feature subset selection methods indicate that the subsets found are typically user-dependent, which justifies the use of a more complicated implicit learning architecture than using fixed behavioral features, independent of the user. However, for some users, implicit learning seems to work significantly worse than for others. For future work, improved, efficient features subset selection methods may be devised to optimize the recommender. Also for further research is the precise embedding of the behavior-based recommender into the integrated recommender and the comparison of its recommendation performance with and without the implicit rating history. An important issue here is how and when to decide to add an implicitly rated song to the implicit rating history. An alternative to adding an implicitly rated song, for example if the behavior-based recommender cannot decide, is to solicit an explicit rating from the user for this song. The possibility of using active learning [11] could also be investigated. Furthermore, initial results, not presented in this paper, indicate that the use of behavioral data can compete with music metadata, which warrants further research. From an architectural point of view, a comparison should be made between a recommender based solely on explicit ratings and an integrated recommender based on both explicit ratings and implicit ratings based on behavioral data. Another topic requiring future work is how to distinguish different types of listening sessions and to apply context-aware learning. A further challenge is the gathering of a large body of detailed user interaction data, preferably using a real-life product or service over a long period of time. We are currently in the process of collecting such data set in the broadcast video domain. Acknowledgments. We thank Marco Tiemann for giving us access to the xStream database, Tsvetomira Tsoneva and Igor Berezhnoy for reviewing this manuscript and all the xStream users for providing their behavioral data.
References 1. 2. 3. 4.
MovieLens Home Page, http://www.movielens.org/ TiVo Home Page, http://www.tivo.com/ Watchmi Home Page, http://www.watchmi.tv/ Oh, J., Lee, S., Lee, E.: A user modeling using implicit feedback for effective recommender system. In: ICHIT ’08: Proceedings of the 2008 International Conference on Convergence and Hybrid Information Technology, pp. 155–158. IEEE Computer Society, Washington (2008)
362
S. Kordumova et al.
5. Park, Y.-J., Chang, K.-N.: Individual and group behavior-based customer profile model for personalized product recommendation. Expert Syst. Appl. 36(2), 1932–1939 (2009) 6. Qiu, F., Cho, J.: Automatic identification of user interest for personalized search. In: WWW ’06: Proceedings of the 15th international conference on World Wide Web, pp. 727–736. ACM, New York (2006) 7. Gadanho, S.: TV Recommendations based on implicit information. In: ECAI’06 workshop on Recommender Systems (2006) 8. Pronk, V., Verhaegh, W., Proidl, A., Tiemann, M.: Incorporating User Control into Recommender Systems Based on Naive Bayesian Classification. In: RecSys 2007: Proceedings of the 2007 ACM Conference on Recommender Systems, pp. 73–80. ACM, Minneapolis (2007) 9. Tiemann, M., Pauws, S., Vignoli, F.: A Modular Hybrid Recommender System. In: BNAIC 2009: Proceedings of the 21st Conference on Artificial Intelligence, Eindhoven, The Netherlands (2009) 10. Kira, K., Rendell, L.A.: The feature selection problem: Traditional methods and a new algorithm. In: AAAI, pp. 129–134. AAAI Press and MIT Press, Cambridge (1992) 11. Hanneke, S.: Theoretical Foundations of Active Learning, Ph.D. thesis, Carnegie Mellon University (2009)