A Context-Aware Movie Preference Model Using a ... - CiteSeerX

7 downloads 34 Views 368KB Size Report
Binder, J., Koller, D., Russell, S., Kanazawa, K.: Adaptive probabilistic networks with hidden variables. Machine Learning 29, 213–244 (1997). 3. Breese, J.S. ...
A Context-Aware Movie Preference Model Using a Bayesian Network for Recommendation and Promotion Chihiro Ono, Mori Kurokawa, Yoichi Motomura, and Hideki Asoh KDDI R&D Labs, Inc., Keio University, AIST [email protected], [email protected], [email protected], [email protected]

Abstract. This paper proposes a novel approach for constructing users' movie preference models using Bayesian networks. The advantages of the constructed preference models are 1) consideration of users' context in addition to users’ personality, 2) multiple applications, such as recommendation and promotion. Data acquisition process through a WWW questionnaire survey and a Bayesian network model construction process using the data are described. The effectiveness of the constructed model in terms of recommendation and promotion is also demonstrated through experiments.

1 Introduction Modeling user preferences is a key technology in various personalized applications, such as recommendation, intelligent interface, and one-to-one marketing. In this paper, two major issues for constructing preference models are investigated. The first issue is context-awareness. As Internet access via cellular phone becomes more common, diversification of the context in which the user uses the service, e.g. in town, in the home, as well as diversification of the service and item, is also increasing. User preferences may also change, not only according to the users’ personality, but also the context such as mood, location, accompanying person, and so forth. Therefore, a user preference model is required that takes account of both the users’ personality and the context for various personalized services, such as recommending appropriate items for each user in different situations. Several approaches for constructing a preference model for recommendation have been developed in research and business fields [1, 17, 20], of which two effective examples include collaborative filtering [3, 7, 18, 19] and the content-based method [13]. Several approaches for integrating both methods have also been investigated in order to combine the merits of each [4, 12, 15]. However, existing approaches cannot handle both users’ personality and the situation at the same time. The second issue is multiple applicability. For example, in addition to the recommendation, a preference model can also be useful for promoting items. Currently, respective preference models for recommendation and promotion are constructed independently, which disturbs efforts to share the data collected by each application. In a preference model that could be used for two or more applications, all collected information, including users’ feedback data, could be used to construct and improve the model. C. Conati, K. McCoy, and G. Paliouras (Eds.): UM 2007, LNAI 4511, pp. 257–267, 2007. © Springer-Verlag Berlin Heidelberg 2007

258

C. Ono et al.

In order to solve both issues, we propose a novel way of constructing contextaware and multi-applicable preference models using Bayesian networks. Bayesian networks [9, 16] provide a powerful and flexible method for modeling a complex joint probability distribution of multiple random variables and are applied to various tasks such as printer failure diagnosis, traffic jam prediction, and modeling chemical reactions in body cells [8, 10]. The high flexibility of Bayesian networks is appropriate for representing complex relations between users’ preference and contexts. One Bayesian network model can be used for multiple applications that may use different variables as dependent variables, such as recommendation and promotion. Unlike a conventional data analysis model, such as the regression model, Bayesian networks do not specify the direction of inference in advance. Any random variables in the network can be inputs, while any other variables can be the target of prediction by calculating the conditional probabilities. One of the most difficult problems in using a Bayesian network is the model construction. Although various methods for model construction have been proposed, many are theoretical or only applicable for small scale problems and there remains no established standard for constructing a practically usable Bayesian network model with many variables. In this paper, we propose a novel model construction process and construct a Bayesian network using the data acquired through an original largescale WWW questionnaire survey. We also show the effectiveness of the model through experiments. The rest of this paper is organized as follows. Section 2 formulates our movie recommendation and promotion task, while Section 3 describes our approach to model construction and Section 4 explains the data collection process. Section 5 describes the construction process of the Bayesian network model. Evaluation of the model is shown in Section 6, before finally, Section 7 concludes the paper.

2 Preference Model for Movie Recommendation and Promotion In this paper, we choose movie recommendation and promotion as target applications and constructed a context-aware movie preference model applicable for both applications. In this section, we provide an overview of how a Bayesian network preference model can be used for both recommendation and promotion. Bayesian networks can be used to model the joint probability distribution of multiple random variables. A random variable is represented as a node of the network, and the links of the network represent dependencies between variables. Conditional independences between variables are represented by the entire structure of the network and used for a more efficient probabilistic inference. With the Bayesian network, we formulate a movie preference model in the form of a joint probability distribution P(U, C, S, V). Here, U represents a set of users’ profile variables, such as age, sex, etc., S represents a set of user situation/context variables such as location, mood, etc., C represents a set of movie attributes, such as genre, director, etc., and V denotes a user’s ratings for given movies within a given context. In the case of movie recommendation, the problem is finding movies that a given user is likely to rate highly. For this purpose, we calculate the conditional probability

A Context-Aware Movie Preference Model Using a Bayesian Network

259

P(V |u, s, c) for the target user U=u, specific context S=s, and the candidate movie C=c and then recommend movies in order of probability. Alternatively, we may calculate the conditional probability P(C| u, s, v) for the target user U=u, context S=s and rating V=positive to find movies that are highly likely to obtain a positive rating. Figure 1 shows the flow of the recommender system we are developing (The recommended movies in the figure were blurred to protect copyright). Firstly, a user sends a request for recommendation with his situational data (accompanying person, location, and mood). Subsequently, the recommender system merges the registered user attributes with the input user situational attributes and calculates the probability of the user rating for each candidate movie using the Bayesian network inference engine, before then composing a recommendation list of movies, according to the probability of positive ratings. The recommendation system may receive user feedback, and periodically, the system updates the parameters of the movie preference Bayesian network model using feedback data by using the Bayesian inference engine in order to increase the precision of the recommendation. 3) Recommend movies and its reason 1) Input user situation 2) Inference Request Recommender System “I want to be relaxed with a girlfriend Calculate at the theater” P(v | u,c,s) or P(c | u,s,v) “Thirty Year Old Male”

User Database

5) Update Parameters Movie Database Rating Database

4) User Feedback

Fig. 1. Flow of the Movie Recommender System

Although the preference model can be used in many ways, here, we explain two typical ways for movie promotion. The first involves finding user segments that may like target movies to be promoted. For this purpose, we calculate the conditional probability P (U | c, s, v) for the target movies C=c, target context S=s, and V=positive, whereupon promotional information concerning the target movie is sent to the user segments with high probability. The second way involves finding solicitation points of the target movie for each user segment. These solicitation points are attractive aspects of the movie to which

260

C. Ono et al.

the target user is likely to react. In this case, prediction of the user impression when a user feels satisfied to see a certain movie in a certain situation could be useful. Our preference model includes impression variables (I), such as feeling relaxed or laughter, as an unobservable hidden variable that can be a reason to rate movies. We calculate the conditional probability P (I | u, c, s, v) for the target movies C=c, S=s, and U=u. Promotional information concerning the target movie, including personalized solicitation points with a high probability of reacting, are sent to each target user segment. Figure 2 shows the flow of a promotional assistance system that we are developing. Firstly, an operator sends a request to the system to find candidate target user segments with information concerning a target movie (e.g. comedy, love romance, etc.). Then, after receiving the candidate user segments and choosing the target user segments (e.g. young female, etc.), the operator sends a request to the system to find appropriate solicitation points for each target user segment (e.g. they will be satisfied because they feel relaxed by the movie). Finally, the operator sends personalized promotion information to target user segments (e.g. “This movie makes you feel relaxed!). Users

Request for Target Users

Candidate Users

Request for Solicitation Points

Promotion Action

Promotion Assistance System calculate: P(u|c,s,v) P(l|u,c,s,v) Update Parameters

Operator

Collect Reaction

(Users, Solicitation Points) Reaction Database

Fig. 2. Flow of the Promotion Assistance System

Here, since a recommendation system and promotion system can use the same movie preference Bayesian network model, feedback and reaction data from users can be commonly used to update the parameters of the model and thus increase the precision of both the recommendation and promotion.

3 Model Construction Strategy A Bayesian network can be specified with the structure of the network and the conditional probability tables (CPT) attached to each node in the network. There are three major approaches in constructing a network. The first involves specifying both

A Context-Aware Movie Preference Model Using a Bayesian Network

261

the network structure and the conditional probabilities manually, based on expert domain knowledge, while the second involves estimating both the network structure and the probabilities from the data automatically. There are two ways to follow the second approach. The first is to select variables and a structure from candidates using information criteria such as AIC (Akaike information criteria) or MDL. A typical example is the K2 algorithm proposed by Cooper et al. [6]. The second way is to decide on the existence of links between variables using a statistical test of independence [5]. However, these approaches can handle only small networks. The third approach involves specifying the (rough) network structure manually and estimating the conditional probabilities from the data. This approach is generally used when both learning data and domain knowledge can be obtained. This approach is also the most practical and has been applied to some real-world problems [2, 11]. However, the processes of the model construction used in those examples include various heuristics and no standard process has been established. Due to the considerable number of random variables in our model, we take the third approach. We initially assume a rough network structure used to predict the user’s ratings for movies, which is shown in Fig. 3. This structure means that the overall rating depends on the common variables representing the user’s impressions of a movie (feeling excited, feeling scared, feeling sad, feeling relaxed, etc.), and the impressions are based on user attributes, situational attributes, and movie attributes. For the model used in the following experiments, we reverse the direction of links in order to simplify the CPT: User Attributes Age

Frequency Gender

Impressions

Situation Attributes

Rating Cried

Who

location

Mood

Movie Attributes

Laughed

Satisfactory

Relaxed

Country Genre

year

Fig. 3. Model Structure

4 Data Acquisition The data acquisition procedure was composed of two parts: a small-scale intensive interview and a large-scale questionnaire survey. Firstly, we interviewed a small number of subjects about their movie preferences. In the interview, several movies were presented to a subject and the subject was then asked to classify them into

262

C. Ono et al.

favorite and hated movie groups respectively. The reasons for awarding a favorable rating and their favorite movies were then asked and based on feedback from the interviews, we designed questions for a large-scale questionnaire survey. The questionnaire survey was conducted in March 2006. 1. Number of subjects: 2153 2. Number of movies: 197 3. Rating condition: Each subject rated 5 to 10 movies that were randomly selected from those he/she had watched. 4. Inquiries: − User demographic and lifestyle attributes: 30 attributes such as age, gender, and occupation, brand loyalty, time and expenditure on leisure. − User attributes regarding movie appreciation: 32 attributes such as important factors for selecting movies, and genre preference.(a 7-grade scale for each attribute) − (For each movie) Situation of watching movie: 43 attributes such as accompanying person, and mood. − (For each movie) Impression of the movie: 358 attributes such as cried, and laughed. (7-grade scale for each attribute) − (For each movie) Rating of the movie: 1 attribute (7-grade scale from satisfied very much to not at all satisfied) In this questionnaire, the number of attributes, such as user situation (43) and user impression (358), is larger when compared to other datasets such as MovieLens. As for movie attributes, we prepared 26 attributes, such as genre, length, country, and keywords extracted from introductory texts for representing movies. We divided the data into three parts. For the first part, we extracted rating data for 3 movies as the test data to evaluate the movie promotion, before subsequently extracting 1 rating data from each of the user ratings as the test data to evaluate the recommendation, while the remainder was used for model construction. Here, in the data used for model construction, each user rated an average of 3.26 movies, which was rather sparse.

5 Model Construction Step Starting from the assumed general structure of the network shown in Fig. 3, we selected effective variables from many observed attributes, determined local network structures that reflect the relationship between selected variables, and estimated the CPTs via a standard maximum likelihood estimation [9], using the questionnaire data. The whole procedure is as follows: 1. Data preprocessing: − extract keywords from movie introductory texts. − categorize the extracted keywords into four main categories such as “describing scene,” “describing mood,” etc. − further categorize the keywords in each category into three to five sub-categories.

A Context-Aware Movie Preference Model Using a Bayesian Network

263

2. Extracting pseudo movie attributes: We introduced pseudo movie attributes to enhance the movie attributes. These pseudo attributes are impressions that are not user specific and can thus be seen as attributes of a movie. The pseudo movie attributes were selected from impression attributes whose scores have little difference within users. Specifically, we calculated: score = I(r,CID)/ H(r|CID) . Here, I(r, CID) is the mutual information between the attribute value r and the content-ID. H(r |CID) is the conditional entropy of r conditioned by the contentID. We selected attributes with high scores as pseudo movie attributes because the heavier the relation to the content-ID and the smaller the difference within users, the more the attribute represents the movie characteristics. 3. We grouped all variables into five categories: “user attributes (U),” “user situations (S),” “movie attributes (including pseudo movie attributes)(C),” “impression attributes (excluding pseudo movie attributes)(I),” and “total rating (V).” 4. Clustering attributes in each group: For each group, we calculated the mutual dependency between each attribute and clustered the attributes based on the dependency score using the Ward’s clustering method [21]. Subsequently, we extracted a representative attribute that stands for each cluster. The number of clusters (= the number of representing attributes) in each group is shown in Table 1. Table 1. The number of Clusters for each group

Groups in questionnaire

Number of attributes

Groups after preprocessing

Number of clusters

Demographic and lifestyle attributes

30

Demographic and lifestyle attributes

10

Movie appreciation Situation Impression

32

Movie appreciation Situation Impression Movie attributes

10

Movie attributes

26

43 358

7 24 23

5. Search structures: In order to find the network structure between groups, we searched the parent child relations between the variables in each group. We then generated candidate structures and selected the one that best fits the data using the AIC. For these processes, BAYONET [12], a tool for constructing a Bayesian network that we developed, was used. When generating the candidate structures, the maximum number of parents was set to 4, while for the comparisons, we constructed 4 types of model as shown in Table 2. The first model includes variables in all groups (V, I, U, C, S), while the second model does not include S. The third model includes S but does not include I and the fourth model does not include S and I.

264

C. Ono et al. Table 2. The number of nodes and links in the constructed networks

Number of nodes 75 68 50 44

UCS-I-V UC-I-V UCS-V UC-V

Number of links 115 107 49 43

6 Evaluation of Models We evaluated the effectiveness of our movie preference models in terms of the accuracy of rating prediction. Four kinds of constructed models were compared from the viewpoint of the recommendation and promotion performances. In the following evaluations, as described in Section 4, data that is not used in the model construction was used. 6.1 Evaluation of Predicting User Rating Firstly, the above four types of models were evaluated from the viewpoint of predicting user rating V. As a measure of accuracy, we used the mean average error (MAE) of the prediction. When the total number of predicted ratings is N, the number of values of the rating is r, the correct rating value of User i to Movie j in Context k is pijk, the predicted rating value is v, the MAE can be formulated as:

1 N

∑| p ijk

ijk

− ∑ vP (V = v | U = i , C = j , S = k ) | . v

We compared the four types of Bayesian network models, a baseline predictor, and the standard collaborative filtering (CF). As a baseline predictor, we used the predictor that outputs the average rating of each movie and implemented both userbased and item-based collaborative filtering. Pearson correlation is used as the similarity in the CFs[3] . Table 3. Comparison of Prediction Accuracy

Model Baseline CF (user-based) CF (item-based) UC-V UCS-V UC-I-V UCS-I-V

MAE 0.927 0.975 0.930 0.887 0.869 0.862 0.854

A Context-Aware Movie Preference Model Using a Bayesian Network

265

Table 3 shows the evaluation results. In the experiment, the user-based CF can obtain predictions only for 4.1% (# = 72) of users, while the item-based CF can obtain predictions only for 12.6% (#=221) of users. The MAE values for CFs are also relatively poor, because the number of ratings is rather sparse (only 3.26 movie ratings for each user on average) in our ratings database. All BN models have a better score than the baseline and CFs. Models with an impression attributes layer (UC-I-V, UCS-I-V) have a better score than the Naïve Bayesian model (UC-V, UCS-V), while models with user situation (UCS-V, UCS-IV) have a better score than those without (UC-V, UC-I-V). These results demonstrate that the situation attributes work well, and that introducing a hierarchical structure is effective. 6.2 Evaluation of Finding Target User Segments We evaluated the performance of finding target user segments. We calculated P(U|C=Ci,S=s,V=positive) for 3 movies (C1,C2,C3) that are not used for the model construction, using a model which includes variables in all groups (UCS-I-V). The typical user segment that likes each movie can be characterized by the maximum posterior value (MAP) of user attributes U. Subsequently, we selected users in the typical user segment. As there are many user attributes, some must be selected to obtain the useful user segment. Here, we highlighted three user attributes, namely “gender”, “marriage status”, and “focus on popularity” because of their strong correlation with overall rating, and selected users with the MAP values of the three attributes. Finally, we compared the ratio of the positive movie ratings for the selected users with that for all users. As shown in Table 4, the ratio of positive ratings for selected users was higher than the equivalent figure for all users and the difference is significant at a 5% level for C1, which demonstrates the effectiveness of the user segmentation based on our preference model. There may be various ways of finding effective target user segments using our user model and exploring better ways remains as a future issue. Table 4. Comparison of ratio of positive rating among user segments Movie ID

Target

Number of Users

Ratio of Positive Rating

C1 C1 C2 C2

Whole Segmented Whole Segmented

1281 309 1068 270

36.20% 42.72% 32.68% 36.37%

7 Discussion and Conclusion This paper proposes a novel approach for constructing a context-aware movie preference model using a Bayesian network and its effectiveness, in terms of recommendation and promotion, is demonstrated through experiments. Although the improvement in prediction accuracy was relatively slight, the results demonstrate that introducing situation attributes and a hierarchical model structure are both effective and promising.

266

C. Ono et al.

One reason for the modest performance improvement was the sparcity of the data used in the model construction, which compounds the difficulty of the prediction problem. We are now planning a field test of a movie recommender system and a movie promotion assistance system, both of which use the common constructed preference model. In field tests, we aim to conduct additional data acquisition and make an assessment of the improvement in the model performance. Meanwhile, evaluating other aspects, such as subjective impressions of the recommendation results, and the usability of the promotion assistance system should also be conducted. Acknowledgements. The authors wish to thank, Dr. Shigeyuki Akiba, President and CEO of KDDI R&D Laboratories, Inc. for his continuous support for this study and Ms. Mayomi Haga for her support for data acquisition.

References 1. Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. on Knowledge and Data Engineering 17(6), 734–749 (2005) 2. Binder, J., Koller, D., Russell, S., Kanazawa, K.: Adaptive probabilistic networks with hidden variables. Machine Learning 29, 213–244 (1997) 3. Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence, pp. 43–52 (1998) 4. Burke, R.: Hybrid recommender systems: survey and experiments. User-Modeling and User-Adapted Interactions 12, 331–370 (2002) 5. de Campos, L.M.: Independency relationships and learning algorithms for singly connected networks. Journal of Experimental and Theoretical Artificial Intelligence 10, 511–549 (1998) 6. Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9, 309–347 (2002) 7. Herlocker, J., et al.: Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems 22(1), 5–53 (2004) 8. Horvitz, E.: Principles of mixed-initiative user interfaces. In: Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (1999) 9. Jensen, F.V.: Bayesian Networks and Decision Graphs. Springer, Heidelberg (2001) 10. Jensen, F.V., et al.: The SACSO methodology for troubleshooting complex systems. Artificial Intelligence for Engineering Design, Analysis and Manufacturing (AIEDAM) 15, 321–333 (2001) 11. Mani, S., McDermott, S., Valtorta, M.: MENTOR: A Bayesian model of prediction of mental retardation in newborns, Research in Developmental Disabilities, vol. 8(5) (1997) 12. Mobasher, B., Jin, X., Zhou, Y.: Semantically enhanced collaborative filtering on the Web. In: Berendt, B., Hotho, A., Mladenić, D., van Someren, M., Spiliopoulou, M., Stumme, G. (eds.) EWMF 2003. LNCS (LNAI), vol. 3209, Springer, Heidelberg (2004) 13. Mooney, R.J., Roy, L.: Content-based book recommending using learning for text categorization. In: Proceedings. of the 5th ACM Conference on Digital Libraries, pp. 195– 204 (2000) 14. Motomura, Y.: Bayesian network construction system: BAYONET. In: Proceedings of Tutorial on Bayesian Networks, pp. 54–58 (In Japanese) (2001)

A Context-Aware Movie Preference Model Using a Bayesian Network

267

15. Ono, C., Motomura, Y., Asoh, H.: A study of probabilistic models for integrating collaborative and content-based recommendation, Working Notes of IJCAI-05 Workshop on Advances in Preference Handling (2005) 16. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Francisco (1988) 17. Resnick, P., Varian, H.R.: Recommender systems. Communications of the ACM 40(3), 56–58 (1997) 18. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of ACM Conference on Computer Supported Cooperative Work, pp. 175–186. ACM Press, New York (1994) 19. Shardanand, U., Maes, P.: Social information filtering: algorithms for automating word of mouth. In: Proceedings of CHI’95 Mosaic of Creativity, pp. 210–217 (1995) 20. Zekerman, I., Alberecht, D.W.: Predictive statistical models for user modeling. User Modeling and User-Adapted Interaction 11(1-2), 5–18 (2001) 21. Ward, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)

Suggest Documents