Modelling User Preferences from Implicit Preference Indicators via Compensational Aggregations Ladislav Peska and Peter Vojtas Faculty of Mathematics and Physics Charles University in Prague Malostranske namesti 25, Prague, Czech Republic peska|
[email protected]
Abstract. In our work, we focus on recommending for small or medium-sized e-commerce portals. Due to high competition, users of these portals lack loyalty and e.g. refuse to register or provide any/enough explicit feedback. Furthermore, products such as tours, cars or furniture have very low average consumption rate preventing us from tracking unregistered user between two consecutive purchases. Recommending on such domains proves to be very challenging, yet interesting research task. We will introduce new method for learning user preferences based on their implicit feedback. The method is based on aggregating various types of implicit feedback with parameterized fuzzy T-norms and Snorms. We have conducted several off-line experiments with real user data from travel agency confirming competitiveness of our method, however further optimizing and on-line experiments should be conducted in the future work. Keywords: Recommender Systems, Implicit Feedback, Fuzzy T-norms and Snorms, User Preference, E-Commerce.
1
Introduction and Related Work
We face the growth of information on the web with an increasing offer of products, information and services. Automation of web content processing is necessary. Several solutions are available, ranging from search engines to e-shops, aggregation shops and recommender systems. The main problem we are interested in is the personalization of web information processing by user preference mining. Our main starting point is using fuzzy (many valued) logic and interpreting each fuzzy value as a degree of user preference. In the area of e-commerce, we are especially interested in recommending for small e-commerce sites without dominant position on the market. Moreover we are interested in domains, where an average customer do not purchase an item very often (e.g. once a year). The competition of such sites is usually very high, so the users tend not to be very loyal, visit more sites comparing offers and do not provide any data about themselves (e.g. register or rate products). We need to deal with recommendation for a non-registered user based on very little information.
1.1
Related Work
Due to the space reasons, we will provide only a few references to the nearest related work in the area of user preferences and fuzzy systems. User preferences: Contrary to the explicit feedback, usage of implicit feedback requires no additional effort from the user. Monitoring implicit feedback varies from simple binary user visit or play counts to more sophisticated scrolling or mouse movement tracking, click stream etc. One of the first approaches to deal with implicit user feedback was Claypool et al. [1]. Their study involved multiple implicit preference indicators and containing idea to combine them in order to achieve better results. In our early work [6], we conducted an online experiment corroborating that using multiple implicit indicators improves recommendation quality. Indicators proposed in [1] served as a starting point to our model of user feedback. Implicit feedback is often taken as positive only. One of the few papers aiming (similarly as us) to infer negative preference from implicit feedback is Lee and Brusilovsky [5]. Hu, Koren and Volinsky [3] raised question of how to interpret real-valued implicit indicator in their work on TV recommender. Their paper brings an idea of decomposing user preference into the polarity and intensity which we use in one part of our method. Fuzzy Systems: The area of fuzzy systems is closely related to our work. Having multiple types of user feedback, resulting into the multiple preferences, we need an aggregation function to create single value representing user preference on the given object. Such aggregation could be the weighted average, a fuzzy T-norm S-norm or similar functions. Zimmermann and Zysno [9] described human decision making process and suggested parameter for the level of compensations for aggregating functions. Yager [8] suggested using noble reinforced S-norms to cope with the same problem. T-norms and S-norms are a generalization of the usual two-valued logical conjunction and disjunction for fuzzy logics. Usually four axioms are used to define T-norms and S-norms. Besides commutativity and associativity, T-(co)norms keep monotonicity: , ≤ , ≤ and boundary condition: for T-norms: , 1 = and S-norms: , 0 = . Several parametrized families of T-(co)norms were introduced, where level of compensations are determined with parameter λ. Frank, Schweizer-Sklar, Sugeno-Weber, Yager and Hamacher T-(co)norms are used in this work and are fully described in [4]. As an example, following formula depicts Sugeno-Weber T-norm and S-norm (also called T-conorm). + + −1 , = max 0, , = min 1, + + λxy 1+ 1.2
Main contribution
The main contributions of this paper are: • Proposing method for learning user preference from multiple implicit indicators using families of fuzzy T-(co)norms. • Comparing usability of binary and many-valued feedback. • Introducing negative user preference based on implicit feedback. • Off-line experiments on travel agency dataset.
2
User’s Implicit Preference Indicators in E-Commerce
We have collected usage data from one of the major Czech travel agencies. Data were collected from December 2012 to January 2014. Travel agency is typical e-commerce enterprise, where customers buy products only once in a while (most typically once a year). The site does not force users to register and so we can track unique users only with cookies stored in the browser. User typically either land straight on the intended object via search engine (less interesting case), or browses through several categories, compares objects (possibly on more websites) and eventually buys an object. Unlike the majority of research groups we had access to the source codes, so we could (after approval) tailor user feedback mining to suit our needs. Table 1 contains full description of used implicit indicators. Note that indicators are stored on user×object bases. Feedback dataset contains approx. 350 000 user×object records with 0.07% density of user×object matrix and in average 1.6 visited objects per user (220 000 distinct users and 2300 objects). For the purpose of the experiment, the dataset was then restricted to only users with at least one purchased and 4 visited objects leaving over 3500 records from 364 users (in the original dataset there are in total 2000 purchases and approx. 16000 users with 4 or more visited objects). Table 1. Description of the considered implicit feedbacks for user visiting an object.
Factor
Description
F1: PageView F2: Mouse F3: Scroll F4: Open F5: Time puo: Purchase
Count( OnLoad() event on object page) Count( OnMouseOver() events on object page) Count( OnScroll() events on object page) Count( Item was opened from the list of recommended objects) Sum( time spent on object page – in seconds) 1 IFF user bought the item, 0 OTHERWISE
As our previous work corroborated [7], using purely collaborative filtering methods on such a sparse dataset comes up with poor results, so content-based algorithms were used in the experiments. The travel agency domain allows defining fairly reasonable nominal content-based features e.g. type of the tour, destination country, types of accommodation, transportation or meal accompanied by the numeric price attribute.
3
Modeling User Preference from Implicit Indicators
In this section, we will first describe our motivation and assumptions which are behind the preference model and then describe the model itself. First, our deep belief is that using only binary preference (like vs. dislike) leads to the large loss of important and interesting information. After trying several other approaches, we tend to accept preference model described e.g. by Hu et al. [3], where user preference is two-dimensional feature with binary preference ∈ {0,1} (like, dislike) and numerical intensity of the preference c. This model allows us to easily
modify c with e.g. importance of indicator and express negative preference easily as well. Second, there are several types of user behavior, which can be recognized as indicator of preference; we will further call them as implicit indicators. Those indicators can be used directly e.g. for recommendation, omitting the concept of user preference, but we would rather use them to define user preference, because afterwards any standard recommending algorithm can be used and the user preference itself can be processed for other purposes. Finally we want to express user preference on each implicit indicator separately and then combine them together. One of the benefits of such approach is that we can define usefulness of each indicator and use this for improved monitoring of user behavior. 3.1
Two-step Model of User Preferences
The major approach in e-commerce systems without explicit feedback is to use business-like point of view and state that user positively prefers the object(s) which he/she has purchased. This user preference is binary (0/1) function denoted as pu,o. The problem of pu,o is that the purchase actions are very sparse. The vast majority of users did not purchase any object, so pu,o is useless to create any personalized recommendations. However we can use other indicators to express purchase behavior. This will be done in two steps: In the first step called local preference learning, every feedback $% : DFi →[0,1], where DFi is the domain of indicator value is ordered using a fuzzy set # the indicator Fi. With these fuzzy sets, the original space of feedback indicators (also ) called data cube) ∏) %*+ '(% is transformed into a preference cube ,0,1- . In the second step, called global preference, the local preferences are aggregated into the overall user preference over the item using an aggregation function @: ,0,1-) → ,0,1-. The resulting score 44444 2,3 is a final product of analysis of user behavior and further passed to the content-based recommending algorithm. Local Preferences: The basic idea behind local preferences is to project user rating (purchasing behavior in case of current dataset) into the domain of each implicit indicator. All currently used implicit indicators are numerical, so we can use some regression method to model relationship between rating and implicit indicator. There are other options like discretizing indicator’s domain, but these methods performed poorer in our preliminary experiments. Linear, Quadratic and Peak [2] regressions were included in the experiments. Global Preferences: There are several ways how to construct global preferences. Probably the most common approach is using weighted average; however weighted average is not compensatory: a single bad value among the indicators can significantly decrease the resulting user preference. This problem is known in the area of decision making and fuzzy systems, where S-norms are suggested. However using solely S-norms may overestimate resulting user preference, so some combination of T-norm and S-norm is needed. While we get local preferences #5% ∈ ,0,1-, they are first transformed into our internal model of user preferences (polarity p, intensity c).
%,2,3
?%,2,3
$;