Using Item Descriptors in Recommender Systems 1
2
Eliseo Reategui , John A. Campbell , Roberto Torres
3
1 Departamento de Informática Centro de Ciências Exatas e Tecnologia Universidade de Caxias do Sul Rua Francisco Getúlio Vargas, 1130 95070-560 Caxias do Sul, RS, Brazil
[email protected] 2 Department of Computer Science University College London London WC1E 6BT, UK
[email protected] 3 Instituto de Informática Universidade Federal do Rio Grande do Sul 91501-970 Porto Alegre, RS, Brazil
[email protected]
Abstract One of the earliest and most successful technologies used in recommender systems is known as collaborative filtering, a technique that predicts the preferences of one user based on the preferences of other similar users. We present here a different approach that uses a simple learning algorithm to identify and store patterns about items, and a noisy-OR function in order to find recommendations. The technique represents knowledge in item descriptors, which are recordlike structures that store knowledge on when to recommend each item. A recommender system keeps several item descriptors that compete when a recommendation is requested. Besides showing a good performance, the item descriptors have the advantage of making it easy to understand and monitor the system’s knowledge. This paper details the item descriptors as well as the way they are used to identify users’ preferences. Preliminary results are presented, and directions for future work are indicated.
Introduction Collaborative Filtering is one of the most popular technologies in recommender systems (Herlocker et al., 2000). The technique has been used successfully in several research projects, such as Tapestry (Goldberg et al., 1992) and GroupLens (Sarwar et al., 1998), as well as in commercial websites: e.g. Amazon.com Book Matcher, CDNow.com (Schafer et al., 1999). The algorithm behind collaborative filtering is based on the idea that the active user is more likely to prefer items that like-minded people prefer. To support this, a similarity score between the active user and every other user is calculated. Predictions are generated by selecting items rated by the users with the highest degrees of similarity. Copyright © 2002, American Association for Artificial Intelligence (www.aaai.org). All rights reserved.
One of the interesting features of this technique is that by exploiting information about other users’ ranked items, one may receive recommendations of items that have dissimilar content to those seen in the past. Although collaborative filtering has been used effectively in a wide range of applications, it has a scalability problem: as for other techniques that rely on actual cases or records to arrive at a solution to a problem (e.g. case-based reasoning), the more users there are in the database, the longer it may take to find similar users and items to recommend. The technique also tends to fail when little is known about a user (Mooney, 2000). Another drawback of collaborative filtering is that it is difficult to add business rules to it, or to modify manually the way the system recommends items. Content-based methods are another approach to recommender systems, where one tries to suggest items that are similar to those a given user has ranked positively in the past (Balabanovic and Shoham, 1997). This method has its roots in the field of information retrieval, and is often based on a search for certain terms and keywords (Popescul et al., 2001). One of the weaknesses of this approach is that it is susceptible to over-specialization, i.e. the more the system recommends items scoring highly against a user’s profile, the more the user is restricted to seeing items similar to those already rated. Another disadvantage of this approach is that generally only a very shallow analysis of contents can be supplied for the recommendations. Another scheme common in recommender systems is association rules (Lin et al., 2000, Mombasher, 2001), which can predict users’ preferences based on general rules extracted from a database. In the domain of e-commerce, association rules are relationships between items that indicate a connection between the purchase of one item and the purchase of another item. Although being successfully applied to predict customers’ preferences, association rules are hard to modify while keeping the rule
base consistent (e.g. adding new rules without contradicting existing ones). Keeping track of and trying to understand the large number of generated rules for each item is another difficulty of this approach. We present in this paper a different approach which uses record-like structures, called item descriptors, to represent knowledge on how to make recommendations. The method proposed is similar to content-based methods in that the system keeps descriptors of items, instead of storing information about users’ preferences. However, rather than listing terms and keywords in the descriptors, user’s features and item relationships are exploited. The method proposed shows a good performance with respect to processing time and accuracy, and has an advantage over other techniques when it comes to understanding the knowledge used to recommend items and letting users modify it. This paper presents the item descriptors, detailing their knowledge representation mechanism and showing how it is used to identify users’ preferences. The first section below gives the general structure of the descriptors, the learning algorithm and recommendation mechanism. Then, preliminary results are discussed, as well as conceptual advantages and drawbacks of the approach. The last section of the paper offers conclusions and directions for future work.
The item descriptors An item descriptor represents knowledge about when to recommend a particular item by listing the characteristics that users likely to be interested in the item should have. These characteristics can be classified as: • demographic: data describing an individual, such as age, gender, occupation, address • behavioral: data describing purchases or preferences of an individual It has been shown that both types of data are important when building a user profile (Krulwich et al., 1997) and inferring user’s preferences ( Buchner et al., 1998; Claypool et al., 2001). Demographic material is represented here in attribute-value pairs, which we call features. For example, a university lecturer could have the demographic features "occupation = lecturer" and "gender = male". Behavioral material is represented by actions that show appreciation or dislike for facts or items. For example, the purchase of an item (or its simple selection for visualization) demonstrates the interest of the user in the article chosen. While attributes used to define demographic features are typically single-valued, behavioral data is usually multivalued. For instance, a person can only belong to one age group (demographic), but he/she may like both jazz and rock 'n roll (behavioral). Nevertheless, both types of information are represented in our model in a similar way.
Let us examine an example of an item descriptor in the domain of a virtual bookstore, where behavioral data and demographic features are considered to be relevant for the recommendation of a certain item (represented by descriptor dn). Descriptor: dn
Class: Business
Correlated terms
Classes
Confidence
ta
Business
0.92
te
Marketing
0.87
tc
Demographic
0.85
td
Demographic
0.84
tb
Business
0.77
Figure 1: Item descriptor in the domain of a virtual bookstore
The descriptor has a target, i.e. an item that may be recommended in the presence of some of its correlated terms. Each term belongs to a certain class. For instance, term ta belongs to the behavioral class Business, representing, for example, the purchase of a business book. Term te belongs to the behavioral class Marketing, representing the purchase of an item of the corresponding category. Term tc belongs to the class Demographic, referring to a characteristic that is frequent in users who have rated the target item represented by dn. Each term’s class and confidence (the strength with which the term is correlated with the target item) is displayed next to its identification, as illustrated in figure 1. A separate structure is used to keep the complete hierarchy of classes for both demographic and behavioral features. While behavioral features are learned over time, demographic data about users is read from a database and kept unchanged (unless more information about users becomes available). Figure 2 gives an example of such a hierarchy for the virtual bookstore. Features Demographic Gender
...
Occupation Books
Marketing
Behavioral
...
Business
...
Page-views
Shopping cart ... Institutional
Figure 2: Hierarchy of demographic and behavioral classes
The hierarchy is useful to give a clearer idea of the existing classes and features, letting the user navigate through it in different levels of granularity. Furthermore, each class has its own set of attributes, such as:
• importance: how pertinent the features belonging to the class are in the search for appropriate recommendations; • default value: a feature considered to be present in the absence of all features belonging to the class. These attributes are inherited by the classes’ subclasses, and consequently by the actual descriptors. They are a flexible mechanism to let the user manipulate the way the system recommends items. This is analogous to having an ontology and using it to retrieve the relative importance of relationships. In the work of Middleton et al. (2002), for instance, the relations of document authorship and project membership can be selected in order to identify communities based on publications and project work.
The Learning Process Behavioral data and demographic features are represented in our model in the same fashion. Therefore, our learning algorithm treats them in the same way when determining the correlation among features and items. We use confidence as a correlation factor in order to determine how relevant a piece of information is to the recommendation of a given item. This is the same as computing the conditional probability P(dj|e), i.e. the probability that the item represented by descriptor dj is rated positively by a user given evidence e. Therefore, the descriptors can be learned through the analysis of actual users’ records. For each item for which we want to define a recommendation strategy, a descriptor is created with the item defined as its target. Then, the confidence between the target and other existing demographic features and behavioral data is computed. This process continues until all descriptors have been created.
The Recommendation Process The goal of the recommendation process is to find one or more items that match the user’s preferences. Given a list of users U={u1, u2,..., um} and a list of descriptors D={d1, d2,...,dn}, the recommendation process starts with the gathering of information about a given user ui to whom we want to make recommendations. All demographic features and behavioral data gathered are used to fill-in a user descriptor. Next, a competitive mechanism starts in which the system computes a similarity score for each descriptor dj by comparing it with the user descriptor and finding the list of terms T={t1,t2,...,tk} which match the rated items and demographic features of the user ui. The system computes a score for the descriptor that ranges from not similar (0) to very similar (1), according to the formula: Score (dj) = 1− Π (Noise (tp)) k
j where Score(dj) is the final score of the descriptor dj; Noise(tp) is the value of thei noise parameter of term tp, a concept used in noisy-OR probability models (Pradhan et al., 1994) and computed as 1 – P(dj | tp).
That expression contains an assumption of independence of the various tp - which the designer of a practical system should be trying to achieve in the choice of terms. Ultimately the test of the assumption is in users’ perception of the quality of a system’s recommendations: if the perception is that the outputs are fully satisfactory, this is circumstantial evidence for the soundness of the underlying design choices. The situation here is the same as in numerical taxonomy (Sneath and Sokal, 1973), where distances between items id in a multidimensional space of attributes are given by metric functions where the choice of distinct dimensions should obviously aim to avoid terms that have mutual dependences. If the aim fails, the metric cannot - except occasionally by accident - produce taxonomic clusters C (analogous to sets of items offered by a recommender system once a user has selected one member of C) that satisfy the users. The figure below gives an example of a descriptor de, with three terms matching the user’s rated items and demographic data: tr, ts and tu. Descriptor: de
Class: Behavioral
Correlate
Class
P(de | ti)
Importance
tr
Behavioral
0.75
1.0
ts
Behavioral
0.70
1.0
tu
Behavioral
0.60
1.0
Figure 3: Descriptor of item Id
One extra column has been added to the descriptor to represent the importance of each term, which may be useful if we want to differentiate the relevance of different types of attribute. This attribute’s value may be inherited by the term’s class, making this a flexible mechanism to exercise some control over the way the system recommends items. The score computed for this user for descriptor de would be: Score(de) = 1 - (0.25 * 0.3 * 0.4) = 0.97 This method is based on the assumption that any term matching the user's terms should increase the confidence that the descriptor holds the most appropriate recommendation. In a real-life example, let us suppose that we have a certain degree of confidence that a customer who buys a nail file will also want to buy nail polish. Knowing that this customer is a woman should increase the total confidence, subject to not exceeding the maximum value of 1.
Validation and Discussion The first tests made in our validation experiments compared the performance of our approach with collaborative filtering, in terms of processing time and
1
accuracy. We used the MovieLens database , storing anonymous ratings of 3900 movies assigned by 6040 users in the year 2000, to perform the tests. For the MovieLens database, we selected 10 films randomly from each test user and tried to identify the remaining films the user rated. The results obtained are presented below. Table 1: Scoring results for the MovieLens data set
Method
Scoring
Item Descriptors
65,7
k-nearest-neighbor (k=1)
39,3
k-nearest-neighbor (k=20)
54,9
k-nearest-neighbor (k=40)
59,7
We carried out the experiments considering neighborhoods with sizes 1, 20 and 40 (we did not observe any significant improvement in accuracy for the nearest-neighbor algorithm with neighborhoods larger than 40). The item descriptors performed better than the k-nearest-neighbor algorithm, no matter what size of the neighborhoods was chosen. Sarwar et al. (2001) have carried out a series of experiments with the MovieLens data set, employing the Mean Absolute Error (MAE) method to measure the accuracy of item-based recommendation algorithms. The results reported cannot be compared directly with our own as the authors computed their system’s accuracy using the MAE and considering integer ratings ranging from 1 to 5 (reaching values around 75%). In our experiment, we only took into account whether a user rated (1) or did not rate (0) an item. In order to evaluate the system’s performance, we monitored how much time was spent by the system in 2 order to recommend the 2114 items in the test data set . For k=1, the nearest-neighbor approach needed less time than the item descriptors to perform the tests, though showing a lower rate of accuracy. However, for larger values of k (or simply larger numbers of users) the performance of the nearest-neighbor algorithm degrades, while that of the item descriptors remains stable. Table 2 summarizes the results of the experiment.
1
MovieLens is a project developed in the Department of Computer Science and Engineering at the University of Minnesota (http://movielens.umn.edu). 2 The tests were performed on a PIII 500MHZ PC with 128Mb of RAM.
Table 2: Performance results for the MovieLens data set Method
Time spent in secs.
Item Descriptors
32
k-nearest-neighbor (k=1)
14
k-nearest-neighbor (k=20)
43
k-nearest-neighbor (k=40)
86
In more realistic situations where the nearest-neighbor algorithm may have to access a database containing actual users’ transactions, the nearest-neighbor approach may become impractical. For the same experiment described above, we tested the nearest-neighbor through access to an actual database, using k=10. A few hours was needed for the system to make the whole set of recommendations. The MSWeb database, available from Blake et al. (1998), was also used in our validation experiments. This database contains web data from 38000 anonymous users who visited Microsoft’s web site over a period of one week. We selected 5 items randomly from each test user and tried to identify the remaining items rated. The results obtained are presented below. Table 3: Scoring results for the MSWeb data set
Method
Scoring
Item Descriptors
59,3
k-nearest-neighbor (k=1)
37,2
k-nearest-neighbor (k=20)
43,8
k-nearest-neighbor (k=40)
53,9
The item descriptors were more accurate than the k-nearest neighbor algorithm regardless of the value chosen for k. Breese et al. (1998) reported results showing that the accuracy of other predictive algorithms varied from 54.8% (clustering method) to 59.8% (Bayesian network) for this same problem. In this experiment, the item descriptors showed an accuracy rate matching that of the Bayesian networks. We are carrying out an additional set of experiments using public databases from Blake et al. (1998) in order to compare our method further with other recommendation techniques. At present the system is also being applied to a large business-to-business website, and we expect to be able to use some of the data from that site in order to evaluate the use of demographical and behavioral data in other research experiments. Table 4 contrasts the item descriptors with other approaches that are frequent in recommender systems, according to different criteria.
Table 4: Comparing recommender systems’ approaches Collaborative Filtering
Contentbased methods
AssociItem ation Rules Descriptors
Representing knowledge
Actual records of rated items
Related terms and keywords
Association Correlated rules items and terms
Learning
Finding neighborhoods
Manual / Text mining
Inductive learning
Computing confidence and other correlation factors
Recommending
Finding similar users and recommending items they have rated
Searching for items with matching terms and keywords
Triggering rules and keeping the outcome with the highest value
Looking for items with the highest correlation factors
Representing knowledge The collaborative filtering approach represents knowledge through actual records of users’ purchases. This is somewhat analogous to case-based reasoning (Watson, 1997), in that both methods search for a problem solution in the history of actual user cases. The information used to describe items in content-based methods may be defined manually or through some text mining technique (Mooney, 2000). Such an approach elicits information about the items, but not the users’ preferences. Association rules store knowledge rules which are extracted from a database using some inductive learning algorithm. The item descriptor approach is different in that it represents knowledge in the form of descriptors and correlation factors. When compared with the other approaches in this respect, descriptors are interesting because they make it easy for users to understand as well as modify the knowledge represented. This is particularly important when the user wants to make the system respond in a certain way in given circumstances, or to include business rules in its knowledge base. At present, the type of information represented in the descriptors does not contain any forms of knowledge on which more complex (e.g. logic-based) extended automated reasoning might improve the quality of the personalization but would also be computationally too expensive to be worth considering in anything larger than a toy demonstration. The challenge for knowledge-based personalization is always to find ways of adding detailed improvements under the strong constraint that they do not change the practical computational feasibility of the application.
Learning Although collaborative filtering does not have any standard learning algorithm, the technique usually employs some clustering method to find neighborhoods and use them in order to accelerate the search for similar users in the database. Content-based strategies normally have their knowledge bases built manually or through some text-mining technique, scanning texts and associating terms and keywords with items. Association rules use well-known inductive learning algorithms, such as a priori (Agrawal and Srikant, 1994), to extract knowledge from databases. The main advantage of using such learning methods relies on the robustness and stability of the algorithms available. The learning mechanism used on the item descriptors also exploits well-known methods to compute correlation factors and define the strength of the relationships among features and items. The option to use term confidence instead of conditional probability to describe the model comes from the fact that other correlation factors that are not supported by probability theory are computed by the system, such as interest and conviction (Brin et al., 1997). However, at present these are provided only to let the user analyze and validate the knowledge extracted from the database. We are currently testing different variations on the combination of these factors in the reasoning process. Although the system learns and updates its descriptors in an offline process (therefore not critical for the application to recommend items in real time), our learning algorithm is fairly simple and fast. Above all, it is faster than algorithms that group evidence and try to compute the relevance of each item and then of each group of evidence.
Recommending Regarding the process for making recommendations, collaborative filtering may go through actual purchase records in the database in the search for users who are similar to the current user. For large databases with hundreds of thousands of users, any such technique may present performance problems (a poor level of scalability). Content-based methods make recommendations by retrieving items with similar descriptions, with the help of information-retrieval techniques. As stated earlier, the main disadvantage of this approach when used on its own is that it relies solely on the item’s characteristics without taking account of other information about users’ preferences. Alternative methods which combine contentbased methods and collaborative filtering have been proposed in order to minimize this weakness (Popescul et al., 2001; Balabanovic and Shoham, 1997). Concerning the recommendation process used by association rules, it finds recommendations by triggering rules that match information known about the user. The recommendation process used by item descriptors selects the descriptor showing the highest correlation score when
compared to the user’s information. Both of these processes are less susceptible to scalability problems, as association rules and item descriptors keep generalized knowledge about when to recommend each item (instead of having to deal with actual records in the reasoning process). Our model may also be compared with Hidden Markov Models (HMM), employed in tasks such as the inference of grammars of simple language (Georgeff and Wallace, 1984), or the discovery of patterns in DNA sequences (Allison et al., 2000). The two models are similar in that both use probability theory to determine the likelihood that a given event takes place. However, the actual methods used to compute probabilities of events are different: while HMM considers the product of the probabilities of individual events, we consider the product of noise parameters. Both models are based on the assumption that an output is statistically independent of previous outputs. This assumption may be limiting in given circumstances, but for the type of application we have chosen, i.e. item recommendation, we do not believe this to be a serious problem (e.g. as we have remarked above in our comments on indepencence). To take one practical example, the probability that a user purchases item C is very rarely dependent on the order in which users have bought other items (e.g. B before A, or A before B). The recommendation method we use has the peculiarity of computing the correlation of individual terms initially, and then combining them in real time. This is analogous to finding first a set of rules with only one left-side term, followed at run time by finding associations between the rules. This is a good technique to avoid computing the relevance of all possible associations among terms in the learning phase.
Conclusions One important contribution of this work has been the use of the method for calculating the relevance of terms individually, and then combining them at recommendation time through the use of the noisy-OR function. A similar use of the function can be found in research on expert systems (Gallant, 1988), but not in applications for recommender systems. Initial results have shown that the approach is very effective in large-scale practice for purposes of personalization. The model used to represent different types of information (demographic or behavioral) in a similar way is another relevant contribution of this project. Previous work in the field has shown the importance of dealing with and combining such types of knowledge in recommender systems (Pazzani, 1999). Current research on the identification of implicit user's interests also shows that recommender systems will have to manipulate different sorts of data in order to infer users’ preferences ( Claypool et al., 2001). From a practical point of view, dealing with single and multi-valued attributes in a uniform manner is another
important advantage of the model. However, further research is needed to refine the reasoning process so as to let it differentiate the way single and multi-valued attributes are used, while keeping their representation unchanged. We are also starting to investigate the use of our model in making recommendations in educational systems, following the idea that courses that work well for one student will also work well for other similar students (Schank, 1997). An educational system to assist students in learning algorithms is being implemented at the Department of Computer Science of the University of Caxias do Sul, Brazil. We intend the system to be able to recommend exercises, texts and other contents according to the student’s own characteristics, as well as according to his/her more general learning profile.
References Agrawal, R. and Srikant, R. 1994. Fast Algorithms for Mining Association Rules. In Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile. Allison, L.; Stern, L.; Edgoose, T. and Dix, T. I. 2000. Sequence Complexity for Biological Sequence Analysis. Computers and Chemistry 24(1):43-55. Balabanovic, M. and Shoham, Y. 1997. Content-based, Collaborative Recommendation. Communications of the ACM, Vol 40, No 3. Blake, C.L.; and Merz, C.J. 1998. UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science. Breese, J. ; Heckerman, D. and Kadie, C. 1998. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI, July, Morgan Kaufmann Publisher. Brin, S.; Motwani, R.; Ullman, J. D.; and Tsur, S. 1997. Dynamic itemset counting and implication rules for market basket data. SIGMOD Record (ACM Special Interest Group on Management of Data), 26(2):255. Buchner A., Mulvenna, M. Discovering Behavioural Patterns in internet files. Proceedings of Intelligent Tutoring Systems ‘98, San Antonio, Texas, USA. 16-19. Lecture Notes in Computer Science, 1452. Claypool, M.; Brown, D.; Le, P.; and Waseda, M. 2001. Inferring User Interest. IEEE Internet Computing, 5(6):3239. Gallant, S. 1988. Connectionist Expert Systems. Communications of the ACM, Vol. 31, Issue 2. Georgeff, M. P. and Wallace, C. S. 1984. A General Selection Criterion for Inductive Inference. European
Conference on Artificial Intelligence – ECAI 84, Pisa, 473482. Goldberg. D.; Nichols, D.; Oki, B. M.; and Terry, D. 1992. Using collaborative filtering to weave an information tapestry. Communications of the ACM, Vol 35, Num 12. Herlocker, J.; Konstan, J.; and Riedl, J. 2000. Explaining Collaborative Filtering Recommendations. In Proceedings of ACM Conference on Computer Supported Cooperative Work. Philadelphia, Pennsylvania, USA. Krulwich, B. 1997. LIFESTYLE FINDER: Intelligent User Profiling Using Large-Scale Demographic Data, Artificial Intelligence Magazine 18(2). 37-45. Lin, W.; Alvarez, S. A.; and Ruiz, C. 2000. Collaborative Recommendation via Adaptive Association Rule Mining. KDD-2000 Workshop on Web Mining for E-Commerce, Boston, MA, USA. Linden, G.; Smith, B.; and York, J. 2003. Amazon.com Recommendations – Item-to-item Collaborative Filtering. IEEE Internet Computing, 7(1):76-80. Mobasher, B., Dai, H., Luo, T. and Nakagawa, M. 2001. Effective Personalization Based on Association Rule Discovery from Web Usage Data. Proceedings of the ACM Workshop on Web Information and Data Management. Atlanta, Georgia, USA. Mooney, R. J. 2000. Content-based book recommending using learning for text categorization. Proceedings of the Fifth ACM Conference on Digital Libraries, San Antonio, US. New York, NY: ACM Press. Middleton, S.E.; Alani, H.; Shadbolt, N.R. and Roure, D.C. D. 2002. Exploiting synergy between ontologies and recommender systems. In: Proceedings of The Eleventh International World Wide Web Conference - WWW2002, Hawaii, USA, ACM, Sementic Web Workshop 2002. Pazzani, M. 1999. A Framework for Collaborative, Content-Based and Demographic Filtering. Artificial Intelligence Review. 13(5-6): 393-408. Pradhan, M., Provan, G. M., Middleton, B., and Henrion, M. 1994. Knowledge engineering for large belief networks. In Proceedings of Uncertainty in Artificial Intelligence, Seattle, Washington. Morgan Kaufmann. Popescul, A.; Ungar, L. H.; Pennock, D. M.; and Lawrence, S. 2001. Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, Seattle, Washington. Morgan Kaufmann. Sarwar, B.; Konstan, J.; Borchers, A.; Herlocker, J.; Miller, B.; and Riedl, J. 1998. Using Filtering Agents to Improve Prediction Quality in the GroupLens Research Collaborative Filtering System. Proceedings of the Conference on Computer Supported Cooperative Work. New York. Seattle, Washington, USA. Sarwar, B. M.; Karypis, G.; Konstan, J. A. and Riedl, J. 2001. Item-based collaborative filtering recommendation
algorithms. In Proceedings of the 10th International World Wide Web Conference - WWW10, Hong Kong. Schank, R. 1997. Virtual Learning: A Revolutionary Approach to Building a Highly Skilled Workforce. New York, NY: McGraw-Hill. Schafer, J. B.; Konstan, J.; and Reidl, J. 1999. Recommender Systems in E-Commerce. ACM Conference on Electronic Commerce, Denver, Colorado, USA. Sneath, P. A. and Sokal, R. R. 1973. Numerical Taxonomy: The Theory and Practice of Numerical Classification. San Francisco, CA: Freeman.