Collaborative Filtering System in Knowledge Management Appasaheb Naikal1* and Su Mon Shein2 1
Librarian, S P Jain School of Global Management, 10 Hyderabad Road, Singapore-119579;
2
Accounts Senior, TnB Corporate Services Pvt. Ltd., Block 738, Woodlands Circle, #03-369 SingaporeS730738 *E-mail:
[email protected];
[email protected] 2
E-mail:
[email protected]
ABSTRACT Collaborative filtering system (CFS) is the process of filtering information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. Applications of collaborative filtering (CF) typically involve very large datasets. The CF methods have been applied to many different kinds of data including sensing and monitoring data, such as in mineral exploration and knowledge management. In this paper, an attempt has been made to briefly explain the meaning and types of collaborative system and its advantages and disadvantages. This paper also highlights new developments and innovations in CFS till date. The authors have also assessed about how CFS can help to manage knowledge at individual, group and at organisational levels.
Keywords: Collaborative filtering system, CFS, Knowledge management
1. INTRODUCTION Information is a very broad term. Essentially, anything that can be digitised and encoded as a stream of bits is information. The information available today has different value to different consumers. Some information have business value and some have entertainment value (Shapiro and Varian, 1999). However, it is a well-known fact that producing information is a costly affair but it is cheap to reproduce, whereas filtering and identifying the right information is not only a costly, but also a tedious process. In this age of information, due to overload of data and information, the consumers often face the problem of how to find out what they are looking for, be it a product, service or content. It became problematic, time consuming and costly to sort out huge amount of information available on the Internet. In the past, if we wanted a suggestion about where to buy some products or even a road direction, we asked our friends or our colleagues who we knew had that information. Today, things have been changed because there are lots of tools available to get such information or tips. Collaborative filtering system (CFS) is one such technology that has emerged to help us. The CFS, developed by Firefly and Net Perceptions, is an automated version. The software allows people to use feedback of thousands of the other people with
1
tastes similar to them and find the information they desire (Brynjolfsson and Charlet, 1998). The CFS provides personalised service to the buyer on the web by recommending the alternative and similar products.
2. WHAT IS COLLABORATIVE FILTERING? Collaborative filtering (CF) is the process of filtering the information or patterns using techniques such as collaboration of one or more applications like viewpoints, data sources, etc. Applications of CF typically involve very large datasets. The CF methods are becoming integral part of electronic commerce and Web 2.0 applications, where the focus is on users and their buying and spending behaviour. The CFSs work by including people in the filtering system, and we can expect people to be better at evaluating documents than a computed function. Present CFSs attempt to find articles of interest to their users, often using some scoring function to evaluate features of the documents and returning the documents with the highest scores (Malza and Ehrlich, 1995)
3. BACKGROUND OF COLLABORATIVE FILTERING SYSTEMS The concepts of CF originated with the information Tapestry project at Xerox Palo Alto Research Center (PARC). Tapestry was the first system to support CF, where its users were allowed to annotate the documents to read based on the content of the documents and what other users said about them. In this CFS, people collaborate to help one another to perform filtering by recording their reactions to the documents they read. These reactions are generally called annotations and can be accessed by others filters.
4. TYPES OF COLLABORATIVE FILTERING SYSTEMS
4.1. Active Filtering System Active filtering system (AFS) is the most popular at the moment because of an ever-growing base of information available to the users of the Internet. Since enormous amount of information and data are being added to the Internet every day, getting the right information by the right user at a right time has become difficult. As we know, when we search the Internet for any information, thousands of results are obtained in numerous pages and most of the information are not relevant or effective. To overcome this problem, there are lot of databases and search engines available in the market. Unfortunately, many of us are not familiar with these options or not aware about how to make use of these options. This is where active filtering comes into effect. Active filtering is different from other CF methods because it uses a peer-to-peer approach. In this system, peers, co-workers and likeminded people rate products, reports and other material objects that they have used or evaluated and are considering to own and share this
2
information on the web for other people’s benefit. It is a system based on the facts that people want to share regarding consumer information with other peers. The users of active filtering use lists of commonly used links to send the information over the web where others can view it and use the ratings of the products to make their own decisions. AFS can be useful to different people at different occasions. It comes handy in a situation, such as in a non-guided web search that produces hundreds of results that look similar, but not useful for the person who is looking for the right information; in such instances, filtering is very useful and effective.
4.2. Passive Filtering System Passive Filtering System is another method of CFS and shows signs of great potential in the future. In this passive filtering, information will be collected implicitly and the web browser is used to record a user’s preferences by following and measuring his/her actions. After that, same implicit filters are used to determine what the other interests of that particular user are and what he/she would like to recommend. Implicit filtering heavily depends on the actions of the users to determine a value rating for specific contents like purchasing an item, no. of items he/she queried about before buying any item, saving items for future consideration and referring to sites.
4.3. Item-Based Collaborative Filtering Item-based collaborative filtering is a model-based algorithm for making recommendations. In the algorithm, the similarities between different items in the dataset are calculated by using one of a number of similarity measures, and then these similarity values are used to predict ratings for user-item pairs not present in the dataset. Item-based filtering is another method of CF in which items are rated and used as parameters instead of the users. This type of filtering uses the ratings to group various items together in groups so that the consumers can compare them, as well as a rating scale that is available to the manufacturers so they can locate where their product stands in the market in a consumer-based rating scale. Through this method of filtering, users or user groups use and test the product and give it a rating that is relevant to the product and the product class in which it falls. These users test many products and rate them accordingly; the products are classified based on the information which the rating holds. The products are used and tested by the same user or group in order to get an accurate rating and eliminate some of the errors that are possible in the tests that take place under this type of filtering.
5. ADVANTAGES AND DISADVANTAGES The advantages of CFS are what it can do better than content-based filtering. The CFS does not rely on keywords only to make recommendations. Content filtering systems rely on defined fields or attributes of the item. Items having the same content in the field would fall under the same category or have the same
3
rating. Its algorithm would continue to recommend similar items to the user, while new items that are not yet rated would be missed. Effectively, the system is unable to understand the subtleties of the item, for example, writing style of the article, which a human would be able to. The CFS overcomes all the above by making use of the capability of people. It analyses all consumers’ profiles of what has been bought, their opinions and experiences to find similar consumers or neighbours to predict the next item, as such recommendation is not restricted by the consumer’s profile or what the user has rated.
The CFS would suffer a common problem known as cold start. The system cannot be effectively used if it is a new user who has yet to rate or has not rated enough items. Hence, there is a training period required before the system can be effective. Furthermore, the system is not effective if it is a small population of evaluators verses a large database. It would be inconceivable that every item has been rated. First-rater problems, where new items or items not yet rated by the consumers would not feature in the algorithm and help in the prediction, could arise. If the algorithm has to process the entire database to make a prediction, processing capability would increase dramatically with the number of evaluators and the items rated, as such CFS is not scalable in processing. Finally, as CFS depends on user opinion, results may not always be subjective and impartial. There will be some elements of false or inconsistent ratings.
6. INNOVATION IN COLLABORATIVE FILTERING SYSTEM New algorithms have been developed for CF as a result of the NetFlix prize. The Netflix prize is an ongoing open competition for the best CF algorithm that predicts user ratings for films based on previous ratings. The competition is held by Netflix, an online DVD rental service, and is open for anyone (with some exceptions). A grand prize of $1,000,000 is reserved for the entry which bests Netflix's own algorithm for predicting ratings by 10%. Cross-system CF is where user profiles across multiple recommender systems are combined in a privacy preserving manner. Recommender systems form a specific type of information filtering technique that attempts to present information items (movies, music, books, news, images, web pages, etc.) that are likely of interest to the user. Typically, a recommender system compares the user's profile with some reference characteristics and seeks to predict the 'rating' that a user would give to an item they had not yet considered. These characteristics may be from the information item (the content-based approach) or the user's social environment (the CF approach).
7. HOW COLLABORATIVE FILTERING SYSTEM SUPPORTS KNOWLEDGE MANAGEMENT AT THE INDIVIDUAL, GROUP AND ORGANISATION LEVELS It is interesting to note that CFS and knowledge management (KM) have a common denominator, i.e. people. Filtering systems cannot replace the ability of people to differentiate the nuances and subtleties of
4
the article or artifact, such as writing style. The CFS helps to leverage on the opinions and experiences of others to shorten search time, thereby reducing search cost to the user and provide value to the organisation as a whole.
The CFS has been used in information and documents sharing in small workshop environments, particularly in domains where personal preference is a highly subjective matter, such as in the music and arts, where machines are not able to discern and analyse the nuances. Many e-commerce websites, such as iTunes, Hollywood Video and Amazon, make use of CFS. The CFS combines CRM techniques and personalised marketing to help the customer in taking better decisions, improving consumer experience and building customer loyalty. Each customer visit is maximised by recommending and cross-selling additional products. This has translated into improved profits for the organisation. In expertise recommender systems, CFS uses passive filtering to rate how often a person is being searched/viewed to locate expertise.
In the study by Maltz and Ehrlich (1995) on group-level information sharing and collaboration among colleagues, information mediators would send interesting articles and relevant documents as pointers to the distribution list. Pointers are hyperlinks to the referenced document. The system demonstrated the importance of people in selecting relevant articles and of the trust on the sender as a reliable source when there is no formal process to review and validate the documents. This system is well-liked by the group, but is more suitable for small focused groups. Balabanovic and Shoham designed a system that comprises of multiple collection agents. Each agent dynamically learns of different topic from the rest, yet learns from one another. What is significantly different is that the agent adapts to the whole population of users, rather than any specific user. Hence, the system avoids the critical problem of processor scalability, as highlighted in above section.
8. FUTURE OF COLLABORATIVE FILTERING SYSTEM The CFS will continue to experience cold start, due to the first-rater and scalability problems highlighted above. Research has been made to create a hybrid of content filtering and CF techniques. In controlled environment, results have shown that the hybrid model performed better than the pure model of each
technique. When we try to correlate CFS with current Internet trends, CFS techniques are very similar to the topic of knowledge organisation, folksonomies and social tagging. In this era of Web 2.0, collaboration and user interactivity will continue to drive the adoption and proliferation of CFS-related techniques.
9. CONCLUSION
5
Due to the revolutionary impact of information technology, the market is changing rapidly. In a business domain there is no place for inefficient and the slow-moving enterprises. Companies are under tremendous pressure to manage their information assets more efficiently and effectively than ever (Davis et al., 2006). How companies are capturing information and further filtering, validating, storing and distributing them are becoming the deciding factors of their survival and profitability. The CFS stood up to meet this new challenge. It is becoming an integral part of all kinds of companies in filtering data and information. The CFS also plays vital role in KM because it allows people to feedback their opinions and experience, which are key enablers of KM success. Any KM project’s success is not possible without concert contribution of infrastructure, process, people and culture. The CFS is facilitating this in a meaningful way.
ACKNOWLEDGEMENT Authors would like to express their sincere gratitude to Mr. Tan Pok Cheng and Miss. Nwaye Mi Mi Tun, for their valuable contributions in the form idea sharing and suggestions while writing this paper.
REFERENCES Balabanovic, M. and Shoham, Y. (1997). Content-based, collaborative recommendation, Communications of the ACM, 40(3), pp. 66-72. Brynjolfsson, E.and Charlet, J.C. (1998). Collaborative Filtering, Technology Note-SOIT-22S. Retrieved on 20 January 2012 from http://cb.hbsp.harvard.edu/cb/web/search_results.seam?Ntt=Collaborative%2BFiltering%252C%2BT echnology%2BNote%2B&conversationId=3797798 Carr, N.G. (2004). Does IT Matter?: Information Technology and the Corrosion of Competitive Advantage, Harvard Business School Press, Boston. Davis, J., Miller, G.J. and Russell, A. (2006). Information Revolution: Using the Information Evolution Model to Grow Your Business, John Wiley and Sons Inc., New Jersey. Herlocker, J.L., Konstan, J.A., Terveen, L.G. and Riedl, J.T. (2004). Evaluating collaborative filtering recommender systems, ACM Transactions on Information Systems, 22(1), pp. 5–53. Kim, B.D. and Kim, S.O. (2001). A new recommender system to combine content-based and collaborative filtering systems, Journal of Database Marketing, 8, pp. 244-252. Maltz, D. and Ehrlich, K. (1995). Pointing the way: active collaborative filtering. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '95), Katz, I.R., Mack, R., L. M., Mary, B.R. and Jakob, N. (eds.). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, pp. 202-209. (Please detele this reference correct reference given below as Maltz & Ehrlich 1995)
6
Maltz, D. and Ehrlich, K. (1995). Pointing The Way: Active Collaborative Filtering. Retrieved on 20 January 2012 from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.5535&rep=rep1&type=pdf. McDonald, D.W. and Ackerman, M.S. (2000). Expertise recommender: A flexible system and architecture, ACM Conference on Computer Supported Cooperative Work. Retrieved on 20 January 2012 from http://courses.ischool.utexas.edu/donturn/2008/fall/INF_385Q/readings/McDonald_Ackerman-2000Expert.pdf Shapiro, C. and Varian, H.L. (1999). Information Rules: A Strategic Guide to the Network Economy, Harvard Business School Press, Boston. Jun, W., Johan, P., Reginald, L.L. and Marcel, J.T.R. (2006). Distributed collaborative filtering for peerto-peer file sharing systems, In Proceedings of the 2006 ACM symposium on Applied computing (SAC '06), ACM, New York, NY, USA, pp. 1026-1030.
About the Authors
Appasaheb Naikal is an Information and knowledge Management professional from India and presently working as Librarian at S P Jain School of Global Management, Singapore. He received his Master of Science in Knowledge Management from Nanyang Technological University, Singapore in 2009 and Master of Library & Information Science from Karnatak Univeristy Dharwad, India in 1999. He has more than 12 years’ experience in managing both academic and corporate libraries at national and international level. His research interests include knowledge management, digital libraries, social media and ICT application to LIS education.
Ms. Su Mon Shein received her Master of Science in Knowledge Management from Nanyang Technological University, Singapore in 2009 and presently she is working as Accounts Senior at TnB Corporate Services Pte Ltd. Singapore.
7