creating academic web communities: a recommender ... - CiteSeerX

3 downloads 7249 Views 46KB Size Report
research and maybe they could create a single research and exchange ... In creating a web community, the geographical distribution of the users is not a ...
CREATING ACADEMIC WEB COMMUNITIES: A RECOMMENDER SYSTEM TO AID BRAZILIAN RESEARCHERS TO FIND INFORMATION AND CONTACT RELEVANT PEOPLE* Sílvio César Cazella1,2, , Luis Otávio Campos Alvares2 Universidade do Vale do Rio dos Sinos, São Leopoldo, RS, Brazil, cazella @inf.ufrgs.br 2 Universidade Federal do Rio Grande do Sul , PPGC, Porto Alegre, RS, Brazil, [email protected] 1

ABSTRACT Researchers (from academy) need to find useful information and contact interesting people from their area of research. If a system could facilitate the contact among these researchers maybe they could put more effort on doing research and maybe they could create a single research and exchange information. It could be a solution to better manage the time and effort of this group of researchers. This paper introduces the architecture of the recommendation system W-RECMAS, which combines a multi-agent system and data mining technique to aid people to find the information they need and contact people to exchange ideas and knowledge. The work involves filtering information using a recommender system combined with the Brazilian e-government's1 database to build user’s profile. The great advantage of this solution is that it can aid to identify people with the same interests (likeminded), and put them in contact in order to create academic web communities. INTRODUCTION The Web is an excellent source of information, but it offers huge amounts of data, and so it is becoming increasingly difficult to find relevant information. For this reason, a considerable number of researches focused on the information overload problem had been done, and there are well- known Internet search engines such as Google (google.com) and AltaVista (av.com)  available to assist the user in his/her search. However, these search engines can help the user only in the information retrieval task; they cannot guarantee the quality of the results (Flake, G.W. et al., 2000). In order to retrieve information, the user needs to specify the query to the search engine, which matches the query with documents from a huge database. However, there are two main problems with this approach: the user may not know the right words to use to find the desired information and/or the system returns a great deal of possibly useless information (Flake, G.W. et al., 2000). In order to assist users, recommender system has been developed. It presents a new approach to aid the user finding relevant information. Almost all recommender systems do the same thing: they identify items based on some criteria and the user’s profile in order to provide recommendations to the user (Balabanovic, M., 1997). Some techniques, such as collaborative filtering (Balabanovic, M., 1997; Konstan, J.A. et al., 1997) and content-based filtering (Balabanovic, M., 1997; Pazzani, M., 1999), are applicable in recommender systems. Another technique is known as the hybrid technique, which applies collaborative-filtering and content-based filtering together (Balabanovic, M., 1997). Many recommender systems were developed to recommend musics, news, web sites, and so on. We can give some examples: RINGO (Shardanand, U., 1995), GROUPLENS (Konstan, J.A. et al., 1997) and FAB (Balabanovic, M., 1997). None of these systems have any concern with the relevance of the user's opinion. One problem for recommender systems is how to justify the recommendations in order to encourage the users to accept them.When we relate our concerns about this to academic life, we see that the relevance of a user's opinion is of great importance. Exchange of experiences is a basic principle in academic life, since researchers need to access large quantities of information in a limited amount of time. A student working on a research project needs to be able to find relevant information; the question is: how? Using our approach, the solution is a combination of a recommender system and a relevant community. Our system presents some concern related with the relevance of user’s opinion (represented by a metric named Recommender’s Rank). The student is then able to access more information about specific items through contact with relevant people from his/her area of study. * This research has been funded in part by the Brazilian agency CAPES under grant BEX1357/03-4. 1 In this paper we assume e-government to be an emergent concept. It refers to systems, which provide free information through an electronic resource. The resource is provided from a public government agency to citizens.

VIRTUAL COMMUNITIES The first forms of virtual communities were created based on users who were able to meet and discuss over distance. The Internet can be considered as a large community of like-minded people. According to Rheingold, H. (2000), a virtual community represents the emergence of socially-motivated communities of interest in the Internet. The author describes virtual communities as "social aggregations that emerge from the Net when enough people carry on those public discussions long enough, with sufficient human feeling, to form webs of personal relationship in cyberpace". In creating a web community, the geographical distribution of the users is not a problem because web technologies (e.g., e-mail, chat, forum), allow people to establish contact easily. A community provides a source of knowledge and information and helps to avoid duplication of current and past efforts of other members. We can imagine a case in which a graduate student needs to start some research, but has a limited grasp of the meaningful keywords and their synonyms. He/she may waste a considerable time searching for information that others had previously discovered, however, if the student was enrolled in a community, it would provide them with some shortcuts. In order to create a community we need to address some challenges (Case, S. et al., 2001): a) members should be directed to like-minded and relevant information; b) members should have access to information they require without feeling overloaded; c) members should be informed about other like-minded members; d) members should be able to disseminate information to the appropriate people. A MULTI-AGENT BASED DATA MINING ARCHITECTURE This architecture is part of a project named W-RECMAS (Cazella, S.C., 2003). which was ellaborated to enable a community creation and to minimize the data overload problem by controlling the content recommended to the user. Our multi-agent system based data mining architecture is responsible for the generation of recommendations (academic papers) and the creation of virtual web communities. It takes advantage of a multi-agent technology and a mix of different types of intelligent agents, each one responsible for a specific task: Crawler Agent, Analyst Agent, Recommender Agent, Personal Agent and Community Agent. To have access to the system, our user needs to be enrolled and supply some personal information (such as full name, e-mail address, university, home page, academic level, and areas of interest). Once we have the user's full name, a Crawler Agent will obtain more information about this user from the Internet, accessing an e-government system in Brazil. This agent retrieves information from a system named CV-Lattes2. The CV-Lattes database has become an excellent source of information, because the system offers to the community a number of important production metrics (quantitative data) from each researcher (e.g., the number of publications in journals, papers published in conferences, books and chapters published, supervised students, participation in examining committees, and so on). Using the information provided by the user and by the Crawler Agent, the Analyst Agent calculates the relevance of user’s opinion (Recommender's Rank - RR) to a specific area of knowledge (e.g., a user indicates interest in “data mining” then our agent parses his curriculum aiming to verify his authority to give opinion on this area), and a user’s profile is created. The user's profile changes dynamically becouse the Crawler Agent monitors the users' CVLattes, already enrolled, to verify if any alteration has occurred. If so, the profile is updated and the RR is recalculated (Cazella, S.C., 2003). A user is identified by RR till the time the community is created and the user agrees to be contacted by others. For the others, a user will be a number between [0,10] (the range could be changed, we are using this range because it is a normal range for evaluating things in Brazil). An RR equal to zero means that a user does not have much experience with a paper's area (we assume this user's opinion is not very relevant for this area); five means that the user has a good deal of experience in a paper's area; and ten means that this user has in-depth knowledge about the paper's area (we assume that this user's opinion is very relevant). In order to recommend items to the user, our solution combines content-based filtering and collaborative filtering techniques. We use this hybrid approach to minimize the limitations from using content-based and collaborative 2

CV-Lattes was developed by CNPq (The National Council for Scientific and Technological Development) is a foundation linked to the Ministry of Science and Technology (MCT), to support Brazilian research. In this database, it is possible to find information about researchers, graduate students, teachers, and professors.

filtering individually (Balabanovic, M., 1997). The role of the Analyst Agent is to match the user's interests (according to the database profile and the database historical consumption) with papers (the database papers). Agent's task is to decide which recommendation technique (content-based or collaborative) is the best to apply for each user in a specific moment. For example, if there are sufficient ratings for items, and sufficient users, the system is able to apply the nearest neighbor algorithm (Han, J., Kamber, M. ,2001) to find like-minded users, and the agent may act, using the collaborative filtering technique. However, in the opposite, if the agent applies content-based filtering, it indicates that user interest needs to be verified and past papers accessed in order to predict if a specific paper would be interesting to the user. Once we have a recommendation, a Recommender Agent delivers it to a Personal Agent. The Personal Agent reminds the user to send a recommendation feedback to the system, so it will improve the recommendations in the future. A Community Agent verifies the user's information to find potential members for a community. Predicting Implicit User’s Interests According to Han, J., (2001) association rule mining is a data mining task that discovers relationships among items in a transactional database. In this work, we are using the original Apriori algorithm (Han, J., Kamber, M. ,2001). Our idea here is to discover new relations between areas of interest and find some users who match with this new knowledge. To confirm if the rule has value, beyond confidence and support, we are applying an average between the values of users’ RRTotal who has a perfect match with the new rule. If the result is greater than, or equal to a threshold (in our example, we are applying 7.0 from a range [0,10]), we consider this rule applicable for the others. Once the rule is established and the relevance is greater than or equal to 7.0, the Analyst Agent will find the users to whom the rule is applicable, and proceed with the recommendations, according to this new rule of interest. To accomplish this, we analyze users' areas of interest. For example, let us consider a matrix with ten users and their respective areas of interest. In this matrix, a column User, identifies the user, and RRTotal represents her/his authority to give opinion about some areas (e.g., User1 presents authority equal to 7.0 from a range [0,10]), the other columns represents areas of interest and a value equal to one indicates that the user is interested in this area (examples of areas are: multi-agent system (MAS), data mining (DM), machine learning (ML), expert systems (ES)). As a result of the association rules task application (we applied the Apriori algorithm from Weka3 tool) we discovered, for example, a rule, “If AI=MAS then AI=DM”, with a support factor equal to 0.5 and confidence factor equal to 0.8. The average among the RRTotal from the users, whose interests match with this rule, is equal to 7.8. This number represents the relevance of the opinion and knowledge from the group, which matches with the rule. The rule will be recorded in a database of rules. When the Analyst Agent observes this rule, it attempts to identify, among the profiles, users which may be benefited by it, meaning, users who mentioned their interest in “MAS” (precondition of the rule) but not in “DM” (post condition of the rule). The agent then recommends papers from the new area discovered and, if the user shows interest in this new area, the profile is updated. CREATING COMMUNITIES An important highlight is that besides to recommend items to our users, the main goal of this work is to enable a researcher to contact others with the same areas of interest and to take advantage of this contact to exchange ideas and experiences. For example, let us consider a user interested in the “multi-agent system” area and living in Recife (a Brazilian city) and another individual also interested in the same area, and living in Porto Alegre (another Brazilian city). Through the architecture it is possible to discover if these people are interested in the same areas, and if they are like-minded. If so, we can consider them potential members for a community. Let U={u0,…,u n-1} be a set of n users; let AI={a 0,…,am-1}be a set of m areas. Let C be a community, and a community, a set of users C ⊆ U. We consider users to be possible members of a community applying a rule: iff Areas(Un,U n-1) ∅ and Like_minded (Un,Un-1) = true then Community(Un,Un-1), where Like_minded (x,y) is a function to define if two users are like-minded or not (applying correlation coefficient from GroupLens (Konstan, J.A. et al., 1997) and Areas(x,y) is a function to verify if two users have at least one area of interest in common. All the users who are characterized by this rule will be a potential member of a community. When we apply the algorithm to identify like-minded people, we discover for example, two users with the same likes, and then we 3

Weka is a collection of machine learning algorithms for data mining tasks (http://www.cs.waikato.ac.nz/~ml/weka/).

consider this group as like-minded. If these users have at least one area of interest in common, in this condition they can be candidates for a community. Once the Community Agent (based on our rule and our recommender system) identifies all the possible members of a community, a process will be started to put the users in contact. The users’ private information cannot be published for a simple reason: the user ratings are based on a private opinion. For this reason, the Community Agent needs to ask the users’ Personal Agents if their owners want to get in contact with others. Once the contact is established, others in the same community will know at least the user's e-mail. After the Community Agent identifies the potential participants in a community, it sends a message to all users’ Personal Agents with a list of users identified as a possible community, and requests the user to return a list with relevant people (interesting people). The user will receive only a list of potential participants and their RR (recommender’s opinion relevant). Each Personal Agent returns the authorized participants to the Community Agent. After verification and a merge among all answers, the Community Agent provides the contact e-mail to keep the users in touch. CONCLUSIONS AND FUTURE WORK In this paper, we have presented a recommender system to aid people to find the information they need and contact relevant people for creating communities on the web. The three principal contributions of this paper are: (a) a detailed architecture to recommend items and identify academic communities; (b) a method of working with multiagent systems and data mining, together; (c) a metric in recommender systems that considers the relevance of a user's opinion. In our architecture, we presented support for the challenges cited in Virtual Communities Section: members are directed to relevant people and relevant information; they have access to information without feeling overloaded once the system controls the quantity of content presented to the user; and they are informed about other like-minded people. Using a recommender system and applying this metric (recommender’s rank) to identify the user’s relevance of opinion, our system  W-RECMAS  can facilitate the contact among Brazilian researchers. The advantage of this is that this system can put Brazilian researchers in contact to exchange ideas and information about their areas of research. It is a shortcut for the researcher to find the relevant information and relevant people in Brazil. Future work includes validate our system with real users. Currently, students and professors are using the WRECMAS. The aim of this experiment is evaluate the users satisfaction with the recommendations based on user’s relevance opinion. Other future work includes a study to implement trust to identify Communities, because trust is the basis of all social interactions. We could work this information to create a model that evaluates the trust interaction among users before creating a community, once we are already working with collaborative filtering. REFERENCES Balabanovic, M., Shoham, Y. (1997). Fab: Content-Based, Collaborative Recommendation. Communications of the ACM, Vol. 40, n.3. (pp. 66-72). New York, USA. Case, S. et al. (2001). Enhancing E-Communities with agent-based systems. IEEE Computer Society Press , Vol. 34, n.7. (pp. 64-69). Los Alamitos, CA, USA. Cazella, S.C. (2003). W-RECMAS: a hybrid recommender system based on multi-agent system to academic paper recommendation. Technical Report, PPGC-UFRGS–2003-339, Federal University of Rio Grande do Sul, POA, Brazil. Flake, G.W. et al. (2000). Efficient identification of web communities. In Proceedings of knowledge discovery in database. (pp. 150-160). Boston, MA, USA. Han, J., Kamber, M. (2001). Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco, CA, USA. Konstan, J.A. et al. (1997). Grouplens: Applying Collaborative Filtering to Usenet News. Communications of the ACM, Vol. 40, n.3. (pp. 77-87). New York, USA. Pazzani, M. (1999). A Framework for collaborative, content-based and demographic filtering. Artificial Intelligence Review, Vol. 13, Issue 5. (pp. 393-408). Kluwer Academic Publishers, Hingham, MA, USA. Rheingold, H. (2000). The virtual community: homesteading on the electronic frontier. MIT Press, Massachusetts, USA. Shardanand, U., Maes, P. (1995). Social information filtering: Algorithms for automating "word of mouth". In Proceedings of Human Factors in Computing Systems (CHI '95). (pp. 210-217). Colorado, USA.

Suggest Documents