Personalized Information Retrieval by Using Adaptive ... - CiteSeerX

4 downloads 201961 Views 918KB Size Report
Many search engines such as Yahoo, Google, MSN, and AltaVista, have been developed to meet various users' search needs in real world. In general, because ...
Personalized Information Retrieval by Using Adaptive User Profiling and Collaborative Filtering Hochul Jeon, Taehwan Kim, Joongmin Choi

Personalized Information Retrieval by Using Adaptive User Profiling and Collaborative Filtering Hochul Jeon, Taehwan Kim, Joongmin Choi Department of Computer Science & Engineering, Hanyang University {hcjeon,kimth}@cse.hanyang.ac.kr, [email protected] doi:10.4156/aiss.vol2. issue4.14

Abstract Many search engines such as Yahoo, Google, MSN, and AltaVista, have been developed to meet various users’ search needs in real world. In general, because of the lack of the personal information such as hobby, preferences, and interests, these existing information retrieval systems are unsuitable to provide personalized search results to users. In this paper, we propose an adaptive user profiling method using dynamic updating policy considering the change of the users’ preferences over time and domain. Moreover, we employ collaborative filtering method to handle the situation that users’ preferences are frequently or continuously changed. Experimental results show that our method considerably improved personalized search performance for each user through automatic creation, maintenance, and personalization of user preference profiles that include search patterns of individual users.

Keywords: Personalized Information Retrieval, Adaptive User Profiling, Collaborative Filtering 1. Introduction With the rapid growth of the World Wide Web, there is a vast amount of available data on the web and the number of the available information on the web rapidly increases. To overcome a flood of information, and provide personalized information to users is an important issue in information retrieval field. Many search engines such as Yahoo, Google, MSN, and AltaVista, have been developed to meet various users‟ search needs in real world. In general, because of the lack of the personal information such as hobby, preferences, and interests, these existing information retrieval systems are unsuitable to provide personalized search results to users. As a solution to this problem, user preference profile approach has been employed to provide personalized search results to each user[5]. In this method, users‟ preferences are compared with search results and then the matched results are served to the user. In general, user preference profiles consist of the results of lexical analysis, the input query, the documents that were clicked by the user, the queries that were used by the user in the past, and some weight values. However, a user preference profile that includes incorrect user preferences only gives a pain to users. Incorrect user preferences are generally obtained by the static profile approach. In this static profile approach, preferences or weight values are static and not changed once the user preference profile is created. Most portal systems use the static profile approach to provide personalized information. Because users‟ preferences vary over time, place, context, or domain, the static profile approach has a high chance of having incorrect users‟ preferences. To address this problem, various learning techniques, such as Bayesian classifiers, neural networks, and genetic algorithms (GAs), have been utilized for revising user profiles in several studies [8, 9, 10, 11], achieving various levels of improvement. However, these studies have a problem of overspecialization, i.e., users can only obtain the information indicated in their profiles and have no chance of exploring new information they might desire. Moreover, because of the complication of user profiles, the learning processes are always time consuming and are not appropriate if user preferences change rapidly and frequently. To solve the problem, in this paper, we propose an adaptive user profiling method using dynamic updating policy considering the change of the users‟ preferences over time and domain. Moreover, we employ collaborative filtering method to deal with the situation that users‟ preferences are frequently or continuously changed. Experimental results show that our method considerably improved personalized

- 134 -

Advances in Information Sciences and Service Sciences Volume 2, Number 4, December 2010

search performance for each user through automatic creation, maintenance, and personalization of user preference profiles that include search patterns of individual users. By using this user profile, our system can provide more personalized search results to users. The remainder of this paper is organized as follows. Section 2 describes related work of several user profile technologies for personalized information retrieval. Section 3 presents the architecture of our proposed system and its components. Section 4 describes some technologies to find similar users and experts, and explains the employed collaborative filtering. Section 5 shows the results of experiments about the satisfaction of participants. The last part of this paper summarizes the employed adaptive user profiling, and gives conclusions and discusses future researches.

2. Related Work In this section, we describe the current status of user profile-based personalized information retrieval systems with their problems. In general, most personalized information retrieval systems use both the user preference profile method and the filtering method that is commonly used in recommendation systems. We also employ both the user profile method and the collaborative filtering method, but our proposed system can provide more personalized and optimized search results to users by overcoming and improving the problems of existing systems.

2.1. Adaptive User Profiling In [1], the system uses a user profile approach to minimize the difference between users‟ perception and the physical features of data in an information retrieval system. By using the user profile method, the system could provide more suitable search results to users. The approach in this system just learns the created user profiles using machine learning technology. Therefore, if the user profiles have incorrect information then the system gives irrelevant results to users. Users can only obtain the information indicated in their profiles and have no chance of exploring new information they might desire. Moreover, because of the complication of user profiles, the learning processes are always time consuming and are not appropriate if user preferences change rapidly and frequently. Compared to this system, our method utilizes the users‟ relevance feedback to improve the profiles automatically using a genetic algorithm. A user sends feedbacks by clicking one of the search results, and then our system updates the user‟s profile by using the given feedbacks. The system describe in [5] focuses on the change of user preferences. The core problem of personalized recommendation is to model and track users' interests and their changes. To address this problem, both content-based filtering (CBF) and collaborative filtering (CF) have been explored in this system. User interests involve the interests on fixed categories and dynamic events, yet in the current CBF approaches, there is a lack of ability to model user's interests at the event level. So, the system in [5] proposed a novel approach to user profile modeling. In this model, user's interests are modeled by a multi-layer tree with a dynamically changeable structure, the top layers of which are used to model user interests on fixed categories, and the bottom layers are for dynamic events. Thus, this model can track the user's reading behaviors on both fixed categories and dynamic events, and consequently capture the interest changes. A modified CF algorithm based on the hierarchically structured profile model is also proposed. However, the proposed system in [5] is unsuitable in the situation that user preference information is frequently changed. In summary, our system employs user profile approach and collaborative filtering approach to deal a situation that user preferences are continuously and frequently changed.

2.2. Personalized Information Retrieval The study in [2] focuses on utilizing clickthrough data to improve the performance of Web search. Since millions of searches are conducted every day, a search engine accumulates a large volume of clickthrough data, which records who submits queries and which pages he/she clicks on. The clickthrough data is highly sparse and contains different types of objects (user, query and Web page), and the relationships among these objects are also very complicated.

- 135 -

Personalized Information Retrieval by Using Adaptive User Profiling and Collaborative Filtering Hochul Jeon, Taehwan Kim, Joongmin Choi

[2] attempts to discover Web users‟ interests and the patterns that users locate information by analyzing these data. The clickthrough data is represented by a 3-order tensor, on which 3-mode analysis is performed using the higher-order singular value decomposition technique to automatically capture the latent factors that govern the relations among these multi-type objects: users, queries and Web pages. A tensor reconstructed based on the CubeSVD analysis reflects both the observed interactions among these objects and the implicit associations among them. Therefore, Web search activities can be carried out based on CubeSVD analysis. However, it is difficult to apply commonly used search engines, because analysis is very complex. [3] focuses on how to model the user and his/her context in an extensible way that can be interpreted and used for personalization. [3] describes the architecture that provides personalization facilities based on the contextual user model for tourism usage. User modeling begins with the creation of a user profile. In [3], each user profile is created based on the ontology, and is used to influence the current context of the user. The ontology includes pairs of context and user behaviors in the past, and influences users‟ current context and behaviors. Therefore, the proposed system in [3] can provide personalized information by modeling based on the ontology. However, the abovementioned existing systems are all unsuitable in the situation that user preference information is continuously and frequently changed. [4] is working on a new system which learns to improve retrieval effectiveness by integrating the following factors: A. The user characteristics (user model or user profile). B. The characteristics in the interaction of the other users (social IR, stereotypes and collaborative information retrieval). C. The context of the research (context modeling). Such system may have the potential to overcome the current plateau in ad-hoc retrieval. [4] concerns on the first two elements: the user profile and the Collaborative Information Retrieval (CIR). CIR is an approach which learns to improve retrieval effectiveness from the interaction of different users with the retrieval system. Collaboration here assumes that users can benefit from search processes carried out at former times by other users although they may not know about the other users and their search processes. In other words, collaborative search records the fact that a result d has been selected for query q, and then reuses this information for similar queries in the future, by promoting results that were reliably selected in the past. However the goals and the characteristics of two users may be different so when they send the same query to a CIR system, they may be interested in two different lists of documents (known as personalization problem). Personalization is a common problem which the CIR researchers often encounter in constructing their systems. The proposed personalized system in [4] is the first attempt toward resolving the problem of personalization in the CIR systems by incorporating the user profiles. [4] uses three Profile Similarity (PS) calculation methods: query based PS, document based PS and query-document based PS. Query based PS approach merely considers the queries in the user profile. [4] believes that the user queries can partially represent the needs and the preferences of the users because the users express their requirements formally with the queries. Document based PS approach absolutely considers the documents that the user has studied or has marked as pertinent to his requirements. These marked documents lead the system to determine the users' needs. When a user reads a particular document it can be judged that the user's need is related to the content of this document. Thus the marked documents in a user profile can be useful to estimate the similarity between two profiles. This approach is very similar to the former approach (query based) except that instead of queries it deals with the documents the user has marked before. [4] uses both query based and document based approaches to partially capture the users' interests. However, the proposed system in [4] is also unsuitable in the situation that user preference information is continuously and frequently changed.

3. System

- 136 -

Advances in Information Sciences and Service Sciences Volume 2, Number 4, December 2010

In this section, we will introduce an adaptive user profiling model which can model user interests at both category (or domain) and event levels. After that, we will describe how it can be learned and utilized for prediction.

3.1. System Architecture As illustrated in Figure 1, our proposed system provides the search results that are retrieved from user profile search with weight value  and from an existing search engine with weight value (1-). This weight value is initialized to 0.5. This value is used to re-rank the search results, which means that from 10 results, our system selects top 5 results from user profile search and an existing search engine. This weight value is updated whenever users give feedback to the system. By increasing the  value, the number of search results from user profile search increases. On the contrary, if the  value decreases then the number of search results from user profile search decreases. Since user feedback is reflected on user profiles, the  value is dwindled at the initial stage, and with more feedbacks, it is increased.

Figure 1. System Architecture

3.2. User Profile As illustrated in Figure 2, a user profile in our system has a hierarchical multi-layer structure. Top layer is a domain layer that contains web search results that were selected by the user. Bottom layer is a search result layer that contains results that were selected by the user.

Figure 2. User Profile Structure

We define some parameters used in Figure 2 as follows :

- 137 -

Personalized Information Retrieval by Using Adaptive User Profiling and Collaborative Filtering Hochul Jeon, Taehwan Kim, Joongmin Choi

U : a set of Domains, U = {Domain1, Domain2, … , Domainn} Domainn : domain that includes the documents selected by the user. Domainn = T : a term vector for the given user query T = {t1 , t2 , t3 , … , tn} t = a term of the user query D : a set of documents selected by the user D = {D1 , D2 , D3 , … , Dn} Di = {ti1 , ti2 , ti3 , … , tin} t = an index term of the document selected by the user WDiTj : the probability(or weight) of the query using T j by the user in domain Di WTiDj : the probability(or weight) that the user will select document Dj By maintaining this hierarchical structure, system can more effectively deal with continuously changed user interests. To address frequently changed user interests, system provides more personalized search results or information by using collaborative filtering method. Our system finds users who have similar interests or much information for a given query, and then returns filtered results. Also, our system can deal with two situations by using the search results from existing search engines. The first situation is where there are no search results, and the second situation is where user interests are suddenly changed.

3.3. Probabilistic approach for personalized information retrieval Once a user query is inputted, our system launches the search process for the given query in two phases. The first phase is the preprocessing phase for the given query. A query is represented as a set as follows. Q = {t1 , t2 , t3 , … , tn} Q : a set of user query terms, t = a term of the query For each domain, our system selects similar queries that were used in the past by the user. The system takes the queries that have similarity values greater than threshold . The  value is decided through experiments. We use a modified cosine similarity method as a similarity measurement. n

SimT , Q    wti  wqi , wti  tf  WD jTi , wqi  tf  1

(1)

i 1

The second phase is the searching phase for the given query. For each query that was used in the past, the system selects similar documents that were used in the past by the user. The system takes the documents that have similarity values (or expected values) greater than threshold . As before, we use a modified cosine similarity method as a similarity measurement, n

SimD, Q    wdi  wqi , wdi  tf  WT j Di , wqi  tf  i 1

1 N

(2)

where N denotes the number of documents selected by the user. 3.4. Discussion In the engineering psychology, researchers represent the diagnostic stage of decision making as a process by which the decision maker is confronted by a series of cues or sources of information, as

- 138 -

Advances in Information Sciences and Service Sciences Volume 2, Number 4, December 2010

shown below, bearing on the true state of the world. The decision maker attends to some or all of these with the goal of using those cues to influence belief in one of several alternative hypotheses[6]. The process of decision making of human in the engineering psychology is as follows : A. Take hypotheses(true state of the world) B. Extract physical features from cues based on hypotheses C. Apply weight(or probability) to physical features based on experience D. Choose data(information) Therefore, using the past query and documents that were clicked by the user is very useful and efficient in personalized information retrieval. Our proposed system reflects this process through rerank factor  and user profile architecture.

4. User Finding and Collaborative Filtering Our system finds similar users and expert users in the domain to provide more personalized results. By using the collaborative filtering method, the system can handle „no results‟ situation or „unsatisfied‟ situation where all the results have similarity values less than threshold value. These situations imply that user interests are very fast and suddenly changed.

4.1. Similar Users Equation (3) measures the distance between a term vector T in the profile of one user and a term vector T‟ in the profile of another user. A shorter distance means more similar users. Finding similar users enables the handling of frequently and continuously changing user interests feasible through collaborative filtering.

DisT , T  

n

 t i 1

1

 t1 

2

(3)

4.2. Expert Users An expert user is selected from the outside users who have not similar query patterns with the current user, but have much high quality information for a given query. In this paper, 5 experts are searched for each domain. The process of finding expert users proceeds off-line.

5. Experiments To evaluate the performance of the adaptive user profiling approach, we adopted a satisfaction degree measure that has been used in machine learning and classification society. In this paper, we measured the satisfaction for the adaptive user profiling using the following formula. 5 users participated in the experiments for 4 weeks. SAT 

Number of Retrieved Satisfied Documents Number of Retrieved Documents(10)

(4)

Equation (4) measures the satisfaction degree by calculating the ratio of the number of retrieved relevant documents over the retrieved top 10 documents. Each user carried out at least 2 queries everyday, and our system did not explicitly record the satisfaction for each result, instead we gathered the satisfaction data through the users‟ implicit feedbacks by using the decision tree conditions of [12]. Figure 3 shows that the satisfaction degree of each user for the top 10 search results during the experiments that were carried out for 4 weeks. As shown in Figure 3, the number of relevant documents in top 10 for a given query is increased for most participants as time goes by. This result indicates that the re-access ratio for the same documents is increased for most participants. On the

- 139 -

Personalized Information Retrieval by Using Adaptive User Profiling and Collaborative Filtering Hochul Jeon, Taehwan Kim, Joongmin Choi

contrary, the decrease of the number of satisfying documents over time implies the decrease of the reaccess ratio, and this situation happens occasionally including the case for „user1‟ from the first to the second week . The cause of the decrease of the ratio is that a user uses a query using new terms, finds some new documents whenever doing search, or re-accesses the same documents by using queries with different terms.

Figure 3. The satisfaction for top 10 search results Figure 4 show a comparison of the average number of the relevant documents in the top 10 results by using the adaptive user profiling method with the Google search results for the same queries. As represented in this figure, the adaptive user profiling approach outperforms Google‟s by 16.4%.

Figure 4. A comparison of the satisfaction for the top 10 search results Figure 5 shows the reflection values of each user. A user who has a smaller reflection value implies that he/she used different queries whenever doing search, or employed different queries to re-access the same documents. On the other hand, a user who has a larger value implies that he/she has higher reaccess ratio for the same documents and frequently uses the same query.

- 140 -

Advances in Information Sciences and Service Sciences Volume 2, Number 4, December 2010

Figure 5. Reflection value of each user

6. Conclusion We have proposed an adaptive user profiling method using dynamic updating policy considering the change of the user preferences over time and domain. Moreover, we used collaborative filtering method to deal with the situation that user preferences are frequently and continuously changed. Our proposed system reflects the process of decision making of human that was described in the engineering psychology through re-rank factor  and user profile architecture. Our system eventually returns improved, personalized search results for each user through automatic creation, maintenance, and personalization of user preference profiles that include search patterns for each user. To improve the performance of personalized information retrieval, the various analysis information for web documents that were selected by users has need to utilize such as personal PageRank.

7. References [1] C. Shahabi, Y. Chen, “Automatically Improving the Accuracy of User Profiles with Genetic Algorithms”, In Proceedings of the Fourth Annual IASTED International Conference on Artificial Intelligence and Soft Computing(ASC2001), Cancun, Mexico, pp. 283-288, 2001. [2] J. Sun, H. Zeng, H. Liu, Y. Lu, and Z. Chen, “CubeSVD: A novel approach to personalized web search”, In Proceedings of the Fourteenth International World Wide Web Conference, ACM Press, pp. 382-390, 2005. [3] Z. Jrad, M. Aufaure, M. Hadjouni, “A Contextual User Model for Web Personalization”, In Proceedings of the Wise Workshops 2007, pp. 350-360, 2007. [4] H. Naderi, and B. Rumpler, “PERCIRS: a PERsonalized Collaborative. Information Retrieval System”, In Proceedings of the INFORSID, pp. 113-127, 2006. [5] J. Wang, Z. Li, J. Yao, Z. Sun, M. Li, and W. Ma, “Adaptive User Profile Model and Collaborative Filtering for Personalized News”, In Proceedings of the APWeb 2006, pp. 474-485, 2006. [6] C.D. Wickens, and J.G. Hollands, “Engineering Psychology and Human Performance”, in Proceedings of the 3rd ed. Upper Saddle River, NJ: Prentice-Hall Inc, pp. 293-330, 2001. [7] B. van Gils, H.A. Proper, P. van Bommel, and E.D. Schabell, “Profile-based retrieval on the World Wide Web”, In Proceedings of the Conferentie Informatiewetenschap(INFWET2003), pp. 91–98, 2003. [8] M. Pazzani, and D. Billsus, “Learning and revising user profiles: The identification of interesting web sites”, In Proceedings of the Machine Learning, pp. 27:313-331, 1997. [9] A. Tan and C. Teo, “Learning user profiles for personalized information dissemination”, In Proceedings of the International Joint Conference on Neural Network, pp. 183-188, 1998.

- 141 -

Personalized Information Retrieval by Using Adaptive User Profiling and Collaborative Filtering Hochul Jeon, Taehwan Kim, Joongmin Choi

[10] W. Lam, S. Mukhopadhyay, J. Mostafa, and M. Palakal, “Detection of shifts in user interests for personalized information filtering”, In Proceedings of the 19th International ACMSIGIR Conference on Research and Development in Information Retrieval, pp. 317-325, 1996. [11] A. Moukas, “Amalthea: Information discovery and filtering using a multiagent evolving ecosystem”, In Proceedings of the Conference on Practical Applications of Agents and Multiagent Technology (PAAM), vol. 11, pp. 437-457, 1996. [12] S. Fox, K. Karnawat, M. Mydland, S. Dumais and T. White, "Evaluating implicit measures to improve web search", ACM Transactions on Information Systems, vol.23, no.2, pp. 147~168, 2005.

- 142 -