A Multi-Agent Based Personalized Meta-Search Engine Using ...

1 downloads 0 Views 264KB Size Report
In the Captain Nemo project [5] is a fully-functional meta-search engine with personalized hierarchical search spaces which retrieves and presents search ...
2010 Third International Conference on Knowledge Discovery and Data Mining

A Multi-Agent Based Personalized Meta-Search Engine Using Automatic Fuzzy Concept Networks Batool Arzanian; Fardin Akhlaghian; Parham Moradi Department of Electrical Engineering University of Kurdistan Sanandaj, Iran e-mail: [email protected]; [email protected]; [email protected] Abstract— As a result of the rapid growth and dynamic content of the web, the general purpose web search engines are becoming deficient. Although the meta-search engines can help by increasing the search coverage of the web, the large number of irrelevant results returned by a meta-search engine is still causing problems for the users. The personalization of metasearch engines overcomes this problem by filtering results respect to individual user's interests. In this paper, a multiagent architecture is introduced for personalizing meta-search engine using the fuzzy concept networks. The main goal of this paper is to use automatic fuzzy concept networks to personalize results of a meta-search engine provided with a multi-agent architecture for searching and quickly retrieving. Experimental results indicate that the personalized metasearch results of the system are more relevant than the combined results of the search engines. Keywords- Search Engine, Meta-search Engine, Web Personalization, Multi-agent, Fuzzy Concept Networks, User Profiles.

I.

INTRODUCTION

The continuous expansion of the web and fully dynamic web environment require to developing and maintaining efficient search tools for the web which have became a challenging problem. While available search engines are massively used by users, they present a number of problems and limitations. Some problems are partial covering of available web pages, erroneous answers (e.g., links to pages not available any more), and redundancy of results [1]. The meta-search engines can provide unified access to multiple existing search engines. For increasing the search coverage of the web and improving the scalability of the search, meta-search engines directly send the user query to several search engines rather than performing searches on locally built indexes as search engines. Then they combine the results of searches and display the obtained results to users. Although the available meta-search engines partially solve the problem of partial covering of existing web pages and up-to date their indexes, most of them have not a well mechanism to overcome the other two problems of search engines, that is, erroneous answers and redundancy of results [7]. A meta-search engine returns the large amount of results which some of them are irrelevant for the users. In other words, users are overwhelmed by large number of search results as well as the search results are not efficient to 978-0-7695-3923-2/10 $26.00 © 2010 IEEE DOI 10.1109/WKDD.2010.95

satisfy most of users. Therefore the personalization of metasearch engines can efficiently help users to find information to their relevant needs. Personalization process extracts users’ preference implicitly and provides immediate response by filtering and re-ranking the results respect to individual user's interests. The information overload problem on the web implies the need of the searching and quickly retrieving the appropriate information for users. Several researches show that using intelligent agents in the information retrieval system is an efficient way to solve this problem [3]. In addition, the multi-agent architecture for the meta-search engines is easier to extend, maintain, and distribute [3]. In this paper, a multi-agent architecture is introduced for the personalization of the meta-search engines using the fuzzy concept networks. The main goal of this paper is to use automatic fuzzy concept networks to personalize the metasearch engine results provided with a multi-agent architecture. II.

RELATED WORKS

Several meta-search engines offer primitive personalization services. For example, Query Server, Profusion, Infogrid, Mamma, Ixquick let the user define the search engines to be used [4]. Excalibur project [2] is a personalized meta-search engine that extracts users’ preference implicitly and reranking the results by using the Naive Bayesian classifier and the resemblance measure. Radovanovic and Ivanovic [6] proposed a meta-search engine, called CatS that utilizes text classification techniques to improve the presentation of search results and displays a tree of topics derived from the dmoz Open Directory topic hierarchy which can be browsed by user. In the Captain Nemo project [5] is a fully-functional meta-search engine with personalized hierarchical search spaces which retrieves and presents search results according to personalized retrieval models and presentation styles. A few projects have also paid to the architecture of intelligent meta-search engines that have been named in following: In the WebNaut project [7] is a multi-agent based metasearch engine that consists of a set of interconnected agents and uses a meta-genetic algorithm for learning of the user’s interests and personalizing search results. Kanteev, Minakov, Rzevski, Skobelev, and Volman [8] proposed a multi-agent text understanding system which is based on the semantic understanding of pages content and 208

generating of the semantic descriptors as well as uses the knowledge about problem domain that is stored in the form of ontology. III.

BACKGROUND

A fuzzy concept network used in information retrieval systems includes nodes and directed links, where each node represents a concept ci ∈C or a document dj ∈D, each directed link connects two concepts or connects one concept ci to one document dj and is labeled with a real value between zero and one [9]. Figure 1 shows an example of fuzzy concept network. μ cj, then it indicates that the degree of relevance If ci ⎯⎯→ μ from concept ci to concept cj is µ, If ci ⎯⎯→ dj, then it indicates that the degree of relevance of document dj with respect to concept ci is µ, where µ ∈ [0,1]. μ cj is represented with f(ci,cj) = µ and Expression ci ⎯⎯→ μ expression ci ⎯⎯→ dj is represented with g(ci,dj) = µ where f and g are mapping functions. In a concept network a document has a different relevance value with respect to each concept. For each document dj, the document descriptor Idj on the basis of the binary indexing relation I, is a fuzzy subset of C defined as follows: I = {µI(d,c),(d,c)|d ∈ D, c ∈ C } µI : D×C→[0,1]

(1) (2)

Where µI is a membership function and indicates the degree of relevance of document d with respect to concept c. Therefore the document descriptor matrix H is defined as follows [10]: ⎡ h11 ⎢h H = ⎢ 21 ⎢ # ⎢ ⎣hm1

h12 h22 # hm 2

" h1n ⎤ " h2 n ⎥⎥ " # ⎥ ⎥ " hmn ⎦

hij = Idj(ci), 1≤i≤m,1≤j≤n

(3)

C= {c1, c2, …, cn} is a set of concepts. A fuzzy concept matrix K is a matrix where Kij ∈ [0,1]. The (i,j) element of K represents the degree of relevance from concept ci to concept cj. K2 = K ⊗ K is the multiplication of the concept matrix [10]. n

K 2ij = ∨ ( Kil ∧ Klj ) , 1 ≤ i , j ≤ n l =1

(4)

∨ and ∧ represent the max operation and the min operation, respectively. Then, there exists an integer ρ ≤ n-1, such that Kρ = Kρ+1 = Kρ+2 =…. Let K*=Kρ. K* is called the transitive closure of the concept matrix K. Missed information of fuzzy concept network can be inferred from the transitive closure of itself [10]. The relevance degree of each document, with respect to a specific concept, can be improved by computing the multiplication of the document descriptor matrix H and the transitive closure of the concept matrix K as follows.

H* = H ⊗ K*

(4)

H* is called the expanded document descriptor matrix [10]. IV.

THE PROPOSED PERSONALIZATION METHOD

We used the automatic fuzzy concept networks for personalizing meta-search results. My proposed method consists of two stages that are described in following. In the first stage, the user profiles including web pages that already visited, are collected. After preprocessing on the profiles, a fuzzy concept network is automatically generated for them according to predefined concepts vector. The fuzzy concept network represents the degree of relevance between the concepts considering user's profiles. The degree of relevance between concepts ci and cj is computed by the following formula:

μ (ci , c j ) =

f ci + f c j − f ci − f c j N

(5)

Where fci is the number of times the concept ci appears in the user's profiles and fcj is the number of times the concept cj appears in them. N is the number of total concepts in the profiles. µ(ci,cj) indicates ratio of the number of joint times the concepts ci and cj appear in user profiles that is degree of relevance between each pair of concepts considering the profiles. In the next stage, a user's query is posed to the search engine and search engine returns a list of results including the retrieved web pages. A fuzzy concept network also is generated for search engine results called document descriptor matrix. This matrix represents the degree of relevance between concepts and documents retrieved and is calculated using tf-idf (term frequency- inverse document frequency) for each pair of concept and document as follows: f ci d j (6) tf − idf c i , d j = f ∑ ci d j

(

)

dj

Figure 1. Fuzzy concept network

Where numerator term indicates the number of times the concept ci appears in the document dj and denominator term indicates the number of times the concept ci appears in total retrieved documents. tf-idf(ci,dj) also indicates the degree of relevance between each concept and each document. Then ranking algorithm computes the transitive closure

209

Search Agents User Agent

User’s profile

User

Google Agent

Yahoo Agent

Msn Agent

Ask Agent

WWW

Personalization Agents FCN1 Agent

FCN2 Agent

Ranking Agent

Figure 2. The proposed architecture

of the fuzzy concept network corresponding to user's profiles and multiplies it by the fuzzy concept network obtained from user's retrieved documents. The ranking algorithm also sums the elements at each row of the result matrix to obtain the ranking of each document and reorders documents based on their rankings. V.

THE PROPOSED ARCHITECTURE

VI.

In this paper, a multi-agent based personalized metasearch engine is introduced that uses the automatic fuzzy concept networks for personalizing the meta-search results. The proposed architecture for this personalized meta-search engine is depicted in figure 2. The implemented system consists of user agent, search agents group and personalization agents group. The search agents group is including 4 agents, namely, Google agent, Yahoo agent, Ask agent and Msn agent that each is responsible for communication with a specified search engine. The personalization agents group is including 3 agents, namely, FCN1 agent, FCN2 agent and ranking agent. The communication between agents accomplishes through the asynchronous message passing. The key responsibility of each of agents is described in the next section. Here, we used the conventional two-level architecture [11] for the meta-search stage and two-level architecture for the personalization stage. Therefore four-level architecture has been used as is depicted in figure 4. User Agent

Google Agent

Yahoo Agent

Ask Agent

EXPERIMENTAL RESULTS

We have used this multi-agent architecture for developing a personalized meta-search engine. We used JADE (Java Agent DEvelopment Framework) for building multi-agent framework of our meta-search engine. This framework that is written completely in Java, facilitates the development of complete agent-based applications [13]. In order to test and evaluate the system, some user's profile was collected and a concepts vector length 100 including concepts in the field of computer was defined. The URLs appearing in the first page of each search engine were combined by Borda rule and the fuzzy concept network was generated for top 5 URL appearing in the Borda list. We used evaluating measure d for comparing the ranking performed by the proposed system with the ranking obtained by applying Borda rule to meta-search results. The average difference between the rankings is defined as follows: 1 m (7) d = ∑ ri − ri′ m i =1 Where m is the number of web pages, ri the ranking of the user, and r'i is the ranking produced by the proposed system or Borda rule. User Agent

Msn Agent

Results

t ec or

Figure 3. four-level architecture of proposed meta-search engine

sv pt

Ranking Agent

e nc Co

Personalized ranking

FCN2 Agent

Search Agents

Query

User’s profile + Concepts vector

FCN1 Agent

Considering the above mentioned, the key responsibility of each of agents is as follows: The user agent communicates with the user and gets user's query, predefined concepts vector and user's profile. After preprocessing the user's profiles, the user agent sends it to FCN1 agent. The user agent also sends user's query to search agents, and predefined concepts vector to FCN1 and FCN2 agents. Each of the search agents dispatches the user query to related search engine and sends the search results to the FCN2 agent. The FCN1 agent generates a fuzzy concept network for the user's profile automatically. Then, it sends the fuzzy concept network to the ranking agent. The FCN2 agent merges results of each of search agents based on Borda rule [12] and generates a fuzzy concept network for the list of combined results automatically. Finally it sends the generated fuzzy concept network to the ranking agent. The ranking agent applies the ranking algorithm, already explained, to fuzzy concept networks. Then it sends the personalized ranking of meta-search results to the user agent. The communications between agents are summarized in figure 4.

FCN1 Agent FC N

FCN2 Agent 1

Ranking Agent

N2 FC

Figure 4. communications between agents

210

TABLE I.

TOP 5 URL APPEARING IN BORDA LIST

Query = "oop" 1 2 3 4 5

TABLE II.

http://en.wikipedia.org/wiki/Object-oriented_programming http://www.oopstuff.com/ http://www.acronymfinder.com/OOP.html http://www.geocities.com/tablizer/oopbad.htm http://www.tonymarston.net/php-mysql/what-is-oop.html

RANKING OF SIX USERS

TABLE III.

User2

User3

User4

User5

User6

User1

User2

User3

User4

User5

User6

5 3 2 1 4

4 1 5 3 2

4 1 3 2 5

1 3 4 5 2

5 4 3 2 1

5 1 2 3 4

5 3 2 4 1

4 3 1 5 2

4 2 3 1 5

1 3 4 5 2

5 4 2 3 1

4 1 5 2 3

TABLE IV. The proposed method Borda rule

EVALUATING MEASURE D

User1

User2

User3

User4

User5

User6

Average

0.4 2

0.8 2

0.8 1.2

0 1.2

0.4 2.2

1.6 1.6

0.67 1.7

The results are presented in following. Table 1 shows top 5 URL appearing in Borda list that each user evaluates them. Table 2 shows the ranks six users made. Table 3 shows the personalized ranking of search results for six users. Shade box shows if personalized rank is equal to user checking's. Table 4 shows the evaluating measure d for the proposed system and Borda rule that indicates improvement about 61% by the proposed system against Borda rule.

[4]

VII. CONCLUSIONS

[6]

The proposed meta-search engine uses the automatic fuzzy concept networks for personalizing the meta-search results. The meta-search engines can overcome some of the limitations of search engines and directories. The multi-agent architecture for implementing the meta-search engines is made to extend, maintain, and distribute easily. The system reorders retrieved meta-search engine results with respect to user's interest. The enrichment user profiles before generating fuzzy concept network automatically using ontology can help to get some better results. REFERENCES [1]

[2]

PERSONALIZED RANKING

User1

E. Bolognesi, and A. Brogi, "A prolog meta-search engine for the world wide web," Proc. Inter. Workshop on Distributed and Internet Programming with Logic and Constraint Languages (DIPLCL99), Las Cruces (USA), Nov. 1999. L. Yuen, M. Chang, Y.K. Lai, and C.K. Poon, "Excalibur: a personalized meta-search engine," Proc. 28th Annual International Computer Software and Applications Conference (COMPSAC'04), 2004, vol. 2, pp.49-50.

[3]

[5]

[7]

[8]

[9]

[10]

[11]

[12] [13]

211

A.H. Keyhanipour, M. Piroozmand, B. Moshiri, and C. Lucas, "A multilayer/multi-agent architecture for meta-search engines," Proc. ICGST International Conf. on Artificial Intelligence and Machine Learning (AIML-05), Cairo, Egypt, 2005. S. Souldatos, T. Dalamagas, and T. Sellis, "Sailing the web with Captain Nemo: a personalized meta-search engine," Proc. the ICML 2005 Workshop: Learning in Web Search (LWS'05), Bonn, Germany, Agu. 2005. S. Souldatos, T. Dalamagas, and T. Sellis, "Captain Nemo: A metasearch engine with personalized hierarchical search space," INFORMATICA -LJUBLJANA-, vol. 30, 2006, pp. 173-182. M. Radovanovic, and M. Ivanovic, "CatS: A classification-powered meta-search engine," Advances in Web Intelligence and Data Mining, Springer-Verlag, vol. 23, 2006, pp. 191–200. N. Zacharis, T. Panayiotopoulos, "SpiderServer: the meta-search engine of WebNaut," Proc. the 2nd Hellenic Conf. on Artificial Intelligence, 2002, pp. 475-86. M. Kanteev, I. Minakov, G. Rzevski, P. Skobelev, and S. Volman, "Multi-agent meta-search sngine based on domain ontology," Springer, vol. 4476, July 2007, pp. 269-274, doi: 10.1007/978-3-540-72839-9_22. S. M. Chen, and Y. J. Horng, "Fuzzy query processing for document retrieval based on extended fuzzy concept networks," IEEE Trans. Syst. Man Cybern., vol. 29, no. 1, 1999, pp. 96–104. K. J. Kim, and S. B. Cho, "A personalized web search engine using fuzzy concept network with link structure," Proc. Joint 9th IFSA World Cong. and 20th NAFIPS Internat. Conf., vol. 1, Vancouver, Canada, 2001, pp. 81–86. W. Meng, and K.L. Lui, "Building efficient and effective meta-search engines," ACM Computing Surveys, vol. 34, no. 1, March 2002, pp. 4884. J. C. Heckelman, "Probabilistic Borda rule voting," Social Choice and Welfare, vol. 21, no. 3, 2003, pp. 455–468. F. L. Bellifem, G. caire, and D. Greenwood, "Developing multi-agent systems with JADE," 2007.

Suggest Documents