265 Agent Based Information Retrieavl System - CiteSeerX

1 downloads 886 Views 268KB Size Report
ABSTRACT. The quantity of presentations on the Internet is constantly increasing. ... However, these search engines are not as sophisticated as one might be expect. .... One of the effective optimization algorithms is a genetic algorithm (GA).
(‫ﻡ‬2004 ‫ ﺃﺒﺭﻴل‬/ ‫ﻫـ‬1425 ‫ ﺍﻟﻤﺩﻴﻨﺔ ﺍﻟﻤﻨﻭﺭﺓ )ﺼﻔﺭ‬، ‫ ﺠﺎﻤﻌﺔ ﺍﻟﻤﻠﻙ ﻋﺒﺩﺍﻟﻌﺯﻴﺯ‬، (‫ﺍﻟﻤﺅﺘﻤﺭ ﺍﻟﻭﻁﻨﻲ ﺍﻟﺴﺎﺒﻊ ﻋﺸﺭ ﻟﻠﺤﺎﺴﺏ ﺍﻵﻟﻲ )ﺍﻟﻤﻌﻠﻭﻤﺎﺘﻴﺔ ﻓﻲ ﺨﺩﻤﺔ ﻀﻴﻭﻑ ﺍﻟﺭﺤﻤﻥ‬

Agent Based Information Retrieavl System F. Eissa, H. Alghamdi King Abdulaziz University, Jeddah, Saudi Arabia ABSTRACT. The quantity of presentations on the Internet is constantly increasing. This implies the problem of searching and quickly retrieving the appropriate information. Several researches show that using intelligent agents in the information retrieval system is an efficient way to solve this problem. Different proposed systems use different types of agents. Some agents used to understand the user requirements. Others used to increase the performance. In this paper we propose a multi-agent based information retrieval system. The system can be used to search for long term interest as well as short term interest. We take the advantages of several techniques borrowed from the software engineering, artificial intelligence and information retrieval fields. The proposed system creates a user profile to understand the user long term interest. The profile is optimized with genetic algorithm and adapted with the relevance feedback. The user queries and the retrieved documents are analyzed with natural language processing techniques. We show how our proposed system increases both the network and information retrieval performance.

1. Introduction The amount of information available via WWW has increased and continues to increase, so the need for effective searching tools is increasing. The search engines represent such tools. However, these search engines are not as sophisticated as one might be expect. Although they are powerful and efficient at locating matching terms and phrases. They are also currently dumb passive systems that require resourceful, active, intelligent human users to produce acceptable results. Generally speaking, there are two ambiguities concerning Internet information retrieval by using a search engine. The first ambiguity is about information sources. A standard search engine hits a page just when the page includes the specified keywords. The second ambiguity is about user's queries. The user may not know how to formulate the query since different search engine have different interface. For some years up to now, agents have been proposed as a way of enhancing the search services on the WWW. Negroponte and Kay [1] were among the first to recognize the value of agents. A number of researchers have explored the use of agents for information retrieval. But first, let us explain what an agent is. In fact, there are several definitions of agents. One can also describe rather than define agents in terms of their task, autonomy, and communication capabilities. Some of the major definitions and descriptions of agents are: "An agent is a software thing that knows how to do things that you could probably do yourself if you had the time."[2] "A piece of software which performs a given task using information gleaned from its environment to act in a suitable manner so as to complete the task successfully. The software should be able to adapt itself based on changes occurring in its environment, so that a change in circumstances will still yield the intended result."[2] "An agent is one who is authorized to act for or in the place of another" (Merriam Webster's Dictionary).

265

266 "An agent is a software program that can perform specific task for a user and processes a degree of intelligence that permits it to perform parts of its tasks autonomously and to interact with its environment in a useful manner".[3] The last definition leads us to the three categories of agents: information, cooperation and transaction agent. Information agent is the one that we will describe because it is the agent that is used to enhance the search service. Information agent supports its user in the search for information in distributed systems or networks. [3] The information agent should be capable of the following tasks: locate information sources, extract information from the sources; filter the information of relevance for the user from the total quantity of found information using the user's interest profile; prepare and present the results in an appropriate form. An agent-based search engine represents an example of information agent that can be used to overcome the problems associated with the simple search engine. Using agents when searching the WWW has certain advantages compared to current methods, such as using a simple search engine. For example an information search is done, based on one or more keywords given by a user. This requires that the user is capable of formulating the right set of keywords to retrieve the wanted information otherwise he/she may get stuck with a large number of references. On the other hand agents are capable of searching information more intelligently, for instance because tools (such as a thesaurus) enable them to search on related terms as well, or even on concepts. Agents will also use these tools to fine-tune, or even correct user queries (on the basis of a user model, or other user information). 2. Related Works The application of intelligent agents to the design of information retrieval systems has drawn some attention in recent years and their benefits have been demonstrated in several applications the following are some examples: WebSifter II: Larry K. and Wooju K. have proposed a semantic based personalized meta search agent approach to achieve two goals: allowing the users more expressive power in formulating their web searches and improving the relevancy of search results based on the user’s real intent. They designed and are presently implementing a met-search agent system called WebSifter II that cooperates with WorldNet for concept retrieval and most well known search engines for web page retrieval.[4] SAIRE (A Scalable Agent-Based Information Retrieval Engine) developed by Odubiyi et al. (1997) is a multi-agent search engine that uses agent technology, natural language understanding, and conceptual search techniques to support public access to Earth and Space Science Data over the Internet. SAIRE provides a Web based, integrated user interface to distributed data sources maintained by NASA and NOAA. Two dictionaries, a main dictionary and a personal dictionary, are used to specify a user’s domain of interest. The main dictionary contains words with semantic meaning related to SAIRE’s specific domains, while the personal dictionary contains words that might have multiple meanings in the domain, as well as new words defined by users. Thus, SAIRE can learn new words through interacting with users to define unknown words as well as to clarify words with multiple meanings. [5] Amalthaea (Moukas & Zacharia 1997) is a multi-agent system for personalized filtering, discovery and monitoring of information sites. Its main goal is to assist the users in finding interesting information on the Web. There are two kinds of agents in Amalthaea: filtering agents that model and monitor the interests of the user and discovery agents that model the

267 information sources. Both the user’s interest and retrieved documents from Web sites are represented by weighted keyword vectors. The information agents pick one document from the downloaded set passed by the discovery agents and calculate how a confidence level is that specific document will satisfy the user’s needs. The confidence measure is not different from the typical normalized similarity measure (cosine) used in the vector space model in information retrieval. A particular feature of Amalthaea is that it provides a market-like ecosystem in which agents evolve, collaborate and compete to survive. Agents that are valuable (useful) to the user and to other agents are allowed to reproduce while lowperforming agents are destroyed to save system resources. Though its multi-agent architecture was not explicitly specified, Amalthaea appeared to be a kind of specification sharing systems. [5] Multi-Agent Based Intelligent WWW Interfacer: Yasuhiko kitamoura has proposed a multi-agent based intelligent WWW interfacer to deal with the ambiguities of information sources and user’s query; he developed keyword spice and multi-character interface technologies and showed their potentials. [6] Agent Communication Inside an Internet Search Engine: Maitel. Francisco M. and others developed a new information retrieval by using autonomous software agent, in this system each, agents corresponding to each web pages from existing search engine’s result. These agents communicate each other to get more agree and complete and less disagree and incomplete, therefore proposed system provides the point view about opinion on the Internet community automatically. This mechanism will support for searcher who does not know about something with out misdirection from ill structured web network. [7] Using an Intelligent Agent to Enhance Search Engine Performance: James Jansen has developed an autonomous, intelligent agent that uses a user preference algorithm based on short-term user preference. Existing information filtering agents develop profiles of user preferences over an extended time frame. For this project, the agent would immediately begin to make decisions based on the information available, regardless of the quantity. The also proposed a technique for performance evaluation of agent system compared to non-agent system. [1] YourAmigo YourAmigo Pty Ltd developed The YourAmigo Enterprise Search Engine which is an agent based search engine that uses a patent pending architecture and technology, the basis of most web search engine particularly in its ability to index dynamic and recently updated web pages.[8] 3. Current Work We propose a multi agent based information retrieval system. The system can be used to search for long term interest or short term interest. We take the advantages of several techniques borrowed from the software engineering, artificial intelligence and information retrieval fields. Our objective is to enhance the search process and help the searcher to find his requirements as fast and accurate as possible. First we discuss our design goals that lead us to use some techniques. Then we present a background of these techniques. After that we present our proposed system design model. The detailed specification then presented. Finally, the conclusion presented. 4. Design Goals In our proposed information retrieval system we would like to achieve the following goals:

268 1- Increase Information Retrieval System Performance: Given that there exists a set of documents and a person who has an interest in the information in some of them, one can define the optimal information retrieval system as; find all the relevant and none of the irrelevant documents. The documents that contain information of interest are relevant. The other documents are not. In light of this there are two standards [1] that measure the performance of the information retrieval systems: Recall: relevant documents retrieved/ total number of relevant documents Precision: relevant documents retrieved/ total number of retrieved documents It is obvious that for an information retrieval system to be effective it should increase both the recall and precision in the retrieved documents. It has been shown [9] that users could miss 77% of the references they would find most relevant by relying on a single search engine (low recall). So, in order to achieve high recall meta-search engines were designed. They query several simple search engines in a parallel process. They provide the user with the results of the associated search query in a compressed and improved form compared with the simple search engine. Because the meta-search engines do not use their own databases to store the information, they access and process the stored data of the simple search engines. Representatives are SavvySearch and MetaCrawler[3]. However the search results of meta-search engines are usually not relevant to the query because of a problem with keyword matching. Therefore, it is important to increase precision in the retrieved documents. In our opinion, the effective ways to increase precision are: using matching method more powerful than keywords matching and applying natural language processing (NLP) on both user’s query and the retrieved documents. Applying NLP techniques enables us to extract meaningful phrases from the user’s query. So, the system will be considered more intelligent. We apply these methods also in the retrieved documents. Finally, Instead of using simple keyword matching method we propose to use vector space model, a well-known model in the information retrieval field, since it can provide more effective matching way. 2- Decreasing Communication Cost: The information retrieval task can be performed by either a stationary or a mobile agent, [3] there are fundamental differences in the resulting method of operation. The stationary agent generate high network load to perform its task. In addition, it requires a continuous network connection. The mobile agent, in contrast, can be used to achieve the same goal but with a very low network load. Once the mobile agent arrives to its destination, the connection of its user to the network can be terminated. Therefore, it does not require a continuous network connection. Only after the mobile agent finishes its task the connection need to be reestablished; and this only for the relatively short time needed to transfer the agent. Thus, the mobile agent can be used to develop powerful and dynamic distributed system instead of static client-server architecture. However, with the use of mobile agent technology the resource demands on the associated server systems increase, because these must execute and supports the mobile agent technology. 3- Understanding User’s long term interest: The understanding could be achieved by creating user’s profile from the user’s examples. Therefore, the profile will consist of most interesting user’s subjects. The user profile is not static. Once we create it the process not finish, in fact it is started. We need to optimize the profile to reflect exactly the user’s interest. One of the effective optimization algorithms is a genetic algorithm (GA). In our proposed system GA used to optimize the user’s profile. We also need to adapt the profile according to the user’s feedback. We apply the relevance feedback process borrowed from information retrieval field in our proposed system.

269 By concern these goals we propose a multi-agent based information retrieval system. We suggest using a mobile agent in our system to take the advantage of a mobile agent. We assume that the suitable infrastructure exist, i.e. The needed server installs one of a mobileagent system such as Concordia or aglet. We also suggest using a natural language processing based agent, that extract meaningful phrases from user’s examples (profile) or user’s natural queries and from retrieved documents. Vector space model was used as a matching method between queries and documents. Thus, we can achieve high precision in the results .To achieve high recall we suggest using a meta-search engine as a component of our proposed system. In the next sections we give a background about the architecture of the typical information retrieval system and genetic algorithm. Then we present the architecture of our proposed system. Explanation of its each component then presented, interaction diagram used for explanation. We conclude with the results and conclusion. 5. Background 5.1 Information Retrieval System A document based IR system typically consists of three main subsystems: document representation, representation of users' requirements (queries), and the algorithms used to match user requirements (queries) with document representations. [10] The basic architecture is as shown in figure 1. Retrieved Documents users Queries Relevance Feedback Documents

Matching algorithm Documents Representation

Fig. (1) : Basic architecture of the information retrieval system

Document contents are transformed into a document representation (either manually or automatically). Documents representations are done in a way such that matching these with queries is easy but it should correctly reflect the author's intention. Queries transform the user's need into a form that correctly represents the user's underlying information requirement and is suitable for the matching process. Query formatting depends on the underlying model of retrieval used (Boolean models, vector space models, probabilistic models, etc.).[6] By using the vector space model, a document can be represented by a vector of its unique words, the terms vector (t), along with their frequencies (f). A weights vector (w) could be calculated based on the frequencies of the terms. (Figure 2)

Fig. (2) : Vector space model

Information-retrieval systems typically calculate weights from the so-called tf*idf. [11]. A matching algorithm matches a user's requests (in terms of queries) with the document representations and retrieves documents that are most likely to be relevant to the user. The user rates documents presented as either relevant or non-relevant to his/her information need

270 Relevance feedback is typically used by the system to improve document descriptions or queries with the expectation that the overall performance of the system will improve after such a feedback. 5.2 Genetic Algorithm The basic principles of GAs were first laid down rigorously by Holland [4]. Genetic algorithms (GAs) are adaptive methods which may be used to solve search and optimization problems. They are based on the genetic processes of biological organisms. Over many generations, natural populations evolve according to the principles of natural selection and survival of the fittest. Genetic algorithms are able to "evolve" solutions to real world problems, if they have been suitably encoded. The analogy in information filtering makes use of vector space model to represent the documents. The genetic would represent a gene as a term, an individual as a document and the community as a profile. After recombining the terms of the two parents' documents, an objective function is used as the survival process to decide whether or not to keep the two generated children documents into the profile. 6. System Architecture Figure 3 shows the architecture design of our proposed system. The design represents an example of multi-agent based system. In a multi-agent based system there are multiple agents cooperate together in order to perform a specific task.

User Machine

Profile Interface Agent

GA Agent

Main Agent NLP Agent

Simple Search Engine 1 Meta Search Engine Query Agent Mobile Agent

DB

DB Agent

Simple Search Engine 2

Search Engines URL’s Search Agent

Simple Search Engine n

Fig. (3) : Proposed system architecture

271 6.1 Process Overview Our system consists of the following agents: • Interface agent: This agent provides a graphical interface that links the user with other agents. The major tasks of the interface agent are to: o Accept a natural language query from the user or in order to build a profile, accept user’s examples. o If the system proposed documents to the user, it accepts a relevance feedback from the user. o Sometimes user needs only to update the search process not to search for new subject so this agent accepts update operation. o Cooperate with a main agent and send the accepted information. o Cooperate with a main agent in order to receive results. o Display the results to the user. • Main Agent: this agent can be considered as administration and cooperation agent. The major tasks it performs are: o Send query or examples to NLP agent. o Use vectors from the profile (in case of searching for long term interest) or the retrieved vector from NLP agent (in case of searching for short term interest) to form a query. o Generate a mobile agent with the generated query. o Whenever the mobile agent return with the results, the main agent forward the results to the interface agent. o The main agent directly accesses the user profile, so it modifies it according to the user feedback. o If the user considers one of the documents as relevant the main agent sends this document to the NLP agent to add it to profile after natural language processing. • NLP agent: this is an intelligent agent. It applies NLP techniques in the user’s query, the user’s examples or relevant documents. The major tasks it performs are: o Extract meaningful phrases from user’s query, user’s examples or relevant documents. o Create vectors for user’s query then send it back to the main agent. o Create vectors for the user’s examples or relevant documents then add it to the profile. • GA agent: this agent applies the genetic algorithm on the user’s profile in order to optimize it. This agent’s execution consists of two phases: passive phase and active phase. Before the search process begins the profile is optimized in the passive phase. Whenever the search process begins and the system proposes some documents to the user and accepts the user’s relevance feedback the profile is optimized again in of course active phase. • Mobile agent: This agent conveys the query vector or profile’s vectors to a meta-search engine. Its itinerary is set by the main agent. • Search agent: This is a stationary agent resides in a meta search engine server .The major tasks it performs are: o Cooperate with the arrived mobile agent to receive its data. o Formulate specialized-simple search engines queries.

272





o Spawn a number of query agents (mobile agents) which hold the formulated query, query’s or profile’s vectors, converting document to vector code and matching code. o Send query mobile agents to a number of specified simple search engines. The URL’s of these simple search engines are stored in the URL’s data base in the meta search engine. o There may be duplications in the results since its query agent travel to single search engine so; the search agent eliminates such duplications in the results. o After duplications elimination, the search agent passes the results to the mobile agent. Query agent: this is a mobile agent which visits single simple search engine. It conveys the formulated query, or profile’s vectors, converting document to vector code and matching code. So its major tasks are: o Cooperate with the DB agent in the simple search engine in order to search for the required information using formulated query. o Convert retrieved documents to vectors. o Execute matching code on the vectors. o Retrieve the relevant documents to the meta-search engine. DB agent: this is a stationary agent which resides in the simple search engine. This agent accesses the simple search engine data base directly. Its main function is to search for the information needed as the formulated query specifies.

Process Detailed Specification The process starts when the user asks the system to search for specific information or when the user provide the system with example’s about what he interest in. The interface agent accepts this information and passes it to the main agent. The main agent pass the queries or examples to NLP agent in order to extract meaningful phrases and then to create a vector. This vector may be passed immediately to the main agent if it is correspond to query or added to profile if it is for example. The process of extraction the phrases is described in the next section. Process of optimization and adaptation the profile using relevance feedback and genetic algorithm then described. Finally, we explain the matching process. 6.2.1 Phrases extraction and vector creation : This process of phrase extraction consists of several steps. [13] Figure 4 shows these steps. Tokenization Stopword removal Lemmatizer Morophlogical Normalization Stemmer Phrases Extraction Synonym Normalization

Fig. (4) : Phrase Extraction process

1-Tokenization: As a first step in processing a document or a query, it has to be determined what the processing tokens are. One of the simplest approaches to tokenization defines word symbols and inter-word symbols. All characters that are no letter and no digit are considered to be inter-word symbols. The inter-word symbols are ignored during this

273 phase, and the remaining sequences of word symbols are the processing tokens. As a result it is not possible to search for punctuation marks like for instance hyphens and question marks. 2-Stop word removal: Stop words are words with little meaning that are removed from the document and the query. Words might carry little meaning from a frequency point of view, or alternatively from a conceptual point of view. Words that occur in many of the documents in the collection carry little meaning from a frequency point of view, because a search for documents that contain that word will retrieve many of the documents in the collection. Stop word removal on the basis of frequency can be done easily by removing the 200-300 words with the highest frequencies in the document collection. As a result of stopping the very frequent words, indexes will be between 30 % and 50 % smaller. If words carry little conceptual meaning, they might be removed whether their frequency in the collection is high or low. Removing stop words for conceptual reasons can be done by using a stop list that enumerates all words with little meaning, typically function words like for instance “the”, “it” and “a”. These words also have a high frequency in English, but most publicly available stop lists are, at least partly, not constructed on the basis of word frequencies alone. For instance the stop list published by Van Rijsbergen (1979), contains infrequent words like “hereupon” and “whereafter”, which occur respectively two and four times in the document collection. 3-Morphological normalization: Morphological normalization of words in documents and queries is used to find documents that contain morphological variants of the original query. Morphological normalization can be achieved either by using a stemmer or by using dictionary lookup. A stemmer applies morphological ‘rules of the thumb’ to normalize words. Sometimes stemming algorithms may conflate two words with very different meanings to the same stem, for instance the words “skies” and “ski” might both be reduced to “ski”. In such cases users might not understand why a certain document is retrieved and may begin to question the integrity of the system in general. Still, stemmers are used in many research systems .Linguistically correct output can be generated by dictionary lookup. Having a full-form dictionary is however not enough to build a lemmatizer. Some words will have multiple entries, possibly with different lemmas. For instance, the word “saw” may be a past tense verb, in which case its lemma is “see” and it may be a noun, in which case its lemma is equal to the full form. Similarly, the word “number” may be the comparative of “numb”. For these cases, a lemmatizer has to determine the word’s part-of speech before the correct lemma can be chosen. Corpora may be used to effectively find the correct part-of-speech and therefore the correct lemma. 4-Phrase Extraction and Parsing: During document or query processing, multiple words may be treated as one processing token. The meaning of phrases might be quite different from the meaning of the separate words. A user who enters the query “stock exchange” will probably not be satisfied with documents that discuss “exchange of live stock”. There are different approaches to phrase extraction. Phrases might be simply predefined extracted by or extracted by syntactic processing. Syntactic processing might be used to extract noun phrases which are then normalized to head-modifier pairs. This will produce the same processing token for e.g. “information retrieval” and “retrieval of information”, because in both “information” modifies the head “retrieval”. 5-Synonym normalization: Much like stemming and lemmatization, synonymous words might also be conflated to one processing token during indexing and automatic query

274 formulation. After applying the above steps we become ready to create a vector. The vector is created based upon the extracted phrases which considered as terms. The term frequency and weights are calculated for the documents in the profile. The weights are calculated using the formula tf x idf. For the query terms we give them high frequencies and high weights. 6.2.2 Adaptation and optimization the profile : In our proposed system the genetic algorithm is used to optimize the profile whereas the relevance feedback is used to adapt it. The profile used to search for long term interest. The short term interest can be started immediately after the NLP return the created vector to the main agent. As mentioned above the document is represented by a vector. Thus, the genetic algorithm would represent a gene as a term, an individual as a document and the population as the profile. The process is as follows: After the documents provided by the user (examples or relevant documents) are translated into vectors by the NLP agent, they are added to the profile. Then the genetic agent starts its execution by calculating the objective function from the formula: S ( Di , Pj ) = ∑ Wik * Wjk Where D and P represent the document and the profile respectively and the subscript k varies on common terms only. This formula computes the similarity between vectors in the profile. The fitness function is calculated based upon it and is defined as:

, where S(Dpk ,Pi) is the similarity between profile’s vector i and the kth document examples provided by the user and #Dp represents the total number of documents examples. After optimization, the main agent starts the search process by using the vectors in the profile or the returned vector from NLP agent in case of short term interest. In both cases, the mobile agent conveys these vectors to the meta-search engine. The search agent spawns many query agent with these vectors, the vector creation code and the matching code. The matching function used is the same as the objective function since it measure the similarities between documents. After the query agents execute in each search engine they return to the meta-search which foreword the results to the mobile agent. The mobile agent returns to the user machine. Then the main agent takes the results and passes them to the interface agent which displays the results to the user. The user replies with his relevance feedback. Feedback is passed to the main agent which in turn modifies the weights of the profile vector according to this reply. If the document is relevant the weights are increased otherwise, decreased. The formula used for this modification is:

where the feedback power α is a predetermined parameter between 0 and 1, Wp are the weights of the firing vectors of the profile, Wd are the weights of the proposed document and f is the user feedback. The user feedback are either 1 if the document is relevant or -1 if it is not. If the proposed document is judged relevant, the agent adds its vector to the profile. Then the genetic agent re-optimizes the modified profile and the system proceeds with the next iteration. In the next section we present the interaction diagrams for our proposed system to provide a visual understanding.

275 7. Interaction Diagrams In this section we present some of the system interaction diagrams. Each diagram corresponds to one or more agents. For simplicity we separate the long term interest search interaction diagrams from the short term interest search ones. 7.1 Interface Agent 7.1.1 Short term interest : Figure 5 shows the interaction diagram in case of short term interest search. In this diagram the interface window accept the natural query from the user. This query stored in an intermediate object “Query” in order to be transmitted to the main agent. Once the user enter the query he/she can disconnected. So the lifetime for the search window is short. Whenever the user connects again he/she can ask the system to display the results. The system checks to see if the results are ready. If so the system displays them. Otherwise, the system either displays an error message. Interface Window

Query

Result Window

Result

Search ( NQ ) New (NQ )

Search ( NQ )

to main agent From main agent

Display_result ( )

Newresult ()

Fig. (5) : Interface Agent interaction diagram- short term interest

7.1.2 Long term Interest : Figure 6 shows the interaction in the case of the long term interest. The user first provides the system with examples. These examples used to create the profile and stored in the query object. After the search process completes the system proposes the retrieved documents to the user using the result window. The user replies with the relevance feedback which forward to the main agent throw the result class. If the document is relevant its vector must be added to the profile. So, in this case the search function is called. The user may want to update the search process to retrieve the up to date documents, he/she can do this throw the update function call. This is passed to the main agent. Interface Window

Query

Result Window

Result

New (E )

Search (E)

to main agent From main agent

Display_result ( )

Newresult ()

feedback (f )

feedback (f )

update ( )

[R] Search(E) update ( ) update()

276

Fig. (7) : Interface agent interaction diagram- long term interest

Figure 7 shows that whenever the main agent accepts the search (NQ) call from the Query object in the interface agent, it calls the analyze function in the NLP agent. The parameters for this function can be the natural query (NQ), user’s examples (E) or relevant document proposed by the system (R). In the case of long term interest it must access the profile object to retrieve the vectors. Otherwise, the NLP returns the vector immediately. Then in order to set the itinerary of the mobile agent, the Get () function in the URL’s object is called. Finally, the mobile agent is created and launched to execute in the meta-search engine. Search

Generation Mobile

profile

URL’s DB

Search ( NQ | R | E )

Analyze( NQ | R | E) retrieve_vector () Return(vector) Long_Search () Get() launch(mobile)

Figure 7: Main and Mobile agent interaction diagram

7.3 NLP Agent This agent may accept analyze call with NQ, R or E parameters. In case of NQ it returns the vector of a natural query immediately to the main agent. In the other cases it adds the vector to the profile. The process of generation the vectors is described in section 5.2.1. 7.4 Search Agent The tasks that are performed by this agent are shown in figure 8. It cooperates with the mobile agent and accepts the vectors. The process of create queries agents starts with the Get_URL’s() function call. It is used to determine the URL for each simple search engines. the call happens many times until all URL’s are retrieved. The formulate_query (vector) function formulates a specialized- simple search engine queries. After that, the generation query class generates many query agents to several search engines. After each agents return with its own results (stored in the results class), the interface call the elimination_duplication function. Of course now the mobile agent can return to the user machine with results. 7.5 Query and DB Agents How do these agents cooperate to perform the search task is shown in figure 9. The interface class in the query agent contains the formulated query and the vectors. It calls the search function in the DB agent using the formulated query. If results are found the retrieved documents are converted to vectors by the converter class. The matching class applies the matching code (similarity function) into the documents and query vectors and determines the

277 relevance. The results stored in the result class which can be accessed by the search agent in the meta-search engine.

Interface launch (mobile agent)

Generation Query

URL’s DB

Query DB

Result

new (vector )

* Get URL’s ( )

* formulate query ( vector) * launch (Query agent ) Eleminate duplication ( )

Fig. (8) : Search agent interaction diagram

Interface

DB

Converter

Matching

Result

Excecute ( Query_agent) Search ( Query )

* Convert_retrieved (document)

* matched (document_vector) Store (relevant document )

Fig. (9) : Query and DB agents interaction diagram

8. Conclusions Research in information retrieval field has underscored the need for agent-based system. We propose a multi agent based system, which run several agents locally on each user’s computer and which support communication between them to help in common tasks.

278 We suggest using a mobile agent in our system to take the advantage of a mobile agent that reduces the communication cost. We also suggest using a natural language processing based agent, that extract meaningful phrases from user’s examples (profile) or user’s natural queries and from retrieved documents. Instead of using simple matching method vector space model is used as a matching method between queries and documents. We also suggest using a metasearch engine as a component of our proposed system. The proposed system creates a user profile to understand the user long term interest. The profile is optimized with genetic algorithm and adapted with the relevance feedback. We believe that the social behavior of multi agent systems will enhance the performance of information retrieval system in both recall and precision. For future work we will try to implement our proposed system for testing it, discovering its weakness points and so improving it. References [1] James J., "Using an Intelligent Agent to Enhance Search Engine Performance.", First monday, peerreviewed journal on the Internet, Vol 2, No. 2. 1997. Available on line at http://www.firstmonday.dl/issues/ issue2_3/jansen/index.html [2] B. Hermans, "Intelligent Software Agents on the Internet: An inventory of currently offered functionality in the information society and a prediction of (near-)future developments," 1996, Available on line at http://www.hermans.org/agents. [3] Walt B. Rudiger Z. Intelligent Software Agent [4] Larry K., Wooju K., “WebSifter II: A Personalizable Meta-Search Agent based on Semantic Weighted Taxonamy Tree”,2002 [5] Hui-min chen, “ Design and implementation of the agent-based EVMs System” , June 2000 [6] Yasuuhaki K. “ A multi-agent Based Intelligent WWW interfacer”,2002 [7] Matie L. et al, “ Agent Communication inside an Internet Search engine “ Intelligent Software Component , Septemper 2000 [8] IT Council for SA Newsletter, “ Your Amigo develops unique search engine technology” February 2002 [9] Selberg, E. and Etzioni, O., "Multi-Service Search and Comparison using the MetaCrawler," In Proceedings of the fourth World Wide Web Conference, pp. 21-70, Boston, Dec. 1995. [10] Jonathan F. "Information Retrieval Research", In Proceedings of the 19th Annual BCS-IRSG Colloquium on IR Research, Aberdeen, Scotland, 8-9 April 1997 [11] Zacharis Z. Nick “Web Search Using a Genetic Algorithm”, University of Piraeus, 2001 [12] Franco Busetti “Genetic algorithms overview”, available online at: http ://citeseer.nj.nec.com/busetti01genetic.html ,2001 [13] Djoerd Hiemstra and Franciska de Jong, “Statistical Language Models and Information Retrieval: natural language processing really meets retrieval “,University of Twente, 2001

‫‪279‬‬

‫ﻨﻅﺎﻡ ﺇﺭﺠﺎﻉ ﺍﻟﻤﻌﻠﻭﻤﺎﺕ ﺍﻟﻤﻌﺘﻤﺩ ﻋﻠﻰ ﺍﺴﺘﺨﺩﺍﻡ ﺍﻟﻭﻜﻴل ﺍﻟﺒﺭﻤﺠﻲ‬

‫ﻓﺘﺤﻲ ﺍﻟﺒﺭﻋﻲ ﻋﻴﺴﻰ‪ ،‬ﺤﻨﺎﻥ ﺼﺎﻟﺢ ﺍﻟﻐﺎﻤﺩﻱ‬

‫ﺠﺎﻤﻌﺔ ﺍﻟﻤﻠﻙ ﻋﺒﺩﺍﻟﻌﺯﻴﺯ‪ ،‬ﺠﺩﺓ‪ ،‬ﺍﻟﻤﻤﻠﻜﺔ ﺍﻟﻌﺭﺒﻴﺔ ﺍﻟﺴﻌﻭﺩﻴﺔ‬ ‫ﺍﻟﻤﺴﺘﺨﻠﺹ ‪ .‬ﺇﻥ ﺍﻻﺯﺩﻴﺎﺩ ﺍﻟﻤﺘﺴﺎﺭﻉ ﻭ ﺍﻟﻤﻀﻁﺭﺩ ﻓﻲ ﺤﺠﻡ ﺍﻟﻤﻌﻠﻭﻤﺎﺕ ﺍﻟﻤﺘﻭﻓﺭﺓ ﻋﻠﻰ ﺍﻹﻨﺘﺭﻨﺕ ﻴﺯﻴﺩ ﻤﻥ‬

‫ﺼﻌﻭﺒﺔ ﻋﻤﻠﻴﺔ ﺍﻟﺒﺤﺙ ﻭﻴﻘﻠل ﻤﻥ ﺴﺭﻋﺔ ﺇﺭﺠﺎﻉ ﺍﻟﻤﻌﻠﻭﻤﺎﺕ ﺍﻟﻤﻁﻠﻭﺒﺔ‪ .‬ﻭﻜﺤل ﻟﻬﺫﻩ ﺍﻟﻤﺸﻜﻠﺔ ﺍﻗﺘﺭﺤﺕ ﺍﻟﻌﺩﻴﺩ ﻤﻥ‬ ‫ﺍﻷﺒﺤﺎﺙ ﺇﻨﺸﺎﺀ ﺃﻨﻅﻤﺔ ﺘﺴﺘﺨﺩﻡ ﺍﻟﻭﻜﻴل ﺍﻟﺒﺭﻤﺠﻲ ﺍﻟﺫﻜﻲ‪ .‬ﻭ ﺘﺨﺘﻠﻑ ﺍﻷﻨﻅﻤﺔ ﺍﻟﻤﻘﺘﺭﺤﺔ ﻤﻥ ﺤﻴﺙ ﻨﻭﻉ ﺍﻟﻭﻜﻴل‬

‫ﺍﻟﺒﺭﻤﺠﻲ ﺍﻟﻤﺴﺘﺨﺩﻡ ﻭ ﺍﻟﻐﺭﺽ ﻤﻥ ﺍﺴﺘﺨﺩﺍﻤﻪ‪ ،‬ﻓﻔﻲ ﺒﻌﺽ ﺍﻷﻨﻅﻤﺔ ﺍﺴﺘﺨﺩﻡ ﺍﻟﻭﻜﻴل ﺍﻟﺒﺭﻤﺠﻲ ﻟﻔﻬﻡ ﻤﺘﻁﻠﺒﺎﺕ‬ ‫ﺍﻟﻤﺴﺘﺨﺩﻡ ﻭ ﻓﻲ ﺍﻟﺒﻌﺽ ﺍﻵﺨﺭ ﺍﺴﺘﺨﺩﻡ ﻟﺯﻴﺎﺩﺓ ﻜﻔﺎﺀﺓ ﺍﻟﺸﺒﻜﺔ‪ .‬ﻭ ﻓﻲ ﻫﺫﺍ ﺍﻟﺒﺤﺙ ﺍﻗﺘﺭﺤﻨﺎ ﺇﻨﺸﺎﺀ ﻨﻅﺎﻡ ﺇﺭﺠﺎﻉ‬

‫ﻤﻌﻠﻭﻤﺎﺕ ﻤﻌﺘﻤﺩ ﻋﻠﻰ ﺍﺴﺘﺨﺩﺍﻡ ﻋﺩﺩ ﻤﻥ ﺍﻟﻭﻜﻼﺀ ﺍﻟﺒﺭﻤﺠﻴﻴﻥ ﺒﺤﻴﺙ ﻴﺯﻴﺩ ﻤﻥ ﺩﻗﺔ ﻭﺴﺭﻋﺔ ﻨﺘﺎﺌﺞ ﺍﻟﺒﺤﺙ ﺒﺎﻹﻀﺎﻓﺔ‬ ‫ﺇﻟﻰ ﺯﻴﺎﺩﺓ ﻜﻔﺎﺀﺓ ﺍﻟﺸﺒﻜﺔ‪ .‬ﻭ ﻟﻘﺩ ﺤﺎﻭﻟﻨﺎ ﻓﻲ ﻫﺫﺍ ﺍﻟﻨﻅﺎﻡ ﺍﻻﺴﺘﻔﺎﺩﺓ ﻤﻥ ﺘﻘﻨﻴﺎﺕ ﻤﺨﺘﻠﻔﺔ ﻤﻥ ﻤﺠﺎﻻﺕ ﻋﺩﺓ‪ :‬ﺍﻟﺫﻜﺎﺀ‬ ‫ﺍﻟﺼﻨﺎﻋﻲ‪ ،‬ﻫﻨﺩﺴﺔ ﺍﻟﺒﺭﻤﺠﻴﺎﺕ‪ ،‬ﺇﺭﺠﺎﻉ ﺍﻟﻤﻌﻠﻭﻤﺎﺕ‪ .‬ﻫﺫﺍ ﻭﻴﻘﻭﻡ ﺍﻟﻨﻅﺎﻡ ﺒﺈﻨﺸﺎﺀ ﻟﻤﺤﺔ ﻋﻥ ﺍﻟﻤﺴﺘﺨﺩﻡ ﻟﻔﻬﻡ‬

‫ﺍﻫﺘﻤﺎﻤﺎﺘﻪ ﻁﻭﻴﻠﺔ ﻭ ﻗﺼﻴﺭﺓ ﺍﻟﻤﺩﻯ‪ .‬ﻭﻴﺘﻡ ﻋﻤل ﺘﻜﺎﻤل ﻟﻠﻤﺤﺔ ﺍﻟﻤﺴﺘﺨﺩﻡ ﻋﻥ ﻁﺭﻴﻕ ﺍﻟﺨﻭﺍﺭﺯﻡ ﺍﻟﺠﻴﻨﻲ ﻭ ﺘﻜﻴﻴﻑ‬ ‫ﺒﻭﺍﺴﻁﺔ ﻋﻤﻠﻴﺔ ﺍﻟﺘﻐﺫﻴﺔ ﺍﻟﺭﺍﺠﻌﺔ ﺫﺍﺕ ﺍﻟﻌﻼﻗﺔ ﻭﻜﺫﻟﻙ ﻴﺘﻡ ﺘﺤﻠﻴل ﺍﺴﺘﻌﻼﻡ ﺍﻟﻤﺴﺘﺨﺩﻡ ﻭ ﻨﺘﺎﺌﺞ ﺍﻟﺒﺤﺙ ﻋﻥ ﻁﺭﻴﻕ‬

‫ﺘﻘﻨﻴﺎﺕ ﻤﻌﺎﻟﺠﺔ ﺍﻟﻠﻐﺎﺕ ﺍﻟﻁﺒﻴﻌﻴﺔ‪.‬‬

Suggest Documents