How Adaptive Agents Learn to Deal with Incomplete ... - CiteSeerX

1 downloads 0 Views 82KB Size Report
{Giant, Wolf, Huge, Appear}. {Giant, Wolf, Ragnarok, Earth}. {Giant, Wolf, Tyr}. {Giant, Wolf, Romulus, Feast}. {Giant, Wolf, Woman}. {Giant, Poseidon, Cyclopes}.
How Adaptive Agents Learn to Deal with Incomplete Queries in Distributed Information Environments Francisco B. Pereira♦,♣ , Ernesto Costa♣ ♦

Instituto Superior de Engenharia de Coimbra, Quinta da Nora 3030 Coimbra, Portugal



Centro de Informática e Sistemas da Universidade de Coimbra, Polo II – Pinhal de Marrocos 3030 Coimbra, Portugal {xico, ernesto}@dei.uc.pt

Abstract- Queries that are not indicative of real information needs are a major problem for information retrieval systems. In this work we study how individual learning helps adaptive agents, when searching for information in a distributed environment, to modify incomplete queries in order to improve their retrieving performance. Two learning procedures, occurring in two different levels, will be proposed and their effect will be studied in several situations. Preliminary results show that changes induced by learning in the query vector of adaptive agents, provide an important advantage and enable them to make correct decisions about how to deal with this problem.

1 Introduction Evolution and learning are the two major forces that promote the adaptation of organisms living in Artificial Life (AL) worlds. Evolution, operating at the population level, includes all mechanisms of genetic changes that occur in organisms over generations. Learning occurs at a different time scale. It gives to each individual the ability to modify its behavior during its life in order to increase its adaptation to the environment and, hence, its chance to survive and reproduce. In our research we are interested in the study of the interactions that exist between individual learning and evolution (for a good overview on the subject, see [2]). To study changes induced in behavior by learning we designed a prototype of an agent-based system for discovering and retrieving information in a distributed and dynamic environment. We will use a subset of the World Wide Web (Web) as the test environment for our experiments. In this system a set of evolving organisms browses through the information environment trying to find relevant information for a specific query. When facing a new document, they try to behave like the user that requested the information. They evaluate it to determine if it is relevant and then select the most promising link (the one more likely to lead to relevant information). Information Retrieval (IR) agents are well adapted to the environment if, when browsing through the Web, they are able to identify most

promising links and also if they can accurately determine relevance of documents. It should be emphasized that we do not intend to present here a complete multi-agent system for IR on the Web. Our goal is to study interactions between evolution and learning. The system we will describe in next sections is just a prototype, which was designed to test our ideas. Nevertheless, we believe that insight gained with the experiments reported here might be important to gain understanding about the role that learning plays in this domain. Some experimental results were already presented in [12], suggesting that, under specific conditions, changes in behavior induced by learning can provide an important advantage when agents browse through the Web. Two learning procedures were proposed and results proved that, when the complexity of the environment is high, learning effectively helps agents to develop correct behaviors. In the experiments reported in the mentioned work, complexity was the result of the lack of syntactic clues in the environment. These clues are needed to guide search performed by agents. Here we will be concerned with query modification techniques. Queries represent formal statements of information needs placed to the IR system by users. Incomplete queries are a frequent problem because users often provide short queries that are not representative of the information need and/or containing terms that do not match the terms that appear in the majority of relevant documents [5]. In this situation complexity arises from the lack of significance of the submitted query. We will analyze how learning helps agents to modify their queries, enabling them to increase their retrieving performance. The paper has the following structure: in the next two sections we present a brief overview of AL models and IR. In section 4, we present the key features of the IR system. Section 5 comprises the analysis of the results of a set of experiments performed in order to determine the influence of learning in the modification of incomplete queries. Finally, in section 6 we present some conclusions and suggest directions for future work.

Traditional IR systems were designed to static and centralized collections of directly accessible documents. This way standard IR research is primarily concerned with the first and third stages of the process.

At the contrary, when searching for information on the Web, finding documents is the most difficult problem to solve. This environment possesses several properties that make it very difficult to access relevant information. First of all, its structure and size: the Web consists of millions of documents distributed over a large number of independent servers. Another problem is the dynamic nature of the environment, with information being constantly updated: several documents disappear, new ones are added and others are modified. From an operational point of view, we can see this structure as a graph where documents are connected by hyperlinks. The easiest way, although inefficient, to access information is to browse across documents, following promising hyperlinks. To help users in their search several search engines, such as Altavista2 or Yahoo3, are available to the public. These IR systems use indexing databases where they store an efficient representation of a large number of documents [4]. Search engines differ in the way they build their databases (either using Web spiders or by subscription) and how they keep them up to date. Their main limitation is that, between updates, they assume that the environment remains unchanged, which is obviously not true. Therefore, all index tables have a significant portion of incorrect information. Lawrence and Giles [7] esteem that main search engines return, as an answer to a query, a significant percentage of invalid documents (between 2% and 9% of invalid links). In addition, the same study refers that only a fraction of the total number of documents is indexed. Although the coverage varies with different search engines, no one indexes more than one third of the Web. There is also evidence that the coverage is increasing slower than the size of the Web (the estimated growth rate is about 1000% for the next few years). Multi-agent adaptive systems are an alternative to standard search engines. These models rely on a set of semi-autonomous intelligent agents, which might help the user to find relevant documents. Several online multi-agent systems for IR on the Web, with varying degrees of autonomy and intelligence, have been developed in the last years. Concerning the problem of finding specific information requested by the user, multi-agent systems can be divided in two categories: in the first one, they submit queries to different search engines and then decide which answers are most relevant (a kind of metasearch) [11]. These systems act as filters of the information provided by standard Web search engines. They are able to improve retrieval efficiency because they use personal profiles of the user who requested the information (they filter information that they think the user will consider irrelevant). In spite of that, they do not perform an autonomous search in the information environment, and this way, the major problems

1

2

2 Artificial Life Models Agents from our IR model live in an AL ecosystem. In AL worlds there is not a global selection mechanism to guide evolution and, so, there is not pressure towards a global optimum. In standard evolutionary computation algorithms [6], an individual’s survival is dependent from the rest of the population. At the contrary, the “quality” of an AL organism is determined without direct comparisons with other individuals. It is an absolute measure, resulting only from the interactions between this individual and the environment. An individual senses the neighborhood, decides what are the actions to perform (in accordance to its behavior) and then executes them. If it is well adapted, i.e., if the actions performed bring it benefits, then it will survive, and, eventually, reproduce. Otherwise, it will die. It is the complexity of the environment (how difficult adaptation and survival are) that creates the selective pressure, leading to the evolution of the organisms. This complexity can be measured by the amount of resources available to the whole population. Agents need energy to survive and reproduce. This energy is collected from the environment as a result of appropriate actions performed. On the other hand, all actions performed have a cost. At the individual level, survival of each agent depends on the difference between these two values. At the population level, by controlling the carrying capacity of the environment1 and the amount of work that must be done to find energy, we have a measure of how difficult it will be for an organism to survive. Cooperation between agents living in an AL world is indirect, resulting from the competition for resources. If there are enough resources to maintain multiple organisms, agents will spread through the environment and collaborate in the discovery and retrieval of relevant information. If resources are scarce, agents will compete for them and the less adapted will eventually die.

3 Information Retrieval on the Web The goal of IR systems is to find and deliver to the user the set of documents that will best enable his need for information [3]. The process of finding information can be divided in 3 stages: -

Formulating queries;

-

Finding documents;

-

Determining relevance;

Carrying capacity is defined as the amount of latent energy available to individuals [8].

3

http://www.altavista.com http://www.yahoo.com

of standard search engines, such as limited coverage or outdated information, are present. Another approach is to let agents act as “spiders” on the Web looking for online information that they think will be relevant for the user [1] [9]. These systems do not suffer from the scalability problem [10], since they search through the current environment, and dynamically adapt to changes, both in the information resources and the interests of the user. The disadvantage of the approach is that the user will have to wait longer before relevant information is retrieved.

4 Architecture of the IR System Here we will just highlight the main features of the system. A detailed description can be found elsewhere [12]. We did not develop a completely new architecture for an agent-based system for IR. Just like we mentioned before, all we need is a framework to study the interactions between evolution and learning. The system relies on the notion of link topology [9]. This property states that, even in unstructured hypertext information spaces like the one we are dealing with, there is a structure imposed upon the collection of documents because author(s) tend to cluster documents of related subjects. This is achieved because links connecting documents are manually inserted with the purpose of helping browsing activity of human users. We expect that IR agents develop behaviors that enable them to explore this property. Even though the information environment is physically distributed in a large number of places, in our system, agents live in a local AL world. When they need to access the information contained in one document they send a request to the server where the page is located. The retrieved information is then locally processed. 4.1 Information Representation The information contained in documents has to be in a standard format. Since we are using typical IR techniques only the text of the documents is considered. A vector of weighted stemmed keywords and a list of links represent Web pages. The processing of a retrieved document has the following steps: a parser converts the HTML document into plain text and builds a list with all the links. The parsed text is then stemmed (process of removing prefixes and suffixes from words) [5] and the output is filtered to remove commonly used words considered to have no indexing value [5]. After these operations we obtain a stemmed word vector D. To enhance retrieval efficiency, we attach a weight to each word. A variation of the TFIFD (Term Frequency * Inverse Document Frequency) measure [13] is used to obtain a normalized weight wi to each word di from D. Wi measures the ability of word di to discriminate between the individual document to where it belongs from the rest of the collection (all pages already retrieved compose the collection of documents).

4.2 Retrieval Process In the beginning of the simulation the user provides a set of words that he believes best define the query. Each agent builds a keyword vector K with all these words. This vector will help agents to decide about the relevance of pages that they found. During simulation, evolution and learning may change K in order to increase performance. Each agent receives the address of one page where it must start the search. Then it browses through the environment trying to find relevant information. The process has two steps: i) Evaluation: When an agent reaches a document D it uses equation 1 to measure the similarity between D and its query vector K. This formula is a variation of the similarity equation between a document and a query proposed by [13].    #T  FD =  ∑ wi  * η +  # K   i∈T 

(eq.1)

T is the set of keywords that appear both in K and D. There is a reward if the percentage of words from K in D is higher than (1-η). A typical value for η is 0.5. Each agent believes it has a perfect model of the user, so its keyword vector is a perfect representation of the information needed. To determine when it should present a given document to the user, each individual has a private value representing minimum acceptable quality (the relevance threshold parameter Rt). This way, when it finds a page with FD higher than Rt, it considers that the document is relevant enough and presents it to the user. ii) Decision about the link to follow: after evaluation, an agent has to decide which link is more likely to lead to an interesting document. At this point it follows a strategy similar to the human activity of browsing. It gives higher priority to links that appear promising and backtracks when it feels that the followed path is unreliable. This decision is local and based on syntactic clues. To estimate the potential of each link, agents use equation 2 that performs a weighted sum of three factors: Link F =

αC1 + βC 2 + χC 3 α+β +χ

(eq. 2)

C1: Syntactic analysis: We consider that, in addition to link topology, there is also a word topology [9]. Assuming this, links with keywords in the neighborhood are more likely to point to documents with information about those subjects. To obtain a value for this factor, the text that surrounds each link is considered. Then, a weight counting of words from K that belong to this text is performed. Weights decay with distance from the link. C2: Syntactic quality of nearby links: usually in a document, links pointing to pages about similar subjects are clustered together. Following this assumption, the potential of a link should also reflect the syntactic quality of neighbor links;

C3: Quality of the actual document (FD): assuming link topology, it is more likely to hit a relevant document from another relevant document; Before calculation, all factors are normalized to the interval [0, 1]. Weights α, β, χ, with values in the range [0, 1], are specific to each agent and are subject to evolution. Variation in weights may lead to the development of different search strategies. During its life, an agent keeps track of links with higher estimates. When it has to decide which link to follow, it considers not only the links from the actual page, but also high ranked links from previous pages, which were not selected. 4.3 Energy of an Agent At birth, an agent has an initial energy level. For each visited document, it pays a tax. When an agent believes that its current document is relevant (FD > Rt), it presents it to the user, expecting a reward. The user is crucial in our system. Since all agents believe that they are perfect, explicit rewards and punishments guide evolution, enabling the development of correct behaviors. Representation of user judgements is done with one ordinal scale with three categories: relevant, partially relevant and non-relevant [14]. The classification is converted to a numerical value (2 for Relevant, 1 for Partially Relevant and –1 for NonRelevant) and the variation of the energy of agent A is computed with equation 3 (φ is a reward constant and feedback is the numerical value provided by the user): Enew ( A) ← Eold ( A) + φ ∗ Feedback

(eq. 3)

important component when evaluating several links from the same document. It is the existence or absence of keywords from K that will be responsible for the difference in the potential of links. During link learning, an agent has the ability to visit and evaluate pages pointed by some of the links of the current document. Then it compares the quality FD of the page with the estimate of the link that led to it. When it finds an underestimated link, a new word belonging to the neighborhood of the link, is added to K. Link learning is only possible when agents are completely lost (i.e., the current document does not contain any syntactic clue). Concerning link learning, we will test two different strategies: • Global: all changes induced in K by link learning are permanent and directly passed to the descendants. • Local: Link learning is local to a single action of the agent. Vector K changes by the effect of link learning, the new keyword vector is used to re-evaluate the link potential, but then the modification in K is discarded and not used in the future. Keyword learning: when the user provides relevance feedback, the agent performs several keyword mutations trying to obtain a value for FD closer to the value provided. It is not possible to add or replace more than one word during each occurrence of keyword learning. Changes in K, as a result of keyword learning, are permanent and directly passed to the descendants.

5 Dealing with Incomplete Queries: Some Experiments

4.4 Evolution and Learning Evolutionary mechanisms in AL worlds are controlled by the amount of energy of individuals. When, due to external rewards, energy of an agent reaches a fixed reproduction threshold, a new agent is created. Reproduction is asexual. The genotype of an agent is composed by the vector K, weights α, β, χ, and the relevance threshold Rt. The offspring receives a mutated copy of the parent genotype and half of its energy. There are two types of mutation:

In many situations the user does not have a clear idea of what his needs are and/or how to express them. This problem has long been recognized as a major difficulty in IR systems [5]. One possible solution to it is to modify the initial query. There are two main strategies to achieve this [5]: query reweighting and query expansion. In this paper we only consider the second option, since our approach does not use weights attached to query terms (we consider that all terms belonging to the query K are equally important).

Normal distributed mutation: new values for α, β, χ and Rt are obtained by adding a small random value to the original value. Keyword mutation: one word with high TFIDF, present in the current document, is selected. Then it is possible to add it to K or to replace the least discriminating word from the vector. The probability of adding a new word decays as the dimension of the vector increases. Learning, occurring during lifetime of individuals, increases their adaptability to the environment. We consider two different learning procedures: Link learning: it gives to an agent the ability to update K in order to improve the link estimate. C1 is the most

The major problem with query expansion is to find a suitable set of words to add to the original query vector. Standard IR techniques suggest several possibilities [5]. We expect that learning procedures enable agents to develop behaviors that mimic the following suggestions: • If some relevant documents have already been found then one possible solution is to add to the query some words from them. Keyword learning procedure induces a similar behavior. • Even if the query has not retrieved any relevant document some words must be selected. Link learning tries to find, among the set of words belonging to irrelevant documents, some that might help agents to find areas where the relevant information is.

5.1 Experimental Settings The experiments were performed in a self contained and controlled environment. To test our ideas we need to have a consistent and clearly defined evaluation framework, and if we considered the whole Web, this would be impossible to achieve. We selected “Encyclopedia Mythica™”4, one on-line Encyclopedia about mythology with a structure similar to the Web. It contains over 5700 documents fully interconnected by hyperlinks. The most important parameters of the AL world are presented in table 1.

5.2 Question: What is the name of the Giant Wolf in Norse Mythology? This question was reduced to a set of keywords to form the initial vector K of all agents. Agents were placed in the homepage (main entry) of the Encyclopedia and needed to find and retrieve documents that contain the answer to the question. The authors played the role of users5. Experiment Nr. 1: Ideal Query Docs 400

350

Parameter

Value

300

100 (divided in 10 runs)

250

40

200

Initial Energy

25

150

Initial R t

0.5

100

Pages visited during Link Learning

10

Number of agents Number of time cycles

50

Probability of Link Learning

{0.25, 0.5} 0

Maximum number of mutations performed during Keyword learning Probability of Keyword Learning Maximum Number of Links in Keep list Reproduction Threshold Maximum number of words in K

5

0

5

10

15

20

Non-Learning

0.5 5

25

Local 25%

30

35

40

Time

Global 25%

Figure 1: Cumulative number of relevant documents presented to the user in experiment 1

31.25 10

Link Learning Attempts with Experiment Nr. 1 Nr. Agents 60

Tax for Visiting a Document

0.15

Constant η

0.5

Reward Constant φ

1.5

50

40

Table 1: Parameters of the AL World 30

Since we are essentially concerned with the influence that learning has in the behavior of single agents, our analysis focuses in direct comparisons between simulations with and without learning. We intend to study how different learning strategies are able to improve retrieval efficiency. To achieve this, we performed several tests with three different situations with varying degrees of incompleteness in the query. In the analysis, we use an adjusted version of precision, defined as the ratio of relevant documents presented by agents over the total number of documents presented. In the Encyclopedia it is available a MythQuiz with questions about myths and gods. The answer to the questions is somewhere in one (or more) documents from the Encyclopedia. We selected several questions and performed an extensive set of experiments. Due to lack of space, results reported in the next section concern just one question, which nevertheless are illustrative of the global results achieved.

20

10

0 0

Available at http://www.pantheon.org/mythica. Copyright © 1995-1999 M. F. Lindemans.

10

15

20

Local 25%

25

Global 25%

30

35

40

Time

Figure 2: Link learning attempts performed by agents in experiment 1 over 40 time cycles.

5.2.1 Experiment 1: Ideal Query In the first experiment we supplied the agents with the following set of keywords: K = {Giant, Wolf, Norse}. This can be considered as an ideal set of keywords and the results obtained will serve to perform some comparisons 5

4

5

The authors do not have any specific knowledge about mythology (they were unable to answer the question used in the experiments). When rewarding or punishing the agents, they can be considered as typical users searching for answers to their queries.

with the results of subsequent experiments. In figure 1 we present a graph showing the cumulative number of relevant documents supplied to the user in runs without learning, with local learning and with global learning. Only results with link learning probability of 25% are presented (results with 50% probability are similar). It is clear that agents easily found relevant documents to the query performed and, also, that both learning strategies were ineffective in the search. This experiment confirmed the results obtained in [12]. The information environment has a hierarchical structure and is divided in areas, being one of them Norse Mythology. Since the word Norse belongs to K, a significant number of agents easily find its way to this area. The graph in figure 2, which presents the total number of agents that performed link learning, supports this conclusion. Link learning nearly disappears after 5 cycles, because agents already found enough syntactic clues to guide their search. 5.2.2

Experiment 2: Incomplete Query

It is clear that the word Norse enables agents to quickly find their way into the area where the relevant documents are. To see how they behave in the absence of this word, we performed a second experiment with the following query vector: K = {Giant, Wolf}. In figure 3 we present a graph showing the cumulative number of relevant documents supplied to the user. There are two lines for each learning strategy: one concerning runs with link learning probability of 25% and another for runs with link learning probability of 50% (keyword learning probability is always 50%). In this experiment it is clear that both learning strategies helped agents. The removal of the word Norse causes difficulties in the search process because agents have difficulties in identifying promising areas in the environment and also because the query is very short and not indicative of the information need (the query is too general). There are several areas in the information environment that, considering the initial query, can be considered relevant. This way, the best strategy for agents is to walk around, trying to find some clues about the best path to follow. At this stage link learning is very helpful. After they find the first clues, agents expand their query vector with some new words (the query becomes more specific) and continue to explore the paths found. If the path leads to promising areas the user will reward them. If, otherwise, they choose a bad path (with regard to the concrete information need) they will not gain any reward. In figure 4 we present a graph with the number of link learning attempts performed by agents. It is visible, for all runs, the decrease of learning attempts over time. Nearly the end of the simulation, runs that started with different probabilities, tend to converge to similar values. This suggests that agents are capable of self-adjusting this probability to their needs. Agents that started with higher link learning probability decrease this value over time, as

soon as they start to find syntactic clues. The large overhead in document retrieving by the effect of learning is produced in early stages of simulation. Nevertheless it enables agents with 50% of link learning probability to achieve significant better results (see graph in figure 3), because, when they start to spread through the environment, they collect a larger number of clues and so perform wiser decisions about the best path to follow. Experiment Nr. 2: Incomplete Query Docs 80

70

60

50

40

30

20

10

0 0

5

10

Non-Learning

15

Local 25%

20

25

Local 50%

30

Global 25%

35

Global 50%

40

Time

Figure 3: Cumulative number of relevant documents presented to the user in experiment 2 Link Learning Attempts with Experiment Nr. 2 Nr. Agents 60

50

40

30

20

10

0 0

5

10

Local 25%

15

Local 50%

20

25

Global 25%

30

Global 50%

35

40

Time

Figure 4: Link learning attempts performed by agents in experiment 2 over 40 time cycles. Concerning the number of relevant documents retrieved, both learning strategies can be considered similar. A different situation occurs if we look at the graph in figure 5, where we present the precision obtained by different runs. Whilst precision in experiments with global learning is poor, it is very high for experiments with local link learning. The explanation for this is related to the role that the two learning procedures play when agents are trying to find new words to add to the initial query. The main task of link learning, when improving the potential of links that point to interesting documents, is to help agents to discover

promising areas in the environment. Link learning only occurs in irrelevant documents, and so, it is likely that the keyword chosen is also irrelevant for the current query. Since global learning strategy inserts this new word in K, agents will gradually change their interests and start seeking for something new. On the contrary, local learning agents discard changes produced by link learning (they only use them to re-evaluate link potential), and new words are added to K only as a result of keyword learning. Since this procedure occurs in relevant documents, the probability of adding a relevant word is significant. This way, local learning combines the best of the two learning procedures: it uses link learning to gain a better understanding of the information environment and uses keyword learning to find new relevant words to add to K and complete the query. In table 2 we present several examples of changes induced in K by the two learning strategies. It is visible that, for agents with global learning, a large number of words added to K are not directly related to the original information need.

70

60

50

40

30

20

10

0 0

5

10

Non-Learning

15

20

Local 25%

25

Local 50%

30

Global 25%

35

Global 50%

40

Time

Figure 6: Cumulative number of relevant documents presented to the user in experiment 3. Precision in Experiment Nr. 3

Precision in Experiment Nr. 2

Precision

Experiment Nr. 3: One Additional Hint Docs 80

Docs 1.0

1.0 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.0 0.0

5 5

10

Non-Learning

15

20

Local 25%

25

Local 50%

30

Global 25%

35

Global 50%

Time

Figure 5: Level of precision of the different runs in experiment 2.

Local Learning

Global Learning

{Giant, Wolf, Loki}

{Giant, Wolf, Norse, Loki}

{Giant, Wolf, Fenrir}

{Giant, Wolf, Juno, Hades}

{Giant, Wolf, Huge, Appear}

{Giant, Wolf, Ragnarok, Earth}

{Giant, Wolf, Tyr}

{Giant, Wolf, Romulus, Feast}

{Giant, Wolf, Woman}

{Giant, Poseidon, Cyclopes}

Table 2: Examples of changes induced in K by the two learning strategies. 5.2.3

10

15

20

25

30

35

40

Experiment 3: One Additional Hint

According to the legend, at the destruction of the world Vidar will kill the Giant Wolf. Given this hint we performed a final set of runs with the following query vector: K = {Giant, Wolf, Vidar}.

Non-Learning

Local 25%

Local 50%

Global 25%

Global 50%

40

Time

Figure 7: Level of precision of the different runs in experiment 3. It is important to notice that this new word should not make the search process easier. Documents about Vidar are at the same distance from the starting point, as documents about the giant wolf, so agents have to find their way into this area with the same information as in Experiment 2. The difference is that the query is now more precise, although it is still incomplete because of the lack of the word Norse. In figure 6 we present a graph with the cumulative number of relevant documents presented to the user. At a first sight results are similar to the ones obtained in the second query. The small increase in the number of documents presented is directly related to the document about Vidar (the percentage of agents that were able to find the area with the relevant documents is equivalent to the one obtained in the previous experiment). The most important difference is the large decrease in the number of irrelevant documents presented. It is visible in the graph from figure 7 the substantial increase in precision for all different runs. This is a direct

consequence of the query words. In this final experiment the question is not so generic and agents are able to distinguish, from the beginning of the search, which areas seem most promising. This final result reveals that, although learning is able to help agents to modify their query in order to increase performance, initial longer queries (even if they are not complete) improve precision and prevent the user from receiving a large number of irrelevant information.

6 Conclusions and Further Work Multi-agent adaptive systems are a promising approach to seek for information in distributed and dynamic environments. In this paper we studied how lifetime learning helps IR agents to deal with incomplete queries in such environments. We proposed two different learning procedures to modify agent’s behavior. Experimental results revealed that their effect is complementary: one of them (link learning) is very useful to help agents to gain a better understanding of the information environment and to gain access t12o some syntactic clues when it is feeling “lost”. The other one (keyword learning), is directly responsible for the addition of new words to the query. The results presented here are part of on-going research. Further tests with different queries and in different information environments are needed in order to confirm our results. We also intend to study several new situations, such as the existence of noise in the query or the case where the interests of the user change over time, which will add another degree of complexity to the environment. In the future we also intend to study the possibility of the emergence of mechanisms of explicit cooperation and/or competition. Currently, interactions between agents living in the AL world are indirect. They search different areas of the information environment and, since they compete for a finite set of resources, only agents well adapted are able to survive and reproduce. To enable the emergence of explicit interactions between agents we need to confer them the ability to communicate, which might be also subject to evolution and learning.

Acknowledgments This work was partially funded by the Portuguese Ministry of Science and Technology, under Program PRAXIS XXI.

Bibliography 1. Balabanovic, M. (1997). An Adaptive Web Page Recommendation Service, Proceedings of 1st International Conference on Autonomous Agents. 2. Belew, R. and Mitchell, M. (1996). Adaptive Individuals in Evolving Populations: Models and Algorithms, Santa Fe Institute in the Sciences of Complexity, Vol. XXVI.

3. Belkin, N. and Croft, B. (1992). Information Filtering and Information Retrieval. Communications of the ACM, 35, No. 12, pp. 29-37. 4. De Bra, P. (1995). Finding Information on the Web. CWI Quarterly, V. 8(4), pp.289-306. 5. Frakes, W. and Baeza-Yates, R., Eds. (1992). Information Retrieval: Data Structures and Algorithms. Prentice-Hall. 6. Holland, J. (1975). Adaptation in Natural and Artificial Systems, The University of Michigan Press, Ann Arbor, Michigan. 7. Lawrence, S. and Giles, C. (1998). Searching the World Wide Web. Science, Vol. 280, 3 April 1998, pp. 98 – 100. 8. Menczer, F. Cecconi, F. and Belew, R. K. (1996). From Complex Environments to Complex Behaviors, Todd, P. (ed.), Adaptive Behavior (Special Issue on Environment, Structure, and Behavior), Vol.4(3), pp. 317-363. 9. Menczer, F. and Belew, R. (1998). Adaptive Information Agents in Distributed Textual Environments. In Proceedings of 2nd International Conference on Autonomous Agents. 10. Menczer, F. (1999). Is Agent-based Online Search Feasible?. In Working Notes of the AAAI Spring Symposium on Intelligent Agents in Cyberspace. 11. Moukas, A. (1996). Information Discovery and Filtering using a Multiagent Evolving Ecosystem, In Proceedings of the Conference on Practical Application of Intelligent Agents and Multi-Agent Technology, London. 12. Pereira, F. B. and Costa, E. (2000). The Influence of Learning in the Behavior of Information Retrieval Adaptive Agents. In Carroll, J., Damiani, E., Haddad, H. and Oppenheim, D. (Eds.), Proceedings of ACM Symposium on Applied Computing (SAC-2000), Como, Italy, pp. 452-457, The Association for Computing Machinery, New York. 13. Salton, G. and Buckley, C. (1987). Term Weighting Approaches in automatic Text Retrieval. Cornell University, TR87-881, 1987. 14. Saracevic, T., Kantor, P. Chamis, A. Y. and Trivison, D. (1987). A study of Information Seeking and Retrieving. I: Background and Methodology. Journal of the American Society for Information Science, 39, p. 161-176.

Suggest Documents