c
, , 1{30 () Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.
A Model for Adaptive Information Retrieval FABIO CRESTANI
[email protected] Dipartimento di Elettronica e Informatica, Universita di Padova, I-35131 Padova, Italy CORNELIS J. VAN RIJSBERGEN
[email protected] Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, Scotland
Editor: Abstract. The paper presents a network model that can be used to produce conceptual and
logical schemas for Information Retrieval applications. The model has interesting adaptability characteristics and can be instantiatedin various eective ways. The paper also reports the results of an experimental investigation into the eectiveness of implementing associative and adaptive retrieval on the proposed model by means of Neural Networks. The implementation makes use of the learning and generalisation capabilities of the Backpropagation learning algorithm to build up and use application domain knowledge in a sub-symbolic form. The knowledge is acquired from examples of queries and relevant documents. Three dierent learning strategies are introduced, their performance is analysed and compared with the performance of a traditional Information Retrieval system.
Keywords: Information Retrieval, Neural Networks, Conceptual Modelling, Learning Strategies, Relevance Feedback.
1. Introduction Recent research work in Information Retrieval (IR) suggests that signi cant improvements in retrieval performance will require techniques that, in some sense, \understand" the content of documents and queries [1]. Recently IR researchers have tried to use application domain knowledge to determine probable relationships between documents and queries. In knowledge representation one has rst to choose between using a symbolic or a subsymbolic knowledge representation technique. A symbolic knowledge representation technique uses symbols to represent information. There is a clear and well de ned correspondence between a symbol (or a part of the knowledge representation structure) and what kind of information the symbol is supposed to represent. In subsymbolic knowledge representation techniques this correspondence is not present. The entire knowledge representation structure (though composed of several dierent atomic elements) stores all the information without explicitly associating a piece of information to a particular subpart of the knowledge representation structure [2]. Symbolic domain knowledge representation approaches have many drawbacks in their application to IR. Some areas of diculty pointed out by previous research are:
F. CRESTANI AND C.J. VAN RIJSBERGEN
2
it is dicult to decide the level to which the application domain knowledge should be represented;
the knowledge of an application domain, however speci c it may be, is dynamic, therefore it must be kept up to date;
a symbolic knowledge representation re ects the knowledge of the expert (or team of experts) who built it, however it may not re ect the user's understanding of the application domain.
The use of subsymbolic knowledge representation techniques may provide a solution to some of these drawbacks. The fact that learning is an integral part of representation in subsymbolic techniques may help to solve the problem that Feigenbaum called \the bottleneck of knowledge engineering" [3], that is the problem of building large knowledge bases and keeping them up to date. Representing an application domain involves a big eort, in which experts of the application domain provide the knowledge to knowledge engineers who have to represent it in an appropriate form suitable for use in the application. Although user needs are kept in mind, users rarely take part in the construction of the knowledge base. The bringing up to date of the knowledge base is also time and eort consuming. The knowledge base cannot be kept continually up to date, so the updating process becomes a discrete process, where the interval between updates could be lengthy. The choice of the appropriate level of representation of the application domain knowledge is dicult. An error at this level can jeopardise the eectiveness of the use of the domain knowledge. A representation of the application domain knowledge that is either too deep or too super cial results in user dissatisfaction. Novice users could nd it too deep to be usable, while expert users could nd it too super cial. What is needed is a representation of the application domain knowledge that can adapt itself to the user's needs. An adaptive knowledge representation structure would enable the system to adapt to the level of domain knowledge required by the application. Subsymbolic representation structures are often adaptive . Both the human nervous system, upon which Arti cial Neural Networks (NN) are modelled, and the evolutionary system, upon which Genetic Algorithms are modelled, possess adaptive mechanisms, and learning itself is often seen as a sort of adaptation to the environment. Thus, subsymbolic knowledge representation structures possess in principle the ability to solve some of the problems related to the use of symbolic knowledge representation structures in IR. Some of these subsymbolic representation structures are based on an associative representation paradigm, the importance of which has long been recognised in IR [4]. The aim of this paper is to present a network model that will be used to model adaptive IR at a conceptual and logical level. The model enables the representation of the application and of the application domain knowledge.
A MODEL FOR ADAPTIVE INFORMATION RETRIEVAL
3
2. Related work In recent years several attempts have been made to apply knowledge representation techniques to IR. Most of these attempts made use of symbolic knowledge representation, where the application domain knowledge was encoded \manually" in the representation structure. Almost every symbolic knowledge representation technique developed in Arti cial Intelligence was tried: rules [5], frames [6, 7, 8], spreading activation on semantic networks [9, 10, 11], and hybrid knowledge representation techniques [12]. In general the results were encouraging and some experimental ndings were implemented in operational IR systems like for example TOPIC TM , or ConQuestTM . The problem of encoding the necessary application domain knowledge that is a bottleneck for these applications, however, brought IR researchers to explore areas such as knowledge acquisition and machine learning. It is only recently that attempts have been made to use NN in IR. The rst experiments were performed by Mozer [13] using an interactive activation model with binary (+1 or ,1) weights and no learning. Despite the simplicity of his experiments and the small scale of the test data used, Mozer stimulated further research, both on theoretical and experimental ground. In the years that followed, a few researchers developed experimental systems using similar models but testing on larger collections of documents [14], or using a hybrid knowledge representation [15, 16]. On the theoretical side a very interesting study was reported by Wong et al. in [17]. In that paper the authors investigated the use of a linear decision function in the context of IR. The procedure they used to determine the parameters of the function was based on the gradient descent algorithm also known as the Perceptron learning function. So even without any explicit reference to NN in their paper, this can be considered as one of the rst studies on the design of an adaptive IR system. However, as a paradigm the perceptron is limited by its linear separability condition, as Minsky and Papert pointed out [18], so having the same problems as probabilistic relevance feedback. A much larger scale set of experiments was performed by Belew [19, 20]. Belew's AIR (Adaptive IR) system was designed to learn through interaction with the user. AIR's interface allowed users to suggest the retrieved items they liked and did not like, suggesting where to expand or prune the search. This form of relevance feedback was used to alter the representation of the documents, so that the system would learn their representation from experience. This kind of NN learning is known as \supervised learning" because it requires a teacher that provides the system with training examples. Supervised learning was also used in other experiments, like for example [21, 22]. Other attempts made use of \unsupervised learning", also referred as selforganisation, that is learning performed without training examples, that relies for its learning only on local information and internal controls [23, 24]. Attempts to combine sound classical IR techniques with NN can be found in the work of a few researchers. Some researchers attempted to combine sound classical
4
F. CRESTANI AND C.J. VAN RIJSBERGEN
IR techniques with NN [25, 26] trying to use NN to reformulate the probabilistic model of IR with single terms as document components. Some others [27] attempted to marry the vector space model with learning techniques in order to build a thesaurus-like knowledge representation structure. Despite the large number of attempts of using NN in IR, here only brie y pointed out, we feel that some other interesting work can still be done in this area. In particular, we think that most of the previous work was performed with \ad hoc" models, mostly copied for other NN application areas and crudely adjusted for IR. We feel that the area needs a conceptual study before deciding which form of learning and which learning technique to use. In the following section we propose a conceptual model that can be used to model classical IR applications and that can also be coupled with application domain knowledge automatically acquired using a supervised learning procedure.
3. The conceptual model Scholars working in the eld of IR have used network structures for various purposes since the 1960's. These structures have been employed to support browsing, clustering, spreading activation search, multiple search strategies, representation of user knowledge and document or query content. Models using network structures dier and are often determined by the requirements of varying and diverse functionalities. So far, no general IR network model suitable for use as a conceptual modelling tool has been developed. The main characteristic of a network representation is that it views an object in terms of its relations with other objects. The advantage of a network representation structure resides in its expressive power. This is particularly valuable in IR, since in IR the signi cance of an information object can only be fully captured by considering its semantic relationships with other objects. In this light, the complexity of IR data resides in their complex semantic relationships rather than in the data themselves. It is widely recognised that further progress in the eectiveness of IR systems requires a breakthrough in our ability to capture the semantics of data. In this model the semantics of an IR object is realised by its relationships with higher level objects representing domain concepts or domain features. The structure that arises from this approach is a network, constructed in layers, that can be used both as a conceptual and logical schema of the application and as a designing and prototyping tool. It should be noticed that we are not concerned here with issues of physical storage of data. The network is seen as a conceptual/logical structure whose actual implementation can be obtained by using dierent processing paradigms, as required for an ecient access and storage of the physical data. Besides, one should bear in mind that a network structure is entirely compatible with other forms of data representation such as vectors or matrices and therefore most of the work already done in IR toward an ecient implementation of IR techniques can easily be used in the context of our model.
A MODEL FOR ADAPTIVE INFORMATION RETRIEVAL
5
3.1. The basic network structure The basic structure of the network is composed of objects and connections between objects. An object is anything that has its own identity or uniqueness irrespective of whether it denotes a physical or a conceptual entity. The notions of object and of an object's identity are those of the object oriented paradigm. The use of the object oriented conceptual paradigm for modelling IR objects was introduced in [28, 29] and here it is important to bear in mind that an object can exist independent of its own characteristics or properties. This means that it is possible to establish relationships between objects directly without any reference whatsoever to the object's properties. A connection expresses the existence of a relation between two objects. A connection can have a weight associated to it, that denotes the strength of the connection. Furthermore, a connection has a direction1 , and dierent directions within the same connection denote dierent relations between the two objects. Thus a connection can have dierent weights associated with dierent directions. Connections can be joined in order to relate objects not directly connected. Weights associated with these new connections are a function of the weights of the single component connections. Using a conventional terminology, a network can be graphically depicted using nodes meant to represent objects, and links meant to represent connections. The main IR objects are: queries: a query is an expression of a user's information need. An information need can be expressed using one or more queries and queries can be more or less complex, but a complex query can always be expressed in terms of simple elements. A set of queries forms a query collection. query descriptors: a query descriptor is any object used to represent a query by means of the query language of the system. A query can be represented using one or more query descriptors, and the complexity of a query depends on the number and structure of query descriptors it uses. Some query languages such as the Boolean query language require the use of operators to express a query, but some other languages do not. The set of query descriptors provides the vocabulary that can be used to formulate a query and to express an information need, while the system provides the grammar and syntax for the query description language. A set of query descriptors forms a query representation. documents: a document is any object (book, article, tape, picture, etc.) carrying information that can potentially satisfy a user's information need. A set of documents forms a document collection. document descriptors: a document descriptor is an object used to describe the document information content. Usually more than one document descriptor is necessary to express the document information content. Like the query descriptors, the document descriptors allow only a poor description of the real information content of a document; they form the vocabulary of the description language (also
F. CRESTANI AND C.J. VAN RIJSBERGEN
6
(b)
(a)
query collection
query representation
document representation
query
document descriptor
query descriptor
document
document collection
Figure 1. Query (a) and document (b) network.
known as indexing language), while the syntax and grammar are determined by the system. A set of document descriptors forms a document representation. Queries and documents are usually stored outside the IR system, their structure is independent of the IR system. Query descriptors and document descriptors, instead, are part of the IR system. Their structure depends very much on the characteristics of the IR system. In fact the IR system derives the query and document descriptors directly from queries and documents by means of two processes: query processing and document indexing . Both query processing and document indexing can be performed either manually or automatically. In our network model similar objects are grouped into layers. Links can connect objects placed on the same layer or on dierent layers. Links connecting objects on the same layer represent relationships between similar objects. Examples of this type of relationships are document citations or similarity relationships between descriptors. These kinds of connection will not be considered in this paper, while they are very important for modelling hypertextual relations [30]. The network model consists of two component networks: a query network and a document network . Figure 1(a) depicts a query network where a link connecting a query to a query descriptor shows that either a query is expressed using the connected query descriptor, that a query descriptor is used by that query. The two directions have a dierent semantics, but we will not use this distinction in this paper. In the same way Figure 1(b) depicts a document network where a link shows that either a document is represented by a document descriptor, or the reverse. As a result of dierences in arti cial and natural query languages and in query processing methods, the same information need can be expressed using dierent query representations, so a query layer can be connected to dierent query descriptor layers, as depicted in Figure 2(a). Likewise, dierences in the languages used in the document collection (as in multi-lingual collections) or dierences in the document indexing methods used to index the collection, can cause the same document collection to be represented using dierent document descriptor layers, as shown in Figure 2(b). Moreover, see again Figure 2(b), the same document descriptor
A MODEL FOR ADAPTIVE INFORMATION RETRIEVAL
7
document collection A document representation A
query representation A
query collection
document collection B (b)
(a) query representation B
B document representation
Figure 2. Multiple query (a) and document (b) descriptions.
layer can be used to represent the information content of documents of dierent document collections. So far queries and documents have been modelled as two distinct networks. At this point, however, it becomes necessary to establish a connection between the two networks to enable the retrieval of documents in response to a query. The connection between query and document networks is achieved by a matching process . Generally speaking the matching process associates a value S(q; d) to every pair of query-document (q; d). This value expresses the retrieval status value , a measure that can be seen as a measure of association, or similarity, between query q and document d. The retrieval status value can be computed as a function of every dierent path s(q; d) connecting the query to the document through the descriptors layers. The way the matching is performed and the way the retrieval status value is computed varies in dierent IR models. In traditional IR models matching is achieved by using the same descriptors in the query and document representation. Documents are retrieved and ranked only if they are described by means of at least one of the descriptors used in the query. This case can be modelled in our network representation using the same set of descriptors (and therefore the same representation layer) for queries and documents, as depicted in Figure 3. The model presented in the next Section enables the conceptual modelling of traditional as well as of advanced IR applications [31]. In this model we will use dierent representation layers for query and document descriptors.
F. CRESTANI AND C.J. VAN RIJSBERGEN
8
query collection
representation (query and document)
document collection
query descriptor document
Figure 3. Matching in traditional IR models.
3.2. Intelligent Information Retrieval A long time has passed since Luhn [32] suggested the use of statistical techniques for the representation of a document's information content. There have been many changes in the eld of IR since then and there have been incredible developments in computer hardware. However some fundamental issues remain unsolved. In particular, the representation of textual documents and of user information needs remain some of the most problematic aspects of research in IR. It is true that statistical approaches to the analysis of text and retrieval of documents have signi cant advantages in terms of eectiveness compared to other techniques, but it is recognised that, at present, statistical techniques have reached the limit of their performance. The dissatisfaction with the current state of research in statistical methods is one of the main factors causing the recent upsurge of interest among the IR research community in more \intelligent" IR techniques. The area of IR research called Intelligent Information Retrieval (IIR) draws on the overlap of research in Arti cial Intelligence (AI) and IR. The application of AI results to IR is a recent phenomenon. The basic belief underlying the research in IIR is that a truly helpful IR systems must, in some way, \understand" what the user is looking for. In order to do this, the system must minimally understand the document's information content in relation to the domain to which the document belongs to. In fact, an obvious reason for the somewhat poor performances of traditional IR technology is in the use of isolated words, or lexical items, as descriptors, that gives a poor approximation to the information content of a document. Even with the addition of a thesaurus or a phrasal lexicon, the approximation is not good enough. It is possible to identify three distinct areas in which AI research has been applied to IR [1]. A brief description of the three application areas follows. IR and expert systems. The central theme of this area of research is the development of an expert intermediary system . This is an expert system that assists the query formulation, the search strategy selection, and the evaluation of retrieved documents [33, 34, 35].
A MODEL FOR ADAPTIVE INFORMATION RETRIEVAL
9
IR and natural language processing. It has been recognised that natural language processing (NLP) is an essential part of the process of identifying and representing the information content of the query and the document [36]. The aim is to produce representations that re ect more accurately the meaning of the objects represented. Higher performances should result, but there is no proof that they will. Besides, in order to perform eective NLP it is necessary to have a large amount of domain knowledge, even larger than that which is required for expert systems. So far, much of the traditional research in IR has been concentrated on simple language analysis techniques, such as identifying word stems and phrases. IR and knowledge representation. The main intent of this area of research is to represent and use application domain knowledge to understand the information content of documents and queries. There are two basic approaches being explored that re ect dierent views of IR. The rst approach assumes that very detailed domain knowledge is available and that document contents are represented using this knowledge. Examples of such IR systems are GRANT [9] and RUBRIC [5, 37]. The second approach is based on conventional IR systems that can be used in more loosely de ned domains. It assumes that domain knowledge is incomplete and must be incrementally acquired through interaction with users. The I 3 R system [7] is an example of this approach. We agree with this last approach. We intend to make use of application domain knowledge to perform a more eective association between query and document descriptors. The domain knowledge necessary to this task is acquired incrementally by interactions with users. 3.3. Representing application domain knowledge Studies on knowledge representation in AI have developed representation techniques that have been used experimentally in several IR prototype systems. Some examples of the representation and use of domain knowledge in IR applications will be pointed out in the sequel. Network knowledge representation structure appear to be the best ways of representing domain knowledge for IR applications, since such domain knowledge captures the semantics of objects in terms of relations with other objects. Using a network representation, the knowledge of a speci c object can be enlarged or restricted by looking at its relations with other objects as a function of the knowledge level required. If for example we are interested only in a general understanding of an information object then we will look at a small set of rather general information objects related to it in the network structure. If, on the other hand, we are interested in a more detailed knowledge of an information object then we will look at a larger number of very speci c information objects related to it. The knowledge level is therefore in direct relation with the two main parameters used to evaluate the performance of an IR system: recall and precision 2. An accurate knowledge of an object, coming from a precise identi cation of the object in relation to other
F. CRESTANI AND C.J. VAN RIJSBERGEN
10
query descriptor concept document query collection
query representation
knowledge representation
document representation
document collection
Figure 4. Matching using application domain knowledge.
objects, enhances the precision of the system, whereas a wider knowledge of an object, extending to other objects related to it, enhances the recall of the system. In order to add domain knowledge to our network model, it is necessary to introduce a new type of objects: concepts. The semantics of this object type is more complex than the other object of the network structure and it is often necessary to use phrases to represent a concept. Descriptors, on the other hand are often more simple objects than concepts. Descriptors are often (except in manual indexing) single terms or even stems of terms. Dictionary de nitions of concepts are remarkably vague, but have in common the abstract idea of a class of objects, particularly one derived from speci c instances or occurrence. We will consider a concept as it is often de ned in AI, where it denotes a structural description of examples of objects being described. A concept carries information about a class of objects or features of the application domain. Concepts can be used to approach the modelling of knowledge based IR (KBIR) systems. In a KBIR system the matching between query descriptors and document descriptors takes place on the concept layer as shown in Figure 4, where query descriptors and document descriptors are connected to the concepts they describe. A concept is represented by a small circle while relationships between concepts are represented by links connecting concepts. Links are directional (the direction is not shown in the gures) and can be labeled and/or weighted to point out respectively the type and/or the strength of the relationship. The concept approach to modelling KBIR systems has some important advantages over traditional modelling:
descriptors depend upon the language and/or the symbolism used in queries and
documents, while concepts are independent from language and symbolism [2]. If the matching is performed on a conceptual level there is no need to use the same language and/or the same symbolism for query descriptors and document descriptors; a single concept can be expressed using dierent descriptors or a single descriptor can be used to express dierent concepts. If descriptors are terms, the signi cance of a concept can be expressed using dierent terms and a term can have varying signi cance and so be associated with dierent concepts;
A MODEL FOR ADAPTIVE INFORMATION RETRIEVAL
11
knowledge representation A
query collection
document representation
query representation
document collection
query descriptor concept document
knowledge representation B
knowledge layer
Figure 5. Multiple classi cation system.
The way the concept network is implemented depends on the particular knowledge representation technique chosen. The concept network is an independent part inside the basic network model and the only relationship it has with the query network and the document network is in the links connecting descriptors to concepts. It is considered as a component network of the conceptual network model. Since the concept network, representing domain knowledge, is sometimes developed by more than one expert, it can re ect dierent classi cation systems on the same domain, and it is possible to have the situation depicted in Figure 5, in which the document network and the query network are connected via two or more different concept networks. Each concept network re ects a dierent encoding of the same domain knowledge. It is possible to organise the concept network in a new layer that contains, as instances, all the knowledge representations of the same domain knowledge. This new layer is called knowledge layer . It is possible to think of it as a sort of meta-knowledge, that is, knowledge about the dierent knowledge representations available for a speci c application domain. The utility of having dierent domain knowledge representations derives from the possibility for the user to choose the best domain knowledge representation or classi cation schema for the particular domain. Sometimes it is not easy to identify the best application domain knowledge and it is dicult to say which classi cation schema is the best, even on a narrow and speci c domain. Besides, some users might be familiar with a particular classi cation schema while other users of the same application might be familiar with another. Users might prefer to use their own classi cation schema or similarly they might prefer to build their own schema. Again, if a domain knowledge representation re ects the view of a speci c expert, a user might be interested to retrieve documents using the views of other experts. Another interesting possibility is depicted in Figure 6. In this example the same query is processed on two dierent document collections that use two dierent classi cation schemas or domain knowledge representations. In a modern environment
F. CRESTANI AND C.J. VAN RIJSBERGEN
12
query collection
knowledge representation A
document representation A
document collection A
knowledge representation B
document representation B
document collection B
query representation
query descriptor concept document
Figure 6. Querying dierent collections.
in which resources are often distributed on complex communication networks and can be shared by dierent users, it will quickly become possible to query dierent collections at the same time. A network structure like the one depicted in Figure 6 can help modelling such situation.
4. Associative Information Retrieval using domain knowledge We have so far described the structure of the conceptual model in detail. An important characteristic of this model is its exibility. Various processing techniques can be applied to it. They determine the way the weights on links are set and the way the retrieval status values for documents with regards to a query are computed. Each processing technique is related to a particular interpretation of the network structure. Three dierent approaches are examined here. Semantic network approach
In a semantic network links specify semantic relationships between nodes and are usually labeled according to their semantic signi cance [38]. The semantic significance of links connecting queries to query descriptors or documents to document descriptors derives from their de nition, as described in Section 3.1. In contrast, the semantic signi cance of links connecting descriptors to concepts is more complex because it involves the idea of \expressing" a concept using descriptors. Descriptors are expressions of concepts by means of dierent languages or simply by means of dierent terms. Links among concepts depend on the semantic association between them; if the concept network is intended to represent a classi cation schema then the relationships between concepts can be the usual semantic relationships represented
A MODEL FOR ADAPTIVE INFORMATION RETRIEVAL
13
in a thesaurus and links are labeled accordingly [39]. More complex representations result from the expressions of expert knowledge in the concept network. There are several processing techniques that can be applied to a semantic network. One of the most interesting ones is the constrained spreading activation technique ([9, 40, 41]) in which weights on links, assigned according to their semantics, are used to spread activation from the query to the documents. Activation spreads from the query layer to the document layer using links. Some rules determining constraints on the spreading activation are needed to prevent too many nodes being activated. Spreading on the knowledge representation network adds the eect of using knowledge to the activation ow. The activation value reaching the documents determines the retrieval status value of that document with regards to the query. Associative network approach
Associative networks are characterised by the fact that links represent generic associations that do not need to be labeled since the focus is not on their semantics, but on their strength. For this reason weights are associated to links. Weights can be calculated using a number of dierent approaches. An interesting processing technique for an associative network involves the use of parallel distributed processing (PDP) models [42]. The use of PDP models allows one to think of the three layers, query, knowledge and document representation in Figure 4, as NN. The NN works as a pattern associator, that learns through training sessions to associate activation values on query descriptors with activation values on document descriptors. The knowledge representation layer acts as the hidden layer of the NN and it represents knowledge in an implicit (subsymbolic) way using the strength of its links. The associations the system learns in the training sessions are stored in the weights on the links. The main advantage of this approach is that it does not require the explicit coding of knowledge in the knowledge representation network because the knowledge is automatically acquired from user or expert feedback. Besides there is no need to update the knowledge because the updating is automatically performed using relevance feedback. However, despite the appeal of this approach, there are many open problems related to the implementation of these ideas, as we will discuss later in this paper. Inference network approach
This is another interesting processing technique proposed recently by Turtle and Croft in [43, 44]. Links are interpreted as logic implications and they have a measure of probability associated with them. Dierent processing techniques can be used on an inference network, such as those based on Bayesian probability theory or the Dempster-Shafer theory of evidence. One of the major advantages of this
14
F. CRESTANI AND C.J. VAN RIJSBERGEN
approach is its probabilistic basis that assures the possibility of performing a deep mathematical analysis of the behaviour of the network. In the following section the associative network approach will be considered in more detail and a prototype system based on the use of NN will be presented.
5. Adaptive Information Retrieval and subsymbolic learning of domain knowledge We decided to perform a set of experiments related to the use of a subsymbolic knowledge representation of the application domain. In order to perform an experimental analysis, two tools, at least, are necessary: (a) a document collection with relevance judgments, and (b) a NN or a NN simulator. Document collection
The document collection chosen for this research is the ASLIB Cran eld test collection . This collection was built up with considerable eort in the early 60s as the testbed for the ASLIB-Cran eld research project [45]. This project produced two collections of documents in the domain of aeronautics, consisting of documents, queries and relevance judgments. The two collections are only dierent in size. The largest one is made up of 1400 documents with 279 queries and relative relevance judgments. The second one is made of 200 documents with 42 queries and corresponding relevance judgments. Because of operative limits of our simulation environment, only the 200 document collection has been used in the investigation. It is of course understood that this limits the generality of the results obtained. However, the main purpose of this investigation is to prove the feasibility of the proposed approach. There are still many open issues concerning the application of NN to IR in real life applications, and the problem of scaling up the results is one of the major ones. Simulation system
For the investigations reported in this paper a NN simulator running on a fast conventional computer has been used. The NN model employed is the classical three layers feedforward model, in which the three layers correspond to the query representation, knowledge representation, and document representation layers of the conceptual network model previously described. The learning rule used to train the NN was the Back Propagation (BP) learning rule [42], an error correcting rule of general use in pattern association applications of NN. A simulation system (SS) was developed. It is composed of the following components:
A MODEL FOR ADAPTIVE INFORMATION RETRIEVAL
15
Query Processor: it transforms a query expressed using descriptors into a binary vector (1 indicates the presence of the descriptor in the query and 0 its absence). NN Simulator: a simulated 3 layers feedforward NN using the BP learning rule. Matcher: it evaluates the similarity between two binary vectors using the Dice's coecient [46] and produces a value indicating the query-document representations similarity, i.e. the document retrieval status value. Document Processor: it transforms documents, which are represented using descriptors, into a binary vector representation to make it possible to evaluate the query-document similarity. The experiments reported in Section 6 were performed on this system. Every experiment is composed of two phases: a training phase and a retrieval phase. The structure of the system during the training phase is depicted in Figure 7 on the left. During the training phase no use is made of the Matcher. As it can be seen in the Figure, on one side the SS gets the query in the form of a set of descriptors, while on the other side it gets a set of descriptors representing relevant document(s). The Query Processor transforms the query into a binary vector whose dimension is that of the input layer of the NN, while the Document Processor does the same for the relevant document(s). The input and the output layer of the NN are set to represent queries and relevant document(s), and the BP algorithm is used for the learning. This is monitored by the NN simulator control structure and when some predetermined condition (e.g. error level or number of learning cycles) on the learning is met the learning phase is halted. Link matrices are produced, expressing in the weights associated to links the application domain knowledge acquired during the training phase. They are stored for further use during the retrieval phase. In the learning phase the SS could be fed with queries and documents according to various teaching strategies, as will be explained in Section 6. During a retrieval phase the components of the SS interact with each other as depicted on the right of Figure 7. As depicted in the Figure, on one side the SS is fed with a query and after the Query Processor has transformed the query into a binary representation, the NN is activated. The activation spreads from the input layer to the output layer using the weight matrices produced during the training phase. The vector representing the query is modi ed or rather adapted to the application domain knowledge and a new query representation vector is produced on the NN simulator's output layer. On the other side, the entire collection of documents is transformed into a large representation matrix by the Document Processor. This big matrix is then fed, together with the result of the query adaptation into the Matcher. The Matcher produces a ranked list of document identi cation numbers, where the ranking re ects the similarity of the documents with the query, expressed by the retrieval status value. The user interface of the SS displays the documents.
F. CRESTANI AND C.J. VAN RIJSBERGEN
16 Query
Query Simulation System
Simulation System
Query Processor
Query Processor
Neural Network Simulator PlaNet
Matcher
Knowledge Representation matrices
Neural Network Simulator PlaNet
Matcher
Knowledge Representation matrices Ranked list of relevant documents
Document Processor Document Processor
Relevant documents Documents
Training Phase
Retrieval Phase
Figure 7. Schematic view of the simulation system during the training and the retrieval phases.
Evaluation
The evaluation of the performance of the SS and of the validity of our approach was performed taking into account the following aspects. Learning performance: this is the ability of the system to acquire domain knowledge during the training phase. It is evaluated using an approach that is classical in NN research. It consists of the evaluation of the \mean error" between the training (target) results and the obtained results. Generalisation performance: this is the ability of the system to generalise the acquired domain knowledge. To evaluate this we determined the recall and precision performance of the SS at dierent stages of learning, with dierent sets of training examples. In Figure 9 and Figure 10, for example, recall and precision graphs have been evaluated after using as training examples only 10%, 20%, or 30% of the total number of queries available in the relevance assessments. The queries were chosen randomly among the entire set of queries available in the relevance assessments. We are aware that a careful selection of these queries could have produced better results, however we decided to stick as much as possible to real situation where queries are submitted to an IR system almost randomly. The graphs were evaluated using all the queries in the relevance assessments that were not used in the training phase, to highlight the generalisation eects. A line reporting the case of no training has also been plotted as a baseline to the training. If learning of the domain knowledge has eectively taken place, and if the NN can generalise it, then an improvement in the performance obtained in the retrieval of new queries has to be expected. Retrieval performance: this is done by comparing the retrieval performance of the SS with that of a traditional IR system. The comparison enables the evaluation of the query adaptation strategy versus the use of the original query. The results of this comparison are again presented using recall and precision graphs. In this evaluation, in contrast with the previous one, recall and precision performance were
A MODEL FOR ADAPTIVE INFORMATION RETRIEVAL
17
d i1
d i1
i
i
d2
qi
qi
i
Simulation
c
. d3 .
System ql
. dl 1 . . .
i
d2 Simulation
Simulation
i
dk
qi
System ql
i
System
l
c
For each query it uses the entire set of relevant documents.
d3
i
dl
l
dm
Total Learning
. . . .
Horizontal Learning For each query it uses a cluster representative of the entire set of relevant documents.
Vertical Learning It uses only one query and only a subset of the set of relevant documents.
Figure 8. Types of learning.
evaluated using the entire set of queries (including those ones used in the training), to highlight the fact that in operative situations the same query could be submitted more than once.
6. Experimenting Adaptive Information Retrieval Typically every experiment starts from scratch with weights on the NN's links assigned randomly. It begins by training the system using a subset of the examples provided by the relevance assessment. The knowledge acquired by means of the training phase, is then used to adapt the original user query to take into account the experience gained by solving similar queries in the training phase. So, after the training has been performed and its eectiveness has been evaluated (learning performance), the SS is tested to see if it is able to generalise the associations learned in the training phase and to respond correctly for the remaining examples (generalisation performance). After this, the eectiveness of the SS is tested against the eectiveness of a traditional IR system, which is based on the evaluation of the RSV between the original user query and the documents (retrieval performance). This last evaluation tests if the solutions provided by the SS are better than those provided by methods that make no use of application domain knowledge. The following three sections report the methodology and evaluations of three dierent learning strategies, resulting from three dierent ways of teaching the SS using examples provided by the relevance assessment. Figure 8 summarises the characteristics of the dierent types of learning.
6.1. Total learning The purpose of this set of experiments was to investigate the ability of the system to learn application domain knowledge from a set of training patterns of the form: (q1; d11); : : :(q1; d1h); : : :(qi ; dik); (ql ; dl1); : : :(ql ; dlm ) where (qi; dis) is a single training pattern made of a query and a document known, from the relevance assessment, to be relevant to that query. The set of training
F. CRESTANI AND C.J. VAN RIJSBERGEN
18
Table 1. Learning results for Total Learning with 25 queries together Hidden units 100 100 100 200
Learning cycles 300 600 900 300
Mean error 0.009102 0.009079 0.009136 0.009006
patterns is made of all documents relevant to a set of queries, a subset of all queries present in the relevance assessment. Each training pattern is considered by the NN module of the SS as a pattern to be learned. Learning results
It was observed that the learning results were good for queries with a small set of relevant documents and they gradually got worse as the size of the set of relevant documents per query increased. This is in complete accordance with the NN theory. The larger the set of associations (or patterns) to be learned the bigger the error. The mean error is a measure of the dierence between the training set and the retrieved set. In the present context this error can be seen as the dierence between two sets of documents: the rst set is composed of documents that experts in the application domain associate with the query, this is considered the target of the learning, the second set is the one actually given by the SS as a response when that query is submitted. The higher the error, the bigger the dierence between these two sets, and so the dierence between the desired and real response of the SS. With the target being determined by the relevance assessments, this evaluation can be considered \objective", because it evaluates the SS versus the best results it could possibly give. It does not take into consideration the user perceived relevance of documents to the query. This same interpretation applies to the learning performance of the SS, whatever the learning strategy employed. A comparison of the gures obtained for various experiments with dierent NN structure suggested the use of the following settings:
the number of input and output units is equal to the number of the descriptors
respectively used in the query and in the documents, that is 195 input units and 1142 output units; only one hidden layer has been used, because, as explained by the Kolmogorow theorem [47] there is no need for having more than three layers to be theoretically able to separate classes of arbitrarily complex shapes; the number of learning cycles performed during the learning phase was set to 300, because we observed that the improvement gained by performing a
A MODEL FOR ADAPTIVE INFORMATION RETRIEVAL
19
Generalisation Results - Total Learning
Retrieval Results - Total Learning
1
1 No training 10% training 20% training 30% training
Conventional IRS 10% training 20% training
0.9
0.8
0.8
0.7
0.7
0.6
0.6 Precision
Precision
0.9
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0 0
0.1
0.2
0.3
0.4
0.5 Recall
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5 Recall
0.6
0.7
0.8
0.9
1
Figure 9. Generalisation and retrieval performances of Total Learning.
larger number of learning cycles (see Table 1) was not worth the time and computational resources used in obtaining it. the number of hidden units was set to 100, bringing the number of connection whose weight has to be evaluated at each pass to 22; 269; 000! Using 200 hidden units brought a higher computational eort without a considerable improvement of the learning (see Table 1) and generalisation performance 3 . Generalisation results
This set of experiments aimed at investigating the capabilities of the SS to perform generalisation on the trained associations between query descriptors and document descriptors. The generalisation should enable the SS to deal with a new query associating to it an appropriate set of document descriptors. This provides a modi cation or adaptation of the user query to the application domain. Of course, the larger the number of queries used in the training the easier for the SS to nd a query similar to the new one among them. However, the larger the set of training patterns used in the training phase the higher the error and therefore the lower the performance of the learning. It is therefore interesting to see how the generalisation varies with dierent numbers of queries used in the training phase. The results obtained for various dimensions of the training set are reported on the left in Figure 9. The terms n % training reported in that gure refer to the percentage of queries used in the training over the entire set of queries available in the relevance assessments. The gures refer to precision and recall values reported by testing using the entire set of queries after the SS was trained using only a n % of them. As can be noticed from the graph, the precision and recall values were higher when the SS received a larger training. These results show that the generalisation capabilities of the NN are better when there is more ground on which to base them. For comparison, the precision and recall gures obtained for the case of absence
20
F. CRESTANI AND C.J. VAN RIJSBERGEN
of training were just those that we would expect if the descriptors were chosen randomly. Retrieval results
After testing the ability of the SS to perform the kind of generalisation required for dealing with new queries, the performance of the SS was compared to that achieved by a traditional IRS. The performance of the SS was evaluated at dierent stages of training. The results are depicted in Figure 9 on the right. There are three recall/precision graphs reported in that gure. The rst was obtained from a conventional IRS based on the use of the Dice's coecient of similarity between the original query and the documents of the entire collection. The second and third were obtained by evaluating the same similarity measure between documents and adapted query. Two dierent stages of training are displayed. The graph shows that at reasonable levels of training, that is when only 10% or 20% of the total number of queries in the relevance assessment were used in the training, the SS still performed very poorly when compared to an operational IRS. The retrieval performance was not considered to be at an acceptable level. The most probable explanation for these bad results was in the structure of the training patterns. These patterns are designed so that the input is constant over the entire set of patterns. In this case the NN has to learn to associate dierent outputs with the same input, which is not an ideal situation. The association of dierent outputs to the same input generates noise in the encoding, which makes it dicult for the NN to detect similarities among the patterns and makes it very dicult to perform generalisation.
6.2. Horizontal learning A new set of experiments was performed using training patterns of the following form: (q1; c1); : : :(qi; ci ); : : :(ql ; cl ) where (qi ; ci) is a single training pattern made of a query and the cluster representative of the set of documents known to be relevant to that query. A cluster representative is simply an object that represents all the objects in the cluster. There are many dierent procedures for obtaining a cluster representative but not all of them are useful in IR. The procedure used here is referred to as \centroid" evaluation, and it is the most widely used in IR. However, since the documents were represented using a binary form, a variant of this procedure was used. This consists of determining a vector ci = (ci1 ; ci2; : : :cin), whose dimension is the same as that of the document vectors, and that is representative of all the documents relevant to the query qi. The generic jth element of this vector is determined as follows:
A MODEL FOR ADAPTIVE INFORMATION RETRIEVAL
21
Pn di > 1 l=1 l;j cij = 10 ifotherwise where dil;j is the jth element of the binary representation of the lth document relevant to the query i. The intuition is that descriptors occurring more than once in the cluster should be considered as representative of the cluster. The motivation for performing such an experiment came from an analysis of the results obtained in the TL (see Section 6.1). The use of training patterns where the input is constant while the output varies (1-to-many mapping), causes the NN to be subject to too much noise to be able to generalise what it learned. Indeed the BP algorithm has not been designed to deal with such a situation. A possible way of avoiding this problem could be by using some kind of \synthesis" of the characteristics of the set of documents relevant to the query. This is equivalent to using a single document representation for each query in the training phase, thus having a single dierent output for each input. The unique document representation should characterise all the relevant documents for that query. The most common way of obtaining such a representation is by clustering the set of documents in order to produce a cluster representative, that summarises and represents the objects of the cluster. Learning results
The learning performance increased compared with that obtained for TL. The learning results again gradually worsened as the size of the set of queries used in the training increased. However, in this case the situation was consistently better than in TL. The addition of a new query to the set of training patterns adds only one single pattern, while in TL it adds as many patterns as the number of documents relevant to that query. In HL the increase in the error is linear to the increase of the number of queries used in the training, while in TL it increases more than linearly. Generalisation results
Here the SS is using \second hand", so to say, information. It does not use directly the descriptors used in the representation of relevant documents, but it uses the descriptors of the cluster representative. This implies a loss of information, but the number of training patterns decreases enormously, easing the computational problems of the NN learning and generalisation. Results obtained evaluating the entire set of queries for various dimensions of the training set are reported on the left in Figure 10. The generalisation performance appears considerably improved. The SS shows higher values of recall and precision than the ones observed in TL, for any one of the dierent levels of training. This proves that the simpler the patterns used to train the NN, the easier it is for the NN to encode them and detect similarities among them. In fact, our result show
F. CRESTANI AND C.J. VAN RIJSBERGEN
22 Generalisation Results - Horizontal Learning
Retrieval Results - Horizontal Learning
1
1 No training 10% training 20% training 30% training
Conventional IRS 10% training 20% training
0.9
0.8
0.8
0.7
0.7
0.6
0.6 Precision
Precision
0.9
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0 0
0.1
0.2
0.3
0.4
0.5 Recall
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5 Recall
0.6
0.7
0.8
0.9
1
Figure 10. Generalisation and retrieval performances of Horizontal Learning.
that, although the actual number of dierent patterns was reduced, the fact that the patterns are simpler and clearer facilitate not only their encoding but also the feature detection and the generalisation. Retrieval results
The graph on the right in Figure 10 shows that there is an improvement in the retrieval performance of the SS due to the new type of training. However, the performance of the SS is still lower than that achieved by a traditional IRS based on the use of the original query.
6.3. Vertical learning In VL the set of training patterns assumes the following form: (qi; di1); (qi; di2); : : :(qi; dil) where (qi ; dij ) is a training pattern made of a query representation and a relevant document representation. The set is made of only a subset (l documents) of all the documents (k documents, with k > l) known to be relevant to that single query. Dierent dimensions of the learning set were used. Experiments are identi ed by the ratio of the number of documents present in the learning set over the number of documents known to be relevant to the query. Three illustrative values of the ratio were used: 1=3, 1=2, and 2=3. It is not possible to report here how many documents these ratios referred to, since it varied from query to query. There are various dierences between VL and TL. The main one is that VL is concerned with a single query and uses information about some documents known to be relevant to that query to nd other documents relevant to that same query. The heuristic rule the SS uses is: \if these documents are relevant to this query, then these other documents must be relevant too". In this way VL is similar to
A MODEL FOR ADAPTIVE INFORMATION RETRIEVAL
23
traditional IR relevance feedback. In TL the SS has to generalise the information acquired during the training phase about several queries and their relevant documents to nd the proper query adaptation for a new user formulated query. In TL the SS uses the following heuristic rule: \if these sets of documents are relevant to these queries, then this set of documents must be relevant to this new query". These two tasks are not mutually exclusive, but they can be combined at dierent stages of a query session. The SS can rst point to the user a set of documents that, according to its knowledge of the application domain, appears to be relevant. Then, using relevance feedback from the user, the SS can point out other relevant documents that did not appear in the previous set. In this case the general application domain knowledge will be rst used to locate a preliminary set of documents considered relevant, while a more speci c knowledge acquired through interaction with the user will be used later to identify more precisely the relevant documents. Learning results
Again, the results gradually got worse as the size of the learning ratio increased, however, a bigger increase was noticed in the learning performance. This was due partially to the relatively small average set of patterns involved in the learning, and also to the very speci c application domain context in which the training was performed. Generalisation results
The results obtained for various dimensions of the learning ratio are reported on the left side of Figure 11. As can be seen, the generalisation got better when the SS based it on a larger amount of information. Notice that it is not necessary to provide the SS with a large amount of relevance information all at once. It is possible to provide this information gradually in an interactive process. The user can point to a small set of relevant documents and let the SS reorder the relevance evaluation of the entire collection according to this information, by means of a training session. Then, the user can look again through the documents and identify some other relevant documents to be used for another training session together with those provided before. In this way the generalisation performance of the SS will keep improving and the SS will identify new relevant documents more and more precisely. Retrieval results
The retrieval results are depicted on the right of Figure 11. It is interesting to notice that the results obtained by VL are always better than those obtained by the traditional IRS. Besides, the precision obtained by the two thirds training is better
F. CRESTANI AND C.J. VAN RIJSBERGEN
24 Generalisation Results - Vertical Learning
Retrieval Results - Vertical Learning
1
1 No training Training with 1/3 of the data set Training with 1/2 of the data set Training with 2/3 of the data set
Conventional IRS Training with 1/2 of the data set Training with 2/3 of the data set
0.9
0.8
0.8
0.7
0.7
0.6
0.6 Precision
Precision
0.9
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0 0
0.1
0.2
0.3
0.4
0.5 Recall
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5 Recall
0.6
0.7
0.8
0.9
1
Figure 11. Generalisation and retrieval performances of Vertical Learning.
that the one obtained by the half training for low values of recall, while the situation is the opposite for high levels of recall. A possible explanation for this can be that very speci c training, obtained by the use of a larger number of relevant documents enables to retrieve fewer documents but mostly the relevant ones. The behaviour of the SS is therefore optimal because when it uses little relevance information then it favours a larger recall, while when the relevance information gets more speci c (larger) the SS favours precision. However, to be fair to the traditional IR technique we used as a comparison, we must notice that retrieval based on VL is more close to relevance feedback than to traditional matching retrieval. VL makes use of relevance information from the user, while the traditional retrieval techniques we are comparing to VL does not. Therefore we decided to compare retrieval based on VL with some traditional form of retrieval based on relevance feedback. Next section reports some experiments aiming at comparing VL with probabilistic relevance feedback.
7. Neural and Probabilistic Relevance Feedback Relevance Feedback (RF) is a technique that allows a user to express in a better
way his information requirement by adapting his original query formulation with further information provided by indicating some relevant documents. RF is a very good technique of specifying an information requirement, because it releases the user from the burden of having to think up lots of terms for the query. Instead the user deals with the ideas and concepts contained in the documents. It also ts in well with the known human trait of \I don't know what I want, but I'll know it when I see it". Obviously the user cannot mark documents as relevant until some are retrieved, so the rst search has to be initiated by a query. The IR system will return a list of ordered documents covering a range of topics, but probably at least one document in the list will cover or come close to covering the user's interest. The user will mark the document(s) as relevant and start the RF process performing
A MODEL FOR ADAPTIVE INFORMATION RETRIEVAL
25
another search. If RF performs well the next list should be closer to the user's requirement. Another way of thinking about RF is as a lter that receives as input a query and a set of relevant documents (or considered so by the user) and that gives as output a modi ed or adapted query. This process of query adaptation is supposed to alter the original user formulated query to take into consideration the information provided by features of relevant documents. In IR we can use dierent RF techniques, depending on the IR model being preferred. Here we decided to use Probabilistic RF. Probabilistic Relevance Feedback (PRF) is one of the most advance technique for performing RF in operative IR systems [46]. Brie y, the technique consists in adding a few other terms to those already present in the original query. The terms added are chosen by taking the rst m terms in a list where all the terms present in relevant documents are ranked according to the following weighting function: (N , ni , R + ri ) wi = log ri(R , ri) (ni , ri ) where: N is the number of documents in the collection, ni is the number of documents with an occurrence of term i, R is the number of relevant documents pointed out by the user, and ri is the number of relevant documents pointed out by the user with an occurrence of term i. Essentially, what this complex function does is to compare the frequency of occurrence of a term in the documents the user marked as relevant with its frequency of occurrence in the whole document collection. So if a term occurs more frequently in the documents marked as relevant than in the whole document collection, it will be assigned a high weight. In the experiments reported in this paper the number of terms added to the original query has been experimentally set to 10. We can implement Neural Relevance Feedback (NRF) using the results obtained for VL, that are described in Section 6.3. NRF learns from training examples to associate new terms to the original query formulation. The adapted query is produced by adding to the original query the rst m higher activated terms (nodes) on the output layer. For comparison with PRF m is set to 10. We decided to adopt this strategy to make NRF stick more closely to the original user formulated query. A problem found with HL and VL was that the adapted query resulted sometimes quite dierent from the original one. Some of the index terms indicated by the user were lost, substituted by new ones and so one could doubt if some information contained in the original query was lost. The use of this strategy makes also more interesting the comparison between NRF and PRF, since what we are actually comparing is two dierent techniques of query expansion/adaptation. Figure 12 shows graphically how the performance of the NRF and PRF increases when the SS is fed with increasing amount of relevant documents. This result shows that the both PRF and NRF act like a pattern recognition device and the more information they receive the more they can discriminate between patterns of relevant and not relevant documents. The performance is evaluated averaging
F. CRESTANI AND C.J. VAN RIJSBERGEN
26 Neural Relevance Feedback
Probabilistic Relevance Feedback
1
1 1 relevant document 2 relevant documents 5 relevant documents 10 relevant documents
1 relevant document 2 relevant documents 5 relevant documents 10 relevant documents
0.9
0.8
0.8
0.7
0.7
0.6
0.6
Precision
Precision
0.9
0.5 0.4
0.5 0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0 0
0.1
0.2
0.3
0.4
0.5 Recall
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5 Recall
0.6
0.7
0.8
0.9
1
Figure 12. Performance of NRF and PRF w.r.t. the training sets.
over the entire set of queries in the relevance assessment at dierent values of the number of relevant documents given as feedback. The graph shows that the NRF increases more rapidly in performance than the PRF. This is due to the better characteristics of non linear discrimination of NRF, that enables it to separate better the two patterns. However, the performance of PRF is better at any level of training and especially at lower levels. This makes PRF more useful in real world applications where the percentage of relevant documents used in the training is usually very low compared to the number of relevant documents present in the collection.
8. Conclusions We proposed a network model for associative adaptive IR, and we showed how it is possible to implement it using NN. We experimented acquisition and use of application domain knowledge for query adaptation, and we tested dierent learning and query adaptation strategies. The experiments reported in this paper show that query adaptation produced by TL does not give good results. The NN is not able to learn and generalise the application domain knowledge. The amount of information submitted to the system seems to be too much and the system shows a form of \confusion". It is necessary to lter and rearrange the information to be learned. Query adaptation produced by HL gives performances that are similar to those provided by the use of the original query. The interesting thing is that the adapted query is most of the time quite dierent from its original formulation. Accordingly, the two sets of retrieved documents resulting from the use of the original query and the adapted one are sometimes quite dierent. The adapted query is often able to retrieve relevant documents that the original query is not able to retrieve. This is because the adaptation process determines descriptors not speci ed in the original query, but useful for identifying other documents relevant to the same information need. Why then is there approximately the same level of performance?
A MODEL FOR ADAPTIVE INFORMATION RETRIEVAL
27
What happens is that the SS gives too much importance to the domain knowledge acquired in the training phase modifying the query accordingly, by doing so it loses some of the information contained in the original query, which is why some descriptors speci ed in the original formulation of the query were dropped out from the adapted query. The results produced by VL show that it is easier to learn about a very narrow topic when there is no other knowledge that can interfere with the learning. The advantage of such a result is that it is possible to distinguish two dierent kinds of query: (a) a generic query, in which the user expresses his not-well-de ned information need; (b) a very speci c query in which the user is also able to point out some document he knows to be relevant. In the rst case the query adaptation resulting from HL or a combination of traditional and adaptive retrieval can be used. The result provided by that retrieval should be good enough to enable the user to point out some relevant documents among those retrieved. Successively or in the second case, the user either in a process of relevance feedback or in a formulation of the query by giving example of relevant documents (known in IR as \query by example"), provides a more speci c formulation of his information need. In this case the system can use this information to retrieve other relevant documents without taking into account the knowledge of the entire application domain, but focusing only on that particular topic. The two situations are typical of an interactive query session and can be combined, as it is done in many systems, using relevance feedback devices. We tested the use of NN to implement relevance feedback and we showed that the performance of a simple form of neural relevance feedback based on VL gives performance only little worse than probabilistic relevance feedback. An added bonus of the proposed implementation is that the result of a good query session, i.e. a query speci cation and a set of relevant documents, can be stored and used in future training sessions. This process should keep improving the performance of the system's response to the rst kind of query. This has not been proved yet and some further experimental investigations will be devoted to the analysis of the improvement of the domain knowledge base, in particular with larger document collections. We think it is necessary to proceed with further research using dierent NN architectures, and dierent learning algorithms. Currently NN do not seem suitable for use in IR applications, where the number training examples is usually very little and the number of nodes and connections is extremely high. Another major drawback in adopting NN to model IR is the high computational complexity, primarily due to the large number of documents and/or query descriptions involved. This problem must be satisfactorily resolved before a viable IR system can be implemented. Despite that, we think that future progress in the hardware and new NN learning strategies will make NN more and more useful in IR and related areas.
28
F. CRESTANI AND C.J. VAN RIJSBERGEN
Notes 1. For sake of simplicity the direction is not shown in the pictures reported on this paper, sine for most of the examples reported here there is no real need of showing it. 2. Recall and precision are two well known eectiveness measures in IR. They are respectively, the ratio of the number of retrieved relevant documents to the total number of relevant documents present in the collection, and the ratio of the number of retrieved relevant documents to the total number of retrieved documents. 3. Experimental results [47] show that the larger the number of hidden units the worse the generalisation performance of a 3-layer feedforward NN.
References 1. W.B. Croft. Approaches to Intelligent Information Retrieval. Information Processing & Management, 23(4):249:254, 1987. 2. D.E. Rumelhartand D.A. Norman. Representationin memory. Technical report, Department of Psychology and Institute of Cognitive Science, UCSD La Jolla, USA, 1983. 3. P.R. Cohen and E.A. Feigenbaum. The handbook of Arti cial Inteligence, volume 3. William Kaufmann, Los Altos, CA, 1982. 4. M. Agosti, F. Crestani, and G. Gradenigo. Towards data modelling in Information Retrieval. Journal of Information Science, 25(6):307{319, 1989. 5. R.M. Tong, L.A. Appelbaum, V.N. Askman, and J.F. Cunningham. Conceptual Information Retrieval using RUBRIC. In Proceedings of ACM SIGIR, New Orleans, Luisiana, USA, June 1987. 6. M.T. Weaver, R.K. France, Q.F. Chen, and E.A. Fox. Using a frame based language for Information Retrieval. International Journal of Intelligent Systems, 4(3):223{258, 1989. 7. W.B. Croft and R.H.Thompson. I 3 R: a new approach to the design of Document Retrieval Systems. Journal of the American Society for Information Science, 38(6):389{404, 1987. 8. M.K. Di Benigno, G.R. Cross, and C.G. de Bessonet. COREL. a conceptual retrieval system. In Proceedings of ACM SIGIR, Pisa, Italy, September 1986. 9. P.R. Cohen and R. Kjeldsen. Information Retrieval by constrained spreading activation on Sematic Networks. Information Processing & Management, 23(4):255{268, 1987. 10. W.B. Croft, T.J. Lucia, and P.R. Cohen. Retrieving documents by plausible inference: a preliminary study. In Proceedings of ACM SIGIR, Grenoble, France, June 1988. 11. W.B. Croft, T.J. Lucia, J. Crigean, and P. Willet. Retrieving documents by plausible inference: an experimental study. Information Processing & Management, 25(6):599{614, 1989. 12. L.F. Rau. Knowledge organization and access in a conceptual information system. Information Processing & Management, 23(4):269{283, 1987. 13. M.C. Mozer. Inductive Information Retrieval using parallel distributed computation. Technical report, Institute for Cognitive Science, University of California, San Diego, USA, June 1984. 14. J. Bein and P. Smolensky. Application of the interactive activation model to document retrieval. Technical report, Dept. of Computer Science, University of Colorado, Boulder, 1988. 15. R.J. Brachman and D.L. McGuinness. Knowledge representation, connectionism, and conceptual Retrieval. In Proceedings of ACM SIGIR, Grenoble, France, June 1988. 16. D.E. Rose and R.K. Belew. A connectionist and symbolic hybrid for improvinglegal research. International Journal of Man-Machine Studies, 35:1{33, 1991. 17. S.K.M. Wong, Y.Y. Yao, and P. Bollmann. Linear structure in information retrieval. In Proceedings of ACM SIGIR, pages 219{232, Grenoble, France, June 1988. 18. M.L. Minsky and S.A. Papert. Perceptrons: an introduction to Computational Geometry. MIT Press, Cambridge, MA, USA, extended edition, 1988.
A MODEL FOR ADAPTIVE INFORMATION RETRIEVAL
29
19. R.K. Belew. Adaptive Information Retrieval: machine learning in associative networks. PhD thesis, University of Michigan, USA, 1986. 20. R.K. Belew. Adaptive Information Retrieval: using a connectionist representation to retrieve and learn about documents. In Proceedings of ACM SIGIR, Cambridge, USA, June 1989. 21. P. Hingston and R. Wilkinson. Document retrieval using a Neural Network. Technical report, Dept. of Computer Science, Royal Melbourne Institute of Technology, Melbourne, Australia, 1990. 22. S.K.M. Wong, Y.J. Cai, and Y.Y. Yao. Computationof term associationby a Neural Network. In Proceedings of ACM SIGIR, Pittsburgh, PA, USA, July 1993. 23. R.N. Oddy and B. Balakrishnan. Pthomas: and adaptive Information Retrieval system on the Connection Machine. Information Processing & Management, 27(4):317{335, 1991. 24. J.C. Scoltes. Neural Nets and their relevance for Information Retrieval. Technical report, Department of Computational Linguistics, University of Amsterdam, The Netherlands, 1991. 25. K.L. Kwok. A Neural Network for probabilistic Information Retrieval. In Proceedings of ACM SIGIR, Cambridge, MA, USA, June 1989. 26. K.L. Kwok. Application of Neural Networks to Information Retrieval. In Proceedings of the International Joint Conference on Neural Networks, volume 2, pages 623{626, Washington, USA, January 1990. IEEE. 27. G.S. Jung and V.V. Raghavan. Connectionist learning in costructing thesaurus-like knowledge structure. AAAI Spring Symposium on Text-Based Intelligent Systems, Working Notes, March 1990. 28. M. Agosti, F. Crestani, G. Gradenigo, and P. Mattiello. An approach to conceptualmodelling of IR auxiliary data. In Proceedings of IEEE International Conference on Computer and Comunications, Scottsdale, Arizona, USA, 1990. 29. D.J. Harper and A.D.M. Walker. ECLAIR: an extensible class library for Information Retrieval. The Computer Journal, 35(3):256{267, 1992. 30. M. Agosti, F. Crestani, and M. Melucci. Design and implementationof a toll for the automatic construction of hypertexts for information retrieval. Information Processing & Management, 32(4):459{476, 1996. 31. F. Crestani. A network model for Adaptive InformationRetrieval. M.Sc. Thesis, Department of Computing Science, University of Glasgow, Glasgow, Scotland, January 1992. 32. H.P. Luhn. A statistical approach to mechanized encoding and searching of library Information. IBM Journal of Research and Development, 1:309:317, 1957. 33. R.H. Thompson. The design and implementation of an intelligent interface for Information Retrieval. Technical report, Computer and Information Science Department, University of Massachusetts, 1989. 34. H.M. Brooks. Expert systems and Intelligent Information Retrieval. Information Processing & Management, 23(4):367{382, 1987. 35. M. Pegman. RADA: an intelligent research and development advisor. Expert Systems for Information Management, 2(2):81{104, 1989. 36. A.F. Smeaton. Progress in the application of Natural Language Processing to Information Retrieval tasks. The Computer Journal, 35(3):268{278, 1992. 37. R.M. Tong, L.A. Appelbaum, and V.N. Askman. A knowledge representation for conceptual Information Retrieval. International Journal of Intelligent Systems, 4(3):259{284, 1989. 38. U. Schiel. Abstraction in Semantic Networks: axiom schemata for generalization, aggregation and grouping. SIGART Newsletters, 107:25{26, January 1989. 39. J. Aitchison and A. Gilchrist. Thesaurus construction. A practical manual. ASLIB, London, 2nd edition, 1987. 40. S.E. Preece. A spreading activation model for Information Retrieval. PhD thesis, University of Illinois, Urbana-Champaign, USA, 1981. 41. G. Salton and C. Buckley. On the use of spreading activation methods in automatic Information Retrieval. In Proceedings of ACM SIGIR, Grenoble, France, June 1988. 42. D.E. Rumelhart, J.L. McClelland, and PDP Research Group. Parallel Distributed Processing: exploration in the microstructure of cognition. MIT Press, Cambridge, 1986.
30
F. CRESTANI AND C.J. VAN RIJSBERGEN
43. H.R. Turtle and W.B. Croft. Inference networks for document Retrieval. In Proceedings of ACM SIGIR, Brussels, Belgium, September 1990. 44. H.R. Turtle and W.B. Croft. Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems, 9(3):187{222, July 1991. 45. C. Cleverdon, J. Mills, and M. Keen. ASLIB Cran eld Research Project: factors determining the performance of indexing systems. ASLIB, 1966. 46. C.J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979. 47. J. Hertz, A. Krogh, and R. Palmer. Introduction to the theory of Neural Computation. Addison-Wesley, New York, 1991. Received Date Accepted Date Final Manuscript Date