implementation and evaluation of a relevance ... - Semantic Scholar

IMPLEMENTATION AND EVALUATION OF A RELEVANCE FEEDBACK DEVICE BASED ON NEURAL NETWORKS Fabio Crestani Dipartimento di Elettronica e Informatica Universita' di Padova I-35131 Padova - Italy

Abstract This paper presents the results of an experimental investigation into the use of Neural Networks for implementing Relevance Feedback in an interactive Information Retrieval System. The most advance Relevance Feedback technique used in operative Interactive Information Retrieval systems, Probabilistic Relevance Feedback, is compared with a Neural Networks based technique. The latest uses the learning and generalisation capabilities of a 3{layer feedforward Neural Network with the Backpropagation learning procedure to learn distinguishing between relevant and non-relevant documents. A comparative evaluation between the two techniques is reported using an advance Information Retrieval System, a Neural Network simulator, and an IR test document collection. The results are reported and explained from an Information Retrieval point of view.

1 Information Retrieval Information Retrieval (IR) is a science that aims at storing and allowing fast access to a large amount of information. This information can be of any kind: textual, visual, or auditory. An Information Retrieval System (IRS) is a computing tool which stores this information to be retrieved for future use. Most actual IR systems store and enable the retrieval of only textual information or documents. However, this is not an easy task, just to give a clue to its size, it must be noticed that often the collections of documents an IRS has to deal with contain several thousands or even millions of documents. A user accesses the IRS by submitting a query, the IRS then tries to retrieve all documents that \satisfy" the query. As opposed to database systems, and IRS does not provide an exact answer but produce a ranking of documents that appear to contain information relevant to the query. Queries and documents are usually expressed in natural language and to be processed by the IRS they are passed through a query and a document processors which breaks them into their constituents words. Non-content-bearing words (\the", \but", \and", etc.) are discarded, and suxes are removed, so that what remains to represent query and documents are lists of terms that can be compared using some similarity evaluation algorithms. Good

1

Relevance Feedback User assessment

Query

Query representation Similarity evaluation

Documents

Ordered documents

Document representation Information Retrieval System

Figure 1: Schematic view of and Information Retrieval System IR systems typically rank the matched documents so that those most likely to be relevant (those with the higher similarity with the query) are presented to the user rst. An example of an IR system is depicted in Fig. 1. Some retrieved documents will be relevant (with varying degree of relevance) and some will, unfortunately be irrelevant. In some advanced IRS, the user can appraise the documents that he considers relevant and feeds them back through a process called Relevance Feedback (RF). RF modi es the original query to produce a new improved query by adding additional query terms (see Fig. 2). As a consequence a new ranking of documents is produced. If the IR system is interactive this feedback process will go on until the user is happy of the resulting list of relevant documents. In recent years big eorts have been devoted to the attempt to improve the performance of IR systems and research has explored many dierent directions trying to use with pro ts results achieved in other areas. In this paper we will investigate the possibility of using Neural Networks (NN) in IR and in particular we will concentrate on the RF process. RF is not always present in operative IR systems, but it has been widely recognised to improve considerably IR systems' performance. The purpose of this work is therefore to investigate the possibility of using NN to implement a Neural RF. Previous research (see [1] for a survey) shows that, though giving encouraging results, NN cannot be eectively used in IR at the current state of the NN technology. The scale of real IR applications, were hundreds of thousands of documents are at stake, makes it impossible to use NN in an eective way, unless we make use of very poor document and query representations. However, recent results [2] proves that it may be still possible to use NN in IR for very speci c tasks were the numbers of patterns involved (and therefore the training) are reduced to a manageable size. In this paper we show that it is indeed possible to use NN to implement a RF device. However, the results here reported also show that a Neural RF device is not eective in real world's IR applications, where it performs worse than traditional RF techniques.

2 Relevance feedback and query adaptation Relevance Feedback is a technique that allows a user to express in a better way his information requirement by adapting his original query formulation with further information provided 2

Relevant documents

Terms weighting and selection

Document representation

Additional query terms

Relevance Feedback Device

Figure 2: Schematic view of a Relevance Feedback Device by indicating some relevant documents. When a document is marked as relevant the RF device analyses the document text, picking out terms that are statistically signi cant to the document, and adds these terms to the query. RF is a very good technique of specifying an information requirement, because it releases the user from the burden of having to think up lots of terms for the query. Instead the user deals with the ideas and concepts contained in the documents. It also ts in well with the known human trait of \I don't know what I want, but I'll know it when I see it". Obviously the user cannot mark documents as relevant until some are retrieved, so the rst search has to be initiated by a query. The IRS will return a list of ordered documents covering a range of topics, but probably at least one document in the list will cover, or come close to covering, the user's interest. The user will mark the document(s) as relevant and starts the RF process performing another search. If RF performs well the next list should be closer to the user's requirement. A schematic view of a RF device is depicted in Fig. 1. We may also think at a RF device as a lter that receives as input a query and a set of relevant documents (or considered so by the user) and that gives as output a modi ed or adapted query. This process of query adaptation is supposed to alter the original user formulated query to take into consideration the information provided by features of relevant documents. More formally the input of the RF device has the following form: (q ; d1; d2; d3; : : :d ) i

i

i

i

i l

where q is the representation of the query i, and d is the representation of a document relevant to the query q . The set is composed of only a subset (l documents) of all the documents (k documents, with k > l) that are relevant to that particular query. The aim of the RF is to enable the retrieval of the other (l k) relevant documents. In IR we can use dierent RF techniques, depending on the IR model being preferred. In the following Section we will illustrate a technique called Probabilistic RF, which is based on the Probabilistic IR model. i j

i

i

3 Probabilistic relevance feedback Probabilistic Relevance Feedback (PRF) is one of the most advance technique for performing RF in operative IR systems. For a better and more complete explanation of the mathematics

3

Relevant document representations

Query representation

New query representation

Query representation

Training

Spreading

Figure 3: Training and spreading phases of Neural Relevance Feedback involved see [3] and [4]. Brie y, the technique consists in adding a few other terms to those already present in the original query. The terms added are chosen by taking the rst m terms in a list where all the terms present in relevant documents are ranked according to the following weighting function: (N n R + r ) w = log r (R r ) (n r ) i

i

i

i

i

i

i

where: N is the number of documents in the collection, n is the number of documents with an occurrence of term i, R is the number of relevant documents pointed out by the user, and r is the number of relevant documents pointed out by the user with an occurrence of term i. Essentially what this rather complex function does is compare the frequency of occurrence of a term in the documents the user marked as relevant with the term frequency of occurrence in the whole document collection. So if a term occurs much more frequently in the documents marked as relevant than in the whole document collection it will be assigned a high weight. In the experiments reported in this paper the number of terms added to the original query has been experimentally set to 10. i

i

4 Neural relevance feedback In this work we intend to investigate the possible use of NN in designing and implementing RF. In [2, 5] an Adaptive IR System using NN was presented. We refer back to those papers for the architecture, the algorithms, and the learning performance of the system. Here we will describe only how that system can be used to perform Neural Relevance Feedback (NRF). We can perform NRF using a RF device based upon a NN. This device learns from training examples to associate new terms to the original query formulation. The NRF device acts in a way similar to the classical RF. The main dierence is that the weights used to order and select the terms are obtained from the output of a 3{layer feedforward NN trained using the Back Propagation (BP) learning algorithm. We use such a simple NN structure because we intended to consider the NN as a black box to investigate the feasibility of the approach. After this rst investigation we will go on into tuning the structure to make it more eective to the problem at hand. We use as many input nodes as the terms that can be used to formulate queries, and as many output nodes as terms that can be used to represent all the documents in the collection. Each node in the NN input layer represents a query term and each node in the output layer represent a document term. We use the BP learning algorithm because its de nition itself, as back propagation of errors, suggests its use in the RF context 4

(see [6, 7]). In fact, what BP does is to back propagate the dierence between the desired output (the representation of a relevant document) and the actual output (the representation of a non-relevant document). The BP learning algorithm is directed to make this dierence the smallest. Fig. 3 shows that the NRF acts in two phases. During the training phase the input and the output layer of the NN are set to represent a training example made of a query representation q and a relevant document representation d . The BP algorithm is used for learning, and is repeated for every relevant document d , with j = 1; : : :; l. The learning is monitored by the NN control structure and when some predetermined conditions are met (such as, for example, a mean error below a certain threshold) the training phase is halted. During a spreading phase the activation produced on the input nodes by the query spreads from the input layer to the output layer using the weight matrices produced during the training phase. A new query is produced by adding to the terms already present in the query the rst m (here m = 10) higher activated terms (nodes) on the output layer. i j

i

i j

5 Experimental Settings and Evaluation In order to perform an experimental analysis on the performance of PFR and NRF these tools are necessary: a document collection with relevance assessment, a probabilistic IRS with PRF, and a NN or a NN simulator on which implement a NRF. The document collection chosen for the investigation is the ASLIB Cran eld test collection . This collection was built up with considerable eort in the 60s as the testbed for the ASLIBCran eld research project, aimed at studying the \factors determining the performance of indexing systems". They produced two test collections of documents about aeronautics, comprehensive of documents, requests and relative relevance judgements. For a full description of the collections see [8]. In the investigation reported in this paper the 200 document collection, with 42 requests and relevance judgements, has been used. It is of course understood that this limits the generality of the results obtained. However, the main purpose of this investigation is to demonstrate the feasibility of the proposed approach. There are still many open issues in the application of NN to IR, and the problem of scaling the results is one of the major ones. Nevertheless, it must be noticed that the dimension of the data set used in the these experiments is one of the largest used in any application of NN techniques to IR. The choice of IR systems with PRF was quite limited. In fact, there are not many operative IR systems with such an advance feature. One of the best systems is News Retrieval Tool (NRT) [9], developed at the University of Glasgow for the Financial Times. NRT implements PRF as speci ed in Section 3. For the investigations reported in this paper a NN simulator, PlaNet 5.6 [10], running on a fast conventional computer has been used. The architecture and the algorithm's details of the implementation are reported in [2]. For the evaluation, the main retrieval eectiveness measures used in IR are Recall and Precision. These two measures have been used in evaluating and comparing the performance of PRF and NRF. Recall is the proportion of all documents in the collections that are relevant 5

Neural Relevance Feedback

Probabilistic Relevance Feedback

1

1 1 relevant document 2 relevant documents 5 relevant documents 10 relevant documents

1 relevant document 2 relevant documents 5 relevant documents 10 relevant documents

0.9

0.8

0.8

0.7

0.7

0.6

0.6

Precision

Precision

0.9

0.5 0.4

0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0 0

0.1

0.2

0.3

0.4

0.5 Recall

0.6

0.7

0.8

0.9

1

0

0.1

0.2

0.3

0.4

0.5 Recall

0.6

0.7

0.8

0.9

1

Figure 4: Performance of NRF and PRF w.r.t. the training sets to a query and that are actually retrieved. Precision is the proportion of the retrieved set of documents that is also relevant to the query. In order to give a measure of the generalisation performance of NRF and PRF, Recall and Precision have been evaluated with dierent dimension of the set of training examples. If some learning of the domain knowledge has taken place, and if the NN can generalise it, then an improvement in the retrieval performance has to be expected.

6 Experimental results The results here reported refer to the PRF implemented in NRT and to a NRF implemented on PlaNet using a 3{layer feedforward NN with BP learning. The numbers of input nodes was set to 195, the number of hidden nodes to 100, and the number of output nodes to 1142. The number of input and output nodes correspond respectively to the number of terms used in all the queries and to the number of terms used by the documents of the Cran eld collection. The mean number of documents relevant to a query is about 6:8. Fig. 4 shows graphically how the performance of the NRF and PRF increases when the RF device is fed with increasing amount of relevant documents. This shows that both PRF and NRF act like a pattern recognition device and the more information they receives the more they can discriminate between patterns of relevant and not relevant documents. The performance has been evaluated averaging over all the set of queries in the relevance assessment at dierent values of the number of relevant documents given as feedback (the training examples). The graph shows that the NRF increases more rapidly in performance than the PRF. This is due to the better characteristics of non linear discrimination of NRF, that enables it to separate better the two patterns. However, the performance of PRF are better at any level of training and especially at lower levels. This makes PRF more useful in real world applications where the percentage of relevant documents used in the training over the total number of relevant document is usually very low. Tab. 1 reports performance gures for the case of 10% training, i.e. when 10% of the number of relevant documents totally present in the document collection are fed to the RF device for 6

Recall Precision NRF Precision PRF 0.10 0.245 0.440 0.20 0.234 0.363 0.30 0.225 0.332 0.40 0.201 0.301 0.50 0.179 0.275 0.60 0.160 0.259 0.70 0.132 0.239 0.80 0.111 0.221 0.90 0.051 0.210 1.00 0.022 0.130

Table 1: Comparison of the performance of Neural and Probabilistic RF for 10% training training. It shows numerically how PRF outperforms NRF in a realistic situation where the number of training examples, i.e. the number of relevant documents fed to the RF device is low compared to the total number of relevant documents present in the collection. This con rms what Fig. 4 already shows, and gives a stronger argument for preferring PRF to NRF in IR.

7 Conclusions The results of this investigation, brie y summarised in this paper, demonstrate that a RF device based on NN (NRF) acts in a similar way to classical RF techniques. However, a comparison with one of the most advance RF techniques presently used in IR (PRF) shows that for low levels of training, which is the most common case in IR, NRF does not perform as well as PRF. Though it is necessary to proceed with further research using dierent NN architectures, and dierent learning algorithms, we think that NN are currently not suitable for use in IR applications where the number training examples is usually very little and the number of nodes and connections is extremely high.

References [1] F. Crestani. A survey on the application of Neural Networks's supervised learning procedures in Information Retrieval. Rapporto Tecnico CNR 5/85, Progetto Finalizzato Sistemi Informatici e Calcolo Parallelo - P5: Linea di Ricerca Coordinata Multidata, December 1991. [2] F. Crestani. Learning strategies for an Adaptive Information Retrieval System using Neural Networks. In Proceedings of the IEEE International Conference on Neural Networks, pages 244{249, S.Francisco, California, USA, March 1993. [3] C.J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979. 7

[4] D. Harman. Relevance feedback and other query modi cation techniques. In W.B. Frakes and R. Baeza-Yates, editors, Information Retrieval: data structures and algorithms., chapter 11. Prentice Hall, Englewood Clis, New Jersey, USA, 1992. [5] F. Crestani. An Adaptive Information Retrieval System based on Neural Networks. In J. Mira, J. Cabestany, and A. Prieto, editors, New trends in Neural Computation, volume 686 of Lecture Notes in Computer Science, pages 732{737. Springer-Verlag, June 1993. [6] D.E. Rumelhart, J.L. McClelland, and PDP Research Group. Parallel Distributed Processing: exploration in the microstructure of cognition. MIT Press, Cambridge, 1986. [7] J. Hertz, A. Krogh, and R. Palmer. Introduction to the theory of Neural Computation. Addison-Wesley, New York, 1991. [8] C. Cleverdon, J. Mills, and M. Keen. ASLIB Cran eld Research Project: factors determining the performance of indexing systems. ASLIB, 1966. [9] M. Sanderson and C.J. van Rijsbergen. NRT: news retrieval tool. Electronic Publishing, 4(4):205{217, 91. [10] Y. Miyata. A user's guide to PlaNet version 5.6: a tool for constructing, running and looking into a PDP network. Computer Science Department, University of Colorado, Boulder, USA, December 1990.

8