Soft Information Retrieval: Applications of Fuzzy Set Theory and Neural Networks Fabio Crestani1 and Gabriella Pasi2 1 2
Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, Scotland. Email:
[email protected] Istituto per le Tecnologie Informatiche Multimediali, Consiglio Nazionale delle Ricerche, via Ampere 56, 20131 Milano, Italy. Email:
[email protected]
Abstract. This paper presents a short survey of fuzzy and neural approaches to Information Retrieval. The goal of such approaches is to de ne
exible Information Retrieval Systems able to deal with the inherent vagueness and uncertainty of the retrieval process. In this survey we address if and how some approaches met their goal.
1. Introduction The rapid and increasing development of the Internet, with the consequent huge availability of online textual information makes urgent the need for effective Information Retrieval Systems. The goal of an Information Retrieval System (IRS) is to retrieve information considered pertinent to a user's query (formally expressed in the system's query language). The eectiveness of an IRS is measured through parameters which re ect the ability of the system to accomplish such goal. However, the nature of the goal is not deterministic, since uncertainty and vagueness are present in many dierent parts of the retrieval process. The user's expression of his/her information needs in a query is uncertain and often vague, the representation of a document informative content is uncertain, and so is the process by which a query representation is matched to a document representation. The eectiveness of an IRS is therefore crucially related to the system's capability to deal with the vagueness and uncertainty of the retrieval process. Commercially available IRSs generally ignore these aspects; they oversimplify both the representation of the documents' content and the user-system interaction. In recent years a great deal of research in IR has aimed at modelling the vagueness and uncertainty which invariably characterise the management of information. A rst class of approaches is based on methods of analysis of natural language [56]. The main limitation of these methods is the level of deepness of the analysis of the language, and their consequent range of applicability: a satisfying interpretation of the documents' meaning needs a too large number of decision rules even in narrow application domains. A second class of approaches is more general: their objective is to de ne retrieval models which deal with imprecision and uncertainty independently on the application domain. The most long standing set of approaches belonging to
2
Fabio Crestani and Gabriella Pasi
this class goes under the name of Probabilistic IR [20]. The aim of Probabilistic IR is develop ad hoc models able to cope with the uncertainty of the retrieval process. However, there is another set of approaches receiving increasing interest that aims at applying techniques for dealing with vagueness and uncertainty. This set of approaches goes under the name of Soft Information Retrieval. In this chapter we will review some of the approaches to Soft Information Retrieval, in particular the approaches that make use of Fuzzy Set Theory and Neural Networks. The remainder of this chapter is structured as follows: in section 2. we present an introduction to the IR problem. In section 2.1 we give an overview of the main classical IR models, while in section 2.2 we explain the concept of Soft Information Retrieval. The core of the chapter is in sections 3. and 4. where a number of models of Soft Information Retrieval based on Fuzzy Set Theory and Neural Networks are surveyed. In section 5. we draw the conclusions of our survey and outline future directions of research.
2. Information Retrieval Information Retrieval is a branch of Computing Science that aims at storing and allowing fast access to a large amount of information. This information can be of any kind: textual, visual, or auditory [59]. An Information Retrieval System is a computing tool which represents and stores information to be automatically retrieved for future use. Most actual IR systems store and enable the retrieval of only textual information or documents. However, this is not an easy task, just to give a clue to its size, it must be noticed that often the collections of documents an IRS has to deal with contain several thousands or sometimes millions of documents. User assessment
Query
Query representation Relevance evaluation
Documents
Ordered documents
Document representation Information Retrieval System
Fig. 2.1.
Schematic view of an Information Retrieval System
A user accesses the IRS by submitting a query, the IRS then tries to retrieve all documents that are \relevant" to the query. To this purpose, in
Soft Information Retrieval
3
a preliminary phase, the documents contained in the archive are analysed to provide a formal representation of their contents: this process is known as \indexing". Once a document has been analysed, a surrogate describing the document is stored in an index, while the document itself is also stored in the collection or archive. To express some information needs a user formulates a query, in the system's query language. The query is matched against entries in the index in order to determine which documents are relevant to the user. In response to a query, an IRS can provide either an exact answer or a ranking of documents that appear likely to contain information relevant to the query. The result depends on the formal model adopted by the system. As it will be explained in next section, the Boolean model produces an exact answer, while others, most advanced models apply a partial matching mechanism, which produces a ranking of the retrieved documents so that the most likely to be relevant are presented to the user rst. In some IRSs queries are expressed in natural language and to be processed by the system they are passed through a query processor which breaks them into their constituents words. Noncontent-bearing words are discarded, and suxes are removed, so that what remains to represent query and documents are lists of terms that can be compared using some \relevance evaluation" algorithms. A scheme of an IR system is depicted in Fig. 2.1.
2.1 Information Retrieval Models The choice of the formal background to de ne both the document and query representations characterises the model of an IRS. In the IR literature dierent models have been proposed. The Boolean model is still the one most commonly used in commercial IR systems. It is based on mathematical set theory. Here documents are represented as sets of index terms, whose role cannot be dierentiated to express the information content. A query is a logical formula made up of index terms and logical connectives (eg. AND, OR, NOT). A document is considered relevant and retrieved by the IRS if it satis es the logical formula representing the query. The Vector Space model [50] is based on a spatial interpretation of both documents and queries. Here an improvement of the documents representation over the Boolean model is obtained by associating with each index term a numeric value, called the index term weight, which expresses the variable degree of signi cance that the term has in synthesising the information content of the document. Similarity measures between document and query representation are then used to evaluate a document's relevance with regards to a query. The Probabilistic model [59] ranks documents in decreasing order of their evaluated probability of relevance to a user's information need. Past and present research has made much use of formal theories of probability and of
4
Fabio Crestani and Gabriella Pasi
statistics in order to evaluate, or at least estimate, the probability of relevance. Without going into the details of any of the large number of probabilistic models of IR that have been proposed in the literature (for a survey see [20]), if we assume that a document is either relevant (R) or not relevant (R) to a query, the task of a probabilistic IR system is to rank documents according to their estimated probability of being relevant, i.e. P (R j q; d). Probabilistic relevance models base this estimation on evidence about which documents are relevant to a given query. The problem of estimating the probability of relevance for every document in the collection is dicult because of the large number of variables involved in the representation of documents in comparison to the small amount of document relevance information available. The models dier, primarily, in the way they estimate this or related probabilities. Probabilistic inference models apply concepts and techniques originating from areas such as logic and arti cial intelligence. The above mentioned models are the most studied ones. However, a large number of other models have been investigated and used in prototypical IRS. Some of these models, related to the so called Soft Computing paradigm, are addressed in this paper.
2.2 Soft Information Retrieval In recent years big eorts have been devoted to the attempt to improve the performance of IR systems and research has explored many dierent directions trying to use with pro ts results achieved in other areas. In this paper we will survey the application to IR of two theories that have been used in Arti cial Intelligence for quite some time: Fuzzy set theory and connectionist (neural networks) theory. The use of fuzzy set or connectionist techniques in IR has been recently refered to as Soft Information Retrieval in analogy with the area called Soft Computing. Fuzzy set theory [64] is a formal framework well suited to model vagueness: in IR it has been successfully employed at several levels [29, 60], in particular for the de nition of a superstructure of the Boolean model, with the appealing consequence that existing Boolean IRSs can be improved without redesigning them completely [10, 11, 13]. Through these extensions the gradual nature of relevance of documents to user queries can be modelled. A dierent approach is based on the application of the connectionist theory [48] to IR. Neural networks have been used in this context to design and implement IRSs that are able to adapt to the characteristics of the IR environment, and in particular to the user's interpretation of relevance. In the remainder of this chapter we will review a number of applications of fuzzy set theory and neural networks to IR.
Soft Information Retrieval
3. Application of Fuzzy Set Theory to Information Retrieval
5
In order to increase the exibility of IRSs some approaches based on the application of fuzzy set theory have been de ned. A fuzzy set allows the characterisation of its elements by means of the concept of \graduality"; this concept supports a more accurate description of a class of elements when a sharp boundary of membership cannot be naturally devised [64]. The entities involved in an IRS are well suited to be formalised within this formal framework with the aim of capturing their inherent vagueness; the main levels of application of fuzzy set theory to IR have concerned: { the de nition of extensions of the Boolean model, concerning both the representation of documents and the query language; { the de nition of associative mechanisms, such as fuzzy thesauri and fuzzy clustering In the following sections a short survey of these approaches is presented. Other applications of fuzzy set theory have concerned the de nition of knowledge based models of IR, and the de nition of fuzzy measures for evaluating the eectiveness of IRSs, in terms of recall and precision. These approaches are not described here, but the interested reader can read, among others, the contributions in [33, 5, 6] relating to the former applications, and the contributions in [14, 38] relating to the latter applications.
3.1 Extended Boolean models: fuzzy document representations A rst natural extension of the Boolean model is to represent a document as a fuzzy set of terms, thus making the description of the document's information contents more accurate [46]. For each term associated with a document a numeric weight is speci ed (the membership degree), which expresses the level of concern of the term with respect to the information contained in the document; formally, the function de ning the relation between documents and terms is de ned as: F : D T ,! [0; 1]. A document is then represented as a fuzzy set of terms, f(t)=tg, in which (t) = F (d; t). The fuzzy document representation is then based on the de nition of a weighted indexing function, which for each pair term-document produces a numeric value, the so called index term weight. Like in the vector space model and in the probabilistic model, the use of index term weights makes the retrieval mechanism able to rank documents in decreasing order of their relevance to the user query, the relevance being expressed by a numeric score, the so called Retrieval Status Value. The quality of the retrieval results strongly depends on the de nition of the adopted weighting function; the original proposal in the literature de ned this function by means of a count of term occurrences in the document and in the whole archive [50].
6
Fabio Crestani and Gabriella Pasi
This de nition, however, does not take into account that the information in documents is often structured: a scienti c paper, for example, is organised in sections as title, authors, keywords, abstract, references , etc. In such a con guration of a text, a single occurrence of a term in the title suggests that the paper is fully concerned with the concept expressed by the term, while a single occurrence in a footnote indicates that the paper refers to other publications dealing with that concept. It is evident that the information carried by a term occurrence depends on the semantics of the section where it is located. Besides this semantic dependency, the sections of a document may assume a dierent importance on the basis of users' needs. For example, when looking for papers written by a given person, the most important subpart to analyse is theauthors section, while when looking for papers on a given topic, the title, keywords, abstract and introduction should be rst analysed. To face with this problem Bordogna and Pasi have proposed a fuzzy representation of structured documents, which can be biased by user's interpretation [9]. The signi cance of a term t in a given document d is computed by rst evaluating the signi cance of t in each of the n sections; this is done by means of the application of a function Fci which has to be de ned for section ci (Fci (d; t) denotes the signi cance degree of term t in section ci of document d). Moreover, with each document's section the user can associate a numeric importance in the set [0,1] which is used in the aggregation phase to emphasise the role of the Fci (t; d)s of important sections with respect to those of less important ones. The signi cance degrees Fc1 (d; t); : : : ; Fcn (d; t) are then aggregated by means of a function, which can be selected by the user among a prede ned set of linguistic quanti ers: all, at least one, or almost k . The linguistic quanti er indicates the number of documents' sections in which a term must be present to be considered fully signi cant. Linguistic quanti ers have been formalised by means of OWA operators [61]. This fuzzy representation of structured documents has been implemented and evaluated, showing that it improves the eectiveness of a system with respect to the use of the traditional fuzzy representation [9]. Molinari and Pasi have proposed an approach to index documents written in HTML (HyperText Markup Language), in which another kind of structure is exploited, based on the syntactic structure of the language [40]. The basic assumption is that, when writing a document in HTML, one associates a dierent importance with dierent documents' subparts, by delimiting them by means of appropriate tags. For example, if characters of dierent dimensions are used inside the text, the bigger the dimension, the more important the information carried by the text; to bold or to italicise words generally means to highlight a portion of the text with respect to others. Tags can then be seen as elements carrying the explicit author's indication of the importance of the associated text. Based on this assumption an indexing function has been proposed, which computes the signi cance of a term in a document by taking into account the dierent role
Soft Information Retrieval
7
of term occurrences according to the importance of tags in which they appear. The signi cance degree of a term in a document is obtained by rst computing the signi cance degrees of the term inside the dierent tags, and then by aggregating these values. taking into account the dierent importance of tags. A ranking of tags based on their importance allows to assign a dierent numeric weight to each tag, and consequently the contribution of a word occurring in a tag is modulated by this importance weight.
3.2 Extended Boolean models: fuzzy extensions of the query language A query formulated through the Boolean language can be seen as the speci cation of a set of selection criteria connected through the Boolean operators AND and OR. A selection criterion is the elementary block for requesting information; in the Boolean language it is constituted by a term, which is selected by a user as the carrier of the concept synthesising the information he/she is looking for. To express more structured requests the elementary selection criteria can be put in relation by means of the aggregation operators. The fuzzy extensions of the Boolean query language proposed in the literature have concerned dierent levels. A rst level of extension was directed to extend the selection criteria. This is done by de ning selection criteria as pairs term-importance weight , in which the weight speci es the importance of the search term in the desired documents. Numeric query weights Importance weights have been rst formalised as numeric values, which specify a constraint to be satis ed by the fuzzy representation of documents in the indexed archive. The function g matching a selection criterion ht; wi against a document d is de ned as follows: g : [0; 1] [0; 1] ,! [0; 1]. The value g(F (d; t); w) is the degree of satisfaction of the selection criterion ht; wi by document d. The satisfaction of the separability property by an IRS makes it possible to evaluate a query from the bottom up, evaluating a given document against each weighted term in the query and then combining those evaluations according to the query structure [60]. The nature of the constraint imposed by the weighted selection criterion depends on the semantics associated with the weight; in the literature different semantics for query weights have been proposed. The weight can be interpreted as an importance weight, as a threshold, or as description of an \ideal" document. The semantics adopted for query weights implies a dierent de nition of function g. Some authors, among which Radecki, Bookstein, Yager have interpreted query weights as indicators of the relative importance among terms in a query [7, 46, 62]. The problem with this semantics however is related with its dependence on the type of the aggregation operator which connects pairs of selection criteria. When using an AND, for example, a very small value of w
8
Fabio Crestani and Gabriella Pasi
for one of the two terms will dominate the min function and force a decision based on the least important (smallest w) term, which is just the opposite of what is desired by the user [29]. To overcome this problem the proposed de nitions of the g function violate the property of separability [60]. Another semantics for query weights is to interpret them as thresholds that indicate the minimum acceptance level for the term signi cance degree in a document, to be selected. Radecki rst proposed the following simple de nition for the g function, based of a -level meaning: g(F (d; t); w) = 0 for w, while g(F (d; t); w) = F (d; t) for > w [46]. Kraft and Buell proposed a more exible de nition, by allowing some small partial credit for a document whose F (d; t) value is less than the threshold [29]. For F (d; t) < w, the function g returns a value varying as the percent satisfaction of w by the weight F (d; t). For F (d; t) < w, the function g is a measure of the closeness of F (d; t) to w, while for F (d; t) > w, g expresses the degree of over-satisfaction of the threshold w. A query ht; wi can be then interpreted as a request for the minimally acceptable document, which is the one having F(d,t) = w. Bordogna, Carrara and Pasi [11] have interpreted query weights as speci cations of ideal signi cance degrees. Based on work by Cater and Kraft [15], they have proposed a g function which measures the closeness of F (d; t) to w: g(F (d; t); w) = exp(K (F (d; t) , w)2 ). This inverse distance measure is symmetric in that documents with an F (d; t) value greater than the w value are considered as those documents with an F(d,t) value less than the w value. Based on [11], Kraft, Bordogna, and Pasi [28] have proposed a closeness measure that allows for asymmetry. Linguistic query weights The main limitation of numeric query weights is to force the user to quantify the qualitative and vague concept of importance. To make simpler and more natural the speci cation of some level of importance associated with the terms in a query, linguistic weights have been formalised, such as important, very important, fairly important , etc. To this aim Bordogna and Pasi [8] have de ned a fuzzy retrieval model in which the linguistic descriptors are formalised within the framework of fuzzy set theory through linguistic variables [65]. A ht; li pair identi es a qualitative selection criterion, where t is a term and l is a value belonging to the term set of the linguistic variable Importance , which has a base variable ranging over the set [0; 1] (the admissible values of the indexing function F ). Such a query language can be employed by any IRS with a weighted document representation. To compute the degree of satisfaction of a pair ht; li by a given document d, the compatibility of the index term weight F (d; t) is evaluated with respect to the constraint imposed by the linguistic query weight l. The term set of the linguistic variable can be formally generated by means of a context-free grammar; an example of such a term set is: T (Importance) = fimportant; veryimportant; notimportant; fairlyimportant; : : :g. The meaning of a linguistic value l is de ned by means of a function l which assesses
Soft Information Retrieval
9
the compatibility of the representation of documents, i.e. the F (d; t)s, with the linguistic term l. The meanings of non-primary terms in T (Importance) are obtained by rst de ning the compatibility function associated with the primary term important, important , and then by modifying important according to the semantics of the hedges. By considering the linguistic weight as a \fuzzi cation" of numeric weights, in [9] a simple procedure has been proposed to derive the semantics of the primary term important from the semantics of numeric query weights. This procedure is based on the assumption that a linguistic query weight can be seen as the synthetic expression of a set of numeric weights; in other words when a user asks for documents in which the concept represented by term t is important , he/she expresses a fuzzy concept on the term signi cance values (the F (d; t) values). The function evaluating a pair ht; li has then be de ned as:
important (F (d; t)) = MAXw2[i;j] g(F (d; t); w) in which the de nition of the g function depends on the semantics adopted for query weights. The two values, i and j , with i < j , de ne the range of
numeric values satisfying the linguistic constraint important . An example of function evaluating the linguistic weight important has been proposed by Kraft, Bordogna, and Pasi as a generalisation of the threshold semantics [28]. Aggregation operators Aggregation operators are used to combine single selection criteria to express more complex request of information. The e function evaluating a Boolean query composed of n selection criteria is de ned as: e : D [0; 1]n ,! [0; 1]. The arguments of function e are the degrees of satisfaction of the selection criteria (produced by the application of function g), and the result of its application is the RSV of the document with respect to the query. When considering weighted selection criteria, the AND and OR connectives are interpreted as a T-norm and a T-conorm operators respectively. Usually the Min T-norm and the Max T-conorm are adopted. In the Boolean language the allowed connectives are the AND and the OR, which support crisp aggregations. For example, when evaluating a query composed by n selection criteria aggregated through the AND, the matching mechanism does not tolerate the unsatisfaction of a single criterion; this may cause the rejection of useful items. Although the fuzzy query expressions seen so far achieve a higher expressiveness than ordinary Boolean expressions, they do not reduce the complexity of Boolean logic. To face this problem, other extensions of the Boolean query language have concerned the de nition of softer aggregation operators. To this aim new de nitions of aggregation operators have been proposed; for example, Salton, Fox and Wu proposed a model based on a pnorm operator [49]. Hayashi [22], Sanchez [51] and Paice [44] consider "soft" Boolean operators weighted between the AND and the OR as a compromise. However, in these approaches dierent soft interpretations of the Boolean connectives in the same query are not supported.
10
Fabio Crestani and Gabriella Pasi
Within the framework of fuzzy set theory Bordogna and Pasi [8] have proposed a generalisation of the Boolean query language, based on the concept of linguistic quanti ers: they are employed to specify both crisp and vague aggregation conditions. New aggregation operators, with a self-expressive meaning such as at least n and most of , are de ned with a behaviour between the ones of the AND and the OR connectives, which are associated with the linguistic quanti ers all and at least 1 respectively. The requirements of a complex Boolean query are more easily and intuitively formulated by using linguistic quanti ers. For example, if one wants to ask that at least 3 out of the four terms \climate", \satellite", \meteorology", \image" be satis ed, the following Boolean query should be formulated: (climate AND satellite AND meteorology) OR (climate AND satellite AND image) OR(climate AND meteorology AND image) OR (meteorology AND image AND satellite) By using linguistic quanti ers the same request can be formulated as: at least 3 (climate, satellite,meteorology,image) Ordered weighted averaging (OWA) operators have been used to de ne the linguistic quanti ers [61]. Besides the quanti er at least k de ned as a crisp threshold, other quanti ers with a vague meaning can be de ned. The quanti er almost k is interpreted as a fuzzy threshold on the number of the criteria to be satis ed: the user gets a certain satisfaction even when less than k criteria are satis ed. The quanti er more than k speci es that the higher the number of the criteria above k which are satis ed, the higher the overall satisfaction value. The e function evaluating a query q = quantifier(q1; : : : ; qn ) yields a value in [0; 1] for each d 2 D; it is formalised as: e(d; q) = OWAquantifier (e(d; q1 ); : : : ; e(d; qn )) in which OWAquantifier is the OWA operator associated with the quanti er. These aggregation operators can be applied either to unweighted or weighted terms. In [8] another extension of the query language has been introduced to specify optional selection criteria. In some cases, the order in which the selection criteria are connected through the AND operator may re ect an implicit priority among them. The selection criteria which are listed rst in the query are in some sense considered essential to characterise the topics of interest, while those which appear later are considered optional. What it is desired is that the retrieval results depend on the satisfaction of the essential criteria, while the overall relevance must also be conditioned by the satisfaction of the optional criteria. The and possibly operator has been de ned to ask for optional selection criteria in relation to essential ones. An optional criterion aects only the degree of relevance of documents retrieved; it acts as a lter on those items which satisfy the essential criteria. For example, to express interest in documents dealing with \expert systems" (essential criteria), while declaring a greater interest for those of such documents which also deal with \fuzzy" or \ANN" (optional criteria), the following query can be formulated:
Soft Information Retrieval
11
all ( expert, systems) and possibly at least 1 (fuzzy, ANN) The and possibly operator has been de ned as a non-monotonic intersection [63] and provides a further level of softening of the retrieval mechanism, not discarding documents which satisfy only the essential criteria.
3.3 Fuzzy Thesauri of terms Associative mechanisms are de ned to enrich an IRS so as to make it able to retrieve additional documents which are not indexed by the terms in a given query. Thesauri are an example of associative mechanisms, which exploit relations among terms, and are usually employed to select terms associated with query terms. Three main types of relation among terms are generally exploited; the relation broader term (BT) is used to express that one term has a more general meaning than the entry term. The relation narrower term (NT) is the inverse relation; the relation related term (RT) is de ned to exploit synonyms or near-synonyms. Fuzzy thesauri have been de ned in order to express the strength in the association between pairs of terms [38, 39, 42]. The rst works on fuzzy thesauri introduced the notion of fuzzy relations to represent associations between terms. Kohout, Keravanou, and Bandler [27] consider a synonym link to be a fuzzy binary relation de ned in terms of fuzzy implication. The authors also de ne a narrower term link (where term t1 is narrower than term t2 , so term t2 is broader than term t1 ) by means of fuzzy implication. Miyamoto and Nakayama [39] have introduced the concept of fuzzy pseudo-thesauri and fuzzy associations based on a citation index. Bezdek, Biswas, and Huang [4] generate a thesaurus based on the maxstar transitive closure for linguistic completion of a thesaurus generated initially by an expert linking terms. Miyamoto has introduced the following de nition of a fuzzy thesaurus [38], as a fuzzy relation. Let C be a set of concepts. Each term t 2 T corresponds to a fuzzy set of concepts h(t) : h(t) = ft (c)=cjc 2 C g, in which t (c) is the degree to which term t is related to concept c. A measure M is de ned on all the possible fuzzy sets of concepts, which satis es: M (;) = 0, M (C ) < 1, M (A) M (B ) if A B . A typical example of M is the cardinality of a fuzzy set. The similarity between two index terms, (t1 ; t2 ) 2 T , is represented in a fuzzy thesaurus by the fuzzy RT relation s de ned as: s(t1 ; t2 ) = M [h(t1 ) \ h(t2 )]=M [h(t1) [ h(t2)]. The fuzzy NT relation t which represents grades of inclusion of a narrower term t1 in another (broader) term t2 is de ned as: t(t1; t2) = M [h(t1) \ h(t2)]=M [h(t1)]. By assuming M as the cardinality of a set, s and t are given as:
s(t1 ; t2 ) =
X min[t (ck); t (ck)]= X max[t (ck); t (ck)] k
1
2
k
1
2
12
Fabio Crestani and Gabriella Pasi
t(t1; t2) =
X min[t (ck); t (ck)]= X t (ck) k
1
2
1
k
A fuzzy pseudo-thesaurus can be de ned by replacing the set C in the de nition of h(t) above with the set of documents D, with the assumption that h(t) is the fuzzy set of documents indexed by term t. Thus, h(t) = fhd; t (d)ijd 2 Dg, in which t (d) = F (d; t). F can be either a binary value or a value in [0; 1], de ning a fuzzy representation of documents. The fuzzy RT and the fuzzy NT relations are de ned as:
X min[F (t ; dk); F (t ; dk)]= X max[F (t ; dk); F (t ; dk)] k k X X t(t1; t2) = min[[F (t ; dk ); F (t ; dk )]= F (t ; dk )
s(t1 ; t2 ) =
1
k
2
1
1
2
k
2
1
The values s(t1 ; t2) and t(t1; t2) are obtained on the basis of the cooccurrences of terms t1 and t2 in the set D of documents.
3.4 Fuzzy Clustering of Documents Clustering in information retrieval is a method for partitioning a given set of documents D into groups using a measure of similarity which is de ned on every pairs of documents. Similarity between documents in the same group should be large, while it should be small for documents in dierent groups. Generated clusters can then be used as an index for information retrieval; that is, also the documents which belong to the same clusters of the documents directly indexed by the terms in the query are retrieved. Often, similarity measures are suggested empirically or heuristically [50]. When adopting fuzzy set theory, clustering can be formalised as a kind of fuzzy association. In this case, the fuzzy association is de ned as f : DD ,! [0; 1], where D is the set of documents. By assuming R(d) to be the fuzzy set of terms representing a document d with membership function d whose values d(t) = F (d; t) are the index term weights of term t in document d, the symmetric fuzzy relation s, as originally de ned above, is taken to be the similarity measure for clustering documents. In fuzzy clustering, documents can belong to more than one cluster with varying degree of membership. Each document is assigned a membership value to each cluster. In a pure fuzzy clustering, a complete overlap of clusters is allowed. Modi ed fuzzy clustering, or soft clustering, approaches use thresholding mechanisms to limit the number of documents belonging to each cluster. The main advantage of using modi ed fuzzy clustering is the fact that the degree
Soft Information Retrieval
13
of fuzziness is controlled. Several researchers have worked on fuzzy clustering for retrieval, who include as Kamel, et al. [26], and Miyamoto [38], and De Mantaras et al. [21].
4. Application of Neural Networks to Information Retrieval The application of connectionist models to Information Retrieval (IR) is not a recent phenomenon. Indeed, a number of papers have appeared on this subject, and much research is in progress. However, so far, there is no operational IR system based on the connectionist model, and only recently prototypes have been proposed approaching applications of the size of \real world" applications. Besides, many research papers claim to be about applications of NN to IR, while they are often applications of spreading activation techniques (see [19] for a few examples). In this section, we summarise some of the most innovative research in the application of NN and connectionist models to IR. We are aware that a number of interesting and recent works are left out of our review because of space. We decided to concentrate only on those that we considered the most important so that we could have space for showing both the innovative aspects and the drawbacks of each approach reviewed. The review is divided in two sections dealing with the two most important paradigms of learning used in the NN eld: supervised learning and unsupervised learning. We refer to the NN literature for a detailed explanation of these two learning paradigms (see for example [23, 55]).
4.1 Supervised Learning Techniques A supervised learning procedure is a process which incorporates an \external teacher". This means that the teacher speci es the desired output of the NN. During the learning phase the NN adapt the values of the weights on the connections in order to obtain the desired output [54]. M. C. Mozer [41] was one of the rst researcher to start working on the application of NN to IR. Some of his ideas are still research ground for many researchers. The dynamic of this model was based on McClelland and Rumelhart's interactive activation model of word perception [48]. The structure of the model is depicted in Figure 4.1. In the model each document or descriptor is represented by a unit. The activation level of a unit indicates the system's belief in the relevance of the document. Excitatory and inhibitory links permit the ow of activation from document to document and from document to descriptors. An asymmetric aspect of the model is that each document is connected with inhibitory links (with a constant weight) to all other documents, whereas descriptors do not have mutually inhibitory connections. This competitive aspect among documents helps keeping their activation level under
14
Fabio Crestani and Gabriella Pasi
control during the retrieval phase and helps to control the level of associativity among documents. At an abstract level of description, the model operates as follows. The user activates a set of descriptors. These descriptors activate a set of documents, which activate a set of new descriptors, which in turn will activate a set of new documents as well as reinforce the activation of the already active ones, and so on. This ow allows descriptors in a query to indirectly suggest other descriptors that may be useful in the documents search, and allows active documents to indirectly suggest other documents.
. . .
doc 12
. . . . .
doc 14
doc 13
computers
Fig. 4.1.
modelling
doc 15
.
psycology
Mozer's Inductive IR model.
The evaluation of the prototype on a small set of documents and descriptors gave good and sometimes surprising results, often retrieving documents with no descriptors in common with the query yet clearly relevant. However, the model has some important drawbacks. First, weights on links are only +1 or -1 and they do not re ect the importance of the descriptor in the representation of the document's informative content. Second, weights are static, and there is no learning procedure which modi es these weights. Therefore, the system performs in the same way over the time. Lastly, there are no links among descriptors that could represent semantic relationships among them. The relationships between descriptors are induced from their relationships with documents, therefore the model requires that the documents should be indexed by a highly correlated set of descriptors. Without such an indexing scheme, the model would not be able to perform the desired induction and it has been shown to perform, given appropriate values of the weights on the links, exactly like a Vector Space model. A few years later, J. Bein and P. Smolensky [1], understanding the importance and the potential of Mozer's research, rigorously tried to test whether Mozer's model of Inductive IR was useful for larger collections. Although they used a document collection of reasonable size, they could not produce recall and precision gures to be compared to other IRS because they did not use a standard document collection. Nevertheless, the results achieved gave analytical and empirical evidence to support the claim that Mozer's model is feasible and ecient enough to be implemented for larger applications. In particular, the issue of feasibility, in terms of the associative characteristics of the model, was investigated in depth. The empirical results showed a good value of induced association among descriptors that are not part of the user
Soft Information Retrieval
15
speci ed query and among documents which are not directly associated with the descriptors in the user query. Also P. Hingston and R. Wilkinson [24] continued in the re nement of Mozer's ideas. The architecture of their model is, in fact, almost the same as the one proposed initially by Mozer. The major contribution of their work is in the proposal of incorporating relevance feedback from users and in the use of a standard document collection, giving gures that could be compared with those produced by conventional IRS. Another not dissimilar approach is that of a document retrieval system implemented on a Connection Machine (CM) by C. Stan ll and B. Kahle [57]. R. K. Belew [2, 3] investigated in depth the use of various NN techniques in an IRS called AIR. AIR 's structure is made of three layers (see Figure 4.2). Nodes on the rst layer represent descriptors; nodes in the second layer represent documents; nodes in the third layer represent documents' authors. Two links (one in each direction) are created between a document and each of its descriptors. Weights are assigned to these links according to an inverse frequency weighting scheme. Similarly, two links connect a document to each of its authors. The sum of the weights on all links going out from a node is forced to be a constant. This weighting scheme is recognised to be simplistic, especially when descriptors are taken only from the document's title. However, this is only an initial weighting. Weights will be permanently modi ed from the rst user session, by means of relevance feedback. An initial query is composed by specifying some of the three type of features represented in the network. The query causes the activation of the nodes corresponding to the features named in the query. This activity is allowed to propagate throughout the network and the response of the system is the set of nodes that become active over a certain threshold during this propagation. Subsequent queries are performed using relevance feedback from the user. The user is requested to evaluate the relevance of the documents that are displayed in order of their current activation. The system constructs a new query directly from this feedback. Moreover, this relevance feedback acts as a training signal to modify the document representation by changing the weights on the links. Although Belew uses a learning rule derived from the Hebbian one, he interpret it as a conditional probability. A weight wAB is considered as the conditional probability that the node B is relevant given that the node A is relevant. This probability is then extended inductively to include direct, transitive paths that AIR uses extensively for its retrieval. This method enables the system to construct a representation of the collection based on the combination of two completely dierent sources of evidence: the word frequency in the initial indexing and the opinions of the users. K. L. Kwok attempted to use the NN paradigm to reformulate the probabilistic model of IR with single terms as document components [30, 31]. The model proposed by Kwok is represented in Figure 4.3. It is a three layer
16
Fabio Crestani and Gabriella Pasi neuron
brain
AND81
parallel
informat
WIN84
BAR81
Fig. 4.2.
associat
KOW81
WINTON
ANDERSON
MOZER
memory
AIR's structure.
NN, with bi-directional and asymmetric connections, where no connections are allowed between units on the same layer. Units on the rst layer, which represent queries, may receive an external input and are connected to the units in the second layer, which represent index terms. The second layer is considered a \hidden" layer, as in the classical three-layer feedforward NN model. Units of the second layer are connected to units on the third layer, which represent documents.
q(a) w(ak) t(k)
w(ik)
d(i)
w(ka) w(ki)
Fig. 4.3.
Kwok's model.
The most innovative aspect of Kwok's model is in the way the initial weights are evaluated. It uses a modi cation of probabilistic indexing to evaluate the initial strength of the connections. Without entering into the details of the actual formulas, it should be noted that the strength of the connections represents a sort of inference, determined using classical probabilistic IR measure. According to this, a weight on a connection is considered, depending on its direction, either as the probability of presence of that index
Soft Information Retrieval
17
term given a particular query (wka ) or document (wki ), or as the evidence that if the index term k is used it will be dealing with the contents of that particular query (wak ) or document (wik ). Although this can be considered a good attempt to combine old and new paradigms in IR, the complexity of the computations necessary for the spreading of activation and for the learning process makes this approach impracticable for real size collections. Other attempts to combine sounded classical IR techniques with NN can be found in the work of G. S. Jung and V. V. Raghavan [25]. They attempted to marry the Vector Space model with learning paradigms of the connectionist model. The main contribution of their work concerns the construction of a thesaurus-like knowledge representation structure, referred as \pseudo-thesaurus". The domain knowledge contained in a pseudo-thesaurus is in numeric, rather then symbolic form and it is represented in a network structure similar to a single layer NN, where (again) terms are represented by nodes and relationships between terms are represented by links. Therefore relationships between terms are represented in the pseudo-thesaurus as real numbers (as weights on links) and they are determined by means of a learning procedure which makes use of relevance feedback from past users. The information provided by the relevance feedback is in the form of training pairs, that is pairs of queries and document descriptions. The learning procedure is such that the pseudo{thesaurus can incrementally update itself in an adaptive way by means of continuous relevance feedback from users. Once the learning of the weights in the pseudo-thesaurus has taken place, the information contained in it is used in conjunction with the vector space model to perform the ranking and retrieval of documents in response to a query. There is a straightforward mapping between the learning procedure they use and the perceptron learning procedure. This model is very interesting and is theoretically sounded being based on the vector space model, being based on a geometrical spatial representation. The major drawback of the model is in its assumption that the relationships between terms in the pseudo{thesaurus are symmetric. This is not always true. Indeed, in most cases (e.g. generalisation - specialisation relationship) the strength of the connection between a pair of terms is dierent according to the direction under consideration. A dierent approach is followed by F. Crestani. In [17] an approach to using NN as a black box to acquire domain knowledge from and IR application is investigated with a series of experiments. Crestani developed a prototype adaptive IR system. The use of the system is divided in two phases: a training phase and a retrieval phase. During the training phase a Query Processor gets the query in the form of a set of index terms and transforms it into a binary vector whose dimension is that of the input layer of the NN. A Document Processor does the same for documents, transforming them into binary vectors whose dimension is that of the output layer of the NN. The input and the output layer of a 3-layers
18
Fabio Crestani and Gabriella Pasi
feedforward NN are set to represent a query and a relevant document, and the Backpropagation algorithms is used for the learning. This is monitored by the NN simulator control structure and when some predetermined conditions are met the learning phase is halted. Link matrices are produced, representing the application domain knowledge acquired. They are stored for their further use in the retrieval phase. During the training phase the system is fed according to various teaching strategies described further on. During a retrieval phase, after the Query Processor has transformed the query into a binary representation, the NN is activated. The activation spreads from the input layer to the output layer using the weight matrices produced during the training phase. The vector representing the query is therefore modi ed or, better, adapted according to the application domain knowledge and a new query representation vector is produced on the NN output layer. The new query representation is then fed into the Matcher that produces a ranked list of documents according to their evaluated similarity. The ranking re ects the evaluated relevance of the documents to the query. The interface to the system displays the documents to the user according to their evaluated relevance to the query. The user can the assess the actual relevance of the documents presented and mark them according to his personal perceived relevance. This can be used for further training also inside the retrieval phase. The results of this further training, however, will be discarded at the end of the retrieval phase in order to avoid system to be in uenced by personal, and not supervised, relevance relationships. Three dierent learning strategies were employed in the experimentation, resulting in three dierent ways of training the system. They are summarised in Figure 4.4. In the rst two types of learning the application domain knowledge is learned by training the system using examples spanning over the entire application domain at the same time using many dierent queries and relevance assessments. The third type of learning instead is intended to build up knowledge going straight on a specialisation dimension, so concentrating on a particular query. d i1
d i1
i
i
d2
qi
c qi
i
Simulation
. d3 .
System ql
i
d2 Simulation
Simulation
i
dk
qi
System
. dl 1 . . .
ql
i
System
l
c
d3
i
dl
l
dm
Total Learning
Horizontal Learning
For each query it uses the entire set of relevant documents.
. . . .
For each query it uses a cluster representative of the entire set of relevant documents.
Fig. 4.4.
Vertical Learning It uses only one query and only a subset of the set of relevant documents.
Crestani's learning strategies.
The results showed that the query adaptation produced by rst two types of learning does not give good retrieval results. The NN is only partially able
Soft Information Retrieval
19
to learn and generalise the characteristics of the application domain knowledge. The amount of information submitted to the system seems to be too much and the system shows a form of \confusion". Only when the system is fed with ltered and prearranged information, by training the system using a centroid cluster representative of the set of documents relevant to a query, the generalisation and retrieval performance improve. More interesting results were obtained from the third type of learning, where the training example is made of a single query representation and a single relevant document representation and where only a subset of all the documents known to be relevant to that particular query are used in the training. The purpose is to retrieve the remaining relevant document by exploiting their similarity with those used in the training. This technique is quite dierent from the classical statistical or probabilistic relevance feedback because is uses a non-linear discriminating function. Query adaptation produced by this techniques, in fact, gives performance similar to those provided by probabilistic relevance feedback [16]. The interesting thing is that the adapted query is most of the times quite dierent from its original formulation and to the expanded query produced by relevance feedback. Accordingly, the sets of documents resulting from the use of the original query, the expanded by relevance feedback one, and the adapted one are often quite dierent. The adapted query is often able to retrieve relevant documents that the original or the expanded one are not able to retrieve. An integration of the proposed prototype system into a more general network model for adaptive IR is presented in [18].
4.2 Unsupervised Learning Techniques In unsupervised learning procedures the NN does not receive any teaching or learning feedback, but it is left to learn by itself. This procedure is also often referred to as \self-organisation" because the process relies only upon local information and internal control to learn by capturing regularities in the stream of input patterns. For these reasons, unsupervised learning has been used in IR mainly for documents or terms clustering and classi cation. In IR documents or terms can be clustered in related groups so that, once identi ed a relevant one, retrieval of associated documents or terms can be facilitated. These are the two classes of applications we will review. We will only report some of the most signi cant approaches and for a more complete review we refer the interested reader to [53]. K.J. MacLeod and W. Robertson [34] were among the rst to examined in depth the suitability of current NN models for performing document clustering. The Adaptive Resonance Theory as well as Backpropagation models were examined. The cited paper describes a NN model which has been designed speci cally to perform document clustering by feature extraction, using unsupervised learning. The paper report in details the hard work of tuning the system parameters and reports on the experimental results obtained from the
20
Fabio Crestani and Gabriella Pasi
clustering and subsequent querying of a classical and well known test collection. The results are compared to those obtained by other researchers using several more traditional (i.e. statistical) clustering algorithms. In particular, the MacLeod algorithm gave results comparable in eectiveness to hierarchic (sequential) clustering algorithms. Another clustering eort based on a fully distributed data representation scheme was performed be X. Lin, D. Soergel, and G. Marchionini [32]. They used a Kohonen feature map for clustering documents of arti cial intelligence literature: 140 titles of documents were used and 25 index terms were extracted. Documents were represented by vectors of index terms that were used as input to train a self-organising map of 25-D input space and 140 neurons. After training, each document was mapped to a neuron and the map reported in Figure 4.5 was produced. The numbers on the map represent the number of documents mapped to each node. The map is divided into concept areas. The size of each area corresponds to the frequency of occurrence of the terms in that area. In addition, terms that are used together in the particular context of AI literature become neighbours in the map, and since neighbouring neurons have similar input vectors being mapped to them, the areas' continuity is assured. The map can serve as a navigation and visualisation interface for information retrieval. 4 citation databases
1 3 network
1
2
2
2 machine learning 1
29
1
1
2 search
1
2
1 4
1
2 1
1 2
1 expert 6
2
3 knowledge
1 4 system
6 librar
2
others
1
1 process 2 language
2 online
1
1 natural
1
1 application
2
1
6 retrieval 1 1
3 1
3
1
1 intelligent 3 1
research 2
3
Fig. 4.5.
A self-organising map of 140 documents of AI literature.
3
1
4
2
2
Although, the attempt was successful, the scale of the task can not be compared to the one of a real information retrieval system. Moreover, there is no evaluation of the implemented system, not only for the quality of clustering but also for the retrieval eectiveness. D. Merkl [36, 37] used the self-organising map for clustering software library documents of the National Institute of Health (NIH, a US government organisation) class library (NIHCL). NIHCL is a collection of classes in C++
Soft Information Retrieval
21
programming language. For each software component, a set of terms from the full text of the respective part of the manual were extracted and document vectors were constructed. The document vectors were used as input to train a self-organising map of 498-D input space (this is the number of index terms extracted) and 100 neurons. After training, each document was mapped to a neuron producing the map depicted in Figure 4.6. OIOistreamOIOostream
OIOnihin
OIOin
OIOnihout
OIOout
OIOifd
StoreOnTbl
OIOofd
ReadFromTbl
AssocInt
LookupKey
LinkOb Bag
Heap
OrderedCltn
Arrayob Arraychar
Nil
IdentSet
Point SeqCltn
LinkedList
Fig. 4.6.
Vector String
FDSet Time
Collection Iterator
Bitset Object
Random
Link
Stack
Dictionary IndentDict
Set
Class
Exception SortedCltn
KeySortCltn Assoc
Float
Integer
Regex Date
Range
Merkl's map of the NIH class library
Tests were performed to compare the clustering eectiveness of the map with the eectiveness of a clustering approach with the complete linkage method (a clustering method often used in IR [59]). The results show that Merkl's algorithm is more eective, being able to generate more coherent and homogeneous clusters. However, software libraries are a special category of documents where traditional clustering methods are known to fail to uncover meaningful clusters. Thus, the evaluation results should be considered as successful only under this particular kind of document collections. Mnemosine, a testbed for the comparison of dierent NN architectures and learning algorithms, has been created to demonstrate potential applications of NN in the eld of IR. The pattern clustering algorithm developed by M.S. Klassen and Y. Pao [45] is incorporated into Mnemosine. By encoding lexicon terms into sparse numeric matrices which become the inputs to the clustering module, it is possible to produce clusters of lexically related terms which are generally also semantically related. The testbed also supports the formation of fuzzy cognitive maps, which enable the retrieval of sets of terms related to a given vector of input terms by a form of constrained spreading activation. Sets of retrieved terms may be stored for future reference in a temporal associative memory. In the experiments performed by M.P Oakes and D. Reid [43], the fuzzy cognitive map is initially hardwired, but is able to learn in real-time by competitive dierential Hebbian learning. The authors give a discussion of how fuzzy cognitive maps with competitive dierential Hebbian learning and ART might also be employed to support hypertexts. J.C. Scholtes made a very extensive survey of the application of NN . Moreover, [52] contains a chapter that presents an implemented NN method for free-text search. A speci c interest (a query) is taught to a Kohonen feature map. Subsequently, large amounts of unstructured documents are passed along the network. Depending on the activity patterns that occur on
22
Fabio Crestani and Gabriella Pasi
the network, a document can be selected by the system. This systems works very much like a \neural" lter. The neural lter implements a mechanism in which a (usually large) query stated in natural language is taught to a self-organising NN, which derives an internal representation of the text. This text is then matched against a large stream of incoming data. The query is stored in a feature map and in a practical implementation of this model multiple queries can be matched simultaneously. The model is implemented in Kohonen Feature Map by using statistics about the adjacency of elements in the underlying text. A statistical algorithm that incorporates such adjacency information is the n-gram vector method. The n-gram analysis method can be interpreted as a window size n, shifting over the words. This can be implemented quite simply in the Kohonen input sensors by assigning several sensors to each element in the window and concatenating all the window sensors to one big input vector. By shifting this window over the training text, only frequent n-grams form clusters on the feature map, the others are overruled. The above model has been tested on a small selection of the 1987 nuclear weapon restriction talks between USA and (former) USSR. The test set was the entire Pravda CD-ROM, being passed thought the neural lter. The results show that the precision and recall gures of the neural lter are higher than those obtained using traditional statistical IR techniques. In [58], G. Troina and N. Walker report on the construction of a prototype that deals with the two tasks of query expansion and document classi cation. Both tasks are achieved by clustering index terms extracted from the text of documents using a Kohonen self-organising map. Query expansion is achieved by suggesting the user terms that are similar to those she put in the query. Terms are considered similar if they belong to the same cluster. The user can follow or not the suggestion and include in the query the suggested terms. This form of query expansion is called by the authors \explicit". Document classi cation is achieved as a by product of term clustering. The document vector is passed to the system that cluster documents by analysing their patterns of term occurrences. Documents are then classi ed into subject-related groups and each group is given a \label", that is a set of index terms that best describe the subject the documents are about. These cluster labels can also be incorporated into queries to broaden or narrow the search. An interesting aspect of the prototype is the ability to perform what the authors called \implicit query expansion". Implicit query expansion is achieved by retrieving not only the documents that match the original query, but also those that are related to them as belonging to the same cluster. In this way the user is supplied also with documents that are to be considered relevant to the suer information need even without using any of the terms present in the query.
Soft Information Retrieval
23
Experiments where performed using a rather small collection of documents: 975 documents of the ESA Microgravity Database. These documents produced a dictionary of 2962 terms, that were reduced to 100 manually determined index terms that composed the semantic pattern vector feed into the Kohonen self-organising map. The paper does not report eectiveness results, but reports an example of the actual subject-related document clusters produced by the system. In [12] G. Bordogna and G. Pasi have proposed a neural relevance feedback model based on the de nition of an associative neural network, and on a rule-based mechanism to expand the query evaluation with the terms highlighted by the network dynamics. The basic idea is to operate on the rst retrieved set of documents by accumulating in a neural network the evidence of the user's information interests. This is done by the de nition of a neural network, dynamically constructed from the analysis of the retrieved documents the user has judged as most relevant. The network can be regarded as a mirror of user interests evolving in time, or as a kind of dynamic personal thesaurus of concepts of interest. In this network the neurons represent the most signi cant terms in the selected documents, while the synapses represent the relations between pairs of terms in these documents. The activities of the neurons are computed on the basis of the terms' signi cance degrees in the selected documents, while the weights of the synapses are computed based on a similarity measure between the terms' signi cance degrees in the selected documents. At the steady state the terms corresponding to active nodes are considered as meaningful terms, while the degrees of the connections between these nodes and those corresponding to the original query terms indicate the strength of the associations between concepts of interest. The rule-based superstructure is then used to expand the original query evaluation with the meaningful terms by avoiding the explicit construction of a new query. Starting from the previous work R.A. Marques Pereira and G. Pasi have recently proposed a new dynamical model for de ning fuzzy associations of terms extracted from relevant documents and query terms [35].
5. Conclusions and Future Directions In this chapter a synthetic survey of some approaches to de ne exible IRSs has been presented. This survey refers to IR models based on the so called Soft Computing paradigm; in particular the approaches based on Fuzzy Set Theory and on Neural Networks were considered. The analysed approaches make it possible to model aspects of the inherent vagueness and uncertainty characterising the information retrieval process. The application of soft computing to IR is particularly appealing to model mechanisms which learn the user 's notion of documents' relevance. Future directions of this research area include the integration of neuro-fuzzy approaches and the de nition of more powerful associative mechanism to improve the eectiveness of IRSs.
24
Fabio Crestani and Gabriella Pasi
References
1. Bein, J. and Smolensky, P. Application of the interactive activation model to document retrieval. Technical report, Dept. of Computer Science, University of Colorado, Boulder, 1988. 2. Belew, R.K. Adaptive Information Retrieval: machine learning in associative networks. PhD thesis, University of Michigan, USA, 1986. 3. Belew, R.K. Adaptive Information Retrieval: using a connectionist representation to retrieve and learn about documents. In Proceedings of ACM SIGIR, Cambridge, USA, June 1989. 4. Bezdek, J. C., Biswas, G., and Huang, L. Y. Transitive closures of fuzzy thesauri for information-retrieval systems. International Journal of Man-Machine Studies, 25(3), September, 1986. 5. Biswas, G., Bezdek, J. C., Marques, M., and Subramanian, V. Knowledgeassisted document retrieval. I. The natural-language interface. II. The retrieval process. Journal of the American Society for Information Science, 38(2), March 1987. 6. Bolc, L., Kowalski, A., Kozlowska, M., and Strzalkowski, T. A natural language information retrieval system with extentions towards fuzzy reasoning (medical diagnostic computing). International Journal of Man-Machine Studies, 23(4), October 1985. 7. Bookstein, A. Fuzzy requests: an approach to weighted boolean searches. Journal of the American Society for Information Science, 31(4), 240-247, 1980. 8. Bordogna G., and Pasi G. Linguistic aggregation operators of selection criteria in fuzzy information retrieval, International Journal of Intelligent Systems, 10, 233-248, 1995. 9. Bordogna G. and Pasi, G. Controlling Retrieval trough a User-Adaptive Representation of documents. International Journal of Approximate Reasoning, 12:317-339, 1995. 10. Bordogna, G. and Pasi, G. A fuzzy linguistic approach generalizing Boolean information retrieval: a model and its evaluation. Journal of the American Society for Information Science, 44(2), March, 70-82, 1993. 11. Bordogna, G., Carrara, P., and Pasi, G. Query term weights as constraints in fuzzy information retrieval. Information Processing and Management, 27(1), 15-26, 1991. 12. Bordogna, G. and Pasi, G. A User Adaptive Neural Network Supporting Rule Based Relevance Feedback, Fuzzy Sets and Systems, 82(2), 1996. 13. Buell, D. A. and Kraft, D. H. A model for a weighted retrieval system. Journal of the American Society for Information Science, 32(3), May, 211-216, 1981. 14. Buell, D. A. and Kraft, D. H. Performance measurement in a fuzzy retrieval environment. In Proceedings of the Fourth International Conference on Information Storage and Retrieval, 31 May-2 June, 1981, Oakland, CA; SIGIR Forum, 16(1), Summer, 56-62, 1981. 15. Cater, S. C. and Kraft, D. H. TIRS: A topological information retrieval system satisfying the requirements of the Waller-Kraft wish list. In Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, New Orleans, LA, June, 1989. 16. Crestani, F. Comparing neural and probabilistic relevance feedback in an interactive Information Retrieval system. In Proceedings of the IEEE International Conference on Neural Networks, pages 3426{2430, Orlando, Florida, USA, June 1994.
Soft Information Retrieval
25
17. Crestani, F. Domain knowledge acquisition for Information Retrieval using neural networks. Journal of Applied Expert Systems, 2(2):101{116, 1994. 18. Crestani, F. and van Risbergen, C.J. A Model for Adaptive Information Retrieval. Journal of Intelligent Information Systems, 8:29-56, 1997. 19. Crestani, F. Applications of spreading activation techniques in information retrieval. AI Review, 11(6), 1997. In print. 20. Crestani, F., Lalmas, M., Campbell, I. and van Risbergen, C.J. Is this document relevant? ...probably. A survey of probabilistic models in information retrieval. ACM Computing Surveys. In print. 21. De Mantaras, R. L., Cortes, U., Manero, J., and Plaza, E. Knowledge engineering for a document retrieval system. Fuzzy Sets and Systems, 38(2), November, 1990. 22. Hayashi, I., Naito, E., Wakami, N., Terano, T., Sugeno, M., Mukaidono, M., and Shigemasu, K. A proposal of fuzzy connective with learning function and its application to fuzzy information retrieval. In Fuzzy Engineering Toward Human Friendly Systems, 13-15 November, Yokohama, Japan, Amsterdam, The Netherlands, IOS Press, 446-55, 1991. 23. Hertz, J., Krogh, A. and Palmer, R. Introduction to the theory of Neural Computation. Addison-Wesley, New York, 1991. 24. Hingston, P. and Wilkinson, R. Document retrieval using a Neural Network. Technical report, Dept. of Computer Science, Royal Melbourne Institute of Technology, Melbourne, Australia, 1990. 25. Jung G.S. and Raghavan, V.V. Connectionist learning in costructing thesauruslike knowledge structure. AAAI Spring Symposium on Text-Based Intelligent Systems, Working Notes, March 1990. 26. Kamel, M., Had eld, B., and Ismail, M. Fuzzy query processing using clustering techniques. Information Processing and Management, 26(2), 279-293, 1990. 27. Kohout, L. J., Keravanou, E., and Bandler, W. Information retrieval system using fuzzy relational products for thesaurus construction. In Proceedings IFAC Fuzzy Information, Marseille, France, 7-13, 1983. 28. Kraft, D. H., Bordogna, G. and Pasi, G. An extended fuzzy linguistic approach to generalize Boolean information retrieval, Journal of Information Sciences Applications, 2(3), 1995. 29. Kraft, D. H. and Buell, D. A. Fuzzy sets and generalized Boolean retrieval systems. International Journal of Man-Machine Studies, 19(1), July, 45-56, 1983. 30. Kwok, K.L. A Neural Network for probabilistic Information Retrieval. In Proceedings of ACM SIGIR, Cambridge, MA, USA, June 1989. 31. Kwok, K.L. Application of neural networks to information retrieval. In Proceedings of the International Joint Conference on Neural Networks, volume 2, pages 623{626, Washington, USA, January 1990. IEEE. 32. Lin, X., Soergel, D. and Marchionini, G. A self-organising semantic map for information retrieval. In Proceedings of ACM SIGIR, pages 262{269, Chigaco, IL, USA, 1991. 33. Lucarella, D. and Morara, R. FIRST: fuzzy information retrieval system. Journal of Information Science, 17(2), 81-91, 1991. 34. MacLeod, K.J. and Robertson, W. A neural algorithm for document clustering. Information Processing and Management, 27(4):337{346, 1991. 35. Marques Pereira, R. A. and Pasi, G. A relevance Feedback Model Based on Soft Consensus Dynamics, Proceedings FUZZ-IEEE 1997, 1-5 July 1997, Barcellona. 36. Merkl, D. A connectionist view on document classi cation. In Proceedings of the 6th Australasian Database Conference (ADC'95), Adelaide, Australia, January 1995.
26
Fabio Crestani and Gabriella Pasi
37. Merkl, D. Content-based document classi cation with highly compressed input data. In Proceedings of the 5th International Conference on Arti cial Neural Networks (ICANN'95), Paris, France, October 1995. 38. Miyamoto, S. Fuzzy sets in Information Retrieval and Cluster Analysis. Kluwer Academic Publishers, 1990. 39. Miyamoto, S. and Nakayama, K. Fuzzy information retrieval based on a fuzzy pseudothesaurus. IEEE Transactions on Systems, Man and Cybernetics, SMC16(2), March-April 1986 40. Molinari A. and Pasi G. A Fuzzy representation of HTML documents for Information Retrieval Systems. In Proceedings of IEEE Int. Conf. on Fuzzy Systems, New Orleans, September 1996. 41. Mozer, M.C. Inductive Information Retrieval using parallel distributed computation. Technical report, Institute for Cognitive Science, University of California, San Diego, USA, June 1984. 42. Neuwirth, E. and Reisinger, L. Dissimilarity and distance coecients in automation- supported thesauri. Information Systems, 7(1), 1982. 43. Oakes, M.P. and Reid. D. Some practical applications of neural networks in information retrieval. In Proceedings of the 13th BCS IR Colloquium, pages 167{185, Lancaster, UK, 1991. 44. Paice, C. D. Soft evaluation of Boolean search queries in information retrieval systems. Information Technology: Research Development Applications, 3(1), January, 33-41, 1984. 45. Pao, Y. Adaptive Pattern Recognition and Neural Networks. Addison Wesley Publishing Co., Reading, MA, USA, 1989. 46. Radecki, T. Fuzzy set theoretical approach to document retrieval. Information Processing and Management, 15(5), 247-260, 1979. 47. Ragade, R. K. and Zunde, P. Incertitude characterization of the retrieversystem communication process. In Proceedings of the ASIS 37th Annual Meeting, 13-17 October, Atlanta, GA, Washington, DC: American Society for Information Science, 11, 128-129, 1974. 48. Rumelhart, D.E., McClelland, J.L. and PDP Research Group. Parallel Distributed Processing: exploration in the microstructure of cognition. MIT Press, Cambridge, 1986. 49. Salton, G., Fox, E., and Wu, H. Extended Boolean information retrieval. Communications of the ACM, 26(12), 1983. 50. Salton, G. and McGill, M.J. Introduction to modern information retrieval. New York, NY: McGraw-Hill, 1983. 51. Sanchez, E. Importance in knowledge systems. Information Systems, 14(6), 1989. 52. Scholtes, J.C. Neural Networks in Natural Language Processing and Information Retrieval. PhD Thesis, Department of Computational Linguistics, University of Amsterdam, The Netherlands, 1993. 53. Scholtes, J.C. Neural networks in information retrieval in a libraries context. EC/PROLIB/ANN Contract, M.S.C. Information Retrieval Technologies B.V. , The Netherlands, 1994. 54. Sejnowsky, T. Neural Network learning algorithms. In R. Eckmiller and Ch.v.d. Malsburg, editors, Neural Computers, NATO ASI, 1988. 55. Simpson, P.K. Arti cial Neural Systems: foundations, paradigms, applications and implementation. Pergamon Press, New York, 1990. 56. Smeaton, A.F. Progress in the application of Natural Language Processing to Information Retrieval tasks. The Computer Journal, 35(3):268-278, 1992. 57. Stan ll, C. and Kahle, B. Parallel free-text search on the connection machine system. Communication of the ACM, 29(12):1229{1239, December 1986.
Soft Information Retrieval
27
58. Troina, G. and Walker, N. Document classi cation and searching - a neural network approach. Esa bulletin, 87:90-96, 1996. 59. van Rijsbergen, C.J. Information Retrieval. Butterworths, London, second edition, 1979. 60. Waller, W. G. and Kraft, D. H. A mathematical model of a weighted Boolean retrieval system. Information Processing and Management, 15, 235-245, 1979. 61. Yager, R. R. On ordered weighted averaging aggregation operators in multi criteria decision making. IEEE Transactions on Systems, Man and Cybernetics, 18(1), 183- 190, 1988. 62. Yager, R. R. A note on weighted queries in information retrieval systems. Journal of the American Society for Information Science, 38(1) 1987. 63. Yager, R.R. Second Order Structures in Multi Criteria Decision Making. International Journal of Man-Machine Studies, 36, 553-570, 1992. 64. Zadeh, L. A. Fuzzy sets. Information and control, 8, 338-353, 1965. 65. Zadeh, L. A. The concept of a linguistic variable and its application to approximate reasoning, parts I, II. Information Science, 8, 199-249, 301-357, 1975. Acknowledgement. Fabio Crestani is supported by a \Marie Curie" Research Fellowship from the European Community.