An Interactive and Collaborative Approach to Answering Questions for

0 downloads 0 Views 255KB Size Report
Many organizations regularly have to answer questions from their clients. While clients complain that they spend too much time navigating the organization's ...
An Interactive and Collaborative Approach to Answering Questions for an Organization Vladimir A. Kulyukin Kristian J. Hammond Robin D. Burke Intelligent Information Laboratory, Department of Computer Science, University of Chicago 1100 E. 58th St., Chicago, IL 60637 [email protected] The University of Chicago Computer Science Department 1100 East 58th Street Chicago, Illinois 60637 Technical Report TR-97-14 February 1998

Abstract Many organizations regularly have to answer questions from their clients. While clients complain that they spend too much time navigating the organization's infrastructure to get answers, the organization's experts feel overwhelmed because they have to answer the same questions repeatedly or they receive questions that are irrelevant or marginal to their expertise. As the organization's body of expertise grows and market competition increases, there is an urgent need to satisfy the clients quickly and use the experts eciently. To address this need, we have developed an approach to building web-based managers of online textual expertise, called Information Exchange systems. These systems act as intermediaries between the organization's clients and experts. They provide the clients with access to online textual expertise through interactive question-answering. They allow the experts to modify the existing textual expertise and to collaborate on the incoming questions. In this paper, we describe an emerging World Wide Web application, called the Chicago Information Exchange, that manages the online textual expertise of the University of Chicago Computer Science Department and acts as a web-based intermediary between its clients and experts. The University of Chicago Computer Science Department supported this work.

Introduction Many organizations regularly have to answer questions from their clients. Universities, libraries, museums, hospitals, and mutual funds are only a few examples of such organizations. While clients complain that they have to spend too much time navigating the information infrastructure to get answers, the organization's experts feel overwhelmed, because they have to answer the same questions repeatedly or they receive questions that are irrelevant or marginal to their expertise. As the organization's body of expertise grows and market competition increases, there is an urgent need to satisfy the clients quickly and use the experts eciently. To address this need, we have developed an approach to building web-based managers of online textual expertise that act as intermediaries between the clients and the experts. Our approach is to view the organization's online textual expertise as a set of topics, each of which is a dynamic collection of question-answer (Q&A) pairs. A topic is an area of the organization's expertise. A Q&A pair is a question previously answered by an expert. The clients are provided with interfaces to ask natural language questions of the Q&A collections and to browse them. Thus, in addition to getting answers to their questions, the clients gain insight into the organization's expertise. The experts are provided with interfaces that allow them to register their expertise on a topic, collaborate with each other on the incoming questions, edit their previous answers, and add new answers. Thus, the update and acquisition of online textual expertise occur as natural by-products of the experts' question-answering activity. We have implemented these ideas in the Chicago Information Exchange system (CIE), an emerging Information Exchange application for managing the online textual expertise of the University of Chicago Computer Science Department. CIE is a web-based intermediary between the department's clients and experts. The clients are the students and their parents. The experts are the faculty, the graduate students, and the secretaries. The class of problems addressed by the Chicago Information Exchange is best described through an example: X is an undergraduate student at the University of Chicago enrolled in CS115, which uses the Scheme programming language. X finds ``The Structure and Interpretation of Computer Programs,'' the textbook for the class, a bit terse and wants to know if there is another book on Scheme that he could read, too. Also, X is a fan of Unix and wants to find out if there is an implementation of Scheme for that operating system. How does the Department make sure that X gets the answers quickly? How can the Department see to it that once the answers to such questions are available they will be reused in case someone else has similar questions? How can the Department make sure that any changes in the answers are quickly integrated?

The ultimate objective of the CIE project is to develop a technology for building applications that organizations can use to deal with these kinds of problems. We see this

technology as applicable in organizations whose experts receive numerous questions on a large set of topics either via the World Wide Web (WWW) or the Internet. We do not claim that our approach applies to all organizations and all kinds of expertise available online. Nor do we claim that the question-answering framework captures all information seeking behaviors. Rather, our claim is twofold. First, client-oriented question-answering is already an integral part of the routine activities of many organizations. Second, due to the growing popularity of the World Wide Web and the Internet among their clients, these organizations are actively seeking ways to support question-answering in these new media. An important question is how the set of topics is structured. Treating the topics as a at set is conceptually simple, but may have unattractive computational and browsing properties. Imposing a hierarchical structure on it has computational and browsing advantages but may not appeal to some experts sociologically (Davenport, 1994). There are several ways to structure the same topics depending on the question-answering functionality to be supported. In this paper, we consider the case when the experts' Q&A collections are organized into a single-inheritance hierarchy of topics. Obviously, user interface is an important part of the Information Exchange systems. Examples in this paper will show some of the interface designs that we have implemented. However, we focus on the functionality that underlies these interfaces and the techniques that support it. This paper is organized as follows. In Section 2, we present an overview of CIE and describe several sample interactions by experts and clients with the system. In Section 3, we focus on the techniques that support these interactions. We review two term weight metrics of the vector space retrieval model used by CIE. We review spreading activation and describe WordNet1 , a semantic network of English words built at Princeton University. We present our term weight metric based on spreading activation in WordNet. We proceed to explain how the system computes the terms and present its term weight metric based on the two metrics of the vector space model and on our spreading activation metric. In Section 4, we describe how the system indexes, reindexes, and retrieves online textual expertise. We review two relevance feedback techniques that CIE uses to support interactions with its clients and experts. We present our technique of negative evidence acquisition that complements relevance feedback. In Section 5, we discuss related work and some future directions of our research. Section 6 presents our conclusions.

Overview of the Chicago Information Exchange The Expert Figure 1 gives the hierarchy of topics currently used in CIE. A CIE expert opens an information account under the topics about which he or she wants to answer questions. The hierarchy of topics allows the experts to choose comfortable levels of abstraction. For example, if an expert feels competent about answering questions on all topics grouped under the topic of Arti cial Intelligence, the expert opens an information account under that 1

WordNet is a trademark of Princeton University

topic. The experts choose their topics by browsing the hierarchy with a set of interfaces and clicking on the topics in which they are interested. Information accounts are opened through a brief online interview during which the system asks the experts for some personal information and for a short textual description of their expertise on the topic. Figure 2 contains part of a sample information account that was opened under the topic of the Scheme programming language.2 Computer Science

Administration

Curriculum

Research

Finance

Artificial Intelligence

Planning

NLP

IR

C/C++

OS

Systems

Theory

Prog. Languages

Networks

Common Lisp

Scheme

Java

Figure 1: Hierarchy of Topics for the CS Department

Figure 2: Sample Information Account After the expert opens an information account under a topic, the system associates with it a Q&A collection, which is initially empty. As the expert answers questions on the topic, the new Q&A pairs are dynamically added to that collection. The expert can add, remove, or edit any Q&A pair in that collection. The expert can tell the system that a routed question is not relevant to his or her area of expertise. The system saves such questions in a collection of nonrelevant questions associated with the expert's information account and uses that collection in deciding the relevancy of the expert to future clients' questions. 2

The name of the actual CIE expert who owns this account has been changed to protect his identity.

The Client The client is provided with a natural language WWW interface to the textual expertise of the CS Department. Figure 3 and gure 4 depict a typical interaction that the client has with CIE. Suppose that the client is the CS student in our example story who enters the following question: What is a good rst book on Scheme? CIE rst nds the topics that are most relevant to the question. Once a topic is found, CIE checks if there are any information accounts under it. For each information account, the system rst checks if there is a suciently high match on the collection of nonrelevant questions. If such a match is found, that information account is no longer considered. Otherwise, the system iterates through the Q&A collection, comparing each Q&A against the client's question and computing a score. The Q&A pair with the highest score is presented to the client. Figure 3 shows one such pair. The left side of the interface contains information on the expert from whose information account the Q&A is retrieved. The client can evaluate the quality of retrieval, request to continue the search, view the expert's expertise, or email the expert. Thus, the client not only receives an answer to his question, but also is introduced to an expert for future reference.

Figure 3: Receiving an Answer from Chicago Information Exchange The system can initiate a dialog with the client if the client's question is related to several

topics. Suppose the next question that the CS student enters is: What is the di erence between Scheme and Common Lisp? Figure 4 presents the resulting CIE-generated interface for the client to indicate his or her preferences. Speci cally, the client can search each retrieved topic separately, search all the topics, or tell the system that none of the retrieved topics is relevant.

Figure 4: Resolving Ambiguity through a Simple Dialog

Collaboration among experts When the system identi es an expert as relevant to a client's question, it noti es the expert via email about it and stores the question in the expert's collection of unanswered questions. The expert logs in at his or her convenience and inspects the unanswered questions. Figure 5 contains a question about the SMART information retrieval system that a CIE expert is about to answer. The CIE experts can collaborate with each other on the incoming questions. When

a CIE expert cannot answer a routed question, either because it is nonrelevant or due to lack of time, the expert may know someone who can and has the option of forwarding the question to that individual. As gure 5 shows, forwarding is the second option available to the expert. Although the expert may not know anyone who can answer that question, he or she may know the appropriate topic for the question. In the latter case, the expert can forward the question to that topic, thereby telling the system to search the information accounts under it. As the system monitors the ow of questions, it tracks the networks of experts who regularly collaborate with each other. When the system knows that one expert is temporarily unavailable,3 it routes the question to the other members of the expert's network. Similarly, when the client wants to email one expert, the system can suggest that the client browse the network of the expert's collaborators and view their expertise.

Figure 5: CIE Expert Answering an Unanswered Question 3

Technically, this is implemented by giving each incoming question a time stamp.

How the System Works CIE uses the standard vector space model of retrieval (Salton, 1971). In this model, a collection of documents is a vector space where each document is a vector of weighted terms. A client's question Q is turned into a vector of weighted terms Q~ = (q1; :::; qn), where n is the dimension of the vector space. The text similarity between Q and D, sim(Q; D), is computed as the cosine of the angle between the question term weight vector Q~ = (q1; :::; qn) and the document term weight vector D~ = (d1; :::; dn), i.e., Pn q d ~ ~ (1) sim(Q; D) = pPn i=12 Pi ni 2 : i=1 qn

i=1 dn

The term weight metric used by CIE combines two of the term weight metrics of the vector space model and our term weight metric based on spreading activation in WordNet, a semantic network of English words and phrases built at Princeton University (Miller, 1995).

Two metrics of the vector space model The rst term weight metric used by the system is commonly referred to as tfidf (Salton and McGill, 1983). Let tf be the term frequency (the number of times a term t appears in a document D), N be the total number of documents, and nt the number of documents containing t. Then the term's tfidf in a document D is tf  log (N=nt). The second term weight metric used by CIE is condensation clustering (Bookstein, Klein, & Raita, 1995, 1998), which complements tfidf , because, unlike the latter, it takes into account the sequential structure of text. As Bookstein, Klein, and Raita show both theoretically and experimentally, good indexing terms can be found on the basis of topological information about their patterns of occurrences in a sequence of textual units, e.g., sentences, paragraphs, pages, chapters, documents, etc. The terms that do not bear content appear to be distributed randomly over the textual units, while deviations from randomness indicate that a term bears content. From the point of view of CIE, this metric is useful because documents in information accounts are sequences of Q&A pairs. One way to measure the condensation clustering of a term is by a ratio of the actual number of units containing at least one occurrence of it over the expected number of such units, assuming a random distribution. Bookstein, Klein, and Raita (1998) o er a simple derivation of the expected number of units based on random variables which we now brie y summarize. Let U be the total number of textual units in the collection under consideration. Let a random variable x~i be de ned as follows: ( i-th unit contains the term x~i = 10 ifotherwise : Let T be the number of occurrences of a term in all units. OnePcan express the number P of units containing the term as x~i , with expected value EC = E (~xi) = UE (~xi), since each indicator variable has the same expected value. But, the indicator variables are binary, and so their expected values are just the probabilities of an indicator variable taking the

value one. Since the probability that no term hit a unit is (1 ? 1=U )T , the probability that x~i = 1 is given by 1 ? (1 ? 1=U )T . Hence, EC = U (1 ? (1 ? 1=U ))T . If N is the actual number of textual units containing the term, its condensation clustering can be measured as the ratio N=EC . Obviously, the smaller this ratio, the more valuable the term. Since the terms for which the ratio N=EC is greater than or equal to 1 are not useful, we modi ed the condensation clustering measure, which we denote Wcc , as follows: ( (EC =N ) if N=EC < 1 Wcc (atrm) = log 0 otherwise: But, Wcc is a measure of whether or not a given term bears content in a collection of documents treated as a sequence of textual units. When the weight of a term is computed with respect to a particular document, e.g., a Q&A pair, we need to take into account how important that term is locally. We do this by multiplying the condensation clustering of the term by its frequency in the document. Thus, the modi ed condensation clustering weight of an annotated term atrm in a document D, which we denote Wtfcc , is given by

Wtfcc (atrm; D) = tf  Wc c(atrm); where tf is the frequency of atrm in D.

(2)

Spreading activation in WordNet The technique of spreading activation (Quillian, 1968; Cohen & Kjeldsen, 1987; Burke et al., 1997) aims at nding semantically related words or concepts. It relies on a semantic network where words or concepts are connected with each other by links that describe various relations. For example, the word \computer" may be connected to the word \machine" by means of an isa link. A basic spreading activation algorithm takes two words as input and attempts to nd a path in the network that leads from one to the other. From the point of view of CIE, the main objective of spreading activation is to reduce the lexical dependence on the terms in the Q&A collections, thus allowing for a modest degree of lexical variation in client's questions. For example, if a client's question contains the term \machine" and an expert's question has the term \computer," the two terms match due to their semantic relation. Our spreading activation technique is based on WordNet. The basic unit of WordNet is a set of synonyms, called synset. Synsets contain words that are interchangeable in a context. For instance, one can easily think of a context in which the words \computer" and \machine" are interchangeable. WordNet consists of 4 subnets organized by the part of speech: noun, verb, adjective, and adverb. Each subnet is organized in terms of its own set of relations. For example, nouns are organized in terms of antonymy, hypernymy/hyponymy (WordNet's term for the isa relation), and three meronym/holonym (part-of) relations. The technique described in this paper uses all parts of speech. But, for each part of speech only a subset of the de ned relations is used. For nouns, the relation of hypernymy/hyponymy is used. For example, the noun \machine" is a hypernym of the noun \computer." For verbs, the relation of entailment is used. For example, the verb \to limp" entails the verb \to

walk." For adjectives, the relation of similarity is used. For example, the adjective \wet" is similar to the adjective \watery." For adverbs, no relation is used because our version of WordNet does not de ne any relation on this part of speech. Figure 6 presents a part of the WordNet noun net showing some of the hypernyms and hyponyms of the noun \computer." The hierarchical lines denote the hypernym/hyponym, or isa relations; the horizontal line connects two senses, i.e. synsets, of the word \computer." device

person

isa

isa

machine

expert

isa

isa synset

computer, data_processor isa

calculator, reckoner

isa

analog_computer

digital_computer

isa

statistician, actuary

Figure 6: Part of the WordNet Noun Net Our spreading activation algorithm takes two inputs: a word to spread activation from and a depth, which is an integer specifying how many links away from the word the activation is to spread. Each word encountered along the way is saved along with its part of speech and the depth at which it was found. WordNet phrases like \digital computer are ignored. The integers 1, 2, 3, and 4 are used to encode noun, verb, adjective, and adverb, respectively. For example, \device12" means that \device" is a noun found at depth 2. The depth of the original word is 0. We refer to words like \device12" as annotated terms. If a word is found at two or more depths, only the smallest depth is encoded. For example, consider gure 6. Assuming that the word to spread activation from is \computer" and the depth is 2, the following encoded words are obtained: computer10, machine11, device12, calculator11, reckoner11, expert12, statistician12, actuary12. The WordNet weight of an annotated term, Wwn , is a combination of four functions and one parameter. The rst function is the part of speech function, Pos, which maps the annotated term to the integer corresponding to its part of speech. For example, Pos(dog 10) = 1. The second function is the polysemy function, Poly , which gives an integer that denotes the number of senses the annotated term has in its part of speech. Our version of WordNet provides the number of senses of each base form. For example, Poly (dog 10) = 12 means that as a noun the word \dog" has 12 senses. The third function is the part of speech weight 00

of an annotated term, Wpos , given by 8 > < 1:0 if Pos(atrm) = 1 Wpos(atrm) = > 0:75 if Pos(atrm) = 2 : 0:5 if Pos(atrm) = 3 or Pos(atrm) = 4 The fourth function is the depth function, Depth, which maps the annotated term to the depth at which it was found. For example, Depth(dog 12) = 2. The parameter is the rate of decay, r, which indicates by how much the weight of an annotated term atrm depreciates with depth. The WordNet weight, Wwn , is a function of aterm and r given by

Wpos(atrm) Wwn (atrm; r) = Poly(atrm : )  rDepth(atrm)

(3)

Computation of terms and their weights Now we describe how the annotated terms are computed and weighted. A greedy WordNetbased morphological algorithm stoplists a piece of text and brings each of the remaining words to its base form.4 For example, the base form of \books" is \book" and the base form of \walked" is \walk." The algorithm uses a stoplist of 425 words derived from the Brown corpus by Francis and Kucera (1982). It works as follows. Given a word, reduce it to its base form as if it was a noun. If the reduction was successful, output the base form tagged as a noun. If the reduction was a failure, reduce the word to its base as if it was a verb. If the reduction was successful, output the base form tagged as a verb. Otherwise, reduce the word to its base as if it was an adjective. If the reduction was successful, save the base form appropriately tagged. Otherwise, reduce the word to its base as if it was an adverb. In case of failure, output the word tagged as a noun by default. We do not do any parsing or word sense disambiguation. The output of the algorithm is a vector of unweighted annotated terms that represents it. For example, the question \What is a good rst book on Scheme?" turns into (book10; scheme10), assuming no activation is spread and \what," \is," \a," \good," \ rst," and \on" are stoplisted. The total weight of an annotated term is computed as follows. Let D be a document or a question and let atrm be an annotated term in D. Then the total weight of atrm in D, which we denote Wtrm , is given by

Wtrm(atrm; D) = Wwn (atrm) 1  (Wtfidf (atrm) 2 + Wtfcc (atrm) 3 ); (4) where Wwn is the WordNet weight given in (3), Wtfidf is the tfidf weight, Wtfcc is the condensation clustering weight given in (2), and 1 , 2 , and 3 denote how much importance is given to each term weight metric. The total weight of atrm in D is normalized by the cosine normalization: the square root of the sum of the squares of the total weights of the other annotated terms in D. Note that if 1 and 3 are set to 0 and 2 to 1, the formula computes tfidf . 4

The algorithm is greedy because it takes the rst part of speech whose rules let it obtain the base form.

Our preliminary small scale experiments showed that Wtfidf acts as a better distinguisher than Wtfcc for terms whose occurrences are small, relative to the overall number of documents in a collection. Our explanation for this is as follows. Condensation clustering is based on the expectation that the number of units containing at least one occurrence of a given term is less than the total number of occurrences of the term in the whole collection. Now let d be the total number of documents, i.e. units, in a collection and let t be the number of occurrences of a given term in all documents. The probability that no document will contain more than 1 occurrence of the term, assuming random distribution, is (d)t=dt. This ratio is close to 1.0 if t is considerably smaller than d. Therefore, the term's Wtfcc is likely to be 0. On the other hand, if t is considerably smaller than d, the term's Wtfidf is high. This observation suggests that Wtfidf and Wtfcc indeed act as complementary metrics. In small collections of documents, Wtfidf may be more reliable because small term occurrences are less likely to exhibit easily identi able topological patterns. However, in larger collections, Wtfcc may do better. Thus, the relative importance of these metrics may be a function of the number of documents in a collection. This is especially pertinent for dynamically growing collections of Q&A pairs. However, we have no experimental support for this conjecture. In our current implementation, 1 , 2 , and 3 are set to 1.0.

Indexing and Retrieval of Online Textual Expertise Indexing In Section 1, we stated that an information account under a topic has three components: a document with the expert's description of his or her expertise on the topic, a collection of Q&A pairs that the expert has answered about the topic, and a collection of questions that the expert has classi ed as nonrelevant. A topic consists of subtopics and a collection of information accounts opened under it. The system indexes its information space by turning its hierarchy of topics and information accounts into a pyramid of vector spaces. The term weights are computed by the metric given in (4). Each topic is turned into a vector of terms contained in its subtopics and information accounts. Each information account is turned into a vector of terms contained in its Q&A collection and the description of the expert's expertise. Q&A pairs and expertise descriptions are turned into vectors of terms contained in their texts. Finally, each collection of nonrelevant questions is also turned into a vector space. To see this pictorially, consider gure 7. A topic T has two subtopics, ST1 and ST2, and an information account collection, IA, that includes three information accounts: IA1, IA2, and IA3.5 After the indexing is completed, this hierarchy is turned into a pyramid of ~ 1; ST ~ 2; IA ~ and IAm = IA ~ 1; IA ~ 2; IA ~ 3, where n and m are the two vector spaces: T n = ST n m dimensions of T and IA , respectively. Each vector space is enclosed in a triangle. For the sake of simplicity, we do not depict the components of the subtopics and the information accounts and treat them as pieces of text. 5

T1

ST1

ST2

IA

IA1

IA2

IA3

Figure 7: Pyramid of Vector Spaces

The expert's feedback in indexing When a question is added to a nonrelevant collection, the weights of the terms in all of the above vectors6 are modi ed in a way similar to the one proposed by Brauen (1971). Let Q~ be the nonrelevant question vector added to a nonrelevant collection Cnr , let C~nr be a vector of all terms in that negative collection, and let V~ be a vector above C~nc in the hierarchy. If an annotated term atrm is present in Q~ and is absent from V~ , no action is taken. If atrm is present both in Q~ and in V~ , its weight is decreased by a small constant. If atrm is present in V~ but absent in Q~ , its weight in V~ is increased by a small constant. Thus, the system uses the expert's feedback both locally and globally. The expert's feedback is used locally, because even when a term is not present in any of the above vectors it is still added to the nonrelevant vector space. We call this negative evidence acquisition. The expert's feedback is used globally, because the terms that caused the wrong retrieval of the experts are punished while the terms that did not participate in it are rewarded. Negative evidence acquisition complements relevance feedback as follows. Relevance feedback, when applied to term weights in document vectors, requires several iterations before the term weights are improved (Brauen, 1971; Ide & Salton, 1971). But, the expert is not interested in receiving the same nonrelevant question several times. The system knows not to retrieve the expert on the same nonrelevant question as soon as the question is added to the nonrelevant collection. A similar adjustment of term weights is done when the expert answers a client's question. The terms of the new Q&A pair that are not in the above vectors are added to them with a small positive weight. The terms that are in them are rewarded while the terms that are not are punished.

Retrieval The retrieval of textual expertise starts at the top vector space. At each stage the top vector is retrieved and an appropriate action is taken, depending on whether the retrieved 6

~ is above IA~ 1 , IA~ 2, and IA~ 3. For example, in gure 7 IA

vector corresponds to a topic or to an information account. If the vector corresponds to a topic, the search proceeds into the vector space of that topic. If the vector represents an information account, the search proceeds rst into its nonrelevant vector space and then, if no high matches are found, into the Q&A vector space. When several vectors are retrieved in a given vector space, the system asks the client to clarify his or her search needs through a set of interfaces, one of which appears in gure 4. Unlike other information retrieval approaches that use spreading activation (Cohen & Kjeldsen, 1987; Burke et al., 1997), CIE does not spread activation at retrieval time. It annotates the terms in the client's question vector at each of the available depth levels and uses the expanded query vector to search the vector spaces. For example, if the question vector is (intelligent20; machine10) and the depth bound is 2, the expanded question vector will contain the following annotated terms: intelligent20, intelligent21, intelligent22, machine10, machine11, machine12. The idea behind this approach is to simulate the spread of activation at retrieval time. Consider again gure 6. If the question of a Q&A pair contains the annotated term computer10 and the depth bound is 2, the annotated term machine11 will be added to the term vector of the Q&A pair. Thus, the client's question vector and the Q&A vector will match at least on one annotated term. The same e ect would be achieved if the activation was spread at run time from the term machine10 down to the term computer11.

The client's feedback in retrieval When the client searches a Q&A collection, the system uses another relevance feedback technique similar to the one proposed by Aalbersberg (1992). The system retrieves the Q&A pairs one by one, as gure 3 shows. At each retrieval, the client can tell the system whether or not the retrieved Q&A pair answers his or her question. The client can also request the retrieval of more Q&A pairs without evaluating the quality of the pair that was just retrieved, in which case the system simply retrieves the next best match. Formally, the client-oriented relevance feedback technique is as follows. Let Q~0 be the vector of the client's question, then for i  0, Q~i+1 is given by ( ~ ~ i if D~ i is relevant i + D ~ Qi+1 =  Q ~Qi ?  D~ i if D~ i is nonrelevant; where = 1:0, = 0:6, = 0:4, and D~ i is the Q&A vector retrieved by Q~ i and is di erent from all of the previously retrieved vectors. Our technique di ers slightly from the one proposed by Aalbersberg in that the number of negative interactions that the client and the system can go through is explicitly limited. If the client is not satis ed with the retrieved result when the limit is reached, the system o ers the client the opportunity to browse the whole Q&A collection or to move the search elsewhere. One advantage of this is that the system knows when to give up and recommend that the client do something else. Another thing to note here is that the client-oriented relevance feedback technique a ects only the weights in question vectors. Unlike the expert-oriented relevance feedback and

negative evidence acquisition, it does not cause any permanent changes in the document term weights. Thus, the system trusts the experts' judgments more than the clients'.

Discussion Interactive and collaborative systems is an area of active research and development in information retrieval, arti cial intelligence, and business psychology. (See, for example, [Cutting et al., 1992; Burke, 1996; Davenport, 1994].) Much of this research and development is technology driven and directed at de ning applications that would require their potential users to engage in new types of interaction and collaboration. Our methodology is di erent. We focus on an interactive and collaborative behavior that already exists, i.e. collaborative and interactive question-answering in organizations, and try to develop a technology that supports it. Our approach is fundamentally human-centered: it is the technology that has to catch up with the people, not vice versa. Most information retrieval systems use relevance feedback as the main method to support interaction with the client. But, while evaluations of relevance feedback have consistently shown improvements in performance (Ide & Salton, 1971; Brauen, 1971; Aalbersberg, 1992), they have also consistently avoided assessments of their user-friendliness. Typically, relevance feedback techniques require multiple retrieval interactions each of which asks the client to state which of the retrieved results are relevant to his or her question and which are not. Essentially, the client has to keep giving feedback until either a good answer appears or the client gives up. In this scenario, the client can easily get overwhelmed with choices or may not have enough knowledge to estimate the relevancy of each retrieved result. To address this problem, Aalbersberg (1992) proposes an interaction technique in which documents are retrieved one by one and the client only has to specify whether or not the most recently retrieved document is relevant. Our client-oriented relevance feedback technique builds on his idea of incremental relevance feedback, but di ers from it both in explicitness and exibility. We explicitly limit the number of retrieval interactions that the client can have with the system in searching a document collection. At each interaction, we o er the client several options: to browse the document collection, redirect the search to another collection, or contact an expert. Another, perhaps less prominent, line of relevance feedback research is aimed at modifying the term weights in the document vectors (Ide & Salton, 1971; Brauen, 1971). Here, the idea is to bring the relevant documents closer to a given question and the nonrelevant ones away from it, so that future clients will bene t from the interactions that the previous clients had with the system. The di erence between this approach and ours is that in our system some vectors in the collection correspond to the experts. If the weights in those vectors are not adjusted fast enough, the expert may receive the same nonrelevant question multiple times. To address this problem, we complement Brauen's technique with our technique of negative evidence acquisition. Information retrieval research has generally paid little attention to word structure. The variability in word forms is typically accounted for via the use of stemmers (Salton & McGill, 1983). However, more recent studies show the importance of morphology in improving performance. Krovetz (1993) demonstrates the importance of morphology in a

stemmer that resolves lexical ambiguity. Rilo (1995) o ers experimental evidence to show that stemming algorithms remove or con ate many words that could be used for e ective indexing. Although we share with these researchers the view that morphology is important, we have a somewhat di erent view of its function. We use morphology only as a means of nding terms that are semantically related to some terms in a document. We do not do any word sense disambiguation for two reasons. First, as Voorhees (1993) shows, using morphology in combination with a constrained spread of activation in WordNet does not result in performance improvement. Second, as Sanderson (1994) demonstrates, IR systems are surprisingly resilient to ambiguity. In fact, Sanderson's experiments show that to be of practical use, disambiguation tools need to operate with at least 90 percent accuracy. To the best of our knowledge, none of the currently available technologies in information retrieval, arti cial intelligence, and computational linguistics is capable of that. The question therefore becomes what approach could provide a moderate step forward. Our position is to do a loosely constrained spread of activation on parts of the document at indexing time and let the term weight metric and incremental relevance feedback home in on useful terms. Our WordNet-based term weight metric given in (3) explicitly decreases the weight of polysemantic words. Although we have not yet done large scale experiments with our term weight metric, we have reason to believe that it will be comparable to other approaches. For example, Burke et al. (1997) present and evaluate a retrieval technique similar to ours that is based on morphological analysis, spreading activation, and inverse document frequency. Their evaluations show the technique to have good performance characteristics. A common criticism of numerical information retrieval approaches by arti cial intelligence researchers is that they are knowledge poor and, hence, cannot perform matching on a deep semantic level (Cohen & Kjeldsen, 1987; Burke et al., 1995). But, the proponents of knowledge intensive approaches often omit from their arguments the fact that knowledge representation has costs. First, all of the necessary knowledge has to be entered into the system before it can be used for deep matches. Second, the knowledge features useful for matching have to be reliably extracted from the inputs. For example, Cohen and Kjeldsen's system, called GRANT, simulates the performance of a funding advisor and relies on an inference-based spreading activation in a semantic network built speci cally for the task. As the authors admit, that semantic network took four person-months to build. The inputs to their system are semantic representations of grant requests. Thus, in oder to system, the client has to master a query language, which limits the range of potential users. Unfortunately, no evaluation is given that compares the performance of GRANT to that of a numerical information retrieval system on the same task. Burke et al. (1995) present an apartment nding system that avoids the problem of query languages by providing the client with a set of browsing interfaces. However, their approach is applicable only in domains with nite sets of features where clients have very speci c search goals expressible in terms of those features. Although our approach uses some knowledge representation, its complexity does not go beyond a single-inheritance hierarchy of topics. The remaining knowledge is acquired by the system automatically from free-text documents. Furthermore, our approach does not put any constraints on the clients' inputs: the clients interact with the system in natural language.

Conclusion In this paper we have presented an approach to building web-based managers of online textual expertise, called Information Exchange systems. The objective of the Information Exchange technology is to meet the growing need of organizations to answer clients' questions quickly and to use their experts eciently. The Information Exchange systems act as intermediaries between the organization's clients and experts and support the organization's routine question-answering activity. We have described the Chicago Information Exchange system that manages the online textual expertise of the University of Chicago Department of Computer Science and acts as an intermediary between its clients and experts. The CIE clients are provided with access to the Department's online textual expertise through interactive question-answering. The CIE experts register their expertise on an appropriate topic, collaborate with each other on the incoming questions, edit their previous answers, and add new answers. Thus, the update and acquisition of online textual expertise occur as natural by-products of the experts' question-answering activity. We have presented the indexing and retrieval techniques and term weight metrics that support interactions and collaborations in question-answering. The CIE term weight metric combines morphological analysis, spreading activation, and two statistical techniques of the vector space model of retrieval. We have also presented two relevance feedback techniques that support the system's interactions with the clients and the experts. We have argued that to support adequate interactions with the experts, relevance feedback needs to be complemented with negative evidence acquisition. Finally, a major advantage of our approach is that it is fundamentally human-centered. We focus on an interactive and collaborative behavior that already exists, i.e. interactive and collaborative question-answering in organizations, and try to develop a technology that supports it to the bene t of both clients and experts.

References Aalbersberg, I.J. (1992). \Incremental relevance feedback." Proceedings of the 15th Annual International SIGIR Conference, 1992 (pp. 11-21). Bookstein, A., Klein, S.T, & Raita, T. (1995). \Detecting Content-Bearing Words by Serial Clustering - Extended Abstract." Proceedings of the ACM SIGIR Conference, 1995 (pp. 319-327). Bookstein, A., Klein, S.T, & Raita, T. (expected 1998). Clumping Properties of ContentBearing Words. Journal of the American Society for Information Science. Brauen, T.L. (1971). \Document vector modi cation." In Salton, G. (Ed.), The SMART retrieval system: Experiments in automatic document processing (pp. 456-484). Englewood Cli s, NJ: Prentice-Hall, Inc. Burke, R.D., Hammond, K.J., Kulyukin, V., Lytinen, S.L., Tomuro, N., & Schoenberg, S.

(1997). Question answering from frequently asked question les: Experiences with the FAQ Finder system. AI Magazine, 18, 57-66. Burke, R., Hammond, K.J., & Young, B.C. (1996). \Knowledge-based navigation of complex information spaces." Proceedings of the 14th National Conference on Arti cial Intelligence (AAA-94). Cohen, P.R., Kjeldsen, R. (1987). Information retrieval by constrained spreading activation in semantic networks. Information Processing & Management, 23, 255-268. Cutting, D.R., Pederson, J.O., Karger, D., & Tukey, J.W. (1992). Scatter/Gather: A cluster-based approach to browsing large document collections. Proceedings of the 15th Annual International SIGIR Conference, 1992 (pp. 318-329). Davenport, T. (1994). Saving IT's soul: Human-centered information management. Harvard Business Review (pp. 119-131). Francis, W., & Kucera, H. (1982). Frequency Analysis of English Usage. New York: Houghton Muin. Ide, E., & Salton, G., (1971). \Interactive search strategies and dynamic le organization in information retrieval." In Salton, G. (Ed.), The SMART retrieval system: Experiments in automatic document processing (pp. 456-484). Englewood Cli s, NJ: Prentice-Hall, Inc. Krovetz, R. (1993). \Viewing morphology as an inference process." Proceedings of ACM SIGIR Conference, 1993 (pp. 191-202). Miller, G. A. 1995. WordNet: A Lexical Database for English. Communications of the ACM, 38(11). Quillian, M. R. 1968. Semantic Memory. In Minsky, M. (Ed.), Semantic Information Processing (pp. 216-270). Cambridge, MA: MIT Press. Rilo , E. (1995). \Little words can make a big di erence for text classi cation." Proceedings of ACM SIGIR Conference, 1995 (pp. 130-136). Salton, G., & McGill, M. (1983). Introduction to modern information retrieval. New York: McGraw-Hill. Salton, G., Yang, C.S., & Yu, C.T. (1975). A theory of term importance in automatic text analysis. Journal of the American Society for Information Science., (pp. 33-44). Sanderson, M. (1994). \Word disambiguation and information retrieval." Proceedings of ACM SIGIR Conference, 1994. Sparck Jones, K., (1972). A statistical interpretation of term speci city and its application in retrieval. Journal of Documentation, 28, 11-21. Voorhees, E.M., (1993). \Using WordNet to disambiguate word senses for text retrieval." Proceedings of ACM SIGIR Conference, 1993 (pp. 171-180).

Suggest Documents