Domain-Specific FAQ Retrieval Using Independent ... - Semantic Scholar

Domain-Specific FAQ Retrieval Using Independent Aspects CHUNG-HSIEN WU, JUI-FENG YEH, AND MING-JUN CHEN National Cheng Kung University, Tainan, Taiwan ________________________________________________________________________ This investigation presents an approach to domain-specific FAQ (frequently-asked question) retrieval using independent aspects. The data analysis classifies the questions in the collected QA (question-answer) pairs into ten question types in accordance with question stems. The answers in the QA pairs are then paragraphed and clustered using latent semantic analysis and the K-means algorithm. For semantic representation of the aspects, a domain-specific ontology is constructed based on WordNet and HowNet. A probabilistic mixture model is then used to interpret the query and QA pairs based on independent aspects; hence the retrieval process can be viewed as the maximum likelihood estimation problem. The expectation-maximization (EM) algorithm is employed to estimate the optimal mixing weights in the probabilistic mixture model. Experimental results indicate that the proposed approach outperformed the FAQ-Finder system in medical FAQ retrieval. Categories and Subject Descriptors: I.2.7 [Artificial Intelligence]: Natural Language Processing - Text analysis; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval - Retrieval models, Query formulation, Search process; H.2.4 [Database Management]: Systems - Query processing General Terms: Algorithms, Design, Experimentation, Performance Additional Key Words and Phrases: Information retrieval, natural language processing, FAQ retrieval, question-answering, ontology, probabilistic mixture model, latent semantic analysis.

________________________________________________________________________ 1. INTRODUCTION During the last decade, question-answering (QA) systems were designed to find the most similar question-answer pairs with respect to user queries. The Text REtrieval Conference (TREC) enables researchers to share their experiences and provides a global metric for assessing QA system performance. These methods are summarized as rulebased, statistical, and mixed approaches. Rule-based QA systems usually contain rule chaining and inference strategies [Lin and Pantel 2001;Wang et al. 2000]. Statistical approaches [Clarke et al. 2002; Brill et al. 2002] are able to mine the data using the ngram model. The mixed approach was realized in QASM [Radev et al. 2001], which included the Expectation-Maximization (EM) algorithm and 15 basic operations for the rule chain inference. In Wang et al. [2000], the information is organized in a tabular form, and the QA system can understand the tabular structure by inference. AskMSR [Brill et al. 2002] can mine the data using question categories. Some systems such as FAQ-Finder [Hammond et al. 1995; Burke et al. 1997], QASM [Radev et al. 2001], and the systems proposed in TREC [Na et al. 2002; Paranjpe et al 2003] used the WordNet as a lexicon. Automated question-answering methods proposed in ChuCarroll et al. [2002] and Sneiders [2002] used multi-lexicons as the knowledge base. In __________________________________________________________________________________________ Authors' addresses: C.H. Wu, J.F. Yeh, and M.J. Chen. Department of Computer Science and Information Engineering, National Cheng Kung University, Ta-Hsueh Rd., Tainan, Taiwan; email:{chwu, jfyeh, mjchen}@csie.ncku.edu.tw Permission to make digital/hard copy of part of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date of appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Permission may be requested from the Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036, USA, fax:+1(212) 869-0481, [email protected] © 2005 ACM 1530-0226/05/0300-0001 $5.00 ACM Transactions on Asian Language Information Processing, Vol. 4, No. 1, March 2005, Pages 1-17.

1

2

•

C. –H. Wu et al.

addition, ontology has been demonstrated to provide a sound semantic basis in Tong et al. [2003] and Ahmedi and Lausen [2002]. On the other hand, instead of generating the desired answers in QA systems, FAQ systems retrieve the existing QA pairs from the frequently-asked question files [Burke et al. 1997]. In general, FAQ retrieval is a task of retrieving information from a set of semistructured texts. FAQ systems include the following characteristics: (1) the FAQ system is designed for the retrieval of the very frequent, popular, and highly reusable questionanswer pairs, called QA pairs; (2) QA pairs are generally maintained and periodically posted on the Internet; (3) QA pairs are usually provided or verified by domain experts. FAQ retrieval is also different from traditional information retrieval, in that in general no semantic representation and knowledge are used in traditional information retrieval, while FAQ retrieval is usually domain-specific and adopts inference and reasoning to retrieve a more accurate QA pair for a query. There was much research focusing on FAQ retrieval in the past few years. Auto-FAQ [Whitehead 1995] relied on a shallow, surface-level analysis for FAQ retrieval. FAQFinder [Hammond et al. 1995; Burke et al. 1997] adopted two major aspects, i.e., concept expansion using the hypernyms defined in WordNet and the TF-IDF weighted score in the retrieval process. In FAQ-Finder, interrogative words like “what” and “how” are, respectively, the substrings of interrogative phrases “for what” and “how large”. This results in misdetection of question types, and therefore degrades system performance. To eliminate the above problem in FAQ-Finder, Tomuro [2002] combined lexicon and semantic features to automatically extract the interrogative words from a corpus of questions. Besides WordNet, FALLQ [Lenz et al. 1998] retrieved FAQs via case-based reasoning (CBR). Sneiders [1999] defined four types of words, i.e., required keywords, optional keywords, forbidden keywords, and irrelevant words for prioritized keywordmatching. Sneiders [2002] used question templates with entity slots that are replaced by data instances from the underlying database to interpret the structure of queries and questions. Berger et al. [2000] proposed a statistical lexicon correlation method. DiQuest e-Answer [Lee and Lee 2003] used the dynamic passage selection and lexicon-semantic patterns for FAQ retrieval. Although in the past open-domain systems were a hot topic for QA systems and information retrieval [Baeza-Yates and Ribeiro-Neto 1999], a domain-specific FAQ retrieval system with high content-to-noise ratio is still a core research topic for practical consideration. This investigation focuses on medical domain FAQ retrieval using independent aspects. In this approach, questions in the QA pairs are classified according to the question stems. In order to provide a more precise answer to a query, each answer in the QA pairs is segmented into several paragraphs. These paragraphs are then clustered using Latent Semantic Analysis (LSA) [Manning et al. 1999] and the K-means algorithm. A probabilistic mixture model is adopted for retrieval modeling where the input query and the QA pairs can be interpreted based on independent aspects. Utilizing the probabilistic mixture model, the retrieval process is considered a maximum likelihood problem. The probabilistic mixture model based on the independent aspects provides a powerful and conceptually transparent formalism for the causal relation and identifies the desired answer from the answer part of the QA pairs, even if the user query is not included in the QA questions. Additionally, predefined relation rules, such as “result-in” and “result-from,” regarding the relationship between diseases and syndromes proposed in Yeh et al. [2004], is adopted in our medical FAQ retrieval system. Finally, the EM algorithm is employed to estimate the optimal mixing weights in the probabilistic mixture model. ACM Transactions on Asian Language Information Processing, Vol. 4, No. 1, March 2005.

2

Domain Specific FAQ Retrieval Using Independent Aspects

•

3

The rest of this article is organized as follows. Section 2 provides the data analysis and answer paragraphing. Section 3 presents the modeling of a problem for the FAQ retrieval system and gives the probabilistic mixture model-fitting using the EM algorithm. Section 4 presents the experimental results for the evaluation of our approach. Section 5 provides concluding remarks. 2. DATA ANALYSIS AND ANSWER PARAGRAPHING This study constructs a medical FAQ retrieval system with a collection of 1172 medical QA pairs from the Internet. For the collected QA pairs, the question and answer parts convey different information and should be analyzed separately. For the question part, the question stem is the most important word in an interrogative sentence. Consequently, ten categories of question types are derived based on question stems. For the answer parts, the answers in the QA pairs can be classified into four types based on their contents: (a) Boolean type: answers with only “yes” or “no” words Example: Q: 是不是每一位病毒性肝炎病人都需要接受干擾素治療？ (Should everyone with virus hepatitis be cured using Interferon?) A: 不是 (No) (b) Set type: answers with only some keywords or concepts. Example: Q: 我終年鼻塞為什麼呀? (Why do I have a stuffy nose all the time?) A: 終年鼻塞原因很多。「肥厚性鼻炎」、「鼻中隔彎曲」引起最多。「鼻竇炎」也有可能。 (There are many reasons for a stuffy nose. “Plumped rhinitis” and “crooked nasal septum” are the most frequent causes. One other possibility is due to nasosinusitis.) (c) Short description type: answers with a short paragraph. Generally, this type does not provide neat answers. Some noisy concepts should be filtered out for this type. Example: Q: 我們為何會產生背部疼痛？ (Why do we have backaches?) A: 大部分的時間我們並沒有以嚴肅的態度來處理我們的背部,快速地坐下或不正常的站姿以及在搬重物時不是以足部施力而以腰部取代之,… (Most of the time we don't treat our backs seriously. We slump down in our chairs, slouch when we stand, lift with our backs instead of our legs, and carry too much weight around our waists...) (d) Long description type: answers with different topics and long descriptions. Greetings and claims to copyright usually appear in this type. More noisy concepts are embedded in this answer type than in the others. Example: Q: 手機致癌首項證據德國科學家宣稱手機輻射可能導致眼癌 (Radio-frequency radiation exposure linked with increased risk of uvea melanoma)

ACM Transactions on Asian Language Information Processing, Vol. 4, No. 1, March 2005.

3

4

•

C. –H. Wu et al.

A: 德國科學家發現手機輻射可能會導致眼癌，這是科學界首度提出使用手機和致癌的關聯性。…. 德國埃森大學(University of Essen)的史丹格（Andreas Stang）大夫負責主持這項研究。… (Mobile phones have been strongly linked to human cancer in a new scientific study. The research found a threefold increase in eye cancers among people who regularly use the devices. … The research, published in the journal Epidemiology, was carried out by a team from the University of Essen in Germany.) From the data analysis, long answers containing multiple topics are segmented into several paragraphs using a topic-segmentation algorithm [Hsieh et al. 2003]. Since it is difficult to collect sufficient paragraphs for the statistical approach, paragraphs should be clustered. In previous approaches, most semantic theories characterized the meaning of a sentence on the basis of the meanings of its constituent parts and the way they were combined. In this study, latent semantic analysis is adopted for distance estimation and is used to filter out the noisy constituents and thus preserve the discriminative concepts in a vector. Finally, the segmented 3312 paragraphs were assembled into 320 clusters via the K-means algorithm. 3. PROBLEM MODELING FOR QA In this study, the FAQ retrieval system compares the user query with every one of the collected QA pairs [Hammond et al. 1995]. The problem can be characterized using a conditional probability model. In this model, the QA pairs can be interpreted based on a set of independent aspects, S = {s1 , s2 ,..., sM } , expanded from the query q. The conditional probability model over all QA pairs and their associated queries is defined by the mixtures P(QA | q) = ∑ P( s | q) P(QA | s, q) , where q denotes the query and QA is a QA s∈S

pair. Consider the class-conditional distributions P(⋅ | s, q) over all QA pairs. These distributions can be represented as the points projected from the set of aspects S = {s1 , s2 ,..., sM } . From the above model assumption, each QA pair can be interpreted using M independent aspects, with M Q aspects from the question part and M A aspects from the answer part, where M Q + M A = M . As in many conditional probabilistic models, the conditional independence assumption is introduced, namely that only the corresponding interpretation of each aspect in a QA pair can be activated. Accordingly, the equation can be further simplified as P(QA | q ) = ∑ P( s | q) P(QA | s, q ) s∈S

= ∑ P( s | q ) P(Q1 ,..., QM Q A1 ,..., AM A | s, q ) s∈S MQ

≈ ∑ ( P( s = sQ , m | q ) P(Qm | s = sQ , m , q) ) m =1 MA

+ ∑ ( P( s = s A, m | q ) P( Am | s = s A, m , q ) ), m =1


4

(1)


•

5

where Qm represents the m-th interpretation from the question part with respect to the mth aspect sQ ,m , and Am represents the m-th interpretation from the answer part with respect to the m-th aspect

s A, m .

In the FAQ retrieval system, from the modeling above, the retrieval process can be considered the maximum likelihood estimation. The objective of this study is to identify the best QA pair, QA* , which maximizes the likelihood of P (QA | q) from the collected QA pairs; that is,

QA* = arg max P(QA | q ) .

(2)

QA

To solve this problem, the aspects for the question and answer can span many diverse features, such as question stems and syntactic and semantic information. This investigation considers the aspects for the question and answer parts separately. 3.1 Aspects for the Question Part Three aspects are investigated separately for the question part, namely: (1) the question stem, (2) the key concept, and (3) the vector space representation.

3.1.1 Aspect sQ,1 : The Question Stem. In this aspect, question stems, the interrogative words like “who,” “how,” “what,” etc., in a sentence provide one of the useful classifications of factual questions. The question stems are used as coarse clues in the identification of the expected answer types [Moldovan et al. 2003]. The question stems in questions and queries are analyzed using the collected query and QA pairs. The conditional probability of the i-th QA pair with respect to aspect sQ,1 and query q is defined as P (Q1,i | sQ ,1 , q ) = P ( Stem(Qi ) | sQ ,1 = Stem(q) ) =

P ( Stem(Qi ) | sQ ,1 = Stem(q ) )

∑ P ( Stem(Qi ) | sQ ,1 = Stem(q ) )

,

(3)

i

where Stem(⋅) represents the question stem of the query or question. 3.1.2 Aspect sQ ,2 : The Key Concept. For semantic representation, in practice traditional keyword-based systems introduce two problems. First, ambiguity generally results from the polysemy of a word due to over expansion. Second, relations between the concepts should be expanded and weighted to include more semantic information. Hence, in this article, we adopt a medical ontology proposed in Yeh et al. [2004]. The hierarchical structure in WordNet is employed as the basis. The ontology is constructed by aligning the synsets in WordNet with the corresponding Chinese words defined in HowNet. A medical domain corpus is collected and used to extract the domain-specific concepts according to the TF-IDF score. Finally, 1213 axioms derived from a medical encyclopedia are also integrated into the domain ontology. This study treats the question and each input query as a bag of words including the interrogative question words. For the words of the query q=Wq=wq1, wq2 ,…, wqK and the words of the question Q=WQ=wQ1, wQ2, …, wQL, the similarity between the input query ACM Transactions on Asian Language Information Processing, Vol. 4, No. 1, March 2005.

5

6

•

C. –H. Wu et al.

and the question is defined as the similarity between the two bags of words. The similarity measure based on key concepts defined in the constructed ontology [Yeh et al. 2004] is defined as follows. SimKey ( Q, q ) = SimKey (WQ , Wq )

= SimKey ( ( wQ1 , wQ 2 ,..., wQL ), ( wq1 , wq 2 ,..., wqK ) )

(4)

K ,L

= ∑ H kl k =1, l =1

where Hkl denotes the concept similarity of wQl and wqk. Most keyword expansion approaches use the extension of scope by the synonyms. This investigation defines the similarity Hkl as 1,   r 1 2 , H kl =   t  1 −1 2  0,

(

if wQl and wqk are identical if wQl and wqk are hypernyms; r is the number of levels in between

)

2

,

if wQl and wqk are synonyms;

(5)

t is the number of common concepts others

The conditional probability of the i-th QA pair with respect to aspect sQ ,2 and query q is defined as P (Q2,i | sQ ,2 , q) = P ( KC (Qi ) | sQ ,2 = KC (q ), q ) =

SimKey ( Qi , q )

(6)

∑ SimKey ( Qi , q ) i

where KC (⋅) represents the key concept of the query or question. 3.1.3 Aspect sQ ,3 : Vector Space Representation. The LSA approach is generally adopted in vector space representation. The mapping from the query to the vector space can be obtained using singular value decomposition [Manning and Schutze 1999]. Besides dimensionality reduction, this approach can be used to measure the similarity between the question part of QA pairs and a query. The similarities between the query and question vectors are calculated in the reduced LSA space. Meanwhile, the conditional probability of the i-th QA pair with respect to aspect sQ ,3 and query q is defined as P (Q3,i | sQ ,3 , q ) = P (VS (Qi ) | sQ ,3 = VS (q), q ) =

cos (VS (q), VS (Qi ) )

(7)

∑ cos (VS (q), VS (Qi ) ) i

where cos(⋅) denotes the cosine function and VS (⋅) represents the vector space representation of the query or question. ACM Transactions on Asian Language Information Processing, Vol. 4, No. 1, March 2005.

6


•

7

Fig. 1. Axiom for the disease (diabetes) and its related syndromes.

3.2 Aspects for the Answer Part For the answer part, two aspects are investigated separately, namely: (1) the relation and (2) the paragraph cluster.

3.2.1 Aspect s A,1 : Relations. Relations such as “result in” and “result from” that describe the relationship between syndromes and diseases in the medical domain are expected to influence system performance. Data on syndromes and diseases was obtained from a medical encyclopedia that defines 1213 axioms containing the relationships between diseases and syndromes. Each disease was assigned to one of three levels, depending on the number of its occurrences; while each syndrome was assigned to one of four levels, depending on its significance to a specific disease [Yeh et al. 2004]. Figure 1 shows an example of the axiom for the disease “diabetes.” The “result in” relation score is defined as RI ( Ai , q) if a disease occurs in the input query and its related syndromes appear in the answer part of the QA pair. Similarly, if the syndrome occurs in the input query and its corresponding disease appears in the answer part of the QA pair, the relation score is defined as the “result from” relation score RI ( Ai , q) . The relation score is estimated as follows. Re l ( A, q ) = max{RI ( A, q ), RF ( A, q )} = max{RI ( wA1 ,...wAP , wq1 ,...wqR ), RF ( wA1 ,...wAP , wq1 ,...wqR )} P, R

P,R

p =1, r =1

p =1, r =1

(8)

= max{ ∑ d prRI , ∑ d prRF },

where d prRI = 1 2n −1 if disease of wAp ; d

RF pr

=1 2

n −1

wAp results in syndrome wqr , and wqr is the top-n feature

if syndrome wAp results from disease wqr , and wAp is the top-n

feature of wqr . The conditional probability of the i-th QA pair with respect to aspect s A,1 and query q is defined as P ( A1,i | s A,1 , q ) = P ( REL( Ai ) | s A,1 = REL(q ), q ) = Re l ( Ai , q ) ∑ Re l ( Ai , q )

(9)

i

where REL(⋅) represents the relation of the query or answers. ACM Transactions on Asian Language Information Processing, Vol. 4, No. 1, March 2005.

7

8

•

C. –H. Wu et al.

3.2.2 Aspect s A,2 : Paragraph Clusters. For the long description answer type, paragraphs are generally treated as separate documents and the document-oriented retrieval techniques are applied to retrieve QA pairs [Harabagiu and Maiorano 1999; Clarke et al. 2001]. Accordingly, in this aspect, all answer parts with short or long description types in the QA pairs are segmented into paragraphs via a topic-segmentation algorithm [Hsieh et al. 2003]. For all the segmented paragraphs, the LSA and the K-means algorithm are adopted to assemble all the paragraphs into clusters. Therefore, paragraphs in each cluster are assumed to belong to the topics with semantically similar words. Given a query q denoted as a sequence of concepts Wq, this study assumes that each paragraph cluster is statistically independent, and the conditional probability of the answer with n paragraphs p1 ,..., pn is described as follows: P ( A | s A,2 , q ) = P( PC1 ,..., PCn | wq1 , wq 2 ,....wqR ) n

= ∏ P( PC j | wq1 , wq 2 ,....wqR )

(10)

j =1

where PC j represents the paragraph cluster of the j-th paragraph. Using the Bayes rule, the above equation is derived as follows: P( PC j | wq1 , wq 2 ,....wqR ) =

P ( PC j ) P ( wq1 , wq 2 ,....wqR | PC j ) P( wq1 , wq 2 ,....wqR )

(11)

∝ P ( PC j ) P ( wq1 , wq 2 ,....wqR | PC j ) R

= P ( PC j ) ∏ P ( wqr | PC j ) r =1

The prior probability can be estimated as k

P ( PC j ) = N ( PC j )

∑ N ( PC ) j

j =1

where N(PC) represents the number of occurrences of paragraph cluster PC and k denotes the number of clusters. P( wqr | PC j ) represents the probability of wqr , given the paragraph cluster PCj estimated from the collected QA pairs. In the proposed approach, paragraphs are selected as the query outputs rather than the whole answer. The conditional probability of the answer with n paragraphs can be considered the conditional probability of the paragraph with the highest probability for a particular input query P( p1 ,..., pn | q) ≡ max P( p j | Wq ) j

≈ max P( PC j | wq1 , wq 2 ,....wqR )

(12)

j

R

= max P( PC j ) ∏ P(wqr | PC j ) j

r =1

The conditional probability of the i-th QA pair with respect to aspect s A,2 and query q is


8


•

9

thus defined as P ( A2,i | s A,2 , q) = P ( PC ( Ai ) | s A,2 = PC (q ), q ) R

max P ( PC ij ) ∏ P ( wqr | PC ij ) j

≡

(13)

r =1 R

i i ∑ max P( PC j ) ∏ P( wqr | PC j ) i

j

r =1

where PC (⋅) represents the paragraph cluster of the answer and PC ij denotes the paragraph cluster for the j-th paragraph in the i-th answer. 3.3 Probabilistic Mixture Model Training Using the EM Algorithm From the modeling above, a probabilistic mixture model is proposed to linearly combine the above aspects for FAQ retrieval. This mixture model has a density function P (QA | q, Θ) that is governed by the set of parameters Θ . The model also has a data set of size N, supposedly drawn from this distribution, i.e., (QA, q ) = {(QAi , qi ), i = 1,..., N } . This study assumes that these data vectors are independent and identically distributed with distribution P. Therefore, the resulting density for the data is P (QA | q, Θ) = ∑ P ( s | q, Θ) P (QA | s, q, Θ) s∈S

= ∑ P ( s | q, Θ) P (Q1 ,..., QM Q A1 ,..., AM A | s, q, Θ) s∈S

(14)

MQ

= ∑ P ( s = sQ , m | q, Θ) P (Qm | s = sQ , m , q, Θ) m =1

MA

+ ∑ P ( s = s A, m | q, Θ) P ( Am | s = s A, m , q, Θ) m =1

where the unobserved parameters are Θ = ( P ( sQ,1 | q, Θ), P ( sQ ,2 | q, Θ),..., P( s A,2 | q, Θ)) . The mixture weight parameters are estimated using the EM algorithm [Dempster et al. 1977]. 4. EXPERIMENTAL RESULTS A medical domain FAQ retrieval system was constructed in order to evaluate the proposed approach. The QA pairs were collected from the websites and contain a total of 1172 medical QA pairs. Forty individuals who did not participate in system development were asked to provide queries given the answer part of the QA pairs. Moreover, 5371 training queries and 5907 test queries were obtained. On average, each query corresponds to the answers from 17.23 QA pairs, and there are 6.72 terms per query, as listed in Table I in the QA database.

Table I. Statistics in the Query Collection Number of queries

11,278

Average no. of terms per query Average no. of QA pairs related to a query 6.72

17.23


9

10

•

C. –H. Wu et al.

For comparison with the FAQ-Finder [Burke et al. 1997], the approaches based on the vector space model using key term expansion and the TF-IDF score were implemented as the baseline system. A performance measure, called 11-AvgP, or the average of precision at 11 standard recall points (0.0, 0.1, 0.2,…, 1.0) [Eichmann and Srinivasan 1998], was used to integrate the precision and recall rates. The evaluation of the FAQ retrieval system for answers with different types to assess the effects of different aspects on the overall performance of the system is described in Section 4.1. The overall evaluation for all answer types is described in Section 4.2. 4.1 Evaluation of Answers with Different Types. 4.1.1 The Boolean Type. In this experiment, Boolean type answers were evaluated first; the results are shown in Figure 2. The horizontal axis shows the range of recall (0% to 100%) and the vertical axis shows the range of precision (0% to 100%). For Boolean type answers, since the question part of the QA pairs contains most of the information, the baseline FAQ-Finder system obtained good precision and recall rates. The best 11-AvgP score using question stems achieved 0.6643, shown in Table II. Since the most important information in the QA pairs with Boolean type answer is embedded in the question part, the experiment using only the question stem obtained the most promising results. However, similar to the question type misclassification problem described in Hammond et al. [1995], this study still has some problems when using question stems: (1) there exists confusion between some question stems such as “什麼 (what)” in question type 1 and “為什麼 (why)” in question type 4 because the word “什麼 (what)” is a substring of the word “為什麼 (why),” and the error in word segmentation results in misclassification of question stems; Table II. Best 11-AvgP Score and Relative Aspect for Answers with Different Types Answer type The Best 11-AvgP Aspect

Boolean 0.6643 Question stem

Set 0.6732 Relation rule

Description 0.6327 Paragraph cluster

100

Precision Rate (%)

90 80 70 60 50 40 30 FAQ-Finder Key Concept Probabilistic mixture model

20 10 0 0

10

20

30

40

Question stem Vector space representation 50

60

70

80

90

Recall Rate (%) Fig. 2. Precision-recall rate curves for Boolean type answers.


10

100


•

11

Precision Rate (%)

100 90 80 70 60 50

FAQ-Finder Question stem Vector Space Representation Key Concept Relation Probabilistic mixture model

40 30 20 10 0 0

10

20

30

40

50

60

70

80

90

100

Recall Rate (%) Fig. 3. Precision-recall rate curves for Set type answers.

(2) some queries not containing the question stems are sometimes misdetected as other question types. For example, the query “流感病毒感染幼童的途徑很多 (The flu virus infects the children in many ways)” was detected as type 6, in which the word “ 多 (many)” was misdetected as the question stem “多 (how many)”; and (3) although there are some words not collected in the question stem set, the intention is still embedded in the sentence pattern. For example, the query “可用阿司匹靈治療否？ (May it be cured using aspirin?)” contains the word pattern “可…否” not defined in the question stem set. However, the word pattern “可…否” can be regarded as question stem “可否 (may)” in type 8. In addition, both recall and precision rates were improved by using the key concept aspect, especially for medical terms contained in the user query or QA pairs. This is because the key concept also plays an important role in characterizing the question part of the QA pairs in Boolean type answers. Moreover, we can see that the relation aspect using the defined relations is also important for Boolean type answers. The reasons for these results are (1) synset pruning [Turcato et al. 2000] of polysemous words reduces word ambiguity; and (2) expansion by the other relations provides more effective inference. 4.1.2 Set Type. In this experiment on the Set type answers the relation between diseases and syndromes is important for semantic inference. There are 81.3% QA pairs relevant to diseases and syndromes in medical domain FAQ retrieval. The experimental results are shown in Figure 3; the best 11-AvgP score using the relation rules achieved 0.6732 as shown in Table II. The results show that the relations “result in” and “result from” greatly improve the retrieval performance for the answers with Set type. According to the investigation of the experimental results, we can see that significant improvement is obtained when the diseases and their related syndromes (and vice versa) appear, respectively, in both the question and answer parts. Besides, improvement in the recall rate is also due to robustness in answering users’ incomplete queries. That is, even though the user query only contains one or two syndromes out of three resulting from one disease, the desired answer can also be retrieved using the defined relations. Moreover, the approach using question stems also achieved good results for question types 1 and 9 because the correspondence between the question stem and answers is fixed. Hence we ACM Transactions on Asian Language Information Processing, Vol. 4, No. 1, March 2005.

11

12

•

C. –H. Wu et al.

find that some new words are contained in the set type answers. The OOV (out-ofvocabulary) problem is another important issue for this type, especially in domainspecific applications. 4.1.3 Short and Long Description Types. In this experiment, the answers with the long description type were segmented into paragraphs. The results for answers with short and long description types are shown, respectively, in Figures 4 and 5. We found that the performance for the FAQ retrieval system with the paragraph-based approach is better than with the whole-answer-based approach, showing that a paragraph with a specific topic can provide a more precise answer. The best 11-AvgP score is 0.6327 using paragraph clusters as shown in Table II, which agrees with the conclusion in our previous data analysis. In addition, vector space representation and paragraph clustering using LSA were beneficial in keeping information for the folding-in query, which refers to the query with the question not contained in the original training collection. According to this experiment, the paragraph or subparagraph with a focused topic is more precise than the document as a whole and more suitable as the unit for query output. 100

Precision Rate (%)

90 80 70 60

FAQ-Finder Question stem

50

Vector Space Representation KeyConcept Relation Paragraph cluster

40 30 20 10

Probabilistic mixture model

0 0

10

20

30

40

50

60

70

80

90

100

Recall Rate (%) Fig. 4. Precision-recall rate curves for short description type answers. 100

Precision Rate (%)

90 80 70 60 50 40 30 20 10

FAQ-Finder

Question stem

Vector Space Representation

KeyConcept

Relation

Paragrapg cluster

Probabilistic mixture model

0 0

10

20

30

40

50

60

70

80

90

Recall Rate (%) Fig. 5. Precision-recall rate curves for long description type answers. ACM Transactions on Asian Language Information Processing, Vol. 4, No. 1, March 2005.

12

100


•

13

100

Precision Rate (%)

90 80 70 60 50 40 30

FAQ-Finder

20 Probabilistic Mixture Model

10 0 0

10

20

30

40

50

60

70

80

90

100

Recall Rate (%) Fig. 6. Comparing the performance of systems for all answer types.

On the other hand, the key concept approach does not obtain promising results for short and long description types because the number of words in the answers with short and long description types is larger than for other types. Question stemming and relation inference play a major role in improving performance when the length of the answer is short, but the improvement will decrease as the length of the answer increases. 4.2 Overall Evaluation for All Answer Types In this experiment, the answers with all types were experimented on to evaluate overall performance. The precision recall curve of the entire model and the performance using the nAP measure were evaluated.

4.2.1 Evaluation of the Mixture Model Using the Precision Recall Curve. The probabilistic mixture model with equal weights and the baseline FAQ-Finder system were also evaluated for comparison; the results are shown in Figure 6. In this figure, a best 11-AvgP score of 0.6632 is obtained. The probabilistic mixture model with equal weights is only slightly inferior to the probabilistic mixture model trained with the EM algorithm. Compared to the baseline system, the proposed probabilistic mixture model achieved a conspicuous improvement in the precision rate. This result also confirms that the integration of different aspects, which span the question stem and syntactic to semantic information, improves retrieval performance. In addition, approaches based on different aspects are useful in covering different answer types. Also, the hierarchical structure for key concept expansion defined in Eq. (5) obtains a better result than the semantic expansion in the baseline system. 4.2.2 Performance Evaluation Using the nAP Measure. The non-interpolated Average Precision Rate (nAP) [Schapire et al. 1998] was also adopted for comparing performances. We define the difference equation as follows:

nAPA / B = nAPA − nAPB

(15)


13

14

•

C. –H. Wu et al.

0.2 nAP: ontology/synonym

0.15 0.1 0.05 0 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

>15

-0.05 -0.1 -0.15 -0.2 The number of key concepts in the query

Fig. 7. The nAP measure for comparing key concept expansion via ontology and synonym

nAP: key concept/vector space

A value of nAPA / B equal to 0 indicates that both methods have equivalent performance. A positive value of nAPA / B indicates a better performance by method A, while a negative value indicates a better performance by method B. Using this evaluation method, the performance for key concept expansion via ontology is better than that via synonym; the result is shown in Figure 7. According to our investigation, expansion via ontology got better performance when the number of key concepts was smaller than 5. On the other hand, synonym pruning [Turcato et al. 2000] eliminated the problem of unnecessary expansion when the number of key concepts is larger than 8. Comparing the performance of vector space representation and the key concept when the number of key concepts per query is smaller than 6, shows that the key concept approach obtains a better performance; otherwise the vector space representation is better, as shown in Figure 8. This is because vector space representation using LSA provides latent semantic inference when the key concepts do not exactly match the words in the answers. 1 0.5 0 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15 >15

-0.5 -1 Number of terms per query

Fig. 8. The nAP measure for comparing the key concept and vector space representation as a function of the number of terms in a query ACM Transactions on Asian Language Information Processing, Vol. 4, No. 1, March 2005.

14


•

15

5. CONCLUSIONS This article presents a probabilistic mixture model using independent aspects for medical domain FAQ retrieval. First, data analysis is performed to categorize the QA pairs into 10 question types and 4 answer types. In FAQ retrieval using the proposed model, the input query and the QA pairs can be interpreted as a set of independent aspects, and a probabilistic mixture model is adopted to model FAQ retrieval. Based on this mixture model, the retrieval process can be considered the maximum likelihood estimation problem. The EM algorithm is employed to optimize the mixing weights in the mixture model. For evaluating performance on the test database, the approach using question stems achieved promising results for the Boolean and set type answers. Rather than keywords, the concepts in the ontology are important in term expansion using synonym pruning and expansion from relations for the specific domain. The folding-in answers can be effectively retrieved by the LSA approach, especially for long description answers. Additionally, relation rules are important for a specific domain and are effective for the case in which the question and answer parts contain no common concepts. Moreover, the retrieval outputs based on topic-based paragraphs outperformed those based on the whole answer. Finally, the experimental results demonstrate that the mixture model can effectively improve the performance of the medical domain FAQ retrieval system compared to the baseline FAQ-Finder system. APPENDIX A. EXAMPLES OF TEN TYPES OF QUESTION STEMS


15

16

•

C. –H. Wu et al.

REFERENCES AHMEDI, L. AND LAUSEN, G. 2002. Ontology-based querying of linked XML documents. In Proceedings of the Semantic Web Workshop 2002 at the 11th International World Wide Web Conference (WWW 2002, Hawaii). 7–11. BAEZA-YATES, R. AND RIBEIRO-NETO, B. 1999. Modern Information Retrieval. Addison-Wesley, Reading, MA, 1999. BERGER, A., CARUANA, A., COHN, D., FREITAG, D., AND MITTAL, V. 2000. Bridging the lexical chasm: Statistical approaches to answer-finding. In Proceedings of ACM SIGIR Conference. ACM, New York. 192–199. BRILL, E., DUMAIS, S., AND BANKO, M. 2002. An analysis of the AskMSR question-answering system. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing. 291–298. BURKE, R. D., HAMMOND, K., KULYUKIN, V., LYTINEN, S. L., TOMURO, N., AND SCHOENBERG, S. 1997. Question answering from frequently-asked-question files: Experiences with the FAQ Finder system. Tech. Rep. TR-97-05, Dept. of Computer Science, University of Chicago. Chicago, IL. CHU-CARROLL, J., PRAGER, J., WELKY, C., CZUKA, K., AND FERRUCCI, D. 2002. A multi-strategy and multisource approach to question answering. In Proceedings of the TREC 2002 Conference. 281–288. CLARKE, C. L. A., CORMACK, G. V., AND LYNAM, T. R. 2001. Exploiting redundancy in question answering. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 358–365. CLARKE, C. L. A., CORMACK, G. V., KEMKES, G., LASZLO, M., LYNAM, T. R., TERRA, E. L., AND TILKER, P. L. 2002. Statistical selection of exact answers (MultiText experiments for TREC 2002). In Proceedings of the TREC 2002 Conference. 823–831. CROWSTON, K. AND WILLIAMS, M. 1999. The effects of linking on genres of Web documents. In Proceedings of the 32nd Hawaii International Conference on System Sciences (Maui, Hawaii). DEMPSTER, A. P., LAIRD, N. M., AND RUBIN, D. B. 1977. Maximum-likelihood from incomplete data via the em algorithm. J. Royal Stat. Soc. Ser. B. (1977), 1–39. EICHMANN, D., RUIZ, M., AND SRINIVASAN, P. 1998. Cross-language information retrieval with the UMLS metathesaurus. In Proceeding of the ACM Special Interest Group on Information Retrieval (SIGIR). ACM, New York, 1998, 72–80. HAMMOND, K., BRUKE, R., MARTIN, C., AND LYTINEN, S. 1995. Faq-Finder: A case based approach to knowledge navigation. In Working Notes of the AAAI Spring Symposium on Information Gathering from Heterogeneous Distributed Environments. AAAI. HARABAGIU, S. AND MAIORANO, S. 1999. Finding answers in lLarge collections of texts: Paragraph indexing + abductive inference. In Proceedings of the AAAI Fall Symposium on Question Answering Systems. 63–71. HSIEH, J. H., WU, C. H., AND FUNG, K. A. 2003. Two-stage story segmentation and detection on broadcast news using genetic algorithm, In Proceedings of the 2003 ISCA Workshop on Multilingual Spoken Document Retrieval (MSDR2003). 55-60. KWOK, C., ETZIONI, O., AND WELD, D. S. 2001. Scaling question answering to the Web. ACM Trans. Inf. Syst. 19, 3 (2001), 242–262. LEE, S. AND LEE, G. G. 2003. Use of dynamic passage selection and lexico-semantic patterns for Japanese natural language question answering. IEICE Trans. Inf. Syst. E86-D, 9 (2003), 1638–1647. LENZ, M., HBNER, A., AND KUNZE, M. 1998. Question answering with textual CBR. In Proceedings of the International Conference on Flexible Query Answering Systems (Denmark). 236–247. LIN, D. AND PANTEL, P. 2001. Discovery of inference rules for question-answering. Natural Language Engineering 7, 4 (2001), 343–378. MANNING, C. D. AND SCHUTZE, H. 1999. Fundamentals of Statistical Natural Language Processing. MIT Press, Cambridge, MA, 1999, 554–566. NA, S. H., KANG, I. S., LEE, S. Y., AND LEE, J. H. 2002. Question answering approach using a WordNet-based answer type taxonomy. In Proceedings of the TREC 2002 Conference. 512–519. MOLDOVAN, D., PASCA, M., HARABAGIU, S., AND SURDEANU, M. 2003. Performance issues and error analysis in an open-domain question answering system. ACM Trans. Inf. Syst. 21, 2 (2003), 133–154. PARANJPE, D., RAMAKRISHNAN, G., AND SRINIVASAN, S. 2003. Passage scoring for question answering via Bayesian inference on lexical relations. In Proceedings of the TREC 2003 Conference. RADEV, D. R., QI, H., ZHEN, Z., SASHA, B. G., ZHANG, Z., FAN, W., AND PRAGER, J. 2001. Mining the Web for answers to natural language questions. In Proceedings of the Tenth International Conference on Information and Knowledge Management. 143–150. SCHAPIRE, R., SINGER, Y., AND SINGHAL, A. 1998. Boosting and rocchio applied to text filtering. In Proceedings of SIGIR-98. The 21st ACM International Conference on Research and Development in Information Retrieval. ACM, New York.


16


•

17

SNEIDERS, E. 2002. Automated question answering using question templates that cover the conceptual model of the database, natural language processing and information systems. In Proceedings of the NLDB'2002 Conference (Stockholm). LNCS 2553, Springer, New York. 235–239. SNEIDERS, E. 1999. Automated FAQ answering: Continued experience with shallow language understanding. In Proceedings for the 1999 AAAI Fall Symposium on Question Answering Systems. TOMURO, N. 2002. Question terminology and representation for question type classification. In Proceedings of the 2nd International Workshop on Computational Terminology (COMPUTERM02). TONG, R., QUACKENBUSH, J., AND SNUFFIN, M. 2003. Knowledge-based access to the bio-medical literature, o ntologically-grounded experiments for the TREC 2003 genomics track. In Proceedings of TREC 2003 Conference. TURCATO, D., POPOWICH, F., TOOLE, J., FASS, D., NICHOLSON, D., AND TISHER, G. 2000. Adapting a synonym database to specific domains. In Proceedings of the ACL'2000 Workshop on Information Retrieval and Natural Language Processing (Hong Kong, Oct. 2000). WANG, H. L., WU, S. H., WANG, I.. C., SUNG, C. L., HSU, W. L., AND SHIH, W. K. 2000. Semantic search on Internet tabular information extraction for answering queries. In Proceedings of the Ninth International Conference on Information and Knowledge Management. 243-249. WHITEHEAD, S. D. 1995. Auto-FAQ: An experiment in cyberspace leveraging. Computer Networks and ISDN Systems 28,1/2 (1995), 137–146. YEH, J. F., WU, C. H., CHEN, M. J., AND YU, L. C. 2004. Automated alignment and extraction of bilingual ontology for cross-language domain-specific applications. In Proceeding of the COLING 2004 Conference. Received July 2004; revised November 2004; accepted December 2004


17

Domain-Specific FAQ Retrieval Using Independent ... - Semantic Scholar

Domain-Specific FAQ Retrieval Using Independent ... - Semantic Scholar

Suggest Documents

language independent document retrieval using

FAQ Finder - Semantic Scholar

HISTOGRAM-BASED IMAGE RETRIEVAL USING ... - Semantic Scholar

Fire Temperature Retrieval Using Constrained ... - Semantic Scholar

Interactive Document Retrieval Using Faceted ... - Semantic Scholar

Endoscopic foreign body retrieval using ... - Semantic Scholar

Software Design Retrieval using Bayesian ... - Semantic Scholar

Spoken Document Retrieval Using Multilevel ... - Semantic Scholar

VIDEO BACKGROUND RETRIEVAL USING ... - Semantic Scholar

Concern Localization using Information Retrieval ... - Semantic Scholar

Optimizing Information Retrieval Using ... - Semantic Scholar

Dissociating Memory Retrieval Processes Using ... - Semantic Scholar

ENDOMICROSCOPIC VIDEO RETRIEVAL USING ... - Semantic Scholar

XML Information Retrieval Using SQL2XQuery - Semantic Scholar

Cross-Language Information Retrieval using ... - Semantic Scholar

Optimizing Information Retrieval Using ... - Semantic Scholar

Multilingual Information Retrieval using GHSOM - Semantic Scholar

CONTENT-FREE IMAGE RETRIEVAL USING ... - Semantic Scholar

Book Related Information Retrieval Using ... - Semantic Scholar

Topic Detection Using Independent Component ... - Semantic Scholar

Independent Studies Using Deep Sequencing ... - Semantic Scholar

Extracting Randomness Using Few Independent ... - Semantic Scholar

Automated Concept Location Using Independent ... - Semantic Scholar

Independent Component Analysis Using Convex ... - Semantic Scholar