Learning Feature Taxonomies for Case Indexing

1 downloads 0 Views 406KB Size Report
212 Kalyan Moy Gupta, David W. Aha, and Philip Moore ... introduced a semi-automated framework (named FACIT) to ease the task of con- .... predicate calculus operations to logical forms, such as those we will use to select and .... This step converts each feature into its signature, S, which includes a list of signature.
Learning Feature Taxonomies for Case Indexing Kalyan Moy Gupta1, David W. Aha2, and Philip Moore1 1 ITT

Industries, AES Division, Alexandria, VA 22303 Center for Applied Research in Artificial Intelligence, Naval Research Laboratory (Code 5515), Washington, DC 20375 [email protected] 2 Navy

Abstract. Taxonomic case retrieval systems significantly outperform standard conversational case retrieval systems. However, their feature taxonomies, which are the principal reason for their superior performance, must be manually developed. This is a laborious and error prone process. In an earlier paper, we proposed a framework for automatically acquiring features and organizing them into taxonomies to reduce the taxonomy acquisition effort. In this paper, we focus on the second part of this framework: automated feature organization. We introduce TAXIND, an algorithm for inducing taxonomies from a given set of features; it implements a step in our FACIT framework for knowledge extraction. TAXIND builds taxonomies using a novel bottom up procedure that operates on a matrix of asymmetric similarity values. We introduce measures for evaluating taxonomy induction performance and use them to evaluate TAXIND’s learning performance on two case bases. We investigate both a knowledge poor and a knowledge rich variant of TAXIND. While both outperform a baseline approach that does not induce taxonomies, there is no significant performance difference between the TAXIND variants. Finally, we discuss how a more comprehensive representation for features should improve measures on TAXIND’s learning and performance tasks.

1 Introduction Retrieval of text documents can be significantly improved by semantically indexing them with their concepts and relations rather than using the keyword and term indexing technique that is standard in information retrieval systems. Semantic indices can be used to guide the user during query formulation and to effectively retrieve documents at a conceptual level, which should increase user satisfaction. Conversational case-based reasoning (CCBR) (Aha et al., 2001) is a suitable methodology for conceptual retrieval of text documents. CCBR is a case-based reasoning (CBR) methodology (Aamodt & Plaza, 1994; Watson, 1999) in which a user engages in a question answer dialog (i.e., a conversation) with the system to incrementally specify a query. In response, the system displays ranked solutions with increasing precision. The CCBR methodology has been used to develop hundreds of customer support and equipment troubleshooting applications (Watson, 1997). In these diagnosis tasks the CCBR system identifies and retrieves information that could be used to solve the problem described by the user’s query. This retrieved information could be in the form of text documents. Thus, indices must be assigned to these documents so that users can efficiently retrieve them through an incremental querying process.

P. Funk and P.A. González Calero (Eds.): ECCBR 2004, LNAI 3155, pp. 211–226, 2004. © Springer-Verlag Berlin Heidelberg 2004

212

Kalyan Moy Gupta, David W. Aha, and Philip Moore

Creating semantic indices for text documents can be a laborious and time-consuming task. For example, this is true for the Taxonomic CCBR methodology, which outperforms other CCBR approaches (Gupta et al., 2002). The taxonomic method requires constructing a set of feature taxonomies for indexing the documents. Although this distributed indexing approach imposes strict constraints for guiding taxonomy development (e.g., each feature in a taxonomy must be a semantic refinement of its parent, and there can be at most one leaf in a taxonomy per document), constructing the taxonomies is difficult because there is a large space of possible taxonomy sets to choose from. Currently, constructing taxonomies is a manual process; a knowledge engineer must create a set and validate that it yields good retrieval performance. Clearly, the process of taxonomy generation could be accelerated and improved by automating it. We introduce and describe an initial empirical analysis for TAXIND (TAXonomy INDuction), an automated approach for organizing extracted features into a set of taxonomies. This builds on our earlier research (Gupta & Aha, 2004), in which we introduced a semi-automated framework (named FACIT) to ease the task of constructing taxonomies for Taxonomic CCBR applications. TAXIND focuses on FACIT’s feature organization subtask. We report results showing that, for a pair of case bases, TAXIND outperforms a baseline strategy on the learning task (i.e., taxonomy induction), but variants of TAXIND that exploit semantic knowledge do not significantly enhance its performance. This motivates our future investigation of a semantically richer approach for feature organization. The rest of this paper focuses on TAXIND’s learning task. Section 2 describes the context for this problem, explaining how it differs from previous research. Sections 3 and 4 introduce TAXIND and present its empirical evaluation. Finally, we discuss the results and our future plans for FACIT in Section 5.

2 Problem and Related Work 2.1 Taxonomic CCBR and the Feature Organization Task We address the problem of organizing features to index Notation

Table 1. Notation used in Section 2.1. Meaning

documents for a Taxonomic CCBR methodology (FigCase c ure 1), which implements a cp and cs Problem and solution components of case c mixed-initiative case reCase library L trieval process (Gupta, Query Q 2001). Table 1 summarizes Set of features (question-answer pairs) QA our notation. In this methqai ith feature (question-answer pair string) in QA odology, users incremenSet of induced taxonomies T tally provide a query Q. We Tk assume that Q is composed kth taxonomy in T of a set of features, each of ti Node i in a taxonomy which is a pair qai, although there is no theoretical constraint on the representation of CCBR queries and a structural representation could instead be used (Bergmann, 2002). As shown in step 1, the user describes an initial problem to the system by pro-

Learning Feature Taxonomies for Case Indexing User Interactions

Taxonomic CCBR System

1

Retrieve Cases Describe problem

Problem Description

Search

2

3

No

4

4’

213

Answer Questions

Ranked Questions

Suitable Solution Available?

Ranked Solutions

Yes View Case & Apply Solution Trigger Case Acquisition

Select & Rank Questions

Match

Rank & Filter

Feature Taxonomies Acquire New Case

Case Base

Fig. 1. The Taxonomic CCBR methodology.

Legend is-a-type-of relationship t1 Printing problem?=Yes

t2 Print quality problem? =Yes

t5 Quality problem looks like?= Black Streaks t6 Quality problem looks like?= Faded

t3 Printer prints black pages? =Yes t4 Printer prints blank pages? =Yes

t7 Printer prints extra blank pages? =Yes

Fig. 2. Subset of a feature taxonomy for a printer troubleshooting application (Gupta, 2001).

viding text that the system can convert to a set of feature(s). The system then compares Q with the problem component cp of each case c in a library L, where we assume a simple representation for cases (i.e., c=). For simplicity, we assume problems are also represented as a set of features and that solutions are text documents. Using a matching function, the system ranks each case c using sim(Q,cp), the similarity of its problem to Q. It then displays the solution cs for each retrieved case, along with questions in their problems that are not answered in Q. In step 2, the user may select and answer a proffered question, which adds a pair to Q, or otherwise modifies Q. The system then cycles with the modified query. Alternatively, the user could decide (step 3) to select and view a displayed solution (step 4). However, if none are acceptable and the user has completed the query, they may decide to trigger case acquisition (step 4’). The primary distinguishing characteristic of Taxonomic CCBR is its reliance on a set of feature subsumption taxonomies T. Each taxonomy TkT is an acyclic directed graph whose nodes ti are features drawn from the features of L’s cases. All nodes in T except the root are related to their parent node by either an is-a-type-of or is-a-part-of

214

Kalyan Moy Gupta, David W. Aha, and Philip Moore

Initial Semantic Lexicon

1. Update Lexicon

Lexicon Engineering

Domain specific concepts

Source Documents (e.g. Troubleshooting Manual, Fault Reports)

Taxonomic Case Index Acquisition 2. Syntactic Parsing

Text Processing

Parsed text

Updated Semantic Lexicon

3. Semantic Interpretation Source logical form

4. Feature Extraction

Knowledge Extraction

Features

5. Feature Organization Feature taxonomies

6. Case Index Assignment

Case indices

User 7. Conversational Case Retrieval (TCRS)

Case Library

Fig. 3. The FACIT framework processes and steps, feeding into a TCRS application.

relation. We say that a node subsumes all its descendent nodes (i.e., while we label each node using a pair, they denote a class, such as the class of printing problems, on which subsumption is well defined). Figure 2 shows a subset of the taxonomy from a CCBR application for a printer troubleshooting domain. In this figure, node t2 represents the pair , has one parent node (t1) and two children nodes (t5 and t6). In this paper we address the problem of how to organize a given set of features into a set of subsumption taxonomies. Figure 3 shows this as the feature organization task (step 5) of FACIT (Feature Acquisition and Case Indexing from Text), the knowledge extraction framework we introduced in (Gupta & Aha, 2004). FACIT requires an initial semantic lexicon and a set of source documents as input. We argued that this lexicon should be generative rather than enumerative in its behavior, and should exploit a representation that extends Generative Lexicon theory (Gupta & Aha, 2003). We also argued for extracting a source logical form for each concept in the documents; it uses predicate argument structures to represent the meaning of sentences contained in the text as propositions. This representation eliminates the syntactic variances in text because sentences with different grammatical structure but the same meaning will have, or be reducible to, the same logical form. Also, we can apply predicate calculus operations to logical forms, such as those we will use to select and organize features into taxonomies. To date, we have only partially implemented FACIT. We have developed a suite of tools for lexicon engineering and syntactic parsing, along with a preliminary implementation for semantic interpretation. Also, we have developed TCRS, an implementation of Taxonomic case retrieval (Gupta et al., 2002). However, we have not previ-

Learning Feature Taxonomies for Case Indexing

215

ously developed tools for feature extraction, feature organization, and case indexing. TAXIND is FACIT’s initial feature organization component (step 5 in Figure 3). Our objective for introducing and evaluating it is to gain further understanding on the feature organization process. Because the source logical forms for the extracted features do not yet exist, we will use the representation for them, where both of its components are represented by text strings. We also have a lexicon that can help identify subsumption relations for these features, although together they do not support the more powerful logical subsumption approach that we will use when FACIT is fully implemented. We discuss this further in Section 5. The learning task we address here is the induction of a set of feature-subsumption taxonomies from a set of cases whose problems are defined by a set of features. The taxonomies are constrained; a feature can appear in only one taxonomy, and only once in that taxonomy. Also, among the features at the leaves of a hierarchy, at most one can appear in a case. See (Gupta, 2001) for more details on these constraints. The performance task that we address concerns TCRS’s retrieval performance. 2.2 Related Work TAXIND’s learning task is unusual because it must induce one or more subsumption taxonomies from a given set of features, and the taxonomies have certain constraints. This is an unsupervised task: no labels are provided with these features (e.g., an associated class label). Yang and Wu (2001) investigated the induction of a set of taxonomies (i.e., “decision forest”) for their CaseAdvisor CCBR system. However, nodes in their trees are attributes rather than features, they do not discover nor use feature subsumption to organize their trees, and their case indices do not have a distributed representation. Aha et al.’s (2001) CLIRE also induces a tree for a CCBR system, but it induces only a single tree, and uses attributes rather than features at nodes. Neither approach used the semantics of features to guide taxonomy induction. In the greater CBR and related literatures (e.g., machine learning), we again find several algorithms exist that induce case-indexing hierarchies (e.g., Daelemans et al., 1997). However, unlike TAXIND, these algorithms typically do not consider the semantics of the features, nor induce trees intended to guide a mixed-initiative querying process. FACIT relates to Textual CBR (Lenz et al., 1998) because it focuses on textual cases. TAXIND induces a distributed indexing representation that is also a characteristic of case retrieval nets (CRNs), which Lenz and his colleagues used for several Textual CBR problems. However, TAXIND induces strict taxonomies rather than a more general semantic network, it does not use spreading activation for case retrieval, and we designed it specifically for the CCBR methodology. Several researchers (e.g., Müller et al., 1999; Kashyap et al., 2004) have investigated methods to learn taxonomies for information retrieval tasks. Their approaches have typically involved using clustering algorithms to create intermediate nodes. In contrast, TAXIND must select and relate leaf nodes to the intermediate nodes (i.e., features), which are already given. FACIT is a knowledge extraction framework, which differs from information extraction frameworks that assume the existence of an initial set of features and focus on the simpler task of feature assignment (i.e., determining which among a given set of

216

Kalyan Moy Gupta, David W. Aha, and Philip Moore

features to use as indices for a case). FACIT does not make this initial assumption. We discuss other issues concerning FACIT in (Gupta & Aha, 2004).

3 The TAXIND Approach for Learning Indexing Taxonomies In this section we present TAXIND. It organizes a set of features QA, each of which is a pair, into a set of taxonomies T as follows: (1) Feature pre-processing: This step prepares each qai∈QA for similarity computations using either a bag-of-words or a phrasal tokenization process. (2) Similarity matrix computation: This step computes the similarity of a feature qai to all other features qaj∈QA (ij) and generates a feature similarity matrix. When the features have been phrasally processed, this step can use application-specific ontologies to improve similarity computation. (3) Taxonomy induction: This step identifies potential subsumption relations among features in QA using the feature similarity matrix and a two step regularization subprocess that induces a strict taxonomy, which we present below using a standard printer troubleshooting application. 3.1 Feature Pre-processing This step converts each feature into its signature, S, which includes a list of signature elements for use in computing similarity between any two features. Each signature element consists of a token t, the frequency f with which the token occurs in the feature, and its weight w (see Section 3.2). TAXIND creates signature element tokens via either bag-of-words processing, which creates signature elements for each word in a feature, or phrasal processing, which parses each feature into a signature of elements containing part-of-speech tagged phrases as tokens. With the assistance of an application-specific ontology, phrasal tokens can be used to compute the similarity between two terms that cannot be otherwise related. For example, it would allow us to compute the similarity between two distinct phrases “jam[n]” and “problem [n]”. The following paragraphs provide additional detail on these two processing methods. Bag-of-words processing: This approach converts a feature into its signature with words as tokens. No stemming is performed. A list of predefined stopwords is used to eliminate words deemed useless for similarity assessment. For example, assuming the stopwords are, you, and ?, then the feature would have five signature elements with word-tokens having, print, quality, problems, and Yes, each with a frequency of one. Their weights are assessed during similarity matrix computation. Phrasal processing: This approach tags each word in a given feature with its part of speech and creates phrases to enable syntactic phrasal subsumption detection and similarity computation with an application-specific ontology (see Section 3.2). It then removes pronouns, prepositions, and linking verbs, which has an effect similar to using the stopwords in the bag-of-words approach. Unlike the stopwords approach, this procedure does not require any additional knowledge engineering effort. Simple inflexional morphological processing is also performed. For example, verb forms are

Learning Feature Taxonomies for Case Indexing

217

reduced to their base forms (e.g., “having [v]” is transformed to “have [v]”) and plural nouns are reduced to their singular forms (e.g., “problems [n]” is transformed to problem [n]”). Table 2 shows the three signature elements with phrasal tokens generated by processing the example feature.

Table 2. Phrase-processed signature for the feature . Token (t)- Phrase have [v] * print [n] quality[n] print quality problem [n] ** Yes [d]

Frequency (f) 1 1 1 1 1

Weight (w) undefined undefined undefined undefined undefined

* having was stemmed to have ** noun phrase generated and problems stemmed to problem [] part-of-speech tags: [v]-verb, [n]-noun, [d]-adverb

3.2 Feature Similarity Matrix To organize question-answer features into taxonomies we need to identify subsumption (is-a-type-of) relations between them. Subsumption is a directed relation: for any two distinct features qa1 & qa2, at most one subsumes the other (i.e., subsumes(qa1,qa2), subsumes(qa2,qa1), or neither is true) and, by definition, if either feature subsumes the other then the truth values of subsumes(qa1,qa2) and subsumes(qa2,qa1) must differ. By defining similarity as subsumption (i.e., yielding “1” for an ordered pair of features, and “0” otherwise), then if sim(qa1,qa2) denotes the similarity of qa1 with respect to qa2, sim(qa1, qa2)  sim(qa2, qa1) when either similarity value is non-zero. Thus, sim() is an asymmetric similarity function. Also, we compute it using the features’ signatures, and define it to yield values in [-1,1]. Signature Similarity Computation: We define the similarity of two signatures sim(S1,S2) as a weighted sum of their token similarities, as shown in Equation 1: sim( S i , S j ) =

∑ sim (t , t ∈ S ∑w t

i

∀ti∈Si

j

) * wt

(1)

t

∀t∈S j

where Si and Sj are the signatures of features qai and qaj, respectively, simt() defines the similarity among a token and a signature (see below), and wt is the weight associated with token t. Token weights are computed using an adaptation of the term frequency (tf) and inverse document frequency (idf) weight computation procedure used in information retrieval systems, as shown in Equation 2: (2) wt = tf * log( N / df ) where N is the number of features used in a given case library, tf is the frequency of token t among them, and df is the number of the cases that contain t.

218

Kalyan Moy Gupta, David W. Aha, and Philip Moore

Token Similarity Computation. The token similarity function simt() (Equation 3) compares a token from one signature to all the tokens in the other feature’s signature and returns the first non-zero similarity value found.  if String ( t k ) = String ( t l ) then 1 .0   else if phrasalSim ( t , t ) > 0 then phrasalSim ( t , t )  (3)  k l k l  sim t ( t k , t l ) =   ≠ ( , ) 0 ( , ) else if ontoSim t t then ontoSim t t k l k l    else 0  simt() returns values in [-1,1]. If defined using only the first and final lines of this equation, then simt() would define a string-comparison function that yields 1 for identical strings and 0 otherwise. However, this string-comparison function would be inadequate for tokens that do not share any words in common, but are semantically related. Therefore, we introduce two additional types of similarity computations: (a) Syntactic phrasal similarity: phrasalSim() computes the similarity between multiword phrases that have the same last word. For example, “problem” is the last word in “print quality problem”. This computation can be used to identify subsumption relations between phrases. It computes the similarity of a phrase-token with respect to another as the ratio of the number of words they have in common and the number of words in the other phrase:

∑ f ( w, t ) j

phrasalSim(ti , t j ) =

w∈ti ∩t j

(4)

∑ f ( w, t ) j

w∈t j

where f(w,tj) is the number of times word w occurs in token tj. This asymmetric function can be applied only when phrasal pre-processing has been performed. For example, phrasalSim(“Problem [n]”, “Print quality problem [n]”)=1/3=0.33 and phrasalSim(“Print quality problem [n]” ,“Problem [n]”,)=1/1=1.0. Clearly, “Print quality problem” is-a-type-of problem. Section 3.3 describes how we use similarity values to detect subsumption relations. While phrasalSim() can help detect subsumption relations between phrases, it is still inadequate for semantically related phrases that do not share tokens. To compute similarity between semantically related phrases, we rely on predefined application specific ontologies, which might specify these relations between terms (see (b) below). (b) Ontological similarity: Similarity computations can be extended by using an application-specific ontology. We assume the existence of simple ontologies that organize terms with is-a-type-of, is-a-part-of, and is-opposite-of relations. Linguistic ontologies (i.e., semantic lexicons) like WordNet (Felbaum, 1998) and more sophisticated generative ontologies (Gupta & Aha, 2003) include these relations. As seen in Figure 4, we represent the relationship between “error” and “problem” using an is-a-type-of relation. Likewise, “printing” has “quality” or “quality” isa-part-of “printing” for this printer troubleshooting application. The similarity between two terms in the ontology is computed as shown in Equation 5.

Learning Feature Taxonomies for Case Indexing

219

is-opposite-of

Printer [n] Is-part-of

problem [n]

yes [d]

no [d]

Is-part-of is-a-type-of

is-a-type-of

printing [n] Is-part-of

quality [n]

error [n]

jam [n]

streak [n]

Fig. 4. A fragment from the Printer troubleshooting ontology.  1 /(1 + path _ length (t i , t j )   ( ( ) if t is − a − type − of t ∨ j i    t j is − a − part − of (t i ))    ontoSim(ti , t j ) = 1.0 if t i is − a − type − of (t j )  0.8 if t is − a − part − of (t )  i j   − 1 if t i is − opposite − of (t j )     0 otherwise

(5)

The path length from a node n to one of its descendants d is the number of links from n to d. Therefore, using the ontology fragment shown in Figure 5 ontoSim(“printing [n]”, “quality [n]”)= 1/(1+1) = 0.5 and ontoSim(“quality [n]”, “printing [n]”) = 1.0. Likewise ontoSim(“yes [d]”, “no [d]”) = -1.0. 3.3 Taxonomy Induction Identifying Taxonomic Relations. We denote a taxonomic relation between two features, qai and qaj (i j), by qai qaj, implying that qaj is-a-subtype-of qai. We identify potential taxonomic relations between features by reference to the feature similarity matrix. The following rule for selecting taxonomic relations is used: IF (sim(Si, Sj) > Π) & (sim(Sj, Si) > Π) THEN IF (sim(Si, Sj) > sim(Sj, Si)) THEN qai qaj qai ELSE qaj ELSE IF (sim(Si, Sj) > Π) & (sim(Sj, Si) > Ψ) THEN qaj qai ELSE IF (sim(Sj, Si) > Π) & (sim(Si, Sj) > Ψ) THEN qai qaj ELSE No relation where Π is a user-specified parent selection threshold, typically set to a value representing a high degree of similarity (e.g., 0.5), and Ψ is the user-specified child selection threshold (Ψ sim(qa3, qa2), then it retains qa1 qa3 and marks qa2 qa3 for deletion. The relations marked for deletion are removed at the end of this step and the remaining taxonomic relations are transformed into taxonomies.

4 Evaluation 4.1 Methodology Performance Measures: We define TAXIND’s learning performance as its ability to accurately induce (i.e., generate) taxonomic relations from a space of possible relations. The relations in a taxonomy are the those between its nodes and their descendants (see Figure 5). For single-node taxonomies, a null relation is counted.

Fig. 5. Example taxonomies and their relations.

For this evaluation, we assume the availability of a “gold-standard” (i.e., ideal) set of taxonomic relations that can be compared with the relations induced by TAXIND. In our evaluation we used a printer troubleshooting and a consumer electronics taxonomic case base. In our previous work, we manually prepared taxonomies for these databases that employ the gold-standard relations we need for this evaluation.

Learning Feature Taxonomies for Case Indexing

221

Like the Recall, Precision, and F-Measure equations used to evaluate information retrieval performance, we propose the following three measures: # Gold −Standard Relations Retrieved (6) # Gold −Standard Relations # Gold −Standard Relations Retrieved (7) Taxonomic Relations Precision (TRP) = # Total Relations Retrieved 2 * TRR * TRP Taxonomic F −Measure (TF−Measure) = (8) (TRR + TRP) For example, using the gold-standard and the relations shown in example 2 of Figure 5 yields: TRR=2/6=0.33, TRP=2/4=0.5, and TF-Measure=0.4. An alternative to inducing taxonomies is a degenerate approach that generates only single-node taxonomies. That is, it creates one taxonomy for each feature qai∈QA. In the absence of alternative taxonomy generation methodologies, we will use this approach as a baseline for assessing TAXIND’s performance. For example, the learning performance values for the baseline approach for example 1 in Figure 5 are: TRR=0/6=0, TRP= 0/4=0, and TF-Measure=0. Test Data: We selected two taxonomic case bases (Gupta et al., 2002) pertaining to printer troubleshooting (“Printer”) and consumer electronics troubleshooting (“Electronics”) as our test data sets. Their existing taxonomies were used as gold standards for our empirical evaluation of TAXIND. Printer’s cases refer to 54 distinct features, while Electronics’ cases refer to 121. Test Procedure and Hypotheses: In addition to varying the dataset, we also varied the version of TAXIND that we applied. In particular, we used two versions: 1. Knowledge poor: This used bag-of-words processing with the string similarity computation option. 2. Knowledge rich: This used phrasal processing with string similarity, phrasal similarity, and ontological similarity. We ran both versions of TAXIND for each application using a range of thresholds for Π and Ψ, report only the best results in Section 4.2, and evaluated two hypotheses: Taxonomic Relations Recall (TRR) =

(1) Both versions of TAXIND will outperform the baseline approach. (2) The knowledge rich approach will outperform the knowledge poor approach. TAXIND Implementation: We implemented TAXIND and used our Java implementation of Brill’s (1992) Tagger for phrasal processing. We compiled a list of stopwords and domain-specific ontologies for Printer and Electronics. We also developed ontologies using our Generative Sublanguage Ontology Editor. Only is-a-type-of, isa-part-of, and is-opposite-of were used from among the many possible types of relations available. 4.2 Analysis of the Learning and Performance Tasks The learning task being addressed is TAXIND’s ability to induce good taxonomies, as assessed by the measures described in Section 4.1. After discussing its learning results, we will then briefly summarize its results for our intended performance task: retrieval performance when using the resulting taxonomic cases in TCRS.

222

Kalyan Moy Gupta, David W. Aha, and Philip Moore

Fig. 6. A taxonomy for Electronics learned by the knowledge poor version of TAXIND.

To illustrate TAXIND’s capabilities, we display one of the taxonomies that the knowledge poor approach learned for Electronics in Figure 6. It has three features that do not belong to it by reference to its gold standard. Still, the knowledge poor approach is effective when the cases have been prepared using consistent terminology and phrasing by experienced case base developers, as was done for both data sets. Tables 3 and 4 display the results of the baseline and TAXIND variants for the two case bases. The knowledge poor and the knowledge rich approaches did indeed outperform the baseline approach for both datasets. For example, this is clear from the results of the TF-Measures in these tables. Therefore, our first hypothesis has some support. TAXIND performs comparatively better on Electronics than on Printer because the expected taxonomies for Electronics are deeper and more complex. Table 3. Baseline and TAXIND’s performances for the Printer case base. Option Baseline Knowledge poor1 Knowledge rich2

Generated Taxonomies Avg. Max. Single No. Depth Depth Nodes 54 0.0 0 54 43 0.20 1 40 36 0.33 1 27

TRR

TRP

TF

0.50 0.50 0.48

0.50 0.57 0.58

0.50 0.53 0.53

1: Π=0.30, Ψ= 0.08; 2: Π=0.45, Ψ= 0.18

Table 4. Baseline and TAXIND’s performances for the Electronics case base. Option Baseline Knowledge poor1 Knowledge rich2

No. 121 73 96

Generated Taxonomies Avg. Max. Single Depth Depth Nodes 0.0 0 121 0.52 3 62 0.29 3 91

TRR

TRP

TF

0.43 0.51 0.52

0.59 0.69 0.70

0.50 0.58 0.60

1: Π=0.63, Ψ= 0.15; 2: Π=0.65, Ψ= 0.23

There was little learning performance difference between the knowledge poor and knowledge rich versions of TAXIND on both the sets. The knowledge rich version

Learning Feature Taxonomies for Case Indexing

223

does marginally better in TRP (0.58 vs. 0.57 on Printer and 0.70 vs 0.69 on Electronics). This small improvement in TRP is offset by a decrease in TRR as in the case of Printer (0.48 vs. 0.50). This result leads to us to reject our second hypothesis that the knowledge rich approach will outperform the knowledge poor approach on the measures we are using. Adding background knowledge at a phrasal level and performing linguistic (i.e., phrasal and morphological) processing did not noticeably improve TAXIND’s performance. In particular, the improvement in computation of token level similarities did not appear to have a significant impact on the overall similarity computations due to their aggregation, and this marginal improvement did not have a large positive effect on TAXIND’s overall performance. We reviewed the taxonomic relations in the gold standard taxonomies with reference to the relations in the background ontologies. Because the Taxonomic CCBR methodology uses strict taxonomies (i.e., a feature can only have one parent), ambiguities can arise when assigning features to taxonomies. The knowledge engineer must often make an assessment based on cases at hand to resolve these ambiguities and organize the features. For example in the printer gold standard the feature is taxonomically related to instead of because “13 Paper Jam” is a type of “Error Message”. In other cases, complex negations, which are difficult to detect using only phrasal processing, were missed. For example, is a subtype of . In addition to assessing TAXIND’s comparative abilities on the learning task (i.e., taxonomy induction), we also briefly examined its capabilities on the performance task (i.e., retrieval using TCRS). In our previous work (Gupta et al., 2002), we used the leave-one-in evaluation methodology, which was designed for data sets in which each case’s solution was unique (Aha et al., 2001). For each “target” case, the simulated user submitted, for each of its conditions, one query for its node and for each of that condition’s ancestor nodes in the same taxonomy. For each query, a simulated TCRS conversation iterated until the target case attained a similarity value of 100% and all its questions were answered. The simulated user answered exactly one question per iteration by selecting the highest-ranking question that is answered in the target case. For each iteration, we recorded the rank of the target case, number of retrieved cases, rank of the answered question, and the total number of questions presented. TCRS retrieved all cases with similarities greater than zero and presented all eligible questions selected by the methodologies. Table 5 shows the results obtained from running three of the four approaches tested on the Printer case base. That is, we report results for the baseline (i.e., degenerate), gold-standard, and TAXIND’s knowledge poor approach. Although we already have the results for the baseline and gold-standard taxonomies, this required creating a taxonomic case base of the printer application using the taxonomies induced by the knowledge poor approach. Therefore, we re-indexed the original set of standard CCBR printer cases on TAXIND’s induced taxonomy. The induced taxonomy contained some errors, including incorrect and missing taxonomic relations. Therefore, some of the cases could not be fully indexed. Nonetheless, we expected the end-user performance of this partially correct case base to be better than the baseline CCBR and poorer than the gold-standard taxonomies.

224

Kalyan Moy Gupta, David W. Aha, and Philip Moore

Table 5. Comparison of Expert End user TCRS performance on Printer with taxonomies induced by TAXIND’s knowledge poor approach. Measures

Rank of Retrieved Case No of Retrieved Cases Length of Conversation Rank of Ans. Question No. of Ques. Displayed

Expert End User Performance Baseline Gold-standard Knowledge Poor (Degenerate) Taxonomic Taxonomic 2.03 1.59 1.38 6.02 2.85 3.45 5.20 2.67 3.22 1.10 1.00 1.10 7.30 3.62 2.82

The results shown in Table 5 confirm our expectation. For example, the number of retrieved cases of Baseline, Gold-Standard Taxonomic, and Knowledge Poor Taxonomic are 6.02, 2.85, and 3.45 respectively. Thus, TAXIND retrieved fewer cases on average than did the baseline approach, but didn’t perform quite as well as when TCRS used the gold-standard taxonomies. Likewise, the results for conversation length and rank of answered questions were expected. However, the average rank of the retrieved case and the number of questions presented suggested that TAXIND’s taxonomies outperformed the gold standard. These are anomalous results; they are more positive for TAXIND then expected due to the errors we described above concerning case indexing. In future comparisons, a human many be required to be in the loop to fix the errors in the set of induced taxonomies prior to their evaluation with TCRS.

5 Discussion Creating case bases and gold standard taxonomies for the types of analysis we described is a manpower-intensive effort. This limited the extent of experimentation discussed here. We plan to create gold-standard taxonomies for additional case bases in our future work, thus permitting a more comprehensive evaluation of TAXIND. TAXIND induces taxonomies that must adhere to the constraints imposed by the Taxonomic CCBR methodology. For example, these are feature subsumption taxonomies rather than decision trees, a feature can be in only one taxonomy, and there can be at most one leaf per case in each taxonomy. Thus, it is difficult to compare TAXIND’s learning approach with existing approaches that induce taxonomies that do not abide by these constraints. The logical form representation that we intend to use to represent features will be derived using a generative ontology approach that we described in (Gupta & Aha, 2003). This representation captures the meaning of features more accurately, and is able to support negations and more complex semantic equivalences that are not recognized by TAXIND’s representation for features as described in this paper. Using this more powerful representation should yield a taxonomy induction algorithm for FACIT with dramatically improved performance. Finally, we will analyze TAXIND’s computational complexity, and will empirically compare its performance vs. the logical form approach described above.

Learning Feature Taxonomies for Case Indexing

225

6 Conclusion Manually generating taxonomies for conversational CBR applications is laborious. We described TAXIND (TAXonomy INDuction), a machine learning approach, for inducing these taxonomies. We evaluated two versions of TAXIND on two case bases versus a baseline strategy that did not induce any taxonomies. To do this, we compared their abilities to induce the taxonomic relations found in the gold-standard set of taxonomies that we had previously developed manually for both case bases. We found that TAXIND outperformed the baseline strategy on both our learning (i.e., taxonomy induction) and performance (i.e., case retrieval) tasks. In our future work, we intend to conduct a more thorough evaluation of TAXIND and investigate an alternative approach that exploits a logical form representation for features.

Acknowledgements This research was supported by the Naval Research Laboratory. Thanks to Kurt Fenstermacher and our reviewers for comments on an earlier version of this paper.

References Aamodt, A., & Plaza, E. (1994). Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Communications, 7, 39-59. Aha, D.W., Breslow, L.A., & Munoz-Avila, H. (2001). Conversational case-based reasoning. Applied Intelligence, 14(1), 9-32. Bergmann, R. (2002). Experience management: Foundations, development methodology, and Internet-based applications. Berlin: Springer. Brill, E. (1992). A simple, rule-based part-of-speech tagger. Proceedings of the Third Conference on Applied Natural Language Processing. Trento, Italy: ACL. Daelemans, W., van den Bosch, A., & Weijters, T. (1997). IGTree: Using trees for compression and classification in lazy learning algorithms. Artificial Intelligence Review, 11, 407-423. Felbaum, C. (Ed.) (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press. Gupta K.M. (2001). Taxonomic case-based reasoning. Proceedings of the Fourth International Conference on CBR (pp. 219-233). Vancouver, Canada: Springer. Gupta, K.,M., & Aha, D.W. (2003). Nominal concept representation in sublanguage ontologies. Proceedings of the Second International Workshop on Generative Approaches to the Lexicon (pp. 53-62). Geneva, Switzerland: Univ. of Geneva. Gupta, K.M., & Aha, D.W. (2004). Acquiring case indexing taxonomies from text. In Proceedings of the Sixteenth International Conference of the Florida Artificial Intelligence Research Society. Miami Beach, FL: AAAI Press. Gupta, K.M., Aha, D.W., & Sandhu, N. (2002). Exploiting taxonomic and causal relations in conversational case retrieval. Proceedings of the Sixth European Conference on CBR (pp. 133-147). Aberdeen, Scotland: Springer. Kashyap, V., Ramakrishnan, C., Thomas, C., Bassu, D., Rindflesch, T.C., & Sheth, A. (2004). TaxaMiner: An experimentation framework for automated taxonomy bootstrapping. Unpublished manuscript. Lenz, M., Hubner, A., & Kunze, M. (1998). Textual CBR. In M. Lenz, B. Bartsch-Sporl, H.-D. Burkhard, & S. Wess (Eds.) Case-based reasoning technology: From foundations to applications. Berlin: Springer.

226

Kalyan Moy Gupta, David W. Aha, and Philip Moore

Müller, A., Dörre, J., Gerstl, P., & Seiffert, R. (1999). The TaxGen framework: Automating the generation of a taxonomy for a large document collection. Proceedings of the 32nd Hawaii International Conference on System Sciences (pp. 20-34). Maui, Hawaii: IEEE Press. Watson, I. (1997). Applying case-based reasoning: Techniques for enterprise systems. San Francisco, CA: Morgan Kaufmann. Watson, I., (1999). CBR is a methodology not a technology. Knowledge Based Systems Journal, 12(5-6), 303-308. Yang, Q., & Wu, J. (2001). Enhancing the effectiveness of interactive case-based reasoning with clustering and decision forests. Applied Intelligence, 14(1), 49-64.

Suggest Documents