Empirical Evaluation Of Retrieval In Case-based Reasoning Systems ...

2 downloads 0 Views 266KB Size Report
Abstract—Case-based reasoning (CBR) supports ill-structured decision making by retrieving previous cases that are useful toward the solution of a new ...
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 27, NO. 5, SEPTEMBER 1997

601

Empirical Evaluation of Retrieval in Case-Based Reasoning Systems Using Modified Cosine Matching Function Kalyan Moy Gupta and Ali R. Montazemi

Abstract—Case-based reasoning (CBR) supports ill-structured decision making by retrieving previous cases that are useful toward the solution of a new decision problem. The usefulness of previous cases is determined by assessing the similarity of a new case with the previous cases. In this paper, we present a modified form of the cosine matching function that makes it possible to contrast the two cases being matched and to include differences in the importance of features in the new case and the importance of features in the previous case. Our empirical evaluation of a CBR application to a diagnosis and repair task in an electromechanical domain shows that the proposed modified cosine matching function has a superior retrieval performance when compared to the performance of nearest-neighbor and the Tversky’s contrast matching functions.

I. INTRODUCTION

A

PPLICATION of computer-based technology depends on the nature of a decision problem which can either be categorized as structured, or ill-structured. Structured decision problems are repetitive, well-defined, and programmable. Illstructured decision problems are ad hoc, complex, and fuzzy. No standard solution exists for them [34]. Case-based reasoning (CBR) has opened up new vistas of computer-based support in the context of ill-structured decision problems [15], [25], [33], [38], [42]. To assist a decision maker (DM), the process followed by a CBR system is as follows [42], [49]. A previous case (or cases) similar to the new decision problem (new case) is (are) retrieved; the solution of the previous case is mapped as a solution for the new case; the mapped solution is adapted to account for the differences between the new case and the previous case; and the adapted solution is then evaluated against hypothetical situations. To aid in future decision making, feedback of the success or failure of the evaluated solution is obtained from the DM. Recent CBR systems mainly retrieve previous cases to offer decision support [1], [27]. The retrieval of relevant previous cases is critical to their success [37]. Central to retrieval methodologies is the assessment of the similarity of a new Manuscript received November 5, 1994; revised July 27, 1995 and October 26, 1996. This work was supported under Grant 39126 from the Natural Sciences and Engineering Research Council of Canada. K. M Gupta was with the Michael G. DeGroote School of Business, McMaster University, Hamilton, Ont., Canada L8S 4M4. He is now with Atlantis Aerospace Corporation, Brampton, Ont., Canada L6T 5E6. A. R. Montazemi is with the Michael G. DeGroote School of Business, McMaster University, Hamilton, Ont., Canada L8S 4M4. Publisher Item Identifier S 1083-4427(97)04999-0.

case with previous cases. Similarity is assessed by means of domain-specific heuristics and matching functions. This paper proposes a modified form of the cosine matching function which has capabilities in addition to those of the nearest-neighbor and Tversky’s matching functions. The proposed matching function is compared with these two other methodologies in an empirical investigation. The paper is structured as follows. Section II provides an overview of elements of retrieval; Section III presents issues related to matching; Section IV presents the modified cosine matching function; Section V describes an application of CBR to an industrial diagnosis and repair decision problem; Section VI describes our investigation for assessing the effectiveness of the proposed methodology; and Section VII presents the results of our investigation. A conclusion and a consideration of the directions of future research are presented in Section VIII. II. OVERVIEW

OF

RETRIEVAL

The aim of case-based retrieval is to retrieve the most useful previous cases toward the solution of a new decision problem and to ignore previous cases that are irrelevant [26], [29]. Retrieval in CBR proceeds as follows. Based on the description of a new decision problem (i.e., new case) the case base is searched for previous cases of potential use in providing decision support to a DM (see Fig. 1). Typically, the search is under constrained and a large number of previous cases are retrieved [3]. In many domains, the filtering of previous cases, based on exclusion criteria is possible [46]. This process involves comparison and filtering [4]. Previous cases that remain after filtering are matched and ranked in order of decreasing degree of match. Matching assesses the degree of similarity of a potentially useful previous case with a new case. III. MATCHING A case can be considered a schema comprising a set of attribute value pairs (i.e., descriptors) [14], [24]. For example, in a credit assessment decision scenario, a loan manager assesses several attribute value pairs (e.g., attribute “character of the applicant” has a value of “average”). Matching establishes the similarity of the schema of a new case with the schema of a previous case (e.g., [35]). Matching is performed in two steps. 1) similarity of the schemata of a new case with a previous case along the descriptors is assessed; and

1083–4427/97$10.00  1997 IEEE

602

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 27, NO. 5, SEPTEMBER 1997

Fig. 1. Components of retrieval in CBR.

2) overall similarity of the schemata is assessed by means of a matching function. Similarity of the schemata of two cases along descriptors has been assessed using domain knowledge in the form of heuristics and domain-specific matching rules (e.g., see [20], [39], and [50]. For example, a matching rule can be adopted to determine that the descriptor “color of the object” with a value orange is very similar to a descriptor with a value red. The overall similarity of a new case with a previous case is assessed by aggregation of similarity along descriptors by use of a matching function. Two matching functions presented in the CBR literature are 1) the nearest-neighbor matching function and 2) Tversky’s contrast function. We provide the pros and cons for each of these. A. Nearest-Neighbor Matching The nearest-neighbor matching function [11] is widely used in CBR systems [7], [10], [21], [24]. The function is

(1)

is the similarity of the new case with a where previous case along a descriptor pair, and is the weight of the th descriptor. The nearest-neighbor matching function assesses the overall similarity by a weighted linear combination of similarities along descriptors. The weights provide a surrogate method of representing the complex interrelationships in the decision domain; they represent the degree of importance of a descriptor toward the goal of a decision problem. The degree of importance (i.e., domain knowledge) can be used at two levels of granularity [24]. 1) Global when the degree of importance of a descriptor does not vary with the previous cases. For example, in the decision domain of military strategy, the ratio of the

strength of the opposing sides engaged in a battle is always weighted higher than their individual strengths [16]. 2) Local, when the degree of importance of a descriptor varies with the goal of the previous cases. For example, in CASEY the degree of importance of a test depends on the type of the heart disease [28]. The global degree of importance is coarse, while local weights are fine-grained and context sensitive. Generally, the weights are acquired from the domain experts by a knowledge engineer or by machine learning techniques, e.g., explanationbased learning [9], [21], [39]. However, some CBR systems use rules to determine the weights during retrieval (e.g., [2] and [28]). The nearest-neighbor matching function was derived from the literature on pattern matching. In pattern matching, all features (i.e., descriptors) of the domain, whose weights are globally determined by using an inductive learning technique, are included [11]. However, the use of nearest-neighbor matching function in CBR gives rise to limitations when local domain knowledge is used: 1) The overall similarity of a new case with a previous case depends on descriptors that match and descriptors that do not match [2]. In ill-structured decision problems, only a subset of all possible descriptors is included in a previous case, and their weights are used locally. Therefore, incorporating the effect of unmatched descriptors can be problematic. In particular, either descriptors of the new case and their weights, or descriptors of the previous case and their weights can be considered, and 2) When local fine-grained knowledge is available, it is assumed that the weights associated with the new case descriptors are equal to the weights associated with the previous case descriptors. This can be justified when a new decision problem has a single goal, and the weights are global; however, when decision problems have multiple goals, the weights of the descriptors can vary with those of previous cases. The first issue is addressed by Tversky’s matching function, which is based on the cognitive notions of similarity that require the use of a contrast model. A contrast model expresses similarity among objects as a combination of their common and distinctive features [51]. B. Tversky’s Contrast Matching Tversky’s contrast matching function is [51] (2) is the set of descriptors of a new case, and is where the set of descriptors of the th previous case. The numerator of Tversky’s contrast function comprises a set of descriptors that match, whereas the denominator consists of a set that includes the descriptors of a new case and the descriptor of the th previous case, thereby computing a contrast. However, the application of Tversky’s matching function in CBR has been limited (e.g., see [30]). This is because

GUPTA AND MONTAZEMI: EMPIRICAL EVALUATION OF RETRIEVAL

603

TABLE I COSINE MATCHING FUNCTION’S ABILITY TO INCLUDE THE DIFFERENCES IN THE WEIGHTS OF THE DESCRIPTORS IN A NEW CASE AND PREVIOUS CASES

1) similarity along descriptors is assumed to be binary (i.e., either 0 or 1), and; 2) TC does not include the weights associated with the descriptors. In this paper, we develop a modified form of the cosine matching function to overcome both the limitations of the nearest-neighbor and Tversky’s matching functions. IV. MODIFIED COSINE MATCHING FUNCTION In information retrieval systems, the cosine matching function [44] has been extensively used for matching a query with documents in a database [13], [36]. The function is represented as

(3) is the weight of the th descriptor in the weight where vector of a new case, and is the weight of the th descriptor in the previous case. The cosine matching determines overall similarity of two cases by comparing the frequency of terms (i.e., weight of descriptors in CBR) in a query (i.e., new case) and the weight of the terms in a document (i.e., previous case). The function measures the cosine of the angle between the weight vector of a new case and weight vector of a previous case. The terms in the denominator of (3) normalize the weight vectors by determining their Euclidean length in an -dimensional descriptor space. Two properties of the cosine matching function in the context of CBR follow: 1) The function incorporates the effect of unmatched descriptors of a new case and a previous case. This is achieved by means of the weight vector of the new case and the weight vector of the previous case. The unmatched descriptors have weight equal to zero in either the previous case or the new case. However, the denominator of the cosine matching function includes the weights of all the unmatched descriptors, thereby computing a contrast. The Venn diagram shown in Fig. 2 illustrates the contrast property of the cosine matching function. Assume that the similarity along a descriptor is binary (i.e., either a complete mismatch [0] or a complete match [1] )

and that the weights of all descriptors are equal (e.g., ). In Fig. 2(a), the cosine matching function distinguishes between , and , and is closer to the new case . The same is determined by the Tversky’s contrast function. However, the nearestneighbor matching function, which only considers the descriptors of the new case , cannot distinguish between previous cases and , whereas, nearestneighbor matching which only considers the descriptors , is able to distinguish of the previous case and . between Consider the scenario in Fig. 2(b), the cosine matching function determines that is closer than The same is determined by Tversky’s contrast function, whereas, cannot distinguish between and 2) When domain knowledge is used at a local level, the descriptors of a new case are assumed to have weights equal to the descriptors of a previous case. Ignoring the differences in the weight of descriptors can adversely affect the retrieval of most useful previous cases. The cosine matching function, however, includes the weight of descriptors of a new case and the weight of descriptors of the previous case . Therefore, the overall similarity of the new case computed by cosine matching function incorporates the differences in the weight of descriptors of the new case and the weight of descriptors of previous case. This ability is illustrated by the following example. Consider the weight vector of a new case when matched with the weight vectors of three candidate previous cases, as shown in the Table I. Assume that 1) the similarity along descriptors is binary, 2) all the previous cases match on the same set of descriptors, and 3) the importance of descriptors in the three previous cases differ. Table I indicates that the weight of descriptors in the previous case , are closest to the new case. The weight of descriptors of the previous case , however, are the most different. The cosine matching function determines that the overall similarity of is higher than , and the overall is less than . However, the is similarity of unable to distinguish any of the three previous cases, and can distinguish only from and . Tversky’s contrast matching function cannot distinguish any of the previous cases because it does not consider descriptor weights. The above examples illustrate that the cosine matching function has more capabilities than the nearest-neighbor and Tversky’s matching functions. However, the cosine matching function needs modification for use in CBR systems. Graded

604

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 27, NO. 5, SEPTEMBER 1997

(a)

(b) Fig. 2. Venn diagram representation of the contrast property of cosine matching function.

similarity along descriptors needs to be included. This modification is

V. DIAGNOSTIC

AND

REPAIR CBR SYSTEM

Our CBR application developed to troubleshoot (i.e., diagnose) and repair alternating current (ac) motors was named TRAAC. The decision domain of ac motors, and the decision support provided by TRAAC is now described. A. Decision Domain

(4) is the similarity along the th descriptor where of a new case and a previous case. The modification implies that the cosine of the angle generated by the weight vectors of the new case and the previous case is weighted by the graded similarity along the m-dimensions of the descriptor space. The candidate previous cases determined by searching the case base can be matched by and rank ordered according to decreasing overall similarity. was implemented in a CBR system developed to assist service personnel at a large manufacturing and service organization to diagnose and repair electromechanical machinery.

AC motors have diverse applications that range from driving blower fans in mine shafts to driving pumps in sewage pumping stations. When diagnosing faults in ac motors, DM’s (i.e., troubleshooters) can encounter large numbers of combinations of faults. Troubleshooters frequently base a diagnosis on incomplete evidence. This is because time constraints and resources available at the location do not always allow all relevant tests to be performed; the inherent complexity of the ac motor is also a factor. Moreover, tests often do not uniquely identify faults. For the same reason, troubleshooters occasionally disagree with another’s diagnosis. As well, for a set of faults that have been confirmed, repair methods can vary depending on a troubleshooter’s past experience. A diagnosis is considered good and complete if the symptoms disappear after the repairs are made.

GUPTA AND MONTAZEMI: EMPIRICAL EVALUATION OF RETRIEVAL

605

Fig. 4. Components of TRAAC.

Fig. 3. Processes in a diagnostic decision.

The characteristics of a decision domain comprising a large number of possible faults and their combinations; incomplete information; uncertainty of relationships among faults and tests; and an inability to characterize a diagnosis and repair (i.e., solution) as right or wrong make the context for the diagnosis and repair of ac motors ill-structured [47]. Experience of the application and the equipment is necessary to identify the root cause of the problem and to repair the equipment. The diagnosis and repair of ac motors is a good candidate for a CBR application. B. Decision Support TRAAC provides decision support to troubleshooters based on a diagnostic process model of electromechanical equipments [40] (see Fig. 3). First, the abnormal behavior of a system (symptoms) is reported, and next, the troubleshooter hypothesizes the faults that could produce these symptoms. If more than one hypothesis is possible, then the set of hypotheses that best explains the symptoms is selected. Tests are performed to gather evidence with a view to either confirm or reject the hypotheses. Confirmation of faults allows the troubleshooter to develop a repair plan to restore the system to its normal state of functioning. To support the diagnostic process, TRAAC includes three components: 1) Advisor; 2) Retriever; and 3) Analyzer (see Fig. 4) [17]. Advisor assists the troubleshooter hypothesize faults and perform tests (i.e., select the relevant descriptors) [31]. Reported symptoms, hypothesized faults, and tests performed to confirm them are the relevant descriptors of a new case. Based on this description, the Retriever retrieves the most relevant previous cases from the case base and presents them to the troubleshooter in order of decreasing overall similarity. The troubleshooter then views the content of the previous cases and uses Analyzer to compare retrieved previous cases with the new case. This enables the troubleshooter to confirm or reject the hypothesized faults, gain insight into the possible relationship between the faults, determine additional tests that

should be performed, and to formulate a repair plan. The effectiveness of decision support is dependent on the contents of the TRAAC knowledge base. 1) Knowledge Base: TRAAC’s knowledge base has two components: descriptor library and case base. Currently, TRAAC’s descriptor library consists of nine symptoms, 43 faults, and 100 tests. TRAAC’s case base consists of 35 prototypical previous cases. These range from vibration problems to overheating problems. The prototypical previous cases were acquired from trip reports which were maintained by the organization for which TRAAC was developed. The previous cases also include descriptors and their degree of importance. These are necessary for matching. Descriptors and their relative importance were acquired from a domain expert. Relevant symptoms, faults, tests and their interrelationships applicable to a previous case were determined by the expert by reference to the trip reports. The number of descriptors in a previous case ranged from six to 17. In a CBR system used for diagnosis and repair, the descriptor weights can only be determined after the case has been solved. Hence, in TRAAC, the descriptors of a new case are equally weighted (i.e., ). Troubleshooters therefore do not specify the descriptor weights of a new case during problem solving. In previous cases, the local descriptor weights are used. This is because only a subset of descriptors from the descriptor library is included in a previous case, and the descriptor weight can vary among the previous cases depending on their context. The descriptor weights in each previous case was acquired from a domain expert as follows. After the domain expert identifies the descriptors in a previous case, they are sorted into three categories: symptoms, faults, and tests. The categorization enables the domain expert to assess their role in a particular previous case (e.g., MEDIATOR [23]). Next, the domain expert was provided with a five-point Likert-type scale that assigns a degree of importance to individual descriptors qualitatively [7]. The scale ranges from the least important descriptor 1) to the most important descriptor 2) (see Appendix). The expert assigns degree of importance based on the events associated with the descriptors in the previous cases. An example of how the degree of importance is assigned is presented in the Section V-B-2. The decision support provided by TRAAC is further illustrated by the following example.

606

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 27, NO. 5, SEPTEMBER 1997

(a) Fig. 5. Example of diagnosis and repair of ac motors by means of TRAAC.

2) Example: A client reported that a 450 hp ac motor in a steel mill was “vibrating” heavily and the “bearing overheating” was fairly high. On the basis of reported symptoms, Advisor recommended some faults for consideration. By using a diagnostic strategy in conjunction with the recommendations of Advisor, the troubleshooter hypothesized an occurrence of “rotor damage,” an occurrence of “thermal imbalance,” and an onset of “bearing failure.” Next, Advisor recommended relevant tests that could gather evidence toward confirmation of these faults. The troubleshooter observed a large “increase in vibration as the motor reached the operating speed” and performed a “vibration analysis” whose spectrum showed that the harmonics in the vibration were fairly high. The bearing end cover was removed and fairly heavy “rub marks on the bearing shoulder” were observed (see Fig. 5). With this description, Retriever retrieved three previous cases, 00035, 00028, and 00032. These were then presented to the troubleshooter (see Fig. 5). The troubleshooter viewed each previous case and used Analyzer to assess the similarities and the differences of the new case with the retrieved previous cases.

On the basis of similarity with previous case 00035, the troubleshooter confirmed an occurrence of “thermal imbalance.” “Rotor damage” was confirmed by the harmonics in “vibration signature.” The troubleshooter concluded, based on the explanation of previous case 00035, the “thermal imbalance” was probably a result of the “damaged rotor” which can distort due to the increase in temperature as the motor reaches its full operating load, thereby causing vibration. In addition, case 00035 indicated that further confirmation of “rotor damage” should be obtained by performing a “single phase test” and by measuring “the increase in current” drawn by the motor. The co-occurance of “rotor damage” and “thermal imbalance” implied that the rotor damage in the new case was probably extensive (cracks were found on the rotor after disassembly of the motor in the previous case 00035). Therefore, the rotor should be replaced because repairing the damaged rotor would be both expensive and difficult. On the basis of previous case 00028, the troubleshooter concluded that the “bearing had failed.” Case 00028 suggested that the “axial thrust” should be measured to detect the

GUPTA AND MONTAZEMI: EMPIRICAL EVALUATION OF RETRIEVAL

607

(b) Fig. 5. (Continued.) Example of diagnosis and repair of ac motors by means of TRAAC.

possibility of “shaft misalignment.” The harmonic “vibration signature” suggested looseness in the rotor and that a “single phase test” should be performed. The repair of the motor in the new case should include the replacement of the bearing. Furthermore, tightening the bars on the rotor (i.e., the rotor repair in the previous case 00028), although inexpensive, would probably not eliminate the vibration completely. The “vibration signature” in previous case 00032 suggests that a looseness in the rotor has occurred, and that a “single phase test” should be performed, and that the “increase in current” should be measured. After viewing the retrieved previous cases the troubleshooter performed a “single phase test.” This resulted in a fairly loud humming noise, thereby confirming the “damaged rotor.” The “axial thrust” was measured and was light. On the basis of acquired experience and knowledge, the troubleshooter overruled the possibility of shaft misalignment and concluded that “axial thrust” was the result of a floating shaft, which, in turn, had caused the bearing to fail. The troubleshooter decided to replace the bearing and the rotor, and realign the shaft on assembly. The new case along with

the explanation provided by the troubleshooter was stored for future use. Central to the decision support provided by TRAAC was the order in which the previous cases were presented. The order of presentation of previous cases was determined by the overall similarity computed by the modified cosine matching function. Modified cosine matching function ranked previous case 00035 as the most relevant and previous case 00032 as the least relevant. To compute the overall similarity of a previous case, computation of the similarity along the descriptors in the previous case and the new case is necessary, as well as application of the weights of previous case descriptors and new case descriptors. Table II provides the details of the new case, and the three previous cases; 00035, 00028, and 00032. Values were assigned to a descriptor by means of a Likert-type scale with linguistic qualifiers (see Appendix). For example, the extent of vibration in the new case was heavy (4). The importance of descriptors in the previous cases were provided by a domain expert who used the methodology presented in Section V-B1. For example, in previous case

608

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 27, NO. 5, SEPTEMBER 1997

TABLE II THE OVERALL SIMILARITY OF THREE CANDIDATE PREVIOUS CASES WITH THE NEW CASE DETERMINED BY MEANS OF MC, NN,

00035, symptoms were assigned as the least important descriptors and faults were assigned as important. All tests were assigned as important, except the test “increase in vibration from startup to operation” which was assigned as the most important descriptor because it confirmed thermal imbalance, which was the principal cause of vibration in the previous case 00035. The similarity along a descriptor was determined heuristically (see the Appendix). For example, the similarity of the new case with the previous case 00035 along the descriptor “vibration” with the new case value heavy and previous case value very heavy is 0.80. Table II shows the overall similarity of previous cases 00035, 00028, 00032 as determined by modified cosine func; nearest-neighbor matching using the descriptors of tion ; nearest-neighbor the previous case and their weights matching using the descriptors of the new case and their , and Tversky’s contrast matching function weights . The overall similarity determined by and indicated that the previous case 00035 was more relevant than the previous case 00028 and 00032 toward solving the new case. This is favorable because previous case 00035 gave information about “thermal imbalance” and its co-occurrance with “rotor damage.” This was necessary for selecting an appropriate repair method, and the previous case 00028 provided , the only information regarding bearing failure. Unlike and indicated that previous case 00028 was more relevant than previous case 00035. This reason is that does not consider the unmatched descriptors of the previous does not consider the domain knowledge. case, and

AND

TC

The effectiveness of matching functions for retrieving previous cases was empirically evaluated in the domain of diagnosis and repair of ac motors. An empirical evaluation follows. VI. EMPIRICAL EVALUATION A. Objective The objective of the empirical evaluation was to compare the retrieval performance of modified cosine matching with nearest-neighbor matching function, and Tversky’s contrast matching function. B. Methodology 1) Measure of CBR Retrieval Performance: Although a number of CBR systems have been reported in the literature, only a few evaluate their performance [8]. Some CBR systems use classification accuracy as a measure of retrieval performance [39], [48], while others use recall and precision as a measure [29], [45]. Recall and precision have been adopted from the information retrieval literature (e.g., see [22], [36], and [43]). However, these measures are not suitable when multiple previous cases are retrieved and rank ordered based on degree of usefulness. Recall and precision ignore the rank ordering of retrieved previous cases [18]. Furthermore, use of these two measures make comparison of alternative retrieval methodologies ambiguous. A measure of retrieval performance of CBR system should incorporate the following four components: 1) retrieved previous cases useful to the new case; 2) retrieved previous cases not useful to the new case;

GUPTA AND MONTAZEMI: EMPIRICAL EVALUATION OF RETRIEVAL

3) useful previous cases not retrieved; and 4) agreement of ranking produced by the CBR system and ranking expected by the DM. We used the Kendal’s Tau with ties [19] to measure retrieval performance. This incorporates all the above components measures the agreement [18]. Kendal’s Tau with ties of judgments from two sources which produces the ordinal ranking of a set of items. The two sources in this CBR retrieval evaluation are rank ordering of previous cases determined by the usefulness rating of retrieved previous cases provided by the DM and rank ordering of previous cases retrieved by the CBR system. The number of agreements and disagreements between the rank order determined by the DM and the rank order determined by the system are determined. The statistical correlation between the two rank orderings is then measured. 2) Experimental Design: Three sets of comparison were performed. 1) Comparison set 1: The retrieval performance of the matching functions was compared precluding domain knowledge, to assess the effect of contrast in matching. 2) Comparison set 2: The retrieval performance of each matching function precluding domain knowledge was compared with its retrieval performance when the domain knowledge was included. This strategy allowed comparison of the effect of domain knowledge on retrieval performance. 3) Comparison set 3: The retrieval performance of matching function was compared with inclusion of domain knowledge. This would enable us to assess the combined effect of domain knowledge and contrast on retrieval performance. matching funcRetrieval was performed by means of tion, nearest-neighbor matching with descriptors of the previ, nearest-neighbor matching ous cases and their weights , with descriptor in the new case and their weights and the Tversky’s matching function. Domain knowledge comprised the weights of the descriptors provided by experts. The descriptors of a new case were equally weighted (i.e., no domain knowledge was used). The retrieval performance of a matching function was determined for each test case by computing . Performance of matching functions was compared by conducting a pairwise signed test [20]. The comparison of retrieval matching function with matching function consists of the following. wins over when loses to when and ties when The number of wins, ties, and losses were determined for all the test-cases. The proportion of wins and losses represent the probabilities of success and failure of a binomial distribution. Therefore, to ascertain whether is better than , the null hypothesis that the proportion of was tested against wins is 0.5 alternative hypothesis that the proportion of wins is greater . For samples of size greater than than 0.5 10, the normal approximation to binomial can be used. On the basis of normal approximation, the null-hypothesis is rejected at 95% confidence level when the -value is greater than 1.645.

609

Consequently, it can be concluded that retrieval methodology is superior to . 3) Decision Environment: TRAAC was used to assess the retrieval performance of the various matching functions in the domain of diagnosis and repair of ac motors. 4) Subjects: The organization under study has several regional services divisions across Canada and a central engineering services division in Ontario. Each division has a team of service engineers who troubleshoot a variety of electrical machinery. Problems that are not solved at the regional level are referred to Central Engineering Services. Ten troubleshooters from Central Engineering Services and Regional Field Services participated in this investigation. All subjects had a college or university degree. The troubleshooting experience of subjects ranged from four to 30 years. Average experience was ten years. All subjects were male. 5) Instruments: Two questionnaires were completed by the subjects. The first questionnaire recorded the subject’s demographic information, such as position in the organization, level of education, years of experience related to ac motors, years of experience in troubleshooting ac motors, and the perceived level of troubleshooting expertise. The second questionnaire measured the degree of usefulness of the retrieved previous cases in response to a test case (new case). A single item questionnaire was used to measure the usefulness of a retrieved previous case toward the new decision problem (test case) with a seven-point Likert-type scale [12]. This is based on reported findings that usefulness and relevance of retrieved information are equivalent [41]. 6) Test: Eleven test cases (i.e., new cases) that were representative of problems that had occurred in the field in the past were selected for our investigation. The diagnosis and the repair solution used in these test cases was known. The functionality and the utility of TRAAC was demonstrated to subjects by reference to a previous troubleshooting event. The subject was then given a warm-up test case to allow familiarization with the functionality of the system. This particular test case was removed from the subsequent analysis. After the subject was comfortable with the system, one test case at a time was presented. Learning during evaluations could affect subject response [41]. Hence, test cases were assigned in a random order. Subjects described the test cases using Advisor. On average, 6.85 descriptors were used to describe a test case. Based on the test case description provided by the subject, TRAAC retrieved a subset of the 35 previous cases and presented them to the subject in a random order. After reading the content of each retrieved previous case, the subject rated it by means of a usefulness questionnaire. When responses indicated ties in the usefulness rating, the subject was given the option of expressing a preferred order among the tied, retrieved previous cases. On completing the rating, the subject wrote the analysis of the new case and suggested a repair technique. The experimental session comprising the description of the new case, matching process, and retrieved previous cases was recorded by TRAAC for later analysis. On average, it took 45 min to complete an analysis of a test case. Due to the time constraints, the number of test cases assessed by the subjects

610

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 27, NO. 5, SEPTEMBER 1997

varied between four and ten. A total of 80 cases were analyzed by ten subjects.

TABLE III EFFECT OF DOMAIN KNOWLEDGE ON RETRIEVAL PERFORMANCE

VII. ANALYSIS The solutions provided by the subjects were scored in accordance with the actual solution implemented in real settings for each test case. The average score was 87.38%. The question arises as to whether any significant differences among the solutions provided by the subjects existed. To assess this, we adopted the analysis of variance (ANOVA) of the scores. The results indicated no significant difference among the subjects . Therefore, it is reasonable to assume that the subjects were uniformly competent in assessing the usefulness of retrieved previous cases toward the solution of the test cases (i.e., the new case). A. Comparison Set 1 Matching functions performance was compared precluding domain knowledge. The analysis shows that the retrieval performance of is better than (see Table III); that the retrieval performance of was also better than , although it was not significant ; and that the retrieval performance of was better than and , which, overall, shows that the new case and the previous case should be contrasted completely by considering the unmatched descriptors of the previous case and the new case. The comparison of with demonstrates no significant difference between the two . The reason is that, when domain knowledge is not used, both and contrast two cases. The comparison of with shows that the former performs better, although, not significantly . This implies that to ignore the unmatched descriptors of the previous case has a greater impact on the contrasting ability of nearest-neighbor than to ignore the unmatched descriptor of the new case.

TABLE IV COMPARISON OF MATCHING FUNCTIONS USING DOMAIN KNOWLEDGE

TABLE V COMPARISON OF THE THREE MATCHING FUNCTIONS PRECLUDING DOMAIN KNOWLEDGE

B. Comparison Set 2 The effect of domain knowledge on the retrieval performance of a matching function is shown in Table IV. Comparisons show that the descriptor weights of previous case impact significantly on the retrieval performance of and matching function (all ). Results indicate that, in some test cases a fall (i.e., loss) in the retrieval performance occurred when domain knowledge was used. Nonetheless, on average, the use of domain knowledge led to better retrieval performance. This implies that the weights of the previous cases provided by the expert did not result in a better rankordering for all the subjects, because the desired rank-ordering of previous cases varied among the subjects, dependent on individual diagnostic strategies and knowledge. This was due to the ill-structured nature of the ac motor diagnosis and repair decision domain [5], [6]. C. Comparison Set 3 The combined effect of domain knowledge and contrast is shown in Table V. The analysis demonstrates the superiority of

compared to and (all considered only the previous case descriptors; considdid not make use of ered only new case descriptors, and domain knowledge. VIII. CONCLUSION An empirical assessment of TRAAC demonstrated the superiority of the modified cosine matching function in an illstructured diagnosis and repair environment. The ill-structured nature of the domain was a result of the following characteristics: occasionally, relevant information was missing; conclusions drawn from the available information could be conflicting (e.g., troubleshooters often disagreed with a diagnosis), the approach to diagnosis and repair varied among troubleshooters; and the repair method used by a troubleshooter could not be characterized as either right or wrong. Under

GUPTA AND MONTAZEMI: EMPIRICAL EVALUATION OF RETRIEVAL

these conditions, multiple previous cases were used to solve a new case. Previous cases included domain knowledge at a local level in the form of the descriptor weights based on the context of previous cases. The ability of the modified cosine matching function to include domain knowledge, as well as its ability to contrast a new case and a previous case, resulted in a better performance than if the nearest-neighbor matching function, and the Tversky’s matching function had been used. A number of current CBR systems determine the overall similarity of a new case with a previous case based on surface features by means of the nearest-neighbor matching function. The modified cosine matching function can be used in CBR systems in which the nearest-neighbor matching function can be used. In these systems, weights are usually acquired from domain experts in advance. Our empirical evaluation showed that local weights provided by a domain expert improves the retrieval performance significantly. When local weights are used for matching and significant improvement in the retrieval performance occurs, the cosine matching function performs better than the nearest-neighbor and the Tversky’s matching function. APPENDIX

611

C. Similarity Along a Descriptor The similarity along a descriptor was computed using a numeric heuristic as Similarity

and

where is the value of descriptor in the new case, is the value of descriptor in the previous case, and is the range of the scale of th descriptor. For example, in the descriptor “extent of bearing overheating” with the value very high (5) is matched with a value medium (2) as Similarity See [32] for a detailed discussion of the above heuristics. ACKNOWLEDGMENT The authors would like to thank the two anonymous reviewers for their helpful comments on the original version of this paper. They would also like to extend their appreciation to the employees of Westinghouse Canada Inc. for their help and cooperation in this research. REFERENCES

A. Assignment of Importance of Descriptors The following five-point Likert-type scale with qualitative values was used to assign importance to the descriptors of a previous case.

B. Qualitative Values of Descriptors in the Example Qualitative values of the descriptors were assessed on fivepoint Likert-type scales. The “extent of motor vibration,” “the extent of axial thrust,” “the extent of increase in vibration from startup to operation,” and “the extent of rub marks on the bearing shoulder” were assessed as follows.

The descriptors “extent of bearing overheating,” the “extent of loudness in the single phase test” and “the extent of harmonics in the vibration signature” were assessed as follows.

The extent of “drop in the vibration when stiffeners are used” was assessed on the following scale.

[1] B. P. Allen, “Case-based reasoning, business applications,” Commun. ACM, vol. 37, pp. 40–42, Mar. 1994. [2] K. D. Ashley, “Assessing similarity among cases: A position paper,” in Proc. DARPA Workshop Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann, 1989, pp. 72–75. [3] R. Bariess and J. A. King, “Similarity assessment in case-based reasoning,” in Proc. DARPA Workshop Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann, 1989, pp. 67–71. [4] N. J. Belkin and W. B. Croft, “Information filtering and information retrieval, two sides of the same coin?,” Commun. ACM, vol. 35, pp. 29–38, Dec. 1992. [5] C. Bento and E. Costa, “Retrieval of cases imperfectly described and explained: A quantitative approach,” in Case-Based Reasoning, Papers 1993 Workshop Tech. Rep. WS-93-01 Menlo Park, CA: AAAI, 1993, pp. 156. [6] T. Cain, M. Pazzani, and G. Silverstein, “Using domain knowledge to influence similarity judgments,” in Proc. Case-Based Reasoning Workshop, Washington DC, 1991, pp. 191–198. [7] Cognitive Systems, REMIND Developers Reference Manual, Boston, MA, 1992. [8] P. R. Cohen, “Evaluation and case-based reasoning,” in Proc. DARPA Workshop Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann, 1989, pp. 168–172. [9] G. DeJong and R. Mooney, “Explanation-based learning: An alternative view,” Mach. Learn., vol. 1, no. 2, pp. 145–176, 1986. [10] D. Donahue, “OGRE: Generic reasoning from experience,” in Proc. DARPA Workshop Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann, 1989, pp. 248–252. [11] R. Duda and P. Hart, Pattern Classification and Scene Analysis. New York: Wiley, 1973. [12] M. B. Eisenberg, “Measuring relevance judgements,” Inf. Process. Manage., vol. 24, pp. 373–389, July–Aug. 1988. [13] N. Fuhr, “Optimum polynomial retrieval functions based on probability ranking principle,” ACM Trans. Inf. Syst., vol. 7, pp. 183–204, July 1989. [14] D. Gentner, “Structure mapping: A theoretical framework for analogy,” Cognitive Sci., vol. 7, pp. 155–170, Apr.–June 1983. [15] A. R. Golding and P. S. Rosenbloom, “Improving rule-based systems through case-based reasoning,” in Proc. 9th Nat. Conf. Artificial Intelligence. Menlo Park, CA: AAAI, 1991, vol. 1, pp. 22–27. [16] M. Goodman, “CBR in battle planning,” in Proc. DARPA Workshop Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann, 1989, pp. 264–269. [17] K. M. Gupta, “A framework for the design and development of diagnostic case-based reasoning systems,” Ph.D. dissertation, McMaster Univ., Hamilton, Ont., Canada, 1996.

612

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 27, NO. 5, SEPTEMBER 1997

[18] K. M. Gupta and A. R. Montazemi, “A methodology for evaluating the retrieval performance of case-based reasoning systems,” Res. Work. Paper Series 398, School of Business, McMaster Univ., Hamilton, Ont., Canada, 1994. [19] W. L. Hays, Statistics. New York: Holt, Rinehart and Wilson, 1963. [20] T. R. Hinrichs, Problem Solving in Open Worlds: A Case Study in Design. Northvale, NJ: Lawrence Erlbaum, 1992. [21] G. A. Klien, J. A. King, and L. Whitaker, “Using analogues to predict and plan,” in Proc. Case-Based Reasoning Workshop. San Mateo, CA: Morgan Kaufmann, 1988, pp. 224–232. [22] D. W. King and E.C. Bryant, The Evaluation of Information Services and Products. Washington, DC: Information Resources, 1971. [23] J. L. Kolodner and R. L. Simpson, “The MEDIATOR: Analysis of an early case-based problem solver,” Cognitive Sci., vol. 13, pp. 507–549, Oct.–Dec. 1989. [24] J. L. Kolodner, Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann, 1993. [25] , “Improving human decision making through case-based decision aiding,” AI Mag., vol. 12, pp. 52–68, Summer 1991. [26] , “Judging which is the best case for a case-based reasoner,” in Proc. DARPA Workshop Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann, 1989, pp. 77–81. [27] J. L. Kolodner, and W. Mark, “Case-based reasoning,” IEEE Expert, pp. 5–6, Oct. 1992. [28] P. Koton, “Reasoning about evidence in causal explanations,” in Proc. DARPA Workshop Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann, 1988, pp. 260–270. [29] M. Kriegsman and R. Barletta, “Building a case-based help desk application,” IEEE Expert, vol. 8, no. 6, pp. 18–26, 1993. [30] T. Liang and B. R. Konsynski, “Modeling by analogy: Use of analogical reasoning in model management systems,” Decision Support Syst., vol. 9, pp. 113–125, Jan. 1993. [31] A. R. Montazemi and K. M. Gupta, “An adaptive agent for description in diagnostic case-based reasoning systems,” Comput. Industry, vol. 29, pp. 209–224, 1996. , “A framework for retrieval in case-based reasoning systems,” [32] res. work. paper series 407, School of Business, McMaster University, Hamilton, Ont., Canada, 1995. , “Case-based reasoning: A methodology for decision support [33] systems,” in Proc. 11th Annu. Conf. Association Management, Atlanta, GA, 1993, vol. 11, no. 1, pp. 63. [34] A. R. Montazemi, D. W. Conrath, and C. A. Higgins, “An exception reporting information system for ill-structured decision problems,” IEEE Trans. Syst., Man, Cybern., vol. SMC-17, pp. 771–779, Sept./Oct. 1987. [35] D. Offutt, “SIZZLE: A knowledge acquisition tool specialized for the sizing task,” in Automating Knowledge Acquisition for Expert Systems, S. Marcus, Ed. Norwell, MA: Kluwer, 1988, pp. 175–200. [36] E. Ozkarahan, Database Machines and Database Management. Englewood Cliffs, NJ: Prentice Hall, 1986. [37] Panel of CBR workshop, “Case-based reasoning from DARPA machine learning program,” in Proc. DARPA Workshop Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann, 1989, pp. 1–14. [38] D. Poetschke, “Analogical reasoning for second generation expert systems,” in Proc. Int. Workshop Analogical Inductive Inference, Reinhardsbun Castle, Germany, 1989, pp. 264–276. [39] B. W. Porter, R. Bariess, and R. C. Holte, “Concept learning in weak theory domains,” Artif. Intell., vol. 45, no. 1–2, pp. 229–264, Sept. 1990. [40] O. Raoult, “A survey of diagnosis expert systems,” in Knowledge Based Systems for Test and Diagnosis, G. Saucier, A. Ambler, and M. A. Breuer, Eds. New York: Elsevier, 1989, pp. 153–167. [41] J. J. Regazzi, “Performance measures for information retrieval systems: An experimental approach,” J. Amer. Soc. Inf. Sci., vol. 39, pp. 235–251, July 1988. [42] C. K. Riesbeck and R. C., Schank, Inside Case-Based Reasoning. Hillside, NJ: Lawrence Erlbaum, 1989.

[43] G. Salton, “The state of retrieval system evaluation,” Inf. Process. Manage., vol. 28, pp. 441–449, July–Aug. 1992. [44] G. Salton, Automatic Information Organization and Retrieval. New York: McGraw-Hill, 1968. [45] E. Simoudis, “Using case-based retrieval for customer technical support,” IEEE Expert, vol. 7, pp. 7–11, Oct. 1992. [46] E. Simoudis and J. Miller, “Validated retrieval in case-based reasoning,” in Proc. 8th Nat. Conf. Artificial Intelligence. Menlo Park, CA: AAAI, 1990, vol. 1, pp. 310–315. [47] R. H. Sprague and E. D. Carlson, Building Effective Decision Support Systems. Englewood Cliffs, NJ: Prentice-Hall, 1982. [48] C. Stanfill and D. L. Waltz, “Memory-based reasoning paradigm,” in Proc. DARPA Workshop Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann, 1988, pp. 414–424. [49] R. J. Sternberg, “Component processes in analogical reasoning,” Psychol. Rev., vol. 84, pp. 353–378, July 1977. [50] R. H. Stottler, “CBR for cost and sales prediction,” AI Expert, vol. 9, pp. 25–33, Aug. 1994. [51] A. Tversky, “Features of similarity,” Psychol. Rev., vol. 84, pp. 327–352, July 1977.

Kalyan Moy Gupta received the B.E. degree from Ravishanker University, India, the M. Tech. degree from the Indian Institute of Technology, Kharagpur, India, and the Ph.D. degree from McMaster University, Hamilton, Ont., Canada. He is a Research Engineer at Atlantis Aerospace Corporation, Brampton, Ont., where he designs and develops performance support systems. Prior to joining Atlantis, he spent several years developing business information systems. He has published papers in the IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, Journal of Computers in Industry and Journal of Management Information Systems. He has presented and published in the conferences of the North East Decision Sciences Institute, the Canadian Operations Research Society, and the American Association of Management. His current research interests include use of artificial intelligence techniques in performance support systems, message extraction, and requirements analysis. Dr. Gupta is a member of ACM, INFORMS, and TORCHI.

Ali R. Montazemi received the Ph.D. degree in management sciences from the University of Waterloo, Waterloo, Ont., Canada, in 1984. He is Associate Professor of Management Information Systems at the Michael G. DeGroote School of Business, and Associate Member of the Department of Electrical and Computer Engineering, McMaster University, Hamilton, Ont., Canada. He has published papers in the IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS, Decision Support Systems, European Journal of Operational Research, , INFOR, International Journal of Man-Machine Studies, Journal of Artificial Intelligence in Education, Journal of Computers in Industry, Journal of Educational Computing Research, Journal of Management Information Systems, Journal of Operational Research Society, MIS Quarterly, and others. His current areas of research interests include human–computer interaction, application of case-based reasoning systems in business, design and development of DSS, and intelligent tutoring systems.

Suggest Documents