Sholom M. Weiss. Department of Computer Science,. Rutgers University,. New Brunswick, NJ 08903, USA. Peter Politakis. Digital Equipment Corporation,.
To appear in the Proceedings of the Fifth International Symposium on Knowledge Engineering, 1992
Complexity-Based Evaluation Of Rule-Based Expert Systems Nitin Indurkhya Advanced Research Laboratory, Hitachi Ltd, Hatoyama 350-03, JAPAN
Sholom M. Weiss Department of Computer Science, Rutgers University, New Brunswick, NJ 08903, USA
Peter Politakis Digital Equipment Corporation, 111, Locke Drive, Marlboro, MA 01752, USA
Abstract We describe techniques for evaluating rule-based expert systems by complexity-analyses. The degree of redundancy and superfluity in the knowledge base is measured by comparing its performance to pruned versions of itself. The pruning method, called weakest-link pruning, takes advantage of the structure of the rule-base to perform a cost-complexity pruning of the set of rules. Two large rule-based expert systems for real-world applications are evaluated using this method. These pruning techniques enable an evaluation of the degree of redundancy of the knowledge base, the identification of the critical portions of the knowledge base and an evaluation of the adequacy of the benchmark cases. 1 Introduction While the field of knowledge engineering has focused on techniques for constructing rule-based expert systems in a cost-effective manner, relatively little attention has been paid to techniques for evaluating these systems to determine how they will perform on new cases. The failure to assess the performance of an expert system on new cases can be a major obstacle to its acceptance and routine commercial deployment. In this paper, we discuss techniques for evaluating rule-based expert systems by analyzing their complexity. In particular, these methods are useful for determining the degree of redundancy and superfluity in the knowledge base. Our methods are based on a dynamic analysis of the expert system's performance on benchmark cases. We address the problem of redundancy within the context of a more general issue: How much of the knowledge base is critical to good performance? Identification of the critical portions can be useful in maintaining the knowledge base. We also show how the methods can be used for analysis of the benchmark cases. Since the benchmark cases are typically used in performing different kinds of evaluation of the expert system, it is important to assess how useful they are for evaluation. These methods enable such an analysis of the benchmark cases and identify possible gaps in their coverage. Only classification-type rule-based expert systems are considered although some of the results are applicable to other types of expert knowledge bases too. The rules are typical IF...THEN
propositional rules with intermediate hypotheses. A simple interpreter is used for reasoning -- the rules are activated based on the inputs. When multiple rules are satisfied and are applicable to the same hypothesis, the interpreter selects the single rule with the maximum absolute confidence for that hypothesis. An example of such an interpreter is described in [Weiss and Kulikowski, 1984]. 2 Related Work Early systems for evaluating rule-based systems such as [Nguyen et al, 1985] attempted to evaluate system properties such as consistency by performing a static analysis of the rules. More recent approaches [Ginsberg, 1988, Ginsberg, Weiss and Politakis, 1988] have been based on dynamic analysis of system performance on sample cases. While this is more useful than static analysis, these methods will modify a knowledge base either by minor refinements to the rules [Ginsberg, Weiss and Politakis, 1988] or by the addition of hundreds of rules to the knowledge base [Ginsberg, 1988, Ginsberg, 1990]. There has been little work on the broader issue of analysis of portions of the knowledge base that are critical for good performance. The redundant portions represent just one end of the spectrum -- the least critical portions. Analysis of other parts of the knowledge base would also be helpful in knowledge base maintainence. Pruning methods for complexity-analysis have been applied in rule-induction systems [Quinlan, 1987, Weiss and Indurkhya, 1991], however, they have not been used so far to evaluate rule-based systems generated by human experts. The pruning method we present in this paper is related to the one used in [Weiss and Indurkhya, 1991]. 3 Methods The central idea behind our evaluation method is a pruning technique called weakest-link pruning. The initial rule-based system is successively pruned back into smaller subsets and the performance of each of these pruned versions is compared with that of the initial knowledge base. This enables identification of the redundant portions of the knowledge base and also provides an analysis of how different parts of the knowledge base affect its performance. In this section we define the pruning problem, give a metric for comparing the performance of pruned knowledge bases, and describe the weakest-link pruning algorithm as applied to rule-based systems. 3.1 The Pruning Problem Consider a knowledge base of size N and associated set of sample cases S. For a rule-based system, the size is given by the total number of propositional components in the left hand side (the IF part) of the rules. Define a pruned knowledge base as one obtained from the original knowledge
2
base by deleting rules and/or components from it. The problem is to find the best pruned knowledge base. There are two issues that must be addressed: (i) What does one mean by the best pruned knowledge base? (ii) How can one find the best pruned knowledge base? The first issue involves giving an appropriate metric for comparing the performance of pruned knowledge bases. The second issue involves searching over an exponential number of prunings. An exhaustive search is clearly impractical and a heuristic approach is strongly indicated. 3.2 Cost-Complexity Evaluation In this section we discuss how to evaluate and compare pruned knowledge bases by using a set of benchmark cases. Conventionally, accuracy (or equivalently, error-rate) has been used to evaluate a knowledge base. The problem with this is that the size of the pruned knowledge base plays no role in the evaluation. Since we are interested in the relationship between the size of the knowledge base and its error-rate, a different metric that includes a cost factor for the size of the knowledge base must be used. One such metric, that has been used in the context of decision trees, is the Costcomplexity Evaluation Function [Brieman, et al, 1984]. Consider a knowledge base of size N with error-rate E. Then the cost complexity error-rate, Ecomp, is given by Ecomp = E + α × N α can be seen as the cost per unit size. When α=0, Ecomp is the same as the error-rate. An attractive feature of Ecomp is that if the error-rate is zero, Ecomp is smaller for smaller knowledge bases. While other possible metrics, like E.N, also order knowledge bases by size if the error-rate is identical for all of them, this is only for non-zero error-rates. Ecomp, on the other hand, behaves well across the entire range of possible values of E and N. 3.3 Cost-Complexity Pruning Ecomp, as defined in the previous section, is dependent on the cost factor α. While we would like to choose the best pruned knowledge base based on Ecomp, we would not like to limit ourselves to a specific value of α. The cost-complexity pruning procedure addresses this concern by picking the best pruned knowledge base over all values of α. Consider the best pruned knowledge bases for each value of α. While α runs through an infinite continuum of real values, there are only a finite number of pruned knowledge bases to be considered -- 2N of them for a knowledge base of size N. Because of this finiteness, if P(α1) is the best pruned knowledge base for α=α1, then it remains the best pruned knowledge base for a range
3
of values around α1. Thus, instead of finding the best pruned knowledge base for each possible value of α, we can find the points along the range of α values where the best pruned knowledge base changes. Thus, cost-complexity pruning is a two step process: 1. Obtain a series of pruned knowledge bases such that each pruned knowledge base is a best pruned knowledge base for a range of α values. 2. From this list, pick the knowledge base that has the lowest error-rate. The series of pruned knowledge bases is obtained by a procedure called weakest-link pruning. 3.4 Weakest-Link Pruning In the previous section we saw how cost-complexity pruning needs to grow a series of pruned knowledge bases such that each knowledge base in the series is the best pruned knowledge base for a range of α values. Let us order this series by their corresponding α values. Thus, we seek to generate a series of pruned knowledge bases, P1(α1), P2(α2),...PL(αL) such that Pi(αi) is the best pruned knowledge base for αi-1