Keywords: knowledge validation, rule-based systems, machine-learning ... Credit decision making, based on client's financial condition, is an example of.
VALIDATION OF RULE-BASED SYSTEMS GENERATED BY CLASSIFICATION ALGORITHMS Mieczyslaw Lech Owoc and Violetta Galant Department of Computer Systems. Wroclaw University of Economics ul. Komandorska 118/120 53-345 Wroclaw, Poland Phone: ++4871-680513; Fax: ++4871679611 E-mail: {owoc, galant}@ksk-2.iie.ae.wroc.pl Abstract. Validation of a knowledge base is an important aspect of the knowledge-based systems (KBS) development procedure, which aims to assure the system’s ability to reach correct results. Very promising seems to be knowledge-bases (KBs) generated by classification algorithms as an example of machine-learning approach. The paper addresses the issues of verification and evaluation of KBs existing as rule-based systems and generated with classification algorithms. The framework consists of three steps. The first one relies on creation a set of rules using two tested algorithms: C4.5 and GIMS and then transformation rules into decision tables. In the second one, rules are verified, taking into account two criteria: completeness and consistency. Finally, during the last step, the set of rules is evaluated using two additional criteria: adequacy and reliability. The classification problem refers to bank customers, which apply to get credit; their applications can be approved or refused. Certain unique features of generated rules are shortly commented in a summarisation. Keywords: knowledge validation, rule-based systems, machine-learning approach, verification and evaluation, classification algorithms
1. Introduction Validation of knowledge is still observed as one of the crucial tasks, which can improve quality of knowledge-based systems. Relatively new approach is developing of knowledge-based systems in more automatic way, namely using machine-learning approach. Without doubt, also such systems have to be validated using specific methodology. We believe validation techniques developed for expert systems can be adopted, however some difficulties may occur. The idea of knowledge validation (KV) - though intuitively intelligible - has more than one interpretation. We accept the Laurent’s proposal (Laurent, 1992), consisting on regarding two separate procedures: verification and evaluation. It looks to be necessary, to specify some set of criteria, see: (Owoc, 1994) useful in the mentioned procedures. The criteria are: knowledge completeness and consistency for the verification purposes and adequacy along to its reliability in due of the evaluation procedure. The organisation of this paper is as follows. In the next section an idea of classification algorithms is briefly presented. Characteristics of the classification task refers to customer properties expressed as their attributes along to rules considered in chosen algorithms. The next part is devoted to performing two procedures, which create the validation process: verification and evaluation in concordance to set of criteria mentioned above. The paper concludes with a summary of the results as well as future directions of the research.
2. Classification Algorithms Applied for Generation of the Rules The classification problems can be solved in many ways. There can be recently observed essential increase of interest in the domain of independent knowledge-acquisition systems based on machine learning techniques. In this paper we use two classification algorithms applied for generation of rules: C4.5 from Quinlan (1993) and GIMS (Generalisation by Inductive Symbolic Method) from Galant (1996; 1997). These systems
generate classification rules on the bases of decision trees. Quinlan applied in C4.5 an evaluation function, based on a classic formula from information theory, that measures the theoretical information. In the GIMS system, in order to generate a decision tree, Czerwinski coefficient of attributes association is applied. This coefficient measures degree of dependence, or independence, which exists between two variables (Galant, 1996). The coefficient is computed for all described attributes. The maximum value of coefficient decides on the choice of the attributes to the following nodes of the decision tree.
3. Credit Risk Assessment Application as a Form of Classification Task The classification tasks are regarded very important, especially in the context of expert systems. We are to do with classification everywhere, where we choose one decision from many possible. Credit decision making, based on client’s financial condition, is an example of such classification task. A formal definition of the classification task is: Objects used for classification knowledge generation are called the examples and are given in a training set C. The examples in training set describe m attributes X and one classification attribute Y. Each example in the C set describes an entity as follows:
C
i
= (x1, x
,..., xm , y) where: x l ∈ dom( X l ) , y ∈ dom( Y ), l = 1,... , m 2
On the basis of a training set C, the rule of a classification set ϕ is generated such as: φ x1 , . .. , x m = y
(
)
( )
where: x l ∈ dom X l , y ∈ dom( Y ) , l = 1,... , m
The result of the system solving classification task is knowledge allowing to prescribe new examples (not belonging to a training set C) to one from determined classes. This effect can be presented by means of many knowledge representation formalisms, like: production rules, decision trees or decision tables. The tests were based on two databases from a banking domain. The first one CREDIT (Japanese Credit Screening - Chiharu Sano) is taken from UCI Repository. The data base CREDITPOL (Credit Decision – V. Galant) contains the real data files from one of the Polish banks. Table 1 describes briefly main features of the training sets. Both databases include more than 100 cases (therefore the data file is representative) referring to the same or similar number of classes and attributes. Table 1. Test Databases characteristics Attributes Database
No. of cases
No. Of classes
CREDIT
125
2
continuous 5
discrete 5
CREDITPOL
146
2
1
5
The experiments were performed by means of two systems: the most popular and best developed decision-tree tool mentioned above - C4.5 and GIMS, designed and implemented by one of the paper’s authors. As a consequence of the research, four knowledge bases were generated. Contents of the rule-bases is presented in Appendix.
4. A Validation Process of Generated Rules
4.1. A General Idea of Knowledge Base Validation
Knowledge base verification and evaluation, as two procedures of a validation process, use separate techniques and, as a consequence, criterion items are different. Basically, all methods were developed for knowledge bases, created in the more natural manner (mostly with employing a knowledge acquisition phase). A question appears, in such a context, whether these methods can be applied for a set of rules generated by classification algorithms. In our opinion the methods may be applied, for the following reasons: 1) generated knowledge bases are used for classification tasks in the same purpose, as in the case of other expert systems, which use domain knowledge (concordance of the goals), 2) reasoning techniques, employed during the classification process, are very similar to heuristic rules obtained from experts (act procedures concordance), 3) knowledge bases to be generated, are created as a consequence of machine learning procedures can be later expressed as one of the common accepted knowledge formalisms (knowledge representations concordance). It does not mean fully substitution of both sorts of knowledge bases; on the contrary - some specific features of generated rules can be detected. Let us have a look on the procedures and approaches applied during the knowledge validation process. The first of the procedures - verification - can be identified with objective validation, because the ultimate goal is to check, whether a knowledge base fulfils one or more formal specifications, see: (Laurent, 1992). In practice, we have to verify completeness and consistency of a knowledge base. There are several methods developed for verification of knowledge bases represented as rule sets. All of them can be grouped as follows: a) graph-oriented approach, using directed graphs (Nazaterth and Kenendy, 1991) or K-trees (Suh and Murray, 1994), b) decision-tables approach, employing some sorts of decision tables ((Cragun and Steudel, 1987; Nguyen, 1987; Vets et al. 1997), c) other approaches, including different concepts as: metaknowledge (Morrel, 1989)], incremental verification (Messeguer, 1992) or machine-learning techniques (Lounis, 1995). Applying one of them, we first need to transform rule sets into a form accepted by the method. Thus, we check initially rule set completeness (e.g. unreferenced and illegal attribute values, unreachable conclusions and socalled “dead-end” conditions or goals) and then we verify its consistency, searching: redundant, subsumed, conflicting and circular rules or unnecessary IF conditions. The second procedure - evaluation - is more subjective and it allows for determining, whether pseudoformal specifications (Laurent, 1992) from user’s point of view are achieved. There is also sensu largo understanding of the procedure, relying on total validation of a knowledge-based system, see for example: (Owoc, 1997; Rouge et al. 1995). In reality, we may use two additional criteria during this procedure: knowledge adequacy and reliability. Testing is the most important activity used here. More known knowledge evaluation techniques can be classified into categories: a) model-based approaches, where a knowledge base model (expressed in some way ) is the basic referential concept; such techniques are: analytical hierarchy approach (Liebowitz, 1989) also metaknowledge concept (Morrel, 1989) and VITAL methodology (Rouge et al., 1995), b) procedure-oriented approaches, elaborating with engaging adequate tasks for the problem as: empirical method (Lehner, 1989) or knowledge refinement approach (Crow and Sleeman, 1995).
Basically, performing this procedure we tend to assess user’s satisfaction from a knowledgebased system, especially from the implemented knowledge regarding as a whole. Therefore, we try to evaluate whether a rule set fits to assumed system’s goal (so, knowledge adequacy) along to expected results generated by a system – knowledge reliability. After the overview presented above, we see how significant and useful (during the whole knowledge validation process) is a knowledge model prepared earlier. Some of methods were implemented in programmed validation tools described in (Zlatareva, 1994). In
reality, usability of the tools is very limited, because of troublesome requirements assumed during performing validation procedures (Owoc and Ochmanska, 1996). In the case of using classification algorithms, a domain knowledge model is not available but some concepts can be incorporated. The proposed validation framework for sets of rules generated by classification algorithms consists of three steps. In the initial step, the rules are transformed into decision tables (DTs). We use a tabular representation expressed as an expanded DT storing decisions in a canonical form. Cragun (1997), Wets et al. (1997), among others have shown the usability of this form in chosen procedures of knowledge validation. Then, each of four preprocessed sets of rules is verified in concordance to completeness and consistency. We try to check for knowledge completeness from local point of view (Owoc, 1997), searching for these attributes and values in rules, which are missed, as well as for rules with unreachable conclusions or premises. Verifying knowledge consistency, we apply typical methods in the case. So, we look for: redundant, contradictory, subsumed and circular rules and we try to discover unnecessary IF in rules. The last step relies on evaluation of each knowledge base, regarding two criteria: knowledge adequacy and reliability. The next sections present results of the validation process.
4.2. Verification of Rules Completeness and Consistency
According to the method pointed out earlier, we have transformed rule sets into expanded decision tables (EDT), see (Vets et al., 1997). Every table refers exactly to one rule set in such a way, that condition states and actions have been derived from the successive rules. The source of every state is marked at the bottom line of the table as a rule number. Table 2. EDT for the “CREDIT” KB generated by C4.5 Condition Subjects Condition States Job Yes No Position period (months) >2 - 4.5 >4.5 Deposit value 75 >75 Bad localisation Yes No Action Subjects Action Values Credit Yes X X Credit No X X X Rule No. 1 2 3 4 5 Verifying the “CREDIT” KB, generated by the GIMS system, we see partially changed attributes, taken into consideration. The classification algorithms acts in a different way (without simplifications to be assumed in C4.5), therefore we expect not the same results. Indeed, the rule set is complete, regarding attribute values (no unreferenced or illegal cases), as well as in the aspect of conclusions to be reached (all are possible to achieve) and no deadand IF conditions or goals. Checking KB consistency, we did not find gaps in rules comparing each other (e.g. redundant, subsumed,
conflict or circular rules). However, a number of rules could be
reduced by creation premises with bigger quantity of variables. This is a matter of effectiveness of reasoning line, but the rule set seems to be consistent.
Table 4. EDT for the “CREDITPOL” KB generated by C4.5 Condition Subjects Demand Sales Forecasts Financial Ratio Action Subjects Credit Yes Credit No Rule No.
High V. good X 1
X 5
Condition States Good Average Good Sufficient >21 [24,30] ≤24 Action Values X X X X 3 6 9 4
Low X 1
The CREDITPOL KB includes just three attributes, but we have bigger number of condition states. Testing rule set completeness, we have detected unreferenced attribute values (e.g. rules 9 and 4), although they gives the same conclusion. The rule 3 does not cover the possible values (what about “Demand”=Good and “Financial Ratio” ≤ 21?), as well as the rule no. 6. Verifying the KB consistency, no redundant, subsumed or circular rules have been detected. However, we discovered unnecessary IF conditions comparing premises of the rules 9 and 4. The same conclusions can be achieved with “Sales Forecasts”=Not Good, no mention about “Financial ratio”. As a result of missing attribute values some rules can generate conflicts (rules no. 6 and 4). Table 5. EDT for the “CREDITPOL” KB generated by GIMS Condition Subjects Condition States Demand High Good Average Sales Forecasts Very Good Good Suffi- Unsuffigood cient Cient Financial Ratio ≤21,5 >21,5 ≤17,5 >17,5 Action Subjects Action Values Credit Yes X X X X Credit No X X X X Rule No. 1 2 3 4 5 6 7 8
Low X 9
We did not discover missed rule, verifying completeness of the CREDITPOL-GIMS KB, in relation to unreferenced or illegal attribute values, as well as to unreachable conclusions or dead-end cases. However, as before, the number of rules could be reduced from the formal point of view. Checking KB consistency, we did not find evident “bad” rules. 4.3. Evaluation of Generated Knowledge Bases Adequacy and Reliability
As it was explained earlier, the last procedure has an objective: to evaluate knowledge base as the whole. The first criterion - knowledge adequacy - is very easy to estimation. Classification algorithms act in such manner, that generate rules from a training set. So, if a
delivered database is proper prepared, we may expect full adequacy. In other words, generated rules use variables, which relate exactly to a classification task. It means 100 % knowledge base adequacy, from the obvious reasons. The four tested rule sets did not contain features not fitting to the problem. The true value of the system can be assessed, by testing with real data and comparing the expected and achieved results. Table 6 demonstrates empirical evaluation of the knowledge bases presenting their main features, as well as accuracy ratios. Table 6 shows the test results obtained by means of the ten cross-validation where columns “Number of rules” and “Average premise length” illustrate the size of knowledge base and the column “Accuracy” presents percentage correctly classified on the validation sample. Table 6. Knowledge base size and classification accuracy Algorithm-database
Number of rules
Average premise length
Accuracy
C4.5-CREDIT
5
1,6
73,82
C4.5-CREDITPOL
7
1,86
91,93
GIMS-CREDIT
5
2,6
73,98
GIMS-CREDITPOL
9
2
95,78
The last column with accuracy ratios can be interpreted as system reliability. Despite of detected gaps, with reference to rule sets completeness and consistency, both algorithms generate very correct results. The level over 70% proper classified results should be regarded as very successful. 5. Conclusions
Usually, rule sets generated by machine learning techniques, seem to be free of typical knowledge base deficiencies. In this paper, we have demonstrated that it is reasonable to validate such KBs, using techniques adopted from the expert systems validation process. The basic contributions of the research are: 1) Rule sets are in low numbers, usually less than 10. All rules represent rather “simple” knowledge in the contrast to “deep” knowledge introduced recently. It is assumed, that every rule concludes with prescribing new examples to one from the defined classes. These properties of the rule sets are essential during the validation process.
2) There are regular anomalies, detected during testing knowledge base completeness. Depends of the quality of a training test and then, simplifying endeavours – a generated rule set can contain one or more missing rules (not covering all needed attribute values). Because of the “flat” structure of a rule set control – unreachable conclusions besides of “dead-end” IF conditions and goals are excluded.
3) Checking rule set consistency, we did discover conflicting rules and exceptionally – unnecessary IF conditions. The other errors of this category are rather impossible, as: redundant, subsumed or circular rules, from the reasons put in the paper.
4) Rule sets, generated by classification algorithms, contain fully adequate knowledge, in practice. This is a consequence of machine-learning approach, where casual, unneccessary data can be avoided.
5) Evaluation of knowledge base reliability can be made by comparing expected and tested results. Classification accuracy is very high under condition the well applied algorithms.
As future directions of the current study, we will be interested in: using another techniques for transforming rules and comparing the results. Additionally, some validation procedures we expect to be automated. Currently, such research are is being carried out. References
Cragun B.J., Steudel H.J. (1987), A Decision-Table-Based Processor for Checking of Completeness and Consistency in Rule-Based Expert Systems. IEEE. Craw S., Sleeman D. (1995), Refinement in Response to Validation. Expert Systems with Application. Vol. 8, No. 3. Czerwinski Z. (1970), O mierze zaleznosci stochastycznej [About a Measure of Stochastic Dependency]. Przeglad Statystyczny 1970/2, PWN Warszawa. Galant V. (1996), GIMS - Decision Tree Learning System, in: Proceedings. of the 1st Polish Conference on Theory and Applications of Artificial Intelligence, Lodz. Galant V. (1997), Zastosowanie indukcyjnych metod symbolicznych do odkrywania wiedzy w SIZ. [Application of Inductive Symbolic Methods for Knowledge Discovery in MIS]. Doctoral Dissertation, Wroclaw. Laurent J-P. (1992), Proposals for a Valid Terminology in KBS Validation. 10th European Conference on Artificial Intelligence. John Wiley & Sons, Ltd. Lehner P.E. (1989), Toward an Empirical Approach to Evaluating the Knowledge Base of an Expert System. IEEE. Liebowitz J. (1986), Useful Approach for Evaluating Expert Systems. Expert Systems Vol. 3, No. 2. Lounis H. (1995), Knowledge-Based Systems Verification: A Machine Learning-Based Approach. Expert Systems With Applications, Vol. 8, No. 3. Messeguer P. (1992), Incremental Verification of Rule-Based Expert Systems. 10th European Conf. on AI. John Wiley & Sons Ltd. New York. Morell L.J. (1989), Use of Metaknowledge in the Verification of Knowledge-Based Systems. IEEE. Nazareth D.L., Kennedy M.H. (1991), : Verification of Rule-Based Knowledge using Directed Graphs. [in:] Knowledge Acquisition. Academic Press Ltd. Nguyen T.A. (1987), Verifying Consistency of Production Systems. Proceedings of the Third Conference on Artificial Intelligence Applications, Washington D.C., IEEE Computer Society Press. Owoc M.L. (1994), Kryteria wartosciowania wiedzy [Knowledge Validation Criteria]. AE, Wroclaw. Prace Naukowe AE [Research Papers of the AE] no. 691. Owoc M.L., Ochmanska M. (1996), Limits of Knowledge Base Validation. EXPERSYS-96 Artificial Intelligence Applications. IITT - International Paris Owoc M.L. (1997), From Local to Global Validation of a Knowledge Base. Prace Naukowe AE [Research Papers of the AE], No 772 Wroclaw. Quinlan J.R. (1993), C4.5: programs for machine learning. Morgan Kaufmann Publishers. Rouge A., Lapicque J.L., Brossier F., Lozingues Y. (1995), Validation and Verification of KADS Data and Domain Knowledge. Expert Systems with Applications, Vol. 8, No. 3. Suh Y.H., Murray T.J. (1994), A Tree-Based Approach for Verifying Completeness and Consistency in Rule-Based Systems. Expert Systems with Applications, Vol. 7, No.2. Vets G., Vanthienen J., Piramuthu S. (1997), Extending a Tabular Knowledge-Based Framework with Feature Selection. Expert Systems With Applications. Vol. 13, No.2. Zlatareva N., Preece A. (1994), State of the Art in Automated Validation of Knowledge-Based Systems. Expert Systems With Applications. Vol. 7, No.2. Appendix - Contents of generated knowledge bases:
CREDIT with C4.5 Rule 9: Rule 1: Rule 6: Rule 12: Rule 2: CREDIT with GIMS Rule 1: Rule 2: Rule 3: Rule 4: Rule 5:
job = yes and position period (months) > 2 → pos job = yes and credit purpose = pc → pos credit purpose =medinstr and age 75.00 and good localisation = no → neg position period (months) > 4.50 and job = no and deposit value >75.00 and good localisation = yes → pos
CREDITPOL with C4.5 Rule 1: Demand = high → yes Rule 5: Sales Forecasts = v.good → yes Rule 3: Financial ratio > 21 Demand = good → yes Rule 6: Financial ratio 24 → yes Rule 11: Demand = low → no Rule 9: Sales Forecasts = sufficient and Demand = average → no Rule 4: Financial ratio