Rule Improvement Through Decision Boundary Detection ... - CiteSeerX

7 downloads 0 Views 147KB Size Report
Abstract. Rule extraction from arti cial neural networks (ANN) pro- vides a mechanism to interpret the knowledge embedded in the numerical weights.
Rule Improvement Through Decision Boundary Detection Using Sensitivity Analysis AP Engelbrecht1 and HL Viktor2 Department of Computer Science, University of Pretoria, Pretoria, SOUTH AFRICA, [email protected] Department of Informatics, University of Pretoria, Pretoria, SOUTH AFRICA,

1 2

[email protected]

Abstract. Rule extraction from arti cial neural networks (ANN) pro-

vides a mechanism to interpret the knowledge embedded in the numerical weights. Classi cation problems with continuous-valued parameters create diculties in determining boundary conditions for these parameters. This paper presents an approach to locate such boundaries using sensitivity analysis. Inclusion of this decision boundary detection approach in a rule extraction algorithm resulted in signi cant improvements in rule accuracies.

1 Introduction

Arti cial neural networks (ANN) have proved to be very ecient classi cation tools. Domain experts are, however, skeptical to base crucial decisions on the results obtained from ANNs, mainly due to the numerical representation used by ANNs. It is very dicult to interpret the knowledge encapsulated by the numerical weights of ANNs. Rule extraction from ANNs provides a mechanism to interpret this numerically encoded knowledge. Several rule extraction algorithms have been developed, including [Craven et al 1993,Fu 1994,Towell 1994,Viktor 1998]. These algorithms have shown to be ecient in the knowledge extraction process. The output of rule extraction algorithms are propositional DNF rules with attribute-value tests of the form Ai < rel operator > boundary value, e.g. petal-width < 49.50. Classi cation problems with continuous-valued attributes present diculties in determining the boundary conditions for such attributes. Current solutions to this problem include discretizing the continuous attributes, to use a bruteforce approach to nd boundaries, or to implement an algorithm to locate decision boundaries. Discretization may cause important information to be obscured, while decision boundary detection algorithms are computationally complex [Baum 1991,Cohn et al 1994,Hwang et al 1991]. This paper presents a computationally ecient approach to locate decision boundaries, using sensitivity analysis. First-order derivatives of the ANN outputs with regard to input patterns are used to nd the position of boundaries. This algorithm is used in conjunction with the ANNSER rule extraction algorithm

[Viktor et al 1995] to nd the boundary values for continuous-valued attributes. Results of this approach showed signi cant improvements in rule accuracies. This paper is outlined as follows: Section 2 overviews an attribute evaluation approach to nd boundary values, and presents a sensitivity analysis approach to locate decision boundaries. Sections 3 and 4 respectively present results of the sensitivity analysis approach on the iris and breast cancer problems, compared to that of the attribute evaluation approach. 2 Decision boundary detection

The ANNSER rule extraction algorithm extract rules from feedforward ANNs, using sensitivity analysis to prune the ANN prior to rule extraction [Viktor et al 1995]. Propositional DNF rules are produced. This section overviews two approaches to determine boundary values for continuous-valued attributes: the attribute evaluation approach, and the sensitivity analysis decision boundary detection approach. Consider a training set that contains n values for continuous attribute Ai , namely the set of values fv1 ; : : : ; vn g. A decision boundary is a point xi in the input space where the attribute values of Ai are divided into two, i.e. subset A with values fv1 ; : : : ; vi g where Ai < xi and subset B containing fvi+1 ; : : : ; vn g for Ai > xi . The value of xi can be used as a threshold value (boundary value) in attribute-value tests to distinguish between output classes, where the test (Ai < xi ) covers concepts of class Cj and the test (Ai > xi ) covers those concepts that do not fall in class Cj . The attribute evaluation method to determine these threshold values is applied to the original unscaled data set. In this approach, the minimum and maximum values of each attribute Ai with respect to each output class Cj are determined. That is, for each attribute Ai and output class Cj , the range minA < Ai < maxA is determined. If minA corresponds to the minimum value contained in the training set, the test is simpli ed to (Ai < maxA ). Similarly, if the maximum value of the range is equal to the maximum value in the training set, the test is simpli ed to (minA < Ai ). Note that the training examples possibly do not include values that re ect the exact boundaries of the attribute-values. To improve the generalization of the rules over unseen examples, the ranges can be extended to include those values that fall on the boundaries. For example, consider a range a1 < Ai < a2 . The thresholds can be modi ed according to the following equations: i

i

i

i

i

a1 = a1 ? (a2 ? a1 ) a2 = a2 + (a2 ? a1 )

(1) (2) where,  is a domain dependent constant, used to improve the test data classi cation by the rules obtained from the training data [Sestito et al 1994]. The sensitivity analysis decision boundary detection algorithm is based on the following assumption: Consider a continuous-valued attribute Ai . If a small

perturbation Ai of Ai causes the ANN to change its classi cation from one class to another, then, according to [Engelbrecht et al 1998a,Engelbrecht 1998b], a decision boundary is located in the range [Ai ; Ai + Ai ]. That is, a decision boundary is located at the point in input space where a small perturbation to the value of an input parameter causes a change in the output class. Sensitivity analysis of the ANN output with respect to input parameter Ai is used to assign a \measure of closeness" of an attribute value to the boundary value(s) of that attribute. That is, for each example p in the training set, the rst-order derivative @Cj (3) (p)

@Ai

is calculated for each class (output) Cj and for each input Ai @C , the [Engelbrecht et al 1998a,Engelbrecht 1998b]. The higher the value of @A greater the chance that a small perturbation of A(ip) will cause a di erent classi @C values lie closest cation [Engelbrecht 1998b]. Therefore, patterns with high @A to decision boundaries. @C , p = 1;    ; P , reveals peaks at boundary points. A curve A graph of @A @C and to nd tting algorithm can be used to t a curve over the values @A the values of Ai where a peak is located. These values of Ai constitute decision boundaries. Sampling values to the left and right of the boundary peaks indicates whether an attribute should have a value less or greater than the boundary value to trigger a rule. The sensitivity analysis decision boundary detection algorithm is applied in conjunction with the ANNSER rule extraction algorithm in the next sections. j (p) i

j (p) i

j (p) i

j (p) i

3 The Iris data set

The aim of this section is to illustrate the sensitivity analysis decision boundary detection algorithm and to compare the rules extracted to the rules extracted when the attribute evaluation method is used. The Iris classi cation problem concerns the classi cation of Irises into one of three classes, namely Setosa, Versicolor and Virginica. Irises are described by means of four continuous-valued inputs sepal-width, sepal-length, petal-width and petal-length. The original 150 instance Iris data set was randomly divided into a 105 instance training set and a 45 instance test set. The sensitivity analysis decision boundary detection algorithm was executed against the 105 instance training set. Firstly, a 4-2-3 ANN was trained using sigmoid activation functions with steep slopes to approximate linear threshold functions. All input values were scaled to the range [?1; 1]. Training converged after 10 epochs, with a classi cation test accuracy of 98%. Next, the sensitivity analysis pruning algorithm in [Engelbrecht et al 1996,Engelbrecht 1998b] was executed to prune irrelevant

Petal Length vs Iris Versicolor

Petal Width vs Iris Versicolor

4

7 Iris Versicilor

Iris Versicilor

3.5

6

Closeness to Boundary

Closeness to Boundary

3 2.5 2 1.5 1

5

4

3

2

1

0.5 0

0 -1

-0.8

-0.6

-0.4

-0.2 0 0.2 Petal Length

0.4

0.6

0.8

1

-1

-0.8

-0.6

-0.4

-0.2 0 0.2 Petal Width

0.4

0.6

0.8

Fig. 1. Petal-width and petal-length decision boundaries for Versicolor Iris

ANN parameters. Sensitivity analysis showed the sepal-width and sepal-length attributes to be of a low signi cance and these two attributes were subsequently pruned [Engelbrecht 1998b]. After pruning, a reduced 2-2-3 ANN was trained. Again training converged after 10 epochs, with a classi cation test accuracy of 95.9%. The threshold values of the attribute-value tests were determined using the sensitivity analysis decision boundary detection algorithm, as discussed in Section 2. The decision boundary detection algorithm was applied to each of the three Iris types. Figure 1 illustrates the decision boundary peaks formed for the petal-width and petal-length attributes with regard to the Versicolor Iris. These peaks were used to determine the actual unscaled attribute values that corresponded to these boundaries. The boundaries were located at 49:50 for the petal-length attribute and at 18:50 for the petal-width. The actual relational operators were determined by sampling values to the left and right of the boundaries. The boundaries produced two attribute-value tests describing the Versicolor Iris, namely the tests (petal-length < 49.50) and (petal-width < 18.50). The decision boundaries for the Setosa and Versicolor Irises were located using the same approach as discussed above. The Setosa decision boundaries were detected at a petal-length of 19:50 and a petal-width of 6:50, producing attribute-value tests (petal-length < 19.50) and (petal-width < 6.50). The Virginica Iris type was described by the attribute-value tests (petal-length > 49.50) and (petal-width > 16.50).

1

Using these attribute-value tests, the ANNSER rule extraction algorithm extracted the following rules: Rule 1: IF petal-length < 19.50 AND petal-width < 6.50 THEN Setosa Rule 2: IF petal-length > 49.50 AND petal-width > 16.50 THEN Virginica Rule 3: IF petal-length < 49.50 THEN Versicolor Rule 4: IF petal-width < 18.50 THEN Versicolor The test set accuracy of the rule set was 95.9%, with individual rule accuracies ranging from 93.9% to 100%. The accuracy of the set of rules was equal to that of the classi cation accuracy of the 2-2-3 ANN. This implies that the rule set models the ANN to a comparable degree of delity, where the delity is measured by comparing the classi cation performance of the rule set to that of the ANN from which it was extracted [Craven et al 1993]. The attribute evaluation method was applied next, and is illustrated by considering the construction of the attribute-value tests of the rule that describe the Versicolor Iris, as depicted in Table 1. This rule concerned the petal-length attribute. For the Versicolor Iris, the petal-length attribute had values within a range of (13.0 < petal-length < 46.50). The petal-length attribute values in the training set ranged from 13 to 69. Therefore, the minimum attribute-value test range value corresponded to the minimum value in the training data set. The attribute-value test was simpli ed to (petal-length < 46.50). To improve the generalization of the rule set, the value of  was set to 0:03. This value was used to calculate a new threshold, using equation (2). A new attribute-value test, namely (petal-length < 47.50), was produced. The resultant rules were subsequently compared with the results of the decision boundary detection algorithm. Table 1 shows the attribute-value tests of the two rule sets. Using the attribute evaluation approach, four rules with a test set accuracy of 93.9% were extracted. The accuracy of the individual rules ranged from 89.9% to 100.0%. Using the decision boundary threshold values obtained from the sensitivity analysis approach, an improvement of 2.0% on the overall accuracy was achieved. An improvement of 4.0% was achieved on the least accurate rule. For this set of experiments, the decision boundary detection algorithm produced an accurate, general set of rules. 4 The Breast Cancer data set

The aim of this section is to illustrate the sensitivity analysis decision boundary detection algorithm in a noisy domain that contained incorrect values. The breast cancer data set, obtained from the UCI machine learning repository was used for this purpose. Originally, the breast cancer database was obtained from Dr. William H. Wolberg of the University of Wisconsin Hospitals, Madison. The data set contained 699 tuples and distinguished between benign (noncancerous) breast diseases and malignant cancer. The data set concerned 458 (65.5%) benign and 241 (34.5%) malignant cases. In practice, over 80 percent of breast lumps are proven benign.

Technique Iris type Attribute-value test Attribute evaluation Setosa petal-length < 19.50 petal-width < 6.50 Virginica petal-length > 44.50 petal-width > 14.50 Versicolor petal-length < 47.50 Decision boundaries Setosa

petal-width < 17.50 petal-length < 19.50 petal-width < 6.50

Virginica petal-length > 49.50 petal-width > 16.50 Versicolor petal-length < 49.50 petal-width < 18.50

Table 1. Attribute evaluation versus decision boundaries The data set contained missing values and the level of noise (incorrect values) was unknown. There are 10 input attributes, including the redundant sample code number. The other nine inputs concerned the results obtained from the tissue samples that were pathologically analyzed. A 10-10-2 ANN was trained, using sigmoid activation functions with a high slope to approximate linear threshold values. The sensitivity analysis pruning algorithm reduced the ANN to a 3-3-2 network that produced six rules. The classi cation test accuracy of this ANN was 95.2%. Next, the attribute-value test thresholds were determined using the attribute evaluation method and the sensitivity analysis decision boundary detection algorithm. The rule sets for both methods were extracted. For the original attribute evaluation method, the rule set accuracy was 79.6%. The individual rule accuracies ranged from 66.4% to 85.3%. The accuracy of the rule set that was produced after the results of the sensitivity analysis decision boundary detection algorithm were incorporated was 94.3%, giving an improvement of 14.7%. The individual rule accuracies ranged from 65.4% to 93.4%. The delity of the nal rule set is high, since the rule set accuracy of 94.3% is comparable to that of the original ANN (95.2%). 5 Conclusion

This paper presented an approach to rule extraction where a decision boundary detection algorithm was used to nd threshold values for continuous-valued attributes in attribute-value tests. The decision boundary algorithm uses sensitivity analysis to locate boundaries for each attribute. The sensitivity analysis

approach to detect decision boundaries is computationally feasible, since the rst-order derivatives are already calculated as part of the learning equations. Results showed a signi cant improvement in rule accuracies compared to an attribute evaluation approach to nd threshold values. References [Baum 1991] EB Baum, Neural Net Algorithms that Learn in Polynomial Time from Examples and Queries, IEEE Transactions on Neural Networks, 2(1), 1991, pp 5-19. [Cohn et al 1994] D Cohn, L Atlas, R Ladner, Improving Generalization with Active Learning, Machine Learning, Vol 15, 1994, pp 201-221. [Craven et al 1993] MW Craven and JW Shavlik, 1993. Learning Symbolic Rules using Arti cial Neural Networks, Proceedings of the Tenth International Conference on Machine Learning, Amherst: USA, pp.79-95. [Engelbrecht et al 1996] AP Engelbrecht, I Cloete, A Sensitivity Analysis Algorithm for Pruning Feedforward Neural Networks, IEEE International Conference in Neural Networks, Washington, Vol 2, 1996, pp 1274-1277. [Engelbrecht et al 1998a] AP Engelbrecht and I Cloete, 1998. Selective Learning using Sensitivity Analysis, 1998 International Joint Conference on Neural Networks (IJCNN'98), Alaska: USA, pp.1150-1155. [Engelbrecht 1998b] AP Engelbrecht, 1998. Sensitivity Analysis of Multilayer Neural Networks, submitted PhD dissertation, Department of Computer Science, University of Stellenbosch, Stellenbosch: South Africa. [Fu 1994] LM Fu, Rule Generation from Neural Networks, IEEE Transactions on Systems, Man and Cybernetics, Vol 24, No 8, August 1994, pp 1114-1124. [Hwang et al 1991] J-N Hwang, JJ Choi, S Oh, RJ Marks II, Query-Based Learning Applied to Partially Trained Multilayer Perceptrons, IEEE Transactions on Neural Networks, 2(1), January 1991, pp 131-136. [Sestito et al 1994] S Sestito and TS Dillon, 1994. Automated Knowledge Acquisition, Prentice-Hall, Sydney: Australia. [Towell 1994] GG Towell and JW Shavlik, Re ning Symbolic Knowledge using Neural Networks, Machine Learning, Vol. 12, 1994, pp 321-331. [Viktor et al 1995] HL Viktor, AP Engelbrecht and I Cloete, 1995. Reduction of Symbolic Rules from Arti cial Neural Networks using Sensitivity Analysis, IEEE International Conference on Neural Networks (ICNN'95), Perth: Australia, pp.1788-1793. [Viktor et al 1998a] HL Viktor, AP Engelbrecht, I Cloete, Incorporating Rule Extraction from ANNs into a Cooperative Learning Environment, Neural Networks & their Applications (NEURAP'98), Marseilles, France, March 1998, pp 386-391. [Viktor 1998] HL Viktor, 1998. Learning by Cooperation: An Approach to Rule Induction and Knowledge Fusion, submitted PhD dissertation, Department of Computer Science, University of Stellenbosch, Stellenbosch: South Africa.

Suggest Documents