knowledge extraction and insertion from radial basis function networks

0 downloads 0 Views 126KB Size Report
of researchers 6], 11] and they have been proved to be capable of universal function approximation 14]. RBF networks have been applied to sev- eral real-world ...
IEE Colloquium on Applied Statistical Pattern Recognition, Birmingham, 20th April, 1999

KNOWLEDGE EXTRACTION AND INSERTION FROM RADIAL BASIS FUNCTION NETWORKS Kenneth J. McGarry and John MacIntyre School of Computing, Engineering and Technology, St Peters Campus, St Peters Way, University of Sunderland, Sunderland, England, SR6 ODD email: [email protected]

ABSTRACT

Neural networks provide excellent solutions for pattern recognition and classi cation problems. Unfortunately, in the case of distributed neural networks such as the multilayer perceptron it is dicult to comprehend the learned internal mappings. This makes any form of explanation facility such as that possessed by expert systems impractical. However, in the case of localist neural representations the situation is more transparent to examination. This paper examines the quality and comprehensibility of rules extracted from localist neural networks, speci cally the radial basis function network. The rules are analysed in order to gain knowledge and insight into the data. We also investigate the bene ts of inserting prior domain knowledge into a radial basis function network.

RADIAL BASIS FUNCTION NETWORKS Radial basis function (RBF) neural networks were independently proposed by a number of researchers [6], [11] and they have been proved to be capable of universal function approximation [14]. RBF networks have been applied to several real-world, large-scale problems of considerable complexity [21]. They are excellent at pattern recognition and are robust classi ers, with the ability to generalize in making decisions about imprecise input data [16]. They o er robust solutions to a variety of classi cation problems such as speech, character and signal recognition, as well as functional prediction and system modeling where the physical processes are not understood or are highly complex [10]. Figure 1 shows the architecture of a typical RBF network.

INTRODUCTION Recently there has been a lot of interest in the extraction of symbolic rules from neural networks. The work described in this paper is concerned with an evaluation of the accuracy and complexity of symbolic rules extracted from radial basis function networks. Here we examine the ability of rule extraction algorithms to extract meaningful rules that describe the overall performance of a particular network. The research carried out on the extracted rule quality and complexity also has a direct bearing on the use of rule extraction algorithms for data mining and knowledge discovery. Most neural network development e ort has concentrated upon what has become known as the tabula rasa approach, i.e. each neural network is developed from scratch using the appropriate training data and does not take advantage of previous task-related work. We investigate the practicality of inserting prior domain knowledge and observe the e ects upon network performance in terms of classi cation accuracy, training times and upon the number of training samples available.

Hidden layer

Input layer

Output layer

Gaussian basis function

Y1

Gaussian basis function

Y2

Gaussian basis function

Yn

Figure 1: Radial basis function network The RBF network consists of feedforward architecture with an input layer, a hidden layer of RBF units and an output layer of linear units. The response of the output units is calculated quite simply using equation 1. J X j =l

Wlj Zj (x)

where: W = weight matrix

(1)

Z = hidden unit activations x = input vector

EXTRACTING RULES FROM NEURAL NETWORKS

The input layer simply transfers the input vector to the hidden units, which form a localized response to the input pattern. The activation levels of the output units provide an indication of the nearness of the input vector to the classes. Learning is normally undertaken as a two-stage process. An unsupervised clustering technique is appropriate for the hidden layer while a supervised method is applied to the output layer units. The nodes in the hidden layer are implemented by kernel functions, which operate over a localized area of input space. The e ective range of the kernels is determined by the values allocated to the centre and width of the radial basis function. The Gaussian function is very appropriate for knowledge transfer and has a response characteristic determined by equation 2.

In this section we discuss motivations, techniques and methodology for rule-extraction from RBF networks. RBF networks provide a localized solution [4, 12] that is amenable to rule-extraction. Previous work on extracting rules from radial basis functions [19] has investigated generating probabilistic rules or has identi ed certain neuro-fuzzy similarities [9]. Rule extraction has been carried out upon a variety of neural network types such as multi-layer perceptrons [18, 5, 7], Kohonen networks [20] and recurrent networks [13]. The advantages of extracting rules from neural networks will be discussed in general terms applicable to most neural networks.

Zj (x) = exp

2 ; jjx ; jj

!

j2

(2)

where:  = n-dimensional parameter vector  = width of receptive eld x = input vector The output of a hidden unit is radially symmetric in input space. Therefore a hidden unit will give an output dependent upon the Euclidean distance between the centre of the basis function and the input vector. Radial basis function networks generally require more varied training examples and need more hidden units than multi-layer perceptrons to achieve similar accuracy. However, due to the localized architecture they have the advantage of quicker training times. RBF networks are an appropriate choice for both classi cation tasks and function approximation. The adjustable parameters within a radial basis function network that e ect classi cation accuracy and that may provide information for rule-extraction are:  Number of basis functions used.  Location of the centre of the basis function.  Width of the basis function.  Weights connecting the hidden RBF units to the linear output units.

 The knowledge learned by a neural network is generally dicult to understand by humans. The provision of a mechanism that can interpret the networks input/output mappings in the form of rules would be very useful.

 De ciencies in the original training set

may be identi ed, thus the generalization of the network may be improved by the addition/enhancement of new classes. The identi cation of noisy training data for removal would also enhance network performance.

 Analysis of previously unknown rela-

tionships in the data. This feature has a huge potential for knowledge discovery/data mining and possibilities may exist for scienti c induction.

 Once having extracted rules from a

neural network we have a rule base that has the potential to be inserted back into a new network with a similar problem domain.

An important part of rule extraction is concerned with the preprocessing of the derived rules and is a process that must be carried out to ensure that a compact set of good quality adequately describes the initial neural network.

Rule-extraction methodology

We developed two rule extraction algorithms, the rst gave a simple statistical summary of the RBF hidden units parameters for each class i.e a single rule describing each class was produced. The second, analyzed each

hidden unit, producing a rule for each each unit. The data sets used were the Iris data set and a condition monitoring data set.

Iris data set The rst and simplest data set used was Fishers iris data [8]. The iris data set is well known within the neural network community as a benchmark to demonstrate the e ectiveness of new algorithms. The iris data set consists of three classes of owers with 50 patterns each, one class is linearly separable (Setosa) while the other two are not (versicolor and Virginica). The data is continuous valued with a dimensionality of four. The inputs correspond to the plant features such as the sepal length, sepal width, petal length and petal width. See table 1. Table 1: Examples of Iris data set. Where: SL=sepal length, SW=sepal width, PL=petal length and PW=petal width. C1=Setosa, C2=versicolor, C3=Virginica.

Input features

SL 5.2 6.2 6.7

SW 3.5 2.8 3.1

PL 1.5 4.8 4.4

Output classes

PW C1 C2 0.2 1 0 1.8 0 0 1.4 0 1

C3 0 1 0

Vibration data set The second data set consisted of spectral vibration data gathered from a fault diagnosis application. Many large items of machinery are regularly monitored by analyzing the spectral vibration data that is generated when they are operating. The vibration signatures produced are very distinct and any changes in these patterns may be used to detect faults in the mechanical condition. The data set consists of 681 samples composed of 10 input features and seven output classes. The input features are continuous values representing the running speed (rotations per minute(RPM)) of the motor or fan and the various harmonics that occur at twice, three times running speed etc. The data represents several fault conditions but also includes healthy data. Several aspects of the data are highly non-linear and linearly non-separable. Table 2 presents a sub-set of the vibration data test set.

Details of rule extraction algorithms The initial algorithm developed to extract rules used a simple approach. The input weight space was summarized in terms of maximum and minimum values per input dimension. The extracted rule set is compact, providing one rule for each output class. Figure 2 describes the extraction algorithm in detail.

Input: Hidden weights  (centre positions) Output: One rule per output class Procedure:

Train RBF network on data set Cluster hidden units by class For each class cluster For each  Get min value Get max value For each class Write out rule by: For each  = [min - max] interval Join intervals with AND Add Class label Write rule to le

Figure 2: Rule-Extraction algorithm I Rule extraction algorithm II analyzed the RBF hidden units in greater detail, with each hidden unit compiling into a rule. The algorithm is similar to the RULEX technique of Andrews [3]. The method of calculating the upper and lower bounds for each antecedent (input features in the data set) is calculated by: IF Feature1 is TRUE AND IF Feature2 is TRUE AND IF Featuren is TRUE THEN Classx Where a Feature is composed of upper and lower bounds calculated by the RBF centre n , RBF width  and feature \steepness" S. The value of the steepness was discovered empirically to be about 0.6 and is related to the value of the width parameter. Xlower = i ; i + S Xupper = i + i ; S

Table 2: Examples of vibration spectra data set. Where: RPM1=motor speed, RPM2=twice motor speed

Input features

RPM1 3.205 1.399 0.406

RPM2 1.687 1.273 0.457

RPM3 1.046 1.328 0.289

Fault classes

RPM4 Healthy Unbalance Misalignment 0.868 1 0 0 0.982 0 1 0 0.789 0 0 1

Input:

Hidden weights  (centre positions) Gaussian radius spread  Steepness S

Output: One rule per hidden unit Procedure:

Train RBF network on data set For each hidden unit For each i Xlower = i ; i + S Xupper = i + i ; S Build rule by: antecedent = [Xlower ; Xupper ] Join antecedents with AND Add Class label Write rule to le

Figure 3: Rule-Extraction algorithm II

Analysis of results The rules extracted using algorithm I proved to be inaccurate when presented with the test data (25% of the neural network data was held back for this purpose). Rules extracted from the RBF network trained on the iris domain had accuracies of 25-40% which are unacceptable for any practice purposes. Rules extracted from the vibration RBF network gave worse results, accuracies of 15-20% were typical. Using algorithm II, the extracted rules provided good classi cation results on both the iris and vibration dataset. Accuracies of the extracted rules were between 80-85% for the iris domain and 65-70% for the vibration domain. Examples of the extracted rules for the vibration diagnosis problem are shown in gure 4. However, for humans to understand the extracted rules it is essential that only a small number of key rules are generated. The rules represent local solutions that must be organized into clusters representing the

Rule 4 IF (RPM  0.41821 AND  3.0124) AND IF (2RPM  0.45858 AND  2.8013) AND IF (3RPM  0.52502 AND  1.6375) AND IF (4RPM  0.44754 AND 2.2021) AND IF (5RPM  0.23192 AND  0.86202) AND IF (HarmPow  2.3951 AND  14.6983) AND IF (IRD  62.1835 AND  248.734) AND IF (ORD  0.0001 AND  0.0001) AND IF (Train  0.0001 AND  0.0001) AND IF (Ball  0.0001 AND  0.0001) THEN..IRD Figure 4: Extracted rules from vibration data set global trends and relationships within the data.

INSERTION AND REFINEMENT OF PRIOR DOMAIN KNOWLEDGE The use of prior domain knowledge has had a long and successful history within purely symbolic processing systems. However, neural networks generally have diculties in sharing their task experience because each network is trained individually on a speci c task that may involve the modeling of a complex function. In the case of multi-layer perceptron (MLP) networks, the learned function is stored across the weights and thresholds in a distributed form. This diculty hinders the isolation and transfer of desirable features learned by the neural network to another task. Early work by Abu-Mostafa involved inserting knowledge other than the training set into a neural network [1]. Further work by researchers [15, 17] has proved to a certain degree that manipulating the neural network parameters can lead to successful task transfer. Unfortunately, many problems still exist: such as catastrophic interference between old and new knowledge and the identi cation and preservation of useful hyper-

layer of the network is quickly and easily accomplished.

4.5 Setosa Versicolor Virginica NewClass

4 3.5 3

IRIS DATA

plane positions. The situation is more encouraging with localist neural network representations. Since each neuron encodes for a particular section of input space. It is then a reasonably straight-forward process to isolate a speci c neuron and identify its role in classi cation. The use of domain theory within a neural network architecture would enable the priming of the network with parameters derived from IF..THEN type rules [2]. The motivation for inserting rules into an RBF network is to take advantage of previous training experiences. More interestingly, it is possible to modify the networks parameters according to the users beliefs or expectations in the absence of suitable data.

2.5 2 1.5 1 0.5 0

Rule insertion process

Inserting an IF..THEN rule into an RBF network is the reverse process of rule extraction. Based upon the upper and lower limits of each feature we can determine the centre for each new RBF unit. Where S is the steepness parameter. n = (Xupper ; Xlower ) = S

Next the width n of the new RBF unit must be chosen, this value is normally set at between 0.8-1.0 (using our RBF software). Given the relative immaturity of the experimental work we have only considered synthetic data. The iris data set was modi ed to include a fourth class of ower. The number of features within the data set remain the same but an extra column of zeros was required to pad out existing class label de nitions. The actual data points were generated as a cluster with a Gaussian distribution. Figure 6 shows the patterns plotted against the two main features. Figure 5 shows a typical set of data points made into a rule. Rule 1 IF (SL  4.4 AND  5.7) AND IF (SW  2.9 AND  4.4) AND IF (PL  3.0 AND  4.5) AND IF (PW  4.2 AND  6.4) THEN..NewClass Figure 5: Synthetic rule inserted into RBF network The process of retraining the output

1

2

3

4

5

6

7

PL against PW

Figure 6: Iris data

CONCLUSIONS In order for humans to understand the extracted rules it is essential that only a small number of important rules are generated. The extraction techniques based on radial basis function networks described in this paper have the potential to abstract from the details of the learned network knowledge and provide much more understandable interpretation. The use of prior knowledge has the e ect of providing a base-level of knowledge that may be used to initialize the radial basis network parameters in optimum positions prior to training. Where no suitable data exists, prior knowledge in the form of user expectations of what may constitute the required class may be inserted. This would enable better control over of the input space and leads to a more robust RBF network.

ACKNOWLEDGEMENTS We would like to thank Robert Andrews of Queensland University of Technology for his advice on RULEX. This work was funded by European funding through the Brite-Euram III initiative.

REFERENCES [1] Y. Abu-Mostafa. Learning from hints in neural networks. Journal of Complexity, 6:192{198, 1990.

[2] R. Andrews and S. Geva. RULEX and CEBP networks as the basis for a rule re nement system. In J. Hallam et al, editor, Hybrid Problems, Hybrid Solutions. IOS Press, 1995. [3] R. Andrews and S. Geva. Rules and local function networks. In Rules and Networks-Proceedings of the Rule Extraction From Trained Arti cial Neural Networks Workshop, Arti cial Intelligence and Simulation of Behaviour, Brighton UK, 1996. [4] C. G. Atkeson, A. Moore, and S. Schaal. Locally weighted learning. Arti cial Intelligence Review, pages 11{73, Feb 1997. [5] G. Bologna and C. Pellegrini. Accurate decomposition of standard MLP classi cation responses into symbolic rules. In International Work Conference on Arti cial and Natural Neural Networks, IWANN'97, pages 616{627, Lanazrote, Canaries, 1997. [6] D. S. Broomhead and D. Lowe. Multivariable functional interpolation and adaptive networks. Complex Systems, pages 321{355, 1988. [7] T. Corbett-Clarke and L. Tarassenko. A principled framework and technique for rule extraction from multi-layer perceptrons. In IEE, Proceedings of the 5th International Conference on Arti cial Neural Networks, pages 233{238, Cambridge, England, July 1997. [8] R. Fisher. The use of multiple measurements in taxonomic problems. Annual Eugenics, 7:179{188, 1936. [9] J. S. Roger Jang and C. T. Sun. Functional equivalence between radial basis function networks and fuzzy inference systems. IEEE-NN, 4(1):156{159, January 1993. [10] D. Lowe. Characterising complexity in a radial basis function network. In Proceedings of the 5th International Conference on Arti cial Neural Networks, pages 19{23, Cambridge, UK, 1997. [11] J. Moody and C. J. Darken. Fast learning in networks of locally tuned processing units. Neural Computation, pages 281{294, 1989. [12] R. Murray-Smith and T. A. Johansen. Local learning in local model networks.

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

In IEE Arti cal Neural Networks, pages 40{46, 1995. C. W. Omlin and C. L. Giles. Extraction and insertion of symbolic information in recurrent neural networks. In V.Honavar and L.Uhr, editors, Arti cial Intelligence and Neural Networks:Steps Towards principled Integration, pages 271{299. Academic Press, San Diego, 1994. J. Park and I. W. Sandberg. Universal approximation using radial basis function networks. Neural Computation, 3:246{257, 1991. L. Pratt. Transfering Previously Learned Back-Propagation Neural Networks to New Learning Tasks. PhD thesis, Rutgers, State University of New Jersey, May 1993. A. Roy, S. Govil, and R. Miranda. An algorithm to generate radial basis function (RBF)-like nets for classi cation problems. Neural Networks, 8(2):179{ 201, 1995. D. L. Silver and R. E. Mercer. The retention and transfer of neural network task knowledge. In Proceedings of the INNS World Congress on Neural Networks, volume III, pages 164{169. Lawrence Erlbaun Associates, 1995. S. Thrun. Extracting rules from arti cial neural networks with distributed representations. In G.Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems 7. MIT Press, San Mateo, CA, 1995. V. Tresp, J. Hollatz, and S. Ahmad. Representing probabilistic rules with networks of gaussian basis functions. Machine Learning, 27:173{200, 1997. A. Ultsch, R. Mantyk, and G. Halmans. Connectionist knowledge acquisition tool: CONKAT. In J. Hand, editor, Arti cial Intelligence Frontiers in Statistics: AI and statistics III, pages 256{263. Chapman and Hall, 1993. Q. Zhao and Z. Bao. Radar target recognition using a radial basis function neural network. Neural Networks, 9(4):709{720, 1996.

Suggest Documents