Modeling fuzzy classi cation systems with compact rule base. Giovanna Castellano and Anna Maria Fanelli. Dipartimento di Informatica - Universit a di Bari.
Modeling fuzzy classi cation systems with compact rule base Giovanna Castellano and Anna Maria Fanelli Dipartimento di Informatica - Universita di Bari Via E. Orabona, 4 - 70126 - Bari - ITALY e-mail: [castellano, fanelli]@di.uniba.it
Abstract. An adaptive method to construct compact fuzzy systems for solving pattern classi cation problems is presented. The method consists of two phases: a rule identi cation phase and a rule selection phase. The rule identi cation phase generates fuzzy rules from numerical data through a simple fuzzy grid method, then tunes the resulting fuzzy rules by training a neuro-fuzzy network used to model the fuzzy classi er. The rule selection phase simpli es the fuzzy classi er by iteratively removing rules in the trained neuro-fuzzy network and adjusting the remaining rules so that the input-output behavior of the neuro-fuzzy network remains approximately unchanged. The performance of the proposed method both for training and test data is examined by computer simulations on the Iris data classi cation problem. 1 Introduction Fuzzy systems have been successfully applied to various control and classi cation problems [1], [2], [3]. In many application tasks, fuzzy rules are manually derived from human expert knowledge, but this approach becomes impractical for large dimension problems. Recently several methods have been proposed for automatically generate fuzzy rules from numerical data [4], [5], [6], [7], but most of them face mainly with system identi cation or control tasks. The main drawback of these approaches is that the number of generated fuzzy rules becomes enormous especially for classi cation problems with high dimensional pattern spaces. Moreover, the resulting rule base needs often to be tuned in order to improve the accuracy of the resulting fuzzy system. This paper proposes an adaptive method to construct a compact fuzzy system for solving pattern classi cation problems with high performance. The method consists of two phases: a rule identi cation phase and a rule selection phase. In the rule identi cation phase two procedures are applied: the rst generates rules from numerical examples, the latter tunes the resulting rules. The generation of fuzzy rules from numerical data is made through a simple fuzzy grid method, similar to the one developed in [8], that partitions the pattern space into fuzzy subspaces, and de nes a fuzzy rule for each subspace; then each fuzzy rule is associated with a grade of certainty. This rule generation procedure attempts to produce a rst approximate model of the fuzzy classi cation system. However, it can be expected that the classi cation power of the fuzzy system may be improved by tuning the set of its parameters. Therefore, we apply an additional tuning procedure to adjust both the antecedent membership functions and the grade of certainty of each fuzzy rule. This is accomplished by trasforming the fuzzy classi cation
system into a neural network, as usual for many neuro-fuzzy approaches [9], [10], and then by training the neural network used to model the fuzzy system. The main drawback of the rule identi cation phase is that the number of generated rules may become very large when a ne fuzzy partition is applied, resulting in an ecient but too complex fuzzy classi er. On the other hand, a too coarse fuzzy partition produces a small rule base that may fail to correctly classify all the patterns. To solve the tradeo between accuracy and complexity, we generate rules with a ne fuzzy partition, and then we apply a rule selection procedure in order to reduce the complexity of the rule base while leaving almost unchanged the classi cation rate of the fuzzy system, and, possibly, improving the generalization ability. The rule selection procedure is a method previously developed for selecting signi cant rules in a neuro-fuzzy system [11]. Such a method iteratively prunes o unnecessary rules and adjusts the remaining ones so that the input-output behavior of the trained neuro-fuzzy network remains approximately unchanged. Experimental results over the Iris data problem are presented that show how the proposed two-phase method is able to produce a fuzzy classi cation system with a parsimonious model and a good generalization capability.
2 Rule identi cation phase In the rule identi cation phase two procedures are applied: the rst generates rules from numerical examples, the latter tunes the resulting rules. Let us consider a n-dimensional classi cation problem for which P patterns (xp; :::; xpn), p = 1; 2; :::; P with xpi 2 [0; 1] are given from m classes C ; C ; :::; Cm. To solve this classi cation problem through a fuzzy system we have to generate fuzzy rules that divide the pattern space into M disjoint areas. In this paper we employ fuzzy rules of the following type: 1
1
2
Rule r: IF xp is Ar and ... and xpn is Arn THEN xp belongs to C r with CF = CF r 1
1
where Ari ; i = 1:::n are the fuzzy subset in the antecedent, C r is the consequent (i.e. one of m classes), and CF r is the grade of certainty of the fuzzy rule for the class represented by C r . Once a set S of such rules is generated as described hereafter in this section, they are used to classify unknown patterns by the following procedure. Given a pattern x= (x ; :::; xn), we calculate for T = 1:::m the quantities: 1
n Y CT = max f Ari (xi)CF r jC r = CT ; r 2 S g i=1
and classify x in the class maximizing CT . When multiple classes take the maximum value, the pattern x cannot be classi ed. 2.1 Rule generation procedure
Our rule generation method is an extension of the simple-fuzzy-grid method proposed in [8]. First, each axis of the pattern space is partitioned into K fuzzy subsets, creating a simple fuzzy grid. We assume that fuzzy set on each axis are in the unit interval [0; 1]. Let be fAij gj :::K the fuzzy subsets for the ith input variable. For each fuzzy set Aij a Gaussian membership function is employed: =1
wij ) ) ij (xi ) = exp (?( xi ?
(1)
2
ij
The center and the width of the Gaussian functions are obtained as: wij = Kj ??11 and ij = K 1? 1
where controls the slopes at the crossover points (i.e where membership function values is 0.5). The proposed simple fuzzy grid procedure generates K n rules, that is one rule for each input fuzzy subspace. The rth input fuzzy subspace (Ar ; Ar ; :::; Arn); r = 1; :::K n represents the antecedent of the rth rule. The consequent C r and the grade of certainty CF r of the rth rule are determined by the following procedure: 1
1. For T = 1; 2; :::m calculate:
CT =
2
X Yn r (xp) Ai i
p2CT i=1
2. The consequent C r is determined as the class C such that :
C = maxf C1 ; C2 ; :::; Cm g 3. The grade of certanty CF r is determined as: CF r = P mC ? CT T where X CT = m?1 =1
CT 6=C
With such procedure we generate a set of fuzzy rules that divide the pattern space into m disjoint decision areas. 2.2 Rule tuning procedure
To improve the classi cation power of the generated fuzzy rule base, we apply an additional tuning procedure to adjust both the antecedent membership functions and the grade of certainty of each fuzzy rule. This is accomplished by training a neuro-fuzzy network that models the fuzzy classi cation system. The adopted neuro-fuzzy network has a four-layer feed-forward architecture ( g. 1). The rst layer L is the input layer, that consists of n units representing the linguistic input variables. No computation is done by these nodes: they just transmit the input values to the next layer directly. Thus the output of the ith node is 1
Oi = x i 1
Units in the second layer L , called the fuzzi cation layer, compute the membership degree for an input vector x. This layer is composed of n groups, each including Ni neurons. Note that in this case Ni = K for each i = 1; :::; n, since the rule generation procedure divides each axis of the pattern space into K fuzzy subsets. Node ij 2 L computes the membership value of the ith input variable xi for the j th fuzzy set. Thus the output of neuron ij 2 L is: Oij = ij (xi) Weights between unit i 2 L and unit ij 2 L are initialized with the mean wij and the width ij of the membership function ij . The third layer L performs the inference of rules by means of product operator. The number of nodes in this layer is equal to the number of rules K n of the fuzzy classi cation system. The connections to this layer are xed (i.e. no modi able parameter is associated with them) and they are used to perform precondition matching of fuzzy rules. The output of neuron r 2 L is: 2
2
2
2
1
2
3
3
Or = 3
YO ij 2
ij 2L2
that corresponds to the activation strength of the rth rule. The fourth layer L performs the defuzzi cation phase. The number of units in this layer is m. The output of unit k 2 L is: 4
4
Ok = P netkO 4
4
r2L3 r3
where netk = Pr2L3 vrk Or is the net input. Weights vrk connecting unit r 2 L to unit k 2 L represent the grade of certainty of the rth rule for the kth output class. They are initialized as follows: vrk = CF r if C r is the kth output class, vrk = 0:0 otherwise. The learning algorithm is a gradient-descent technique commonly used for training neuro-fuzzy networks [10]. 3
4
3
4
y1
y2
ym
x1
x2
xn
Layer 4 Defuzzification
Layer 3 Rule Inference
Layer 2 Fuzzification
Layer 1 Input
Figure 1: The neuro-fuzzy network
3 Rule selection phase Once rule and parameter have been identi ed through the above described phase, we attempt to optimize the resulting fuzzy model in order to obtain a simpler and ecient system. The fuzzy system is simpli ed by reducing the number of rules while leaving almost unchanged the classi cation rate. The rule selection procedure is a method previously developed for selecting significant rules in a neuro-fuzzy system [11]. Such a method iteratively prunes o unnecessary rules and adjusts the remaining ones so that the input-output behavior of the trained neuro-fuzzy network remains approximately unchanged. At each step, a rule is identi ed to be removed and the remaining ones are updated so as the performance of the reduced system remains approximately unchanged. Figure 2 shows the eect of removing a rule node in a single step of the algorithm. The selection algorithm works as follows: 1. t := 0; 2. Repeat (a) identify the unit h 2 L to be removed (b) remove ingoing connections fij hgij 2L2 (c) remove outgoing connections fhkgk2L4 (d) remove eventual isolate nodes ij 2 L (e) update remaining weights fvrk gr2L3;k2L4 according to vrkt = vrkt + rk (f) t := t +1; until stopping condition is met. The update quantities rk 's are derived by imposing that the net input netk of each output node k 2 L remains approximately unchanged after the elimination of unit h 2 L . This amounts to requiring that, for each training pattern xp and for each node k 2 L , the following relation holds: X v O = X (v + )O (2) rk rp rk rk rp 3
2
( +1)
( )
4
4
3
4
3
r2L3
3
r2L3 ?fhg
where Orp is the output of node when pattern xp is presented. Simple algebraic manipulations yield the following linear system: 3
X O =v O rk rp hk hp 3
r2L3 ?fhg
3
(3)
The quantities rk 's are then computed by solving the linear system (3) in the leastsquares sense throught an ecient preconditioned Conjugate-Gradient method [12]. The criterion for identifying the rule to be removed at each step has been suggested by the adopted least-squares method. Such a method provides a better solution with faster convergence if the system being solved has a small known term vector (in terms of Euclidean norm). Since in system (3) the known terms depend essentially on the unit h being removed, our idea is to choose the unit for which the norm of the known term vector is minimum. The algorithm is stopped before the performance of the reduced network worsens signi cantly.
y1
y2
ym
x1
x2
xn
Layer 4 Defuzzification
Layer 3 Rule Inference
Layer 2 Fuzzification
Layer 1 Input
Figure 2: Elimination of a rule during a step of the selection process
4 Experimental results The performance of the proposed method has been tested on the well known benchmark Iris data problem [13], also used as testing problem in [8], where a rule generation method based on a simple-fuzzy-grid procedure is proposed. The classi cation problem of the Iris data consists of classifying three species of iris (setosa: C , versicolor: C and verginica: C ). There are 150 samples for this problem, 50 of each class. A sample is a four-dimensional pattern vector (x ; x ; x ; x ) representing four attributes of the iris ower (sepal lenght, sepal width, petal lenght, petal width). All the attribute values are normalized in [0; 1] in order to employ the same membership functions for each axis of the pattern space. To begin, we applied the rule identi cation phase. First, using all the 150 patterns, the rule generation procedure was run for K = 2; 3; 4; 5, obtaining respectively four rule bases with K rules. Then, we applied the tuning procedure to the generated fuzzy classi ers. Each fuzzy classi cation system was transformed into a neuro-fuzzy network with 4 inputs, K n units in the fuzzi cation layer, K units in the rule layer, and 3 outputs, corresponding to the 3 classes. For each network, the parameters of the fuzzi cation and defuzzi cation layer were initialized with the membership function parameters and the CF r values, respectively, obtained in the rule generation procedure. The four networks were trained on the whole data set for 1000 iterations of the training procedure proposed in [10] with variable step size. Table 1 summarizes the classi cation rates of the fuzzy classi ers obtained in the rule identi cation phase. It can be seen that after the tuning procedure, the classi cation rate improves for all the four fuzzy systems. After, the rule selection procedure was run. It was stopped when the classi cation rate over the whole data set worsened for more than 1%. The results are summarized in table 2. It can be seen that in all cases almost 50% of the rules are removed while leaving almost unchanged the classi cation rate of the fuzzy system. Since in classi cation problems we care about the generalization ability of the classi er, we carried out some simulations to test the fuzzy classi ers generated through the proposed method in the classi cation of unknown patterns. To investigate the generalization ability, we divided randomly the 150 samples into two subset, each having 75 samples (i.e. 25 samples for each class). One subset was used to generate the fuzzy rules and train the neuro-fuzzy networks in the rule identi cation phase, the other subset was 1
2
3
1
4
4
2
3
4
used as test set. As before, the rule generation procedure was run for K = 2; 3; 4; 5, producing four fuzzy classi ers. After we applied the tuning procedure, using the fuzzy rules to initialize the neuro-fuzzy networks. In this case, the training was performed with a cross-validation technique. From the 75-sample training set, a validation set of 15 samples was extracted and used to evaluate the performance of the networks during the learning phase. The remaining samples were used for training. The training was stopped when the classi cation rate of the network over the validation set worsened for ve consecutive times. The four fuzzy systems obtained after tuning phase were simpli ed through the rule selection procedure. As the goal was to improve the generalization ability, this time the rule selection process was stopped when the classi cation rate over the test set worsened for more than 1%. The generalization results of the obtained fuzzy classi ers, both before and after the rule selection phase, are shown in table 3, together with those of the fuzzy classi ers obtained in [8] when a learning procedure is applied to the simple-fuzzy-grid method. It can be seen that in all cases the rule selection process reduces drastically the number of rules while leaving unchanged (or even improving ) the generalization ability. Moreover, the fuzzy classi cation system generated with the proposed method overperforms the fuzzy system produced in [8] both in terms of simplicity and classi cation rate. rule identi cation method K=2 K=3 K=4 K=5 before tuning 70.0 96.0 96.0 96.7 after tuning 96.6 99.3 100.0 100.0 Table 1: Classi cation rate (%) of the fuzzy systems obtained by the rule identi cation phase
fuzzy classi er
K=2 K=3 K=4 K=5 #rules (%) #rules (%) #rules (%) #rules (%) before rule selection 16 96.6 81 99.3 256 100.0 625 100.0 after rule selection 8 96.7 53 98.0 112 99.3 320 99.3 Table 2: Classi cation rate (%) of the fuzzy systems obtained by the rule selection phase
method
K=2 K=3 K=4 K=5 #rules (%) #rules (%) #rules (%) #rules (%) Rule identi cation 16 94.7 81 97.3 256 97.3 625 96.0 Rule selection 8 94.7 4 98.7 91 97.3 105 96.0 Adaptive method [8] 16 91.7 62 94.8 129 94.5 190 94.8 Table 3: Generalization ability of the fuzzy classi ers for testing patterns (%)
5 Conclusions This paper proposed an adaptive method to model fuzzy classi ers with compact rule base. The method performs a rule identi cation phase, that generates and tunes fuzzy rules using numerical data, and a rule selection phase, that reduces the complexity of the produced rules. A simple-fuzzy-grid method is employed to generate rules using numerical examples. Then rules are tuned by training a neuro-fuzzy network used to model the fuzzy system. Finally, the rule base is simpli ed through a rule selection process, that removes rules in the trained neuro-fuzzy network. Preliminar experimental results have been presented, that demonstrate how the proposed two-phase method can construct a compact fuzzy rule-based system with high performance not only on training data but also on testing data. More extensive tests are in progress, aiming at evaluating the performance of the method on a lot of pattern classi cation problems. Future work will concern an extension of the rule generation procedure, that uses plural partitions of the input space, in order to produce more exible models of fuzzy classi ers.
References [1] C.C. Lee. Fuzzy logic in control systems: Fuzzy logic controller-part ii. IEEE Transactions on Systems, Man and Cybernetics, 20(2):419{435, 1990. [2] D.G. Schwartz, G.J. Klir, H.W. Lewis III, and Y. Ezawa. Applications of fuzzy sets and approximate reasoning. Proceedings of the IEEE, 82(4):482{497, April 1994. [3] D. E. Thomas and B. Armstrong-Helouvry. Fuzzy logic control - a taxonomy of demonstrated bene ts. Proceedings of the IEEE, 83(3):407{421, March 1995. [4] L. X. Wang and J. M. Mendel. Generating fuzzy rules by learning from example. IEEE Trans. Sys., Man, Cybern., 22(6):1414{1427, 1992. [5] S. Abe and M. Lan. Fuzzy rules extraction directly from numerical data for function approximation. IEEE Transactions on Systems, Man and Cybernetics, 25(1):119{129, January 1995. [6] T. Takagi and M. Sugeno. Fuzzy identi cation of systems and its applications to modelling and control. IEEE Transactions on Systems, Man and Cybernetics, 15(1):116{132, January 1985. [7] M. Delgado, A.F. Gomez-Skorneta, and F. Martin. A fuzzy clustering-based rapid prototyping for fuzzy rule-based modelling. IEEE Transactions on Fuzzy Systems, 5(2), May 1997. [8] K. Nozaki, H. Ishibuchi, and H. Tanaka. Adaptive fuzzy rule-based classi cation systems. IEEE Transactions on Fuzzy Systems, 4(3):238{250, August 1996. [9] J.S.R. Jang and C.T. Sun. Neuro-fuzzy modeling and control. Proc. of the IEEE, 83(3):378{406, March 1995. [10] C-T. Lin and C.S.G. Lee. Neural-network-based fuzzy logic control and decision system. IEEE Transactions on Computers, 40(12):1320{1336, December 1991. [11] G. Castellano and A.M. Fanelli. Simplifying a neuro-fuzzy model. Neural Processing Letters, 4(2):75{81, November 1996. [12] A. Bjiorck and T.Elfving. Accelerated projection methods for computing pseudoinverse solutions of systems of linear equations. BIT, 19:145{163, 1979. [13] R. A. Fisher. The use of multiple measurements in taxonomic problems. Ann. Eugen., 7:179{188, 1936.