Shie-Jue Lee. Department of Electrical Engineering. National Sun Yat-Sen University. Kaohsiung 804, Taiwan, ROC. Keywords: Neuro-fuzzy Modeling, Singular ...
1999 Third InternationalConferenceon Knowledge-Basedlntelligent InformationEngineeing Systems, 3 1'' Aug-1" Sept 1999, Adelaide, Australia
An Improved Learning Algorithm for Rule Refinement in Neuro-Fuzzy Modeling * Chen-Sen Ouyang Shie-Jue Lee Department of Electrical Engineering National Sun Yat-Sen University Kaohsiung 804, Taiwan, ROC
Keywords: Neuro-fuzzy Modeling, Singular Value Decomposition Abstract
ral networks [l] are widely used because of its learning ability and adaptive property. However, this kind of methods often suffers from the problems of local minima, slow convergence speed, and the difficulty of understanding the symbolic meanings of numerical data. Another approach such as fuzzy modeling [2] has the ability to deal with uncertain information. Also, the knowledge represented by fuzzy rules can be easily understood and handled by human b e i n g . Nevertheless, fuzzy modeling lacks a definite method to determine the membership functions and an effective learning algorithm to refine those functions. To combine these two methods mentioned above, some authors have proposed neuro-fuzzy modeling techniques [3, 4, 51, which have the advantages of adaptability, quick convergence and high accuracy.
In this paper, we propose an improved learning algorithm for rule refinement in neuro-fuzzy modeling. This algorithm is mainly based on a well-known technique, i.e., singular value decomposition (SVD). By using the method of SVD, the learning algorithm can converge quickly. Besides, the reasoning operator adopted in our algorithm is a compensatory fuzzy operator which has the advantage of being more adaptive and effective. Experimental results show that the proposed algorithm converges quickly and the obtained fuzzy rules are more precise.
1
Introduction
In recent years, knowledge acquisition has become more and more important in many areas such as control, expert systems, database, etc. The purpose of knowledge acquisition is to extract the knowledge such as relation or distribution from a set of data. Through representing knowledge by symbolic rules, we can easily understand the essential properties of these data and handle them properly. So far, many approaches have been published to solve this problem. The techniques of neural networks such as multi-layered neu-
In general, neuro-fuzzy modeling can be constructed in three phases. In the first phase, fuzzy partitions are used to extract some rough fuzzy rules from a set of sampling data. In the second phase, for the sake of more precision, the extracted rough fuzzy rules are refined by neural networks. Finally, we can re-extract the symbolic rules from the numerical weights of neural networks. In this paper, we focus on the second and third phases. For the first phase, we have
'Partially supported by National Science Council u n d e r grant NSC-88-2213-E110-010.
0-7803-5578-4/99/$10.00@ 1999 IEEE
238
1999 Third International Conference on Knowledge-BasedIntelligent Information Engineeing Systems, 31" Aug-I" Sept 1999, Adelaide, Australia
proposed a self-constructing approach [6] to generate rough fuzzy rules from a set of numerical data. After the rough fuzzy rules have been extracted, we can construct the architecture of the fuzzy neural network according to the number of rules and the parameters associated with each rule. Traditionally, we can use backprogation learning algorithm to train the network. However, we still suffer from the problems of local minima and slow convergence speed. To avoid these problems, we incorporate the singular value decomposition(SVD) into the conventional backpropagation learning algorithm.
Figure 1: Architecture of the fuzzy neural network.
Construction of the Fuzzy Neural Network
2
label each node in this layer with ni8 to represent the ith node in the sth set. The connection between the ith input node and node ni, is labeled with two weights (m:, uf) and is initially set as the values of rule s. The output function of node ni, is
The problems we concern in this paper can be described as follows. Suppose we have a set of input-output data from a system with N inputs { x i E %li= 1 , 2 , 3 , .. ., N}, and a single output y E W. The relation between these input-output data can be described by a function f : W N --+ W. The purpose of neuro-fuzzy modeling is to find a way to precisely model this function. Moreover, we assume that R fuzzy rules have been generated by the self-constructing approach in [SI.Each rule s has the following format:
IF
x1
...and
is
The third layer is the inference layer with R nodes. Each node receives the onedimensional membership degrees of the associated rule from nodes of a set in the fuzzification layer. Here we use a compensatory fuzzy operator mentioned in [5] t o perform IF-condition matching of fuzzy rules. As a result, the output function of each inference node s is
and x 2 is p 2 s ( x 2 ) , a n d is p N s ( x N ) , T H E N y is Cs
pls(xl),
XN
where { p i sli = 1,2,. . ., N } is the set of input fuzzy membership functions of rule s and c, is the consequent parameter of rule s. The membership function pis can be calculated by
N
0$3)
=
(UOi:))l-r+sS, i=l
where 7 E [0,1] is called the compensatory degree. By tuning 7 , the fuzzy operator becomes more adaptive: The final layer is the defuzzification layer with one node. The connection between the sth node in the inference layer and the single node in the defuzzification layer is labeled with the weight cs. The initial value of c, is set as the value of rule s. The single output node acts as a defuzzifier and the output function of this node is
where mi and uf represent the mean and standard deviation of this Gaussian function, respectively. To construct the fuzzy neural network shown in Figure 1, we first decide the number of input nodes in the input layer as N . The output function of the ith node is
where t i is the ith element of the input vec. second layer tor x = ( t 1 , t g , . . . , X N ) ~ The is the fuzzification layer with N x R nodes. The N x R nodes can be divided into R sets, each of which includes N nodes and represents the linguistic variables of a rule. We
3
Our Learning Algorithm
Our improved learning algorithm is based on the technique of SVD . We first introduce
239
1999 Third International Conference on Knowledge-BasedIntelligent Information Engineeing Systems, 3 1" Aug-I' Sept 1999, Adelaide, Australia
3.2 Learning Algorithm For the learning algorithm, we derive the learning rules using {(m:,uI)li = 1 , 2,..., N ; s = 1 , 2,..., R } , y, and { c j l j = 1 , 2 , ., ., R } as adjustable parameters. The error function we consider for batch learning is
the basic concepts of this technique. Then, we describe our learning algorithm in detail.
3.1
SVD
SVD is a very powerful technique for dealing with sets of matrices that are either singular or numerically very close to singular. The main objective of SVD is to factorize a specific matrix into another form which is easy to process. According to the theorem of SVD, any m by n matrix A whose number of rows m is greater than or equal to its number of column n , can be factorized into
P
E = 1 / 2 2 ( y P - jjP)' p=l
where is the simulated output. According to the gradient descending method, the learning rule is
w(t
The columns of Q l ( m by m ) are eigenvectors of A A T , and the columns of Qz(n by n ) are eigenvectors of A T A . The r singular values on the diagonal of X(m by n) are the square roots of the nonzero eigenvalues of both A A T and A T A . After the SVD of A is found, we can get the pseudoinverse of A as follows:
where 77 is the learning rate and w is an adjustable parameter. Therefore, our learning algorithm can be described as follows.
Learning rule for the consequent parameter c,: We update c, by the method mentioned in previous subsection. Learning rule f o r the compensatory degree y: To ensure y E [0, l], we redefine 7 as 7 = c2/(c2 + d 2 ) .
The singular values 6 1 , . . . , U, are on the diagonal of E(m by n ) , and the reciprocals l/u1,. . . , l/u, are on the diagonal of E+(n by m).Consider a set of simultaneous equations Ax = b where A is an m by n matrix, b is an n by 1 matrix, and x is an m by 1 matrix. We want to find x which minimizes I A x - b I. We can prove that the minimum length least squares solution is X+
= A+b = Q2X+QTb.
+ I) = w ( t ) + v ( -dzE)
(1)
Suppose we have P training patterns in the training set. For each pattern p , we can represent it as { XP = (z;, . . . ,zP,)=; y' }. Note that yP is the desired output. The learning mode we adopt is the batch learning. Therefore, after the forward calculation of each training pattern p , we can get the output vector or' = (og), . . .,of;)'. we assume that o ( ~represents ) a P by R matrix and the rows of that matrix are composed of orIT,p = 1 , . . . , P . C is an R by 1 matrix whose elements are composed of c,, s = 1,.. ., R and y is a P by 1 matrix whose elements are composed of $ , p = 1 , . . . , P . Because the objective of batch learning is to minimize I o@)C - y I, we can use Equation (1) to find C + , where C+ is the update of original C .
Learning rule f o r n: and
4
U::
Experiments
In this section, we demonstrate the performance of our approach by experimenting with one set of input-output data. Suppose we want to model the nonlinear system which is defined by y = z2sin(zl)
240
+
21
cos(zz),
0
I 2 1 , zz I K
1999 Third International Conference on Knowledge-Based Intelligent Information Engineeing Systems, 3 I” Aug-lU Sept 1999, Adelaide, Australia
Figure 2: A nonlinear system.
Figure 4: Improved fuzzy rules. converges quickly and the obtained fuzzy rules are more precise. Experimental results have confirmed the better performance of our algorithm.
References [l] M. Hornik, K. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, pp. 359-366, 1989.
[2] M. Sugeno and T. Yasukawa, “A fuzzylogic-based approach to qualitative modeling,” IEEE Trans. Fuzzy Systems, vol. 1, pp. 7-31, February 1993.
Figure 3: Simulated result by rough fuzzy rules. where
and
[3] Y. Lin, G. Cunningham, and S . Coggeshall, “Using fuzzy partitions to create fuzzy systems from input-output data and set the initial weights in a fuzzy neural network,’’ IEEE Trans. Fuzzy Systems, vol. 5, pp. 614-621, November 1997.
are the inputs and The output of this system is shown in Figure 2. The input-output data represented by { ( x ~ , x ~ , y )are } sampled from {x1,x2} = (0 2L !rr 3rr . . , 120 8s1 120 9* .I. As a result, , 2 0 , 20, 20 , there are totally 441 input-output data. By using the self-constructing approach in [SI, we can obtain nine fuzzy rules as shown in Figure 3 with a mean square error of 0.16. Then, we train these rough rules by our improved learning algorithm. After 20 iterations, we can refine these rules t o be more precise with a mean square error of 0.016. The simulated result is shown in Figure 4. In [3], nine fuzzy rules with a mean square error of 0.16 are obtained by the fuzzy partitions. However, the improved rules with a mean square error of 0.016 are obtained after 100 iterations by their training algorithm. x1
22
y is the single output.
9
5
)
[4] C. F. Juang and C. J. Lin, “An online self-constructing neural fuzzy inference network and its applications,” IEEE Trans. Fuzzy Systems, vol. 6 , pp. 12-32, February 1998. [5] Y.-Q. Zhang and A. Kandel, “Compensatory neurofuzzy systems with fast learning algorithms,” IEEE Trans. Neural Networks, vol. 9, pp. 83-105, January 1998. [6] C.-S. Ouyang and S.-J. Lee, “A new selfconstructing approach for neuro-fuzzy modeling,” to appear in Proceedings of the Eighth International Fuzzy Systems Association World Congress, (Taipei, Taiwan, R . 0 .C .) , August 1999.
Conclusion
In this paper, we have presented proved learning algorithm for rule ment in neuro-fuzzy modeling. The tages of this learning algorithm are
an imrefineadvanthat it
241