Neuro-fuzzy Systems Complexity Reduction by Subtractive Clustering and Support Vector Learning for Nonlinear Process Modeling C. Pereira1,2 and A. Dourado1 1 CISUC – Centro de Informática e Sistemas da Universidade de Coimbra, Pinhal de Marrocos, 3030 Coimbra, Portugal Phone: +351 39 790000, Fax: +351 39 701266 email: {cpereira,dourado}@dei.uc.pt 2 ISEC – Instituto Superior de Engenharia de Coimbra, Quinta da Nora, 3030 Coimbra, Portugal Phone: +351 39 790200, Fax: +351 39 790270 email:
[email protected] ABSTRACT: The design of a neuro-fuzzy system based on a radial basis function (RBF) network architecture and using support vector learning is considered. Typically, a neuro-fuzzy model structure is created from numerical data, however the common modeling techniques may introduce unnecessary redundancy into the rule base. It is of great interest to reduce the number of fuzzy rules. The proposed method proceeds in two phases. First, the input-output data is clustered according to a modified form of the Mountain Method for cluster estimation - the subtractive clustering method. Second, a support vector machine is defined. The parameters of the network, number of centers, its positions and output layer weights are computed using support vector learning. This approach will improve the interpretability analysis and reduces the complexity of the problem. The proposed learning scheme is applied to the distributed collector field of a solar power plant. KEYWORDS: Neuro-fuzzy systems, radial basis function networks, subtractive clustering, support vector machines, nonlinear process modeling.
1. INTRODUCTION Neuro-fuzzy systems have attracted the growing attention of researchers for modeling and control of nonlinear systems. This kind of systems can take linguistic information from human experts and also adapt itself using the available inputoutput data. In contrast to pure neural or fuzzy methods, the neuro-fuzzy method possesses both of their advantages [1], the learning and adaptation capabilities of the neural networks and also providing an inference approach that enables approximate human reasoning capabilities. One significant issue in neuro-fuzzy modeling is how to optimally partition the input space. The way the input space is partitioned determines the number of rules extracted from training data as well as the number of fuzzy sets on the universe of discourse of each input variable. Often lattice networks with hypercubes as finite elements are used. However these models do not comply with the principle of parsimony which demands that the model should represent a given set of data with the fewest possible parameters. A fuzzy model with a large number of rules often encounters the risk of overfitting the data [2], [3]. One valuable approach to parsimonious interpolation is to allow the nodes to be arbitrarily distributed. The node density can then be adjusted to the local complexity of the represented function. Usually a clustering algorithm is applied. The generation of a new input cluster corresponds to the generation of a new fuzzy rule. An alternative design for neuro-fuzzy systems using radial basis function (RBF) network architecture is support vector learning (SVL). This is a method proposed by Vapnik and co-workers [4], [5], which in particular allows the construction of RBF networks by choosing an appropriate kernel function. Instead of minimizing the empirical training error, like conventional methods, the support vector machines (SVM) aims at minimizing an upper bound of the generalization error, finding a separating hyperplane and maximizing the margin between the hyperplane and the training data. The particular characteristic of a SVM is providing a good generalization performance despite the fact that it does not incorporate problem domain knowledge. The support vector algorithm provides a direct way of choosing the number and location of centers and weights of the network. Also SVM points an alternative point of view, with the centers being those examples which are critical to solve the given task. In the traditional view the centers are regarded as clusters.
eunite 2001
122
www.eunite.org
The performance of a SVM largely depends on the kernel, however there is a lack of theory concerning how to chose good kernel functions in a data-dependent way. An information-geometric method of modifying the kernel to improve performance has been suggested [6]. If Gaussian functions are chosen, the width is a parameter that must be defined a priori. In addition, when designing a SVM for regression tasks the selection of the insensitivity zone for the loss function and the regularization parameter, which controls the trade-off between model complexity and training error, is difficult. In practice, several values are used and cross-validation is performed. In addition, SVM needs to solve a quadratic optimization problem, exhibiting long running times, and in most cases slower training time than conventional neural networks. Thus, for computational reasons, pre-processing the data, in order to solve a smaller problem is more adequate. The proposed framework addresses these design difficulties. The remainder of this paper is organized as follows. Section 2 presents the network structure, describes the subtractive clustering method for input-output space clustering and introduces support vector machines. Description of the solar power plant and modeling results are given in Section 3. In Section 4 conclusions are given.
2 NEURO-FUZZY NETWORK LEARNING The proposed training method proceeds on two phases. First, subtractive clustering is applied, computing the number and position of the clusters representing the training data set. Second, the local feature space is computed using support vector learning. The Gaussian functions are used as multidimensional membership functions. The local model neurofuzzy network is implemented in the form of a RBF network.
2.1 NETWORK STRUCTURE A local basis function neural network has been adopted. The model belongs to a general class of function approximators, called the basis function expansion, taking the form: m
y = ∑ φi (x )θ i
(1)
i =1
where φ i are the basis functions and θ i the consequent parameters. The well-known RBF neural networks belong to this class of models. These networks are under certain conditions functionally equivalent to the fuzzy model with constant consequent parts [7]: Ri : If x is Ai then y = θ i , i = 1,2,..., m.
(2)
where x is the antecedent linguistic variable, which represents the input to the fuzzy system, and y is the consequent variable representing the output. Ai is the linguistic term defined by the Gaussian multivariable membership function, and m denotes the number of rules. x1 θ1
x2 . . .
θ2
. . .
. . .
Σ
y Output
θm
xn Inputs
Fuzzy Layer
Figure 1. Local basis function network structure The functional equivalence implies that the learning algorithms of RBF networks can be used to train models of this type. In most cases the RBF training proceeds in two phases. The basis functions are constructed by clustering the input training vectors in the input space and the consequent parametersθ i are estimated from the data using least-squares methods [8]. In a similar way, Takagi-Sugeno fuzzy models are constructed by product-space fuzzy clustering. The way the input space is partitioned determines the number of fuzzy rules. Typically, a model structure is created from numerical data in the form of IF-THEN rules. However these automated modeling techniques may introduce
eunite 2001
123
www.eunite.org
unnecessary redundancy into the rule base. It is of great interest to reduce the number of fuzzy rules (or hidden units). A survey of some methods proposed in the literature is given in [9]. Recently SVM has been proposed to construct different types of learning machines. Applied to RBF networks the learning process using a given set of training data automatically determines the required number of hidden units (support vectors), its positions and weights. In this work, by using an effective partition of the input space, a more parsimonious structure can be achieved.
2.2 SUBTRACTIVE CLUSTERING In the proposed method, the input-output data is clustered according to the subtractive clustering method. The purpose of clustering is to classify a set of N data points X = {x1 , x 2 ,..., x N } into homogenous groups of data P = {p1 , p 2 ,..., pC } with 1 ≤ c ≤ N (If the number of clusters c=1 all data belongs to the same class and if c=N, each data sample defines a class). The fuzzy c-means clustering algorithm [10] is an extremely powerful classification method, which minimizes the Euclidean distance between each data point and its cluster center. The number of clusters should reflect the level of the knowledge of the system under consideration or the level of generality in the user’s description of the system. The quality of the fuzzy c-means depends strongly on the choice of the number of centers and the initial cluster positions. The advantage of using the subtractive clustering algorithm is that the number of clusters does not need to be a priori specified, instead the method can be used to determine the number of clusters and their values. The method is a modified form of the Mountain Method [11] for cluster estimation. Assuming N normalized points in an M-dimensional space, each data point is considered as a potential cluster center [12] and defines a measure of the potential of data point xi as: N
Poti = ∑ e
−α xi − x j
2
(3)
j =1
4 , and ra is a positive constant (radius defining a neighborhood). The measure of potential for a given ra2 point is a function of its distances to all other data points. A point with many neighboring points will have a high potential value. After the potential of every data point has been computed the point with the highest potential x k is selected as the first cluster center. Then the potential of each point xi is updated by the formula:
where α =
Pot i ⇐ Pot i − Pot k e
where Pot k is the potential of the selected cluster and β =
− β x i − x1 2
(4)
4 , where rb is a positive constant, somewhat greater than rb2
ra . When the potential of all data points have been updated according to equation 4, the data point with the highest remaining potential is selected as the next cluster to select, and the process repeats until a given threshold for the potential has been reached.
2.3 NETWORK SIZE REDUCTION BY SUPPORT VECTOR LEARNING The support vector learning, proposed by Vapnik, is a constructive learning procedure based on statistical learning theory. By choosing different kinds of kernels, this technique can be applied to a variety of representations such as multilayer perceptron (MLP) neural networks, radial basis function networks, splines or polynomial estimators. It can be used either in classification or regression tasks. The SVM works by mapping the input space into a high-dimensional feature space using a set of nonlinear basis functions. The framework was originally designed for pattern recognition, but the basic properties carry over to regression by choosing a suitable cost function, like for example the e-insensitive or Vapnik’s loss function: y − f ( x ) − e, for y − f ( x ) ≥ e Ls (d , y ) = , otherwise 0
eunite 2001
124
(5)
www.eunite.org
This is a linear function with an insensitive zone. The parameter e > 0 controls the width of the insensitive zone (errors below e are not penalized) and is usually chosen a priori. In the support vector machines a function linear in the parameters is used to approximate the regression in the feature space: m
f (x) = ∑ w j g j (x )
(6)
j =1
where x ∈ ℜ d represent the input, w j the linear parameters and g j denotes a nonlinear transformation. The goal is to find a function f with a small test error, based on training data ( xi , y i ), i = 1,..., n , by minimizing the empirical risk: Remp [ f ] =
subject to the constraint w
2
1 n ∑ Lε ( y, f (x )) n i =1
(7)
≤ c , where c is a constant. The constrained optimization problem is reformulated by
introducing the slack variables ξ i ≥ 0, ξ i' ≥ 0, i = 1,..., n : yi − ∑ w j g j (x) ≤ e + ξ i m
j =1
(8)
∑ w j g j (x ) − yi ≤ e + ξ i' m
j =1
Assuming this constraint, the problem is posed in terms of quadratic optimization by introducing the following functional:
(
)
Q ξ i , ξ i' , w =
n C n 1 ∑ ξ i + ∑ ξ i' + w T w n i =1 i =1 2
(9)
The pre-determined coefficient C should be sufficiently large and affects the trade-off between complexity and the training error. Then this optimization problem is transformed into the dual problem by constructing a Lagrangian and applying the Kuhn-Tucker theorem (for details, see [4]). The regression function is then given by: n
f (x ) = ∑ (α i − β i )K (x i , x)
(10)
i =1
where K (x i , x ) represents the inner product kernel defined in accordance with Mercer’s Theorem. It is interesting to note that the explicit knowledge of g j is not really necessary, instead equation 1 is rewritten in terms of dot products. In the dual problem, the parameters α i and β i are calculated by maximizing the functional: n
n
i =1
i =1
Q(α , β ) = ∑ yi (αi − β i ) − e∑ (α i + β i ) −
(
)
1 n n ∑ ∑ (α i − β i ) α j − β j K (xi , x ) 2 i =1 j =1
(11)
subject to the following constraints: n
∑ (α i − β i ) = 0 i =1
(12)
C C 0 ≤ α i ≤ , 0 ≤ β i ≤ , i = 1,..., n n n
In the SVM the nonlinear feature space is directly incorporated in the parameter optimization. By solving the quadratic optimization problem one obtains the number of hidden units, their values and the weights of the output layer. Choosing Gaussian functions as kernels: x −x i K (x i , x) = exp − 2σ 2
eunite 2001
125
2
(13)
www.eunite.org
the support vector learning offers an alternative method to the design of RBF networks. The parameters to chose are the capacity control, the insensitive error and the spread of the Gaussian functions.
3. MODELING A SOLAR POWER PLANT The support based learning model is applied in this section to the distributed collector field of a solar power plant. The main characteristic of the solar plant is that the primary energy source, the solar radiation, cannot be manipulated. The solar radiation changes substantially during plant operation due to daily solar cycle, atmospheric conditions, such as clouds cover, humidity and turbidity leading to significant variations in the dynamic characteristics of the field, corresponding to different operation conditions. Therefore it is difficult to obtain a satisfactory model over the total operating range. To deal with the several operating points some control strategies have been proposed. One of them applies adaptive control schemes, using local linear models of the plant [13]. Also a hierarchical control strategy consisting on a supervisory switching of PID controllers based on recurrent neural networks models was tested [14]. The approach presented in this paper, effectively partitioning the input-output space and taking advantage of the generalization ability of the support vector machines intends to overcome this problem. The Acurex distributed solar collector field of the solar power plant is well described in the literature [13], and is located at the ‘Plataforma Solar de Almeria’, Spain. The field consists of 480 distributed solar collector arranged in 20 rows, which form 10 parallel loops. Each loop is 172 m long and the total aperture surface is 2672 m2. A schematic diagram is shown in Figure 5. Each collector uses parabolic mirrors to concentrate solar radiation in a receiver tube. Synthetic oil is pumped through the receiver tube and picks up the heat transferred through the tube walls. The inlet oil, at temperature Tin, is pumped from the bottom of storage tank flowing through the collector field where its temperature is raised. After that, the fluid is introduced into the storage tank, from the top, to be used for electrical energy generation or feeding a heat exchanger in the desalination plant. The oil flow rate, Qin, is the manipulated variable in the solar plant, and the main goal is to regulate the outlet field oil temperature, Tout, at a desired level, Tref. The main disturbances are solar radiation Irr and the inlet oil temperature.
Irr Loop10
Tank Loop2
To the desalination plant or steam generator
Tin
Tout
Loop1
Qin
Buffer Pump
Figure 2. Schematic diagram of the Acurex field.
The considered model corresponds to the following regression NARX model: Tout (k ) = f (Tout (k − 1),.., Tout (k − n a ), Irr (k − 1),..., Irr ( k − nb ), Qin(k − 1),..., Qin(k − nc ))
(14)
The figure 3 represents the model prediction for a given learning data set (VAF=95.9%). In this experiment, na = nb = nc = 1 . The number of support vectors achieved was 12. Figure 4 represents the model prediction for a validation data set (VAF=93.6%). VAF criterion represents the percentile "Variance accounted for" measure between measured data and the model output: var( y − y ) 1 2 VAF = 100% 1 − var( y ) 1
(15)
VAF equals 100% between two equal signals y1 and y 2 .
eunite 2001
126
www.eunite.org
10
1000
280 T [ºC]
Irr 240
8
900
Qin [l/s]
Irr [w/m2]
Qin
200
6
800
160
4 12.5
13
13.5
14
14.5 15 Time [hours]
15.5
16
700 16.5
Fig. 3(a). Learning data set. Oil flow rate and radiation
12.5
13
13.5
14
14.5 15 Time [hours]
15.5
16
16.5
Fig. 3(b). Learning data set. Oil temperature and temperature prediction (dashed). 310
10
900
T [ºC]
Irr
290
Qin [l/s]
Irr [w/m2] Qin
270
5
800
250
230 12
700
0 12
12.5
13
13.5 14 Time [hours]
14.5
15
15.5
Fig. 4(a). Validation data set. Oil flow rate and radiation.
12.5
13
13.5 14 Time [hours]
14.5
15
15.5
Fig. 4(b). Validation data set. Oil temperature and temperature prediction (dashed).
This experiment confirms the excellent generalization ability when using support vector learning. To compare with a different network, a feedforward neural network using the Levenberg-Marquardt backpropagation learning algorithm with 30 units was also applied. It exhibits similar learning performance (VAF=95.2%) but the generalization ability is undoubtedly inferior (VAF=47.3%).
4. CONCLUSIONS This work has presented a framework for the construction of a neuro-fuzzy network based on support vector learning and subtractive clustering. When modeling the solar power plant, the results show the good approximation capabilities of the technique keeping a parsimonious network structure. The clustering of the input-output space is an effective way to decompose a large problem, which is difficult to solve using a support vector algorithm in a simpler problem, easily solved. The method simplifies the design of the RBF networks, providing a direct way of choosing the number and location of centers and weights of the network. Comparing to standard neural networks, the generalization ability is improved by the use of the support vector learning algorithm. It gives a good generalization performance despite the fact that it does not incorporate problem domain knowledge.
eunite 2001
127
www.eunite.org
ACKNOWLEDGMENTS This work was partially supported by project ALCINE PRAXIS/EEI/14155/98 and POSI - Programa Operacional Sociedade de Informação of Portuguese Fundação para a Ciência e Tecnologia and European Union FEDER. The experiments described in this paper were carried out within the project Improving Human Potential program (ECDGXII) supported by the European Union Program Training and Mobility of Researchers. The authors would like to express their gratitude to the personnel of the PSA, in particular Diego Martinez and Loreto Valenzuela, who were nominated to take care of this project.
REFERENCES [1] Brown, M. ;Harris, C., 1994, “Neurofuzzy Adaptive Modelling and Control”, New York: Prentice Hall. [2] Babuska, R.,1998, “Fuzzy Modeling for Control”, Boston, Kluwer. [3] Yen J.; Wang, L., 1999, “Constructing optimal fuzzy models using statistical information criteria”, Journal of intelligent and Fuzzy Systems, 7, 185-201. [4] Vapnik, V., 1995, “The nature of statistical learning theory”, Springer. [5] Cortes, C.; Vapnik, V., 1995, “Support Vector Networks”, Machine Learning, 20, 273-297. [6] Amari, S.; Wu, S., 1999, “Improving support vector machine classifiers by modifying kernel functions”, Neural Networks, 12, 783-789. [7] Jang, J.; Sun, C., 1993, “Functional equivalence between radial basis function networks and fuzzy inference systems”, IEEE Transactions on Neural Networks, 4(1), 156-159. [8] Moody, J.; Darken, C., 1989, “Fast learning in networks of locally-tuned processing units”, Neural Computation, 1(2), 281-294. [9] Setnes, M.; Lacrose V.; Titli A., 1999, “Complexity Reduction Methods for Fuzzy Systems, Fuzzy Algorithms for Control”, Boston: Kluwer, Chapter 8. [10]Bezdek, J.,1981, “Pattern Recognition with fuzzy objective function algorithms”, New York: Plenum. [11]Yager, R., Filev, D., “Learning of fuzzy rules by mountain clustering”, Proc. SPIE Conf. On Applications of Fuzzy Logic Technology, Boston, 246-254. [12]Chiu, S., 1994, “Fuzzy Model Identification Based on Cluster Estimation”, Journal of Intelligent and Fuzzy Systems, 2, 267-278. [13]Camacho, E.; Berengel, M.; Rubio F., 1998, “Modeling and simulation of a solar power plant with distributed collector system”, IFAC symposium Power Systems, Modeling and Control Applications, Brussels. [14]Henriques, J.; Cardoso, A.; Dourado, A., 1999, “Supervision and C-means clustering of PID controllers for a solar power plant”, Int. Journal of Approximating Reasoning, 22, 73-91.
eunite 2001
128
www.eunite.org