ABSTRACT. The identification of uncertain and nonlinear systems is an impor- tant and challenging problem. Fuzzy logic models are often a good choice to ...
1
Compact TS-Fuzzy Models through Clustering and OLS plus FIS Model Reduction
J. Abonyi
J.A. Roubos
M. Oosterom
F. Szeifert
Corresponding author, E: abonyij fmt.vein.hu T: +36 88 422-0224209 F: +36 88422-0224171 University of Veszprem, Dep. of Process Engineering, P.O. Box 158, H-8201, Hungary, http: www.fmt.vein.hu softcomp Delft University of Technology, Dep. of Information Technology and Systems, Control Laboratory, P.O. Box 5031 2600 GA Delft, The Netherlands A BSTRACT
The identification of uncertain and nonlinear systems is an important and challenging problem. Fuzzy logic models are often a good choice to describe such systems, however in many cases these become complex soon. Generally, too less effort is put into variable selection and in the creation of suitable local rules. Moreover, in general no model reduction is applied, while this may simplify the model by removing redundant information. This paper proposes a combined method that handles these issues in order to create compact Takagi-Sugeno (TS) models that can be effectively used to represent complex systems. A new fuzzy clustering method is proposed for the identification of compact TS-fuzzy models. The most relevant consequent variables of the TS model are selected by an orthogonal least squares method based on the obtained clusters. For the selection of the relevant antecedent (scheduling) variables a new method has been developed based on Fisher’s interclass separability criteria. This overall approach is demonstrated by means of the MPG (miles per gallon) nonlinear regression benchmark. The results are compared with results obtained by standard linear, neuro-fuzzy and advanced fuzzy clustering based identification tools. I. I NTRODUCTION Fuzzy modeling and identification from input-output process data proved to be effective for approximation of uncertain nonlinear systems [1]. The most frequently applied Takagi-Sugeno (TS) model tries to decompose the input space of the nonlinear model into fuzzy subspaces and then approximate the system in each subspace by a simple linear regression model [2]. Different approaches to obtain TS-fuzzy models from data have been proposed. Most approaches, however, utilize only the function approximation capabilities of fuzzy systems, and little attention is paid to the qualitative aspects. This makes them less suited for applications in which emphasis is not only on accuracy, but also on interpretability, computational complexity and maintainability [3]. Additionally, such straightforward approaches soon lead to quite complex models because generally little effort is put into variable selection and in the creation of suitable local rules. Moreover, in general no model reduction is applied, while this may simplify the model by removing redundant information. This paper proposes a combined method that handles these issues in order to create compact Takagi-Sugeno (TS) models that can be effectively used to represent complex systems. The bottleneck of the TS model identification is the data-driven identification of the structure that requires nonlinear optimization. For this purpose often heuristic, data-driven approaches, like fuzzy clustering methods are applied, mainly for determining the rule antecedents of TS models [4]. We propose a more advanced clustering method that pursues a further step in accomplishing the total parameter and structure identification of TS models. This approach is based on a new clustering algorithm: the modified Gath-Geva clus-
tering [5]. The clusters are formed by expectation maximization (EM)identification of the TS model in the form of a mixture of Gaussians model [6]. The obtained model is then reduced by reducing the amount of antecedent variables and also the amount of consequent variables. Using too many antecedent variables may result in difficulties in the prediction and interpretability capabilities of the model due to redundancy, non-informative features and noise. Hence, selection of the scheduling variables is usually necessary. For this purpose, we modify our method that is based on Fischer interclass separability method and have been developed for feature selection of fuzzy classifiers [7]. Others have focused on reducing the antecedent by similarity analysis of the fuzzy sets [3], [8], however this method is not very suitable for feature selection. An Orthogonal Least Squares (OLS) method is proposed for reduction of the consequent space. The application of orthogonal transforms for the reduction of the number of rules has received much attention in recent literature [9], [10]. These methods evaluate the output contribution of the rules to obtain an importance ordering. For modeling purposes, the OLS is the most appropriate tool [9]. In this paper, OLS is applied for a different purpose; the selection of the most relevant input and consequent variables based on the OLS analysis of the local models of the clusters. The paper is organized as follows. In Section II, the mathematical structure of the applied TS-fuzzy model is presented. Section III describes the new clustering algorithm that allows for the direct identification of TS models. For the selection of the consequent variables, orthogonal least squares based method is presented in Section IVa. The selection of the antecedent variables is based on Fisher’s interclass separability criteria as presented in Section IVb. Section V presents an application example. Conclusions are given in Section VI. II. M ATHEMATICAL S TRUCTURE OF THE A PPLIED TAKAGI -S UGENO F UZZY M ODELS Our goal is to develop an algorithm for the identification of unknown nonlinear systems of the form:
(1)
based on specified or measured input data !!!!"$#% '& ( and measured output data of the system, where ) +*!!!'", denotes the index of the ) -th input-output data-pair. In general it may not be easy to find a global nonlinear model that is universally applicable to describe the unknown system .- . In that case it would certainly be worthwhile to build local linear models for specific operating points of the process and combine these into a global model. This can be done by combining a number of local models, where each local model has a predefined operating region where the local model is valid. This results in the so-called operating regime based model [11] or more specific in a TS-fuzzy model when
2
the operating region is defined by fuzzy rules. This type of operating regime based model is formulated as:
(
(2)
where describes the operating regime of the -th local linear
model defined by the parameter vector ( & ( , is the regression vector, where and are and dimensional subsets of the original input vector , respectively. The operating regime of the local models can also be represented by fuzzy sets [12]. Hence, the entire global model can be conveniently represented by TakagiSugeno fuzzy rules [2]:
If
is
(
then
*!!'! ! &
(3) represents a multivariate membership function that describes the fuzzy set , where and are the parameters of the local linear model, and ! "'* & is the weight of the rule that represents the desired impact of the rule (note that ! is not in (2)). The value of ! is often chosen by the designer of the fuzzy system based on his or her belief in the goodness and accuracy of the -th rule. When such knowledge is not available ! is set as ! *$# . Usually, the antecedent proposition “ is ” is expressed as a logical combination of simple propositions with univariate fuzzy sets defined for the individual components of , often in the conjunction form:
The objective of clustering is to partition the identification data I into clusters. This means, each observation consists of the input and the output variables, grouped into a row vector J '& , where ) denotes the ) th row of the I matrix. The fuzzy partition is represented by the K L & NMO matrix, where the L element of the matrix represents the degree of membership, how the J observation is in the cluster *!!'! . Here a new fuzzy clustering method is utilized that has been adapted from the Gath-Geva method [5]. Each cluster contains an input distribution, a local model and an output distribution to represent the density of the data:
P $
is & %
If %
and
!!
and ( % '
is & ' ( % '
+ !
!
,'
-
& - % -
(5)
The rules of the fuzzy model are aggregated using the normalized fuzzy mean formula:
01 ! + /0 ( . ! +
.
(6)
Gaussian membership functions are used here to represent the fuzzy set & - % - :
* % - = - ? (7) @ ? A where = - represents the center and @ ? - the variance of the Gaussian & - % - 2436587:9 ;
function. The use of Gaussian membership function allows for the compact formulation of (5):
P $ Q P $SR Q P Q P R Q P TR Q P Q
The input distribution is parameterized as an unconditioned Gaussian [13], and defines the domain of influence of a cluster similarly to multivariate membership functions. The clustering is based on the minimization of the sum of weighted squared distances between the data points, J and the cluster prototypes, Q that contain the parameters of the clusters. The square of the U ? distances are weighted with the membership values L in the objective function that is minimized by the clustering algorithm and formulated as:
then (
(4) In this case, + , the degree of fulfillment of a rule is calculated as the product of the degree of fulfillment of the fuzzy sets in the rule:
III. C LUSTERING FOR THE I DENTIFICATION OF TS M ODEL
)*
V Q I K
! &
O
WL U ?
-
(10)
To get a fuzzy partitioning space, the membership values have to satisfy the following conditions:
O L b` , _ # XZY\[ NMO R L Y "'* &]# L *_# a " ` ( ) ^ )(^ (11)
The minimization of the (10) functional represents a non-linear optimization problem that is subject to constrains defined by (11) and can be solved by using a variety of available methods. The most popular method, is the alternating optimization (AO), which is formulated as follows: Clustering algorithm: Initialization Given a set of data I specify , choose a weighting ; exponent c and a termination tolerance dfe " . Initialize the
L & L g
P Q R NMO partition matrix randomly, where K denotes the membership that the J input-output data is generated by the th cluster. ; Repeat for h * !'! Step 1 Calculate the parameters of the clusters i Calculate the centers and standard deviation of the Gaussian membership functions:
1 1 . O L j l F m . O L j l F m * 0/ < /0 9 C D 0 E G F + ; D jlm ? -j l m @ A C k . O L j l F m .O L (8) !' = - & C
= ! ! D ' denotes the center of the -th multivariwhere
ate Gaussian and E stands for a diagonal matrix that contains the variances. In the following sections a new clustering-based technique for the
identification of the above presented model structure is described. In addition we describe new techniques for the selection of the consequent and the antecedent variables.
- n9>= - ?
j l F m
*po oq (12)
i Estimate the parameters of the local models by weighted least squares, The weighted, also called local, parameter estimation approach does not estimate all parameters simultaneously, the parameters of the local models are estimated separately, using a set of local estimation
3
criteria [4]
9 ? 9 ? (13)
where denotes the extended regression matrix obtained by adding a unitary column to , + & , where + !!!! O & ( and !!!! O & ( , and denote a diagonal matrix having membership degrees in its diagonal elements.
L "
.. .
"
" L "
.. .
?
-!-!-!-!-
"
..
..
.
.
-!-!- L O
(14)
The weighted least-squares estimate of the consequent rule parameters is given by:
(
i @
F
(
standard deviation of the modeling error:
(15)
C&9' ( C
9>C&9'$C&9%
(22) The feature interclass seperatibility selection criterion is a trade-off between E and E :
67
65
V ;: =2 < E65 =2