Determining the Model Order of Nonlinear Input-Output Systems by Fuzzy Clustering Balazs Feil, Janos Abonyi, and Ferenc Szeifert University of Veszprem, Department of Process Engineering, Veszprem, P.O. Box 158, H-8201, Hungary
[email protected], http://www.fmt.vein.hu/softcomp Abstract. Selecting the order of an input-output model of a dynamical system is a key step toward the goal of system identification. By determining the smallest regression vector dimension that allows accurate prediction of the output, the false nearest neighbors algorithm (FNN) is a useful tool for linear and also for nonlinear systems. The one parameter that needs to be determined before performing FNN is the threshold constant that is used to compute the percentage of false neighbors. For this purpose heuristic rules can be followed. However, for nonlinear systems choosing a suitable threshold is extremely important, the optimal choice of this parameter will depend on the system. While this advanced FNN uses nonlinear inputoutput data based models, the computational effort of the method increases along with the number of data and the dimension of the model. To increase the efficiency of the method this paper proposes the application of a fuzzy clustering algorithm. The advantage of the generated solutions is that it remains in the horizon of the data, hence there is no need to apply nonlinear model identification tools. The efficiency of the algorithm is supported by a data driven identification of a polymerization reactor.
1 Introduction In recent years a wide range of model-based engineering tools have been developed. However, most of these advanced techniques require models of relatively low order and restricted complexity. Since most of the current data-driven identification algorithms assume that the model structure is a priori known, the structure and the order of the model have to be chosen before identification. Several information theoretic criteria have been proposed for the selection of the order of input-output models of linear dynamical systems. A technique based on prediction-error variance, the Final Prediction-Error (FPE) criterion, was developed by Akaike [1]. Akaike also proposed another well known criterion, Aikaike’s Information Criterion (AIC), that is derived from information theoretic concepts, but do not yield consistent estimates of the model order. To avoid this problem, the Minimum Description Length (MDL) criterion has been developed by both Schwartz and Rissanen, and its ability to produce consistent estimates of model order has been also proven [8]. With the usage of these tools, determining the model order of linear systems is not a problematic task. While there is extensive work in determining the proper
2
B. Feil, J. Abonyi and F. Szeifert
model order for linear systems, there is relatively little work in the filed of nonlinear systems. For the determination of the order of nonlinear models deterministic suitability measures [3] and false nearest neighbors (FNN) based algorithm [9] have been worked out and applied in the chemical process industry. These ideas build upon similar methods developed for the analysis of self-driven chaotic time series [7]. The idea behind the FNN algorithm is geometric in nature. If there is enough information in the regression vector to predict the future output, then any of two regression vectors which are close in the regression space will also have future outputs which are close in some sense. Hence, the model order identification is reformulated as a determination of a distance measure and the calculation a problemspecific threshold that is used to compute the percentage of false neighbors for all combinations of the possible input variables. The one parameter that needs to be determined before performing FNN is the threshold constant. For this purpose heuristic rules can be followed [3]. However, for nonlinear systems choosing a suitable threshold is extremely important, the optimal choice of this parameter will depend on the system [9]. While this advanced FNN uses nonlinear input-output data based models, the computational effort of the method increases along with the number of data and the dimension of the model. To increase the efficiency of the method this paper proposes the application of a fuzzy clustering algorithm. The advantage of the generated solutions is that it remains in the horizon of the data, hence there is no need not to apply nonlinear model identification tools. The paper is organized as follows. Section 2 presents the idea behind the FNN algorithm. In Section 3, the application of fuzzy clustering for improvement of this algorithm is proposed. An application example is given in Section 4. Conclusions are given in Section 5.
2 FNN Algorithm Many non-linear static and dynamic processes can be represented by the following regression model yk = f (xk )
(1)
where f (.) is a nonlinear function and x represents its input vector. Among this class of models, the identification of discrete-time, Non-linear AutoRegressive models with eXogenous inputs (NARX) is considered in this paper. In the NARX model, the model inputs are past values of the process outputs y(k) and the process inputs u(k). xk = [y(k − 1), . . . , y(k − m), u(k − 1), . . . , u(k − n)]T
(2)
while the output of the model is the one-step ahead prediction of the process, y k = y(k).
Fuzzy Clustering for Model Order Detection
3
The number of past outputs used is m and the number of past inputs is n. The values m and n are often referred to as model orders. The above SISO system representation can be assumed without a loss of generality since the extension to MISO and MIMO systems is straightforward. The method of false nearest neighbors (FNN) was developed by Kennen [7] specifically for determining the minimum embedding dimension, the number of time-delayed observations necessary to model the dynamic behavior of chaotic systems. For determining the proper regression for input/output dynamic processes, the only change to the original FNN algorithm involves the regression vector itself [9]. The main idea of the FNN algorithm utilized in this article stems from the basic property of a function. If there is enough information in the regression vector to predict the future output, then any of two regression vectors which are close in the regression space will also have future outputs which are close in some sense. For all regression vectors embedded in the proper dimensions, for two regression vectors that are close in the regression space and their corresponding outputs are related in the following way: i h i2 h m,n m,n m,n m,n yk − y j = d f xm,n x − x + o x − x (3) j j k k k where d f xm,n is the jacobian of the function f (.) at xm,n k k . Ignoring higher order terms, and using the Cauchy-Schwarz inequality the following inequality can be obtained:
m,n m,n
|yk − y j | ≤ d f xm,n (4) x − x
j k k 2 2
|yk − y j |
≤ d f xm,n
k 2
m,n m,n
xk − x j
(5)
2
If the above expression is true, then the neighbors are recorded as true neighbors. Otherwise, the neighbors are false neighbors. Based on this theoretical background, the outline of the FNN algorithm is the following. 1. Identify the nearest neighbor to a given point in the regressor space. For a given regressor: T xm,n k = [y(k), . . . , y(k − m), u(k), . . . , u(k − n)] find the nearest neighbor xm,n j such that the distance d is minimized: m,n d = ||xm,n k − x j ||2
2. Determine if the following expression is true or false |yk − y j | m,n ||xk − xm,n j ||2
≤R
where R is a previously chosen threshold value. If the above expression is true, then the neighbors are recorded as true neighbors. Otherwise, the neighbors are false neighbors.
4
B. Feil, J. Abonyi and F. Szeifert
3. Continue the algorithm for all times k in the data set. 4. Calculate the percentage of points in the data that have false nearest neighbors J(m, n). 5. Continue the algorithm for increasing m and n using the percentage of false nearest neighbors drops to some acceptably small number. Because the model order is determined by finding the number of past outputs m and past inputs n, the J(m, n) indices become a surface in two dimensions. It is possible to find a ’global’ solution (or solutions) for the model orders by computing the desired index over all values of m and n in a certain range and determining which points satisfy the order determination conditions. The smallest m and n values such that J(m, n) is zero lie in the corner of this that is nearest to the origin, mˆ and n. ˆ This corner is easily identified since J(m, n) 6= 0 for m ≤ mˆ and n ≤ n. ˆ When the noise is not zero, J(m, n) will not be zero if m and n are chosen as m ≥ mˆ and n ≥ n, ˆ but it will tend to remain relatively small and flat. Therefore, we calculate table of J(m, n) and then search for the corner where J(m, n) drops quickly similarly to the MDL based method suggested in [8]. A more heuristic ’local’ solution is also possible. In this case, initial guesses for m and n are used, and the optimum model order is computed competitively; ate each iteration, either m or n is increased by one, depending on which reduces the index the greatest amount [3]. In cases where the available input-output data set is small, the algorithm is sensitive to the choice of the R threshold. In [3] the threshold value was selected by trial and error method based on empirical rules of thumb, 10 ≤ R ≤ 50. However, choosing a single threshold that will work well for all data sets is impossible task. In this case, it is advantageous to estimate R based on (5) using the following expression
as it has been suggested by Rhodes and Morari []. R = maxk d f xm,n k the method uses input-output data based models for the estimation of
While
d f xm,n , the computational effort of FNN increases along with the number of k data and the dimension of the model. To increase the efficiency of the method this paper proposes the application of a fuzzy clustering algorithm that will be introduced in the following section.
3 Application of Fuzzy Clustering to FNN The available input-output can be clustered. The main idea of the paper is that when the appropriate number of regressors are used, the collection of the obtained clusters will approximate the regression surface of the model of the system. In this case the clusters can be approximately regarded as local linearizations of the system and can be used to estimate R. Clusters of different shapes can be obtained by different clustering algorithms by using an appropriate definition of cluster prototypes (e.g., points vs. linear varieties) or by using different distance measures. The Gustafson–Kessel clustering algorithm [6] has been often applied to identify Takagi–Sugeno fuzzy systems that
Fuzzy Clustering for Model Order Detection
5
are based on local linear models [2]. The main drawbacks of this algorithm are that only clusters with approximately equal volumes can be properly identified which constrain makes the application of this algorithm problematic for the task of this paper. To circumvent this problem, in this paper Gath–Geva algorithm is applied [5] that will be described in the following subsection. 3.1
Gath-Geva Clustering of the Data
The objective of clustering is to partition a data set Z into c clusters, where the available identification data, ZT = [X y] formed by concatenating the regression data matrix X and the output vector y y1 xT1 T y2 x2 (6) y= X= .. .. . . yN xTN This means, each observation consists of m + n + 1 variables, grouped into an m+n+1-dimensional column vector zk = [x1,k , . . . , xn+m,k , yk ]T = [xTk yk ]T . Through clustering, the fuzzy partition matrix U = [µi,k ]c×N is obtained, whose element µik represents the degree of membership of the observation zk in the cluster ı = 1, . . . , c. In this paper, c is assumed to be known, based on prior knowledge, for instance. For methods to estimate or optimize c in the context of system identification refer to [2]. The GG algorithm is based on the minimization of the sum of the weighted squared distances between the data points,zk and the cluster centers, vi , i = 1, . . . , c. c
J(Z, U, V) = ∑
N
∑ µmi,k D2i,k
(7)
i=1 j=1
where V = [v1 , . . . , vc ] contains the cluster centers and m ∈ [1, ∞) is a weighting exponent that determines the fuzziness of the resulting clusters and it is often chosen as mw = 2. The fuzzy partition matrix has to satisfy the following conditions: U ∈ Rc×N
with µi,k ∈ [0, 1], ∀i, k;
c
∑ µi,k = 1, ∀k;
i=1
0
1 and a termination tolerance ε > 0. Initialize the partition matrix such that (8) holds. Repeat for l = 1, 2, . . .
6
B. Feil, J. Abonyi and F. Szeifert
Step 1 Calculate the cluster centers. N
(l) vi
(l−1)
∑ µi,k
=
k=1 N
∑
k=1
zk
(l−1) µi,k
,1≤i≤c
(9)
Step 2 Compute the distance measure D2ik . The distance to the prototype is calculated based the fuzzy covariance matrices of the cluster N (l−1) (l) (l) T zk − v i zk − v i ∑ µik (l) ,1≤i≤c (10) Fi = k=1 N (l−1) ∑ µik k=1
The distance function is chosen as n+1 p (2π)( 2 ) det(Fi ) 1 (l) T −1 (l) 2 exp Di,k (zk , vi ) = zk − v i Fi zk − v i (11) αi 2 with the a priori probability αi αi =
1 N ∑ µi,k N k=1
(12)
Step 3 Update the partition matrix 1
(l)
µi,k =
∑cj=1 Dik (zk , vi )/D jk (zk , v j )
until ||U(l) − U(l−1) || < ε. 3.2
2/(mw−1) , 1 ≤ i ≤ c, 1 ≤ k ≤ N . (13)
Estimation of the R Threshold Coefficient
The collection of c clusters approximates the regression surface as it is illustrated in Figure 1. Hence, the clusters can be approximately regarded as local linear subspaces. This is reflected by the smallest eigenvalues λi,m+n+1 of the cluster covariance matrices Fi that are typically in orders of magnitude smaller than the remaining eigenvalues [2] (see Figure 2). The eigenvector corresponding to this smallest eigenvalue, tim+n+1 , determines the normal vector to the hyperplane spanned by the remaining eigenvectors of that cluster (tim+n+1 )T (zk − vi ) = 0
(14)
T T Similarly to theh observation i vector zk = [xk yk ] , the prototype vector and is y partitioned as vi = (vxi )T vi into a vector vx corresponding to the regressor x, and
Fuzzy Clustering for Model Order Detection
7
Fig. 1. Example for clusters approximating the regression surface. y
a scalar vi corresponding to the output y. The smallest eigenvector is partitioned in T T i,y i,x i the same way, tm+n+1 = tm+n+1 tm+n+1 . By using this partitioned vectors
(14) can be written as T T h i i,y y i,x tm+n+1 tm+n+1 [xTk yk ]T − (vxi )T vi = 0
(15)
from which the parameters of the hyperplane defined by the cluster can be obtained: T T −1 i,x 1 yk = i,y tm+n+1 xk + i,y (16) tim+n+1 vi = aTi xk + bi tm+n+1 tm+n+1 | {z } {z } | aTi
bi
Fig. 2. Example for clusters approximating the regression surface.
Although the parameters have been derived from the geometrical interpretation of the clusters, it can be shown [2] that (16) is equivalent to the weighed total leastsquares estimation of the consequent parameters, where each data point is weighed √ by the corresponding µik .
8
B. Feil, J. Abonyi and F. Szeifert
The main contribution of this paper is that it suggests the application of an adaptive threshold function that takes into account the nonlinearity of the system. This means, based on the result of the fuzzy clustering, for all input-output data
pairsm,ndif
d f x
ferent Rk values are calculated. Since, the optimal value of R is R = k k k m,n and the d f xk partial derivatives can be estimated based on the shape of the clusters from (16) T c −1 i,x (17) tm+n+1 d f xm,n ≈ ∑ µik i,y k tm+n+1 i=1 the threshold can be calculated as
c T −1 i,x
Rk = ∑ µik i,y tm+n+1
i=1 t
m+n+1
(18)
2
4 Application for Continuous Polymerization Reactor The following example illustrates identification using data from a model of a continuous polymerization reactor. The model describes the free-radical polymerization of methyl methacrylate with azobisisobutyronitrile as an initiator an toluene as a solvent. For further information on the details of this model and how it is derived, see [4]. The reaction takes place in a jacketed CSTR, and after some simplifying assumption are made the first-principles model is given by (23). √ x˙1 = 10(6 − x1 ) − 2.4568x1 x2 (19) x˙2 = 80u − 10.1022x2 (20) √ (21) x˙3 = 0.024121x1 x2 + 0.112191x2 − 10x3 √ x˙4 = 245.978x1 x2 − 10x4 (22) x4 x˙5 = (23) x3 The dimensionless state variable x1 refers to the monomer concentration, and x4 /x3 is the number-average molecular weight (an also the output y). The input u is the dimensionless volumetric flow rate of the initiator. Since a model of the system is known, large amounts of data can be collected for analysis. For this example we apply a uniformly distributed random input over the range 0.007 to 0.015 with a sampling time of 0.2. By driving the system with this input signal, an output that is roughly in the range of 26,000 to 34,000 is produced, which is the desired operating range of the system. The model with output order m = 1 and input order n = 2 should give an accurate estimate of future outputs, because the MARS algorithm constructs an accurate model for this problem [9]. The Table 1 shows the results of the proposed algorithm with c=6 cluster. The number with m=2 and l=1 is enough small, but larger input and output orders are acceptable, too [9]. The clustering algorithm has an parameter: c, the
Fuzzy Clustering for Model Order Detection
9
Table 1. FNN results for polymerization data when R is obtained by fuzzy clustering Input Delays (n) % FNN
Output Delays (m) 0
1
2
3
4
0
100.00 99.87 62.40 38.14 17.25
1
99.19 69.14 8.76 0.54 0.54
2
73.45 3.77 1.89 0.67 0.00
3
8.76
0.40 0.14 0.00 0.00
4
0.14
0.40 0.13 0.00 0.00
number of the clusters. The increasing of this parameter increases the accuracy of the model as a general rule. For the purpose to avoid the overfitting and the increasing of the calculation requirement it is recommended to determine the number of the clusters automatically. For this purposes the method of Gath and Geva can be applied [5] and [2]. For comparisons the next table shows the results when constant threshold has been used. In this case the value of R has been estimated based on the parameters of a linear ARX model identified based on the data used for clustering purposes. Table 2. FNN results for polymerization data when R is obtained based on the parameters of a linear ARX model Input Delays (n) % FNN
Output Delays (m) 0
1
2
3
4
0
100.00 99.73 97.04 92.32 84.64
1
99.60 68.19 9.97 1.48 0.94
2
73.72 5.52 4.45 2.02 0.14
3
9.57
2.16 1.08 1.48 0.27
4
1.48
1.08 1.35 0.94 0.27
We can allocate that this linear model based method does not give conspicuously incorrect results, but induces larger error for high nonlinear systems, because it results in more inaccurate approximation. Hence, the application of the proposed clustering based approach is much more advantageous.
10
B. Feil, J. Abonyi and F. Szeifert
5 Conclusions By determining the smallest regression vector dimension that allows accurate prediction of the output, the FNN algorithm is a useful tool for linear and also nonlinear systems. It reduces the overall computational effort, simplifies and makes more effective the nonlinear identification which becomes difficult and gives not certainly accurate results by false regression vector. To increase the efficiency of FNN this paper proposed the application of clustering algorithm. The advantage of our approach is that it remains in the horizon of the data and there is need not to apply nonlinear model identification tools to determine the threshold parameter of the FNN algorithm.
Acknowledgement The financial support of the Hungarian Ministry of Culture and Education (FKFP0073/2001) and the Hungarian Science Foundation (T037600) is greatly acknowledged. Janos Abonyi is grateful for the financial support of the Janos Bolyai Research Fellowship of the Hungarian Academy of Science.
References 1. H. Akaike. A new look at the statistical model identification. IEEE Trans. on Automatic Control, 19:716–723, 1974. 2. R. Babuˇska. Fuzzy Modeling for Control. Kluwer Academic Publishers, Boston, 1998. 3. J.D. Bomberger and D.E. Seborg. Determination of model order for NARX models directly from input–output data. Journal of Process Control, 8:459–468, Oct–Dec 1998. 4. F.J. Doyle, B.A. Ogunnaike, and R. K. Pearson. Nonlinear model-based control using second-order volterra models. Automatica, 31:697, 1995. 5. I. Gath and A.B. Geva. Unsupervised optimal fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 7:773–781, 1989. 6. D.E. Gustafson and W.C. Kessel. Fuzzy clustering with fuzzy covariance matrix. In Proceedings of the IEEE CDC, San Diego, pages 761–766. 1979. 7. M.B. Kennel, R. Brown, and H.D.I. Abarbanel. Determining embedding dimension for phase-space reconstruction using a geometrical construction. Physical Review, A:3403– 3411, 1992. 8. G. Liang, D.M. Wilkes, and J.A. Cadzow. Arma model order estimation based on the eigenvalues of the covariance matrix. IEEE Trans. on Signal Processing, 41(10):3003– 3009, 1993. 9. C. Rhodes and M. Morari. Determining the model order of nonlinear input/output systems. AIChE Journal, 44:151–163, 1998.