Comparison of fuzzy and Volterra series nonlinear system modeling approaches Musa H. Asyali1 and Musa Alci2 1 2
Dept. of Computer Engineering, Yasar University, Izmir, Turkey
[email protected] Dept. of Electrical and Electronics Engineering, Ege University, Izmir, Turkey
[email protected]
In this study, we investigated modeling performances of two popular nonlinear system identification methods, namely fuzzy modeling and Volterra series. In literature a general approach to nonlinear structure modeling does not exist, therefore both fuzzy models and Volterra series are interesting and widely used as they can approximate a large class of nonlinear functions. In fuzzy modeling, a dynamic system is modeled using a set of fuzzy membership functions and rules. The fuzzy model parameters are trained using optimization techniques. In Volterra series approach, the dynamic system is modeled using a set of kernel functions that represent the first and higher order convolutions. The kernel functions are typically estimated using an orthogonal expansion technique using a set of suitable basis functions such as Laguerre. We compared the modeling performance of these approaches on a hypothetical test system whose kernels or structure is known priori and observed that the Volterra modeling based on Laguerre basis expansion of kernels offers better performance.
1 Introduction The use of the fundamental models for nonlinear system is not practical in most cases, as the models of realistic complexity may often involve hundreds of nonlinear differential equations. Therefore, empirical models such as the Volterra Series approach that are obtained from input-output data are commonly used in nonlinear system identification [WK03, AJ05]. Another empirical technique gaining popularity in the recent years is the fuzzy modeling approach [SZL95]. Both approaches are very important because they can approximate a large class of nonlinear systems. In this study, we aim at comparing modeling performances or efficiencies of these two modeling approaches on a test system. Such a comparison does not exist in literature. Therefore, 335 Kenan Ta¸s et al. (eds), Mathematical Methods in Engineering, 335–345. © 2007 Springer. Printed in the Netherlands.
336
Musa H. Asyali and Musa Alci
motivation behind is study is to address the question of which approach performs better. This will help modeling scientists select the right, i.e. better performing, approach. We define modeling efficiency as the ability of a model to reduce the estimation error using as few parameters as possible. In this context, we used the minimum description length criterion to compare efficiencies of the two approaches. In Section 2, we will briefly review Volterra series and the fuzzy modeling approaches. We will describe the test data that we used in our comparison studies and present the results in Section 3. Finally, in Section 4, we will discuss our results and make some concluding remarks.
2 Methods 2.1 Volterra series Volterra has shown that a nonlinear time invariant system’s output or response y(t) to an input or stimulus x(t) can be expressed by the following multiconvolution relation [WK03, AJ05, Mar93]. ∞
∞
∞
y(t) = Cdc +
x(t − τ1 ) . . . x(t − τi )Hi (τ1 , . . . , τi )dτ1 . . . dτi
...
i=1 0
(1)
0
The sum given by (1) is known as the Volterra series. Here, Hi (τ1 , . . . , τi ) denotes the system’s ith order Volterra kernel which is associated with the system’s ith order nonlinearity. The Cdc is a constant term that balances the means of the two sides. For a linear system Cdc is 0 and the remainder of the right-hand side of (1) reduces to the well-known convolution integral in which case H1 (τ1 ) is called the impulse response. As we will be using numerical techniques for kernel estimation, we need to translate equation (1) into discrete-time. We do this by assuming a default sampling period of 1 s and input-output data length of N points. If we denote input and output sequences by x[n] and y[n] (n = 0, 1, . . . , N − 1) after discretization of (1) we obtain y[n] = Cdc +
N 1 −1 m=0
x[n − m]H1 [m] +
N 2 −1 N 2 −1
H2 [m1 , m2 ]
(2)
m1 =0 m2 =0
where, H1 and H2 denote the 1st and 2nd order kernels and N1 and N2 their lengths. Notice that, during the conversion to discrete-time, we included kernels up to 2nd order and assumed that the contribution of the higher order kernels are embedded in the error sequence e[n].
Fuzzy modeling and Volterra series
337
Given a system with memory M , our sample input-output data must have a length greater than M , i.e. N > M , in order to capture the system dynamics. We need some prior knowledge in order to make a reasonable guess for the memory of the system. As for the selection of kernel lengths N1 and N2 , these cannot be greater than the data length N , i.e. N > N1 and N > N2 . Further, both N1 and N2 must be greater than M. From (2) we note that, H1 and H2 respectively has N1 and N2 (N2 − 1) distinct values or parameters to be estimated, as such the number of values in the kernels increases exponentially with the order. Therefore, to keep the number of parameters at a reasonable level, one can select kernel lengths N1 and N2 such that N > N1 ≥ N2 ≥ M . We note that even a 2nd order Volterra model is highly parameterized and this may cause the parameters to have high noise-sensitivity. One way to alleviate this difficulty associated with the estimation of Volterra kernels is to project the kernels onto a small number of orthogonal basis functions. The key issue here is to utilize basis functions that are morphologically similar to the kernels of the system under study. This enables accurate representation of kernels with a relatively small number of basis functions, which implies a reduction in the number of parameters to be estimated. Estimating a smaller number of parameters may improve the numerical condition of the estimation problem and produce coefficient estimates with less variance and hence a more reliable model. The series expansion utilizing discrete orthogonal Laguerre Basis Functions (LBF) has been used widely [WK03]. Discrete LBF can be defined/given more conveniently by their z-transform: q 1 − ξz z −−−−−−−−−−−→ 2 (3) Lq [m] z − transf orm Lq [z] = 1 − ξ z−ξ z−ξ
Here, ξ is the pole parameter (0 < ξ < 1) that determines how soon the LBF will die away and q is the order of the basis functions. As increases, the functions become more oscillatory and prolonged. Therefore, ξ must be chosen in accordance with memory of the system. Given the guessed value for the memory and the highest order of LBF to be used to represent the kernels, using (3) the most suitable value for ξ can be calculated easily. This is the approach employed typically in literature. However, knowing the memory of the system in advance accurately is rarely possible. Therefore we employ a different technique based on simplex optimization to select an optimal value for ξ in our kernel estimation studies. This issue is further explained in Section 3.2. For a more detailed treatment of the LBF, we refer the reader to [WK03, AJ05, Mar93]. We will now show that Volterra kernel estimation problem can be formulated as a multiple regression problem using LBF expansion and therefore solved using least squares estimation. We return to (2) where we employ two kernels to explain the nonlinear dynamics of a system. Even if the system under study may have higher order nonlinearities (kernels), we can still find out how well we can approximate the system’s behavior with a 2nd order
338
Musa H. Asyali and Musa Alci
Volterra model. This is why this approach is sometimes referred to as truncated Volterra Series. We first expand H1 and H2 using LBF Lq , q being the order, as Q 1 −1 H1 [m] ∼ Cq Lq [m], m = 0, 1, . . . , N1 − 1 (4) = q=0
H2 [m1 , m2 ] ∼ =
Q 2 −1 Q 2 −1
Cq1 ,q2 Lq1 [m1 ]Lq2 [m2 ]
(5)
q1 =0 q2 =0
Here, Cq and Cq1 ,q2 are coefficients or weights, and Q1 and Q2 are the number of basis functions that are used in the expansion of H1 and H2 respectively. Selection of proper values for Q1 and Q2 is of crucial importance. We will discuss this issue in Section 3.2. If we substitute (4) and (5) in (2) and define convolution of x with Lq as N −1
vq [n] =
x[n − m]Lq [m],
m=0
we can express y[n] as y[n] =
Q 1 −1 q=0
Cq vq [n] +
Q 2 −1 Q 2 −1
Cq1 ,q2 vq1 [n]vq2 [n] + e[n]
(6)
q1 =0 q2 =0
We should note here that the error terms in (6) and (2) are slightly different. The error term in (6) includes not only missing and/or ignored contribution of higher order kernels to the output but also the error introduced due the approximate kernel expansions (4) and (5) which are substituted in (2). By further defining column vectors corresponding to the outT put, convolution, and error sequences respectively as y = [y[0]...y[N − 1]] , T T vq = [vq [0]...vq [N − 1]] , and e = [e[0]...e[N − 1]] , we can put (6) into matrix form as y = VC + e,
(7)
where, V = [[1 . . . 1] |v0 v1 . . . vQ2 −1 |v0,0 v0,1 . . . vQ2 −1,Q2 −1 ] is the N × P observation matrix formed by using vq ’s and their element-wise multiplicative combinations T
vq1 ,q2 = vq1 vq2 and C = [Cdc |C0 C1 . . . CQ2 −1 |C0,0 2C0,1 . . . CQ2 −1,Q2 −1 ]T is the P × 1 vector of coefficients. The column of 1’s in V allows for the estimation of the constant term Cdc . Since vq1 ,q2 = vq2 ,q1 , we collected similar terms in the expansion of H2 in (5) and doubled the corresponding coefficient in vector C, hence P = 1 + Q1 + Q2 (Q2 + 1)/2. The over-determined system of equations given in (7) can be solved conveniently for C using the least squares technique. Inclusion of the constant term in the regression assures that the error sequence will have zero mean.
Fuzzy modeling and Volterra series
339
Once the coefficients are estimated, they are substituted in the expansions (4) and (5) and the kernels are constructed. Extension of this technique to the estimation of higher order kernels is straightforward. 2.2 Fuzzy modeling Fuzzy models are based on the concept of fuzzy logic, a notion which extends human decision making practices or heuristics into a formal system modeling and/or identification platform. Using fuzzy models one can formulate mapping from a given input to an output using the following elements: fuzzifier, inference engine, defuzzifier and rule base [Le90, LLW97]. Rule base consists of linguistic statements such as: If x1 = Al1 and x2 = Al2 and ... and xn = Aln , then y = B l
(8)
where Al1 , ..., Aln , are the fuzzy sets represented by the input membership functions, B l are the fuzzy sets represented by the output membership functions and l = 1, 2, ..., M is the rule index. Fuzzy models have been successfully applied in fields such as automatic control, expert systems, computer vision, and data clustering/classification. There are two types of fuzzy inference systems that are commonly used in the practice, Mamdani-type and Sugeno-type. Mamdani’s model uses fuzzy sets in both antecedent and consequent parts of rules. Sugeno has shown that, it is also possible to use crisp functions as the output membership function rather than a distributed fuzzy set. This approach (also called Takagi, Sugeno and Kang model) enhances the efficiency of the defuzzification process as it requires less computation than the Mamdani method. A comprehensive survey of many other ways proposed to implement fuzzy rules and models can be found in [Le90]. Using singleton fuzzifier, product inference engine, center average defuzzifier, and Gaussian membership functions, the fuzzy model with respect to the given rule base is modeled in [Wan94, Wan97] as: n M l l l y¯ µAli (xi , x ¯ i , σi ) i=1 (9) f (x, x ¯li , σil , y¯l ) = l=1 M n l l µAli (xi , x ¯ i , σi ) l=1
i=1
where M is the number of rules, n is the number of inputs,¯ y l parameters repl l resent the center of output membership functions B , x ¯i parameters represent the center of input membership functions Al , and σil parameters represent the input membership function widths. Considering the fact that fuzzy models are parametric, we can use optimization tools to calculate or train the system parameters. During the optimization, the following performance criterion is minimized: 1 N −1 (y[n] − y˜[n])2 (10) E= n=0 2
340
Musa H. Asyali and Musa Alci
Here, N is the length of input/output pairs and y and y˜ respectively denote the actual and estimated output values. In order to determine the parameters of the fuzzy model, fuzzy system is represented as a feed forward network. In our estimations, Levenberg-Marquardt algorithm with Fletcher strategy is used for tuning the parameters [KA05].
3 Experimental data and results 3.1 Experimental data We have carried the modeling performance comparison study on synthetic data obtained using the following linear-nonlinear Wiener cascade (Fig. 1).
Fig. 1. Generation of synthetic test data.
This cascade corresponds to a 2nd order nonlinear system. Expressed algebraically, the relationship between input x and output y is: y[n] = y1 [n] + y1 [n]2 , where y1 [n] = x[n − 5] − 0.35y[n − 1]. According to this formulation, Volterra kernels of our test system are as follows 0 , n