System Identi cation Using Overparametrized State-Space Models T. McKelvey Department of Electrical Engineering Linkoping University S-581 83 Linkoping, Sweden E-mail:
[email protected] Abstract
In this report we consider identi cation of linear time-invariant nite dimensional systems using state-space models. We introduce a new model structure which is fully parametrized, i.e. all matrices are lled with parameters. All multivariable systems of a given order can be described within this model structure and thus relieve us from all the internal structural issues otherwise inherent in the multivariable state space identi cation problem. The models are obtained with an identi cation algorithm by minimizing a regularized prediction error criterion. Some analysis is pursued which shows that the proposed model structure retains the statistical properties of the standard identi able model structures. We prove, under some mild assumptions, that the proposed identi cation algorithm locally converges to the set of true systems. Inclusion of an additional step in the algorithm gives convergence to a balanced realization of the true system. Some results on the analysis of the sensitivity of the transfer function with respect to the parameters for a given realization are reviewed, which show that balanced realizations have low sensitivity. We show that for one particular choice of regularization the obtained model is in a norm minimal realization. Examples are given showing the properties of the proposed model structure using both real and simulated data.
1
1 Introduction In this report we consider identi cation of multivariable linear time invariant systems. We also assume that we have no structural information and thus have to resort to so called black-box models. These can be either of inputoutput type or of state-space type. There are several reasons that favor the state-space form: They are their own economic implementation algorithm. They have simple relationships between simulation and prediction applications. Many control design methods and algorithms, including the recent Doyle-Glover algorithm solving the H1 optimal control problem [2], use state-space descriptions of the plants to be controlled. From a computer implementation point of view it is important to use models which have low sensitivity with respect to round-o errors and nite word length eects. Input-output descriptions of some systems are known to be very sensitive to these eects [29]. We will mainly concentrate on the parametrization problem of multivariable state-space systems. We propose a fully parametrized state-space model structure which immediately solves the dicult parametrization problem and also opens possibilities to obtain models with good numerical properties. The only drawback is a higher computational burden since more parameters are estimated compared to standard techniques using identi able model structures. The report is outlined as follows: In Section 2 we give a brief review of state-space models and their associated predictors. The parametrization of state-space model structures is then discussed and the fully parametrized model structure is introduced in Section 3. Section 4 is devoted to the prediction error minimization problem and a regularized prediction error criterion is introduced. In section 5 a statistical analysis is performed showing that the quality of the prediction for the fully parametrized model is the same as for a standard identi able model. A convergence analysis show that the identi cation algorithm is locally convergent. In section 6 some results on the sensitivity of a transfer function are reviewed which show that a balanced 2
realization has low sensitivity. The main identi cation algorithm is presented in Section 7 which yield a balanced realization of the true system. In Section 8 we discuss a particular form of regularization and show that the estimated system will be a norm minimal realization. Some identi cation examples using both real and simulated data are given in Section 9. In Section 10 the report is summed up.
2 State-Space Models of Linear Systems
A general time-invariant linear system S can be described as
S : y(t) = G0(q)u(t) + H0 (q)e0(t)
(1)
where y(t) is an output vector of dimension p, u(t) is an input vector of dimension m and is assumed to be known. The unknown disturbances acting on the output are assumed to be generated by the second term where e0 (t) is a noise vector of dimension p and is assumed to be a sequence of independent stochastic variables with E e0 (t) = 0 and E e0 (t)e0(t)T = 0. The transfer functions can be characterized with their impulse responses as
G0(q) =
1 X k=0
g(k)q?k; H0(q) = I +
1 X k=1
h(k)q?k
(2)
where H0(q)?1 is assumed to be stable. We will also assume that g(0) = 0 i.e. the direct term from u(t) to y(t) is missing. Inclusion of the direct term is straightforward and does not change any of the results to be presented. Furthermore we also will assume that the system is nite-dimensional, i.e. the transfer function can be written as a matrix where each entry is a rational function. A natural predictor y^(t) for the output y(t) given inputs u and outputs y up to time t ? 1 is
y^(t) = H0 (q)?1G0(q)u(t) + [I ? H0(q)?1]y(t):
(3)
Since the system (1) is of nite order it can also be described in a statespace model formulation introducing an auxiliary state vector with dimension n, the order of the system. In order to estimate the dynamics of the system 3
we introduce a model structure M, parametrized by some parameters . In this report we will focus on state-space model structures in innovations form
M : x^(t + 1) = A()^x(t) + B ()u(t) + K ()e(t) y(t) = C ()^x(t) + e(t)
(4)
where the matrices A; B; C; K are constructed from the parameter vector according to the model structure M. The model structure M is thus a mapping from the parameter space to the model (4). Let
dM = dim
(5)
be the dimension of the parameter vector and let M() denote the model (4). The model will thus have the transfer functions
G(q; ) = C ()[qI ? A()]?1 B () and
H (q; ) = I + C ()[qI ? A()]?1 K (): Using (4) the predictor is given by x^(t + 1) = [A() ? K ()C ()]^x(t) + B ()u(t) + K ()y(t) y^(tj) = C ()^x(t)
(6) (7)
(8)
which also can be written as
y^(tj) = Wu(q; )u(t) + Wy (q; )y(t) where
(9)
Wu(q; ) = C ()[qI ? A() + K ()C ()]?1B () (10) Wy (q; ) = C ()[qI ? A() + K ()C ()]?1 K (): (11) In order to use the predictor (8) we have to ensure stability of the two transfer functions Wu and Wy and make the following de nition De nition 2.1 Let the set of parameters DM R dM be DM = f 2 R dM j Wu(q; ) and Wy (q; ) are stableg 4
2 From the structure of the state-space predictor (8) it follows trivially that we here have DM = f 2 R dM j the matrix A() ? K ()C () has all eigenvalues inside the open unit discg for the general state-space model structure (4).
De nition 2.2 Two models, M1(1 ) and M2(2 ) are equal if and only if G1(ei! ; 1) = G2 (ei! ; 2 ) and H1(ei! ; 1 ) = H2(ei! ; 2) for almost all !
(12)
2
To simplify the reasoning about dierent models we will use the term model set denoted by M , see [13]. A model structure M together with a parameter space will give us a model set:
M = fM() j 2 DMg
(13)
3 Parametrization of State-Space Models The parametrization of models to be used in system identi cation is often closely tied to the concept of identi ability.
De nition 3.1 [13] A model structure M is globally identi able at if M() = M() ) = (14) 2 An identi able model structure thus has a one to one correspondence between the models, i.e. the transfer functions, and the value of the parameter vector . 5
Traditionally, see [13, 27], model structures which are identi able have been favored for the purpose of system identi cation. The use of an identi able model structure has some clear advantages; the criterion VN () has a unique minimum if the data is informative, numerical algorithms perform well etc. However the parametrization of multivariable state-space models is a well-known and notoriously dicult problem in system identi cation, see e.g. [9], [17] or [31]. The root of the problem is that there is not one single, smooth identi able parametrization of all multi-output systems (4). Typically one has to work with a large number of dierent possible parametrizations depending on the internal structure of the system. Also for some systems it is dicult to nd an identi able parametrization which is well conditioned [14]. As an example of an identi able model structure we have the following model structure: Let A() initially be a matrix lled with zeros and with ones along the super diagonal. Let then the row numbers r1 ; r2 ; : : : ; rp , where rp = n, be lled with parameters. Take r0 = 0 and let C () be lled with zeros, (15) and then let row i have a one in column ri?1 + 1. Let B () and K () be lled with parameters.
Denote this model structure MI . This parametrization is uniquely characterized by the p numbers ri which are to be chosen by the user. Instead of ri we will use the numbers i = r1 ? ri?1 and call n = f1 ; : : : ; pg (16) the multi-index associated with the model structure (15)
Theorem 3.1 The state-space model structure MI de ned by (15) is globally identi able at if and only if fA( ); [B ( ) K ( )]g is controllable. Proof. See [13] Theorem 4A.2. 2 From the de nition of the identi able model structure (15) it follows that the dimension of the parameter vector is
dM = nm + 2np: I
6
(17)
For a given system order n and number of outputs p there exists (pn??11) dierent multi-indices. We also have from [13]: Theorem 3.2 Any linear system that can be represented in state-space form of order n can also be represented in the particular form (15) for some multi-index n . 2 Thus if we consider the union of all model sets Mn generated by the model structures given by all dierent multi-indices we have
[
S 2 M n
n
for all possible linear systems S of order n. When identifying a multivariable system of a given order n using an identi able parametrization we can, in principle, not use one single parametrization and thus have to search for the best model among all model structures de ned by the multi-indices. However, as pointed out in [13], the dierent model structures overlap considerable and one particular multi-index is capable of describing almost all n-dimensional linear systems. The price paid is that the identi cation might result in a numerically ill-conditioned model. The concept of identi able model structures is important if we are interested in the values of the parameters themselves, e.g. change detection techniques where a change in the parameters indicates a change in the underlying system. If we are only interested in the input/output relation G(q; ) the value of each parameter is then of no interest to us. The parameters can then be seen as vehicles to describe the interesting characteristics of the system. It is also known that some systems described in their canonical forms have transfer functions with a very high sensitivity with respect to the parameters and are thus sensitive to nite word length eects occurring in a computer [29]. Based on these arguments we will introduce a new model structure which will circumvent these problems.
3.1 The Fully Parametrized State Space Model
If we consider the state space model (4) and choose to ll all the matrices A; B; C; K with parameters we will clearly over-parametrize the model and thus loose identi ability (14). To formalize we have: 7
Let all the matrices A(); B (); C () and K () in the model (4) be lled with parameters from the parameter vector 2 DM and let the corre- (18) sponding model structure be called fully parametrized.
Denote this model structure MF . The number of parameters needed for this model structure (18) is
dM = n2 + 2np + nm: (19) The fully parametrized model structure thus has an additional n2 number of parameters compared to an identi able model structure (15,17). For completeness we have: F
Property 3.1 Any linear system S that can be represented in state-space
form of order n can also be represented by a model from the fully parametrized model structure (18). 2
Using this model structure for the purpose of identi cation will give two interesting implications. First, this state-space model structure would relieve us from the search through the many dierent forms de ned by the multiindices when dealing with multivariable systems since the proposed model structure trivially includes all possible systems of a given order n. Secondly, the quality of the estimated model might increase if we use a exible model structure which not only can describe the underlying system but also allows a numerically well-conditioned description. This is a major dierence compared with the identi able model structures (15), which by de nition only have one realization for each system S . Since we are con ned to computers with limited accuracy for all calculations it is important to be able to use models which are numerically well conditioned. It might be important to point out that even if the proposed model has more parameters than the corresponding identi able model, the two models will have exactly the same exibility with respect to the transfer functions which we formally can state as
[
MF = M n
n
where MF denotes the model set generated by the fully parametrized model structure (18). 8
4 Prediction Error Minimization In order to investigate the properties of the proposed model structure we will focus on system identi cation techniques based on the minimization of the prediction errors (PEM). The standard setting can thus be described as: Given an input sequence fu(t)g and an output sequence fy(t)g with t = 1; : : : ; N , denoted by Z N , and a model structure M which de nes a mapping between the parameter space DM and a predictor y^(tj), nd the value ^ which minimizes a criterion VN (). Let us de ne the criterion to be N X 1 VN () = N j"(t; )j2 (20) t=1 where j j is the Euclidean l2 norm. The prediction error is the vector
"(t; ) = y(t) ? y^(tj) (21) with the predictor y^(tj) given by (8). This problem formulation is well known and there exists a vast literature on how to minimize (20) and the properties of the estimate ^N = arg min V () (22) 2DM N under varying assumptions about the model structure M and the data Z N . See [13] or [27] for a general discussion on the topic.
4.1 PEM for the overparametrized model
If we use (20) together with the proposed model structure MF the minimum of VN () will be a hyperplane in the parameter space and thus not unique. This follows from the fact that there exist in nitely many parameter values such that S = M(). In Section 5 we will further discuss this property. This non-uniqueness usually leads to convergence problems if standard optimization algorithms are used in the minimization of (20) since most algorithms use the inverse of the Hessian d2 V () d2 N 9
and in this case the Hessian will be singular and non-invertible. One possibility to overcome this property is to further constrain the solution. We will here focus on regularization which is a standard technique for ill-conditioned estimation problems, see [3]. Regularization means that the criterion (20) is augmented with a term: WN () = VN () + 2 j ? #j2; > 0 (23) where jj denotes the Euclidean norm or more general with a function r(; #) with a positive de nite Hessian. The regularization term =2j ? #j2 will have the eect that the resulting estimate ^N = arg min WN () (24) 2DM
will be a compromise between minimizing VN () and being close to the value #. Since our objective is to nd the which minimizes VN () it is clear that the choice of # will in uence the result. In the next section we will address this question together with the presentation of the identi cation algorithm.
5 Analysis
5.1 Statistical analysis of the prediction quality
In system identi cation literature the concept of parsimony [27] is considered to be very important: Use the simplest model with as few parameters as possible. This concept can easily be justi ed by analyzing the statistical properties of the prediction error. The key result is that the variance of each estimated parameter in most cases increases if we estimate more parameters with a xed amount of data and thus degrade the quality of the estimated model, i.e. the variance of the prediction error increases. An identi able state-space model of a given order has, by de nition, a minimal number of parameters and thus satis es this concept. Even though the fully parametrized model has more parameters than the corresponding identi able model we will in this section show that regularization will restore the statistical properties of an identi able model. In [26] some statistical properties are developed considering neural networks as nonlinear predictor models. 10
These results also apply to the fully parametrized model structure which is pointed out in [16]. In the sequel we will perform the analysis focusing on fully parametrized state-space models.
De nition 5.1 Let the true system S be described by (1) and consider a model structure M of minimal order n such that S 2 M. Then let the set
of the true parameters be de ned as DT = f 2 DM j G0(z) = G(z; ); H0 (z) = H (z; ); 8zg
(25)
2
For an identi able model structure (15) the set DT contains only one point which directly follows from (14) and Theorem 3.1. If the model structure is a fully parametrized state space model structure (18), the elements in DT will be related as follows. Lemma 5.1 Consider the fully parametrized model structure (18) with n states. Then 8i 2 DT ; i = 1; 2; 9 a uniqe T 2 R nn; T invertible : (26) A(1 ) = T ?1A(2 )T; B (1 ) = T ?1B (2 ) (27) C (1 ) = C (2 )T; K (1 ) = T ?1K (2 ) Proof See [9] Theorem 6.2.4 2 If we now let the parameter vector be composed as = [ vec(A)T vec(B )T vec(K )T vec(C )T ]T (28) where vec() is the operator which forms a vector from a matrix by stacking it's columns. The following relation for compatible matrices A; B; C vec(ABC ) = C T A vec(B ) (29) holds, where denotes the Kronecker product [6], de ned as: 2 a B a B ::: a B 3 66 a1121 B a1222 B : : : a12nn B 77 A B = 66 .. ... ... 775 4 . am1 B am2 B : : : amnB 11
where A is of dimension m n. 1 and 2 in Lemma 5.1 will then be related as 1 = T(T )2 (30) where 2 T T T ?1 0 0 0 3 6 0 Im T ?1 0 0 777 (31) T(T ) = 664 0 0 Im T ?1 0 5 0 0 0 T T Ip which proves the following lemma.
Lemma 5.2 Consider the fully parametrized model structure MF in (18) with n states, m inputs and p outputs. Then
8i 2 DT ; i = 1; 2; 9 a unique T 2 R nn; T invertible : where T(T ) is given by (31).
1 = T(T )2
2
If we now consider two models of the same order n, one from the fully parametrized model structure (18), MF (F ), and one from the family of identi able model structures (15), MI (I ), with a multi-index n. Now assume that the two models are equal
MI (I ) = MF (F ): Let
I = [ vecA(I )T vecB (I )T vecK (I )T vecC (I )T ]T : This vector consists of the xed ones and zeros as given by the model structure de nition (15) and the parameters from the vector I . Since the two models are equal we have from Lemma 5.2 9 a uniqe T : I = T(T )F Thus for all F such that MF (F ) 2 MI , we can uniquely nd a T since the model structure MI is identi able. T is easily constructed from the 12
observability matrix given by the matrices in MF (F ) together with the multi-index n, see [9]. Hence there exists a dierentiable function
g() : D~ M ! DM F
such that
I
MF (F ) = MI (g(I )); 8F 2 D~ M
(32) where D~ MF DMF is the set of parameters which yield all models in the model set MI . The gradient of the predictor y^(tj) is de ned as F
2 6 (t; ) = d y^(tj) = 64 d
d d1 y^1 (tj)
:::
... d ddM y^1 (tj) : : :
d d1 y^p(tj)
... d ddM y^p(tj)
3 77 5
(33)
a matrix of dimension dM p where the subscript k in yk (tj) denotes the position in the vector.
Lemma 5.3 Let the predictor y^F (tjF ) be given by a fully parametrized model structure MF (18). Then
where
N 1X F F T N t=1 (t; F ) (t; F ) dMI is given by (33) and dMI by (17).
rank
Proof. Pick an identi able model structure (15) and a I such that MI (I ) = MF (F ): Then using (32) we have F (t; ) = d y^F (tj ) = d y^I (tjg ( )) = @ g ( ) I (t; ) F F I F dF dF @F F which gives us N 1X F F T N t=1 (t; F ) (t; F ) 13
# " X N 1 @ I (t; ) I (t; )T ( @ g ( ))T : g (F ) = I I @F N t=1 @F F The proof is concluded by observing that I (t; I ) I (t; I )T has dimension dM dM and applying Sylvester's inequality. 2 I
I
We will continue by introducing some assumptions which we need in order to perform the convergence analysis. The set Z N , i.e. our measured input and output data, is our source of information about the true system. A model structure M together with the parameter space DM gives us the model set M. A natural question to pose is if the data set contain enough information to distinguish between dierent models in the model set.
De nition 5.2 [13] A data set Z 1 is informative with respect to a model structure M if N 1X E Nlim jy^(tj1 ) ? y^(tj2 )j2 = 0 !1 N t=1 ) M(1 ) = M(2) (34) 2
Assumption A1 Consider a data set Z 1 which has been generated by the
system (4) where the input u(t) is chosen so the data set is informative and that the data set satis es the technical condition D1 in [13, p. 210]. 2 We then have the following convergence result from [13]: Theorem 5.1 Let ^N be de ned by (22) and let the data Z 1 satisfy A1. Then (35) inf j^N ? j ! 0; w.p. 1 as N ! 1: 2DT
2
If we now consider the regularized version of the criterion (23) and let # = 0 2 DT we will have ^N ! 0 as N ! 1 since both terms in the criterion will simultaneously obtain their minimum at = 0 which proves the following theorem. 14
Theorem 5.2 Let ^N be de ned by (23) and (24) with # = 0 2 DT and let the data Z 1 satisfy A1. Then ^N ! 0 ; w.p. 1 as N ! 1
(36)
2
This choice is however arti cial since we in reality do not know 0 a priori. However the theoretical implications in the analysis to follow are still interesting since we will let # approach a 0 2 DT in the algorithm. In the analysis we will also need the following result. Lemma 5.4 Let the predictor be de ned by (8),the gradient by (33) and let the data set Z 1 satisfy A1. Then 8 2 DM lim E N !1
N 1X T N k=1 (t; ) (t; ) = R
()
(0) < 1
Proof. Follows from Lemma 4.2 and Theorem 2.3 both in [13].
2
We will now derive a statistical model quality measure for the fully parametrized models obtained via minimization of the regularized prediction error criterion (23) and (24) with # = 0 2 DT . For a nite data set Z N the obtained estimate ^N will deviate a little from 0 . As a measure of the quality of the estimated model using nite data we can consider the scalar V (^N ) = Nlim E VN (^N ): (37) !1 For all estimates we have V (^N ) V (0 ) = tr(0); 0 2 DT and the increase of V (^N ) due to the parameter deviation should be as low as possible. This assessment criterion thus considers the variance of the one-step prediction errors, when the estimated model is applied to future data. Finally we have VN = E V (^N ) (38) with the expectation taken over the estimated parameters, as a performance measure \on the average" for the estimated model. 15
At the minimum of the criterion (23) we have
W 0 (^ ) =4 N
N
d W () = 0: d N =^
(39)
N
A Taylor expansion around the limiting estimate 0 will then give1 0 = WN 0(0 ) + WN 00(N )(^N ? 0 ) with N between ^N and 0 and
WN 00(N ) =4
(40)
d2 W () : d2 N = N
From [13, p. 270] we have
WN 00(N ) ! W 00 (0 ) w.p. 1 as N ! 1 where
N 1X 00 E W (0 ) = Nlim W 00 ( ): !1 N i=1 N 0 Thus, for suciently large N we can write ^N ? 0 = ?[W 00(0 )]?1WN 0(0 ) w.p. 1
We also have with and2
N
(42) (43)
WN 0 () = VN0 () + ( ? 0 )
(44)
N X (t; )"(t; ) VN0 () = ? N2
(45)
N 2X ( (t; ) (t; )T ? 0 (t; )"(t; )) N t=1
(46)
t=1
V 00 () =
(41)
1 The application of the Taylor expansion (40) may give dierent in dierent rows N of this vector expression. 2 Note that the equation is written in an informal way since (t; ) is a tensor. 0
16
which gives us where
W 00 (0 ) = V 00 (0 ) + I = Q + I
(47)
N 2X Q = V 00 (0 ) = Nlim E (t; 0 ) (t; 0)T !1 N
(48)
t=1
since "(t; 0) = e0 (t) and 0 (t; 0) are independent. Now by Taylor expansion of VN (38) around 0 we have VN = E fV (0 ) + (^N ? 0 )T V 0(0 ) + 21 (^N ? 0)T V 00(0 )(^N ? 0)g (49) neglecting higher order terms. The second term in (49) is zero since V 0(0 ) = 0. If we now insert equation (43) and use the properties of the trace operator we obtain VN = tr(0) + 21 trfE WN 0(0 )(WN 0(0 ))T [Q + I ]?1Q[Q + I ]?1g (50) If we assume that 0 = 0I we have 0 ( )(W 0 ( ))T = 2 Q: lim N E W (51) 0 0 N N 0 N !1
Equation (50) will then be (52) VN = 0 p + N1 trfQ[Q + I ]?1Q[Q + I ]?1g Since Q is a symmetric positive semide nite matrix we can simultaneously make Q and Q + I diagonal which gives us 0 dX MF 2 1 1 i A VN = 0 @p + N (53) 2 ( + i=1 i ) where i are the eigenvalues of the matrix Q and dMF is given by (19). From Lemma 5.3 we know that the matrix Q will at most have dMI nonzero eigenvalues. If we now choose to satisfy 0 < i; 8i 6= 0 (54) we arrive at ! VN = 0 p + rank Q : (55) N Thus we have proved the following theorem: 17
Theorem 5.3 Consider an overparametrized model structure (18) which
de nes a predictor (8) and let the data set Z 1 satisfy A1 and be generated by a true system with 0 = 0 I . Furthermore let ^N be given by (23-24) where # = 0 2 DT . Then ! rank Q : (56) lim V = 0 p + !0 N
N
where VN is given by (37) and (38).
2
The expression (56) is similar to the known expression
VN
= 0 p + dM N
!
[13, 27] for identi able model structures, where the latter clearly shows the relation to model accuracy versus number of estimated parameters. However Theorem 5.3 shows us that the fully parametrized model retains the same properties since the rank of Q, by Lemma 5.3, can at most be dMI which is the number of parameters in a corresponding identi able model. Hence the fully parametrized predictor model in fact obtains the same statistical properties, i.e. the same prediction error variance, as an identi able model.
5.2 Convergence analysis
In reality we cannot choose # = 0 since it is unknown to us so we have to use some a priori good guess. One possibility is to perform a sequence of minimizations of (23) and use the obtained parameter estimate ^N as # in the next minimization. This scheme is in fact locally convergent which we now will prove. Since we have to perform the analysis locally we can simplify the calculations by considering a linear regression set up: Let the data Z 1 be generated by the system S : y(t) = (t)T 0 + e(t) (57) where the regressors (t) are made up from data and the noise terms e(t) are assumed to be a sequence of independent stochastic variables with zero mean 18
E e(t) = 0, covariance E e(t)T e(t) = and independent of the regressors. Let the prediction model be
M : y^(tj) = (t)T :
(58)
V () = Nlim E VN (): !1
(59)
Consider the criterion This gives us with
V () = tr() + (0 ? )T Q(0 ? ) N 1X (t) (t)T : Q = Nlim E !1 N k=1
Now assume that the matrix Q is only positive semide nite which implies that V () obtains its minimum value tr() for all = 0 + z; z 2 N (Q) the null space of the matrix Q. If we let the data set Z 1 be informative, the set DT for this system is then given by DT = fj = 0 + z; z 2 N (Q)g: (60) To nd the minimum of (59) consider the iterative scheme
h i ^k?1j2 : ^k = arg min (61) V ( ) + j ? 2DM Without loss of generality we can assume that 0 = 0. The right hand side of (61) will then be tr() + T Q + ( ? ^k?1)T ( ? ^k?1) which is a quadratic expression in . Completing the squares gives us ( ? (Q + I )?1^k?1)T (Q + I )( ? (Q + I )?1^k?1) + C where C is all the terms independent of lumped together. The minimum is thus given by ^k = (Q + I )?1^k?1: 19
Let T be a square nonsingular matrix such that TQT ?1 = is diagonal and let ~k = T ^k . Furthermore let the diagonal elements of be given in an descending order. This gives us ~k = ( + I )?1~k?1: Using the diagonal form of gives us
2 66 66 6 ~k = 66 66 64
+1
...
0 +r
1
...
3 77 77 77 ~k?1 ~k?1 77 = R 77 5
0 1 where i; i = 1 : : : r are the nonzero eigenvalues of Q. If we now study the limiting estimate as k tends to in nity we have lim ~k = lim Rk ~0 = 0: k!1
k!1
The limiting estimate limk!1 ~k thus belongs to the null space of which shows us that lim ^k 2 N (Q): k!1 This proves the following theorem.
Theorem 5.4 Let the data set Z 1 be generated by the system (57) and be informative. Let the predictor be given by (58). Let the sequence of estimates be given by (61) with > 0, and let DT be given by (60). Then lim inf j^k ? j = 0: k!1 2DT
2
If we now return to the state-space model structures and consider a Taylor expansion of V () around a 0 2 DT we get V () V (0 ) + ( ? 0 )T V 0(0 ) + 21 ( ? 0 )T Q( ? 0 ) 20
Q = V 00(0 ) for in the neighborhood of 0. The second term is zero since 0 2 DT is a minimum. If all higher order terms are small the approximation is valid locally and we have the same problem as for the linear case given in Theorem 5.4. This implies that the iterative minimization scheme (61) will also be locally convergent for the fully parametrized state-space model structures which exhibit the same non-uniqueness properties as the linear regression setup in the theorem. From a regularization point of view the iterative scheme (61) will decrease the actual regularization eect compared to minimization of (23) where # is a xed value. But in the light of Theorem 5.3 this is ne since can become arbitrarily small and in the case (23) is a measure of the degree of regularization. Theorem 5.4 proves that the iterative scheme (61) restores the convergence properties of the standard prediction error techniques, i.e. the limiting estimate belongs to the set DT , the true parameters. Our main goal is thus achieved; we have one model structure which include all possible multivariable systems, the introduced regularization gives the estimated predictors the same statistical quality as identi able models as stated by Theorem 5.3. However we can not directly apply Theorem 5.3 on the iterative scheme (61) since # = k is a stochastic variable. Our model structure still give us more freedom: The set DT is a hyper plane in the parameter space and all the results in this section applies to all points 0 2 DT . However it is known in the literature that some realizations i.e points in DT are much more favorable to use on a computer from a numerical point of view. This important practical question will be addressed in the next section.
6 Numerical Sensitivity In the area of digital lter synthesis, numerical sensitivity of dierent lter structures have been addressed by several authors, among others the work by Mullis and Roberts [24, 23]. They modeled the xed point calculations roundo error as noise and gave conditions which minimized the output noise. From our system identi cation point of view it is interesting to mention that their resulting lter structures are far from the standard canonical forms. 21
Other more recent results have focused on the sensitivity of the transfer function with respect to the parameters [19, 8, 10]. The results indicate that balanced realizations have low sensitivity and will thus yield numerically well conditioned models. The question of sensitivity combined with system identi cation is however not at all as developed in the literature. Some of the work has been focused on changing the delay operator q to some others in the ARX models [5, 11] which in some cases yield well conditioned parameter estimates. In [20] a balanced realization with some restrictions to make it canonical is proposed. This model would thus be well conditioned. The identi cation method however includes a dicult constrained optimization problem and no convergence analysis is given.
6.1 A review of some results
In this section we will consider state-space systems of the form (4) without noise 3 , i.e. H (q) = 0. The matrix
Wo =
1 X
(AT )k C T CAk
k=0
(62)
is known as the observability Gramian for the state space system (4). The eigenvalues of this matrix describe how the state variables in uence the output signal y. This matrix also satis es the following Lyapunov equation
Wo = AT WoA + C T C: The dual matrix
Wc =
1 X k=0
Ak BB T (AT )k
(63)
is called the controllability Gramian. This matrix describes how the inputs u in uence on the state vector x. Wc also satis es
Wc = AWcAT + BB T : The Gramian matrices are symmetric by construction and if the state space realization is minimal the matrices are also positive de nite. These Gramian 3
We could also consider the noise e(t) as an input and let B = [B K ]
22
matrices are the discrete time counterpart to the continuous time Gramians described in [9]. In [22] Moore showed that for every transfer function G(q) there exists a state space realization where Wc = Wo = and diagonal. This realization is called the balanced realization. Moore used this realization to perform model reduction. A change of basis of the state vector x~ = T ?1x gives new Gramian matrices W~ c = T ?1WcT ?T and W~ o = T T WoT The eigenvalues of the matrix WcWo are thus invariant under similarity transformations T . The Hankel singular values [4] of G(q) are de ned as
i =4 [i(WcWo)]1=2 ; i = 1 : : : n where i(WcWo) denotes the i'th eigenvalues of WcWo .
6.2 The sensitivity problem
In computer implementations it is important to consider so called nite word length eects, i.e. what eect the use of nite representation of real numbers have for an algorithm. In our case we are interested in which state-space realization are least sensitive to these eects and thus minimize the degradation of the system performance. As a measure of sensitivity of a transfer function with respect to the parameters we will use the expression in [28]
@G (z)
2 @G (z)
2 @G (z)
2
M =
@A
+
@B
+
@C
F;L1 F;L2 F;L2 Z
@G(ei! )
!2 1 Z
@G(ei! )
2 1 =
d ! + 2 ?
@B
d !
2 ? @A F F
2 Z
@G(ei! )
d!
+1 2 ? @C F 23
(64)
where the Frobenius norm is kX k2F = tr X X and X is the transposed conjugate of X . This scalar expression gives a measure of the sensitivity of the transfer function over the whole frequency range [?; ]. The mix between L1 and L2 norms is motivated by the analytical properties of the rst term in (64). For single input single output systems (m = 1; p = 1) the measure M can be shown [30] to have an upper bound S given by
M S =4 tr(Wc) tr(Wo ) + tr(Wc) + tr(Wo ):
(65)
For general multivariable systems the above bound is generalized to [19]
M S = tr(Wc) tr(Wo) + p tr(Wc) + m tr(Wo)
(66)
where p and m is the number of system outputs and inputs respectively.
Theorem 6.1 Let Wc and Wo be the n n Gramian matrices for a minimal state space realization of a transfer function G(z). Then
n n !2 X X i + 2 i S = tr(Wc) tr(Wo ) + p tr(Wc) + m tr(Wo) i=1
i=1
where fig is the Hankel singular values of G(z). The equality holds if and only if p Wc = m W o :
Proof. See [19]
2
In [30] it is shown that realizations of systems with p = m = 1 (SISO) which satis es Wc = Wo also minimize the true measure M . From the theorem it immediately follows that balanced realizations of square multivariable systems p = m achieves the minimal value of the upper bound S . If the multivariable system has p 6= m a minimum sensitivity realization is obtained from the balanced realization via a state transformation matrix T = (p=m)1=4 I . We also note that if a realization satis es Wc = Wo an orthonormal state transformation T (TT T = I ) will also give a new realization with the Gramian matrices still equal. This shows that there exist in nitely many realizations which satisfy Wc = Wo. 24
7 The Algorithm In this section we will present an algorithm for the purpose of identi cation using the proposed fully parametrized model structure. The algorithm is designed with the goal of providing accurate models with low sensitivity with respect to nite word length representations. The results on sensitivity presented in the previous section lead us to nd an algorithm which has a balanced realization as the convergence point in DT . This is easily achieved if we after each step in (61) adjust the vector ^k to be a balanced realization through a change of basis T . This means that we not only in the limit obtain a balanced realization but we also use it in each step of the identi cation. This discussion leads us to the following algorithm:
Algorithm 7.1
1. Obtain an initial estimate by the least square solution of the equations A(q?1)y(t) = B (q?1)u(t); t = 1; : : : ; N where A and B are polynomial matrices. Each entry of the A matrix is a monic polynomial 1 + a1 q ?1 + : : : + anq ?n of degree n and the entries of the B matrix are polynomials b0 + b1 q?1 + : : : + bn q?n also of degree n. 2. Realize the transfer function A(q ?1 )?1 B (q ?1 ) in state-space form using some basis, e.g. observability canonical form [9]. This yields a statespace system with npm number of states. 3. Convert the system to a balanced realization by state transformation and reduce the system order to n by only including the n states corresponding to the n largest values of the gramian, see [22]. 4. If the model include the matrix K , let K be the solution to the Kalman ltering problem with noise covariance equal to identity matrices. The resulting model is the initial estimate ^b0 . 5. Solve the minimization problem
^k
" # k ? 1 2 = arg min V () + 2 j ? ^b j : 2DM N 25
6. Convert the obtained estimate to represent a balanced realization. ^bk = b(^k ) 7. Repeat steps 5 ? 6 until a minimum of the criterion is reached.
2
Step 4 ensures that the initial predictor will be stable and hence ^b0 2 DM. Step 5 of the algorithm can be easily be solved by a Gauss-Newton method, see section 7.1. The balanced realization is obtained via singular value decompositions, see [22]. A dierent version of the algorithm is obtained if only one numerical iteration is performed in step 5. Practical use show that both methods give compatible convergence properties.
7.1 Numerical solution by iterative search
Since all state-space model structures give a nonlinear relation between the parameters and the predictor y^(tj) we have to search the parameter space DM with some iterative method in order to nd a minimum of the criterion VN (). A well known method is the Newton method which can be described as ^i+1 = ^i ? [V 00 (i)]?1VN0 (i) (67) where ^i is the estimate at the i'th step in the iterative algorithm and N d V () = ? 2 X (t; )"(t; ) VN0 () = d N N t=1
with
(t; ) = d y^(tj) d is the gradient of VN () with respect to the parameters . And N d2 V () = 2 X T 0 VN00 () = d 2 N N ( (t; ) (t; ) ? (t; )"(t; )) t=1
26
(68) (69) (70)
is the Hessian. In the neighborhood of the minimum = 0 the Hessian VN00 () can be approximated by N X (t; ) (t; )T H () = N1 t=1
(71)
since "(t; 0) and 0(t; 0 ) are independent. If we use this approximation together with an adjustable step length we obtain the damped Gauss-Newton method. ^i+1 = ^i ? i[H (^i)]?1 VN0 (^i) (72) The scalar step length i is chosen so that the criterion VN () is decreased in every iteration. If we instead would chose H = I we would obtain a gradient method which is fairly inecient close to the minimum compared to the Gauss-Newton method. A thorough treatment of the Newton methods can be found in [1]. A condition which has to be met, in order to be able to use a Newton method, is that the Hessian VN00 () must be nonsingular to ensure that its inverse is well de ned. Using model structures which are identi able meets this condition in the neighborhood of the minimum if the input is rich enough (persistence of excitation) which result in data Z N which are informative [13]. The proposed overparametrized model structure does however not meet this condition since VN () will in this case not have a unique minimum. The introduction of regularization and the minimization of (23) instead of (20) solves this problem since the approximate Hessian now is given by N X (t; ) (t; )T + I H () = N2 t=1
(73)
which is positive de nite.
8 Norm Minimal Realizations In this section we will establish a connection between norm minimal realizations and the fully parametrized model structures together with a particular regularization, namely the choice # = 0 in (23). The key result is that the obtained realization is norm minimal and if the system is of a particularly 27
simple form this also implies that the realization is a balanced realization except for an orthonormal state transformation. A further specialization of the set in De nition 5.1 will de ne norm minimal realizations.
De nition 8.1 Consider a fully parametrized model structure (18) and let set of parameters from DT which satis es DN = fj = arg min jjg (74) 2D T
be called norm-minimal and let the realization of the model M(); 2 DN be called a norm-minimal realization. 2
Lemma 8.1 Let T be de ned by (31). Then TT T = I , T T T = I Proof. Follows immediately from the fact that 2 TT T (TT T )?1 0 0 6 T ? 1 0 I
( TT ) 0 m TT T = 664 0 0 Im (TT T )?1 0
0
0
0 0 0
T T T Ip )
3 77 75 :
2
Lemma 8.2 Consider the fully parametrized model structure (18). Then 8i 2 DN ; i = 1; 2; 9T :
2 and TT T = I 1 = T Proof. The existence of a T is proven in Lemma 5.2. In [7] it is proven that the state transformation matrix T satis es T T T = I . Applying Lemma 8.1 concludes the proof. 2 This lemma shows that the set DN DT does not contain a singleton, but is a hyperplane in the parameter space DM. The following theorem shows that we will obtain a norm minimal realization (in the limit) if we use the regularized prediction error criterion with the special choice # = 0. 28
Theorem 8.1 Let the data set Z 1 satisfy A1 and consider a fully para-
metrized model structure (18) which contain the true system and let ^N be de ned by (23) and (24) with # = 0. Then lim lim inf j^N ? j = 0 w.p 1 !0 N !1 where DN is de ned by (74).
2DN
Proof. We can consider (24) as a penalty method to solve the following constrained minimization problem; min jj2 subject to E Nlim V () = V () = tr 0 !1 N since we know from Theorem 5.1 that V () tr 0 and obtain its minimum for 2 DT (w.p. 1). The theorem in [18, p. 368] then states that the sequence of ^1 = Nlim ^ !1 N will converge to DN as ! 0 which concludes the proof. 2 In the next section we will point out that a norm minimal realization is close to (in some cases identical) to realizations with low sensitivity to nite word length eects occurring in computer implementations. The set of models M(); 2 DN can also be characterized by a simple matrix equation with a balancing appearance. The following theorem will give the details and can also be found in [7] wherein a complete proof is given. The theorem is also mentioned in [25]. The partial proof given here is however dierent. Theorem 8.2 Let the model M() from the fully parametrized model struc B; C; K . Then ture (18) have matrices A; AAT + B B T + K K T = AT A + C T C if and only if
2 DN 29
Proof.
Let a model M(); 2 DT be represented by the matrices B; C; K corresponding to a parameter A; B; C; K . The set of matrices A; value in the set DN can be characterized via a nonsingular matrix T B = T?1B; C = C T; K = T?1K A = T?1AT; where T is given by
?1 ?1 B T ?1 K
2 T AT T
J (T ) =
CT 0 0
F T = arg min J (T ) T 6=0 Since k kF denotes the Frobenius norm we can express J (T ) as J (T ) = tr(T ?1ATT T AT T ?T + T ?1BB T T ?T + T ?1KK T T ?T + T T C T CT ) = tr(T ?T T ?1ATT T AT + T ?T T ?1 BB T + T ?T T ?1KK T + TT T C T C ) The function J (T ) does not have a unique minimum T as already stated in Lemma 8.2. If we let P = TT T and let J(P ) = tr(P ?1APAT + P ?1BB T + P ?1KK T + PC T C ) a minimizing matrix P will then give us the whole set of minimizing matrices T. The existence and uniqeness of a minimizing P is shown in [7]. The minimizing P will then satisfy d J(P ) = 0 dP We thus have according to Lemma A.1-A.3 in appendix that T P ?1 ? P ?1BB T P ?1 0 = AT P ?1A ? P ?1APA ?P ?1 KK T P ?1 + C T C (75) Take a T which satis es TTT = P and insert in (75) and multiply the equation with TT from left and T from right. 0 = TT AT T?T T?1AT ? T?1ATTT AT T?T ? T?1BB T T?T ?T?1 KK T T?T + TT C T C T = AT A + C T C ? AAT ? B B T ? K K T which proves the \if" part. For the \only if" part we refer to [7]. 2 30
8.1 Norm minimal realizations and sensitivity
The norm minimal realizations are partly related to balanced realizations and thus low sensitivity. We have the following theorem: Theorem 8.3 Assume that a single input single output transfer function G(z) of order n has the following structure n X i z ? i ; i 2 R + (R + = [0; 1[); i 2 R ; i = 1 : : : n i=1 Then all norm minimal state-space realizations M(); 2 DN will have the controllability Gramian Wc and the observability Gramian Wo equal. Proof. Let a fully parametrized model M() represent the system and let the parameter give the following matrices. A = diag(i) i = 1 : : : n B = [p1; p2 ; : : : ; pn]T C = [p1 ; p2; : : : ; pn]
G(z) =
which gives us
X G(z) = C (zI ? A)?1B = n
i
i=1 z ? i
and thus 2 DT . Since A = AT and C T = B this realization trivially satisfy
AAT + BB T = AT A + C T C which shows that that the realization is norm minimal and 2 DN . Furthermore from (62-63) we automatically have Wc = Wo. From Lemma 8.2 we know that all other norm minimal realization can be reached via orthonormal transformation matrices T . Since the transformation matrices are orthonormal we will have Wc = Wo for all norm minimal realizations. 2 The result shows that if we identify systems with the structure in Theorem 8.3 using a fully parametrized state space model together with the criterion (23) and # = 0 we automatically obtain a minimum sensitivity realization. Although the class of systems which satisfy the conditions in 31
Theorem 8.3 is very small we believe there is a connection between low sensitivity and norm minimal realizations for the whole class of systems (4). To conclude this section we will give some numerical values on the measure S for dierent kinds of state-space realizations of the same system.
Example 8.1 Consider a discrete time Butterworth low pass lter of order n = 4 with cuto frequency =2 rad/s. A state space realization of the lter can be obtained in MATLAB [12] with the following command. [A; B; C; D] = butter(4; 0:5);
(76)
If we evaluate S in (65) for some dierent realizations previously mentioned we obtain the following results. Realization Sensitivity S Original from (76) 8.11 Observer canonical form 11.37 Balanced realization 5.20 Norm-minimal realization 5.28 The balanced realization obtains a minimal value which was proven in Theorem 6.1 since the balanced realization satis es Wc = Wo. The near optimal value obtained for the norm minimal realization shows the previously stated connection between norm-minimal realizations and low sensitivity. As expected the observability canonical form has the least favorable sensitivity of the four dierent realizations. 2
9 Examples In this section we will present three examples illustrating the previously discussed properties of the proposed identi cation algorithm and model structure.
Example 9.1 In [21] Appendix A.2 a turbo generator model with two
inputs, two outputs and six states is presented. We used the continuous time model to generate an estimation data set and a validation set using random binary (?1; +1) signals as inputs. The sample time was set to 0.05. The 32
Model FM IM 6 = f1; 5g IM 6 = f2; 4g ^V 0.0056 0.0070 0.0123 Model IM 6 = f3; 3g IM 6 = f4; 2g IM 6 = f5; 1g V^ 0.0177 0.0056 0.0059 Table 1: V^ evaluated for the fully parametrized model (FM) and the 5 identi able models (IM) from Example 9.1. estimation data set and the validation set was 500 respectively 300 samples long with the output sequences corrupted with zero mean gaussian noise with variance 0 = 0:0025I2 to make the system of output-error type. The estimation was performed on the estimation data according to the Algorithm 7.1 using a fully parametrized model with six states. This gives us a total of 64 parameters to estimate which can be compared with the 40 parameters for any identi able model structure (15). The regularization parameter was set to 10?5 which makes the numerics well conditioned. To assess the quality of the estimated model we evaluate N X V^ = N1 jy(t) ? y^(tj^N )j2 t=1 using the independent validation set. To compare the proposed parametrization of the model with the conventional identi able parametrizations, we identi ed the ve dierent possible models corresponding to the ve sets of multi-indices 6. The results are given in Table 1 which clearly shows that the fully parametrized model (FM) is equally good as the best identi able model (IM). It is also interesting to notice that all the other identi able models perform signi cantly worse. 2
Example 9.2 Consider a Butterworth lter of order n = 4 with a narrow passband [0:1 ? 0:11] rad/s. The corresponding state space realization can be obtained in MATLAB [12] with the following command. [A; B; C; D] = butter(2; [0:1; 0:11]); 33
(77)
Model Fit 1 1:3 10?13 2 1:1 10?16 Table 2: The performance for the two estimated models in Example 9.2. Model 1: Identi able state-space model. Model 2: Fully parametrized statespace model. Let fu(t)g be a random binary input signal consisting of 400 samples and let fy(t)g be the output from the lter (77) using fu(t)g as the input. Let the estimation data set Z 400 consist of fu(t)g and fy(t)g. Based on the data set Z 400 and two dierent model structures, two models were estimated. 1. An identi able state space model of order four. 2. A fully parametrized state space model of order four. Model 1 was estimated by rst initializing the model with an ARX estimate and then numerically minimizing (20) using the standard commands canstart and pem from [14]. Model 2 was estimated using Algorithm 7.1. The minimizations were conducted until the minima were reached for both models. Table 2 shows the performance of the two estimated models simulated with fu(t)g as the input signal. The value Fit is the RMS value of the deviations between the true output and the models output. Since the two models theoretically have the same ability to model the underlying system the dierence in performance must be accredited to numerical properties. The exibility of the second model not only allows the true system to be correctly described but also oer a degree of freedom to obtain better numerical properties. Another possibility is that Model 1 reached a local minimum. 2
Example 9.3 In [32] an identi cation of a glass oven using real data is given
as an example to illustrate the subspace identi cation algorithm presented in the paper. The system consists of 3 inputs and 6 outputs and is modeled as a 34
Model Simulation 1 step prediction Subspace [32] 0.536 0.108 Full param. with K 0.547 0.135 Full param. K = 0 0.511 Table 3: Performance P of the estimated models in Example 9.3 system of order n = 5. Using the same data and our fully parametrized model structure we identi ed two dierent models. One output error model, i.e. K = 0 and one model where K was also estimated. In Table 3 a prediction error measure P of the two estimated models are compared with the model obtained by [32], where
v 6 u N (y (t) ? y^ (t))2 u X X 1 1 k k t P=6 2 yk (t) k=1 N t=1
is evaluated on independent data. The fully parametrized OE-model has the best performance of the models in simulations. The other fully parametrized model is a little worse compared to the subspace model. This example shows that identi cation using real data works quite well for the proposed model structure compared with the subspace identi cation method.
2
10 Conclusions In this report we have introduced a state space model structure which is fully parametrized and thus each parameter is not identi able. The use of this model structure for identi cation of multivariable systems will relieve us from the search through all possible identi able parametrizations and allow parametrizations which are well conditioned. In order to minimize the prediction error for the fully parametrized model with an ecient numerical method, regularization is introduced. It is shown that the use of regularization will automatically give the same statistical properties as an identi able model structure even though the fully parametrized model structure contains more parameters to estimate. 35
An identi cation algorithm is presented which yields, in the limit, a balanced realization of the true system. Some material is reviewed which shows that a balanced realization has low sensitivity with regard to nite word length eects. The proposed algorithm thus produces a numerically sound model of the underlying system. It is also shown that the use of a particular type of regularization will give a model which, in the limit, is norm minimal. A connection between low sensitivity and norm minimal realizations was also established.
36
A Appendix
Lemma A.1 Let A 2 R nn and P 2 R nn and symmetric. Then
d tr(P ?1APAT ) = d tr(PAT P ?1A) dP dP T ? 1 ? = A P A ? P 1APAT P ?1 Proof. Denote Y = PAT P ?1A and let V = AT P ?1A which gives Y = PV . From [6, p. 64] we have n @p n @v X @yii = X ik ki v ki + pik @P k=1 @P k=1 @P and @vki = ?P ?1AE AT P ?1 ki @P together with @pik = E ik @P where Eik is a matrix with element i; k equal to 1 and the rest equal to zero. This gives us n @yii = X ?1 T ?1 @P k=1(Eik vki + pik (?P AEkiA P )) which nally gives us n X n d tr(Y ) = X (Eik vki + pik (?P ?1 AEkiAT P ?1)) dP i=1 k=1 = AT P ?1A ? P ?1APAT P ?1 which concludes the proof. 2
Lemma A.2 Let B 2 R nm and P 2 R nn and symmetric. Then d tr(P ?1BB T ) = d tr(B T P ?1B ) dP dP = ?P ?1 BB T P ?1 37
Proof. Denote Y = B T P ?1B . From [6, p. 65] we have which directly gives us
@yii = ?P ?1 BE B T P ?1 ii @P
n d tr(Y ) = X ?P ?1BEii B T P ?1 dP i=1
= ?P ?1 BB T P ?1
2
Lemma A.3 Let C 2 R pn and P 2 R nn and symmetric. Then d tr(C T CP ) = d tr(CPC T ) dP dP = CT C
Proof. Denote Y = CPC T . From [6, p. 65] we have which gives us
@yii = C T E C ii @P n d tr(Y ) = X C T EiiC = C T C dP i=1
38
2
References [1] J. E. Dennis and R. B. Schnabel. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Englewood Clis, New Jersey, 1983. [2] J. C. Doyle, K. Glover, P. P. Khargonekar, and B. A. Francis. Statespace solutions to standard H2 and H1 control problems. IEEE Trans. on Automatic Control, 34(8):832{847, August 1989. [3] N. R. Draper and R. C. Van Nostrand. Ridge regression and Jamesstein estimation: Review and comments. Technometrics, 21(4):451{466, November 1979. [4] K. Glover. All optimal hankel-norm approximations of linear multivariable systems and their L1 -error bounds. International Journal of Control, 39(6):1115{1193, 1984. [5] G. C. Goodwin. Some observations on robust stochastic estimation. In Proc. IFAC Identi cation and System Parameter Estimation, Beijing, PRC, 1988. [6] A. Graham. Kronecker Products and Matrix Calculus With Applications. Ellis Horwood Limited, Chichester, England, 1981. [7] U. Helmke. Balanced realisations for linear systems: a variational approach. SIAM J. Control and Optimization, 31(1):1{15, January 1993. [8] M. Iwatsuki, M. Kawamata, and T. Higuchi. Statistical sensitivity structures with fewer coecients in discrete time linear systems. IEEE Trans. on Circuits and Systems, 37(1):72{80, January 1989. [9] T. Kailath. Linear Systems. Prentice-Hall, Englewood Clis, New Jersey, 1980. [10] G. Li, B. D. O. Anderson, and M. Gevers. Optimal FWL design of state-space digital systems with weighted sensitivity minimization and sparseness considerations. IEEE Trans. on Circuits and Systems-Vol I: Fundamental Theory and Applications, 39(5):365{377, May 1992. 39
[11] G. Li and M. Gevers. Data ltering, reparametrization, and the numerical accuracy of parameter estimators. In Proc. 31'th IEEE Conference on Decision and Control, Tuscon, Arizona, 1992. [12] J. Little and L. Shure. Signal Processing Toolbox. The Mathworks, Inc., 1988. [13] L. Ljung. System Identi cation: Theory for the User. Prentice-Hall, Englewood Clis, New Jersey, 1987. [14] L. Ljung. Issues in system identi cation. IEEE Control Systems Magazine., 11(1):25{29, January 1991. [15] L. Ljung. A simple start-up procedure for canonical form state space identi cation, based on subspace approximation. In Proc. 30'th IEEE Conference on Decision and Control, pages 1333{1336, Brighton, England, December 1991. [16] L. Ljung, J. Sjoberg, and T. McKelvey. On the use of regularization in system identi cation. Technical report, Report LiTH-ISY-I-1379, Dep. of Electrical Engineering, Linkoping University, S-581 83 Linkoping, Sweden, August 1992. [17] D. G. Luenberger. Canonical forms for linear multivariable systems. IEEE Trans. Automatic Control, AC-12:290, 1967. [18] D. G. Luenberger. Linear and Nonlinear Programming. Addison Wesley, Reading, Massachusetts, 1984. [19] W. J. Lutz and S. L. Hakimi. Design of multi-input multi-output systems with minimum sensitivity. IEEE Trans. on Circuits and Systems, 35(9):1114{1121, September 1988. [20] J. M. Maciejowski. Balanced realisations in system identi cation. In Proc. 7'th IFAC Symposium on Identi cation & parameter Estimation, York, UK, 1985. [21] J. M. Maciejowski. Multivariable Feedback Design. Addison-Wesley, Wokingham, England, 1989. 40
[22] B. C. Moore. Principal component analysis in linear systems: controllability, observability, and model reduction. IEEE Trans. on Automatic Control, 26(1):17{32, 1981. [23] C. T. Mullis and R. A. Roberts. Roundo noise in digital lters: Frequency transformations and invariants. IEEE Trans. on Acustics, Speech and Signal Processing, 24(6):538{550, December 1976. [24] C. T. Mullis and R. A. Roberts. Synthesis of minimal roundo noise xed point digital lters. IEEE Trans. on Circuits and Systems, 23(9):551{ 562, September 1976. [25] J. E. Perkins, U. Helmke, and J. B. Moore. Balanced realisations via gradient ow techniques. Systems & Control Letters, 14:369{379, 1990. [26] J. Sjoberg. Regularization issues in neural network models of dynamical systems. Linkoping studies in science and technology. thesis no.366, liutek-lic-1993:08, Department of Electrical Engineering, Linkoping University, Sweden, 1993. [27] T. Soderstrom and P. Stoica. System Identi cation. Prentice-Hall International, Hemel Hempstead, Hertfordshire, 1989. [28] V. Tavsanoglu and L. Thiele. Optimal design of state-space digital lters by simultaneous minimization of sensitivity and roundo noise. IEEE Trans. on Circuits and Systems, 31(10):884{888, October 1984. [29] L. Thiele. Design of sensitivity and round-o noise optimal state-space discrete systems. Circuit Theory and Applications, 12:39{46, 1984. [30] L. Thiele. On the sensitivity of linear state-space systems. IEEE Trans. on Circuits and Systems, 33:502{510, 1986. [31] A. J. M. van Overbeek and L. Ljung. On-line structure selection for multivariable state space models. Automatica, 18(5):529{543, 1982. [32] P. Van Overschee and B. De Moor. Two subspace algorithms for the identi cation of combined deterministic-stochastic systems. In Proc. 31'st IEEE Conference on Decision and Control, Tuscon, Arizona, pages 511{516, 1992. 41