Subspace Algorithms for the Identi cation of Combined ... - CiteSeerX

1 downloads 0 Views 419KB Size Report
In section 7 we will describe a numerically stable way to do this. For convenience, we rewrite these projections as follows : Zi = L1 i. |{z} li mi. L2 i. |{z} li mi. L3.
N4SID1 : Subspace Algorithms for the Identi cation of Combined Deterministic-Stochastic Systems 2 Peter Van Overschee 3 Bart De Moor 4 ESAT/SISTA 1992-34-I Internal report, Accepted for publication in : Automatica, Special Issue on Statistical Processing and Control

1 Numerical

algorithms for Subspace State Space System Identi cation. Read as a Californian license plate : enforce it. 2 Running title : N4SID : Identi cation of Combined Deterministic-Stochastic Systems. 3 ESAT, Department of Electrical Engineering, Katholieke Universiteit Leuven, Kardinaal Mercierlaan 94, 3001 Heverlee, BELGIUM. email : [email protected]. Research Assistant of the Belgian National Fund for Scienti c Research. Author to whom all correspondence should be addressed. 4 Research Associate of the Belgian National Fund for Scienti c Research

Abstract Recently a great deal of attention has been given to numerical algorithms for subspace state space system identi cation (N4SID). In this paper, we derive two new N4SID algorithms to identify mixed deterministic-stochastic systems. Both algorithms determine state sequences through the projection of input and output data. These state sequences are shown to be outputs of non-steady state Kalman lter banks. From these it is easy to determine the state space system matrices. The N4SID algorithms are always convergent (non-iterative) and numerically stable since they only make use of QR and Singular Value Decompositions. Both N4SID algorithms are similar, but the second one trades o accuracy for simplicity. These new algorithms are compared with existing subspace algorithms in theory and in practice. Key words : Subspace identi cation, non-steady state Kalman lter, Riccati di erence equations, QR and Singular Value Decomposition

1 Introduction The greater part of the systems identi cation literature is concerned with computing polynomial models, which are however known to typically give rise to numerically ill-conditioned mathematical problems, especially for Multi Input Multi Output systems. Numerical algorithms for subspace state space system identi cation (N4SID1) are then viewed as the better alternatives. This is especially true for high-order multivariable systems, for which it is not trivial to nd a useful parameterization among all possible parametrizations. This parametrization is needed to start up the classical identi cation algorithms (see e.g. Ljung, 1987), which means that a-priori knowledge of the order and of the observability (or controllability) indices is required. With N4SID algorithms, most of this a-priori parametrization can be avoided. Only the order of the system is needed and it can be determined through inspection of the dominant singular values of a matrix that is calculated during the identi cation. The state space matrices are not calculated in their canonical forms (with a minimal number of parameters), but as full state space matrices in a certain, almost optimally conditioned basis (this basis is uniquely determined, so that there is no problem of identi ability). This implies that the observability (or controllability) indices do not have to be known in advance. Another major advantage is that N4SID algorithms are non-iterative, with no non-linear optimization part involved. This is why they do not su er from the typical disadvantages of iterative algorithms, e.g. no guaranteed convergence, local minima of the objective criterion and sensitivity to initial estimates. For classical identi cation, an extra parametrization of the initial state is needed when estimating a state space system from data measured on a plant with an non-zero initial condition. A nal advantage of the N4SID algorithms, is that there is no di erence between zero and non-zero initial states. Most commonly known subspace methods are realization algorithms of e.g. Kung (1978), where a discrete-time state space model is computed from a block Hankel matrix with Markov parameters. It is unfortunate that the theory here relies on Markov parameters as a starting point, something rather dicult to measure or compute in practice (e.g. think of unstable systems). An alternative direct identi cation scheme for purely deterministic systems is described by e.g. Moonen et al. (1989), Moonen & Ramos (1991), where a state space model is computed directly from a block Hankel matrix constructed from the input-output data. In a rst 1

Read as a Californian license plate : enforce it

1

step, a state vector sequence is computed as an interface between a `past' and a `future'. Once the state vector sequence is known, the system matrices are computed from a set of linear equations. Similar data-driven identi cation schemes for purely stochastic identi cation are well known, (see e.g. Arun & Kung (1990) and the references therein). Less well known is that these algorithms can compute extremely biased results. This problem was studied and solved by Van Overschee & De Moor (1991a,1991b). The problem addressed in this paper is that of identifying a general state space model for combined deterministic-stochastic systems directly from the input-output data. Some papers in the past have already treated this problem but from a di erent viewpoint. In Larimore (1990) for instance, the problem is treated from a purely statistical point of view. There is no proof of correctness (in a sense of the algorithms being asymptotically unbiased) whatsoever. In De Moor et al. (1991), Verhaegen (1991) the problem is split up into two subproblems : deterministic identi cation followed by a stochastic realization of the residuals. In Moonen et al. (1992) the problem is solved for double in nite block Hankel matrices, which implies practical computational problems. In this paper, we will derive two N4SID algorithms that determine the deterministic and stochastic system at the same time. The connection with classical system theory (Kalman lter) will be used to prove the exactness (unbiasedness for an in nite number of measurements) of algorithm 1, or the degree of approximation (calculation of the bias for an in nite number of measurements) of algorithm 2. The approach adopted here is similar to the identi cation schemes of Moonen et al. (1989) for the purely deterministic case and Van Overschee & De Moor (1991a,1991b) for the stochastic case. First a state sequence is determined from the projection of input-output data. This projection retains all the information (deterministic and stochastic) in the past that is useful to predict the future. Then, the state space matrices are determined from this state sequence. Fig. 1 shows how these N4SID algorithms di er from the classical identi cation schemes. The connection of the two new N4SID algorithms with the existing algorithms described above will also be indicated. This paper is organized as follows : The problem description and the mathematical tools can be found in section 2. In section 3 the main projection is de ned. Section 4 introduces a closed form formula for the non-steady state Kalman lter estimation problem. This result is related to the results of section 3 to nd the interpretation of the main projection as a sequence of outputs of a non-steady state Kalman lter bank. Section 5 introduces a 2

rst N4SID algorithm that identi es the system matrices exactly. In section 6 accuracy is traded o for simplicity in a second approximate N4SID algorithm. Section 7 shows how these N4SID algorithms can be implemented in a numerically reliable way, using the QR and the Singular Value Decomposition (SVD). Section 8 investigates the connection with other existing algorithms. Finally section 9 will treat some comparative examples. The conclusions can be found in section 10.

2 Preliminaries In this section, we describe the linear time invariant system we want to identify. We also introduce the input and output block Hankel matrices, the past and future horizon as well as the input-output equations.

2.1 System description Consider the following combined deterministic-stochastic model to be identi ed :

xk+1 = Axk + Buk + wk yk = Cxk + Duk + vk with

(1) (2)

0 1 0 s s1  t t w k E[@ v A wl vl ] = @ (SQs )t RS s A kl  02 (3) k and A; Qs 2 Rnn; B 2 Rnm; C 2 Rln; D 2 Rlm; S s 2 Rnl and Rs 2 Rll. The input vectors uk 2 Rm1 and output vectors yk 2 Rl1 are measured. vk 2 Rl1 and wk 2 Rn1 on the other hand are unmeasurable, Gaussian distributed,zero mean, white noise vector  sequences. fA; C g is assumed to be observable, while fA; B (Qs)1=2 g is assumed to be controllable. This system (1)-(2) is split up in a deterministic and a stochastic subsystem, by splitting up the state (xk ) and output (yk ) in a deterministic (:d) and stochastic (:s) component : xk = xdk + xsk , yk = ykd + yks . The deterministic state (xdk ) and output (ykd) follow from the deterministic subsystem, which describes the in uence of the deterministic input (uk ) on the deterministic output : xdk+1 = Axdk + Buk (4) ykd = Cxdk + Duk (5) 2

E denotes the expected value operator and kl the Kronecker index.

3

The controllable modes of fA; B g can be either stable or unstable. The stochastic state (xsk ) and output (yks ) follow from the stochastic subsystem, which describes the in uence of the noise sequences (wk and vk ) on the stochastic output :

xsk+1 = Axsk + wk (6) yks = Cxsk + vk (7) The controllable modes of fA; (Qs)1=2g are assumed to be stable. The deterministic inputs (uk ) and states (xdk ) and the stochastic states (xsk ) and outputs (yks ) are assumed to be quasi-stationary (as de ned in Ljung, 1987, pp. 27). Note that even though the deterministic subsystem can have unstable modes, the excitation (uk ) has to be chosen in such a way that the deterministic states and output are nite for all time. Also note that since the systems fA; B g and fA; (Qs)1=2g are not assumed to be controllable, the deterministic and stochastic subsystem may have common as well as completely decoupled input-output dynamics. The main problem of this paper can now be stated : Given input and output measurements u1; : : : ; uN and y1; : : : ; yN (N ! 1), and the fact that these two sequences are generated by an unknown combined deterministic-stochastic model of the form described above, nd A; B; C; D; Qs; Rs ; S s (up to within a similarity transformation). In the next two sections, we will de ne some more useful properties and notations for the deterministic and the stochastic subsystem.

2.1.1 The deterministic subsystem Associated with the deterministic subsystem (4)-(5), we de ne the following matrices :  The extended (i > n) observability matrix ?i (where the subscript i denotes the number of block rows) : 1 0 C C BB BB CA CCC ?i def =B BB CA2 CCC B@ : : : CA CAi?1  The reversed extended controllability matrix di (where the subscript i denotes the number of block columns) :   d def = Ai?1B Ai?2B : : : AB B i

4

 The lower block triangular Toeplitz matrix Hid :

0 D 0 0 B B CB D 0 B B def d B Hi = B CAB CB D B B ::: ::: @ ::: i ? 2 i ? 3 CA B CA B CAi?4B

2.1.2 The stochastic subsystem

::: ::: ::: ::: :::

1 0 C 0 C CC 0 C CC ::: C A D

For the stochastic subsystem (6)-(7) we de ne : P s def = E[xsk(xsk )t] G def = E[xsk(yks )t] 0 def = E[yks(yks )t] With equations (3),(6),(7) and through stability of the controllable modes of the system fA; (Qs)1=2g, we nd easily that the following equations are satis ed : P s = AP sAt + Qs (8) G = AP sC t + S s 0 = CP sC t + Rs This set of equations describes the set of all possible stochastic realizations that have the same second order statistics as a given stochastic sequence yks . We call them the positive real equations. More details can be found in Faure (1976). It is also easy to derive that : 8 i?1 > i>0 < CA G def s s t i = E[yk+i(yk ) ] = > 0 i=0 : Gt (At)?i?1 C t i < 0 Associated with the stochastic subsystem, we de ne the following matrices :  The matrix si :   = Ai?1G Ai?2G : : : AG G si def  The block Toeplitz covariance matrix Lsi : 1 0   0 ? 1 ?2 : : : 1?i C BB    1 0 ?1 : : : 2?i C def C B s Li = B B@ : : : : : : : : : : : : : : : CCA i?1 i?2 i?3 : : : 0 5

 The block Toeplitz cross covariance matrix His : 0 BB i i?1 His def =B BB i+1 i @ ::: ::: 2i?1 2i?2

i?2 i?1 ::: 2i?3

1 1 C 2 C CC = ?is i ::: C A i

::: ::: ::: :::

2.2 Block Hankel matrices and input-output equations Input and output block Hankel matrices are de ned as : 1 0 1 0 y y y u u u 0 1 2 : : : yj ?1 0 1 2 : : : uj ?1 C BB C BB y y y y u u u u 1 2 3 ::: j C 1 2 3 ::: j C def def C B C B U0ji?1 = B B@ : : : : : : : : : : : : : : : CCA Y0ji?1 = BB@ : : : : : : : : : : : : : : : CCA yi?1 yi yi+1 : : : yi+j?2 ui?1 ui ui+1 : : : ui+j?2 where we presume that j ! 1 throughout the paper. The subscripts of U and Y denote the subscript of the rst and last element of the rst column. The block Hankel matrices formed with the output yks of the stochastic subsystem are de ned as Y0sji?1 in the same way. Somewhat loosely we denote the "past" inputs with U0ji?1 or U0ji and the "future" inputs with Uij2i?1 or Ui+1j2i?1. A similar notation applies for the past and future outputs. This notational convention is useful when explaining concepts. The deterministic and stochastic state matrices are de ned as :     = xs xs xs : : : xs X s def = xd xd xd : : : xd X d def i

i

i+1

i+2

i+j ?1

i

i

i+1

i+2

i+j ?1

For the deterministic subsystem we de ne : 0 1 0 t 1 0 1 R R U S 11 12 0ji?1 1 t  B CC  t C BB S R 1 def lim U U0ji?1 Uitj2i?1 (X0d )t = B @ Rt12 R22 S2t CA = @ S P d A j !1 j @ ij2i?1 A X0d S1 S2 P d where we use the assumption that the limit exists (quasi-stationarity of uk and xdk ). For the stochastic subsystem we nd that, due to stationarity of yks , the following equalities hold true : 0 s 1 0 s 1 s )t   Y 1 L ( H lim @ 0ji?1 A (Y0sji?1)t (Yisj2i?1)t = @ is is A (9) j !1 j Yisj2i?1 Hi Li The Matrix input-output equations are de ned in the following Theorem (De Moor, 1988) : 6

Theorem 1 Y0ji?1 = ?iX0d + Hid U0ji?1 + Y0sji?1 Yij2i?1 = ?iXid + Hid Uij2i?1 + Yisj2i?1 Xid = AiX0d + diU0ji?1

(10) (11) (12)

The Theorem is easy to prove by recursive substitution into the state space equations.

3 The Main Projection In this section, we introduce the projection of the future outputs onto the past and future inputs and the past outputs. The results can be described as a function of the system matrices and the input-output block Hankel matrices. We de ne the matrices Zi and Zi+1 as :

1 0 U Zi = Yij2i?1= @ 0j2i?1 A Y 0 0ji?1 1 Zi+1 = Yi+1j2i?1= @ U0j2i?1 A Y0ji

(13) (14)

where A=B = AB t(BB t)?1 B . The row space of A=B is equal to the projection of the row space of A onto the row space of B . Formula (13) corresponds to the optimal prediction of Yij2i?1 given U0j2i?1 and Y0ji?1 in a sense that : kYij2i?1 ? Zik2F is minimized constrained to 0 1 U Zi 2 Row Space @ 0j2i?1 A Y0ji?1 So, intuitively, the kth row of Zi would correspond to a k step ahead prediction of the output. This intuition become clearer in section 4. These projections (Zi and Zi+1) are useful in determining the combined system, since (as we will show in Theorem 2) the linear combinations to be made of the input-output block Hankel matrices to generate the matrices Zi and Zi+1 are functions of the system matrices (A; B; C; D; Qs; S s; Rs ). Moreover, the system matrices can be retrieved from these linear combinations, as will be explained in section 5. 7

It is tedious though straightforward to prove the following Theorem which delivers formulas for the linear combinations to be made of the rows of the input-output block Hankel matrices to generate the matrices Zi and Zi+1 :

Theorem 2 Main projection  If the deterministic input uk and state xdk are uncorrelated with the stochastic output yks :

1Y s Ut = 0 1 Y s (X d )t = 0 lim lim 0 j i ? 1 : j !1 j j !1 j 0ji?1 : lim 1 Y s U t = 0 lim 1 Y s (X d )t = 0 j !1 j ij2i?1 : j !1 j ij2i?1 : where the subscript : denotes past or future

 and if the input is "persistently exciting of order 2i (Ljung, 1987, pp. 363)" : rank U0j2i?1 = 2mi

 and if the stochastic subsystem is not identically zero (the purely deterministic case

will be treated in section 6.1.3) then (for j ! 1): Zi = ?i X^i + Hid Uij2i?1 Zi+1 = ?i?1 X^i+1 + Hid?1 Ui+1j2i?1 with :

1 0 ?1 SR U 0j2i?1  C B X^i = Ai ? Qi?i di ? QiHid Qi B @ U0ji?1 CA Y0ji?1 0 ?1 SR U0j2i?1   B i +1 d d X^i+1 = A ? Qi+1?i+1 i+1 ? Qi+1Hi+1 Qi+1 B U0ji @ Y0ji

(15) (16)



and

Qi = i i?1 i = Ai(P d ? SR?1 S t)?ti + si d ?1 t t s i = ?i (P ? SR S )?i + Li

(17)

1 CC A

(18)

(19) (20)

A proof can be found in Appendix A. In the next section, we give an interpretation of these projections. 8

4 A Bank of Kalman Filters In this section, we show how the sequences X^i and X^ i+1 can be interpreted in terms of states of a bank of j non-steady state Kalman lters, applied in parallel to the data. This interpretation will lead to a formula that will prove to be extremely useful when determining the system matrices from the data. As stated before, it may come as no surprise that there is a connection between the states X^i de ned by the projection Zi and some optimal prediction of the outputs Yij2i?1. To establish this connection, we need one more Theorem that states how the non steady state Kalman lter state estimate x^k can be written as a linear combination of u0; : : :; uk?1, y0; : : :; yk?1 and the initial state estimate x^0.

Theorem 3 Kalman Filter

Given x^0; P0; u0; : : :; uk?1 ; y0 ; : : :; yk?1 and all the system matrices (A; B; C; D; Qs; S s; Rs ), then the non steady state Kalman lter state x^k de ned by the following recursive formulas :

x^k = Ax^k?1 + Buk?1 + Kk?1 (yk?1 ? C x^k?1 ? Duk?1 ) Kk?1 = (APk?1C t + G)(0 + CPk?1 C t)?1 Pk = APk?1At ? (APk?1 C t + G)(0 + CPk?1C t)?1(APk?1C t + G)t can be written as :

0 BB x^0 BB u0 BB : : :  k  B x^k = A ? Qk ?k dk ? Qk Hkd Qk B BB uk?1 BB y0 BB @ ::: yk?1

1 CC CC CC CC CC CC CC A

(21) (22) (23)

(24)

where :

Qk = k k?1 k = Ak P0?tk + sk t s k = ?k P0 ?k + Lk

(25) (26) (27)

The proof of this Theorem and some details concerning the special form of the Kalman lter equations (21)-(23) can be found in Appendix B and Appendix C. Let us just indicate 9

that the error covariance matrix P~k def = E[(xk ? x^k )(xk ? x^k )t] is given by P s + Pk , with P s the state covariance matrix from Lyapunov equation (8). Note that the limiting solution (k ! 1) of (23) is ?P1 , where P1 is the state covariance matrix of the forward innovation model (Faure, 1976). Hence the limiting error covariance matrix is P~1 = P s ? P1 , which is the smallest state error covariance matrix we can obtain (in the sense of nonnegative de niteness). Also note that the expressions for k and k (26)-(27) are equal to the expressions of i and i (19)-(20) with P d ? SR?1 S t substituted by P0. If we now combine the results of Theorem 2 and 3, we nd an interpretation of the sequences X^i and X^i+1 in terms of states of a bank of non-steady state Kalman lters, applied in parallel to the data. More speci cally, compare formulas (17), (18) and (24) : 1. The j columns of X^i are equal to the outputs of a bank of j non-steady state Kalman lters in parallel. The (p + 1)th column of X^i for instance, is equal to the non-steady state Kalman lter state x^i+p of the Kalman lter (21), (22), (23), with initial error covariance matrix at starting time p : P~p = Pp + P s = P d ? SR?1S t + P s 1 0 u p C B x^p = SR?1 B @ : : : CA up+2i?1 Notice that P~p is independent of the column index, so it is denoted with P~ 0. In this way, all the columns can be interpreted as Kalman lter states. The initial states of the j lters together can be written as : X^ 0 = SR?1U0j2i?1 All this is clari ed in Fig. 2. The expressions for P~ 0 and X^ 0 can be interpreted (somewhat loosely) as follows : If we had no information at all about the initial state, then the initial state estimate would be X^ 0 = 0 and the initial error covariance would be equal to the expected variance of the state : P~ 0 = E[xkxtk ] = P d + P s. Now, since the inputs are possibly correlated, we can derive information about X^ 0 out of the inputs U0j2i?1. This is done by projecting the (unknown) exact initial state sequence X0d + X0s onto the row space of the inputs U0j2i?1 : X^ 0 = (X0d + X0s )=U0j2i?1 = SR?1 U0j2i?1 10

This extra information on the initial state of the Kalman lter also implies that the error covariance matrix reduces from P d + P s to : 1 X^ 0(X^ 0)t = P d + P s ? SR?1 S t P~ 0 = P d + P s ? jlim !1 j These are exactly the same expressions for X^ 0 and P~ 0 as we found above. It can also be seen that when the inputs are uncorrelated (white noise), the projection of X0d + X0s onto the inputs U0j2i?1 is zero, which implies that there is no information about the initial state X^ 0 contained in the inputs U0j2i?1. The state sequence X^i+1 has a similar interpretation. The pth column of X^i+1 is equal to the non-steady state Kalman lter state estimate of the same (in a sense of the same initial conditions) non-steady state Kalman lter bank as discussed above, but now the lter has iterated one step beyond the estimate of the pth column of X^i . This is valid for all columns p = 1; : : :; j . 2. We de ne the residuals Ri of the projection as :

Ri = Yij2i?1 ? Zi = Yij2i?1 ? ?i X^i ? Hid Uij2i?1

(28)

Since Zi is the result of the projection of Yij2i?1 on the row space of U0j2i?1 and Y0ji?1, the residuals of this projection (Ri) will always satisfy : RiU0tj2i?1 = 0, RiY0tji?1 = 0 and RiZit = 0. Also, since X^i can be written as a linear combination of U0j2i?1 and Y0ji?1 (see formula (17)), we nd : RiX^it = 0. 3. Since the corresponding columns of X^i and X^i+1 are state estimates of the same (in a sense of the same initial conditions) non-steady state Kalman lter at two consecutive time instants, we can write (see formula (21)) :

X^i+1 = AX^i + BUiji + Ki (Yiji ? C X^ i ? DUiji )

(29)

It is also trivial that :

Yiji = C X^i + DUiji + (Yiji ? C X^i ? DUiji )

(30)

If we inspect the formula for Ri a little bit closer (28), we see that its rst row is equal to Yiji ? C X^i ? DUiji. And since we know that the row space of Ri (and thus also the rst l rows of Ri ) is perpendicular to U0j2i?1; Y0ji?1 and X^i , we nd (together 11

with (29) and (30)) :

0 1? U 0j2i?1 B C X^i+1 = AX^ i + BUiji + B (31) @ Y0ji?1 CA X^i 1? 0 U 0j2i?1 C B (32) Yiji = C X^ i + DUiji + B @ Y0ji?1 CA X^i where (:)? indicates a matrix whose row space is perpendicular to the row space of (:). These formulas will prove to be extremely useful in the next section where we determine the system matrices from Zi and Zi+1 . This summarizes the whole interpretation as a bank of non-steady state Kalman lters.

5 Identi cation Scheme In this section, we derive an N4SID algorithm to identify exactly (unbiased for j ! 1) the deterministic subsystem, directly from the given inputs uk and outputs yk . The stochastic subsystem can be determined in an approximate sense.

5.1 The projections First, the projections Zi and Zi+1 (13)-(14) have to be calculated. In section 7 we will describe a numerically stable way to do this. For convenience, we rewrite these projections as follows : 0 1 ! B U0ji?1 C 2 3 1 Li |{z} Li B U Li |{z} (33) Zi = |{z} @ ij2i?1 CA limi limi lili Y0ji?1 0 1 0 U0ji 1 1 2 3 L| {zi+1} L| {zi+1} A B B@ Ui+1j2i?1 CCA Zi+1 = @ L| {zi+1} (34) l(i?1)m(i+1) l(i?1)m(i?1) l(i?1)l(i+1) Y0ji with, from (15)-(18) :   (35) L1i = ?i [Ai ? Qi?i]S (R?1)1jmi + di ? QiHid L2i = Hid + ?i [Ai ? Qi?i ]S (R?1)mi+1j2mi (36) L3i = ?i Qi (37) 12

with (R?1 )1jmi denoting the submatrix from column 1 to column mi. The expressions for L1i+1; L2i+1 and L3i+1 are similar, but with shifted indices.

5.2 Determination of ?i and n. An important observation is that the column space of the matrices L1i and L3i coincides with the column space of ?i . This implies that ?i and the order of the system n can be determined from the column space of one of these matrices. The basis for this column space actually determines the basis for the states of the nal (identi ed) state space description. Let us mention two other possible matrices that have the same column space as ?i : 1 + L3 L2 L (38) 0 i i 1i  1 3  U0ji?1 A (39) Li Li @ Y0ji?1 It should be mentioned that, for i ! 1 the rst one (38) will lead to a deterministic subsystem that is balanced (See also Moonen & Ramos, 1991) while the second one (39) leads to a deterministic system that is frequency weighted (with the input spectrum) balanced (Enns, 1984) together with a stochastic subsystem of which the forward innovation model is balanced in a deterministic sense. We will not expand any more on this, but keep this for future work. We can now determine ?i ; ?i?1 and the order n as follows : Let T be any rank de cient matrix whose column space coincides with that of ?i .  Calculate the Singular Value Decomposition : 1 0  1 0 t  AV T = U1 U2 @ 0 0

 Since T is of rank n, the number of singular values di erent from zero will be equal to the order of the system.

 The column spaces of ?i and U111=2 coincide 3. So, ?i can be put equal to U111=2 .  With ?i de ned as ?i without the last l rows (l is the number of outputs), we get : ?i?1 = ?i In the following, we will take T equal to the expression in formula (39), but one can replace this with any other matrix of which the column space coincides with ?i. 3

The factor 1=2 1 is introduced for symmetry reasons.

13

5.3 Determination of the system matrices We now assume that ?i, ?i?1 and n are determined as described in the previous section, and are thus known. From (15)-(16) it follows that4 :

X^i = ?yi (Zi ? Hid Uij2i?1) X^i+1 = ?yi?1(Zi+1 ? Hid?1Ui+1j2i?1)

(40) (41)

In these formulas, the only unknowns on the right hand side are the matrices Hid and Hid?1 . From (31) and (32) we also know that : 1? 0 0 1 0 1 0 1 U 0 j 2 i ? 1 ^ @ Xi+1 A = @ A A X^i + @ B A Uiji + BB@ Zi CCA (42) Yiji C D X^i If we now substitute the expressions for X^i and X^i+1 (40)-(41) in this formula, we get : 1? 0 0 y 1 0 1 1 0 U 0j2i?1 @ ?i?1Zi+1 A = @ A A ?yi Zi + @ K12 A Uij2i?1 + BB@ Zi CCA (43) Yiji C K 22 ^ Xi | {z } | {z } {z } | term 1 term 2 term 3 where we de ne : 0 0 1 11 0 D 0 0 A ?yi?1 Hid?1 ? A?yi @ d A CC 1 BB B ? A?yi @ H1i?1 C B CC @ K12 A def 0 0 ?i?1B 1 (44) =B B C B@ K22 0 D y y A A ?C ?i @ H d A D ? C ?i @ ?i?1B i?1 Observe that the matrices B and D appear linearly in the matrices K12 and K22. Let  be a matrix whose row space coincides with that of 0 y 1 @ ?i Zi A Uij2i?1 then (from (43)) :

0 y 1 10 y 1 0 ? Z A K i +1 12 @ i?1 A = = @ A @ ?i Zi A = Yiji C K22 Uij2i?1

A denotes the Moore-Penrose pseudo-inverse

4 y

14

Obviously, this is a set of linear equations in the unknowns A; C; K12; K22. Another point of view is that one could solve the least squares problem : 0 y 1 0 10 y 1 Z ? A K 12 A @ ?i Zi A 2 @ i?1 i+1 A ? @ min k k A;C;K12 ;K22 Yiji C K22 Uij2i?1 F

(45)

Either way, from (43) we nd (term by term) :

term 1. A and C exactly. term 2. K12 and K22 from which B and D can be unraveled by solving a set of linear equations, analogous to the one described in De Moor (1988). Note that in (44), B and D appear linearly. Hence if A; C; ?i; ?i?1; K12 and K22 are known, solving for B and D is equivalent with solving a set of linear equations.

term 3. The residuals of the least squares solution (43) can be written as : 0 BB U0j2i?1  = @ Zi X^i

1? 0 1 CC @ Wiji A A = Viji

where W:j: and V:j: are block Hankel matrices5 with as entries wk and vk : the process and measurement noise. This is clearly indicated by equation (42). The system matrices Rs ; S s and Qs are determined approximately from  as follows : 0 1 1 (t) ' @ Qs S s A j (S s )t Rs The approximation is due to the fact that the bank of Kalman lters for nite i is not in steady state (see for instance the Riccati di erence equation (23)). As i grows larger, the approximation error grows smaller. For in nite i the stochastic subsystem is determined exactly (unbiased). More details on this purely stochastic aspect can be found in Van Overschee & De Moor (1991a,1991b).

5.4 N4SID Algorithm 1

In this section, we summarize the rst N4SID algorithm step by step, not yet paying attention to the ne numerical details, which will be treated in section 7.

The block Hankel matrices of the residual  have only one block row. The notation is introduced to be consistent with previous notations. 5

15

1. Determine the projections :

0 BB U0ji?1 Zi = Yij2i?1= @ Uij2i?1 Y0ji?1

Zi+1

1 CC A

0 ! U0ji?1 1 2 3 B L L L = |{z}i |{z}i |{z}i B @ Uij2i?1 limi limi lili Y0ji?1 1 0 BB U0ji CC = Yi+1j2i?1= @ Ui+1j2i?1 A Y0ji

1 CC A

2. Determine the Singular Value Decomposition 1 0 1 0  1 0 t   1 3  U0ji?1 A = U1 U2 @ AV Li Li @ 0 0 Y0ji?1 The order is equal to the number of non-zero singular values. ?i = U111=2 and ?i?1 = U111=2 3. Determine the least squares solution (1 and 2 are residuals) : j n mi j j 0 y 1 1 0 0 1 1 0 y n B K11 K12 C n B ?i Zi C n B 1 C n B ?i?1Zi+1 C B@ CA : C CA + B@ CA B@ A = B@ l Yiji l K21 K22 mi Uij2i?1 l 2 4. The system matrices are determined as follows : 4.1 A C

K11 K21

B; D follow from A; C and K12; K22 through a set of linear equations. 0 s s1 0 t 1 t Q   S   1 1 1 2 A 1@ @ st sA 4.3 j  t  t (S ) R 2 1 2 2 The deterministic subsystem will be identi ed exactly (as j ! 1, independent of i). The approximation of the stochastic subsystem is still dependent on i and converges as i ! 1. 4.2

16

6 A Simple Approximate Solution In this section, we introduce an N4SID algorithm that is very similar to the "exact" (as j ! 1) algorithm of the previous section (algorithm 1). The algorithm we present now nds a good approximation to the state X^i , and to the system matrices, without having to go through the complicated step 4 for the determination of B and D. This results in a simple and elegant algorithm with a slightly lower computational complexity as compared to algorithm 1. Another advantage of this simpli ed N4SID algorithm is that it is very closely related to existing algorithms (Larimore, 1990). This means that the analysis of this simpli ed algorithm can also be applied to the other algorithms, and can thus contribute to a better understanding of the mechanism of these algorithms. A disadvantage is that the results are not exact (unbiased) for nite i (except for special cases), but an estimate for the bias on the solutions can be calculated. If we could determine the state sequences X^i and X^i+1 directly from the data, the matrices A; B; C and D could be found as the least squares solution of (31)-(32). Unfortunately, it is not possible to separate the e ect of the input Hid Uij2i?1 from the e ect of the state ?i X^i in formula (15), by just dropping the term with the linear combinations of Uij2i?1 (L2i ) in the expression for Zi. We would automatically drop a part of the initial state if we did this (see for instance (36)). So, it is not possible to obtain an explicit expression for ?i X^i and ?i?1 X^i+1, without knowledge of Hid , which would require knowledge of the system matrices. It is however easy to nd a good approximation of the state sequences, directly from the data. If we use this approximation in (31)-(32), we obtain a second very elegant and simple N4SID algorithm that calculates approximations of the system matrices.

6.1 The Approximate States

In this section, we derive an approximate expression for the states X^ i and X^i+1 . This approximation can be calculated directly from input-output data. The approximate state sequences are calculated by dropping the linear combinations of Uij2i?1 out of Zi, and the linear combinations of Ui+1j2i?1 out of Zi+1. In this way, we obtain Kalman lter states of a di erent Kalman lter, compared to the one that produces X^i (in a sense of di erent initial conditions). We call the resulting matrices ?iX~i and ?i?1 X~i+1 : ?i X~i = Zi ? L2i Uij2i?1 (46) ?i?1X~i+1 = Zi+1 ? L2i+1Ui+1j2i?1 (47)

17

From (33) and (34) we derive : !   U 0 j i ? 1 X~ i = [Ai ? Qi ?i ]S (R?1)1jmi + di ? Qi Hid Qi (48) Y0ji?1  U0ji !  i+1 ? 1 d d ~ Xi+1 = [A ? Qi+1 ?i+1 ]S (R )1jm(i+1) + i+1 ? Qi+1 Hi+1 Qi+1 Y0ji The matrices ?i X~i and ?i?1 X~i+1 can be obtained directly from the data without any knowledge of the system matrices, and can be interpreted as oblique projections as described in Appendix D. The state sequence X~i is generated by a bank of non-steady state Kalman lters, with : P~ 0 = P d ? SR?1S t + P s and X~ 0 = S (R?1 )1jmiU0ji?1. The state sequence X~i+1 on the other hand, is generated by a bank of non-steady state Kalman lters, with : P~ 0 = P d ? SR?1 S t + P s and X~ 0 = S (R?1)1jm(i+1)U0ji. So clearly, both sequences do not belong to the same bank of Kalman lters, and the useful formulas (31)-(32) are not valid for these 2 sequences. Still, we will see below, that X~i and X~i+1 are very close to X^i and X^i+1, and that X^i = X~i , X^i+1 = X~i+1 if at least one of the following conditions is satis ed :

 i!1  The deterministic input uk of the combined deterministic-stochastic system is white noise

 The system is purely deterministic For these three special cases, we will analyze the di erence between X^i (17) and X~i (48), de ned as Xi :

Xi def = X^i ? X~i = [Ai ? Qi?i ]S (R?1)mi+1j2miUij2i?1

(49)

6.1.1 i ! 1

De ne the term between square brackets in (49) as Pi def = Ai ? Qi?i . Now, it is easy to prove (Appendix E) that : iY ?1 Pi = (A ? Kk C ) (50) k=0

Since the non-steady state Kalman lter converges to a stable closed loop system (Anderson & Moore, 1979), we nd that Pi grows smaller when i grows larger. It is also clear that : 18

limi!1 Pi = 0. In the limit, equation (49) thus becomes (with S (R?1 )mi+1j2miUij2i?1 nite) : lim Xi = ilim P S (R?1 )mi+1j2miUij2i?1 = 0 !1 i

i!1

The same holds for Xi+1. So, we can conclude that, for i ! 1, there is no di erence between the state sequences X^i and X~i . Actually, when i ! 1, the non-steady state Kalman lter bank converges to a steady state Kalman lter bank, i.e. the Kalman lters converge (the Riccati di erence equation (23) converges to an algebraic Riccati equation). To obtain the e ect of i ! 1 on a computer, in theory we would need a test like : kp XikF = kPiS (R?1)mi +1j2mi Uij2i?1 kF p < mach n:j n:j where mach is the machine precision. This is a condition that can only be checked, after the identi cation is done. Appendix G indicates how this quantity can be calculated. In practice, it turns out that i does not have to be that large. The identi cation results are already very good for reasonably small i (' 10). This will become apparent in the examples of section 9.

6.1.2 uk white noise With the deterministic input uk white noise, we nd : S = 0 and R = I2mi with I2mi the 2mi  2mi identity matrix. So, for a white noise input, we nd for any i (see (49)) : X^i = X~i

6.1.3 Purely deterministic system From Moonen et al. (1989), we know that generically for deterministic systems we have (if there is no 'rank-cancellation', see De Moor (1988)) : 1 0 1 0 rank @ U0ji?1 A = mi + n and rank @ Uij2i?1 A = mi + n (51) Y0ji?1 Yij2i?1 which implies that

1 0 U 0 j 2 i ? 1 A = 2mi + n rank @ Y0ji?1 This rank de ciency implies that for purely deterministic systems, the proof of Theorem 2 breaks down (B?1 can not be calculated). The following Theorem is an alternative for Theorem 2 for purely deterministic systems : 19

Theorem 4 Purely deterministic systems

If the input is persistently exciting and the stochastic subsystem is zero we have :

1 0  U 0 j i ? 1 A = ?i Xi = ?iX^i @ ?iX~i = Y0ji?1 A proof can be found in Appendix F. This implies that for deterministic systems, Xi is also equal to zero. 

?i[di ? Ai?yi Hid]

?i Ai?yi

6.2 The Algorithm

We know from (31) and (32) that the least squares solution L of : 0 1 0 1 ^ ^ X X min k @ Yi+1 A ? L @ U i A k2F L iji iji is equal to : 1 0 L = @ CA DB A We also know that from the residuals of this least squares problem, we can approximately calculate the stochastic subsystem (exactly if i ! 1). Unfortunately, it is impossible to calculate the states X^i and X^i+1 directly from the data, without any knowledge of the system matrices. In the previous section, we have seen that for some special cases X^i = X~i . In the general case, we have X^i ' X~i if i is reasonably large. This is because (section 6.1.1) iY ?1 Xi = (A ? Kk C )S (R?1)mi+1j2miUij2i?1 k=0

Consequently, for the least squares solution L~ of : 0 1 0 1 ~i+1 ~ X X i min k @ Y A ? L~ @ U A k2F (52) L~ iji iji we have : 1 0 A B L~ ' @ C D A Contrary to X^i, X~i can be calculated directly from the data, without any knowledge of the system matrices. The solution is exact (L~ = L) for the cases of section 6.1.1 (i ! 1), section 6.1.2 (input is white noise) and section 6.1.3 (purely deterministic), since then X^i = X~i . In Appendix G, an approximate expression for (L ? L~) is derived. This is the bias on the solution. 20

6.3 N4SID Algorithm 2

In this section, we summarize the second N4SID algorithm step by step, not yet paying attention to the ne numerical details, which will be treated in section 7. 1. Determine the projections : 0 1 U 0ji?1 B C Zi = Yij2i?1= B @ Uij2i?1 CA Y0ji?1 0 1 ! B U0ji?1 C 1 2 3 Li |{z} Li |{z} Li B U = |{z} ij2i?1 C @ A limi limi lili Y0ji?1 1 0 BB U0ji CC Zi+1 = Yi+1j2i?1= @ Ui+1j2i?1 A Y0ji 0 1 0 U0ji 1 2 3 1 L| {zi+1} L| {zi+1} A B B@ Ui+1j2i?1 CCA = @ L| {zi+1} l(i?1)m(i+1) l(i?1)m(i?1) l(i?1)l(i+1) Y0ji 2. Determine the Singular Value Decomposition 0 1 0 1  1 3  U0ji?1  1 0 t  A = U1 U2 @ AV Li Li @ 0 0 Y0ji?1 The order is equal to the number of non-zero singular values. ?i = U111=2 and ?i?1 = U111=2 3. Determine the states X~i and X~i+1 : 1 0   U 0 j i ? 1 y A X~i = ?i L1i L3i @ Y0ji?1 1 0   U 0 j i y A X~i+1 = ?i?1 L1i+1 L3i+1 @ Y0ji 4. Determine the least squares solution : j n m j j 0~ 1 0~ ~ 1 0 1 0 ~ 1 n B Xi+1 C n L11 L12 C n B Xi C n 1 B@ CA = BB@ CA : B@ CA + BB@ CCA l Yiji l L~21 L~22 m Uiji l 2 21

5. The system matrices are (approximately) determined as follows : 1 0 0 1 ~11 L~12 A B L A @ @ A 5.1 C D L~21 L~22 1 0 t 0 s s1 t   Q S   1 1 1 2 1@ A @ st sA 5.2 j  t  t (S ) R 2 1 2 2 where in the general case the approximation of the deterministic and stochastic subsystem depends on i (even when j ! 1) and converges as i; j ! 1.

7 A Numerically Stable and Ecient Implementation In this section, we describe how N4SID algorithms 1 and 2 can be implemented in a numerically stable and ecient way. We make extensive use of the QR decomposition and SVD. In section 5.4 and 6.3 we described step by step, two N4SID algorithms that determine the system matrices from given input-output data. In this section, we will show how these two algorithms can be implemented in a numerically stable and ecient way. Steps 1 through 4 are common to both algorithms.

1. Construct the block Hankel matrix H 6 0 1 q U H = @ Y 0j2i?1 A = j } | 0j2i?{z1 2(m+l)ij

2. Calculate the R factor of the RQ factorization of H

H=

R |{z}

2(m+l)i2(m+l)i

:

Qt |{z}

2(m+l)ij

with QtQ = I and R lower triangular. It is important to note that in the nal calculations, only the R factor is needed, which lowers the computational complexity signi cantly. 6 The scalar pj is used to be conform with the de nition of E. 22

Partition this factorization in the following way :

0 mi m m(i ? 1) li l l(i ? 1) 1 0 jt 1 0 j 1 Q1 C R11 0 0 0 0 0 mi U0ji?1 C mi C B B B BB Qt2 CC BB R21 R22 BB Uiji CC m 0 0 0 0 C m C CC BBB Qt3 CCC BB Ui+1j2i?1 CCC m(i ? 1) BBB R31 R32 R33 0 0 0 C m(i ? 1) B = BB R41 R42 R43 R44 0 BB Y0ji?1 CC li 0 C li CC BBB Qt4 CCC BB CC BB l @ R51 R52 R53 R54 R55 0 CA B@ Qt5 CA @ Yiji A l Qt6 l(i ? 1) R61 R62 R63 R64 R65 R66 l(i ? 1) Yi+1j2i?1 where we use the shorthand notation R4:6;1:3 for the submatrix of R consisting of block rows 4 to 6 and block columns 1 to 3. 3. Calculate the projections (see also the note at the end of this section). 1 0 1 0 U 0ji?1   C B Zi = R5:6;1:4 R?1:41;1:4 @ U0j2i?1 A = L1i L2i L3i B Uij2i?1 C A @ Y0ji?1 Y0ji?1 0 0 1 U0ji   1 B U 0 j 2 i ? 1 ? 1 A = Li+1 L2i+1 L3i+1 B@ Ui+1j2i?1 Zi+1 = R6:6;1:5 R1:5;1:5 @ Y0ji Y0ji

1 CC A

4. Determine ?i and n through the SVD of ?iX~i : 0 1     1  0 1 t A (Q1:4 V )t Li 0 L3i R1:4;1:4Q1:4 = U1 U2 @ 0 2 Note that this SVD can be calculated through the SVD of ( L1i 0 L3i )R1:4;1:4, since Q1:4 is an orthonormal matrix. The rank is determined from the dominant singular values of this decomposition (1), and ?i can be chosen as : ?i = U111=2

;

?i?1 = ?i

where the underbar means deleting the last l rows (l is the number of outputs). The two algorithms described in section 5 and 6 now di er in step 5 and 6.

23

N4SID Algorithm 1 : 5. We nd for the left hand side and right hand side of equation (43), written as a function of the original Q matrix : 0 y 1 0 ?1=2 t 1 Z ?  U R @ i i A = @ 1 1 5:6;1:4 A Qt1:4 U R2:3;1:4 0 y ij2i?1 1 1 0 ?1=2 y ? Z ) R  ( U 6:6;1:5 A t 1 @ i?1 i+1 A = @ 1 Q1:5 Yiji R5:5;1:5 6. Now the least squares problem (45) can be rewritten as 0 ?1=2 y 1 0 ?1=2 t 1  ( U ) R  U R 1 6:6 ; 1:4 5:6 ; 1:4 A?K@ 1 1 A k2F min k@ 1 R K R 5:5;1:4 2:3;1:4 and solved in a least squares sense for K. Note that the Q matrix has completely disappeared from these nal formulas. The rst n columns of K are A and C stacked on top of each other. The next columns determine B and D as described in section 5. The residuals of this solution determine the stochastic system as described in section 5.

N4SID Algorithm 2 : 5. We have :

0 @ X~i U 0 iji @ X~i+1 Yiji

1 0 ?1=2 t  1 1  3 R  U L 0 L 1:4;1:4 A t i A = @ 1 1 i Q1:4 R2:2;1:4 1 1 0 ?1=2 y  1  3 ) R  ( U 0 L L 1:5;1:5 A t 1 i+1 i+1 A = @ 1 Q1:5 R5:5;1:5

6. Now the least squares problem (52) can be rewritten as 0 ?1=2 y  1 0 ?1=2 t  1   1 3 1 0 L3 R1:4;1:4  ( U ) R  U L 0 L L 1 1:5 ; 1:4 1 i+1 i+1 i i A?L~ @ 1 A k2F min k@ 1 L~ R5:5;1:4 R2:2;1:4 Once again, the Q matrix has completely disappeared from these nal formulas. From L~ and the residuals we can nd the system matrices as described in section 6.

Note : 24

 The projection of step 3 is written as a function of U and Y , and not as a function of Q, because we need to be able to drop the linear combinations (L2i ) of the rows of Uij2i?1 in Zi to nd the sequence ?iX~i and the linear combination (L2i+1 ) of the rows of Ui+1j2i?1 in Zi+1 to nd the sequence ?i?1X~i+1 (see formulas (46)-(47)).

 The possible rank de ciency of the row space we are projecting on (in step 3) has to

be taken into account (see section 6.1.3). This occurs when R1:4;1:4 and R1:5;1:5 are rank de cient. It can be checked by inspection of the singular values of R1:4;1:4. If one of them turns out to be zero (purely deterministic case), a basis  for the space we are projecting on has to be selected. This basis should contain the row vectors of Uij2i?1 explicitly. This makes it easy to drop the linear combinations of Uij2i?1 (L2i ) in Zi to obtain the sequence ?i X~i. With the singular value decomposition : 0 1 0 10 t 1 1 0  S1 0  U R 1:1 ; 1:4 0 j i ? 1 t @ A=@ A @ V1t A Qt1:4 A Q1:4 = U1 U2 @ Y0ji?1 R4:4;1:4 V2 0 0 we nd as an appropriate basis :

0 t 1  = @ V1 A Qt1:4 R2:3;1:4

The rest of the algorithm is very similar to the general case. We should note that in the generic real life case, the problem of rank de ciency is a pure academic one.

8 Connection with existing algorithms 8.1 Instrumental variable method This method was described by De Moor et al. (1991), Verhaegen (1991). The basic idea will be shortly repeated here. We start from the input-output equation (11) : Yij2i?1 = ?i Xid + Hid Uij2i?1 + Yisj2i?1. Projection of this equation on the row space perpendicular to that of Uij2i?1, gives : Yij2i?1=Ui?j2i?1 = ?i Xid =Ui?j2i?1 + Yisj2i?1 (53) Note that Yijs2i?1=Ui?j2i?1 = Yisj2i?1 since Yisj2i?1=Uij2i?1 = 0 as j ! 1. Projection of (53) on U0ji?1 gives : [Yij2i?1=Ui?j2i?1]=U0ji?1 = ?i[Xid =Ui?j2i?1]=U0ji?1 since Yisj2i?1=U0ji?1 = 0. It 25

can be seen from this equation that the column space of [Yij2i?1=Ui?j2i?1]=U0ji?1 is equal to that of ?di (only the deterministic controllable modes are observed). So, from the column space of ?di, A and C can be derived. B and D are then found from the fact that : (?di )?[Yij2i?1=Uij2i?1]Uiyj2i?1 = (?di)? Hid which leads to a set of linear equations in B and D. Finally, the stochastic subsystem is identi ed from the di erence between the measured output and a simulation of the identi ed deterministic subsystem with the given input uk . This di erence is almost equal to the output of the stochastic subsystem (yks ). The stochastic subsystem can then be identi ed with for instance one of the algorithms described by Arun & Kung (1990), Van Overschee & De Moor (1991a,1991b). A clear disadvantage of this algorithm is the fact that the deterministic and stochastic subsystem are identi ed separately. This involves more calculations. Two di erent A matrices will also be identi ed (one for the deterministic and one for the stochastic system), even though a lot of the dynamics could be shared between the two subsystems.

8.2 Intersection Algorithms In the literature, a couple of algorithms, which we call 'intersection algorithms' have been published by Moonen et al. (1989), Moonen et al. (1992). It is interesting to note the analogy between these algorithms, and algorithm 2 described in section 6. We show that the row space of X~i can also be found as the intersection between the row spaces of two Hankel matrices H1 and H2. De ne I = row space H1 \ row space H2 with : 0 0 11 1 0 U BB Yij2i?1= @ 0j2i?1 A CC Y 0ji?1 A @ H1 = U and H2 = B Y0ji?1 C A @ 0ji?1 Uij2i?1

Facts :  The subspace I is n dimensional. Proof : We have : rank H1 = 1, with 1 = (m + l)i in the generic combined deterministic-stochastic case. From equation (15), we nd that: rank H2 = t mi + n. Finally, if the input is persistently exciting : rank Ht1 Ht2 = 1 + mi. With Grassmann's Theorem, we nd : dim [H1 \ H2] = rank H1 + rank H2 ?  t rank Ht1 Ht2 = (1) + (mi + n) ? (1 + mi) = n  The intersection subspace I is equal to the row space of X~i. Proof : { X~i lies in the row space of H1, since it is written as a linear combination of U0ji?1 and Y0ji?1 (see Formula (48)). 26

{ X~i lies also in the row space of H2 since we know from (46) that : 0 1 U X~i = ?yi [Yij2i?1= @ 0j2i?1 A ? L2i Uij2i?1] Y0ji?1

So, X~i lies in the intersection of the row spaces of H1 and H2. Since both the intersection subspace I and the row space of X~i are n dimensional, they must coincide. This derivation indicates strong similarities between N4SID algorithm 2 and the existing intersection algorithm mentioned above. When j ! 1, both algorithms will calculate the same solution (if the same state space basis is used). In practice however ( nite j ), they calculate slightly di erent models. In the references given above, only purely deterministic systems and the asymptotic case i ! 1 are treated. The interpretation of Kalman lter states is totally absent, as is the e ect of nite i. The similarities indicate an alternative way to calculate the states X~i and X~i+1 as intersections (see Moonen, 1992 for example).

8.3 Interpretation of general projection methods Larimore (1990) shows that viewed from a statistical point of view, the state (memory) Mi is the solution to : 0 1 0 1 U U p[Yij2i?1j @ 0j2i?1 A] = p[Yij2i?1j @ ij2i?1 A] Y0ji?1 Mi where p[AjB ] is the probability distribution of A given B . For Gaussian processes, these distributions are characterized by the rst two moments, and we can replace p[AjB ] by the expectation operation E[AjB ]. This expectation operator can in its turn be replaced by the projection operator (Papoulis, 1984). We then get : 0 1 0 1 U U 0 j 2 i ? 1 i j 2 i ? 1 A = Yij2i?1= @ A Yij2i?1= @ (54) Y0ji?1 Mi By substitution it is easy to verify that all of the following 3 alternatives : Mi = X^i , Mi = X~i and Mi = X^i =Ui?j2i?1 satisfy equation (54). Actually, every matrix Mi satisfying Mi = Zi ? Uij2i?1 and rank Mi = n with an arbitrary li  mi matrix, will also satisfy equation (54) and give rise to an n dimensional memory. This ambiguity is due to the fact that the exact in uence of the input variable Uij2i?1 on the output is not known. In the 27

frame work of linear theory, the most natural choice for Mi is though the one for which : 0 1 U Yij2i?1= @ 0j2i?1 A = ?iMi + Hid Uij2i?1 Y0ji?1 is satis ed ( = Hid). This is because this choice corresponds to the linear state space equations :

Mi+11 = AMi + BUiji 0 Yiji= @ U0j2i?1 A = CMi + DUiji Y0ji?1 This leads to the choice Mi = X^i .

9 Examples 9.1 A simple example This simple simulation example illustrates the concepts explained in this paper. We consider the single input single output single state system in forward innovation form (Faure, 1976) :

xk+1 = 0:9490xk + 1:8805uk ? 0:1502ek yk = 0:8725xk ? 2:0895uk + 2:5894ek With ek a unit energy ( = 1), zero mean, Gaussian distributed stochastic sequence. E ect of i : We rst investigate the e ect of the number of block rows (i) of the block Hankel matrices. The input uk to the system is the same for all experiments and is equal to a ltered unit energy ( = 1), zero mean, Gaussian distributed stochastic sequence added to a Gaussian white noise sequence ( = 0:1). The lter is a second order Butterworth lter, with cut-o frequency equal to 0:025 (sampling time 1). The number of data used for identi cation is xed at 1000. 100 di erent disturbance sequences ek were generated. For each of these sequences and for each i in the interval [2; 15], two models were identi ed using N4SID algorithm 1 (section 5) and 2 (section 6). Also for each of these models the bias was calculated using the expression in appendix G. Then the mean of all these quantities over the 100 di erent disturbance sequences was calculated (Monte Carlo experiment). 28

Fig. 4 shows the results as a function of i for both N4SID algorithms. The results for algorithm 1 are represented with a dotted line (:) and circles (o). The results for algorithm 2 are represented with a dotted-dashed (-.) line and the stars (*). The exact values are indicated with a dashed line. Fig. a/ shows the eigenvalue of the system as a function of i. Clearly the estimates are accurate, and there is hardly any di erence between the two algorithms. Fig. b/ shows the deterministic zero of the system (A ? BD?1 C ) as a function of i. The di erence between the two N4SID algorithms is clearly visible. Algorithm 1 estimates a zero that is close to the exact deterministic zero, independently of i. Algorithm 2 on the other hand is clearly biased for small i. Fig. c/ shows this bias (dashed-dotted line) and the estimated bias (dotted line) using the expressions of appendix G as a function of i. As expected, the bias grows smaller as i grows larger. Fig. d/ shows the calculated and estimated bias on the identi ed D matrix. This matrix seems to be a lot more sensitive to i than A. Finally Fig. e/ shows the estimated stochastic zero as a function of i (the eigenvalues of A ? KC , with K the steady state Kalman gain). The convergence is very slow (the exact zero is 0:9996). More details can be found in Van Overschee & De Moor (1991a,1991b). For this example, we can conclude that both algorithms do a good job of estimating A and C . B and D are estimated accurately with N4SID algorithm 1, but not with algorithm 2 (for small i). The bias can be calculated though. The accuracy of the stochastic subsystem is strongly dependent on i for both algorithms. E ect of j : Throughout the paper, we assumed that j ! 1. The e ect of nite j is shown in Fig. 4 f/. This Figure shows the standard deviation (stars) of the estimate of A as a function of j (i = 5). For eachpj , 100 Monte Carlo experiments were done. The standard deviation is p proportional to 1= j (dashed line). So, the accuracy of the results depend on j as 1= j .

9.2 A glass oven The glass oven has 3 inputs (2 burners and 1 ventilator) and 6 outputs (temperature). The data have been pre-processed : detrending, peak shaving, delay estimates, normalization (Backx, 1987). Using 700 data points, ve di erent models are identi ed (the 300 following points are used as validation data) : 1. A state space model with N4SID algorithm 1 of section 5 2. A state space model with N4SID algorithm 2 of section 6 29

3. A state space model with the algorithm of section 8.1 4. An ARX model (Ljung, 1991) followed by a balanced truncation 5. A prediction error model (Ljung, 1991) The results are summarized in Table 1 The rst row indicates the chosen order. Fig. 5 shows the singular values that led to the system order 5 for N4SID algorithm 1 and 2. The gap in the singular values is clearly visible. For the ARX model, the parameters 'na/nb' indicate that 'na' was a 6  6 matrix with as entries 4, and 'nb' a 6  3 matrix with as entries 4 (see also Ljung, 1991). In this way, the resulting state space model was of order 33. It was reduced to a fourth order model using balanced truncation of the deterministic subsystem. This is basically the same philosophy as for the N4SID algorithms, but the last ones do not calculate the intermediate model explicitly. For the prediction error model, we used the identi ed model of algorithm 1 in the controllability canonical form (controllability indices : 2; 2; 1. Number of parameters : 93) as an initial value (the initial values obtained from 'canform' or 'canstart' of Ljung (1991) did not converge). We restricted the number of iterations to 3 (no improvement with more iterations). The third row shows the number of oating point operations. Algorithm 2 needs about 5% less computation (simpler algorithm). The calculation of the ARX model takes 3 times as much computation. The calculation of the prediction error model (only 3 iterations) takes about 25 times as much computation ! It should be mentioned that none of the algorithms was optimized in the number of operations (nor ours, nor the ones in the Matlab toolbox), but these gures still give an idea of the order of magnitude of the number of computations. The fourth row is the error (in percent) between the measured validation output and the simulated output using only the deterministic subsystem. The fth row shows the error between measured and Kalman lter one step ahead prediction. With [ykv]i the ith validation output channel and [yks ]i the ith simulated output channel, the error is de ned as (Nv = 300) : v PN 6 u u X 1 t k=1vP([Nykv]i ? [yks]i)2 ] %  = 100 [ 6 v v 2 k=1 ([yk ]i) i=1 The best deterministic prediction is obtained with the ARX model, which is logical since it came about by balanced truncation of the deterministic subsystem. The other models perform a little worse (but all four about the same). 30

The Kalman lter error is the smallest though for the rst two N4SID algorithms. The projection retains the deterministic and stochastic information in the past useful to predict the future. This implies that the resulting models will perform best when used with a Kalman lter. The prediction error model does not improve this Kalman lter prediction error at all. There is even a slight decline which is probably due to numerical errors since during the iteration procedure a warning for a badly conditioned matrix was given. This means that, even though there is no optimization involved, the results obtained with the N4SID algorithms, for this industrial MIMO process are close to optimal. For this example, the N4SID algorithms thus calculate a state space model without any a-priori xed parametrization, and this in a fast and numerically reliable way. Finally, we should mention that, as we found out in discussion with Lennart Ljung, the prediction error method can be made to work better with some extra manipulations. The main problem with the prediction error method is that the outputs are pairwise collinear. This results in a very hard, ill-conditioned optimization problem, for every possible parametrization. This problem can be solved as follows : rst estimate an initial model. Now, in the optimization step, only the parameters of the C and D matrix corresponding to three outputs can vary (output 1, 4, and 6 for this example). The optimization is better conditioned now and the resulting model has a similar performance as the subspace models for this example. The number of oating point operations stays extremely high though and this method is rather complicated and requires quite some insight from the identi ers.

10 Conclusions In this paper two new N4SID algorithms to identify combined deterministic-stochastic linear systems have been derived. The rst one calculates unbiased results, the second one is a simpler biased approximation. The connection between these N4SID algorithms and the classical Kalman lter estimation theory has been indicated. This allows interpretations and proofs of correctness of the algorithms. In future work, we will discuss other connections with linear system theory : model reduction properties, frequency interpretations and connections with robust control. The open problems that remain to be solved are the connection with maximum likelihood algorithms (Ljung, 1987) and the problem of nding good asymptotic statistical error estimates. 31

11 Acknowledgments We would like to thank Lennart Ljung and Mats Viberg for their help and useful comments.

References Anderson B.D.O., Moore J.B. (1979). Optimal Filtering. Prentice Hall. Arun K.S., Kung S.Y. (1990). Balanced Approximation of Stochastic Systems. SIAM Matrix Analysis and Applications, 11, 42-68. Astrom K., Wittenmark B. (1984). Computer Controlled Systems : Theory and Design. Prentice Hall. Backx T. (1987). Identi cation of an Industrial Process : A Markov Parameter Approach. Doctoral Dissertation, Technical University Eindhoven, The Netherlands. Baksalary J.K., Kala R. (1979). Two relations Between Oblique and -Orthogonal Projectors. Linear Algebra and its Applications 24, 99-103. De Moor B. (1988). Mathematical concepts and techniques for modeling of static and dynamic systems. Doctoral Dissertation, Department of Electrical Engineering, Kath. Universiteit Leuven, Dept. E.E. Belgium. De Moor B., Van Overschee P., Suykens J. (1991). Subspace algorithms system identi cation and stochastic realization. Proceedings MTNS, Kobe, Japan, part 1, 589-594. Enns D. (1984). Model reduction for control system design. Ph.D.. dissertation, Dep. Aeronaut. Astronaut., Stanford University, Stanford CA. Faure P. (1976). Stochastic realization algorithms. System identi cation : Advances and case studies. Eds. Mehra R. and Lainiotis D., Academic Press. Kailath T. (1980). Linear Systems. Prentice Hall. Kung S.Y. (1978). A New Identi cation and Model Reduction Algorithm via Singular Value Decomposition. Proceedings of the 12th Asilomar Conference on Circuits, Systems and Computers, Paci c Grove, 705-714.

32

Larimore W.E. (1990). Canonical Variate Analysis in Identi cation, Filtering, and Adaptive Control. Proceedings of the 29th Conference on Decision and Control, Hawaii, 596-604. Ljung L. (1987). System Identi cation : theory for the user. Prentice Hall Information and System Sciences Series. Ljung L. (1991). System Identi cation Toolbox. The Mathworks, Inc. Moonen M., De Moor B., Vandenberghe L., Vandewalle J. (1989). On- and o -line identi cation of linear state space models. International Journal of Control, 49, 219232. Moonen M., Ramos J. (1991). A subspace algorithm for balanced state space system identi cation. ESAT/SISTA report 1991-07, Kath. Universiteit Leuven, Dept. E.E., Belgium. Accepted for publication in IEEE Automatic Control. Moonen M., Van Overschee P., De Moor B. (1992). A Subspace Algorithm for Combined Stochastic-Deterministic State Space Identi cation. ESAT/SISTA report 1992-10, Kath. Universiteit Leuven, Dept. E.E. , Belgium. Papoulis A. (1984). Probability, Random Variables, and Stochastic Processes. McGrawHill International Book Company. Van Overschee P., De Moor B. (1991a). Subspace Algorithms for the Stochastic Identi cation Problem. 30th IEEE Conference On Decision and Control, Brighton, England, 1321-1326. Van Overschee P., De Moor B. (1991b). Subspace Algorithms for the Stochastic Identi cation Problem. ESAT/SISTA report 1991-26, Kath. Universiteit Leuven, Dept. E.E. , Belgium. Accepted for publication in Automatica. Verhaegen M. (1991). A novel non-iterative MIMO state space model identi cation technique. Preprints 9th IFAC/IFORS symposium on Identi cation and System parameter estimation, Budapest, Hungary, 1453-1458.

Appendices :

33

A Proof of the projection theorem Proof of formula (33) :

Note : Due to the complexity of the formulas, we use the subscripts p (past) and f (future) throughout this appendix as follows : For U , Y d and Y s, these subscripts denote respectively the subscript 0ji ? 1 and ij2i ? 1. For X d and X s they denote the subscript 0 and i. We also make use of (9) - (12) without explicitly mentioning it. De ne  1 (Y  U t t t ) =  A def = jlim U Y A A A f 2 3 1 p f p !1

We have : 1Y Ut A1 = jlim !1 j f p

j

1 (? Ai X d + ? d U + H d U + Y s )U t = jlim i i p p i f f p !1 j i = ?i Ai S1 + ?i di R11 + Hid Rt12

1Y Ut A2 = jlim !1 j f f

1 (? Ai X d + ? d U + H d U + Y s )U t = jlim i i p p i f f f !1 j i = ?i Ai S2 + ?i di R12 + Hid R22

1Y Y t A3 = jlim !1 j f p

1 (? Ai X d + ? d U + H d U + Y s )((X d)t ?t + U t (H d )t + (Y s )t ) = jlim i i p p i f f p i p i p !1 j i = ?i Ai P d ?ti + ?i Ai S1 (Hid)t + ?i di S1t ?ti + ?i di R11(Hid)t + Hid S2t ?ti + Hid Rt12(Hid)t + His = ?i Ai P d ?ti + ?i di R11(Hid)t + ?i Ai S1 (Hid)t + ?i di S1t ?ti + Hid Rt12(Hid)t + Hid S2t ?ti + ?i si

Thus, we nd : ! A A A 1 2 3 A = |{z} |{z} |{z} # " limii limi d lili ?i A S + ?i i ( R11 R12 ) ?i Ai P d ?ti + ?i si + ?i di R11(Hid)t + ?i Ai S1 (Hid)t (55) = +?i di S1t ?ti + Hid Rt12(Hid)t + Hid S2t ?ti +Hid ( Rt12 R22 ) The second part we need to express in terms of the system matrices is : 0 1 0 1 Up C  t  B B B 1 11 21 A B U C t t t =@ B def = jlim !1 j @ f A Up Uf Yp B B 21 22 Yp 34

with :

1Y [ Ut Ut ] B21 = jlim !1 j p p f

1 (? X d + H dU + Y s)[ U t U t ] = jlim i p p p f !1 j i p = ?i S + Hid[ R11 R12 ]

1Y Y t B22 = jlim !1 j p p

1 (? X d + H dU + Y s)((X d)t?t + U t(H d)t + (Y s )t) = jlim i p p p i p i p !1 j i p = ?i P d?ti + ?iS1(Hid)t + Hid S1t ?ti + HidR11(Hid)t + Lsi

and B11 = R. Note that B is guaranteed to be of full rank ((2m + l)i) due to the persistently exciting input and the non-zero stochastic subsystem. To compute B?1, we use the formula for the inverse of a block matrix (Kailath, 1980) : 0 1?1 0 ?1 ?1 t ?1 ?1 1 ?1 B t ?1 t B ?B B B + B B B B 11 21 @ 11 21 A = @ 11 11?1 21 ?1 21 11 A (56) ?1

B21 B22 with = B22 ? B21B11?1B21t .

?

B21B11

So, in our case, this becomes : 0   ? 1 + ( I (H d)t + R?1 S t ?t ) ?1 (H d ? I 0  + ?i SR?1 ) R i i i i 0 B?1 = @ ?  ? 1 d ? (H I 0 + ? SR?1 ) i

i

i

 

?( I0

(Hid)t + R?1 S t ?ti ) i?1 ?1 i

with i

= ?i P d ?ti + Lsi + ?i S1 (Hid)t + HidS1t ?ti + Hid R11 (Hid)t ? (?i S + Hid [ R11 R12 ])R?1 (S t ?ti + = ?i (P d ? SR?1 S t )?ti + Lsi

1 A

R 

11 dt Rt12 (Hi ) )

From formulas (13) and (33) and the de nitions of A and B, we nd (with j ! 1): 0 1  1 2 3     1 1 U Li Li Li = j Yij2i?1 U0tj2i?1 Y0tji?1 [ j @ 0j2i?1 A U0tj2i?1 Y0tji?1 ]?1 Y0ji?1 = A B?1 35

(57)

Using (55) and (57), we thus nd : ? L1 L2  = ? Ai SR?1 + ? d ? I i

i



?



0 + Hid 0 I i i i +[?i Ai S1 (Hid)t + ?i Ai SR?1 S t ?ti + ?i di R11 (Hid )t + ?i di S1t ?ti +Hid Rt12 (Hid )t + Hid S2t ?ti ? ?i Ai P d ?ti ? ?i si ? ?i di R11 (Hid )t ??i Ai S1 (Hid)t ? ?i di S1t ?ti ? HidRt12 (Hid )t ? HidS2t ?ti ] i?1 (Hid I 0 = ?i di Hid + ?i Ai SR?1 + ?i (Ai (SR?1 S t ? P d )?ti ? si ) i?1 (Hid = ?i di ? ?i i i?1 Hid Hid + ?i Ai SR?1 ? ?i i i?1 ?i SR?1 = ?i [di ? Qi Hid ] Hid + ?i [Ai ? Qi?i ]SR?1

? ? ?

?







 + ? SR?1 ) ? I i0  + ? SR?1) i

which is exactly the same as the rst part of (33). Once again, using (55) and (57), we nd for L3i :

L3i = [??iAiS1(Hid )t ? ?idiR11(Hid )t ? HidRt12(Hid)t ? ?iAiSR?1 S t?ti     ??i di I 0 S t?ti ? Hid 0 I S t?ti + ?iAiP d ?ti + ?isi +?i diR11(Hid )t + ?i AiS1(Hid )t + ?idiS1t ?ti + Hid Rt12(Hid )t + Hid S2t ?ti] = ?i (Ai(P d ? SR?1S t)?ti + si) i?1 = ?i i i?1 = ? i Qi

?1 i

which is exactly the same as the second part of (33). The proof of formula (34) is completely analogous.

2

B Notes on the special form of the Kalman lter In this Appendix, we give two di erent forms of the recursive Kalman lter. The rst one is the classical form that can be found in for instance Astrom & Wittenmark, 1984. This form is transformed into the form of (21)-(23), which is more useful for this paper. Consider the system (1)-(2) :

xk+1 = Axk + Buk + wk yk = Cxk + Duk + vk where, for the time being, we assume that A; B; C; D as well as 0 1 0 s s1   w E[@ v k A wlt vlt ] = @ (SQs )t RS s A kl k 36

are known. Given x^0, P~0 and u0; : : :; uk?1, y0; : : : ; yk?1, the non-steady state Kalman lter state estimate x^k is then given by the following set of recursive formulas (Astrom & Wittenmark, 1984) : x^k = Ax^k?1 + Buk?1 + Kk?1 (yk?1 ? C x^k?1 ? Duk?1 ) (58) with :

Kk?1 = (AP~k?1C t + S s)(C P~k?1C t + Rs )?1 (59) P~k = AP~k?1At + Qs ? (AP~k?1C t + S s)(C P~k?1C t + Rs )?1 (AP~k?1C t + S s )t (60) and P~k the error covariance matrix : For our purpose, a di erent form of these general Kalman lter equations is more useful. With P s ; G; 0 the solution of the set of positive real equations :

P s = AP sAt + Qs G = AP sC t + S s 0 = CP sC t + Rs and the transformation :

) ) )

Ps G 0

P~k = P s + Pk

we get (from (59)) :

Kk?1 = ((AP sC t + S s) + APk?1 C t)((CP sC t + Rs ) + CPk?1 C t)?1 = (APk?1 C t + G)(0 + CPk?1 C t)?1 For the Riccati equation (from (60)) we have : P s + Pk = AP s At + APk?1 At + (P s ? AP s At ) ?((AP s C t + S s) + APk?1 C t)((CP sC t + Rs ) + CPk?1C t)?1((AP s C t + S s) + APk?1 C t)t Pk = APk?1 At ? (APk?1 C t + G)(0 + CPk?1 C t )?1 (APk?1 C t + G)t

This means that the Kalman lter (21)-(23) calculates the same state estimate x^k as the original Kalman lter (58)-(60) with P~0 = P s + P0 .

C Proof of the Kalman lter linear combinations In this Appendix we prove Theorem 3.

37

First we show that :

Pk = Ak P0(At)k ? k

?1 t k k

(61)

We prove (61) by induction :

Proof 1 :

k=1: From the Kalman lter formula (23) , we nd : P1 = AP0At ? (AP0C t + G)(CP0C t + 0)?1(AP0C t + G)t which is exactly the same as (61) for k = 1. k =i ) k =i+1 We have to prove that

Pi+1 = Ai+1P0(At)i+1 ? i+1

?1 t i+1 i+1

(62)

given that :

Pi = AiP0(At)i ? i i?1ti Hereto, we write i+1 i?+11 ti+1 as a function of i and i. 1?1 0 tC t    i i A i+1 i?+11 ti+1 = Ai G + Ai+1P0(At)iC t @ Ci 0 + CAiP0(At)iC t 1 0 t At  i A @ (63) (G + Ai+1P0(At)iC t)t To calculate the inverse of the middle block matrix in (63), we use formula (56). (To avoid confusion we use 2 in stead of ). 2

= 0 + C (AiP0(At)i ? i = 0 + CPiC t

?1 t t i i )C

Finally, if we put  = Ai+1P0(At)iC t + G, we nd for (63) :  i?1 + i?1tiC t 2?1Ci i?1 ?  ? 1 t i+1 i+1 i+1 = Ai  ? ?1C ?1

?1 t t ?1 i i C 2 ?1

i i 2 2 ? 1 t t ? 1 t t ? 1 t t ? 1 = Ai i i A + Ai i i C 2 Ci i i A ?Ai i?1tiC t 2?1t ?  2?1Ci i?1ti At +  2?1t

38

!

ti At t

!

So, from (62) : Pi+1 = Ai+1 P0(At)i+1 ? i+1 i?+11 ti+1 = A[Ai P0 (At )i ? i i?1ti ]At ? (Ai i?1ti C t ? ) 2?1 (Ci i?1ti At ? t ) = APi At ? [A(i i?1 ti ? Ai P0(At )i )C t ? G] 2?1[A(i i?1 ti ? Ai P0 (At)i )C t ? G]t = APi At ? (APi C t + G)(0 + CPi C t )?1 (APi C t + G)t Which is the equation of the Kalman lter (23). This implies that Pi+1 as written in (62) will be the state covariance at time instant i + 1.

End of Proof 1

2

The rest of the proof is also a proof by induction. We prove (24) :

Proof 2 :

k=1

 The Kalman lter gain K0 is given by (22) : K0 = (AP0C t + G)(0 + CP0C t)?1 and the state by (21) :

x^1 = Ax^0 + Bu0 + K0 (y0 ? C x^0 ? Du0) = (A ? K0C )^x0 + (B ? K0D)u0 + K0y0

 The linear combinations (24) are for k = 1 :  A ? 1 1?1C B ? 1

?1 D

1

1

?1

1



But we also know that for k = 1 (27) and (26) :

1

?1 = (AP0 C t + G)(CP0 C t + 0 )?1

1

So, these linear combinations give us the Kalman lter state estimate, for a Kalman lter with initial state covariance matrix P0, and initial state estimate x^0. k =i ) k =i+1 We know that (24) is valid for k = i : 1 0 1 0 y u 0 0 C B C B x^i = (Ai ? Qi?i)^x0 + (di ? QiHid ) B @ : : : CA + Qi B@ : : : CA yi?1 ui?1 39

with Qi = i i?1. So, from this we nd the state estimate at time i + 1 through the Kalman lter equation (21) 0 x^0 1 BB u0 CC BB :: : CC ?  BB uiu?1 CCC (64) x^i+1 = (A ? Ki C )(Ai ? Qi ?i ) (A ? Ki C )(di ? Qi Hid ) (B ? Ki D) (A ? Ki C )Qi Ki B BB y0i CC BB :: : CC @ yi?1 A yi

Now we have to prove that the linear combinations in (64) are the same as those of (24) for k = i + 1.

Proof 3 :

Hereto, we have to prove that :   Qi+1 = (A ? KiC )Qi Ki From the same formulas as in Proof 1, we nd : ? A ?1 + A ?1 tCt ?1 C ?1 ?  ?1C ?1 ?A ?1tCt ?1 +  i+1 i?+11 = ? AQi ?i ( ? AQi itCti) ?12CQ i (i ? AQ2tCt)i ?i 1  i i i 2 = i i i i i i 2 2  ? t t ? 1 = AQi ? (APi C + G)(0 + CPi C ) CQi (APi C t + G)(0 + CPi C t )?1 ? (A ? K C)Q K  = i

i

(65) ?1 2



i

End of Proof 3

2

If we now look at (24) for k = i + 1, we get :  i+1 A ? (A ? KiC )Qi?i ? KiCAi Adi ? (A ? KiC )QiHid ? Ki C di  B ? Ki D (A ? KiC )Qi Ki   = (A ? Ki C )(Ai ? Qi?i ) (A ? KiC )(di ? QiHid ) B ? KiD (A ? Ki C )Qi Ki These last linear combinations are exactly the same as (64). This concludes the proof of the Kalman lter states.

End of Proof 2

2

D Oblique Projections In this section, we introduce oblique projections. It is shown that ?iX~i and ?i?1 X~i+1 can be interpreted as oblique projections. A geometrical interpretation will also be provided.

40

D.1 Oblique projections

De nition 1 Oblique Projection Given two subspaces B,C of a j -dimensional ambient space and two matrices B 2 Rqj ; C 2 Rrj the rows of which generate a basis for these spaces , then the oblique projection along

C on B denoted with PBjC is uniquely de ned by (Baksalary & Kala, 1979) : B PBjC = B C PBjC = 0 The following Theorem is easy to derive from this de nition :

Theorem 5 Oblique Projection Given two matrices B 2 Rqj with row space B and C 2 Rrj with row space C , then the oblique projection along C onto B can be written as : 

PBjC = B t C t



0 t 1 t ?1 BB BC A )1jqB (@ t CB CC t

with (:)1?j1q the rst q columns of (:)?1.

Proof : B PBjC = = = =

C PBjC

1 0 t t ?1  BB BC A B B Bt C t @ t CB CC t 1jq 0 t 1 t ?1  t  BB BC A B BB BC t @ t CB CC t 1jq   Iq 0 1jq B B 

0 t 1 t ?1  BB BC A B = C Bt C t @ t CB CC t 1jq 1 0 t t ?1  t  BB BC A B = CB CC t @ t CB CC t 1jq   = 0 Ir 1jq B = 0 

2 41

Theorem 6 With A 2 Rpj and

0 t 1?1 t    A B t C t @ BBt BC t A = M N CB CC 

we have :

0 1 APBjC = MB = A= @ B A ? NC C

The Theorem is easy to prove starting from Theorem 5. As a last fact on oblique projections, we nd from Baksalary & Kala (1979) that the oblique projection on B along C , can also be written as a  orthogonal projection on B :

PB; = B t(B B t)?1B with

 = PC ? and PC? denoting the orthogonal projection operator on the row space orthogonal to C .

D.2 Interpretation of the states

From Theorem 6 and the de nition of X~i and X~i+1 (46)-(47), it can clearly be seen that ?i X~i and ?i?1 X~i+1 are oblique projections : ?iX~i = Yij2i?1P U0ji?1 jU ij2i?1 Y0ji?1 1 0  1 3  U0ji?1 A = Li Li @ Y0ji?1 ?i?1X~i+1 = Yi+1j2i?1P U0ji jU i+1j2i?1 Y0ji 1 0  1  U = Li+1 L3i+1 @ 0ji A Y0ji

Fig. 3 shows how the oblique projection of the row space of Yij2i?1 on the row spaces of ( U0tji?1 Y0tji?1 )t along the row space of Uij2i?1 can be interpreted as the following sequence of operations :

 an orthogonal projection to determine Zi 42

 followed by a decomposition of Zi into directions along the row spaces of ( U0tji?1 Y0tji?1 )t

and Uij2i?1. This gives respectively ?i X~i and L2i Uij2i?1. The decomposition of Zi into ?iX^i and Hid Uij2i?1 is also indicated. This is just a nice geometrical interpretation of the kalman lter states, and has no further consequences for the derivation of the identi cation algorithm.

E Additional Proof In this Appendix, we prove that : iY ?1

Pi = (A ? Kk C ) k=0

Proof :

We have (with (65)) :

Pi = Ai ? Qi?i = Ai ? (A ? Ki?1 C )Qi?1?i?1 ? Ki?1 CAi?1 = Ai ? AQi?1?i?1 + Ki?1CQi?1?i?1 ? Ki?1 CAi?1 = A(Ai?1 ? Qi?1?i?1 ) ? Ki?1C (Ai?1 ? Qi?1?i?1 ) = (A ? Ki?1C )(Ai?1 ? Qi?1?i?1 ) = (A ? Ki?1C )Pi?1 From this, we nd easily :

iY ?1

Pi = (A ? Kk C ) k=0

2

F Purely Deterministic Systems In this appendix, we prove that for purely deterministic systems, the algorithms still work. From the deterministic input-output equations (10)-(12) (Moonen et al. , 1989) we nd :

Yij2i?1 = ?i[di ? Ai?yi Hid]U0ji?1 + Hid Uij2i?1 + ?iAi?yi Y0ji?1 43

So, from this and (51) and the fact that the input is persistently exciting, we nd : 1 0 U 0 j i ? 1  B C Yij2i?1 = T1 T2 T3 B @ Uij2i?1 CA Y0ji?1 with :

T1 = ?i[di ? Ai?yi Hid ] + T1h T2 = Hid T3 = ?iAi?yi + T3h and the homogeneous solution satisfying :

0 U0ji?1  h  B T1 0 T3h B @ Uij2i?1 Y0ji?1

So, we nd that :

1 0 U 0 j 2 i ? 1 A Zi = Yij2i?1= @ Y0ji?1 0  BB U0ji?1  = T1 T2 T3 @ Uij2i?1 Y0ji?1 =



1 CC A=0

1 CC A

0  BB U0ji?1 y y d i d d i ?i[i ? A ?i Hi ] Hi ?i A ?i @ Y0ji?1 Yij2i?1

1 CC A

= Yij2i?1 This implies that :

?i X~i = = = =

Zi ? L2i Uij2i?1 Yij2i?1 ? Hid Uij2i?1 ?i Xi ?i X^i

2 44

G Calculation of the bias For the calculation of the bias L?L~ the error on the states X~i and X~i+1 has to be calculated. From (49)-(50), we have : iY ?1 ~ Xi = (A ? Kk C )S (R?1)mi+1j2miUij2i?1 k=0

This quantity can only be calculated after the system is identi ed. In the following it is thus assumed that the system matrices A; B; C; D; Qs; S s; Rs are known. (R and Q are the submatrices of the RQ decomposition as de ned in section 7).

 SR?1 can be calculated as : SR?1 = = = =

X0d U0tj2i?1(U0j2i?1U0tj2i?1)?1 ?yi [Y0dji?1 ? Hid U0ji?1]U0tj2i?1(U0j2i?1U0tj2i?1)?1 ?yi [Y0ji?1U0tj2i?1 ? Hid U0ji?1U0tj2i?1](U0j2i?1U0tj2i?1)?1 ?yi [R4:4;1:3 ? Hid R1:1;1:3]R?1:31;1:3

 For the calculation of the Kalman lter gains Kk , we also need the initial covariance estimate :

P~0 = P d + P s ? SR?1 S t

{ We have approximately : 1 X~ X~ t ' P d + P s lim j !1 j i i And from the description of the algorithm (see section 7) we have : X~iX~it =  1 j Thus we have : P d + P s ' 1

{ SR?1 S t can be easily found as : SR?1 S t = SR?1 (R1:3;1:3Rt1:3;1:3)(SR?1 )t = ?yi [R4:4;1:3 ? Hid R1:1;1:3][Rt4:4;1:3 ? Rt1:1;1:3(Hid)t](?yi )t 45

algorithm 1 2 3 4 5 order 5 5 3/3 4/4 5 i 5 5 5

ops 14355148 13590364 24056676 41744504 330529255 pred. determ. 47.41 47.95 48.56 45.72 47.34 pred. Kalman 10.76 10.76 14.17 28.98 10.87 Table 1: Comparison of 5 identi cation algorithms. Care should be taken when subtracting both quantities to obtain P~0. It is possible that due to the approximations, P~0 becomes negative de nite. If this is the case, it should be put equal to zero.

 Now the Kalman gains Kk can be calculated using formulas (22)-(23). The errors X~ i and X~ i+1 can be easily calculated as : iY ?1 ~ Xi ' (A ? Kk C )S (R?1)mi+1j2miR2:3;1:3Q1:3

X~i+1 '

k=0 Yi

k=0

(A ? Kk C )S (R?1)m(i+1)+1j2miR3:3;1:3Q1:3

The least squares solution of : 1 0 0 1 ~i+1 + X~ i+1 ~ ~ X X +  X i A 2 A ? L @ i @ min k kF  L Yiji Uiji will thus be a better approximation for L as L~ is. The bias can then be calculated as :

L ? L~ ' L ? L~

46

Input-Output data uk; yk Classical Identi cation

N4SID ?

?

System matrices

Kalman states Least Squares ?

Kalman lter

System matrices

?

Kalman states

Figure 1: The left hand side shows the N4SID approach : rst the (Kalman) states, then the system matrices. The right hand side is the classical approach : rst the system, and then an estimate of the states.

P~ 0 = P d ? SR?1S t + P s

SR?1 U0j2i?1

X^ 0

0 1 U 0 j i ? 1 @ A Y0ji?1

up ...

u0 ... ui?1 y0 ... yi?1

ui+p?1 yp ... yi+p?1 ?

X^i

x^i

uj?1 ... ui+j?2 yj?1 ... yi+j?2 ?

: : : x^i+p

?

:::

x^j

Figure 2: Interpretation of the sequence X^i as a sequence of non-steady state Kalman lter state estimates based upon i measurements of uk and yk . 47

0 BB U0ji?1 Yij2i?1= @ Uij2i?1 Y0ji?1

Uij2i?1 Hid Uij2i?1

  

1 CC A = Zi 0 1 U @ 0ji?1 A Y 0ji?1 : 

*           2     i ij2i?1       :        i i          

LU

? X~

?iX^i Figure 3: Interpretation of ?i X~i as an oblique projection.

48

Eigenvalue of A

0.95 0.949

*o

1.75

*o

o *

*o

o *

0.949

*o *o

o *

0.948

*o

Deterministic zero

1.76 o

*o

*o

*o

o

o *

o

*o

*o

o

1.74

o

*

o

*

*

0.948

5

10

5

o

*

-0.01

o

-0.015 o

o

* o

*o

*o

*

o *

o *

15

i

Bias on deterministic zero

-0.005

o *o

10

i 0

*

*

1.73

15

o *

*

* *

o

o

*o

o *

o *

*o

o *

Bias on D

0

o *

o *o

-0.01

* o

*

o

*o

*o

*o

o *

o *

*o

o

*o

*

*

o *

-0.02 o

*o

-0.02 -0.025

*

5

10

-0.03 *

15

5

10

i

15

i

Stochastic zero

Standard deviation of A

10 -1

1 0.9 *o

0.8

*o

*o

*o

*o

*o

*o

*o

*o

*o

*o

*o

* * *

10 -2

* * * * * *

*

*

*o

* * * * * * *

0.7 *o 5

10

10 -3 10 2

15

i

10 3 j

* *

10 4

Figure 4: a/ Eigenvalue of A as a function of i (N4SID algorithm 1 (*) and 2 (o)). b/ Deterministic zero as a function of i (N4SID algorithm 1 (*) and 2 (o)). c/ Calculated (*) and estimated (o) bias on the deterministic zero as a function of i (N4SID algorithm 2). d/ Calculated (*) and estimated (o) bias on D as a function of i (N4SID algorithm 2). e/ Stochastic zero as a function of i (N4SID algorithm 1 (*) and 2 (o)). f/ Standard deviation of A as a function of j (N4SID algorithm 1).

49

Singular Values 10 3

*

10 2

* *

10 1 * *

*

* * *

10 0

* * * * * * * * * *

10 -1

0

5

10

15

* *

20

* * * *

25

Order

Figure 5: Order decision based on the dominant singular values.

50

Suggest Documents