ON WEIGHTING IN STATE-SPACE SUBSPACE SYSTEM IDENTIFICATION Magnus Jansson and Bo Wahlberg S3-Automatic Control Royal Institute of Technology (KTH) S-100 44 Stockholm, Sweden
[email protected]
[email protected]
Keywords: System identi cation; parameter estimation; in the literature; see e.g. [8, 10, 11]. The analytical results
are applied on a simple numerical example where a problem in choosing a user speci ed parameter is highlighted. In the example the results are also compared with a lower bound, inAbstract dicating relative eciency of an optimally weighted parameter The aim of this contribution is to analyze a class of state-space estimator. subspace system identi cation (4SID) methods. In particular, the eect of dierent weighting matrices is studied. By a linear regression formulation, dierent cost-functions, which are rather implicit in the ordinary framework of 4SID, are com- 2 Model and Assumptions pared. Expressions for asymptotic variances of pole estimation error are analyzed and from these expressions, some di- Consider a linear time-invariant system and assume that the culties in choosing user speci ed parameters are pointed out. dynamics of the system can be described by an nth order Another interesting result obtained is that a row weighting in state-space realization the subspace estimation step does not aect the pole estimation error, asymptotically. This result holds for two dierent pole estimation methods, namely the shift invariance method x(t + 1) = Ax(t) + Bu(t) + w(t) (1a) and an optimally weighted least-squares method. In a numery(t) = Cx(t) + Du(t) + v(t) : (1b) ical example, relative eciency of the optimally weighted pole estimation method is indicated. Here, x(t) is an n-dimensional state-vector, u(t) is a vector of m input signals, w(t) represents process noise, the vector y(t) 1 Introduction contains l output signals and v(t) is additive measurement The schemes for state-space subspace system identi cation noise. The system is assumed to be asymptotically stable. (4SID) are considered very attractive since they estimate a The objective of an identi cation of the system description state-space representation directly from input-output data in (1) can be stated as follows: Estimate the system order n, [1, 2, 3, 4]. This is done without requiring a canoni- a set of matrices fA; B; C; Dg which is similar to the true cal parameterization as is normally necessary in conven- description and, possibly, the noise covariances from the meational prediction error methods [5, 6]. This requirement has sured data fu(t); y(t)gNt=1. Subspace estimation techniques been relaxed lately in [7] where it is shown that an over- rely on a model that has an informative part which inherit a parameterization of the state-space model does not necessar- low rank property plus, possibly, a noise part. In 4SID, the ily lead to performance degradation. However, the estimation model is obtained through a stacking of input-output data. method requires a huge parameter search. The 4SID methods Introduce a l 1 vector of stacked consecutive outputs as use reliable numerical tools which lead to numerically eective implementations and the nonlinear iterative optimization h i schemes that usually are necessary in the traditional frameT ; y(t + 1)T ; : : : ; y(t + ? 1)T T ; y ( t ) = y ( t ) (2) work are avoided. Perhaps, the most tractable property of 4SID is that the computations for multi-variable systems are just as easy as for single input single output systems. For an overview of 4SID methods, see [8, 9]. The motivation of our where is a user speci ed integer. To assure a low rank property is required to be larger than the system order work is to improve the understanding of 4SID methods. In this paper 4SID methods are studied in a linear regres- n. Similarly, de ne the vector of stacked inputs, process and sion framework. Subspace estimation from the linear regres- measurement noises as u (t), w (t), and v (t), respectively. sion model is discussed and dierent approaches are related Now, it is straightforward to iterate the system equations in to previously reported estimates. Results from asymptotic (1) to obtain the equation for the stacked quantities analysis of two pole estimation methods are also presented. This analysis is provided to be able to give guidelines for the y (t) = ? x(t) + u (t) + n (t) ; (3) user and to see eects of some weighting matrices proposed
subspace methods; performance analysis.
where n (t) = w (t) + v (t) contains the noise. Here, we have introduced the block matrices whose ij th blocks are i?1 f? gi1 = CA (4) 8 i ? j ? 1 > B i?j >0 D (5) i?j =0 : 0 i?j 0 f gij = 0CA (6) i?j 0 for i = 1; : : : ; and j = 1; : : : ; . Equation (3) is the basic model in 4SID. Notice that ? is the extended observability matrix and, hence, of rank n if the system is observable. The process and measurement noise are assumed to be stationary zero-mean ergodic white jointly Gaussian random processes. The measurable input, u(t), is assumed to be a quasi-stationary deterministic sequence [5], uncorrelated with w(t) and v(t). The correlation sequence of u(t) is de ned as 1 ru (s) = Nlim !1 N
N X t=1
Efu(t + s)uT (t)g
In [1] a similar idea of replacing the state with an estimate is elaborated on. They consider the estimated state-sequence as obtained from a bank of non-steady state Kalman lters and, in that case, give expressions for the matrices Li . The dierence here is that the observer-parameters are kept free when evolving in time. This results in the linear regression interpretation [11] and makes it possible to perform a systematic analysis within this framework. In 4SID algorithms it is common to use = and there exists no general rule for how to choose them. It is also a quite common idea that the performance improves if and are increased. However, in this paper a simple example is provided to show that this is not always the case.
4 Estimation Most 4SID methods start the estimation procedure with estimating the extended observability matrix. This, so called \subspace" estimation is discussed in Sect. 4.1. To be able to compare the eects of dierent weightings, the estimation of the system poles is considered in Sect. 4.2.
= E fu(t + s)uT (t)g : (7) 4.1 Subspace Estimation Furthermore, assume that the input is persistently exciting (8), the observations can be modeled by the matrix (PE) of order + , [5], where the parameter is described Using equation below. Y = L1 Y + L2 U + L3 U + E ; (11)
3 Linear Regression
The key idea of subspace identi cation is, as a rst step, to estimate the n dimensional range space of ? from Equation (3). This can be done with more or less clever projections on the data. The framework can be a bit dicult to interpret in terms of advanced numerical linear algebra, concealing the statistical properties. Here, 4SID will be formulated in a linear regression framework. The explicit relation to linear regression will hopefully clarify certain questions regarding 4SID.
De nition. The 4SID linear regression model is de ned by y (t) = L y (t ? ) + L u (t ? ) + L u (t) + e (t) : (8) 1
2
3
As seen in [11, 12] the linear regression model will inherit certain properties when used in conjunction with state-space descriptions like (1). The noise vector e (t) is orthogonal to y (t ? ), u (t ? ), u (t) and rank[L1 L2 ] = n ; R[L1 L2 ] = R[? ] ; (9) where R[] denotes the range space of []. Here, n is a design variable which corresponds to the dimension of the past used for estimating the state in (3). The parameter is the prediction horizon and is required to ful ll > n. The conditions (9) imply that there exists a rank n factorization of [L1 L2 ] like [L1 L2 ] = ? [K1 K2 ] : (10) The noise vector e(t) consists of two parts. The rst part contains the process and measurement noises, n (t). The second part comes from the estimation error of the states. Remark 1. While it is obvious that rank[L1 L2 ] n, it is not easy to establish conditions for when equality holds. Loosely speaking this requires the states to be suciently excited. We shall not elaborate further on this but rather assume (9) to hold throughout the paper.
where the block Hankel matrix of outputs is de ned as
Y = [y (1 + ); : : : ; y (N ? + 1)] ; (12) and similarly for U ; Y ; U and E . The 4SID linear regression model, (8), corresponds to a multi-step prediction error model. It is quite natural to base an estimation procedure on minimization of the prediction error using some suitable norm. Thus, consider the prediction error covariance matrix
(L ; L ; L ) = E ET = 1
2
3
N? +1 X t=1+
e(t)eT (t)
(13)
where only the explicit dependence on the regression coecients has been stressed. In general the matrices Li depend on the system matrices, noise covariances and the initial state. The structure in these matrices can be incorporated in the minimization of . However, the resulting nonlinear minimization problem is quite complicated. Here, the structure will be relaxed and L3 will be considered as a completely unknown matrix. Furthermore, the only restriction on [L1 L2 ] will be the rank n factorization of (10). In the following two subsections we will consider the subspace estimation for two dierent norms.
4.1.1 Weighted Least Squares (WLS)
The cost-function to be minimized in this section is given by TrfWg = TrfWE ET g =
N? +1 X t=1+
eT (t)We(t)
(14)
where W is a symmetric positive de nite matrix. Below we will directly supply the result of the minimization of (14). For
a detailed derivation, see [12]. Introduce the singular value decomposition (SVD) Q^ s S^ s V^ sT + Q^ n S^ n V^ nT = W1=2 Y ? PT (P ? PT )?1=2 (15) where ? = I ? UT (U UT )?1 U (16) and P = Y T UT T : (17) S^ s is a diagonal matrix containing the n largest singular values and the columns of Q^ s are the corresponding left singular vectors. The subspace estimate obtained by minimizing the WLS cost-function is now given by ?^ = W?1=2 Q^ s T (18) The estimate of the extended observability matrix concludes the rst step in many 4SID techniques. The appearance of the invertible factor T in (18) re ects the fact that dierent statespace realizations are related by a similarity transformation. T does not aect the minimization, but determines which realization is obtained.
is given in [8]. Here it is called the CVA solution even if Larimore does not explicitly use the extended observability matrix. Actually, Larimore does propose a generalization of the canonical variate analysis method and introduce a general weighting matrix corresponding to W in our notation. In [8] an interpretation of W as a shaping lter in the frequency domain is given. More precisely, W contains impulse response coecients of a shaping lter specifying the important frequency bands. Comparing (15) and (18) with (20) and (21) it is clear that if W = (Y ? YT )?1 the two estimates are identical and equal to the CVA estimate. Thus, CVA can be interpreted as an approximate ML estimator. In N4SID [1] the rank of L P is constrained while in our derivation the constraint is put on L , which leads to a completely dierent solution.
4.2 Pole Estimation
For comparing the quality of the estimates a canonical parameter set is to be considered. The perhaps most simple and most well known set is the system poles. Initial work on the performance of 4SID methods has been reported in [15, 4, 10]. The results presented therein rely on the pole estimation error and we will draw heavily upon their results.
4.2.1 The Shift Invariance Method For deriving the system matrix, A, a straightforward method It is well known that prediction error methods under certain is to use the shift invariance structure of ? , [16]. The esti-
4.1.2 Approximate Maximum Likelihood (AML)
assumptions have the same performance as Maximum Likelihood (ML) estimators. More precisely it is required that the prediction error is Gaussian noise, white in time, and that the cost-function is properly chosen, e.g. the determinant of the prediction error covariance matrix, [5, 6]. However, in the present problem the prediction errors are not white and thus taking the determinant cost function will not be the ML estimate. Instead, here, it will be called approximate ML (AML). Actually, there are more profound considerations in an ML approach. This is due to the rank constraint on the regression coecients, (10). For the derivation of the maximum likelihood estimator for a reduced-rank linear regression model we refer to [13], which also served as inspiration to the presentation in this section. Here, the criterion thus will be chosen as detfg : (19) Again we will only state the nal result of the minimization, for details see [12]. Similar to (15), introduce the SVD
Q^ s S^ s V^ sT + Q^ n S^ n V^ nT = (Y ? YT )? = Y ? PT (P ? PT )? = : (20) 1 2
With this the AML subspace estimate is ?^ = (Y ? YT )1=2 Q^ s T :
4.1.3 Discussion
1 2
(21)
The weighting matrix introduced in the weighted least squares method is a user's choice. Dierent alternatives have been suggested in the literature. If W = I the PO-MOESP method is obtained [3]. The weighting W = (Y ? YT )?1 results in the canonical variate analysis (CVA) solution [14, 2]. A natural interpretation of the details of Larimore's algorithm
mate of the system matrix is then A^ = ?^ y1 ?^ 2 (22) where ?^ 1 and ?^ 2 are the ( ? 1)l rst and last rows of ?^ , respectively. Here, ()y denotes the Moore-Penrose pseudo inverse of (). A direct application of this approach can in some cases lead to quite strange behavior of the accuracy of the estimates as a function of the dimension parameter ; see [11] or Sect. 5. However, in [11] the weighting matrix, W, was not included in the analysis. It would be interesting to investigate the eect of the weighting on the pole estimation error. Theorem 1. Let the subspace estimate be given by (18) together with (15) and the estimated system poles be the eigenvalues of (22). Then the asymptotic accuracy, rst order approximation in N , of the pole estimates is not aected by the weighting matrix, W. Proof. See [12]. Remark 2. This result, showing that the error is not aected by a row weighting in the subspace estimation, is quite surprising. Actually, it was expected that the multi-step ahead prediction error should be prewhitened by W. However, the eect of W obviously cancels, in rst approximation, in the shift invariance step of the estimation. Remark 3. One should keep in mind that this result holds for large data records. For nite data, the weighting may indeed improve the estimates. Furthermore, the weighting can be useful for other reasons, e.g. model order selection, nite sample bias shaping and for coping with unmodeled dynamics. Remark 4. Theorem 1 implies that the asymptotic distributions of the estimated poles with or without a W are the same. For explicit expressions of the asymptotic variance of the pole estimates, see [11, 12].
Here, denotes Kronecker product, ?(i) is the partial derivaof ? () with respect to i and d is the number of paramRecently, a method for optimally estimating the system poles tive eters in . Furthermore, ^ given ? has been proposed in [10]. It is suggested that a parameterization of the extended observability matrix, say I1 Rp ? RTup R?u 1 Rup ?1=2 Vs S?s 1 ; H = ? ^ ? = ? (), is chosen. Assume that an estimate, ? , is given. ?Ru Rup Then the estimate of the parameter vector is obtained by (28) solving the weighted least-squares problem T T T (t) = [p (t) u (t + )] ; (29) n o ^ = arg min vecT (?W1=2 ? Q^ s ) vec(?W1=2 ? Q^ s ) (23) T R ( ) = E (t + ) (t) ; (30) n o for some symmetric positive semi-de nite weighting matrix Rup = E u (t + )pT (t) ; (31) . Here, vec(A) denotes a vector obtained by stacking the columns of A and and similarly for all correlation matrices introduced. All ex? 1=2 ? 1 1=2 W1=2 ? = I ? W ? (? W? ) ? W (24) pressions are to be evaluated in 0 . Proof. See [12]. is a projection matrix on the null space of (W1=2 ? ); here () denotes complex conjugate transpose. The argument () Now it is straightforward to obtain an expression for the ophas been suppressed for notational convenience. In [10], this timal weighting in (23), in terms of yielding lowest possible problem is solved by invoking the general theory of asymptot- estimation error covariance. ically best consistent estimation of nonlinear regression pa- Theorem 3. If is chosen as rameters; see e.g. [6]. Furthermore, the projection matrix is = y (32) reparameterized in the coecients of the characteristic polynomial of A and a non-iterative procedure for solving (23) the covariance in (25) is is described. The results of the analysis below will to some T extent be parallel to the one in [10]. Speci cally, the opti~ = (GT yG)?1 : lim N E ~ (33) mal weighted method derived below is the same as in [10] N !1 though phrased dierently and tailored to our subspace esti- The weighting1 (32) is optimal in the sense that the dierence mate, (15)-(18), with the weight W included. Our derivation between (25), with any positive semi-de nite , and (33) is also provides an expression for the parameter estimation er- positive semi-de nite. ror covariance with a general weighting matrix and not just the optimal weighting. This will make it possible to compare Proof. See [12]. dierent weightings. Moreover, the results explicitly concern parameters in the extended observability matrix. For exam- Remark 5. If the null space of ? can be linearly parameterple, this makes it possible to directly apply the result to the ized and, if the weighting can be consistently estimated, it is possible to implement the estimator (23) in a non-iterative pole estimation error, as will be seen in Sect. 5. Here, ? is assumed to be parameterized such that the fashion; see [10]. The dimensions of the quantities, however, true parameter vector, 0 , is given uniquely by the identity, blow up considerably. ?W1=2 ? (0 )Qs = 0. In particular, ? in the diagonal re- Similarly as in the shift invariance method it can be proven alization can be parameterized by the system poles and, for that the weighting W does not aect the asymptotic accuracy multi-output systems, possibly some of the coecients in C; when the optimal , (32), is used. see [10]. The asymptotic distribution of these parameters can Theorem 4. Assume that the weighting matrix = y is be derived through a standard Taylor series expansion ap- used in (23). Then the asymptotic accuracy, rst order approach [5, 6]. This will answer e.g. the question of the eect proximation in N , of the parameter estimates does not depend of the weighting matrix W. on the row weighting in the subspace estimation, i.e. W. Theorem 2. The normalized parameter estimation error p ~ p ^ N = N ( ? 0 ), when using the estimator (23), has a Proof. See [12]. limiting zero-mean Gaussian distribution with the asymptotic Remark 6. In this case the result of Theorem 4 is quite incovariance given by tuitive, but not obvious, since each element in the error is T T ? 1 T T ? 1 ~ = (G G) (G G)(G G) ; already weighted individually by . lim N E ~ In the analysis above it was assumed that a subspace estimate N !1 (25) was given. One could ask why not to introduce the parameterization of ? directly in (14), before a speci c subspace if the inverse exist. The matrices G and are de ned by estimate is derived. The cost-function (14) can be concentrated to only depend on (? ) by rst minimizing with h iT G = (W1=2 ? )yQs ?W1=2 ? W1=2 respect to L3 and K . The concentrated cost-function reads n (d) vec(?(1) (26) ) : : : vec(? ) ; V () = Tr ? 1=2 W1=2 Y ? PT
4.2.2 Optimal Method
W
=
X? T
j j<
H R ( )H ?
? 1=2 W
(P ? PT )? P ? YT W = 1
? W
?
=
1 2
Rn ( )W
=
1 2
1 2
o
(34)
1 This weighting is not unique. It is easy to see that any matrix in the null space of GT can be added without aecting the lower W ? : (27) bound.
? 1=2
where terms independent of ? have been omitted. Consider the parameter estimates obtained through ^ 1 = arg min V () : (35) An interesting question is how this estimate is related to the one from (23). This can be answered (asymptotically) by using the following result. Theorem 5. The estimator (35) is asymptotically equivalent to ^ 2^T ^ 2 = arg min Trf? (36) W1=2 ? Qs Ss Qs g ;
p i.e. N (^ ? ^ ) ! 0 in probability as N ! 1. This implies 1
2
that ^ 1 and ^ 2 have the same asymptotic distribution. Proof. See [12]. Now, it is possible to rewrite the cost-function in (36) to yield vecT (?W1=2 ? Q^ s )(S2s I) vec(?W1=2 ? Q^ s ) (37)
where the facts that TrfABg = vecT (AT ) vec(B) and that vec(ABC) = CT A vec(B) have been used. Comparing (37) and (23), it is obvious that (37) is a special case with = S2s I : (38) The following theorem is then immediate. Theorem 6. The estimator in (35) yields less accurate estimates than using (23) with the optimal weighting (32). Remark 7. The performance comparison in Theorem 6 is not completely proper since the optimally weighted estimator is considerably more involved (the norms are dierent). However, it is interesting that we by this have an expression for the asymptotic covariance for the estimates from (35). Remark 8. This should also be compared with the subspace formulation in [17]. They consider a weighted subspace tting method applied to another subspace estimate than that used here. However, if their approach is adopted on the estimate herein, the estimator would be (36) and the performance given by Theorem 2 with the weight (38).
5 Example With the example below the importance of proper weighting when recovering the poles from the subspace estimate is clearly illustrated. The theoretical expression for the estimation error covariance, (33), for the optimal estimator will be compared with the corresponding results for the shift invariance method; see [11, 12]. Consider the same example as in [11]. The system is described by the rst-order state-space model xt+1 = 0:7xt + ut + wt (39a) yt = xt + vt : (39b) The measurable input, ut , the process noise, wt, and the measurement noise, vt , are all zero mean individually independent white Gaussian scalar-valued random processes. In the two cases considered, the variance of ut is ru = 1 and the variance of vt is rv = 1. The variance of the process noise will be varied.
0:75 0:70 0:65 0:60 0:55 0:50 2
0:94 0:93 0:92 0:91 0:90 0:89 0:88 0:87 0:86 0:85 0:84 2
3
3
4
4
5
6 1
5
2
6 1
3
2
4
3
4
5
5
6
Figure 1: The basic shift invariance approach. Normalized asymptotic standard deviation of the estimated pole as function of and . In the rst plot the variance of the process noise is rw = 0:2 and in the second rw = 2. In order to use (23) a parameterization of ? has to be chosen. If C is xed to 1, ? takes the form
? = 1 ? T ; 2
1
(40)
where is the pole to be estimated. For a practical way of doing this we refer to [10]. p Fig. 1 shows the normalized (by N ) asymptotic standard deviation of the pole estimate for the shift invariance method. The two sub-plots correspond to dierent variance of the process noise. Next, the normalized asymptotic standard deviation of the pole estimate from (33) is plotted in Fig. 2. It is seen that the degradation in accuracy when increasing for high process noise variance in Fig. 1, is overcome in the optimal weighting case; see Fig. 2. The shift invariance method does not have any inherent weighting compensating for bad scaling and, thus, it is impossible to give any rules for the choice of the prediction horizon . In the optimal method, the weighting assures a more robust or consistent behavior. The performance clearly improves with , but the computational burden increases and a trade-o is necessary. For the choice of the observer dimension, , it is observed that the performance is quite independent of . Thus, a choice close to the system order is recommended, since the computational cost increases with . For the example considered it is possible to calculate a lower bound on the variance of the estimation error for any unbiased estimator of the pole, i.e. the Cramer-Rao lower bound (CRB). For this example the bounds on the pole estimation error standard deviation were calculated to be 0:506 and 0:830 for low and high process noise, respectively. An interesting observation is that the CRB seems to be attained by the method which uses the complete structure of ? , for large and .
0:75 0:70
[2]
0:65 0:60 0:55 0:50 2
3
4
5
6 1
2
3
4
5
[3]
[4]
0:88
0:87
0:86
0:85
[5]
0:84
0:83
0:82 2
3
4
5
6 1
2
3
4
5
[6] [7]
Figure 2: The optimal weighting approach. Normalized asymptotic standard deviation of the estimated pole as function of and . In the rst plot the variance of the process noise is rw = 0:2 and in the second rw = 2.
[8]
6 Conclusions
[9]
In this paper we have used a linear regression formulation to link state-space subspace system identi cation (4SID) to traditional identi cation methods. This has been used to compare some cost-functions which arise implicitly in the ordinary framework of 4SID. From the cost-functions, estimates of the extended observability matrix have been derived and, based on these estimates, the estimation of certain parameters were analyzed. The problem of estimating the poles was considered. In particular the asymptotic properties of the estimates of two dierent methods were analyzed. The rst method is common in 4SID and makes use of the shift invariance structure of the observability matrix, while the second method is a recently proposed weighted least-squares method. From these results the choice of user-speci ed parameters was discussed, and it was shown that this choice indeed may not be obvious for the shift invariance method. This problem can be mitigated using more of the structure of the system. It was also shown that a proposed row weighting matrix in the subspace estimation step does not aect the asymptotic properties of the estimates. Furthermore, a numerical example was used to indicate the relative eciency, i.e. attainment of the CRB, when the optimally weighted least-squares method was applied.
[10] [11] [12] [13] [14] [15]
Acknowledgments
The authors are very grateful to Iven Mareels, Bjorn Otter- [16] sten, Mats Viberg and Petre Stoica for various help during the course of this project. The comments from the reviewers are also acknowledged. [17] References [1] P. Van Overschee and B. De Moor, \N4SID: Subspace Algorithms for the Identi cation of Combined
Deterministic-Stochastic Systems", Automatica, Special Issue on Statistical Signal Processing and Control, 30(1):75{93, Jan. 1994. W. E. Larimore, \Canonical Variate Analysis in Identi cation, Filtering and Adaptive Control", In Proc. 29th CDC, pages 596{604, Honolulu, Hawaii, December 1990. M. Verhaegen, \Identi cation of the Deterministic Part of MIMO State Space Models given in Innovations Form from Input-Output Data", Automatica, Special Issue on Statistical Signal Processing and Control, 30(1):61{74, Jan. 1994. M. Viberg, B. Ottersten, B. Wahlberg, and L. Ljung, \Performance of Subspace-Based System Identi cation Methods", In Proc. IFAC 93, Sydney, Australia, July 1993. L. Ljung, System Identi cation: Theory for the User, Prentice-Hall, Englewood Clis, NJ, 1987. T. Soderstrom and P. Stoica, System Identi cation, Prentice-Hall International, Hemel Hempstead, UK, 1989. Tomas McKelvey, \Fully Parametrized State-Space Models in System Identi cation", In Proc. 10th IFAC Symp. on Syst. Id., volume 2, pages 373{378, Copenhagen, Denmark, 1994. P. Van Overschee and B. De Moor, \A Unifying Theorem for three Subspace System Identi cation Algorithms", In Proc. 10th IFAC Symp. on Syst. Id., Copenhagen, Denmark, 1994. Mats Viberg, \Subspace Methods in System Identi cation", In Proc. 10th IFAC Symp. on Syst. Id., volume 1, pages 1{12, Copenhagen, Denmark, 1994. B. Ottersten and M. Viberg, \A Subspace-Based Instrumental Variable Method for State-Space System Identi cation", In Proc. 10th IFAC Symp. on Syst. Id., volume 2, pages 139{144, Copenhagen, Denmark, 1994. B. Wahlberg and M. Jansson, \4SID Linear Regression", In Proc. IEEE 33rd Conf. on Decision and Control, pages 2858{2863, Orlando, USA, December 1994. M. Jansson and B. Wahlberg, \A Linear Regression Approach to State-Space Subspace System Identi cation", Submitted to Signal Processing, 1995, Also available at http://control.e.kth.se/s3/. P. Stoica and M. Viberg, \Maximum Likelihood Parameter and Rank Estimation in Reduced-Rank Multivariate Linear Regressions: A Subspace Approach", Submitted to Signal Processing, 1995. W. E. Larimore, \System Identi cation, Reduced-Order Filtering and Modeling via Canonical Variate Analysis", In Proc. ACC, pages 445{451, June 1983. M. Viberg, B. Ottersten, B. Wahlberg, and L. Ljung, \A Statistical Perspective on State-Space Modeling Using Subspace Methods", In Proc. 30th IEEE Conf. on Decision & Control, pages 1337{1342, Brighton, England, Dec. 1991. S.Y. Kung, \A New Identi cation and Model Reduction Algorithm via Singular Value Decomposition", In Proc. 12th Asilomar Conf. on Circuits, Systems and Computers, pages 705{714, Paci c Grove, CA, November 1978. A.L. Swindlehurst, R. Roy, B. Ottersten, and T. Kailath, \A Subspace Fitting Method for Identi cation of Linear State Space Models", IEEE Trans. Autom. Control, 40(2):311{316, Feb. 1995.