WEIGHTING MATRICES IN SUBSPACE ALGORITHMS D. BAUER ;1 Institute f. Econometrics, Operations Research and System
Theory, University of Technology Vienna, Argentinierstr. 8, A-1040 Wien , Austria, e-mail:
[email protected] 1
Abstract In this paper we deal with the estimation
of linear, time invariant, discrete time systems using so called subspace algorithms. The asymptotic distribution of the estimates is given for the case, when the subspace algorithm is performed using rather general weighting matrices. This result is an extension of the result obtained in [1]. The presented theory is complemented with a comparison of the asymptotic variance for two dierent weighting matrices and with a simulation study corresponding to the small sample properties. Keywords: Subspace Methods, Asymptotic Properties, Identi cation, Linear Systems. 1.
INTRODUCTION
The traditional approach to estimation of nite dimensional linear time invariant systems is to optimize a criterion function (such as the likelihood) over the set of all transfer functions of given degree. Fully automatic procedures have been derived (see e.g. [2]). These procedures are known to be consistent and asymptotically ecient under mild conditions on the system and the noise. However these approaches also have some disadvantages: They are computationally demanding, especially for multi-input multi-output (MIMO) systems, and they suer from the problems corresponding to local minima. These problems can be circumvented with using subspace algorithms (for a review on subspace methods, see [3]), which use the geometrical structure of state space systems to obtain estimates of the transfer function. The statistical properties of subspace algorithms have been analyzed in a number of papers: In [4] consistency for a special class of subspace algorithms was derived, in [5] asymptotic normality for the system poles was established and in [1] asymptotic normality for the Support by the Austrian 'Fonds zur Forderung der wissenschaftlichen Forschung' Projekt P11213-MAT is gratefully acknowledged.
1
system matrix estimates was proved. Using subspace algorithms, the user has to take certain choices, whose eect on the accuracy of the estimates has not been clari ed up to now. For a class of subspace algorithms, which is slightly dierent than the one considered here, [6] showed, that the choice of the weighting matrices does not aect the asymptotic distribution of the estimates of the poles. This paper extends the results of [1] to the case of general weighting matrices, and demonstrates the eects of the weighting matrix on the asymptotic distribution in one example both using an analysis of the asymptotic variance and in a small sample simulation study.
MODEL SET AND DESCRIPTION OF THE ALGORITHM
2.
In this paper we consider linear, nite dimensional, time invariant, discrete time, state space systems of the form xt+1 = Axt + B"t (1) yt = Cxt + D"t where yt denotes the s-dimensional output series, "t denotes the s-dimensional (unobserved) white noise with unit variance, and xt denotes the n-dimensional state variable. The matrices A 2 Rnn; B 2 Rns; C 2 Rsn and D 2 Rss, where D is lower triangular with strictly positive entries on the diagonal, are parameter matrices. The system is assumed to be stable (i.e. jmax (A)j < 1, where max (:) denotes an eigenvalue of maximummodulus of a matrix) and strictly minimumphase (i.e. jmax (A ? BD?1 C )j < 1). Note that these assumptions only exclude spectra having zeros on the unit circle. All subspace algorithms hinge on the same fact: De n? = [yt?1; ; yT ]T ing Yt;f+ = [ytT ; ; ytT+f ?1 ]T ; Yt;p t?p T T and the matrices Of = [C ; A C T ; ; (AT )f ?1 C T ]T ,
Kp = [B; ; (A ? BD?1 C )p?1B ], we obtain the fol-
lowing equation: + Yt;f+ = Of KpYt;p? + Of (A ? BD?1 C )p xt?p + Nt;f + denotes the term due to "t ; ; "t+f ?1 , where Nt;f ? and xt?p . Also note, which is uncorrelated with Yt;p ? 1 p that (A ? BD C ) tends to zero for p ! 1. Now the algorithms considered here may be decomposed into three steps: ? to get an estimate ^f;p of (1) Regress Yt;f+ on Yt;p Of Kp . (2) Approximate ^f;p by a rank n matrix and decompose the approximation into the product of two rank n matrices to get an estimate K^ p of Kp . ? to (3) Use the estimate of the state x^t = K^ p Yt;p estimate C by regressing yt on x^t and then estimate [A; BD?1 ] by regressing x^t+1 on x^t and the residuals ^t of the rst regression. D is estimated as the lower triangular Cholesky factor of the sample variance of ^t. The approximation in step 2 is performed by using the singular value decomposition of W^ f+ ^f;p W^ p? = U^n ^ nV^nT + R^ , where the smaller singular values contained in R^ are neglected. The two weighting matrices W^ f+ and W^ p? are user choices. In this paper we will restrict W^ p? to be equal to (?^ ?p )1=2 , where ?^ ?p denotes the estimate of the variance of Yt;p? and (:)1=2 denotes any square root of a symmetric matrix. Wf+ will be assumed to be equal to Wf+ = (kW (i ? j ))i;j =1;;f ; kW (j ) = 0; j < 0 and kW (0) P nonsingular, j for some transfer function kW (z ) = 1 j =0 kW (j )z . With the exception of the CCA algorithm [7] (which uses W^ f+ = (?^ +f )?1=2 , where ?^ +f denotes the estimate of the variance of Yt;f+ ), to the best of the authors knowledge, no other weighting matrices have been proposed. 3.
THE MAIN RESULT
Let Un+ (kW ) denote the set of transfer functions of order n, for which for given weighting matrix Wf+ and xed nite f n the matrix Wf+ Hf (?? )?1=2 has distinct singular values. It can be shown, that this set is generic in M (n), the set of all stable, minimumphase, rational transfer functions of degree n. Let Dn+ (kW ) denote the set of all realizations of transfer functions k 2 Un+ (kW ) corresponding to a SVD of Wf+ Hf (?? )?1=2. Note, that the equivalence classes in Dn+ (kW ) are nite sets corresponding to dierent choices of the orientation of the singular vectors (for a discussion see [1]). Then the following holds:
Z
Z
Theorem 1. Let (yt )t2 be generated by k0 2 Un+ (kW ),
where the ergodic white noise ("t )t2 ful lls the following conditions:
Ef"t jFt?1g = 0 T E f"t "t jFt?1g = Ef"t "Tt g = I > 0 Ef"4t;i g < 1 E f"t;a "t;b "t;cjFt?1g = !a;b;c
where E denotes the expectation operator, Ft the algebra spanned by the past of the noise and additional subscripts here indicate components of the vector "t . If p ful lls the following conditions: (1) p ? 2d loglogjT0 j ; 8T > T0 for some d > 1, where 0 is the zero of k0 of maximum modulus, (2) p=(log T )a ! 0 for some a < 1. then there exist matrices (A0 ; B0 ; C0; D0) 2 Dn+ (kW ), such that p T vec[A^T ? A0 ; B^T ? B0 ; C^T ? C0; D^ T ? D0 ] !d Z where Z is a multivariate normal random variable with zero mean and (singular) variance Vf (kW ). Let g be a mapping attaching vectors x 2 Rm to matrices (A; B; C; D), which is dierentiable in a neighborhood of (A0 ; B0; C0; D0) and has Jacobian Jg at (A0 ; B0; C0; D0 ). Then p ^ ^ ^ ^ T [g(AT ; BT ; CT ; DT ) ? g(A0 ; B0 ; C0; D0 )] is asymptotically normal distributed with mean zero and variance Jg Vf (kW )JgT . Although the variance Vf (kW ) is too complicated to be analyzed analytically, it can be approximated on the computer and thus can be used to compare the eects of dierent weightings on the asymptotic variance. Note also, that the transfer function at a xed frequency is a dierentiable function of the system matrices. Thus the asymptotic accuracy of the transfer function estimate at xed frequency can be compared for dierent weightings. Also note, that there are possibilities to estimate p such that the restrictions hold a.s. asymptotically (see [1]). 4.
NUMERICAL EXAMPLES
Consider the SISO system having zeros at 0:4; ?0:4 and 0:6i and poles at 0:8e0:2i ; 0:79e0:8i. In the rst example we compare the asymptotic variance of the transfer function estimate for three dierent weighting matrices: A high pass lter (12-th order butterworth lter, generated by the MATLAB command butter(12,0.5,'high')), a low pass lter (12-th order butterworth, butter(12,0.5)) and nally the weighting used in CCA (for the central limit theorem in this case see [1]). Figure 1 plots the absolute value of the asymptotic variance of the transfer function estimates for 100 points distributed equally spaced over the frequency range. The high pass frequency lter
results in a lower variance for frequencies in the high frequency range, whereas the low pass lter achieves the best accuracy in the low frequency range, as could have been expected from the choice of kW . The CCA procedure performs comparable to the high pass lter weighting in the high frequency range and comparable to the low pass lter weighting in the low frequency range.
(i.e. kW = I ) and the low pass lter used in the last example. f = p = 20 was chosen for all experiments. Figure 2 shows the absolute value of the estimated variance of the estimates, Figure 3 shows the absolute value of the deviation of the sample mean of the transfer function estimates from the true transfer function. The CCA procedure and the weighting corresponding to kW = I show comparable variances, however the bias distribution over the frequency range is quite dierent. In this case, the weighting corresponding to the low pass lter performs worse in all respects: It produces higher variances and also higher deviations from the true transfer function. 5.
FIGURE 1: Modulus of the Asymptotic Variance: Low pass (-o-), high pass(-x-) and CCA (-), f = 10.
FIGURE 2: Absolute value of sample variance: CCA(?), kW = I (?x?) and low-pass(?o?).
FIGURE 3: Absolut value of deviations of sample mean from true transfer function: CCA(?), kW = I (?x?) and low-pass(?o?). The same system is used in a simulation study, to investigate the small sample properties of the estimates. 500 output series of sample size 500 are generated, and subspace algorithms with three dierent weightings are applied to these series: The CCA procedure, Wf+ = I
CONCLUSIONS
In this paper we presented an extension of the central limit theorem given in [1], which can be used to compare dierent weightings for given system corresponding to their asymptotic properties. In the examples it was shown, that the asymptotic properties may re ect the properties of the weighting transfer function kW , and that the nite sample performance using low-pass lters may be poor compared to e.g. the CCA procedure. However, the reader is reminded, that the results only refer to the case, that the true transfer function can be modelled exactly. 6. REFERENCES [1] D. Bauer, M. Deistler, and W. Scherrer. The statistical analysis of subspace algorithms. Technical Report 150, Inst.f. Econometrics, OR and System Theory, TU Wien, 1997. [2] E. J. Hannan and M. Deistler. The Statistical Theory of Linear Systems. John Wiley, New York, 1988. [3] M. Viberg. Subspace-based methods for the identi cation of linear time-invariant systems. Automatica, 31(12):1835{1851, 1995. [4] M. Deistler, K. Peternell, and W. Scherrer. Consistency and relative eciency of subspace methods. Automatica, 31:1865{1875, 1995. [5] M. Viberg, B. Ottersten, B. Wahlberg, and L. Ljung. Performance of subspace based state space system identi cation methods. In Proc. of the 12th IFAC World Congress, volume 7, pages 369{ 372, Sydney, Australia, 1993. [6] B. Wahlberg and M. Jansson. 4sid linear regression. In Proc. of the 33rd Conference on Decision and Control, pages 2858{2863, Orlando, USA, 1994. [7] W. E. Larimore. System identi cation, reduced order lters and modeling via canonical variate analysis. In H. S. Rao and P. Dorato, editors, Proc. 1983 Amer. Control Conference 2., pages 445{451, Piscataway, NJ, 1983. IEEE Service Center.