Fundamental Limitations on the Accuracy of MIMO ... - IEEE Xplore

Joint 48th IEEE Conference on Decision and Control and 28th Chinese Control Conference Shanghai, P.R. China, December 16-18, 2009

WeA14.4

Fundamental Limitations on the Accuracy of MIMO Linear Models Obtained by PEM for Systems Operating in Open Loop Juan C. Agüero, Cristian R. Rojas and Graham C. Goodwin Abstract— In this paper we show that the variance of estimated parametric models for open loop Multiple-Input Multiple-Output (MIMO) systems obtained by the prediction error method (PEM) satisfies a fundamental integral limitation. The fundamental limitation gives rise to a multivariable ‘waterbed’ effect.

I. I NTRODUCTION The study of fundamental limitations has important implications in many fields of study. As an example, we recall the well known Bode “water-bed” effect (and related results) in the area of linear feedback systems (see e.g. [10]). In system identification there are several results regarding fundamental limitations. For example, in [18], [8], [28] a water-bed effect for the accuracy of spectral estimates was established. In [25], [7], [22] integral constraints on the bias for least-squares estimates were presented. Recently, in [24], a “water-bed” effect has been established for system identification which is analogous to the Bode result. The latter result is restricted to single input single output systems. Here, we develop the corresponding results for multiple-input multiple-output systems. We consider both absolute errors as well as relative errors. As part of the development we also present explicit expressions for the parameter covariance matrix for multiple-input multiple-output systems. The latter result is difficult to find in contemporary system identification literature. II. S YSTEM D ESCRIPTION AND N OTATION The system of interest in this paper is as follows: yt = Go (z)ut + vt vt = Ho (z)wt

(1)

where Go (z) and Ho (z) are multivariable transfer functions written in terms of the complex variable1 z. The signals yt ∈ Rny and ut ∈ Rnu are the output and input of the system respectively. wt ∈ Rny is zero mean white noise process with covariance matrix cov {wt } = Γo . We denote the cross spectrum between two (quasistationary) signals zt and xt as: Φzx (ejω ) =

∞ X

Rzx (τ )e−jωτ ,

ω ∈ [−π, π]

(2)

where the cross-correlation is given by: ¯ zt xT Rzx (τ ) = E t+τ

(3)

¯ {·} denotes [21]: and the operator E N X ¯ {st } = lim 1 E {st } E N →∞ N t=1

(4)

E {st } denotes the expected value of st . We denote the transpose of X as X T , conjugate transpose of X as X H , and the conjugate of X as X ∗ . In the Prediction Error Method (PEM) framework [21], the estimate obtained from N input-output measurements is given by: θˆN = arg min l(θ) θ

(5)

where the cost function, l(θ), is given by: N

l(θ) =

1X εt (θ)T Γ−1 o εt (θ) 2 t=1

(6)

and where the prediction error εt is given by: εt (θ) = H(z, θ)−1 [yt − G(z, θ)ut ]

(7)

Here, G(z, θˆN ) and H(z, θˆN ) are models for Go and Ho respectively. Notice that we have chosen the weighting matrix in PEM as Γo (the noise covariance). This choice is not as restrictive as it first sounds. Indeed, note that when wt has a Gaussian distribution and the covariance matrix is iteratively estimated, then the PEM estimate in (5) is also the Maximum Likelihood (ML) estimate. Moreover, it is well known that in ˆ N and θˆN are the ML estimates of Γo and θo the case that Γ respectively, then the accuracy of θˆN is the same irrespective of using the ML estimate of Γo or its true value [6]. We also denote the i − k element of a matrix X as [X]i,k , and vec {X} is the operator that transforms a matrix into a vector by stacking its columns on the top of each other [2].

τ =−∞

III. A SSUMPTIONS Juan C. Agüero and Graham C. Goodwin are with the School of Electrical Engineering and Computer Science, The University of Newcastle, Australia. Email: {juan.aguero,graham.goodwin}@newcastle.edu.au Cristian R. Rojas is with the ACCESS Linnaeus Center, Electrical Engineering, KTH, S-100 44 Stockholm, Sweden. Email: [email protected] 1 This is actually an abuse of notation. The complex variable z in equation (1) should be understood as the forward shift operator (i.e. zyt = yt+1 ).

978-1-4244-3872-3/09/$25.00 ©2009 IEEE

We use the following assumptions in the sequel: Assumption 1: The PEM estimate θˆN in equation (5) converges in distribution as follows: √ d N (θˆN − θo ) − → N(0, Pθ ) (8)

482

WeA14.4 where N(0, Pθ ) denotes a normal distribution with zero mean and covariance matrix Pθ , N is the number of data points, and Pθ is the inverse of the following matrix: ( T ) 1 ∂l(θ) ∂l(θ) Mθ = lim E N →∞ N ∂θ θo ∂θ θo  " #   ∂ε T ∂ε t t ¯ Γ−1 = Pθ−1 (9) =E o  ∂θT θo ∂θT θo  OOO Assumption 2: ut and wt are independent of each other. Indeed ut might be a deterministic (quasi-stationary) sequence. OOO Assumption 3: Go (z), Ho (z) and Ho (z)−1 are analytic for |z| ≥ 1. OOO Assumption 4: H(∞) = Ho (∞) = Iny OOO Assumption 5: There is no under-modeling i.e. there exists a vector of parameters θo such that G(z, θo ) = Go (z), and H(z, θo ) = Ho (z). OOO Assumption 6: The vector of parameters θ is partitioned T as θ = ρT η T , where ρ ∈ Rnρ and η ∈ Rnη are the parameters of G(z, ρ) and H(z, η) respectively. OOO Assumption 7: G(z, ρ) and H(z, η) are differentiable functions of ρ and η respectively. OOO Assumption 8: The input spectrum Φu and the noise spectrum Φv are positive definite for all ω. OOO The ramifications of the above assumptions are explained below: Assumption 1 is satisfied under standard regularity conditions (see e.g. [19], [4], [21]). The expression for Mθ in (9) appears in [21]. In the case that the noise wt is Gaussian, Mθ is usually called the (asymptotic) Fisher information matrix. In the current paper, we use this name even in the case that the noise wt is not Gaussian. Recently, there has been interest in finding alternative expressions for the information matrix (see e.g. [16], [15], [17]). Note that we are assuming that Mθ is non-singular i.e. the vector of parameters, θo , is locally identifiable [27]. This condition depends on the parametrization of the system and persistent excitation [26, section 6.4]. Local identifiability is an important property since it implies that large sample properties of the estimator hold. From a practical point of view, the ML algorithm is well behaved when the parameters are locally identifiable [12, section 3.6]. Assumption 2 implies that the system is operating in open loop. This assumption simplifies the analysis, and allows us to obtain stronger results. Assumption 3 means that the system Go (z) is stable and that Ho (z) is the minimum phase spectral factor of the noise process vt . Assumption 4 can be used, without loss of generality, since it amounts to a normalization for the noise model. This assumption is standard. Assumption 5 is the usual paradigm in System Identification. Assumption 6 sets the class of models as Box-Jenkins (BJ) models (see e.g. [21]). This assumption is useful when the system is operating in open loop since it is then possible to obtain consistent estimates for Go (ejω ) irrespective of under-modeling in Ho (ejω ). Assumption 7 is

used to calculate the variance of G(z, ρ) and H(z, η) by using a first order approximation. Finally, notice that spectra are always positive semidefinite. The stronger condition of Assumption 8 is necessary in the proof of one of the Theorems presented in the paper. IV. P REVIOUS RESULTS The topic of the current paper is related to the general area of experiment design (see e.g. [11], [13], [14], [3], [9]). In this area, one usually designs the experiment (e.g. system input, sampling time, etc.) in order to have “good” estimates. Experiment design has traditionally been developed for Maximum Likelihood (ML) estimates. However, it is convenient to analyze the effect of the covariance of the models identified using different estimation algorithms. In particular, we focus on PEM with a weighting matrix Γo equal to the noise covariance matrix. Also, in Experiment Design, it is common to approximate the covariance matrix for the parameters as [21, chapter 9]: n o 1 (10) cov θˆN ≈ Mθ−1 N Notice that when wt is Gaussian distributed, then the matrix in (9) is the “usual” (per sample) information matrix (see e.g. [11, chapter 6]). System Identification is, in general, concerned with the estimation of dynamic systems. Thus, it is important to also obtain expressions for the covariance of the transfer functions G and H. Since G and H are complex matrices, we use the vec {} operator to transform them into vectors. We then define the covariance matrix of a random vector as follows: Definition 1: The covariance matrix for a complex random vector X is given by: cov {X} = E [X − E {X}][X − E {X}]H (11) OOO We combine G and H by defining the following complex random vector h iT Π(ejω , θ) := vec G(ejω , θ) T vec H(ejω , θ) T (12) and use the Delta-method [5, page 243] to obtain the ˆ jω ) = Π(ejω , θˆN ): covariance of Π(e H n o ∂Π(ejω , θo ) 1 ∂Π(ejω , θo ) jω ˆ Pθ cov Π(e , θN ) ≈ N ∂θoT ∂θoT (13) A. Asymptotic results in the number of parameters It is well known that, asymptotically, as the number of parameters goes to infinity, the covariance of the dynamic ˆ jω ) operating in open loop is given by (see [20], system Π(e [29] and [31]): −T n o n Φu (ejω ) 0 jω ˆ cov Π(e ) = ⊗ Φv (ejω ) (14) 0 Γo N where ⊗ denotes Kronecker product. and n is the order of the system. The previous formula for the covariance of

483

WeA14.4 a dynamic system provides useful insight into the design of identification experiments. However, in [23] it has been pointed out that these results might be misleading for low order models. In addition, it is suggested in [14] that it is preferable to develop experiment design directly from (13). Here we will pursue results that are non-asymptotic in the number of parameters. B. Water-bed effect in system identification In [18] it was established that a measure of the spectral error is bounded as follows: Z π n 2 o 1 E tr Φ(ejω , ηo )−1 Φ(ejω , ηˆ) − Φ(ejω , ηo ) 2π −π 2nη ≥ (15) N where η ∈ Rnη is the vector of parameters that define the spectrum. This result has been derived for the estimation of multivariable spectra. Notice that {X}2 in (15) should be understood as {X}2 = XX H . In [8] a similar result was established for the case of scalar spectral estimates: Z π 2nη 1 var log Φ(ejω , ηˆ) ≥ (16) 2π −π N Notice that the result in (16) can be obtained by using the Delta-method for log[Φ(ejω , ηˆ)] in (15). In [28] scalar systems were studied and the result presented in [18] was interpreted as a water-bed effect for efficient estimates where the inequality in (15) becomes an equality. Recently, in [24], a water-bed effect for estimates accuracy for scalar systems driven by noise and an external input was presented. This water-bed effect represents a fundamental limitation for the estimation accuracy for spectral estimates and for dynamic single-input single-output (SISO) models. We repeat the result below for completeness. Theorem 1: In open loop identification, where G and H are independently parametrized with nρ and nη parameters respectively, and where (G(q, θG ), H(q, θH )) are parameter identifiable under Φu for the ML method [26], then2 Z π n o n Φu 1 ˆ = ρ, var G (17) 2π −π Φv N Z π 2 n o n 1 σ ˆ = η. var H (18) 2π −π Φv N Proof: See [24]. The result presented in Theorem 1 provides a restriction on the covariance of models obtained by ML. For example, if we change the parametrization of G(ρ), then (provided that the number of parameters is unchanged and that Φu and Φv remain the same) the variance of the dynamic model, G, cannot be improved at all the frequencies. 2 For simplicity of notation, we omit the argument and range of integration. Unless it is explicitly shown, they are ω and [−π, π]. Moreover the argument ejω will be omitted when there is no risk of confusion.

V. T HE INFORMATION MATRIX FOR A MIMO BJ MODEL The frequency domain representation of the information matrix for SISO system is well known (see e.g. [1]). Extensions to a class of multivariable systems can be found in [16]. In [30], [17] the (asymptotic) Fisher Information Matrix (FIM) for a multivariable ARMAX model (VARMAX) is presented. In the following Theorem we develop a frequency domain expression for the information matrix of MIMO BJ models. (To the best of our knowledge this result is novel. We present it here mainly for completeness.) Theorem 2: Under assumptions 1 to 7, the per sample information matrix is given by: M ρρ M ρη Mθ = (19) M ρη M ηη where the elements (r, m) of the matrices M ρρ , M ρη , and M ηη denoted by [M ρ,ρ ]r,m , [M ρ,η ]r,m , [M η,η ]r,m are given respectively by: Z π n o 1 ˙ m Φu G˙ H [M ρρ ]r,m = tr Φ−1 G (20) v r 2π −π [M ρη ]r,m = 0 (21) Z π n o 1 ˙ ˙ H tr Φ−1 (22) [M ηη ]r,m = v Hm Γo [Hr ] 2π −π where ∂G ˙ Gk = , ∂[ρ]k θo

∂H ˙ Hk = ∂[η]k θo

(23)

Φu is the input spectrum and Φv is: Φv = Ho Γo HoH

(24)

Proof: We follow the analysis in [30], [16], [21], [15], [17]. Due to space limitations the proof will not be included. However, the extension is relatively straightforward based on the analysis developed for ML and VARMAX models in [30]. VI. WATER - BED EFFECT FOR MIMO BJ MODELS In the previous section, we have analyzed the information matrix for the parameters of the MIMO model in (1). We next present a result showing the fundamental limitation that exists on standard model errors for MIMO systems. Theorem 3: Under assumptions 1 to 8, the covariance of estimates for G and H satisfy the following: Z π n n n ooo n 1 ρ ˆ tr [ΦTu ⊗ Φ−1 = (25) v ]cov vec G 2π −π N Z π n n n ooo n 1 η ˆ tr [Γo ⊗ Φ−1 ]cov vec H = (26) v 2π −π N where Φu is the input spectrum, and Φv is as in equation (24). Proof: See Appendix. We next present a corollary showing a multivariable “water-bed” effect for affine transformations of G and H.

484

5

5

4.5

4.5

4

4

3.5

3.5

Normalised Variance of G2

Normalised Variance of G1

WeA14.4

3 2.5 2

3 2.5 2

1.5

1.5

1

1

0.5

0.5

0

0

1 2 Normalised Frequency [rad/s]

3

where b01 = 2, b02 = 1, a0 = 0.4, and the following two alternative model structures h i b1 z −1 b2 +b3 z −1 G1 (z, θ1 ) = 1−a , H1 (z) = 1 (34) −1 −1 z 1−a1 z i h 1 −1 −1 b2 z b1 z , H2 (z) = 1 (35) G2 (z, θ2 ) = 1−a −1 1−a2 z −1 1z

0

0

1 2 Normalised Frequency [rad/s]

3

Fig. 1. (Normalised) variance of the transfer function estimators of G, based on the model structures S1 (left) and S2 (right), as functions of ω. Covariance of the first element of G is in red (dashed line), and the one for the second element of G is in blue (solid line).

ˆ jω ) be given Corollary 1: Let the vectors gˆ(ejω ), and h(e by n o ˆ jω ) − Go (ejω ) + go (ejω ) (27) g(ejω ) = A(ejω )vec G(e n o ˆ jω ) − Ho (ejω ) + ho (ejω ) h(ejω ) = D(ejω )vec H(e (28) where A(ejω ) and D(ejω ) are deterministic non-singular matrices (almost everywhere), and go (ejω ) and ho (ejω ) are deterministic vectors. Then, n o ˆ = h0 as N → ∞ E {ˆ g } = g0 , E h (29) ˆ jω ) satisfy the Moreover, the covariance of gˆ(ejω ) and h(e following water-bed effect Z π nρ 1 −1 tr A−H [ΦTu ⊗ Φ−1 cov {ˆ g} = (30) v ]A 2π −π N Z π n n oo n 1 η −1 ˆ tr D−H [Γo ⊗ Φ−1 cov h = (31) v ]D 2π −π N n o ˆ − Go , we have that: Proof: Solving for vec G n o ˆ − Go = A−1 [ˆ vec G g − go ]

This result means that it is not possible to reduce the variance ˆ at all frequencies by choosing a suitable model strucof G ture, since if we reduce the variance at some frequencies, it will necessarily increase at others. This illustrates the multivariable ‘water-bed’ effect which is the subject of the current paper. VII. WATER - BED EFFECT FOR RELATIVE ERRORS OF MIMO BJ MODELS In the previous section, we have analyzed fundamental limitations on the covariance of the estimators for MIMO dynamical systems. Relative errors are especially important. Indeed, when comparing multivariable quantities, a small error can either be insignificant (if it relates to a normally large quantity) or dramatically significant (if it relates to a normally small quantity). As an aside, we note that relative errors underpin typical uses of estimated models e.g. robust control[10]. To illustrate, let us define the left multiplicative error by ˆ jω ) Go (ejω ) = [I + G∆l (ejω )]G(e

(36)

ˆ jω )]G(e ˆ jω )−1 G∆l (ejω ) = [Go (ejω ) − G(e

(37)

i.e.

(32)

Next, using Theorem 3, and Lemma 3 (in the Appendix) we ˆ jω ) follows a obtain the result for gˆ(ejω ). The proof for h(e similar procedure.

Then, it is well known [10, page 642], that a sufficient condition for robust stability is that σ(G∆l (ejω )Tˆ(ejω )) < 1,

∀ω ∈ [−π, π]

(38)

where Tˆ(ejω ) is the nominal complementary sensitivity i.e.

MIMO example: Consider a system described by h 0 −1 i b1 z b02 z −1 G0 (z) = 1−a , 0 z −1 0 −1 1−a z

where θ1 := [b1 b2 b3 a1 ]T and θ2 := [b1 b2 a1 a2 ]T . Notice that both model structures, S1 and S2 , have 4 parameters and include the true plant. For Γo and Φu (ejω ) equal to the identity matrix, the normalised (i.e. multiplied by N ) variances of the transfer ˆ 1 (ejω ) and G ˆ 2 (ejω ) are shown in function estimators G Figure 1. From the figure, we see that the variances are different functions of frequency. In particular, the covariance ˆ 1 and G ˆ 2 are the same for both of the first elements of G parametrizations. On the other hand, the covariance of the ˆ 1 is smaller than the covariance of second element of G ˆ 2 at low frequencies and larger at the second element of G high frequencies. This is consistent with the fundamental limitation derived in Theorem 3, namely that Z π n ooo n n 4 1 ˆ i (ejω ) dω = , i = 1, 2. tr cov vec G 2π −π N

ˆ jω )C(ejω )[I + G(e ˆ jω )C(ejω )]−1 Tˆ(ejω ) = G(e H0 (z) = 1

(33)

(39)

Thus, we see from (38) that relative model errors are the key issue in MIMO robust stability.

485

WeA14.4 We next present a Theorem that shows how the water-bed trade-off applies to the relative errors of the estimates for MIMO BJ models. Lemma 1: Under assumptions 1 to 8, we have that the relative error for the estimates for Go and Ho , satisfies the following water-bed effect: • If Go is square and invertible almost everywhere, then: Z π nρ 1 l tr [G∗o Φu GTo ⊗ Φ−1 (40) v ]RG = 2π −π N Z π nρ 1 T r tr [Φu ⊗ G∗o Φ−1 (41) v Go ]RG = 2π −π N where l RG

n o n oH −1 −1 ˆ ˆ := E vec [G − Go ]Go vec [G − Go ]Go

(42) n o n oH r ˆ − Go ] vec G−1 [G ˆ − Go ] RG := E vec G−1 [ G o o

(43) •

Similarly, the relative error for the estimate of Ho satisfies the following: Z π nη 1 l (44) tr [Ho∗ Γo HoT ⊗ Φ−1 v ]RH = 2π −π N Z π nη 1 T r (45) tr [Γo ⊗ Ho∗ Φ−1 v Ho ]RH = 2π −π N where n H o l RH := E vec [H − Ho ]Ho−1 vec [H − Ho ]Ho−1 (46) n −1 −1 H o r RH := E vec Ho [H − Ho ] vec Ho [H − Ho ] (47)

Proof: The proof of (40) follows by using Corollary 1 with A = [G−T o ⊗I], go = 0 and using Lemma 4. The remainder of the proof follows a similar procedure. VIII. C ONCLUSIONS In this paper we have established fundamental limitations on the variance of the frequency response of estimated parametric models for MIMO systems operating in open loop. We have illustrated the results via an example which shows the trade-offs imposed by the fundamental limitations, and also illustrates the ‘water-bed’ effect in system identification for MIMO systems. R EFERENCES [1] J. C. Agüero and G. C. Goodwin. Choosing between open and closed loop experiments in linear system identification. IEEE Transactions on Automatic Control, 52(8):1475–1480, 2007. [2] D. S. Bernstein. Matrix mathematics: Theory, facts, and formulas with application to linear systems theory. Princeton University Press, 2005. [3] X. Bombois, G. Scorletti, M. Gevers, P. M. J. Van den Hof, and R. Hildebrand. Least costly identification experiment for control. Automatica, 42(10):1651–1662, 2006. [4] P. Caines. Linear Stochastic Systems. John Wiley & Sons, 1988. [5] G. Casella and R. L. Berger. Statistical inference. Duxbury, second edition, 2002.

[6] D. R. Cox and N. Reid. Orthogonality and approximate conditional inference. Journal of the Royal Statistical Society. Series B (Methodological), 49(1):1–39, 1987. [7] B. de Moor, M. Gevers, and G. C. Goodwin. L2 -overbiased, L2 underbiased and L2 -unbiased estimation of transfer functions. Automatica, 30(5):893–898, 1994. [8] B. Friedlander and B. Porat. A general lower bound for parametric spectrum estimation. IEEE Transactions on Accoustics, Speech, and Signal Processing, 32(4):728–733, 1984. [9] M. Gevers, L. Miskovic, D. Bonvin, and A. Karimi. Identification of multi-input systems: variance analysis and input design issues. Automatica, 42(4):559–572, 2006. [10] G. C. Goodwin, S. F. Graebe, and M. E. Salgado. Control System Design. Prentice Hall, Upper Saddle River, NJ, 2001. [11] G. C. Goodwin and R. Payne. Dynamic System Identification: Experiment design and data analysis. Academic Press, 1977. [12] A. C. Harvey. The econometric analysis of time series. The MIT Press, 2nd edition, 1990. [13] H. Hjalmarsson. From experiment design to closed-loop control. Automatica, 41(3):393–438, 2005. [14] H. Jansson and H. Hjalmarsson. Input design via LMIs admitting frequency-wise model specifications in confidence regions. IEEE Transactions on Automatic Control, 50(10):1534–1549, 2005. [15] A. Klein. A generalization of Whittle’s formula for the information matrix of vector-mixed time series. Linear algebra and its applications, 321(1-3):197–208, Dec 2000. [16] A. A. B. Klein and G. Mélard. The information matrix of multipleinput single-output time series models. Journal of Computational and Applied Mathematics, 51:349–356, 1994. [17] A. A. B. Klein and P. Spreij. Matrix differential calculus applied to multiple stationary time series and an extended Whittle formula for information matrices. Linear Algebra and its Applications, 430(23):674–691, 1 2009/1/15. [18] W. E. Larimore. A survey of some recent developments in system parameter estimation. In Proceedings of the 6th IFAC Symposium on Identification and Parameter Estimation, pages 979–984, Arlington, VA, 1982. [19] L. Ljung. Convergence analysis of parametric identification methods. IEEE Transactions on Automatic Control, AC-23:770–783, 1978. [20] L. Ljung. Asymptotic variance expressions for identified black-box transfer function models. IEEE Transactions on Automatic Control, 30(9):834–844, 1985. [21] L. Ljung. System Identification: Theory for the user. Prentice Hall, 2nd edition, 1999. [22] B. Ninness. Integral constraints on the accuracy of least-squares estimation. Automatica, 32(3):391–397, 1996. [23] B. Ninness, H. Hjalmarsson, and F. Gustafsson. The fundamental role of general orthonormal bases in system identification. IEEE Transactions on Automatic Control, 44(7):1384–1406, 1999. [24] C. R. Rojas, J. S. Welsh, and J. C. Agüero. Fundamental limitations on the variance of estimated parametric models. IEEE Transactions on Automatic Control, 54(5):1077–1081, 2009. [25] M. E. Salgado, C. E. de Souza, and G. C. Goodwin. Qualitative aspects of the distribution of errors in least squares estimation. Automatica, 26(1):97–101, January 1990. [26] T. Söderström and P. Stoica. System identification. Prentice-Hall International, 1989. [27] V. Solo. Topics in advanced time series analysis. Lectures in probability and statistics, eds. G del Pino and R. Rebolledo. Lecture notes in mathematics, 1215:165–328, 1986. [28] P. Stoica, J. Li, and B. Ninness. The waterbed effect in spectral estimation. Signal Processing Magazine, IEEE, 21(3):88–100, 2004. [29] Z. D. Yuan and L. Ljung. Black-box identification of multivariable transfer functions–asymptotic properties and optimal input design. International Journal of Control, 40(2):233–256, 1984. [30] M. B. Zarrop, R. L. Payne, and G. C. Goodwin. Experiment design for time series analysis: The multivariate case. SIAM Journal of Applied Mathematics, 37(2):370–381, 1979. [31] Y. Zhu. Black-box identification of MIMO transfer functions: asymptotic properties of prediction error models. International Journal of Adaptive Control and Signal Processing, 3:357–373, 1989.

486

WeA14.4 A PPENDIX A. Proof of Theorem 3 Proof: Using Lemma 2, we have that Φv and Φu can be rewritten as: H/2 Φv = Φ1/2 v Φv

(48)

H/2 Φ1/2 u Φu

(49)

Φu =

We then define X as follows: X = Φ−1/2 GΦ1/2 v u

(50)

We thus have that the r − m element of the matrix M ρρ can be written as: Z π n o 1 H/2 ˙ H −H/2 tr Φ−1/2 G˙ m Φ1/2 Gr Φv [M ρρ ]r,m = v u Φu 2π −π (51) =

1 2π

=

1 2π

(

H ) ∂X ∂X tr ∂ρm θo ∂ρr θo −π Z π X ∂[X]i,k ∂[X]∗i,k ∂ρm θo ∂ρr θo −π Z

π

(52) (53)

i,k

where (51) is obtained by using (48) and (49) and Theorem 2. Equation (52) is obtained by using the definition of X in (50). Equation (53) is obtained by using Lemma 3. We also have that, by using the “Delta method”, the covariance of the i − k element of X is given by: 1 X ∂[X]i,k ∂[X]∗i,k ρρ −1 cov {[X]i,k } = [(M ) ]r,m N r,m ∂ρr θo ∂ρm θo (54)

where (55) is obtained by definition of trace. Equation (56) is obtained by rewriting the sum of the covariance of all the elements of X in P a matrix fashion. Equation (57) follows by changing the operators. Equation (58) follows by R P changing and operators. Equation (59) follows by using (53). Equation (60) follows by using Lemma 3. Equation (61) ρρ −1 ρρ follows by noting that (M ) M = Inρ . Equation (62) follows by using tr Inρ = nρ . Finally we have that: Z π 1 tr {cov {vec {X}}} 2π −π Z π n ooo n n 1 1/2 (63) = tr cov vec Φ−1/2 GΦ v u 2π −π Z π n n oo 1 = tr cov [ΦTu /2 ⊗ Φ−1/2 ]vec {G} (64) v 2π −π Z π 1 = tr [ΦTu ⊗ Φ−1 (65) v ]cov {vec {G}} 2π −π where (63) is obtained by using the definition of X. Equation (64) follows by using Lemma 4. Equation (65) follows by using Definition 1, Lemma 3, and equations (48) and (49). The proof for the limitation on the covariance of H follows a similar procedure. B. Technical Lemmas Lemma 2: Let A ∈ Cn×n be a positive semidefinite matrix with rank rank {A} = r. Then, there exists B ∈ Cn×r such that A = BB H . Proof: See [2, page 218]. Lemma 3: X tr {AB} = tr {BA} = Aik Bki (66)

Then i,k Z π 1 tr {cov {vec {X}}} Proof: See [2, page 22] 2π −π Z π X Lemma 4: Let A, B, C be matrices of appropriate di1 = [cov {vec {X}}]i,i (55) mensions, then the vec {·} operator, and Kronecker product 2π −π i satisfy the following properties: Z π X 1 1) vec {ABC} = [C T ⊗ A]vec {B} = cov {[X]i,k } (56) 2) [A ⊗ B][C ⊗ D] = [AC ⊗ BD] 2π −π i,k 3) [A ⊗ B]H = [AH ⊗ B H ] Z π X ∗ X ∂[X]i,k ∂[X]i,k 1 1 4) [A ⊗ B]−1 = [A−1 ⊗ B −1 ] = [(M ρρ )−1 ]r,m 2π −π r,m N ∂ρr θo ∂ρm θo Proof: i,k (57) 1) See [2, page 249]. Z π X 2) See [2, page 248]. 1 X 1 ∂[X]i,k ∂[X]∗i,k ρρ −1 = [(M ) ]r,m 3) See [2, page 248]. N r,m 2π −π ∂ρr θo ∂ρm θo i,k

(58) 1 X [(M ρρ )−1 ]r,m M ρρ m,r = N r,m 1 ρρ −1 ρρ tr (M ) M N 1 = tr Inρ N nρ = N =

(59) (60) (61) (62)

487