Vector ARMA estimation: an enhanced subspace ... - IEEE Xplore

6 downloads 0 Views 1MB Size Report
Petre Stoica. Jorge Mari. Systems and Control Group. Uppsala University. PO Box 27. SE-751 03 Uppsala. Sweden. Sweden. Optimization and Systems Theory.
ThP05

Proceedingsof the 38" Conference on Decision & Control Phoenix, Arizona USA December 1999

17:20

Vector ARMA Estimation: an enhanced subspace approach Petre Stoica Systems and Control Group Uppsala University PO Box 27 SE-751 03 Uppsala Sweden

Jorge Mari Optimization and Systems Theory Royal Institute of Technology SE-100 44 Stockholm Sweden

Tomas McKelvey Division of Automatic Control Linkoping University SE- 581 83, Linkoping Sweden. Abstract

A parameter estimation method for finite-dimensionalmultivariate linear stochastic systems is presented which is guaranteed to produce valid models close enough to the true underlying model, in a computational time of at most a polynomial order of the system dimension. This is achieved by combining the main features of certain stochastic subspace identification techniques together with sound statistical order estimation methods, matrix Schur restabilization procedures and multivariate covariance fitting, the latter formulated as linear matrix inequality problems. In this paper we make emphasis on the last issues mentioned, and provide an example of the overall performance for a multivariable case.

1 Introduction The problem of estimating the parameters of ARMA signals has received significant attention in the literature for more than fifty years, and a great number of ARMA estimation methods have been proposed. Even so, the search for a satisfactory ARMA estimation method has not stopped. Indeed, all existing solutions to this problem appear to suffer from one or more drawbacks.

All computationsperformed in polynomial time. This is an important property: note for instance, that one can use global search algorithms with the ARMA estimation methods requiring multidimensional search in an attempt to eliminate convergence problems. However these methods have non-polynomial algol rithmic complexity, so the practical computation time required may be prohibitively long. ARMA parameter estimates with satisfactory accuracy should be obtained in the small or medium-sized sample cases, and the errors in the approximations should vanish asymptotically as the number of data samples increases. The objective of this paper is to present a vector ARMA (VARMA) parameter estimation method enjoying the qualities mentioned above.

2 Data models and problem formulation Consider a vector-valued stochastic process y ( t ) , t E 74 which is assumed to have a rational spectral density. Under these conditions, the process admits a finite-dimensional state-space innovation representation as follows: x(t+l)

More exactly, to our knowledge, no method possessing the following combined desiderata was available. 1. No need for using canonical parameterizations; the only structural parameter to be determined is the ARMA order. 2. No hard-failures: the method should always provide an estimated valid ARMA model, which should be stable and minimum phase if so desired. 0-7803-5250-5/99/$10.000 1999 IEEE

= Ax(t)+Ke(t)

(nxl)

(1)

Here x ( t ) is the state vector and A,K,C are matrices of suitable dimensions (implicitly indicated in (l)), whereq the vector valued sequence e ( t ) is the white innovation sequence. We assume zero-mean processes throughout. The innovations covariance matrix is denoted by Q and is assumed non-singular: E[e(t)eT(s)l =

3665

(2)

with being the Kronecker delta function. The stationarity assumption is equivalent to A being a Schur stable matrix. For later developmentwe also need to assume that (1) is a minimum degree representation of y ( t ) in the stochastic sense. The problem considered in this paper is to estimate the order n and the matrices A,K,C and Q in (I), (2), from N observations of a sample realization of y ( t ) , such that the second order statistics of the identified model match those of the measured one. We do not use any special parameterization (canonical or otherwise) of A,K and C. Hence, n is the only structural parameter of (1) we need to find. The estimation of n from the available data will be discussed briefly. A stochastic process like the y ( t ) considered here admits in general several Markovian representations of type (1). For computational purposes, later on, we will find it convenient to work also with the following state-space representation for y ( t ) : f(t+

1)

= & ( t ) + w ( t ) ( n x 1)

where E denotes the expected value with respect to the underlying probability measure. The Rk are the process covariances. It then holds that

Rk = CAk-'D = CAk-'b k = 17 2 and R-k = R:. From (1) we get

= APA~+KQK~

P

(8)

Ro = CPCT+Q, and similarly from (3) we obtain

B

= APAT+1711

(9)

Ro = C w T + I 1 2 2 . It follows from (6) and (8) that

K =(D-APC~)Q-'

(10)

Q = RO- CPCT.

(1 1)

and, respectively,

Finally, insertion of (10) and (1 1) into (8) yields the following discrete algebraic Riccati equation (DARE) for P :

+

P = A P A ~ (D-APC~)(R~-CPC~)-~(D-APC~)~. (12)

(3)

with A and C as before, a new state vector x'(t),and general zero-mean noise processes satisfying

(7)

, * . e

Next we note that the minimality assumption introduced in Section 2 implies that the following observability matrix

The matrix ll can be partitioned as follows:

(5) nY Note that Il is necessarily a positive semidefinite matrix, 2 0, and also that it is possible to show the existence of a generally nontrivial entire set of matrices ll [2] parameterizing a fixed rational spectral density O(e'"). Despite the aforementioned non-uniqueness of either (1) or (3), our method is able to estimate the unknown parameters (A,K , C , Q ) - modulo a similarity transformation T- in a reliable manner, as we will explain shortly.

has full (column) rank. This fact, along with (7) implies ai once that b = D . Hence, besides the expression in (6) D is also given by

D = b = ABCT + ll12

(14)

By the same minimality assumption, the constructibility matrix

rT= [D

AD

Am-'D]

mzn

(15)

must also have full (row) rank. It readily follows from (7) that the matrices SZ and r introduced above can be used to write down the following factorization of the block-Hankel matrix of covariances:

3 Preliminary results Fort, k E Z (integers) define,

Rk P

P

D

b

=

Eb(t)YT(t-k)l

= E[x(t)xT(t)]

= E[i(t)iT(t)] = APCT+KQ = APCT+Il12.

The aforementioned rank properties of SZ and with (16) imply that rank(R) = n for m 2 n

r together (17)

which ends the presentation of the preliminary results needed.

3666

4 Parameter estimation: theoretical aspects In most of this section we assume that the theoretical VARMA covariances are given. The extension to the practically relevant case when only a sample { y ( t ) } L l is available is deferred to the next section.

4.1 Determination of n, A and C According to (17), the ARMA order n can be obtained as the rank of R, the block-Hankel matrix of covariances, if the order of the system is no more than half the length of the available covariance segment. However, our method is able to derive an approximate underlying model even if this non-testable assumption is not fulfilled. The subspace approach we consider derives A,C and D from the bases generating the range subspacesof R and R T . To understand how this can be done, consider the singular value decomposition of R:

R =U W T

(18)

where X is a n x n nonsingular diagonal matrix (we use the value of n coming from the rank of R), and U and V are UV ~ = V z~.Comparing semi-orthogonal matrices: U ~ = (16) and (18) we obtain: SL = UZ'I2T

(19)

for some nonsingular n x n matrix T , because the columns of both Q and UC'I2 form a basis for the range space of R. Henceforth, Z1I2denotes the positive square root of Z. Inverting (19) in the equation SLr' = UZVT yields r T = T-lXlPVT,

Next observe that

QT-'=

[

(CT-I)

(TAT-')

(CT-I)

(CT-' ) (

(20)

- a -

4.3 The role of the state-space equation (3) It is still unclear where the state-space equation (3) comes into the picture. In practice, we do not have an exact covariance sequence nor may we have an a priori knowledge about n. The discussion to follow makes the connection with the implementation aspects of the method that are presented in the next section. From { y ( t ) } E l we can obtain the following sample covariances:

& = -l

*

]

(TAT-')m-'(TD)]. (22)

N y(t)y'(t-k)

k=0,

...,2m-1.

r=k+l

tif, =&,

& =(?dk-'b, k =

1,2,...

c

= the first ny rows of uz'l2 D = the first ny columns of 2

~ T

(23)

The matrix A can then be obtained as the solution to the following system of linear equations:

(VZ'l2)A =

(24)

where and U are the matrices made from the first and, respectively, the last m - 1 block rows of U .

3667

(26)

need not belong to the set of covariances of all minimal degree-n VARMA signals. If this happens, no ARMA model of order n can match the estimated k= 0,... ,2m - 1, and the Riccati equation (12) cannot have any positive definite solution P. This in turn means that K and Q cannot be determined as indicated above, and hence, the , "subspace approach" has reached a hard failure mode.

l?i,

The problem of reestimating D under an appropriate constraint can be reduced to that of estimating l2 in (3) (with A = and C = under the simple constraint that II 2 0, which, as we shall see in the next section, can be easily handled.

e)

4.4 Estimation of lJ For presentation reasons, consider once again the case where theoretical exact covariances are available. Since A and C are fixed, we can perform the following left coprime factorization: C(zZ-A)-' = F-'(z)G(z).

Setting T = inI (19) and (22)resultsinthefollowing choice of matrices:

(25)

The estimation errors in {&} will induce errors in the estimated matrices d,e and 8.Owing to these errors the synthesized sequence

a

and

TrT = [(TD)

4.2 Determination of Q and K Once A,C and D have been determined, P can in principle be obtained as the stabilizing positive semidefinitesolution of the Riccati equation (12) and then Q and K are derived from (IO), (1 1).

(27)

Here F(z) and G ( z ) are polynomial matrices where any greatest common left divisor of F(z) and G ( z )is unimodular (see e.g. [3]). Combining (27) and (3) yields the following equation for y(t):

I;:[:[

~ ( t=) [F-'(z)G(z) Iny]

.

(28)

We could then obtain Il by matching the spectra of the two sides of (28), or of the following equivalent equation:

z-'F(z)y(t) = [z-'G(z)

z-'F(z)]

.

(29)

The latter equation is more convenient than (28) for fitting purposes, since both sides of (29) are moving averages and hence much more easily handled than those in (28).

also be justified as a special weighted approximation problem when finding the A matrix in a subspace identification method as we discuss in [4]. Problem (33) is regularized by introducingan auxiliary pos: itive scalar v and requiring that:

5 Parameter estimation: practical issues

P-APAT

{&}E;'

We begin by computing the sample covariances using (25) for a sufficiently large m. There are several rules of thumb concerningthe choice of m.

We need a statistical test on 8 (the estimated block-Hankel matrix of (16)) to infer the rank of R. We refer the reader to the recent survey [5] for details. Let ri be the estimate of n returned by one of the algorithms in the cited reference. Also, for later use, let denote the truncated SVD of k in which only the largest ri singular values have been kept.

P

2 VI 2 VI

Introducing the auxiliary matrix X = AP and using standard transformations in mathematical programming we can rewrite problem (33) as follows:

Y

min

XPJ

o%vT

I

[

P-VI XT

Following (23) and (24) we set

i. B

= the first ny rows of 0%'l2 = the first ny columns of t1I2V

(30)

A

and obtain an estimate of A as the least-squares(or total least squares) solution to the following approximate system of linear equations:

(34)

(35)

x

P]

This is a so-called semidejinite programming (SP) problem that can be solved in a number of iterations that is polynoi mial in the size of the problem. See [ 13. Let k and P be the values of X and P returned by the solver of (35). Then

A = xp-1

(36)

which concludes the description of the stabilizing procedure. There is empirical evidence suggestingthat the accuracy of estimatesb, and f3 is enhanced if instead of using the matrix l?directly, we use a "normalized version" of i? as done in the canonical correlation analysis literature. For presentation reasons we refrain from doing so.

5.1 Enforcing the stabdity of A There is no reason why the solution of (31) should be Schur stable. Consider the problem of finding an approximant to b such that A is close to A and stable. It is well known that d is strictly Schur stable, if and only if there exists a positive definite matrix P such that P-APAT

> 0,

(32)

where, as is customary, X > 0 means that X is a symmetric positive definite matrix. The introduction of the new variable P allows us to express the stability condition on A in the convenientform of the Lyupunov inequality (32). Using this we propose determining a stable approximant to A by solving the following problem min

I/@- A)PIIz

s.t.

P >0 P - A P A ~> o

A,P

(33)

The weighting by P is done for computational convenience (the resulting problem is expressableas a standardsemidefinite programming problem (SP), see below), but it can

5.2 Estimation of K and Q First we determine the stabilizing positive definite solution P of the DARE (12) with A,C,D and RQ replaced by their estimates:

P =

AP~T+(B-~P'T)(fiO-~~'T)-l(a-~P'T)T.

(37) Of course if d were unstable, would be used. Then we obtain estimatesof K and Q using (1l),(10): Q = =

R

&-'PP (b--APP)Q-!

(38) (39)

This concludesthe estimation procedure. As already explained, the final step cannot be completed whenever the synthesized sequence (26) is not a valid segment of the covariance sequence of a VARMA of the estimated order ri. When this happens, we recourse to the procedure outlined in subsection 4.4 and described in full detail in the next section.

5.3 Re-estimation of D when needed We assume here that the attempt to find a stabilizing P satisfying (37) has failed.

3668

>0

c(z)

Let R(z) and be the polynomials obtained from (27), when C and A have been replaced by their estimates. Also define

e(z-') 9 [z-'e(z)

e z-'R(z)] =

k=O

&z-k

of order fi. If one wishes to obtain a square minimum phase model as the result of the identification pethod it is necessary to obtain a state covariance matrix P as the stabilizing solution to

(40)

P =

The "true" values of e k are denoted by Lk. Finally, let { ~ k denote the covariances of either side of (29). A straightforward calculation yields that the stacked covariances are given by:

}

[

LI]lJ,

(42)

Here, 18is the Kronecker product symbol, Lj = 0 for j > l , andJis the ( r ~ + n ~ (n+ny)(n+ny+1)/2matrix )~x defined through the equality vec(l7) = J p . The covariances {sk} can be estimated from {&} and { y ( t ) ) by considering this time the left hand side in (29). Define the process S(t)

pjY(f

j=O

-j >

(43)

and compute its covariances& with the usual ergodic estimate. We derive an estimate of l2 by solving the following covariance matching problem:

(44) where 9 is the matrix Y made from the {&}. Next, from the resulting fi coming from (44), together with d and e, we obtain P as the positive definite solution to the

Lyapunov equation P =ApAT + f i l l (see ( 9 ) the ~ existence of which is guaranteed by the (possibly enforced) stability of A. We also construct

ko = CPCT + fi22

(45)

and from (14)

b =A P P + 0 1 2 . (46) By construction, {&,CAk-'b k = 1,2,...} is the covari-

ance sequence of a VARMA model. Generically it will be

3669

-1

(b-A P & T ) T

(47) and then, finally, determine estimates of Q and K as in (1 l), (lo), i.e.,

6

Lp+e

e

+ (b-APCT)(ko - & C T )

2

is the vector comprising the elwhere /3 E W("fn~)("Cnyf*)/2 ements on and above the main diagonal of l2 (e.g. rowwise), a n d t h e n ? ( l + l ) x (n+ny)(n+ny+1)/2matrixY"sobtained as:

Y = p=o Ze [ L p @

APAT

= &o-&?

=

(48)

(b-,&T)Q-'.

This completes the description of our VARMA estimation approach. For a complete description of the algorithm see 141.

6 Numerical results We illustrate our procedure for a system whose transfer function H ( z ) is 2 x 2, i.e., a true multivariable case. Let then each entry of H(z) be:

22+0.5451z+0.8406 z2 - 0.89362 0.9702 0.83342 0.02877 h21(z) = z2 - 0.89362 0.9702 (49) 0.88542+0.1189 hi2(z) = z2 0 . 7 6 + ~ 0.986 z2 +0.5053z+0.7187 h22(z) = z2 0.762 0.986 A two-dimensional pseudo random white noise sequence was generated, and passed through the filter H(z), to obtain 500 data points of a stochastic process y ( t ) . 100 different noise cases were simulated. The Hankel matrix chosen was square with 8 block rows, and the model order chosen n = 4. We do not consider the model order selection probr lem now. Figures 1 and 2 show the averages of the diagonal elements of the identified spectral densities (solid lines), superposed with the true diagonal elements (dashed-dotted), both for the cases that needed corrections and those which did not need. The dashed curves correspond to the average estimated spectral density plus and minus the statistical standard deviation (all in dB). In Figure 3 we show the maximum singular value of the difference of spectral densities (identified and true), averaged over all realizations (with and without correction), and normalized with respect to the maximum singular value of the original spectral density. As can be seen, the recovered systems when the correction mechanisms come into action are not worse than the identified systems when no corrective measures are needed. The directional errors are large at the frequencies corresponding to the arguments of the poles of the system. hi1 (z)

=

+

+ +

+

+

+

The algorithm was implemented in MATLAB and relies on MATLAB's LMI Control Toolbox. The interested reader

!

R8alLelkms lhal reguindcmmclh 51 X

........... ..............................

............. ....

I .......

............ RsalirariDns m mnsdDn needed 49 X

........ 30 . . . . . . . .

........

g25

. ; ......... ,.:

. . . . . . . .1. . . . . . . . . . . . . . . . . . . . . ............. ...... , _ . . . . . . . . . . . . . . . . :, . . . . . . . . . . . . . . . . . . . . . . .

.:.. . . . . . . . .:.

20 . . . . . . . . . . ~ .

0

0.5 1 1.5 2 2.5 btimted PSD (saw). original E D (&&dol). ttd. d s v M i (dashed)

.............

3

0

2.5

3

........

:

estimation: a complete story. Submitted for publication, 1999. [5] J. Sorelius, T. Soderstrom, P. Stoica, and M. Cedervall. Comparative study of rank test methods for ARMA order estimation. In T.Katayama and S.Sugimoto, editors, Statistical methods in control and signal processing, pages 179-216. Marcel Dekker, 1997.

2.5

(daJhed)

3

Figure 2: As for Figure 1but for @ ~ 2 .

/ ; l i

T. Kailath. Linear Systems. Prentice Hall, 1980. J. Mari, P. Stoica, and T. McKelvey. Vector ARMA

...............

...............................

0.5 1 1.5 2 WmaSd PSD (6W.orighal PSD (dashdd). sld. &Viatior,

References [11 S . Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory. SIAM, 1994. [2] P. Faurre, M. Clerget, and E Germain. Ope'rateurs Rationnels Postitiifs. Dunod, 1979.

;

2

. I ;.

can obtain a copy of the programs from the Internet web site http://uuw.math.kth.sefjorgem/doc/Mlab/code.html where further details are also given.

[4]

1.5

. . . . . . . . . ..j . . . . . . . . . . .

Figure 1: Results of the identification procedure when applied to the multivariate model (49),for 100 noise realizations. Here @I 1 is considered. See the text for explanation.

[3]

1

RealiratDnaM)amadion needed 49 X

.............. ............ .......... ............. . . . . . . . . . . . . . . . . . . . . ......................... ..............; .. .............. ............ .... ..: . . . . . . . . . . . ..............:..

0

0.5

emr ( m e d casas)

Avaage relatueo,,

:

02

0

05

1

Average &We

15 a-

2

25

3

enw (casesmmout anrezbon)

02

0

05

1

15

2

25

3

Figure 3: An indication of the overall performance of the identification algorithm is provided by plotting the singular value of the spectral density error matrix between true and identified systems. This is done here for both th8 realizations that required corrections and those that did not require.

3670