Identification of structurally constrained second-order Volterra models

6 downloads 0 Views 1MB Size Report
we consider the special case of second-order Volterra models, focusing on the effects of structural restrictions and non-Gaussian input sequences on the model ...
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 44, NO. I I , NOVEMBER 1996

2837

Identification of Structurally Constrained Second-Order Volterra M:odels Ronald K. Pearson, Babatunde A. Ogunnaike, and Francis J. Doyle, I11

General order Volterra. models in either continuous-time [ 11, [2] or discrete-time [ 111 have been considered much earlier by other authors, but we restrict attention here to the secondorder case because it is one of the simplest nonlinear subsets of the Volten-a family. Despite this simplicity, significant and interesting results may be obtained, a s we demonstrate here. Even when we restrict our attention to second-order Volterra models, the number of parameters required to specify them rapidly becomes unacceptable with increasing dynamic order M. In particular, note that we have one constant parameter yo, M linear parameters { a ( i ) } , and M 2 quadratic parameters { b ( i , j ) } appearing in (1); there is no loss of generality in assuming b ( j , 1;) =: b ( i , , j ) for all 1;, j since the value of G ( k ) depends only on the sums b ( i , j ) b ( j ; i ) , and we will adopt this simplification here. Practical applications of model predictive cmontrol (MPC) typically involve linear models with M 30; direct extension to the second-order I. INTRODUCTION Volterra case corresponds to adding 365 quadratic parameters HIS paper is concerned with the identification of secondfor M = 30. This observation motivates our interest in pruned order Volterra models, which are defined by the equation Volterra models, which are defined as Volterra models in which some fraction (typically, “most”) of these parameters are M constrained to be zero. For example, the class of Hamrnerstein ( L ( i ) U ( k 1;) G ( k ) =yo models [12] represents a subset of the cllass of diugonally i=l M M pruned Volterra models in which all off-diagonal coefficients (specifically, b ( i , i) for 1; # j in the second-order case) are constrained to be zero. These models have been the subject of some recent interest in the chemical process control literature Here, $ ( k ) represents the prediction of some observable [3] and will be discussed further here in connection with variable y(k) in response to a stimulus sequence { ~ ( k ) } . diagonal pruning. In general, both the nature of the pruning In particular, we are interested in the application of these considered and the input sequences used will influence the models in approximating chemical process dynamics, where model identification problem, as the results presented here U ( k ) represents a sequence of “manipulated variable” values, will illustrate. and y(k) represents the response of the chemical process to More specifically, this paper considers the problem of that sequence. Our interest in Volterra models stems both from determining the parameter values yo, { ~ ( i ) and } , (b(1;; j ) } in the fact that the structure of these models may be exploited (1) such that ? j ( k )is a minimum variance, unbiased estimator in developing identification algorithms [ 5 ] , [6], and the fact (MVUE) of the observed process response sequence {y(k)}. that the associated Internal Model Control (IMC) problem We are concerned with both the basic form of the resulting pafor this class of nonlinear dynamic models is tractable [4]. rameter estimation equations obtained under different working assumptions and the influence of pruning on these results. In Manuscript received January 24, 199.5; revised May IS, 1996. The associate favorable cases, we show that pruning the Volterra model has editor coordinating the review of this paper and approving it for publication no influence on the optimal estimates for the unconstrained was Prof. Andreas Spanias. para’meter values, but in the general case, the question appears R. K. Pearson and B. A. Ogunnaike are with E. I. du Pont de Nemours and Co., P. 0. Box 80101, Wilmington, DE 19880-0101 USA. to be open. F. J. Doyle, I11 is with the School of Chemical Engineering, Purduc The remainder of this paper is organized as follows. It is University, West Lafayette, IN 47907-1283 USA. shown in Section I1 that yo may be chosen to make $ ( k ) Publisher Item Identifier S 1053-587X(96)08248-7.

Abstract-Recent and continuing advances in industrial control hardware and software have increased interest in the identification of nonlinear dynamic models from chemical process data. Two practically important issues are those of nonlinear model structure selection and input sequence design for adequate model identification. While the “general nonlinear model identification problem” is intractably complex, we may obtain useful insights into both of these issues by restricting our attention to special cases, building incrementally on our “linear intuition.” Here, we consider the special case of second-order Volterra models, focusing on the effects of structural restrictionsand non-Gaussian input sequences on the model identification problem. The results presented here build on the work of Powers and his co-workers, who considered the unconstrained Gaussian problem, certain constrained special cases (e.g., the Hammerstein model), and identification using non-i.i.d. input sequences. Besides extending these results to a wider class of model structure constraints and input sequences, the results presented here yield some useful insights into the issue of input sequence design.

+

-

T

+

1053-587W96$0.5.00 0 1996 IEEE

“’? l C C L TRANSACTIONS ON SIGNAL PROCESSING, VOL. 44, NO 11. NOVEMBER 1996

LX3X

an unbiased estimator of y(k) for any given input sequence { ~ ( k ) }extending , a result given earlier by Koh and Powers [5] for zero-mean, Gaussian input sequences. Section I11 discusses conditions on the input sequence { ~ ( k ) under } which the linear and quadratic model identification problems decouple. Under these conditions-satisfied by zero-mean, Gaussian input sequences-we recover the Yule-Walker equations for the optimal linear parameter estimates, and we obtain a second, more complex set of simultaneous linear equations for the quadratic model parameters. Section IV derives an expression for the model prediction error variance for pruned Volterra models and uses it to show that for linear approximations to the second-order Volterra model, pruning and optimization commute. That is, optimal linear approximations to a secondorder Volterra model are obtained by appropriately pruning the unconstrained Volterra model identified from the input sequence under consideration. Further, this optimal pruned model is unique if the input sequence is persistently exciting. It is not obvious that this result holds for the quadratic model parameters, and this question defines the focus of the rest of the paper. In particular, Section V formulates the general pruned Volterra model identification problem, discussing the distinction between Hammerstein models and diagonally pruned Volterra models. In addition, Section V introduces the class of elliptically distributed input sequences [7] and [8], which provide a basis for a general extension of Koh and Powers Gaussian results. Sections VI and VI1 establish that results analogous to the linear case also hold for two other cases: diagonal pruning with arbitrary input sequences and arbitrary pruning with i.i.d. input sequences, respectively. Consequently, if pruning and optimization do not commute in the general case, any counterexamples necessarily involve both dependent sequences and Volterra models with off-diagonal terms. Section VI11 considers the identification problem for elliptically distributed input sequences because it provides a tractable problem formulation that can incorporate both of these features. Finally, Section IX gives a brief summary of the key results of this paper.

11. UNBIASEDMODELS As noted above, the constant yo may be chosen to make y(k) an unbiased estimator of y(k) for any given input sequence { ~ ( k ) ) This . result is established under the assumption that the input sequence { u ( k ) ) and the output sequence {y(k)} are stationary stochastic processes with known means 2 and y,respectively, and known joint second- and third-order correlations. In particular, the following correlations are assumed to be finite and known:

Taking the expectation in (1) yields M L=l

M

A4

z=l

]=1

To make $ ( k ) an unbiased estimator of y(k), yo must be chosen so that E { y ( k ) }= E { y ( k ) }= 1J, implying M i=l M

mi

a=l

j=1

Remark 2.1: While the problem of estimating correlations ) & ( i . , j ) has a long history, traditional like r T L Y ( iand moment-based methods can be extremely sensitive to outliers, particularly if they are present in both { u ( k ) } and {y(k)) simultaneously [ll]. Robust approaches to this problem are being explored and will be discussed in future publications. In what follows, it will initially be more convenient to work in terms of the following deviation variables:

Note that E { v ( k ) } = E { w ( k ) } = 0 by definition and that E { z ( k ) } = 0 because we have chosen yo to make E { i j ( k ) )= E { y ( k ) } .Combining (I), (2), and (4) yields the deviation model M

M

x(k) =

C .(i)U(k

-

i)

M

+ 2 c y 7; b ( i , j)w(k

-

i)

(5) Remark 2.2: Note that in the linear model, b ( z , j ) = 0 and the output deviations z ( k ) depend on the input deviations w(k) only-there is no dependence on the mean input level U . In the nonlinear case, however, note that the output deviations depend explicitly on U . In particular, note that we may rewrite (5) as z ( k ) L ( k ) + Q ( k ) , where Q ( k ) is the last term in (5) (i.e., the “quadratic part” of the model), and L ( k ) is the “linear part” of the model, which is defined by

where M ..

Q ( i )= .(i)

+ 2UC b ( i , j ) . J=1

In contrast, note that thc mean output level 7j does not enter the deviation equation for either the linear or the nonlinear

PEARSON et al.: IDENTlFICATlON OF STRUCTlJALLY CONSTRAINED SECOND-ORDER VOLTERRA MODELS

case. This observation emphasizes one difference between linear and nonlinear models-“input offsets” (U # 0) and “output offsets” (7j # 0) have radically different effects. The physical implications of this dependence of z ( k ) on E will be discussed in detail in the companion paper [ 101 describing the application of the model identification results presented here. 111. MINIMUMVARIANCE MODELS

, now Having chosen go to make E { i j ( k ) )= E { ? j ( k ) ) we choose the model coefficients a ( i ) and b ( i , j) to minimize the prediction error variance a:, which is defined as

= E { [ z ( k )=C :

M

+

a(i)g(i) i=l

+

M

M

b ( i , j ) h ( i ,j ) .

linear and quadratic problems when { U ( k ) } is assumed to be a zero-mean Gaussian sequence. In fact, decoupling of the linear and quadratic problerns occurs under the following weaker assumptions: -

u=o

(1 1) E { v ( k - i ) v ( k - m ) v ( k - n ) } = 0 for all i , m ,n. (12)

Note that these conditions are satisfiedl by any sequence { u ( k ) }whose joint density is symmetric, including Gaussian input sequences as a special case. Under assumptions (11) and (12), ,9(i) is independent of b ( m , n ) , and the linear model identification problem reduces to the solution of the familiar Yule-Walker equations:

(6)

i=l j=1

Here, a:” is the variance of the zero-mean output fluctuation sequence { w ( k ) } ,and g ( i ) and h(i,j ) are defined as M

a(.i)R”v(i- j ) - 2rw7,(i)

g(i) =

2839

j=1 1\/1

Dl

+ b(m, n)[4UR7,,,(i + 2E{v(k - i ) v ( k - nL)v(k

-

m)

-

n)}]

Tn=1 n = l

(7)

and M

M

h(i, j ) =

b(m,n ) H ( i ,j ,

VL;n )

m=l n=l

- 4ur,,,(i) - 2t71,7;(2,j ) .

where R,:,, is the N x N autocorrelation matrix whose (I, . j ) element is Ru,)(i-j),r,,, is the N vector [r,,,,,(l).. .r,,,,,(N)IT, and a is the N-vector of linear model coefficients a ( i ) . This result is exactly the same as (9) of Koh and Powers [ 5 ] ,but it was derived here without assuming the input sequence is Gaussian. Similarly, the fact that g ( i ) is independent of b(m, n ) implies that a: may be minimized with respect to b(m, n ) by differentiating only the last term in (6) with irespect to b(m, n). Setting this derivative to zero and noting that H ( i , j , m, n ) simplifies to D ( i , j , m,, n ) under assumptions (11) and (12), we obtain the following set of N ( N + 1 ) / 2 simultaneous linear equations for the quadratic model coefficieints b(m, n):

(8) rlr

M

The term H ( i , j,m, n) in this second equation is defined by

m=l n = l

H ( i , j , m, n) = D ( i , j , m, n) + 4U2R,,,(i - VL) 4GE(v(k - i)w(k - m ) v ( k - n)} (9)

+

where

D ( i , j , VL,n) = E { v ( k - i ) w ( k - j ) v ( k - m)7i(k - n ) } - Rvv(i- j)Rvu(7n- n ) . (10) Optimal values for these model parameters are obtained by setting the corresponding derivatives of the prediction error variance to zero. Specifically, define the linear model identijkation problem to be the problem of estimating the linear coefficients { a ( i ) } from the available input/output data. Similarly, define the quadratic model identijication problem as that of estimating the quadratic model coefficients { b ( i , j ) } from the available input/output data. For arbitrary input sequences { u ( k ) } ,these problems will generally be coupled as may be seen by considering the linear problem. Note that the linear model coefficients { a ( * i ) }only enter the second term in the prediction error variance expression (6). In general, the derivative of this term will depend on both { a ( i ) } and { b ( i , j ) } , coupling the linear and quadratic model identification problems. Koh and Powers [5] obtained independent

for i = 1, 2, . . . , N and j = 1, 2, f . . , i . ’While they do not develop the general formalism used here, Koh and Powers effectively obtain an explicit solution to this equation for the Gaussian case using fourth-moment factorization results. This solution will be discussed further in Section VIII. Finally, note that we can obtain an expression for the minimum prediction error variance 00” that results when the model coefficients { a ( i ) } and { b ( i , j ) } have their optimal values { a o ( i ) } and { 6 o ( i , j ) } , which are defined by (13) and (14), respectively. In particular, substitute (1 1)-(13) into (7) to obtain

Similarly, substitute (9), (1 l), (12), and (14)l into (8) to obtain

ho(i, j ) = -twu(i,j ) . Substituting these two riesults into the prediction error variance expression (6) yields the minimum prediction error variance:

lEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 44, NO. I I , NOVEMBER 1996

2840

where Bo is the N x N symmetric matrix of optimal model parameters b o ( i , j ) , and T,,, is the N x N symmetric matrix of cross-bicorrelations t,,,,(i, j ) . Note that this result was also given by Koh and Powers [ 5 ] ,but it was derived here without assuming the sequence { U ( k ) } is normally distributed.

obtained from (15) as

IV. PRUNEDVOLTERRAMODELS Recall from the introduction that a pruned Volterra model was defined as a Volterra model with a subset of its model parameters constrained to be zero. More specifically, define M as the set of integers M = (1, 2, . . . , M } , and take the index set S1to be any subset of M. Similarly, take the index set S2 to be any symmetric subset of M x M-i.e., if ( i ,j ) E Sz,then ( j , i ) E Sz.We will say the second-order Volterra model defined in (1) is pruned with respect to SIand S2 if the following constraints are satisfied: o,(i) = 0

b(i, j ) = O

if i

E S1

if ( i , j ) E S2.

Note that S 1 and S2 need not be proper subsets of M and M x M. In particular, if these subsets are both empty, the corresponding pruned model is the unconstrained secondorder Volterra model defined in (1). At the other extreme. if SI= M, all of the linear terms are constrained to be zero, leaving us with a “completely quadratic” model. Similarly, if S2 = M x M , the corresponding pruned Volterra model exhibits only the constant term yo and the linear terms defined by the model coefficients ( a ( i ) } . The following perturbation formulation of the Volterra model identification problem is useful in considering pruned Volterra models. Define the perturbations a l ( i ) and b l ( i , ,J’) of the model parameters a ( i ) and b ( i , j ) from their optimal ) b o ( i , j ) , which are defined by (13) and (14): values ~ ( iand Ul(i)

= .(i)

-

ao(i),

b l ( i , j ) = b ( i , j ) - bo(2, j ) . The prediction error variance associated with these perturbed parameters may be computed exactly as

In particular, for a Volterra model that is pruned with respect to SI and S2, it follows that i E SI =+ (2,

z.

The key practical question is whether is a lower bound on the prediction error variance 0; for the pruned Volterra model of interest. To address this question, define the following “free perturbation” terms:

U l ( i ) = -ao(i)

j) E s2 =+ b l ( 2 , j ) = - b o ( i , j ) .

As an interesting special case, suppose we allow the unconstrained parameters u ( i ) and b ( i , j ) , i.e., for i $ SIand (i, j ) $ S2,to assume their optimal values ao(i) and bo(2, j ) . The resulting “constrained prediction error variance” crz is then

The prediction error variance for the pruned Volterra model is then given by

where

4 = [q!(l) d(2) . . . $ ( M ) l T ,and YQ is defined

by

We define the pruned Volterra model identijkation problem as follows. Given an input sequence { ~ ( k ) and } the sets S1 and 5’2, choose the “linear free parameters” { u l ( i ) } and the “quadratic free parameters” { b l ( i ; j ) } such that op” is minimized. As in the unconstrained Volterra model identification problem, yo is chosen to make c ( k ) an unbiased estimator of y(k) for the input sequence ( ~ ( k ) }The . next three sections are concerned with solutions to this problem, but it is useful to first note the following result for the linear parameters { a1( i ) } . Recall that the input sequence { u ( k ) }is persistently exciting if R,, is nonsingular [16, p. 3631; further, note that since U = 0, here, R ,, = RUu. Theorem 4.1: Consider the Volterra model (l),pruned with respect to arbitrary index sets SIand S2,and suppose the parameters { a1(2)) do not depend on the parameters { b l ( i l .J’) }. Then, for any input sequence {u(lc)}satisfying the decoupling conditions (11) and (12), we have the following: 1) a l ( i ) = 0 for all i s1minimizes 0;. 2) If { u ( k ) } is persistently exciting, these optimal values are unique. Proofi Note that since R,, is a covariance matrix, it is nonnegative definite, and therefore, the second term in (18) is nonnegative. Thus, its contribution is minimized by setting q5 = 0, corresponding to a l ( i ) = 0 for all i $ SI. Further, if { ~ ( k ) is } persistently exciting, R,, is positive definite, implying that a; is strictly larger than this value for any q5 # 0 . 0 Thus, the solution given in (1) above is unique.

PEARSON et a1 IDENTIFICATION OF STRUCTUALLY CONSTRALNED SECOND-ORDEK VOLTERRA MODELS

2841

are defined. To indicate that the random vector x is a member of this class, we will use the notation of [7] and write x EC,(p, A. $). Many different choices of the function $ { z } are possible (see, for example, the cases discussed in [SI); as a specific example, note that when $ { z } = exp(-z/2), the variable is normally distributed, i.e., x N ( p . A). The advantage of considering the wider class of elliptically distributed random variables is that many of the standard results for Gaussian random variables are, in fact, generic to the entire class of elliptically distributed random variables. For example, if x N EC,(p. A. 4) and y = Hx, then y EC, (Hp, HAHT, 4) Thus, many estimation and control problems that are usually formulated for Gaussian random variables may be formulated for elliptically distributed random variables with little or no change [SI. N

Pruning

Distribution

Correlation

Diagonal (D)

Elliptical (E)

I.I.D. (I)

None (N)

Gaussian (G)

N

I

I

I

I

N

Remark 4.2: The independence condition between { a1 ( i ) } and { b l ( i , j ) } excludes Hammerstein models and other “structured Volterra models.” This point is discussed further in the next section. Remark 4.3: Note that n l ( i ) = 0 corresponds to taking a,(z) to have the unconstrained optimal value no (1,) for any 1, !$ SI.A. Cases Considered In other words, for the linear model parameters, optimization Table I1 lists the options considered explicitly in this paper and pruning commute: The optimal pruned linear model cofor model structures, input sequence distributions, and input efficients are the same as the unconstrained coefficients. It sequence correlation structures. Taken together, these options is not obvious that YQ is nonnegative, however; therefore, it lead to 18 possible model identification problem formulations, is not clear that this result extends to the quadratic model each designated by a sequence of three letters, as defined parameters. The next three sections consider special cases in Table I. The most general problem considered here is for which the quadratic model identification problem can be “A,S,A”-identification of an arbitrarily pruned second-order solved explicitly. Volterra model, excited by an arbitrary symmetrically distributed input sequence with arbitrary correlation structure. OF PRUNEDVOLTERRA MODELS V. IDENTIFICATION Unfortunately, we have not been able to solve this most Koh and Powers [5] considered both the unconstrained general problem or its restriction to either elliptically disand the diagonally pruned Volterra model identification prob- tributed (“A,E,A’) or Gaussian (“A,G,A’) input sequences. lems using Gaussian input sequences with arbitrary corre- In particular, it is not clear that analytic solutions to these lation structures. For the diagonally pruned problem, SZ= problems are possible since it is not clear that the operations {(z, j)li # j } , implying that b(i, j ) = 0 for i # j. Here, we of parameter estimation and model pruning commute. The consider a number of extensions of these problems, which are most general problems we can solve explicitly are the foldescribed in detail in the following paragraphs. In particular, lowing four: “N,S,A,” “D,S,A,” “A,S,I,” and “N,E,A.” Probwe consider all possible combinations of the three problem lem “N,S,A”-no pruniing, arbitrary symmetrically distributed characteristics defined in Table I: the type of pruning involved, input sequences, and arbitrary correlation structure-was rethe distribution of the input sequence, and the correlation duced to a set of simultaneous linear equations in Section structure of the input sequence. In this paper, we consider three 111. Section IV then showed that for the linear part of the different pruning options-pruning with respect to arbitrary model identification problem, parameter estimation and model sets S I and S, (denoted “A’), the diagonal pruning defined pruning commute for arbitrary symmetric input sequences above (denoted “D”), and no pruning (denoted “N’). While (Theorem 4.1). This result permits us to focus on the probthese options are not exhaustive, they do cover a useful range lem of quadratic model identification, except in those cases of possibilities. Similarly, we consider three classes of input (discussed in Section VI) in which the linear and quadratic sequence distributions: symmetric distributions, denoted “S,” model parameters themselves are not independent. elliptical distributions, denoted “E,” and Gaussian distribuMore specifically, Slection VI focuses on the problem of tions, denoted “G.” The class of symmetric distributions is the diagonal pruning, where it is established that if the linear and most obvious class of distributions satisfying the decoupling quadratic model parameters are independent, a result similar condition (12), whereas the class of elliptical distributions is a to Theorem 4.1 for the linear parameters (Tiheorem 6.1) holds class of symmetric distributions that generalizes the Gaussian for the quadratic parameters as well. This result holds for distribution. arbitrary symmetric input sequences, establishing a solution for More specifically, elliptically distributed random variables the “D,S,A” problem listed in Table 11. Section VI1 establishes [7]-[9] are vector-valued random variables x E RT’ whose a similar result for the “A,S,I’’ problem, proving (Theorem 7.1) joint densities are of the form: that optimization and pruning commute for tlhis case as well. It follows from these results that if there are problems for which optimization and pruning do not commute, they necessarily For this class, p E R” represents the mean of the vector x, involve both dependent input sequences (e.g., correlated seand A is proportional to its covariance when these quantities quences) and nondiagonal Volterra models Such cases will

lEEE TRANSACTlONS ON SlGNAL PROCESSING, VOL. 44, NO. 11, NOVEMBER 1996

2842

TABLE I1 EIGHTEEN POSSIBLE PROBLEM FORMULATIONS Number

Main Option

Comments

1

AAA

Most general case, unresolved

2

3

where

R is the N

x N matrix whose i , j element is

An important simplification of this result is possible if we observe that

D ( i , i , j ; j ) = E{w2(k - i ) u 2 ( k - j)}- o4 = E { [ 7 J ( k- i ) - 02][u”k - j )

4

Reduced to linear system in Sec. 3

5

Reduced to N,S,A in Sec. 6

6

7

Special case, considered in 151

S

Solved in Sec 7

9

Equivalent to A,G,I

~

~

10

11 12

Equivalent to D,G,I

13 14

Considered in 161

15

Solved in Sec. 8

16

Main result in [5]

17

Equivalent to N,G,I

18

be fairly complex to analyze, in general, motivating our interest in the unconstrained problem-“N,E,A”-considered in Section VIII. There, an unconstrained solution is presented that generalizes the one obtained by Koh and Powers [5] for Gaussian input sequences (“N,G,A”). VI. DIAGONALLY PRUNED MODELS Recall from the above discussion, that a diagonally pruned Volterru model is a Volterra model of the form (1) that is

pruned with respect to any arbitrary linear index set SI and any set 5’2 that includes all possible pairs ( i ; j ) for which i # j. For diagonally pruned Volterra models, the free parameter matrix r defined by (17) is given by

r = diag { b l ( i ,

2)).

If we now define the quadratic parameter vector /I = [ b l ( l , 1), b 1 ( 2 , 2); . . . , b l ( M , M ) I T , it follows from (19) that

-0”}

implying that R is the covariance matrix for the sequence { u2( I C ) } . This observation leads immediately to the following result. Theorem 6.1: Consider any diagonally pruned Volterra model, and suppose the parameters { a l ( i ) } do not depend on the parameters { b l ( i , j ) } . Then, for any input sequence { u ( k ) }satisfying the decoupling conditions (11) and (12), we have the following: 1) b l ( i , i ) = O for all i 6 S2 minimizes 0;. 2) If the squared input sequence { v 2 ( k ) }is persistently exciting, these optimal values are unique. Proof: As in Theorem 4.1, since R is a covariance matrix, it is nonnegative definite, implying that YQ 2 0 for all possible quadratic parameter vectors p. Thus, it follows from (18) that the minimum possible value of 0; is achieved by taking /3 = 0, establishing the first result. Similarly, if { v 2 ( k ) } is persistently exciting, it follows that R is positive definite, implying that this solution is unique, just as in Theorem 4.1.0 Remark 6.2: The requirement that the squared input sequence be persistently exciting has some interesting consequences. In particular, note that pseudo random binary sequences (PRBS’s) are quite popular in linear model identification [16], and they can easily be made to conform to the decoupling conditions (1 1) and (12). In this case, these sequences switch at random times between the values i c for some c > 0; consequently, v2(k) = c2 for all k , and R = 0 identically. One of the practical consequences of this result is that PRBS inputs cannot be used to identify Hammerstein models, which is a point we discuss further below. The Hammerstein model is illustrated in Fig. 1 and is a member of the block-oriented family of nonlinear models, which is a class of considerable independent interest in the literature [ 3 ] , [12], [13]. Recently, the Hammerstein model has attracted some interest in chemical process control [ 3 ] , in part because it combines two familiar practical concepts: a “nonlinear steady-state gain” and linear dynamics. In particular, this model consists of a static nonlinearity g(.) followed by linear dynamics, which is characterized by the transfer function H ( z ) . If g ( . ) is analytic, expanding it as a Taylor series yields an infinite-order Volterra series model that is diagonally pruned. Further, if g(u) = go glu g 2 u 2 , we obtain a diagonally pruned second-order Volterra model. It is important to note, however, that this model does not satisfy the parameter independence conditions imposed in Theorems 4.1 and 6.1. In particular, if H ( z ) = C,”=, biz-?, then a ( i ) = glh; and b ( i , i ) = g2hi for i = 1, 2, . . . , M .

+

+

~

PEARSON et al.. IDENTIFICATION OF STRUCTUALLY CONSTRAINED SECOND-ORDER VOLTERRA MODELS

2843

Theorem 7.1: Consiider the Volterra model (l), pruned with respect to arbitrary index sets SI and Sz,and suppose the parameters { a l ( i ) } do not depend on the parameters { b l ( i , j)}. Then, for any i.i.d. iniput sequence { ~ ( k ) with } variance o2 and kurtosis 6 , satisfying the decoupling conditions (1 1) and (12), we have the following: 1) b l ( ! i , j ) = 0 for all ( i ,j ) $ S2 minimizes Static v(k) Linear 2) If 0' > 0 and IC. > -2, these optimal1 values are unique. Nonlinearity Dynamics Pro08 Since all of the model parameter terms enter (21) as squares, i.e., d 1 2 ( i ) , r2(i,i ) , P ( i ,j ) , it follows Fig. 1. Hammerstein model structure. immediately that up" 2 a:. Further, this lower bound is achievable by setting 4 ( i ) = 0, r ( i ,i ) = 0, and r ( i ,j ) = 0 VII. IDENTIFICATION WITH 1.I.D. INPUTS from which the first result follows. The conditions 0' > 0 and Here, we assume the input sequence { ~ ( k )is} independent, K > -2 make the coefficients associated with all three sums in identically distributed (i.i.d.) and satisfies the decoupling con- (21) strictly positive, implying that any other choice of model ditions given in (11) and (12). Under these assumptions, the parameters will result in a strictly greater value for autocorrelation matrix reduces to R,,,= ,a21,where o2 is the Remark 7.2: Since the terms multiplying the sums in (21) variance of the sequence { ~ ( k ) }which , is the same as { u ( k ) } are nonnegative, we may view them as penalties associated since ii = 0. The expression defined in (10) is given by with the deviations { $ ( i ) } , { r ( i ,i ) } , anid { r ( i ,j ) l i # j } . . of the linear, diagonal, and off-diagonal model parameters (n++)a4 z = j = m = n from their optimal values. For example, note that when a2 i = m , j = n j m# n D ( i , j , m, n ) = (20) is small, the linear deviations { 4 ( i ) } dominate the prediction error variance 0;. Consequently, the accuracy of linear model otherwise parameter estimates should improve as the variance of the where the kurtosis n is defined as [14] input sequence decreases. Similarly, since the quadratic model parameter deviations { r ( i ;j ) } are weighted by 04, increasing the input variance will1 emphasize the contribution of these Under this definition, (c, = 0 for a Gaussian distribution, and it deviations to the total prediction error variance. Here, we may distinguish between diagonal terms and off-diagonal terms may be either positive or negative for non-Gaussian distributions. Defining the skewness y = E{w3(k)}/a3, Rohatgi and since they enter (21) with different coefficients. In particular, note that if K = -2, the prediction error variiance is completely Szekeley show that 6 2 2 y2 [15]. Further, they show that insensitive to arbitrary deviations r(i,i ) from the optimal this lower bound is achievable by a distribution concentrated diagonal parameter values. This observation corresponds to on, at most, two discrete values. Thus, a symmetric, binary the fact that (14) has rio solution for b(m, m ) when K = -2. distributed sequence will achieve the lower bound K = -2. As noted above, K == -2 is achievable by a symmetric, It follows immediately from the above expressions that the binary distributed input sequence. Conversely, note that large optimal unconstrained linear and quadratic parameters no( i ) variance, high-kurtosis input sequences will emphasize the and bo(i, ,j) are given by contributions of diagonal model errors to tihe prediction error variance. Thus, such input sequences would be preferred for identifying diagonal rnodel parameters. These observations and form the basis for a three-step model identification procedure that will be discussed in detail later [lo].

-t -

ai.

ai.

+

VIII. IDENTIFICATION WITH ELLIPTICALLY DISTRIBUTED INPUTS In addition to providing an explicit solution for the unconstrained model identification problem, the special case of i i d . input sequences also yields the following analog of Theorems 4.1 and 6.1. This result is based on the following prediction error variance expression, which is obtained by substituting (20) into (18) and (19): M

M

i=l j # z

Koh and Powers [5] solved the second-order Volterra model identification problem for the case of arbitrarily correlated, zero-mean Gaussian input sequences { U ( k ) } . Note that these sequences satisfy the separation conditions given here; therefore, the linear model parameters are given by (13), whereas the quadratic parameter matrix is given by

M

This result was obtained by exploiting the fact that the fourth moments defining D(7, ,j, 7n. n ) in (14) may be computed explicitly from the covariance matrices for a Gaussian input

2844

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 44, NO. 11, NOVEMBER 1996

sequence { ~ ( k ) }Here, . we extend this result to the case of elliptically distributed input sequences. If { ~ ( k ) is} an infinite sequence such that every finite subsequence is elliptically distributed, it is called a nondegenerate elliptical process. Jensen and Foutz [9] showed that all nondegenerate elliptical processes may be represented in the form:

sequences: (6

+ 2)a4

i =j = m = n

I (S).‘ { (F)

i = j , m = n,i

D ( i ; j , m; n ) =

u4 i = m; j = 71; m # n

1

04

l o where { ~ ( k )is} a Gaussian sequence with mean proportional to jL(k) and covariances proportional to A(,j, k ) . Here, a is a positive random variable, independent of { x( k ) } , whose distribution determines the ultimate distribution of { U ( k ) } . In particular, note that if the underlying Gaussian sequence is zero-mean, then the sequence { u ( k ) } will satisfy the separation conditions (11) and (12). Further, the kurtosis may be obtained directly by taking the fourth moment of (22), obtaining:

where i ~ :is the variance of the random variable a.In addition, note that since all quantities in this expression are nonnegative, nondegenerate elliptically distributed sequences must have nonnegative kurtosis. One of the key properties of elliptically distributed random variables is that they all satisfy the fourth-moment factorization result:

#m

‘i = n, ,j = m, m

#n

otherwise

Note that this expression and (20) are not the same unless 6 = 0 , corresponding to the Gaussian special case. Thus, an uncorrelated, elliptically distributed sequence is i.i.d. if and only if it is Gaussian [9]. For this reason, problems “A,E,I,” “H,E,I” and “N,E,I” appearing in Table I are equivalent to the Gaussian problems “A,G,I,” “H,G,I,” and “N,G,I,” respectively. Substituting (23) into (14) yields an equation whose lefthand side consists of three double sums. Noting that b ( i , J ) = b ( j . i ) and exchanging summation indices, the second of these sums may be shown to be identical to the first. Expressing these sums in matrix notation, we have

To solve this equation, multiply on the right by R;: and define the matrices X = R,,,B and Y = T,,R,;,Z. Noting that tr {BRTLTL} = tr {RILVB} = tr { X}, the result is

2 ( T6 + ) X3 +

[(i)tr{X}]I=Y.

(24)

E { u ( k - i ) U ( k - j ) u ( k - m ) u ( k - n,)) The solution of this equation is obtained by first taking its trace, noting that tr {X} is a scalar and that tr {I} = M . We may then solve for tr {X} as This result is important because it allows us to solve (14) as Koh and Powers did but without assuming that the input sequence is Gaussian. Thus, we can explore the effects of both non-Gaussian distributions ( K , # 0) and dependent (i.e., “non-i.i.d.”) correlation structures (A # 0’1). Specifically, D ( i , j , m, n ) is given by

3 tr { Y } ( M 2)K 6 = 3?7

tr{X} =

+

+

Substituting this result back into (24) then yields

In terms of the original variables, this solution yields

It is important to distinguish here between the notions of i. i d . sequences and uncorrelated sequences. First, note that if the input sequence { ~ ( k )is} i.i.d., D ( i , j , m, n ) is given by (20). In contrast, if { ~ ( k )is} uncorrelated, it follows only that R,,, = a21. Substituting this result into (23) yields the following result for uncorrelated, elliptically distributed input

where

Note that this result reduces to that of Koh and Powers [5] in the Gaussian limit IC. = 0. Relative to this result, the

PEARSON et ul.; IDENTIFICATION OF STRUCTUALLY CONSTRAINED SECOND-ORDER VOLTERRA MODELS

effect of non-Gaussian elliptically distributed input sequences is twofold. First, all elements of B are reduced uniformly by the factor 3 / ( ~ 3 ) . The other effect is to suppress or enhance the elements of B in a way that depends on the correlation structure of the input sequence. In general, this dependence is complicated, but there is one very special case that is somewhat insightful. Specifically, consider the uncorrelated case R,,,, = a21. Recall from the above discussion that this case is not the same as the i.i.d. case for K # 0; therefore, the results presented in Section V do not apply. Instead, substituting this expression for the correlation matrix into (25) and (26) yields

+

(M

+ 2 ) +~ 6

Thus, in the uncorrelated case, the effect of the non-Gaussian input sequences is to suppress the diagonal elements of the Gaussian estimate B by an amount that depends on the kurtosis. In the “high-kurtosis limit”-i.e., K >> 6 / ( M 2)-the magnitude of this suppression is

+

[

Ktr{B) (M+2)K+6

]

N -

-

tr{B) M t 2

off-diagonal model coefficients. These observations have been used to develop a three-pass model idenl ification algorithm, which will be described in detail in a forthcoming paper and demonstrated on a process-oriented example. Note also that these results demonstrate that binary input sequences (PRBS’s), which are popular in linear model identification, are not suitable for use in identifying second-order Volterra models. Specifically, it was demonstrated in Section VI that the quadratic diagonal model coefficients cannot be identified from PRBS response data. Finally, note that (14) in Section I11 provides the basis for investigating the use of a wide variety of other input correlation structures. For example, correlated input sequences generated from non-Gaussian MA(q) processes would reduce the coupling in this system and may lead to efficient identification algorithms for these input sequences. In addition, we are exploring other examples for which the modified fourth moments D ( i , j , m, 71) may be computed analytically from distributional assumptions. A,CKNOWLEDGMENT

The authors wish to acknowledge the helpful comments and suggestions of two anonymous reviewers.

which is approximately equal to the average diagonal element of B.

IX. SUMMARY The results presented here give solutions to several specific formulations of the pruned second-order Volterra model identification problem. The motivation for this investigation is the practical necessity of reducing the number of parameters required to use second-order Volterra models in empirical chemical process modeling. In addition, motivated by the practical constraints imposed on input sequences in industrial process model identification, we have also explored the influence of non-Gaussian input distributions and noni.i.d. dependence structures. More specifically, we showed in Section I11 that the linear and quadratic model identification problems decouple under very nonrestrictive assumptions, reducing the linear problem to the standard Yule-Walker equations and deriving a corresponding set of simultaneous linear equations for the quadratic model parameters. It is not clear that the operations of parameter estimation and model pruning commute in general, but we showed in Sections VI and VI1 that these operations do commute for the special cases of diagonally pruned models and i.i.d. input sequences. In addition, we presented a solution to the unconstrained model identification problem for the case of elliptically distributed input sequences, providing results for one case in which the input sequence is neither Gaussian nor i.i.d. The prediction error variance expression (21) given in Section VI1 provides a basis for input sequence design. Specifically, examination of this expression shows how the variance and kurtosis of an i.i.d. input sequence can be chosen to selectively emphasize the effects of linear model coefficients, quadratic diagonal model coefficients, and quadratic

2845

REFERENCES J. H. Seinfeld and L. Lapidus, “Mathematical methods in chemical enginecring,” Process Modeling, Estimation, and Identzficution. Englewood Cliffs, NJ: Prentice-Hall, 1974, vol. 3. N.S. Rajbman and V. 1M. Chadeev, ldentificalion cflndustrial Processes. New York: North-Holland, 1980. E. Eskinat, S. H. Johnson, and W. L. Luyben. “Use of Hammerstein models in identification of nonlinear systems,” AIChE J., vol. 37, pp. 255-268, 1991. F. J. Doyle, Ill, B. A. Ogunnaike, and R. K. Pearson, “Nonlinear modelbased control using second order volterra modcls,” Automutica, vol. 3 1, pp. 697-714, 1995. T. Koh and E. J. Powers, “Second-Order volterra filtering and its application to nonlinear system identification,” IEE Speech, Signal Processing, vol. ASSP-33, pp. 1445-1455, 1985. Y. S. Cho and E. J. Powers, “Estimation of quadratically nonlinear Acoust., Speech, systems with an I.I.D. input,” in Proc. 1991 Inf. C~~vlf: Signul Processing, Toronto, May, 199 I . S. Cambanis, S. Huang, and G. Simons, “On the theory of elliptically contoured distribution,s,” J. Muliivariate Anal., vol. 11, pp. 368-385,

1981. K. Chu, “Estimation and decision for linear systems with elliptical random processes,” IE‘EE Trans. Autom. Contr., vol. 18, pp. 499-505, 1973. D. R. Jensen and R. V. Fouta, “The structure and analysis of spherical time-dependent processes,” SIAM J . Appl. Muth., vol. 49, pp. 1834-1844, 1989. F. J. Doyle, 111, R. K. Pearson, and B. A. Ogunnaike, “Nonlinear idcntification of chemical processes using second-order Volterra models,” in preparation. R. K. Pearson, “Robust estimation of cross-bicorrelations,” in P roc. IEEE Workshop Nonlinear Signal Image Processing, Neos Marmaras, Greece, June, 1995. W. Grehlicki and M. Pawl&, “Nonparametric identification of Hammerstein systems,” IEEE Trans. Inform. lheory, vol. 35, pp. 409-418, 1989. -, “Nonparametri,c identification of a cascade nonlinear time series system,” Signal Proce:;sing vol. 22, pp. 61-75, 1991. G. E. P. Box and G. C. Tiao, Bayesian Inference in Statistical Analysis. Reading, MA: Addison-Wesley, 1973. V. K. Rohatgi and G. J . Szekely, “Sharp Inequalitics between Skewness and Kurtosis,” Statistics and Probability Lett., voll. 8, pp. 297-299, 1989. L. Ljung, System Ident$cation: Theory,for the User. Englewood Cliffs, NJ: Prentice-Hall, 1987.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 44, NO. 1 I , NOVEMBER 1996

2846

Ronald K. Pearson was born in Kansas City, KS, in 1952. He received the B.S. degree in physics in 1972 from the University of Arkansas, Monticello, the M.S.E.E. degree from the Massachusetts Institute of Technology MIT, Cambridge, in 1974, and the Ph.D. degree in control theory from the MIT Department of Electrical Engineering and Computer Science in 1981. Since then, he has been with E.I. Du Pont de Nemours and Company, Wilmington, DE, where he is now a Research Associate. His urimani research interests are in the areas of nonlinear system identification, analysis of “nonideal data” (e g , asymmetric, multimodal, outlier-contaminated, etc ), and the empirical modeling of chemical processes for improved process understanding and control I

,

Babatunde A. Ogunnaike was born in Nigeria, in 1956 He received the B Sc degree (with First Class Honors) in chemical engineering from the University of Lagos, Nigeria, in 1976, the M S degree in statistics from the University of Wisconsin, Madison in 1981, and the Ph D degree in chemical engineering from the University or Wisconsin, Madison in 1981 From 1981 to 1982, he was a Research Engineer with the Process Control group of the Shell Development Corporation in Houston, TX From 1982 to 1988, he was a professor at the University o f Lagos with joint dppointments in the Chemical Engineering and the Statistics Department7 He joined the Advanced Control and Modeling Group of DuPont Central Science and Engineering in 1989, where he is currently a Research Fellow He is also currently an Adjunct Profesyor in the Chemical Engineering Department of the Univercity of Delaware His research interests include modeling and control of polymer reactors, identihcdtion and control of nonlinear systems, applied stdti\tics, and reverse engineering biological control syrtems for process applications

Francis J. Doyle, I11 was born in Philadelphia, PA, in 1963. He received the B.S.E. degree from Princeton University, Princeton, NJ, in 1985, the C.P.G.S. degree from Cambridge University, Cambridge, UK, in 1986, and the Ph.D. degree from the California Institute of Technology, Pasadena, in 1991, all in chemical engineering. From 1991 to 1992, he was a visiting scientist in the strategic process technology group at the DuPont Company, Wilmington, DE. Since 1992, he has been an assistant orofessor with the School of Chemical Engineering at Purdue University, West Lafayette, IN His research interests include nonlinear dynamics and control with applications in process control, nonlinear model reduction, and the reverse engineering of biological control systems Dr Doyle received the National Young Investigator Award from the National Science Foundation in 1992 and an Office of Naval Research Young Investigator Award in 1996

Suggest Documents