Direct and Indirect Least Squares Methods in ... - Semantic Scholar

0005-1098/87 $3.00+ 0.00 PergamonJournalsLtd. ~) 1987InternationalFederationof AutomaticControl

Vol.23, No. 6, pp. 707-718, 1987 Printedin GreatBritain.

Automatica,

Direct and Indirect Least Squares Methods in Continuous-time Parameter Estimation* S. VAJDA,% P. VALK0$ and K. R. GODFREY§

The extension of the discrete-time least squares approach to the estimation of parameters in continuous nonlinear models is considered and a three-stage procedure is formulated combining the robustness and numerical efficiency of direct integral least squares with the asymptotic unbiasedness of the indirect least squares approach. Key Words--Continuous systems; identification; least squares estimation; nonlinear systems; parameter estimation.

chemistry and biomedicine, where the primary goals are the validation of an assumed model by fitting it to observations and the determination of quantities which have physical meaning (/~str6m and Eykhoff, 1971). Another example occurs in clinical pharmacokinetics, where the actual parameter values are not particularly important but where the primary goal is to maintain desired plasma drug levels by a rational therapy. In many such applications, limitations on the measurements often necessitate identification and parameter estimation from very small samples with quite large inter-sample intervals (Mori and DiStefano, 1979), so that a discrete-time model, predicting behaviour only at the sample times, is not very useful for control design. In contrast to the situation in the control engineering literature, the continuous-time estimation problem has received a good deal of attention in the statistical, chemical and biomedical literature (see, for example, Himmelblau, 1970; Bard, 1974; Endrenyi, 1981; Cobelli, 1985) and there has been comparatively little cross-fertilization of ideas with control engineering. The goal of the present paper is to extend some methods of discrete-time parameter estimation to the classical continuous-time, nonlinear estimation problem, which is stated in Section 2.1. The conventional indirect least squares (ILS) solution to this problem is outlined in Section 2.2, and a direct integral least squares (DILS) method is introduced in Section 2.3. The two approaches are compared in Section 2.4, where it is shown that there is a trade-off between asymptotic biasedness, always present in DILS estimates, and the increased variance and mean-squared error of the ILS estimator.

Abstract--The discrete-time least squares approach is extended to the estimation of parameters in continuous nonlinear models. The resulting direct integral least squares (DILS) method is both simple and numerically efficient and it usually improves the mean-squared error of the estimates compared with the conventional indirect least squares (ILS) method. The biasedness of the DILS estimates may become serious if the sample points are widely spaced in time and/or the signal-to-noise ratio is low and so a continuous-time symmetric bootstrap (SB) estimator which removes this problem is described. The DILS, SB and ILS methods form a three-stage procedure combining the robustness and numerical efficiency of direct methods with the asymptotic unbiasedness of ILS procedures. 1. INTRODUCTION

THE EMPHASS in much of the recent control engineering literature on system identification and parameter estimation has been on discretetime models (see, for example, Goodwin and Payne, 1977; Sorenson, 1980; S6derstr6m and Stoica, 1983). In many applications, however, it is required to estimate the parameters in continuous-time models (Young, 1981). An example is the "diagnostic" identification of theoretically-based continuous-time models in

* Received 4 June 1986; revised 16 February 1987; revised 20 April 1987. The original version of this paper was presented at the 7th IFAC/IFORS Symposium on Identification and System Parameter Estimation which was held in York, England during July 1985. The Published Proceedings of this IFAC Meeting may be ordered from: Pergamon Books Limited, Headington Hill Hall, Oxford OX30BW, U.K. This paper was recommended for publication in revised form by Associate Editor G. C. Goodwin under the direction of Editor P. C. Parks. %Department of Engineering, University of Warwick, Coventry CV47AL, U.K. On leave from L. E6tv6s University, Budapest, Hungary. $ Laboratory for Chemical Cybernetics, L. E6tv6s University, Muzeum Krt 6-8, H-1088 Budapest, Hungary. §Department of Engineering, University of Warwick, Coventry CV4 7AL, U.K. AUTOn:6-S

707

S. VAJDA et al.

708

In Section 3, we use the instrumental variable principle and select a bootstrap estimator for reducing biasedness to an extent commensurate with keeping mean-squared errors relatively small. These estimators are considered as stages of a multi-stage estimation procedure in Section 4. Some ground rules for selecting the best estimator are given in Section 5 and results of simulation studies comparing the methods are presented in Section 6. Throughout, because we are dealing with small samples, we restrict consideration to batch methods. 2. LEAST SQUARES METHODS

2.1. Model structure Consider estimation of the parameters p • R q in the ordinary differential equations

ic(t, p)=f(x(t, p), u(t), t, p); X(to, p)--Xo(p).

essential and may be eliminated by applying the multiresponse estimation criterion of Bates and Watts (1985). In this paper, we deal only with the least squares method and its modifications; this is to keep the similarity with the simplest discrete-time identification techniques. 2.2. Indirect least squares method (ILS) This method, which involves the integration of differential equations for use in parameter estimation algorithms, is well documented in the statistics literature, but rarely mentioned in the control engineering literature. Let x(t, p) = F(t, Xo, u, p) denote the solution of (1) with initial conditions xo(p) and input u and define

F F(tl, Xo, u , p ) ] y=

Y2

and

yi=x(ti, p)+vi,

i--0, 1 , . . . , m ,

(2)

where y; • R n and u i • R n. Let v= [v~, VET,..., V~] denote the noise vector in the entire sample with the properties E(v)= 0 and E(vv r) =R, where R is the positive definite covariance matrix, known at least up to a scalar multiplier. We also assume that the disturbances v and the variables x are independent. In many continuous-time applications, parameter estimation is based on special identification experiments, rather than normal operating records and the deterministic input functions are usually of very simple form, typically consisting of impulsive and step functions. Zero-input experiments, describing either the response to a non-zero initial condition, or the response, for t > 0, to an impulsive perturbation, are particularly important and they will be considered in our examples because much of the published data are from such experiments. Observations of the form (2) assume that all state variables are measurable. While this may appear a restrictive condition, it is satisfied in many engineering applications, where models rarely involve unmeasurable quantities. In a number of other cases this situation is attained by applying quasi-steady-state approximation to the "fast" and usually unmeasurable variables of the model. Such simplifying assumptions are particularly important in chemical and enzyme reaction kinetics, see, e.g., Gelinas (1972) and Klonowski (1983)• The assumption that the covariance matrix of disturbances is known is not

/

F(f2"'•x°"/~"'P) '

[.F(tm, Xo, u, p)

(1) Let the measurements of x(t, p) • R" be made at t=to, tl . . . . ,tm and assume that the observations can be described by

q~(p)=

(3) The objective function to be minimized is QtLs(P) = ( r -

dp(p))Tw(y- dp(p))

(4)

where the weighting matrix W satisfies R = aZw -~, where o 2 is an unknown scalar and R is the covariance matrix of v. The estimation is inherently nonlinear, even for linear systems, and generally requires numerical integration of the differential equations in each iteration step. Bard (1970) and Nazareth (1980) have shown that one of the most efficient least squares minimization algorithms, in terms of the total number of function evaluations, is the classical Marquardt procedure, using the Jacobian matrix Otp(p)/Op. Evaluation of this matrix either by solving the sensitivity equations or by finite difference approximations requires integration of (q + 1 ) x n differential equations in each iteration step. The computational effort becomes prohibitive if the system described by (1) is very stiff, as is the case in many chemical and biological applications.

2.3. Direct integral least squares method (DILS) The estimation of parameters directly from the differential equations (1), instead of their solutions, results in improved numerical efficiency and is a classical approach in chemical reaction kinetics. In contrast to ILS, this approach is very rarely mentioned in the statistics literature. The basic problem of the approach is the estimation of the derivatives from noisy discrete observations. As shown by Young (1970) and Young and Jakeman (1980), this problem can be solved through the use of a state variable filter with frequency bandwidth

Continuous-time parameter estimation that approximately encompasses the frequency band of interest in the identification. The filter also damps the effects of initial conditions on the process variables. While state variable filters have also been used in the case of relatively coarse sampling intervals, such applications require great care (Young and Jakeman, 1980). In applications considered here the sampling rate is usually low and the data are collected in zero-input experiments. In addition, we consider mostly nonlinear systems. While the use of state variable filters under real conditions needs more

709

matrix O~/Op of the prediction t~(p) can easily be computed. Changing the order of differentiation and integration, we obtain

O fti'Sfp(~)dr= f~i'S°pt/aPJ(r)dz

apt

(9)

where S~J/apj denotes the n-vector of natural cubic spline functions interpolating the values Of(yi, u(ti), t~, p)/api, i= O, 1. . . . . m, over the interval t0-< t-< tin. The Jacobian matrix of the prediction (8) is then given by

....

((Xo)q+ Ii' S~f/aP~(r)dr) (10)

J(p) =

.... ( xo, research, good results were obtained by avoiding the need for numerical differentiation through the use of the integral equations

x(ti, p) = xo(p) +

f(x(r, p), u(r), lr, p) dr.

(5) The idea of direct integral methods is to approximate the integrands in (5) by functions interpolating the points f(Yi, u(ti), ti, p), i= 0, 1 , . . . , m and then to predict the response x(t, p) by evaluating the integrals. The simplest choice is a piecewise linear interpolation resulting in the trapezium rule of numerical integration. In many applications, spline interpolation gives better results, as shown by Yermakova et al. (1982) and Sinha and Qijie (1985). Let S~ denote the n-vector of natural cubic splines interpolating the values f (y~, u(t~), t~, p), i = 0, 1 . . . . . m. Then the predicted output is given by

P(t,, Xo, u, p) --Xo(p) +

s (o dr.

(6)

To minimize the deviations between observed and predicted outputs, we use the criterion QDILS = ( Y - cP(p))Tw( r - ~P(P))

(7)

where

F ((tl, Xo, u, P) 3

LF(tm, x0, u, p ) J Evaluation of the Jacobian matrix adp/ap is the most computationally expensive step of the indirect method, but by contrast, the Jacobian

where (Xo)j = aXo(p)/apj. Thus it is computed by the same interpolation technique used for the prediction. Minimizing (7) by the Marquardt method gives the iteration formula pk+l = p k + [fr(pk)WJ(p~, ) + ~k/]-a

x ]V(pk)W[Y - ~(p*)].

(11)

where ~k denotes the Marquardt's ~. in the kth iteration step; see Bard (1984). The DILS algorithm is a natural continuoustime nonlinear counterpart of the well-known discrete-time linear least squares method (see, for example, Astr6m and Eykhoff, 1971; Strejc, 1980). The DILS method is more efficient numerically than the ILS method because spline approximation followed by analytical integration is much faster than numerical integration of differential and sensitivity equations. If the model (1) is linear in the parameters, then the Jacobian matrix does not depend on the parameters and, as in the discrete-time case, the least squares estimate is obtained in one step. Applications have appeared in the literature for many years. Early examples are those of Joseph et al. (1961), and Himmelblau et al. (1967). Recent applications are discussed by Sinha and Qijie (1985). The initial conditions can be eliminated from the predictions by the method of modulating functions (Loeb and Cahen, 1965), but this is not necessary in the approach described here since the initial conditions can be regarded as free parameters. The nonlinear version of the DILS method has been studied by Yermakova et al. (1982). Application of a smoothing spline function in (6) instead of an interpolating one can be

S. VAJDA et al.

710

regarded as a state variable filter. Smoothing is certainly advantageous if the sampling intervals are sufficiently small and the error variance is a priori known. Otherwise we prefer simple interpolation, thereby avoiding the need for choosing a particular smoothing function. As will be shown in Section 3, filtered variables can be obtained from the model itself using the auxiliary model principle in conjunction with the method of instrumental variables (Young and Jakeman, 1980).

2.4. Statistical properties of least squares estimators The prediction error Y - ~ ( p ) in the DILS method is correlated with the predicted response q~(p) and the estimates are not consistent (i.e. they are asymptotically biased) (Strejc, 1980), whereas the ILS estimates are consistent (Jennrich, 1969). Therefore, the DILS approach is usually regarded as only a short-cut method to simplifying parameter estimation, with inferior statistical properties of the estimates (see, for example, Hosten, 1979). Here we use some heuristic arguments to show that consistency of the ILS estimates does not necessarily imply their practical superiority in every situation. For example, if the columns of the Jacobian matrix J(p) are nearly collinear, the conventional LS estimates tend to be inflated (Hoerl and Kennard, 1970; Hocking, 1983). In many control applications, this problem can be avoided by using low-order models and appropriate parameterizations (see, for example, Gevers and Tsoi, 1984), but it remains an important problem in applications involving models with physically-justified state variables and parameters. Jennrich and Sampson (1968) showed that such models are often "poorly parameterized" in the sense that the collinearity of the Jacobian is inherent. Let ~-1-> ~.2 -> • " " ->/~min~ 0 denote the eigenvalues of the approximate Hessian matrix JT(/3)WJ(/3) at the estimates /3. Based on the usual linear approximation of the response function 4~ near /3, the volume of the joint confidence region of the parameters is proportional to

s = oa/I-I J~j ~j=l

(12)

(Bard, 1974). Further, let L 2= (p _/3)T(p _/3) denote the squared distance between the LS estimates/3 and the true parameters p. Again, based on the linear approximation of the

response function, E (L 2) = trace [jT(/3)R -1j(/3)]-1 q = 2 0 " 2 / / ~ j > O2/~min j=l

(13)

(Hoerl and Kennard, 1970). Thus if nearcollinearity occurs, we may expect both large variances and large mean-squared error of the estimates. As the number of samples becomes small, these are of more serious concern than asymptotic biasedness. To reduce the consequence of nearsingularity, several biased estimators have been proposed that dampen or shrink the least squares estimator towards the origin by increasing the eigenvalues of the Hessian matrix (Hocking, 1983); the best known is the ridge regression of Hoerl and Kennard (1970). The eigenvalues of ]T(/3)W](/3) in the DILS method are usually larger than those of JT(/3)WJ(/3) in the ILS method. To see the source of this difference, consider a change Ap in the parameters. In the DILS method, the values x(ti, p) are approximated by observations Yi which do not depend on the parameters. Hence Ap affects the DILS objective function (7) more directly than in the ILS case, for which the response function values x(Gp) are also influenced by Ap and these cross-effects may result in near-collinearity of the columns of the Jacobian matrix. If there are large measurement errors, widely-spaced samples or insufficiently good interpolating functions, there is likely to be considerable error in the approximation of the right-hand side of (5). In this case, the severely biased estimator may outweigh the advantage of larger eigenvalues. In the next section, a further method for reducing the bias of the estimates is introduced. 3. REDUCING THE BIAS OF DIRECT ESTIMATES

For discrete-time models, the main operations for reducing biasedness are model extension, filtering and introducing instrumental variables. The last of these can readily be extended to continuous-time nonlinear systems. There is some freedom in choosing an IV signal 2 as long as it is correlated with the undisturbed variables x, but uncorrelated with the noise v. The most obvious procedure is to generate ,f by solving (1) at the estimates of the parameters. This idea of the "auxiliary model" has been pursued in a number of papers, e.g. Wong and Polak (1967), Young (1970), and Young and Jakeman (1980). Let ~pof/smdenote the natural cubic spline functions interpolating the values

Continuous-time parameter estimation

af($(ti, p), u(ti), ti, p)/apjwherei = O, 1 , . . . , m. Let J(p) denote the matrix obtained from (10), replacing S~I/apj by ~par/apj. Then, according to the basic IV method, (11) is replaced by

pk+l = pk + [jT(pk)WJ(pk ) + ,~ki]-I x j T ( p k ) W [ Y - (p(pk)].

(14)

Since ~ is an estimate of the undisturbed process signal x, (14) is the continuous-time counterpart of the bootstrap estimator discussed by S6derstr6m and Stoica (1983) in their Theorem 4.5. In the discrete-time linear case, this estimator is consistent under fairly mild conditions and one may expect global convergence for sufficiently large samples. As to its efficiency, the estimator is optimal if v is white noise; see Theorem 6.2 in S6derstr6m and Stoica (1981). If v is not white noise, the estimates generally do not have optimal properties, but according to Young and Jakeman (1980) should be reasonably efficient. To avoid the inversion of a non-symmetric matrix and later to form a multistage estimation procedure we use the symmetric variant of the bootstrap estimator, replacing (14) by the iteration formula pk+l =pk + [y(pk)WY(pk ) + Aki]-a

x j T ( p k ) W [ Y - q~(pk)]

(15)

where the starting point p0 is the DILS estimate. The symmetric bootstrap (SB) estimator can be interpreted in different ways. First, from a practical point of view, it minimizes the indirect least squares objective function (4) using the approximate Jacobian matrix J(p) based on (10), with spline functions interpolating the values

Of($(t,, p), u(t,), t,, p)/Opj,

i = O, 1. . . . .

m.

Second, with ~k= O, (15) is the continuous-time counterpart of the discrete-time pseudo-linear regression estimator, recently discussed by Stoica et al. (1985). In fact, let sj(t,p)= O~(t, p)/Opj denote the vector of sensitivities of the solution of (1) with respect to pj, then from the sensitivity equation the entries of the Jacobian matrix J(p) are given by

si(ti, p) =

f,i'af-~x (x(v, p), u(v),

v, p)s~(v, p) dv

(" Of (~(v,p), u(v), v,p) dz.

+J,0apj

(16)

Evaluating J(p), we neglect the first term on the right-hand side of (16), as in the pseudo-linear regression, and we approximate the integrand in the second term by spline

711

functions. As shown by Stoica et al. (1985), for linear systems and in its basic form [i.e. with Zk= 0 in (15)], this algorithm converges to the consistent estimates under quite restrictive conditions, but its convergence can be improved considerably by a "step-variable" modification, multiplying the second term in (15) by a scalar trk such that ,~ ~'k ~ 0,

Z trk = ~, k=l

and

lim ~k = 0.

k--~

It should be emphasized that in monitoring the convergence, the Marquardt-algorithm in (15) automatically introduces a similar sequence of multipliers, with an additional effect of regularization. In fact, if QiLs(pk+l)>--QiLs(pk), t h e n ~k+l ~ )k, thereby reducing the step length. It should be noted that the Marquardtalgorithm probably does not yield the best "step-variable" SB algorithm and based on the results of Stoica et al. (1985) its properties certainly can be improved. Application of the same search technique in the DILS, SB and ILS methods, however, simplifies the formulation of a multistage estimation procedure, discussed in the next section. If the integration is replaced by state variable filtration, then the SB procedure is directly comparable with the sub-optimal IV method for continuous linear systems proposed by Young (1970), in which state variable filters were utilized to avoid integration and the problems with non-zero initial conditions (if these are not estimated). The integration is, however, advantageous in situations depending on initial conditions. In addition, for problems involving nonlinear models and coarsely sampled data we were unable to find state variable filters to replace integration in the DILS and SB methods without significantly worsening the estimates. This agrees with the general suggestions of numerical mathematics, which advocate avoiding any form of numerical differentiation as far as possible (Hamming, 1962). The problems are that it increases the random errors and its results are heavily dependent on additional assumptions, whereas numerical integration automatically decreases the effects brought about by zero-mean random errors. According to our experience, the direct integral approach (with multiple integrals) is advantageous also in the case of higher-order continuous-time A R M A models with continuous, but deterministic input signal perturbation. Dealing with applications where the linear model describes system behaviour around a stationary state, all state variables vanish for

712

S. VAJDA et al.

tSDms, where S is defined by (12), when an increased variance of the estimates in the ILS stage can be expected, and/or (C) /~lLSrnin< j~,DiLs,minwhen the ILS stage is expected to increase the mean-squared error of the estimates. This is a heuristic argument and a more reliable, though significantly more tedious, method of selecting the best estimator would be to perform a number of estimations with simulated data. As will be shown, however, in the next section, our conditions will predict the outcome of such a simulation studies. As noted previously, the DILS and frequently also the SB methods fail if QDms(fi)

Direct and Indirect Least Squares Methods in ... - Semantic Scholar

Direct and Indirect Least Squares Methods in ... - Semantic Scholar

Suggest Documents

Least-Squares Methods in Reinforcement ... - Semantic Scholar

Least-Squares Methods in Reinforcement ... - Semantic Scholar

LEAST-SQUARES METHODS FOR LINEAR ... - Semantic Scholar

Least Squares Optimization - Semantic Scholar

Integer Least-Squares - Semantic Scholar

and Diagonally Weighted Least Squares - Semantic Scholar

Direct Least Squares Fitting of Ellipses - Semantic Scholar

least-squares finite element methods for - Semantic Scholar

Direct Least Squares Fitting of Ellipses - Semantic Scholar

least-squares finite element methods for - Semantic Scholar

Least Squares/Maximum Likelihood Methods for ... - Semantic Scholar

LEAST SQUARES RESIDUALS AND MINIMAL RESIDUAL METHODS ...

LEAST SQUARES RESIDUALS AND MINIMAL RESIDUAL METHODS ...

Nonnegative least-squares image deblurring ... - Semantic Scholar

A MODIFIED LEAST SQUARES SUPPORT ... - Semantic Scholar

Multichannel recursive-least-squares algorithms ... - Semantic Scholar

Polyreference Frequency-domain Least-squares ... - Semantic Scholar

L2-PROJECTED LEAST-SQUARES FINITE ... - Semantic Scholar

The Preference of Direct or Indirect Methods in ... - Semantic Scholar

Incremental Least-Squares Temporal Difference ... - Semantic Scholar

Regularized Alternating Least Squares Algorithms ... - Semantic Scholar

PARTIAL LEAST SQUARES ON GRAPHICAL ... - Semantic Scholar

Multivariable Least Squares Frequency Domain ... - Semantic Scholar

Gain Adaptation Beats Least Squares? - Semantic Scholar