Data Assimilation and Inverse Methods in Terms of a ... - CiteSeerX

1 downloads 0 Views 703KB Size Report
Apr 24, 1996 - Peter Jan van Leeuwen .... very successful with the strongly nonlinear and chaotic Lorenz ... used by Tarantola 1987] and Lorenc 1988].
Data Assimilation and Inverse Methods in Terms of a Probabilistic Formulation Peter Jan van Leeuwen

Institute for Marine and Atmospheric research Utrecht (IMAU), Utrecht University, Utrecht, The Netherlands and

Geir Evensen

Nansen Environmental and Remote Sensing Center, Bergen, Norway

Submitted to Monthly Weather Review April 24, 1996

Peter Jan van Leeuwen IMAU, Utrecht University PO Box 80005 3508 TA Utrecht The Netherlands

Phone:+31-30-2537759 fax: +31-30-2543163 e-mail:[email protected]

Contents 1 Introduction

2

2 Formulation of the Inverse in Terms of Probability Functions

5

2.1 Model : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.2 Bayesian Statistics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

3 Maximum-Likelihood Estimator

5 6

7

3.1 Gaussian Error Statistics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 3.2 Linear Dynamics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 3.3 Nonlinear Dynamics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10

4 Minimum-Variance Estimator

12

4.1 Direct Ensemble Method : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12 4.2 Linear Ensemble Smoother : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 4.3 Nonlinear Smoother Estimate : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14

5 Examples 5.1 5.2 5.3 5.4 5.5

15

Reference Case and Simulated Measurements Error Statistics : : : : : : : : : : : : : : : : : Ensemble Kalman Filter : : : : : : : : : : : : Ensemble Smoother : : : : : : : : : : : : : : Direct Ensemble Method : : : : : : : : : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

15 16 16 17 18

6 Summary and Discussion

19

A Representer Calculations Based on Ensemble Statistics

22

1

Abstract The weak constraint inverse for nonlinear dynamical models is discussed and derived in terms of a probabilistic formulation. The well-known result that for Gaussian error statistics the minimum of the weak constraint inverse is equal to the maximum likelihood estimate is rederived. Then several methods based on ensemble statistics which can be used to nd the smoother (as opposed to the lter) solution are introduced and compared to traditional methods. A strong point of the new methods is that they avoid the integration of adjoint equations, which is a complex task for real oceanographic or atmospheric applications. They also avoid iterative searches in an Hilbert space, and error estimates can be obtained without much additional computational e ort. The feasibility of the new methods is illustrated in a two-layer quasi-geostrophic ocean model.

1 Introduction Weak-constraint inverse problems are normally formulated by allowing a model, its initial and boundary conditions and a set of measurements to contain errors. An integral is then de ned which contains the weighted squares of these errors, and the minimizing solution is an estimate which is close to the measurements and the rst guess initial and boundary conditions, and at the same time satis es the dynamical equations approximately. In such a formulation it has been assumed implicitly that the error statistics are Gaussian. For a linear model the weak-constraint inverse for time-dependent problems is readily solved using either the representer technique [Bennett, 1992], a gradient descent method, or the Kalman smoother or the sweep algorithm (Bennett and Budgell, 1989, Bennett, 1992). The extension of these methods to nonlinear dynamics is nontrivial and has lead to a focus on so-called suboptimal formulations and solution methods. An approach which has been popular in meteorology is to assume that the model is perfect and it is included as a strong constraint in the variational formulation. However, it has been illustrated by several authors that the strong constraint formulation imposes a strict limitation on the length of the time interval for nonlinear systems, related to the predictability limit of the model (Miller et al., 1994, Gauthier et al., 1993, Tanguay et al., 1995). This limitation does not appear in weak-constraint formulations (as proposed in a manuscript by Evensen and Fario submitted to the J. Atmos. Sci. Japan, 1995). Another popular class of methods are those based on sequential updating of the solution during a forward integration of the model. For linear dynamics the optimal sequential technique is the Kalman lter. In the Kalman lter an additional equation for the second statistical moment is integrated forward 2

in time to predict error statistics for the model forecast. The error statistics are then used to calculate a variance-minimizing estimate whenever measurements are available. The Kalman lter can be derived directly as a suboptimal minimizer of the weak-constraint variational formulation [Bennett, 1992] as long as linear dynamics are considered. For nonlinear dynamics the extended Kalman lter may be applied, in which an approximate linearized equation is used for the prediction of error statistics. It has been shown that this equation is based on a too simpli ed closure assumption, where higher-order statistical moments have been neglected, and this may lead to a nonphysical error variance evolution (Evensen, 1992, Miller et al., 1994). So far, only a few methods have proven successful for weak-constraint inverse calculations when the model dynamics are nonlinear, (see e.g., the review by Evensen, 1994a). The most promising methods used today are the representer method and variants of a substitution or descent method. The representer method was introduced in oceanography by Bennett and McIntosh [1982] and used for the rst time with nonlinear dynamics by Bennett and Thorburn [1992] with a quasi-geostrophic ocean circulation model. For nonlinear dynamics a convergent sequence of linear iterates of the Euler{Lagrange equations is de ned where each iterate is solved using representer expansions. The representer method is appealing because it completely decouples the system of Euler{Lagrange equations. The solution is sought in a nite dimensional space spanned by the so-called representers (one for each measurement) It can be shown that unobservable elds are rejected automatically and that the method is equivalent to Gauss-Markov interpolation (objective analysis) in space and time. The method has recently been implemented for tropical cyclone prediction, and this system provided a signi cant improvement of the meteorological forecasts compared to results from the operational systems Bennett et al. [1993]. The representer method has also been used for solving a nonlinear inverse problem in combination with parameter estimation in a manuscript by Ekness and Evensen (submitted to the J. Atmos. Sci. Japan, 1995). In the work by Egbert et al. [1994] it was shown that the numerical load, which is dependent on the number of measurements, could be signi cantly reduced since only an approximation of the representer matrix is needed to calculate the representer coecients. By de ning a numerical grid, and then calculating the gradient of the penalty function with respect to the state variables in space and time, the gradient can be used in a descent algorithm to nd the minimizing solution. The huge state space associated with such a formulation is the main objection against using a gradient descent method for weak-constraint inverse calculations. On the other hand, when using a gradient descent method, there is no need to integrate any dynamical equation, since a new candidate for the solution in space and time is substituted every step in the iteration procedure. 3

In a manuscript by Evensen and Fario, (submitted to the J. Atmos. Sci. Japan, 1995) the method was used for the rst time to minimize a waek-constraint functional. The method was tested and proven very successful with the strongly nonlinear and chaotic Lorenz equations. This suggests that the method may work well for general ocean circulation models. Recently Evensen [1994b] proposed a new sequential method, the ensemble Kalman lter, where the error statistics are predicted using ensemble integrations as an alternative to integrating the error covariance equation in the Kalman lter. The method resolves many of the problems related to nonlinear dynamics in the extended Kalman lter, and it also reduces the numerical load signi cantly. It was shown that this approach is equivalent to solving Kolmogorov's equation, which describes the evolution of the probability density function for the error statistics. The surprising result was that a rather small ensemble provided reasonably accurate results, and the method has recently been applied with success for a realistic application in the Agulhas Current by Evensen and van Leeuwen [1996]. For process studies one is more interested in generalized inverses or smoothers than in predictions with lters. A smoother gives a best estimate given model and all the data, while a lter only uses data up to the time point of interest. Assuming noise which is uncorrelated in time, lter solutions are discontinuous in time at data points, while a smoother solution at most has a discontinuous time derivative at data points. The promising results obtained with the ensemble Kalman lter lead us to search for new probabilistic methods that can be used for minimizing nonlinear inverse problems. A further motivation is obtained from the fact that for each linear iterate in the Euler{Lagrange equations the representers can be calculated from an ensemble integration. This means that the weak-constraint problem can be solved without any backward integrations of adjoint equations. Intuitively it is clear that the probability density of the model and the probability density of the data contain all information needed to calculate the inverse estimate. Indeed, using Bayesian statistics one can consider the probability density of the model forecast as prior information, which is `updated' by the data. This results in a new probability density of the model, given the data. This view is expressed by Tarantola [1987] for time-independent problems. Of course, one cannot calculate the probability density function of the model for realistic oceanographic or atmospheric problems, however, only a few statistical moments are needed to calculate an analyzed estimate. As in the ensemble Kalman lter, ensemble or Monte-Carlo calculations can be used to represent the probability density of the model. When the probability density of the data is known, which is usually assumed to be the case, the moments of the new probability density can easily be determined. 4

The outline of the paper is as follows. In Section 2 the intuitive idea of using the probability densities for the smoother solution is presented, and the generalized inverse is formulated in a probabilistic context. In the next section the maximum-likelihood estimator is calculated and shown to result in the well-known penalty-function approach to inverse modeling. Expressions for the estimator are given for linear and nonlinear model dynamics. In Section 4 the minimum-variance estimators are presented. First a direct method is presented for Bayes theorem where a frequency or ensemble representation is used for the prior probability function. Then another variance-minimizing method is presented, based on calculating the second statistical moment from the ensemble and then using this in the standard variance-minimizing analysis scheme. This method becomes equal to standard Gauss Markov interpolation in space and time with higher-order statistical moments neglected. The new methods are tested on a two-layer quasi-geostrophic ocean model in Section 5, and a comparison to the ensemble Kalman Filter is made. A summary with discussion is given in Section 6.

2 Formulation of the Inverse in Terms of Probability Functions In this section we show how an inverse estimator can be formulated in terms of probability density functions, using Bayes theorem. In the context of time-independent problems this method has been used by Tarantola [1987] and Lorenc [1988]. The use of ensemble or Monte-Carlo methods in calculating the estimator will be outlined.

2.1 Model Consider a nonlinear model, de ned for the domain D with boundary D,

@ = g( ) + q; @t

(1)

(t = 0) = + a;

(2)

(x = D) = B + b;

(3)

with initial conditions and boundary conditions

where the q, a, and b are random errors with known distributions. It is assumed that with all error terms equal to zero this system would provide a unique solution. By adding additional conditions in 5

the form of a set of measurements

d = L[ ] + ;

(4)

the system becomes over-determined and the error terms are introduced so a solution can be found which minimizes these error terms in some sense.

2.2 Bayesian Statistics The determination of the generalized inverse can be considered as the estimation of the unknown true model variables given the data and the model estimates with information about their prior error statistics. If the error statistics are given in terms of probability density functions a Bayesian estimation problem can be formulated. In Bayesian statistics the unknown is viewed as the value of a random variable and the probability density of the data d is interpreted as the conditional distribution density f (dj ) of d assuming = . The pure model is regarded as a-priori information and it is used to assign a density f ( ) to the random variable . Using the de nition of a conditional probability density we can derive the probability density of given the data

f ( jd) = f (dfj (d)f) ( ) :

(5)

The denominator can be rewritten as Z

f (d) = f (dj )f ( )d :

(6)

To use this concept, the prior probability density f ( ) has to be determined as well as the probability density of the data f (dj ). The latter is usually assumed to be known to be, for instance, a Gaussian. More problematic is the probability density function of the model evolution. If one assumes that the model equations describe a rst-order autoregressive, or Markov, process this probability density can be determined by solving the Kolmogorov equation. The assumption is that the model is forced randomly as

d = g( )dt + d ;

(7)

in which d are random increments with known variance and mean. The probability density for the model state has a huge amount of variables, so it is computationally not feasible for real oceanographic or meteorologic applications to determine its evolution. An alternative is to determine it from an 6

ensemble calculation, but to construct the density from the ensemble members is again not feasible. However, generally one is not interested in f ( jd) as a whole, only in its rst few moments e.g., a best estimator of the truth and its error variance. In that case Monte-Carlo experiments can be extremely useful. These estimates can be obtained in a number of ways. The most general estimator is the maximumlikelihood estimator which will be evaluated in the next section. To use that concept the joint probability density of model evolution and data has to be known. The most-used estimator is the minimum-variance estimator. It is well known that for Gaussian-distributed variables, the minimum-variance estimate is equal to the maximum-likelihood estimate. We will elaborate on the minimum-variance estimate in Section 4.

3 Maximum-Likelihood Estimator To use the maximum-likelihood method the probability density of the model evolution given the data has to be known. The value ^ of that maximizes f ( jd) is the maximum likelihood estimate of . In practice one maximizes the logarithm of f ( jd), the so-called log-likelihood function. From the monotonicity of the logarithm it follows that ^ also maximizes the logarithm of f ( jd). The conditional probability density of model evolution and data is given by, using (5) and (6),

f ( jd) = R ff((ddjj ))ff(( ))d :

(8)

Thus, the probability density of the data given a model evolution f (dj ) and the probability density of , f ( ), must be known. For the model one has to specify initial and boundary conditions with their respective probability densities. The probability f ( ) used above should therefore be written as f ( j B ; 0 )f ( B )f ( 0 ), and accordingly the probability density for the measurements should be f (dj ; B ; 0). The maximum-likelihood estimator maximizes the probability density

f ( ; B ; 0jd) = Af (dj ; B ; 0)f ( j B ; 0)f ( B )f ( 0);

(9)

or rather the log-likelihood function log f ( ;

dj ; B ; 0) + log f ( j B ; 0) + log f ( B ) + log f ( 0) + log A;

B ; 0 jd) = log f (

7

(10)

where A arises from the denominator in equation (8) and only depends on the data.

3.1 Gaussian Error Statistics For the probability density of the model evolution we use in this section the assumption that the model evolution can be described as a Markov process. This means that the past has no in uence on the future if the present is speci ed. Introduce a model state n , which develops from the deterministic model evolution g~( n?1) and random increment d (n). At this stage it is only necessary to assume that the distribution of the d (n)'s is known, we do not have to assume its shape. The joint probability density of n and n?1 is found from

f ( n; n?1) = f ( nj n?1)f ( n?1):

(11)

Because the d 's are independent by de nition, we have

f ( nj n?1; n?2) = f ( nj n?1);

(12)

so if this sequence is continued we arrive at

f ( ) = Nn=1 [f ( nj n?1)] f ( 0); in which 0 = (0) and = Gaussian distributed, i.e.,

n ; n?1 ; :::; 0 .

(13)

We now assume that the random increments d are

 1 ~ f ( nj n?1) = A exp ? 2 ( n ? g~( n?1))  Wqq  ( n ? g~( n?1)) ; 

(14)

in which A is a normalization constant, W~ is the inverse of covariance matrix of d and the  denotes integration over the space variables. By taking the continuum limit we nd 



f ( ) = A exp ? 12 ( t ? g( ))  W  ( t ? g( )) f ( 0);

(15)

where the subscript t denotes di erentiation in time. If all error distributions are Gaussian, the maximization of the joint probability density (10) becomes equal to the minimization of the penalty function J , given by J [ ] = q  Wqq  q + a  Waa  a + b  Wbb  b + T w; (16) 8

where q, a, b, and  are errors in the model equations, initial condition, boundary conditions and data respectively. The  is a short-hand notation for integration over the total space-time domain of interest, the  denotes integration over space, and the  denotes integration over the boundaries and time. The W 's and w are the inverses of the covariances of errors in the model, initial-, and boundary conditions, and measurement errors respectively. With this equation we rederived the notion that the minimum-variance estimator and the maximumlikelihood estimator are equal for Gaussian-distributed variables. From the penalty function one can derive the Euler-Lagrange equations, from which one can derive well known data assimilation methods like the representer method and the strong-constraint adjoint method. A quadratic penalty-function like (16) implicitly assumes that the errors are Gaussian. If this assumption fails to be true, the penalty function will no longer de ne the maximum-likelihood estimator, but it can still be used as a variance-minimizing estimator.

3.2 Linear Dynamics Assuming that all errors are Gaussian-distributed and the model is linear it is easy to see that the errors of itself are Gaussian distributed at all times. Let F be a rst guess solution obtained from a model integration with zero noise. For linear dynamics the time increments of the model solution, n+1 ? n , including boundary values, can be written as a linear combination of the initial eld 0 and the random increments d n. It is well known that the sum of Gaussian distributed variables is again Gaussian distributed. Thus the distribution for the a-priori model solution becomes 

fG ( ) = A exp ? 21 ( ?

F)  W



 ( ? F) ;

(17)

and the posterior density then becomes 

f ( jd) = A exp ? 12 ( ?

F)  W  ( ?

1 (d ? L[ ])T w (d ? L[ ]) ; ) ? F 2

(18)

from which one can de ne the maximum likelihood estimate for Gaussian statistics and linear model dynamics as the minimum of the penalty function

J [ ] = ( ? F )  W  ( ? F ) + (d ? L[ ])T w (d ? L[ ]) : 9

(19)

It is a simple exercise to derive an expression for the minimizing solution (see next section) which of course becomes identical to the representer solution [Bennett, 1992], i.e., ^= with

F

+ rT b;

(20)

(R + w?1 )b = d ? L[ F ];

(21)

The representers r are the model eld-measurement covariances,

r = E [( ? F )L [( ? F )]] = L[Q ]

(22)

in which E [::] is the expectation operator. The representer matrix R is the measurement-measurement covariance, R = E [L [( ? F )] L [( ? F )]] = LT [r]: (23) The error covariance of the model Q

is related to the weigths W

as

Q (x1; t1; x3; t3)  W (x3; t3; x2; t2) = (x1 ? x2)(t1 ? t2):

(24)

Equation (21) is a matrix problem that is of the same order as the number of observations. When this number becomes large this problem becomes dicult to solve. Thus, when all errors are Gaussian distributed the Euler-Lagrange equations are not needed to derive the minimum of the penalty function. Of course, this could be anticipated beforehand because the model evolution equations do not appear explicitly in the expression for J [ ] in (19). This estimator is also equivalent to Gauss-Markov interpolation in space and time.

3.3 Nonlinear Dynamics The probability density of the a-priori model integration is partly determined by its mean and its covariance. For linear dynamics, assuming Gaussian statistics, these two completely determine the density, and the maximum likelihood solution can be found as shown in the last section. For nonlinear dynamics the probability density may be separated in a Gaussian part (G), with the mean and covariance of the whole density, and a non-Gaussian part (N), describing the deviation from a Gaussian density.

10

Thus the model density distribution to be used in the expression for the posterior density (8) becomes

f ( ) = fG( )fN ( ):

(25)

The maximum-likelihood estimator can be found from the variational derivative of minus the logarithm of the posterior density

?  log (f (dj )fG( )fN ( )) = W  ( ^ ? F ) ? LT []w(d ? L[ ^]) ?  log fN ( ) = 0: (26) in which F is the rst-guess evolution of , which follows from a pure model integration with zero noise, W is the functional inverse of the covariance Q of , and L[] denotes the measurement of  functions at the measurement times and positions. This equation can be rewritten as ^=

F

+ LT [Q ]w(d ? L[ ^]) + Q   log fN ( ):

(27)

To proceed this equation is measured and subtracted from the measurements d to nd (d ? L[ ^]) = w?1 (R + w)?1 (d ? L[ F ]) ? w?1(R + w)?1L[Q ]   log fN ( );

(28)

in which R = L[rT ] is the representer matrix and r = L[Q ] are the representer elds. If we use this expression in (27) we obtain ^=

T F +r b?





rT R + w?1

?1

r?Q



  logfN ( ) :

(29)

This expression for the inverse estimate is actually the representer solution with an additional term holding information about the non-Gaussian contribution. To nd the nonlinear smoother one has to specify fN . A way to do this is to assume a certain form of fN in which the parameters are determined from the ensemble with the maximum-likelihood method or the moment method for instance. The problem remains that we have no clue on what form to choose fN . Even for simple forms of fN complicated expressions for ^ arise. A way to circumvent these problems is to calculate a few additional moments of the ensemble, i.e., the skewness and the kurtosis. If they are small the smoother can be approximated by using Gaussian error statistics. If not, the method should not be used. 11

4 Minimum-Variance Estimator The minimum-variance method can be phrased as follows: nd the variable ^ such that the variance h

i

Z

e( ^) = E ( ? ^)2 = ( ? ^)2f ( jd)d ;

(30)

is minimal. (In this section contains the boundary and initial values.) The minimum can be obtained from the calculus of variations as Z 2 ( ? ^)f ( jd)d = 0; (31) so we nd

^=

Z

f ( jd)d :

(32)

Now use the expression for f ( jd) from equation (8) to nd ^=

R R

f (dj )f ( )d : f (dj )f ( )d

(33)

Two ways of evaluating this expression are given in the following.

4.1 Direct Ensemble Method One way to proceed is to use a frequency interpretation of the model probability density function. This leads to the following well-known expression for the expected value of a general function h( ),

E [h( )] =

PN i=1 h( i )f (dj i ) : PN

dj i)

i=1 f (

(34)

If the probability density function of the data is known this expression can be used to evaluate the in uence of the data. The relative frequencies of the model states can be determined from an ensemble calculation. The probability density at a certain time is represented by a large cloud of states at that time. The integration of an ensemble of these states forward in time builds up the complete model evolution, or prior, probability density. For the expected value and the variance one then nds

E[ ] =

PN i f (dj i ) Pi=1 N

dj i)

i=1 f (

12

(35)

and

h i E ( ? E [ ])2 =

PN

])2f (dj i ) : dj i)

i=1 ( i ? E [ PN i=1 f (

(36)

The interpretation of (35) is that each member is weighted with its \distance" to the data. To calculate the ensemble smoother variance the alternative formula

E

h

i P 2 f (dj i ) 2 ? E2[ ] ( ? E [ ]) = Pi i

dj i)

i f(

(37)

is used. The summations of the terms f (dj i ), and i f (dj i ) and i 2 f (dj i ) can be performed by integrating the ensemble members forward one at a time and add their contribution to the summations. The expressions (35) and (37) can be evaluated at the desired times during the accumulation of the contribution of new ensemble members and the process can be stopped when acceptable convergence has been obtained. If only the mean and its variance are needed, only three xyz -dependent elds have to be stored at each time-step during the integration of each ensemble member. The strong point in the above formulation is that we have derived a smoother for nonlinear dynamics which only uses forward model integrations. This means that the dicult and complex task of determination of the adjoints of the model equations can be avoided. Furthermore, the assumption of noise being uncorrelated in time can easily be relaxed. Of course, the method is only feasable if one can obtain a good estimate of the prior and posterior probability density with an ensemble of reasonable size.

4.2 Linear Ensemble Smoother Consider the mean and the covariance of the prior distribution and the data with its variances. A linear unbiased smoother estimate can be found as ^=

F

+ C (d ? L[ F ])

(38)

in which C a vector of space-time elds which has to be determined. The standard variance-minimizing analysis scheme can be used to nd, ^ = F + rT b; (39) with

(R + w?1 )b = d ? L[ F ];

13

(40)

in which the space-time covariance function Q , the representer r, and the representer matrix R, can all be calculated from an ensemble of model solutions. The rst-guess estimate will be the mean of the ensemble, thus Q can be interpreted as the error covariance of the rst guess. Because of the nonlinearities in the model, this estimator is not the maximum-likelihood estimator which requires the contribution from the last term in (29). However, it can be interpreted as the best linear unbiased smoother for nonlinear model dynamics. This approach (Ensemble Smoother) is similar to the method used in the ensemble Kalman lter [Evensen, 1994b], where a variance-minimizing estimate is calculated at measurement times based on the current ensemble statistics, except that the estimate is calculated over the whole space and time domain. Note that this ensemble smoother is as simple to calculate as the ensemble Kalman lter and no elds as function of both space and time must be stored. The computational load is either the same or twice that of the ensemble Kalman lter dependent on whether the ensemble members are stored at diagnostic output times or the ensemble is recalculated. The details for construction of the smoother solution using ensemble statistics are given in Appendix A.

4.3 Nonlinear Smoother Estimate To nd the minimum-variance estimator taking the nonlinear model evolution into account we use the same method as in section 3.3. The prior density is divided in a Gaussian and a non-Gaussian part and essentially the same derivation as in section 3.3 is used to nd, with (33), ^=

1 hrT (R + w?1)?1 r ? Q F +r b? A T

i Z

 fN ( ) f (dj )fG( )d ;

(41)

in which A is a normalization factor. Again we recognize the representer expression plus an extra term describing the contribution of the non-Gaussian part of the prior density. A comparison with (29) shows that the di erence between the maximum-likelihood estimator and the minimum-variance estimator lies in the contribution from the non-Gaussian part, as can be expected. Equation (41) can also be compared with the expression obtained with the best linear unbiased estimator (39). It is clear now what the linear estimator does: it simply neglects the non-Gaussian part of the prior distribution.

14

5 Examples The smoothing algorithms will now be illustrated using a simple example. A two-layer nonlinear quasigeostrophic model on a -plane, and with a at bottom, is used to model the evolution of a perturbed baroclinic jet propagating through a periodic channel in the east-west direction. This case is based on the examples from Ikeda and Lygre [1989] who studied eddy{jet interactions with the QG model. A similar example was also applied for studies of open boundary conditions in Evensen [1993]. The following parameters have been used in all the model runs: the internal Rossby radius is Rd = 5600 m, and the nondimensional grid spacing is x = 0:5 on a 80  40 grid. The velocity scale U = 0:3 m s?1. The total depth is 300:0 m and an upper layer of 50:0 m and a lower layer of 250:0 m have been used with a density di erence of  = 1:0 kg m?3. This results in Froude numbers fr 1;2 = 1:0 and fr 2;1 = 0:2, and using a Coriolis parameter f = 1:25  10?4 , the Rossby number becomes " = U=fRd ' 0:43. The dimensionless time step is dt = 0:125. These parameters are typical for mesoscale processes in the Norwegian coastal waters. The QG model has proven to give good results in this parameter regime as discussed by Haugan et al. [1991] and Ikeda et al. [1989] even though the Rossby number is rather large.

5.1 Reference Case and Simulated Measurements First, a reference case is created in which the initial upper layer stream function is de ned by a jet with a Gaussian velocity pro le that propagates through the channel, and is interacting with a barotropic cyclone. The jet is initialized as 1 = ? 0:8

Zy

0

# "  0 y ? 5 2

exp ?

2

dy0;

2 =0:0;

(42)

and then the barotropic cyclone is added in both layers, 2

l = l ? 2 exp 4?

! 3 (x ? 12:5)2 + (y ? 8:5)2 2 5 ; 2

p

(43)

for l = 1; 2. The model is integrated from this initial condition for 40 time units, and the resulting model solution is used as initial condition for the reference run. The upper and lower layer stream functions for the reference run are shown in Figure 1. The triggering of a baroclinic instability produces strong 15

perturbations of the upper layer jet which propagate downstream. Measurements are collected from this reference case in every fth grid point summing to a total number of 112 observations at each measurement time. The measurements are taken at the times t = 10; 20; 30; 40. Each stream function measurement is contaminated with independent Gaussian noise with zero mean and variance equal to 0.02.

5.2 Error Statistics An error-covariance model is speci ed for the generation of pseudo random elds. It is used to determine a perturbed initial condition, the initial ensemble, and the system noise, and it is given as !

2 cov(x1 ; x2 ) = e(x1 )e(x2 ) exp ? jjx1 ?r2x2 jj2 ; d

(44)

where the horizontal de-correlation length is rd = 3:0. Note that the errors decrease to zero at the closed boundaries where the stream function must be kept constant to avoid cross boundary ow, (see e.g. the initial variance plots in Figure 3). The initial condition for the data-assimilation runs are generated by adding a smooth pseudo random eld with zero mean and variance equal to 0.2, to the initial condition for the reference case. Further an ensemble of pseudo random elds are then added to this new initial condition to create the initial ensemble. The resulting mean of the inital ensemble is shown in Figure 2 at t = 0:0. The vertical correlation has been set to zero initially, to allow the model dynamics to develop the proper vertical correlation structure. For this reason the vertical correlation is also set to zero in the system noise which is simulated by adding pseudo random elds to each of the ensemble members every time step. These random elds are also Gaussian distributed having mean zero and variance equal to 0.0005. The ensemble size is 500. The standard errors in a sample of N estimates are of order 1=N 1=2 = 0:04. Indeed, increasing the ensemble size resulted in no visual changes in the shown results.

5.3 Ensemble Kalman Filter First an example is given using the ensemble Kalman lter for comparison with the smoother solution. The ensemble Kalman lter is the optimal sequential method as long as a variance-minimizing analyses scheme can be used, i.e., the ensemble is close to a Gaussian distribution. A time series of the estimated upper and lower layer stream function is shown in Figure 2. Clearly, 16

the ensemble Kalman lter is capable of keeping the solution on track, and all signi cant features are reproduced in the correct location. Note also that the lower layer stream function is very well reconstructed even if the initial ensemble did not contain any correlation between the two layers. The time evolution of the variance estimate is shown in Figure 3. The initial variance is fairly constant in the center of the channel and declining to zero close to the channel boundaries. The upper layer variance is strongly reduced after the rst analysis step and a low variance is maintained during the simulation, thus the upper layer is controlled by the measurements. On the other hand, the in uence on the lower layer is rather weak in the rst analysis step, because the vertical in uence functions have not had time enough to develop properly yet. However, after the analyses at t = 20 the variance has been reduced to a signi cantly lower value, and proper in uence functions have been established. Such an increase in the vertical correlation can be expected in this case where there is no topographic in uence on the lower layer and no surface forcing has been used. The ows in the two layers are therefore expected to adjust to each other, e.g., an eddy in the upper layer will spin up an eddy in the lower layer. Note also the increased variance in between measurement times at t = 25:0.

5.4 Ensemble Smoother Now, the same example is examined using the linear ensemble smoother proposed in Section 4.2. The estimated stream function is shown in Figure 4. Again it is clear that the solution is relatively close to the reference solution, and that the smoother is capable of controlling the evolution. As can be expected, the smoother performs better than the lter at small integration times. For instance, at t = 0:0 the eddies at (65, 10) and at (72, 32), which are not present in the reference case, are less strong in the smoother case than in the lter case. Hence, there is an in uence of future data. A surprising result is that the smoother estimate is poorer than the lter estimate after some integration time. This observation is supported by the variance estimates given in Figure 5, which clearly show a higher variance than the lter experiment. This interesting result can be further illustrated by examining a time series of mean square errors (trace of the error covariance matrix) in Figure 6. The variance of the prior estimate grows exponentially during the initial integration and thereafter starts to level o . The lter variance has the characteristic behavior as has been observed in previous applications with the extended Kalman lter [Evensen, 1992], and the ensemble Kalman lter [Evensen, 1994b]. The ensemble smoother estimate has some of the characteristic behaviors of the experiments with the Kalman smoother with a linear quasi-geostrophic model by Bennett and Budgell [1989], i.e., there are local minima of the variance at measurement locations, and the variance estimate is continuous in time. 17

However, there are some remarkable di erences too. In Bennett and Budgell [1989] it was found that the smoother variance was equal to the lter variance at the nal time, and that the smoother variance at all other time locations was lower than the lter variance. These are characteristics of linear dynamics and are clearly not valid for nonlinear dynamics. Note that this is not due to the ensemble approach. A linear model with stable dynamics will have a prior estimate which is constant in time or at most linearly growing if model errors are included. This can be illustrated by considering a linear advection equation with a constant advection velocity. All ensemble members will then conserve their shape and propagate with the same phase speed, and thus the ensemble will not spread. The ensemble smoother will clearly give exactly the same result as the Kalman smoother for linear dynamics in the limit of an in nite ensemble size. The poor performance of the ensemble smoother compared to the lter can best be explained by the poor prior estimate illustrated by its large variance in gure 6. In the lter estimate the complete ensemble, so the complete prior distribution is 'pulled' towards the data at every data point. The posterior or analyzed distribution then has a much lower variance. It will grow exponentially again until the next data point. If enough data points are present the variance increase due to this exponential growth will not be that large, so that the next analysis will pull it again below the smoother variance. Another problem with the ensemble smoother when used with quasi-geostrophic dynamics concerns the propagation of energy towards large wave lengths. When the ensemble for the prior error statistics evolves freely as in the ensemble smoother, it will result in an ensemble of rather smooth samples with a correspondingly wide correlation function. (In more advanced primitive equation ocean circulation models this dynamical-smoothing problem will not appear.) Note that the reference run does not show this smoothing yet because its initial conditions are such that the instabilities still have to grow. This is not a problem in the ensemble Kalman lter run since the analyses yields stream functions with scales similar to those contained in the measurements. This problem actually means that the model is biased to too smooth solutions. This bias can be removed by adding forcing and dissipation to the model, but we will not elaborate further on this here.

5.5 Direct Ensemble Method We also computed the direct ensemble estimate based on the method outlined in Section 4.1. Note that this computation can be performed in parallel with the ensemble smoother solution since all required information has already been stored for the smoother calculations. The results of the computation were not good and are not shown here. The direct ensemble estimate 18

is a weighted average of the di erent individual ensemble members, and the weights are determined by the \distance" of each ensemble member from observations. The problem with this method is that only a few (4 to 5) members will be relatively close to the observations while the rest of the ensemble members are too far o and hence obtain a very low weight. This causes them to have a negligible in uence on the smoother estimate. It is believed that a very large ensemble is needed to get a reasonable estimate using this method. We will elaborate on this in the next section.

6 Summary and Discussion In this paper inverse methods were formulated in a probabilistic formulation. The idea is that the pure model evolution provides a prior distribution which is modi ed by adding data to the problem. The resulting posterior distribution is determined by Bayes' theorem. For Gaussian-distributed errors and linear model dynamics the representer method is rederived as the maximum-likelihood estimator. For nonlinear model dynamics the errors in the variables will not remain Gaussian. A minimum-variance estimator is formulated and two new methods to determine smoothers were presented. The results were compared with the Ensemble Kalman Filter. A simple smoother was determined using a frequency interpretation of the posterior distribution, the so-called direct ensemble smoother. In this smoother each ensemble member is weighted with the \distance" to all data points in the integration interval. The advantage of the method is that only three model elds have to be stored at all times. It is also very easy to add or subtract a data point in the nal solution. Finally, the integration interval can easily be extended without having to perform the complete model integrations again. The method behaves as a lter in this aspect. We also formulated the best linear unbiased ensemble smoother for nonlinear model dynamics. The resulting equations are similar to those of the representer method for linear model dynamics. However, in this case the representers and the representer matrix are calculated from an ensemble of nonlinear model integrations. We showed that the method can be derived from the minimum-variance estimator by assuming Gaussian-error statistics for the prior distribution. A remaining term for the non-Gaussian part of the prior distribution turned out to be very hard to evaluate, so it is at this stage unclear how good the estimator is. The new methods were tested with a two-layer quasi-geostrophic model with an eddy-jet interaction case. We found that the so-called direct ensemble smoother did not work with 500 ensemble members. It turned out that only 4 to 5 members were close enough to the data to contribute to the posterior distribution. This means that the posterior estimate was very poor. One way to solve this is to increase 19

the ensemble size signi cantly. To obtain a realistic posterior estimate it is expected that at least 100 members are necessary. This means that 10000 members, or probably more, are needed, which is computationally impossible. Another way to solve this problem can be to change the data-error covariance, which serves as the weighting function for the posterior estimate. It is clear that one bad data point can completely ruin the solution by selecting the wrong members. It is believed that the Gaussian distribution has too low tails, thus maybe a Lorentz pro le will do better. This will be investigated in a following paper. The ensemble smoother seemed to work ne. One may wonder why the direct method needs a larger ensemble than the ensemble smoother. For the ensemble method only the mean and the covariance are used. What is actually done is that a certain shape of the posterior distribution is chosen, namely a Gaussian. In the direct method such an assumption is not made: it uses all information contained in the ensemble. We compared the results obtained with the ensemble smoother with those of the ensemble Kalman Filter, which was shown by Evensen and van Leeuwen [1996] to work ne with 500 ensemble members. Contrary to our beliefs the lter was superior to the smoother, at least at later times. The poor performance of the ensemble smoother compared to the lter can best be explained by the poor prior estimate illustrated by its large variance. In the lter estimate the complete ensemble, so the complete prior distribution is \pulled" towards the data at every data point. The posterior or analyzed distribution then has a much lower variance and the members are all \close together" again. Its variance will grow exponentially again until the next data point. If enough data points are present the variance increase due to this exponential growth will not be that large so that the next analysis will pull it again below the smoother variance. It is interesting to compare the methods with the well-known representer method. The representer method has been applied for solving nonlinear inverse problems by Bennett and Thorburn [1992], Bennett et al. [1993] and in a manuscript by Eknes and Evensen, submitted to the J. Atmos. Sci. Japan, 1995. A weak constraint variational formulation like (16) is used for the model and measurements (1{4), with the inherent assumption about Gaussian prior error statistics for data and model dynamics. This results in a system of nonlinear Euler{Lagrange equations for which a sequence of linear iterates is de ned where each linear iterate can be solved using the representer method. Again the representers can be calculated from an ensemble integration for each iterate. The rst-guess eld changes in each iteration because the advection velocity and the forcing change. However, the equation for the rstguess eld remains linear. This means that if the initial condition is Gaussian distributed, then so will 20

the prior probability distribution be at all times. Thus, the traditional representer method is devised so that the prior distribution is Gaussian for every iteration. This will be true for the rst iteration, but it is unclear what the initial distribution is in later iterates. To summarize, in the representer method an iterative procedure is used to nd the unknown rst-guess eld and its covariance; i.e., the mean and covariance of a Gaussian distribution. The ensemble smoother proposed here generates a non-Gaussian ensemble from which the mean and the covariance are used to calculate the variance-minimizing analysis. Another problem with the representer method for nonlinear problems is that until now attempts to prove the convergence of the iteration scheme have been unsuccessful. It is however possible to prove that the iterative sequence of solutions is bounded. In the ensemble methods discussed here convergence is not a problem because no iterations are performed. However, the ensemble method needs to converge with respect to the ensemble size. An interesting remark about the Euler-Lagrange equations resulting from the weak-constraint inverse problem for a nonlinear model concerns the appearance of an additional term in the adjoint equation. For the quasi-geostrophic models used in Bennett and Thorburn [1992] and Bennett et al. [1993] the term arises from the functional variation of the advecting velocity. One can often solve the nonlinear system (1{3) , which is the pure model integration, by de ning an iteration, e.g., by evaluating the advecting velocities in the previous iterate. Accordingly, one can de ne a sequence of inverse problems, one for each of the linear iterates. Each of the linear inverse problems can again be solved exactly using the representer method. The interesting point is that, by de ning a convergent sequence of linear inverse problems one still has to solve exactly the same Euler-Lagrange equations as in the traditional generalized inverse formulation, except that one avoids the term resulting from functional variation of the advecting velocity. To conclude, there are various ways of formulating an inverse problem for nonlinear dynamical models. The traditional nonlinear generalized inverse formulation, a convergent sequence of inverse problems, and a probabilistic formulation, all yield di erent solutions, but all are mathematically consistent. Note also that the representer solution of the generalized inverse formulation integrates linearized equations, and thus may give a solution which is di erent from that obtained when the same integral is minimized directly using a so called substitution method. Clearly, the di erent formulations and solution techniques for nonlinear inverse problems need further investigation and intercomparison.

21

A Representer Calculations Based on Ensemble Statistics The ensemble of model states is integrated over the complete time interval of interest but the prior error covariance is not calculated explicitly. This would be a huge computation and the equations in section 4.2 show that only measurements of Q are needed. The actual algorithm can be illustrated as follows. Assume now that is represented on a discrete numerical grid in space and time with a total of Nt unknowns. Let us then store each of the n ensemble members in the columns of the matrix A 2

Suggest Documents