The Diffusion Kernel Filter Applied to Lagrangian Data ... - AMS Journals

1 downloads 0 Views 2MB Size Report
Dec 1, 2009 - of tracking point vortices and passive drifters, using a dynamical model and data, both of which have known error statistics. It is found that the ...
4386

MONTHLY WEATHER REVIEW

VOLUME 137

The Diffusion Kernel Filter Applied to Lagrangian Data Assimilation PAUL KRAUSE Department of Mathematics, The University of Arizona, Tucson, Arizona

JUAN M. RESTREPO Department of Mathematics, and Department of Physics, The University of Arizona, Tucson, Arizona (Manuscript received 25 November 2008, in final form 25 June 2009) ABSTRACT The diffusion kernel filter is a sequential particle-method approach to data assimilation of time series data and evolutionary models. The method is applicable to nonlinear/non-Gaussian problems. Within branches of prediction it parameterizes small fluctuations of Brownian-driven paths about deterministic paths. Its implementation is relatively straightforward, provided a tangent linear model is available. A by-product of the parameterization is a bound on the infinity norm of the covariance matrix of such fluctuations (divided by the grid model dimension). As such it can be used to define a notion of ‘‘prediction’’ itself. It can also be used to assess the short time sensitivity of the deterministic history to Brownian noise or Gaussian initial perturbations. In pure oceanic Lagrangian data assimilation, the dynamics and the statistics are nonlinear and nonGaussian, respectively. Both of these characteristics challenge conventional methods, such as the extended Kalman filter and the popular ensemble Kalman filter. The diffusion kernel filter is proposed as an alternative and is evaluated here on a problem that is often used as a test bed for Lagrangian data assimilation: it consists of tracking point vortices and passive drifters, using a dynamical model and data, both of which have known error statistics. It is found that the diffusion kernel filter captures the first few moments of the random dynamics, with a computational cost that is competitive with a particle filter estimation strategy. The authors also introduce a clustered version of the diffusion kernel filter (cDKF), which is shown to be significantly more efficient with regard to computational cost, at the expense of a slight degradation in the description of the statistics of the dynamical history. Upon parallelizing branches of prediction, cDKF can be computationally competitive with EKF.

1. Introduction The diffusion kernel filter (DKF) is a particle-based filtering strategy for continuous-time models and discretetime data. The type of models and observations we focus on here will, in general, have explicit error terms; for example, the error term in the model can represent unresolved physical processes. The statistics of the model and observation error are presumed known. The data assimilation problem is specified in section 2. The state vector for which an estimator is sought can consist of dynamic quantities (e.g., physical variables) as well as parameters. Hence, the dimension of the state variable

Corresponding author address: Juan M. Restrepo, Department of Mathematics, and Department of Physics, The University of Arizona, Tucson, AZ 85721. E-mail: [email protected] DOI: 10.1175/2009MWR2889.1 Ó 2009 American Meteorological Society

and that of the dynamic variable may differ. The dimension of the measurement vector may be different from that of the state vector as well. DKF yields the first few moments of the timedependent state vector, conditioned on observations. It does so without assuming linearity in the dynamics or Gaussianity in the statistics. However, the DKF is applicable to small diffusion processes. It is derived by a reformulation of the Ito nonlinear stochastic ordinary differential equation problem for diffusion processes into a Liouville stochastic partial differential equation problem. An application of Duhamel’s principle to the Liouville problem and a splitting into deterministic and fluctuating components of branches of prediction is then used, within a Bayesian particle-filter structure, to produce a sample-based filtering method. The method was inspired by the work of Chorin et al. (2002), where a similar technique was used to tackle the dimension

DECEMBER 2009

KRAUSE AND RESTREPO

reduction problem for the dynamics of a system of nonlinear ordinary differential equations. The DKF is described in section 3, and the algorithmic aspects are covered in section 4. The DKF is built upon a diffusion kernel. The norm of the diffusion kernel, which we call the uncertainty norm, can be used independently of the filtering methodology proposed here to find an upper bound on the covariance matrix ‘-norm of branches of prediction. The notion of branches of prediction will be made clear in section 3. The uncertainty norm is presented in section 3, and applied in section 5 to test the degree of complexity of the dynamics of a dynamical system. For linear dynamics and Gaussian error statistics an optimal smoother of the history is provided by varianceminimizing least squares, the variational data assimilation approach or the Kalman filter/smoother [see Wunsch (1996) for details on these]. Two commonly used techniques in nonlinear and weakly non-Gaussian contexts are the extended Kalman filter–smoother (EKF/S) and the ensemble Kalman filter–smoother (EnKF/S). Ensemble Kalman approaches and the variational approach [i.e., the four-dimensional variational data assimilation (4DVar)] are presently being evaluated in operational forecast models for weather and climate [see Gustafsson (2007) and references contained therein]. The extended Kalman filter/smoother uses a linearized forecast, followed by a linear (Gaussian) analysis. The ensemble Kalman filter, on the other hand, makes a nonlinear forecast, but retains the use of the Gaussian analysis [see Evensen (1997) and references contained therein]. There are a number of data assimilation strategies that specifically target problems in which nonlinearity and/or non-Gaussianity pose significant assimilation challenges. Among them are the optimal approach of Kushner (1962) [see as well Kushner (1967a,b), Stratonovich (1960), and Pardoux (1982)] and the near-optimal approach meanfield variational strategy of Eyink et al. (2004) [see also Eyink and Restrepo (2000)]. Optimality here refers to the fact that the methods minimize the trace of the conditioned posterior variance. There are a variety of different particle methods (also called sequential Monte Carlo methods), such as those alluded to in Van Leeuwen (2003) and parametric resampling methods, such as Kim et al. (2003). [Arulampalam et al. (2002) provide a tutorial on particle methods.] There are also direct probability density function sampling methods—for example, the method introduced by Pham (2001), the path-integral method of Alexander et al. (2005) and Restrepo (2008), and the Langevin sampling method of Hairer et al. (2007). In the above-mentioned sample-based methods the aim is to obtain truly representative samples of the posterior distribution. These are then used to compute

4387

sample moments. A common trait of all the methods mentioned above is that they are computationally intensive and thus deemed impractical in problems for which the state vector is large in dimension. This computational limitation, however, should not be construed as a complete practical failure; for example, not every time-dependent estimation problem of interest has dimensions comparable to those of a weather forecasting problem. Furthermore, a method capable of handling nonlinearity and non-Gaussianity in a computationally robust way is an essential tool, serving as a benchmark with which to evaluate the relative merits of alternative assimilation strategies. The first aim of this paper is to show how this new DKF methodology performs on nonlinear/non-Gaussian filtering problems. Its estimates will be compared to a benchmark solution. We will also compare DKF estimates to those produced by the extended and ensemble Kalman filters (EKF and EnKF, respectively). DKF will be shown to perform well in terms of the statistical consistency provided by the benchmark and to distinguish itself from other methods applicable to nonlinear/ non-Gaussian problems in its ability to handle problems of larger dimension. We will also show that the DKF is competitive with the benchmark in operations count, and in a clustered version, which we test here, is significantly more efficient. We will compare the DKF estimates to those obtained by Kuznetsov et al. (2003, hereafter KIJ) on a Lagrangian data assimilation problem that is frequently used to test Lagrangian estimation ideas. See also Ide et al. (2002) ¨ zgo¨kmen et al. (2000) for related work; KIJ coined and O their method ‘‘oceanic Lagrangian data assimilation.’’ Their methodology, as applied to a fluid–drifter problem, consists in posing the dynamics in a Lagrangian frame: the flow is entirely characterized by the dynamics of two-dimensional interacting nondissipative vortices. The resulting system consists of a reduced representation of the dynamics, in the form of a system of differential equations. (The issue of how noise and uncertainties translate from the Eulerian frame, in which the dynamics are given by partial differential equations, to the Lagrangian frame is not addressed, however.) A very appealing aspect of this approach is that the system is now amenable to powerful dynamical systems analytical and sensitivity tools. The use of a Lagrangian framework is also inspired by real and practical circumstances; namely, this approach is compatible with a current trend in oceanic data-gathering strategies: the oceans are presently, and will be in the near future, measured by a variety of active and passive platforms and drifters, which themselves have possibly complex pathfollowing dynamics.

4388

MONTHLY WEATHER REVIEW

As a general rule it is expected that the recasting of oceanic flows in purely Lagrangian terms will lead to highly nonlinear dynamics and non-Gaussian statistics. If estimation techniques are to be used to complement the analysis of these types of problems, or deal sensibly with their sensitivity to initial conditions, an estimation technique that can handle nonlinear dynamics and/or non-Gaussian statistics would be needed. At the same time, a geophysically viable method must also be significantly more efficient computationally than the aforementioned nonlinear–non-Gaussian methods. This will allow consideration of problems of oceanic interest that have state variables of reasonable size. The estimation strategy chosen by KIJ in the drifter–vortex problem that they focus on is known as the constrained extended Kalman filter [the drifter equations are effectively constraints on the dynamics of the point vortices; see Simon and Chia (2002) and references contained therein]. Kuznetsov and collaborators report satisfactory performance from their estimation strategy. However, they also point out that the method fails in exceptional circumstances. They were tentative regarding how and why their method failed. Most practical or potentially practical methods for data assimilation in the context of nonlinear–non-Gaussian processes are ad hoc or low-order approximations to random processes. This includes EKF and EnKF. It also includes DKF. A common measure of success of an estimator is the closeness of its estimate to the truth, which we denote here the estimate error norm. KIJ use this measure to assess their assimilation strategy. Between different classes of estimators, this is certainly a telling measure of success–failure. However, probabilistic estimators of the same class are better evaluated by computing the distance between their moment estimates and those produced by a benchmark method. We will denote this alternative measure the n-moment error norm, where n is the highest moment under consideration. The estimate error norm is commonly used in practical data assimilation. However, it is common to have no information on how large or small this norm should be in order to consider an assimilation a success or failure. The n-moment error norm does not provide such absolute measure either; however, it is realistic to compare an assimilation to a benchmark in order to qualify an alternative method’s estimation abilities. Moreover, the n-moment error norm conveys better statistical consistency than a norm that simply looks at the first moment alone. With statistical consistency it is possible to define predictors that are more faithful to the physics than the simple average prediction; for example, the average entropy may be such a predictor (to be defined shortly). In the context of the simulations on

VOLUME 137

Lagrangian data assimilation of section 5 we will take the opportunity to discuss and illustrate how these two metrics compare and assess the quality of an assimilation simulation. The bootstrap filter estimator (BF) is chosen as a benchmark because it has been shown to be convergent (see Crisan et al. 1999). It will be compared, in section 5, with the extended Kalman filter (EKF) methodology advocated by KIJ, our proposed alternative, the DKF, and the EnKF since it is becoming widely used and often proposed as a viable alternative in nonlinear problems. In section 6 we describe this alternative implementation, which we call the clustered DKF (cDKF). We show that in some cases the cDKF can be competitive in computational efficiency with the EKF, yet outperforms EKF with regard to the quality of the answer. In section 7 we summarize the computational expense incurred in DKF as well as cDKF. We show that DKF is competitive with the bootstrap filter. However, by clustering along branches of prediction the DKF can be made significantly more efficient computationally, at the expense of a modest deterioration in the estimate. With the potential to provide a robust approach to nonlinear–non-Gaussian problems and handle problems with state vectors of modest but yet physically and scientifically significant size, it is suggested that DKF– cDKF may be a practical tool for moderate-dimension data assimilation, where nonlinearity is an issue. Such is the case in moderately sized Lagrangian data assimilation studies.

2. Problem statement Here we consider the determination of the statistics of the state variable x(t), whose dimension is N, given incomplete and possibly imprecise observations of that system. Here t is an indexing parameter, taken to represent time. The state vector x is assumed to satisfy the stochastic differential equation (SDE): dx(t) 5 f[x(t)]dt 1 gdw, x(t0 ) 5 x0 ,

t . t0 , (1)

where g :5 (2D)1/2(x, t) is the diffusion matrix, and 2D is the diffusion stress tensor, of dimension N 3 N. The deterministic part of the dynamics of x is given by f. The term dw is a Wiener incremental process of dimension N with independent components. Here g is assumed known, as is the distribution of the initial state x0. Observations at discrete times tk, are denoted by the Ny-dimensional vector:

DECEMBER 2009

KRAUSE AND RESTREPO

y(tk ) [ yk , k 5 1, 2, . . . , K,

4389

t0 # t k # t f .

The relationship between the observations and the state variable at different times is given by yk 5 h[x(tk )] 1 ek ,

(2)

where h : RN ! RN y, and ek is an Ny-dimensional zeromean noise vector with a known statistical distribution. We will use a Gaussian distribution for the measurement errors in the computation examples presented later. The filtering problem consists of sampling the probability density p(xjyk , . . . , y1 ),

k 5 1, 2, . . . , K,

for any t $ tk, and using these to produce sample moments of the history x(t)jyk, . . . , y1. The sequential Bayesian approach consists of associating model outcomes to the prior and the data to the likelihood, using these to compute or approximate the conditional probability: p(xjyk ) 5

p(yk jx)p(x) , p(yk )

(3)

when data at tk become available. The diffusion kernel will be used within the particle filter framework to propose the DKF, a filtering strategy.

FIG. 1. Schematic representation of branches of prediction. The solution for each branch i 5 1, . . . , Ik of prediction is the set of sample paths F j [j(tk ,i),t] 5 f[j(tk ,i),t]1F9j [j(tk ,i),t], j 5 1, ..., J k,i . The solid lines depict the deterministic trajectories f[j(tk, i), t] emanating from the sample states j(tk, i); the dashed lines depict sample paths emanating from the sample states j(tk, i); RN is configuration space, for the n-dimensional real state variable.

where $f is the tangent linear model solution, whose initial value is the identity matrix, and (7) is the diffusion kernel. If the diffusion matrix happens to be a constant, g 5 g0 say, then

3. The diffusion kernel Referring to Fig. 1, let Fj[j(tk, i)]; t be a set of sample paths that obey dx 5 f(x)dt 1 g(x)dw,

t 2 (tk , tk11 ),

x(tk ) 5 j(tk , i) 2 RN ,

In this particular case the diffusion kernel is equal to the fluctuating field of the particular branch of prediction.

(5)

a. The tangent linear model

t 2 (tk , tk11 ).

As shown by Krause (2009), the fluctuations about the ith deterministic path are ðt F9[j(tk , i), t] ; $f[j(tk , i), t] gf[j(tk , i), s]g dw(s  tk) ðt 5 tk

tk

G[j(tk , i), t, s] dw(s  tk ),

(7)

(4)

where the j(tk, i) are realizations of the conditional density at time tk. In the time interval t 2 (tk, tk11) the branches are indexed by i 5 1, . . . , Ik and a particular sample path within branch i identified by j 5 1, . . . , Jk,i. Each branch of prediction is written in the form F 5 f 1 F9, where f[j(tk, i); t] is the deterministic path of branch i, that is, with initial value j(tk, i) that satisfies dx 5 f(x), dt

F9[j(tk , i), t] ; $f[j(tk , i), t]g0 w(t  tk ).

(6)

The tangent linear model is needed in order to compute the diffusion kernel. It is obtained by solving d$f 5 $f(f)$f, t 2 (tk , tk11 ), dt $f 5 I, at t 5 tk ,

(8)

where I is the N 3 N identity matrix. Obtaining the gradient of f is a nontrivial task, however. Automatic differentiation (see Bischoff et al. 1992; Giering 1999) and checkpointing (see Restrepo et al. 1998) are tools that have greatly simplified obtaining it.

b. The uncertainty norm We call kGk[j(tk, i), t] the uncertainty norm associated with the ith branch of prediction. It is defined mathematically as

4390

MONTHLY WEATHER REVIEW

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð t

kGk[j(tk , i), t]d max

r51,...,N

tk

EfGr [j(tk , i), t, s]G*[j(t t k , i), t, s]g ds,

where Gr and G*r refer to a row of G and its conjugate transpose. The symbol E denotes an expectation operator. The kGk[j(tk, i), t] was shown in Krause (2009) to bound the infinity norm of the covariance matrix of branch i, divided by N. Specifically,   1 kCov(F)k‘ # kGk2 , tk # t # tk11 . (9) N This result holds as long as (6) is a good approximation. In the special case where the diffusion matrix is constant, g 5 g0, one has   pffiffiffiffiffiffiffiffiffiffiffi kGk(t) 5 t  tk max [$f(t)]r g0 2 , (10) r51,..., N

where the dependence of both kGk and $f upon j(tk, i) was omitted. The second term in (10) reads as the maximum fluctuation over all state components obtained at time t after a set of initial perturbations given by the column vectors of g0. The key observation is that kCov(F)k‘ will be small whenever kGk is small; that is, for small diffusion kernels within branches of prediction. The uncertainty norm can thus be used to obtain a bound on the covariance matrix of the state within a branch of prediction independent of its use in the context of filtering. Furthermore, its connection to uncertainty makes it useful in defining quantitatively a notion of prediction. Further on we will illustrate how it compares to the more traditional average and likely notions of prediction. We will also use it to infer the complexity of the dynamics in the example calculations.

4. The diffusion kernel filter The DKF is a sample-based filtering method. Moments of the history are thus approximated by sampling averages. The samples themselves are obtained from the random variable F(x)j(y1 , . . . , yk )(t),

VOLUME 137

k 5 1, . . . , K,

for 0 # t and tk # tf. Here tk are the filtering times. Let Dt be the time interval between filtering times, for simplicty taken here as constant and commensurate with the time stepping used in approximating solutions to (4). In what follows we describe the filtering algorithm as it proceeds from time tk to tk11. Let I denote the total number of sample paths, over all branches of prediction.

This number is fixed in what follows (over the entire filtering process). In the time interval [tk, tk11], each branch of prediction i 5 1, 2, . . . , Ik, in turn, is composed Ik J k,i . of Jk,i sample paths, so that I 5 åi51 The DKF is built upon the structure of the BF, a particular particle filter discussed at length by Gordon et al. (1993). [See Arulampalam et al. (2002) for a tutorial on particle filters.] The choice of the bootstrap filter was dictated by simplicity and familiarity; however, it would not be difficult to adapt the DKF to alternative particle filters.

a. The BF algorithm 1) Samples of the prior distribution at tk11 are denoted by Fj[j(tk, i), tk11]. They are obtained by numerically solving (4) with the initial condition: x(tk ) 5 j(tk , i),

(11)

with Jk,i times, where i 5 1, . . . , Ik. 2) The density function of (2) is used to compute the likelihood weights:

-(tk11 , i, j) d

p(yk11 jx(i, j) )(tk11 )

å p(yk11 jx(i, j) )(tk11 ) i, j

.

(12)

The density function is known since  is assumed known. Here, x(i,j) d Fj[j(tk, i), tk11]. The weights are associated with the likelihood. 3) Combining Fj[j(tk, i), tk11] and -(tk11, i, j) one obtains weighted samples of the posterior p(xjyk11)(tk11) in accordance with (3). 4) Next we resample as well as prepare to continue to the next filtering stage. We perform the following operations. d Using x (i,j), over all admissible i, and j, we define a discrete random variable taking values (i, j) with mass function -(tk11, i, j) over all of the x(i,j). Then we draw I samples from this discrete random variable. Let (i9, j9) be these samples, without repetition, and J(i9, j9) the number of times they are repeated. d Relabel the sets F [j(t , i9), t j9 k k11] and J(i9,j9) as j(tk11, i) and Jk11,i, for i 5 1, 2, . . . , Ik11. A sequential algorithm is then obtained by repeating the process in the next filtering time interval.

DECEMBER 2009

4391

KRAUSE AND RESTREPO

Built upon this bootstrap filter, the diffusion kernel filter can be implemented as follows.

b. The DKF algorithm 1) For tk # t # tk11, (8) is solved, for i 5 1, 2, . . . , Ik, in order to obtain $f[j(tk, i), tk11]. 2) For tk # t # tk11, (4)–(5) is solved, with g set to 0. The solution yields f[j(tk, i), tk11]. 3) Draw Jk,i samples x9(i,j) from the density of F9[j(tk , i), tk11 ]d $f[j(tk , i), tk11 ] ðt k11 3 g[s, w(s  tk )] dw(s  tk ).

(13)

is called by KIJ ‘‘Lagrangian data assimilation.’’ They specifically consider a two-dimensional ocean state, fully determined by the (nondissipative) dynamics of vortices, coupled to passive drifters: in what follows, the system for the time evolution of the centers of the vortices and the passive drifters has already been made nondimensional by scaling spatial coordinates 2(x, y)/ ‘ / (X, Y ) and time by 2Gt/p‘2 / t. Here ‘ is the typical distance between the vortices (e.g., the distance between them at t 5 0) and G is the vortex strength. In compact form, the dynamics of the mth point vortex with space coordinates [ pm(t), qm(t)] at time t can be written in terms of complex coordinates zm(t) d pm(t) 1 iqm(t) as

tk

Ny

4) Define F j [j(tk , i), tk11 ]d f[j(tk , i), tk11 ] 1 x9(i, j) , for i 5 1, . . . , I k , and j 5 1, . . . , Jk,i, thus obtaining I samples of the prior probability density p(x)(tk11). [See Eq. (3).] 5) Proceed as in steps 2–4 of the BF algorithm, repeating the process at each filtering time. The DKF algorithm introduces an approximation to the BF at the level of the stochastic dynamics in the prediction step: by linearizing the equations within each branch of prediction the nonlinear dynamics are effectively approximated by a sum of Gaussian branches of prediction. As a general rule, in numerically solving the stochastic differential equation, required in step 1 of BF, one is forced to use a very small time step and/or a great many substage computations. In contrast, numerically integrating the deterministic ordinary differential equations in steps 1 and 2 in the DKF algorithm generally permits the use of a significantly longer time step. Depending on the required samples and the dimension of the state variable, exploiting this fact can have a significant impact on the computational expense of the DKF compared to BF. It is noted that in step 3 of DKF, if g is constant, the integral is a simple product. For simplicity, we will specialize the presentation in what follows to problems whose g is a constant matrix or a matrix that depends on timeÐt and the vector Wiener process w(t 2 tk) such that t the t g[s, w(s  tk )] dw(s  tk ) can be computed anak lytically, as a function of time. (This precludes cases in which the history of this process has to be sampled.)

5. Example calculations The estimation of the history and higher moments of the position of ‘‘floats’’ or drifters and the state of largescale oceanic dynamics in the Lagrangian frame is what

dzm Gl i 5 1 hV (t), m 2p l51,l6¼m z*m  z*l dt

å

m 5 1, 2, . . . , N y , (14)

where Gm is the vortex strength of the mth vortex (all of them assumed hereon to be equal to 2p). The dynamics of the nth passive tracer, with position coordinates [yn(t), wn(t)] is given compactly in terms of z n(t) d y n(t) 1 iwn(t), as Ny

dz n Gl i 5 1 hP (t), n 2p l51 z n*  z*l dt

å

n 5 1, 2, . . . , N p .

(15)

The star superscript denotes complex conjugate. ( p) (q) In the above equations, hV m 5 hV m 1 ihV m, hPn 5 (y ) (w ) hPn 1 ihPn are stochastic terms. These may represent unresolved processes, for example, and are assumed known. They will be taken here as being independent white noise processes over time. We choose ( )

( )

E[h() (t)h() (t9)] 5 Qd(t  t9), where E denotes the expectation operator and the superscript () symbol denotes the components of the elements indexed by the subscript symbol () of the vectors hPn or hV m. The matrix Q is assumed herein to be diagonal and constant: Q d diag(s 2). In terms of the matrix g in (1) the relationship is ggT 5 Qdt, which stems from writing gdw 5 hdt. The equation for the covariance C, in a linear setting, is dC 5 $ f C 1 ($ f C)T 1 ggT . dt In terms of Q this equation then becomes dC 5 $f C 1 ($ f C)T 1 Qdt. dt

4392

MONTHLY WEATHER REVIEW

In (2), we assume the noise is Gaussian with zero mean and covariance: E[e() (t)e() (t9)] 5 Rd(t  t9), where the symbol () denotes the components of the measurement error vector and R is hereon assumed to be diagonal and constant: R d diag(r2). In nondimensional form the model and pffiffiffiobservation variances are transformed according to G/2ps ! s and 2r/‘ / r, respectively. The underlying deterministic dynamics for the drifter/ vortex problem have received considerable scrutiny. Friedrichs (1966) gives a detailed derivation and analysis of this system of equations; see Weiss et al. (1998) and references contained therein for an analysis. Its noisy counterpart, as proposed as a data assimilation problem, has been used by KIJ (also see Ide et al. 2002; Restrepo 2008) as a test bed for Lagrangian data assimilation schemes. The specific assimilation strategy adopted in KIJ to perform the estimation is called the constrained EKF; the drifter dynamics, which are coupled to the fluid dynamics, are thought of as constraints on the history of the flow. Evaluating the outcome of the assimilation itself will be done in two ways. Using the estimate error norm (the Euclidean distance of the estimate to the ‘‘truth’’), and the n-moment error norm (the Euclidean distance of the first n estimated moments to the corresponding moments produced by a benchmark calculation. In what follows we take n 5 3. We present two Lagrangian data assimilation examples. The first one consists of producing an estimate for the positions of a drifter and two point vortices, conditioned on noisy drifter observations. In this example the state variable dimension N 5 6. The initial estimate positions of the vortices and the drifter are taken to be ‘‘delta’’ distributed, and noncoincidental with truth. The second example considers the same dynamical system; however, the initial sample positions of the vortices are drawn from uniform distributions over two different unit squares on the ocean plane while the initial drifter location remains with a delta distribution. As was considered in KIJ in this set of calculations we have 1 drifter and 2 vortices (Np 5 1 and Ny 5 2). The drifter observations make up the full dataset. This data is drawn from a numerically generated sample orbit of (14)–(15). KIJ acknowledge that the EKF had a tendency to fail, though it was an exceptional rather than a systematic outcome. They make use of the estimate error norm to assess failure or success. They consider an estimate a success when a threshold for the norm is not exceeded. This threshold is based on a dimensional

VOLUME 137

argument. An estimate was deemed a failure when its error norm became large. In their analysis of the events that lead to EKF failure, by the estimate error norm measure, they report that these exceptional circumstances are correlated to when the drifter is extremely close to the vortices or when the drifter approached a saddle bifurcation in the Lagrangian flow. In other words, when the nonlinear dynamics became sensitive in the extreme (see also Griffa et al. 2007, p. 222). They also found that the assimilation procedure was very sensitive to the initial position of the drifter relative to the Lagrangian flow structures in the flow. We will compare EKF, EnKF, and DKF against a benchmark estimate that is provided by the bootstrap particle filter, BF. All methods use Heun in the deterministic or stochastic dynamics updates. Save for using Heun, the EKF algorithm used appears in the book by Wunsch (1996). The EnKF algorithm used was provided by Evensen (2004) and implemented with a Heun time integrator. The implementations of all methods were first tested on a linear scalar Langevin problem, of the following form: dx 5 adt 1 gdW,

(16)

where a $ 0, as well as a double-well problem dx 5 4x(1 2 x2)dt 1 gdW. Both of these problems have analytical stationary (asymptotic) probability distributions, which enabled us to determine whether the algorithms were working correctly. In what follows the time discretization for the time integrators of the BF, EKF, DKF, and EnKF were dt 5 1023. (As we mentioned before, it is possible to make the time stepping of DKF and EKF much larger than the BF time step, however, we wanted to show all methods in the best light possible and the least affected by time integration issues.) The interval between filtering times was set to 0.75, in all runs. The filtering times were chosen to be at unit intervals in the KIJ calculations. The number of samples required, in all of the sample-based methods, is problem dependent. The number of samples used the BF, DKF, and EnKF were 5 3 105. In Figs. 2–6 we compare the estimated mean position of the drifter, as predicted by all four estimation methods. In this example s2 5 0.1 and r2 5 0.01. The initial sample values for the positions of the drifter and the vortices were not coincidental with truth at t 5 0; it is often the case in practice that the initial truth is not known with certainty. The initial sample values for the locations were all taken at (20.95, 0.05) and (0.95, 0.05) for the vortices and (0.68, 0.30) for the drifter. The initial truth for the vortex positions were (21, 0) and (11, 0) and (0.7, 0.3) was the initial truth for the drifter. Thus,

DECEMBER 2009

KRAUSE AND RESTREPO

FIG. 2. Mean drifter (y1, w1) path, up to t 5 10. Comparison of the estimated history for the two-vortex–one-drifter problem, highlighting comparison to EKF. Here s2 5 0.1, data uncertainty is r2 5 0.01, time stepping is 0.001, and the filtering time step is 0.75. BF–DKF (both dark–solid) and EKF (dashed). Truth (light solid) observations shown as circles and the estimate starting point is indicated on the figure as an asterisk. Initial estimate position for the drifter is (0.68, 0.30) and the initial estimate for the vortices are (20.95, 0.05) and (0.95, 0.05). The initial position for the drifter associated with the truth is (0.7, 0.3). The vortices’ position associated with the truth path are (21, 0) and (1, 0), respectively.

the initial truth is not a member of the state samples (in filtering problems truth is supposed to be a sample path of the random dynamics associated to model and random initial value). In practice it is common to choose truth with little to no knowledge of the state sample statistics, and thus it is not unreasonable to investigate how the filtering schemes will make estimates under these conditions. The filter runs to tf 5 30. In Figs. 2 and 3 we show the estimated mean drifter path, up to t 5 10, as estimated by the various methods. In Fig. 2 we highlight the BF, DKF, and EKF estimates. For comparison, Fig. 3 highlights BF, DKF, and EnKF estimates. The truth path is shown as a light solid line and the observations are indicated as circles. The BF and the DKF mean paths are coincidental and represented by a heavy solid line. The EKF mean path is shown as a dashed line, is significantly different from the benchmark BF estimate. In Fig. 3 the ensemble estimate of EnKF is shown with squares. The EnKF coincides for a longer interval than EKF. The EnKF reacts strongly to observations at filtering times. None of the estimates tracks truth because of the choice of the initial setting: truth not encompassed by samples of the random initial state value, which implies that truth is at best a low probability sample of it. In KIJ, saddle bifurcations or proximity of the drifter to vortex centers are singled out as being the primary

4393

FIG. 3. Mean drifter (y1, w1) path, up to t 5 10, highlighting comparison to EnKF. Comparison of the estimated history for the two-vortex–one-drifter problem. Same parameters as in Fig. 2. Observations (circles), EnKF (squares), BF (solid), and truth (light solid).

failure mechanisms of EKF. Since these events are infrequent, the implication is that EKF fails only rarely. The results shown above paint a different picture. The empirical evidence is well typified by the figures thus far discussed: the linearization in the forecast was successful, however, the Gaussian assumption in the analysis stage was not. Figure 4a shows a comparison between the BF and EKF estimates. For completeness, we also show, in Fig. 4b a comparison of BF to EnKF. The case illustrated is the same one as shown in Fig. 2, focusing on times close to t 5 0. During this time the system does not experience extreme sensitivity, dynamic or numerical. Yet it is clear that both EKF and EnKF are unable to track the benchmark estimate, save for very short times. Furthermore, the estimate divergence grows markedly after the first filtering time. The circles indicate the order in which the observations are assimilated. Asterisks on the paths denote the position of the drifter estimate, at filtering times, prior to filtering. Figures 5 and 6a portray the second and third moments of the first component of the drifter, respectively. Similar plots would be obtained for the moments of the other components of the state vector. The second moment predicted by DKF shows some differences to the BF at early times, however, it latches on to BF later. The EKF is comparable in size to BF though not qualitatively similar: it does not improve later in the simulation. The third moments of DKF are nearly coincidental with BF. There is no third moment for EKF. The EnKF secondand third-moment estimates are significantly larger and qualitatively different from the benchmark estimates. It is emphasized that the EnKF moments were computed

4394

MONTHLY WEATHER REVIEW

FIG. 4. Corresponding to the two-vortex–one-drifter problem, estimated path of (y1, w1), up to t 5 3. Same parameters as in Fig. 2. Numbered circles indicate order in which observations are assimilated: EKF (dashed), BF (heavy solid), truth (light solid), and EnKF (squares). Asterisks on paths indicate position of drifter estimate at filtering times, prior to filtering.

using half a million samples, thus the computations yielded stable ensemble estimates. Based upon the threemoment error norm the DKF performs well. In Fig. 6b we show the maximum uncertainty norm value over all branches, as a function of time. For the linear Langevin problem in (16), using (7) it can be shown that G 5 exp[(t 2 tk)a]g, with g constant, hence, that pffiffiffiffiffiffiffiffiffiffiffi kGk(t) 5 t  tk exp[(t  tk )a]g, in all branches of prediction in any prediction step. It is clear from Fig. 6b that pffiffiffiffiffiffiffiffiffiffiffi in the vortex/drifter problem t  tk is the dominant term in all prediction steps, in which case (t 2 tk)a must be very small in these steps, with a ; k$fk(f). The EKF estimate, thus, was not challenged by extreme model dynamics sensitivity, as alluded to in KIJ.

VOLUME 137

FIG. 5. For the two-vortex–one-drifter problem, second moment of y1, up to time 10. Same parameters as in Fig. 2: (a) EKF and (b) EnKF. DKF (light solid), BF (dark solid), EKF (dashed), and EnKF (squares).

In this second example s2 5 0.1 and r2 5 0.01. The filter was run until tf 5 22.5. The initial sample values for the locations of the vortices were drawn from uniform distributions over the unit squares on the ocean plane centered at (20.75, 0.25) and (1.25, 0.25). The initial sample values for the locations of the drifter were all taken at (0.70, 0.26). The initial truth for the vortex positions were (21, 0) and (11, 0), and (0.7, 0.3) was the initial truth for the drifter. In this example the vortex samples encompass the initial truth position. In Fig. 7 we show the estimated drifter and one of the vortex paths. Initially the histories are much further from truth than in the previous run, nevertheless in a few filtering steps the DKF and the BF estimates and the truth close in. The same is not true for EnKF.

DECEMBER 2009

KRAUSE AND RESTREPO

4395

FIG. 7. For the two-vortex–one-drifter problem, with initial uniform distributions, shown up to time 15: vortex ( p1, q1) estimate; truth (light solid), BF and DKF (solid), and EnKF (squares). The asterisk indicates the position of the initial average estimate for the vortex.

FIG. 6. For the two-vortex–one-drifter problem: (a) third moment of y1, up to time 10. Same parameters as in Fig. 2: BF (dark solid), DKF (light solid), and EnKF (squares). (b) The maximum uncertainty norm value over all branches.

Figure 8 shows the estimated second and third moments for the first component of a vortex. The BF and the DKF are nearly coincidental. The EnKF is not. The EKF second moment agrees with the EnKF and then diverges. The average entropy and likelihood-weights predictions are defined as follows: the average entropy prediction is the deterministic path associated with the branch of prediction whose final covariance matrix ‘-norm is closest to the average over all branches (the average is taken with respect to the likelihood weights); the likelihood-weights prediction is the deterministic path obtained by starting from the most likely sample. For linear Gaussian problems we expect all three norms to agree. However, in nonlinear–non-Gaussian problems they can differ, as is the case here. Figure 9 com-

pares various prediction errors, for the vortices alone. The average entropy prediction error follows more closely the average prediction error of DKF. The average prediction of EnKF is seen to fluctuate. To conclude, a filtering scheme that either requires or has good statistical consistency characteristics will produce estimates with better uncertainty quantification information. Furthermore, as borne out in our illustrative examples, a filtering scheme of this sort will also yield valuable information in the testing of an estimation procedure itself. Thus, a filtering scheme with good statistical consistency will produce estimates with better uncertainty quantification (which may also be valuable information in the testing of an estimation procedure).

6. The clustered DKF We introduce a further development of DKF we will call a clustered DKF (cDKF). The cDKF is a costreducing implementation of the full DKF. The computational cost reduction in cDKF comes from using fewer samples to describe the posterior, which implies using fewer branches in the prediction step. The idea behind cDKF is to use a few branches of prediction and (13) to obtain F9 at filtering times, from which to draw the required samples; these samples are then decimated into cluster representatives. The weights associated with these representative samples are taken to be the sum of the weights of the members of their respective cluster, so that the number of paths that would emanate from each cluster is preserved. A simple, but nonoptimal, procedure for this in Lagrangian data

4396

MONTHLY WEATHER REVIEW

VOLUME 137

FIG. 8. For the two-vortex–one-drifter problem, with initial uniform distributions, up to time 15. Parameters are the same as in Fig. 7: (a) second and (b) third moment of p1. BF and DKF (solid) and EnKF (squares).

assimilation is to partition the interval [-1, -I] of likelihood weights into Ik subintervals, which amounts to partitioning the whole configuration space into Ik cells based on a partitioning of the drifters space fnote that p[yk11jx(i,j)](tk11) is a distance in the drifters space, up to the exponentiationg, which embodies the most unstable directions. In each subinterval the weights are then added in order to compute the effective weight of representative samples associated with each subinterval. Here Ik is chosen to be sufficiently large so that the samples effectively represent each cluster. We considered how cDKF compares to the DKF on a two-vortex–drifter system, similar to the cases considered in the previous section and found that cDKF estimates were very close to those of DKF. To illustrate how well the cDKF performs on a problem with higher

FIG. 9. For the two-vortex–one-drifter problem, with initial uniform distributions, up to time 20. Contributions from the vortical components only. Parameters are the same as in Fig. 7: (a) comparison of likelihood estimate error (crosses), DKF average estimate error (solid), and EnKF average estimate error (squares). (b) Comparison of entropy estimate error (crosses), DKF average estimate error (solid), and EnKF average (squares).

dimensions we chose an Np 5 4 and Ny 5 4 (thus the state variable dimension was 16), up to tf 5 20. The observations comprise the position of the four drifters. The uncertainty parameters are s2 5 0.03 and r2 5 0.01. Figure 10 compares the cDKF, the DKF, and the bootstrap filter estimates. The figure features the outcomes for one of the components of the state vector. We found that the other components were similarly well captured by DKF and cDKF as compared to the bootstrap filter results.

DECEMBER 2009

4397

KRAUSE AND RESTREPO

tegrate the deterministic nonlinear equation and the TLM equation. Thus, T 3 (C9N 1 CN 3 ) 3 I k ’ T 3 CN 3 3 I k .

(18)

Here C9 is an implementation constant of the same order as C. Here we are assuming that the second term dominates the estimate. Comparing (17) and (18) we can find the conditions whereby the cost of the bootstrap filter exceeds that of the DKF: N , ab

FIG. 10. The four-vortex–four-drifter problem: (y1, w1) estimate and truth (dashed). BF, DKF, and cDKF estimates appear to coincide as the solid line: s2 5 0.01 and r2 5 0.01.

7. Computational cost of DKF and cDKF A reasonable comparison of costs is to contrast the computational complexity of DKF and cDKF with those of the bootstrap filter cost. The cost of computing the bootstrap filter estimate from tk to tk11 is dominated by CbaT 3 N 2 3 I,

(17)

which is the cost of computing the noise term. Here C is an implementation constant, I is the number of sample paths, and N is the dimension of the state variable. Typical of the calculations performed here I 5 5 3 105. Here T is the number of time steps required in the deterministic ordinary differential equation integrator. Here aT is the number of times steps taken in the stochastic differential equation integrator, where a  1; b . 1 is associated with the computational overhead inherent in the particular numerical stochastic integrator chosen as compared to the same order deterministic ordinary differential equation integrator. The time step in the stochastic ordinary differential equation integrator in the bootstrap was on the order of 1024, the time integrator for the deterministic problem, on the other hand, could be made 100 times larger; in general a can be a large number. Relevant to this estimate is that many explicit numerical integrators for stochastic differential equations are such that increasing b decreases a and vice versa. (The stochastic and ordinary differential equation integration schemes used here were the stochastic Heun and its deterministic counterpart.) The cost of computing the DKF in the same time interval includes a contribution from having to time in-

I . Ik

This reflects the impact of non-Gaussianity in the cost of the DKF: the less Gaussian the assimilation becomes, the closer I/Ik is to 1; the worst scenario would be a uniform distribution of likelihood weights. Our experience with the cDKF is that one can take Ik to be at most a fixed fraction of I, say Ik # 102rI, r $ 1, without suffering a serious deterioration on the estimate of the first three moments. As such, a conservative estimate for cDKF cost exceedance over the bootstrap filter would be N , ab10r , where r is related to the phase space configuration complexity. In the computational examples illustrated above involving the four vortices and four drifters, we used r 5 2. In the prediction steps the cost of DKF (or cDKF) is the same as EKF times the number of branches of prediction solved, provided the EKF forward propagator can be solved with the same time step as in DKF. The cost of the DKF analysis step when g is a constant matrix or g(t), is on the order of N2 times the number of samples I. For a random g(w) the cost is still order N2 times I: the integral of g is computed–approximated yielding a vector, the vector–matrix multiply is now the same as before. The computational cost in the analysis step in EKF is the smaller of order N3 or order Ny3. The cDKF scheme has another computationally attractive feature: since the maximum number of branches of prediction Ik is fixed from the outset, it is possible to distribute the load and fix the memory allocation among different processors in a multiprocessor implementation of the method. This is clearly an advantage over the situation in which communication and memory management would have to take place in the course of filtering: this is in fact a challenge to the parallel implementations of a bootstrap filter. With parallelism, cDKF can be computationally competitive with EKF yet capable of handling non-Gaussian problems.

4398

MONTHLY WEATHER REVIEW

8. Summary and conclusions The diffusion kernel filter is a data assimilation method of general applicability to time-dependent nonlinear– non-Gaussian estimation problems. The method is arrived at by a parameterization of small fluctuations of Brownian-driven paths about a deterministic path, within branches of prediction. A small-covariance expansion of sample paths, within branches of prediction, permits representing the overall cloud of samples in terms of a deterministic path and a fluctuating field, which is determined by the diffusion kernel. The diffusion kernel is the matrix product of the tangent operator and the diffusion matrix of the stochastic differential equation. The norm of this kernel gives a bound on the covariance matrix of the state within a branch of prediction. As such it can be used to define a notion of ‘‘prediction’’ itself. It can also be used to assess sensitivity of the deterministic history to Brownian noise. The DKF is relatively easy to implement, provided a tangent linear model is available. In applications, the tangent linear model can be found using automatic differentiation. We apply the DKF to oceanic Lagrangian data assimilation. In the general case this problem is nonlinear–non-Gaussian and thus not amenable to standard methodologies; in the context of a test problem, we show this is the case: extended or ensemble Kalman filters do not provide robust answers when the initial uncertainties are properly set (i.e., truth is more prone to be outside of the confidence interval along the filtering process). There are other sample-based methodologies that can handle the dynamic and statistical complexities of the Lagrangian data assimilation problem, such as the path integral method (see Restrepo 2008) or a particle bootstrap method (which we implement here and use its estimates as benchmarks). However, these methods are challenged by the typical size of practical estimation problems. The computational efficiency of DKF is competitive with particle filter methods for moderately sized problems. A variation of the DKF method is the clustered DKF or cDKF, which is significantly more efficient than DKF. It derives its efficiency from using fewer samples to describe the posterior, which implies using fewer branches in the prediction step. This increase in efficiency comes with a slight but acceptable degradation in the description of the statistics of the dynamical history. To illustrate DKF and compare its estimates to other filters, we chose the purely Lagrangian oceanic problem that was considered by KIJ. This problem has been used to test several filtering methodologies. The model can easily be made multiscale by prescribing many vortices with different circulations and many drifters. It can also

VOLUME 137

be expanded to handle more complex phenomenology. The dynamics consist of interacting point vortices and passive drifters, modeled by a set of stochastic differential equations, with a well-characterized diffusion term. In this estimation problem noisy time dependent measurements are assumed available. As was done by Kuznetsov and collaborators, we focused on assimilation problems in which sparse partial observations of the drifter positions are available. As it turns out the position of the drifters are very sensitively dependent on the position of the vortices. Though the model we considered here is highly stylized, it may be the case that the assimilation of the ocean state by an assimilation procedure that takes in drifter data will also be highly sensitive to this source of data. This aspect of practical Lagrangian data assimilation deserves further analysis. In this study we show that even in the simple case of two interacting vortices and one drifter, for which measurements and their uncertainty are available, the sensitivity in the observed variables to the remaining variables is enough to challenge extended as well as ensemble Kalman filters, even when the uncertainties are very small. On the other hand, the first three moments of the dynamical history produced by DKF were found to agree well with the outcomes of the benchmark bootstrap filter under a variety of different system configurations and estimation parameters. Oceanic Lagrangian data assimilation problems are, as a norm, nonlinear, and non-Gaussian. Kuznetsov and coworkers reported on failures in their extended Kalman filter scheme, when applied to the point vortex– drifter system. However, they concluded that failures were rare since they were tied to the occurrence of complex dynamic behavior of the system, such as in the vicinities of saddle bifurcations. They also identify high dynamic sensitivity, which can occur when the vortices and drifters are very close to each other, as another source of failure in their extended Kalman filter scheme. We compared the extended Kalman filter estimates to a benchmark estimate. In doing so it became clear that the extended Kalman filter failure happens even when the dynamics do not involve saddle bifurcations or dynamic sensitivity; we note, in fact, that the uncertainty norm for the cases where we observed failure of the extended Kalman filter indicated that the dynamics were not very complex (the uncertainty norm was comparable among nearby branches of prediction). In the examples calculations considered here we found that the extended Kalman filter estimates failed at the analysis step when the Gaussianity assumption did not hold. In cases in which Kuznetsov and collaborators report large estimates in error norms, the extended Kalman filter estimate was nonsensical. However, they were apparently

DECEMBER 2009

4399

KRAUSE AND RESTREPO

not aware of the failures when the error norm was below their chosen threshold: the threshold was not small enough and suitable to the nature of the problem. As it turns out, the nature of the dynamics of the problem under consideration is one in which a relatively small portion of ocean surface area is explored. In this study we explore to some extent the issue of choosing a norm to quantify whether a data assimilation simulation is to be deemed a success or a failure. The choice is not straightforward when the problem is nonlinear–non-Gaussian, as it is tied to the very notion of prediction; and as we illustrate in this paper, these notions of prediction can vary significantly for any given nonlinear non-Gaussian problem. We suggested here the use of an alternative norm to the Euclidean distance between an estimate history and the truth trajectory. This alternative emphasizes the statistical consistency of the method product, and is more naturally suited to proposing alternative notions of prediction and robust answers. We call our proposed measure the n-moment error. It is defined as the distance between the first n-moment estimates produced by the method being assessed and those produced by a benchmark method. We implemented and tested the ensemble Kalman filter and found it to perform poorly in the n-moment error norm, even when using 500 000 samples and a forecast time step that was very small. The DKF, on the other hand, consistently captured at least up to the thirdmoment estimate. In addition to accuracy, a good estimation tool requires high computational efficiency, particularly in the geosciences where a variety of important problems are inherently large in state dimension. In comparison with other methods applicable to nonlinear–non-Gaussian problems the DKF is competitive. It is possible to push further the DKF and trade some estimation performance for a further computational gain, such as we did in proposing the cDKF. In this variant the branches of prediction are subsampled. We obtained a tenfold increase in computational efficiency with a small degradation of the moment estimates. In going forward the challenge will be to obtain maximal gains in efficiency in exchange for reasonable losses in the quality of the description of the statistics of the dynamical history. In this spirit we are presently studying several algorithmic modifications to DKF. Acknowledgments. This work was supported by NSF Grant DMS0335360. The authors thank P. L. S. Dias and A. Stuart for stimulating discussions. We also thank J. Hansen and the anonymous reviewers for suggesting ways to significantly improve the presentation of the results.

REFERENCES Alexander, F. J., G. L. Eyink, and J. M. Restrepo, 2005: Accelerated Monte Carlo for optimal estimation of time series. J. Stat. Phys., 119, 1331–1345. Arulampalam, M. S., S. Maskell, N. Gordon, and T. Clapp, 2002: A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process., 50, 174–188. Bischoff, C. H., A. Carle, G. F. Corliss, and A. Griewank, 1992: ADIFOR: Automatic differentiation in a source translation environment. Proceedings of the International Symposium on Symbolic and Algebraic Computation, P. S. Wang, Ed., ACM Press, 294–302. [Available online at http://www-fp.mcs.anl. gov/autodiff/.] Chorin, A. J., O. H. Hald, and R. Kupferman, 2002: Optimal prediction with memory. Physica D, 166, 239–257. Crisan, D., P. del Moral, and T. J. Lyons, 1999: Discrete filtering using branching and interacting particle systems. Markov Process. Related Fields, 5, 293–331. Evensen, G., 1997: Advanced data assimilation for strongly nonlinear dynamics. Mon. Wea. Rev., 125, 1342–1354. ——, 2004: Sampling strategies and square root analysis schemes for the EnKF. Ocean Dyn., 54, 539–560. Eyink, G. L., and J. M. Restrepo, 2000: Most probable histories for nonlinear dynamics: Tracking climate transitions. J. Stat. Phys., 101, 459–472. ——, ——, and F. J. Alexander, 2004: A mean field approximation in data assimilation for nonlinear dynamics. Physica D, 194, 347–368. Friedrichs, K. O., 1966: Special Topics in Fluid Dynamics. Gordon and Breach, 177 pp. Giering, R., cited 1999: TAMC: Tangent linear and adjoint model compiler. [Available online at http://www.autodiff.com/tamc/.] Gordon, N. J., D. J. Salmond, and A. F. M. Smith, 1993: Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc., 140, 107–113. ¨ zgo¨kmen, and Griffa, A., A. D. Kirwan Jr., A. J. Mariano, T. O H. T. Rossby, Eds., 2007: Lagrangian Analysis and Prediction of Coastal and Ocean Dynamics. Cambridge University Press, 500 pp. Gustafsson, N., 2007: Discussion on ‘4DVAR or EnKF?’. Tellus, 59A, 774–777. Hairer, M., A. M. Stuart, and J. Voss, 2007: A Bayesian approach to data assimilation. Physica D, 230, 50–64. Ide, K., L. Kuznetsov, and C. K. R. T. Jones, 2002: Lagrangian data assimilation for point vortex systems. J. Turbulence, 3, doi:10.1088/1468-5248/3/1/053. Kim, S., G. L. Eyink, J. M. Restrepo, F. J. Alexander, and G. Johnson, 2003: Ensemble filtering for nonlinear dynamics. Mon. Wea. Rev., 131, 2586–2594. Krause, P., 2009: The diffusion kernel filter. J. Stat. Phys., 134, 365–380. Kushner, H. J., 1962: On the differential equations satisfied by conditional probability densities of Markov processes, with applications. SIAM J. Control, 2A, 106–119. ——, 1967a: Dynamical equations for optimal nonlinear filtering. J. Differ. Equations, 3, 179–190. ——, 1967b: Approximation to optimal nonlinear filters. IEEE Trans. Auto. Control, 12, 546–556. Kuznetsov, L., K. Ide, and C. K. R. T. Jones, 2003: A method for assimilation of Lagrangian data. Mon. Wea. Rev., 131, 2247–2260. ¨ zgo¨kmen, T. M., A. Griffa, A. J. Mariano, and L. I. Piterbarg, O 2000: On the predictability of Lagrangian trajectories in the ocean. J. Atmos. Oceanic Technol., 17, 366–383.

4400

MONTHLY WEATHER REVIEW

Pardoux, E., 1982: E´quations du filtrage non line´aire de la pre´diction et du lissage (Nonlinear filtering equations for prediction and smoothing). Stochastics, 6, 193–231. Pham, D. T., 2001: Stochastic methods for sequential data assimilation in strongly nonlinear systems. Mon. Wea. Rev., 129, 1194–1207. Restrepo, J. M., 2008: A path integral method for data assimilation. Physica D, 237, 14–27. ——, G. K. Leaf, and A. Griewank, 1998: Circumventing storage limitations in variational data assimilation studies. SIAM J. Sci. Comput., 19, 1586–1605.

VOLUME 137

Simon, D., and T. Chia, 2002: Kalman filtering with state equality constraints. IEEE Trans. Aerosp. Electron. Syst., 39, 128–136. Stratonovich, R. L., 1960: Conditional Markov processes. Theory Probab. Appl., 5, 156–178. Van Leeuwen, P. J., 2003: A variance minimizing filter for large scale applications. Mon. Wea. Rev., 131, 2071–2084. Weiss, J. B., A. Provenzale, and J. C. McWilliams, 1998: Lagrangian dynamics in high-dimensional point-vortex systems. Phys. Fluids, 10, 1929–1941. Wunsch, C., 1996: The Ocean Circulation Inverse Problem. Cambridge University Press, 458 pp.

Suggest Documents