1640
MONTHLY WEATHER REVIEW
VOLUME 137
Sampling Errors in Ensemble Kalman Filtering. Part II: Application to a Barotropic Model WILLIAM SACHER AND PETER BARTELLO McGill University, Montre´al, Que´bec, Canada (Manuscript received 13 June 2008, in final form 17 September 2008) ABSTRACT In the current study, the authors are concerned with the comparison of the average performance of stochastic versions of the ensemble Kalman filter with and without covariance inflation, as well as the double ensemble Kalman filter. The theoretical results obtained in Part I of this study are confronted with idealized simulations performed with a perfect barotropic quasigeostrophic model. Results obtained are very consistent with the analytic expressions found in Part I. It is also shown that both the double ensemble Kalman filter and covariance inflation techniques can avoid filter divergence. Nevertheless, covariance inflation gives efficient results in terms of accuracy and reliability for a much lower computational cost than the double ensemble Kalman filter and for smaller ensemble sizes.
1. Introduction Reduced-state filters (Lawson and Hansen 2004) have been extensively studied in the past few years as an alternative to the extended Kalman filter (e.g., Evensen 1992), which has shown limitations when used with strongly nonlinear dynamics (Evensen 1992; Gauthier et al. 1993; Bouttier 1994). Various versions using Monte Carlo methods, many of them reviewed by Evensen (2003), have been the subject of studies in meteorological, oceanographic, and hydrological contexts. One of the major issues of these ensemble assimilation techniques is related to the impossibility of using a very large number of members. Small ensemble sizes lead to unavoidable sampling errors in the estimation of the background error statistics, and can have dramatic consequences on the quality of the analysis. We focus here mainly on the sampling issues concerning the operational applications of the ensemble Kalman filter (EnKF), namely, the fact that the use of a limited number of members implies spurious covariances, leading to a degradation of the filter accuracy, as well as an underestimation of the analysis variance. To address these issues, we first recall that the aim of Monte Carlo filtering is to satisfy the following two criteria:
Corresponding author address: William Sacher, Atmospheric and Oceanic Sciences Department, McGill University, 805 Sherbrooke St. West, Montreal, QC H3A 2K6, Canada. E-mail:
[email protected] DOI: 10.1175/2008MWR2685.1 Ó 2009 American Meteorological Society
1) to produce an analysis having the best accuracy, and 2) to produce a reliable (or statistically consistent) ensemble of analyses. Practically, the first criterion means to produce a sample with an ensemble mean having the smallest expected error (the ensemble mean is commonly used as the state’s best estimate). The second criterion means that the variations of the ensemble mean error, though not accessible, should be well represented by the ensemble of analyses given by the filter itself. In other words, the true state should be statistically indistinguishable from a randomly selected member of the ensemble. In the standard EnKF, the first criterion is generally satisfied on average, whereas the second one is not, as analytically established in Sacher and Bartello (2008, hereafter Part I). As detailed in Part I, it is possible to define an ideal value for the analysis variance when using an ensemble of limited size. This value is generally different from the optimal value, that is to say the one obtained when using an infinite number of members. Rather, the ideal value is the one the analysis variance has to take for the analysis ensemble given by the EnKF to be reliable. In general, the EnKF gives an unreliable ensemble of analyses. The finite ensemble used implies sampling errors and that the true state is on average statistically distinguishable from a randomly selected member of the ensemble (Part I). The practical consequence of such a characteristic, namely, the underestimation of the analysis variance, has been known for many years. In some cases this leads to filter
MAY 2009
SACHER AND BARTELLO
divergence: the analysis loses track of the truth, whereas its spread (and therefore the assumed background error variance) remains much smaller than the actual error of the analysis mean, making the filter believe it performs better than it does in reality. This paper is the second part of a two-part study. In Part I, a series of analytic results on EnKF performances are exposed and verified using simulations with a simple scalar. Here, we use a quasigeostrophic barotropic model in order to examine the relevance of the analytic results presented in Part I in a multivariable context. These results concern one-analysis cycle experiments. We point out here that this is the natural approach to adopt. Namely, when iterating the analysis over many cycles, no simple analytical results can be established for the mean analysis error variance or the mean-squared error (MSE) of the analysis ensemble mean, owing to the nonlinear nature of the flow. In the scalar case, we have seen that it is possible to considerably improve the quality of one analysis with respect to criterium 2. In the current paper, we want to see if this result carries over to a barotropic model and ultimately test the solutions, based on one analysis cycle, in a simulated forecast system. We point out here that this exercise remains of theoretical nature. Among the various problems existing in practical data assimilation, we only address a specific one: the sampling errors associated with the use of ensembles of limited size in Monte Carlo filtering. The simulated forecast system uses the barotropic model at a low resolution, and therefore provides a very simplified context in comparison with the NWP setting. Nevertheless, even if we are not in an identical regime, we want to determine if our analytic results can be used in data assimilation in a multivariate context. We expect that some of our results, even if not immediately applicable, will be extendable to operations. We use a stochastic (Tippett et al. 2003; Lawson and Hansen 2004) version of the EnKF, namely, using an ensemble of perturbed observations. We aim to establish the relative performances of the covariance inflation techniques and the double ensemble Kalman filter (DEnKF; see references hereafter). Both have been proposed to avoid filter divergence. In addition to these techniques, some authors (Houtekamer and Mitchell 2001; Hamill et al. 2001) also used a so-called covariance localization (or tapering), acknowledging the difficulty to obtain accurate estimates of the small correlations associated with remote observations and limited ensemble sizes (as recognized by Houtekamer and Mitchell 1998). To properly tune an operational EnKF, both covariance inflation and localization may be necessary, although Karspeck and Anderson (2007) show that isotropic localization is more detrimental than spurious
1641
correlation because of the sampling errors, even for small ensemble sizes. In any case, in order to get the best understanding of the impact of each, we think it is appropriate to study them separately. That is why we concentrate our efforts here on the covariance inflation and reserve the analysis of covariance localization for a later study. However, we shall point out that the use of localization would reduce the sampling error e in the evaluation of the background error covariance matrix Pf. The analytic results of Part I lead directly to the conclusion that the filter would be more accurate, and that a smaller covariance inflation factor would be required than if no localization is used. This is consistent with Furrer and Bengtsson (2007), who found that applying an appropriate taper to the sample error covariance matrix reduces the need for inflation. The DEnKF was first implemented by Houtekamer and Mitchell (1998) on a three-level, quasigeostrophic T21 model, with simulated observations. They found a reduction of the spread misrepresentation, at the expense of a significant loss of accuracy. The authors have then extensively used this technique in many subsequent studies (Houtekamer and Mitchell 2001, 2005; Houtekamer et al. 2005). Their version of the DEnKF, using a pair of 48-member ensembles, has been implemented in operations in the medium-range ensemble prediction system at Environment Canada. Observing that the use of two pairs of ensembles still leads to analysis variance underestimation, the authors also suggested (without testing) dividing the ensemble in 4, 8, and ultimately N (using, for a particular member, a gain matrix computed using all of the other ensemble members) stating that this would lead to ‘‘more efficient implementations’’ of the EnKF (Houtekamer and Mitchell 1998; Mitchell et al. 2002). As in Part I, we shall refer to the l-DEnKF, when l 2 ½½2, N is the number of subensembles used. Other authors have tested the DEnKF in various contexts. Whitaker and Hamill (2002) compared the single and double deterministic and stochastic EnKF versions with a simple scalar model. They found that the DEnKF reverses the sign of the error in the analysis variance with respect to the optimal value. They concluded that the DEnKF provides a ‘‘very accurate estimate of the actual ensemble-mean analysis error’’ (i.e., satisfying the second criterion), and also that ‘‘the mean absolute error in the estimation of the analysis variance for the DEnKF with 2N members is very similar to the single EnKF with N members.’’ Charron et al. (2006) used a DEnKF with 2 3 50 members to assimilate synthetic radar data in a dry (perfect) Boussinesq model with periodic boundary conditions. The analysis obtained were generally reliable. Using a hybrid EnKF three-dimensional variational data assimilation (3DVAR) analysis scheme
1642
MONTHLY WEATHER REVIEW
with a quasigeostrophic model, Hamill and Snyder (2000) tested the N-DEnKF, as proposed by Houtekamer and Mitchell (1998). Based on rank histograms, they obtained spread underestimation (i.e., the second criterion was not satisfied), whose intensity depended on the observation density. Anderson (2001) applied the DEnKF to the 40-variable Lorenz model, and found a loss of accuracy. Tong and Xue (2005) tested the DEnKF to assimilate synthetic radar data in a compressible nonhydrostatic model and concluded that this approach ‘‘overestimated the analysis uncertainty in our case and did not help improve’’ the analysis. Using basic perturbation matrix theory, van Leeuwen (1999) analytically addressed the problem of analysis variance and spread misrepresentation, which appeared to come from the nonlinear dependence of the Kalman gain on the background error covariance matrix. The analytic development detailed in Part I uses an identical framework to express the mean analysis error variance and the MSE of the analysis ensemble mean as a function of the sampling error e. Covariance inflation, a technique increasing the spread of the ensemble while not changing the subspace spanned by the ensemble, has also been tested in various contexts. Anderson and Anderson (1999) first suggested covariance inflation because of the known problem of underestimation of analysis variance and its potential link to filter divergence, though Pham et al. (1998) were the first to introduce an inflation factor (termed the forgetting factor by them). Hamill et al. (2001) tested covariance inflation in a series of analysis cycles using a two-layer hemispheric primitive equation global spectral model at T31 truncation, and a stochastic EnKF. They found that the optimal magnitude of the inflation was a function of ensemble size; the smaller the size of the ensemble, the larger the inflation. Whitaker and Hamill (2002) applied covariance inflation and tapering simultaneously with both a stochastic and deterministic filter and a two-level idealized GCM. They determined the optimal combination of the two in terms of reliability. Ott et al. (2004) tested the same covariance inflation with data assimilation experiments using the 40variable Lorenz model and a new formulation of the EnKF. They found that the ad hoc covariance inflation factor increases when the number of observations used decreases. In a twin experiment, Snyder and Zhang (2003) tested the potentiality for a deterministic EnKF to assimilate Doppler radar data with a cloud-scale model. They used various values for the covariance inflation factor to compensate for an observed lack of spread. They found that inflation, by increasing the deviations from the ensemble mean and then enhancing spurious convective cells, degraded the results. Tong and Xue (2005) reached the same conclusions in a similar study.
VOLUME 137
Hamill and Whitaker (2005) point out that a constant inflation coefficient for all of the domain may not be optimal. In addition, they stated that the existence of data-sparse regions in the model domain may lead to unchecked growth in the variance of the ensemble. One general characteristic recognized by many authors is that only a few percent of covariance inflation is required to avoid divergence. We claim here that many results obtained in the above-mentioned studies are quite consistent with what can be inferred from the analytic results presented in Part I, which were confirmed with a scalar experiment. In that case, covariance inflation and the 2-DEnKF were found to perform well in the sense of both criteria. The results presented in the current study show that these are also confirmed in a fully nonlinear barotropic model. Section 2 recalls some fundamental results obtained in Part I. Section 3 describes the barotropic model and presents the simulation settings, and section 4 contains the main results obtained in such a multivariate context, providing some guidelines on the relative performance of each type of filter tested here.
2. Summary of the main results of Part I To verify the optimality of each EnKF version in the sense of both criteria, we have measured the total analysis error variance by taking the trace of the ensemble-based average analysis error covariance maa . We recall Eq. (12) from Part I, which links the trix PN a i 2Pa made in the average sampling error heai 5 hPN estimation of the analysis error covariance matrix, to f 2 P f: the square of the sampling error e 5 PN hPaN i 5 Pa LhkQkT iLT 1 o(kkQkT k), where Q 5 HPfHT 1 R, k 5 eHTQ21 and L 5 I 2 KH. In appendix B of Part I, it is also described how in turn, hkQkTi can be linked to the optimal values Pa and K, for Gaussian error fields. This leads to Eq. (13) of Part I, which is reproduced here: tr(hPaN i) ’ tr(Pa )
1 [tr(LPa KH) þ tr(LPa )tr(KH)]. (1) N1
A necessary condition to satisfy the second criterion is a i) has to be as close as possible to the expected that tr(hPN MSE (with respect to the truth) of the analysis ensemble a the analysis mean. This is estimated by the trace of DN mean error covariance matrix: DaN 5 (ct c a )(ct c a )T ,
MAY 2009
SACHER AND BARTELLO
where ct is the true state and ca is the analysis ensemble mean of the studied process. The equivalent of (1) for a is given by Eq. (14) of Part I: DN 1 1 [tr(LPa KH) tr(hDaN i) ’ 1 1 tr(Pa ) 1 N N1 a 1 tr(LP )tr(KH)] . (2) For the l-DEnKF, l 2 ½½2, N, we have equivalent analytic expressions for these values [Eqs. (28) and (31) of Part I]: tr(hPaN(1,l) i) ’ tr(Pa ) 1
l11 [tr(LPa KH) 1 tr(LPa )tr(KH)], (l 1)N l (3)
1643
inflation being dependent on the local density of observations. The optimal inflation factor in the sense of both a i) 5 tr(hDaNi) for the new criteria is given by solving tr(hPN analysis: 1 tr(Pa ) r’ 11 N tr(hLN PaN i) 1 [tr(LPa KH) 1 tr(LPa )tr(KH)]1/2 21 N . (7) 1 (N 1)tr(hLN PaN i) This value of r can be approximated by evaluating tr(Pa), tr(LPaKH), tr(LPa), and tr(KH) over the ensemble (for more details, please see appendix A).
3. The model and the assimilation system a. The model
a(1,l)
tr(hDN i) l l1 tr(Pa ) 1 [tr(Pa LKH) ’ 11 (4) N (l 1)N l 1 tr(Pa L)tr(KH)] . These expressions are valid for a merged analysis, that is to say an analysis that considers the whole ensemble given by merging all of the subensembles. The expression for one particular subensemble of the l available is given by the following [Eq. (22) and (25) of Part I]: tr(hPaN i) ’ tr(Pa ) 1
l [tr(Pa LKH) 1 tr(Pa L)tr(KH)], (l 1)N l (5)
tr(hDaN i) l l tr(Pa ) 1 ’ 11 [tr(Pa LKH) N (l 1)N l 1 tr(Pa L)tr(KH)] , (6) with l 2 ½½2, N and p 2 ½½1, l. We also recall from Part I that in the covariance inflation technique, the deviations from the ensemblemean forecast are inflated by a constant factor r (Hamill et al. 2001) so that the new value cejf for each member is cejf 5 cjf 1 (r 1)(cjf c f ), where c is a vector of length n containing the assimilated field and the overbar stands for an ensemble-based estimate. We shall point out here that this is a coarse way to compensate for covariance underestimation, because the inflation is homogeneous on the domain, whereas one may expect the
In the context of data assimilation in geophysical fluid dynamics, the use of a simple b-plane barotropic model is motivated by the need for dynamics that mimic a certain range of observed features and a scale to scale nonlinear transfer. The latter is particularly important when considering the propagation of information introduced at various scales by a set of observations, or resolution errors. The enstrophy cascade of barotropic b-plane homogeneous turbulence in the 2D model used here provides a simple representation of synoptic-scale flows. We think that this model is generally a good one to address the fundamental problems that we want to look at regarding the cost implications of the numerics. Namely, it is easy and rapid to implement and reproduces the simplest balanced dynamics. The model solves the barotropic vorticity equation on the b-plane: ›z 1 J(C, z) 1 by 5 f D(z), ›t
(8)
where C 5 2Uoy 1 c, z 5 =2c, u 5 2›c/›y, and y 5 ›c/›x. Here J is the Jacobian, f is a forcing term, and D is a linear dissipation operator. The model integrates the equation on a doubly periodic domain of length 2p using pseudospectral methods (Orszag 1971). It is possible to perform model scaling by looking at its ‘‘climatology,’’ namely, the ensemble mean enstrophy and kinetic energy. After spinup, these quantities reach a statistical stationarity state where fluctuations grow and decay. The same model settings as in Tanguay et al. (1995) are used here and the reader is referred to that paper for a detailed description. The domain size corresponds roughly to 7000 km, and the forcing scale to 2000 km. This mimics the injection of barotropic energy
1644
MONTHLY WEATHER REVIEW
VOLUME 137
by baroclinic instability at the synoptic scale. In this study, we have used the model at a very low resolution (64 3 64), implying Dx ;100 km. As we work with a large number of realizations of the model, this is the price to pay to have good statistics. In Fig. 1 we reproduced typical snapshots of the vorticity and the streamfunction. We think that this model more accurately reflects the NWP problem than for instance, the Lorenz 1963 model.
b. The assimilation system The complete formulation of the stochastic, nonlocalized, EnKF used here can be found in section 3 of Part I. We have detailed in the latter that there are two ways to consider the analysis given by the DEnKF. The first one, the partitioned ensemble, implies that the members that compose each subensemble do not change for all the subsequent analysis cycles. The second one, the merged ensemble version of the DEnKF, implies that the l subensembles are merged at the end of each analysis step to give a single ensemble of size N. A new random partition of this merged ensemble in l different N/l-member subsets is employed for the next assimilation period. Twin-experiment data assimilation simulations have been performed using the barotropic model. The assimilated field is the streamfunction. A randomly chosen simulation, taken to be the truth, is run in parallel. Synthetic observations are generated from this truth run by perturbing it with numbers issued from a Gaussian set of zero mean, generated with a pseudorandom number generator. For all EnKF implementations here, we have assumed that observations are independent and all have the same variance sr2. This means that the observation error covariance matrix can be written R 5 sr2Im [where Im is the identity matrix in the observation space, and sr has been systematically chosen so f T H ) ; tr(R)]. Observations are regularly that tr(HPN spaced on the grid. Also, the total number of available observations as well as the observation frequency have been chosen so that the total variance is roughly reduced by a factor of 2 [i.e., tr(Pa) ; 0.5tr(Pf)], which is in the range of expected values in the NWP context.
1) INITIAL ENSEMBLE To generate a collection of 10 000 initial and equiprobable members, the model is first spun up from random initial conditions. Once statistical stationarity is obtained, Fourier phases (but not amplitudes) are randomly perturbed to generate a set of 10 000 equally probable realizations. All of them are then run for a long time, up to a climatological error level. Then a succession of data assimilation cycles is performed with
FIG. 1. Examples of vorticity and streamfunction fields obtained with the barotropic model at resolution 64 3 64. Shading correspond to the vorticity and the contours correspond to the streamfunction.
a 256-member EnKF. Here, m 5 64 observations are available every 30 h. Once a stabilized error is reached (after 150 analysis cycles), the initial ensemble is considered to be ready. These operations have been repeated 36 times, in order to generate a first set of 9000 available independent members, and another 4 times to generate an additional 1000 independent members.
2) ONE ANALYSIS CYCLE This experiment should be compared to a similar one with the Gaussian scalar of Part I. Here, an average over 1000 simulations has been performed for each ensemble size tested (varying between 16 and 128 members). Each simulation takes a random sample of N members among the first 9000 members of the 10 000 available, whereas ‘‘truth’’ is randomly chosen among the remaining 1000 others. The true background error covariance matrix, as well as the optimal values of the gain and the analysis error covariance matrix K and Pa are approximated by running the 10 000 available members. As done in Hamill et al. (2001), we assume that the covariance estimate over this large ensemble can be considered as the true covariance Pf. This estimate has been used to calculate the analytic values taken by a a i) and tr(hPN i) and given by Eqs. (1) and (2). tr(hDN In this experiment, 64 observations are available every 30 h.
3) SIMULATED
FORECAST SYSTEM
A particular ensemble of members and truth chosen randomly from the 10 000 available is run for 200 days.
MAY 2009
SACHER AND BARTELLO
1645
In this data assimilation simulation m 5 32 observations are available every 10 h. Ensembles with N 5 32, 64, 96, 128, and 256 have been tested for the standard EnKF, and N 5 32, 64, and 128 for the other EnKF versions.
4. Numerical results In this section, we present the results of the simulations with the barotropic model. The standard EnKF, the EnKF with covariance inflation, and the l-DEnKF techniques have been tested. We compare the results obtained to the analytic expressions given in section 2, as well as to the results obtained with the Gaussian scalar of Part I. We conclude on the relative performance of each EnKF version. In the one-analysis cycle experiment we use the same second-order moment approach, as in Part I. That is to a i), say we look at the MSE of the analysis mean tr(hDN which is a measure of the accuracy of the filter, as well as the mean total analysis error variance represented by the a i). The trace of the analysis error covariance matrix tr(hPN comparison of both values gives a measure of the reliability of the produced ensemble of analyses. These two mean values represent the total error values (i.e., the sum of the contributions of each length scale to the calculated errors). To see if some features are observable at particular length scales, we also looked at the spectral decomposition of these quantities. Even so, an analysis relying only on the second-order moment of the probability density function (pdf) is an incomplete tool to evaluate how reliable the ensemble might be. Since we may be dealing here with non-Gaussian fields (owing to the nonlinear character of the flow), we present rank histograms for each version of the EnKF tested. A technique adapted to multivariate contexts, the minimum spanning tree (MST) rank histograms (Smith and Hansen 2004) has been used. Last, we present the performance of each EnKF version in the simulated forecast system.
a. One-analysis cycle 1) MEAN ANALYSIS ERROR VARIANCE AND MSE OF THE ANALYSIS MEAN
Figures 2, 3, 4, 5, and 6 show the analytic (continuous lines) and simulated values (squares and triangles) taken a i) and tr(hDaNi) as a function of N. The analytic by tr(hPN curves are given by equations of section 2. Figures 3, 4, and 5 show the results for the DEnKF, the left panel showing the results obtained with a merged ensemble and the right panel showing those obtained for the first subensemble only. In general, the results of the simu-
FIG. 2. Simulation of one analysis cycle with the barotropic model. MSE of the analysis ensemble mean (triangles) and trace of the average analysis ensemble-based covariance matrix (squares). Means are performed over 1000 realizations. For each experimental curve, the analytic curves from Eqs. (1) and (2) are drawn as continuous lines.
lation with the barotropic model match very well the analytic curve for the EnKF and the EnKF with covariance inflation, and quite well for the DEnKF when N is not too small. It shows that the second-order accuracy of the series expansion is sufficient. In Fig. 2, as established in (1) and (2), we see that a a i) and tr(hDN i) are converging toward the optimal tr(hPN value (drawn as a horizontal solid line) with a negative and positive error, respectively. This means that the analysis ensemble is not reliable and confirms that the EnKF is naturally subject to filter divergence. In Fig. 3 one sees that the partitioned DEnKF (right panel) almost gives a reliable analysis ensemble (the second criterion), but at the expense of a significant decrease of the filter accuracy (roughly by a factor of 2), meaning that the first criterion is no longer satisfied. When l is increased, we see in Figs. 4 and 5 that the MSE of the
1646
MONTHLY WEATHER REVIEW
VOLUME 137
FIG. 3. As in Fig. 2, but for a 2-DEnKF. (left) Statistics for the merged analysis, analytic curves from Eqs. (3) and (4) are drawn as continuous lines. (right) Statistics for the analysis of the first ensemble, analytic curves from Eqs. (5) and (6) are drawn as continuous lines.
analysis mean increases dramatically. This characteristic comes from the factor l in the first parenthesis of Eq. (6). This loss of accuracy has been reported by studies cited in the introduction. On the other hand, as established by the analytical results, the mean error variance now has a positive error with respect to the optimal value. However, the mean analysis error variance still underestimates the actual MSE of the analysis mean. In our case, even if the partitioned 2-DEnKF might be interesting for the reliability of its ensemble (the squares are just below the triangles), we see in Fig. 4 and 5 that further ensemble divisions deteriorate the quality of the analysis ensemble in the sense of both criteria. On the other hand, and still as predicted, the merged DEnKF does not imply a significant loss of accuracy when compared to the standard stochastic EnKF. However, while still satisfying the first criterion, the 2-DEnKF gives a mean analysis variance significantly bigger than the MSE of the analysis mean, meaning that
the filter is not optimal in the sense of the second criterion (the spread is too big). This leads to a filter giving too much weight on the observations. When looking at Figs. 4 and 5 we see that this behavior is less evident when increasing l, as indicated by Eq. (3). For the 8-DEnKF, the merged analysis is almost optimal in the sense of both criteria. This is consistent with the idea that there exists some optimal value for l. The solution for this optimal value is given by Eq. (32) in Part I: in our case [64 3 64 barotropic model with tr(HPfHT) ; tr(R)], we found lopt 5 9.6. However, the l-DEnKF includes the calculation of l Kalman gain matrices using an ensemble of size l21(l 2 1)N. Hence, an optimal l-DEnKF is quite demanding in terms of computational cost. In the scalar case, the solution is lopt 5 2. In a simulation with the barotropic model at a resolution of 16 3 16 (not shown), we found that the optimal value was lopt 5 5.4. This illustrates the variability of the optimal value of l (already mentioned in
MAY 2009
SACHER AND BARTELLO
1647
FIG. 4. As in Fig. 3, but for a 4-DEnKF.
Part I), and suggests that it increases with resolution. Consequently, it might be difficult to implement a merged DEnKF with the optimal value of l in an operational context. Figure 6 shows the results obtained using the covariance inflation factor r described by Eq. (7). We see that this method is likely to satisfy both criteria, even though the mean analysis variance given by the EnKF with covariance inflation is slightly too small for small values of N. This comes from the fact that we used the actual value of the optimal inflation factor r as evaluated over the ensemble. An inflation of the inflation is then needed here. [The quality of the analysis is confirmed in Fig. 9, where one sees that this method can avoid filter divergence down to a small number of members (N 5 32 in our case).] The above-cited graphs look very much like their scalar counterparts, presented in Part I. For the EnKF, the second-order truncation in the power series is still a very good approximation in this multivariate case. For the l-DEnKF, the truncated power series remains a
good approximation for values of N that are not too small. The similarity between the scalar case and the 64 3 64 periodic barotropic model, somewhat surprising, as well as the similar results obtained with resolutions 16 3 16 and 32 3 32 (not shown), suggest that the conclusions of our analysis may be independent of resolution and might apply quite well in an operational context where the state space is larger. We shall point out here that the magnitude of the MSE of the analysis mean and the mean analysis error variance are functions of (I 2 KH)KH. In a realistic data assimilation process, one expects the background and the observation information to have comparable error variances. In the simple scalar case, if we let k vary from 0.25 to 0.75, (1 2 k)k varies between 0.1875 and 0.25, a small range. In the present case, the results described above should remain qualitatively the same. We have confirmed this remark by performing similar experiments where K was changed by varying either the number of available observations or the observation error variance sr2 (not shown).
1648
MONTHLY WEATHER REVIEW
VOLUME 137
FIG. 5. As in Fig. 3, but for a 8-DEnKF.
2) MINIMUM SPANNING TREES RANK HISTOGRAMS
So far, we have looked at the mean analysis variance and the MSE of the analysis mean to assess the reliability of the analysis ensembles. That is to say we only focused on the second-order moment of the pdfs. One may ask if conclusions change when examining the pdfs as a whole. In this section, we use rank histograms to determine whether ensemble members and the truth are drawn from the same pdf. Rank histograms (also called Talagrand diagrams; Hamill et al. 2001) are a classic tool to evaluate ensemble forecasts. For an ensemble of size N, one chooses a way to sort the members, for example by comparing the magnitude of some single element of the state vector. Then, one records the rank of the verification in the constituted list. When the experiment is repeated over many realizations, the verification will populate the bins corresponding to the N 1 1 available ranks. If the ensemble and the verification share a common pdf, there should
be no preference in the verification rank. Hence, a perfectly reliable ensemble leads to a flat histogram. However, in multivariate cases, as pointed out by Smith and Hansen (2004), one cannot ‘‘evaluate the diagrams from more than one variable unless the forecast value of each variable is truly independent.’’ In our context, rank histograms obviously do not provide reliable results because of the existence of spatial correlations in the error fields. Instead, we have employed MSTs, an approach suggested by them. The idea is to use the the length of MSTs to sort the members and determine the rank of the verification. First, N MST lengths Li, i 2 [1, N] are calculated, using the trees obtained when the ith member is replaced by the truth. Then the length L0 of the MST using the N ensemble members is calculated and ranked in the constituted list. A repetition of Nr independent analysis cycles is done. If the ensemble is reliable, the number of Li smaller than L0 is a random variable with mean (Nr 1 1)21 and standard deviation N205{(Nr 1 1)21[1 2 (Nr 1 1)21]}0.5, and should lead to a flat histogram. For example, in the case of the EnKF,
MAY 2009
SACHER AND BARTELLO
1649
therefore poses a risk of filter divergence. On the other hand, we see that using the appropriate covariance inflation factor leads to much flatter histograms. We see that the DEnKF gives comparable histograms, but for l 5 8. For smaller values of l, the last bins of the histograms are overpopulated, consistent with the previous experiment. (Also, when looking at Fig. 9, described below, we see that the flat histograms observed here correspond in practice to the best results obtained in terms of spread and error of the mean, and particularly that there seems to be a direct link between histograms overpopulated in the first bins and filter divergence.)
b. Multianalysis experiment
FIG. 6. As in Fig. 2, but for an EnKF with an optimal covariance inflation factor.
as the mean analysis error variance generally underestimates the MSE of the analysis mean, we expect to see very small numbers of L0 exceeding the lengths Li. The first bins of histograms will be overpopulated. Here, we have constructed MST rank histograms for ensemble sizes varying between 16 and 128. As for the mean analysis error variance and the MSE of the analysis mean, we used 1000 different twin experiments, as described in section 3b. We point out that in these twin experiments, the verification and the analyzed ensemble can be considered to be independent. Figure 7 shows MST rank histograms for the standard EnKF, for the EnKF covariance inflation, and for the merged analysis of the l-DEnKF with l 5 2, 4, and 8. The results are highly consistent with those obtained for the secondorder moments. For instance, we see that the standard EnKF is characterized by histograms far from being flat, especially for small values of N. Indeed, the fact that the left bins are overpopulated shows that the truth lies, on average, far from any generated ensemble. This is consistent with an underestimation of the spread, and
Figure 8 shows similar experiments as above, but after 9 analyses (;4 days) and 17 analyses (;7 days). We see that the analysis error variance tends to take an almost constant value, close to the optimal value and independent of the number of members N. On the other hand, the MSE of the analysis ensemble mean still decreases with respect to N. This supports the use of an inflation coefficient r which does not correct for the negative error observed in Eq. (1) in the analysis error variance. Then, we can drop the factor of 2 in the expression of r and use the following corrected expression for the inflation coefficient: 1 tr(Pa ) rc ’ 1 1 N tr(hLN PaN i) 1 1 N1 [tr(LPa KH) 1 tr(LPa )tr(KH)] 1/ 2 , (9) 1 (N 1)tr(hLN PaN i) rather than the one presented in Eq. (7). In our nonlinear context, we point out that it is impossible to establish theoretical results for the mean analysis error variance, as well as for the MSE of the ensemble mean for more than one analysis. That is why we acknowledge that the above formula is only speculation in this case. Nevertheless, the results obtained in the next section confirm the relevance of Eq. (9).
c. Performance in a forecast system Figure 9 shows a data assimilation experiment for different ensemble sizes and the above-mentioned versions of the EnKF. A given ensemble, chosen randomly from the 10 000 available was run for 200 days. The a ), which is a measure of the analysis error values of tr(DN a ), which is a measure of the of the mean, and tr(PN spread, are plotted as a function of time. In the top panels, we can see the simulation for the standard EnKF for ensemble sizes N 5 32, 64, 96, 128, and 256 members. For the period presented, the EnKF clearly diverges for N , 256 members. We see that the average variance, which reaches some stable value after
1650
MONTHLY WEATHER REVIEW
VOLUME 137
FIG. 7. MST rank histograms for different versions of the EnKF, for N 5 32, 64, and 128: (from left to right) standard EnKF, EnKF with optimal covariance inflation, merged analysis of the 2-DEnKF, merged analysis of the 4-DEnKF, and merged analysis of the 8-DEnKF.
a few days, is smaller when N is small. On the other hand, the error of the mean increases more rapidly when N decreases and the divergence occurs earlier when N is smaller. These observations are highly consistent with previous results and confirm the existence of a relation between filter divergence and the nonreliability of the EnKF analysis ensemble. For N 5 256, the two curves follow each other quite well. The bottom part of the panel shows the simulations for the same period and for the same ensemble sizes using the EnKF with optimal covariance inflations (corrected and uncorrected), and the merged l-DEnKF, l 5 2, 4, and 8. The bottom row shows the results obtained with N 5 32, the middle row with N 5 64, and the top row for N 5 128. For N 5 32, we see that the DEnKF diverges for each value of l tested. This happens earlier for small l. We point out that the divergence observed here is different from the filter divergence observed with the standard EnKF in the top part, because both the analysis variance and error of the mean diverge. This result, already observed by Houtekamer and Mitchell (1998), shows that the DEnKF is able to track the evolution of the error of the mean, even if this error diverges. On the other hand, one can see that the uncorrected covariance inflation prevents the filter from diverging. Nevertheless, when comparing with the values obtained in the case of a standard EnKF with N 5 256, we see that the level of the mean analysis error is much higher. This is the price to
pay to have a stable EnKF with such a small ensemble. For N 5 64 and N 5 128, the covariance inflation techniques as well as the DEnKF avoid the filter divergence observed for the standard EnKF. We clearly see that the corrected covariance inflation leads to smaller spread than any of these versions of the DEnKF. This result is consistent with the experiments presented above. The reason why the corrected covariance inflation diverges for N 5 32 probably stems from the general underestimation of traces of matrices calculated over the ensemble. Nevertheless, unlike what might be inferred from the one-cycle experiment, we see that the error of the mean for the DEnKF and for the uncorrected inflation coefficient does not remain at the level of that obtained for the corrected inflation coefficient. Instead, it seems to stabilize at a larger value and follows the evolution of the mean analysis error variance. When using a larger r in the case of covariance inflation, and therefore increasing the spread, we have also observed (not shown) that the error of the mean stabilizes at a larger value. This means that the accuracy is degraded when the spread is too large. To analyze error spectra, we define in the physical space the squared error of the analysis mean d2 (x, y) 5 (c a c t )2 , as well as the analysis error variance s2 (x, y) 5 (cja c a )2 , where c is the actual streamfunction. Then, it is possible to express these error fields in the Fourier space as dk2 and sk2, k being the
MAY 2009
SACHER AND BARTELLO
1651
FIG. 8. As in Fig. 2, but for more than one analysis. MSE of the analysis ensemble mean (triangles) and trace of the average analysis ensemble-based covariance matrix (squares). The error values have been divided by the optimal analysis error obtained in Fig. 2 (i.e., 0.53). Results for (a) 9 and (b) 17 analysis cycles. Means are performed over 250 realizations.
wavenumber. Figure 10 shows the values of dk2 and sk2 for various analysis times in the case of 128 members with a standard EnKF, an experiment shown in Fig. 9. One can see that the two errors have similar spectra, almost overlapping each other at the beginning. When the filter begins to diverge, the spectrum of the error of the analysis mean is translated vertically, whereas the spectrum of the analysis error covariance does not change significantly on average. This suggests that the difference between these two errors is quite homogeneous. This supports the use of a covariance inflation factor that does not depend on scale. We point out here that though the spectra of ct and c are expected to be significantly different (especially in the small scales), this is not the case with the error field examined here. The average over the ensemble is also likely to reduce the differences between the two spectra. Moreover, the error
spectra appear to be very flat after 190 days. This might come from a lack of resolution. This confirms the need for future work involving a model with larger resolution, which might lead to a different averaging of the small scales and show more differences between the curves at large k. It would be straightforward to correct this in a spectral model.
5. Conclusions The performance of different types of ensemble Kalman filter (EnKF) in a quasigeostrophic barotropic model have been tested and compared. Though very simplistic, we expect this context to provide dynamics that efficiently mimic a certain range of features and scale to scale energy transfer observed in the atmosphere and ocean. The problem addressed here is the sampling
1652
MONTHLY WEATHER REVIEW
VOLUME 137
FIG. 9. Simulation in a forecast system for different versions of the EnKF, for different ensemble sizes. The dark line corresponds to tr(DaN) (i.e., to the analysis error of the mean); and the gray line corresponds to tr(PNa ) (i.e., to the spread). (top) Standard EnKF with N 5 32, 64, 96, 128, and 256; (bottom from left to right) EnKF with optimal corrected covariance inflation, EnKF with optimal uncorrected covariance inflation, merged analysis of the 2-DEnKF, merged analysis of the 4-DEnKF, and merged analysis of the 8-DEnKF.
errors introduced by the use of ensembles of limited size and the general underestimation of the analysis variance and filter divergence. By averaging over a large number of equivalent realizations in a one-analysis cycle experiment, we have been able to confirm the analytic results presented in Part I for a stochastic EnKF, a stochastic double ensemble Kalman filter (DEnKF) and a stochastic EnKF with perturbed observations. The general agreement between the analytic and observed
behavior of the filters confirms the validity of the hypotheses behind the analytic results, especially the order 2 truncation in the Taylor series, and the Gaussianity of the error fields. It turns out that the results presented with a simple scalar model in Part I generalize very well to the nonlinear multivariate context. When used with an ensemble of limited sized, the EnKF has a propensity to lead to filter divergence, because the error statistics of the analysis mean are on average different from the
FIG. 10. Spectra obtained for various times with a standard EnKF with N 5 128 members in the simulated forecast system. As shown on Fig. 9, the EnKF diverges in this situation. (solid line) Squared error of the analysis ensemble mean; (dashed line) analysis error variance.
MAY 2009
1653
SACHER AND BARTELLO
error statistics calculated over the ensemble. Namely, the spread underestimates the optimal value of the analysis error, whereas the error of the mean overestimates it, with the dependency on the number of members N described in Part I. On the other hand, it is seen that with the DEnKF, in its merged version, the analysis error variance generally overestimates the error of the mean. We found that it is possible to find the number of divisions for the DEnKF to be optimally accurate and reliable. Nevertheless, this optimal number may be hard to determine operationally and implies a larger computational cost. In its partitioned version, a DEnKF using two subensembles may give a reasonably reliable analysis, but at the expense of a significant loss of accuracy, while further division of the ensemble degrades dramatically the quality of the analysis. An EnKF with optimal covariance inflation has also been tested. Results show that it is a very good alternative to obtain an optimally accurate and reliable analysis, with no significant change in the EnKF algorithm and a very small computational cost. Furthermore, analysis of the MSE of the analysis mean and the analysis error covariance spectra support the use of a scale-independent inflation factor at our resolution. However, we think that the use of an inflation factor that is homogeneous in space needs further investigation. For instance, using inhomogeneous boundary conditions may imply the need of a spatially varying inflation factor. In addition to the above-mentioned second-order statistics, a series of minimum spanning tree rank histograms were generated with the same experiment settings. Beside the consistency with the results obtained with second order, we noticed that the histogram’s lack of flatness may be directly related to the potentiality for the filter to diverge. Particularly, if the first bins are overpopulated, there is a high risk of filter divergence. Further investigation would be needed to formally establish this link. Finally, flat rank histograms are obtained for large N showing that the stochastic EnKF’s analysis converges toward the correct error’s pdf. In general, there is also a good agreement between the conclusions inferred from the one-analysis cycle experiments and the results obtained from the simulated forecast system, meaning that the theoretical conclusions give practical insight. However, an anonymous reviewer pointed out that the number of observation used (m 5 32) is less than or equal to any of the sizes of ensemble tested (N 5 32, 64, 96, 128, and 256). This ratio does not correspond to the typical operational NWP problem. Future work would be needed to examine the performance of the tested filters in the portion of parameter space where the number of observations far outweighs
the number of ensemble members. Also, we have seen that the analytic formula proposed for the inflation coefficient might need some modification when used in a forecast system because the analysis error variance tends to take almost constant values, with respect to the number of members, after several cycles. Finally, we mentioned that in spite of the relatively low-resolution and simplified dynamics used here, a series of strong assumptions have been shown to be valid in a nonlinear multivariate context. For instance, a linear observation operator as well as a diagonal observation error covariance matrix have been used. Moreover, though crucial in an operational context, the issue of covariance localization has not been addressed in our study. To properly tune an EnKF assimilation system, the impact of both covariance inflation and covariance localization on the accuracy and reliability of the EnKF should be studied. Nevertheless, even if the conclusions made here may need modifications in a real prediction system, we expect that this study will provide guidance for users of ensemble Kalman filters. Acknowledgments. This work would not have been possible without fruitful conversations with Peter Houtekamer and Herschel Mitchell. We acknowledge financial support from the Canadian Foundation for Climate and Atmospheric Sciences (CFCAS; Grant GR-500B).
APPENDIX Estimation Errors in the Calculation of Matrix Traces Formally, the estimation of all the matrices over the ensemble of size N involves an error. This error can be established by using appendix B of Part I. We have, for instance, tr(Pa ) 5 tr(hPaN i) 1
1 [tr(Pa LKH) 1 tr(Pa L)tr(KH)], N1
tr(KH) 5 tr(hKN Hi)1
1 [tr(LKHKH)1 tr(LKH)tr(KH)], N1
and 1 [tr (LKHPa L) N1 1 tr(LKKHLPa ) tr(LKHKHPa )
tr(LPa ) 5 tr(hLN PaN i) 1
1 tr(KH)tr(LPa L) tr(KH)tr(LKHPa ) 1 tr(LKH)tr(LPa )],
1654
MONTHLY WEATHER REVIEW
Namely, a general underestimation of the traces of the matrices. This dependency on N may explain why the curves are not exact matches of each other for small values of N in Fig. 6, and also why the filter diverges with the corrected inflation coefficient for N 5 32 in Fig. 9. REFERENCES Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129, 2884–2903. ——, and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127, 2741–2758. Bouttier, F., 1994: A dynamical estimation of forecast error covariances in an assimilation system. Mon. Wea. Rev., 122, 2376–2390. Charron, M., P. L. Houtekamer, and P. Bartello, 2006: Assimilation with an ensemble Kalman filter of synthetic radial wind data in anisotropic turbulence: Perfect model experiments. Mon. Wea. Rev., 134, 618–637. Evensen, G., 1992: Using the extended Kalman filter with a multilayer quasi-geostrophic ocean model. J. Geophys. Res., 97 (C11), 17 905–17 924. ——, 2003: The ensemble Kalman filter:Theoretical formulation and practical implementation. Ocean Dyn., 53, 343–367. Furrer, R., and T. Bengtsson, 2007: Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivariate Anal., 98, 227–255. Gauthier, P., P. Courtier, and P. Moll, 1993: Assimilation of simulated wind lidar data with a Kalman filter. Mon. Wea. Rev., 121, 1803–1820. Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman filter—3D variational analysis scheme. Mon. Wea. Rev., 128, 2905–2919. ——, and J. S. Whitaker, 2005: Accounting for the error due to unresolved scales in ensemble data assimilation: A comparison of different approaches. Mon. Wea. Rev., 133, 3132–3147. ——, ——, and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 2776–2790. Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126, 796–811. ——, and ——, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129, 123–137.
VOLUME 137
——, and ——, 2005: Ensemble Kalman filtering. Quart. J. Roy. Meteor. Soc., 131, 3269–3289. ——, ——, G. Pellerin, M. Buehner, M. Charron, L. Spacek, and B. Hansen, 2005: Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations. Mon. Wea. Rev., 133, 604–620. Karspeck, A. R., and J. L. Anderson, 2007: Experimental implementation of an ensemble adjustment filter for an intermediate ENSO model. J. Climate, 20, 4638–4658. Lawson, W. G., and J. A. Hansen, 2004: Implications of stochastic and deterministic filters as ensemble-based data assimilation methods in varying regimes of error growth. Mon. Wea. Rev., 132, 1966–1981. Mitchell, H. L., P. L. Houtekamer, and G. Pellerin, 2002: Ensemble size, balance, and model-error representation in an ensemble Kalman filter. Mon. Wea. Rev., 130, 2791–2808. Orszag, S. A., 1971: Numerical simulation of incompressible flow within simple boundaries (I). Galerkin (spectral) representations. Stud. Appl. Math., 50, 293–327. Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation. Tellus, 56A, 415–428. Pham, D. T., J. Verron, and M. C. Roubaud, 1998: A singular evolutive extended Kalman filter for data assimilation in oceanography. J. Mar. Syst., 16, 323–340. Sacher, W., and P. Bartello, 2008: Sampling errors in ensemble Kalman filtering. Part I: Theory. Mon. Wea. Rev., 136, 3035–3049. Smith, L. A., and J. A. Hansen, 2004: Extending the limits of ensemble forecast verification with the minimum spanning tree. Mon. Wea. Rev., 132, 1522–1528. Snyder, C., and F. Zhang, 2003: Assimilation of simulated Doppler radar observations with an ensemble Kalman filter. Mon. Wea. Rev., 131, 1663–1677. Tanguay, M., P. Bartello, and P. Gauthier, 1995: Four-dimensional data assimilation with a wide range of scales. Tellus, 47A, 974–997. Tippett, M. K., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. Whitaker, 2003: Ensemble square root filters. Mon. Wea. Rev., 131, 1485–1490. Tong, M., and M. Xue, 2005: Ensemble Kalman filter assimilation of Doppler radar data with a compressible nonhydrostatic model: OSS experiments. Mon. Wea. Rev., 133, 1789–1807. van Leeuwen, P. J., 1999: Comments on ‘‘Data assimilation using an ensemble Kalman filter technique.’’ Mon. Wea. Rev., 127, 1374–1377. Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130, 1913–1924.