Towards a GME ensemble forecasting system ...

2 downloads 0 Views 2MB Size Report
underlying probability density function, we built a new ensemble forecasting system (GME-EFS) based on the. GME model of the German Meteorological Service ...
Article

Meteorologische Zeitschrift, Vol. 17, No. 6, 707-718 (December 2008) c by Gebr¨uder Borntraeger 2008

Towards a GME ensemble forecasting system: Ensemble initialization using the breeding technique JAN D. K ELLER1∗ , L UIS KORNBLUEH2 , A NDREAS H ENSE1 and A NDREAS R HODIN3 1 Meteorological Institute, University of Bonn, Germany 2 Max-Planck-Institute for Meteorology, Hamburg, Germany 3 Deutscher Wetterdienst, Offenbach, Germany (Manuscript received March 19, 2008; in revised form September 30, 2008; accepted September 30, 2008)

Abstract The quantitative forecast of precipitation requires a probabilistic background particularly with regard to forecast lead times of more than 3 days. As only ensemble simulations can provide useful information of the underlying probability density function, we built a new ensemble forecasting system (GME-EFS) based on the GME model of the German Meteorological Service (DWD). For the generation of appropriate initial ensemble perturbations we chose the breeding technique developed by T OTH and K ALNAY (1993, 1997), which develops perturbations by estimating the regions of largest model error induced uncertainty. This method is applied and tested in the framework of quasi-operational forecasts for a three month period in 2007. The performance of the resulting ensemble forecasts are compared to the operational ensemble prediction systems ECMWF EPS and NCEP GFS by means of ensemble spread of free atmosphere parameters (geopotential and temperature) and ensemble skill of precipitation forecasting. This comparison indicates that the GME ensemble forecasting system (GME-EFS) provides reasonable forecasts with spread skill score comparable to that of the NCEP GFS. An analysis with the continuous ranked probability score exhibits a lack of resolution for the GME forecasts compared to the operational ensembles. However, with significant enhancements during the 3 month test period, the first results of our work with the GME-EFS indicate possibilities for further development as well as the potential for later operational usage. Zusammenfassung Die quantitative Vorhersage von Niederschlag erfordert eine probabilistische Grundlage besonders im Hinblick auf Vorhersagezeiten von mehr als 3 Tagen. Da nur Ensemble-Simulationen n¨utzliche Informationen der zugrundeliegenden Wahrscheinlichkeitsdichtefunktion liefern k¨onnen, haben wir ein neues EnsembleVorhersagesystem (GME-EFS) basierend auf dem GME-Modell des Deutschen Wetterdienstes (DWD) erstellt. Um geeignete St¨orungen zur Ensembleinitialisierung zu generieren, kommt die Breeding-Technik zur Anwendung, die von T OTH und K ALNAY (1993, 1997) entwickelt wurde. Mit dieser Methode werden St¨orungen durch die Absch¨atzung der Regionen gr¨oßter Modellunsicherheit entwickelt. Angewandt und getestet wird das Breeding im Rahmen von quasioperationellen Vorhersagen f¨ur einen dreimonatigen Testzeitraum im Jahr 2007. Die Ergebnisse werden mit der Leistungsf¨ahigkeit der operationellen EnsembleVorhersagesysteme ECMWF EPS und NCEP GFS verglichen. Dies geschieht sowohl im Hinblick auf den Ensemble-Spread in freier Atmosph¨are bez¨uglich der Parameter Geopotential und Temperatur als auch auf den Ensemble-Skill der Niederschlagsvorhersage. Die Ergebnisse dieses Vergleichs zeigen, dass die Vorhersagen des GME Ensemble-Vorhersagesystems (GME-EFS) vern¨unftige Vorhersagen liefern, die im Bereich des Ensemble-Spreads vergleichbar mit denen des NCEP GFS sind. Eine Analyse mit dem Continuous ranked probability score zeigen jedoch noch einige Schw¨achen des GME-EFS im Bereich der Resolution an. Allerdings weisen die Verbesserungen des Skill-Score w¨ahrend der dreimonatigen Testperiode auf die Entwicklungsm¨oglichkeiten des GME-EFS und die M¨oglichkeiten f¨ur einen sp¨ateren operationellen Einsatz hin.

1

Introduction

The development of meteorological forecasting techniques and observation systems has constantly improved the quality of short-range (up to 3 days) and mediumrange (up to 10 days) weather forecasts, e.g. for temperature and wind, in the past 20 years. In contrast to this, precipitation forecasts are still associated with the same deficiencies as 15 years ago (E BERT et al., 2003). Quantitative precipitation forecasts, for instance, are still of ∗ Corresponding

author: Jan D. Keller, Meteorological Institute, University of Bonn, Auf dem H¨ugel 20, 53121 Bonn, Germany, e-mail: [email protected]

DOI 10.1127/0941-2948/2008/0333

poor quality and should always be considered uncertain. Additionally forecasting precipitation with lead times of more than 3 days is inherently probabilistic due to the chaotic nature of the atmospheric dynamics. Hence the attraction of forecasting centers has been drawn to ensemble forecasting, which provides the only way for estimating the underlying probability density function of precipitation consistent with the remaining dynamics. In addition to the most probable value the user of an ensemble forecast is also supplied with information regarding the likelihood of other events (e.g. extreme events). A general overview of ensemble forecasting can be found in K ALNAY (2003) for example.

0941-2948/2008/0333 $ 5.40 c Gebr¨uder Borntraeger, Berlin, Stuttgart 2008

708

J.D. Keller et al.: Ensemble initialization using the breeding technique

The quality of ensemble forecasts highly depends on the method of ensemble initialization as it influences the subsequent evolution of the simulations. A simple random perturbation (e.g. L EITH, 1974) of the initial state, for instance, is not likely to lead to a pronounced perturbation growth, which is necessary for a good estimation of the uncertainties in the forecast. Hence, special ensemble initialization procedures have been developed to enforce the ensemble spread growth in such a way that these represent the uncertainties as well as possible. Several different techniques are used for the generation of the initial perturbation of analysis fields. They focusing on the estimation of the strongest growing error modes or on the uncertainties in the initial analysis field caused by deficiencies in the observational network (either the lack of observations in a certain region or the measurement errors). Only a short overview of three of these methods is presented here – a detailed description can be found in H AMILL et al. (2000). The singular vector (SV) method (B UIZZA and PALMER, 1995) is used for ensemble initialization at the European Centre for Medium-range Weather Forecast (ECMWF) in its ensemble prediction system (EPS). The SVs are created in such a way that the forecast perturbation growth is maximized over a finite time window. For this purpose the adjoint and linear tangent of the model are used, making this method technically difficult to implement. The breeding (or bred vector (BV)) method was developed by T OTH and K ALNAY (1993, 1997) and has since then been used in the National Centers for Environmental Prediction (NCEP) global forecasting system (GFS). This approach uses a recursive rescaling of the perturbed forecast relative to the unperturbed forecast and thus “breeds” the fastest growing modes of the perturbed forecast. A more detailed description of this method will be given in section 2. An ensemble initialization procedure with respect to the uncertainty coming from observational errors is presented in H OUTEKAMER and D EROME (1995). In the perturbed observations (PO) method the initial ensemble perturbations are produced by different sets of observation which are then used in the data assimilation cycle. The different sets of observations are created by perturbing the original observations with normally distributed random numbers using the error statistics of the observations. This results in a set of analysis states describing an analysis probability density function. There are multiple studies that compare the performance of the different ensemble initialization techniques. The difficulty involved in such a comparison is that the techniques are implemented in different models and are sometimes even combined in one forecast. Hence, most studies apply the techniques to a simplified model to enhance the practicability of such a study (e.g. H AMILL et al., 2000; WANG and B ISHOP, 2003; B OWLER, 2006; D ESCAMPS and TALAGRAND,

Meteorol. Z., 17, 2008

2007) but therefore all of these studies lack realistic atmospheric conditions. Results of a comparison study of (simple and masked) breeding and singular vectors as ensemble initialization technique based on a full atmospheric model (ECMWF IFS) are presented in M AG NUSSON et al. (in press). Their main findings are that the singular vector approach performs better compared to breeding for the Northern hemisphere and that masked breeding has proven to be beneficial for enhancing growth in the Southern hemisphere and tropical regions. They also emphasize the shortcomings of the breeding method as the initial spread has to be amplified considerably to allow the system to produce sufficient spread for medium range forecasts. A comparison of the performance of the three major centers for ensemble prediction (namely ECMWF, MSC and NCEP) is given by B UIZZA et al. (2005). They state that ECMWF EPS provides the most skillfull performance of the three systems, but that this is not a proof of the superiority of the singular vector approach. The higher skill is considered to also originate from a superior model and data assimilation scheme what has been confirmed by M AGNUSSON et al. (in press). The aforementioned methods for ensemble perturbation are based on the assumption that the errors of numerical weather forecasts originate from the uncertainties in the initial conditions (either from observational or model error). In addition to these methods, meteorological forecast centers have begun to perturb parts of the model physics itself to account for the uncertainties in parameterizations especially on the subgrid scale (e.g. convection, turbulence, surface parameters, for further details see H OUTEKAMER et al., 1996; B UIZZA et al., 1999). The implementation of such perturbations is not restricted to the perturbation of parameters but can also be realized by using completely different parameterization schemes. This method of ensemble perturbation is referred to as stochastic parameterization and is currently used in operational forecasting at ECMWF and MSC for example. In this paper we compare the forecasts performed using our system to those produced by the operational ensemble prediction systems ECMWF EPS and NCEP GFS. The ECMWF EPS forecasts consist of a 50member ensemble. In our study we chose the breeding technique as it is easy to implement and computationally inexpensive. In addition the method uses the full nonlinear model to generate initial perturbations. As part of the priority program Quantitative Precipitation Forecast of the German Research Foundation (DFG) we have access to models and data of the German Meteorological Service (DWD), which give us the opportunity to develop a new ensemble forecasting system based on the operational deterministic global model GME (M AJEWSKI et al., 2002). In this article we describe in detail the application and performance of breeding as ensemble initializa-

Meteorol. Z., 17, 2008

J.D. Keller et al.: Ensemble initialization using the breeding technique

tion method in the completely new ensemble prediction framework GME-EFS. The principal idea behind the breeding method and a description of our implementation is given in section 2, the experimental setup is outlined in section 3, and this is followed by a description of the verification tools (section 4.1). The results from the verification analysis are shown and discussed in section 5. Final conclusions and remarks are given in section 6.

2

The breeding technique

The breeding technique is one of the most simple and computational inexpensive implementations of ensemble initialization procedures. This method has been developed at NCEP during the early 1990s (T OTH and K ALNAY, 1993, 1997). It is a very efficient algorithm for the initialization of ensemble based prediction systems focusing on the dynamically introduced errors of the system. The bred vectors will produce enhanced perturbations especially in those regions where former forecasts show large uncertainties due to model errors (error of the flow). The algorithm called the breeding cycle consists of the following steps 1. First a small perturbation is added to the atmospheric analysis at a given time t0 . 2. Then the model is integrated from both the perturbed and unperturbed initial conditions for a short period t1 − t0 . 3. The unperturbed (control) forecast is subtracted from the perturbed one. 4. The difference field is scaled down to such an extent that it has the same norm (see section 2.1) as the initial perturbation. 5. This scaled perturbation is now added to the analysis corresponding to time t1 and the cycle is restarted at step 2. The principal idea of the breeding technique is simple and often described in literature. However, in some details we take an alternative approach to the original proposal of the breeding method. This details are described in the following sections (except for regional rescaling of the breeding modes as we apply the rescaling on a global scale).

2.1

Total energy norm

For the breeding method some kind of norm is required to measure the degree of difference between perturbed and unperturbed forecast from the atmospheric fields. Total energy is a reasonable choice as a norm as it can incorporate kinetic and/or potential energy of the perturbation. We have expanded such an energy norm

709

to a weighted total energy norm k kE which is defined by

k kE

Z 1 ′2 1 ′2 = wu · (2.1) u dV + wv · v dV 2 2 Z Z 1 cp ′ 2 1 ′2 1 + wT · p Φs dF T dV + wps · · 2T g ps s Z

with the weighting factors wu , wv , wT and wps , T¯ the mean zonal temperature at the corresponding level and Φs the orographical height expressed by the surface geopotential. The symbols u′ , v′ , T′ and p′s denote the difference fields of the u and v wind components, the temperature and the surface pressure. These difference fields are given by u′ = up − uc .

(2.2)

with up as u in the perturbed breeding run and uc in the unperturbed control run. The other variables v′ , T′ and p′s are calculated analogously. The weighted total energy norm k kE as described in equation 2.1 takes account of the kinetic energy (KE) as well as the available potential energy (APE) – represented by both, the temperature and surface pressure contributions. k kE is weighted by wu , wv , wT and wps in such a way that every contribution to the total energy – either from kinetic (u and v) or potential (T and ps ) energy – is set to 0.25 at the initial time and therefore changes of all specific contributions have the same influence on the value of the norm. The weights are calculated as follows

wu =

1 · 4

Z

1 ′2 u dV 2 0

−1

(2.3)

wv =

1 · 4

Z

1 ′2 v dV 2 0

−1

(2.4)

wT

=

1 · 4

wps

=

−1 1 cp ′ 2 dV T 2 T¯ 0  Z −1 1 1 1 ′2 · · ps 0 Φs dF 4 g ps Z

(2.5) (2.6)

where u′ 0 , v′ 0 , T′ 0 and p′s0 are the initial perturbations of the u, v, T and ps fields. Hence k kE equals 1 for the initial state and expresses the relative change of perturbation in the further course of the breeding process. In addition to the measurement of the perturbation growth in the breeding cycle k kE can be used to measure the spread growth of the ensemble forecast itself. This has proven very beneficial in the process of setting up and tuning the GME-EFS.

710

2.2

J.D. Keller et al.: Ensemble initialization using the breeding technique

Ensemble initialization with breeding

The information coming from breeding is used in the ensemble initialization process as follows: let   u  v  X= (2.7) T  ps be a state vector consisting of the breeding parameter fields. Therefore an initial state vector Xinit is generated i so that   breeding mean Xinit = f · X − X + Xanalysis (2.8) i i with f as a scaling factor, Xanalysis as the analysis state vector, Xbreeding as a single state vector from the i breeding run and X

1 X breeding Xi = n

solved. As in all NWP models the GME uses parameterization schemes for physical processes which are left unresolved by the adiabatic part of the model. The schemes implemented in the GME model include calculations for • radiative transfer, • grid scale precipitation and cloud microphysics, • shallow and deep convection, • vertical turbulent fluxes, • subgrid-scale orographic effects, • soil model, and • cloudiness. For vertical discretization hybrid coordinates are used as described by S IMMONS and B URRIDGE (1981) with 40 levels and a top level at about 30 km. For further information on the GME the reader is referred to M AJEWSKI et al. (2002) and references therein.

3.2

n

mean

Meteorol. Z., 17, 2008

(2.9)

Adaption of the GME for ensemble prediction

i=1

as the mean state of all breeding runs. The aforementioned method describes how the breeding modes are calculated using the mean of the breeding runs instead of the control run. This is because the breeding patterns with respect to the control run are very similar and so would the resulting perturbations be. This is also the reason for applying the scaling factor f to the ensemble perturbations as the breeding patterns with respect to the breeding runs’ mean are quite small in amplitude.

3 3.1

Model and experimental setup Model

For our experiments we have chosen the global model GME (M AJEWSKI et al., 2002) of the German Meteorological Service (DWD). Differently to its predecessor the integration in the horizontal domain of the GME is conducted in grid space rather than spectral space. This grid is based on an icosahedron which is iteratively (ni-times) divided into triangles (e.g. as proposed by S ADOURNY et al., 1968; W ILLIAMSON, 1969). Thereby such a grid provides nearly equally spaced (triangular) grid boxes and avoids a singularity at the pole. The horizontal resolution used in the experiments ni = 64 (from here on referred to as ni64) equals a conventional grid spacing of approximately 100km. The aforementioned treatment of horizontal discretization also avoids unnecessary physical calculations in overresolved near-pole areas which occur in grids with polar singularities. In the adiabatic part of the model prognostic equations for the horizontal wind components, temperature, surface pressure, water vapor and liquid water content are

The practical implementation of an ensemble prediction environment is often referred to as an EMBARRASSING PARALLEL PROBLEM . To allow for efficient usage of the GME in such an ensemble prediction environment, the ensemble is started as a single application instead of multiple instances. The different ensemble members are then spawned in parallel threads. This implementation offers very simple handling for the whole system and hence is advantageous for experimental application of the GME-EFS. In this context special attention should be paid to the issue of interpolation of data. Our experiences show that such procedures should be based on conservative methods (SCRIP; see J ONES, 1999)1 . In addition special care has to be taken when interpolating water vapor and cloud properties as well as surface coverage with snow and sea ice. For technical details a special report is available upon request.

3.3

Experimental setup

The full ensemble forecasting system is tested with the basic breeding initialization in a quasi-operational mode for the three- months period of the COPS experiment (W ULFMEYER et al., 2008). Therefore initial forecasts are started at 2007-05-24 00 UTC with randomly disturbed surface pressure and run for 4 days. The resulting difference fields between the perturbed runs and the control run are then used as initial perturbations for the breeding cycle starting at 2007-05-28 00 UTC. The 1 The

implementation of the conservative calculations was realized within the cdo tool which allows conversions and calculations of various gridded data formats and is freely available on http://www.mpimet.mpg.de/fileadmin/software/cdo

Meteorol. Z., 17, 2008

J.D. Keller et al.: Ensemble initialization using the breeding technique

breeding procedure is then carried out every 12 hours until 2007-08-31 12 UTC. The breeding cycle is set up with 21 runs (one control run and 20 perturbed runs) and hence results in one control forecast and a total of 20 ensemble members. The ensemble initialization is implemented as follows: the original breeding patterns obtained from the breeding cycle are multiplied by a scalar scaling factor f (see equation 2.8). In our simulations this factor is set to 3 which has proven to be a reasonable value as a result of a case study sensitivity analysis. A scalar scaling factor is chosen instead of random vectors to ensure the balance of the initialization state. Vectorial scaling e.g. for concentrating the growth in a specific region, would disturb this balance leading to a damping of the initial perturbations and therefore result in a smaller growth of ensemble spread. The initialization procedure is performed every day at 00 UTC and the forecast simulations are run for 6 days.

4

Verification

4.1

Verification methods

The quality of the GME-EFS is tested with different scores and skill scores. The results of these analyses are compared to the operational ensembles ECMWF EPS and NCEP GFS. One of the scores – spread skill score – is based on the root mean square error (RMSE) relationship of ensemble spread and forecast error. The continuous ranked probability score is a probabilistic score which can be seen as an integral of the Brier score to be usable not only for binary but continuous distributions. The last verification score, the quantile verification skill score, computes the skill of forecast quantiles from the ensemble compared to those from a reference forecast for a certain variable which is precipitation in our specific case. 4.1.1

Ensemble spread skill

Let k kRM S denote the root mean square (RMS). Then we can define

2 t (4.1) Espd = Xe t − hXe t i RM S

which is the spread of an ensemble at forecast time step t where Xe t are the forecasted values of the single ensemble members at time t and hXe t i is the ensemble mean. The . . . denotes an average over many ensemble forecasts. Then

2 t Eerr = XT t − hXe t i RM S (4.2) is the error of the forecast at time step t where XT t is the true state at time t. Following PALMER et al. (2006)

711

the relationship between the spread and the error of the ensemble forecast is t t = Eerr Espd

(4.3)

if the ensemble forecast is perfect regarding the estimate of the underlying probability density function. Thus t t Espd 6= Eerr

(4.4)

for any other ensemble forecast. Therefore we define the spread skill score of the ensemble forecast in relation to the perfect ensemble as St =

t Espd t Eerr

(4.5)

at the forecast time t with S t = 1 for a perfect ensemble forecast. In our verification the spread skill score S t is calt t of either the culated by accumulating Espd and Eerr geopotential at 500 hPa (Φ500 ) or the temperature at 850 hPa (T850 ) at each forecast time step t over all ensemble forecasts resulting in a times series of the ensemble’s spread skill score depending on forecast time. In addition separate illustrations are given for Espd and Eerr in relation to the forecast lead time. 4.1.2 Continuous ranked probability score The continuous ranked probability score (CRPS) is an extension of the Brier and ranked probability score and is defined as Z ∞ (P (x) − H(x − x0 ))2 dx (4.6) CRPS = −∞

with F (x) the cumulative probability density at x, x0 the observed value and H(x − x0 ) the Heaviside function  1 if y ≥ 0 H(y) = (4.7) 0 if y < 0. The cumulative probability density is calculated using the empirical ranking method described in section 4.1.3. For additional information on the CRPS the reader is referred to H ERSBACH (2000). In our verification analysis the CRPS is calculated for the same data and time period as described for the spread skill score (section 4.1.1). 4.1.3 Quantile verification skill score Following F RIEDERICHS and H ENSE (2007) we define a skill score based on quantiles of ensemble forecasts. The quantiles of the forecasts are estimated using the empirical ranking method (summarized in F OL LAND and A NDERSON , 2002). Adapted to our analysis let Nens be the number of ensemble members and let the single ensemble forecast values be ranked from the

712

J.D. Keller et al.: Ensemble initialization using the breeding technique

smallest value x1 to the largest value xNens . Then we use the method described by J ENKINSON (1977) to estimate the cumulative probability value Pi for the i-th value (with i = 1 . . . Nens ) by Pi = 100 ·

i − 0.31 . Nens + 0.38

(4.8)

Then we can calculate any quantile τ by simple linear interpolation which is defined as follows: βτ (x) = (τ − Pj ) ·

xj+1 − xj + xj Pj+1 − Pj

(4.9)

with j being the index of the next smaller probability value in relation to τ and x the ensemble forecast values x1 . . . xNens . It should be mentioned that the 0.5 quantile is also called MEDIAN and equals the mean if the data are normally distributed. Then the quantile verification score (QVS) for a forecast time t is given by Nf cs t

QVS =

X

it =1

 ρ yit − βτ (xti )

(4.10)

with yit the observations, xti the ensemble forecast values at forecast time t, Nf cs as the total number of ensemble forecasts and  τ ·u if u ≥ 0 ρ(u) = (4.11) (τ − 1) · u if u < 0 the so-called check function. Assuming normally distributed data the QVS for the median would be close to the root mean square error. Based on the QVS the quantile verification skill score for the forecast time t and quantile τ is defined in analogy to the Brier skill score (B RIER, 1950) as QVSStτ = 1 −

QVStf or (τ ) QVStref (τ )

.

IFS analyses are taken from the THORPEX Interactive Grand Global Ensemble (TIGGE) dataset1 . Our QVSS verification analysis is conducted with 24hour precipitation sums using DWD data consisting of over 1000 gauges in Germany provided in the framework of the DFG priority program. The QVSS is calculated using forecasts in 24-hour intervals and the mean of 24-hour accumulated precipitation observations. To ensure that a sufficient number of observations is available in each 1.25◦ -box to calculate a reasonable area mean, only grid boxes with at least 80% of its area lying in Germany are taken into consideration. The reference forecast is build on precipitation data from 1999 to 2006 on the same observational network. This reference forecast is constructed by taking 31 days around the date of interest for all of the 8 years and calculating the area mean for each single date resulting in a set of 8 · 31 values. The GME-EFS results are compared to the operational ensemble forecasting systems from ECMWF and NCEP which are both based on ensemble perturbation methods accounting for the uncertainties coming from model dynamics (singular and bred vectors). However these centers made additions to their operational frameworks for better forecasting performance. The ECMWF uses stochastic parameterizations, i.e. in addition to the perturbation of the initial state, parameterized quantities are stochastically perturbed. This accounts for the uncertainty coming from the parameterized processes (such as convection or subgrid turbulence) and has proven to be beneficial (B UIZZA et al., 1999). The NCEP GFS has altered their ensemble prediction system by extending the ensemble initialization procedure with an ensemble transform (ET) approach. The original breeding patterns are orthogonalized using the forecast error covariance matrix which is obtained by an ET with rescaling applied in the data assimilation cycle of a perturbed observation process (W EI et al., 2008).

(4.12)

which equals a value of 1 for the perfect ensemble and is smaller than 1 otherwise.

4.2

Meteorol. Z., 17, 2008

Data set and verification domain

To allow for comparability both verification methods are applied to data on a Gaussian grid with a resolution of 1.25◦ in longitude and latitude which approximately matches the resolution of the GME ni64 grid used in our ensemble forecasts. Therefore all other model data are also remapped to this resolution on a Gaussian grid. The ensemble spread skill score analysis is limited to the northern hemisphere extra-tropics (30◦ N to 87.5◦ N). The approximation for the true atmospheric state for each verification time is provided by the corresponding ECMWF Integrated Forecast System (IFS) analysis. The ensemble forecast data of EPS and GFS and the

5 Results Sample breeding patterns at the beginning and the end of the simulation period taken from the geopotential height field at 500 hPa are shown in Figures 1 and 2. The filled contours in the Figures represent the differences between the first breeding member of the breeding cycle and the mean of all breeding members at 2007-0601 00 UTC and 2007-08-31 12 UTC respectively. These breeding patterns which have been used to initiate the ensemble simulations reveal a large difference in magnitude at the two particular times. This indicates to the problem that the breeding patterns do not seem to be fully developed at the beginning of the simulation period (2007-06-01) after 4 days of “breeding”. The following analyses support this thesis. 1 http://tigge.ecmwf.int

Meteorol. Z., 17, 2008

J.D. Keller et al.: Ensemble initialization using the breeding technique

Figure 1: Breeding pattern of the first member for the 500 hPa geopotential at 2007-06-01 00 UTC expressed in geopotential meters.

713

Figure 3: Spread skill score diagram of the 500 hPa geopotential against forecast lead time. Thick lines denote scores calculated over the whole simulation period for GME-EFS (solid), ECMWF EPS (dashed) and NCEP GFS (dash-dotted). Thin lines represent the scores of the GME-EFS calculated for June (diamonds), July (triangles) and August (squares). Vertical lines denote estimated 95 %confidence.

Figure 2: Breeding pattern of the first member for the 500 hPa geopotential at 2007-08-31 12 UTC expressed in geopotential meters.

The result of the spread skill score analysis for Φ500 is shown in Figure 3. The spread skill score for the quasioperational simulations calculated over all three months of the simulation period (thick lines) of the GME-EFS is comparable to that of the GFS but both scores from GME-EFS and GFS are much lower than that of the EPS, indicating that the EPS performs best. An additional comparison of single month values is provided for the GME-EFS indicated by the thin lines with diamonds (June), triangles (July) and squares (August). The monthly values for the EPS and GFS ensembles show only very small deviations from the overall values and are therefore omitted. The result of the monthly analysis reveals a distinct trend to higher skill scores with low values in June and high values in August.

Figure 4: Root of ensemble spread and root mean square error calculated with the ensemble mean of the 500 hPa geopotential height calculated for the whole period (thick lines) for the GMEEFS(solid), ECMWF EPS (dashed) and NCEP GFS (dash-dotted). Thin lines represent the values of the GME-EFS calculated for June (diamonds), July (triangles) and August (squares).

To further investigate the reasons for the different skill score characteristics, an illustration of the two different contributions to the skill score – ensemble spread and forecast error – is provided in Figure 4. The EPS exhibits a far stronger spread growth rate than GFS and GME-EFS which nearly equals the error growth rate of the EPS. This explains the ability of the EPS to maintain high skill score levels over longer forecasts lead times. It can also clearly be seen that the monthly trend in skill score in the GME-EFS is induced by an increase in ensemble spread and not by a reduction of the forecast error. Overall the GME-EFS shows a higher forecast er-

714

J.D. Keller et al.: Ensemble initialization using the breeding technique

Figure 5: Spread skill score diagram of the 850 hPa temperature against forecast lead time. Thick lines denote scores calculated over the whole simulation period for GME-EFS(solid), ECMWF EPS (dashed) and NCEP GFS (dash-dotted). Thin lines represent the scores of the GME-EFS calculated for June (diamonds), July (triangles) and August (squares). Vertical lines denote estimated 95 %confidence.

Figure 6: Root of ensemble spread and root mean square error calculated with the ensemble mean of the 850 hPa temperature calculated for the whole period (thick lines) for the GME-EFS(solid), ECMWF EPS (dashed) and NCEP GFS (dash-dotted). Thin lines represent the values of the GME-EFS calculated for June (diamonds), July (triangles) and August (squares).

ror than the GFS and a smaller spread. Hence, the skill score may be higher in a sense of reliability but not in a sense of resolution. However, the GME-EFS results are reasonable if we keep in mind that these are the first results and no calibration or tuning has been applied to the GME-EFS yet. Hence, results of an analogous analysis of the spread skill score (Figure 5) and the ensemble spread and forecast error (Figure 6) for the 850 hPa temperature (T850 ) are encouraging. The trend for this skill score over forecast time is contrary to that of the Φ500 skill score as it is increasing with forecast lead time instead of decreas-

Meteorol. Z., 17, 2008

Figure 7: Continuous ranked probability score of the 500 hPa geopotential calculated for the whole period (thick lines) for the GMEEFS(solid), ECMWF EPS (dashed) and NCEP GFS (dash-dotted). Thin lines represent the values of the GME-EFS calculated for June (diamonds), July (triangles) and August (squares). Vertical lines denote estimated 95 %-confidence.

ing. Again, the EPS performs best and the GME-EFS skill score is higher than the GFS score in the beginning and lower at the end of the simulations. The separate illustration of spread and forecast error given in Figure 6 also exhibits a very reasonable spread growth rate for the ECMWF EPS, which is by far better than that of GFS and GME-EFS. The comparison between the latter reveals that the forecast error of the GME-EFS is lower than that of the GFS. This also applies for the monthly values which show a trend to smaller forecast errors especially at the end of the forecast time. An additional analysis using the CRPS is given in Figure 7. The CRPS calculated for the geopotential height at 500 hPa is plotted against forecast lead time. The results confirm that the ECMWF EPS performs best compared to the other two ensembles. The three ensembles all exhibit a similar growth of the CRPS with lead time with a slightly stronger increase for the GMEEFS. In general the GME-EFS values exhibit an offset of about 3 in the beginning and 5.5 in the end of the forecast lead time to the GFS values. The CRPS split up onto a monthly basis shows large differences between the three months. However no clear trend can be determined as the June values of CRPS are larger, the July values are lower and the August values match those of the overall score. This may indicate that the ensemble spread is too large in August and the forecast skill is decreasing in a sense of resolution. As an approach to further investigate the cause for the growth in ensemble spread, a perturbation growth analysis of the breeding modes is undertaken. The results presented in Figure 8 show that the strength of perturbation growth fluctuates over the whole simulation period around a value of 1.5 per day. The spread of the perturbation growth rates between the single simulations is small in the beginning and slowly widens over the three

Meteorol. Z., 17, 2008

J.D. Keller et al.: Ensemble initialization using the breeding technique

Figure 8: Time series of the breeding growing modes perturbation growth expressed by the ratio of the energy norm k kE of the current time step and the initial time step. In addition to the values of the single breeding runs (grey lines) the 9-day running mean (solid black line) is plotted with the standard deviation (dashed black line).

Figure 9: Quantile verification skill score analysis for the 0.7 quantile of precipitation in Germany over the whole simulation period for the GME-EFS.

months. As this process continues until the end of the simulation period a saturation of this spread does not seems to be reached. This is an indication that the quality of bred vector perturbations may not depend on the saturation of the perturbation growth rates but on the saturation of the spread of bred vectors. An analysis regarding precipitation is given using the quantile verification skill score which is given for 19 quantiles from 0.05 up to 0.95. Figure 9 shows an example for the spatial distribution of the QVSS for the GME-EFS calculated with the 0.7 quantile for the whole forecast period. The structure of the plot with the higher QVSSs in the low mountain range and lower values in the low lands and high mountain regions is typical for quantiles above 0.5. This contrast increases up to the

715

Figure 10: The quantile verification score (QVS) as a function of the quantile value for the complete simulation period (June, July and August 2007) over all grid boxes in Germany for the parameter precipitation. The QVSs are plotted as thick black lines, its 95 percent confidence interval is represented by the vertical lines for the GME-EFS (solid), the EPS (dotted), the GFS (dashed) and the reference forecast (dash-dotted). In light grey the RMSE of the ensemble mean is plotted for each model (the lines for GME-EFS and EPS overlap).

Figure 11: The quantile verification skill score (QVSS) as a function of the quantile value for the complete simulation period (June, July and August 2007) over all grid boxes in Germany for the parameter precipitation. The QVSSs are plotted as thick black lines, its 95 percent confidence interval is represented by the vertical lines for the GME-EFS (solid), the EPS (dotted) and the GFS (dashed).

0.95 quantile while for quantile values lower than 0.5 the QVSSs become more uniform in the horizontal (not shown). In Figure 10 the dependence of the QVS on the quantile values averaged over the full area and the whole simulation period is shown. The ensemble forecasts have higher QVS values than the reference forecast for low quantiles but the increase of the reference’s QVS towards higher values is steeper than that of the ensembles. The best model is the EPS which has lower QVSs

716

J.D. Keller et al.: Ensemble initialization using the breeding technique

than the other ensembles over all quantiles. The QVS of GME-EFS is lower than that of the GFS up to a quantile value of about 0.7. The maxima of all forecasts are around a quantile value of 0.6 to 0.7 with slower increases from the lower and sharp decreases to higher quantiles. For comparison only the root mean square errors of the ensemble means of all three forecast models are plotted (GME-EFS and EPS lines overlap) which would be comparable to the QVS of the 0.5 quantile in the case of normally distributed data. The influence of the QVS on the QVSS can be seen in Figure 11. The QVSS averaged over the full area and the whole three-month period is also strongly depend on the quantile value. All three models exhibit negative values for low quantiles which means that they are worse than the reference forecast. The QVSS increases towards positive values when going to the higher quantiles. While the EPS’s increase extends up to the 0.95 quantile the GME-EFS and GFS have a maximum near the 0.7 and 0.8 quantile, respectively. Up to its maximum value the GME-EFS has a higher QVSS than the GFS, but again, both are lower than that of the EPS. The enhanced performance of the EPS is probably connected to the stochastic parameterization which is also used for subgrid-scale processes. As it is a relatively new score, experience with the QVS and QVSS is limited. Like most skill scores the QVSS is not proper and therefore the QVS may be a better measure of forecast quality. However, QVS and QVSS perform well when it comes to variables for which assumed distributions are far from reality (e.g. precipitation and normal distribution). In addition, promising results for the enhancement of ensemble forecasting skill can be seen from forecast calibration which would result in a significant reduction of the QVS values and therefore in a higher quantile verification skill score (F RIEDERICHS and H ENSE, 2008).

6

Conclusions

The results from the quasi-operational mode of the GME-EFS shown in section 5 indicate that the implemented breeding cycle establishes an ensemble initialization process that can be used to generate reasonable ensemble forecasts. However, the GME-EFS still performs worse in comparison with the operational ensembles ECMWF EPS and NCEP GFS. It especially exhibits a higher forecast error and a smaller skill in a sense of resolution, respectively. The most prominent characteristic of the GME-EFS simulations is the significant increase of the ensemble spread (and therefore the spread skill score) from June to August caused by an increase in bred vector spread which was used for ensemble initialization. This may be an indication that the 4-day breeding cycle prior to the ensemble simulations is simply to short. The time necessary for the breeding modes to develop reasonable

Meteorol. Z., 17, 2008

initial perturbations seems to be much larger – probably on the scale of months. The reason for the initially small spread are similarities of the bred vectors which only slowly develop different structures: First the breeding vectors all point in one direction and then slowly begin to evolve into different preferred directions. Evidence for this diversification becomes apparent in the breeding mode perturbation growth analysis. While the perturbation growth oscillates quite strongly, the spread between the different breeding runs increases over the whole period: The initial differences are too alike to build up a set of diverse breeding modes. Additional studies for longer timescales and different initial perturbations for the breeding process will be undertaken for determining the appropriate setup for an optimal perturbation growth in the ensemble simulations. The improvement in quality of the breeding modes also leads to the conclusion that the scaling factor f which inflates the bred vectors to be used for ensemble initialization may become superfluous when the breeding cycle is performed over a long time period. This would also prevent the system from overestimating the ensemble spread which negatively influences the skill in a sense of resolution (as can be seen from the CRPS). During set-up and testing of the GME-EFS another issue regarding the implementation of the breeding cycle became apparent. Commonly it is assumed that the unperturbed control forecast is a good representation of the ensemble mean, but during the set up of GME-EFS this could not be validated. Therefore we decided to calculate the initial perturbations for the ensemble forecasts by using the differences between the breeding runs and the mean of the breeding runs. As this lead to a far better performance it may also be beneficial to implement this approach in the breeding cycle or in the initialization of the breeding process for which further studies have to be performed. In the our analysis, 95 %-confidence intervals are calculated using the bootstrap method (see E FRON and T IBSHIRANI, 1991). This method is implemented by calculating the score for 100 different sets of randomly chosen ensemble members with repetition. These 100 scores are then used to determine the 95 %-interval. In our case all permutations exhibited a lower score than the original one and hence, the confidence interval was in general lower than the original score. This indicates that information from all ensemble members is necessary to ensure the quality of the forecast. In turn, this implies that an ensemble size of 20 (or even 50 in case of the EPS) is not yet sufficient to appropriately describe the probability density function of the forecast. For our illustrations we changed the basis for the permutations from single ensemble members to sets of ensemble forecasts to ensure for correct confidence intervals. The comparative analysis of the GME-EFS to the state-of-the-art operational ensemble prediction systems ECMWF EPS and NCEP GFS conducted for a three-

Meteorol. Z., 17, 2008

J.D. Keller et al.: Ensemble initialization using the breeding technique

month test period exhibited perspectives for operational usage of the GME-EFS. The results from the quasioperational simulations also reveal the potential for future enhancements, be it from an enhanced ensemble initialization (e.g. masked breeding) or from ensemble calibration which will be implemented in relation to the corresponding data assimilation procedures. However, the promising first results of the quasi-operational GMEEFS ensemble forecasts encourage further work.

Acknowledgments Thanks to the Max-Planck-Institute for Meteorology (MPIfM), the German Meteorological Service (DWD), the German Climate Computing Centre (DKRZ) and the European Centre for Medium-range Weather Forecast (ECMWF) for providing data and computational resources. This work has been funded by the German Research Foundation (DFG) priority program 1167 Quantitative Precipitation Forecast. We also want to thank the reviewers for their constructive and helpful remarks.

References B OWLER , N.E., 2006: Comparison of error breeding, singular vectors, random perturbations and ensemble Kalman filter perturbation strategies on a simple model. – Tellus 58, 538–548. B RIER , G.W., 1950: Verification of forecasts expressed in terms of probability. – Mon. Wea. Rev. 78, 1–3. B UIZZA , R., T. PALMER, 1995: The singular-vector structure of the atmospheric global circulation. – J. Atmos. Sci. 52, 1434–1456. B UIZZA , R., M. M ILLER, T. PALMER, 1999: Stochastic representation of model uncertainties in the ECMWF Ensemble Prediction System. – Quart. J. Roy. Meteor. Soc. 125, 2887–2908. B UIZZA , R., P. H OUTEKAMER, G. P ELLERIN, Z. T OTH, AMD Y.Z. M OZHENG W EI , 2005: A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems. – Mon. Wea. Rev. 133, 1076–1097. D ESCAMPS , L., O. TALAGRAND, 2007: On some aspects of the definition of initial conditions for ensemble prediction. – Mon. Wea. Rev. 135, 3260–3272. E BERT, E.E., U. DAMRATH, W. W ERGEN, M.E. BALDWIN, 2003: The wgne assessment of short-term quantitative precipitation forecasts. – Bull. Amer. Meteor. Soc. 84, 481– 492. E FRON , B., R. T IBSHIRANI, 1991: Statistical data analysis in the computer age. – Science 253, 390–395. F OLLAND , C., C. A NDERSON, 2002: Estimating changing extremes using empirical ranking methods. – J. Climate 15, 2954–2960. F RIEDERICHS , P., A. H ENSE, 2007: Statistical downscaling of extreme precipitation events using censored quantile regression. – Mon. Wea. Rev. 135, 2365–2378. —, —, 2008: A probabilistic forecast approach for daily precipitation totals. – Wea. Forecast. 23, 659–673.

717

H AMILL , T. M., C. S NYDER, R.E. M ORSS, 2000: A comparison of probabilistic forecasts from Bred-, Singularvector, and Perturbed Observation ensembles. – Mon. Wea. Rev. 128, 1835–1851. H ERSBACH , H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. – Wea. Forecasting 15, 559–570. H OUTEKAMER , P., J. D EROME, 1995: Methods for ensemble prediction. – Mon. Wea. Rev. 123, 2181–2196. H OUTEKAMER , P. L., L. L EFAIVRE, J. D EROME, H. R ITCHIE, H. L. M ITCHELL, 1996: A system simulation approach to ensemble prediction. – Mon. Wea. Rev. 124, 1225–1242. J ENKINSON , A. F., 1977: The analysis of meteorological and other geophysical extremes. – Met Office Synoptic Climatology Branch Memo 58, available from National Meteorological Library, Bracknell, Berkshire RG12 2SZ, United Kingdom. J ONES , P., 1999: First- and second-order conservative remapping schemes for grids in spherical coordinates. – Mon. Weath. Rev. 127, 2204–2210. K ALNAY, E., 2003: Atmospheric modeling, data assimilation and predictability – Cambridge University Press, 341 pp. L EITH , C., 1974: Theoretical skill of monte carlo forecasts. – Mon. Wea. Rev. 102, 409–418. ¨ E´ N, in press: M AGNUSSON , L., M. L EUTBECHER, E. K ALL Comparison between singular vectors and breeding vectors as initial perturbations for the ecmwf ensemble prediction system. – Mon. Wea. Rev. M AJEWSKI , D., D. L IERMANN, P. P ROHL, B. R ITTER, M. B UCHHOLD, T. H ANISCH, G. PAUL, W. W ER GEN , J. BAUMGARDNER, 2002: The operational global icosahedral-hexagonal gridpoint model gme: Description and high-resolution tests. – Mon. Weath. Rev. 130, 319– 338. PALMER , T., R. B UIZZA, R. H AGEDORN, A. L AWRENCE, M. L EUTBECHER, L. S MITH, 2006: Ensemble prediction: A pedagogical perspective. – ECMWF Newsletter 106, 10– 17. S ADOURNY, R., A. A RAKAWA, Y. M INTZ, 1968: Integration of the nondivergent barotropic equation with an icosahedral hexagonal grid for the sphere. – Mon. Wea. Rev. 96, 351– 356. S IMMONS , A.J., B.M. B URRIDGE, 1981: An energy and angular momentum conserving vertical finite-difference scheme and hybrid vertical coordinates. – Mon. Wea. Rev. 109, 758–766. T OTH , Z., E. K ALNAY, 1993: Ensemble forecasting at NMC: the generation of perturbations. – Bull. Amer. Meteor. Soc. 74, 2317–2330. —, —, 1997: Ensemble forecasting at NCEP and the breeding method. – Mon. Wea. Rev. 125, 3297–3319. WANG , X., C. B ISHOP, 2003: A comparison of breeding and ensemble transform Kalman filter ensembke forecast schemes. – J. Atmos. Sci. 60, 1140–1158. W EI , M., Z. T OTH, R. W OBUS, Y. Z HU, 2008: Initial perturbations based on the ensemble transform (ET) technique in the NCEP global operational forecast system. – Tellus A 60, 62–79. W ILLIAMSON , D.L., 1969: Integration of the primitive barotropic model over a spherical geodesic grid. – Mon. Wea. Rev. 98, 512–520.

718

J.D. Keller et al.: Ensemble initialization using the breeding technique

W ULFMEYER , V., A. B EHRENDT, H.-S. BAUER , C. KOTTMEIER , U. C ORSMEIER , A. B LYTH , G. C RAIG , U. S CHUMANN , M. H AGEN , S. C REWELL , P. D I G IRO LAMO , C. F LAMANT, M. M ILLER , A. M ONTANI , S. M OBBS , E. R ICHARD , M.W. ROTACH , M. A RPAGAUS , ¨ ¨ H. RUSSCHENBERG , P. S CHL USSEL , M. K ONIG ,, V. ¨ G ARTNER , R. S TEINACKER , M. D ORNINGER , D.D.

Meteorol. Z., 17, 2008

T URNER , T. W ECKWERTH , A. H ENSE , C. S IMMER , 2008: The Convective and Orographically-induced Precipitation Study: A research and development project of the world weather research program for improving quantitative precipitation forecasting in low-mountain regions. – Bull. Amer. Meteor. Soc. 89, 1477–1486, DOI 10.1175/2008BAMS2367.1.