Improved Skill for the Anomaly Correlation of Geopotential Height at 500 hPa
T.N. Krishnamurti1 K. Rajendran1 T.S.V. Vijaya Kumar1 Stephen Lord2 Zoltan Toth2 Xiaolei Zou1 I. Michael Navon3 and Jon Ahlquist1 1
Department of Meteorology Florida State University Tallahassee, Fl 32306–4520
2
Environmental Modeling Center, National Centers for Environmental Prediction, 5200 Auth Road, Camp Springs, MD 20746 3
CSIT and Department of Mathematics, Florida State University, Tallahassee, FL 32306
Submitted to the Bulletin of American Meteorological Society 20 November 2001
Corresponding Author: T.N. Krishnamurti E-mail:
[email protected]
1
Abstract This paper addresses the anomaly correlations of the 500 hPa geopotential heights from a suite of global multimodels and from a model-weighted ensemble mean called the superensemble. This procedure follows a number of current studies on weather and seasonal climate forecasting that are being pursued.
This study includes a slightly
different procedure from that used in our experimental forecasts for other variables. Here we construct a superensemble for the ∇2 of the geopotential based on the daily forecasts of the geopotential fields at the 500 hPa level. The geopotential of the superensemble is recovered from solution of the Poisson equation. This procedure appears to improve the skill for those scales where the variance of the geopotential is large and contributes to a marked improvement in the skill of the anomaly correlation. We note especially large improvements over the Southern Hemisphere. Consistent day-6 forecast skill above 0.80 is achieved on a day-to-day basis. The superensemble skills are higher than those of the best model and the ensemble mean. For days 1 through 6, the percent improvement in anomaly correlations of the superensemble over the best model are 0.3, 0.8, 2.3, 4.8, 8.6 and 14.6% respectively for the Northern Hemisphere. The corresponding numbers for the Southern Hemisphere are 1.1, 1.7, 2.7, 4.5, 7.1 and 12.2% respectively. At forecast day-5 and day-6, the superensemble realizes major improvement of anomaly correlation skills. The collective regional strengths of the member models, which reflected in the proposed superensemble, provide a useful consensus product that may be useful for future operational guidance.
2
1. Introduction In several operational numerical weather prediction centers, the anomaly correlation of 500 hPa forecasts has always been used as a measure of the models’ overall performance. Professor Fred Sanders from MIT was a frequent lecturer at FSU in recent years.
One of his favorite comments was that the 500 hPa anomaly correlation (a
measure of skill of 500 hPa geopotential height forecasts) was not weather. He always wondered why so much was said on that skill parameter while assessing the relative performance of models when in fact what mattered was the rain and the severe weather. The counter argument, he usually received, was that if the troughs and ridges were not correctly placed, the likelihood of a good weather forecast were slim. For what it may have been worth, an anomaly correlation index has been used uniformly over several decades by the weather services of the world to assess the performance of their models. The motivation for this paper emerged from our recent examination of anomaly correlations of 500 hPa heights in the context of a multimodel superensemble following several of our recent studies, Krishnamurti et al. (1999, 2000a, 2000b, and 2001). The multimodel superensemble is best explained from the schematics of Fig. 1. The word “superensemble” was used by the first author of this paper in a series of publications, only to stress the fact that this ensemble does carry the highest skill compared to participating member models of the ensemble and also carry skills above those of the bias removed ensemble mean representations. Here the multimodel forecasts are divided into two categories: i) past medium range forecasts and ii) current real-time medium range forecasts. The past covers roughly 100 recent past forecasts made by the multimodels. Given a benchmark analysis for this entire period, it is possible to obtain a modelweighted bias of forecasts of the multimodels at each geographical location. That is done 3
following Krishnamurti et al. (2000a), using a multiple regression of the model forecasts (applied on individual ensemble members) against a benchmark analysis. In this study, ECMWF analysis was selected as the benchmark.
This superensemble, based on
selective weights assigned to the member models, can be viewed in a probabilistic sense providing information from a number of models, Stefanova and Krishnamurti (2002). A list of acronyms is provided in Table 1. An anomaly correlation of 0.6 is generally regarded as an indication of a useful forecast. This threshold value came from experience of watching the forecast charts. A forecast with a skill greater than 0.6 generally implies that troughs and ridges at 500 hPa are beginning to be properly placed in that forecast. Reviewing the anomaly correlations of US operational forecasts, Kalnay (1991) reported the summaries for the decade of the 1980’s. During this decade, the anomaly correlations for five-day forecasts increased from values such as 0.6 to approximately 0.75 over the Northern Hemisphere and from approximately 0.4 to 0.625 over the Southern Hemisphere, as seen in Fig. 2a. Here, only the first twelve zonal wave numbers were included in her analysis. If we take the first twelve wave numbers (from forecasts and the analysis), the current skill during November 2000 for the best model and the proposed superensemble are approximately 0.85 and 0.90. This implies that major improvements for the five-day forecasts are continually being experienced. This progress of the forecast quality was attributed to factors such as the improved computing power, improved models, data coverage and assimilation methodologies. Fig. 2b shows the 500 hPa anomaly correlation skill over the European region for October through December 1981 carried out with the ECMWF model (Nieminen, 1983). What is most encouraging here is the steady maintenance of very high skill at day 3 4
forecasts where the anomaly correlations are around 0.9 for almost the entire three-month period.
That speaks for the high quality of modeling and data assimilation of the
ECMWF system. Variability in skill from one day to the next increases with skills at day 5 of forecasts ranging from 0.4 to 0.9 and for day 7 of forecasts ranging from –0.4 to 0.7. This is nearly the current state of these forecasts at 500 hPa from the current best models. Kalnay (1998) also reported on the anomaly correlation at the 500 hPa level, covering a more extensive period of nearly 43 years of forecasts. Here, the forecasts were based on the NCEP reanalysis data sets, described by Kalnay et al. (1996), and the 1998 version of the NCEP forecast model runs at the resolution T62 for day five forecasts. Results for the Southern and Northern Hemispheres (20˚ latitude to 80˚ latitude) are presented in Fig. 2c. This shows a slow increase of skill over the Northern Hemisphere with skills reaching around 0.7 and around 0.6 for the Southern Hemisphere. Dashed lines indicate the recent operational scores (also averaged over yearly periods). They are based on higher resolution operational models and reveal slightly higher skills. Brown (1987) has provided a review of the rapid progress in the improvements of global numerical weather prediction during the 1980s. The anomaly correlation at the 500 hPa was one of the measures historically monitored by the weather services during that decade. Fig. 2d from the review of Brown shows rapid increase of 500 hPa skill of the European Centre’s forecasts. Here the length of forecast period (along the ordinate) for which the forecasts with anomaly correlation greater than 0.6 are reached during different years (along the abscissa) are shown. As the analysis and forecasting system has been improved, the useful length of the prediction has increased from 5.5 days in 1980 to 6.5 days in 1985, which is reflected in the 12-month average of the forecast length where the anomaly correlation of the 500-hPa geopotential height over the 5
Northern Hemisphere drops to a value of 0.6. The monthly mean values indicated that the longer forecast skills are realized in the colder months. Anomaly correlations of 500 hPa geopotential heights from ensemble forecasts at various operational centers also have shown major improvements of skill in recent years. The ECMWF forecasts now show that three-day skills can be close to 0.9. This implies that on day three these forecasts are placing the troughs, ridges and contours right on top of the observed fields. This is an extraordinary accomplishment when we compare results obtained in the 1970’s and 1980’s from single models. Recent skills of NCEP/EPS and several other models also show similar major improvements in recent years.
It is
interesting to note that although there are major differences in the horizontal resolution of these models, somewhat comparable anomaly correlations are being achieved by these groups simply from overall model improvements. 2. Superensemble Methodology A main tool of this study is a multimodel superensemble that was recently developed at Florida State University, Krishnamurti et al. 1999, 2000 a, 2000 b, and 2001). The superensemble is developed using forecasts from a variety of weather and climate models. Along with an observed (analysis) field, past forecasts are used to derive statistics on the past behavior of these models. These statistics, combined with multimodel forecasts, enable us to construct a superensemble forecast. Given a set of past multimodel forecasts, we used a multiple regression technique (for the multimodels), in which the model forecasts were regressed against an observed (analysis) field. We then used least-squares minimization of the difference between the anomalies of the model and the analysis fields in order to determine the weights. We carried out this minimization at all vertical levels, at all geographic locations (the grid 6
points of the multimodels), and for all model variables. In all, some six million statistical coefficients describe the past behavior. The motivation for this approach came from the construction of a multimodel superensemble from a low-order spectral model (Lorenz 1963). In this low-order model, it was shown possible to introduce various (proxy) versions of cumulus parameterization (or model physics) by simply altering forcing terms (Krishnamurti et al. 1999). Time integration of this multimodel system showed that the multiple regression coefficients of these multimodels (regressed against a nature run) carry a marked time invariance. This time invariance was a key element for the success of the proposed method. We have used many models at diverse horizontal and vertical resolutions. Model output was interpolated to a common grid of the lowest resolution multimodel (about 125 km). These global models include several different parameterizations of physical processes; effects of ocean, snow, and ice cover; and treatment of orography.
The
observed (or the analysis) fields are used only during the control period to determine the weights and the verification of forecasts in the forecast phase of the superensemble. The training period for global weather comprises about 120 forecast experiments for each of the multimodels. The construction of the superensemble is a post-processing of multimodel forecasts. At least 7 or 8 multimodel forecasts are needed to produce very effective superensemble forecasts. We are preparing this forecast product experimentally in real time at the Florida State University using eleven member models. Thus it is a useful product for people to see and is currently available on a website1 on real-time basis. We have assessed the quality of superensemble weather and seasonal climate 1
http://lexxy.met.fsu.edu/rtnwp 7
forecasts using standard measures of skill such as root mean square (RMS) error, correlation against observed fields, anomaly correlation, and the so-called Brier skill score (assessing skills above those of climatology). By construction, the least squares procedure for computing the superensemble weights minimizes the root mean square error, but weights that optimize one measure of skill do not in general optimize other measures of skill. Therefore, if some other measure of skill were important, it would be desirable to select ensemble weights that optimize that particular measure of skill, although that may not always be practical in practice. 3. Summary of past results The following is a summary of computations based on our past publications (Krishnamurti et al. 1999, 2000a, 2000b and 2001). It is consistently noted in our past studies that the superensemble forecasts generally have higher skill compared to all participating multimodels and the ensemble mean. If N is the number of models, the ensemble mean assigns a weight of 1/N to all the member models everywhere for all variables. As a result, assigning the same weight of 1/N to poorer models degrades the skill of the ensemble mean. It is possible to remove the bias of models individually at all locations and for all variables and to compute the ensemble mean of the bias-removed models.
That too has somewhat lower skill
compared to the superensemble, which carries selective weights distributed in space, multimodels, and variables. A poorer model does not reach the levels of the best models after its bias removal. Training is a major component of this forecast initiative. We have compared training with the best quality “observed” past data sets versus training deliberately with poorer data sets. This has shown that forecasts are improved when higher quality training 8
data sets are deployed for the evaluation of the multimodel bias statistics. It was felt that the skill during “forecast phase” could be degraded if the training were executed with either poorer analysis or poorer forecasts.
That was noted in our recent work on
precipitation forecasts where we had shown that the use of poorer rainfall estimates during the training period affected the superensemble forecasts during the “forecast phase” (Krishnamurti et al. 2001). In medium-range real-time global weather forecasts, the largest skill improvement is seen for precipitation forecasts both regionally and globally. The overall skill of the superensemble is 40% to 120% higher than the precipitation forecast skills of the best global models. The RMS error and the equitable threat scores were the skill parameter used in that study. The training data sets for precipitation came from the daily TSDIS operational files of TRMM microwave radiometer based rainfall estimates. These were augmented from the use of the US Air Force polar orbiting DMSP satellites that provided SSM/I data from a number of current satellites (F11, F13, F14, and F15) in order to extend the global coverage. An application of these precipitation forecasts included the forecast guidance for some recent flood episodes. In real-time global weather forecasts the superensemble exhibited major improvements in skill for the divergent part of the wind and the temperature distributions. Tropical latitudes show major improvements for the superensemble for daily weather forecasts. For most variables, we have used the operational ECMWF analysis at 0.5° latitude/ longitude for the training phase in these previous studies. Real-time hurricane track and intensity forecasts is another major component of superensemble modeling. This approach of carrying out a training phase followed by real-time forecasts has shown improved forecasts for the tracks and intensity (up to 5 9
days) for the Atlantic hurricanes. Improvements in track forecasts were 25% to 35% better than those of the participating member models.
The intensity forecasts for
hurricanes have been only marginally better than the best models. In some recent realtime tests during 1999, marked skills in the forecasts of difficult storms such as Floyd and Lennie were noted where the performance of the superensemble was considerably better than that of the member models. The area of seasonal climate simulations has only been addressed recently in the context of atmospheric climate models where the sea surface temperatures and sea ice were prescribed, such as the AMIP data sets. In this context, given a training period of some 8 years and a training data base from the ECMWF the results exhibited improved skill compared to the member models and the ensemble mean. Those were based on seasonal and multiseasonal forecasts of monthly mean precipitation, temperatures, winds, and sea level pressure distributions. Further extension of this work is currently being pursued in the area of improved multimodel seasonal forecasts using coupled climate models. 4. Computational Methodology A main tool for this study is a multimodel superensemble that was recently developed at Florida State University, Krishnamurti et al. (1999, 2000 a, 2000 b, and 2001). The proposed superensemble is defined by: __
S= O +
N
∑
__
ai( F i – F i)
(1)
i −i
__
Where, S = superensemble prediction, O = time mean of “observed” state, ai is the __
weight for model i, where ‘i’ being the model index, N is the number of models, F i is the
10
time mean of prediction by model i, F i is the prediction by model i. Here the coefficient ai is determined from the use of the least square minimization procedure. The weights “ai” are computed at each grid point by minimizing the following function: t − train
G=
∑
(St – Ot)2
(2)
t =0
O = Observed state, t = time, t-train = Length of training period (100 days in present case for NWP prediction) The variance of the final geopotential field was obtained using two different methods, i) the use of height field Z to construct the superensemble, and ii) the use of the ∇ 2 Z field to construct the superensemble - here instead of constructing a superensemble
of the geopotential height Z at the 500hPa level; we have first constructed the superensemble of the ∇ 2 Z field. The reason behind that was to extract some extra skill from the geopotential gradients and its laplacian. The geopotential is thereafter recovered from a solution of the Poisson equation using the spectral transform.
(∇ Z ) 2
m n
= − n(n + 1)Z nm Ynm
(3)
Where m is the zonal wave number, and n is the index of the Associate Legendre Function (denotes the degree of its polynomial). As one proceeds to smaller and smaller scales, n becomes larger and we note then ∇2Z is directly proportional to n2, where as Z is inversely proportional to n2, thus this property would be reflected in the two dimensional spectral distribution of ∇2Z and Z. This method appears to improve the superensemble solution for the geopotential compared to a direct construction of the superensemble of Z. We are furthermore able to
11
assess the scales where this method appears to contribute to the improvement of forecasts. We have also carried out a comparison of the classical bias versus that of the proposed superensemble for the forecasts. A simple way of finding the bias for the NWP of a specific model is to take a daily string of NWP forecasts, obtain a monthly average of these, and compare those with the analysis (or observed) mean for that month. This procedure has been used by most weather services to assess whether the model has a cold, warm, moist or dry bias, etc., e.g. Heckley (1985), Sumi and Kanamitsu (1984), Kanamitsu (1985). This is the classical bias of a forecast. The proposed superensemble does not do quite the same thing. The multiple regression based least square minimization of errors is different from the simple bias correlation in the following manner. The simple classical bias is given by: CB (Classical Bias) = {Z Fn (λ, φ) − Z on (λ, φ )}/ N
(4)
Here the nth day forecast bias for a total number of N days are considered. Z Fn is the average forecasted geopotential height value at 500 hPa for a period of N days while Z On is the averaged of the observed (analyzed) geopotential height for that period.
The Superensemble based bias, following equation 1, is given by: t=N
P×Q
t =1
1
SB ( Superensemble Bias) = Z sn − Z on = ∑
∑ a (Z i
Fi
− Z FMi )
(6)
Here Z Sn is the average geopotential height obtained from superensemble forecasts for the period of N days, ais are the coefficients (weights) for individual member models, Z Fi is the forecast from model ‘i’ and Z FMi is the simple forecast mean of geopotential height from the ith member model. The summation is done both on spatial (P x Q grid points 12
(lat/lon)) and temporal (days) scales for all member models. If the ai were all set equal to 1/N, then the classical bias and the superensemble based bias are very close to each other. The weights take into account the local relative error characteristics of each model in the formulation of the superensemble. The anomaly correlation coefficient is computed for individual model forecasts and the ensemble based on the method suggested by Brankovic et al. (1990). Anomaly correlation for forecast variables is defined as the correlation between the predicted and analyzed anomalies of the variables. Here anomalies are deviations from the mean climatological values. The following expression is used for computing the anomaly correlation of geopotential height at 500 hPa. ACC =
∑ {[(Z − Z ) − (Z − Z )][(Z − Z ) − (Z − Z )] } ∑ [(Z − Z ) − (Z − Z )] ∑ [(Z − Z ) − (Z − Z )] F
C
F
C
V
C
V
C
2
F
C
F
C
V
C
V
(7) 2
C
Here suffix F denotes forecast, suffix C denotes climatology and suffix V stands for verifying analysis. Over bar is the area mean and Z is the geopotential height at 500 hPa. 5. Data Sets: The data sets used in this study are identical to those used in a recent study on precipitation forecasts (Krishnamurti et al. 2001). The daily global analysis and forecasts from the following prediction centers were used: NCEP (Washington), RPN (Canada), NOGAPS (NRL, Monterey), BMRC (Australia), UKMET (Reading, UK) and JMA (Japan). In addition to these, we used six daily forecasts from the FSU global spectral model that utilizes different initial analyses. The differences in the initial analysis are obtained through the physical initialization of observed rain rates (Krishnamurti et al. 1991) using different rain rate algorithms (Krishnamurti et al. 2001).
All of these
forecasts commenced at 1200 UTC and in most cases six-day forecast data sets were 13
available for extended periods.
In addition to these, we had access to the initial
assimilated data sets from the European Center for Medium Range Weather Forecasting for the extended periods. It should be noted that in this study we have used ECMWF data sets in the following context: i) For the training phase and forecast verification we have used the analyzed fields from ECMWF. ii) We have not used ECMWF forecast data sets in the training phase and iii) We had access to the daily anomaly correlation of geopotential height at 500 hPa of ECMWF forecasts from a web site provided by the NCEP. These are included in our tabulations of anomaly correlation skills but those were not a part of the superensemble. The formal computation of anomaly correlation for most of these models was possible on a daily basis. Table (2) provides an outline of all the models used in this study. These models deploy different horizontal and vertical resolutions, topography, physical parameterizations and modes of dynamic computations. The superensemble was constructed at an interpolated horizontal resolution of T126 (Triangular truncation at 126 waves around the globe) at the 500 hPa surface for this study. The Environmental Modeling Center of the U.S. National Weather Service maintains daily records of the 500 hPa anomaly correlation. Some of these data sets for the diverse models were extracted from their web site and used in this study. In the computation of anomaly correlation, it should be noted that the anomaly correlation do depend on the choice of the source of climatology and also the verifying analysis. In this study, both ECMWF and the NCEP analyses were separately used for the training and forecast validation and in the evaluation of anomaly correlation skills. The results from the use of these different analyses for training were quite similar; hence those based on the use of ECMWF only are shown here. 14
Since we carry a daily inventory of rms errors of forecasts during the training phase, it was possible to identify days where the forecasts skills were low. We arbitrarily removed those dates where the rms errors of the training superensemble were greater than 120 meters for the 500 hPa geopotential heights. Using 100 days of ‘higher skill’ training days, we noted that the skill of the superensemble was much improved. Fig. 3 shows the results of computations of anomaly correlations with and without this optimization of the training phase respectively. It is clear that some further improvement of the skill of the superensemble is achievable from this procedure. 6. Results of Computations: In this section we present some of our results on the multimodel forecasts covering several months of forecasts. This is an ongoing effort, which is being carried out in real time with the data sets from the models described in section 5. Unless stated otherwise, most computations and results presented here are based on a training period covering 20 March to 15 July 2000 and the forecasts covered the period from 16 July to 17 September 2000.
This period includes some missing days where data were not
available. 6.1
Optimizing the number of training days We had noted that the forecast skill degrades somewhat having either too few or
too many training days for the construction of the superensemble. Thus it was felt that a certain optimal number of days could be determined in order to obtain the best statistics for the training phase, this in turn could provide the highest forecast skill. In order to determine that, we worked separately at each grid point and assessed the optimal number of days that provided the highest skill for the superensemble.
Fig. 4a shows the
distribution of the optimal number of days for the best anomaly correlations of 15
geopotential height at the 500 hPa level for day 6 of forecasts over the North American region. The forecasts here were carried out for the entire month of August 2000; the optimal training days preceded that. It is interesting to note that a large-scale distribution of the optimal number of days is present in this analysis. That number around North America shows large-scale variation from oceans to mountains to the Great Plains. This behavior reflects the possible effects of bias errors of the member models over these different geographical regions. We have not addressed the issues of seasonality with respect to the optimal number of training days; this may need to be addressed for practical applications. 6.2
The distribution of weights The essence of the proposed superensemble lies with the distribution of multiple
regression weights for the member models. The training phase provides these weights. These weights exhibit a distribution of positive and negative fractional values. Fig. 5 (a and b) illustrates some of these distributions based on the training period covering the months April to July for the year 2000. We have arbitrarily selected four of the member models, whose weights for the Northern and Southern Hemisphere are displayed in Fig. 5a and 5b respectively. These weights display positive and negative fractional distributions. The scales of centers of maxima and minima of these weights appear to be smaller over the Northern Hemisphere compared to those of the Southern Hemisphere. Some interesting features seen here are, for instance, the weights for model 1 over North America show positive fractional weights over the Western U.S. and negative fractional weights over the Eastern U.S. Over the South Pacific and Australia, the weights for model 2 are predominantly positive whereas they are negative for model 3. Collectively, positive or negative signs are contributed by the different models. 16
If one were to contrast these with the ensemble mean, then all these distributions would bear the same constant value 1/N (where N denotes the number of ensemble models). This is the major difference between a superensemble and a member mean. The past history of performance of the member models makes the superensemble superior in skill that is reflected by the geographical distribution of the statistical weights. This exercise of determining the weights is based on training of the ∇2 Z fields and not the geopotential Z. 6.3
On the number of models We noted that differences in the design on the models arise from the choice of
physics, resolution, air-sea interaction and the definition of orography. Thus the question of the optimal minimum number of models is important in the construction of the superensemble. If more than two of the models are included, the errors of the ensemble mean start to increase, see Krishnamurti et al. (2000a). That growth of error arises from assigning an equal weight of 1.0 to all models including the models with relatively lower performance levels. However, the error of the superensemble decreases as these additional models are included, since all models seem to have something to contribute over different regions. The error growth rate starts to decrease as these models are included, and beyond the inclusion of six top models, the error reduction almost stops. The reduction of error from six models is substantial and much higher in comparison to the ensemble mean. The reason for this behavior of the superensemble arises from its selective use of fractional and negative weights for the member models of the superensemble. Thus, we feel that improved 500 hPa anomaly correlations require
17
minimally six models. These results are quite similar to what had already been noted from the data sets of 1998, Krishnamurti et al. (2000a). 6.4
Predicted Maps on day 6 of forecasts Figs. 6, 7 and 8 illustrate a typical 6-day forecast. In Fig. 6 (a, b, c and d) we
show a forecast over the northern hemisphere for day-6 of forecast from the superensemble (panel b), the best model (panel c) and a model with the lowest anomaly correlation skill (panel d). These are to be compared to the analysis valid on that date (i.e., September 4, 2000, 1200 UTC). The anomaly correlation for these three respective forecast categories were 0.75, 0.69 and 0.44. We can see a slight improvement in the forecast features over the best model and a considerable improvement with respect to the model with the lowest skill. A similar example of the day-6 forecast for the southern hemisphere is shown in Fig. 7 (a, b, c and d). The skill of the superensemble was generally quite high over the southern hemisphere. The respective anomaly correlations for the superensemble, best and the lowest skill models for this day-6 forecast were 0.81, 0.71 and 0.32. Given that roughly 10 – 15% improvement in anomaly correlation skill is possible from the superensemble over the best model, does it convey any useful synoptic information. Inspecting numerous 500 hPa forecasts from the best model and from the superensemble, we find that there is generally some more information content in the superensemble forecasts.
The superensemble places the trough along 120˚W more
accurately, as compared to the best model, which moves it somewhat farther east in six days. The ridge over North America west of Lake Michigan shows a short wave trough (riding the ridge) that is captured by the superensemble; the best model fails to capture that very short wave feature. In general, the model with the lowest skill has much lower 18
heights over the entire United States. The trough along 120˚E in the analysis is placed too far east near 140˚E by the best model and near 125˚E by the superensemble. In the highest latitudes, north of 50˚N, the difference between the forecasts of the superensemble and the best model does not appear to be large. The above features are more clearly seen from the differences between the forecasts and the analysis fields. Those fields for the northern and southern hemispheres are shown in Fig. 8 (a, b, c, d and e). The preponderance of dark blue and dark brown coloring shows the large forecast errors for the model with the lowest skill. That coloring diminishes somewhat as we proceed to the best model and the least errors are seen for the superensemble. This was an example that was selected randomly. There are several other instances where the forecast skills on day-6 were strikingly larger for the superensemble compared to the best model where these differences are even sharper than what are shown in Fig.8. 6.5 Anomaly Correlations at 500 hPa Fig. 9 (a, b and c) illustrates the anomaly correlation of the geopotential height at 500 hPa for the member models (blue) and for the superensemble (red) for a recent 20day period (this includes a 100 days of training and 20 days of forecast). These results include those models that directly participated in providing daily data sets, apart from those models whose anomaly correlations were available on the NCEP web site. Although the member models exhibit considerable variation in their anomaly correlation varying from 0.1 to 0.8 for these forecasts, we note more consistent results for the superensemble. The ∇2Z - based superensemble does show higher skill on almost all occasions for days 1, 2, 3, 4, 5 and 6 of forecasts. Panel ‘a’ shows the results over the southern hemisphere, panels ‘b’ and ‘c’ show the results for the northern hemisphere and 19
the whole globe respectively. Through day three of forecasts, the anomaly correlation skill for superensemble is consistently higher than 0.9.
This shows the major
improvement in the state of numerical weather prediction from some of the member models and from the superensemble. Overall the results for the superensemble are quite impressive. A noteworthy feature of these forecasts is the consistent higher skills over the southern hemisphere that is well above those of the last two decades shown earlier. Although we present a sample of results here, we have noted a major uniformity of these results in our continued computation of this algorithm on a real time basis. Table 3 (a, b, c and d) provides a summary of these results for the southern hemisphere, northern hemisphere and the global belt. Here the entries for the anomaly correlation skills covering a forecast period from August 20th to September 17th 2000 are presented. Results for the member models, the ensemble mean and the superensemble are included here. Results for days 1 through 6 of forecasts are provided in these tables. The wintertime skills of the anomaly correlation are generally higher than those for the summer season. The overall member model skills over the Southern Hemisphere are quite high. Over the Northern Hemisphere during this period, the best model’s skill at days 1 and 6 were 0.992 and 0.653 respectively. The corresponding numbers for the superensemble were 0.995 and 0.748 for days 1 and 6 respectively. This shows roughly a 14 percent improvement on day 6 of forecasts.
The corresponding figures for the
Southern Hemisphere are an improvement for days 1 and 6 from 0.979 and 0.715 (for the best model) to 0.990 to 0.802 (for the superensemble, i.e. roughly a 13 percent improvement on day 6). Also shown in this table are the entrees for the ensemble mean, which lie roughly halfway in between the best model and the superensemble. Thus it appears that a substantial improvement in skill is possible from the use of the proposed 20
superensemble. The global results presented in Table 3 basically confirm these same findings. The fall of skill from the best to the poor model on day 6 of forecasts for the Northern Hemisphere can be seen ranging from 0.65 to 0.45 and for the Southern Hemisphere from 0.71 to 0.53.
We have noted a consistent high skill for the
superensemble around 0.75 to 0.8 for day 6 in our experimental runs. Table 3d describes the global results for the case where two of the member models exhibiting the highest anomaly correlation skills are entirely excluded. The entries in this table are to be compared with the entries in Table 3c where these high-skill member models are included. It appears as though the addition of the best models contributes to roughly 1 to 2 percent improvement for the superensemble, the overall improvement of the superensemble over the best (available) model is around 10 percent. This improvement of superensemble is a result of the selective weighting of the available models during the training phase. 6.6 Reduction of systematic errors In Fig. 10, we show the systematic errors (predicted minus observed mean) for day-6 of forecasts for entire month of August 2000. Here the results for a member model with the least systematic error are compared with those of the superensemble for the northern and southern hemisphere. In this illustration, the distribution of colors from blue to red displays the negative to positive spread of systematic errors. A preponderance of dark blue color over the southern hemisphere reflects large positive systematic errors in excess of 40 meters for the best model. It can be clearly seen that the systematic error of the model with the lowest skill is quite large (as seen by the coloring). Even the best model has rather large systematic errors. The superensemble is not free of these errors; however, the error is small compared to that of the member models. Over the northern 21
hemisphere the preponderance of white, yellow and light brown colors for the superensemble clearly reflect a large reduction of the systematic error. These same features have been monitored on a real-time basis for the last two years. It is also clear from these computations that the superensemble does not simply remove the classical bias (i.e., forecast mean minus observed mean equal to zero).
The proposed
superensemble, although not systematic error free, is able to reduce the forecast error with obvious practical advantages but that kind of an artificial bias removal is not possible in a truly predictive sense. The bias correction is an after-the-fact correction and cannot be implemented for forecasts of any practical utility. 6.7
Percent improvement in rms errors from superensemble forecasts Fig. 11 shows the percent improvements from the superensemble forecasts
compared to those of the ensemble mean over the best model. These are for 2 and 5-day forecasts of the global geopotential heights at 500 hPa. These improvements are related to the respective rms errors. It is clear that the improvements for the superensemble are quite large at day 2 of the forecasts, approaching about 40 percent, whereas the corresponding improvement for the ensemble mean is around 28 percent. At day 5, the improvements appear significant and larger - here the corresponding numbers are 52 and 46 percent for the superensemble and the best model respectively. The mean rms errors at 48 and 120 hours of forecast for several of the member models and for the ensemble mean and superensemble are shown in Fig. 11b. These are results over the tropical belt 30o S to 30o N. The period covered for the results shown in Fig. 11 (a and b) includes 92 days starting from June 1 through August 31, 2000 (averaged). It is clear that the errors of the superensemble are smaller than the other representations shown here.
22
6.8 Improvements in the planetary and synoptic scales When we examine the zonal harmonics of the geopotential at 500 hPa level (Fig. 12), we note that the superensemble (based on the use of ∇2Z) carries larger portion of variance compared to the individual models (assessed in terms of their respective anomaly correlation) for the first 7 zonal wave numbers. These are the scales where the zonal harmonics of the 500 hPa geopotential carries the largest proportion of the total variance. The predicted geopotential field for the worst model is quite flat by day-6 of forecasts than its variance field, shown in Fig. 12, and appears that the long waves are somewhat misrepresented. The best model exhibits some improvement for the zonal percent variances while the superensemble, holding the highest anomaly correlation skill, exhibits a robust structure for the planetary and synoptic scale waves. These same features can be viewed using two-dimensional variances in the triangular truncation space. In Fig. 13 we show these variances as a function of the eastwest and north-south wave numbers ‘m’ and ‘n’. Here again we note a very robust structure for the variances centered around zonal wave numbers 2 to 5 and meridional wave numbers 4 through 12. The variances for the best and the worst models are much lower in comparison. When we first started on this exercise we felt that the construction of the superensemble would enhance the structure of the smaller scales (wave numbers greater than 10) since the ∇2Z field exhibits many smaller scales compared to the Z-field. This exercise revealed that large geopotential gradients on the planetary and synoptic scales were much improved from the construction of the superensemble of the ∇2Z field. 7. Future Outlook The results on Anomaly Correlation at the 500 hPa level are a part of an ongoing real-time Numerical Weather Prediction Exercise that is being pursued at Florida State 23
University. The total problem includes eleven different models where all the variables at 12 vertical levels are being subjected to the construction of the superensemble. For these weather models, some 107 statistical weights, which are being used, carry the past behavior. The reason for this large volume of statistics has to do with the model’s performance. Some models appear to handle local water bodies better; others have greater skill over orographic regions while others seem to describe the oceanic convection and rainfall better. The systematic errors, in detail, vary from one region to another for these diverse models.
Results for all of the variables, including daily
precipitation, Krishnamurti et al. (2001), show a rather similar behavior, i.e. the superensemble generally exhibits a much higher skill compared to the ensemble mean and the member models. The emphasis on the 500 hPa geopotential is based on historical and practical reasons. Noting that it is now possible to construct a superensemble of the geopotential heights at 500 hPa with an accuracy of 0.95 to 0.80 between days 1 to 6 of forecasts, implies that troughs and ridges are very nearly accurately placed close to their correct locations for the medium range forecasts. Achieving these skills in a consistent manner appears to be an accomplishment of current NWP systems combined with the superensemble post-processing method. This may be one of the reasons why operational forecasters should consider the implementation of this simple procedure of construction of superensemble forecasts. The data sets provided by the superensemble contain the weighted averages that are carried out independently at each grid point of the domain of calculation for each day of forecast separately. The issue of dynamical consistency of this data set has been raised following our first study, Krishnamurti et. al. (2000a), where we have examined the quasi static and quasi geostrophic (over middle latitudes) of this 24
data and noted that the balance is quite acceptable. It should be noted in this context that the construction of superensemble is based on equation (1) where the regression coefficients are based on anomalies with respect to a time mean and not from a direct use of the full geopotential fields. It should also be noted that the individual models have indeed improved considerably over the last three decades. Further improvement, no doubt, will occur as the models improve their data assimilation, resolution, dynamical representations and physical parameterizations. From what we have seen in our recent work it would seem that further improvements of the superensemble forecast would also follow. An area of future work would be to explore the usefulness of Singular Value Decomposition (SVD) method to address the removal of cases of possible ill conditioned matrices in the current superensemble. The current multiple regression procedure in the training phase involves the solution of matrices at each location and uses the GaussJordan method. We have essentially bypassed the difficulty at a few hundred of grid points using the ensemble mean wherever ill conditioning was encountered. The SVD method, as well as several other methods that invoke Empirical Orthogonal Functions (EOFs), Z-transforms, Kalman Filter and cyclostationary EOFs can also be used to remove the degeneracy. An example of anomaly correlation improvement from the SVD method is shown in Table 4. These are the results for the same period as those presented earlier in Table 3. We notice that an improvement of over 5% from the use of SVD over the southern and northern hemispheres and the global belt. On day-6 of forecasts, an increase of anomaly correlation from 0.799 to 0.845 can be seen over the global belt; while it increased from 0.748 to 0.801 and 0.802 to 0.849 over the Northern and Southern hemispheres respectively using the SVD method.
We propose to incorporate these 25
refinements in our future studies and hopefully would also be useful for operational forecasts. It is worth noting that the anomaly correlation skills over the Southern Hemisphere are beginning to exceed those of the Northern Hemisphere. At first we thought that this might have been peculiar for the period we had investigated, but it appears that as the member models are improving, the Southern Hemisphere forecast skills have indeed been going up in recent years. Currently the multimodel data sets have only been available to us through day 6 of forecasts. It should be possible to obtain these data sets through day 10 of the forecasts. It may also be possible to receive the data sets as ensemble forecasts for the modeling groups. Given such data sets we expect further improvements in the forecasts of the 500 hPa anomaly correlations. Having such a forecast on the operational suite of products can provide a useful guidance for the weather especially over the tropics and mid-latitudes. Acknowledgements: The research reported here was funded by NASA grants NAG5-9662 and NAG81537, NSF grant ATM-0108741, NOAA grant NA07WA0104 and FSURF COE. The authors wish to express their thanks to Dr. Ricardo Correa Torres for computing the optimal training periods for this study.
26
References: Brankovic, C., T.N. Palmer, F. Molteni, S. Tibaldi and U. Cubasch, 1990: ExtendedRange Predictions with ECMWF Models – Time-Lagged Ensemble Forecasting. Quart. J. Roy. Meteor. Soc., 116, 867-912. Brown, J.A., 1987: Operational Numerical Weather Prediction. Rev. Geophys., 25, 312322. Heckley, W.A, 1985: Systematic-errors of the ECMWF operational forecasting-model in tropical regions. Quart. J. Roy. Meteor. Soc., 111, 709-738 Kalnay, E, R. Petersen, M. Kanamitsu and W.E. Baker, 1991: United-States Operational Numerical Weather Prediction. Rev. Geophys., 29, 104-114. Kalnay, E., M. Kanamitsu, R. Kistler, W. Collins, D. Deaven, L. Gandin, M. Iredell, S. Saha, G. White, J. Woollen, Y. Zhu, A. Leetmaa, B. Reynolds, M. Chelliah, W. Ebisuzaki, W. Higgins, J. Janowiak, K.C. Mo, C. Ropelewski, J. Wang, Roy Jenne, and Dennis Joseph, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77, 437–472. Kalnay, E, S.J. Lord and Ronald D. McPherson, 1998:
Maturity of Operational
Numerical Weather Prediction: Medium Range. Bull. Amer. Meteor. Soc., 79, 2753–2892. Kanamitsu, M, 1985: A study of the predictability of the ECMWF Operational Forecast Model in the tropics. J. Met. Soc. Japan, 63, 779-804. Krishnamurti, T.N., C.M. Kishtawal, T. LaRow, D. Bachiochi, Z. Zhang, C.E. Williford, S. Gadgil and S. Surendran, 1999: Improved skills for weather and seasonal climate forecasts from multimodel superensemble. Science, September 3, 1999.
27
Krishnamurti, T.N., C.M. Kishtawal, D.W. Shin and C.E. Williford, 2000a: Multimodel superensemble forecasts for weather and seasonal climate. J. Climate, 13, 41964216. Krishnamurti, T.N., C.M. Kishtawal, T. LaRow, D. Bachiochi, Z. Zhang, C.E. Williford, S. Gadgil and S. Surendran, 2000b: Improving tropical precipitation forecasts from a multianalysis superensemble. J. Climate, 13, 4217-4227. Krishnamurti, T.N., S. Surendran, D.W. Shin, R.J. Correa-Torres, T.S.V. Vijaya Kumar, C.E. Williford, C. Kummerow, R.F. Adler, J. Simpson, R. Kakar, W.S. Olson and F.J. Turk, 2001: Real Time Multianalysis/Multimodel Superensemble Forecasts of Precipitation using TRMM and SSM/I Products. Mon Wea. Rev., in press. Nieminen, R, 1983: Operational verification of ECMWF forecast fields and results for 1980-1981. ECMWF Technical Report No. 36. Available from European Centre for Medium Range Weather Forecasts, Shinfield Park, Reading, Berkshire, RG2 9AX, England. 40 pp. Lorenz, E. N., 1963: Deterministic non-periodic flow. J. Atmos. Sci., 20, 130-141. Stefanova, L. and T.N. Krishnamurti, 2002: Interpretation of seasonal climate forecast using Brier skill score, FSU superensemble and the AMIP-1 dataset. Accepted for publication in J. Climate.
Sumi, A and M. Kanamitsu, 1984: A study of systematic-errors in a numerical weather prediction model .1. General-aspects of the systematic-errors and their relation with the transient eddies. J. Met. Soc. Japan, 62, 234-251.
28
Figure Captions Fig. 1 A schematic diagram that illustrates the division of the time line; prior to day zero is the training phase that includes 120 experiments from the
multimodels
whose daily results are regressed against an observed “analysis” benchmark. To the right of the zero line is the future forecasts phase that make use of the statistics from the training phase. Fig 2 Historical progress on Anomaly Correlations of the 500hPa level. Panel a) 5-day forecast 500 hPa height anomaly correlation (seasonally averaged), including zonal wave numbers 0 to 12 for both Northern and Southern Hemispheres. (Kalnay et al. 1991). Panel b) Daily variation of the anomaly correlation of error of the 500 hPa height for forecast days 1, 3, 5 and 7 from 1 October to 31 December 1981 in Europe. (Nieminen 1983). Panel c) Comparison of operational and reanalysis 5-day forecast anomaly correlations for NH and SH. (Kalnay et al. 1998) Panel d) ECMWF operational forecast skill since 1980 as represented by the forecast day on which the 500 hPa geopotential height anomaly over the Northern Hemisphere has dropped to 0.6. Heavy line is the 12-month average of the monthly mean values. (Brown 1987). Fig 3 Two graphs of anomaly correlation skill are being compared here; i) optimized training, ii) non-optimized training. The ordinate denotes anomaly correlation and the abscissa denotes forecast days. The dashed vertical line separates the training and forecast phase of the superensemble.
29
Fig 4 Distribution of optimal number of days for the training phase that provides the highest anomaly correlations at the 500 hPa surface over the North American Region for day-6 forecast. Fig 5 Geopgraphical distribution of statistical weights for different member models. (a) For Northern Hemisphere (b) For Southern Hemisphere Coloring interval of the fractional weights is shown at the bottom. Fig 6 A typical example of forecast on day 6 over the Northern Hemisphere valid on September 4, 2000. Top left: Analysis
Bottom left: the best model
Top right: Superensemble Bottom right: the model with the lowest skill Contour interval 100 meters Fig 7 Same as Fig 6 but for the Southern Hemisphere Fig 8 Differences between forecast and analysis from figures 6 and 7 Top:
For superensemble forecasts
Middle: For the forecast from the best model Bottom: For the forecast from the model with the lowest skill Units: meters, coloring interval in meters shown at the bottom Fig 9 Anomaly correlation as a function of forecast days: Blue lines show results for member models Red lines show results for the superensemble The different panels show forecasts for days 1 trough 6 Fig 10 Systematic errors: forecast minus analysis over 120 days over the 500 hPa level Top left:
Northern Hemisphere superensemble 30
Bottom left:
Northern Hemisphere best model
Top right:
Southern Hemisphere superensemble
Bottom left:
Southern Hemisphere best model
Units: meters; coloring interval in meters shown at the bottom Fig 11a Percent reduction of mean rms errors at the 500hPa surface over the best model by the superensemble and by the ensemble mean for July 2000 over the global tropical belt 30S to 30N. Left panel shows results at 48 hours forecast, the right panel shows the results at 120 hours. Fig 11b Mean rms errors of the respective member models, the ensemble mean and the Superensemble at hours 48 and 120 of forecasts over the tropical belt 30S to 30N Fig 12 Meridionally averaged percent variance of the geopotential heights as a function of zonal wave number. Fig 13 Geopotential variance in the two-dimensional triangular truncation space. The ordinate denotes a meridional wave number n and the abscissa denotes the zonal wave number m. Top diagram shows results for the superensemble. The bottom left diagram shows the variances for the best model and the bottom right illustrates those for the model with the lowest skill.
31
List of Tables: Table 1: List of Acronyms Table 2: Details of the multimodels used in this study Table 3: a) 500 hPa geopotential height anomaly correlation values from the Superensemble, Ensemble mean and the multimodels averaged for the period 20 August to 17 September, 2000 for Northern Hemisphere for forecasts from day 1 through day 6. b) Same as Table 3a except for Southern Hemisphere c) Same as Table 3a except for total Globe d) Same as Table 3c except that two best models are excluded from the superensemble suite. Table 4: 500 hPa geopotential anomaly correlation averaged for the period from 20 August to 17 September, 2000 for Global, Northern Hemisphere and Southern Hemisphere for forecasts from day 1 through day 6. Here the results from two different methods, the conventional superensemble (SENS) and the SVD based superensemble (SE_SVD) , are presented.
32
t=0 TRAINING PHASE
FORECAST PHASE
Multimodel Forecasts S T A T I S T I C S
Multimodel Forecasts
Observed Analysis Superensemble
t=0 TIME
Figure 1 33
Figure 2 34
Figure 3 35
Figure 4 36
Figure 5a 37
Figure 5b 38
Figure 6 (a,b,c and d) 39
Figure 7 (a,b,c and d) 40
Figure 8
41
Figure 9a
42
Figure 9b
43
Figure 9c
44
Figure 10 45
Percent Reduction of Mean RMS Error
0
10
20
30
40
50
60
SUPERENSEMBLE
48 HOUR FORECAST
Model
Figure 11a
ENSEMBLE MEAN
SUPERENSEMBLE
120 HOUR FORECAST
ENSEMBLE MEAN
Percent Reduction of Mean RM S Errors of 500mb Heights over the Best Model by Superensemble and Ensemble Mean for July 2000 as Applied to the Global Tropics (30N-30S)
46
Mean RMS (meters)
0
5
10
15
20
25
30
Model 1
Model 2
Model 3
Figure 11b
Model
Model 4
Model 5
ENS
SENS
Mean RMS (120 hour fcst. (m))
Mean RMS (48 hour fcst.(m))
Mean RM S errors (July 2000) for 48 and 120 Hour, 500mb Heights Forecasts for 5 Global Models, Ensemble Mean, and Superensemble Applied to the Global Tropics (30N-30S)
47
Figure 12 48
Figure 13
49
Table 1: List of Acronyms Acronym
Full Form
BMRC
Bureau of Meteorology Research Center
DMSP
Defense Meteorological Satellite Program
ECMWF
European Centre for Medium-Range Weather Forecasts
EMC
Environmental Modeling Center
EPS
Ensemble Prediction System
FSU
Florida State University
hPa
hectopascals
JMA
Japan Meteorological Agency
MIT
Massachusetts Institute of Technology
NCEP
National Centers for Environmental Prediction
NOGAPS
Navy Operational Global Atmospheric Prediction System
NWP
Numerical Weather Prediction
RMSE
Root Mean Square Error
RPN
Recherché en Prévision Numérique
SSM/I
Special Sensor Microwave Imager
TRMM
Tropical Rainfall Measuring Mission
TSDIS
TRMM Science Data and Information System
UTC
Coordinated Universal Time
50
Table 2: Outline of Multimodels used in this study Model
Vertical Levels
Horizontal Resolution
ECMWF
31
T213
UKMET
30
0.8333 Lon x 0.5555 Lat
BMRC
29
T239
JMA
40
T213
FSU
14
T126
NCEP
42
T170
NRL
24
T159
RPN
28
0.9 Lon x 0.9 Lat
51
Table 3 (a, b)
52
Table 3 (c, d)
53
Table 4 500 hPa Geopotential Height Anomaly Correlation: 20 Aug-17 Sep 2000
Global
NH
SH
SENS
SE_SVD
SENS
SE_SVD
SENS
SE_SVD
Day-1
0.992
0.994
0.995
0.996
0.990
0.993
Day-2
0.979
0.983
0.981
0.983
0.978
0.982
Day-3
0.958
0.960
0.956
0.959
0.956
0.961
Day-4
0.928
0.952
0.905
0.948
0.933
0.953
Day-5
0.881
0.911
0.843
0.869
0.889
0.921
Day-6
0.799
0.845
0.748
0.801
0.802
0.849
54