3008
MONTHLY WEATHER REVIEW
VOLUME 130
Evaluation of Ensemble Predictions of Blocking in the NCEP Global Spectral Model JOSHUA S. WATSON*
AND
STEPHEN J. COLUCCI
Department of Earth and Atmospheric Sciences, Cornell University, Ithaca, New York (Manuscript received 13 September 2001, in final form 21 May 2002) ABSTRACT Ensemble forecasts from the National Centers for Environmental Prediction (NCEP) Global Spectral Model (GSM) have been used to develop a probabilistic scheme for the prediction of blocking over the Northern Hemisphere. An evaluation of these forecasts during three recent cool seasons revealed that they underpredicted the frequency of blocking at all ranges considered. Probabilistic forecasts for blocking over a sector on a particular day were constructed via binary logistic regression from all ensemble forecasts verifying on that day. Forecasts from two cool seasons served as the developmental dataset; the probabilistic forecasts generated from these data were tested on data from a third, independent cool season. The resulting calibrated (bias corrected) ensemble forecasts were compared to the climatology of blocking derived from the 40-yr NCEP–NCAR reanalysis data, and to the probability of blocking expected from the uncalibrated ensemble forecasts. The calibrated forecasts had higher skill, relative to climatology, over the Atlantic than over the Pacific. The calibrated forecasts also produced more accurate predictions of blocking than did the uncalibrated forecasts over the Atlantic, but not over the Pacific. The results demonstrate that the ensemble forecasts, when calibrated, can provide useful predictions of blocking at extended ranges.
1. Introduction The inability of numerical weather prediction models to reliably predict the onset of blocking is well documented (Tibaldi and Molteni 1990; Tracton 1990; Tibaldi et al. 1994, 1995; Colucci and Baumhefner 1998). These studies have shown that the error associated with block prediction is a major contributor to the systematic errors of the models. The block-prediction errors may be due to model biases toward a systematic underforecasting of blocking events. Additionally, the block predictions may be sensitive to uncertainties in model initial conditions. Given these problems, one possible way to improve the usefulness of model predictions of blocking is to use an ensemble of model predictions to generate blocking probabilities. A time-lagged ensemble procedure, using ensemble forecasts from different initial times but verifying on the same day (e.g., Brankovic et al. 1990), can further enhance forecast utility by increasing the amount of information available for the development of * Current affiliation: Scientific Services Division, National Weather Service Eastern Region Headquarters, Bohemia, New York. Corresponding author address: Stephen J. Colucci, Dept. of Earth and Atmospheric Sciences, 1116 Bradfield Hall, Cornell University, Ithaca, NY 14853. E-mail:
[email protected]
q 2002 American Meteorological Society
probabilistic forecasts. The ensemble forecasts can, in turn, be calibrated to correct for model biases. In this contribution, a scheme for improving the utility of blocking forecasts is presented. The scheme exploits information available from the ensemble forecasts of the Global Spectral Model (GSM) of the National Centers for Environmental Prediction (NCEP). The ensemble forecasts are calibrated, or corrected for model biases, following Hamill and Colucci (1998). The calibration also serves to remove artificially introduced correlations that arise as a consequence of the persistence of blocking. The accuracy of the resulting forecasts, developed from data for two cool seasons and tested on data for a third season over the Northern Hemisphere, was compared to the climatological expectation of blocking determined from the NCEP reanalyses (Kalnay et al. 1996). The calibrated forecasts were further compared to predictions of blocking based upon the raw (uncalibrated) ensemble forecasts. The outcome of this evaluation is sufficiently encouraging so as to suggest that useful information about the likelihood of blocking may be obtained from the NCEP GSM ensemble. The proposed method may be further refined, with additional data, to sharpen the geographical resolution of the blocking forecasts. Also, this evaluation, which is focused upon predictions of the transition to blocking, may be extended to the problem of predicting block maintenance and decay.
DECEMBER 2002
WATSON AND COLUCCI
2. Definition of blocking In this study, we used a modified version of the procedure of Tibaldi and Molteni (1990) for objectively identifying blocks in forecasts and analyses. In their procedure, two geopotential height gradients, (middlelatitude geopotential height gradient (GHGS)) and highlatitude geopotential height gradient (GHGN), are calculated at each longitude on the 500-mb level at a particular time: GHGS 5
Z(f o ) 2 Z(f s ) , (f o ) 2 (f s )
GHGN 5
Z(f n ) 2 Z(f o ) , (f n ) 2 (f o )
where f n 5 808N 1 D, f o 5 608N 1 D, f s 5 408N 1 D, and D 5 248, 08, or 48. A given longitude is considered to be potentially blocked at that time for at least one value of D if 1) GHGS $ 0 and 2) GHGN , 210 m per degree of latitude. Blocking is defined to occur where and when these conditions are met at 20 consecutive degrees of longitude for at least five consecutive days. To improve the Tibaldi and Molteni (1990) index, their objective procedure was modified to include a height gradient calculation on the equatorward side of the system and to increase slightly the poleward reach of the index. In addition, changes were made in D to reflect the 2.58 spacing of the available analysis and forecast data (discussed in the next section). Calculations of the maximum meridional height gradient (MMHG), upper (poleward) geopotential height gradient (UGHG), and lower (equatorward) geopotential height gradient (LGHG) were performed. Here, MMHG 5 Z(b) 2 Z(a), where a ranges from 37.58 to 42.58N and b starts at a 1 20 and ends at 708N. Both a and b proceed in 2.58 increments. MMHG represents the maximum geostrophic easterly wind between 37.58 and 708N. Each 2.58 longitude is searched for MMHG. Twenty degrees of latitude is the minimum distance over which this calculation is made. Upon finding the maximum meridional height gradient, two more calculations are made: 1) the height gradient covering 20 degrees of latitude directly poleward of MMHG, or UGHG 5
Z(b 1 20) 2 Z(b) , 20
and
2) the height gradient covering 208 of latitude directly equatorward of MMHG, or LGHG 5
Z(a) 2 Z(a 2 20) . 20
Our definition requires that at a given longitude the following instantaneous conditions be met: MMHG $ 0 and UGHG and LGHG # 210 m per degree of latitude.
3009
This was to provide a minimum westerly flow on the poleward and equatorward side of the easterly flow, similar to the original Rex (1950) definition that required equivalent flow on the poleward and equatorward sides. A height gradient of 10 m per degree of latitude corresponds approximately to a geostrophic westerly wind speed of 15 m s21 within LGHG and 25 m s 21 within UGHG. Blocking is defined to occur where and when these conditions are met at 20 consecutive degrees of longitude for at least 5 consecutive days. This follows the procedure used by Tibaldi et al. (1994, 1995) in their comprehensive evaluations of the operational prediction of blocking. Our procedure improves upon the original Tibaldi and Molteni (1990) procedure by rejecting closed lows spuriously identified as blocks by the original procedure. This type of improvement is also noted by J. Pelly and B. Hoskins (2002, personal communication) using a new blocking definition based upon potential vorticity. 3. Data and methods The approach currently used at NCEP to generate ensemble members is the ‘‘breeding of growing modes,’’ or the ‘‘breeding method’’ (Toth and Kalnay 1993, 1997). The GSM ensemble used in this research consists of 17 members, 12 from the 0000 UTC Medium-Range Forecast (MRF) model run and 5 from the 1200 UTC Aviation Forecast (AVN) run. All forecasts are run out to 16 days. At 0000 UTC each day, the forecasts include the following: • T126 high-resolution control (MRF forecast) that gets truncated after 7 days and is run at T62 resolution, • T62 control that is started with a truncated T126 analysis, and • five pairs of perturbed forecasts each run at T62 horizontal resolution. The perturbations are from five independent breeding cycles. Scaled growing modes from these five cycles are added to or subtracted from the 0000 UTC T62 control analysis. At 1200 UTC each day, the forecasts include the following: • T126 high-resolution control (‘‘Aviation Forecast’’), which gets truncated after 3.5 days and is run at T62 resolution, and • two pairs of perturbed forecasts each run at T62 horizontal resolution. The perturbations are from two independent breeding cycles. Scaled growing modes from these two cycles are added to or subtracted from the 1200 UTC T62 control analysis. Each forecast dataset is on a global 144 3 73 equally spaced latitude–longitude grid with grid points spaced 2.58 apart. Only Northern Hemisphere data were used in this research. The ensemble forecasts studied were from November 1995 to March 1996, October 1996 to May 1997, and Oct 1997 to May 1998. Each forecast
3010
MONTHLY WEATHER REVIEW
FIG. 1. Analyzed 500-mb heights on 16 Dec 1997. The blocking system over western Europe and Great Britain was identified in the analyses from 15 to 19 Dec. The contour interval is 80 m.
extends from 0 to 16 days in advance of the initial time (forecast day 0). The ensemble forecast dataset had missing forecasts as well as bad data points. A complete listing of the missing forecast data is given in appendix A. Forecasts containing bad data points were disregarded to ensure a ‘‘pure’’ dataset for use in forecast probability development. After eliminating all forecasts containing erroneous data and accounting for missing forecast data, 91% (162 308) of the maximum possible 177 480 forecasts were available. A summary of the available forecast data is given in Table 1. NCEP reanalysis data (Kalnay et al. 1996) were used to verify the ensemble forecasts. A climatology of blocking, constructed from the reanalyses, served as a benchmark control forecast and for comparison with the GSM blocking climatology. The reanalysis data consisted of four times daily averaged global 500-mb heights for the periods September to May from 1959 to 1998 on a 144 3 73 equally spaced latitude–longitude grid with grid points spaced 2.58 apart. As with the forecasts, only data over the Northern Hemisphere were used. The reanalysis dataset was complete with no missing or bad data points.
VOLUME 130
FIG. 2. As in Fig. 1 except for 25 Dec 1996. The blocking system is centered near 1808 lon and was identified in the analyses from 21 to 30 Dec 1996.
The reanalysis and forecast data were searched for all longitudes meeting the instantaneous blocking definition and screened to find blocking systems that spanned 208 or more of longitude and lasted for 5 or more days. In the case of the ensemble forecasts, the screening process was done independently for each individual ensemble member. A particular ensemble member was considered to have predicted blocking at a longitude if, during the member’s 16-day forecast, there were at least 5 consecutive days meeting the blocking definition at that longitude. For example, if blocking was predicted at a longitude by an ensemble member on consecutive days 3–8 of a forecast, then blocking was considered to have been predicted by that member at that longitude on each of the days 3, 4, 5, 6, 7, and 8 of that forecast. On the other hand, a forecast initialized during a blocking episode may not be considered to have predicted blocking if, in the forecast, blocking does not persist for at least 5 days (i.e., until at least day 4). Thus, by this procedure, we are unable to evaluate the prediction of the maintenance of blocking and are focusing instead upon the prediction of the transition to blocking. We will address the problem of predicting block maintenance in subsequent work.
TABLE 1. Available ensemble blocking forecasts. Units are forecast days. Blocking season Nov–Mar 1995–96 Oct–May 1996–97 Oct–May 1997–98 Total available forecasts
Total possible forecasts 47 736 64 872 64 872
2 2 2 2
Missing forecasts 2159 1768 6290
2
Erroneous data forecasts
5
Subtotal available
2 2 2
2407 1270 1278
5 5 5
43 170 61 834 57 304 162 308
DECEMBER 2002
WATSON AND COLUCCI
3011
FIG. 3. Frequency of blocking as a function of longitude during the blocking seasons (1 Sep– 15 May) from 1 Sep 1959 through 15 May 1998.
The blocks identified in the forecast and reanalysis data were assigned to sectors. Following Colucci and Alberta (1996), the Pacific sector was defined to be the Northern Hemisphere longitudes from 908E eastward to 908W and the Atlantic sector was defined to be the Northern Hemisphere longitudes from 908W eastward to 908E. A 500mb height analysis during an identified Atlantic blocking episode is shown in Fig. 1, while a similar analysis during an identified Pacific block is presented in Fig. 2. 4. Climatology of blocking in the NCEP reanalyses The NCEP reanalysis data were searched for blocks, as defined above, during the cool seasons (1 September–16 May) from 1 September 1959 to 16 May 1998. Northern Hemisphere blocking is rarely observed outside of this season. Analyzed blocking frequencies as a function of longitude are presented for the entire record in Fig. 3. Consistent with previous climatological studies (Colucci and Alberta 1996 and references therein), maximum blocking frequencies were noted over the central Pacific Ocean and eastern Atlantic Ocean. Minimum frequencies were determined near the longitudes (908E and 908W) separating the Atlantic and Pacific sectors. Analyzed blocking frequencies as a function of longitude are presented for each season by Watson (1999). Here attention is focused upon blocking frequencies during the three seasons (1995– 96, 1996–97, and 1997–98) for which the ensemble forecasts were evaluated. A blocked day in a sector (Atlantic or Pacific) is defined when at least one blocking system is found in that sector. The blocking frequency for a sector during a season is defined to be the total number of blocked days over that sector during that season divided by the number of days in that season.
Thirty-two percent (83) of the 259 days during the 1995–96 season were blocked over the Atlantic sector by the above definition. This was the highest seasonal Atlantic blocking frequency of the 39 seasons studied (Watson 1999) and compares with a 39-season mean Atlantic blocking frequency of 13%. The other seasons (1996–97 and 1997–98) selected for forecast evaluation were characterized by near-normal (12%) blocking frequency over the Atlantic. The average number of blocked days per blocking season over the Pacific was approximately 30, corresponding to 12% of the time. The 1995–96 and 1996–97 seasons featured above-normal blocking frequencies (23% and 21%, respectively), while the 1997–98 blocking frequency (6%) was below normal. The block onset and decay dates during the 1995–96, 1996–97, and 1997–98 seasons over both sectors are presented in appendix B. 5. Ensemble forecast performance The forecast blocking frequency as a function of longitude, averaged over all ensemble members at five different forecast ranges, is presented for the 1995–96 cool season in Fig. 4 and is compared with the correspondingly analyzed blocking frequency. At each of the forecast time ranges, the ensemble forecasts on average qualitatively reproduced the analyzed blocking frequency maxima over the Atlantic and Pacific sectors. Blocking frequency appeared to be overpredicted at shorter ranges, especially over the Pacific, but underpredicted at longer time ranges, particularly over the Atlantic. In fact, at the longest range displayed (15 days), the predicted blocking frequency is no greater than the climatological frequency shown in Fig. 3,
3012
MONTHLY WEATHER REVIEW
VOLUME 130
FIG. 4. Analyzed (solid) and forecast (dashed) blocking frequency for the 1995–96 cool season averaged over all ensemble members for all (a) day 3, (b) day 6, (c) day 9, (d) day 12, and (e) day 15 forecasts.
whereas the analyzed frequency, as noted above, was well above normal during this season. Figure 4 does not reveal the accuracy of forecasts relative to the length of time between forecast initiali-
zation and the analyzed block onset. The analyzed block-onset day over a sector is defined as the first day of an analyzed blocking episode over that sector. The percentage of ensemble members predicting blocking
FIG. 5. Percentage of available ensemble members predicting blocking on the block-onset day, averaged over all three seasons studied, as a function of forecast time range for the Pacific and the Atlantic.
DECEMBER 2002
TABLE 2. A 2-by-2 contingency table for categorical ensemble forecasts. Event observed
Event forecast
3013
WATSON AND COLUCCI
Yes No
Yes
No
A C
B D
over a sector on the analyzed block-onset day, averaged over the number of cases in each sector, is shown in Fig. 5 as a function of the length of time between forecast initialization and block-onset day. Less than 100% of the ensemble members on average predict block onset on day 0 because, by definition, blocking beginning at day 0 is required to persist at least until day 4. Our focus is not on the prediction of the maintenance of existing blocks but on the prediction of the transition to blocking, that is, on the accuracy of forecasts initialized at least one day prior to the onset of blocking. To keep the forecast verification as simple as possible, we strictly require each forecast to meet to objective blocking definition. We recognize that blocklike forecasts, or those that missed block onset by a day or two, may be subjectively regarded as correct even if they do not satisfy the objective blocking criteria. Figure 5 reveals that the percentage of ensemble members predicting block onsets dropped below the climatological (12%–13%) frequency of blocking by day 12 over the Atlantic and day 15 over the Pacific, suggesting that the ensemble can skillfully predict block onsets through at least the medium ranges. The skill of the ensemble was further investigated as follows. Ensemble forecast skill scores were calculated for each full block season (November 1995–March 1996, October 1996–April 1997, and October 1997–April 1998). Each forecast for each member was categorized using a 2 by 2 contingency table (Table 2) since there are only
two possible outcomes to the block forecast or observation (yes or no). ‘‘Yes’’ implies a block analysis or forecast within a particular sector, while ‘‘no’’ means no analyzed or forecast blocking in that sector. The entry A in Table 2 is the number of event forecasts that correspond to analyzed events, or the number of hits. Entry B is the number of event forecasts that do not correspond to analyzed events, or the number of false alarms. Entry C is the number of no-event forecasts corresponding to analyzed events, or the number of misses. Entry D is the number of no-event forecasts corresponding to no events analyzed, or the number of correct rejections. Thus, A 1 B 1 C 1 D 5 n, the total number of forecast–observation pairs. The 2 by 2 table will be referenced in the definitions of a number of performance measures, or skill scores, formulated for the 2 by 2 verification problem. The skill scores calculated were the bias, the Heidke skill score (HSS) and the false alarm rate (FAR). Specific details regarding each of these skill scores can be found in Wilks (1995). Skill scores will be presented and discussed herein for day 6 forecasts by way of example. Briefly, the bias measures whether events are overforecast or underforecast. Using the contingency table notation, bias 5 (A 1 B)/(A 1 C). A bias score of one indicates that the event is observed as often as it is forecast. Greater than one indicates that the event is overforecast and less than one indicates the event is underforecast. The HSS is a verification measure of categorical forecast performance based on the hit rate, (A 1 D)/n. It provides a skill calculation of forecasts versus the hit rate achieved by random forecasts that are constrained to have the same relative frequency as the verification dataset. Again, using the contingency table notation,
FIG. 6. Skill scores for Pacific forecasts, averaged over every forecast day 6 for the 3 yr of verification data.
3014
MONTHLY WEATHER REVIEW
VOLUME 130
FIG. 7. Same as in Fig. 6 but for Atlantic sector forecasts.
HSS 5 2
(AD 2 BC) . [(A 1 C)(C 1 D) 2 (A 1 B)(B 1 D)]
Perfect forecasts receive an HSS of 1. Forecasts equivalent to randomly made forecasts receive a skill score of 0, indicating no skill. Forecasts worse than the reference (random) forecast receive a score of less than 0. The FAR is a verification measure of categorical forecast performance equal to the number of false alarms divided by the total number of event forecasts. In contingency table notation, FAR 5 B/(A 1 B). FAR ranges from zero to one. Low FAR scores denote more accurate forecasts (less false alarms) and high FAR scores denote less accurate forecasts (more false alarms). The bias, HSS and FAR averaged over all day 6 fore-
casts for each individual ensemble member are presented for the Pacific in Fig. 6 and Atlantic in Fig. 7. Not surprisingly, Figs. 6 and 7 show that there are only minor fluctuations in skill scores between members. All 17 members were found to be statistically identical, with variations in skill scores attributable to sampling errors. Thus, each of these members has an approximately equal chance of predicting blocking. The bias calculations in both sectors average less than one, indicating that blocking is underforecast. The HSS is positive for each member in each sector, however. From Fig. 8 and 9, it is seen that the skill of the average of all the ensemble members maximizes near forecast day 4, but declines rapidly after that. The forecasts averaged over all ensemble members for each forecast day for the entire 3-yr verification set were examined. In Figs. 10 and 11, the bias plots clearly
FIG. 8. Skill scores for ensemble member 18’s Pacific forecasts, averaged over individual forecast days (0–16) for the 3 yr of verification data.
DECEMBER 2002
WATSON AND COLUCCI
3015
FIG. 9. Same as in Fig. 8 but for Atlantic sector forecasts.
show that, on average, the ensemble underforecasts blocking in the Pacific and Atlantic sectors, respectively, at all forecast ranges. That the forecast blocking frequency is higher than the analyzed frequency at most Pacific longitudes at day 6 (Fig. 4b) means that fewer Pacific blocking days are being predicted than analyzed, but on these days blocking conditions are forecast at more longitudes than are analyzed. From the evidence presented, it appears that the GSM ensembles were able to qualitatively predict the analyzed blocking frequencies during the period of investigation. Therefore, the forecast data were used to quantitatively formulate a probabilistic block prediction scheme, as described in the next section.
6. Probability forecast development and validation The first step toward developing probability forecasts was to determine the predictors available for the regression. One natural selection was to use ‘‘the number of ensemble members predicting blocking on a given day in either the Atlantic or Pacific sector.’’ However, there are, at times, considerable gaps in the forecast data due to unavailable or erroneous data. Hence, using only ‘‘the number of members available’’ for a specific forecast period would not provide an accurate assessment of the block likelihood from the ensemble forecasts. Therefore, the predictor chosen for use in the probability forecast development
FIG. 10. Average skill scores for Pacific ensemble forecasts. The skill scores were averaged over individual forecast days (0–16) over all ensemble members for the 3 yr of verification data.
3016
MONTHLY WEATHER REVIEW
VOLUME 130
FIG. 11. Same as in Fig. 10 but for Atlantic sector forecasts.
was ‘‘percent of available members predicting blocking on a given forecast day for either the Atlantic or Pacific sector,’’ referred to hereafter as %-available. However, using the %-available for each individual forecast only provided one predictor for each forecast day; that is, the %-available for the forecast day 5 would be the only information used to develop a probability statement about the blocking forecast 5 days from now. These data were therefore supplemented with previous forecasts valid for the same day, for example, using time-lagged forecasts. Use of the time-lagged forecasts provided 16 times more data (since the forecasts are run out to day 16), such that forecast data from up to 16 days ago could be used in a regression to develop forecast blocking probabilities. The 1996–97 and 1997–98 forecast data served as the developmental sample and the 1995–96 data were used for independent validation, based on the distribution of blocking episodes over time in each season. The 1996– 97 and 1997–98 seasons both had block episodes that were spaced apart temporally from one another, while the 1995– 96 season had numerous block episodes that occurred in near succession. The 1996–97 season was dominated by Pacific blocking and the 1997–98 season was dominated by Atlantic blocking. Combined, the 1996–97 and 1997– 98 seasons had nearly the same number of block episodes as the 1995–96 season. Binary logistic regression (Wilks 1995, p. 183) was used to determine the blocking probability on a forecast day:
yˆ i 5
11 1 exp(B 1 B X 1 B X 1 · · · 1 B X )2 1
o
1
1
2
2
5
k
3 (100%), where B k are the coefficients and X k are the percent of available members predicting blocking at a given forecast range. Both the Atlantic and Pacific forecast data from the 1996–97 and 1997–98 blocking season were combined into one dataset to develop the regression because there were not enough blocking episodes in either sector to develop separate sector regressions. Each regression equation was built separately, using only ‘‘current’’ and ‘‘past’’ or time-lagged forecasts, for each forecast day. Backward elimination was used to remove predictors, starting with those that had the highest p value. Once all the predictors with p values . 0.05 were eliminated the coefficients were examined. Elimination of predictors continued until the remaining coefficients had p values of , 0.01, when testing the null hypothesis that the coefficient is zero. Resulting forecast equations are given in Table 3. The boxes with only dashes in Table 3 indicate that forecast data from those days are not used in the regression, since those boxes are actually ‘‘future’’ forecasts. Note that equation ‘‘x’’ is simply forecast day ‘‘x 2 1’’ and again that the day 2 n headings are read as, ‘‘A forecast made n days prior to the valid date.’’ As an example, the forecast equation for a day 4 forecast made ‘‘today’’ is given as follows:
probability of blocking on day 4 5
k
6
1 (100%). 1 1 exp[3.683 2 5.978(day 24) 2 2.364(day 27) 2 2.798(day 211)]
23.724 23.971 23.226 23.381 25.068 22.835 23.277 23.082 22.842 – 25.921 23.686 23.249 23.019 – – 25.218 22.657 22.893 22.854 – – – 25.928 22.364 22.995 – – – – 25.127 22.798 22.454 – – – – – 25.388 22.95 – – – – – – 26.131 22.836 – – – – – – – 24.915 23.355 – – – – – – – – 23.398 23.386 22.394 – – – – – – – – – 23.491 22.926 22.694 – – – – – – – – – – 24.345 22.173 22.893 – – – – – – – – – – – 23.892 22.617 23.1 – – – – – – – – – – – – 24.035 22.67 23.132 – – – – – – – – – – – – – 24.954 23.934 – – – – – – – – – – – – – – 24.716 24.051 – – – – – – – – – – – – – – – 26.692
Day 0 Day 21 Day 22 Day 23 Day 24 Day 25 Day 26 Day 27 Day 28 Day 29 Day 210 Day 211 Day 212 Day 213 Day 214 Day 215 Day 216
4.496 27.11 4.291 – 4.192 – 3.963 – 3.683 – 3.421 – 3.098 – 3.096 – 2.844 – 2.906 – 2.892 – 2.881 – 2.791 – 2.655 – 2.48 – 2.384 – 2.189 – 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
B0
WATSON AND COLUCCI
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Equa- Forecast tion day
TABLE 3. Regression equations using 1996–97 and 1997–98 ensemble forecast data.
DECEMBER 2002
3017
Since this is a 4-day forecast made today, the %-available on day -4 is actually the %-available from today’s 4-day ensemble forecasts. The %-available on day -7 is actually the %-available from 3 days prior to today, verifying 4 days from now. The most notable feature of Table 3 is the constant selection of day -4, day -7, and day -11 as the most significant predictors. In the two equations where day -7 or day -11 was not chosen, their p values were less than 0.03, but above the 0.01 cutoff. A test was performed to examine the ‘‘true’’ importance of these forecast days, recognizing that adjacent forecasts were highly correlated (Fig. 12). For example, day -7 forecasts were chosen consistently over day -8 forecasts, and day 11 forecasts were chosen consistently over day -12 forecasts. Furthermore, day -8 forecasts were nearly always the first forecasts to be removed in the backward elimination process. However, if the day -7 and day -11 forecasts were arbitrarily discarded instead of the day -8 and day -12 forecasts, then day -8 and day -12 forecasts became as important a predictor as the day -7 and day -11 forecasts had been. This indicates that either of the two forecast days could have been used as a predictor, as long as the other was not. The forecast equations were validated using the 1995–96 season ensemble forecast data applied to block prediction over the Atlantic and Pacific sectors. It is important to note that the same set of equations was used to predict both Atlantic and Pacific sector blocking. Overall ‘‘reliability diagrams’’ for the blocking forecast equations are presented in Fig. 13. Reliability diagrams for the individual forecast equations are found in Watson (1999). A reliability diagram is a plot of the observed relative frequency of an event versus the frequency of use of the forecast values for that event. Also displayed on a reliability diagram is a histogram of the forecast probability ‘‘bins.’’ Forecast probabilities have been categorized into bins to ensure that each bin has a large enough sample size to make the reliability diagram a useful tool. Four bins were used to categorize the ensemble probability forecasts, 0%–9.9%, 10%–49.9%, 50%–89.9%, and 90%–100%. These are the bins for all the histograms plotted with the reliability diagrams. The ranges were averaged to obtain the points 5%, 30%, 70%, and 95%. These points are plotted on the reliability diagram as the 1:1 ‘‘perfect forecast’’ diagonal. Since the bins are not spaced equally between 0% and 100%, the 1:1 diagonal is not a straight line. The histogram reveals the number of times a particular bin of probabilities has been used. ‘‘Sharp’’ forecasts are those that use the highest and lowest probabilities most often. A sharp forecast is preferable, especially when diagnosing the forecast of a dichotomous event, so long as reliability is not overly compromised to achieve sharper forecasts. Interpreting the reliability of each of the forecast
3018
MONTHLY WEATHER REVIEW
FIG. 12. Correlation between the %-available forecasts and forecasts verifying on the same day but initialized k days earlier, with each forecast range (from day 1 to 16) represented by a single curve, during the 1995–96 season over the (top) Pacific and (bottom) Atlantic. Note that extended-range forecasts are more correlated with shorter-range forecasts, verifying at the same time, over the Atlantic than over the Pacific.
VOLUME 130
equations is straightforward. The thick, black line with the small boxes plotted in Fig. 13 is the relative frequency of observed blocking (RFOB) for a particular forecast equation. When the RFOB is above the thin ‘‘perfect reliability’’ line, the interpretation is that the equation is underforecasting blocking in those particular probability bins. Conversely, when the RFOB is below the thin perfect reliability line, the interpretation is that the equation is overforecasting blocking in those particular probability bins. Overall, the results are encouraging and the equations appear reliable. The equations underforecast blocking at smaller probabilities in the Atlantic sector and overforecast blocking at larger probabilities in the Pacific sector. The question of whether the forecast equations were skillful at predicting blocking over each sector is addressed by comparing their accuracy, determined by the Brier (1950) score, with the accuracy of suitable reference forecasts. Not surprisingly, the calibrated ensemble forecasts skillfully predicted blocking, relative to a climatological reference forecast, over each sector and at all ranges (Fig. 14). On the other hand, the calibrated ensemble forecasts skillfully predicted blocking, relative to the uncalibrated ensemble forecasts, at most ranges over the Atlantic but not over the Pacific (Fig. 15). In other words, our calibration of the ensemble forecasts improved the accuracy at predicting blocking, at most ranges, over the Atlantic but not over the Pacific where the uncalibrated ensemble forecasts were more accurate. An exception for the Atlantic sector prediction was at day 4 range [Eq. (5) in Fig. 15] when the uncalibrated ensemble forecasts achieved peak accuracy (Fig. 11).
FIG. 13. Reliability diagrams for the calibrated ensemble predictions of blocking over the (left) Pacific and (right) Atlantic sectors. The dark curve in each diagram is the observed vs predicted frequency of blocking, while the light curve represents perfect reliability. The observed relative frequency is linearly scaled, while the forecast probability is linear in bin number.
DECEMBER 2002
WATSON AND COLUCCI
FIG. 14. Calibrated probability equation forecast skill vs overall climatology in the Pacific and Atlantic. Climatological block relative frequencies for the Pacific (0.115) and Atlantic (0.131) are used as the reference forecasts. Note that equation ‘‘x’’ corresponds to forecast day ‘‘x 2 1’’.
The reason for the differing results between the Atlantic and Pacific sectors is unclear. Perhaps there are stronger signals for blocking in the extended forecast range over the Atlantic sector relative to these signals over the Pacific sector. Future work will incorporate more data and will segregate these data to separately develop Pacific and Atlantic prediction equations. 7. Conclusions GSM ensemble 500-mb height forecasts from two cool seasons (September through mid-May) were used to develop a probabilistic scheme for the prediction of blocking over the Atlantic and Pacific sectors of the Northern Hemisphere. The probabilistic scheme was tested for blocking prediction during a third cool season. The raw (uncalibrated) ensemble forecasts underpredicted the frequency of blocking at all ranges over both sectors during all three seasons. The probabilistic scheme developed from the bias-corrected (calibrated) ensemble forecasts during two cool seasons skillfully (relative to climatology) predicted blocking during the third cool season at all ranges over both sectors. The accuracy of the calibrated ensemble forecasts was greater at most ranges than the accuracy of the uncalibrated ensemble forecasts over the Atlantic sector, but not over the Pacific.
3019
FIG. 15. Calibrated probability equation forecast skill vs the %available forecast Brier scores. Note that equation ‘‘x’’ corresponds to forecast day ‘‘x 2 1’’.
The lack of improvement noted over the Pacific is likely due to our use of a single set of predictive equations for both sectors rather than one set for each sector. This constraint was imposed by the limited amount of ensemble forecast data available. Still, we are encouraged by the demonstrated potential of the scheme to correct an underforecasting bias in the Atlantic and to produce sharper probabilistic forecasts over both the Atlantic and Pacific sectors. With additional data, the method can be refined to produce more geographically specific blocking forecasts. Our results demonstrate that an ensemble of forecasts can overcome the effect of initial condition uncertainty by skillfully predicting the onset of blocking at extended ranges. The usefulness of these forecasts can be enhanced through the correction of model biases toward the underprediction of blocking. Further testing and extension of our block-prediction scheme seems justified by these results. Acknowledgments. This work represents the first author’s (JSW) M.S. thesis research under the supervision of the second author (SJC) and was supported in part by National Science Foundation Grant ATM-9726250. The data for this study were provided by the National Oceanographic and Atmospheric Administration’s Climate Diagnostic Center in Boulder, Colorado (http:// www.cdc.noaa.gov). We thank Dr. Steven Tracton for his encouragement of this work and the anonymous reviewers for their helpful comments.
3020
MONTHLY WEATHER REVIEW
VOLUME 130
APPENDIX A Missing Ensemble Forecast Data 1995–96
1996–97
Oct: not available 3 17 18 20 21 25
Nov Nov Nov Nov Nov Nov
1200 1200 0000 1200 0000 0000
UTC UTC UTC UTC UTC UTC
23 Dec
1200 UTC
2 15 16 30
1200 1200 1200 1200
Jan Jan Jan Jan
24 Feb 2 Mar 16 Mar 18 Mar
UTC UTC UTC UTC
12 Oct 21 Oct 31 Oct
1997–98
1200 UTC 0000 UTC 1200 UTC
Nov: none Dec: none Jan: none 9 Feb
0000 UTC
6 Mar 8 Mar 9 Mar 14 Mar 26 Mar
0000 0000 1200 0000 0000 1200
4 Apr 19 Apr
0000 UTC 0000 UTC
UTC UTC UTC UTC UTC UTC
1200 UTC
1200 0000 0000 1200 19 Mar 0000 Apr: not available
UTC UTC UTC UTC UTC
1 Oct 2 Oct 6 Oct 7 Oct 8 9 10 14 16 17
Oct Oct Oct Oct Oct Oct
20 Oct 21 Oct 22 27 28 29 30 31
Oct Oct Oct Oct Oct Oct
1 Nov 2 9 16 23 27
Nov Nov Nov Nov Nov
0000 1200 1200 1200 0000 1200 1200 1200 1200 1200 1200 0000 1200 1200 0000 1200 1200 1200 1200 1200 1200 0000 1200
UTC UTC UTC UTC UTC UTC UTC UTC UTC UTC UTC UTC UTC UTC UTC UTC UTC UTC UTC UTC UT UTC UTC
0000 1200 1200 1200 1200 1200 0000
UTC UTC UTC UTC UTC UTC UTC
Dec. none 7 Jan
0000 UTC
27 Jan
1200 UTC
17 21 23 27
Feb Feb Feb Feb
1200 1200 1200 1200
UTC UTC UTC UTC
1 Mar 8 Mar
1200 0000 1200 0000
UTC UTC UTC UTC
0000 1200 1200 0000 1200 0000 1200 0000 1200 0000 1200 1200 1200
UTC UTC UTC UTC UTC UTC UTC UTC UTC UTC UTC UTC UTC
15 Mar 18 Mar 20 Mar 21 Mar 22 31 5 6 11 15 20 28
Mar Mar Apr Apr Apr Apr Apr Apr
DECEMBER 2002 APPENDIX B Block Onset and End Dates Block onset
Sector
REFERENCES Block end
1995–96 season 28 3 8 10 25 4 16 1 24 27 6 25 25 27 10 10
Nov 1995 Dec 1995 Dec 1995 Dec 1995 Dec 1995 Jan 1996 Jan 1996 Feb 1996 Feb 1996 Feb 1996 Mar 1996 Mar 1996 Mar 1996 Apr 1996 May 1996 May 1996
16 21 28 19 8 12 10 13
Sep 1996 Dec 1996 Dec 1997 Jan 1997 Feb 1997 Mar 1997 Apr 1997 Apr 1997
Pacific Atlantic Pacific Atlantic Atlantic Pacific Atlantic Pacific Atlantic Pacific Atlantic Pacific Atlantic Atlantic Pacific Atlantic
3 8 14 23 30 15 31 5 29 11 21 30 1 1 16 14
Dec 1995 Dec 1995 Dec 1995 Dec 1995 Dec 1995 Jan 1996 Jan 1996 Feb 1996 Feb 1996 Mar 1996 Mar 1996 Mar 1996 Apr 1996 May 1996 May 1996 May 1996
20 30 12 1 15 17 20 24
Sep 1996 Dec 1996 Jan 1997 Feb 1997 Feb 1997 Mar 1997 Apr 1997 Apr 1997
15 22 6 19 27 8 24
Nov 1997 Nov 1997 Dec 1997 Dec 1997 Jan 1998 Feb 1998 Mar 1998
1996–97 season Atlantic Pacific Atlantic Pacific Pacific Pacific Pacific Atlantic 1997–98 season 11 17 29 15 21 2 20
Nov 1997 Nov 1997 Nov 1997 Dec 1997 Jan 1998 Feb 1998 Mar 1998
3021
WATSON AND COLUCCI
Pacific Atlantic Atlantic Atlantic Atlantic Pacific Atlantic
Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev., 78, 1–3. Brankovic, C., T. N. Palmer, F. Molteni, S. Tibaldi, and U. Cubasch, 1990: Extended-range predictions with ECMWF models: Timelagged ensemble forecasting. Quart. J. Roy. Meteor Soc., 116, 857–912. Colucci, S. J., and T. L. Alberta, 1996: Planetary-scale climatology of explosive cyclogenesis and blocking. Mon. Wea. Rev., 124, 2509–2520. ——, and D. P. Baumhefner, 1998: Numerical prediction of the onset of blocking: A case study with forecast ensembles. Mon. Wea. Rev., 126, 773–784. Hamill, T. M., and S. J. Colucci, 1998: Evaluation of Eta–RSM ensemble probabilistic precipitation forecasts. Mon. Wea. Rev., 126, 711–724. Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77, 437–471. Rex, D. F., 1950: Blocking action in the middle troposphere and its effect upon regional climate: The climatology of blocking action. Tellus, 2, 275–301. Tibaldi, S., and F. Molteni, 1990: On the operational predictability of blocking. Tellus, 42A, 343–365. ——, E. Tosi, A. Navarra, and L. Pedulli, 1994: Northern and Southern Hemisphere seasonal variability of blocking frequency and predictability. Mon. Wea. Rev., 122, 1971–2003. ——, P. Ruti, E. Tosi, and M. Maruca, 1995: Operational predictability of winter blocking at ECMWF: An update. Ann. Geophys. 13, 305–317. Toth, Z., and E. Kalnay, 1993: Ensemble forecasting at NMC: The generation of perturbations. Bull. Amer. Meteor. Soc., 74, 2317– 2330. ——, and ——, 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., 125, 3297–3319. Tracton, M. S., 1990: Predictability and its relationship to scale interaction processes in blocking. Mon. Wea. Rev., 118, 1666– 1695. Watson, J. S., 1999: Using time-lagged ensemble forecasts to generate blocking probabilities. M.S. thesis, Cornell University, 106 pp. [Available from J. S. Watson, 38 Audubon Ave., Holbrook, NY 11741.] Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences: An Introduction. Academic Press, 467 pp.