3750
MONTHLY WEATHER REVIEW
VOLUME 135
Evaluation of Precipitation from Numerical Weather Prediction Models and Satellites Using Values Retrieved from Radars SLAVKO VASIC´ , CHARLES A. LIN,
AND ISZTAR
ZAWADZKI
Department of Atmospheric and Oceanic Sciences, and Global Environmental and Climate Change Centre, McGill University, Montréal, Québec, Canada
OLIVIER BOUSQUET Centre de Météorologie Radar, Météo France, Paris, France
DIANE CHAUMONT Ouranos Consortium on Regional Climatology and Adaptation to Climate Change, Montréal, Québec, Canada (Manuscript received 12 July 2006, in final form 25 January 2007) ABSTRACT Precipitation is evaluated from two weather prediction models and satellites, taking radar-retrieved values as a reference. The domain is over the central and eastern United States, with hourly accumulated precipitation over 21 days for the models and radar, and 13 days for satellite. Conventional statistical measures and scale decomposition methods are used. The models generally underestimate strong precipitation and show nearly constant modest skill over a 24-h forecast period. The scale decomposition results show that the effective model resolution for precipitation is many times the grid size. The model predictability extends beyond a few hours for only the largest scales.
1. Introduction Numerical weather prediction (NWP) models are constructed to simulate complex physical processes that occur in the atmosphere. Their success in predicting different physical fields crucially depends on various processes (e.g., turbulence, cloud physics and microphysics, and radiative heat transfer) that have to be adequately formulated and implemented in the model. Furthermore, formulation of pertinent initial and boundary conditions for all governing equations and the choice of well-behaved numerical schemes used in the solution procedure are very important issues. Considerable improvements in modeling different processes have been made over the last few decades, resulting in an increased forecast skill (Roebber and Bosart 1996). There are, of course, further improvements to be made. For instance, Skamarock (2004)
Corresponding author address: Dr. Slavko Vasic´, Department of Atmospheric and Oceanic Sciences, McGill University, 805 Sherbrooke Street West, Montréal, QC H3A 2K6, Canada. E-mail:
[email protected] DOI: 10.1175/2007MWR1955.1 © 2007 American Meteorological Society
strongly advocates the use of higher-than-second-order numerical schemes. He proposes the use of an up-tofifth-order upwind advection scheme that contains a sixth-order filter with both hyperviscosity and a scaleselective damping feature that are minimally diffusive. In practice, models commonly use second-order schemes. Another example is the treatment of turbulent transport processes in atmospheric flows. The simple eddy viscosity/diffusivity type of turbulence models, as mostly used in operational NWP models, are known to perform inadequately in predicting flows with strong curvature or countergradient heat transport, which occur in the atmospheric convective boundary layer. A number of countergradient heat flux formulations have been proposed (e.g., Deardorff 1972; Troen and Mahrt 1986; Holtslag and Moeng 1991; Abdella and McFarlane 1997; Zilitinkevich et al. 1999; etc.) either to relax or eliminate such deficiency of eddy diffusivity models. Second-moment closure turbulence models have been used in dealing with the effects of stress anisotropy, streamline curvature, buoyancy, Coriolis effects, and cross interactions between fluctuating and mean fields (e.g., Zeman and Lumley 1976; Mellor
NOVEMBER 2007
V A S I C´ E T A L .
and Yamada 1982; Canuto et al. 1994; Vasic´ et al. 2001). However, their incorporation into NWP models is associated with a number of issues that require special treatment. In this study, we evaluate precipitation fields from operational NWP models using observations from radars and satellites. We use both traditional statistical and scale decomposition methods. Many traditional statistical approaches yield a single-valued measure of correlation between the observed and forecast fields. Examples of such measures are the precipitation above a given threshold rate using categorical skill scores, or the Pearson correlation coefficient. Though useful and popular, these and similar measures can be inadequate to characterize and assess the overall performance of model forecasts (e.g., Murphy and Winkler 1987; Murphy and Epstein 1989; Wilks 1995; Ebert and McBride 2000, Bousquet et al. 2006). Precipitation fields in general are characterized by large and small scales with complex space–time structures. A valuable approach to analyze such variability or to discriminate between observations and forecasts is the multiscale decomposition technique that relies on Fourier and wavelet methods. This methodology yields estimates of power spectra over a wide range of scales. Fourier spectra have been widely used for this purpose (Harris et al. 2001; Baldwin and Wandishin 2002; Skamarock 2004; Skamarock and Dempsey 2005). Harris et al. (2001) used a suite of multiscale statistical methods to compare the scale dependence of the precipitation variability of the high-resolution numerically simulated convective storm with that observed by radar. Baldwin and Wandishin (2002) examined several Weather Research and Forecast (WRF) and Eta Model precipitation forecasts, obtained at different grid spacings and using different convective schemes. Both studies show that model precipitation fields exhibit less spatial variability than that observed, particularly at small scales. Skamarock (2004) noted that errors generated at the grid scale may propagate upscale through the spectrum for mesoscale NWP models with grid sizes between 1 and 20 km and forecast periods from 1 to 3 days. Skamarock examined the time-averaged kinetic energy spectra over a 24-h period from WRF model forecasts at 3-h intervals and with horizontal grid spacings (⌬x) of 22, 10, and 4 km. Generally, it has been suggested that models have an effective resolution that is of order 5–10 ⌬x (Harris et al. 2001; Baldwin and Wandishin 2002; Skamarock 2004). Bousquet et al. (2006) examined 6-h rainfall accumulations from radar observations and models during a 3-day summer precipitation event over central Alberta in Canada. Using wavelet analysis, they showed that the model effec-
3751
tively resolves spatial scales larger than 6 times its grid resolution of 10 km. The lack of power of the models at smaller scales is a consequence of both the inadequate parameterization of physical processes and numerical schemes that are prone to numerical diffusion. The latter has been examined by Skamarock and Dempsey (2005) in an evaluation of forecasts of the WRF and Nonhydrostatic Mesoscale Model (NMM; Janjic´ 2003). They used a correction method to modify NMM forecasts to bring its results in closer agreement with the WRF model results and observations. In this study, we use both traditional statistical and scale decomposition methods to evaluate model- and satellite-derived precipitation using radar-retrieved precipitation. We recognize that radar measures reflectivity and not precipitation intensity. Reflectivity and precipitation rate are given by two different moments of the rainfall drop size distribution—the 6th moment for reflectivity and 3.67th moment for precipitation rate. Mesoscale model precipitation is not derived from a direct computation of precipitation-generating processes, but instead is the result of parameterizations based on model information on the environmental conditions on scales much larger than the scales of precipitation formation. We call it model precipitation for practical reasons; we could as well call it model radar reflectivity. Thus, evaluating model and satellite precipitation using radar values as a reference is perfectly legitimate. Whether we transform radar data into precipitation rate units or model data into reflectivity units is arbitrary. However, conversion of these data is not uniquely determined, and the chosen conversion factor in the Z–R relationship may contribute to the more- or less-accurate estimate of model-based reflectivity or radar-based precipitation. This issue is closely related to the treatment of the microphysical processes in NWP models and information that is available from model forecast results, as well as information from radar observations. Microphysical processes in the Global Environmental Multiscale (GEM) and Eta Models are parameterized differently, and the details are reported by Sundqvist et al. (1989) and Ferrier (1994). Section 2 describes the comparison data and methods used. In section 3, we discuss results of the comparison of model forecasts and observations using traditional statistical and scale decomposition methods. Section 4 contains concluding remarks.
2. Comparison data and methods a. Numerical model simulations We compare precipitation fields from numerical weather prediction models with values retrieved from
3752
MONTHLY WEATHER REVIEW
VOLUME 135
TABLE 1. Summary of model configurations related to regional forecasts. Note: other details can be found in cited references and/or MSC and NCEP online documentation. Model Reference Horizontal grid spacing Vertical levels Precipitation accumulated Data assimilation Lateral boundary conditions PBL turbulence scheme Horizontal diffusion Vertical diffusion Deep convection Shallow convection Cloud microphysics Gravity wave drag
GEM/HIMAP
GEM
Eta
Côté et al. (1998) Mailhot et al. (2006) Janjic´ (1994) 10 km 15 km 12 km 28 58 60 Hourly Hourly Hourly Fully cycled 3D variational data assimilation system in each model Supplied by a corresponding (GEM, NCEP) 6-h old global model forecasts Dry TKE–l Moist TKE–l Mellor–Yamada level 2.5 ⵜ6* (only for momentum) ⵜ2 (all variables) ⵜ2 (all variables) Fully implicit Fully implicit Fully implicit Fritsch–Chappell Kain–Fritsch Betts–Miller–Janjic´ Conres Kuo transient Betts–Miler–Janjic´ Sundqvist Sundqvist Ferrier None McFarlane None
* Horizontal diffusion for other variables is treated by a second-order scheme.
the continental U.S. radar network. More precisely, we examine hourly accumulated precipitation fields from two variants of the Canadian GEM model, and the U.S. Eta Model. The first is the High-Resolution Model Application Project (HIMAP) version of the global GEM model (Côté et al. 1998), which had been used with the operational 24-km GEM model (Bélair et al. 2000) to produce regional precipitation forecasts at a higher 10km grid resolution over western and eastern parts of the North American continent. The second is the new GEM version with 15-km grid spacing and an improved physics packages (Mailhot et al. 2006). This model became operational at the Canadian Meteorological Centre (CMC) on 18 May 2004. Since then, the 15-km GEM has been used for regional forecasts over the entire North American continent. The Eta Model is the operational model of the U.S. National Centers for Environmental Prediction (NCEP) (Janjic´ 1994), and currently produces forecasts at 12-km grid spacing. All of these models are run twice a day, with initialized fields at 0000 and 1200 UTC. A summary of model configuration related to regional forecasts is given in Table 1. The horizontal grid spacing and the number of vertical levels are different between the models. Other major differences include the physics package, such as the schemes for cumulus parameterization and microphysics; turbulence models of transport processes in the planetary boundary layer (PBL); and treatment of gravity wave drag. For example, GEM uses the Kain– Fritsch scheme for deep convection and the Sundqvist scheme for cloud microphysics, while the Eta Model uses the Betts–Miler–Janjic´ and Ferrier schemes. Furthermore, the GEM variants employ a prognostic equation for turbulent kinetic energy and a turbulent mixing length (TKE–l model); the Eta Model uses the Mellor
and Yamada (1982) algebraic turbulence model, closure level 2.5. More details can be found in the cited references or in the Meteorological Service of Canada (MSC) and NCEP online documentation.
b. Observations The radar precipitation data are taken from U.S. radar composites provided by Weather Decision Technologies, Inc. (WDT), with 10-min and 5-km temporal and spatial resolutions, respectively, and 5-dBZ reflectivity intervals. Reflectivity corresponds to the maximum value measured in the vertical over each pixel at a particular time. In this study, a lower threshold of 10-dBZ units of the logarithmic radar reflectivity, corresponding to 0.1 mm h⫺1, is used. To compare models results with radar data, we first converted reflectivity Z to rainfall rate R (mm h⫺1) using the following relationship: R ⫽ 共ZⲐ300兲2Ⲑ3,
共1兲
and then, groups of six 10-min rainfall rate values were summed and averaged to obtain hourly accumulated precipitation on a 5-km grid. As illustrated by Germann and Zawadzki (2002), the accuracy in terms of rainfall rates is not comparable to that of rainfall maps based on sophisticated algorithms that include corrections for factors such as the vertical profile of reflectivity, visibility, attenuation, and Z–R variation. There are errors in the quantitative rainfall estimation associated with brightband contamination, range effects on individual radars, and lack of cross-calibration of radars. However, the spatial coverage of radar composites is unique to warrant its use for scale-dependent analyses of model forecasts up to continental scales (Germann and Zawadzki 2002). Also, we note that the Z–R relation-
V A S I C´ E T A L .
NOVEMBER 2007
3753
TABLE 2. Period of the analyzed models/radar and satellite/radar 1-h accumulated precipitation fields.
Period 2003 2004
September October May June July August
Models/radar (21 days)
Satellite/radar (13 days)
12, 13, 18, 27 14, 17, 25, 26, 28 21, 22, 30, 31 11, 24 3, 4, 5, 6 19, 20
18, 27 14, 17, 25, 26, 28 24 4, 5, 6 19, 20
ship [(1)] is commonly used at NCEP to infer radar reflectivities associated with rainfall from Etaparameterized convection (B. Ferrier 2007, personal communication). The second set of data that we include in this paper is the Geostationary Operational Environmental Satellite (GOES) data, obtained on the same spatial resolution as the radar data and at 15-min time resolution. These data are based on the GOES operational rainfall auto-estimator (Vincente et al. 1998). This algorithm computes rain rates from the 10.7-m brightness temperatures using a relation derived from more than 6000 collocated radar and satellite pixels. We do not have detailed information on the quality of satellite data, but it is certain that the algorithm introduces additional errors in comparison with radar data. Thus, we do not use the satellite data to evaluate the model results; in fact, we consider these data as model data as well. Consequently, we evaluate them both with radar data. The results will show how different the satellite data are from the radar data. This is useful for examining the applied autoestimator algorithm, and also for further understanding the effects of assimilation of satellite data in numerical models. For a consistent comparison both the radar and satellite data need to be projected onto a common verification grid.
c. Study cases and domain of analysis The Canadian and U.S. model data are obtained from, respectively, the CMC and the National Oceanic and Atmospheric Administration (NOAA)/National Weather Service/NCEP Mesoscale Modeling Branch. Hourly accumulated precipitation fields over 24 h for 21 days in the summer and autumn of 2003 and 2004 are used in this study. These dates are displayed in Table 2, and were chosen because of the availability of the Eta Model data. For Canadian models, the 2003 cases are from the 10-km GEM/HIMAP, while the 2004 cases are from the 15-km GEM. The model data are obtained from forecasts initialized at 1200 UTC the previous day,
FIG. 1. The domain of analysis is the boxed region over the central and eastern United States with an extent of 2160 km ⫻ 2160 km. The model, radar, and satellite precipitation are put on a common grid with a resolution of 12 km.
where the first 12 h of each forecast are not used to avoid model spinup problems associated with the initialization. Thus, we use only the 12–36-h forecasts for our study. Because these forecast hours correspond to 0000–2400 UTC of the days considered, we subsequently refer to this period as 0000–2400 UTC in our results. The radar data correspond to the same days as the models over 21 days, but the satellite data are available for only 13 of these days (Table 2). We use a 12-km grid to compare precipitation data from models, radars, and satellites over a domain in the central and eastern United States (Fig. 1). The domain is slightly reduced in extent (2160 km ⫻ 2160 km) compared with that of Germann and Zawadzki (2002) to better match the GEM/HIMAP model and radar domains, and to exclude regions that are not covered by the radar network. Using CMC software, all model forecasts (GEM/HIMAP, GEM, and Eta) on the original grid were used to extract precipitation over a common 12-km analysis grid. Hourly accumulated radar and satellite precipitation data (mm h⫺1) on the original 5-km grid were projected onto the 12-km analysis grid by searching all 5-km grid points whose coordinates (latitude and longitude) fall in the analysis domain and assigning them to the nearest verification grid point. The rain accumulation is then appropriately averaged over the 12-km grid for further analysis.
d. Comparison approaches We use both traditional statistical and scale decomposition methods to compare model and satellite precipitation fields with the radar-retrieved values. The
3754
MONTHLY WEATHER REVIEW
following statistics are considered: the mean diurnal cycle of the domain-averaged precipitation, the area of fractional coverage of different precipitation intensity levels relative to a threshold of 0.1 mm h⫺1, the predictability of precipitation over a 24-h forecast lead time using conventional skill scores, and frequency distribution of binned precipitation intensity. The predictability is assessed using conventional categorical skill scores, such as the probability of detection (POD), false-alarm rate (FAR), and critical success index (CSI). These measures depend on the number of hits H, misses M, and false alarms F of the forecast (Johnson and Olsen 1998), and are computed as POD ⫽ CSI ⫽
H , H⫹M
FAR ⫽
F , H⫹F
and
H . H⫹M⫹F
R␣共m, 兲 ⫽
1 T⫺
共2兲
T⫺
兺 t⫽1
VOLUME 135
For a perfect forecast, POD ⫽ 1, FAR ⫽ 0, and CSI ⫽ 1. These scores were used by Germann and Zawadzki (2002) and Turner et al. (2004) to assess the predictability of radar nowcasts. We use the Haar discrete wavelet transform to perform scale decomposition of the precipitation fields, as implemented by Turner et al. (2004). The latter updated the discrete cosine transform used by Germann and Zawadzki (2002) to study the predictability of precipitation from the U.S. continental radar network. Lin et al. (2005) used the same algorithm to compare NWP model forecasts and radar nowcasts of precipitation, and showed that the latter is more skilful up to a lead time of about 6 h. We extend this approach to analyze the time evolution of lagged auto- and cross correlation of the spectral amplitudes at different dyadic scales. The lagged cross-correlation coefficient at scale m and lag (h) is given as
3
4⫺m
兺 关W 共m, x, y, t兲W 共m, x, y, t ⫹ 兲兴
⫽1
␣

关S␣␣共m, t兲S共m, t ⫹ 兲兴1Ⲑ2
The numerator is the lagged cospectrum at scale index m and lag , computed via wavelet coefficients of two lagged precipitation fields ␣ and , where, for example, ␣ denotes radar and  the GEM model. The overbar denotes a spatial average, and there are T ⫽ 24 hourly accumulations for each day. The terms in the denominator represent the spectra of the corresponding precipitation fields (S␣␣, S). In the case of ␣ ⬅ , (3) represents the autocorrelation coefficient.
e. Precipitation estimation from models, radar, and satellite Precipitation estimation from model forecast and observational systems is a difficult problem. There are many issues that contribute to the quality of the estimated fields. We emphasize some of these issues here. The formation of clouds and the development of precipitation are dictated by complex phenomena. Physical processes generating precipitation occur at scales that usually cannot be treated explicitly by mesoscale NWP models at current grid resolution. Thus, model precipitation is obtained as a result of parameterizations based on environmental conditions on scales much larger than the scales of real precipitation formation. These processes are represented by different cumulus parameterization and microphysics schemes (Table 1). The tuning of parameters such as “cloud efficiency” and “trigger
, ⫽ 0, 1, . . . , T ⫺ 1.
共3兲
function” are needed in these schemes to generate reasonable precipitation intensity. Other processes include turbulent transport in the planetary boundary layer and the overall effect of turbulence on the evolution of droplet spectra. Model initialization, and lateral and bottom boundary conditions are other factors that affect model accuracy. Finally, the choice of finitedifference numerical schemes with filters that effectively control implicit and explicit numerical diffusion contributes to accuracy of the model simulation (Skamarock 2004). We now turn to radar-estimated precipitation fields. The radar detects precipitation but measures reflectivity at some height from the ground. Transformation of the radar-measured reflectivity into a rainfall rate poses a number of difficulties. First, the accuracy of the measured reflectivity values can be affected either by the influence from fixed targets (e.g., ground clutter, beam blocking) or by the intermittent influence of anomalous propagation. Second, the increase in elevation of the beam with distance from the radar leads to errors associated with uncertainties in the shape of the vertical reflectivity profile between the beam height and the surface; precipitation is often undetected, or underestimated, as distance from the radar increases (Hunter 1996). Third, brightband effects can lead to overestimation of rainfall rates. The Z–R relationship between
NOVEMBER 2007
V A S I C´ E T A L .
reflectivity Z and rainfall rate R cannot apply equally well for different radars at different times and location (Knievel et al. 2004). The Z–R relationship depends on the drop size distribution and is also affected by beam spreading because a uniform spatial distribution of liquid drops is usually assumed (Hunter 1996). Precipitation estimation from satellite radiance is an interesting approach, however, that is an additional step removed from radar estimation. While radar detects precipitation and measures its reflectivity, satellite observes clouds from the top and measures radiances on a wide range of scales. Moreover, the algorithms used to convert radiances into rainfall rates are less robust than those for radars. The functional relation between the 10.7-m cloud-top brightness temperature and rainfall rates is calibrated by radar data. Further, the algorithm uses several adjustments, such as multiplicative moisture adjustment using humidity data from the Eta forecasts, and adjustments for parallax and orographic effects (Vicente et al. 2002). Because the autoestimator has a tendency to incorrectly identify cirrus shields as raining clouds, a screen that uses radar reflectivity data to identify no-rain clouds has also been added. We thus believe precipitation data from satellite measurements are less accurate than that from radar measurements. Thus, we do not use the satellite data for model evaluation, and instead they are evaluated using radar data. We acknowledge all estimates of precipitation have errors, and the purpose of this paper is not to quantify the sources of errors. Instead, the goal is to evaluate model and satellite precipitation data, using radar values as reference.
3. Results We will show results averaged over two study periods depending on data availability. The first is the 21-day period where precipitation fields are available from models and radars (Table 2). The second is for the restricted 13-day period where satellite precipitation is also available. We will refer to results based on these study periods as “21 days” and “13 days,” respectively. We will also show combined results, where the averaging is performed over 21 days for the models and radar, and over 13 days for the models, radar, and satellite. These results using the combined datasets with the two averaging periods will be referred to as “21/13 days.” We will refer to the results for 4–6 July 2004 as “4–6 July”; these are the only three consecutive days where precipitation is available from all three sources (models, radar, and satellite) in our datasets. The combination of the 15-km GEM and 10-km GEM/HIMAP fore-
3755
casts, after their transformation on the common 12-km verification grid, is hereafter termed the GEM results. Figure 2 shows a typical 1-h-accumulated precipitation field from the models, radar, and satellites, on 20 August 2004 (1900–2000 UTC). The radar and satellite fields qualitatively resemble each other more than the model results, with more small-scale structure. The resemblance between the radar and satellite results is perhaps not surprising, because the satellite retrieval algorithm is calibrated using radar precipitation. The model results tend to be smoother, with Eta precipitation maxima being generally reduced compared with GEM, which will be verified quantitatively later.
a. Statistical evaluation In this subsection, we apply conventional statistical measures to quantitatively evaluate model and satellite precipitation with respect to radar values. Figures 3a,b show the mean diurnal cycles of the domain-averaged precipitation over 21 days for models and radar, and over 13 days for models, radar, and satellite. The radar results show a minimum of precipitation at about 1700 UTC (1100 LT) over the eastern United States. There is a maximum around 2300 UTC (1800 LT). The satellite results reproduce both the minimum and maximum well, but the averaged precipitation in the 0700–2200 UTC time period is higher than that of the radar, and is considerably lower at about 0400–0700 UTC. The GEM and Eta both exhibit similar distribution in comparison with radar with a clear diurnal cycle, and slight phase shifts in the timing of the precipitation minimum. Both models have maximum precipitation at about 2300 UTC, but the magnitudes are quite different compared with that of the radar. The Eta Model also has a second maximum at about 0700 UTC. GEM overestimates the domain-averaged precipitation throughout the diurnal cycle, with almost the correct phasing. Dai et al. (1999) examined the diurnal cycle of precipitation over the continental United States using observations. Their results for June–July–August, averaged over 1963–93, show that the maximum precipitation occurs at about 0600 UTC (0000 LT) over the west side of our domain, and at about 2400 UTC (1800 LT) over the east side. We did not examine the diurnal cycle separately over the west and east sides of our domain, but the presence of a broad maximum near 1800 LT is consistent with the Dai et al. (1999) analysis. Knievel et al. (2004) examined the rainfall diurnal and semidiurnal modes over the entire United States in the WRF numerical weather prediction model and their agreement with observations from the Weather Surveillance Radar-1988 Doppler network. They found that the model’s peak in rainfall frequency was 1–3 h too early over
3756
MONTHLY WEATHER REVIEW
VOLUME 135
FIG. 2. A typical distribution of 1-h accumulated precipitation (mm) at 1900–2000 UTC 20 Aug 2004 from the (bottom) models (GEM and Eta) and (top) RAD and SAT.
large regions of the country. Our results also show phasing errors in the precipitation minima and maxima. Figure 3c shows the time series of domain-averaged precipitation for 4–6 July, for three consecutive days. The 12–36-h forecasts of each individual forecast initialized at 1200 UTC on 3, 4, and 5 July, respectively, were combined to produce the consecutive 72-h rainfall time series for 4, 5, and 6 July 2004. The results are noisier than those of the mean diurnal cycle, as expected. There are time periods where the models’ and satellite results are in better agreement with the radar results. In particular, during the 36–48-h period, the models and satellite are in good agreement with radar, while the models’ results for the 24–36-h period are
considerably underpredicted. During 60–72 h, both models overpredict the observed precipitation minimum; the Eta Model highly underpredicts the observed maximum at the end of the third day. The radar precipitation minimum at around 1700 UTC is present in all 3 days with a broad maximum earlier in the day, consistent with the mean diurnal cycle. The models’ diurnal maxima and minima again are somewhat shifted in comparison with the radar. The agreement among the different sources of precipitation is better in the second half of the 72-h time period compared with the first. Figures 4a and 4b show the percentage area coverage of different precipitation intensity levels over the mean
NOVEMBER 2007
V A S I C´ E T A L .
FIG. 3. The domain-averaged precipitation for models, radar, and satellite. (a), (b) The mean diurnal cycle averaged over 21 and 13 days, respectively. (c) The time series for three consecutive days, 4–6 Jul 2004.
diurnal cycle, averaged for 21 and 13 days, respectively. The top panels show the fractional coverage of precipitation exceeding 1 mm h⫺1 relative to a threshold of 0.1 mm h⫺1, and the middle and bottom panels show the 5 and 10 mm h⫺1 exceedance, respectively. The timing of maximum and minimum precipitation for the three exceedance levels is consistent with Figs. 3a,b. The frac-
3757
tional coverage is systematically lower for satellite and models compared with radar for the 5 and 10 mm h⫺1 levels. In other words, the models and satellite underestimate strong precipitation. We already noted from Figs. 3a,b that GEM has a higher domain-averaged precipitation compared with the radar over the mean diurnal cycle. This is consistent with the top panels in Figs. 4a,b, which show the area coverage of precipitation exceeding 1 mm h⫺1 as being generally higher for GEM compared with radar. However, the middle and bottom panels in Figs. 4a,b show the reverse is true for higher precipitation levels. The same comments apply to the Eta Model, with even more underestimation of strong precipitation compared with the GEM. The overestimation of domain-averaged precipitation by the models together with an underestimation of strong precipitation means that model precipitation patterns tend to be too diffused spatially. That is especially evident with the Eta Model results. We note that the satellite results have a similar distribution as that of the Eta Model, except the higher precipitation intensities are considerably closer to those of the radar and GEM. We expect the satellite results to be in closer agreement with the radar relative to the models because the satellite retrieval algorithm is calibrated by radar data. Lin et al. (2005) compared the skill of short-term precipitation forecasts based on Lagrangian advection of radar echoes (“nowcasts”) with the GEM and Eta Model forecasts using the 21 cases examined in this study. The radar nowcasts are robust and have more skill than numerical weather prediction models over time scales of several hours. This is because the models do not generally capture the initial precipitation distribution well. Over longer time scales, we expect the models to perform better than nowcasts as they resolve the large-scale flow dynamically (Austin et al. 1987; Golding 1998; Wilson et al. 1998). Lin et al. (2005) verified this conceptual picture of the relative accuracy of radar nowcasts and model forecasts using categorical skill scores (POD, FAR, CSI). The crossover point in time, where model forecasts start to have more skill than nowcast methods, was identified as occurring at about 6 h after forecast initiation. The comparison of radar nowcasts and model forecasts was done for a forecast lead time of 9 h in that study. In this study, we focus on model forecasts and not on radar nowcasts, and we extend the skill analysis from lead times of 9 to 24 h. Figure 5a shows that model forecast skill is approximately constant over a 24-h lead time. The 21-day-averaged skill results obtained using a precipitation threshold of 0.1 mm h⫺1, along with the sample standard deviation of the corresponding GEM skill measures, are shown for the 12-km analysis grid
3758
MONTHLY WEATHER REVIEW
VOLUME 135
FIG. 4. The percentage area coverage of different precipitation intensity levels over the mean diurnal cycle. For example, A_pr ⬎ 10 refers to the area in the study domain where the precipitation exceeds 10 mm h⫺1. The areas for different levels (1, 5, and 10 mm h⫺1) are normalized by the area where precipitation exceeds a threshold of 0.1 mm h⫺1. (a) RAD, GEM, and Eta results are 21-day averaged. (b) All results are 13-day averaged.
and after wavelet low-pass filtering with a cutoff scale of 96 km. The results with the filtering (Fig. 5b) will be discussed later. The envelope delineated by ⫾1sGEM (standard deviation) about the mean value represents an estimate of the spread of the GEM skill scores, and is taken to represent the uncertainty in the forecast skill. The fact that the Eta Model results fall within this envelope suggests that both models have similar skill. We note that the Eta Model is more skillful than GEM in the POD score, but the reverse is true for FAR. This suggests that the Eta Model forecasts rain in more pixels compared with GEM, thus giving more hits and a higher POD score. However, this also results in more false alarms, thus resulting in a worse FAR. This interpretation is consistent with Fig. 4a (top panel), which shows that the Eta Model has more spatial coverage than GEM for rain rates exceeding 1 mm h⫺1, but that the opposite holds for higher rain rates (middle and
bottom panels). We have also computed skill scores for two more precipitation thresholds (0.5 and 1.0 mm h⫺1), and found generally the same trend except with decreasing skill. We now turn to an analysis of precipitation in the frequency domain. For models and radar, our datasets consist of 21 days of 24-hourly accumulated precipitation fields, giving 21 ⫻ 24 ⫽ 504 hourly samples. The sample size is somewhat reduced (312 h) for the satellite dataset. The domain extent is 2160 km ⫻ 2160 km at 12-km resolution, yielding a total of 180 ⫻ 180 ⫽ 32 400 pixels. We bin the hourly precipitation of each pixel in intervals of ␦ ⫽ 0.3 mm h⫺1 width. Figures 6a,b show the histograms so obtained for the radar and models, and for the radar, models, and satellite for the reduced dataset. Note that a straight line in the semilog plot indicates a power-law dependence of the frequency, with the exponent given by the slope. The results show
NOVEMBER 2007
V A S I C´ E T A L .
3759
FIG. 5. The 21-day averaged model skill scores (POD, FAR, and CSI) over a 24-h forecast lead time. The threshold for hourly precipitation is 0.1 mm. The thin solid lines are the ⫾1sGEM for GEM, taken over all forecast cases. (a) The results on the original model grid with a resolution of 12 km. (b) The results obtained after wavelet low-pass filtering with a cutoff scale of 96 km.
a good agreement between the radar and GEM, as well as between the radar and satellite, with the GEM and satellite values falling well within the envelope of ⫾1 standard deviation about the radar. The comparison with the Eta Model is much less favorable, with most of the Eta Model values falling outside the envelope. Only Eta Model precipitation of approximately 3–5 mm h⫺1 falls within the envelope. The Eta Model shows a nearpower-law dependence over almost the entire precipitation range, as seen from the straight line slope. For the radar, satellite, and GEM, the power-law dependence is found at intermediate and high precipitation values, with an exponent that is quite different from that of the Eta Model. We showed in Fig. 3c the average precipitation over the consecutive 3-day period of 4–6 July 2004, where the radar, satellite, and model values are all available. Figure 7 shows a quartile analysis of this 3-day period, where the first (highest 25%), second (median), and third (lowest 25%) quartile values are identified. The
lines are tightly bunched together for the third quartile, with the spread being progressively larger for the median and first quartile. This indicates the models are better at reproducing low precipitation values. We saw in Fig. 6 that the GEM histograms compare well with the radar, but the quartile analysis for 4–6 July 2004 shows that GEM does not reproduce intense precipitation well for these specific days. The results are even less favorable for that of the Eta Model.
b. Scale decomposition analysis For the scale decomposition analysis, we use the Haar wavelet algorithm of Turner et al. (2004). Figure 8a shows the average Haar power spectra of the model, radar, and satellite precipitation fields as a function of spatial scale. There is a decrease in power with decreasing spatial scale for all sources of precipitation, but the rate of this decrease can be quite different for models’ and observed spectra, as seen from the spectral slopes at high wavenumbers. The results for the satellite pre-
3760
MONTHLY WEATHER REVIEW
VOLUME 135
FIG. 6. Frequency distributions of the observed and the model predicted precipitation. (a) The radar and model results are based on 504 hourly precipitation samples. (b) All results are based on 312 hourly precipitation samples. 1sRAD denotes 1 std dev for the radar. The numbers in the top right corners indicate the number of pixels with rain above the threshold of 0.3 mm h⫺1, out of a total of 32 400 pixels.
cipitation show reduced power compared with radar by almost the same amount at all scales. Because the spectra are given in dimensional form, this offset is because of the use of different sample sizes for the satellite (13 days) and radar (21 days). We have verified that using the same 13 days for both radar and satellite removes the offset. This is confirmed later in the results shown in Fig. 9, where we examine the spectral amplitudes for radar and satellite over these 13 days. GEM shows slightly more power than the radar at scales larger than about 100 km, while its power at small scales decreases considerably faster compared with that of the radar. The Eta Model shows less power than radar at all scales, with a loss of power at small scales being even
more pronounced than that for GEM. This is consistent with our earlier conclusion that Eta Model precipitation tends to be too diffuse compared with that of the radar. This problem with the Eta Model precipitation forecast is also reported by Baldwin and Wandishin (2002) in their comparative study of several WRF and Eta Model forecasts, at different grid spacing with different convective schemes. These authors analyzed a 3–6-h forecast of a single case initialized at 1200 UTC 4 June 2002. The operational Eta Model precipitation field at 12-km resolution exhibits much less spatial variability than the other models, and in particular the observed field, resulting from significant smoothing. They also reported unexpected findings that an experimental
FIG. 7. A quartile analysis of the observed and model forecast precipitation for three consecutive days (4–6 Jul 2004). (left) First (highest 25%) and third (lowest 25%) quartiles and (right) the second (median) quartile.
NOVEMBER 2007
V A S I C´ E T A L .
3761
FIG. 8. (a) The average Haar power spectra of the model (GEM and Eta), RAD and SAT precipitation fields, obtained by averaging the 504- (GEM, Eta, and RAD) or 312-sample spectra (SAT) of 1-h accumulated precipitation fields. The arrows show estimates of the effective model resolution for precipitation; see text for discussion. (b) The effect of smoothing on the model and radar precipitation fields, using low-pass filtering on a number of dyadic scales. The index m ⫽ 1, 3, and 5 refers to the original field, and cutoff scales of 8⌬x ⫽ 96 km and 32⌬x ⫽ 384 km, respectively. The abscissa is the nondimensional wavenumber (k⌬x) and the corresponding dimensional length scale (km) is also shown.
version of the Eta Model running at the National Severe Storms Laboratory at 22-km grid spacing with the Kain–Fritsch convective scheme (denoted by KF in their paper) performed much better than the operational Eta Model on a finer 12-km grid with the Betts– Miller–Janjic´ convective scheme. As seen in Table 1, we also use data from the Eta Model that is operational with the Betts–Miller–Janjic´ convective scheme, while GEM uses the Kain–Fritsch convective scheme. In both Baldwin and Wandishin (2002) and our precipitation spectra, an enhanced drop in power of Eta Model spectra is evident across almost all spatial scales. A similar problem with the NMM model spectra of kinetic energy is reported by Skamarock and Dempsey (2005). These authors introduced several corrections in the original NMM second-order numerical schemes and succeeded in eliminating anomalies in the kinetic energy spectra. The improved spectra are consistent with the results in Skamarock (2004). We thus conclude it is likely that both the convective and numerical schemes (with filters that are less scale selective) in the Eta Model are responsible for the differences between the Eta Model and GEM. Model spectra can help to determine the resolved
and unresolved scales. The point where the model’s spectrum begins to decay faster than the radar spectrum can be taken as the separation between the unresolved and resolved scales, that is, determining the effective resolution of the model. Referring to Fig. 8a, we use the length scale determined by the intersection of the GEM and radar spectra as a measure of the effective model resolution. This criterion does not work for the Eta Model, because the Eta Model spectrum is offset from the radar spectrum at all scales. We thus use the intersection of the Eta Model spectrum with the lower bound of one standard deviation from the radar spectrum. The effective resolution so obtained is approximately 8⌬x ⫽ 96 km for GEM and 11⌬x ⫽ 132 km for Eta. A second criterion for the Eta Model is based on the scale where the spectrum starts to decay faster relative to the radar, giving a second estimate of the Eta Model effective resolution as approximately 32⌬x ⫽ 384 km. All of these estimates are indicated by arrows in Fig. 8a. Other studies have determined the effective resolution to be 5–10 times the model grid size (Harris et al. 2001; Baldwin and Wandishin 2002; Skamarock 2004; Bousquet et al. 2006). Studying NWP model kinetic energy spectra, Skamarock (2004) found that the
3762
MONTHLY WEATHER REVIEW
FIG. 9. Scatterplot of the Haar spectral power at different scales for models (GEM and Eta) vs RAD and SAT vs RAD. The results represent a daily (24-h) average of 1-h accumulated precipitation fields for each of 21 days for (top) GEM–RAD and (middle) Eta–RAD, and 13 days for (bottom) SAT–RAD.
WRF model forecasts at 22-, 10-, and 4-km grid spacing show an effective resolution of around 7⌬x. This is similar to the Bousquet et al. (2006) result of an effective resolution of 6⌬x for the GEM/HIMAP model with 10-km grid spacing. Figure 8b shows the filtered spectra, through the application of a Haar wavelet smoothing function on a number of dyadic scales. We see that low-pass filtering does reduce the difference between the model and radar spectra from the original unfiltered results. With smoothing at the scale of 8⌬x ⫽ 96 km (m ⫽ 3), the
VOLUME 135
spectra of the model fields agree much better with observed spectra than the unfiltered original results (m ⫽ 1); there is, of course, reduced power at the small scales and a steeper spectral slope. Further filtering (m ⫽ 5) only increases the spectral slope, leading to nearidentical model and radar slopes. Thus, scales smaller than m ⫽ 3 may be treated as noise, which accounts for the discrepancy between the model and radar precipitation. The effective model resolution for precipitation, being many times the model grid size, means the smaller scales are essentially unpredictable. It is thus reasonable to filter these scales in both the model and radar spectra in our evaluation. We return to Fig. 5, which shows the model skill over a 24-h forecast period at both the original grid resolution of ⌬x ⫽ 12 km (Fig. 5a) and after wavelet low-pass filtering with a cutoff scale of 96 km (Fig. 5b). The latter choice corresponds to 8⌬x, consistent with our earlier results. We see that filtering does increase the model skill, resulting in higher values of POD and CSI and reduced, lower FAR. Bousquet et al. (2006) also showed, using a much smaller sample, that smoothing leads to an improvement in model skill. The filtering of small scales makes the precipitation fields smoother and reduces the variance, leading to a better correlation with observations. Another way to improve the correlation is through phase shifting to reduce phase errors. Bousquet et al. (2006) showed such phase shifting and subsequent smoothing to remove small-scale noise that resulted in a reduction of the mean square error by as much as 30%. Figure 9 shows scatterplots of the daily (24 h) averaged GEM, Eta Model, and satellite spectral power versus radar values at different scales. In this analysis, we have 21 days of data for model–radar comparison (top and middle panels) and only 13 days for satellite– radar comparison (bottom panel). There is a decrease of power with decreasing scales, as seen from the earlier results. The satellite–radar comparison shows better agreement than the model–radar comparison, which is probably a result of the satellite retrieval algorithm being calibrated using radar precipitation. The good agreement between the radar and satellite values is nonetheless encouraging. The Eta Model shows a much larger underestimation of power at small scales as compared with the GEM. These results could guide model developers in identifying model deficiencies at different scales. Precipitation fields evolve in space and time, and both aspects need to be considered in precipitation verification. We now examine the Haar time-lagged auto- and cross correlation of the spectral amplitudes at
NOVEMBER 2007
V A S I C´ E T A L .
3763
FIG. 10. The 21-day Haar averaged lag correlation (RAD–RAD and GEM–GEM) and cross correlation (RAD– GEM and RAD–Eta) over a 24-h time period at different length scales, ranging from /⌬x ⫽ 8 to 128; (lag ⫽ 0, 1, 2, . . . , 23 h). The ranked Spearman correlation results are shown as the full thin lines. The horizontal dashed line at e⫺1 ⫽ 0.368 is used as a measure of the decorrelation time. The top and middle panels use only the fluctuating component, while the bottom panels include both fluctuating and mean components at each scale. See text for further discussion.
different scales, comparing radar–radar (RAD–RAD), GEM–GEM, RAD–GEM and RAD–Eta. The time lag ranges from 0 to 23 h. Figure 10 shows results for different values of the nondimensional wavelength (/⌬x) ranging from 8 to 128. The ranked Spearman correlation is also evaluated. The top and middle panels show the results when only the fluctuating component (i.e., with the mean contribution removed) is included in the correlation analysis. For computation of these coefficients we used (3). The bottom panels show results with
the entire field (mean and fluctuating components) included. The autocorrelation analysis gives a measure of the decorrelation time as a function of scale, while the cross correlation gives an estimate of the lead time over which the forecast has skill. As in Germann and Zawadzki (2002), the threshold of e⫺1 ⫽ 0.368 is used as an indicator of the predictability time scale. The RAD– RAD results show a decorrelation time of about 9–10 h for the largest scale (/⌬x ⫽ 128), which decreases to less than 2 h for the smallest scale shown (/⌬x ⫽ 8).
3764
MONTHLY WEATHER REVIEW
The behavior of GEM–GEM and RAD–RAD are quite comparable, with a slightly longer decorrelation time for GEM–GEM at the larger scales, most likely resulting from smoother model results. The Eta–Eta results (not shown) are similar to those of GEM–GEM. The RAD–GEM and RAD–Eta lagged crosscorrelation coefficients (middle panels) show that there is little predictability beyond about 5 h for scales smaller than 64⌬x ⫽ 768 km. However, when both the mean and fluctuating fields are used in computing cross-correlation coefficients (bottom panels), the results are more favorable, with a model predictability of about 5–6 h being extended to scales of 32⌬x ⫽ 384 km. In both cases, smaller scales are not predictable and thus should be excluded in model skill evaluation. We also included the ranked Spearman correlation and cross-correlation coefficients (full thin lines) for precipitation fields taken as a whole without scale decomposition. These results show that model forecast has skill up to a lead time of approximately 5–6 h. In this section, we use “predictability” to mean temporal correlation between model and radar precipitation fields. This correlation is in general sensitive to the relative position of precipitation patterns. Errors in the latter affect most the small scales. A simple phase shift can improve correlation as a postprocessing correction (e.g., Bousquet et al. 2006). We now turn to the spatial correlation of precipitation accumulation over different time periods (1, 3, 6, 12, and 24 h). We would expect longer accumulation to be more predictable. Figure 11 shows the RAD–GEM, RAD–Eta, and RAD–satellite (SAT) cospectra at zero lag as a function of spatial scales, for different accumulation periods. The RAD–GEM and RAD–Eta cospectra of the 1-h accumulated precipitation fields, Fig. 11, at different scales have corresponding values in Fig. 10 at zero lag ( ⫽ 0). Thus, for 1-h accumulation, cospectra (Fig. 11) are a special case of the lagged cospectra (Fig. 10). Spatial correlation is enhanced for 24-h accumulation compared with 1-h accumulation, as seen from the corresponding spatial decorrelation scales of ⬃26⌬x ⫽ 312 km and ⬃48⌬x ⫽ 576 km. The agreement between satellite and radar is better than between models and radar for a larger range of spatial scales.
4. Conclusions Precipitation from two operational numerical weather prediction models (GEM and Eta) has been compared quantitatively with observed precipitation from radars and satellites. Hourly accumulated precipitation fields from the models and radar from 21 days over the summer and the autumn of 2003–04 were examined over a domain covering the eastern and central
VOLUME 135
FIG. 11. The RAD–GEM, RAD–Eta, and RAD–SAT cospectra of the 1-, 3-, 6-, 12-, and 24-h accumulated precipitation fields as a function of length scale.
United States, at a grid spacing of 12 km. For satellite data, we only used 13 days because of data availability. The sample size is thus 504 and 312 1-h accumulated precipitation fields for the models/radar and satellite, respectively. Both conventional statistical skill measures and wavelet-based scale decomposition methods were used as diagnostic methods. We take the radar precipitation as a reference, and evaluate the model and satellite precipitation in comparison. We first summarize results using conventional skill measures, including categorical scores such as POD, FAR, and CSI. The observed mean diurnal cycle of the domain-averaged precipitation is reasonably well reproduced by both forecast models, with phase shifts in the timing of the minimum; these results agree in gen-
NOVEMBER 2007
V A S I C´ E T A L .
eral with Dai et al. (1999). The satellite results are generally close to those of the radar, partially because of the calibration of the satellite retrieval algorithm using radars (Vicente et al. 1998). The model–radar difference is larger than satellite–radar, especially for the Eta Model. The latter underestimates strong precipitation compared with GEM and the radar, and its precipitation patterns tend to be too diffused spatially. Consistent with these results are the averaged model skill score over a 24-h forecast lead time. For a precipitation threshold of 0.1 mm h⫺1, the Eta Model shows more skill than GEM for POD scores with the reverse being true for FAR, a consequence of smoothing in the Eta Model leading to more diffuse precipitation. Consequently, CSI is about the same for both models. The Eta Model underestimates strong precipitation compared with GEM and the radar. Analysis of the binned precipitation in the frequency domain shows a good agreement of GEM results with radar and satellite at all intensities, with much less favorable results for the Eta Model; most of the Eta Model values fall outside the envelope of ⫾1 standard deviation about the radar mean. We now turn to the Haar wavelet-based scale decomposition results. The average spectra show an enhanced falloff of power in the models compared with radar at small scales, with an effective resolution of about 8 times the grid spacing for GEM, and larger for Eta. Analyses of the lagged cospectra shows that the models have little predictability at small scales. Scale decomposition is thus a powerful diagnostic tool to characterize spatial and temporal features of precipitation, and to assess the predictability of models as a function of both time and space scales. Acknowledgments. This work forms part of the Canadian Weather Research Program. The authors gratefully acknowledge the help of Dr. Eric Rogers of the U.S. NCEP, who made the 12-km Eta Model hourly accumulated precipitation available. We thank Dr. Brad Ferrier for providing us with some specific details of his cloud microphysics scheme. We thank Dr. Robert J. Kuligowski from NOAA/NESDIS/Office of Research and Applications, for providing us the satellite data, and Dr. Yingxin Gu from SAIC/USGS Earth Resources Observation and Science, who made a suitable format of these data. The precipitation data from the GEM model were obtained from the CMC archives. The U.S. composite radar data were provided by WDT. We acknowledge useful discussions with Dr. Barry Turner on the discrete wavelet transform. Finally, the authors are especially grateful to the anonymous reviewers, whose comments and suggestions on an initial
3765
version of the manuscript have helped us to make considerable improvements. Financial support for this study is from the CRTI-02-0041RD project on “Real time determination of influence of CBRN release” of Health Canada, and the project on “Improving quantitative precipitation forecasts of extreme weather,” funded by Canadian Foundation for Climate and Atmospheric Sciences (CFCAS). REFERENCES Abdella, K., and N. McFarlane, 1997: A new second-order turbulence closure scheme for the planetary boundary layer. J. Atmos. Sci., 54, 1850–1867. Austin, G. L., A. Bellon, P. Dionne, and M. Roch, 1987: On the interaction between radar and satellite image nowcasting systems and mesoscale numerical models. Mesoscale Analysis and Forecasting, European Space Agency Special Publication ESA-SP 282, 225–228. Baldwin, M. E., and M. S. Wandishin, 2002: Determining the resolved spatial scales of Eta model precipitation forecasts. Preprints, 19th Conf. on Weather Analysis and Forecasting/ 15th Conf. on Numerical Weather Prediction, San Antonio, TX, Amer. Meteor. Soc., CD-ROM, 3.2. Bélair, S., A. Méthot, J. Mailhot, B. Bilodeau, A. Patoine, G. Pellerin, and J. Côté, 2000: Operational implementation of the Fritsch–Chappell convective scheme in the 24-km Canadian regional model. Wea. Forecasting, 15, 257–274. Bousquet, O., C. A. Lin, and I. Zawadzki, 2006: Analysis of scale dependence of quantitative precipitation forecast verification: A case study over the Mackenzie River Basin. Quart. J. Roy. Meteor. Soc., 132, 2107–2125. Canuto, V. M., F. Minotti, C. Ronchi, R. M. Ypma, and O. Zeman, 1994: Second-order closure PBL model with new thirdorder moments: Comparison with LES data. J. Atmos. Sci., 51, 1605–1618. Côté, J., S. Gravel, A. Méthot, A. Patoine, M. Roch, and A. Staniforth, 1998: The operational CMC-MRB Global Environmental Multiscale (GEM) model. Part I: Design considerations and formulation. Mon. Wea. Rev., 126, 1373–1395. Dai, A., F. Giorgi, and K. E. Trenberth, 1999: Observed and model-simulated diurnal cycles of precipitation over the contiguous United States. J. Geophys. Res., 104, 6377–6402. Deardorff, J. W., 1972: Theoretical expression for the countergradient vertical heat flux. J. Geophys. Res., 77, 5900–5904. Ebert, E. E., and J. L. McBride, 2000: Verification of precipitation in weather systems: Determination of systematic errors. J. Hydrol., 239, 179–202. Ferrier, B. S., 1994: A double-moment multiple-phase four-class bulk ice scheme. Part I: Description. J. Atmos. Sci., 51, 249– 280. Germann, U., and I. Zawadzki, 2002: Scale-dependence of the predictability of precipitation from continental radar images. Part I: Description of the methodology. Mon. Wea. Rev., 130, 2859–2873. Golding, B. W., 1998: Nimrod: A system of generating automated very short range forecasts. Meteor. Appl., 5, 1–16. Harris, D., E. Foufoula-Georgiou, K. K. Droegemeier, and J. J. Levit, 2001: Multiscale statistical properties of a highresolution precipitation forecast. J. Hydrometeor., 2, 406–418. Holtslag, A. A. M., and C.-H. Moeng, 1991: Eddy diffusivity and
3766
MONTHLY WEATHER REVIEW
countergradient transport in the convective atmospheric boundary layer. J. Atmos. Sci., 48, 1690–1698. Hunter, S. M., 1996: WSR-88D radar rainfall radar estimation: Capabilities, limitations, and potential improvements. Natl. Wea. Dig., 20, 26–38. Janjic´ , Z. I., 1994: The step-mountain Eta coordinate model: Further developments of the convection, viscous sublayer, and turbulence closure schemes. Mon. Wea. Rev., 122, 927–945. ——, 2003: A nonhydrostatic model based on a new approach. Meteor. Atmos. Phys., 82, 271–285. Johnson, L. E., and B. G. Olsen, 1998: Assessment of quantitative precipitation forecasts. Wea. Forecasting, 13, 75–83. Knievel, J. C., D. A. Ahijevych, and K. W. Manning, 2004: Using temporal modes of rainfall to evaluate the performance of a numerical weather prediction model. Mon. Wea. Rev., 132, 2995–3009. Lin, C., S. Vasic´ , A. Kilambi, B. Turner, and I. Zawadzki, 2005: Precipitation forecast skill of numerical weather prediction models and radar nowcasts. Geophys. Res. Lett., 32, L14801, doi:10.1029/2005GL023451. Mailhot, J., and Coauthors, 2006: The 15-km version of the Canadian regional forecast system. Atmos.–Ocean, 44, 133–149. Mellor, G. L., and T. Yamada, 1982: Development of a turbulence closure model for geophysical fluid problems. Rev. Geophys. Space Phys., 20, 851–875. Murphy, A. H., and R. L. Winkler, 1987: A general framework for forecast verification. Mon. Wea. Rev., 115, 1330–1338. ——, and E. S. Epstein, 1989: Skill scores and correlation coefficients in model verification. Mon. Wea. Rev., 117, 572–582. Roebber, P. J., and L. F. Bosart, 1996: The complex relationship between forecast skill and forecast value: A real-world analysis. Wea. Forecasting, 11, 544–559. Skamarock, W. C., 2004: Evaluating mesoscale NWP models using kinetic energy spectra. Mon. Wea. Rev., 132, 3019–3032. ——, and D. Dempsey, 2005: High-resolution winter-season NWP: Preliminary evaluation of the WRF ARW and NMM
VOLUME 135
models in the DWFE forecast experiment. Preprints, 21st Conf. on Weather Analysis and Forecasting/17th Conf. on Numerical Weather Prediction, Washington, DC, Amer. Meteor. Soc., CD-ROM, 16A.3. Sundqvist, H., E. Berge, and J. E. Kristjánsson, 1989: Condensation and cloud parameterization studies with a mesoscale numerical weather prediction model. Mon. Wea. Rev., 117, 1641–1657. Troen, I., and L. Mahrt, 1986: A simple model of the atmospheric boundary layer; sensitivity to surface evaporation. Bound.Layer Meteor., 37, 129–148. Turner, B. J., I. Zawadzki, and U. Germann, 2004: Predictability of precipitation from continental radar images. Part III: Operational nowcasting implementation (MAPLE). J. Appl. Meteor., 43, 231–248. Vasic´ , S., C. A. Lin, and Y. Delage, 2001: Assessment of heat flux turbulence models for flows dominated by buoyancy effects. Atmos.–Ocean, 39, 471–484. Vicente, G. A., R. A. Scofield, and W. P. Menzel, 1998: The operational GOES infrared rainfall estimation technique. Bull. Amer. Meteor. Soc., 79, 1883–1898. ——, J. C. Davenport, and R. A. Scofield, 2002: The role of orographic and parallax corrections on real time high resolution satellite rainfall rate distribution. Int. J. Remote Sens., 23, 221–230. Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Academic Press, 467 pp. Wilson, J. W., N. A. Crook, C. K. Mueller, J. Sun, and M. Dixon, 1998: Nowcasting thunderstorms: A status report. Bull. Amer. Meteor. Soc., 79, 2079–2099. Zeman, O., and J. L. Lumley, 1976: Modeling buoyancy driven mixed layers. J. Atmos. Sci., 33, 1974–1988. Zilitinkevich, S., V. M. Gryanik, V. N. Lykossov, and D. V. Mironov, 1999: Third-order transport and nonlocal turbulence closures for convective boundary layers. J. Atmos. Sci., 56, 3463–3477.