DECEMBER 2010
TORN
4375
Performance of a Mesoscale Ensemble Kalman Filter (EnKF) during the NOAA High-Resolution Hurricane Test RYAN D. TORN Department of Atmospheric and Environmental Sciences, University at Albany, State University of New York, Albany, New York (Manuscript received 27 January 2010, in final form 3 August 2010) ABSTRACT An ensemble Kalman filter (EnKF) combined with the Advanced Research Weather Research and Forecasting model (ARW-WRF; hereafter WRF) on a 36-km Atlantic basin domain is cycled over six different time periods that include the 10 tropical cyclones (TCs) selected for the NOAA High-Resolution Hurricane (HRH) test. The analysis ensemble is generated every 6 h by assimilating conventional in situ observations, synoptic dropsondes, and TC advisory position and minimum sea level pressure (SLP) data. On average, observation assimilation leads to smaller TC position errors in the analysis compared to the 6-h forecast; however, the same is true for TC minimum SLP only for tropical depressions and storms. Over the 69 HRH initialization times, TC track forecasts from a single member of the WRF EnKF ensemble has 12 h less skill compared to other operational models; the increased track error partially results from the WRF EnKF analysis having a stronger Atlantic subtropical ridge. For nonmajor TCs, the WRF EnKF forecast has lower TC minimum SLP and maximum wind speed errors compared to some operational models, particularly the GFDL model, while category-3, -4, and -5 TCs are characterized by large biases due to horizontal resolution. WRF forecasts initialized from an EnKF analysis have similar or smaller TC track, intensity, and 34-kt wind radii errors relative to those initialized from two other operational analyses, which suggests that EnKF assimilation produces the best TC forecasts for this domain. Both TC track and intensity forecasts are deficient in ensemble variance, which is at least partially due to the lack of error growth in dynamical fields and model biases.
1. Introduction Tropical cyclone (TC) track and intensity forecasts from numerical weather prediction (NWP) models are limited by initial condition errors associated with the TC structure and environment. Over the past 15 yr, enhanced analyses of the TC environment are credited with the steady decrease in TC track forecast error; however, there has been minimal change in TC intensity forecast error over the same period (e.g., Rogers et al. 2006). Although the lack of improvement is at least partially related to model errors, including grid resolution, initial condition errors associated with the TC structure may also contribute to this finding. Tropical cyclones are characterized by large gradients in the wind and mass field; thus, it can be difficult to derive error statistics for a data assimilation system that
Corresponding author address: Ryan Torn, University at Albany, State University of New York, ES 351, 1400 Washington Ave., Albany, NY 12222. E-mail:
[email protected] DOI: 10.1175/2010MWR3361.1 Ó 2010 American Meteorological Society
will project observations onto model grid points in both the TC core and environment in an appropriate manner. As a consequence, many modeling systems either simply relocate the TC to the observed position (e.g., Liu et al. 2000) or use a vortex insertion technique in which a TClike vortex is added to the analysis (e.g., Kurihara et al. 1995; Zou and Xiao 2000), or observations are sampled from a predefined vortex and assimilated in a conventional manner (e.g., Chou and Wu 2008). The latter two methods are limited by their assumptions, so the above techniques may not be optimal at all times, particularly for asymmetric and vertically tilted TCs. Ensemble-based assimilation systems, such as the ensemble Kalman filter (EnKF), have shown great promise for atmospheric data assimilation in a number of settings because they use flow-dependent error statistics estimated from short-term ensemble forecasts (e.g., Dowell et al. 2004; Whitaker et al. 2008; Szunyogh et al. 2008; Meng and Zhang 2008). The error statistics, which determine how much weight to give to observations relative to a model forecast and how to spread observation information to model state variables, should allow for the
4376
MONTHLY WEATHER REVIEW
VOLUME 138
TABLE 1. Starting and ending dates of each assimilation cycling period and the HRH TCs that are contained in each. Cycling start date
Cycling end date
Storms
0000 UTC 7 Jul 2005 0000 UTC 21 Aug 2005 0000 UTC 3 Sep 2005 0000 UTC 12 Oct 2005 0000 UTC 28 Aug 2007 0000 UTC 21 Sep 2007
1200 UTC 21 Jul 2005 1800 UTC 30 Aug 2005 0000 UTC 26 Sep 2005 1800 UTC 25 Oct 2005 1200 UTC 17 Sep 2007 0600 UTC 29 Sep 2007
Emily (E) Katrina (Kat) Ophelia (O), Philippe (P), Rita (R) Wilma (W) Felix (F), Humberto (H), Ingrid (I) Karen (Kar)
more effective use of observations near TCs. In addition, a cycling EnKF system provides a set of analyses for TC ensemble forecasting. Previous work has shown that EnKF assimilation has great promise for TC state estimation. Torn and Hakim (2009) cycled an EnKF over the lifetime of Hurricane Katrina (2005). Their results indicate that EnKF TC track and intensity forecasts are characterized by a 50% reduction in error compared to the National Centers for Environmental Prediction (NCEP) Global Forecasting System (GFS) and National Hurricane Center (NHC) official forecasts. Zhang et al. (2009) showed that assimilating coastal radar data with an EnKF led to much improved forecasts of Hurricane Humberto (2007), while Chen and Snyder (2006) obtained lower TC track errors when initializing a mesoscale model with an EnKF analysis, rather than the GFS analysis. Moreover, TC track forecasts appear to benefit from assimilating satellitederived temperature data with an EnKF (e.g., Li and Liu 2009; Liu et al. 2010, manuscript submitted to Mon. Wea. Rev.). Although these studies are encouraging, they are all limited to individual case studies and are characterized by relatively short periods over which observations are assimilated (i.e., cycling). This study describes the performance of a cycling mesoscale EnKF system over the life cycle of multiple TCs, ranging from marginal to the most intense systems observed in the Atlantic basin. The EnKF system described here produces an analysis ensemble every 6 h for the 10 Atlantic basin TCs chosen for the National Oceanographic and Atmospheric Administration (NOAA) HighResolution Hurricane (HRH) test. The primary goal of this program was to evaluate the sensitivity of TC forecasts to grid resolution. Although one member of this analysis ensemble is used to initialize the National Center for Atmospheric Research (NCAR) contribution to HRH, the interested reader is directed to Davis et al. (2010) for those results. This manuscript proceeds as follows. Section 2 describes the modeling and data assimilation system, while section 3 provides an overview of the 10 storms studied here. Sections 4 and 5 provide verification information for the data assimilation system and ensemble forecasts,
respectively. The role of initial condition and model errors in the WRF forecasts is evaluated in section 6. A summary and conclusions are provided in section 7.
2. Experiment setup Ensemble analyses are generated every 6 h by cycling an EnKF over six different time periods that include 10 Atlantic basin TCs from the 2005 and 2007 seasons. Table 1 gives the starting and ending dates of the cycling periods, while Fig. 1 shows the model domain, which represents a trade-off between including the track of all storms and being computationally feasible. The 36-km horizontal grid spacing is chosen to be an odd integer multiple of the largest grid spacing used for the NCAR contribution to the HRH test. All 96 ensemble members are advanced in time using version 2.2.1 of the Advanced Research Weather Research and Forecasting model (ARW-WRF, hereafter WRF; Skamarock et al. 2005) with 36 vertical levels up to 20 hPa. This implementation of WRF has the following components: the WRF 5-class microphysics scheme (Hong et al. 2004), Rapid Radiative Transfer Model (RRTM) longwave radiation (Mlawer et al. 1997), the Dudhia shortwave scheme (Dudhia 1989), the Kain–Fritsch cumulus parameterization (Kain and Fritsch 1990), the Yonsei University (YSU) boundary
FIG. 1. Tropical cyclone best-track data for each of the 10 HRH TCs studied here. See Table 1 for the list of storms and the label identification. Latitude and longitude lines are shown every 108.
DECEMBER 2010
4377
TORN
layer scheme (Hong et al. 2006), and the similarity theory land surface model (Skamarock et al. 2005) that includes the updated enthalpy and momentum drag formulations described in Davis et al. (2008). Ensemble initial and lateral boundary conditions are generated using the fixed-covariance perturbation (FCP) technique of Torn et al. (2006). This technique produces a deviation from the ensemble mean for each ensemble member by drawing random perturbations from the NCEP error covariances contained in the WRF VAR system (Barker et al. 2004). The initial ensemble is then generated by multiplying the state perturbation by 1.7 and adding it to the 36-h NCEP GFS forecast valid at the starting times listed in Table 1. This setup follows Dirren et al. (2007), who showed that regional EnKF systems spin up faster when using larger ensemble-mean error and spread. The initial dates are chosen so that the ensemble has little memory of the initial ensemble by the time the first HRH TC is declared a tropical depression (at least 48 h prior to genesis). Ensemble lateral boundary conditions are produced in a similar manner as the initial ensemble, but the ensemble mean is the 6-h NCEP GFS forecast valid at the appropriate time. Observations are assimilated from Automated Surface Observing System (ASOS) stations, ships, buoys, rawinsondes, the Aircraft Communications Addressing and Reporting System (ACARS), and cloud motion vectors (Velden et al. 2005) using the Data Assimilation Research Testbed (DART; Anderson et al. 2009), which is an implementation of the ensemble adjustment Kalman filter (Anderson 2001). In addition, this system uses dropsondes deployed within 1 h of the analysis time from the NOAA G-IV aircraft, which samples the synoptic environment surrounding the storm (e.g., Aberson 2002). Dropsondes from the NOAA P3s are not considered because the model does not have sufficient resolution to resolve features sampled by the dropsondes in the TC core. Previous studies have shown that assimilating these dropsondes with a coarse-resolution model can result in degraded forecast skill (e.g., Aberson 2008). NHC TC advisory position (latitude and longitude of lowest sea level pressure) and minimum sea level pressure (SLP) are also assimilated using the technique outlined by Chen and Snyder (2007). Table 2 gives the list of observations assimilated from each platform and the data source. Observations are preprocessed using the methods outlined by Torn and Hakim (2008b). In particular, surface observations are assimilated only if the model and station elevation are within 300 m of each other. The number of ACARS observations is reduced by averaging all observations within 36 km in the horizontal and 25 hPa in the vertical of other ACARS observations. The resulting averaged observations, or ‘‘superobs,’’ are then
TABLE 2. Observations assimilated from each platform and the data source. Observation platform
Types assimilated
Surface Rawinsonde Dropsonde ACARS Satellite winds
Altimeter u, y, T, q u, y, T, q u, y u, y
TC
Lat, lon, min SLP
Source NCEP prepbufr NCEP prepbufr Postprocessed HRD files NCEP prepbufr Cooperative Institute for Meteorological Satellite Studies (CIMSS) NHC advisory data
assimilated instead of individual data. A similar procedure is used for satellite wind observations, except that observations are averaged within 60 km in the horizontal. All observations are subject to a quality control algorithm that rejects the observation if the absolute value of the difference between the observation and model estimate of the observation (i.e., observation innovation) is greater than 4 times the square root of the variance of the model estimate of the observation plus the observation error variance (i.e., innovation standard deviation). Observation errors are obtained from NCEP statistics, with the exception of TC position and minimum SLP. For TC position, the observation error is specified as 10 km in each horizontal direction, while the TC minimum SLP error is assumed to be 3 hPa. EnKF systems require additional steps designed to overcome sampling errors that result from using a small ensemble compared to the number of state variables and account for model error. The ensemble-derived covariances are localized using Eq. (4.10) of Gaspari and Cohn (1999) where the value reduces to zero 2000 km in the horizontal and 6 km in the vertical from the observation location. For locations where there are more than 1600 observations within the volume defined by the localization radii, the horizontal and vertical covariance localization radii are reduced until there are approximately 1600 observations within the ellipsoid. In densely observed regions, this approach helps prevent sampling errors associated with observations that are at the edge of the covariance localization window from artificially reducing the variance in a state variable. Both the localization radii and maximum observation number are determined by assimilating observations every 6 h from 0000 UTC 21 August to 1800 UTC 30 August 2005 and comparing the resulting 6-h wind and temperature forecasts against rawinsonde data. At each assimilation time, the ensemble deviations from the ensemble mean are inflated prior to assimilation using the adaptive inflation technique of Anderson (2009), where the inflation factor is damped by 10% at
4378
MONTHLY WEATHER REVIEW
VOLUME 138
TABLE 3. Forecast initialization times for each HRH TC. The number in parentheses is the forecast length in hours. TC name Emily (2005)
Katrina (2005)
Ophelia (2005)
Philippe (2005) Rita (2005)
Wilma (2005)
Felix (2007)
Humberto (2007) Ingrid (2007) Karen (2007)
Forecast initialization times 0000 UTC 11 Jul (72), 0000 UTC 12 Jul (72), 0000 UTC 13 Jul (72), 0000 UTC 14 Jul (72), 0000 UTC 15 Jul (72), 0000 UTC 16 Jul (72), 0000 UTC 17 Jul (72), 0000 UTC 18 Jul (72), 0000 UTC 19 Jul (60), 0000 UTC 20 Jul (36) 0000 UTC 24 Aug (72), 0000 UTC 25 Aug (72), 0000 UTC 26 Aug (72), 0000 UTC 27 Aug (72), 0000 UTC 28 Aug (66), 0000 UTC 29 Aug (42) 1200 UTC 6 Sep (72), 1200 UTC 7 Sep (72), 1200 UTC 8 Sep (72), 1200 UTC 9 Sep (72), 1200 UTC 10 Sep (72), 1200 UTC 11 Sep (72), 1200 UTC 12 Sep (72), 1200 UTC 13 Sep (72), 1200 UTC 14 Sep (72), 1200 UTC 15 Sep (54), 1200 UTC 16 Sep (30) 1200 UTC 17 Sep (72), 1200 UTC 18 Sep (72), 1200 UTC 19 Sep (72), 1200 UTC 20 Sep (72), 1200 UTC 21 Sep (48), 1200 UTC 22 Sep (24) 0000 UTC 18 Sep (72), 0000 UTC 19 Sep (72), 0000 UTC 20 Sep (72), 0000 UTC 21 Sep (72), 0000 UTC 22 Sep (72), 0000 UTC 23 Sep (72), 0000 UTC 24 Sep (48) 0000 UTC 16 Oct (72), 0000 UTC 17 Oct (72), 0000 UTC 18 Oct (72), 0000 UTC 19 Oct (72), 1200 UTC 19 Oct (72), 0000 UTC 20 Oct (72), 0000 UTC 21 Oct (72), 0000 UTC 22 Oct (72), 0000 UTC 23 Oct (66), 0000 UTC 24 Oct (42), 0000 UTC 25 Oct (18) 1200 UTC 31 Aug (72), 1200 UTC 1 Sep (72), 0000 UTC 2 Sep (72), 0600 UTC 2 Sep (72), 1200 UTC 2 Sep (72), 1800 UTC 2 Sep (72), 0000 UTC 3 Sep (66), 1200 UTC 3 Sep (42) 1200 UTC 12 Sep (30), 0000 UTC 13 Sep (24) 1200 UTC 12 Sep (72), 1200 UTC 13 Sep (72), 1200 UTC 14 Sep (60), 1200 UTC 15 Sep (36) 0000 UTC 25 Sep (72), 0000 UTC 26 Sep (72), 0000 UTC 27 Sep (54), 0000 UTC 28 Sep (30)
each assimilation time and the inflation standard deviation is fixed at 0.6. Inflation damping limits the ensemble variance in regions where the number of observations changes with time, but the inflation factor is large. Without inflation damping and vertical covariance localization, this covariance inflation technique can lead to unrealistic ensemble variance at the model top, which can ultimately lead to model failure. For each of the initialization times chosen for the HRH (given in Table 3), up to 72-h ensemble forecasts are produced by integrating the 96 ensemble analyses forward in time. Lateral boundary conditions for each ensemble member are generated using the FCP technique, where the perturbation scaling factor linearly increases with time such that a 48-h forecast has a perturbation standard deviation that is 55% larger than the 6-h value; this perturbation scaling is consistent with the RMS error in 48-h GFS forecasts (e.g., Torn and Hakim 2008a). The ensemble-mean lateral boundary condition is the comparable-time GFS forecast.
3. Overview of cases Prior to describing the performance of the ensemble analysis and forecast system, a short overview of the
Max intensity 140 kt, 0000 UTC 17 Jul
150 kt, 1800 UTC 28 Aug
75 kt, 0600 UTC 11 Sep
70 kt, 0000 UTC 20 Sep 155 kt, 0600 UTC 22 Sep
160 kt, 1200 UTC 19 Oct
150 kt, 0000 UTC 3 Sep
80 kt, 0600 UTC 13 Sep 40 kt, 1200 UTC 14 Sep 65 kt, 1200 UTC 26 Sep
10 TCs is provided (Fig. 1). The TCs and initialization times were chosen for the HRH test by NHC forecasters based on the skill of operational model forecasts and represent a variety of Atlantic basin TCs. Although some of these TCs have received considerable attention because of their impact on highly populated areas (i.e., Katrina, Rita, and Wilma), others were less notable (i.e., Philippe, Ingrid, and Karen). Most of the TCs formed in the main development region, moved westward along the southern periphery of the subtropical anticyclone, and turned poleward at some point in their life; others formed at relatively higher latitudes. The lifetime of the storms ranged from 48 h (Humberto) to approximately 10 days (Ophelia and Wilma). Unlike many other TC modeling studies that tend to focus on well-observed or high-impact TCs, these storms are characterized by a wide variety of intensities. In particular, 70% of the 6-hourly position fixes for these TCs are characterized by category 1 or less intensity on the Saffir– Simpson scale (maximum winds less than 42.7 m s21). Emily, Katrina, Rita, Wilma, and Felix all reached major hurricane status (at least category 3 on the Saffir–Simpson scale, maximum winds greater than 49.4 m s21) and were characterized by at least one rapid intensification period, defined using the Kaplan–DeMaria criteria [15.4 m s21
DECEMBER 2010
TORN
4379
FIG. 2. (a) RMS error in the ensemble-mean analysis (dark gray bar) and 6-h forecast (light gray bar) TC position as a function of besttrack intensity. The left (right) white bar gives the analysis (6-h forecast) observation innovation standard deviation. The number of verification times is given along the top. (b) Bias in the ensemble-mean analysis (circle) and 6-h forecast (square) TC position for all TCs. The range rings denote 5-km intervals. (c) As in (a), but for the TC minimum SLP. (d) As in (c), but for the bias (forecast 2 observation) in TC minimum SLP. (e),(f) As in (c),(d), but for the TC maximum wind speed. Error bars denote the 5% and 95% percentiles determined from bootstrap resampling.
(24 h)21; Kaplan and DeMaria (2003)]. Both Ophelia and Humberto met rapid intensification criteria but failed to reach major TC status despite favorable environmental conditions. The remaining storms (Philippe, Ingrid, and Karen) were subject to large environmental wind shear, but proved problematic for operational models. In addition, these six cycling periods captured a portion of 10 other TCs’ life cycles (2005: Dennis, Jose, Lee, Maria, Nate, Vince, and Alpha; 2007: Gabrielle, Jerry, and Lorenzo); most of these storms were weak and/or short lived. Except where noted, all verification statistics are computed with respect to the 10 HRH TCs listed in Table 1.
4. Cycling verification Ensemble analysis (i.e., posterior) and 6-h forecast (i.e., prior) data are verified against NHC best-track position and intensity estimates. It is important to note that the TC position and minimum SLP data used as verification are not the same as what is assimilated. These experiments assimilate the NHC advisory TC position and minimum SLP, which are real-time
estimates of these quantities. In contrast, the best-track data are based on a comprehensive analysis of all available data by a set of NHC specialists after the season is over (see Beven et al. 2010 for further information about best track). The relationship between advisory and besttrack data is akin to the prior forecast and analysis in data assimilation; therefore, these two pieces of information can be considered quasi-independent. Figure 2 shows the RMS error and observation innovation standard deviation in TC track, minimum SLP, and maximum wind speed as a function of the best-track TC category. If the ensemble is well calibrated, the RMS error and innovation standard deviation should match (e.g., Houtekamer et al. 2005). In all instances, the TC position and minimum SLP are determined using the lowest SLP in the vicinity of the storm, while the maximum wind speed is given by the maximum 10-m wind speed within 250 km of the storm center. The statistical significance of all results is determined using a bootstrap resampling method. In this implementation, each error distribution is resampled 10 000 times, whereby bootstrapping is performed. Only significance levels greater
4380
MONTHLY WEATHER REVIEW
FIG. 3. (a) Wilma minimum SLP for the ensemble member where the minimum SLP observation is (dashed) and is not (solid) assimilated as a function of lead time for the forecast initialized at 0000 UTC 20 Oct 2005. (b) The RMS value of the second derivative of the WRF dry air mass with respect to time within 500 km of the TC center.
than 90% are considered statistically significant hereafter. For all TC categories, the analysis TC position error is less than prior forecast value, which indicates that assimilating observations with an EnKF leads to systematic improvements in the RMS error and bias (Figs. 2a,b). Although this result may seem trivial because TC position observations are assimilated, TC position is not a state variable; thus, it indicates that the flow-dependent error statistics are consistently correcting the ensemble. In addition, the analysis and prior innovation standard deviation are statistically indistinguishable to the RMS error; thus, the ensemble position estimates contain the appropriate amount of variance. The position errors and variance are inversely proportional to the intensity of the storm, which is likely related to weak TCs having poorly defined centers. As a means of comparison, the mean absolute error in the posterior and prior TC position over all TC is 34 and 66 km, respectively (not shown); the latter value is roughly equivalent to the error in 12-h NHC position forecasts during the 2005 season (Beven et al. 2008). Unlike TC position, observation assimilation does not lead to consistent improvements to the TC minimum SLP. In general, the RMS error in the ensemble mean prior and posterior minimum SLP increases with the TC category; however, the innovation standard deviation does not increase proportionally, so the ensemble is not well calibrated for this quantity (Fig. 2c). The increased RMS error is accompanied by larger bias values, which
VOLUME 138
increase from less than 5 hPa for category 2 or weaker storms to 32 hPa for major TCs (Fig. 2d). This bias is likely related to the model’s inability to resolve the large gradients in the TC’s wind and mass field at this horizontal resolution (36 km). For tropical depressions (TDs) and tropical storms (TSs), the RMS error in the posterior is smaller than the prior, which suggests that observation assimilation is systematically reducing the error in TC minimum SLP for these storms. Although the EnKF system attempts to assimilate the TC minimum SLP observation at every time, this observation is often rejected by the quality control algorithm (described in section 2) because the resolution-related model biases lead to large observation innovations. For nonmajor TCs, the TC minimum SLP observation is assimilated 93% of the time, compared to 25% of the time for category 3 and above TCs. It may seem tempting to force the assimilation system to use this observation; however, this does not necessarily produce the desired effect of making the TC more intense in the subsequent forecast. Figure 3 shows the evolution of two 6-h Wilma forecasts: one where the TC minimum SLP observation is rejected and another where the minimum SLP observation is forced to be assimilated. Assimilating the minimum SLP observation leads to a 20-hPa decrease in the 0-h TC minimum SLP, yet by 4 h into the forecast the difference reduces to less than 2 hPa. During the first 1 h of the simulation, the forecast where the minimum SLP observation is assimilated is characterized by a 30% increase in acoustic and gravity wave activity, measured by the RMS value of the second derivative of the dry airmass field averaged within 500 km of the TC center (Fig. 3b). This result suggests that the model ‘‘rejects’’ this minimum SLP observation and returns to an intensity that this grid resolution can resolve. Moreover, it appears that the quality control algorithm acts as a crude way of preventing the assimilation system from using observations the model cannot represent. Similar to TC minimum SLP, the prior and posterior TC maximum wind speed estimates suffer from large biases for stronger TC and a large mismatch between the ensemble-mean error and spread. The RMS error (bias) in the analysis maximum wind speed increases from 4.0 (0.5) m s21 for TD–TS to 26 (22) m s21 for major TCs; at all intensities, assimilation does not improve this metric (Figs. 2e,f). Over all TCs, the prior and posterior mean absolute error (8.0 m s21) is equivalent to the error in 48-h NHC intensity forecasts during the 2005 season (Beven et al. 2008). For major TCs, it appears that observation assimilation leads to lower bias in the posterior compared to the prior; however, this result is an artifact of using a finite ensemble to compute error covariances. Sampling errors lead to spurious positive and negative
DECEMBER 2010
changes to all fields when observations are assimilated (e.g., Fig. 8 of Torn and Hakim 2009). Over large horizontal areas, these spurious increments are expected to average out to 0; however, since the TC maximum wind speed is defined as the maximum 10-m wind speed at any single grid point, a spurious increase in the 10-m wind components will manifest as an increase in the TC maximum wind speed in the analysis compared to the prior forecast. This idea is further supported by looking at the maximum wind speed innovation standard deviation, which is larger in the posterior, compared to the prior. Prior to TC genesis, most modeling systems, including the EnKF described here, do not use any special observations or data assimilation techniques in the vicinity of pre-genesis systems. As a consequence, the 6-h forecasts valid at genesis time (defined from NHC best-track data) provide a good opportunity to evaluate how this EnKF system handles pre-genesis systems relative to GFS.1 The mean-absolute error in the prior ensemblemean position forecast for the 17 storms that undergo genesis during these six periods is 154 km, compared to 186 km for the GFS2; this difference is statistically significant at the 90% confidence level. For 13 of the 17 storms, the difference between the WRF EnKF and GFS position is less than 50 km; however, the WRF EnKF position error for Rita, Felix, Jose, and Lorenzo is at least 100 km smaller than the GFS value. This result suggests that the EnKF assimilation may have more skill for pregenesis systems, though the sample size is small.
5. Forecast verification In this section, WRF EnKF TC track and intensity forecasts are evaluated against NHC best-track data during the 69 HRH initialization times listed in Table 3; the number of forecasts available at each lead time is given in Table 4. For comparison, errors in the equivalenttime official NHC forecast, the NCEP GFS, the Geophysical Fluid Dynamics Laboratory (GFDL) hurricane model (Bender et al. 2007), the NCEP Hurricane WRF (HWRF; Rappaport et al. 2009), the Navy Operational Global Atmospheric Prediction System (NOGAPS; Peng et al. 2004), the Climatology and Persistence (CLIPER) statistical and climatological model (Aberson 1998), and the Statistical Hurricane Intensity Forecast (SHIFOR) model (Knaff et al. 2003) are also computed; all data are obtained from NHC. The latter two forecasts are often used as a benchmark for the skill of TC position and intensity forecasts, respectively (e.g., Beven et al. 2008). 1
4381
TORN
Best-track data are not available prior to genesis, so it is difficult to verify prior forecasts from earlier times. 2 This is evaluated from available 18 gridded data.
TABLE 4. Number of TC forecasts as a function of category and lead time (h).
TD and TS Categories 1 and 2 Categories 3, 4, and 5 All
00
12
24
36
48
72
35 15 19 69
29 20 20 69
32 14 22 68
26 18 21 65
24 14 20 58
19 10 18 47
To provide a fair comparison between these deterministic forecasts and the WRF EnKF system, it is necessary to use a single forecast from the WRF EnKF system. The obvious choice would be to use the ensemblemean analysis; however, because of the variability in the 0-h TC position, this analysis contains an overly smoothed depiction of the TC mass and wind fields. As a consequence, the TC minimum SLP and maximum winds within the ensemble-mean analysis are often weaker than any individual ensemble member (not shown); thus, a forecast initialized from this analysis requires a larger ‘‘spinup’’ time compared to individual members. Instead of comparing the operational deterministic forecasts against a WRF forecast initialized from the ensemble-mean analysis, a single-member (SM) of the WRF EnKF analysis ensemble is chosen for the WRF deterministic forecast. Since all members of an EnKF ensemble are equally likely estimates of the atmospheric state, there are a number of ways to determine which ensemble member to use. Given that this study concerns TC forecasting, the SM initial condition is chosen based on how ‘‘close’’ the analysis member’s TC position and minimum SLP values are to the ensemble-mean values. The single-member analysis, which is determined independently for each initialization time, minimizes the cost function: "
#2 " #2 f (xai ) f (xa ) g(xai ) g(xa ) 1 Ji 5 sdist sdist " #2 h(xai ) h(xa ) 12 , smslp
(1)
where xai is the ith member analysis state vector; f, g, and h are scalar functions that return the TC latitude, longitude, and minimum SLP, respectively; and the overbar indicates the ensemble-mean value (i.e., the mean over all ensemble members). Finally, sdist and smslp are the ‘‘climatological’’ TC position (15 km) and minimum SLP (4 hPa) standard deviation, respectively, which are determined by averaging the analysis standard deviation over the 69 initialization times. Dividing by the standard deviation normalizes these quantities and prevents one metric from dominating another because of intrinsic variability.
4382
MONTHLY WEATHER REVIEW
VOLUME 138
a. Track Figure 4a shows the mean absolute track errors as a function of forecast hour. Over all times, the GFS, GFDL, HWRF, and NHC official forecasts have similar values, while the SM and NOGAPS forecasts have roughly 12–18 h less skill relative to these models (i.e., the 60-h SM track error is equivalent to the 72-h NHC track error). Moreover, the 72-h SM track error is roughly equivalent to the 72-h NHC official forecast track error during the 2005 season (289 km; Beven et al. 2008) and is 31% lower than the CLIPER model error. It is worth noting that the SM track error increases faster than the other dynamical models during the first 36 h but is similar thereafter; the reason for this discrepancy will be explored later. Although the SM forecast appears to have lower skill than other operational systems, there are several individual forecasts for which this is not the case. For 15 (17) of the 69 times, the 48-h SM position error is at least 50 km smaller (100 km larger) than the NHC official forecast error (not shown). The SM forecast has lower errors during the early forecasts of Emily and Katrina and later forecasts of Rita. In contrast, the SM forecast is significantly worse than NHC during cases of weak steering flow (i.e., early Ophelia, Rita, and Wilma forecasts) or when Emily and Felix are within 48 h of making landfall in Central America. The previous result suggests that the SM forecast may suffer from persistent track biases. This possibility is evaluated by computing the position bias as a function of forecast lead time (Fig. 4b). The SM position bias is consistently directed northwest of the best-track position and the magnitude increases with time, reaching 130 km by 48 h; this position bias equates to a 0.75 m s21 bias in TC motion. In contrast, the position bias for other models is generally less than 70 km, but is also directed to the
FIG. 4. (a) Mean absolute error in TC position for the WRF EnKF SM (red line), NHC official (green), NCEP GFS (blue), GFDL (cyan), HWRF (magenta), NOGAPS (gray), and the CLIPER (black) forecasts as a function of forecast hour for the 69 forecast initialization times listed in Table 3. The number of verification times at each lead time is given in Table 4. (b) The position bias (forecast 2 verification) relative to the best track in the earth-relative framework. (c) The position bias in a reference frame relative to the best-track TC motion. Error bars in (a) denote the 5% and 95% percentiles determined from bootstrap resampling. Both the zonal distance and along-track bias in the SM forecast are statistically different from 0 at all lead times. The range rings in (b) and (c) denote 50-km intervals, while the dots denote the bias each 12 h.
DECEMBER 2010
TORN
northwest. To determine whether there is a systematic problem in TC motion, the position biases are recomputed in a natural coordinate framework relative to the best-track motion at the verification time, where positive across (along) track is to the right of (faster than) the observed TC (Fig. 4c). Whereas the bias in other models is generally less then 30 km, the SM position bias is 80 km in the along-track direction by 48 h, which indicates the TCs are propagating too fast. These large track biases suggest there is a systematic error in the WRF EnKF initial conditions or in the WRF model itself. There are several possible explanations for this bias including, but not limited to, errors in the TC steering wind or errors in TC size, which can result in excessive advection of planetary vorticity (i.e., beta drift; e.g., Holland 1983). To determine whether there are systematic errors in the WRF EnKF steering wind, the difference between the GFS and WRF EnKF analyses is computed. For both the GFS analysis and WRF EnKF ensemble-mean analysis, the time-average alongtrack component of the wind as a function of pressure level is computed by averaging over all horizontal grid points within 300 km of the model TC center and the 69 initialization times. The horizontal averaging is meant to remove the TC circulation from the total wind field. Although it is highly unlikely that the GFS analysis steering winds are perfect, systematic differences between the WRF EnKF and GFS values likely reflect problems in the WRF EnKF analysis because the GFS forecast has lower TC track errors. Between 300 and 850 hPa, the WRF EnKF analysis along-track component of the wind is higher than the GFS value (Fig. 5); the largest differences are in the 400–500-hPa layer (1.2 m s21). Previous studies have shown that TC motion correlates strongly with the 500- and 700-hPa winds (e.g., Chan and Gray 1982); therefore, this result could explain the positive along-track bias in the SM forecast. It is possible that the positive along-track wind bias could be related to the TC structure or that it reflects a large-scale bias within the model. The latter possibility is evaluated by comparing the WRF EnKF ensemblemean analysis 500-hPa wind field averaged over the 315 analysis times at least 48 h after the ensemble is initialized and comparing it to GFS analysis at the same times (Fig. 6). On average, the easterly winds equatorward of 258N are 0.5 m s21 too strong in the WRF EnKF analysis, with differences of up to 2 m s21 over the Pacific Ocean south of Mexico; this latter bias probably explains the large position bias in later Emily and Felix forecasts. Moreover, there is a positive meridional wind bias off the east coast of the United States and a 0.4 m s21 westerly wind bias north of 308N in the midlatitude westerlies. This figure suggests that the winds associated with the
4383
FIG. 5. Difference between the 0-h WRF EnKF ensemble mean and 0-h GFS along-track component of the wind averaged within 300 km of the TC position for the 69 forecast initialization times listed in Table 3. Error bars denote the 5% and 95% percentiles determined from bootstrap resampling.
subtropical high are stronger in the WRF EnKF analysis, particularly west of 508W. Along the eastern portion of the high, the lateral boundary conditions strongly constrain the model; thus, the biases are smaller there. For most of the HRH forecasts, the 0-h TC position is collocated with regions where the 500-hPa wind bias is greater than 0.4 m s21 (cf. Figs. 1 and 6), so much of the along-track position bias is likely related to this large-scale wind bias. Returning to Fig. 5, it is worth noting that the alongtrack wind bias is inversely proportional to the number of satellite wind observations near TCs (not shown). This result implies that assimilation of satellite winds could be reducing the wind bias where there are observations, but has limited impact in the vertical column. To explore this possibility, the time-average linear regression between the 200 (850)-hPa zonal wind for a grid point south of Hispaniola (denoted by the X in Fig. 6) and the zonal winds in the column is computed from 6-h WRF EnKF data (Fig. 7). In essence, this calculation shows the average change in the column zonal wind when either a 200- or 850-hPa zonal wind observation is assimilated; these levels correspond to the maximum in infrared and visible satellite motion vectors, respectively. For both 200 and 850 hPa, the wind increment extends roughly 200 hPa above and below the observation location, so wind data at either level have little impact between 400 and 600 hPa. As a consequence, the large along-track wind bias in the midtroposphere cannot be overcome by assimilating satellite winds alone.
4384
MONTHLY WEATHER REVIEW
VOLUME 138
FIG. 6. Difference between the 0-h WRF EnKF ensemble-mean and 0-h GFS 500-hPa wind for the 315 analysis times at least 48 h after the ensemble initialization times in Table 1 (vectors). The gray wind barbs are the time-mean 0-h GFS winds (kt). Latitude and longitude lines are shown every 108.
Finally, the possibility that the position bias results from excessive beta drift is evaluated by verifying the SM TC 34-kt wind radius in all four quadrants (i.e., the largest radius at which a 34-kt wind exists in the northeast, northwest, southeast, and southwest quadrants) against best-track values; this metric is used as a proxy for TC size. Although the best-track wind radii can have large errors (e.g., Moyer et al. 2007), systematic differences between the model and observations should be indicative of problems in model TC structure. Figure 8a indicates that the SM wind radii error increases from 44 km at 0 h to 86 km at 72 h, which is roughly 20 km higher (lower) than the NHC (GFDL) forecasts, while the improvement relative to HWRF is statistically significant during the first 36 h. In addition, the SM forecast has a nearly zero bias, while the HWRF and GFDL TC wind radii are 20 and 60 km larger than best-track data, respectively. Nevertheless, this result suggests that the TC size estimates are comparable to or better than other operational models, even though no direct information about the TC wind profile is included in the assimilation system. As a consequence, it appears unlikely that the SM forecast is suffering from excessive beta drift.
b. Intensity Similar to TC track, the SM intensity forecasts appear to suffer from large model-related biases. Even though the EnKF system assimilates TC minimum SLP, both the TC minimum SLP and maximum wind errors are at least twice as large as any other model (Figs. 9a,b). The SM intensity forecast error increases slowly with time; however, it is statistically worse than any other dynamical model prior to 72 h and is comparable to SHIPS beyond 36 h. A significant fraction of the SM intensity forecast error is related to the 14-hPa bias in minimum SLP and the 212 m s21 bias in maximum wind at nearly all lead times (Figs. 9c,d). By comparison, the NHC and GFDL maximum wind speed bias is less than 3 m s21 at all lead times, while the HWRF bias increases from 22 m s21 at 0 h to 29 m s21 by 72 h. Given the relatively coarse horizontal grid spacing in the SM forecast, the intensity bias are not surprising, particularly for major TCs. Recall from Figs. 2d,f that the 6-h TC intensity forecast bias is much smaller for nonmajor TCs, which suggests that this grid spacing can resolve weaker TCs. To determine whether the SM forecast has skill in these instances, the intensity forecast
DECEMBER 2010
TORN
4385
absolute 34-kt wind radii error is 48, 47, and 60 km for the SM, HWRF, and GFDL models, respectively; the GFDL result is statistically worse than both SM and HWRF. These results suggest that the overly broad wind field in the GFDL forecast may lead to overintensification at later times, although understanding how this happens is beyond the scope of this work.
c. Ensemble performance
FIG. 7. Time-mean linear regression between the 200- (solid) and 850-hPa (dashed) zonal wind at the X south of Hispaniola in Fig. 6 and the zonal wind at every other point in the column for the 69 initialization times listed in Table 3. Error bars denote the 5% and 95% percentiles determined from bootstrap resampling.
errors are computed only for lead times where the besttrack intensity is category 2 or less (Figs. 9e,f). In these cases, the SM TC minimum SLP and maximum wind speed errors are better than both the GFDL and SHIPS models from 24 h onward, but are statistically indistinguishable from HWRF and NHC. This result could be at least partially related to analysis TC structure errors for these storms. For the 30 initialization times at which the 0-h TC intensity is category 2 or less, the mean
The remainder of this section evaluates whether the ensemble contains the appropriate amount of variance by comparing the RMS error in ensemble-mean position and intensity forecasts with the innovation variance. Mismatches between these two quantities can indicate a lack of growing modes or reflect model biases. Note that since the ensemble spread should only explain the random error component, the forecast-hour-dependent bias is removed prior to computing the RMS error. Figure 10a shows that the RMS error and ensemble standard deviation are comparable at 0 h; beyond that, the RMS error increases at a higher rate, so that the 72-h ensemble standard deviation (157 km) is half of the RMS error (292 km). This large mismatch indicates that the ensemble track forecasts are spread deficient. The lack of spread in TC position is likely related to the mismatch between ensemble-mean error and variance in the large-scale dynamical fields in the tropics, such as the 500-hPa zonal winds (Fig. 10b). Previous studies have shown that ensemble TC track forecasts are characterized by a lack of variance, which is at least partially due to a lack of growing modes in the TC steering wind (e.g., Puri et al. 2001; Magnusson et al. 2008). To further determine whether this is true within the EnKF ensemble, the time-average perturbation
FIG. 8. (a) Mean absolute error and (b) bias in the TC 34-kt wind radii in all four quadrants with respect to best-track data as a function of forecast hour for the 69 initialization times in Table 3. The thick solid line denotes the SM forecast, the thin solid line is the NHC official forecast, the thick dotted–dashed line is the GFDL forecast, and HWRF is the thick dashed line. Error bars denote the 5% and 95% percentiles determined from bootstrap resampling.
4386
MONTHLY WEATHER REVIEW
VOLUME 138
FIG. 9. Mean absolute error in TC (a) minimum SLP and (b) maximum wind speed as a function of forecast hour for the 69 initialization times listed in Table 3. (c),(d) As in (a),(b), but for the bias. (e),(f) As in (a),(b), but for cases where the best-track intensity is category 2 or less. Error bars denote the 5% and 95% percentiles determined from bootstrap resampling.
steering wind magnitude is computed for each forecast initialization and lead time. The perturbation steering wind is found by calculating the 200–850-hPa layeraverage zonal and meridional wind component, averaging over all grid points within 300 km of the member’s TC position, subtracting the ensemble-mean value for that particular time, and averaging the magnitude of the resulting perturbation wind vectors over all members. Figure 11 shows that, on average, the perturbation wind
amplitude increases at all lead times, doubling in 48 h. Yamaguchi and Majumdar (2010) performed a similar calculation with the 500-hPa winds in two different operational ensembles. In the European Centre for Medium-Range Weather Forecasts (ECMWF) ensemble, the perturbation 500-hPa wind increases during the first 12 h but is roughly constant for longer lead times. In contrast, the magnitude of the perturbation wind remained fairly steady in the GFS ensemble at all forecast
DECEMBER 2010
TORN
4387
FIG. 10. (a) The RMS error in the ensemble-mean TC position (solid) and ensemble standard deviation (dashed) as a function of forecast hour. (b) As in (a), but for the 500-hPa zonal wind south of 308N and east of 958W verified against rawinsonde data. (c),(d) As in (a), but for the TC minimum SLP and maximum wind, respectively. In (c) and (d), the thick lines refer to all forecasts, while the thin lines are for lead times where the best-track intensity is category 2 or less. Error bars denote the 5% and 95% percentiles determined from bootstrap resampling.
hours. Overall, this suggests that, unlike the NCEP and ECMWF ensembles, the EnKF ensemble has consistent error growth in the perturbation steering wind, even though the ensemble track forecasts are spread deficient. Even after removing the bias in TC minimum SLP and maximum wind speed forecasts, the ensemble still does not contain the appropriate variance for these two metrics, even at 0 h (Figs. 10c,d). Whereas the RMS error in these two quantities increases by roughly 25% during the 72-h forecast, the ensemble standard deviation decreases during the first 12 h and then slowly increases thereafter. Given that this ensemble suffers from smaller biases for nonmajor TCs, the RMS error and standard deviation are recomputed for only those times. When only these lead times are considered, the RMS error is reduced by 30% relative to considering all TCs, while the standard deviation is indistinguishable. As a consequence, the mismatch between the
ensemble error and variance lack is not solely attributable to resolution-related biases.
6. Control forecasts Although the verification in the previous section suggests that the WRF EnKF track and intensity forecasts are not as skillful as operational models, it is difficult to determine whether this is a consequence of the initial conditions or model formulation. To address these two possibilities, all 69 forecasts are repeated where WRF is initialized with either GFS (WRFGFS) or GFDL (WRF-GFDL) initial conditions, but with the same lateral boundary conditions and model formulation. Differences between the WRF-GFS or WRF-GFDL and SM forecast result solely from initial condition errors, while differences between the GFS (GFDL) and WRF-GFS (WRF-GFDL) can be attributed to model errors.
4388
MONTHLY WEATHER REVIEW
VOLUME 138
(WRF-GFS) 34-kt wind radii error is 10 (20) km larger than SM.3 Furthermore, the bias in WRF-GFS and WRF-GFDL wind radii forecasts is 230 km at 12 h, but subsequently decreases in magnitude, which indicates that the TC wind field in these forecasts is broadening with time.
7. Summary and conclusions
FIG. 11. Perturbation 200–850-hPa layer-average wind speed averaged over all grid points within 300 km of the TC center and all ensemble members as a function of lead time for the 69 initialization times listed in Table 3.
Figure 12 shows that the SM, WRF-GFS, and WRFGFDL track errors are similar during the first 36 h, while the SM has lower errors thereafter. By 72 h, the SM position error is 305 km, compared to 370 (380) km for GFDL (GFS) initial conditions. In addition, the WRF-GFS (WRF-GFDL) track error is 54% (65%) larger than the GFS (GFDL) model (cf. Fig. 4), which indicates that this configuration of WRF has nontrivial model errors. Moreover, the WRF-GFDL (WRF-GFS) has a 105 (82)-km westward position bias by 48 h (not shown). The larger error in WRF-GFS and WRF-GFDL TC position forecasts is likely a symptom of errors in the large-scale wind fields in the tropics. Figure 13 shows that the WRF-GFS and WRF-GFDL 500-hPa vector wind errors are similar to the SM forecast during the first 36 h but become larger thereafter. For both TC minimum SLP and maximum wind speed, the SM forecast has similar skill to the GFS and GFDL initialized forecasts, even though the WRF EnKF initial conditions have lower intensity errors at 0 h (Figs. 12b,c). Moreover, the minimum SLP and maximum wind speed biases are similar for both models after initialization (not shown). This result suggests that TC intensity forecasts on this domain are limited by model-related errors, and in particular, the grid resolution. Finally, the SM forecast appear to provide a better estimate of TC size compared to the other two WRF simulations (Fig. 12d). In general, the WRF-GFDL
This manuscript describes the performance of a cycling mesoscale EnKF system during six different periods that include the life cycle of 10 TCs during 2005 and 2007. This EnKF assimilates conventional in situ data, TC position, and minimum SLP every 6 h for a total of 91 days. For the 69 analysis times chosen for the HRH test, all 96 ensemble analyses are integrated forward up to 72 h and validated against best-track data. The resulting forecast errors are compared to operational models and WRF forecasts on the same domain initialized from other analyses. On average, observation assimilation improves the TC position; however, the same is true for TC minimum SLP only at TD and TS intensity. For major TCs, this configuration of WRF cannot resolve the actual TC minimum SLP or maximum wind speed, thus the TC minimum SLP observation is often rejected by the quality control algorithm during these analysis times. Although one could bypass the quality control algorithm for TC minimum SLP observations, supplemental tests indicate that forcing the EnKF to use the TC minimum SLP when it would otherwise be rejected has little impact beyond a few hours. While the initial intensity of the storm is closer to observations, the TC subsequently weakens once the model is integrated forward, resulting in larger acoustic and gravity wave activity. At all categories, the ensemble prior and posterior TC position estimates contain the appropriate amount of variance, while the TC intensity estimates are variance deficient at all intensities. Over the 69 initialization times used here, the SM position forecast has approximately 12 h less skill compared to other operational forecasts. A portion of the increased track errors can be attributed to biases in the subtropical anticyclone in the WRF EnKF analysis. As a consequence, TCs in the SM forecast move faster than the actual storm and the ensemble position forecasts are characterized by a lack of variance relative to the RMS error. At this point, it is not clear what the source of this wind bias is, although it could be related to a recently
3
The 0-h 34-kt wind radii error is not computed for the WRF GFS and WRF GFDL forecasts because the model does not calculate 10-m winds until after the first time step.
DECEMBER 2010
TORN
4389
FIG. 12. Mean-absolute error in (a) TC position, (b) minimum SLP, (c) maximum wind speed, and (d) 34-kt wind radii in each of the four quadrants as a function of forecast hour for the 69 initialization times listed in Table 3. The solid line is the WRF forecast initialized from an EnKF analysis, the dashed line is the WRF forecast initialized from the GFS analysis, and the dotted–dashed line is the WRF forecast initialized from the GFDL analysis. The thin lines in (d) indicate the bias. Error bars denote the 5% and 95% percentiles determined from bootstrap resampling.
discovered stratospheric temperature bias that results from the assumed temperature profile above the model top. Satellite wind observations, which are the only source of data in the free troposphere over the ocean, attempt to correct this problem; however, their impact is limited to within 200 hPa of the observation location and by relatively large observation errors. Other types of observations, such as global positioning system (GPS) refractivity profiles (e.g., Anthes et al. 2008), could provide additional data over the ocean but were not available during 2005. Moreover, most operational centers use satellite radiance data to improve analyses over the ocean, but these data are not used here because they would require sophisticated bias-correction algorithms (e.g., Dee 2005). Nevertheless, these results show that model biases seemingly unrelated to TCs can degrade the skill of TC forecasts. Although the model grid spacing is relatively coarse for TCs, this EnKF system provides skillful TC intensity
and size forecasts for weaker storms. In particular, the SM TC minimum SLP and maximum wind speed error is lower than GFDL forecasts for nonmajor TCs. It is not clear why the EnKF-initialized forecast has more skill in these situations, though it could be related to the analysis TC structure, including the TC size. For weak TCs, the EnKF initial conditions have lower 34-kt wind radii errors relative to GFDL and subsequently had smaller minimum SLP and maximum wind speed errors. In addition, TC initialization schemes assume a vertically aligned vortex; however, this is not always appropriate because TCs subject to large vertical shear are often vertically tilted (e.g., Jones 1995; Black et al. 2002). In contrast, the EnKF does not place any such constraints on the TC structure and thus the storm does not have to adjust to the correct structure during the first few hours of the forecast; future work will evaluate the validity of this hypothesis. The ensemble intensity forecasts are characterized by a lack of variance compared to the ensemble-mean
4390
MONTHLY WEATHER REVIEW
VOLUME 138
Finally, the assimilation system described here uses a consistent set of model settings and observations; accordingly, the ensemble analysis and forecast data provide a unique dataset for studying TC predictability via ensemble-based sensitivity analysis (e.g., Hakim and Torn 2008; Torn and Hakim 2008a). Future work will use this output to determine the dynamical processes that limit the predictability of TC track and intensity forecasts during certain times.
FIG. 13. The RMS error in the 500-hPa vector wind field south of 308N and east of 958W verified against rawinsonde data for the 69 initialization times listed in Table 3. The solid line is the SM forecast, the dashed line is the WRF forecast initialized from the GFS analysis, and the dotted–dashed line is the WRF forecast initialized from the GFDL analysis. The error bars are smaller than the width of the line at all times.
Acknowledgments. I would like to thank Jeff Anderson, Chris Davis, Greg Holland, Chris Snyder (NCAR), Frank Marks, and Sim Aberson (AOML) for feedback on this work. Vijay Tallapragada (NCEP) made available HWRF forecasts from 2005. Satellite wind data were obtained from Chris Velden and Dave Stettner (CIMSS/ SSEC). Processed dropsonde data were provided by NOAA/Hurricane Research Division of AOML. Three anonymous reviewers helped to significantly improve this manuscript. This work is supported by the College of Arts and Sciences at the University at Albany, State University of New York and the NOAA Hurricane Forecast Improvement Project.
REFERENCES
error, which could result from resolution-related biases or from other sources, such as each member using the same prescribed sea surface temperature field. Similar to the atmosphere, there is uncertainty in both the ocean surface temperature and ocean dynamics, which is not accounted for here. At this point, it is unclear how to obtain an ensemble of ocean states without employing a coupled ocean model. In addition, one could also increase the variance in intensity forecasts by adding uncertainty in parameterized processes (e.g., Reynolds et al. 2008). The EnKF forecasts are also compared to WRF forecasts on the same domain initialized from the GFS and GFDL analyses. For TC track forecasts beyond 36 h and TC wind radii, the EnKF-initialized forecast has the lowest errors among these three forecasts, while TC track prior to 36 h and intensity forecasts are similar. It is not clear why the EnKF-initialized forecast has better wind radii forecasts because this system does not include any direct observations of the TC wind field. One possibility is that assimilating observations with an EnKF has an indirect benefit to wind radii forecasts and is the subject of future work. It is worthwhile to note that the GFS and GFDL-initialized WRF forecasts have larger errors relative to the parent model: this configuration of WRF is characterized by nontrivial model errors.
Aberson, S. D., 1998: Five-day tropical cyclone track forecasts in the North Atlantic basin. Wea. Forecasting, 13, 1005–1015. ——, 2002: Two years of operational hurricane synoptic surveillance. Wea. Forecasting, 17, 1101–1110. ——, 2008: Large forecast degradations due to synoptic surveillance during the 2004 and 2005 hurricane seasons. Mon. Wea. Rev., 136, 3138–3150. Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129, 2884–2903. ——, 2009: Spatially and temporally varying adaptive covariance inflation for ensemble filters. Tellus, 61A, 72–83. ——, T. Hoar, K. Raeder, H. Liu, N. Collins, R. Torn, and A. Arellano, 2009: The Data Assimilation Research Testbed: A community facility. Bull. Amer. Meteor. Soc., 90, 1283– 1296. Anthes, R. A., and Coauthors, 2008: The COSMIC/FORMOSAT3 mission: Early results. Bull. Amer. Meteor. Soc., 89, 313–333. Barker, D. M., W. Huang, Y. R. Guo, A. J. Bourgeois, and Q. N. Xiao, 2004: A three-dimensional variational data assimilation system for MM5: Implementation and initial results. Mon. Wea. Rev., 132, 897–914. Bender, M. A., I. Ginis, R. Tuleya, B. Thomas, and T. Marchok, 2007: The operational GFDL coupled hurricane–ocean prediction system and a summary of its performance. Mon. Wea. Rev., 135, 3965–3989. Beven, J. L., and Coauthors, 2008: Atlantic hurricane season of 2005. Mon. Wea. Rev., 136, 1109–1173. ——, L. A. Avila, E. S. Blake, E. S. Cobb, and R. J. Pasch, 2010: Comments on ‘‘Structure and evolution of a possible U.S. landfalling tropical storm in 2006.’’ Mon. Wea. Rev., 138, 279–281.
DECEMBER 2010
TORN
Black, M. L., J. F. Gamache, F. D. Marks Jr., C. E. Sumsury, and H. E. Willoughby, 2002: Eastern Pacific Hurricanes Jimena of 1991 and Olivia of 1994: The effect of vertical wind shear on structure and intensity. Mon. Wea. Rev., 130, 2291– 2312. Chan, J. C. L., and W. M. Gray, 1982: Tropical cyclone movement and surrounding flow relationships. Mon. Wea. Rev., 110, 1354–1374. Chen, Y., and C. Snyder, 2006: Initializing a hurricane vortex with an ensemble Kalman filter. Preprints, 27th Conf. on Hurricanes and Tropical Meteorology, Monterey, CA, Amer. Meteor. Soc., 8A.5. [Available online at http://ams.confex.com/ ams/27Hurricanes/techprogram/paper_108770.htm.] ——, and ——, 2007: Assimilating vortex position with an ensemble Kalman filter. Mon. Wea. Rev., 135, 1828–1845. Chou, K.-H., and C.-C. Wu, 2008: Typhoon initialization in a mesoscale model-combination of the bogused vortex and the dropwindsonde data in DOTSTAR. Mon. Wea. Rev., 136, 865–879. Davis, C., and Coauthors, 2008: Prediction of landfalling hurricanes with the Advanced Hurricane WRF model. Mon. Wea. Rev., 136, 1990–2005. ——, W. Wang, J. Dudhia, and R. Torn, 2010: Does increased horizontal resolution improve hurricane wind forecasts? Wea Forecasting, 25, 1826–1841. Dee, D. P., 2005: Bias and data assimilation. Quart. J. Roy. Meteor. Soc., 131, 3323–3343. Dirren, S., R. D. Torn, and G. J. Hakim, 2007: A data assimilation case-study using a limited-area ensemble Kalman filter. Mon. Wea. Rev., 135, 1455–1473. Dowell, D. C., F. Zhang, L. J. Wicker, C. Snyder, and N. A. Crook, 2004: Wind and temperature retrievals in the 17 May 1981 Arcadia, Oklahoma, supercell: Ensemble Kalman filter experiments. Mon. Wea. Rev., 132, 1982–2005. Dudhia, J., 1989: Numerical study of convection observed during the Winter Monsoon Experiment using a mesoscale twodimensional model. J. Atmos. Sci., 46, 3077–3107. Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723–757. Hakim, G. J., and R. D. Torn, 2008: Ensemble synoptic analysis. Synoptic-Dynamic Meteorology and Weather Analysis and Forecasting: A Tribute to Fred Sanders, Meteor. Monogr., No. 55, Amer. Meteor. Soc., 147–161. Holland, G. J., 1983: Tropical cyclone motion: Environmental interaction plus beta effect. J. Atmos. Sci., 40, 328–342. Hong, S. Y., J. Dudhia, and S. H. Chen, 2004: A revised approach to ice microphysical processes for the bulk parameterization of clouds and precipitation. Mon. Wea. Rev., 132, 103–120. ——, Y. Noh, and J. Dudhia, 2006: A new vertical diffusion package with an explicit treatment of entrainment processes. Mon. Wea. Rev., 134, 2318–2341. Houtekamer, P. L., H. L. Mitchell, G. Pellerin, M. Buehner, M. Charron, L. Spacek, and B. Hansen, 2005: Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations. Mon. Wea. Rev., 133, 604–620. Jones, S. C., 1995: The evolution of vorticies in vertical shear. I: Initially barotropic vorticies. Quart. J. Roy. Meteor. Soc., 121, 821–851. Kain, J. S., and J. M. Fritsch, 1990: A one-dimensional entraining detraining plume model and its application in convective parameterization. J. Atmos. Sci., 47, 2784–2802.
4391
Kaplan, J., and M. DeMaria, 2003: Large-scale characteristics of rapidly intensifying tropical cyclones in the North Atlantic basin. Wea. Forecasting, 18, 1093–1108. Knaff, J. A., M. DeMaria, B. Sampson, and J. M. Gross, 2003: Statistical, 5-day tropical cyclone intensity forecasts derived from climatology and persistence. Wea. Forecasting, 18, 80–92. Kurihara, Y. M., A. Bender, R. E. Tuleya, and R. J. Ross, 1995: Improvements in the GFDL hurricane prediction system. Mon. Wea. Rev., 123, 2791–2801. Li, J., and H. Liu, 2009: Improved hurricane track and intensity forecast using single field-of-view advanced IR sounding measurements. Geophys. Res. Lett., 36, L11813, doi:10.1029/ 2009GL038285. Liu, Q., T. Marchok, H.-L. Pan, M. Bender, and S. J. Lord, 2000: Improvements in hurricane initialization and forecasting at NCEP with global and regional (GFDL) models. Tech. Rep., NOAA Tech. Procedures Bull. 472, 7 pp. Magnusson, L., M. Leutbecher, and E. Kallen, 2008: Comparison between singular vectors and breeding vectors as initial perturbations for the ECMWF Ensemble Prediction System. Mon. Wea. Rev., 136, 4092–4104. Meng, Z., and F. Zhang, 2008: Test of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part IV: Comparison with 3DVar in a month-long experiment. Mon. Wea. Rev., 136, 3671–3682. Mlawer, E. J., S. J. Taubman, P. D. Brown, M. J. Iacono, and S. A. Clough, 1997: Radiative transfer for inhomogeneous atmosphere: RRTM, a validated correlated-k model for the long-wave. J. Geophys. Res., 102, 16 663–16 682. Moyer, A. C., J. L. Evans, and M. Powell, 2007: Comparison of observed gale radius statistics. Meteor. Atmos. Phys., 97, 41–55. Peng, M. S., J. A. Ridout, and T. F. Hogan, 2004: Recent modifications of the Emanuel convective scheme in the Navy operational global atmospheric prediction system. Mon. Wea. Rev., 132, 1254–1268. Puri, K., J. Barkmeijer, and T. N. Palmer, 2001: Ensemble prediction of tropical cyclones using targeted diabatic singular vectors. Quart. J. Roy. Meteor. Soc., 127, 709–731. Rappaport, E. N., and Coauthors, 2009: Advances and challenges at the National Hurricane Center. Wea. Forecasting, 24, 395–419. Reynolds, C. A., J. Teixeira, and J. G. McLay, 2008: Impact of stochastic convection on the ensemble transform. Mon. Wea. Rev., 136, 4517–4526. Rogers, R., and Coauthors, 2006: The Intensity Forecasting Experiment. Bull. Amer. Meteor. Soc., 87, 1523–1537. Skamarock, W. C., J. B. Klemp, J. Dudhia, D. O. Gill, D. M. Barker, W. Wang, and J. G. Powers, 2005: A description of the Advanced Research WRF Version 2. Tech. Rep. 4681STR, National Center for Atmospheric Research, 88 pp. Szunyogh, I., E. J. Kostelich, G. Gyarmati, E. Kalnay, B. R. Hunt, E. Ott, E. Satterfield, and J. A. Yorke, 2008: A local ensemble transform Kalman filter data assimilation system for the NCEP global model. Tellus, 60, 113–130. Torn, R. D., and G. J. Hakim, 2008a: Ensemble-based sensitivity analysis. Mon. Wea. Rev., 136, 663–677. ——, and ——, 2008b: Performance characteristics of a pseudooperational ensemble Kalman filter. Mon. Wea. Rev., 136, 3947–3963. ——, and ——, 2009: Ensemble data assimilation applied to RAINEX observations of Hurricane Katrina (2005). Mon. Wea. Rev., 137, 2817–2829.
4392
MONTHLY WEATHER REVIEW
——, ——, and C. Snyder, 2006: Boundary conditions for limitedarea ensemble Kalman filters. Mon. Wea. Rev., 134, 2490–2502. Velden, C., and Coauthors, 2005: Recent innovations in deriving tropospheric winds from meteorological satellites. Bull. Amer. Meteor. Soc., 86, 205–223. Whitaker, J. S., T. M. Hamill, X. Wei, Y. Song, and Z. Toth, 2008: Ensemble data assimilation with the NCEP Global Forecast System. Mon. Wea. Rev., 136, 463–482. Yamaguchi, M., and S. J. Majumdar, 2010: Using TIGGE data to diagnose initial perturbations and their growth for tropical
VOLUME 138
cyclone ensemble forecasts. Mon. Wea. Rev., 138, 3634– 3655. Zhang, F., Z. Weng, Z. Meng, J. A. Sippel, and C. H. Bishop, 2009: Cloud-resolving hurricane initialization and prediction through assimilation of Dopper radar observations with an ensemble Kalman filter. Mon. Wea. Rev., 137, 2105– 2125. Zou, X., and Q. Xiao, 2000: Studies on the initialization and simulation of a mature hurricane using a variational bogus data assimilation scheme. J. Atmos. Sci., 57, 836–860.