3560
JOURNAL OF CLIMATE
VOLUME 16
Comparison of Modeled and Observed Trends in Indices of Daily Climate Extremes DMITRY KIKTEV Hydrometeorological Centre of Russia, Moscow, Russia
DAVID M. H. SEXTON, LISA ALEXANDER,
AND
CHRIS K. FOLLAND
Hadley Centre for Climate Prediction and Research, Met Office, Bracknell, Berkshire, United Kingdom (Manuscript received 10 July 2002, in final form 11 March 2003) ABSTRACT Gridded trends of annual values of various climate extreme indices were estimated for 1950 to 1995, presenting a clearer picture of the patterns of trends in climate extremes than has been seen with raw station data. The gridding also allows one, for the first time, to compare these observed trends with those simulated by a suite of climate model runs forced by observed changes in sea surface temperatures, sea ice extent, and various combinations of human-induced forcings. Bootstrapping techniques are used to assess the uncertainty in the gridded trend estimates and the field significance of the patterns of observed trends. The findings mainly confirm earlier, less objectively derived, results based on station data. There have been significant decreases in the number of frost days and increases in the number of very warm nights over much of the Northern Hemisphere. Regions of significant increases in rainfall extremes and decreases in the number of consecutive dry days are smaller in extent. However, patterns of trends in annual maximum 5-day rainfall totals were not significant. Comparisons of the observed trend estimates with those simulated by the climate model indicate that the inclusion of anthropogenic effects in the model integrations, in particular increasing greenhouse gases, significantly improves the simulation of changing extremes in temperatures. This analysis provides good evidence that human-induced forcing has recently played an important role in extreme climate. The model shows little skill in simulating changing precipitation extremes.
1. Introduction The Second Assessment Report of the Intergovernmental Panel on Climate Change (IPCC) concluded that, although several regions had experienced local changes in climate extremes, there were inadequate data to say anything about global changes (Houghton et al. 1995, 132–192 and 289–357). But given the potentially devastating effects of changing extremes, the Third Assessment Report made a concerted effort to obtain a more global picture of these climate indicators (Folland et al. 2001). Even so, the daily data required for such analyses are still far from globally complete, partly due to the reluctance of national meteorological services to part with such a potentially valuable commodity. Despite the problems, the most comprehensive studies so far, notably Groisman et al. (1999) and Frich et al. (2002), did find evidence for significant large-scale changes in temperature and precipitation extremes during the latter half of the twentieth century. Corresponding author address: David Sexton, Hadley Centre, Met Office, London Road, Bracknell, Berkshire, RG12 2SY, United Kingdom. E-mail:
[email protected]
q 2003 American Meteorological Society
These studies found that, in some parts of the world, sharp contrasts in local climate extremes between neighboring stations make it difficult to present coherent results. To overcome this problem, Frich et al. (2002) only analyzed a subset of stations that provided a relatively uniform network. An alternative approach, which we use here, is to interpolate the extremes data at all available stations onto a regular latitude–longitude grid. This has the additional advantage of enabling us to compare observed trends in extremes with climate model simulations of changing twentieth-century climate extremes and test the hypothesis proposed by Frich et al. (2002) that these changes were likely to be anthropogenic in origin. In this study, we use three ensembles of an atmosphere-only GCM forced by observed SSTs and a combination of anthropogenic forcings. Such model integrations can be used to test whether the observed changes can be simulated by observed SSTs alone (Folland et al. 1998). If the inclusion of anthropogenic effects in the atmosphere-only GCM (AGCM) is required to reproduce the patterns of observed trends with significant skill, then this provides good evidence that human influences have played an important role in recent climate
15 NOVEMBER 2003
3561
KIKTEV ET AL.
change. However, these experiments cannot be used to rule out a human-induced influence on climate because there is already an anthropogenic effect in the observed SSTs. First, we discuss the observational and model data and the method of gridding the observed data in section 2. Section 3 describes the methodology for estimating the uncertainty of observed and modeled trend estimates. The technique also improves upon Frich et al. (2002) in assessing whether the patterns of observed trends in extremes, as a whole, are significant by taking into account the spatial interdependence of extremes at the stations. This section also discusses the method of comparison of the observed and modeled trend patterns. Sections 4 and 5 discuss the results and the conclusions. 2. Data a. Derived indices from observed and modeled data We chose to analyze six indices from the updated dataset of Frich et al. (2002), omitting those which had poor coverage or very complex statistical properties. The ones chosen were as follows: • Frost days (FD): The annual number of days when the absolute minimum temperature is less than 08C. • Very warm nights (Tn90): The percentage of time in a year when daily minimum temperature is above the 90th percentile of the 1961–90 daily temperature distribution, calculated at the specific stations. Distributions are calculated from 5-day windows centered on each calendar day in the 1961–90 period. • Consecutive dry days (CDD): The annual maximum number of consecutive days when daily precipitation was less than 1 mm. • Simple daily intensity index (SDII): Total annual precipitation divided by the number of days with less than or equal to 1 mm of precipitation. • Maximum 5-day precipitation (R5D): The annual maximum consecutive 5-day precipitation total. • Wet days (R10): The number of days in a year with precipitation $ 10 mm. We chose to analyze the period from 1950 to 1995 due to a dramatic drop in the data coverage after 1995. Trends in the indices estimated over this period used the ordinary least squares method and required at least 35 years of data to be present. It should be noted that after this paper was accepted, some inconsistencies were discovered in the way different data centers had calculated R5D. Therefore, the trends have been recalculated for this indicator using the correct algorithm. Although this made little difference to our results, it has highlighted the pitfalls of exchanging indicators rather than raw data. Other indicators were not affected. Frich et al. (2002) analyzed four other indices: heatwave duration index (HWDI), extreme rainfall (R95T),
growing season length (GSL), and extreme temperature range (ETR). These have been omitted from our paper for a number of reasons. First, we found that there were inconsistencies in the way R95T was calculated by different national meteorological services and although this error was rectified before the Frich et al. paper went to press, it was too late for us to include it in our analysis. Second, GSL was omitted because it is not strictly a ‘‘global’’ indicator as it is biased toward midlatitudes. ETR was omitted because, although it may be indicative of anthropogenic climate change, there may also be several other physical reasons for changes in this index. We therefore deemed that ETR requires further investigation, which was beyond the scope of this paper, to show whether ETR is indeed a suitable indicator of anthropogenic climate change. Finally HWDI was omitted because this particular index can only take the values 0 (i.e., no heat waves in a year) or a number greater than 5 (at least six heat waves in a year), and as a result has very poor statistical properties. An unfortunate consequence of omitting HWDI is that we do not analyze any indicators based solely on maximum daily temperatures; such a study will have to wait until a new derived index of the raw daily data is available. b. Spatial statistical structure of observed derived climate indices A major part of this study was to place the station index data on a regular latitude–longitude grid. As one purpose of the gridding was to enable comparisons between observed and modeled data, we used the same grid as the climate model, which is 2.58 latitude by 3.758 longitude. We chose to grid the data in four distinct regions: Northern Hemisphere extratropics (308–858N), northern Tropics (08–308N), Australia (south from 128S), and South Africa (south from 208S for precipitation only). Before we gridded the station index data, we had to better understand the statistical structure of the derived climate indices. For each pair of stations within a region, the correlation between the values of each given selected index was calculated and this value was binned according to the distance between the two stations in intervals of 100 km. The region-mean correlation was estimated for each 100-km interval and an exponential decay function, c(l) [Eq. (1)] was fitted to these mean values using least squares estimation: c(l) 5 Ae 2l/L 0 .
(1)
Here l is the distance between stations, and L 0 is the e-folding distance called the decorrelation length scale. As l tends to zero, c(l), tends to some value A , 1. This value A measures the random observational errors and small-scale features that are independent of covariability between the stations. Separate estimates of A and L 0 were made for each of the four regions. Table 1 indicates a lack of coherence between neighboring stations for some precipitation indices, in par-
3562
JOURNAL OF CLIMATE
TABLE 1. Parameters A and L 0 (km) for spatial autocorrelation function [Eq. (1)] for annual derived extremes indices and their trends for Northern Hemisphere extratropics. Annual data
Trends
Index
A
L0
A
L0
FD Tn90 CDD SDII R5d R10
0.74 0.61 0.45 0.63 0.63 0.70
1277 837 534 273 219 468
0.81 0.86 0.84 0.83 0.82 0.87
854 929 663 562 537 620
ticular R5D and SDII. Therefore, we decided to grid trend estimates of the indices based on annual values rather than the annual values of the indices themselves, as we expected low-frequency changes in climate extremes to have larger spatial scales. To confirm this, we generated 1000 ‘‘plausible’’ fields of station index trend values using a bootstrap procedure (see the appendix). These 1000 fields provide 1000-point distributions of trends in indices at each station. The estimate of the decorrelation function, c(l), for station trends, was based on the correlations between the 1000-point trend distributions of each pair of stations, binned in 100-km intervals. Table 1 shows that the decorrelation length scale is mostly greater for trend index data than annual index data (FD is an exception); therefore, it is better to grid the trend data. We gridded the station trends using a modified version of Shepard’s angular-distance weighting (ADW) algorithm (Shepard 1984). New et al. (2000) advocated this method rather than other approaches because of its flexibility when gridding irregularly spaced station data. The appendix describes the gridding procedure in detail. c. Model experiments and derivation of indices The experimental design is based on the same principles used in earlier AGCM climate change studies (Folland et al. 1998; Sexton et al. 2001). The atmosphere only GCM, HadAM3 (Pope et al. 1999), is a gridpoint model with a horizontal resolution of 2.58 latitude by 3.758 longitude and 19 sigma levels in the vertical. One important parameterization scheme in HadAM3 is the two-stream Edwards–Slingo radiation scheme (Edwards and Slingo 1996), which resolves the individual radiative effects of the greenhouse gases. Ensembles of AGCM integrations were run so that individual ensemble members differed only in their initial atmospheric and land surface conditions. SSTNAT is the control experiment, forced by the observed record of SST and sea ice extent using version 3.1 of the Global Sea Ice and Sea Surface Temperature dataset (GISST3.1: much as described in Rayner et al. 1996). SSTNAT is a six-member ensemble, which includes the effects of two naturally occurring external forcings: changing solar output and stratospheric aerosols result-
VOLUME 16
ing from volcanic eruptions. GSOT, which has four members, is like SSTNAT but includes additional radiative forcings: increasing well-mixed greenhouse gases (G), direct sulphate aerosol effect (S) that increases the backscattering of incoming shortwave radiation, stratospheric ozone loss (O), and increasing tropospheric ozone (T). GSOTI includes the indirect aerosol effect (I), which increases cloud albedo, in addition to these other anthropogenic radiative forcings. Both the natural and anthropogenic forcings and how the time profiles of their global mean radiative forcing are estimated are described in detail in Johns et al. (2003). Indices were derived in the same way as the observed data for each model integration using daily precipitation and daily maximum and minimum temperature, which were calculated as the highest and lowest of the 48 halfhourly simulated values beginning at midnight. 3. Methodology a. Estimation of trend uncertainty Climate extreme indicators are generally non-Gaussian, which prohibits the use of standard theoretical results for the estimation of the uncertainties in their linear trends. To overcome this problem, we used a bootstrap technique (see the appendix) to estimate the statistical significance of the observed trend values at each grid point, thereby avoiding any erroneous assumptions that may invalidate the hypothesis tests. The bootstrapping procedure aims to use the observational data to construct a distribution of trends at each station that could have plausibly occurred in the real world. To provide a gridded pattern of plausible observed trends, the set of trend values at each station was gridded using the same combination of search radius R and decorrelation length scale L as the actual station trend estimates (see the appendix). A similar bootstrapping procedure was used to generate a set of 1000 modeled trend patterns for each ensemble mean to assess the significance of the simulated trends in climate extremes. To account for autocorrelation we used a bootstrap technique that resampled blocks of consecutive years rather than individual years (see the appendix). The same value of block length, C, was chosen for the model analysis. Values of C 5 2 were adequate for FD, R10, CDD, R5d, and Tn90 while values of C 5 3 were required for SDII indicator. The sensitivity of our results to the choice of C will be discussed in section 4b. b. Field significance of observed trends Livezey and Chen (1983) showed how to assess the collective statistical significance of a finite number of locally significant trends allowing for the interdependence between station data. Frich et al. (2002) did not account for interdependence between station data in determining whether there were an unusually large number
15 NOVEMBER 2003
KIKTEV ET AL.
of significant trend estimates for each extreme indicator that they analyzed. We use the set of 1000 patterns of observed trends produced by the bootstrapping to estimate field significance. The null hypothesis for this problem is that the pattern of trends estimated from the actual station data is due to climate noise. A suitable set of trends that could have arisen through natural variability alone was estimated by taking the difference between each of the 1000 patterns of trends produced in section 3a and the actual trend pattern. This new set of 1000 fields was used to estimate the 95% confidence interval for zero trend at each grid point. Using these confidence intervals we calculated the percentage area of significant grid points in each of the 1000 plausible patterns of observed trends from section 3a to form a distribution. If the percentage area of significant grid points in the actual trend pattern lies in the upper 5% tail of this distribution, then the actual trend estimates are field significant at the 5% level. c. Comparison of observed and modeled trend patterns To check that the modeled climate extreme indices were appropriate for a comparison with observations, we evaluated how HadAM3 simulated the observed natural variability of the climate extreme indices on interannual timescales. First, we gridded each observed climate extreme index for each year from 1949 to 1995. Then we used a Kolmogorov–Smirnoff test to compare the observed and modeled probability distribution functions (PDFs) of the detrended time series at each grid point where at least 15 stations had contributed to the gridded observed values. For the model, we used data from all six members of SSTNAT ensemble. To objectively compare the similarity between the patterns of observed and modeled trends, we took a similar approach to that of Folland et al. (1998). We used the bootstrapped fields produced in section 3a to estimate PDFs of measures of similarity between the patterns of observed and collocated modeled trends. Four measures of pattern similarity were used: • Centered pattern correlation: Area-weighted correlation of the series of gridpoint values from the modeled trends with the corresponding series of the observed trends. The global mean trend, which was calculated as the area-weighted mean of available trend values, was removed prior to the calculation in each case. • Congruence (or uncentered pattern correlation): As for the centered pattern correlation but the global mean trend was not removed prior to the calculation. • Regression: A regression of the series of modeled trend values against the series of observed trend values. Unlike the correlation statistics, this measure retains information about the relative magnitudes of the
3563
modeled and observed trend patterns. The appropriate global mean trend was removed from each series prior to the regression. • Amplitude: As for regression, but here the global mean trend was retained. For each ensemble mean, PDFs for the four measures of pattern similarity were estimated to compare the observed and modeled trends for each daily extreme indicator. To generate the PDF, we randomly selected a modeled and observed pattern of trends from the appropriate set of 1000 trend patterns, calculated the four measures of pattern similarity for these patterns, and repeated this procedure 2500 times. As in Folland et al. (1998), this procedure accounts for uncertainty due to natural climate variability and due to the fact that the patterns of observed and modeled trends are estimated from a finite number of realizations. For the two PDFs of correlation statistics, we centered the PDF on the value of the correlation between the actual patterns of observed and modeled trends to correct for bias in the bootstrapped estimates of the PDF. This was achieved by setting the median value of the PDF equal to the original correlation value in Fisher-Z space, and then transforming the modified PDF back to the original correlation space (Efron and Tibshiriani 1993). With a PDF for each ensemble mean, we tested various null hypotheses for each climate extreme indicator. First, we tested the null hypothesis that each ensemble mean had no skill in reproducing the pattern of observed trends; that is, its pattern similarity was zero. This null hypothesis was rejected at the 5% level if a zero measure of similarity fell within either the upper or lower 2.5% tail of the PDF. If more than 2.5% of the PDF was to the right of zero, then the ensemble mean showed skill at simulating the observed trends for the climate extreme indicator under consideration. The second test checked whether the pattern similarity was the same for a pair of ensemble means by seeing whether the estimate of pattern similarity for the actual patterns of modeled and observed trends fell within the PDF of the other ensemble. However, if this value was in the upper 2.5% of the other ensemble’s PDF, then the first ensemble showed a significant improvement in its reproduction of the observed trend patterns at the 5% level. We also checked that our results were robust to errors in HadAM3’s simulation of natural variability on the timescales used in the above analysis. First, we produced a new set of 1000 plausible patterns of model trends by perturbing the actual pattern of model trends with observed residual data. Then we repeat the analyses above with this new set of patterns of trend values. 4. Results a. Patterns of modeled and observed trends The observed pattern of trends in the number of frost days per year (Fig. 1a) was field significant at the 5%
3564
JOURNAL OF CLIMATE
VOLUME 16
FIG. 1. Trends per decade for 1950 to 1995 in the annual number of frost days per year for (a) observations, (c) SSTNAT, (e) GSOT, and (g) GSOTI; (b), (d), (f ), (h) as in (a), (c), (e), and (g) but for the percentage of time exceeding 90% threshold in minimum temperature defined for the period 1961–90. Black lines enclose regions where trends are significant at the 5% level.
level. There have been locally significant decreases in the number of frost days over the United Kingdom and much of Europe, Eurasia, western North America, and southern Australia during 1950–95. Indeed, there are no examples of locally significant increases in the number of frost days per year in the gridded observations. In contrast, the ensemble mean pattern of trends for SSTNAT (Fig. 1c) shows a significant increase in frost days in mid-Asia with a general increase over much of Eurasia. Of the significant observed regional reductions in frost days, SSTNAT only reproduces those in southern Australia and western Canada and Alaska. The
trends over North America are better simulated by GSOT and GSOTI (Figs. 1e,g), most likely because of land surface air temperature increases in the model due to increasing greenhouse gas concentrations (Sexton et al. 2003). Over Eurasia, there is a general reduction in frost days in GSOT but only a small proportion of these trends are significant at the 5% level; the significant increase in frost days in SSTNAT has been substantially reduced in GSOT. GSOTI shows similar changes to GSOT. However, GSOT and GSOTI underestimate the observed trends over Europe and Eurasia. This may be partly due to the simulation of recent changes in the
15 NOVEMBER 2003
KIKTEV ET AL.
North Atlantic Oscillation (NAO), which underestimates the positive trend in westerly circulation during winter. However, a regression of an NAO index onto the number of frost days (not shown) shows that this could only explain why the model underestimates trends in the number of frost days over Europe and far-east Asia, not central Asia. Southern Australia consistently shows a decrease in the number of frost days in all ensembles in good agreement with the observations, suggesting that oceanic forcing dominates this change. Figure 1b shows that in tropical West Africa and much of Europe, Russia, and southern Australia, there has been a significant observed increase in the days where the minimum temperature falls into the top 10% of values calculated from the period 1961–90; this is mostly due to warmer nights. (It is possible for a minimum temperature to be recorded by day in the extratropical winter but this is not very common.) By contrast, in Japan there have been significant decreases in the top decile of observed minimum temperatures for the period 1961–90. This observed pattern in trends is field significant at the 5% level. In all three ensembles, the trends are reproduced in southern Australia and tropical West Africa (see Fig. 1d), though not to the extent observed. As these trends are significant in all three ensembles, it implies that the higher minimum temperatures in these regions are mostly influenced by SST fluctuations. This also occurs over China but the modeled increase is not observed. The increase in warm nights over Eurasia is poorly simulated by SSTNAT, but the explicit inclusion of anthropogenic forcing (see Figs. 1f,h) improves the modeled trend patterns over western Russia and reproduces the general increase in the occurrence of warm nights over much of the Northern Hemisphere. However, over Europe GSOT and GSOTI both underestimate the observed trends. Figure 2 shows the results for the observed and modeled patterns of trends in the rainfall indices. The three ensemble means do not show many large regions of significant changes in rainfall indices, so only the trends for the ensemble mean of GSOTI (considered to be the most realistic ensemble mean) is plotted. Figure 2a shows the observed trend pattern for consecutive dry days, which is field significant at the 5% level. Although the regions of locally significant trends are small scale and scattered, they point toward a general reduction in consecutive dry days over much of Eurasia and parts of North America. There have also been locally significant increases in consecutive dry days over Japan and Southeast Asia. The modeled trend patterns are in reasonable agreement over North America but do not show many significant trends over Eurasia. The observed trend pattern for the number of wet days is also field significant at the 5% level. The main changes are an increased number of wet days over east Europe, the Sahel, and Japan, and over central and northeast North America and Alaska. The ensembles reproduce the observed Sahelian, Alaskan, and central
3565
North American trends but fail to simulate the changes over Europe. The trend pattern for simple daily precipitation intensity index (Fig. 2e) is field significant but only at the 10% level. The significant trends are scattered but there is a relatively large region over the eastern United States where there has been increased intensity in precipitation during 1950–95, in good agreement with the increase in the number of wet days there. The ensemble means show no such change and significant trends are sporadic. The observed trend map for maximum 5-day rainfall was not field significant and the ensemble means show very few significant points. b. The objective comparison of observed and modeled trend patterns First, the ability of the climate model to simulate natural variability of the climate extreme indices was evaluated. For FD and SDII, the Kolmogorov–Smirnoff tests showed very few statistically different grid points, indicating that HadAM3 reproduces the natural variability of these climate extremes well. In contrast, natural variability of R10 is poorly simulated over much of Asia. For the other three indices, the climate model generally reproduces their natural variability well, although there are several regions where this is not the case, especially in the Tropics and areas of sharp physiographic contrasts such as the Pacific coastline, Alaska, and Japan. Figure 3a shows the value for the centered pattern correlation, and its PDF, between the actual observed and modeled patterns of trends in the number of frost days for each ensemble mean. Zero correlation lies well within the PDFs for SSTNAT and GSOT so that these ensembles show no skill in simulating the observed trend pattern once the global mean has been removed. The inclusion of the indirect aerosol effect in GSOTI shows some skill at reproducing the centered observed trend pattern, as zero correlation lies within the first 2.5% of the PDF. However, the centered pattern correlation of the observed and GSOTI trend patterns (light gray dash on the x axis) lies within the PDF of SSTNAT. So the inclusion of indirect aerosol effect does not cause a significant improvement in pattern similarity. The regression statistic (Fig. 3c) supports this result. In contrast, the value for the congruence between GSOT and the observed trend pattern (when the global mean is retained) lies in the upper tail of the PDF for SSTNAT (see Fig. 3b). Also, zero congruence lies outside the PDF for GSOT. This shows that GSOT reproduces the observed trend pattern with skill and is significantly better than SSTNAT. This is mainly due to the inclusion of increasing concentrations of well-mixed greenhouse gases. As this occurred for the congruence and not for the centered pattern correlation, this implies that GSOT well simulates the global mean change in the number of frost days per year but not the geograph-
3566
JOURNAL OF CLIMATE
VOLUME 16
FIG. 2. Observed and modeled trends per decade for 1950 to 1995 for (a), (b) consecutive dry days; (c), (d) number of wet days per year; (e), (f ) simple daily intensity index; and (g), (h) maximum 5-day rainfall amount. Black lines enclose regions where trends are significant at the 5% level.
ical variation about the global mean. The amplitude (Fig. 3d) shows a similar result but provides the additional information that the magnitude of the GSOT trend pattern in frost days is not the same magnitude as the observed trend pattern (the amplitude would then equal 1), mainly because the model underestimates the observed trends over Europe and Eurasia (see above) but also partly because the model does not capture the correct anomalous pattern about the global mean. For the trends in the fraction of days above the 90% minimum temperature threshold, all three ensemble means show significant skill in reproducing the trend pattern both when the global mean is removed (Figs.
4a,c) and retained (Fig. 4b). The congruence and the amplitude both indicate that GSOT and GSOTI are significant improvements over SSTNAT. However, the amplitudes of GSOT and GSOTI do not overlap with one, indicating that these ensembles underestimate the magnitude of the observed trend pattern of 90% minimum temperatures. For the rainfall indices (not shown), only GSOTI showed skill at reproducing an observed trend pattern, and then only for the number of wet days. However, this may be due to deficiencies in modeling the interannual variability of R10 as described above. The sensitivity of the results to the choice of the
15 NOVEMBER 2003
KIKTEV ET AL.
3567
FIG. 3. PDF of pattern similarity statistics of model signals in observed trend pattern for the number of frost days per year: (a) centered pattern correlation, (b) congruence, (c) regression, (d) amplitude. Dashes on x axis indicate pattern similarity for ensemble mean trend patterns compared with the observed trend patterns. In (b) the dashes for GSOT and GSOTI are very close together.
length of the moving blocks, C, (see section 3a) was explored only for the number of frost days per year. We found that the PDFs were only affected when C 5 1; this is below our choice of C of 2 or 3 in the main calculation (see the appendix). Therefore, a larger choice of chunk length would not have affected the main results described above. We also found that the results were unaffected by the model’s natural variability on long timescales, by perturbing the actual pattern of model trends with observed residual data and repeating the analysis. 5. Conclusions The main aim of this study was to provide a rigorous and easily interpretable methodology for analyzing spatially distributed trends in climate extremes. The gridding of observed trend estimates at each station avoided the need to omit stations as done in Frich et al. (2002). To assess the statistical significance of the gridded trend estimates, we use a bootstrap procedure to estimate a distribution for the trend at each grid point. The distribution, and therefore the statistical inferences based on it, reflected uncertainty in the trend due to natural climate variability, which was allowed by our method to
be non-Gaussian. The bootstrap procedure also incorporated the interdependence between stations, which allowed us to test the field significance of the pattern of observed trends. The results largely confirm the earlier findings of previous studies. There has been a significant reduction in frost days and an increase in very warm nighttime temperatures. Also, our results confirm that there has been a significant decrease in the number of consecutive dry days, an increase in wet days .10 mm rainfall, and an increase in the simple daily precipitation intensity index. Our results show that the trend in maximum annual 5day rainfall is not field significant. This differs from Frich et al. (2002) but one reason may be because they used the earlier partially incorrect R5D data. Comparisons with the atmosphere-only GCM runs forced by prescribed oceanic forcing and human-induced effects show that explicit anthropogenic radiative forcing is required to reproduce observed changes in temperature extremes, particularly on large spatial scales; anthropogenic forcing through oceanic feedback processes contained in the observed SSTs is insufficient. This shows that human influences are an important component of changes in the number of frost days and warm nights. For precipitation extremes, there is no evidence
3568
JOURNAL OF CLIMATE
VOLUME 16
FIG. 4. PDF of pattern similarity statistics of model signals in observed trend pattern for the fraction of days above 90% minimum temperature threshold: (a) centered pattern correlation, (b) congruence, (c) regression, (d) amplitude. Dashes on x axis indicate pattern similarity for ensemble mean trend patterns compared with the observed trend patterns.
here that HadAM3 can detect an anthropogenic signal in rainfall indices. This does not eliminate the possibility that such a signal exists, as any effect may arise through the SSTs. Indeed, the transient climate change runs of Semenov and Bengtsson (2002) indicate that the characteristics of precipitation such as intensity should respond significantly to increased greenhouse gas concentrations. This study cannot show whether the field significant observed trends still may have a large contribution from multidecadal natural climate variability. Also, it is not possible to estimate the total anthropogenic effect on climate extremes from the model results, as the prescribed SSTs already contain some anthropogenic effect. To answer these questions a study based on data from transient coupled ocean–atmosphere GCM integrations that include anthropogenic radiative forcings is required. Furthermore, an analysis with a long unforced coupled model run could determine whether the observed changes are unusual in the context of multidecadal natural climate variability. The main limitations of this work arise from the observational dataset. A seasonal analysis was not possible, as the raw daily data were mostly not available. Also, only the Northern Hemisphere extratropics is reasonably well observed, with large gaps in the Tropics where extreme climate often has its most devastating
FIG. 5. Schematic of the angular-distance gridding procedure. Black dots represent stations and the long dashed lines represent the latitude–longitude grid. The distance, r i , and angle, u i , of the ith station are shown relative to the specified location.
15 NOVEMBER 2003
KIKTEV ET AL.
3569
impact. So there is a strong requirement for raw daily station data to be released worldwide and that these data are regularly updated. A suitable mechanism would be the new Global Climate Observing System (GCOS) global surface network of about 1000 well-distributed stations (Peterson et al. 1997). Acknowledgments. This work was funded by the U.K. Department of the Environment, Food, and Rural Affairs (Contract PECD/7/12/37) and the U.K. Government Meteorological Research Program. This paper is in British crown copyright. We thank Jesse Kenyon and Gabi Hegerl for discovering the inconsistencies in the R5D data. APPENDIX FIG. A1. Spatially averaged rms interpolation errors for trends in the number of frost days per year for the period 1950–95 plotted as a function of search radius and decorrelation length scale for the Northern Hemisphere extratropics.
Further Details of the Analysis a. Gridding the station trends in extreme indices The following gridding procedure is based on Shepard’s angular-distance weighting (ADW) algorithm (Shepard 1984), used for its flexibility when gridding irregularly spaced station data (New et al. 2000). For gridding, a trend value at a specified gridpoint location is estimated as a weighted sum of the trend values at n stations that fall within a search radius, R. If n , 3 then the gridded value is set to missing. We define the position of the ith station in terms of its distance, r i , and its angle to North, u i , relative to the specified location (see Fig. 5). The angular-distance weight for the ith station, wi , is defined by New et al. (2000) as
wi 5 e
11
2r i /L
Oe
2r k /L
k
[1 2 cos(u k 2 u i )]
Oe
2r k /L
k
,
i ± k.
(A1) The first e 2ri/L term weights the gridded trend in favor of stations close to the specified location using an exponential decay function with a decorrelation length scale, L. The terms in large brackets increase the weight if the station is isolated in an angular sense. Therefore the weights depend on the relative positions of the stations to the specified gridpoint location and two parameters: the search radius, R, and the decorrelation length scale, L. The search radius, which controls the number of stations used to estimate each gridpoint value, ultimately determines the coverage of the gridded dataset. However, we must not choose a value of R that is so large that unrepresentative stations are included in the gridding. To determine the best combination of R and L to grid data in each region, for each index we estimated a mean trend interpolation error over the region, which we aimed to minimize. This spatially averaged interpolation error effectively quantifies the sensitivity of our gridded trend values to the choice of R and L. For each index,
we use Eq. (A1) to estimate a trend value at each station location with the circle centered on the station of interest rather than a grid point; all stations within the search radius, R, are used except the central station itself. The root-mean-square difference between these estimates and the trends at the central station provides a trend interpolation error at that location for that combination of R and L for each index. The spatial mean of the trend interpolation error was estimated as a weighted average, which avoided biasing the estimate of the interpolation error toward parts of the region that had the most dense station coverage. We used Thiessen polygons to generate these weights (Thiessen 1911). This method determines the polygonal region around each station, which consists of the set of points that are closer to this station than to any other. The weight of each station value is proportional to the area of this polygon. It is possible to estimate the most appropriate combination of R and L for each grid box. However, the estimates would be prone to large sampling errors. Therefore, to obtain more robust estimates of R and L, we assumed that both of these were constant over the four regions defined earlier. The spatially averaged interpolation errors were estimated for all possible combinations of search radius, R, and decorrelation length scale, L, such that R 5 (100, 150, . . . , 650, 700) km and L 5 (100, 150, . . . , 1000) km. Figure A1 shows an example of mean interpolation error as a function of search radius and decorrelation length scale. This example is typical as the interpolation error is more sensitive to the search radius than the decorrelation length scale because the search radius controls the total number of observations included in the gridding. At some instances these figures revealed several local minima, which may be due to the configuration of the station network. In such cases, we also considered the coverage
3570
JOURNAL OF CLIMATE
of the gridded fields so that our final choice of the ‘‘optimal’’ combination of R and L was partly subjective. b. Assessment of uncertainty in trend estimates using a bootstrap approach Bootstrap techniques use repeated resampling of the available data to estimate the distributions required to make statistical inferences. The approach, which is typically very computationally intensive, is used when the data invalidate one or more of the underlying assumptions of standard statistical theory. In this study, the goal of the bootstrap is to estimate distributions of trends for each station from a set of N annual values. For a single station, we first estimate the linear trend from the N actual data points, keeping the residuals and setting aside the best-fit line for later use. A linear trend was only estimated if at least 35 yr of data were present at a station. The bootstrap technique randomly selects N values from the residuals using the method described below to form a time series of residual variations of same length, N. This randomly resampled set of residuals is added back onto the station’s best-fit line from the trend analysis of the actual data, and the trend is reestimated. Ordinary least squares estimation, which yields an unbiased estimate of the trend, was used so that the calculation was computationally inexpensive; this was possible as only the trend estimate and not the uncertainty was used in the bootstrapping. The bootstrapping procedure was repeated a 1000 times to produce a distribution of 1000 observed trends at each station. To deal with temporal correlation in the residuals, we use a method of resampling called ‘‘moving block bootstrap resampling’’ (Wilks 1997). This means that the data is resampled in time using blocks of consecutive years rather than single years so that a good proportion of the autocorrelation is taken into account (see Fig. 2 of Wilks 1997). The amount of temporal interdependence that is included in the bootstrap depends on the choice of the length C of the block used in the resampling. Wilks presents formulas for determining C for various statistical descriptions of autocorrelation. Here, we chose a first-order autoregressive process specified by a single lag-1 correlation parameter. To resample the N residual values for each iteration of the bootstrap, we randomly select the number of blocks, K of the N 2 C 1 1 possible blocks and join them together. If N is divisible by C, then K 5 N/C; if not, then K is set to the next highest integer greater than N/C, and the last KC 2 N values are discarded after the K blocks have been joined together. Missing data were also shuffled around so that the effects of missing data were captured in the bootstrap estimate of the uncertainty. As the variability of the station trends is spatially interdependent, the distributions of the trends for each station must be related to each other. To account for the spatial interdependence of the station data, we use the
VOLUME 16
same resampling sequence for each station within each iteration of the bootstrap. Therefore in effect, in each iteration, we estimated a plausible set of trend values by resampling from N whole fields of residuals. Consequently, we had to use the same value of block length C for all stations although in reality the degree of autocorrelation varies spatially. A global value is taken for C so that it is greater than or equal to the station value at a very large percentage of the stations, say 90%. The final product is 1000 plausible fields of station trends, which were then gridded in the same way as the actual observed trend values. The trend at each grid point was statistically significant at the 5% level if a zero trend fell within either the upper or lower 2.5% tail of its bootstrapped distribution. We also tested two alternative methods of determining trend significance: a Mann–Kendall test and a linear model that allowed for red noise. Both were modified so that their respective gridpoint standard errors were inflated using a formula by Kagan (1966) and Kagan (1997) to account for the finite number of stations used in each gridpoint estimate. The results from the bootstrapped significance tests were found to agree well with those from these two alternative methods. REFERENCES Edwards, J. M., and A. Slingo, 1996: Studies with a flexible new radiation code. I: Choosing a configuration for a large-scale model. Quart. J. Roy. Meteor. Soc., 122, 689–719. Efron, B., and R. J. Tibshiriani, 1993: An Introduction to the Bootstrap. Chapman and Hall, 436 pp. Folland, C. K., D. M. H. Sexton, D. J. K. Karoly, C. E. Johnson, D. P. Rowell, and D. E. Parker, 1998: Influences of anthropogenic and oceanic forcing on recent climate change. Geophys. Res. Lett., 25, 353–356. ——, and Coauthors, 2001: Observed climate variability and change. Climate Change 2001: The Scientific Basis. Contribution to Working Group to the Third Assessment Report of the Intergovernmental Panel to Climate Change, J. T. Houghton et al., Eds., Cambridge University Press, 99–181. Frich, P., L. V. Alexander, P. Della-Marta, B. Gleason, M. Haylock, A. M. G. Klein Tank, and T. Peterson, 2002: Observed coherent changes in climatic extremes during the second half of the 20th century. Climate Res., 19, 193–212. Groisman, P. Ya, and Coauthors, 1999: Changes in the probability of heavy precipitation: Important indicators of climatic change. Climatic Change, 42, 243–283. Houghton, J. T., L. G. Meira Filho, B. A. Collander, N. Harris, A. Kattenberg, and K. Maskell, Eds.,1995: Climate Change 1995: The Science of Climate Change. Cambridge University Press, 572 pp. Johns, T. C., and Coauthors, 2003: Anthropogenic climate change for 1860 to 2100 simulated with the HadCM3 model under updated emissions scenarios. Climate Dyn., 20, 583–612. Kagan, R. L., 1966: An evaluation of representativeness of precipitation data (in Russian). Proc. Main Geophys. Observatory, 191, 22–34. ——, 1997: Averaging of Meteorological Fields. L. S. Gandin and T. M. Smith, Eds. Kluwer Academic, 279 pp. (Translated from original Russian edition 1979 by U.K. Ministry of Defence Linguistic Services.) Livesey, R. E., and W. Y. Chen, 1983: Statistical field significance
15 NOVEMBER 2003
KIKTEV ET AL.
and its determination by Monte Carlo techniques. Mon. Wea. Rev., 111, 46–59. New, M., M. Hulme, and P. Jones, 2000: Representing twentiethcentury space–time climate variability. Part II: Development of 1901–96 monthly grids of terrestrial surface climate. J. Climate, 13, 2217–2238. Peterson, T., H. Daan, and P. Jones, 1997: Initial selection of a GCOS Surface Network. Bull. Amer. Meteor. Soc., 78, 2145–2152. Pope, V. D., M. L. Gallani, P. R. Rowntree, and R. A. Stratton, 1999: The impact of new physical paramerizations in the Hadley Centre climate model: HadAM3. Climate Dyn., 16, 123–146. Rayner, N. A., E. B. Horton, D. E. Parker, C. K. Folland, and R. B. Hackett, 1996: Version 2.2 of the Global Sea-Ice and Sea Surface Temperature data set, 1903–1994. Climate Research Tech. Note 74, Hadley Centre for Climate Prediction and Research, Met Office, 43 pp.
3571
Semenov, V. A., and L. Bengtsson, 2002: Secular trends in daily precipitation characteristics: Greenhouse gas simulation with a coupled AOGCM. Climate Dyn., 19, 123–140. Sexton, D. M. H., D. P. Rowell, C. K. Folland, and D. J. Karoly, 2001: Detection of anthropogenic climate change using an atmospheric GCM. Climate Dyn., 17, 669–685. ——, H. Grubb, K. P. Shine, and C. K. Folland, 2003: Design and analysis of climate model experiments for the efficient estimation of anthropogenic signals. J. Climate, 16, 1320–1336. Shepard, D. S., 1984: Computer mapping: The SYMAP interpolation algorithm. Spatial Statistics and Models, G. L. Gaile and C. J. Willmott, Eds., D. Reidel, 133–145. Thiessen, A. H., 1911: Precipitation averages for large areas. Mon. Wea. Rev., 39, 1082–1084. Wilks, D. S., 1997: Resampling hypothesis tests for autocorrelated fields. J. Climate, 10, 65–82.