Evaluation of Statistically Downscaled GCM Output as Input for ...

2 downloads 0 Views 3MB Size Report
412, Denver Federal Center, Lakewood, CO 80225-0046. E-mail address: [email protected]. Earth Interactions d. Volume 18 (2014) d. Paper No. 9 d. Page 1.
Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 1

Copyright Ó 2014, Paper 18-09; 67813 words, 18 Figures, 0 Animations, 3 Tables. http://EarthInteractions.org

Evaluation of Statistically Downscaled GCM Output as Input for Hydrological and Stream Temperature Simulation in the Apalachicola–Chattahoochee–Flint River Basin (1961–99) Lauren E. Hay,* Jacob LaFontaine, and Steven L. Markstrom U.S. Geological Survey, Lakewood, Colorado Received 26 December 2013; accepted 9 January 2014 ABSTRACT: The accuracy of statistically downscaled general circulation model (GCM) simulations of daily surface climate for historical conditions (1961–99) and the implications when they are used to drive hydrologic and stream temperature models were assessed for the Apalachicola–Chattahoochee– Flint River basin (ACFB). The ACFB is a 50 000 km2 basin located in the southeastern United States. Three GCMs were statistically downscaled, using an asynchronous regional regression model (ARRM), to 1/ 88 grids of daily precipitation and minimum and maximum air temperature. These ARRMbased climate datasets were used as input to the Precipitation-Runoff Modeling

* Corresponding author address: Lauren E. Hay, U.S. Geological Survey, Box 25046, MS 412, Denver Federal Center, Lakewood, CO 80225-0046. E-mail address: [email protected] DOI: 10.1175/2013EI000554.1

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 2

System (PRMS), a deterministic, distributed-parameter, physical-process watershed model used to simulate and evaluate the effects of various combinations of climate and land use on watershed response. The ACFB was divided into 258 hydrologic response units (HRUs) in which the components of flow (groundwater, subsurface, and surface) are computed in response to climate, land surface, and subsurface characteristics of the basin. Daily simulations of flow components from PRMS were used with the climate to simulate in-stream water temperatures using the Stream Network Temperature (SNTemp) model, a mechanistic, onedimensional heat transport model for branched stream networks. The climate, hydrology, and stream temperature for historical conditions were evaluated by comparing model outputs produced from historical climate forcings developed from gridded station data (GSD) versus those produced from the three statistically downscaled GCMs using the ARRM methodology. The PRMS and SNTemp models were forced with the GSD and the outputs produced were treated as ‘‘truth.’’ This allowed for a spatial comparison by HRU of the GSD-based output with ARRM-based output. Distributional similarities between GSD- and ARRM-based model outputs were compared using the two-sample Kolmogorov–Smirnov (KS) test in combination with descriptive metrics such as the mean and variance and an evaluation of rare and sustained events. In general, precipitation and streamflow quantities were negatively biased in the downscaled GCM outputs, and results indicate that the downscaled GCM simulations consistently underestimate the largest precipitation events relative to the GSD. The KS test results indicate that ARRM-based air temperatures are similar to GSD at the daily time step for the majority of the ACFB, with perhaps subweekly averaging for stream temperature. Depending on GCM and spatial location, ARRM-based precipitation and streamflow requires averaging of up to 30 days to become similar to the GSD-based output. Evaluation of the model skill for historical conditions suggests some guidelines for use of future projections; while it seems correct to place greater confidence in evaluation metrics which perform well historically, this does not necessarily mean those metrics will accurately reflect model outputs for future climatic conditions. Results from this study indicate no ‘‘best’’ overall model, but the breadth of analysis can be used to give the product users an indication of the applicability of the results to address their particular problem. Since results for historical conditions indicate that model outputs can have significant biases associated with them, the range in future projections examined in terms of change relative to historical conditions for each individual GCM may be more appropriate. KEYWORDS: Precipitation-Runoff Modeling System; Stream Network Temperature model; Asynchronous regional regression model; Statistical downscaling; General circulation model; Apalachicola–Chattahoochee–Flint River basin

1. Introduction The U.S. Geological Survey (USGS) National Climate Change and Wildlife Science Center (NCCWSC; http://nccwsc.usgs.gov/) is supporting a series of regional assessments that seek to provide integrated science that is useful to resource managers to understand the effect of climate change on a range of ecosystem responses. The chosen methodology is to link simulation models that span a broad range of scales and themes: from planetary general circulation models (GCMs) to

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 3

local models of landscape dynamics and biota. The USGS Southeast Regional Assessment Project (SERAP; http://serap.er.usgs.gov) is the first regional assessment to be funded by the NCCWSC. SERAP has been developed in close coordination with recently formed Department of the Interior Landscape Conservation Cooperatives (http://www.doi.gov/lcc/index.cfm) to ensure that its products meet the needs of resource managers in the southeastern United States. Scientists associated with SERAP have developed regional models and other science tools to help environmental resource managers assess potential effects of climate change on land cover, ecosystems, and priority species in the region. Two components of SERAP are the development of hydrologic and stream temperature models for the Apalachicola–Chattahoochee–Flint River basin (ACFB) (Figure 1) and the corresponding climatic datasets to drive these models for historical and future conditions. In recent decades, competing demands of municipal, industrial, and agricultural water use; ecological needs of fish and mussels; and economic development have resulted in conflict and discussions between stakeholders in Alabama, Florida, and Georgia over water allocation in the ACFB (U.S. Army Corps of Engineers 1997). These pressures, coupled with the effects of climatic variability and potential climatic change, have stimulated research efforts to develop water management tools, which include the use of atmospheric model output from GCMs in hydrologic models. Many atmospheric processes that have hydrologic consequences are not modeled adequately with GCMs; hydrologic modeling at the basin scale requires climatological information on scales that are generally much finer than the typical grid size of even the highest-resolution GCMs (Hay et al. 2002). To study the effects of climate change at basin scale, finer-resolution climate projections are required, based on either statistical or dynamical downscaling techniques. An overview of statistical and dynamical GCM downscaling techniques for hydrologic modeling is presented in Fowler et al. (Fowler et al. 2007). Statistical downscaling uses empirical relations between features reliably simulated by a GCM at gridbox scales and surface predictands at subgrid scales. Dynamical downscaling uses simulations from regional climate models with initial and lateral boundary conditions from GCM output. Both downscaling techniques introduce uncertainties: statistical downscaling is based on the assumption that the derived present relations will remain unchanged in the future, while dynamical downscaling is dependent on the GCMs for their boundary conditions and will retain some of the larger GCM biases. For the SERAP study, two sources of climate data (daily precipitation and maximum and minimum air temperature) were compared under the same historic condition: 1) 1/ 88 gridded station data (GSD) derived from station measurements (Maurer et al. 2002) and 2) statistically downscaled GCM output [an asynchronous regional regression model (ARRM); Stoner et al. 2012]. The GSD and ARRM datasets were used to simulate streamflow and components of flow (surface, subsurface, and groundwater) using the Precipitation-Runoff Modeling System (PRMS). Daily simulations of surface, subsurface and groundwater flow from PRMS were in turn used with the GSD and ARRM climate forcings to simulate daily in-stream water temperatures using the Stream Network Temperature (SNTemp) model. Figure 2 shows a schematic of the inputs and outputs from the loosely coupled models (GSD, ARRM, PRMS, and SNTemp).

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 4

Figure 1. Map showing the ACFB. [Note the boundary for Metro North Georgia Water Planning District (MNGWPD).]

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 5

Figure 2. Diagram of Southeast Regional Assessment Project data flow showing the various linkages of climate, hydrology, and stream temperature components.

Numerous sources of uncertainty are introduced at each step of the simulation process outlined in Figure 2. This includes uncertainty in the models associated with climate, hydrology, and stream temperature. The climate system represented by the GCMs has large uncertainties associated with the representation of physical processes, model structure, and feedbacks within the climate system (Alley et al. 2007). Additionally, there is uncertainty associated with the ARRM downscaling procedure. However, this uncertainty may be overwhelmed by the choice of driving GCM (Fowler et al. 2007), which has been shown to be consistently greater than uncertainty from the hydrologic model or natural variability (Prudhomme and Davies 2009). The uncertainty propagated through the modeling chain shown in Figure 2 may be enhanced or compensated for, depending on the structure and parameterization of each simulation model (Buytaert and Beven 2009), leading to final model predictions that may have large uncertainties that could jeopardize the effectiveness of management decisions (Pappenberger and Beven 2006).

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 6

Wood et al. (Wood et al. 2004) note that reproducing accurate historical conditions is the minimum standard for any hydrologic application that uses downscaled climate and is considered by many to be a necessary condition for a model to be trusted (Tebaldi and Knutti 2007). Chen et al. (Chen et al. 2011) examined the uncertainty of the hydrological impacts of climate change and found that the major contributors to uncertainty can vary depending on the hydrological variables selected (low flow, floods, etc.). There are well-known examples where errors in different components of a single model tend to cancel (Tebaldi and Knutti 2007). However, if a model performs poorly for historical conditions, then it may continue to perform poorly in the future (e.g., Charles et al. 1999; Jun et al. 2008; Knutti 2010; Perkins et al. 2012). For this study, all model outputs produced using the GSD were considered ‘‘truth,’’ allowing for spatial and temporal comparisons between the GSD- and ARRM-based outputs in Figure 2. The distributional similarities for historical conditions between model outputs from GSD and ARRM were evaluated using the twosample Kolmogorov–Smirnov (KS) statistic (Conover 1971). The KS test gives an indication of overall performance but will not give an indication of negative or positive model bias, nor will it indicate the relative accuracy of extremes (i.e., droughts/floods, largest/smallest events, or sustained events). There is evidence that extreme events may change more than indicated by a change in the distribution mean (Mearns et al. 1984; Schaeffer et al. 2005; Trigo et al. 2005). Simulating these extreme events accurately is critical for environmental impact studies and adaption strategies (Katz and Brown 1992; Colombo et al. 1999; Easterling et al. 2000). Therefore, the KS test is used in combination with a diversity of descriptive metrics (e.g., STARDEX 2005; Brands et al. 2011; Maxino et al. 2008; Knutti 2010) such as the mean and variance and an evaluation of rare and sustained events. This paper presents an evaluation of statistically downscaled precipitation and maximum and minimum air temperature from a set of three GCMs and the associated accuracy of using these downscaled estimates to simulate streamflow and stream temperature outputs using the PRMS and SNTemp models, respectively.

2. Study area The ACFB includes three major rivers: the Apalachicola, Chattahoochee, and Flint Rivers (Figure 1). The Chattahoochee River begins in the mountains of northeastern Georgia and flows southwestward through metropolitan Atlanta to the Alabama–Georgia border, where the river flows southward to Lake Seminole on the Florida–Georgia border. The Flint River begins in north-central Georgia, just south of Atlanta, and flows southward to Lake Seminole. The Apalachicola River begins at Lake Seminole, which is the confluence of the Chattahoochee and Flint Rivers, and flows southward through Florida to the Gulf of Mexico. The Chattahoochee River is regulated by four U.S. Army Corps of Engineers (USACE) projects and eight run-of-the-river dams (not operated to regulate flow), while the Flint River is relatively unregulated with just two run-of-the-river dams (U.S. Army Corps of Engineers 1997). The Apalachicola River has one USACE project (Lake Seminole) as its headwaters; one other impoundment in the basin, Dead Lakes, is present on the Chipola River (Figure 1). The ACFB is home to

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 7

numerous fish and wildlife species of conservation concern (including four endangered and two threatened freshwater mussel species and one threatened fish species) and is regionally important for water supply. The ACFB is characterized by a warm and humid temperate climate. The ACFB has a generally north–south direction and spans approximately 590 km. Because of this substantial areal range and surface altitudes that range from greater than 1300 m in the northern area to sea level at the mouth of the ACFB, air temperature and precipitation are not spatially constant across the basin. Annual precipitation in the basin averages about 1270 mm, with totals substantially higher in the mountains in the northernmost part of the basin (about 1780 mm) and along the Gulf of Mexico in the southernmost part of the basin (about 1520 mm); totals are relatively close to the basin average across the middle part (data from http://www.sercc.com/ climate). Precipitation totals in the ACFB are generally lower in the fall (September– November) compared to the rest of the year. Maximum daily summer (June–August) air temperatures are around 29.48C in the northern part of the basin and 32.28C in the southern part. Daily minimum summer air temperatures range from 17.28C in the northern part of the basin to 21.78C in the southern part. Daily minimum and maximum air temperatures for winter (December–February) range from 08 to 12.28C in the northern part of the basin and from 6.78 to 18.38C in the southern part.

3. Model overviews/background The following section describes each of the loosely coupled models shown in Figure 2. Daily climate values (precipitation and maximum and minimum air temperature) from the GSD and ARRM datasets were used as input to the hydrologic model. The hydrologic model produced climate and flow variables by stream segment, which were used as input to the stream temperature model. Models are described in the order used. 3.1. Climate The hydrology and stream temperature models were driven with data derived from 1) 1/ 88 gridded station data and 2) statistically downscaled GCMs (ARRM). All climate forcings were made available through the USGS Geo Data Portal (GDP; http://cida.usgs.gov/climate/gdp/) (Blodgett 2013). The GDP was used to summarize daily values of precipitation and maximum and minimum air temperature from the gridded datasets (GSD and ARRM) to the appropriate modeling units (hydrologic response units; described below). 3.1.1. Station data

The GSD produced by Maurer et al. (Maurer et al. 2002) was chosen for consistency across the SERAP tasks for both historical and future conditions. Precipitation and maximum and minimum air temperature grids were developed for the conterminous United States at a 1/ 88 cell size (approximately 140 km2) using stations from the National Oceanic and Atmospheric Administration/National Weather Service Cooperative Observer Program. Daily precipitation totals were assigned to each day based on the time of observation for the gauge, so a fraction of

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 8

Table 1. Climate data used in analysis. Model abbreviation

Origin

Horizontal resolution (lat 3 lon)

GSD CCSM3 GFDL CM2.1

Gridded station data (Maurer et al. 2002) Community Climate System Model, version 3 Geophysical Fluid Dynamics Laboratory Climate Model, version 2.1 Parallel Climate Model

0.1258 3 0.1258 1.48 3 1.48 2.08 3 2.58

PCM

2.88 3 2.88

each daily precipitation total was applied to the previous day. The gridded daily precipitation data were then scaled to match the long-term average precipitation climatology from the Parameter-Elevation Regressions on Independent Slopes Model (PRISM; Daly et al. 1994; Daly et al. 1997). The minimum and maximum daily air temperature data were derived using the same algorithm as for precipitation and were lapsed at 26.58C (1000 m)21 to the gridcell mean elevation. Temperatures at each time step were interpolated by fitting an asymmetric spline through the daily maxima and minima (Maurer et al. 2002). The GSD was available for 1950–99 and was spatially transferred to the hydrologic modeling units through the GDP using an area-weighted averaging for each daily time step for maximum and minimum air temperature and precipitation. 3.1.2. Statistically downscaled GCM climate

For the SERAP study, statistically downscaled daily precipitation and maximum and minimum air temperature using the asynchronous regional regression model (the ARRM dataset) developed by Stoner et al. (Stoner et al. 2012) and available through the GDP was used. The ARRM is based on quantile regression, which matches quantiles of the observed and simulated time series. This approach was originally proposed by O’Brien et al. (O’Brien et al. 2001) and applied by Dettinger et al. (Dettinger et al. 2004). In addition, the ARRM uses a piecewise regression model to improve its ability to simulate extremes in the daily distribution. For precipitation, a mixture model clustering approach that includes nonhomogeneous transition probabilities to model the occurrence and intensity of daily precipitation was used (Vrac and Naveau 2007). The GSD developed by Maurer et al. (Maurer et al. 2002) and described previously was used as the ARRM training dataset with retrospective downscaling available for the years 1961–99. The reader is referred to Stoner et al. (Stoner et al. 2012) for a detailed description of the ARRM downscaling procedure. For this study, three GCMs, using the A1Fi emission scenario, were analyzed for historical conditions (CCSM3, GFDL CM2.1, and PCM). Table 1 summarizes the datasets used for statistical downscaling in this study. 3.2. Hydrology The USGS PRMS was used to simulate and evaluate the effects of various combinations of precipitation, climate, and land use on watershed response in the ACFB and is documented in LaFontaine et al. (LaFontaine et al. 2013). PRMS is a modular, deterministic, distributed-parameter, physical-process watershed model used to simulate the generation of streamflow by process algorithms that are based on physical laws or empirical relations with measured or estimated characteristics

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 9

Figure 3. Overview of the Precipitation-Runoff Modeling System conceptualization of basin components and fluxes (taken from Markstrom et al. 2008).

(Figure 3). PRMS uses daily inputs of precipitation and maximum and minimum air temperature. The reader is referred to Leavesley et al. (Leavesley et al. 1983; Leavesley et al. 2005), Leavesley and Stannard (Leavesley and Stannard 1995), and Markstrom et al. (Markstrom et al. 2008) for a complete description of PRMS. The ACFB PRMS model was divided into 258 hydrologic response units (HRUs) in which the components of flow (groundwater, subsurface, and surface) are computed in response to precipitation, air temperature, and land surface and subsurface characteristics of the basin (Figure 4). Daily maximum and minimum air temperature and precipitation data from the GSD dataset (Maurer et al. 2002), summarized by HRUs, were used as PRMS forcings. A total of 35 USGS streamflow gauges, identified as predominately ‘‘unregulated’’ streamflow (Falcone 2011), were used for PRMS calibration and evaluation (Figure 4). A multiple-objective, stepwise, automated procedure (Hay et al. 2006; Hay and Umemoto 2006) combined with a nested-basin modeling approach was developed to calibrate the ACFB PRMS model. The simulated storm hydrographs using the GSD tended to start earlier, have smaller peak flows, and last longer when compared to measured and climate station-forced (nongridded) streamflow storm hydrographs. Streamflow timing was calibrated in PRMS using a 3-day running mean to keep the calibration process from overcompensating for the difference in flow timing due to the gridded inputs and adjusting the model parameters to unrealistic values. Therefore, evaluating streamflow on a daily basis may not be appropriate.

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 10

Figure 4. Map showing Precipitation-Runoff Modeling System HRUs, stream network, and USGS stream gauges.

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 11

Figure 5. Comparison of Nash–Sutcliffe goodness-of-fit statistics using daily vs 3-day running mean streamflow values for the ACFB stream gauges (LaFontaine et al. 2013).

This known limitation in the predictive capability of models using the gridded forcings to reproduce measured streamflow magnitudes and timing must be considered when using these results (Mannshardt-Shamseldin et al. 2010; LaFontaine et al. 2013). Although other climate forcings could have been used to better simulate basin hydrology (i.e., nongridded station data or finer-resolution gridded inputs), the GSD dataset was selected as it provided coverage of the conterminous United States and a consistent framework for all components of the SERAP. Figure 5 shows the relation between the Nash–Sutcliffe (NS) model efficiency coefficients calculated with daily versus 3-day running mean time series for the 35 stream gauges (LaFontaine et al. 2013). The shaded areas indicate the performance rating for the NS coefficient based on the criteria from Moriasi et al. (Moriasi et al. 2007). As expected, NS results on a 3-day time step are always higher (greater than 0.6) than those for a daily time step. The resulting PRMS simulations of streamflow on a 3-day running mean time step for the period 1961–99 are all at least satisfactory with most of the ACFB classified as very good. The less optimal simulations

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 12

Figure 6. Diagram showing the information flow between the PRMS, P2S, and Stream Network Temperature model.

occurred in smaller, more developed subbasins and in subbasins located in the southern groundwater-dominated part of the basin. The reader is referred to the LaFontaine et al. (LaFontaine et al. 2013) report for further information on the ACFB PRMS model setup and analysis. 3.3. Stream Network Temperature model SNTemp was used to predict in-stream water temperatures based on hydrological, meteorological, topographic and vegetative shading, and stream channel conditions in the ACFB. SNTemp is a mechanistic, one-dimensional heat transport model for branched stream networks that predicts the daily mean and maximum water temperatures as a function of stream distance and environmental heat flux. SNTemp incorporates 1) a heat transport model that predicts the daily mean water temperature and diurnal fluctuations in water temperature as functions of longitudinal downstream distance; 2) a heat flux model that predicts the energy balance between the water and its surrounding environment; and 3) a shade model that predicts the solar radiation-weighted shading resulting from both topography and riparian vegetation. Theurer et al. (Theurer et al. 1984) and Bartholow (Bartholow 2000) provide a complete description of SNTemp. Stream temperature is simulated with a ‘‘loosely coupled’’ model framework called P2S (documented in Markstrom 2012). Simulation of water temperature in a stream segment with SNTemp depends on 1) the water temperature and amount of water coming from upstream segments, 2) the water temperature and amount of local lateral flow to the segment from the adjacent land surface and subsurface, and 3) the energy balance of the stream segment. Total streamflow from the adjacent land is partitioned into direct surface runoff, subsurface, and groundwater components in PRMS (Figure 6), and different water temperatures are assigned to each of these components. On the basis of this flow partitioning, a daily stream temperature is calculated for the total lateral flow into any particular stream

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 13

Figure 7. Schematic showing the PRMS input to the SNTemp model.

segment. In addition, PRMS computes air temperature, solar radiation, and potential evapotranspiration, on a daily time step, for each stream segment simulated by SNTemp. SNTemp produces stream temperature for every stream segment. In general there are two HRUs associated with each stream segment (a left and a right bank; Figure 7). To make the results for SNTemp consistent with the other modeling results by HRUs, all SNTemp results by stream segment were assigned to the contributing HRUs (i.e., in Figure 7 stream temperature for the stream segment is assigned to HRU 1 and HRU 2). SNTemp uses the accumulated streamflow above each stream segment (see Figure 7); therefore, streamflow results are also examined by HRU.

4. Model evaluation metrics The goal of this study was to evaluate the 1) ability of the ARRM climate dataset to reproduce historical climate and 2) corresponding simulated streamflow and stream temperature outputs from the PRMS and SNTemp models, respectively. Climate (maximum and minimum air temperature and precipitation), hydrology (components of flow and streamflow), and stream temperature in Figure 2 (yellow boxes) for historical conditions (1961–99) were evaluated by comparing model outputs produced using historical climate forcings developed from the GSD (Maurer et al.

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 14

Table 2. Evaluation metrics. Evaluation metric

Abbreviation

Mean daily values Variance in daily values Number of dry days Days , 5th percentile threshold each year Days . 95th percentile threshold each year Highest (lowest) value each year Highest (lowest) 7-day value each year Two-sample Kolmogorov–Smirnov test

Mean Variance nDRY nLT05 nGT95 1-day 7-day KS

2002) to those produced using the three statistically downscaled GCMs (CCSM3, GFDL CM2.1, and PCM) developed with the ARRM downscaling procedure (Stoner et al. 2012) for the same period. In this analysis, all outputs produced using the GSD were considered ‘‘truth.’’ This allows for a spatial comparison, by HRU, of the simulations using GSD output with those using the ARRM output for the three GCMs (Table 1) using descriptive statistics such as the mean and variance, in combination with an evaluation of rare and sustained events and a test of distributional similarity using the KS test (Table 2). 4.1. Descriptive statistics An examination of the mean and variance in daily climate, hydrology, and stream temperature output by HRU is presented in the form of model bias (ARRM minus GSD) for each downscaled GCM result (CCSM3, GFDL CM2.1, and PCM). Hayhoe and Stoner (Hayhoe and Stoner 2012) noted that, when analyzing the accuracy of ARRM statistically downscaled GCMs, absolute biases at the tails of the distribution were much greater than biases toward the center of the distribution because of the sparse observational data available to train the downscaling model at the tails of the distribution as compared to the center. There should be no expectation that extreme or sustained events produced using the GSD output will temporally match those from the downscaled GCMs because of the statistical nature of the ARRM downscaling procedure. However, over the period of record, the average of the number, length, and magnitude of extreme events from each year in the record from outputs produced using the ARRM downscaled GCMs should be similar to those using the GSD. Table 2 lists the extreme and sustained events evaluated for climate, hydrology, and stream temperature outputs for each HRU. The mean, variance, and average number of dry days per year (nDRY) were compared for each dataset. Model biases in the tails of the distributions were examined by calculating threshold values at the 5th and 95th percentile for the climate, hydrology, and stream temperature outputs using the GSD. Using these thresholds, the number of days below (above) the 5th (95th) percentile value each year were calculated (nLT05 and nGT95) for each dataset. In addition, 1- and 7-day events each year for climate, hydrology, and stream temperature outputs were determined. These included the largest precipitation events, warmest maximum daily air temperatures, coldest minimum daily air temperatures, and warmest daily mean stream temperatures calculated for each of

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 15

the time periods (1 and 7 days). To examine high and low flow biases the largest 1day peak and smallest 7-day low flow were calculated using streamflow. 4.2. Kolmogorov–Smirnov test The nonparametric two-sample KS test (Conover 1971) was used to determine if the GSD- and ARRM-derived datasets had different distributions. The KS test finds the maximum distance between two empirical cumulative distribution functions and is sensitive to differences in both central tendency and distribution shape. In this study, the null hypothesis (two datasets are from identical populations) is rejected if the KS test, using GSD and ARRM datasets, shows significant probabilities (p values) less than 0.05. If the null hypothesis is rejected, then the two populations tested may differ in median, variability, and/or the shape of the distribution. The distributions considered under the KS test null hypothesis are continuous and will not have definite null distributions when ties are possible. Therefore, a bootstrap KS test was used to calculate the correct test level when ties were present in the data (Abadie 2002).

5. Results In the following sections, the evaluation metrics described above and listed in Table 2 are used to compare the climate, hydrology, and stream temperature outputs shown in Figure 2 (yellow boxes), for historical conditions (1961–99). Model outputs using the GSD developed from station measurements (Maurer et al. 2002) are used as a surrogate for truth and are compared to model outputs using the ARRM datasets for the three GCMs. 5.1. Descriptive statistics For each HRU, the mean and variance for daily climate (maximum and minimum air temperature and precipitation), hydrology (streamflow and components of flow), and stream temperature (daily mean) simulated using the GSD and the ARRM datasets for three GCMs (CCSM3, GFDL CM2.1, and PCM) were calculated. Figure 8 shows the range in HRU model bias (ARRM minus GSD) in the mean daily values for each variable. Positive biases (ARRM overprediction) are shaded a light pink. Model biases are minimal (less than 0.58C) for all temperature calculations (Figure 8a). Median model biases for precipitation and hydrology are all negative (Figure 8b), indicating, in general, an underprediction in the mean daily values from the ARRM-based precipitation and hydrology outputs. The range in model bias, but not necessarily the medians, increased when going from precipitation to the surface, subsurface, and groundwater flow components. The range in bias for the resulting streamflow was generally less than the range in any of the flow components with median streamflow biases similar to those shown for the subsurface flow component. The largest biases in streamflow are associated with the CCSM3-based results (median bias of 27%), and the smallest biases are associated with the PCM-based results (median bias of 22%).

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 16

Figure 8. Range in HRU (a) bias and (b) percent bias (ARRM minus GSD) in the mean daily values for each variable.

Figure 9 shows the range in model bias by HRU (ARRM minus GSD) for the variance of daily values for each variable. Absolute median model biases are small (less than 0.58C2) for maximum and minimum air temperature calculations (Figure 9a) with ranges from 228 to 2.68C2. Median model biases for stream temperature variance are higher and all positive with all values ranging from 22.88 to 4.18C2. Median model biases of variance for precipitation and hydrology are negative, with largest underprediction of variance associated with the CCSM3-based results.

Figure 9. Range in HRU (a) bias and (b) percent bias (ARRM minus GSD) in the variance in daily values for each variable.

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 17

Figure 10. Range in model bias (ARRM minus GSD) by GCM for difference in average number of dry days per year.

The mean annual values for nDRY were calculated for each HRU and each precipitation dataset in Table 1. The boxplots in Figure 10 show the ranges in HRU model biases (ARRM minus GSD) for nDRY; negative nDRY values indicate that the GCM-derived output has fewer dry days. The percentage of nDRY days in the period from 1961 to 1999 based on the GSD (Figure 11a) and the corresponding model bias based on the ARRM-based outputs from the three GCMs (CCSM3,

Figure 11. (a) Percent dry days for GSD and model bias in percent dry days (ARRM minus GSD) for (b) CCSM3, (c) GFDL CM2.1, and (d) PCM.

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 18

Figure 12. Range in HRU model biases (ARRM minus GSD) in mean daily precipitation by percentiles.

GFDL CM2.1, and PCM) from Figure 10 are shown spatially in Figures 11b–d. The CCSM3-based precipitation consistently underestimates nDRY (Figure 10a) by as much as 10% for some HRUs in the southern part of the basin (Figure 11b), even when considering a 0.254-mm precipitation detection limit in the calculations (not shown). The GFDL CM2.1–based precipitation has nDRY similar to the GSD-based precipitation for many HRUs in the basin (Figures 10a, 11c). The PCM-based ARRM results underestimate nDRY compared to the GSD (Figures 10a, 11d) but not to the extent seen with the CCSM3-based ARRM results (Figures 10a, 11b). To further examine the difference between the GSD and downscaled precipitation, model biases in mean daily precipitation by percentile (5% increments) were calculated (Figure 12) for each HRU. In general, the model bias in mean daily precipitation is zero in the 30th and lower percentiles (dashed gray line in Figure 12, indicating that at least 30% of the days are dry for all datasets). CCSM3-based ARRM results show increasing positive median biases from the 30th to the 80th percentile, indicating that the CCSM3-based ARRM simulations tend to consistently overestimate all but the largest precipitation events relative to the GSD. This may be due to the inability of the downscaling approach to correct for the ‘‘drizzle problem’’ typical in GCMs (Iorio et al. 2004; Mearns et al. 1995; Dai 2006; Lee et al. 2009), where smaller precipitation events are overly common. The downscaled GFDL CM2.1– and PCM-based precipitation shows the opposite pattern with increasing negative median biases from the 30th to the 80th percentile, indicating that the GFDL CM2.1– and PCM-based ARRM simulations tend to

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 19

Figure 13. Range in HRU model bias (ARRM 2 GSD) by GCM for number of days below the 5th (nLT05) and above the 95th (nGT95) percentile GSD threshold for (a) precipitation, (b) maximum air temperature, (c) minimum air temperature, (d) stream temperature, (e) surface flow, (f) subsurface flow, (g) groundwater flow, and (h) streamflow.

consistently underestimate these precipitation events relative to the GSD. The largest negative median model biases are seen in the highest percentile for all ARRM results, indicating that the downscaled GCM simulations consistently underestimate the largest precipitation events relative to the GSD, especially when using the CCSM3-based ARRM results. Figure 13 shows the HRU range in model bias by GCM for simulation of the mean annual number of days below (above) the 5th (95th) percentile threshold for the GSD (nLT05 and nGT95, respectively). For precipitation (Figure 13a), the threshold values were calculated after omitting the percentage of dry days determined from the GSD precipitation (approximately 30%; see Figure 12). Precipitation model biases are minimal for nGT95 (Figure 13a). The nLT05 results for precipitation are similar to those shown in Figure 10a, but Figure 13a includes the smallest precipitation events. In Figure 13a, the ARRM downscaled GCMs show median model biases for nLT05 of 213, 25, and 27 days yr21, for the CCSM3, GFDL CM2.1, and PCM, respectively. Model biases for maximum and minimum air temperature for the 5th and 95th thresholds are less than 5 days (Figures 13b,c) with the minimum air temperature exception in simulating nGT95 (negative model bias as high as 17 days; Figure 13c). Compared to maximum and minimum air temperature, stream temperature model biases are more variable and tend to be positive, indicating overestimation of these metrics (Figure 13d).

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 20

Figure 14. Range in HRU model bias (ARRM 2 GSD) by GCM for mean annual 1- and 7-day maximum or minimum values for (a) precipitation, (b) maximum air temperature, (c) minimum air temperature, (d) stream temperature, and (e) streamflow.

The hydrology results (Figures 13e–h) show the largest model biases in the simulation of nLT05 with inconsistent results between simulations using the ARRM downscaled GCMs. Surface runoff results (Figure 13e) show a somewhat similar pattern between GCMs as those shown for precipitation (Figure 13a). Surface and subsurface runoff results (Figures 13e,f) both show the largest model biases in predicting nLT05 (240 to 10 days for surface and 225 to 38 days for subsurface). Groundwater flow biases are the largest of the hydrology components (Figure 13g). Streamflow model biases (Figure 13h) are similar in magnitude to those shown for subsurface flow (Figure 13f). Biases in total streamflow associated with nLT05 are mostly positive for the GFDL CM2.1– and PCM-based ARRM results, both with median values of approximately 10 days, while the median bias for the CCSM3based ARRM results was near zero. Figure 14 shows the range in HRU model bias by GCM for mean annual 1- and 7-day maximum or minimum values for precipitation, maximum air temperature, minimum air temperature, stream temperature, and streamflow. Model biases are shown in degrees Celsius for temperatures and percent for precipitation and streamflow. Median model biases for precipitation (Figure 14a) are all negative, indicating that the downscaling procedure tends to underestimate 1- and 7-day maximum annual precipitation events. The PCM-based ARRM results produce the smallest negative median biases, and the CCSM3-based ARRM results produce the largest negative median biases for the 1- and 7-day maximums. Model biases for maximum air temperatures (Figure 14b) are variable; the median biases are

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 21

positive and less than 0.58C for the CCSM3- and PCM-based ARRM results and negative and less than 18C for the GFDL CM2.1–based ARRM results. Model biases for minimum air temperatures (Figure 14c) are positive and less than 0.88C for the GFDL CM2.1– and PCM-based ARRM results and variable but less than 0.38C for the CCSM3-based ARRM results. Stream temperature biases (Figure 14d) follow the same pattern as maximum air temperature for the GCMs (Figure 14b). Median model biases for streamflow are all negative ranging between 215% and 225% for the 1-day maximums and between 25% and 212% for the 7-day minimums. Examining the descriptive evaluation metrics, there are some similarities in the outputs generated by the three ARRM downscaled GCMs. All ARRM-based models tended to underestimate precipitation, including the daily mean and variance (Figures 8b, 9b) and the annual daily and 7-day maximum (Figure 14a). Underprediction of precipitation for the historical period could indicate that precipitation will be underpredicted for any forecasted period. With respect to maximum and minimum air temperature, all ARRM-based outputs are highly accurate for daily mean and variance (Figures 8a, 9a), with less accuracy in replicating the temperature extremes (Figures 14a,b). For surface runoff, all ARRM-based outputs tended to underestimate the daily mean and variance. This could be because the models are underpredicting precipitation. Subsurface flow and streamflow biases tended to be of similar magnitudes. Groundwater flow biases show the largest spatial variability. The nGT95 metrics are well represented using the ARRM-based streamflow. In general, the annual 1-day maximum and the 7-day minimum are underestimated using the ARRM-based streamflow, again, possibly because of the tendency to underpredict precipitation. Finally, stream temperature, the last link in the chain of models studied, shows high accuracy in estimating mean daily values, which can be attributed to the high accuracy associated with simulating air temperature.

5.3. Kolmogorov–Smirnov test Comparison of model output from the GSD with the three ARRMs using the KS test are summarized by variable and GCM in Figures 15 and 16. KS tests were calculated using daily outputs for every HRU. Figure 15 shows boxplots of KS test p values by HRU and Figure 16 maps the KS test p values by HRU. The null hypothesis (two datasets are from identical populations) was accepted if the KS test p values using the GSD and downscaled ARRMs were greater than 0.05. The grayshaded area in Figure 15 corresponds to p values less than 0.05 (the gray HRUs in Figure 16). The null hypothesis is rejected for every HRU for precipitation, indicating the inability of the ARRM downscaling approach to replicate historical conditions on a daily basis for precipitation for all GCMs tested. Maximum and minimum air temperature results are more encouraging, with only a few HRUs with p values less than 0.05. Because the ACFB is a rainfall-dominated basin, the lack of skill in the daily simulation of precipitation by the ARRMs is directly reflected in the KS test results for hydrology. The majority of HRUs shows p values less than 0.05, indicating the inability of the ARRM downscaling approach to

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 22

Figure 15. Range in KS test p values for HRUs by variable.

replicate historical conditions on a daily basis for hydrology for large portions of the ACFB. Streamflow results show an exception in the northwestern portion of the ACFB, where KS test p values are greater than 0.05 for the CCSM3-based ARRM results. The skill in downscaled maximum and minimum air temperature translated into relatively high skill for stream temperature simulations. Markovic et al. (Markovic et al. 2013) note that air temperature variability describes more than 80% of total stream temperature variability, and this is certainly reflected in the KS test stream temperature results with large portions of the basin having p values greater than 0.05 for the GFDL CM2.1– and PCM-based ARRM results. The CCSM3-based ARRM results for stream temperature show only a few HRUs with p values greater than 0.05 and also had the largest areas with p values less than 0.05 for air temperature (Figure 16).

6. Discussion Table 3 summarizes the evaluation metrics (listed in Table 2) by determining which ARRM downscaled GCM (Table 1) produced the best overall skill for each variable and metric combination. There is no ‘‘best’’ overall ARRM downscaled GCM for simulating the climate, hydrology, and stream temperature model outputs in the ACFB; skill varies from GCM to GCM, variable to variable, metric to metric, and HRU to HRU. Some general conclusions can be drawn from Table 3: 1) the highest overall skill in simulating precipitation is associated with the GFDL

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 23

Figure 16. KS test p values for each variable and ARRM downscaled GCM.

CM2.1–based ARRM results and the least skill is associated with the CCSM3based ARRM results, 2) the highest overall skill in simulating maximum air temperature is associated with the CCSM3-based ARRM results, and 3) the highest overall skill in simulating the components of flow and streamflow is associated with the PCM-based ARRM results. Based on the KS test results alone, no skill is demonstrated in any of the ARRMbased daily precipitation, which translated into little skill in simulating daily streamflow and components of flow. To determine the required aggregation of time steps appropriate for analysis of model outputs using downscaled climate as input, the KS test was calculated using daily values incrementally summed up to 30 days. Figure 17 shows, for all HRUs, the range in the number of days each model output must be aggregated to achieve a KS test p value greater than 0.05 for each GCM.

Precipitation (PRCP)

GFDL CM2.1 CCSM3 GFDL CM2.1 GFDL CM2.1 GFDL CM2.1 PCM PCM GFDL CM2.1

Mean Variance nDRY nLT05 nGT95 1 day 7 days KS

GFDL CM2.1 PCM CCSM3 CCSM3 CCSM3 GFDL CM2.1 GFDL CM2.1

CCSM3 GFDL CM2.1 PCM CCSM3 GFDL CM2.1

Min air temperature (TMIN)

CCSM3 CCSM3

Max air temperature (TMAX)

CCSM3 GFDL CM2.1 PCM GFDL CM2.1

PCM

PCM PCM

Subsurface flow (SSUR)

CCSM3 PCM PCM GFDL CM2.1 CCSM3

PCM PCM

Streamflow (STRM)

CCSM3 CCSM3 PCM GFDL CM2.1 GFDL CM2.1

PCM CCSM3

Stream temperature (TEMP)

d

PCM

CCSM3 PCM CCSM3

PCM PCM

Groundwater flow (GWTR)

Volume 18 (2014)

GFDL CM2.1 PCM PCM

CCSM3 PCM

Surface flow (SURF)

d

Evaluation metric

Variable

Table 3. Summary of evaluation metrics indicating best downscaled general circulation model for each variable in each box.

Earth Interactions Paper No. 9 d

Page 24

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 25

Figure 17. Range in number of days to achieve KS test p values greater than 0.05 for HRUs by variable and GCM.

Figure 18 maps the number of aggregated days by HRU. Some HRUs did not reach a p value greater than 0.05 within 30 days and are omitted from the analysis (Figure 17) and indicated by the gray areas shown in Figure 18. The numbers of days the precipitation values need to be summed to achieve a KS test p value greater than 0.05 varied spatially and across ARRM-based GCMs, with approximate HRU median values of 7 days for the CCSM3 and 4 days for the GFDL CM2.1– and PCM-based ARRM results. The spatial patterns in the precipitation results are directly reflected in the surface runoff results (Figure 18). The range in the number of days tended to increase from surface runoff to subsurface flow to groundwater flow (from the fastest to the slowest components of flow). The resulting streamflow results are highly variable, ranging from 1 to 26 days across the basin. The streamflow results are somewhat counterintuitive, with the CCSM3based ARRM results showing the lowest median number of days for streamflow but the highest for precipitation. Stream temperature results indicate accurate simulations with a time step of 2 days or less for most HRUs, but there are a few HRUs associated with each GCM that require more than 15 days for accurate stream temperature results. These results indicate that many of the model outputs based on downscaled data may not be reliable for studies or decision making using analyses based on daily or even weekly time scales, but this is highly variable. Daily simulations of streamflow are even more suspect because of the limitations noted earlier in the predictive capability of PRMS using measured GSD to reproduce measured streamflow

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 26

Figure 18. Number of days to achieve a KS test p value greater than 0.05 for each HRU, variable, and downscaled GCM.

magnitudes and timing on a daily time step (LaFontaine et al. 2013). Until better GCM simulations of daily precipitation are available, estimates of future streamflow may be most appropriately evaluated on at least a weekly time step, whereas future stream temperatures, which rely more heavily on the downscaled air temperatures, may be evaluated subweekly. Evaluation of the model skill for simulated historical conditions suggests some guidelines for use of the carbon emission–based future projections. While it seems correct to place greater confidence in evaluation metrics that perform well for historical conditions, this does not necessarily mean those metrics will perform well under a different climate state. Problems with stationarity (Milly et al. 2008) may mean that parameters calibrated to historical conditions are not relevant for

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 27

future conditions (Charles et al. 1999; Bloschl and Montanari 2010). Depending on the structure and parameterization of each model, uncertainty may propagate through the modeling chain in ways that it may be either enhanced or compensated for, with different components of a single model or model chain effectively canceling the errors (Tebaldi and Knutti 2007; Buytaert and Beven 2009; Knutti 2010). This may explain the localized exceptions in model skill for the different output variables and why some model results seem to get better further down the modeling chain. Previous studies have found that the more metrics used to evaluate model skill, the harder it becomes to identify a ‘‘better model’’ (Bureau of Reclamation 2011; Brekke et al. 2008; Reichler and Kim 2008; Gleckler et al. 2008). The Statistical and Regional Dynamical Downscaling of Extremes for European Regions Project (STARDEX 2005) noted that their nonsystematic results made it difficult to choose a single best method. The hydrologic and stream temperature models developed in this study will be used to project future conditions in the ACFB for use by resource managers in a wide range of applications. While there may be no best model overall, the breadth of analysis can be used to give the product users an indication of the applicability of the results to their particular problem. The Intergovernmental Panel on Climate Change (IPCC; Parry et al. 2007) recommended applying several GCMs in climate change studies, using an ensemble of projections. Some researchers promote weighting or not using specific GCMs based on their ability to reproduce historical conditions (Murphy et al. 2004; Tebaldi et al. 2005; Pitman and Perkins 2008; Reifen and Toumi 2009; Santer et al. 2009; Perkins et al. 2009; Sa´nchez et al. 2009; Knutti 2010). Others argue that the basis for establishing a GCM culling rationale is unclear (Bureau of Reclamation 2011) and promotes the use of GCM ensembles to produce an outer bound on the maximum range of uncertainty for future conditions (Stainforth et al. 2007; Hay et al. 2011). Since results for historical conditions indicate that model outputs can have significant biases, the range in future projections examined in terms of change relative to historical conditions for each individual GCM may be more appropriate than predictions of magnitude. Analysis of the future ensembles that are consistent versus those showing significant difference can then serve to quantify the uncertainty associated with the range in future predictions.

7. Conclusions The hydrologic and stream temperature models for the ACFB were developed as part of the SERAP to help environmental resource managers assess potential effects of climate change on ecosystems and priority species in the region. In this study, the daily GSD dataset, hydrology, and stream temperature model outputs (see Figure 2) were considered ‘‘truth’’ and compared with model outputs produced using the ARRM datasets from three GCMs for historical conditions (1961–99). Results from this study of historical conditions will be used to guide the use of future projections. The model skill evaluation for historical conditions suggests some guidelines for use of future projections. These results indicate that many of the model outputs based on downscaled data may not be reliable for studies requiring analyses on daily or even weekly time scales, but this is highly variable across the ACFB. Until better GCM simulations of daily precipitation are available, estimates of future

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 28

streamflow may be most appropriately evaluated on at least a weekly time step, whereas future stream temperatures, which rely more heavily on the downscaled air temperatures, may be evaluated subweekly. While it seems correct to place greater confidence in evaluation metrics that perform well historically, this does not necessarily mean those metrics will perform well under a different climate state. While there may be no best overall model, the breadth of analysis can be used to give the product users an indication of the applicability of the results to address their particular problem. Since results for historical conditions indicate that model outputs can have significant biases associated with them, the range in future projections examined in terms of change relative to historical conditions for each individual GCM may be more appropriate.

References Abadie, A., 2002: Bootstrap tests for distributional treatment effects in instrumental variable models. J. Amer. Stat. Assoc., 97, 284–292, doi:10.1198/016214502753479419. Alley, R. B., and Coauthors, 2007: Summary for policymakers. Climate Change 2007: The Physical Science Basis, S. Solomon et al., Eds., Cambridge University Press, 1–18. Bartholow, J. M., 2000: The stream segment and stream network temperature models: A self-study course. U.S. Geological Survey Open-File Rep. 99-112, 270 pp. Blodgett, D. L., 2013: The U.S. Geological Survey Climate Geo Data Portal: An integrated broker for climate and geospatial data. U.S. Geological Survey Fact Sheet 2013-3019, 2 pp. Bloschl, G., and A. Montanari, 2010: Climate change impacts—Throwing the dice? Hydrol. Processes, 24, 374–381. Brands, S., S. Herrera, D. San-Martı´n, and J. M. Gutie´rrez, 2011: Validation of the ENSEMBLES global climate models over southwestern Europe using probability density functions, from a downscaling perspective. Climate Res., 48, 145–161, doi:10.3354/cr00995. Brekke, L. D., M. D. Dettinger, E. P. Maurer, and M. Anderson, 2008: Significance of model credibility in estimating climate projection distributions for regional hydroclimatological risk assessments. Climatic Change, 89, 371–394, doi:10.1007/s10584-007-9388-3. Bureau of Reclamation, 2011: West-wide climate risk assessments: Bias-corrected and spatially downscaled surface water projections. U.S. Department of the Interior Bureau of Reclamation Technical Services Center Tech. Memo. 86-68210-2011-01, 138 pp. Buytaert, W., and K. Beven, 2009: Regionalization as a learning process. Water Resour. Res., 45, W11419, doi:10.1029/2008WR007359. Charles, S. P., B. C. Bates, P. H. Whetton, and J. P. Hughes, 1999: Validation of downscaling models for changed climate conditions: Case study of southwestern Australia. Climate Res., 12, 1–14, doi:10.3354/cr012001. Chen, J., F. P. Brissette, A. Poulin, and R. Leconte, 2011: Overall uncertainty study of the hydrological impacts of climate change for a Canadian watershed. Water Resour. Res., 47, W11515, doi:10.1029/2011WR010491. Colombo, A., D. Etkin, and B. Karney, 1999: Climate variability and the frequency of extreme temperature events for nine sites across Canada: Implications for power usage. J. Climate, 12, 2490–2502, doi:10.1175/1520-0442(1999)012,2490:CVATFO.2.0.CO;2. Conover, W. J., 1971: Practical Nonparametric Statistics. John Wiley & Sons, 462 pp. Dai, A., 2006: Precipitation characteristics in eighteen coupled climate models. J. Climate, 19, 4605–4630, doi:10.1175/JCLI3884.1. Daly, C., R. P. Neilson, and D. L. Phillips, 1994: A statistical-topographic model for mapping climatological precipitation over mountainous terrain. J. Appl. Meteor., 33, 140–158, doi:10.1175/1520-0450(1994)033,0140:ASTMFM.2.0.CO;2.

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 29

——, G. H. Taylor, and W. P. Gibson, 1997: The PRISM approach to mapping precipitation and temperature. Proc., 10th Conf. on Applied Climatology, Reno, NV, Amer. Meteor. Soc., 10–12. Dettinger, M. D., D. R. Cayan, M. K. Meyer, and A. E. Jeton, 2004: Simulated hydrologic responses to climate variations and change in the Merced, Carson, and American River basins, Sierra Nevada, California, 1900–2099. Climatic Change, 62, 283–317, doi:10.1023/ B:CLIM.0000013683.13346.4f. Easterling, D. R., G. A. Meehl, C. Parmesan, S. A. Changnon, T. R. Karl, and L. O. Mearns, 2000: Climate extremes: Observations, modeling, and impacts. Science, 289, 2068–2074, doi:10.1126/science.289.5487.2068. Falcone, J. A., cited 2011: GAGES-II, Geospatial Attributes of Gages for Evaluating Streamflow. Digital spatial dataset. [Available online at http://water.usgs.gov/GIS/metadata/usgswrd/ XML/gagesII_Sept2011.xml.] Fowler, H. J., S. Blenkinsop, and C. Tebaldi, 2007: Linking climate change modelling to impacts studies: Recent advances in downscaling techniques for hydrological modelling. Int. J. Climatol., 27, 1547–1578, doi:10.1002/joc.1556. Gleckler, P. J., K. E. Taylor, and C. Doutriaux, 2008: Performance metrics for climate models. J. Geophys. Res., 113, D06104, doi:10.1029/2007JD008972. Hay, L. E., and M. Umemoto, 2006: Multiple-objective step-wise calibration using Luca. U.S. Geological Survey Open File Rep. 2006-1323, 28 pp. ——, M. P. Clark, R. L. Wilby, W. J. Gutowski, G. H. Leavesley, Z. Pan, R. W. Arritt, and E. S. Takle, 2002: Use of regional climate model output for hydrologic simulations. J. Hydrometeor., 3, 571–590, doi:10.1175/1525-7541(2002)003,0571:UORCMO.2.0.CO;2. ——, G. H. Leavesley, M. P. Clark, S. L. Markstrom, R. J. Viger, and M. Umemoto, 2006: Stepwise, multiple-objective calibration of a hydrologic model for a snowmelt-dominated basin. J. Amer. Water Resour. Assoc., 42, 877–890, doi:10.1111/j.1752-1688.2006.tb04501.x. ——, S. L. Markstrom, and C. Ward-Garrison, 2011: Watershed-scale response to climate change through the twenty-first century for selected basins across the United States. Earth Interact., 15, doi:10.1175/2010EI370.1. Hayhoe, K., and A. Stoner, 2012: The Gulf Coast Study, phase 2: Temperature and precipitation projections for the Mobile Bay region. U.S. DOT Center for Climate Change and Environmental Forecasting Final Rep., 54 pp. [Available online at http://www.fhwa.dot.gov/ environment/climate_change/adaptation/ongoing_and_current_research/gulf_coast_study/ phase2_task2/mobile_infrastructure/mobile_climate_report.pdf.] Iorio, J. P., P. B. Duffy, B. Govindasamy, S. L. Thompson, M. Khairoutdinov, and D. Randall, 2004: Effects of model resolution and subgrid-scale physics on the simulation of precipitation in the continental United States. Climate Dyn., 23, 243–258, doi:10.1007/s00382-004-0440-y. Jun, M., R. Knutti, and D. W. Nychka, 2008: Spatial analysis to quantify numerical model bias and dependence. J. Amer. Stat. Assoc., 103, 934–947, doi:10.1198/016214507000001265. Katz, R. W., and B. G. Brown, 1992: Extreme events in a changing climate. Climatic Change, 21, 289–302, doi:10.1007/BF00139728. Knutti, R., 2010: The end of model democracy? Climatic Change, 102, 395–404, doi:10.1007/ s10584-010-9800-2. LaFontaine, J. H., L. E. Hay, R. J. Viger, S. L. Markstrom, R. S. Regan, C. M. Elliott, and J. W. Jones, 2013: Application of the Precipitation-Runoff Modeling System (PRMS) in the Apalachicola-Chattahoochee-Flint River basin in the southeastern United States. U.S. Geological Survey Scientific Investigations Rep. 2013-5162, 132 pp. Leavesley, G. H., and L. G. Stannard, 1995: The Precipitation-Runoff Modeling System—PRMS. Computer Models of Watershed Hydrology: Highlands Ranch, V. P. Singh, Ed., Water Resources Publications, 281–310. ——, R. W. Lichty, B. M. Troutman, and L. G. Saindon, 1983: Precipitation-Runoff Modeling System—User’s manual. U.S. Geological Survey Water-Resources Investigations Rep. 83-4238, 207 pp.

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 30

——, S. L. Markstrom, R. J. Viger, and L. E. Hay, 2005: USGS Modular Modeling System (MMS)– Precipitation-Runoff Modeling System (PRMS) MMS-PRMS. Watershed Models, V. P. Singh and D. K. Frevert, Eds., CRC Press, 159–177. Lee, J. E., R. Pierrehumbert, A. Swann, and B. R. Linter, 2009: Sensitivity of stable water isotopic values to convective parameterization schemes. Geophys. Res. Lett., 36, L23801, doi:10.1029/2009GL040880. Mannshardt-Shamseldin, E., R. L. Smith, S. R. Sain, L. Mearns, and D. Cooley, 2010: Downscaling extremes: A comparison of extreme value distributions in point-source and gridded precipitation data. Ann. Appl. Stat., 4, 484–502, doi:10.1214/09-AOAS287. Markovic, D., U. Scharfenberger, S. Schmutz, F. Pletterbauer, and C. Wolter, 2013: Variability and alterations of water temperatures across the Elbe and Danube River basins. Climatic Change, 119, 375–389, doi:10.1007/s10584-013-0725-4. Markstrom, S. L., 2012: P2S—Coupled simulation with the Precipitation-Runoff Modeling System (PRMS) and the Stream Temperature Network (SNTemp) models. U.S. Geological Survey Open-File Rep. 2012-1116, 19 pp. ——, R. G. Niswonger, R. S. Regan, D. E. Prudic, and P. M. Barlow, 2008: GSFLOW—Coupled ground-water and surface-water flow model based on the integration of the PrecipitationRunoff Modeling System (PRMS) and the Modular Ground-Water Flow Model (MODFLOW2005). U.S. Geological Survey Techniques and Methods 6-D1, 240 pp. Maurer, E. P., A. W. Wood, J. C. Adam, D. P. Lettenmaier, and B. Nijssen, 2002: A long-term hydrologically based dataset of land surface fluxes for the conterminous United States. J. Climate, 15, 3237–3251, doi:10.1175/1520-0442(2002)015,3237:ALTHBD.2.0. CO;2. Maxino, C. C., B. J. McAvaney, A. J. Pitman, and S. E. Perkins, 2008: Ranking the AR4 climate models over the Murray-Darling basin using simulated maximum temperature, minimum temperature and precipitation. Int. J. Climatol., 28, 1097–1112, doi:10.1002/joc.1612. Mearns, L. O., R. W. Katz, and S. H. Schneider, 1984: Extreme high-temperature events: Changes in the probabilities with changes in mean temperature. J. Climate Appl. Meteor., 23, 1601– 1613, doi:10.1175/1520-0450(1984)023,1601:EHTECI.2.0.CO;2. ——, F. Giorgi, L. McDaniel, and C. Shield, 1995: Analysis of daily variability or precipitation in a nested regional climate model: Comparison with observations and doubled CO2 results. Global Planet. Change, 10, 55–78, doi:10.1016/0921-8181(94)00020-E. Milly, P. C. D., J. Betancourt, M. Falkenmark, R. M. Hirsch, Z. W. Kundzewicz, D. P. Lettenmaier, and R. J. Stouffer, 2008: Stationarity is dead: Whither water management? Science, 319, 573– 574, doi:10.1126/science.1151915. Moriasi, D. N., J. G. Arnold, M. W. Van Liew, R. L. Bingner, R. D. Harmel, and T. L. Veith, 2007: Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE, 50, 885–900, doi:10.13031/2013.23153. Murphy, J. M., D. M. H. Sexton, D. N. Barnett, G. S. Jones, M. J. Webb, M. Collins, and D. A. Stainforth, 2004: Quantification of modelling uncertainties in a large ensemble of climate change simulations. Nature, 430, 768–772, doi:10.1038/nature02771. O’Brien, T. P., D. Sornette, and R. L. McPherron, 2001: Statistical asynchronous regression: Determining the relationship between two quantities that are not measured simultaneously. J. Geophys. Res., 106, 13 247–13 259, doi:10.1029/2000JA900193. Pappenberger, F., and K. J. Beven, 2006: Ignorance is bliss: Or seven reasons not to use uncertainty analysis. Water Resour. Res., 42, W05302. Parry, M. L., O. F. Canziani, J. P. Palutikof, P. J. van der Linden, and C. E. Hanson, Eds., 2007: Climate Change 2007: Impacts, Adaptation, and Vulnerability, Cambridge University Press, 976 pp. Perkins, S. E., A. J. Pitman, and S. A. Sisson, 2009: Smaller projected increases in 20-year temperature returns over Australia in skill-selected climate models. Geophys. Res. Lett., 36, L06710, doi:10.1029/2009GL037293.

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 31

——, D. B. Irving, J. R. Brown, S. B. Power, A. F. Moise, R. A. Colman, and I. Smith, 2012: CMIP3 ensemble climate projections over the western tropical Pacific based on model skill. Climate Res., 51, 35–58, doi:10.3354/cr01046. Pitman, A. J., and S. E. Perkins, 2008: Regional projections of future seasonal and annual changes in rainfall and temperature over Australia based on skill-selected AR4 models. Earth Interact., 12, doi:10.1175/2008EI260.1. Prudhomme, C., and H. Davies, 2009: Assessing uncertainties in climate change impact analyses on the river flow regimes in the UK. Part 2: Future climate. Climatic Change, 93, 197–222, doi:10.1007/s10584-008-9461-6. Reichler, T., and J. Kim, 2008: How well do coupled models simulate today’s climate? Bull. Amer. Meteor. Soc., 89, 303–311, doi:10.1175/BAMS-89-3-303. Reifen, C., and R. Toumi, 2009: Climate projections: Past performance no guarantee of future skill? Geophys. Res. Lett., 36, L13704, doi:10.1029/2009GL038082. Sa´nchez, E., R. Romera, M. A. Gaertner, C. Gallardo, and M. Castro, 2009: A weighting proposal for an ensemble of regional climate models over Europe driven by 1961–2000 ERA40 based on monthly precipitation probability density functions. Atmos. Sci. Lett., 10, 241–248. Santer, B. D., and Coauthors, 2009: Incorporating model quality information in climate change detection and attribution studies. Proc. Natl. Acad. Sci. USA, 106, 14 778–14 783, doi:10.1073/pnas.0901736106. Schaeffer, M., F. M. Selten, and J. D. Opsteegh, 2005: Shifts in means are not a proxy for changes in extreme winter temperatures in climate projections. Climate Dyn., 25, 51–63, doi:10.1007/ s00382-004-0495-9. Stainforth, D. A., T. E. Downing, R. W. A. Lopez, and M. New, 2007: Issues in the interpretation of climate model ensembles to inform decisions. Philos. Trans. Roy. Soc., 365A, 2163–2177, doi:10.1098/rsta.2007.2073. STARDEX, 2005: STARDEX: Downscaling climate extremes. STARDEX Executive Summary, 24 pp. [Available online at http://www.cru.uea.ac.uk/projects/stardex/reports/ STARDEX_FINAL_REPORT.pdf.] Stoner, A. M. K., K. Hayhoe, X. Yang, and D. J. Wuebbles, 2012: An asynchronous regional regression model for statistical downscaling of daily climate variables. Int. J. Climatol., 33, 2473–2494, doi:10.1002/joc.3603. Tebaldi, C., and R. Knutti, 2007: The use of the multi-model ensemble in probabilistic climate projections. Philos. Trans. Roy. Soc., 365A, 2053–2075, doi:10.1098/rsta.2007.2076. ——, R. L. Smith, D. Nychka, and L. O. Mearns, 2005: Quantifying uncertainty in projections of regional climate change: A Bayesian approach to the analysis of multimodel ensembles. J. Climate, 18, 1524–1540, doi:10.1175/JCLI3363.1. Theurer, F. D., K. A. Voos, and W. J. Miller, 1984: Instream water temperature model. U.S. Fish and Wildlife Service Instream Flow Information Paper 16, 316 pp. Trigo, R. M., R. Garcı´a-Herrera, J. Dı´az, and I. F. Trigo, 2005: How exceptional was the early August 2003 heatwave in France? Geophys. Res. Lett., 32, L10701, doi:10.1029/ 2005GL022410. U.S. Army Corps of Engineers, 1997: ACT/ACF comprehensive water resources study: Surface water availability: Volume I: Unimpaired flow. U.S. Army Corps of Engineers Rep., 96 pp. Vrac, M., and P. Naveau, 2007: Stochastic downscaling of precipitation: From dry events to heavy rainfalls. Water Resour. Res., 43, W07402, doi:10.1029/2006WR005308. Wood, A. W., L. R. Leung, V. Sridhar, and D. P. Lettenmaier, 2004: Hydrologic implications of dynamical and statistical approaches to downscaling climate model outputs. Climatic Change, 62, 189–216, doi:10.1023/B:CLIM.0000013685.99609.9e.

Earth Interactions

d

Volume 18 (2014)

d

Paper No. 9

d

Page 32

Earth Interactions is published jointly by the American Meteorological Society, the American Geophysical Union, and the Association of American Geographers. Permission to use figures, tables, and brief excerpts from this journal in scientific and educational works is hereby granted provided that the source is acknowledged. Any use of material in this journal that is determined to be ‘‘fair use’’ under Section 107 or that satisfies the conditions specified in Section 108 of the U.S. Copyright Law (17 USC, as revised by P.IL. 94553) does not require the publishers’ permission. For permission for any other from of copying, contact one of the copublishing societies.