resolution hydrologic and carbon dynamic ... - Wiley Online Library

PUBLICATIONS Water Resources Research RESEARCH ARTICLE 10.1002/2015WR017782 Key Points: Reduced-order model for predicting high-resolution hydrological and biogeochemical dynamics Accurate and efficient estimates of soil moisture, latent heat flux, and net primary production Accurate reproduction of subgrid probability distributions, with computational savings of O(1000)

Correspondence to: G. S. H. Pau, [email protected]

Citation: Pau, G. S. H., C. Shen, W. J. Riley, and Y. Liu (2016), Accurate and efficient prediction of fine-resolution hydrologic and carbon dynamic simulations from coarse-resolution models, Water Resour. Res., 52, 791– 812, doi:10.1002/2015WR017782. Received 2 JUL 2015 Accepted 7 JAN 2016 Accepted article online 11 JAN 2016 Published online 10 FEB 2016

Accurate and efficient prediction of fine-resolution hydrologic and carbon dynamic simulations from coarse-resolution models George Shu Heng Pau1, Chaopeng Shen2, William J. Riley1, and Yaning Liu1 1 2

Climate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA, Department of Civil and Environmental Engineering, Pennsylvania State University, University Park, Pennsylvania, USA

Abstract The topography, and the biotic and abiotic parameters are typically upscaled to make watershed-scale hydrologic-biogeochemical models computationally tractable. However, upscaling procedure can produce biases when nonlinear interactions between different processes are not fully captured at coarse resolutions. Here we applied the Proper Orthogonal Decomposition Mapping Method (PODMM) to downscale the field solutions from a coarse (7 km) resolution grid to a fine (220 m) resolution grid. PODMM trains a reduced-order model (ROM) with coarse-resolution and fine-resolution solutions, here obtained using PAWS1CLM, a quasi-3-D watershed processes model that has been validated for many temperate watersheds. Subsequent fine-resolution solutions were approximated based only on coarse-resolution solutions and the ROM. The approximation errors were efficiently quantified using an error estimator. By jointly estimating correlated variables and temporally varying the ROM parameters, we further reduced the approximation errors by up to 20%. We also improved the method’s robustness by constructing multiple ROMs using different set of variables, and selecting the best approximation based on the error estimator. The ROMs produced accurate downscaling of soil moisture, latent heat flux, and net primary production with O(1000) reduction in computational cost. The subgrid distributions were also nearly indistinguishable from the ones obtained using the fine-resolution model. Compared to coarseresolution solutions, biases in upscaled ROM solutions were reduced by up to 80%. This method has the potential to help address the long-standing spatial scaling problem in hydrology and enable long-time integration, parameter estimation, and stochastic uncertainty analysis while accurately representing the heterogeneities.

1. Introduction

C 2016. American Geophysical Union. V

All Rights Reserved.

PAU ET AL.

Hyperresolution hydrological and biogeochemical (BGC) models are increasingly used to study land surface and subsurface processes [Bierkens et al., 2015; Wood et al., 2011]. While the merit of highresolution models remains debatable given the scarcity of appropriate observational data to constrain the models [Beven and Cloke, 2012], high-resolution spatial structure in hydrological states and fluxes has been demonstrated to be important in the prediction of surface evapotranspiration budgets [Vivoni et al., 2007; Wood, 1997], runoff and streamflow [Arrigo and Salvucci, 2005; Barrios and Frances, 2012; Vivoni et al., 2007], atmospheric feedbacks [Nykanen and Foufoula-Georgiou, 2001], and carbon fluxes [Bohn et al., 2013, 2007]. For BGC models, there is a need to resolve ‘‘hot spot’’ dynamics at scales that range between centimeters [Frei et al., 2012] and meters [McClain et al., 2003]. Bouwman et al. [2013] also concluded that detailed representations of streams, ponds, lakes, reservoirs, floodplains, and wetlands are needed to accurately describe BGC cycles. Directly simulating these processes at their relevant scales will remain computationally challenging in an operational setting [Bierkens et al., 2015], such as for uncertainty quantification or data assimilation, when large number of simulations are needed. Practical considerations therefore impose a need for upscaling techniques that are able to represent spatial heterogeneity at coarser resolutions. In subsurface flow, the accurate upscaling of hydraulic parameters is a well-studied field, see, for example, Wen and mez-Herno ndez [1996] and Jana and Mohanty [2012]. However, effects of fine-resolution heterogeGo neities are not directly captured and the upscaled solutions may have significant biases, depending on the upscaling techniques used and the types of nonlinearities in the models, especially those with coupled processes.

EFFICIENT PREDICTION OF HYDROLOGIC AND CARBON DYNAMICS

791

Water Resources Research

10.1002/2015WR017782

One approach to recover the spatial heterogeneity at resolutions finer than represented in a particular modeling framework is to relate the statistical properties of the field of interest (e.g., soil moisture) with the spatial scale [Hu et al., 1997; Rodriguez-Iturbe et al., 1995; Wood, 1998]. However, these relationships cannot be described in a simple manner [e.g., Das and Mohanty, 2008; Famiglietti et al., 1999; Joshi and Mohanty, 2010; Mascaro et al., 2010, 2011; Nykanen and Foufoula-Georgiou, 2001]. The soil moisture statistical fractal [RodriguezIturbe et al., 1995], for example, evolves in time depending on complex interaction between rainfall, groundwater flow, soil water retention, and land-use heterogeneity [Ji et al., 2015]. A second approach is to explicitly include description of higher-order moments in the governing equations [Albertson and Montaldo, 2003; Montaldo and Albertson, 2003; Teuling and Troch, 2005; Kumar, 2004; Choi et al., 2007]. However, the derivations of these augmented governing equations for different processes and subsequent numerical implementations are nontrivial, and are frequently applicable only to idealized situations. These methods also cannot account for the temporal memory in the system that impacts BGC transformations. A third approach is to relate subgrid higher-order moments to the means, and then apply these relationships within a model that predicts the transient coarse-resolution mean. In many observationally based studies [e.g., Brocca et al., 2010, 2012; Choi and Jacobs, 2011; Famiglietti et al., 2008; Lawrence and Hornberger, 2007; Li and Rodell, 2013a; Pan and Peters-Lidard, 2008; Rosenbaum et al., 2012; Tague et al., 2010; Teuling et al., 2007; Teuling and Troch, 2005] and theoretical analysis [e.g., Vereecken et al., 2007], an upward convex relationship between the mean and variance of soil moisture has been reported. However, the relationships between soil moisture mean and statistical moments have been reported to depend on a large number of factors [Brocca et al., 2007; Ivanov et al., 2010; Riley and Shen, 2014]. The lack of a set of dominant factors complicates efforts to utilize these relationships within land models to represent subgrid spatial heterogeneity. In climate modeling, a common approach to capture subgrid variability is the tiling scheme [Essery et al., 2003]. For example, in Community Land Model [Oleson et al., 2010], the tiling scheme consists of a hierarchy of nested subgrids. A particular grid block is divided into land units which are subdivided into snow/soil columns; each column is further populated with different plant functional types (PFTs). Although this method allows processes to be described at the desired scale, it is still computationally demanding since we must evaluate climate variables (such as heat and carbon fluxes) in all the grid blocks in a fine-resolution subgrid. In addition, tiling scheme is not capable of capturing the effects of lateral communication, e.g., overland flow and groundwater flow. Recently there has been discussion in the literature regarding the missing mechanism of lateral convergence and how tiling schemes have been found to be inadequate under some conditions [Clark et al., 2015]. Downscaling techniques have been also used to downscale climatic variables, typically precipitation and temperature, from global to regional scales; reviews of these techniques have been well-documented, e.g., in Wilby et al. [1998], Fowler et al. [2007], and Gutmann et al. [2014]. Statistical scaling methods, such as bias corrected spatial disaggregated method [Wood et al., 2004], have been particularly successful for downscaling of precipitation suitable as input to regional hydrological model. More complex regression models [von Storch et al., 1993; Hanssen-Bauer et al., 2003] are used to directly model the relationship between the predictors (e.g., sea level pressure) and climatic variables of interest. A similar approach called pattern scaling [Tebaldi and Arblaster, 2014] has been used within integrated assessment models for impact assessment. However, the above methods have not been directly used to downscale land subsurface and surface quantities such as soil moisture and net primary production (NPP). In this paper, we focus on a reduced-order modeling technique called Proper Orthogonal Decomposition Mapping Method (PODMM) [Robinson et al., 2006; Pau et al., 2014] that captures spatially explicit relationships between fine-resolution and coarse-resolution model predictions. PODMM determines the dominant singular vectors that describe the spatial and temporal variability in the solutions based on a singular value decomposition of the data matrix, similarly to principle component analysis [Jolliffe, 2002] and KarhunenLoeve decomposition [Everson and Sirovich, 1995]. High-resolution approximations are then obtained by mapping coarse-resolution solutions onto a subset of the singular vectors. PODMM can thus be considered as a regression-based downscaling technique since the mapping process involves solving a least square minimization problem. There are other types of reduced-order models (ROMs) that are commonly used in hydrology that do not involve mapping coarse-resolution solutions to fine-resolution solutions. An approach that is closely related

PAU ET AL.


792


10.1002/2015WR017782

to PODMM is the projection-based POD approach [Willcox and Peraire, 2002] that directly discretizes the governing equation in the approximation space spanned by the singular vectors. This approach has been recently used in hydrology [Vermeulen et al., 2004; Siade et al., 2010; Li and Rodell, 2013b; Boyce et al., 2015], and subsurface flow [Cardoso et al., 2009; Lieberman et al., 2010]. However, for highly nonlinear partial differential equations, projection-based approaches are difficult to apply and intrusive, requiring extensive modification to existing codes. Alternatively, response surface approaches are less intrusive methods that directly map the model parameters to the output of the model [see, e.g., Razavi et al., 2012, for a sample of ROMs used in hydrology]. In hydrology, these ROMs are typically constructed based on Bayesian statistical approaches [Kennedy and O’Hagan, 2001; Sacks et al., 1989]. In particular, Gaussian process regression (GPR) [Rasmussen and Williams, 2006] has been almost exclusively used in recent years for model calibration and uncertainty analysis [Drignei et al., 2008; Edwards et al., 2010; Holden et al., 2009; Olson et al., 2012; Rougier et al., 2009]. However, only scalar output is modeled in these studies. Direct use of GPR with multivariate output is possible by considering a covariance function that is a function of both the parameters and all components of the output [Alvarez, 2012; Conti and O’Hagan, 2009] although the resulting GPR model will be computationally expensive if we are interested in predicting fine-resolution solution. An alternative approach is to first reduce the degree of freedom of the multivariate output through principal component analysis [Higdon et al., 2008; Lawrence, 2004; Liu et al., 2015] or wavelets [Bayarri et al., 2007; Drignei et al., 2008; Marrel et al., 2010]. GPR models are then constructed for the output of these dimensional reduction techniques. This approach belongs to the same category as the regression-based statistical downscaling techniques if coarse-scale predictors are used in place of model parameters. In this paper, we focused on the development of PODMM. We applied PODMM to a PAWS1CLM model of a Michigan river basin that has significantly more temporal and spatial variability compared to the model in Pau et al. [2014]. In the section 2, we describe the Michigan watershed used for our simulations, the PAWS1CLM model, the PODMM method and several new improvements, and an efficient approach to error estimation. In the section 3, we describe the accuracy of the methods in approximating fine-scale and upscaled solutions, as well as subgrid statistical distributions. We end with a discussion of the method’s current limitations, possible improvements, and methods to incorporate the proposed ROM approach within a global-scale hydrological and biogeochemical model.

2. Methods 2.1. The Clinton River Watershed The Clinton River basin (CRB, 1837 km2) drains into Lake St. Clair to the East and has a humid continentalclimate, with relatively uniform seasonal distribution of precipitation (average annual precipitation is approximately 900 mm) but strong seasonal variations in solar radiation and air temperature (average temperatures in July and December are 228C and 238C, respectively). There are significant spatial heterogeneities in the topography, subsurface properties and vegetation [see Shen et al., 2013, Figures A1 and 2]. The basin has rugged hills on the highlands in the West and flat plains toward the East, with minimum and maximum elevations of 175 m and 375 m, respectively. This regional-scale topographic descent induces basinscale groundwater flow and regional patterns of soil moisture [Shen et al., 2013]. The soils in the basin consist of patchy, unconsolidated glacial drifts (moraines and till) in most parts of the basin and lacustrine deposits in the Southeast, where higher clay percentages are found. Underneath the soils is the very-lowpermeability cold water shale that can be treated as impervious. The basin has varying intensities of urban areas in the South, forests in the northwestern hills, and agricultural land uses on the plains in the Northeast. 2.2. PAWS1CLM Model Description and Simulations Performed PAWS1CLM model was used to generate fine-resolution and coarse-resolution solutions used to construct the ROM. PAWS (Process-based Adaptive Watershed Simulator) [Shen et al., 2013; Shen and Phanikumar, 2010] is a computationally efficient, physically based hydrologic model that is coupled with CLM4.0 [Lawrence et al., 2011]. PAWS1CLM explicitly solves the physically based governing partial differential equations for overland flow, channel flow, subsurface flow, wetlands, and the dynamic two-way interactions among these components. The model evaluates the integrated hydrologic response of the surface-subsurface system using a noniterative method that couples runoff and groundwater flow to vadose zone processes

PAU ET AL.


793


10.1002/2015WR017782

approximating the three-dimensional (3-D) Richards equation. By reducing the dimensionality of the fully 3-D subsurface problem, the model significantly reduces the computational demand with little loss of physics representation. The model has a lowland-depression storage compartment that allows groundwater to exfiltrate prior to saturation of the entire soil column, which reduces the effects of resolution on the simulated water table elevations. Rivers in the domain are discretized separately from the land grid, and explicitly exchange with the overland and groundwater flows. The PAWS1CLM model has been tested extensively with analytical and 3-D benchmarks and compares favorably with other physically based models [Maxwell et al., 2014; Shen and Phanikumar, 2010]. Reactive transport simulation capabilities have recently been added to the model [Niu and Phanikumar, 2015]. We applied PAWS1CLM at 220 m 3 220 m horizontal resolution across the CRB. Although this resolution is coarser than the hyper-resolution called for in Wood et al. [2011] and proof-of-concept work in Kollet et al. [2010], it provides substantial resolution of topographic and land use variation across a horizontal 256 3 280 grid. Twenty vertical layers were used to discretize the subsurface between the land surface and bedrock top. The vertical spatial resolution therefore varies throughout the basin depending on the depth to bedrock. The simulations were performed from 2002 to 2009. Details of the data used to construct and calibrate the model have been described in Shen et al. [2013]; for completeness we summarize the information in the Appendix A. When generating models on different resolutions, the GIS-enabled discretizer aggregated the same raw data sets listed above, which have resolutions that are finer than our fine-resolution model. Similar to constructing large-scale land surface models, the areally averaged attributes, e.g., surface elevation, bedrock elevations, and horizontal conductivities, were used as the attributes for the cell. The areally averaged vertical conductivities at different depths were also used. For each cell, we recorded bottom elevation of the lowland-depression storage by registering the lowest elevation found in the cell. The other soil hydraulic parameters (e.g., the van Genuchten parameters) were derived based on the dominant soil type in the cell. Although upscaling to effective parameters is a research topic [Joshi and Mohanty, 2010], we employed the pragmatic approach described above as our focus is on the effectiveness of the ROM. 2.3. Proper Orthogonal Decomposition Mapping Method Proper orthogonal decomposition mapping method (PODMM) was first proposed by Robinson et al. [2006] and it is derived from the Gappy proper orthogonal decomposition (POD) method [Everson and Sirovich, 1995]. This method was recently applied to a subsurface, three-dimensional hydrological model in Pau et al. [2014] although the lateral extent of the studied Alaska polygonal tundra sites is only 100 m 3 100 m and the spatial properties, such as surface elevation and rock permeability, at the sites are relatively homogeneous. For these sites, relative accuracies of soil moisture predictions ranging from 0.1% to 1% were achieved. Here we summarize the details of the PODMM, and describe several new improvements tailored to tackle some of the challenges we encountered in applying the method to heterogeneous watershed-scale models. PODMM method maps coarse-resolution solutions (g5½g1 ; . . . ; gN g T ) to fine-resolution solutions (f5½f1 ; . . . ; fN f T ); N g and N f are the respective degrees of freedom. These solutions can correspond to any spatial quantity of interests obtained from the PAWS1CLM model. Under the same set of model parameters (e.g., vegetation distribution, soil types, and topography), simulation periods, and climate forcings (e.g., precipitation rate), we obtain a training sample set consisting of corresponding sets of fine-resolution and coarse-resolution snapshots, i.e., ff 1 ; . . . f N g and fg1 ; . . . gN g where N is the number of snapshots in the training sample set. We then construct a reduced-order model (ROM) based on this training sample set. In prediction stage, fine-resolution solutions are then reconstructed using the ROM and the coarse-grid solution, i.e., for any given g, we will determine a f ROM , a downscaled approximation to the corresponding g. We achieve dimensional reduction by first determining a set of POD bases that are used in subsequent approximation of f. These POD bases are found through a singular value decomposition of a data matrix. With PODMM, the data matrix WPODMM is given by: " # f 1 2f . . . f N 2f PODMM (1) W 5 g1 2g . . . gN 2g P P where f5 N1 Ni51 fi and g5 N1 Ni51 gi . We then determine M right singular vectors, V5fv1 ; . . . ; vM g corresponding to the M largest singular values for the above data matrix. The POD bases are then given by

PAU ET AL.


794


10.1002/2015WR017782

fi 5WPODMM vi ; i51; . . . ; M, and represent dominant modes of variability in the snapshots within the data matrix WPODMM . Dimensional reduction is achieved if only a small M (typically N N f ) are needed to approximate f to desired accuracy. By decomposing fi into: " f# fi fi 5 g (2) fi where ffi and fgi are components associated with the fine-resolution and coarse-resolution models, an approximation to f and g can be obtained by taking linear combinations of ffi and fgi , respectively: f fapprox 5f1

M X

ci ffi ;

g gapprox 5g1

i51

M X

ci fgi

(3)

i51

All POD-based ROMs differ on how c5fc1 ; . . . ; cM g is determined. With PODMM method, we determine the optimal c, given by aPODMM , by minimizing the least square error between g and gapprox for any given g: M X aPODMM ðgÞ5arg min g2g2 ci fgi (4) c

2

i51

where Nh 1 X jjhjj2 5 h2 N h i51 i

!1=2 (5)

is the root mean square (RMS) for a given vector h with dimension N h , and M is the number of POD bases used in an approximation. The approximated PODMM solution, fPODMM , is then given by: f PODMM 5f1

M X

aPODMM ffi i

(6)

i51

We note that c5 aPODMM does not minimize jjf2fapprox jj2 but if ffi is strongly correlated with fgi , we expect the resulting approximation to be good. Next we present an improvement to the algorithm. The number of POD bases we can use is typically a small percentage of the degrees of freedom in the coarse-resolution model to avoid over-fitting [Pau et al., 2014]. However, in a typical land surface simulation, a large number of outputs are generated, some of which are correlated to one another. By including these correlated variables into the data matrix, we can potentially increase the degrees of freedom used to determine an optimal c. This approach is motivated by the same rational behind cokriging [Odeh et al., 1995; Ver Hoef and Barry, 1998; Goovaerts, 1998]. This increase will allow larger number of POD bases to be used and improve the overall accuracy of the approximation. Denoting this approach as POD Multicomponent Mapping Method (PODM3), the associated data matrix, WPODM3 , can be defined as 2 6 6 6 6 6 6 PODM3 6 W 56 6 6 6 6 6 4

321

rðf 1 Þ ..

7 7 7 7 7 7 7 7 7 7 7 7 7 5

. rðf n Þ 1

rðg Þ ..

.

2

f11 2f

1

6 6 6 6 6 f n 2f n 6 1 6 6 g1 2g 1 6 1 6 6 4

rðgn Þ

gn1 2g n

... ⯗ ... ... ⯗ ...

f 1N 2f

1

3

7 7 7 n 7 f nN 2f 7 7 7 g1N 2g 1 7 7 7 7 5

(7)

gnN 2g n

where f j and gj are the fine-resolution and coarse-resolution solutions of jth variable, and rðf j Þ and rðgj Þ are the standard deviations of fj and gj determined based on the training snapshots. We scale the different variables by the inverse of the standard deviation. To determine the optimal c, we jointly minimize the least square error between the scaled gj and scaled gapprox;j :

PAU ET AL.


795


10.1002/2015WR017782

Table 1. Summary of the ROMs Studied in this Paper Abbreviation

Data Used

PODMM PODM3 PODMM-vm PODM3-vm PODM1ð^ Þ

Solution of a single variable determined at coarse resolutions and fine resolutions Solutions of multiple variables that are correlated to one another, determined at coarse and fine resolutions Similar to the PODMM, but the optimal M depends on month of the year Similar to the PODM3, but the optimal M depends on month of the year Multiple ROMs are evaluated simultaneously and best approximation is chosen based on an empirical error estimator ^ Multiple ROMs are evaluated simultaneously and best approximation is chosen based on the actual error

PODM1ðÞ

aPODM3 5arg min c

n M X gj 2g j X gj 2 c f i i rðgj Þ j51

i51

(8) 2

The PODM3 approximation is then given by: f PODM3 5f1rðfÞ

M X

aPODM3 ffi i

(9)

i51

The accuracy of fPODM3 depends on the scaling approach and the correlations between the primary variable we are interested in and the ancillary variables considered in PODM3. Different scaling approaches will change the minimization problem that we are solving in equation (8) and thus yield different aPODM3 . If the variables considered are correlated, equation (8) allows us to search for a better c in the vicinity of aPODMM since minimizers of jjgi 2gapprox;i jj2 ; i51; . . . ; n will be clustered. We do not consider ancillary variables that are perfectly and linearly correlated to the primary variable since no new information is gained from adding these ancillary variables. Instead, we choose variables that are correlated, but not perfectly, to the primary variables we are interested in. We list the primary variables studied in this paper and their respective ancillary variables in section 2.6. A second improvement is possible by taking into account the fact that the magnitude and characteristics of a variable can change drastically over an annual cycle due to seasonal changes. This variation with time has a strong effect on the optimal M. However, it is not currently possible to determine an optimal M for any given g. As a compromise, we prescribe an optimal M for each month, determined by analyzing the validation results of PODMM and PODM3. We denote the resulting methods as PODMM-vm and PODM3-vm, respectively. Finally, PODM1 is a method where we determine the best approximation by finding the method that gives the lowest error for a given g. Using the actual error (denoted as PODM1ðÞ ) is not practical, however, since we need to first determine f to determine the actual error. The alternative (denoted as PODM1ð^ Þ ) uses the empirical error estimator, described in section 2.5, to identify the best approximation among the different methods. We summarize the different methods in Table 1.

2.4. Approximation Error We studied ROM errors at the fine and coarse resolutions used to determine f and g, respectively. At fine resolution, we measure the RMS error (RMSE) as defined in equation (5): ROM ðfÞ5jjeðfÞjj2

(10)

where eðfÞ5f2fROM and ROM can stand for any of the methods listed in Table 1. We note that ROM is a very stringent error criteria if exact reproduction of the spatial heterogeneity is not needed. To facilitate the analysis of ROM , we examine the sample-averaged error ( ROM ) for the entire validation sample set: ROM 5

Nv 1 X ROM ðfi Þ Nv i51

(11)

We can also define a monthly averaged error ( ROM monthly ) when the averages are only taken over snapshots in a particular month of a year. Similarly, we can define the RMSE at coarse resolution as:

PAU ET AL.


796


10.1002/2015WR017782

Figure 1. g, f, f ROM , and f ROM 2f on days with error (ROM ) that is close to the average error over the summer. The ranges of f ROM 2f are set to 25% of the minimum and maximum of f ROM 2f so that the distributions of the difference can be more clearly visualized. The snapshots indicate that the ROM can accurately predict the heterogeneous structure in the fineresolution solutions.

~ ROM ðfÞ5jj~e ðfÞjj2 ; ROM

ROM ~e ðfÞ5~f2~f

(12)

where ~f and ~f are upscaled solutions of f and f . The fine-resolution and coarse-resolution grids are nested and the upscaling procedure amounts to averaging elements of f that fall within a coarse gridcell. Since the boundaries of the watershed are irregular, the boundaries of the fine and coarse models do not match. The above errors are thus only evaluated for coarse gridcells that are not on the boundaries. We can ROM similarly measure the error in the coarse-resolution solution by replacing ~f by g. Analogous to ROM and ROM ROM ROM ~ ~ ROM ~ , we can define the mean errors and for . monthly monthly ROM

Exact reproduction of the spatial heterogeneity is not frequently needed as long as fROM has the same probability distribution as f. To compare the empirical distribution functions of f and f ROM , we determine the Kolmogorov-Smirnov (KS) statistic: Df;fROM 5 sup jF f ðxÞ2F f

ROM

ðxÞj

(13)

x

ROM

ðxÞ are empirical cumulative distribution functions (cdfs) of f and f ROM . Based on equaqffiffiffiffiffiffiffi N 2f ROM < kðaKS Þ , where k is the critical value D tion (13) and significance level aKS , we can determine P 2N f f;f where F f ðxÞ and F f

of the Kolmogorov test for the aKS specified. We will use aKS 50:001 in this paper. We can similarly define ROM and g to ~f. the KS statistics to compare ~f 2.5. Estimate of Approximation Error During the prediction stage, the approximation error must be estimated since we do not know the true P g ROM solution. We found ROM ðgÞ5jjg2gROM jj2 ; gROM 5g1 M ðfÞ. In i51 ci fi to be highly correlated with PAU ET AL.


797


10.1002/2015WR017782

Figure 2. g, f, f ROM , and f ROM 2f on days with the worst approximation in 2006. The ranges of f ROM 2f are set to 25% of the minimum and maximum of f ROM 2f so that the distributions of the difference can be more clearly visualized. Even though the differences between f and f ROM are clearly more distinguishable than Figure 1, the heterogeneity structures are still well-approximated.

addition, since g is typically viewed as an upscaled f and thus less heterogeneous than f, ROM ðgÞ is typically smaller than ROM ðfÞ. Thus, we propose the following empirical error estimator ^ ROM ðgÞ that approximates ROM ðfÞ: ^ ROM ðgÞ5ROM ðgÞ1C

(14)

where C5max ðf;gÞ2Nv jROM ðfÞ2ROM ðgÞj and Nv is the validation sample set. For a given g (used to determine f ROM and the corresponding f, we define the effectivity of the error estimator as: gROM 5

^ ROM ðgÞ ROM ðfÞ

(15)

We note that ^ only depends on g once C is determined. It can thus be evaluated efficiently since it only involves determining gROM . Values of gROM that vary between 1 and 10 is typically desirable. The lower bound of 1 means ^ ROM is always an upper bound of the actual error, while the upper bound of 10 means the estimator is only an order of magnitude larger than the actual error. 2.6. ROM Setup We performed simulations at three horizontal resolutions (Dx5 7040, 3520, and 220 m) from 2002 to 2009 during which daily snapshots of the solutions were taken. We studied the approximation of the soil moisture in the top 10 cm layer (h0210 ), the latent heat (Hl), and the net primary production (NPP); these variables are representative of typical moisture, heat, and carbon cycle variables. Solutions at the finest resolution, Dx 5Dxf 5220 m, are taken to be the reference solutions, and will be denoted by f. Solutions at resolutions Dx 57040 m or 3520 m (denoted as g) are used to reconstruct an approximation to f. The ROMs were

PAU ET AL.


798


10.1002/2015WR017782

constructed based on a training sample set consisting of snapshots taken from 2002 to 2005 and validated based on a validation sample set consisting of snapshots taken from 2006 to 2009. Since we are studying multiple variables, we use [ ] to denote which variable we are determining the error for. For example, PODMM ½h0210 means we are determining PODMM for the ROM constructed for h0210 using PODMM. For constructing a ROM based on PODM3, we used the following ancillary variables: soil moisture at depths 10–30, 30–50, 50–100, and 100–200 cm for h0210 ; evapotranspiration rate (ET), sensible heat (Hs), and ground temperature (Tground ) for Hl; and leaf area index (LAI), gross primary production (GPP), net ecosystem exchange (NEE), heterotrophic respiration (rh), autotrophic respiration Figure 3. Average f and f ROM for a 2 km 3 2 km domain at the center of the (ra), and ecosystem respiration (re) for domain. The temporal dynamics of the variables are accurately captured. NPP. We did not consider forcing data since the resolution of forcing data is coarser than the coarse-resolution grid and as such the effects of spatial variability in the forcing data are already captured by the coarse-resolution solutions.

3. Results In this section, we first demonstrate the ability of the ROM to reproduce both spatial and temporal heterogeneity in the fine-resolution solutions. We then demonstrate how the upscaled solutions of the ROM have significantly smaller bias than coarse-resolution solutions when compared to the upscaled fine-resolution solutions. We finally describe how the various numerical parameters in the ROM affect the accuracy of the approximation, and compare among the different methods proposed.

Figure 4. Variation of ROM monthly for the months in the validation period. The ROM errors for h0210 , Hl, and NPP are small in comparison to the monthly means.

PAU ET AL.

3.1. Comparing ROM Solutions to Fine-Resolution Solutions Here we compare f and f ROM to determine whether f ROM reproduces the spatial heterogeneities in f. We consider only the ROM constructed using PODM1ð^ Þ in this section; in subsequent comparisons of the methods proposed in section 2.3, we will demonstrate that PODM1ð^ Þ is the most robust and accurate method. The comparisons between fine-resolution solutions and ROM solutions of h0210 , Hl, and NPP show that the PODM1ð^ Þ is able to reproduce the heterogeneity in fine-resolution solutions of h0210 , Hl,


799


10.1002/2015WR017782

Figure 5. Differences between the seasonal averages of f ROM and f for the following seasonal periods: December–February (DJF), March–May (MAM), June–August (JJA), and September–November (SON).

and NPP very accurately (Figure 1, shown for days with errors (ROM ) that are close to the average error over the summer: 0.031 m3 m23, 10.1 W m22, and 5.3 31026 g Cm22 s21, respectively). The difference between f and fROM is small compared to the magnitude of the variables. Compared to the coarse-resolution solutions g, it is clear that the ROM adds significant amount of information when downscaling from the coarseresolution grid to the fine-resolution grid. Based on the rightmost column in Figure 1, the ROM slightly under-predicts h0210 when h0210 is high and over-predicts h0210 when h0210 is small. Figure 2 shows the comparison of the solutions g, f, f ROM on the day with the largest error (ROM ) within the validation period. The corresponding ROM are 0.1 m3 m23, 35.2 W m22, and 1:831025 g Cm22 s21, respectively. Even though the differences between f and fROM are more clearly distinguishable than Figure 1, the heterogeneity structures are still well-approximated. We note that 80% of the ROM of the snapshots in the validation sample set are below 0.04 m3 m23, 9.8 W m22, and 4:931026 g Cm22 s21 for h0210 , Hl, and NPP, respectively. Apart from the spatial heterogeneity, the ROMs are also able to capture the time-varying dynamics in the solutions, as shown by the time series of h0210 , HI, and NPP, averaged over a 2 km 3 2 km domain in the middle of the domain (Figure 3). To more concisely determine the ROM’s ability to reproduce the temporal variability in the entire f, we examined the monthly mean approximation error ( ROM monthly ) for each month of the year. The errors are only a small fraction of the monthly mean of f (Figure 4). In June (the month Hl and NPP are at their largest), the average errors are 7.31 W m22 and 3:431026 g Cm22 s21, or 8.5% and 12.4%, respectively, relative to the monthly means. The ROM of h0210 has lesser variation since the monthly mean of h0210 also varies less than Hl and NPP. To understand the temporal dependence of the spatial difference between f and f ROM , we looked at the difference between the seasonal averages of fROM and f (Figure 5). The differences in h0210 are more uniform than Hl and NPP, both spatially and temporally. In addition, the differences are largest in the urban area (south of the watershed) where h0210 is typically underestimated. Similarly, Hl and NPP are consistently underestimated in the urban area. Temporally, Hl and NPP are underestimated in the winter (DJF) and overestimated in other seasons. Although the differences in all three variables are relatively small compared to the magnitudes of the variables, the presence of distinguishable

PAU ET AL.


800


10.1002/2015WR017782

Figure 6. The leftmost plots show how Df;fROM changes over the entire validation period. Df;fROM is the Kolmogorov-Smirnov statistics defined in (13); a small value indicates two pdfs are similar. At point (a), the Df;fROM is closest to the average of Df;fROM over the summer period. The comparison of the corresponding pdfs of f and f ROM is shown in the middle column. At point (b), Df;fROM is at its maximum over the validation period. The corresponding pdfs are shown in the rightmost column. The Df;fROM is small, particular in summer, resulting in pdfs that are similar. The Df;fROM tends to be larger in winter, with larger discrepancies between the pdfs, especially for Hl and NPP. However, these quantities are an order of magnitude smaller during the winter compared to summer.

patterns indicates that the number of POD bases used in our approximation is not sufficient to capture all structural variabilities in the variables. We next compare the statistical distributions of f and f ROM by examining how Df;fROM varies over the validation period (Figure 6). Df;fROM , defined in (13), measures closeness between two empirical cdfs, and thus is a good indicator of how similar two probability density functions (pdfs) are. For all three variables, the Df;fROM is small in summer but tends to be larger in the colder months. In Figure 6, we have also highlighted two points. At point (a), the Df;fROM is closest to the average of Df;fROM over the summer period. An examination of the corresponding pdfs shows that f ROM reproduces the pdf of f. At point (b), Df;fROM is at its maximum over the validation period. As expected, the pdfs of f and f ROM show greater discrepancies, especially for Hl and NPP. However, Hl and NPP are also an order of magnitude smaller during the winter compared to summer. Thus, impact from the larger discrepancies will be small. The poorer reproduction of the pdfs of f by fROM during the winter is not surprising since the dominant signals captured by PODMM occur in the summer when the magnitudes of Hl and NPP are significantly larger than in other seasons. In order to study how the statistical distributions of f and f ROM differ spatially and temporally, we determine ~ season , the season-averaged D ROM for each grid block in the coarse-resolution grid (Figure 7). Discrepancies D f;f ~ season also tends tend to be larger in the eastern region where the agricultural and urban lands are located. D ~ to be smaller in summer. For NPP; D season is noticeably larger in fall, winter and spring because the smaller magnitudes lead to larger relative difference between f and fROM , especially in the western region. We note ~ season , the match between the subgrid pdfs of f and fROM is very good (Figure 8). that even for the larger D For many coarse grid cells, the POD-generated pdfs are nearly indistinguishable from the fine-grid simulations, especially for Hl, thus reproducing the subgrid variability accurately. The good performance was achieved not only for cells with high central tendency (bell-shaped pdfs), but also for skewed, flat and multimodal distributions. Both h0210 and Hl have some local peaks that were smoothed out in the f ROM ’s pdf. The

PAU ET AL.


801


10.1002/2015WR017782

~ season is the season-averaged D ROM for each grid block in the coarse-resolution grid; D ROM is the Kolmogorov-Smirnov statistics Figure 7. D f;f f;f ~ season is smallest in the spring (MAM) for h0210 , fall season (SON) for Hl and summer (JJA) for NPP, although the seasonal defined in (13). D differences can be considered small.

h0210 shows slightly larger deviation than Hl because the patchiness of the soil texture aggregates that strongly influences the soil moisture cannot be captured efficiently with a small number of POD bases. On the other hand, Hl has lower sensitivity to soil texture spatial aggregation patterns. 3.2. Comparing Upscaled ROM Solutions to Coarse-Resolution Solutions Similar to previous section, we consider only ROM constructed using PODM1ð^ Þ . At coarse resolution, the ROM mean monthly error of the upscaled ROM solutions (~ monthly ) is significantly smaller than mean monthly coarse erro of the coarse-resolution solutions (~ monthly ) when compared to the upscaled fine-resolution solutions (Figure 9). In June, for example, the error is reduced by 59%, 62%, and 80% for h0210 , Hl, and NPP, respectively. Consistent with the errors measured on the fine grid, the relative improvement is bigger in the summer months and smaller in the winter months. This result demonstrates that the ROM significantly reduces the bias in the coarse-resolution model, especially in summer months when the biases are typiROM cally larger. In addition, the pdfs and cdfs of ~f and ~f are very similar, while the pdfs and cdfs of g devi~ ate significantly from those of f (Figure 10, shown for June 2005; we use data from the entire month in order to have sufficient data points to construct the pdf and cdf since the coarse-resolution grid has only 23 interior grid blocks). 3.3. Numerical Analysis 3.3.1. Convergence Analysis of ROMs The main numerical parameter in PODMM is M, the number of POD bases used in a ROM. We thus examine how the mean RMSE of the ROM constructed based on PODMM ( PODMM ), measured on two different resolutions: Dx57040 and 3520 m, converges with M (Figure 11). The errors decrease monotonically with M initially but eventually start to fluctuate since the ROM approximations of the snapshots in the validation set can have different optimal M for which the errors are the smallest. The large fluctuation (e.g., for h0210 ;

PAU ET AL.


802


10.1002/2015WR017782

Dx57040 m, and M > 10 in Figure 11) is primarily due to over-fitting for all snapshots in the validation data set [Pau et al., 2014]. The fluctuation is less prominent when the coarse model has higher resolution. For all three variables, the errors for Dx53520 m fluctuate much less than the errors for Dx57040 m since the higher resolution provides more data that can be used to fit a larger number of a. PODMM (mean of D PODMM ) also The D f;f f;f decreases with M (Figure 11). For h0210 , PODMM is approximately 0.07. the smallest D f;f We also do not see large increases in PODMM when there are large increases in D f;f PODMM . As such, the empirical cdf of the ROM solutions is very close to the empirical cdf of the actual solutions, even though ROM solutions do not reproduce the exact spatial structures of the actual solutions. When evaluated at M with the smallest PODMM (between 9 and 11), the KS null hypothesis is rejected for all snapshots in the validation data set for all three variables. As such, even when PODMM is relatively large compared to the mean value (for example, PODMM of NPP relative to the mean NPP is 28%), the ROM solutions are still statistically accurate and can thus be used in situations where exact reproduction of the spatial patterns is not the goal. The h0210 , Hl, and NPP have different convergence behaviors. Figure 11 shows that doubling the resolution of the coarse model reduces the minimum PODMM of Hl and NPP by approximately 17% and 27% but no reduction is seen for h0210 . The upscaling procedure plays an important role in the coarse model. When discretizing for the coarseresolution models, the nonlinear van Genuchten soil parameters of a gridcell were determined based on the dominant soil type. As a result, increasing the Figure 8. Spatial variations of the pdfs of fine-scale solution and ROM approxcoarse model’s resolution does not necimation of h0210 , Hl, and NPP for the summer. The subplots are 7 km grid cells essarily mean the resulting solution is arranged in the same way as they are in the actual watershed. Blue line is the closer to the fine model’s solution. On pdf of f and red line is the pdf of f ROM . The pdfs in the subregions are well approximated by the ROMs. the other hand, Hl and NPP are functions of PFT types and the proportion of the PFT types per unit area was preserved regardless of resolution, resulting in a more consistent upscaling procedure. Thus, the coarse model’s solution converges consistently to the fine model’s solution when the resolution of the coarse model is increased. A higher-resolution coarse model for Hl and NPP will then lead to a lower PODMM of Hl and NPP.

PAU ET AL.


803


10.1002/2015WR017782

3.3.2. Comparison of the Different Methods We studied whether the different enhancements described in section 2.3, including the use of correlated variables to make prediction (PODM3), monthly varying M (PODM-vm and PODM3-vm), and joint consideration of all aforementioned methods (PODM1ð^ Þ ), can improve the accuracy of PODMM. We only considered the coarse model with resolution Dx57040 m in this section. We define Mopt;monthly as the M at which the mean monthly error ( PODMM monthly ), averaged over the validation period, is smallest for each month. Figure 12 shows that Mopt;monthly for h0210 and Hl tends to be larger in winter months comROM coarse ~ monthly is significantly lower than ~ monthly for h0210 , Hl, and NPP. Figure 9. pared to the summer months while the Mopt;monthly for NPP has the opposite trend. With PODMM-vm, we use Mopt;monthly for different months instead of using Mopt , defined as the M at which the mean error ( PODMM ) over the entire validation sample set is smallest. As expected, this enhancement reduces PODMM monthly when Mopt;monthly is significantly different from Mopt (Figure 12). Similarly, the accuracy of the ROM constructed using PODM3-vm has higher overall accuracy compared to the ROM constructed using PODM3. Comparison between the PODMM-vm and the PODM3-vm shows that the PODM3-vm performs better than the PODMM-vm in summer months (typically April–November) but worse during the rest of the year (Figure 13). This discrepancy is likely because the variables considered in the PODM3-vm are more strongly correlated during summer months than in the winter months. For example, Hl is much more weakly correlated with Hs in the winter months (the correlation coefficient is less than 0.2) compared to the summer months. By using PODM1ð^ Þ , the errors we obtained are the minimum, or close to minimum of the errors obtained using the PODMM-vm and the PODM3-vm, as shown in Figure 13. To construct a ROM based on PODM1ð^ Þ , we first constructed ROMs based on PODMM, PODM3, PODMM-vm, and PODM3-vm. In addition, different combinations of ancillary variables are considered when constructing ROMs based on PODM3 and PODM3vm. Given a coarse-resolution solution, the best approximation is determined by choosing results from the

ROM

Figure 10. The pdf and cdf of ~f

PAU ET AL.

have much smaller deviations from those of ~f compared to g.


804


10.1002/2015WR017782

PODMM (mean of D PODMM ) with M for h0210 , Hl, and NPP using two different coarse models. Figure 11. Variations of PODMM and D f;f f;f

ROM with the lowest error estimator, ^ ROM , which can be evaluated efficiently. However, when the accuracies of the different ROMs are similar, the ^ ROM determined from the different ROMs are insufficiently precise 1ð^ Þ to determine the ROM that gives the lowest ROM PODMM of Hl in May is larger than monthly monthly . For example, the 22 PODM3 vm the monthly , but the difference is only less than 1 Wm (Figure 13). Figure 13 also shows that the PODM1ð^ Þ is less accurate than PODM1ðÞ . The larger error is expected since the true error, ROM is used to determine the best M for each snapshot and the best approximation among the different ROMs considered in PODM1ðÞ . However, it is impractical to use ROM as a selection criteria since we need to first compute the true solution. As such, PODM1ðÞ provides a lower bound on the approximation error but cannot be efficiently used as a ROM. Based on the above results, we can conclude that PODM1ð^ Þ is the best method among the methods considered in this paper. 3.3.3. Error Estimator The ability of PODM1ð^ Þ to pick the best approximation among the different PODMM and PODM3 models based on the error estimator ^ indicates that ^ ROM is a good approximation of the actual error ROM . Here we examine gROM to more concretely evaluate the performance of ^ ROM . In general, the monthly means of 1ð^ Þ 1ð^ Þ gROM for all three variables studied are close to one and the standard deviation of gROM in each month 1ð^ Þ ROM is small (Figure 14). The maximum gROM is below 10, indicating that ^ is acceptable for all snapshots in the validation sample set. 3.4. Discussion We have demonstrated the ability of the POD mapping methods to predict fine-resolution solutions and distributions with significant heterogeneity using only coarse-resolution simulations. We envision using PODMM to develop multiple ROMs at critical locations, or hot spots, that require the higher-resolution models across the globe. When used in conjunction with a coarse global model, the ROMs provide the desired resolutions used for fine-resolution models such as mechanistic BGC models developed in Bouskill et al. [2012], Riley et al.

Figure 12. Variations of ROM monthly , averaged over the validation period by month, for ROM 5 PODMM and PODMM-vm, and the optimal M with the month. For PODMM, the Ms used in determining PODMM monthly for h0210 , Hl, and NPP are 10, 12, and 9, respectively

PAU ET AL.


805


10.1002/2015WR017782

1ð^ Þ Figure 13. Variations of ROM , and PODM1ðÞ . Except for PODMM-vm, we jointly conmonthly , averaged over the validation period by month, for ROM 5 PODMM-vm, PODM3-vm, PODM sidered soil moisture at depths 10–30, 30–50, 50–100, and 100–200 cm for h0210 ; evapotranspiration rate (ET), sensible heat (Hs), and ground temperature (Tground ) for Hl; and leaf area index (LAI), gross primary production (GPP), net ecosystem exchange (NEE), heterotrophic respiration (rh), autotrophic respiration (ra), and ecosystem respiration (re) for NPP.

[2014] and Tang and Riley [2015]. The computational demands of developing ROMs customized to each site are readily surmountable by fully utilizing modern supercomputers. Since supercomputers are heavily shared by many users, there are vast variations in their overall throughput and performance due to uneven resource utilization [Gunasekaran and Kim, 2014]. By executing the construction stage of ROMs, which is computationally demanding, during the off-peak cycles, we are able to execute the prediction stage of the ROM even during peak cycles since it requires very limited resources. We will also be able to execute the prediction stage of the ROM on smaller machines with smaller user bases and thus better throughput. To this end, we have developed a software framework called parallel Reduced-Order Model for Earth systems (pROME) that allows ROMs to be constructed and utilized efficiently on supercomputers. pROME utilizes the latest numerical and IO libraries, such as PETSc [Balay et al., 2014], SLEPc [Hernandez et al., 2005], pnetcdf [Li et al., 2003], and hdf5 [The HDF Group, 1997–2016] to perform parallel IO operations, eigenvalue decompositions, and linear solves. Efforts are currently being undertaken to couple pROME more tightly with CLM so that coupled simulations involving ROMs can be performed. Instead of site-specific ROMs, we could also develop a ROM that is applicable to multiple sites, although it will require several enhancements not explored in this paper. First, the model’s domain must be disaggregated into statistically homogeneous subdomains. This task can be very challenging and clustering techniques [Hoffman et al., 2013] can perhaps be used. Second, separate ROMs must be constructed for each subdomain. There must be sufficient data on both the fine-resolution and coarse-resolution grids of the subdomain to apply POD mapping methods successfully, which sets limits on the coarse model resolution. Third, the separate ROMs developed for each subdomain must be mapped to subdomains at the new river basin. The accuracy of the resulting approximation will depend on how similar the subdomains

1ð^ Þ

Figure 14. The monthly effectivity gROM . The solid box represents gROM 1ð^ Þ for that particular month. range of gROM

PAU ET AL.

1ð^ Þ

that fall within 61 standard deviation from the mean of that month. The thin vertical line represents the


806


10.1002/2015WR017782

from the two river basins are. We can improve the accuracy of the site-independent ROMs, at least statistically, by considering multiple river basins (not necessarily including the basin under consideration) during the construction stage. The challenges of constructing site-independent ROMs are similar to those encountered by regionalization methods [Oudin et al., 2008], even if the motivation is different. While we understand the attractiveness of developing a site-independent ROM that is broadly applicable, the resulting site-independent ROMs are unlikely to have the accuracy and efficiency of a site-specific ROM since it is very difficult to reproduce structured heterogeneities, such as PFT distributions, topography, and soil properties, present in land models statistically. Apart from its role in multiscale simulation, ROMs can also be used to improve sensitivity and uncertainty analyses; see, for example, Borgonovo et al. [2012], Zhang et al. [2013], and Sargsyan et al. [2014] for recent applications in hydrology. Studying the land model responses to uncertainty in the climate forcing involves running the models with many different instantiations of forcing data. In order to capture the spatial heterogeneity in the response, ROMs can be used to approximate these high-resolution solutions efficiently instead of running high-resolution model that is computationally expensive. To accurately perform the above analyses, we need to address several shortcomings of the method before the method can be used to construct a ROM for complex land models. First, the mapping scheme does not include a mechanism to ensure discrete conservation of quantities. Such a mechanism will require two-way coupling between the coarse model and the ROM. For example, the difference in the total moisture content in the coarse and the approximated fine solutions can be taken into account by adjusting the runoff in the coarse model at the end of a time step, similar to approaches implemented in CLM [Oleson et al., 2010]. Second, we should only model independent variables with this method since it does not explicitly enforce physical relationships between the downscaled variables. The three variables we modeled are only weakly correlated and the good accuracy of the ROM has allowed these correlations to be reproduced. Finally, downscaling directly from kilometer scale to centimeter scale based on PODMM will be challenging since running a watershed-scale model at centimeter scale would be very computationally demanding. A possible solution will be to develop a hierarchical approach that couples multiple PODMM-based ROMs at different scales. We will address these challenges in our future work.

4. Conclusions In this paper, we describe the construction of ROMs for fine-resolution river basin models based on POD mapping methods. We study the performance of POD mapping methods that consider, independently and jointly, different sets of variables. We describe an approach that chooses the best approximation among the ROMs based on an error estimator that efficiently and accurately approximates the actual error. We achieve accurate downscaling of soil moisture, latent heat flux, and net primary production; the ROM solutions and the actual solutions have indistinguishable statistical distributions. Biases in the upscaled ROM solutions are also significantly smaller than the coarse-resolution solutions and their distributions are similar to the upscaled fine-resolution solutions. ROMs constructed based on POD mapping methods will have the desired efficiency and accuracy needed to perform long-time integration and stochastic uncertainty analysis at the resolutions that best represent the heterogeneities in the processes that we are modeling.

Appendix A: PAWS1CLM Model of Clinton River Basin To create a PAWS1CLM model for the Clinton River watershed (Figure A1), high-resolution elevation (30 m data set from National Elevation Dataset (NED)), land use (30 m data set from Integrated Forest Monitoring, Assessment, and Prescription data set (IFMAP) Michigan Department of Natural Resources [2010]), soil (1:12,000 to 1:63,360 SSURGO), river hydrography (1:24,000), well-log-based aquifer characteristics, landbased climate forcing data (12 stations; precipitation, temperature, humidity, and wind speeds), and simulated steady state carbon and nitrogen states (220 m) were used as inputs to the model [Shen et al., 2013]. All 12 land use-land-cover (LULC) types (10 plant functional types, or PFTs, one bare ground type, and an impervious type) present in the domain were modeled in each horizontal cell. We extracted the soil data from SSURGO database [Soil Survey Staff, 2010] and kriged well records from the WELLOGIC database [Groundwater Inventory and Map, 2006; Simard, 2007] to obtain the spatial distribution of lateral

PAU ET AL.


807


10.1002/2015WR017782

conductivities and the depths to bedrock. Daily weather data were obtained from the National Climatic Data Center [National Climatic Data Center, 2010]. In particular, daily precipitation data have been downscaled to hourly using the type II (intensive rain) rainfall hyetograph to which the study domain belongs. Daily maximum and minimum temperatures are downscaled to hourly using the Campbell equation, assuming highest temperature is 3 P.M. local time [Shen, 2009]: T5Tav 1

Tmx 2Tmn cos ð0:2618ðt215ÞÞ 2 (A1)

Figure A1. Map of Clinton River Watershed in the state of Michigan, USA.

in which t is the time of the day (h), and Tmx, Tmn, and Tav are, respectively, the maximum, minimum and average temperatures of that day (8C). The climatic data from the weather stations are distributed to the gridcells using nearest neighbor (Thiesson Polygon) approach. The model was then calibrated against USGS gaging station 04165500 (Clinton River at Mt. Clemens). To avoid over-fitting to the peak flow, we calibrated against the average of the Nash-Sutcliffe model performance coefficients (NS) [Nash and Sutcliffe, 1970] and the NS based on root-square-transformed discharge (RNS), which is defined as: XT qffiffiffiffiffiffi pffiffiffiffiffiffiffi2 Qt0 2 Qtm t51 RNS512 (A2) XT qffiffiffiffiffiffi pffiffiffiffiffiffi 2 t2 Q Q 0 0 t51 where Q0 and Qm are the daily observed and simulated flows (m3 d21), respectively. NS, which varies between 0 and 1, measures the performance of a model in matching the observed streamflow; a value of 0.7 based on daily hydrograph indicates very good model performance. The NS value we obtained for the calibrated gages was 0.61 for the current CRB model [Shen et al., 2013], the Upper Grand River basin (4527 km2) [Shen et al., 2014]. The RNS value was 0.14 higher than NS value, indicating the groundwater dynamics were better captured than the peaks. This pattern was confirmed by discharge comparisons shown in log-scale, in which the low-flow periods were shown to be very well simulated [Shen and Phanikumar, 2010]. We note that calibration is done at 800 m resolution due to computational consideration. The calibrated PAWS1CLM model at CRB has been evaluated against basin outlet gages, internal uncalibrated gages, spatially distributed depth to water table, in situ soil moisture, soil temperature measurements, satellite-based leaf area index and evapotranspiration estimates [Shen et al., 2013; Riley and Shen, 2014]. In this humid continental climate region, the model has also been compared to transient groundwater heads and GRACE-based terrestrial water storage [Niu et al., 2014]. This wide range of published comparisons with observations has demonstrated that the model reproduces the spatial-temporal hydrologic dynamics in this region reasonably well. For one of the uncalibrated internal gages in these basins, the NS value was 0.65. The performance however deteriorates in more recent years as the number of missing climate records grew. Apart from a good fit between the observed and simulated discharges, the coefficient of determination (R2) between the observed and simulated solutions in the CRB was 0.99 for the spatially distributed groundwater head (0.66 for depths to the water table), 0.72 for the transient groundwater head, and 0.97 for the soil temperature. We also demonstrated that the simulated soil moisture was in general agreement with in situ measurements [Riley and Shen, 2014]. Leaf Area Index (LAI) and ET also matched reasonably well with satellite (MODIS)-based products.

PAU ET AL.


808

Water Resources Research Acknowledgments We would like to thank the three anonymous reviewers for their constructive comments. Pau, Riley, and Liu were supported by the Director, Office of Science, Office of Biological and Environmental Research of the U.S. Department of Energy under contract DEAC02-05CH11231. C. Shen was supported by Office of Biological and Environmental Research of the U.S. Department of Energy under contract. DE-SC0010620. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the US Department of Energy under contract DEAC02-05CH11231. The data used to construct the models are obtained from multiple public sources identified in the Appendix A. The model results and codes used in this work are available upon request by sending an email to the first author at [email protected].

PAU ET AL.

10.1002/2015WR017782

References Albertson, J. D., and N. Montaldo (2003), Temporal dynamics of soil moisture variability. 1: Theoretical basis, Water Resour. Res., 39(10), 1274, doi:10.1029/2002WR001616. Alvarez, M. A. (2012), Kernels for vector-valued functions: A review, Found. Trends Mach. Learning, 4(3), 195–266. Arrigo, J. A. S., and G. D. Salvucci (2005), Investigation hydrologic scaling: Observed effects of heterogeneity and nonlocal processes across, hillslope, watershed, and regional scales, Water Resour. Res., 41, W11417, doi:10.1029/2005WR004032. Balay, S., et al. (2014), PETSc users manual, Tech. Rep. ANL-95/11—Revision 3.5, Argonne Natl. Lab., Lemont, Ill. Barrios, M., and F. Frances (2012), Spatial scale effect on the upper soil effective parameters of a distributed hydrological model, Hydrol. Processes, 26(7), 1022–1033, doi:10.1002/hyp.8193. Bayarri, M. J., J. O. Berger, J. Cafeo, G. Garcia-Donato, F. Liu, J. Palomo, R. J. Parthasarathy, R. Paulo, J. Sacks, and D. Walsh (2007), Computer model validation with functional output, Ann. Stat., 35(5), 1874–1906. Beven, K. J., and H. L. Cloke (2012), Comment on ‘‘Hyperresolution global land surface modeling: Meeting a grand challenge for monitoring earth’s terrestrial water’’ by Eric F. Wood et al., Water Resour. Res., 48, W01801, doi:10.1029/2011WR010982. Bierkens, M. F. P., et al. (2015), Hyper-resolution global hydrological modelling: What is next?, Hydrol. Processes, 29(2), 310–320, doi: 10.1002/hyp.10391. Bohn, T. J., D. P. Lettenmaier, K. Sathulur, L. C. Bowling, E. Podest, K. C. McDonald, and T. Friborg (2007), Methane emissions from western Siberian wetlands: Heterogeneity and sensitivity to climate change, Environ. Res. Lett., 2(4), 045015. Bohn, T. J., et al. (2013), Modeling the large-scale effects of surface moisture heterogeneity on wetland carbon fluxes in the west Siberian lowland, Biogeosciences, 10(10), 6559–6576, doi:10.5194/bg-10-6559-2013. Borgonovo, E., W. Castaings, and S. Tarantola (2012), Model emulation and moment-independent sensitivity analysis: An application to environmental modelling, Environ. Modell. Software, 34, 105–115. Bouskill, N. J., J. Tang, W. J. Riley, and E. L. Brodie (2012), Trait-based representation of biological nitrification: Model development testing, and predicted community composition, Front. Microbiol., 3, 364, doi:10.3389/Fmicb.2012.00364. Bouwman, A. F., M. F. P. Bierkens, J. Griffioen, M. M. Hefting, J. J. Middelburg, H. Middelkoop, and C. P. Slomp (2013), Nutrient dynamics, transfer and retention along the aquatic continuum from land to ocean: Towards integration of ecological and biogeochemical models, Biogeosciences, 10(1), 1–22, doi:10.5194/bg-10-1-2013. Boyce, S. E., T. Nishikawa, and W. W.-G. Yeh (2015), Reduced order modeling of the Newton formulation of {MODFLOW} to solve unconfined groundwater flow, Adv. Water Resour., 83, 250–262, doi:10.1016/j.advwatres.2015.06.005. Brocca, L., R. Morbidelli, F. Melone, and T. Moramarco (2007), Soil moisture spatial variability in experimental areas of central Italy, J. Hydrol., 333(2–4), 356–373, doi:10.1016/j.jhydrol.2006.09.004. Brocca, L., F. Melone, T. Moramarco, and R. Morbidelli (2010), Spatial-temporal variability of soil moisture and its estimation across scales, Water Resour. Res., 46, W02516, doi:10.1029/2009WR008016. Brocca, L., T. Tullo, F. Melone, T. Moramarco, and R. Morbidelli (2012), Catchment scale soil moisture spatial-temporal variability, J. Hydrol., 422, 63–75, doi:10.1016/j.jhydrol.2011.12.039. Cardoso, M. A., L. J. Durlofsky, and P. Sarma (2009), Development and application of reduced-order modeling procedures for subsurface flow simulation, Int. J. Numer. Methods Eng., 77(9), 1322–1350. Choi, H. I., P. Kumar, and X. Z. Liang (2007), Three-dimensional volume-averaged soil moisture transport model with a scalable parameterization of subgrid topographic variability, Water Resour. Res., 43, W04414, doi:10.1029/2006WR005134. Choi, M., and J. M. Jacobs (2011), Spatial soil moisture scaling structure during soil moisture experiment 2005, Hydrol. Processes, 25(6), 926– 932, doi:10.1002/hyp.7877. Clark, M. P., et al. (2015), Improving the representation of hydrologic processes in earth system models, Water Resour. Res., 51, 5929–5956, doi:10.1002/2015WR017096. Conti, S., and A. O’Hagan (2009), Bayesian emulation of complex multi-output and dynamic computer models, J. Stat. Plann. Inference, 140(3), 640–651. Das, N. N., and B. P. Mohanty (2008), Temporal dynamics of PSR-based soil moisture across spatial scales in an agricultural landscape during SMEx02: A wavelet approach, Remote Sens. Environ., 112(2), 522–534, doi:10.1016/J.Rse.2007.05.007. Drignei, D., C. E. Forest, and D. Nychka (2008), Parameter estimation for computationally intensive nonlinear regression with an application to climate modeling, Ann. Appl. Stat., 2(4), 1217–1230. Edwards, N. R., D. Cameron, and J. Rougier (2010), Precalibrating an intermediate complexity climate model, Clim. Dyn., 37(7–8), 1469–1482. Essery, R. L. H., M. J. Best, R. A. Betts, P. M. Cox, and C. M. Taylor (2003), Explicit representation of subgrid heterogeneity in a GCM land surface scheme, J. Hydrometeorol., 4(3), 530–543, doi:10.1175/1525-7541(2003)004 < 0530:EROSHI>2.0.CO;2. Everson, R., and L. Sirovich (1995), Karhunen-Loeve procedure for Gappy data, J. Opt. Soc. Am. A Opt. Image Sci., 12(8), 1657–1664. Famiglietti, J. S., J. A. Devereaux, C. A. Laymon, T. Tsegaye, P. R. Houser, T. J. Jackson, S. T. Graham, M. Rodell, and P. J. van Oevelen (1999), Ground-based investigation of soil moisture variability within remote sensing footprints during the southern great plains 1997 (SGP97) hydrology experiment, Water Resour. Res., 35(6), 1839–1851, doi:10.1029/1999WR900047. Famiglietti, J. S., D. Ryu, A. A. Berg, M. Rodell, and T. J. Jackson (2008), Field observations of soil moisture variability across scales, Water Resour. Res., 44, W01423, doi:10.1029/2006WR005804. Fowler, H. J., S. Blenkinsop, and C. Tebaldi (2007), Linking climate change modelling to impacts studies: Recent advances in downscaling techniques for hydrological modelling, Int. J. Climatol., 27(12), 1547–1578, doi:10.1002/joc.1556. Frei, S., K. H. Knorr, S. Peiffer, and J. H. Fleckenstein (2012), Surface micro-topography causes hot spots of biogeochemical activity in wetland systems: A virtual modeling experiment, J. Geophys. Res., 117, G00N12, doi:10.1029/2012JG002012. Goovaerts, P. (1998), Ordinary cokriging revisited, Math. Geol., 30(1), 21–42, doi:10.1023/A:1021757104135. Groundwater Inventory and Mapping Project (2006), State of Michigan Public Act 148 Groundwater Inventory and Mapping Project (GWIM), IWR, Michigan State Univ. [Available at http://gwmap.rsgis.msu.edu/, accessed on 1 Dec. 2015.] Gunasekaran, R., and Y. Kim (2014), Feedback computing in leadership compute systems, in 9th International Workshop on Feedback Computing (Feedback Computing 14), USENIX Assoc., Philadelphia, Pa. Gutmann, E., T. Pruitt, M. P. Clark, L. Brekke, J. R. Arnold, D. A. Raff, and R. M. Rasmussen (2014), An intercomparison of statistical downscaling methods used for water resource assessments in the United States, Water Resour. Res., 50, 7167–7186, doi:10.1002/ 2014WR015559. Hanssen-Bauer, I., E. J. Førland, J. E. Haugen, and O. E. Tveito (2003), Temperature and precipitation scenarios for Norway: Comparison of results from dynamical and empirical downscaling, Clim. Res., 25(1), 15–27, doi:10.3354/cr025015.


809


10.1002/2015WR017782

Hernandez, V., J. E. Roman, and V. Vidal (2005), SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems, ACM Trans. Math. Software, 31(3), 351–362. Higdon, D., J. Gattiker, B. Williams, and M. Rightley (2008), Computer model calibration using high-dimensional output, J. Am. Stat. Assoc., 103(482), 570–583. Hoffman, F. M., J. Kumar, R. T. Mills, and W. W. Hargrove (2013), Representativeness-based sampling network design for the state of Alaska, Landscape Ecol., 28(8), 1567–1586, doi:10.1007/s10980-013-9902-0. Holden, P. B., N. R. Edwards, K. I. C. Oliver, T. M. Lenton, and R. D. Wilkinson (2009), A probabilistic calibration of climate sensitivity and terrestrial carbon change in GENIE-1, Clim. Dyn., 35(5), 785–806. Hu, Z. L., S. Islam, and Y. Z. Cheng (1997), Statistical characterization of remotely sensed soil moisture images, Remote Sens. Environ., 61(2), 310–318, doi:10.1016/S0034-4257(97)89498-9. Ivanov, V. Y., S. Fatichi, G. D. Jenerette, J. F. Espeleta, P. A. Troch, and T. E. Huxman (2010), Hysteresis of soil moisture spatial heterogeneity and the homogenizing effect of vegetation, Water Resour. Res., 46, W09521, doi:10.1029/2009WR008611. Jana, R. B., and B. P. Mohanty (2012), A topography-based scaling algorithm for soil hydraulic parameters at hillslope scales: Field testing, Water Resour. Res., 48, W02519, doi:10.1029/2011WR011205. Ji, X., C. Shen, and W. J. Riley (2015), Temporal evolution of soil moisture statistical fractal and controls by soil texture and regional groundwater flow, Adv. Water Resour., 86, 155–169, doi:10.1016/j.advwatres.2015.09.027. Jolliffe, I. T. (2002), Principal Component Analysis, Springer Ser. Stat., 2nd ed., XXIX, 487 pp., Springer, N. Y. Joshi, C., and B. P. Mohanty (2010), Physical controls of near-surface soil moisture across varying spatial scales in an agricultural landscape during SMEx02, Water Resour. Res., 46, W12503, doi:10.1029/2010WR009152. Kennedy, M., and A. O’Hagan (2001), Bayesian calibration of computer models, J. R. Stat. Soc., Ser. B, 63(3), 425–464. Kollet, S. J., R. M. Maxwell, C. S. Woodward, S. Smith, J. Vanderborght, H. Vereecken, and C. Simmer (2010), Proof of concept of regional scale hydrologic simulations at hydrologic resolution utilizing massively parallel computer resources, Water Resour. Res., 46, W04201, doi:10.1029/2009WR008730. Kumar, P. (2004), Layer averaged Richard’s equation with lateral flow, Adv. Water Resour., 27(5), 521–531, doi:10.1016/ j.advwatres.2004.02.007. Lawrence, D. M., et al. (2011), Parameterization improvements and functional and structural advances in version 4 of the community land model, J. Adv. Model. Earth Syst., 3, M03001, doi:10.1029/2011MS000045. Lawrence, J. E., and G. M. Hornberger (2007), Soil moisture variability across climate zones, Geophys. Res. Lett., 34, L20402, doi:10.1029/ 2007GL031382. Lawrence, N. D. (2004), Gaussian process latent variable models for visualisation of high dimensional data, Adv. Neural Inform. Process. Syst., 16, 329–336. Li, B., and M. Rodell (2013a), Spatial variability and its scale dependency of observed and modeled soil moisture over different climate regions, Hydrol. Earth Syst. Sci., 17(3), 1177–1188, doi:10.5194/Hess-17-1177-2013. Li, B., and M. Rodell (2013b), Spatial variability and its scale dependency of observed and modeled soil moisture over different climate regions, Hydrol. Earth Syst. Sci., 17(3), 1177–1188, doi:10.5194/Hess-17-1177-2013. Li, J., W.-K. Liao, A. Choudhary, R. Ross, R. Thakur, W. Gropp, R. Latham, A. Siegel, B. Gallagher, and M. Zingale (2003), Parallel netCDF: A high-performance scientific I/O interface, in Supercomputing, 2003 ACM/IEEE Conference, 39 pp., IEEE, Lansing, Mich., doi:10.1109/ SC.2003.10053. Lieberman, C., K. Willcox, and O. Ghattas (2010), Parameter and state model reduction for large-scale statistical inverse problems, SIAM J. Sci. Comput., 32(5), 2523–2542. Liu, Y., G. Bisht, Z. Subin, W. Riley, and G. S. H. Pau (2015), A hybrid reduced-order model of fine-resolution hydrologic simulations at a polygonal tundra site, Vadose Zone J., 1–14, doi:10.2136/vzj2015.05.0068, in press. Marrel, A., B. Iooss, M. Jullien, B. Laurent, and E. Volkova (2010), Global sensitivity analysis for models with spatially dependent outputs, Environmetrics, 22(3), 383–397, doi:10.1002/env.1071. Mascaro, G., E. R. Vivoni, and R. Deidda (2010), Downscaling soil moisture in the southern Great Plains through a calibrated multifractal model for land surface modeling applications, Water Resour. Res., 46, W08546, doi:10.1029/2009WR008855. Mascaro, G., E. R. Vivoni, and R. Deidda (2011), Soil moisture downscaling across climate regions and its emergent properties, J. Geophys. Res., 116, D22114, doi:10.1029/2011JD016231. Maxwell, R. M., et al. (2014), Surface-subsurface model intercomparison: A first set of benchmark results to diagnose integrated hydrology and feedbacks, Water Resour. Res., 50, 1531–1549, doi:10.1002/2013WR013725. McClain, M. E., et al. (2003), Biogeochemical hot spots and hot moments at the interface of terrestrial and aquatic ecosystems, Ecosystems, 6(4), 301–312, doi:10.1007/S10021-003-0161-9. Michigan Department of Natural Resources (2010), 2001 IFMAP/GAP Lower Peninsula land cover. Lansing, Mich. [Available at http://www. dnr.state.mi.us/spatialdatali-brary/sdl2/land_use_cover/2001/IFMAP_lp_landcover.exe, accessed 30 Oct. 2015.] Montaldo, N., and J. D. Albertson (2003), Temporal dynamics of soil moisture variability. 2: Implications for land surface models, Water Resour. Res., 39(10), 1275, doi:10.1029/2002WR001618. Nash, J. E., and J. V. Sutcliffe (1970), River flow forecasting through conceptual models. Part I: A discussion of principles, J. Hydrol., 10(3), 282–290, doi:10.1016/0022-1694(70)90255-6. National Climatic Data Center (2010), National climatic data center, Natl. Oceanic and Atmos. Admin. [Available at: http://www.noaa.gov/ oa/climate/climatedata.html#daily, last access 30 May 2014.] Niu, J., and M. S. Phanikumar (2015), Modeling watershed-scale solute transport using an integrated, process-based hydrologic model with applications to bacterial fate and transport, Part 1, J. Hydrol., 529, 35–48, doi:10.1016/j.jhydrol.2015.07.013. Niu, J., C. Shen, S.-G. Li, and M. S. Phanikumar (2014), Quantifying storage changes in regional Great Lakes watersheds using a coupled subsurface-land surface process model and GRACE, MODIS products, Water Resour. Res., 50, 7359–7377, doi:10.1002/2014WR015589. Nykanen, D. K., and E. Foufoula-Georgiou (2001), Soil moisture variability and scale-dependency of nonlinear parameterizations in coupled land-atmosphere models, Adv. Water Resour., 24(9–10), 1143–1157, doi:10.1016/S0309-1708(01)00046-X. Odeh, I., A. McBratney, and D. Chittleborough (1995), Further results on prediction of soil properties from terrain attributes: Heterotopic cokriging and regression-kriging, Geoderma, 67(3–4), 215–226, doi:10.1016/0016-7061(95)00007-B. Oleson, K., et al. (2010), Technical description of version 4.0 of the Community Land Model (CLM), Tech. Note NCAR/TN-478+STR., Natl. Center for Atmos. Res., Boulder, Colo., doi:10.5065/D6RR1W7M. Olson, R., R. Sriver, M. Goes, N. M. Urban, H. D. Matthews, M. Haran, and K. Keller (2012), A climate sensitivity estimate using Bayesian fusion of instrumental observations and an earth system model, J. Geophys. Res., 117, D04103, doi:10.1029/2011JD016620.

PAU ET AL.


810


10.1002/2015WR017782

Oudin, L., V. Andr eassian, C. Perrin, C. Michel, and N. Le Moine (2008), Spatial proximity, physical similarity, regression and ungaged catchments: A comparison of regionalization approaches based on 913 French catchments, Water Resour. Res., 44, W03413, doi:10.1029/ 2007WR006240. Pan, F., and C. D. Peters-Lidard (2008), On the relationship between mean and variance of soil moisture fields, J. Am. Water Resour. Assoc., 44(1), 235–242, doi:10.1111/J.1752-1688.2007.00150.X. Pau, G. S. H., G. Bisht, and W. J. Riley (2014), A reduced-order modeling approach to represent subgrid-scale hydrological dynamics for landsurface simulations: Application in a polygonal tundra landscape, Geosci. Model Dev., 7(5), 2091–2105, doi:10.5194/gmd-7-2091-2014. Rasmussen, C. E., and C. K. I. Williams (2006), Gaussian Processes for Machine Learning, 248 pp., MIT Press, Cambridge, Mass. Razavi, S., B. A. Tolson, and D. H. Burn (2012), Review of surrogate modeling in water resources, Water Resour. Res., 48, W07401, doi: 10.1029/2011WR011527. Riley, W. J., and C. Shen (2014), Characterizing coarse-resolution watershed soil moisture heterogeneity using fine-scale simulations, Hydrol. Earth Syst. Sci. Discuss., 11(2), 1967–2009, doi:10.5194/hessd-11-1967-2014. Riley, W. J., F. Maggi, M. Kleber, M. S. Torn, J. Y. Tang, D. Dwivedi, and N. Guerry (2014), Long residence times of rapidly decomposable soil organic matter: Application of a multi-phase, multi-component, and vertically resolved model (BAMS1) to soil carbon dynamics, Geosci. Model Dev., 7(4), 1335–1355, doi:10.5194/gmd-7-1335-2014. Robinson, T., M. Eldred, K. Willcox, and R. Haimes (2006), Strategies for multifidelity optimization with variable dimensional hierarchical models, in 47th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, Am. Inst. of Aeronaut. and Astronaut., Newport, R. I., doi:10.2514/6.2006-1819. Rodriguez-Iturbe, I., G. K. Vogel, R. Rigon, D. Entekhabi, F. Castelli, and A. Rinaldo (1995), On the spatial-organization of soil-moisture fields, Geophys. Res. Lett., 22(20), 2757–2760, doi:10.1029/95GL02779. Rosenbaum, U., H. R. Bogena, M. Herbst, J. A. Huisman, T. J. Peterson, A. Weuthen, A. W. Western, and H. Vereecken (2012), Seasonal and event dynamics of spatial soil moisture patterns at the small catchment scale, Water Resour. Res., 48, W10544, doi:10.1029/ 2011WR011518. Rougier, J., D. M. H. Sexton, J. M. Murphy, and D. Stainforth (2009), Analyzing the climate sensitivity of the Had SM3 climate model using ensembles from different but related experiments, J. Clim., 22(13), 3540–3557, doi:10.1175/2008JCLI2533.1. Sacks, J., W. J. Welch, T. J. Mitchell, and H. P. Wynn (1989), Design and analysis of computer experiments, Stat. Sci. A, 4(4), 409–435. Sargsyan, K., C. Safta, H. N. Najm, B. J. Debusschere, D. Ricciuto, and P. Thornton (2014), Dimensionality reduction for complex models via Bayesian compressive sensing, Int. J. Uncertain. Quantif., 4(1), 63–93, doi:10.1615/Int.J.UncertaintyQuantification.2013006821. Shen, C. (2009), A process-based distributed hydrologic model and its application to a Michigan watershed, PhD thesis, Michigan State Univ., East Lansing, Mich. Shen, C., and M. S. Phanikumar (2010), A process-based, distributed hydrologic model based on a large-scale method for surfacesubsurface coupling, Adv. Water Resour., 33(12), 1524–1541, doi:10.1016/j.advwatres.2010.09.002. Shen, C., J. Niu, and M. S. Phanikumar (2013), Evaluating controls on coupled hydrologic and vegetation dynamics in a humid continental climate watershed using a subsurface-land surface processes model, Water Resour. Res., 49, 2552–2572, doi:10.1002/wrcr.20189. Shen, C., J. Niu, and K. Fang (2014), Quantifying the effects of data integration algorithms on the outcomes of a subsurface-surface processes model, Environ. Modell. Software, 59, 146–161, doi:10.1016/j.envsoft.2014.05.006. Siade, A. J., M. Putti, and W. W. G. Yeh (2010), Snapshot selection for groundwater model reduction using proper orthogonal decomposition, Water Resour. Res., 46, W08539, doi:10.1029/2009WR008792. Simard, A. (2007), Predicting groundwater flow and transport using Michigan’s statewide wellogic database, PhD thesis, Civ. and Environ. Eng., Michigan State Univ., East Lansing, Mich. Soil Survey Staff (2010), Soil Survey Geographic Database, Natural Resources Conservation Service, U.S. Dep. of Agric. [Available at http:// www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/survey/geo/?cid=nrcs142p2_053627, accessed 1 Dec. 2015.] Tang, J., and W. J. Riley (2015), Weaker soil carbon-climate feedbacks resulting from microbial and abiotic interactions, Nat. Clim. Change, 5(1), 56–60. Tague, C., L. Band, S. Kenworthy, and D. Tenebaum (2010), Plot- and watershed-scale soil moisture variability in a humid piedmont watershed, Water Resour. Res., 46, W12541, doi:10.1029/2009WR008078. Tebaldi, C., and J. Arblaster (2014), Pattern scaling: Its strengths and limitations, and an update on the latest model simulations, Clim. Change, 122(3), 459–471, doi:10.1007/s10584-013-1032-9. Teuling, A. J., and P. A. Troch (2005), Improved understanding of soil moisture variability dynamics, Geophys. Res. Lett., 32, L05404, doi: 10.1029/2004GL021935. Teuling, A. J., F. Hupet, R. Uijlenhoet, and P. A. Troch (2007), Climate variability effects on spatial soil moisture dynamics, Geophys. Res. Lett., 34, L06406, doi:10.1029/2006GL029080. The HDF Group (1997–2016), Hierarchical Data Format, Version 5. [Available at http://www.hdfgroup.org/HDF5/.] Ver Hoef, J. M., and R. P. Barry (1998), Constructing and fitting models for cokriging and multivariable spatial prediction, J. Stat. Plan. Inference, 69(2), 275–294. Vereecken, H., T. Kamai, T. Harter, R. Kasteel, J. Hopmans, and J. Vanderborght (2007), Explaining soil moisture variability as a function of mean soil moisture: A stochastic unsaturated flow perspective, Geophysical Res. Lett., 34, L22402, doi:10.1029/2007GL031813. Vermeulen, P., A. Heemink, and C. Te Stroet (2004), Reduced models for linear groundwater flow models using empirical orthogonal functions, Adv. Water Resour., 27(1), 57–69. Vivoni, E. R., D. Entekhabi, R. L. Bras, and V. Y. Ivanov (2007), Controls on runoff generation and scale-dependence in a distributed hydrologic model, Hydrol. Earth Syst. Sci., 11(5), 1683–1701. von Storch, H., E. Zorita, and U. Cubasch (1993), Downscaling of global climate change estimates to regional scales: An application to Iberian rainfall in wintertime, J. Clim., 6(6), 1161–1171, doi:10.1175/1520-0442(1993)006 < 3C1161:DOGCCE>3E2.0.CO;2. Wen, X., and J. G omez-Hern ondez (1996), Upscaling hydraulic conductivities in heterogeneous media: An overview, J. Hydrol., 183(1–2), R9–R32. Wilby, R. L., T. M. L. Wigley, D. Conway, P. D. Jones, B. C. Hewitson, J. Main, and D. S. Wilks (1998), Statistical downscaling of general circulation model output: A comparison of methods, Water Resour. Res., 34(11), 2995–3008, doi:10.1029/98WR02577. Willcox, K., and J. Peraire (2002), Balanced model reduction via the proper orthogonal decomposition, AIAA J., 40(11), 2323–2330. Wood, A. W., L. R. Leung, V. Sridhar, and D. P. Lettenmaier (2004), Hydrologic implications of dynamical and statistical approaches to downscaling climate model outputs, Clim. Change, 62(1–3), 189–216, doi:10.1023/B:CLIM.0000013685.99609.9e. Wood, E. F. (1997), Effects of soil moisture aggregation on surface evaporative fluxes, J. Hydrol., 190(3–4), 397–412, doi:10.1016/S00221694(96)03135-6.

PAU ET AL.


811


10.1002/2015WR017782

Wood, E. F. (1998), Scale analyses for land-surface hydrology, in Scale Dependence and Scale Invariance in Hydrology, edited by G. Sposito, pp. 1–29, Cambridge Univ. Press, Cambridge, U. K. Wood, E. F., et al. (2011), Hyperresolution global land surface modeling: Meeting a grand challenge for monitoring earth’s terrestrial water, Water Resour. Res., 47, W05301, doi:10.1029/2010WR010090. Zhang, G. N., D. Lu, M. Ye, M. Gunzburger, and C. Webster (2013), An adaptive sparse-grid high-order stochastic collocation method for Bayesian inference in groundwater reactive transport modeling, Water Resour. Res., 49, 6871–6892, doi:10.1002/wrcr.20467.

PAU ET AL.


812