Optimal allocation of computational resources in

0 downloads 0 Views 1MB Size Report
Advances in Water Resources 83 (2015) 299–309. Contents .... the interplay of the uncertainty reduction and the computational cost .... i, the MC framework is employed by sampling Nmc realizations from the pdf of the random field parameter.
Advances in Water Resources 83 (2015) 299–309

Contents lists available at ScienceDirect

Advances in Water Resources journal homepage: www.elsevier.com/locate/advwatres

Optimal allocation of computational resources in hydrogeological models under uncertainty Mahsa Moslehi a, Ram Rajagopal b, Felipe P.J. de Barros a,∗ a b

Sonny Astani Department of Civil and Environmental Engineering, University of Southern California, Los Angeles, CA, USA Department of Civil and Environmental Engineering, Stanford University, CA, USA

a r t i c l e

i n f o

Article history: Received 27 February 2015 Revised 25 June 2015 Accepted 27 June 2015 Available online 2 July 2015 Keywords: Model complexity Computational resource allocation Optimization Stochastic hydrology Flow and transport Aquifer heterogeneity

a b s t r a c t Flow and transport models in heterogeneous geological formations are usually large-scale with excessive computational complexity and uncertain characteristics. Uncertainty quantification for predicting subsurface flow and transport often entails utilizing a numerical Monte Carlo framework, which repeatedly simulates the model according to a random field parameter representing hydrogeological characteristics of the aquifer. The physical resolution (e.g. spatial grid resolution) for the simulation is customarily chosen based on recommendations in the literature, independent of the number of Monte Carlo realizations. This practice may lead to either excessive computational burden or inaccurate solutions. We develop an optimization-based methodology that considers the trade-off between the following conflicting objectives: time associated with computational costs, statistical convergence of the model prediction and physical errors corresponding to numerical grid resolution. Computational resources are allocated by considering the overall error based on a joint statistical–numerical analysis and optimizing the error model subject to a given computational constraint. The derived expression for the overall error explicitly takes into account the joint dependence between the discretization error of the physical space and the statistical error associated with Monte Carlo realizations. The performance of the framework is tested against computationally extensive simulations of flow and transport in spatially heterogeneous aquifers. Results show that modelers can achieve optimum physical and statistical resolutions while keeping a minimum error for a given computational time. The physical and statistical resolutions obtained through our analysis yield lower computational costs when compared to the results obtained with prevalent recommendations in the literature. Lastly, we highlight the significance of the geometrical characteristics of the contaminant source zone on the optimum physical and statistical resolutions. © 2015 Elsevier Ltd. All rights reserved.

1. Introduction Hydrogeological models that represent flow and transport in subsurface formations are usually large-scale with excessive computational complexity and uncertain characteristics. In general, hydrogeologists have to rely on numerical methods to predict a model response. This is due to the presence of the spatio-temporal variability of input parameters, nonlinearities and complex boundary conditions. Such conditions are normally encountered in different hydrogeological applications such as flow and solute transport [1,2], contaminant site management [3,4], human health risk assessment [5,6] and the response of ecosystems [7]. Thus, in many situations, fully analytical treatments cannot be utilized and therefore numerical methods are employed. Besides the numerical complexity of



Corresponding author. Tel.: +12137400603. E-mail address: [email protected], [email protected] (F.P.J. de Barros).

http://dx.doi.org/10.1016/j.advwatres.2015.06.014 0309-1708/© 2015 Elsevier Ltd. All rights reserved.

large-scale hydrogeological systems, there are multiple sources of uncertainty that affect model prediction. A major source of uncertainty stems from the incomplete characterization of the heterogeneous geological formations. Hydrogeological properties, such as the hydraulic conductivity, are spatially variable and due to the high costs of data acquisition and measurement errors, a full detailed characterization is infeasible [2]. Therefore, hydrogeological predictions are subject to uncertainty and stochastic methods are required. The combined effect of data scarcity and spatial variability of the hydraulic properties of the subsurface leads to several challenges. First, the heterogeneous hydrogeological system needs to be approximately discretized in order to accurately capture spatial and temporal fluctuations of the model response. In addition to spatio-temporal discretization in numerical schemes, a statistical resolution should be considered in order to quantify the uncertainty in model predictions. For this purpose, the Monte Carlo (MC) framework is commonly utilized. The MC framework is straightforward to implement and can provide an estimation of the model response, but its accuracy highly

300

M. Moslehi et al. / Advances in Water Resources 83 (2015) 299–309

depends on the considered statistical resolution [8]. In the MC framework, the hydrogeological model needs to be simulated repeatedly to provide statistically meaningful results. Given the aforementioned reasons, the predicted hydrogeological model response contains two major sources of error. The first source of the overall error is originated from the truncation error of numerical approximations and the second arises from the statistical error associated with the brute-force MC framework. Other stochastic approaches could be chosen where the statistical error decreases at a faster rate when compared to the MC technique [9–11]. However the simplicity of the MC framework has made it the most commonly used approach among hydrogeologists. By considering higher statistical resolution (e.g. by increasing the number of MC runs), the accuracy of the solution can be improved and the statistical error can be reduced. On the other hand, by decreasing the size of the physical discretization (e.g. smaller numerical grid blocks), the truncation error will be diminished. Due to limited computational resources, choosing high resolutions for both physical and statistical discretization is almost impractical. Considering that the costs associated with numerical computations can be alleviated through the use of parallel computing, there is still a need to allocate computational resources in an optimal manner. Moreover, if a fine physical discretization is chosen, but the statistical resolution is coarse, the truncation error may be negligible but the predicted solution is not statistically converged. The same situation can be experienced when many repetitions of model evaluation are used within the MC framework, but the physical discretization is coarse. In this case, the truncation error due to physical discretization is dominant and the provided solution cannot represent the physics of the model accurately. Consequently, it can be implied that there is a tradeoff among physical and statistical discretization and method for optimally choosing these variables is required. Having this methodology at hand would provide a step towards addressing the following fundamental question: Given a limited computational resource, what are the optimum spatial and statistical resolutions that minimize the total error of the model prediction? Ababou et al. [12] studied the impact of grid refinement in spatially heterogeneous subsurface flow fields. Based on heuristic arguments, they proposed a rule to determine a number of numerical grid blocks per heterogeneity correlation scale of the log-conductivity field to achieve physically meaningful solutions and to capture the effects of velocity variability in solute transport. However, this method does not guarantee that the chosen grid resolution is the computationally optimal resolution (especially when confronted with the computational costs associated with MC). Cainelli et al. [13] investigated the accuracy of numerical schemes for solving the flow governing equation and noted that numerical schemes should be chosen carefully when dealing with spatially heterogeneous formations. By comparing velocity fields calculated by different numerical schemes, Cainelli et al. proposed a methodology for defining the physical resolution that bounds the numerical error of the flow field. However, they did not consider the roles of computational resource limitations and statistical resolutions in their analysis. Investigations have been carried out to enhance the computational efficiency of numerical hydrogeological models. Leube et al. [14] used the temporal moment method to reduce the complexity of the governing transport equation. With the aim of improving the computational efficiency while maintaining the physics of the problem accurate, Battiato et al. [15] developed a hybrid model that accounts for processes at both the pore-scale and the Darcy-scale. Parameter upscaling also allows to alleviate the computational burden [16]. Examples consist of upscaling of the permeability field [17] and the dispersion tensor [18,19]. Although, in these papers, the influence of the model reduction on the computational cost has been regarded, the interplay of the uncertainty reduction and the computational cost is not investigated.

In the context of uncertainty quantification, Ballio and Guadagnini [20] suggested a methodology for convergence analysis of the MC framework. As a result, they provided an estimation for the number of MC realizations required to refine the accuracy of the model prediction by a given percentage of the confidence interval. Passetto et al. [21] constructed reduced-order models for fully saturated heterogeneous subsurface formations. They evaluated the accuracy of the reduced model as a function of the correlation scale and variance of the log-transmissivity without considering the interaction between computational budget, physical and statistical resolutions. In a recent work, Leube et al. [22] quantified the optimal computational resource allocation by jointly considering the trade-off between uncertainty quantification and discretization and model reduction. They obtained cost-to-error surfaces which were generated by simulating different combinations of physical and statistical resolutions which are then used to determine the optimal points for a given computational constraint [22]. The limitation of the approach described in [22] is the need to run various numerical experiments in order to develop cost-to-error surfaces. Thus, the process of finding the optimal physical and statistical resolutions (for a given computation budget) is a byproduct of these numerical investigations which may lead to an excessive computational burden. In this study, we develop a methodology to efficiently distribute the available computational resources among physical and statistical discretization while minimizing the overall error. The main goal is to illustrate the existence of an appropriate choice of grid resolution and the number of MC runs that accurately estimate the model’s prediction with respect to the computational resource constraints. A key and novel component in our methodology lies in the definition of the overall error in the model response which is based on the bias-variance tradeoff [8]. The derived expression for the overall error explicitly accounts for the contribution stemming from the physical discretization and the stochastic resolution. As opposed to Leube et al. [22], the derived error metric does not rely on the assumption of statistical independence between the physical and stochastic errors. Through a series of examples, we show that the derived overall model response error is capable of efficiently handling the following conflicting objectives: computational cost, discretization error and statistical error. We test the framework on a commonly used stochastic spatially heterogeneous groundwater flow and transport model set-up. Results indicate that rational allocation of resources can reduce the computational costs while keeping the overall error minimum. The optimal physical and statistical resolutions arising from our method are successfully compared with the optimal results obtained by extensive brute-force numerical simulations. Finally, we highlight the importance of the solute plume scale in controlling the tradeoffs between the statistical and physical resolutions. 2. Formulation Let  denote the output of a physically-based hydrogeological model. Because of the uncertainty associated with the model parameters,  must be treated stochastically. Due to the spatial and temporal variability of these models, the output (), varies over the space (X ) and time (T ) domain. The output  can be considered as an environmental performance metric (EPM) such as human health risk, resident or flux-averaged concentration of a chemical mixture, and solute travel time [23]. Furthermore, the output  is a function of the model input parameters that can be represented by the input vector . A variety of parameters such as the hydraulic conductivity, porosity and geostatistical parameters can be included in the input vector . Thus, the output can be expressed as,

 = f (x, t; ),

(1)

where x and t are related to a specific point in the space and time domain, respectively. The function f assigns each feasible point in

M. Moslehi et al. / Advances in Water Resources 83 (2015) 299–309

space and time to a corresponding output value by considering the input parameters. This function is the solution of a governing equation that can be a partial differential equation (PDE) or in more complex cases, a set of coupled PDEs that mathematically describes the physical model. The model output is a function of the physical configuration of the problems such as boundary conditions and sink and source terms. In general, given the physical-(bio)chemical complexity of the problem,  is computed via numerical methods. The common feature of most numerical methods is the discretization of the physical domain of the model according to the resolution vector . Here  contains the dimensions of the numerical grid block. The proper resolution depends not only on the physics of the problem but also on the numerical approach that is used [13], bearing in mind that fine resolutions can significantly increase the computational efforts. Although conventional numerical methods require the definition of grid blocks, there exist grid-free methods such as moving particle semi-implicit method (MPS) [24] and smooth particle hydrodynamics (SPH) [25,26]. However, our work and upcoming results are based on discretization-based methods. In subsurface environments, hydrogeological features vary spatially over multiple scales [2]. The spatial variability of hydrogeological properties makes the full characterization of the site properties a difficult task. Therefore, the hydraulic parameters present in  are subject to uncertainty that arises from the lack of complete knowledge about the site properties. Consequently,  should be modeled as a random function. In order to take this randomness into account, the stochastic input parameters can be discretized into Monte Carlo (MC) realizations by sampling from the underlying probability distribution function (pdf) that best represents the uncertainty in . In other words, the MC approach discretizes the stochastic space into Nmc equiprobable realizations 1 , . . . , Nmc . Finally, by integrating all realizations, an estimation of the model output statistics can be obtained. The process of evaluating the statistics of the model output () by considering physical () and statistical () resolutions is illustrated in Fig. 1. The main difficulty associated with the process depicted in Fig. 1 is on selecting the appropriate spatial resolutions  and number of MC realizations Nmc to achieve accurate results given a computational time constraint. The main objective of this work is to tackle this challenge by developing a framework that illustrates the existence of an appropriate choice of  and Nmc that accurately estimates the model’s output  with respect to the computational resource con-

301

straints. Our analysis is based on deriving an expression for the error in approximating the statistics and physical response of . The second goal is to show that optimizing the derived error expression will yield near optimal results for  and Nmc . Details of the proposed framework are described in Section 3. 3. Methodology 3.1. Overall error estimation of the model response Let the vector  denote the physical resolution of the numerical model,

 = (1 , 2 , . . . , J ),

(2)

where J is the number of physical dimensions. In this work, focus is solely on the spatial discretization of the flow domain over a regularly spaced grid. Furthermore, the input parameters in  are random variables. Within the MC framework, the ensemble of  has a finite size Nmc :

 = (1 , 2 , . . . , Nmc ),

(3)

where m corresponds to the mth realization of the random field. As a consequence, substituting the ensemble by a finite size sample and adopting a numerical grid will lead to an approximation of . Our overall goal is to obtain  and Nmc that minimizes the error for the approximation of  for a given computational resource constraint. Thus, the optimization problem can be written as,

minimize subject to

ε(, Nmc ) Ttot (, Nmc ) ≤ B,

(4)

with variables  ∈ R+ and Nmc ∈ N+ . We define ε(, Nmc ) to represent the objective function and Ttot (, Nmc ) is the resource function that determines the required resource for each physical and statistical resolutions. Ttot (, Nmc ) can represent the required computational time for a specific processor to reach the desired outputs. The required computational resources should be bounded by B which represents the available resources (e.g. time). Hence, by solving this opopt opt timization problem, we can estimate the optimal  and Nmc such that the minimum error will be obtained according to the computational budget at hand. J

Fig. 1. Schematic process of a typical stochastic numerical hydrogeological simulation: (a) Different grid resolutions for the physical domain (here only spatial discretization). (b) For a specific physical resolution i , the MC framework is employed by sampling Nmc realizations from the pdf of the random field parameter . (c) The hydrogeological model is solved numerically for each realization and the results are integrated to achieve the statistical characterization of the output corresponding to a specific  and Nmc value.

302

M. Moslehi et al. / Advances in Water Resources 83 (2015) 299–309

In order to perform the minimization equation (4), the error expression needs to be defined. In the rest of this section, the method of estimating the overall error is described. Let us define V as the estimator of  which is given by,

V (x, t ) ≡ E [(x, t; )],

(6)

In general, due to the complexity of the models, it is not usually tractable to calculate the above expectation analytically. Therefore, the sample mean, can be used as an approximation for Eq. (6). Next, ˆ over Nmc realizations we define V¯ to be the sample mean of  of ,

V¯ (x, t; , Nmc ) =

Nmc 1  ˆ (x, t; m , ).  Nmc

(7)

m=1

Thus, the combined use of numerical methods for solving the governing equations and the MC framework leads to Eq. (7) which is an approximation for Eq. (5). This approximation is accurate as long as fine resolutions are considered (e.g.  → 0) and a large number of MC simulations are used (Nmc → ∞). The error of this approximation can be expressed as,

(x, t; , Nmc ) ≡ V¯ (x, t; , Nmc ) − V (x, t ) = [V¯ (x, t; , Nmc ) − Vˆ (x, t; )] +[Vˆ (x, t; ) − V (x, t )].

(8)

mined by,

ˆ (x, t; , )] Var [ Nmc ˆ + {E [(x, t; , ) − (x, t; )]}2 ,

(12)

ε(, Nmc ) S2ˆ



Nmc

2

+ (a p  p + a p−1  p−1 + · · · + a1 1 + a0 ) , (13)

ˆ (, ; x, t ) and ai (with i = where S2ˆ is the sample variance of   0, 1, . . . , p) are the coefficients of the polynomial. Details pertaining the estimation of S2ˆ and parameters ai , as well as the optimization  process, are discussed in Section 3.2.

3.2. Description of the optimization process (9)

Since Vˆ (x, t; ) and V(x, t) are constant values and not random varimean ables (see Eqs. (4) and (5)), and the expectation of sample   V¯ (x, t; , Nmc ) is the expected value of the population Vˆ (x, t; ) , the third term in the right hand side (RHS) of Eq. (9) can be expressed as,

E [2(V¯ (x, t; , Nmc ) − Vˆ (x, t; ))(Vˆ (x, t; ) − V (x, t ))] 2(Vˆ (x, t; ) − V (x, t ))E [(V¯ (x, t; , Nmc ) − Vˆ (x, t; ))] = 0. (10) Using Eq. (10) and the fact that Vˆ (x, t; ) and V(x, t) are constant values, Eq. (9) can be rewritten as follows,

ε(, Nmc ) = E [(V¯ (x, t; , Nmc ) − Vˆ (x, t; ))2 ] + (Vˆ (x, t; ) − V (x, t ))2 .

ˆ (x, t; , )] Var [ Nmc ˆ (x, t; , )] − E [(x, t; )]}2 + {E [

where Var [·] corresponds to the variance operator over . The statistical dependency between the statistical and physical errors is manifested in both terms in the RHS of Eq. (12). Although the errors are not completely independent as it will be shown in the following examples, the first component of Eq. (12) can be viewed as the statistical error (MC error) and its second component is strongly linked to the discretization error. Moreover, the correlation between the errors will depend on the complexity of the PDE. In order to apply the error expression, we need to evaluate the RHS of Eq. (12). To achieve this, we will provide an approximation to Eq. (12) based on heuristic arguments. The first component of the error ˆ ] which can be approximated by the sample varidepends on Var [ ˆ denoted by S2 . The sample variance can be estimated on ance of  ˆ  some preliminary evaluations of the model. The second term in RHS ˆ − ], is mainly associated with the truncation of Eq. (12), i.e. E [ error in the discretization. For this component of the error, we hyp pothesize that it can be approximated by a polynomial i=0 ap p , see chap. 3 of [27]. Note that the nature of the discretization error will highly depend on the flow and transport solver used and the numerical scheme adopted. Given the difficulty inherent in solving Eq. (12), we assume that (12) can be approximated as follows,



In the following, the notation (x, t; , Nmc ) is simplified by using (, Nmc ). The expected value of the error squared can be deter-

ε(, Nmc ) ≡ E [ 2 (, Nmc )] = E [(V¯ (x, t; , Nmc ) − Vˆ (x, t; ))2 ] + E [(Vˆ (x, t; ) − V (x, t ))2 ] − E [2(V¯ (x, t; , Nmc ) − Vˆ (x, t; )) (Vˆ (x, t; ) − V (x, t ))].

=

(5)

where E [·] represents the expectation operator with respect to the random field . As described in Section 2, to solve Eq. (1), numerical ˆ (x, t; , ) be the output of the approaches are usually used. Let  numerical method at a point in space and time for a model with input parameter  and resolution vector . Numerical errors are emˆ , so it can be interpreted as an approximation for . bedded into  ˆ is given by, Consequently, the expected value of 

ˆ (x, t; , )]. Vˆ (x, t; ) = E [

ε(, Nmc ) =

(11)

The equation above is known as the bias-variance decomposition where Vˆ (x, t; ) − V (x, t ) is the bias. The first term of the RHS of the Eq. (11) is the variance of the sample mean (V¯ (x, t; , Nmc )). Since the variance of the sample mean can be expressed as the population variance over sample size, Eq. (11) can be represented as,

The constrained optimization problem in Eq.(4) should be specified and solved to find the optimal physical () and statistical resolutions (Nmc ). The objective function, given by Eq. (13), contains the unknown parameters S2ˆ and ai (with i = 0, 1, . . . , p). These param eters can be obtained by estimating the overall error Eq. (13) from preliminary simulations. The first component of Eq. (13) embraces S2ˆ as an unknown pa rameter that can be estimated by multiplying the MC error, obtained from preliminary simulations, with Nmc . In the following examples, it will be shown that S2ˆ does not display a significant dependence on 

. The second component of Eq. (13) contains coefficients associated

with a polynomial that can be determined by fitting the discretization error of the empirical results (obtained from simulations) to a polynomial. Once the estimate for the unknown coefficients of Eq. (13) is obtained, we can address the optimization problem in Eq. (4). There are various existing optimizing methods in the literature based on mathematical methods (nonlinear programming) or heuristic algorithms [e.g. 28,29,30]. For this work, we optimized the objective function by a nonlinear programming method using the MATLAB’s fmincon built-in function which enables us to find the optimal decision variables  and Nmc . The optimal parameters resulting from the

M. Moslehi et al. / Advances in Water Resources 83 (2015) 299–309

minimization of Eq. (13), constrained by a computational time budopt opt get, are denoted by  and Nmc . 4. Evaluating the performance of the method In order to test the performance of the optimization based on Eq. opt opt (13), we will compare  and Nmc with the optimal results obtained from a series of brute-force numerical simulations. The procedure for evaluating the performance of the method is given in the following steps: 1. This step consists of solving the numerical model for different combinations of  and Nmc . For each combination of  and Nmc , we will obtain a computational cost and an overall error. Details of the error computation are given further below using Eq. (14). The outcome of this intensive numerical investigation is a data set from which we can estimate the empirical optimal points for a given computational budget, e.g. computational time B in Eq. (4). opt,∗ opt,∗ and Nmc . These empirical optimal points are denoted by  This is the approach adopted in Leube et al. [22] to construct the cost-to-error surfaces. 2. The second step aims in determining the parameters S2ˆ and ai in  Eq. (13). This can be achieved by estimating the error through numerical simulations for a few discretizations and MC realizations. 3. Once we have an estimate for S2ˆ and ai , we can evaluate and min imize Eq. (13) subject to the same computational budget B in step 1. This will allow us to estimate the optimal physical and statistiopt opt cal resolutions, namely  and Nmc . opt opt 4. Lastly, we compare  and Nmc (step 3) with the results obtained from the brute-force numerical investigation (step 1), opt,∗ opt,∗ namely  and Nmc . We point out that the optimal results obtained from the full bruteforce numerical investigation (step 1) should be based on an error formulation consistent with the proposed error definition given in Eq. (13). In other words, the error metric should be similar to Eq. (13) so that the results taken from step 1 and step 3 can be comparable. To construct the error formulation to be used in step 1, we need to analyze both terms of Eq. (13). The first component of the formula in Eq. (13), i.e. the residual variance, is mainly associated with the error of the statistical resolution. The residual variance is inversely proportional to the number of MC realizations, i.e. by increasing Nmc , statistical error will be diminished [31]. The second component of Eq. (13) evaluates the effect of physical discretization error. Thus, the error metric which is the sum of MC error and discretization error is proposed as,



ε (, Nmc ) = ∗

Nmc 1  ˆ mean )2 ˆ (i) −  ( Nmc



ˆ ref |2 , (14) ˆ mean −  + |

i=1

ˆ (i) is the output of the numerical method at a specific point where 

in space and time for the ith realization of . The sample mean of ˆ ref denotes the ˆ mean and  all realizations of the model is given by  model output for maximum Nmc and finest resolution, i.e. small . ˆ ref is considered as a reference point and for this work, we evaluate  it numerically. Evaluating Eq. (14) for a series of Nmc and  values allows to perform the tasks described in both step 1 and step 2. In step 2, Eq. (14) is used to estimate S2ˆ and ai needed for the error estimation Eq. (13).  The main drawback in using Eq. (14) to execute step 2 is that, in hyˆ ref (especially when one drogeological applications, we rarely have  wishes to, a priori, estimate the optimal Nmc and ). However, from ˆ ref can represent a measurement from the a practical perspective,  ˆ ref can be estimated through the site actual field site. For example,  investigation and conceptual model development processes and/or ˆ ref in Eq. (14) can be when calibrating the model. In other words, 

303

obtained from hydraulic head measurements or breakthrough curves measured from tracer tests. In addition to the error metric in Eq. (14), the computational effort should be quantified. The computational budget is required and should be computed as a function of  and Nmc to be used as a constraint function for optimization problem Eq. (4). There are different budget norms, such as units of elapsed time or computational floating point operations (FLOPS), that can be used to measure the required computational budget of one realization of the model for a given physical discretization. For the total budget, the computational cost associated with one realization is simply multiplied with the total number of realizations, bearing in mind that different realizations are assumed to be independent. This assumption was tested and verified for the cases analyzed in this work. Therefore, the computational budget can be estimated by the following equation:

Ttot (, Nmc ) = Nmc T˜ (),

(15)

where Ttot is the total budget, which is a function of  and Nmc , and T˜ ( · ) is the computational time associated with simulating one realization of the model for a specific spatial discretization. 5. Simplistic illustration To illustrate the methodology described in Sections 3 and 4, we consider a simple ordinary differential equation (ODE). In this example, a 1D steady-state transport of a non-reactive solute in an idealized horizontal homogeneous soil column of dimension x ∈ [0, L] is considered. The main physical processes are advection and localscale dispersion. Hence, the advection–dispersion equation is given by:

d2C (x) dC (x) −α = 0, dx dx2

(16)

with α = v/D where v is the Darcy-scale velocity and D is the localscale dispersion coefficient. C(x) corresponds to the concentration of the solute at point x. For the purpose of illustration, we assume that the v is a random parameter which has a uniform distribution bounded by 1 and 2 m/d, i.e. U [1, 2]m/d. The concentrations at x = 0 and x = L are equal to C0 and CL , respectively. The desired output () of this example is the concentration of the plume at a specific point x0 , which is denoted by Cx0 ≡ C (x0 ). The finite difference method (FDM) is employed to solve Eq. (16) numerically within the MC framework to capture the effect of the uncertainty of v. Table 1 summarizes the relevant parameters associated with this example. Following the procedure described in Section 4, we have executed a large number of simulations with different grid resolutions ( = L/n where n is the number of grid blocks) and MC realizations (Nmc ). For each simulation, we recorded the computational error, using Eq. (14) and the computational time (step 1). The results of these simulations are illustrated in Figs. 2 and 3. The total error evaluated in Eq. (14) associated with different combinations of  and Nmc is shown in Fig. 2a and b (step 1). It can be observed that the total error diminishes by increasing the statistical and spatial resolutions. The dependence of the MC error (first component Table 1 Parameters for the ODE example. Parameters

Values

Units

L C0 CL x0 D v

10 1 0 7.5 1 U [1,2] 0.66, 0.5, 0.4, 0.33, 0.28, 0.25, 0.22, 0.20 102 , 5 × 103 , 103 , 5 × 103 , 104

m ppm ppm m m2 /d m/d m –

 Nmc

304

M. Moslehi et al. / Advances in Water Resources 83 (2015) 299–309 −4

x 10

Δ = 0.66 Δ = 0.50 Δ = 0.40 Δ = 0.33

−3

Nmc = 100 2.5

N

= 500

N

= 1000

N

= 5000

N

= 10000

mc mc

2 Total Error

Total Error

10

Δ = 0.28 Δ = 0.25 Δ = 0.22 Δ = 0.20

−4

10

mc mc

1.5 1

−5

10

0.5 −6

10

0

2000

4000 6000 8000 No. of MC Realizations

10000

0.2

0.3 0.4 0.5 0.6 Spatial Discretization size (Δ)

0.7

(b)

(a) −4

x 10

MC Error

0.8 0.6 0.4 0.2 0 0

0.012

Nmc = 100 Nmc = 500

0.01 Discretization Error

Δ = 0.66 Δ = 0.50 Δ = 0.40 Δ = 0.33 Δ = 0.28 Δ = 0.25 Δ = 0.22 Δ = 0.20

1

Nmc = 1000 Nmc = 5000

0.008

Nmc = 10000

0.006 0.004 0.002

2000

4000 6000 8000 No. of MC Realizations

10000

(c)

0 0.2

0.3 0.4 0.5 0.6 Spatial Discretization size (Δ)

0.7

(d)

Fig. 2. (a) The total error (Eq. (14)) versus the number of MC realizations (statistical resolutions) for different spatial discretizations . (b) Evolution of the total error (Eq. (14)) versus spatial discretization for given numbers of MC realizations. (c) MC error (first term of the RHS of Eq. (14)) versus the number of MC realizations for different spatial resolutions. (d) Discretization error (second term of the RHS of Eq. (14)) versus the spatial discretization for different number of MC realizations.

of the RHS of Eq. (14)) on Nmc for various values of  is depicted in Fig. 2c. It is shown that the stochastic errors are nearly identical for all  values explored. This result indicates that for this class of problem, the statistical error can be assumed to be independent of . In order to estimate S2ˆ in Eq. (13), the statistical error is multiplied by  Nmc . The results show that S2ˆ is almost a constant value (approxi mately equal to 0.01) and it does not change by Nmc . As a result, we can use the set of simulations with minimum computational demand (minimum Nmc and maximum ) to estimate S2ˆ (step 2).  The discretization error is computed by using the second term of the RHS of Eq. (14) and the results are shown in Fig. 2 d. It can be observed that the trend of discretization error is almost the same for any value of statistical resolution. The evolution of the error is in agreement with the classic results from discretization error analysis [chap. 3 of 27]. For this problem, the discretization error is almost independent of Nmc (compare curves for Nmc =102 and 104 in Fig. 2d). As mentioned in Section 3.1, a polynomial can be fitted to the discretization errors; in our work, we opted for a quadratic polynomial. Based on our simulations, the fitted polynomials obtained from different Nmc are almost identical (see Fig. 2d), thus the curve achieved for Nmc = 500 is considered as a representative for all cases. There-

fore, the quadratic polynomial fitted with Nmc = 500 is used in the objective function Eq. (13) (i.e. p = 2). By following the methodology described in Section 4, the parameters in Eq. (13), namely S2ˆ and ai (for i = 0,1,2), can be esti mated (step 2). As a consequence, Eq. (13) can be optimized with respect to the computational resource constraints. The optimal solutions of different computational budget constrains are shown in Fig. 3 (step 3). In this figure we also have included the contour plots of computational time and error empirically obtained from the results of aforementioned simulations for all cases of Nmc and  (step 1). This figure verifies the proper performance of the methodology since the optimal points found by solving optimization problem opt opt and Nmc , are close to the minimum error described in step 3,  opt,∗ and contour lines restricted to the time constraints (i.e. the  opt,∗ Nmc in step 1 are located in the dashed blue line crosses the solid black line). Next, we compare the error, ε ∗ , obtained from empirical results and Eq. (14), with the estimated error, ε , calculated by Eq. (13), to quantitatively investigate the accuracy of the proposed overall error equation (13). In order to show the error comparison, three different physical and statistical resolutions are chosen; the actual error (ε ∗ )

M. Moslehi et al. / Advances in Water Resources 83 (2015) 299–309

305

Fig. 3. Error and computational time for different combinations of Nmc and . The optimal points were computed by solving Eq. (4) subject to the following computational budget (time): Ttot = [5, 10, 50, 100, 250] s. It can be observed that the empirical optimal points found by brute-force simulations and the optimal points achieved by solving Eq. (4) are almost identical. (For interpretation of the references to color in the text, the reader is referred to the web version of this article.)

Table 2 Comparison between the actual and estimated error for the ODE example Nmc

(m)

ε

ε∗

102 103 104

L/15 L/40 L/30

2.47×10−4 1.35×10−5 1.06×10−5

2.68×10−4 1.54×10−5 9.34×10−6

and the estimated error (ε ) are presented in Table 2. As it is illustrated in Table 2, the values of ε and ε ∗ are similar.

∂ C (x, t ) + u(x) · ∇ C (x, t ) = Dd ∇ 2C (x, t ), ∂t

6. Heterogeneous groundwater flow and transport problem In this section, the proposed methodology is applied to a groundwater heterogeneous flow model. For this illustration, we consider a regularly spaced grid, i.e. i ≡  for i = 1, 2, . . . , J. As done in the previous section, the accuracy of the method is verified by comparing the optimal results from Eq. (4) (step 3) with the optimal results from the brute-force numerical simulations (step 1). For this section, we have selected a flow and transport configuration widely used in the stochastic hydrogeological literature. Two distinct transport scenarios are investigated and their details are presented in the remaining parts of this section. 6.1. Physical formulation, model setup and implementation A two-dimensional aquifer with a steady-state incompressible horizontal flow field in the absence of sinks and sources is considered, with a Cartesian coordinate system x = (x1 , x2 ). The transmissivity (T) is spatially variable, i.e. T = T (x), and isotropic. The porosity (φ ) of the geological formation is considered uniform. Flow is governed by the following differential equation:

 ∇ · T (x)∇ H (x) = 0,

with no-flow boundary conditions in the transverse boundaries and a constant head gradient is imposed at the longitudinal boundaries of the flow domain. This physical set-up was selected for the purpose of illustration and different types of boundary conditions (deterministic or random) can be incorporated. The hydraulic head is denoted by H. For this example, the environmental performance metric of interest will be the time of the peak concentration (τ p ) at a control plane downstream from the injection source. Transport of an instantaneously released dissolved non-reactive tracer along a line source of transverse dimension (L2 ) is assumed to be governed by the advection–dispersion equation:

(17)

(18)

where Dd is the local-scale dispersion tensor, u(x) is the Darcy-scale velocity field and C(x, t) is the resident concentration of the plume. The velocity u(x) is determined via Darcy’s law once the solution of Eq. (17) is obtained. For this investigation, the major source of uncertainty stems from the spatial distribution of T. The log-transmissivity field, Y = ln T is modeled as a statistically stationary random space function (RSF) [2] with variance σ 2 , correlation length λ and covariance CY . We consider an isotropic exponential covariance model for the upcoming illustrations. To cast Eqs. (17) and (18) within a MC framework, we need to randomly generate multiple (Nmc ) realizations of Y. For each equiprobable realization of Y, we will solve the flow and transport and obtain the statistics of τ p . To randomly generate the spatially variable Y, we will employ SGeMS [32]. Flow is solved using MODFLOW [33] and solute transport was modeled through the random walk particle tracking code (RW3D) documented in [34–37]. The concentration field within the random walk particle tracking framework is obtained using the methodology described in [38]. We performed extensive MC simulations using MODFLOW in a 200 × 200 m2 square domain, in which

306

M. Moslehi et al. / Advances in Water Resources 83 (2015) 299–309 Table 3 Parameters of flow and transport in heterogeneous field example.

6.2. Computational results and method performance

Parameters

Value

Units

σ λ

0.8 6.667 1 0.3 0.1 0.02 0.07 30 000 λ, λ, λ , λ 3 6 10 20 10, 50, 100, 500, 1000, 5000, 10 000

− m m/d − m2 /d m2 /d − − m −

2

KG

φ

Dd, 11 Dd, 22 Hydraulic gradient Number of particles

 Nmc

the spatial resolutions in both directions are identical. Table 3 summarizes the flow, transport and geostatistical parameters associated with this computational example. The time associated with the peak concentration is evaluated at the control plane located at x1 = 100 m. In these scenarios, the contaminant is released from the vertical line segment source located at x1 = 25 m with length L2 . The impact of L2 on the optimal resource allocation will be addressed in Section 6.3.

Δ = λ/3 Δ = λ/6 Δ = λ/10 Δ = λ/20

4

10

3500 3000

3

10

Nmc = 50

Nmc = 1000

N

= 100

N

= 5000

N

= 500

N

= 10000

mc mc

2500 Total Error

Total Error

The upcoming results are based on a contaminant line segment source of dimension L2 = 2.5λ m. Fig. 4 depicts the overall empirical error (Eq. (14)) and its two components (MC and spatial errors) for various combinations of statistical and physical resolutions (see step 1 in Section 4). As it is shown in Fig. 4b and d, the discretization error and the total error increase as the grid block is coarsened. As expected, the error decreases for a higher statistical resolution (see Fig. 4 a and c). In addition, Fig. 4d reveals that both statistical and discretization errors are not independent. The dependency between the errors is explicitly taken into account in Eq. (12). Next, we utilize the results in Fig. 4c and d to determine the coefficients, S2ˆ and ai . These are required for the evaluation of the objec tive function (13), see step 2 in Section 4. As depicted in Fig. 4c, the curves obtained from all physical resolutions are quite similar. Thus, S2ˆ can be evaluated based on simulations with a coarse numerical  grid block. The coefficients ai , second term in the RHS of Eq. (13), can be estimated by fitting a polynomial curve through the results presented in Fig. 4d. For this illustration, we fit the polynomial to the results

mc mc

2000 1500

2

10

1000 500 1

10

0

2000

4000 6000 8000 No. MC Realizations

0

10000

0.5

1 1.5 2 Spatial Discretization size (Δ)

(b)

(a) 4

Δ = λ/3 Δ = λ/6 Δ = λ/10 Δ = λ/20

3

MC Error

10

2

10

1

10

18 16 14 Discretization Error

10

12 10

N

= 50

8

N

= 100

6

Nmc = 500

mc mc

Nmc = 1000

4

0

= 5000

N

= 10000

mc

0

10

N

mc

2

2000

4000 6000 8000 No. of MC Realizations

(c)

10000

0

0.5

1 1.5 2 Spatial Discretization size (Δ)

(d)

Fig. 4. (a) The total error (Eq. (14)) versus the number of MC realizations (statistical resolutions) for different spatial discretizations . (b) Evolution of the total error (Eq. (14)) versus spatial discretization for given numbers of MC realizations. (c) MC error (first term of the RHS of Eq. (14)) versus the number of MC realizations for different spatial resolutions. (d) Discretization error (second term of the RHS of Eq. (14)) versus the spatial discretization for different number of MC realizations.

M. Moslehi et al. / Advances in Water Resources 83 (2015) 299–309

307

Fig. 5. Error and computational time for different combinations of Nmc and . The optimal points were computed by solving Eq. (4) subject to following computational budget (time): Ttot = [105 , 2 × 105 , 3 × 105 ] s. It can be observed that the empirical optimal points found by brute-force simulations and the optimal points achieved by solving Eq. (4) are almost identical. (For interpretation of the references to color in the text, the reader is referred to the web version of this article.)

computed for Nmc = 500 since additional Nmc leads to an incremental decrease of the discretization error (see Fig. 4d). As opposed to the ODE example presented in Section 5, where the discretization error had a clear quadratic growth independent of the number of MC runs (see Fig. 2d), the results shown in Fig. 4d display a dependency on Nmc . This dependency is reduced for Nmc  500 (Fig. 4d). Notice that the error displayed in Fig. 4d accumulates the discretization error from both the MODFLOW software (based on the FDM) and the random walk particle tracking code [34–36]. The latter contains errors associated with the time step, velocity interpolation and smoothing of the dispersion tensor [34,39]. We believe this to be the cause of the complex dynamics observed in Fig. 4d. The overall error and the computational time for all values of  and Nmc are depicted in Fig. 5. This figure reflects the existence of a trade-off between the overall error and the computational budget. It also illustrates the comparison of the empirical results and the results obtained from optimizing the objective function subject to different resource constraints (steps 1 and 3 in Section 4). For each time conopt straint, the optimal point (opt and Nmc ) is close to the minimum error obtained empirically through extensive simulations (opt,∗ and opt,∗ Nmc ). This observation implies that the proposed error metric and framework provide an acceptable estimation of the optimal decision variables  and Nmc for a given limited computational budget. Table 4 reports the actual error (ε ∗ ) and the estimated error (ε ) for three different statistical and physical resolutions to investigate the accuracy of the proposed overall error estimation (Eq. (13)). As it is shown in Table 4, Eq. (13) provides an appropriate estimation of the actual error since ε ∗ and ε are almost identical for any resolution. Next, we evaluate the performance of the methodology with commonly used grid size values suggested by the literature, independent on the number of MC runs. Ababou et al. [12] recommend using a grid block value  = λ/4 to capture the effects of heterogeneity in groundwater flow and transport. In addition, modelers tend to perform stochastic simulations with Nmc on the order of 102 –103 .

Table 4 Comparison between the actual and the estimated error in heterogeneous flow and transport problem. Nmc



ε

ε∗

102 103 104

λ/20 λ/10 λ/6

963.8 128.9 87.6

1008.1 130.8 84.6

Running our flow and transport simulation with these commonly used parameters ( = λ/4, Nmc = 103 ), we obtain an error of ε ∗ = 290 and a computational cost of Ttot = 33 960 s. Using Ttot as a constraint, i.e. B = Ttot , we adopt the proposed method defined in Section 4 (Eq. (13)) and obtain the following near optimal values for the grid size and number of MC:  = λ/6.65 and Nmc = 680 with ε = 206. Note that our method provides an overall error (ε = 206) smaller than the one estimated from the recommended  and Nmc values from the literature (about 29% decrease in the total error is observed). We believe that this difference will be enhanced when dealing with higher levels of aquifer heterogeneity and model complexopt ity. Furthermore, the near optimum results for opt and Nmc suggest a finer grid resolution as opposed to the statistical resolution for the given time constraint adopted (B = Ttot ). 6.3. Plume-scale and ergodicity Next, we analyze the impact of the source dimension, L2 , in the optimal statistical and physical resolutions. We performed simulations with the same parameters as the previous scenario (Table 3) on the same Y fields generated by SGeMS in Section 6.2. The main distinction is the increase of contaminant line segment source from L2 = 2.5λ m to L2 = 9λ m. By repeating the analysis, we obtain the new optimal resolutions. Fig. 6 displays the computational time

308

M. Moslehi et al. / Advances in Water Resources 83 (2015) 299–309

10000 1st Scenario (L2=2.5λ)

9000

2nd Scenario (L2=9λ)

00

00

00

00

7000

45 30

No. of MC Realizations

8000

6000 5000

60

20

00

00

00

00

4000 3000

100

000

2000 1000 5

10 15 Spatial Resolution (λ / Δ)

20

Fig. 6. Optimal points of the two scenarios with different source zone dimensions are depicted subject to the 3 different time constraints: Ttot = [105 , 2 × 105 , 3 × 105 ] s. Blue square symbols correspond to L2 = 9λ m, while red circle symbols are for L2 = 2.5λ m. Solid lines correspond to the computational cost. (For interpretation of the references to color in this figure legend and in the text, the reader is referred to the web version of this article.)

contour map (solid lines) as a function of the grid resolution and the number of Monte Carlo realizations as well as the optimal points for L2 = 2.5λ (red circles) and L2 = 9λ (blue squares). The deviation of optimal points in this scenario (L2 = 9λ m) from the optimal resolutions obtained from the previous scenario (L2 = 2.5λ m) is illustrated in Fig. 6. As depicted in Fig. 6, by increasing the size of contaminant source relative to λ, the prediction error for τ p becomes more sensitive to the physical resolution rather than number of MC simulations. Fig. 6 shows the optimal path (the path that includes the optimal points) has a smaller slope in the scenario for a larger plume. As a consequence, the model output is more sensitive to the physical discretization when compared to the previous scenario investigated (i.e. with the smaller source zone, L2 = 2.5λ m). This shift in the optimal allocation is due to the plume ergodicity. The results in Fig. 6 reveal that under near ergodic condition the model response (τ p ) is less sensitive to the statistical resolution, as opposed to . This result is in agreement with the stochastic hydrogeological theory [2,40], i.e. near ergodic plumes are less prone to uncertainty. Investigating the optimal resolution sensitivity to the plume scale can assist decision makers for the better allocation of computational resources. Similar analysis was performed in the context of goal-oriented uncertainty reduction [6]. 7. Conclusions We examined the impact of both physical and stochastic resolutions on the overall error approximation of the model response subject to a computational budget constraint. In general, hydrogeological predictions of interest are large-scale with excessive computational complexity and uncertain characteristics. A huge variety of numerical schemes combined with Monte Carlo framework are widely used to handle the complexity associated with these models. However, in order to maintain accurate predictions, one needs to wisely allocate limited computational resources. In this paper, by considering the tradeoff among two conflicting objectives, i.e. the available computational resource and the accuracy of predictions, we proposed a framework based on the bias-variance

tradeoff analysis to determine the optimal physical and statistical resolutions for a given computational resource in order to minimize the error of predictions. Our objective is to illustrate the existence of an appropriate choice of physical and stochastic resolutions that accurately estimate the model’s prediction with respect to a given computational resource constraint. To achieve this, we derived an error expression of the model response that accounts for both sources of error as well as its statistical dependency. We applied the method to a series of examples in order to test its performance. Results show that the optimal points, achieved through the model error expression, are close to the actual optimal points obtained by extensive brute-force numerical simulation (i.e. examining all candidate resolutions to find the minimal error with the given time constraint). In addition, our analyses show that various factors such as the physics of the problem, numerical scheme and the available computational resources affect the optimal resolutions and the allocation of computational resources. In particular, the optimal resolutions highly depend on the available computational budget. The initial dimension of the contaminant source plays a fundamental role in defining the allocation of computational resources. As the size of the contaminant source increases, with respect to the correlation scale, the model output becomes more sensitive to the physical discretization and the significance of the statistical resolution is diminished. This is an outcome of plume ergodicity: larger plumes are less prone to uncertainty since it samples more variability of the conductivity field [2,40,41]. Extending a similar analysis to investigate the impact of different error metrics on the computational resources allocation should be considered in future studies. In addition, applying the ideas discussed in this paper to address a real case-study would prove a future research endeavor. An interesting question to be addressed is how the tradeoffs for spatial and statistical resolutions behave for three dimensional groundwater flows and for geological formations with both high heterogeneity and connectivity. Moreover, results should be expanded to consider the tradeoffs for both spatial and temporal resolutions in the presence of uncertainty and to investigate the impact of distinct boundary conditions in the final optimal design. Although outside the scope of the current contribution, we highlight that the computational effort associated with the physical discretization can be alleviated through upscaling techniques. For example, the computational burdens can be relieved by developing block-scale conductivity and block-scale dispersion coefficients [17–19,42,43]. It should be noted that the optimal resource allocation is goaloriented and depends on the level of statistics being sought. If one is interested in evaluating extreme values (e.g. rare events in contamination [4,26]), the error metric needs to be modified appropriately. Capturing these rare events would require several MC runs and a refined numerical grid to accurately predict peak concentration. Therefore, the target upon which decisions will be made plays a significant role in developing the error expression and consequently the resource allocation. As discussed in Section 4, the approach proposed requires determining, a priori, the reference solution. To avoid this problem, field data from preliminary site characterization (e.g. [44]) or under certain conditions, analytical solution or surrogate approximations (e.g. [21,45]) can be utilized to evaluate the reference solution. Additional investigation should be carried out to propose an error metric without the requirement of having the reference point. Moreover, spatial refinement depends on the smoothness of the flow field. In the presence of high velocity gradients, such as in the vicinity of pumping or injection wells and fractures, the discretization may become adaptive. Under these conditions, higher physical resolution might be required as opposed to a higher statistical resolution since a fraction of the flow is controlled by a pumping or injection well. Hence, new strategies should be explored to determine

M. Moslehi et al. / Advances in Water Resources 83 (2015) 299–309

the optimal discretization based on the physical setting of the problem (e.g. distinct boundary conditions, sinks and sources) and the geostatistical characteristics. The results shown in this study are limited to our simulation configurations and the numerical method adopted, however, the methodology can be applied and expanded to models in other environmental flows that entails numerical schemes and uncertainty. Acknowledgments The first author gratefully acknowledges the financial support from the USC Provost’ s Ph.D. Fellowship. All authors would like to thank Christopher Henri and Daniel Fernandez-Garcia for providing the flow and transport codes and valuable technical advices. The authors thank the anonymous reviewers, Arianna Libera and Arsalan Heydarian for editorial comments. References [1] Helmig R, et al. Multiphase flow and transport processes in the subsurface: a contribution to the modeling of hydrosystems. Springer-Verlag; 1997. [2] Rubin Y. Applied stochastic hydrogeology. Oxford University Press, USA; 2003. [3] Fernàndez-Garcia D, Bolster D, Sanchez-Vila X, Tartakovsky DM. A bayesian approach to integrate temporal data into probabilistic risk analysis of monitored NAPL remediation. Adv Water Resour 2012;36:108–20. http://dx.doi.org/10.1016/j.advwatres.2011.07.001. [4] Tartakovsky DM. Assessment and management of risk in subsurface hydrology: a review and perspective. Adv Water Resour 2013;51:247–60. http://dx.doi.org/10.1016/j.advwatres.2012.04.007. [5] Maxwell RM, Kastenberg WE, Rubin Y. A methodology to integrate site characterization information into groundwater-driven health risk assessment. Water Resour Res 1999;35(9):2841–55. http://dx.doi.org/10.1029/1999WR900103. [6] de Barros FPJ, Rubin Y, Maxwell RM. The concept of comparative information yield curves and its application to risk-based site characterization. Water Resour Res 2009;45(6). http://dx.doi.org/10.1029/2008WR007324. [7] Porporato A, Dodorico P, Laio F, Ridolfi L, Rodriguez-Iturbe I. Ecohydrology of water-controlled ecosystems. Adv Water Resour 2002;25(8):1335–48. http://dx.doi.org/10.1016/S0309-1708(02)00058-1. [8] Asmussen S, Glynn PW. Stochastic simulation: algorithms and analysis, 57. Springer; 2007. http://dx.doi.org/10.1007/978-0-387-69033-9. [9] Ghanem RG, Spanos PD. Stochastic finite elements: a spectral approach, 387974563. Springer; 1991. http://dx.doi.org/10.1007/978-1-4612-3094-6. [10] Li H, Zhang D. Probabilistic collocation method for flow in porous media: comparisons with other stochastic methods. Water Resour Res 2007;43(9). http://dx.doi.org/10.1029/2006WR005673. [11] Oladyshkin S, Nowak W. Data-driven uncertainty quantification using the arbitrary polynomial chaos expansion. Reliab Eng Syst Safe 2012;106:179–90. http://dx.doi.org/10.1016/j.ress.2012.05.002. [12] Ababou R, McLaughlin D, Gelhar L, Tompson A. Numerical simulation of threedimensional saturated flow in randomly heterogeneous porous media. Transport Porous Med 1989;4(6):549–65. http://dx.doi.org/10.1007/BF00223627. [13] Cainelli O, Bellin A, Putti M. On the accuracy of classic numerical schemes for modeling flow in saturated heterogeneous formations. Adv Water Resour 2012;47:43–55. http://dx.doi.org/10.1016/j.advwatres.2012.06.016. [14] Leube PC, Nowak W, Schneider G. Temporal moments revisited: why there is no better way for physically based model reduction in time. Water Resour Res 2012;48(11). http://dx.doi.org/10.1029/2012WR011973. [15] Battiato I, Tartakovsky DM, Tartakovsky AM, Scheibe T. Hybrid models of reactive transport in porous and fractured media. Adv Water Resour 2011;34(9):1140–50 (new computational methods and software tools). http://dx.doi.org/10.1016/j.advwatres.2011.01.012. [16] Dentz M, Le Borgne T, Englert A, Bijeljic B. Mixing, spreading and reaction in heterogeneous media: a brief review. J Contam Hydrol 2011;120:1–17. http://dx.doi.org/10.1016/j.jconhyd.2010.05.002. [17] Wen X-H, Gómez-Hernández JJ. Upscaling hydraulic conductivities in heterogeneous media: an overview. J Hydrol 1996;183o(1). http://dx.doi.org/10.1016/S0022-1694(96)80030-8. [18] Rubin Y, Sun A, Maxwell R, Bellin A. The concept of blockeffective macrodispersivity and a unified approach for grid-scaleand plume-scale-dependent transport. J Fluid Mech 1999;395:161–80. http://dx.doi.org/10.1017/S0022112099005868. [19] de Barros FPJ, Rubin Y. Modelling of block-scale macrodispersion as a random function. J Fluid Mech 2011;676:514–45. http://dx.doi.org/10.1017/jfm.2011.65.

309

[20] Ballio F, Guadagnini A. Convergence assessment of numerical Monte Carlo simulations in groundwater hydrology. Water Resour Res 2004;40(4). http://dx.doi.org/10.1029/2003WR002876. [21] Pasetto D, Guadagnini A, Putti M. A reduced-order model for monte carlo simulations of stochastic groundwater flow. Computat Geosci 2014;18(2):157–69. http://dx.doi.org/10.1007/s10596-013-9389-4. [22] Leube PC, de Barros FPJ, Nowak W, Rajagopal R. Towards optimal allocation of computer resources: trade-offs between uncertainty quantification, discretization and model reduction. Environ Model Softw 2013;50:97–107. http://dx.doi.org/10.1016/j.envsoft.2013.08.008. [23] de Barros FPJ, Ezzedine S, Rubin Y. Impact of hydrogeological data on measures of uncertainty, site characterization and environmental performance metrics. Adv Water Resour 2012;36:51–63 (special issue on uncertainty quantification and risk assessment). http://dx.doi.org/10.1016/j.advwatres.2011.05.004. [24] Surhone L, Tennoe M, Henssonow S. Moving particle semi- implicit method. VDM Publishing; 2010. [25] Benz W. Smooth particle hydrodynamics: a review. In: Buchler J, editor. The numerical modelling of nonlinear Stellar pulsations. NATO ASI series, 302. Netherlands: Springer; 1990. p. 269–88. http://dx.doi.org/10.1007/978-94-009-05191_16. [26] Boso F, de Barros FPJ, Fiori A, Bellin A. Performance analysis of statistical spatial measures for contaminant plume characterization toward risk-based decision making. Water Resour Res 2013;49(6):3119–32. http://dx.doi.org/10.1002/wrcr.20270. [27] Tannehill J, Anderson D, Pletcher R. Computational fluid mechanics and heat transfer, Series in computational and physical processes in mechanics and thermal sciences. Washington, DC: Taylor and Francis; 1997. [28] Goldberg D. Genetic algorithms in search, optimization, and machine learning. Addison-Wesley; 1989. http://dx.doi.org/10.5860/choice.27-0936. [29] Dorigo M, Maniezzo V, Colorni A. Ant system: optimization by a colony of cooperating agents. IEEE Syst Man Cybern 1996;26(1):29–41. http://dx.doi.org/10.1109/3477.484436. [30] Kaveh A, Motie M, Moslehi M. Magnetic charged system search: a new meta-heuristic algorithm for optimization. Acta Mech 2013;224(1):85–107. http://dx.doi.org/10.1007/s00707-012-0745-6. [31] Liu JS. Monte Carlo strategies in scientific computing. Springer; 2008. [32] Remy N, Boucher A, Wu J. Applied geostatistics with SGeMS: a user’s guide. Cambridge University Press; 2009. http://dx.doi.org/10.1007/s11004-009-9217-5. [33] Harbaugh AW. MODFLOW-2005, the US Geological Survey modular groundwater model: the ground-water flow process. VA, USA: US Department of the Interior, US Geological Survey Reston; 2005. [34] Salamon P, Fernandez-Garcia D, Gomez-Hernandez JJ. A review and numerical assessment of the random walk particle tracking method. J Contam Hydrol 2006;87(3–4):277–305. http://dx.doi.org/10.1016/j.jconhyd.2006.05.005. [35] Henri CV, Fernandez-Garcia D. Toward efficiency in heterogeneous multispecies reactive transport modeling: a particle-tracking solution for first-order network reactions. Water Resour Res 2014;50(9):7206–30. http://dx.doi.org/10.1002/2013WR014956. [36] Fernandez-Garcia D, Illangasekare TH, Rajaram H. Differences in the scaledependence of dispersivity estimated from temporal and spatial moments in chemically and physically heterogeneous porous media. Adv Water Resour 2005;28(7):745–59. http://dx.doi.org/10.1016/j.advwatres.2004.12.011 [37] Pedretti D, Fernàndez-Garcia D. An automatic locally-adaptive method to estimate heavily-tailed breakthrough curves from particle distributions. Adv Water Resour 2013;59:52–65. http://dx.doi.org/10.1016/j.advwatres.2013.05.006. [38] Fernàndez-Garcia D, Sanchez-Vila X. Optimal reconstruction of concentrations, gradients and reaction rates from particle distributions. J Contam Hydrol 2011;120–121:99–114. http://dx.doi.org/10.1016/j.jconhyd.2010.05.001. [39] Zheng C. Analysis of particle tracking errors associated with spatial discretization. Groundwater 1994;32(5):821–8. http://dx.doi.org/10.1111/j.17456584.1994.tb00923.x. [40] Dagan G. Transport in heterogeneous porous formations: spatial moments, ergodicity, and effective dispersion. Water Resour Res 1990;26(6):1281–90. http://dx.doi.org/10.1029/WR026i006p01281. [41] Fiori A. On the influence of pore-scale dispersion in nonergodic transport in heterogeneous formations. Transport porous med 1998;30(1):57–73. http://dx.doi.org/10.1023/A:1006548529015. [42] Ebrahimi F, Sahimi M. Multiresolution wavelet scale up of unstable miscible displacements in flow through heterogeneous porous media. Transport Porous Med 2004;57(1):75–102. http://dx.doi.org/10.1023/B:TIPM.0000032742.05517.06. [43] Lawrence AE, Rubin Y. Block-effective macrodispersion for numerical simulations of sorbing solute transport in heterogeneous porous formations. Adv Water Resour 2007;30(5):1272–85. http://dx.doi.org/10.1016/j.advwatres.2006.11.005. [44] Ferro CA, Jupp TE, Lambert FH, Huntingford C, Cox PM. Model complexity versus ensemble size: allocating resources for climate prediction. Phil Trans R Soc A 2012;370:1087–99. http://dx.doi.org/10.1098/rsta.2011.0307. [45] Zhang D, Lu Z. An efficient, high-order perturbation approach for flow in random porous media via Karhunen–Loeve and polynomial expansions. J Comput Phys 2004;194(2):773–94. http://dx.doi.org/10.1016/j.jcp.2003.09.015.