Stoch Environ Res Risk Assess (2015) 29:959–974 DOI 10.1007/s00477-014-0927-y
ORIGINAL PAPER
Development of a surrogate model and sensitivity analysis for spatio-temporal numerical simulators Amandine Marrel • Nadia Perot • Cle´mentine Mottet
Published online: 22 July 2014 Springer-Verlag Berlin Heidelberg 2014
Abstract To evaluate the consequences on human health of radionuclide releases in the environment, numerical simulators are used to model the radionuclide atmospheric dispersion. These codes can be time consuming and depend on many uncertain variables related to radionuclide, release or weather conditions. These variables are of different kind: scalar, functional and qualitative. Given the uncertain parameters, code provides spatial maps of radionuclide concentration for various moments. The objective is to assess how these uncertainties can affect the code predictions and to perform a global sensitivity analysis of code in order to identify the most influential uncertain parameters. This sensitivity analysis often calls for the estimation of variance-based importance measures, called Sobol’ indices. To estimate these indices, we propose a global methodology combining several advanced statistical techniques which enable to deal with the various natures of the uncertain inputs and the high dimension of model outputs. First, a quantification of input uncertainties is made based on data analysis and expert judgment. Then, an initial realistic sampling design is generated and the corresponding code simulations are performed. Based on this sample, a proper orthogonal decomposition of the spatial output is used and the main decomposition coefficients are modeled with Gaussian process surrogate model. The obtained spatial metamodel is then used to compute spatial maps of Sobol’ indices, yielding the identification of global and local influence of each input variable and the detection of areas with interactions. The impact of uncertainty quantification step on the results is also evaluated.
A. Marrel (&) N. Perot C. Mottet CEA, DEN, DER, 13108 Saint-Paul-lez-Durance, France e-mail:
[email protected]
Keywords Spatio-temporal models Metamodel Gaussian process POD decomposition Global sensitivity analysis
1 Introduction In case of radioactive material release in the environment as a consequence of nuclear power plant accident or accidental releases due to other events, the first concern relates to the dispersion of these radioactive substances in the air. Atmospheric dispersion process is very efficient to characterize the transport of the released substances to long distance and define the geographical areas that are likely to be contaminated. The prediction of radionuclide dispersion in the atmosphere is an important element for the emergency response procedures and risk assessment. Numerical modeling is an essential tool for an accurate prediction of the plume contamination spread and an assessment of environmental risks associated with the site. The CEA has developed the CERES-MITHRA (C-M) application to model the radionuclide atmospheric dispersion and evaluate the consequences on human health of radioisotope releases in the environment. This application is used either for crisis management or to perform assessment calculations for regulatory safety documents relative to nuclear facilities. However, it is well known that many input variables, such as radionuclide release characteristics or weather conditions are highly uncertain. To deal with all these uncertainties, computer experiment methodologies based upon statistical techniques are useful. The uncertainty analysis step is used to evaluate statistical indicators, confidence intervals, or the density probability distribution of the model response (De Rocquigny et al. 2008), while the global sensitivity analysis
123
960
step is used to quantify the influence of the model input variable uncertainties (in their whole range of variations) on model responses (Saltelli et al. 2000). Recent studies have applied different statistical methods of uncertainty and sensitivity analysis to environmental models (Helton 1993; Nychka et al. 1998; Fasso et al. 2003; Volkova et al. 2008; Lilburne and Tarantola 2009; Marrel et al. 2011; Ciriello et al. 2012). All these methods have shown their efficiency in providing guidance to a better understanding of the modeling. However, C-M application has several specific features. First of all, the uncertain variables can be of different kind: scalar, functional and qualitative. It is necessary to propose a well-adapted uncertainty quantification (i.e. probabilistic modeling), especially for functional inputs such as weather conditions, and to extend the classical sensitivity techniques to this kind of functional inputs. Moreover, the C-M simulator (or C-M code) is a time consuming complex spatio-temporal model based on a Gaussian puff model. It is too time expensive to be directly used to conduct uncertainty propagation studies or global sensitivity analysis based on intensive Monte Carlo methods. To avoid the problem of huge calculation time, it can be useful to replace the complex computer code by a mathematical approximation, called surrogate model or metamodel (Sacks et al. 1989; Fang et al. 2006). This function must be as representative as possible of the computer code, with good prediction capabilities. In addition, it must require a negligible calculation time which allow the computation of sensitivity indices. For example, Ciriello et al. (2012) recently use polynomial chaos expansion to compute variance-based sensitivity indices for the peak radionuclide concentration, predicted by a contaminant transport model. Among all the metamodel-based solutions (polynomials, splines, neural networks, polynomial chaos, etc.), we focus our attention on the Gaussian process (Gp) model. It can be viewed as an extension of the kriging method, which is used for interpolating data in space (Chile`s and Delfiner 1999), to computer code data (Sacks et al. 1989; Oakley and O’Hagan 2002). Many authors (e.g., Welch et al. 1992; Marrel et al. 2008) have shown how the Gp model can be used as an efficient emulator of code responses, even in high dimensional cases. Given the uncertain parameters, C-M code provides spatial maps of radionuclide concentration for various moments defined by the scenario. The first challenge here is to extend the currently used metamodels to the specific case of functional output, i.e. which varies in space and time. These spatial outputs encompass several thousands of grid blocks, each with a concentration value. This kind of problem cannot be tuned to a vectorial output problem because of its dimensionality: the metamodeling of this vectorial output cannot be solved referring to kriging or
123
Stoch Environ Res Risk Assess (2015) 29:959–974
cokriging techniques (Fang et al. 2006). Therefore, we consider the model output as a functional output synthesized by its projection on an appropriate basis. This problem of building a metamodel (based upon functional decomposition and Gp modeling) for a functional output has been addressed for one-dimensional outputs by Shi et al. (2007) and Bayarri et al. (2007) and, for twodimensional outputs, by Higdon et al. (2008) and more recently by Marrel et al. (2011). The latter have proposed to use a wavelet decomposition combined with Gp modeling of coefficients. The obtained spatial metamodel were efficient but a large number of wavelet components have to be kept and, consequently, many Gp metamodels have to be built, which can be delicate in the case of a large number of inputs and several outputs. In the case of sensitivity analysis, a functional output is usually considered as a vectorial output and sensitivity indices relative to each input are computed for each discretized value of the output (De Rocquigny et al. 2008). To avoid the large amount of sensitivity index computations when applying such an approach, a few authors referred to various basis decompositions on the functional output such as the principal component analysis (Campbell et al. 2006). Following this idea, Lamboni et al. (2011) recently proposed ‘‘aggregated sensitivity indices’’ to measure the global contribution of each input factor to multivariate outputs. The full functional restitution of sensitivity analysis, which would provide local and global information, remains a challenge. Moreover, the characterization of uncertainty related to stochastic variables such as weather conditions can be delicate and can have a significant impact on the results of sensitivity analysis and uncertainty propagation. Indeed, the probabilistic models used to represent the uncertainty of such variables can be inadequate or simply uncertain due to difficulties in its estimation. It can be interesting to evaluate the influence of this probabilistic characterization of uncertainty, particularly if the related variables (weather conditions for C-M test case) are very influential on code predictions. The second main challenge in this paper is to investigate the influence of the probabilistic model on the results of sensitivity analysis, which is usually known as «second-level sensitivity analysis» (De Rocquigny 2012). In the case of functional output, the visualization and the interpretation of this second-level sensitivity analysis is a challenge. To meet all these objectives, we propose a global methodology combining several advanced statistical techniques which enable to deal with the various natures of the uncertain inputs, the high dimension of model outputs and the impact of the probabilistic characterization of complex uncertain inputs. C-M atmospheric dispersion model is described in the following section. Then, a global methodology is proposed
Stoch Environ Res Risk Assess (2015) 29:959–974
to deal with the model input uncertainties and perform sensitivity analysis. This methodology is then applied stepby-step to C-M test case: in ‘‘Step 1: Uncertainty quantification’’ section describes the step of uncertainty quantification, in ‘‘Step 2: Experimental design and code simulations’’section the learning sample building. In ‘‘Step 3: Surrogate modeling for spatial output’’ and ‘‘Step 4: Global sensitivity analysis and uncertainty propagation’’ sections present respectively the surrogate model used to approximate C-M code and the global sensitivity analysis. The impact of the probabilistic characterization of meteorological variables is then studied in the last section.
2 Ceres-Mithra atmospheric model 2.1 Impact assessment Following an accidental release of radionuclides in the atmosphere, Ceres-Mithra evaluates instantaneous and time integrated activity concentrations (resp. Bq m-3 and Bq s.m-3) for different points and moments. To do this, several phenomena are simulated: transport, diffusion, impaction and sedimentation. Atmospheric transport modeling is carried out with the Gaussian puff model (Monfort et al. 2011). This model assumes that a sequence of individual puffs of pollutant is released from the source. Different standard deviation equations can be used; Doury’s formulas (Doury 1980) which are function of travel time are the default option used in this study. Instantaneous and time integrated volume activity concentrations are predicted. Deposits on the ground result from mechanisms of diffusion, impaction and sedimentation and from washing out of puffs during rainy situations. The dry deposition velocity is considered independent of distance from the emissary. Quite important rain leads to more significant deposits. The calculation of wet deposition velocity with distance from emissary is possible by taking into account the washing out rate. For aerosols or vapours, depletion due to dry and wet deposition is calculated. Instantaneous and time integrated surface activity concentrations are predicted. 2.2 Uncertain input parameters C-M code depends on many uncertain input parameters related to the various phenomena involved: •
Radionuclide parameters: the deposition process depends on the radionuclide deposition velocity. CERES has a database containing the characteristics of about 600 radioisotopes, including their deposition velocity.
961
•
•
Release parameters: height of release and source term activity. As Mithra component is a Gaussian puff model, it can take into account releases varying with time. Meteorological parameters: in the calculations, it is possible to take into account variations of weather conditions. This can be necessary for long duration releases. Indeed, if the release is quite long, weather conditions, especially wind directions, are not constant during release time and observation and can impact the atmospheric dispersion prediction. For example, the variation of the wind direction increases the width of the plume; the concentrations in the wind axis are then lower. If weather conditions are well known, it is possible to define meteorological steps, i.e. time intervals during which meteorological data (wind speed and direction, atmospheric stability and rain) are supposed to be constant.
All these uncertain inputs are modeled by random variables, which will be characterized by their probabilistic distribution (or probabilistic model) in the uncertainty quantification step. C-M application has the specificity of having variables of different kind: scalar and functional. 2.3 Ceres-Mithra outputs Given the uncertain parameters, C-M code provides spatial maps of radionuclide volume and surface concentration for various moments defined by the scenario. These data can be instantaneous or integrated over time. C-M outputs can be viewed as functional outputs (spatio-temporal). Fig. 1 gives an example of an integrated activity concentration map for a given moment. Each map consists of a grid of nz = 64 9 64 = 4,096 points which corresponds to a discretization of the two-dimensional space domain. This discretization is the same for each simulation of the code. The C-M code is a time consuming complex spatiotemporal model and the time required for one run can vary from few seconds to few minutes, depending on the release configuration (number of release sites, radioisotopes, calculation moments …). Moreover, no automatic simulator launch is available; for each simulation, the user has to enter manually the input parameters on dialog boxes, launch the corresponding simulation and post-treat them. For a median configuration, the time required for each simulation (including the parameter entering, the computation of C-M model and the post-treatment of the outputs) can take 30 min. Consequently, the number of simulations is limited to few hundreds. Fig. 2 summarizes, via a flowchart, the inputs and outputs of the C-M model (in bold the inputs considered as uncertain in our study).
123
962
Stoch Environ Res Risk Assess (2015) 29:959–974
Fig. 1 Integrated surface activity concentration of Cesium 137 after 20 min of release (from two installations, denoted I1 and I2)
Fig. 2 Flowchart of the C-M model, with the inputs and the outputs of the simulator (in bold the inputs considered as uncertain in our study)
3 Methodology for uncertainty treatment The objective is to assess how the input uncertainties can affect the C-M predictions and to perform a global sensitivity analysis of C-M code in order to identify the most influential uncertain parameters. However, for the purpose of sensitivity analysis, five main difficulties can arise due to practical problems, especially when focusing on environmental risk assessment:
123
•
• • •
physical models involve rather complex phenomena (they are nonlinear and submitted to threshold effects) sometimes with strong interactions between physical variables; the number of available code simulations is often limited, due to CPU time and simulation constraints; numerical models take as inputs some functional uncertain variables (varying with time); the code outputs encompass many variables of interest that can vary in space and time;
Stoch Environ Res Risk Assess (2015) 29:959–974
•
963
the probabilistic characterization of complex uncertain inputs such as weather conditions can be difficult and have a significant impact on the sensitivity analysis studies. This impact has to be evaluated.
To solve these problems, we propose in this paper a global methodology combining several advanced statistical techniques in order to manage the uncertainties. •
•
•
•
•
Step 1 First of all, a scenario of an accidental release is defined and an important step of uncertainty quantification is made. This step aims at identifying the variables which are considered as uncertain in our study, among all the scenario parameters required to run a simulation with the computer code (or simulator). These uncertain input variables are modeled by random variables which are characterized by their probabilistic distribution. This uncertainty characterization can be made from database analysis or expert judgment. Step 2 Then, an experimental design is chosen which yields a set of simulations of computer code. This design enables to investigate the domain of variation of the uncertain parameters and provides a learning sample. Step 3 Then, from this learning sample, a metamodel can be built to approximate the computer code. The metamodel aims at reproducing the behavior of the computer code in the domain of its influential parameters (Sacks et al. 1989; Fang et al. 2006). Once its accuracy and prediction capabilities have been checked, the metamodel, which requires a negligible calculation time, can be used to perform sensitivity analysis and uncertainty propagation. Indeed, these studies, which often refer to the probabilistic framework and Monte Carlo methods, require a lot of simulations (thousands). As the number of computer code simulations is limited, these studies cannot be performed directly with the computer code. Step 4 Thus, the metamodel is used to perform global sensitivity analysis (GSA). GSA accounts for the whole input range of variation, and tries to explain output uncertainties on the basis of input uncertainties. GSA based on variance decomposition enables to perform quantitative sensitivity analysis which determines the precise part of response variability explained by each variable and any interaction between variables (Sobol 1993). Based upon the functional analysis of variance decomposition of any square integrable function (Efron and Stein 1981), sensitivity indices called Sobol’ indices are thus obtained. These indices are estimated with the metamodel, through analytical or Monte-Carlo formulations. Our purpose here is to extend the use of metamodel and sensitivity analysis to the specific case of functional output variables, for instance spatially or temporally dependent. The problem of building a metamodel for a
•
functional output has recently been addressed for onedimensional outputs by Shi et al. (2007) and Bayarri et al. (2007) who proposed an approach based upon functional decomposition, such as wavelets. In the case of sensitivity analysis, a functional output is usually considered as a vectorial output and sensitivity indices relative to each input are computed for each discretized value of the output. To avoid the large amount of sensitivity index computations when applying such an approach, a few authors referred to various basis decompositions on the functional output such as the principal component analysis (Campbell et al. 2006; Lamboni et al. 2011). Then, sensitivity indices are obtained for the coefficients of the expansion basis. However, the full functional restitution of Sobol’ indices remains an unexplored challenge. Recently, Marrel et al. (2011) proposed a complete methodology to perform sensitivity analysis of two-dimensional output: their approach consists in building a functional metamodel (based on a wavelet decomposition technique and the Gaussian process metamodel) and computing variance-based sensitivity indices at each location of the spatial output map. The maps of Sobol’indices thus obtained yields the identification of global and local influence of each input variable and the detection of areas with interactions. However, this method seems to require a large number of metamodels to be built. To deal with the C-M test case, we propose to adopt the methodology proposed by Marrel et al. (2011) but using a Proper Orthogonal Decomposition (Chatterjee 2000) instead of the wavelet decomposition, in order to obtain Sobol’ maps and deduce local and global influence of each input parameter. Step 5 The uncertainty quantification process (step 1) can be a tricky step because of lack of information or data on the uncertain inputs or difficulties to fit a welladapted probabilistic model. In the case of parametrical probabilistic models, this additional uncertainty can be partially represented by the variability of probabilistic model parameters. These additional uncertain parameters constitute a second layer of uncertainties. Varying the parameters of the probabilistic models, GSA results can change. It could be interesting to evaluate the impact of the 2nd kind uncertainties on the GSA results, which is usually known as «second-level sensitivity analysis» (De Rocquigny 2012). To quantify this variability, we propose to use specific tools dedicated to functional uncertainty visualization.
This paper proposes a complete application of the whole methodology to the real C-M test case. This methodology is applied from the step 1 to 5 and allows dealing with the various natures of the uncertain inputs, the uncertain
123
964
Stoch Environ Res Risk Assess (2015) 29:959–974
probabilistic modeling of these inputs and the high dimension of model outputs.
4 Step 1: Uncertainty quantification 4.1 Scenario of accidental release In this paper, we consider the following scenario of accidental release: •
• • •
Release from several emission points: a simultaneous release occurs in two CEA installations, denoted I1 and I2 installations. We consider a release of one hour with a constant radionuclide activity. The location and height of the two release points are different. Release of one aerosol: cesium 137 denoted 137Cs. Doury’s formulas are used to compute the standard deviation of Gaussian puff model. Concerning the weather conditions, we assume that there is no rain and that the temperature, the hydrometry and the atmospheric stability are constant. During all the release, we are in normal diffusion conditions (stable atmosphere). Only the wind speed and direction are the uncertain meteorological variables. Moreover, we consider that the class of meteorological day is known and fixed: it is a dry weather class with a wind origin direction lying in [249; 333] (class of weather conditions with the highest probability of occurrence).
The 137Cs atmospheric dispersion is studied during 2 h from the beginning of the one-hour multi-emissary release. During this period, the meteorological data are defined by steps, i.e. time intervals during which meteorological data (wind speed and direction) are supposed to be constant. We chose here to use steps of 20 min; consequently, for a twohour dispersion study, six values are required for each meteorological variable. 4.2 Uncertain parameter characterization Considering the chosen scenario, the remaining uncertain inputs can lie in three categories: the ones related to radionuclide, release or weather conditions. Concerning the radionuclide, the deposition velocity is assumed, following expert opinion, to be known with a factor ten around a reference value DV137Cs. Thus, the variation interval is [10-1 DV137Cs; 10 DV137Cs] and a log-uniform distribution is used to model its probability distribution over this interval. The activity of the source term activity released by each installation is supposed to be measured by sensor with an error of a factor ten around a reference value Q137Cs. The height of release points for I1 and I2 installations are assumed to be uniformly distributed in an
123
interval [h0/2; 3h0/2] around their reference value h0 (respectively 15 and 45 m). All the uncertain parameters and their associated probability distribution are summarized in the Table 1. Concerning the weather conditions, a database of 10 year meteorological records is available (from sensor placed on the CEA site). These statements are made at an interval of 10 min. From this database, several meteorological classes are identified corresponding to different kinds of ‘‘meteorological days’’ (e.g. dry day with north wind, rainy windless day…). We supposed in this study that the type of ‘‘meteorological days’’ is known: dry weather with north wind. Under the hypothesis of such meteorological class day, the atmospheric dispersion is assumed to be normal and the temperature is supposed constant to 20 C during the two-hour dispersion study. Only two meteorological variables are considered: wind speed (WS) and wind direction (WD). For the chosen meteorological class day, these variables belong respectively to the intervals [0; 12.5] m.s-1 and [249; 333] degrees and could be supposed to be independent. Time series of wind speed and direction, which belong to this chosen meteorological class, are extracted from the database. These two variables are time random temporal processes. An autoregressive (AR) model is chosen to fit each stochastic process. The AR parameters are estimated by maximum likelihood and the optimal order of the process is determined with the Akaike information criterion (Akaike 1974). AR processes of order one are selected for both WS and WD and the corresponding AR parameters, i.e. model coefficient, mean coefficient and white noise variance, are respectively denoted a, l and r2e . The formulation of these temporal processes is given by equation (1): ðXt lÞ ¼ a ðXt1 lÞ þ et
ð1Þ
with et a Gaussian centered white noise of variance r2e . Several statistics tests (normality, independence and residual-centering tests), not detailed here for brevity, have been made to control the validity of the choice of AR(1) models. Thus, for each available WS and WD time series, an AR(1) model is fitted and the AR(1) parameters are estimated. The mean and standard deviation values of these parameters are given in Table 2. To model the variability of these parameters, parametric probabilistic distributions are fitted. The most adapted selected distributions are listed in Table 2. At a first step, only the mean values of AR(1) parameters will be considered to simulate meteorological WS and WD time series and run C-M simulations. In a second step, the variability of AR(1) parameters will be taken into account and its impact on sensitivity analysis results will be evaluated. To achieve this, the probabilistic distributions given in Table 2 will be used.
Stoch Environ Res Risk Assess (2015) 29:959–974
965
Table 1 Uncertain parameters and associated probabilistic model for the C-M application Uncertain parameters (random variables)
Nature
Height of release (m)
Reference value
Variation interval
Probability distribution First-level uncertainty
Second-level uncertainty
I1
real scalar
15
[7.5; 22.5]
Uniform
no second-level uncertainty
I2
real scalar
45
[22.5; 67.5]
Uniform
no second-level uncertainty
real scalar
510-3
[5.10-4; 5.10-2]
Log-Uniform
no second-level uncertainty
9
Deposition velocity (m.s-1)
137
Source term activity (Bq)
137
Cs Cs
8
10
real scalar
10
[10 ; 10 ]
Log-Uniform
no second-level uncertainty
Wind direction–WD ()
real vector
291
[249; 333]
AR(1) truncated process
AR parameters following Weibull distributions
Wind speed –WS (m.s-1)
real vector
5.3
[0; 12.5]
AR(1) truncated process
AR parameters following Weibull and log-normal distributions
Table 2 Mean, standard deviation and probabilistic distribution of the AR(1) parameters of WS and WD series AR(1) parameters
la
r2e
l
a ra
Probabilistic model
ll 283.71 5.51
Wind direction (WD)
0.66
0.21
Weibull
Wind speed (WS)
0.80
0.16
Log-normal
Probabilistic model
l2re
r2re
14.93
Weibull
59.36
46.78
Weibull
2.43
Weibull
0.53
0.36
Weibull
rl
The process to simulate a wind speed (or wind direction) trajectory is the following: considering the initial value to be known (measured by sensor at the beginning of the release), a simulation of the AR(1) model with 10-minute intervals is made from the initial value. For this simulation, the mean values of AR(1) parameters, given in Table 2, are used. The WS (or WD) value obtained for each interval is computed multiplying the value of the previous interval by the AR(1) coefficient la and adding a white noise sampled with the variance lr2 e . The initial values at t = 0 min are assumed to be known for all the simulations and equal to 291 and 5 m s-1 for WD and WS, respectively. This trajectory is then averaged to obtain 20-minute intervals, as required for each run of C-M code. Finally, to generate a 2hour WS (resp. WD) trajectory, 12 white noises values e1;WS ; . . .e12;WS (resp. e1;WD ; . . .e12;WD ) are sampled and then, after averaging the trajectories by 20-minute intervals, six values WS1,…, WS6 (resp. WD1,…, WD6) are obtained in order to describe the weather conditions during the 2-hour atmospheric diffusion process. Examples of random simulations of WD and WS in their piecewise linear form with 20-minute intervals are given by Fig. 3. Note that truncated AR(1) processes are used based on truncated Gaussian simulations in order to ensure that WS (resp. WD) keep its value in their definition intervals. Consequently, this test case has the particularity to involve variables of different kind: scalar parameters like deposition velocity and functional parameters like wind or direction speed.
Probabilistic model
5 Step 2: Experimental design and code simulations To investigate the most efficiently the domain of variation of the uncertain parameters, we propose to use a Latin Hypercube Sampling (LHS) design (McKay et al. 1979). To optimize the space-filling properties of the design, we used a maximin LHS introduced by Johnson et al. (1990) which maximizes the minimal distance between two points of the design. Note that, as the meteorological variable are simulated via AR process based on several white noise simulations, the LHS technique is applied to the sample of white noise (to simulate a 2-hours random trajectory of WS and WD, a centered Gaussian realization is simulated for each 10-minute intervals which yields to a total number of 24 normal samplings). Finally, for each simulation a total of 28 independent random variables are sampled: heights of release I1 and I2, deposition velocity, source term activity and the 24 independent white noises e1;WS ; . . .e12;WS and e1;WD ; . . .e12;WD . From these 28 variables, 16 variables are deduced: heights of release I1 and I2, deposition velocity, source term activity and the 12 values of wind speed and direction WS1,…,WS6, WD1,…,WD6 (as described in the previous paragraph). These 16 variables are the input parameters of each C-M simulation. We chose to use a LHS of n = 150 simulations. This number of simulations is a compromise between the time required for each simulation (30 min, giving a total of 3 days for the 150 runs) and the number of input parameters. For each simulated set of input variables, C-M
123
966
Stoch Environ Res Risk Assess (2015) 29:959–974
Fig. 3 Examples of random simulations of wind direction and speed, in their piecewise linear form with 20-minute intervals, as required for each run of C-M code Fig. 4 Examples of 137Cs integrated surface activity maps from the learning sample (logscale Bq)
Fig. 5 Mean (left) and variance (right) of the 137Cs integrated surface activity maps of the learning sample (log-scale Bq)
computes atmospheric dispersion of 137Cs and predicts, for each 20 min, the evolution of instantaneous and time integrated activity concentrations (volume and surface concentrations). Figure 4 gives few examples of obtained integrated surface activity maps after one-hour of dispersion process. Note that the limits of the considered spatial domain are fixed, extending over nearly 4 kms in the two directions from the source term location. We can clearly observe the two source terms corresponding to the two
123
nuclear installations. In what follows, we focus our study on the 137Cs integrated surface activity but all our methodology can be easily applied to the other C-M outputs. The mean and variance map of the 137Cs integrated surface activity maps of the learning sample at the time t = 1 h are given by Fig. 5. As the release scenario is studied during 2 h and C-M computes atmospheric dispersion for each 20 min, each simulation provides six maps of integrated surface activity
Stoch Environ Res Risk Assess (2015) 29:959–974
(corresponding to t = 20 min, 40 min, …, t = 2 h). As the temporal discretization is coarse, the uncertainty treatment process (metamodeling and sensitivity analysis) is applied on maps for each moment independently. Step 3, 4 and 5 and corresponding equations are so described for maps corresponding to a fixed moment and are then applied to the 6 moments, separately. However, all the process can be extend to spatio-temporal outputs, considering 3D-outputs instead of 2D- maps.
967
to extract mode shapes or basis functions. An orthogonal decomposition truncated to k elements applied to C-M output is: Yk ðX; zÞ ¼ YðzÞ þ
k X
ah ðXÞuh ðzÞ
ð2Þ
h¼1
where YðzÞ ¼ E½YðX; zÞ and the functions ðuh Þh are an orthonormal basis: R uh1 ðzÞuh2 ðzÞdz ¼ dh1 h2 with dh1 h2 ¼ 1 if h1 = h2, 0 X
6 Step 3: Surrogate modeling for spatial output The input variables are uncertain and modeled by the random vector X = (X1,…, Xd) belonging to a bounded domain of Rd and of known distribution. For a given x value of vector X = (X1,…, Xd), the code output is now a deterministic function yðx ; zÞ, where z denotes a vector of two-dimensional spatial coordinates. Thus, the target outputs are two-dimensional maps. Our objective is to perform a sensitivity analysis with respect to the uncertain input variables X. Variables z are deterministic. They vary on a grid of size nz which corresponds to a discretization of the two-dimensional space domain (as described in ‘‘CeresMithra outputs’’ section). At each simulation, the C-M model yields a vectorial output corresponding to the nz values yðx ; zÞ for z describing the grid. The n outputs of the learning sample are arranged in a n 9 nz matrix A. As a metamodel cannot be built for each point of the space discretization, we propose an innovative strategy based upon a proper orthogonal decomposition (POD) of the spatial output and the metamodeling of main POD coefficients by a Gaussian process (Gp). Therefore, for a given input design Xs = (x(i))i=1..n and the n corresponding simulations of the map (y(x(i), zj), j = 1,…, nz), i = 1,…, n, the three main steps of the method are: (1) (2) (3)
Proper orthogonal decomposition of the n maps Selection of the k main POD components coefficients with the largest variance Modeling of the coefficients corresponding to the k main POD components with respect to the input variables using a Gp.
6.1 Proper orthogonal decomposition The proper orthogonal decomposition (Chatterjee 2000) also known as Principal Component Analysis, Karhunen– Loe´ve Decomposition and single value decomposition is a powerful method of data analysis aimed at obtaining lowdimensional approximate descriptions of high-dimensional processes. Data analysis using the POD is often conducted
otherwise. The coefficients ah are function of random variables X R and defined by: ah ðXÞ ¼ YðX; zÞuh ðzÞdz. Rd
The orthonormal functions uh are chosen in such a way that the approximation for each k is as good as possible in a least squares sense. That is, we would try to find, once and for all, a sequence of orthonormal functions such that the first two of these functions give the best possible two-term approximation, the first seven give the best possible seventerm approximation, and so on. These special, ordered, orthonormal functions are called the proper orthogonal nodes and the decomposition in eq. (2) is called the POD of Y(X, z) over z. The discrete version of the POD is the singular value decomposition (SVD) of matrix A and is described in Chatterjee (2000). The method of snapshots is used to obtain the k main eigenvectors (nz 9 n discrete version of each uh ). For each eigenvectors, a learning sample of n realizations of the coefficient ah is available. The value of this coefficient depends on X and is then modeled by a Gaussian process metamodel.
6.2 Gaussian process-based metamodel Among all the metamodel-based solutions (polynomials, splines, neural networks, etc.), we focus our attention on the Gaussian process (Gp) model. It can be viewed as an extension of the kriging method, which is used for interpolating data in space (Chile`s and Delfiner 1999), to computer code data (Sacks et al. 1989; Oakley and O’Hagan 2002). Many authors (e.g., Welch et al. 1992; Marrel et al. 2008) have shown how the Gp model can be used as an efficient emulator of code responses, even in high dimensional cases. We used the Gp model as described in Marrel et al. (2008). The deterministic part is a linear regression model with a selection process and a Mate´rn correlation function (Rasmussen and Williams 2006) is used for the stochastic part. For each coefficient ah(x) of the POD decomposition, the Gp metamodel is estimated on the learning sample following the methodology proposed by Marrel et al.
123
968
Stoch Environ Res Risk Assess (2015) 29:959–974
(2008) and the obtained approximation of ah is denoted a^h ðxÞ. Thus, an approximation of Y(X,z) is obtained: Y^k ðX; zÞ ¼ YðzÞ þ
k X
a^h ðXÞuh ðzÞ
h¼1
For any new input point x*, Y^k ðx ; zÞ is used as a predictor and yields the predicted map ðyðx ; zj Þ; j ¼ 1. . .; nz Þ: Of course, this method of map prediction has the advantage, compared to the simulation of the code, to be much less time expensive (less than one second to predict a map with the metamodel, compared to the 30 min required for each C-M simulation). Finally, the global metamodel resulting from the combination of a POD decomposition and the Gp-metamodelings of coefficients provides an efficient functional metamodel and, more precisely, in our case, a spatial metamodel. To evaluate the accuracy of this global metamodel, we use the predictivity coefficient Q2 which gives us the percentage of the mean explained variance of the output map: nP test 2 Y^k ðxðjÞ ; zÞ YðxðjÞ ; zÞ Q2 ðzÞ ¼ 1
j¼1 nP test j¼1
Yk ðxðjÞ ; zÞ 1n
n P
YðxðiÞ ; zÞ
2
i¼1
(j)
where {x }j=1…ntest is a test sample. Q2 corresponds to the coefficient of determination R2 computed in prediction. It can be computed on a test sample independent from the learning sample or by crossvalidation on the learning sample. The coefficient Q2 can be estimated for each point of the spatial grid, which yields a map of Q2. It can be averaged over the space. 6.3 Application to C-M data For each moment, the discrete version of the POD is applied on the matrix A containing the n discretized maps and a selection of the main components is performed. For example, for the maps corresponding to the time t = 1 h, the k = 10 main eigenvectors which explained more than 95 % of the variance are kept and their coefficients are modeled by Gp metamodels. Let us remind that 16 variables (heights of release I1 and I2, deposition velocity, source term activity and the 12 values of wind speed and direction WS1, …, WS6, WD1, …, WD6) are input parameters of each C-M simulation (as explained in ‘‘Step 2: Experimental design and code simulations’’ section). The metamodels are built taking as input these variables. However, only the values of wind speed and direction in the time period prior to the considered moment are kept. For example, for the maps corresponding to the time t = 1 h, only WS and WD at time t = 0 min (WS1 and WD1), t = 20 min (WS2 and WD2) and t = 40 min (WS3 and WD3) are considered as explicative variables in Gp
123
metamodels. Moreover, to build metamodels on independent inputs, we consider as explicative variables the increments of WS (resp. WD) instead of their absolute values: WS1, WS2WS1,…, WS6-WS5 (resp. WD1, WD2-WD1,…, WD6-WD5). As the number of C-M simulations is limited, the Q2 of the spatial metamodels is estimated by cross validation (leave-one-out technique described in Hastie and Tibshirani 1990). The maps of Q2 obtained for each moment are given by Figure 6. Note that Q2 is considered to be equal to one in the area where the activity variance and mean is null (cf. Fig. 5). The spatial metamodel based on POD and Gp modeling of the first ten POD coefficients yields a good predictivity at each moment: Q2 averaged over space upper than 0.8 (i.e. more than 80 % of the variance is explained). Note that only the points where the activity variance is not null are taken into account to compute the spatial average of Q2. The central part of the plume is better predicted, except the central limit between the two releases. The limits of the plume are less well predicted. Note that the area where the Q2 is the lowest corresponds to the area where the variability is lowest, which is not too damaging. However, it could be interesting in future work to improve the quality of spatial metamodels in this area either considering more components in the POD or further improving the Gp estimation.
7 Step 4: Global sensitivity analysis and uncertainty propagation 7.1 Definition of Sobol’ indices To identify the most influent uncertain inputs on the predicted surface activity maps, we make a global sensitivity analysis (GSA) computing variance-based measures. These measures can handle nonlinear and non-monotonic relationships between inputs and output (Saltelli et al. 2000). These measures are based upon the functional analysis of variance (ANOVA) decomposition of any integrable function (Efron and Stein 1981) and determine how to share the variance of the output resulting from a variable Xi or an interaction between variables (Sobol 1993) : Var E Y Xi ; Xj Var ½EðY jXi Þ ; Sij ¼ Si Sj ; Sijk Si ¼ Var ðY Þ VarðYÞ ¼ ... Si, which is the first order Sobol index also called the primary effect of Xi, measures the part of the response variance explained by Xi alone. Similarly, Sij defined for i = j measures the part of response variance due to the interaction effect between Xi and Xj. In an equivalent way, higher order indices can be defined.
Stoch Environ Res Risk Assess (2015) 29:959–974
969
Fig. 6 Temporal evolution of Q2 map estimated, by cross-validation, for the predicted
Sobol’ indices are all included in the interval [0; 1] and their sum is one in the case of independent input variables. The larger the index value, the greater the importance of the variable related to this index. To express the overall output sensitivity to an input Xi, Homma and Saltelli (1996) introduced the total sensitivity index. STi, also called total effect, is defined as the sum of all the sensitivity indices involving Xi: X Sk STi ¼ k#i
where k#i denotes all the terms that include the index i. As some explicative input variables are dependent in the C-M application like WS1,…, WS6 (resp. WD1,…, WD6) and refer to the same uncertain physical input WS (resp. WD), we compute Sobol’ indices by group (estimation of Sobol’ indices for subsets of variables, Sobol 2001). Consequently, for each variance decomposition, the Sobol’ indices related to 6 independent variables are computed (heights of release I1 and I2, deposition velocity, source term activity, wind speed and wind direction). 7.2 Application to Ceres-Mithra code A map of the Sobol’ index of an input Xi shows the local and global influences of this input on the output. It can help to better understand the computer code results and can be used to reduce the uncertainties in the responses more efficiently. Thus, to reduce the output variability at a given point of the map, we analyze all Sobol’ maps and determine the most
137
Cs integrated surface activity
influential inputs. Then, we can try to reduce the uncertainty of these inputs by accounting for additional measures. In addition, the global influence of each input over the whole space can be investigated to identify areas of influence and non-influence for this input. At present, the functional metamodel obtained in Sect. 6 can be used to estimate first-order and total Sobol’ indices. We used Monte Carlo-based formulas of Saltelli et al. (2010) and made 10,000 simulations with the metamodel. As a final result, we obtain, for each uncertain input and each of the six moments, maps of first-order Sobol’ indices and maps of total Sobol’ indices. For the sake of brevity, only the variables with significant Sobol’ indices are presented, in Figs. 7 to 9, and only three moments are shown. The global analysis of Sobol’ indices shows that the wind direction is the most influential variable on the surface activity forecast: it explains more than half of the activity variance. The source term activity is the second most influential variable, with on average about 15 % of explained variance. The uncertainty on wind speed is less influential while the deposition velocity and height of release do not affect the forecast. A local analysis of Sobol’ indices reveals that the source term activity influence is predominant in the central part of the plume while the wind direction is less influential in this area. On the flip side, the situation is reversed in the limit of the plume: logically, the wind conditions have a strong impact on the plume trajectory. The sum of the primary effects (i.e. without considering any interactions between variables) explains between 80 to 100 % of the variance. Thus, the interactions account for less than 10 % of the variance and
123
970
Stoch Environ Res Risk Assess (2015) 29:959–974
Fig. 7 Temporal evolution of 1st order Sobol’ indices for the predicted 137Cs integrated surface activity
the only significant interaction is the one between the meteorological variables (SW and DW), located at the limit of the plume, as illustrated by Fig. 10. This interaction increases slightly over time while the direction wind decreases. This study could be completed by an uncertainty propagation in order to evaluate confidence intervals for C-M forecasts. As the weather conditions, which are in practice often considered as constant, appeared as the most influential parameters, the spatial metamodels could be used to compare two scenarios: one with constant weather conditions and the other with variable weather conditions. To do this, Monte Carlo technique is applied (using the metamodels) and provides probabilistic distributions for the predicted 137Cs integrated surface activity at each spatial point, for the two scenarios. An illustration is done by Fig. 11 and shows the wide impact of weather conditions variability. The GSA reveals the predominant influence of meteorological variables, speed and direction wind, on the C-M forecasts. Their uncertainty can strongly impact the C-M predictions and its interpretation. SW and DW are stochastic variables: their uncertainty can be difficultly reduced in practice and should be therefore propagated for each study of release scenario. However, as illustrated in ‘‘Uncertain parameter characterization’’section, the
123
characterization of their uncertainty is quite difficult. A modeling with AR(1) processes has been chosen (this choice could be discussed) but a residual uncertainty remains on the AR(1) parameters. Probabilistic distributions have been fitted to model the AR(1)-parameters uncertainty that we can call uncertainty of the 2nd kind, as opposed to the uncertainty on input variables, called uncertainty of the 1st kind. The GSA provides the impact of 1st kind uncertainties. It could be interesting to evaluate the impact of the 2nd kind uncertainties on the GSA results, which is usually known as «second-level sensitivity analysis» (De Rocquigny 2012).
8 Step 5: Impact of uncertainties during quantification step on the global sensitivity analysis To evaluate the impact of AR(1) parameter uncertainties on the GSA results, thousand samples of AR(1) parameters are done following their probabilistic distributions given in Table 2. For each sample, ten thousands WS and WD simulations are generated, yielding the estimation of Sobol’ maps (always with the Gp-based spatial metamodel and the Sobol’ estimation with Saltelli’s formulas). To visualize the variability of Sobol’ maps due to the AR(1) variation, we propose to use the functional bagplot,
Stoch Environ Res Risk Assess (2015) 29:959–974
971
Fig. 8 Temporal evolution of total Sobol’ indices for the predicted 137Cs integrated surface activity
Fig. 9 Temporal evolution of 2nd Sobol’ indices corresponding to the interaction WS and WD for the predicted 137 Cs integrated surface activity
introduced by Rousseeuw et al. (1999). They adapted the common boxplot definition to bivariate data, using the halfspace location depth, defined by Tukey (1975). For a point x and a set Z of points, this depth is defined as the smallest number of points from Z contained in a closed halfspace whose boundary contains x. The bagplot is composed of four main elements: the median, the bag, the fence and the outliers. The bagplot median is defined as the point maximizing the halfspace location depth. The bag is the smallest depth region containing 50 % of the total number of observations. The fence is the convex hull of the points contained in the region obtained by inflating the bag by a factor 2,58. The points outside the fence are considered as outliers. Hyndman and Shang (2010) have extended the notion of bagplot to functional data. In their method,
the functions are first decomposed on a principal components analysis and a bagplot is built for the first two components. This functional bagplot is applied to the sample of 1st and 2nd Sobol’ maps for the time t = 2 h, when the AR(1) parameters vary. The corresponding lower and upper limits of the bag are given by Fig. 11. Only the results for influential inputs are presented. The uncertainty on AR(1) parameters has a significant impact on the sensitivity results. Even the influential and non-influential inputs remain unchanged, as well as their order of influence, AR(1) parameters modify the local influence of wind direction in the limit of the plume, by reporting its influence on the interaction between wind direction and speed. The AR(1) parameter values used to model the uncertainty on weather conditions substantially modify the
123
972
Fig. 10 Probabilistic distribution of
Stoch Environ Res Risk Assess (2015) 29:959–974
137
Cs integrated surface activity for site A with two scenarios: constant and variable weather conditions
Fig. 11 Lower and upper envelopes of the bag in the bagplot approach applied to the 1st and 2nd Sobol’ indices maps for the predicted integrated surface activity, when AR(1) parameters vary
results of GSA: quantitatively and locally. Therefore, it seems important to reduce the uncertainty on AR(1) parameters. Two options can be considered: either the AR(1) model is conserved and the uncertainty on AR coefficients is reduced by making better use of the meteorological database or another probabilistic model more adapted is fitted on the database (by increasing the order of AR or choosing a more sophisticated stochastic process).
9 Conclusion In this paper, a new methodology was introduced to compute spatial maps of variance-based sensitivity indices (such as the Sobol’ indices) for numerical models giving spatial maps as outputs and taking as input uncertain functional variables. Such situations often occur in environmental modeling
123
137
Cs
problems. Our global methodology combines several advanced statistical techniques from the quantification of input uncertainties (with the management of functional inputs) to sensitivity analysis of spatial outputs, through the development of functional metamodeling. One critical issue with our application is due to the reduced number of model output maps available because of the high cpu time cost of the numerical model. A proper orthogonal decomposition linked to a metamodel technique (based upon the Gp model) is proposed and used to solve this problem. Choosing proper orthogonal decomposition reduce the number of components to keep and consequently the number of Gp metamodel to build. In addition, the Gp model is appropriate for handling the large differences between the output maps obtained for various inputs. This induces strong non-linear variations in the Gp-modeled POD coefficients. The resulting functional metamodel is a fast emulator (i.e. with negligible cpu time)
Stoch Environ Res Risk Assess (2015) 29:959–974
of the computer code. It can be used for uncertainty propagation issues, optimization problems and, as advocated in this paper, for sensitivity index estimation. The sensitivity maps thus obtained allow us for spatially identifying the most influential inputs, for detecting zones with input interactions and for determining the zone of influence for each input. The approach that we propose in this paper provides global and local information on the influence of each input. Thus, the wind direction has been identified as the most influential variable on the surface activity forecast and explains more than half of the activity variance. The source term activity and the wind speed are the two other influential variables while the deposition velocity and height of release do not affect the code prediction. A local analysis of Sobol’ indices revealed that the source term activity influence is predominant in the central part of the plume while the wind direction is more influential in the limit of the plume. Even if it increases slightly over time, the only significant interaction which accounts for less than 10 % of the variance is the one between the meteorological variables and is located at the limit of the plume. The ranking of input variables, according to their direct influence in mapping model, allows an expected robust expert opinion. In our global methodology, we also proposed to evaluate the impact of the additional uncertainty related to the probabilistic characterization of meteorological variables. To perform this second-level sensitivity analysis, a Monte Carlo approach combined with uncertainty visualization tools is proposed. It revealed the significant impact of the probabilistic models used for weather conditions: they substantially modify the results of sensitivity analysis, quantitatively and locally. Several options have been proposed to reduce this impact. In fine, such efficient methodology supports global decision process, from interpretation and prediction modeling to the deployment of measurement and characterization targeted and pragmatic strategies. The economic and time benefit are obvious. Our methodology can be extended to any computer codes with functional inputs and outputs: codes with outputs depending on time, codes depending on other physical processes (such as a function of temperature), codes with outputs varying in space and time. Acknowledgments This work was supported by French National Research Agency (ANR) through COSINUS program (project COSTA BRAVA no ANR-09-COSI-015).
References Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723 Bayarri M, Berger J, Cafeo J, Garcia-Donato G, Liu F, Palomo J, Parthasarathy R, Paulo R, Sacks J, Walsh D (2007) Computer model validation with functional output. Ann Stat 35:1874–1906
973 Campbell K, McKay M, Williams B (2006) Sensitivity analysis when model outputs are functions. Reliab Eng Sys Saf 91: 1468–1472 Chatterjee A (2000) An introduction to the proper orthogonal decomposition. Curr Sci 78:808–817 Chile`s J-P, Delfiner P (1999) Geostatistics: modeling spatial uncertainty. Wiley, New-York Ciriello V, Di Federico V, Riva M, Cadini F, De Sanctis J, Zio E, Guadagnini A (2012) Polynomial chaos expansion for global sensitivity analysis applied to a model of radionuclide migration in a randomly heterogeneous aquifer. Stoch Env Res Risk Assess 27:945–954 De Rocquigny E (2012) Modelling under risk and uncertainty: an introduction to statistical, phenomenological and computational methods. Wiley, New York, NY De Rocquigny E, Devictor N, Tarantola S (2008) Uncertainty in Industrial Practice. Wiley, New York, NY Doury A (1980) Pratiques franc¸aises en matie`re de dispersion quantitative de la pollution atmosphe´rique potentielle lie´e aux activite´s nucle´aires. Se´minaire sur les rejets radioactifs et leur dispersion dans l’atmosphe`re a` la suite d’un accident hypothe´tique de re´acteur. RISQ, Danemark Efron B, Stein C (1981) The jacknife estimate of variance. Ann Stat 9:586–596 Fang K-T, Li R, Sudjianto A (2006) Design and modeling for computer experiments. Chapman & Hall/CRC Fasso A, Esposito A, Porcu E, Reverberi AP, Veglio F (2003) Statistical sensitivity analysis of packed column reactors for contaminated wastewater. Environmetrics 14:743–759 Hastie T, Tibshirani R (1990) Generalized additive models. Chapman & Hall/CRC, New York, NY, USA Helton J (1993) Uncertainty and sensitivity analysis techniques for use in performance assessment for radioactive waste disposal. Reliab Sys Saf 42:327–367 Higdon D, Gattiker J, Williams B, Rightley M (2008) Computer model calibration using high-dimensional output. J Am Stat Assoc 103:571–583 Homma T, Saltelli A (1996) Importance measures in global sensitivity analysis of non linear models. Reliab Eng Sys Saf 52:1–17 Hyndman R, Shang H-L (2010) Rainbow plots, bagplots and boxplots for functional data. J Comput Graph Stat 19(1):29–45 Johnson ME, Moore LM, Ylvisaker D (1990) Minimax and maximin distance designs. J Stal Plan Inference 26:131–148 Lamboni M, Monod H, Makowski D (2011) Multivariate sensitivity analysis to measure global contribution of input factors in dynamic models. Reliab Eng Sys Saf 96:450–459 Lilburne L, Tarantola S (2009) Sensitivity analysis of spatial models. Int J Geogr Inf Sci 23:151–168 Marrel A, Iooss B, Van Dorpe F, Volkova E (2008) An efficient methodology for modeling complex computer codes with Gaussian processes. Comput Stat Data Anal 52:4731–4744 Marrel A, Iooss B, Jullien M, Laurent B, Volkova E (2011) Global sensitivity analysis for models with spatially dependent outputs. Environmetrics 22:383–397 McKay MD, Conover WJ, Beckman RJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21:239–245 Monfort M, Patryl L, Armand P (2011). Presentation of the CERES platform used to evaluate the consequences of the emissions of pollutants in the environment. International conference on radioecology & environmental radioactivity: environment & nuclear renaissance Nychka D, Cox L, Piegorsch W (1998) Case Studies in Environmental Statistics. Springer Verlag, Berlin Oakley J, O’Hagan A (2002) Bayesian inference for the uncertainty distribution. Biometrika 89:769–784
123
974 Rasmussen CE, Williams C (2006) Gaussian Processes for Machine Learning. The MIT Press, Cambridge Rousseeuw PJ, Ruts I, Tukey JW (1999) The bagplot: a bivariate boxplot. Am Stat 53:382–387 Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4:409–435 Saltelli A, Chan K, Scott EM (eds) (2000) Sensitivity analysis. Wiley series in probability and statistics. Wiley, New York, NY, USA Saltelli A, Annoni P, Azzini I, Campolongo F, Ratto M, Tarantola S (2010) Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index. Comput Phys Commun 181:259–270 Shi J, Wang B, Murray-Smith R, Titterington D (2007) Gaussian process functional regression modeling for batch data. Biometrics 63:714–723
123
Stoch Environ Res Risk Assess (2015) 29:959–974 Sobol IM (1993) Sensitivity estimates for non linear mathematical models. Math Model Comput Exp 1:407–414 Sobol IM (2001) Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math Comput Simul 5:271–280 Tukey JW (1975) Mathematics and the picturing of data. In: Proceedings of the international congress of mathematicians, vol 2, p. 523–521 Volkova E, Iooss B, Van Dorpe F (2008) Global sensitivity analysis for a numerical model of radionuclide migration from the RRC ‘‘Kurchatov Institute’’ radwaste disposal site. Stoch Environ Res Risk Asses 22:17–31 Welch W, Buck R, Sacks J, Wynn H, Mitchell T, Morris M (1992) Screening, predicting, and computer experiments. Technometrics 34:15–25