STATISTICAL INFERENCE ON SENSITIVITY INDICES OF MATHEMATICAL MODELS: AN ILLUSTRATION TO A FLOOD RISK MODEL
1
MATIEYENDOU LAMBONI1,2∗ Univ. of Guyane, Department of Science and Technology (DFRST), 97346 Cayenne, French Guiana, France 2 228-UMR Espace-Dev, Univ. of Guyane, Univ. of R´eunion, Univ. of Montpellier, Univ. of Nouvelle-Cal´edonie, IRD
[email protected]/
[email protected]
A deep exploration of mathematical models in reliability analysis is often made by using model-free methods. Variance-based sensitivity analysis ([18,17,3]) and multivariate sensitivity analysis ([10,9,2]) are ones of them and aim at apportioning the variability of the model output(s) into input factors and their interactions. Sobol’s first-order index of a single factor and of a group of factors, which accounts for the effect of input factor(s), serves as a practical tool to assess the order of interactions among input factors. In this abstract, we propose an optimal estimator of the (non-normalized) first-order index, including its rate of convergence. The optimal estimator of the non-normalized index makes use of a kernel of degree (p, q). We also provide the statistical properties of the estimator of the first-order index, including the asymptotic confidence bounds. An illustration to a flood risk model shows that our estimator allows for improving the estimations of the first-order indices.
Keywords: First-order index; Mathematical models; Optimal estimator; Uncertainty; U-Statistics.
1. Methodology Let Y = f (X) be a model that includes d independent input factors X = (X1 , . . . , Xd ) (assumption A1). Under assumption E f 2 (X) < +∞ (A2), we have the Hoeffding decomposition: X f (X) = fu (Xu ) , (1) u⊆{1,2,...d}
where f∅ = E [f (X)], fj (Xj ) = E [f (X)|Xj ] − f∅ , and E [fu (Xu )] = 0.
∗ Corresponding
Author
In the following text, let Xu = {Xj , j ∈ u} be a set of input factors and X∼u denote the vector containing all input factors except Xu . We have the following partition: X = (Xu , X∼u ). It is well known that the non-normalized first-order index of a set of inputs Xu , is defined as follows: Du = V [E (f (X) | Xu )] .
(2)
If we use D for the variance of the model output, the first-order index of Xu ([18]) is given as follows: Su =
Du . D
(3) (j)
Definition 1 Let us consider p ≥ 2, q ≥ 2 be integers; Xu , j = 1, 2, . . . , p (j) be p vectors from the probability measure µ(Xu ); X∼u , j = 1, 2, . . . , q be q vectors from the probability measure µ(X∼u ). We consider the kernel of degree (p, q) (K(·)) defined like: (p) (1) (q) K X(1) , . . . , X , X , . . . , X u u ∼u ∼u =
p X q−1 X q X 2 × p2 (p − 1)q(q − 1) k=1 l=1 i=l+1
p X
j1 =1 j1 6=k
p h i h i X (k) (i) (j2 ) (i) (l) (j1 ) (l) f (X , X ) − f (X , X ) f (X(k) , X ) − f (X , X ) u ∼u u ∼u . u ∼u u ∼u j2 =1 j2 6=k
(4) (q) (1) (p) (1) The kernel K Xu , . . . , Xu , X∼u , . . . , X∼u is symmetric under in(1)
(p)
dependent permutations of its first arguments (Xu , . . . , Xu ) and of its (1) (q) second arguments (X∼u , . . . , X∼u ). The following theorem gives the property of the kernel K(·). Theorem 1 Under assumptions A1 and A2, we have: h i (p) (1) (q) E K X(1) = Du . u , . . . , Xu , X∼u , . . . , X∼u
(5)
Proof The proof is straightforward knowing that ([13]): h i 0 Du = Cov f (Xu , X∼u ), f (Xu , X∼u ) , under assumption A1 (see [8] for more details). 2
Theorem 2 Let Y = f (X) be a model output and consider two inde(1) (p) (1) (q) pendent samples (Xi,u , . . . , Xi,u ) from µ(Xu ) and (Xi,∼u 4 , . . ., Xi,∼u ) from µ(X∼u ) with i = 1, 2, . . . , m. If assumptions A3 (E f (X) < +∞) and A4 (2 ≤ p, 2 ≤ q) hold, then: i) the minimum variance (best) unbiased estimator of Du for a given (p, q) and m is: m
cu = D
p
q−1
q
XXXX 2 2 mp (p − 1)q(q − 1) i=1 k=1 l=1 i1 >l
×
p X
j1 =1 j1 6=k
p h i h i X (k) (i1 ) (j ) (i1 ) (k) (l) (j ) (l) f (Xi,u , Xi,∼u ) − f (Xi,u2 , Xi,∼u ) (6); f (Xi,u , Xi,∼u ) − f (Xi,u1 , Xi,∼u ) j2 =1 j2 6=k
cu is: ii) the variance of D cu ) = V(D
2 σp,q m
2 2 cu − Du = σp,q and mE D ,
(7)
2 with σp,q the variance of the kernel K(·);
iii) if m → +∞, we have the consistency of the estimator of the firstcu = Dcu ): order index (S b D P cu − S → Su ;
(8)
iv) and its asymptotic distribution: √
D
cu − Su −→ N m S
2 σp,q 0, 2 D
! ,
(9)
with D the model variance; v) the 100(1 − α)% asymptotic confidence bounds of Su is: σp,q h1−α/2 c σp,q h1−α/2 c √ √ Su ∈ Su − , Su + , D m D m
(10)
with h1−α/2 the 1 − α/2 fractile of the standard normal distribution. (1) (p) (1) (q) Proof The kernel K Xu , . . . , Xu , X∼u , . . . , X∼u is symmetric un(1)
(p)
der independent permutations of its first arguments (Xu , . . . , Xu )
(1)
(q)
and (X∼u , . . . , X∼u ). Knowing that h of its second arguments i (1) (p) (1) (q) E K Xu , . . . , Xu , X∼u , . . . , X∼u = Du (see Theorem 1), points i) - iv) are obtained using the properties of U-statistics ([11,1,12,4,5]) and the Slutsky’s theorem. For comprehensive details, see [7,8]. Point v) is the classical confidence bounds of Su knowing the asymptotic cu and the value of σp,q . distribution of S D 2 2. Illustration to a Flood Risk Model 2.1. Flood Risk Model To illustrate our approach, we consider a flood risk model that simulates the height of a river compared to the height of a dyke ([6]). Flooding occurs when the height of a river is over the height of the dyke. The model includes 8 input factors (listed in Table 1) and it is defined as follows: 0.6 Q , q (11) S = Zv + H − Hd − Cb with H = Zm −Zv BKs L with S the maximal annual overflow (in meters) and H the maximal annual height of the river (in meters). Table 1. Input Q Ks Zv Zm Hd Cb L B
Input variables of the flood model and their probability distributions
Description Maximal annual flowrate Strickler coefficient River downstream level River upstream level Dyke height Bank level Length of the river stretch River width
Unit m3 /s m m m m m m
Probability distribution Truncated Gumbel G(1013, 558) on [500, 3000] Truncated normal N (30, 8) on [15, +∞[ Triangular T (49, 50, 51) Triangular T (54, 55, 56) Uniform U [7, 9] Triangular T (55, 55.5, 56) Triangular T (4990, 5000, 5010) Triangular T (295, 300, 305)
2.2. Implementation Issue For a fair comparison, we computed the indices for degree (p, q) and (p, q = 2) using the same number of model runs. We consider six different values of the degree (p, q), as follows: (p = 3, q = 3), (p = 4, q =
3), (p = 4, q = 4), (p = 5, q = 3), (p = 5, q = 4), (p = 5, q = 5). We also compare our estimator with the estimator of the first-order index from [16], implemented in the R-package ”sensitivity” ([14]). We used the root mean square error (RMSE) to assess the accuracy of our estimations. For each sample size (m) and for each degree (p, q), we replicated the process of computing the indices R = 30 times (changing randomly the seed when sampling input values). The average RMSE of all of the d indices is defined as follows: v d u R 2 X 1 Xu t1 d S , RM SEd = j,r − Sj d j=1 R r=1
(12)
d where Sj and S j,r are respectively the true and the estimated values of the d first-order indices of factor Xj , j = 1, 2, . . . , d. Furthermore, S j,r is the estimate for a given replication r. 2.3. Numerical Results In Figure 1, we present the average of the RMSEs of the d = 8 first-order indices when increasing the total number of model evaluations. Figure 1 consists of 6 panels associated to the six different values of the degree (p, d). In each panel, we show the trends of the average RMSEs for the degree (p, q) compared to the RMSEs associated to the degree (p, q = 2) and to the RMSEs associated to the estimator from [16]. The RMSEs associated to our estimator, for different values of the degree, decrease with the number of model evaluations and we have converging estimations. It comes out that our estimator outperforms the estimator from [16] with significant difference in the bottom panels, suggesting to use small values for the degree. 3. Conclusions We propose a new estimator of Sobol’s first-order index by making use of a kernel of degree (p, q) based upon U-statistics. The properties of Ustatistics allow for deriving the main properties of our estimator such as the optimality and the rate of convergence of the non-normalized estimator
Degree (p, q) = (5, 4)
Degree (p, q) = (5, 5)
3.0
3.5
4.0
−3.2 −3.4 −3.6
−3.6
log of RMSE
−3.5
−3.4
−3.0
−3.2
−2.5
−3.0
−3.0
Degree (p, q) = (5, 3)
4.5
3.5
4.0
4.5
3.5
Degree (p, q) = (4, 3)
4.5
−2.5 −3.0 −3.5
−3.5
−3.5
−3.0
−3.0
−2.5
−2.5
−2.0
4.0
Degree (p, q) = (4, 4)
−2.0
Degree (p, q) = (3, 3)
3.0
3.5
4.0
3.0
3.5
4.0
4.5
3.0
3.5
4.0
4.5
log10 of Total number of Model evaluations
Figure 1. Log-RMSEs against the total number of model evaluations (in log10 ) for six values of the degree (p = 3, q = 3), (p = 4, q = 3), (p = 4, q = 4), (p = 5, q = 3), (p = 5, q = 4), (p = 5, q = 5). For each panel, we show the RMSE for the degree (p, q) (solid line), the RMSE for degree (p, q = 2) (dashed line), and the RMSE associated to the estimator from [15] (dotted line).
of the first-order index; the asymptotic confidence bounds of the first-order index. We found that, for a given input factor(s), the kernel of degree (p, q) with a low variance should be preferred for estimating Sobol’ indices. The numerical tests confirmed the superiority of our estimator of the first-order index compared to the estimator from [16]. The numerical tests suggest using small values of the degree. References [1] T. S. Ferguson. A Course in Large Sample Theory. Chapman-Hall, New York, 1996. [2] Fabrice Gamboa, Alexandre Janon, Thierry Klein, and Agn`es Lagnoux. Sensitivity indices for multivariate outputs. Comptes Rendus de l’Acadmie des Sciences, page In press, 2014. [3] R. Ghanem, D. Higdon, and H. Owhadi. Handbook of Uncertainty Quantification. Springer International Publishing, 2017. [4] W. Hoeffding. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics, 19:293–325, 1948.
[5] W. Hoeffding. A non-parametric test for independence. Annals of Mathematical Statistics, 19:546–557, 1948. [6] B. Iooss. Revue sur l’analyse de sensibilit´e globale de mod`eles num´eriques. Journal de la Soci´et´e Fran¸caise de Statistique, 152:1–23, 2011. [7] M. Lamboni. Global sensitivity analysis: a generalized, unbiased and optimal estimator of total-effect variance. Statistical Papers, pages 1–26, 2016. [8] M. Lamboni. Uncertainty quantification: A minimun variance unbiased (joint) estimator of the non-normalized sobol’s indices. Submitted, 2017. [9] M. Lamboni, D. Makowski, S. Lehuger, B. Gabrielle, and H. Monod. Multivariate global sensitivity analysis for dynamic crop models. Fields Crop Reasearch, 113:312–320, 2009. [10] M. Lamboni, H. Monod, and D. Makowski. Multivariate sensitivity analysis to measure global contribution of input factors in dynamic models. Reliability Engineering and System Safety, 96:450–459, 2011. [11] E. L. Lehmann. Consistency and unbiasedness of certain nonparametric tests. Annals of Mathematical Statistics, 22:165–179, 1951. [12] E. L. Lehmann. Elements of Large Sample Theory. Springer, 1999. [13] A. B. Owen. Variance components and generalized Sobol’ indices. SIAM/ASA Journal on Uncertainty Quantification, 1(1):19–41, 2013. [14] Gilles Pujol, Bertrand Iooss, and Alexandre Janon. sensitivity: Sensitivity Analysis, 2013. R package version 1.7. [15] A. Saltelli and P. Annoni. How to avoid a perfunctory sensitivity analysis. Environmental Modelling and Software, 25(12):1508–1517, 2010. [16] A. Saltelli, P. Annoni, I. Azzini, F. Campolongo, M. Ratto, and S. Tarantola. Variance based sensitivity analysis of model output. design and estimator for the total sensitivity index. Computer Physics Communications, 181(2):259– 270, 2010. [17] A Saltelli, K. Chan, and E. Scott. Variance-Based Methods. Probability and Statistics. John Wiley and Sons, 2000. [18] I. M. Sobol. Sensitivity analysis for non-linear mathematical models. Mathematical Modelling and Computational Experiments, 1:407–414, 1993.