Applied Mathematical Modelling 39 (2015) 5310–5326
Contents lists available at ScienceDirect
Applied Mathematical Modelling journal homepage: www.elsevier.com/locate/apm
Characterization of uncertainty in probabilistic model using bootstrap method and its application to reliability of piles Dian-Qing Li a,⇑, Xiao-Song Tang a, Chuang-Bing Zhou a, Kok-Kwang Phoon a,b a b
State Key Laboratory of Water Resources and Hydropower Engineering Science, Wuhan University, 8 Donghu South Road, Wuhan 430072, PR China Department of Civil and Environmental Engineering, National University of Singapore, Blk E1A, #07-03, 1 Engineering Drive 2, Singapore 117576, Singapore
a r t i c l e
i n f o
Article history: Received 19 May 2014 Received in revised form 1 March 2015 Accepted 11 March 2015 Available online 28 March 2015 Keywords: Probabilistic models Uncertainty Copulas Bootstrap method Probability of failure
a b s t r a c t This paper aims to propose a bootstrap method for characterizing the uncertainty in probabilistic models and its effect on geotechnical reliability. First, the copula theory is briefly introduced. Second, both the uncertainties in parameters and type of the best-fit marginal distributions and copulas are characterized by the bootstrap method. Finally, four load-test datasets of load-settlement curves of piles are used to illustrate the proposed method. The serviceability limit state reliability analysis of piles is presented to illustrate the practical application of the proposed method. The results indicate that the bootstrap method can effectively characterize the uncertainty in probabilistic models derived from a small sample. Through bootstrapping, the uncertainties in both the parameters and type of the specified probabilistic models are simultaneously incorporated into geotechnical reliability analyses. The probability of failure of piles is represented by a confidence interval at a specified confidence level instead of a single fixed probability, which enables the engineers to make a more informed decision. Ó 2015 Elsevier Inc. All rights reserved.
1. Introduction The geotechnical engineering literature is replete with correlations between two engineering parameters, such as cohesion and friction angle of soils [1–3], two hyperbolic curve-fitting parameters underlying load-settlement curves of piles [4], and curve-fitting parameters of a soil–water characteristic curve [5]. To achieve a rigorous evaluation of geotechnical reliability, the joint cumulative distribution function (CDF) or probability density function (PDF) of these parameters should be known [6–9]. Recently, the copula theory (e.g., [10,11]) has been received increasing attention in geotechnical engineering for constructing the joint CDF or PDF of multivariate data, particularly bivariate data [12–19]. There are many copulas in the literature to characterize the dependence structure among variables such as Gaussian, t, Frank, Clayton and Gumbel copulas. It is widely accepted that there exist many marginal distributions to be used for fitting the univariate data. Among the candidate marginal distributions and bivariate distributions characterized by copulas, the best-fit marginal distribution and best-fit copula are usually identified based on the measured data using Akaike Information Criterion (AIC) [20] or Bayesian Information Criterion (BIC) [21] in the copula approach. Furthermore, sample statistics such as sample mean, sample standard deviation (SD) and sample Kendall rank correlation coefficient are used to determine the parameters of marginal distributions and copulas. Theoretically, the joint CDF or PDF determined by the copula approach is accurate
⇑ Corresponding author. Tel.: +86 27 6877 2496; fax: +86 27 6877 4295. E-mail address:
[email protected] (D.-Q. Li). http://dx.doi.org/10.1016/j.apm.2015.03.027 0307-904X/Ó 2015 Elsevier Inc. All rights reserved.
D.-Q. Li et al. / Applied Mathematical Modelling 39 (2015) 5310–5326
5311
when sufficient data are available. In geotechnical practice, unfortunately, probabilistic models such as marginal distributions and copulas are commonly selected from a very limited data because small sample size is a main feature of geotechnical data [22], which leads to uncertainty in the derived probabilistic models [23]. These uncertainties mainly consist of two components: (1) the uncertainty in distribution parameters and copula parameters underlying the derived marginal distributions and copulas, and (2) the uncertainty in type of the best-fit marginal distributions and copulas. In the literature, the probabilistic models derived from a small sample size are often assumed the population ones [13–15]. Thus, the reliability analyses are significantly affected by both the uncertainties in parameters and type of the derived probabilistic models, which should be properly considered to achieve a more meaningful reliability estimate. The key issue is to characterize the uncertainty in probabilistic models based on the limited data. In statistics, the resampling techniques such as jackknifing [24] and bootstrapping [25] are often used to derive the sampling distributions of statistics based on the measured data. Compared with the jackknife method, the bootstrap method is simple in engineering applications [25], especially in the situation where the distribution of a statistic concerned is unknown and/or the sample size of the measured data is insufficient for straightforward statistical inference [26–28]. It has been widely used to derive the sampling distributions and confidence intervals of statistics in many fields such as hydrology and economics [29–32]. Recently, it is applied to geotechnical engineering [33–35]. For instance, Most and Knabe [33] used the bootstrap method to estimate the variations of sample mean and SD of shear strength parameters associated with the bearing failure problem. Most [34] employed the bootstrap method to derive the variations of the first four moments of input parameters. Luo et al. [35] investigated the effect of uncertainty in mean and SD of typical soil parameters on probability of exceedance in a braced excavation using bootstrapping. It is clear that these studies mainly focused on the variations of the statistical moments, in which only the uncertainty in distribution parameters of variables was considered. In these studies, the distribution types of variables were typically assumed to be a fixed distribution such as the normal, lognormal or maximum entropy distribution. As mentioned previously, there exist uncertainties in the results of the best-fit marginal distributions and copulas determined from a small sample using AIC/BIC. Hence, the uncertainty in type of the best-fit marginal distributions and copulas and its effect on reliability should be further studied, and is the topic of the present research. This paper aims to propose a bootstrap method for characterizing the uncertainty in probabilistic models and investigating its effect on geotechnical reliability. To achieve this goal, this article is organized as follows. In Section 2, the copula theory is briefly introduced to model the bivariate distribution of correlated geotechnical parameters. Thereafter, the AIC and BIC are also presented to identify the best-fit marginal distributions and copulas. In Section 3, the bootstrap method is presented to characterize both the uncertainties in parameters and type of the best-fit marginal distributions and copulas. An example of probabilistic model of two hyperbolic curve-fitting parameters underlying load-settlement curves of piles is presented to illustrate the proposed method in Section 4. A practical application of serviceability limit state reliability analysis of piles is also presented. 2. Bivariate probability distribution functions using copula Let X1 and X2 be two random variables with marginal cumulative distribution functions (CDFs) F1(x1) and F2(x2), respectively. F(x1, x2) is the joint distribution function of X1 and X2. The marginal probability density functions (PDFs) of X1 and X2 are denoted by f1(x1) and f2(x2), respectively. According to Sklar’s theorem (e.g., [10]), the bivariate joint CDF of X1 and X2 can be constructed as
Fðx1 ; x2 Þ ¼ CðF 1 ðx1 Þ; F 2 ðx2 Þ; hÞ
ð1Þ
in which C is a copula; h is a copula parameter describing the dependency between X1 and X2. If F1(x1) and F2(x2) are continuous, then C is unique; otherwise, C is uniquely determined on range F1 range F2. Let u1 = F1(x1) and u2 = F2(x2), then C(F1(x1), F2(x2); h) = C(u1, u2; h). Therefore, the copula function C(u1, u2; h) is a two-dimensional probability distribution on [0, 1]2 with uniform marginal probability distributions on [0, 1]. Sklar’s theorem clearly states that the joint probability distribution of random variables can be expressed in terms of a copula function and its marginal probability distributions. From Eq. (1), the bivariate joint PDF f(x1, x2) of X1 and X2 can be obtained as
f ðx1 ; x2 Þ ¼
@ 2 CðF 1 ðx1 Þ; F 2 ðx2 Þ; hÞ f 1 ðx1 Þf 2 ðx2 Þ ¼ cðF 1 ðx1 Þ; F 2 ðx2 Þ; hÞf 1 ðx1 Þf 2 ðx2 Þ ¼ cðu1 ; u2 ; hÞf 1 ðx1 Þf 2 ðx2 Þ @F 1 ðx1 Þ@F 2 ðx2 Þ
ð2Þ
in which c(F1(x1), F2(x2); h) is the copula density associated with the copula function C(F1(x1), F2(x2); h), which is given by
cðF 1 ðx1 Þ; F 2 ðx2 Þ; hÞ ¼ cðu1 ; u2 ; hÞ ¼ @ 2 Cðu1 ; u2 ; hÞ=@u1 @u2 :
ð3Þ
Theoretically, the joint CDF and PDF of X1 and X2 can be determined by Eqs. (1) and (2) if the marginal distributions of X1 and X2, and the copula function are known. Within the framework of the copula theory, the Kendall rank correlation coefficient s can be expressed in terms of a copula function, which is given by
s¼4
Z 0
1
Z 0
1
Cðu1 ; u2 ; hÞdCðu1 ; u2 ; hÞ 1:
ð4Þ
5312
D.-Q. Li et al. / Applied Mathematical Modelling 39 (2015) 5310–5326
This relationship is very useful. For a prescribed target s, the copula parameter h underlying a specified copula can be determined by solving the nonlinear Eq. (4) as demonstrated later. Many functions exist in the literature satisfying the mathematical conditions for a copula function. Thus, the best-fit copula for fitting the measured data should be identified. In engineering applications, the AIC [20] or BIC [21] is often adopted for such a purpose. A copula producing the smallest AIC or BIC score is considered to be the best-fit copula. The AIC and BIC are defined as
AIC ¼ 2
N X
ln cðu1i ; u2i ; hÞ þ 2k1 ;
ð5Þ
ln cðu1i ; u2i ; hÞ þ k1 ln N;
ð6Þ
i¼1
BIC ¼ 2
N X i¼1
where k1 is the number of copula parameters; {(u1i, u2i), i = 1, 2, . . ., N} are the empirical distribution values of measured (X1, X2), which are defined as
8 rankðx1i Þ > > < u1i ¼ Nþ1 > > : u2i ¼ rankðx2i Þ Nþ1
i ¼ 1; 2; . . . ; N;
ð7Þ
where rank(x1i) [or rank(x2i)] denotes the rank of x1i (or x2i) among the list {x11, x12, . . . , x1N} (or {x21, x22, . . . , x2N}) in an ascending order. Similarly, the AIC or BIC can also be used to identify the best-fit marginal distributions. A marginal distribution producing the smallest AIC or BIC score is considered to be the best-fit marginal distribution. The AIC and BIC are defined as
AIC ¼ 2
N X
ln f ðxi ; p; qÞ þ 2k2 ;
ð8Þ
ln f ðxi ; p; qÞ þ k2 ln N;
ð9Þ
i¼1
BIC ¼ 2
N X i¼1
where p and q are the distribution parameters of a marginal distribution; k2 is the number of distribution parameters. Note that the distribution parameters p and q can be obtained using the mean and SD of a random variable. 3. Bootstrap method for characterizing the uncertainty in probabilistic models It is well known that sample statistics such as sample mean, sample SD and sample Kendall rank correlation coefficient are used instead of the unknown population ones to determine the parameters of marginal distributions and copulas. Furthermore, the AIC/BIC scores are employed to identify the best-fit marginal distributions and copulas. These sample statistics and the AIC/BIC scores may exhibit large variations due to small sample size, which leads to uncertainties in the derived probabilistic models. These uncertainties mainly consist of two components: (1) the uncertainty in distribution parameters and copula parameters underlying the derived marginal distributions and copulas, and (2) the uncertainty in type of the best-fit marginal distributions and copulas. In this study, these uncertainties are characterized using a bootstrap method, as introduced below. In geotechnical engineering practice, the available information is in terms of only limited observations of the geotechnical parameters. It is impossible to exactly estimate the sampling properties of statistics based on limited observations. Resampling technique such as the bootstrap method [25] provides a simple and practical tool for deriving the approximate sampling properties of statistics based on limited data. The bootstrap method is a straightforward approach to derive the approximate sampling properties of statistics. It is useful for statistics with an unknown distribution and data sets with a small sample size. For example, numerical examples show that the bootstrap method can effectively model the variations of sample statistics and AIC/BIC scores even though the sample size of the original data set is as small as N = 20. The idea behind the bootstrap method is relatively simple. That is to create many sets of bootstrap samples by random sampling with replacement from the original data set. Let X = {xi, i = 1, 2, . . . , N} denote the original data set, a bootstrap sample set Bj = {B1,j, B2,j, . . . , BN,j} is constructed by random sampling with replacement from X, as illustrated in Fig. 1. In this set, each observation xi may appear once, more than once or not at all. With the constructed bootstrap sample set, the statistics of concern (e.g., sample mean, sample SD, sample Kendall rank correlation coefficient and AIC/BIC scores) are obtained. For example, the sample mean and sample SD are given by
BN;j ¼ and
N 1X Bi;j N i¼1
ð10Þ
D.-Q. Li et al. / Applied Mathematical Modelling 39 (2015) 5310–5326
5313
Fig. 1. Generation of one bootstrap sample set from the original observation set by random sampling with replacement (modified from [33]).
SN;j
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N u 1 X 2 ¼t Bi;j BN;j : N 1 i¼1
ð11Þ
The above procedure is repeated many times and Ns bootstrap sample sets are obtained. These sets of bootstrap samples are used to estimate the statistics such as the sample mean, sample SD, sample Kendall rank correlation coefficient and AIC/BIC scores. To this end, each statistic investigated can be estimated using its mean value and standard deviation. Then, the bootstrap mean and SD estimates of the sample mean, MN, can be calculated by
MN;mean
Ns 1X BN;j Ns j¼1
ð12Þ
and
rMN
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u Ns u 1 X 2 t ðBN;j M N;mean Þ Ns 1 j¼1
ð13Þ
Similarly, the bootstrap mean and SD estimates of the sample SD, SN, are derived as below:
SN;mean
Ns 1X SN;j Ns j¼1
ð14Þ
and
rSN
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u Ns u 1 X t ðSN;j SN;mean Þ2 Ns 1 j¼1
ð15Þ
As can be seen from the bootstrap method, only the original data set is required in this method, which does not require additional simulations because the resampling process draws samples only from the existing data set. As pointed out by Most and Knabe [33], a large value for Ns is required to obtain converged results in the statistical analysis. Following Most and Knabe [33], Luo et al. [35], a value of Ns = 10,000 is adopted in this study. The sample size of each bootstrap sample set is taken equal to that of the original data set, N. The sample statistics may be overestimated or underestimated if it is set to be smaller or greater than N [36]. 4. Illustrative example 4.1. Measured data for hyperbolic curve-fitting parameters of piles In pile foundation design, a hyperbolic equation is usually fitted to the measured load-settlement data in a pile load test to characterize the load-settlement behavior of piles [37,38]. Then, the Davisson’s failure criterion is adopted to interpret the measured capacities of piles [4,39]. To reduce the scatter within the hyperbolic load-settlement curves, Phoon and Kulhawy [40] recommended normalizing the hyperbolic equation by the measured capacity. The normalized hyperbolic loadsettlement model is as follows
5314
D.-Q. Li et al. / Applied Mathematical Modelling 39 (2015) 5310–5326
Q y ¼ ; Q STC a þ by
ð16Þ
where Q is the applied load (kN); y is the pile head settlement (mm); QSTC denotes the measured ultimate capacity (kN); a (mm) and b are hyperbolic curve-fitting parameters for the normalized load-settlement curve. Note that the curve-fitting parameters are physically meaningful, with the reciprocals of a and b equal to the initial slope and asymptotic value, respectively, as illustrated in Fig. 2. Eq. (16) reduces uncertainties in a set of nonlinear continuous curves to uncertainties in two hyperbolic curve-fitting parameters. To derive probabilistic models of the two parameters for reliability-based serviceability limit state design, four measured data sets of (a, b) compiled by Dithinde et al. [4] are considered in this study. Fig. 3 shows the scatter plots of the measured data. Note that the above four pile load test databases are collected from the geologic region of Southern Africa. Each pile load test in the databases includes complete test information such as test pile size (length and diameter), complete records of the load-settlement data and availability of subsurface exploration data for the site. The main pile types include Franki (expanded base) piles, auger piles and continuous flight auger (CFA) piles which are widely used in Southern Africa. The soil types conform to geomaterials occurring in a typical Southern Africa soil profile. According to the construction process and soil type, four pile-soil classes namely driven piles in noncohesive soils (D-NC), bored piles in noncohesive soils (B-NC), driven piles in cohesive soils (D-C) and bored piles in cohesive soils (B-C) are sorted [4]. After a careful detection of data outliers, the resulting sample sizes (N) are 28, 30, 59 and 53, respectively. Since the considered databases cover a fairly representative range of pile sizes, pile types and soil types in Southern Africa, the four data sets of (a, b) are considered as independent and identically distributed (i.i.d.) observations from the four population samples (i.e., D-NC, B-NC, D-C and B-C piles in Southern Africa). It is clear that all the four data sets of (a, b) are small sample size from the statistical point of view. Based on the measured data for (a, b), the statistical properties of the two parameters including sample mean, sample SD, sample coefficient of variation (COV) and sample Kendall rank correlation coefficient [14] can be obtained, which are summarized in Table 1. In the context of this study, these properties are referred to as sample statistics from the original data sets. It is evident from Fig. 3 that there is a strongly negative correlation between a and b, which should be considered explicitly to achieve realistic results. For example, if this correlation were ignored, the scatter within the hyperbolic loadsettlement curves would not be simulated properly and the probability of serviceability failure of piles would be highly overestimated [14]. Previous studies [37,4] employed a translation approach for modeling the bivariate distribution of the two parameters. Li et al. [14] pointed out that the translation approach essentially adopts a Gaussian copula for modeling the dependence between a and b. Li et al. [14] revealed that the Gaussian copula is not the best-fit copula for modeling the dependence between a and b in most cases. Hence, they suggested modeling the bivariate distribution using the copula approach, which is also adopted in this study. 4.2. Bivariate distribution of curve-fitting parameters using copulas To fit the marginal distributions of a and b, four candidate marginal distributions, namely Normal truncated below zero (referred to as TruncNormal hereafter), Lognormal, Gumbel truncated below zero (TruncGumbel) and Weibull distributions are examined in this study. These four marginal distributions can ensure that the simulated data of (a, b) are positive. Table 2 summarizes the probability density functions and distribution parameters. The distribution parameters underlying the four marginal distributions are determined through the means l and SDs r of a and b, as shown in Table 2. Having determined the distribution parameters of the four marginal distributions, the next step is to select a marginal distribution that provides the best fit to the measured data using AIC or BIC. Since the four marginal distributions have
Fig. 2. Definition of hyperbolic curve-fitting parameters for piles.
5315
D.-Q. Li et al. / Applied Mathematical Modelling 39 (2015) 5310–5326
Fig. 3. Scatter plots of the measured hyperbolic curve-fitting parameters for piles.
Table 1 Sample statistics for measured hyperbolic curve-fitting parameters. D-NC
Sample Sample Sample Sample
mean SD COV correlation coefficient
B-NC
D-C
B-C
a (mm)
b
a (mm)
b
a (mm)
b
a (mm)
b
5.55 3.00 0.54 0.597
0.71 0.10 0.14
4.10 3.20 0.78 0.750
0.77 0.16 0.21
3.58 2.04 0.57 0.740
0.78 0.09 0.11
2.79 2.04 0.73 0.755
0.82 0.09 0.11
the same number of distribution parameters, i.e. k2 = 2, the best-fit marginal distribution identified using AIC and BIC remains the same. For illustration, only AIC is used to identify the best-fit marginal distribution hereafter. Table 3 shows the AIC scores associated with the four marginal distributions for the four data sets of (a, b). It is evident that the Weibull distribution is the best-fit marginal distribution in most cases. The often used Lognormal distribution is the best-fit marginal distribution only for parameter b associated with D-NC data set. In the context of this study, the AIC scores for the four marginal distributions and best-fit marginal distributions for the four data sets in Table 3 are referred to as the AIC scores and best-fit marginal distributions derived from the original data sets. After determining the best-fit marginal distributions of (a, b), the next step is to select a copula for constructing the bivariate distribution of (a, b). To characterize the dependence structures between a and b, four candidate copulas, namely
5316
D.-Q. Li et al. / Applied Mathematical Modelling 39 (2015) 5310–5326
Table 2 Probability density functions and domains of distribution parameters associated with the selected four marginal distributions. Distribution TruncNormal Lognormal TruncGumbel Weibull
f (x; p, q) i 2 h 12 xp 1 Uð pqÞ q 2 1ffiffiffiffiffi p exp 12 ln xp q qx 2p p1ffiffiffiffiffi exp q 2p
q expfqðx pÞ exp½qðx pÞg=f1 exp½ expðpqÞg q1 h q i q x exp px p p
l, r2
Range of p
Range of q
l ¼ p; r2 ¼ q2
(1, 1)
(0, 1)
l ¼ expðp þ 0:5q2 Þ; r2 ¼ ½expðq2 Þ 1 expð2p þ q2 Þ
(1, 1)
(0, 1)
l ¼ p þ 0:5772=q; r2 ¼ p2 =ð6q2 Þ l ¼ pCð1 þ 1=qÞ; r2 ¼ p2 ½Cð1 þ 2=qÞ C2 ð1 þ 1=qÞ
(1, 1) (0, 1)
(0, 1) (0, 1)
Note: U denotes the standard normal distribution function; C is the gamma function.
Table 3 AIC scores associated with various marginal distributions of parameters a and b using the measured data. TruncNormal a
Lognormal
b
a
b
TruncGumbel
Weibull
a
b
a
140.72
44.98
140.36
44.29
Weibull
Lognormal
150.56
7.05
148.99
22.77
Weibull
Weibull
243.59 206.45
116.86
Weibull
Weibull
104.89
TruncGumbel
Weibull
D-NC
142.16
48.34
146.99
B-NC
151.67
20.73
205.32
48.82 15.85
D-C
249.61
115.58
250.34
113.18
243.96
96.76
B-C
219.49
99.32
220.18
91.49
202.93
11.70
Best-fit distribution b
a
b
Note: The AIC scores are bold and underlined if the corresponding marginal distributions are preferred.
Table 4 Copula density functions and domains of copula parameters associated with the selected four copulas. Copula Gaussian
c (u1, u2; h) h 2 2 i 1 h 2h11 12 þ122 h2 1 pffiffiffiffiffiffiffiffi exp 1 2ð1h 2 2 Þ 1h
Plackett
n
s 1
11 ¼ U ðu1 Þ; 12 ¼ U ðu2 Þ
h½1 þ ðh 1Þðu1 þ u2 2u1 u2 Þ 2
½1 þ ðh 1Þðu1 þ u2 Þ 4u1 u2 hðh 1Þ
4
o32
2
No.16
f½expðhÞ 1 þ ½expðhu1 Þ 1½expðhu2 Þ 1g h i2 1
1 h þ1 1 þ uh2 S2 S1 u1 þ u2 1 h u11 þ u12 1 2 1 þ u2 1
[1, 1]
p
h½expðhÞ 1 exp½hðu1 þ u2 Þ
Frank
Range of h
2 arcsin h
1
R1 R1 0
1 þ 4h 4
0
h R 1 h
t h 0 expðtÞ1 dt
R1 R1 0
Cðu1 ; u2 ; hÞdCðu1 ; u2 ; hÞ 1
0
i 1
Cðu1 ; u2 ; hÞdCðu1 ; u2 ; hÞ 1
(0, 1)n{1}
(1, 1)n{0} [0, 1)
2
h i2 þ 4h S ¼ u1 þ u2 1 h u11 þ u12 1 Note: U1 denotes the inverse of standard normal distribution function; C (u1, u2; h) is the copula function.
Table 5 AIC scores associated with various copulas for modeling the bivariate distribution of (a, b) using the measured data. Gaussian
Plackett
Frank
No.16
Best-fit copula
26.26 39.29
23.41
22.23
16.09
Gaussian
B-NC
44.92
40.45
40.55
Plackett
D-C
83.14
93.79
93.43
89.26
Plackett
B-C
72.42
85.62
79.21
81.45
Plackett
D-NC
Note: The AIC scores are bold and underlined if the corresponding copulas are preferred.
Gaussian, Plackett, Frank and No.16 copulas [10] are examined in this study. The copula density functions and the copula parameters h are listed in Table 4. The values of h underlying the four copulas can be determined using Eq. (4), which are listed in Table 4. The Gaussian copula is an elliptical copula. The Plackett copula is a member of the Plackett copula family. The Frank and No.16 copulas are commonly used Archimedean copulas. In addition, all the four copulas can describe negative dependences, and the values of the correlation coefficients can approach 1. These features are suitable for describing the strongly negative correlations between a and b. Similarly, the AIC or BIC is used to identify the best-fit copula underlying the measured data. Since the considered four copulas have the same number of copula parameters, i.e. k1 = 1, the identification results using AIC and BIC are the same. Thus, only the results obtained from AIC are presented herein. Table 5 summarizes the AIC scores associated with the four
D.-Q. Li et al. / Applied Mathematical Modelling 39 (2015) 5310–5326
5317
copulas for various data sets of (a, b). The Plackett copula is the best-fit copula in most cases. The often used Gaussian copula is the best-fit copula only for D-NC data set. In the context of this study, the AIC scores for the four copulas and best-fit copulas for the four data sets in Table 5 are referred to as the AIC scores and best-fit copulas from the original data sets. 4.3. Variation in distribution parameters using bootstrapping Having obtained the Ns sets of bootstrap samples, the sample statistics such as sample mean, sample SD and sample Kendall rank correlation coefficient are calculated for each bootstrap sample set like the original data set in Section 4.1. Then, the set of Ns bootstrap estimates of sample statistics defines an empirical distribution of these statistics. In this study, the kernel smoothing density [41] is used to fit this empirical distribution, which can be realized by the MATLAB function ‘ksdensity’. Fig. 4(a) and (b) shows the kernel smoothing density functions of sample mean and sample SD for parameter a, respectively. For comparison, the sample means and sample SDs from the original data sets are also plotted in horizontal axis labeled by vertical lines. It is evident that the bootstrap mean estimates of sample mean and sample SD match well with the sample means and sample SDs from the original data sets. This indicates that the bootstrap distributions can capture the characteristics of the original data sets, which essentially are the population samples for the bootstrap sample sets. In addition, the sample mean and sample SD derived from a small sample size exhibit large variation. The bootstrap COV estimates of sample mean are 0.100, 0.139, 0.074 and 0.098 for D-NC, B-NC, D-C and B-C data sets, respectively. The corresponding values for sample SD are 0.135, 0.108, 0.100 and 0.224. Generally, the COVs for sample SD are much larger than those for sample mean. The variation in sample mean and sample SD will lead to uncertainty in the distribution parameters underlying the four marginal distributions for parameter a. Similarly, Fig. 4(c) and (d) show the kernel smoothing density functions of sample mean and sample SD for parameter b, respectively. Like the results in Fig. 4(a) and (b), the bootstrap mean estimates of sample mean and sample SD agree well with those from the original data sets. Furthermore, large variation in sample mean and sample SD is observed because of a small sample size. For example, the bootstrap COV estimates of sample mean are 0.025, 0.038, 0.015 and 0.015 for D-NC, B-NC, D-C and B-C data sets, respectively. The corresponding values for sample SD are 0.134, 0.106, 0.072 and 0.127. The COVs for sample SD are much larger than those for sample mean. Fig. 4(e) shows the kernel smoothing density function for sample Kendall rank correlation coefficient between a and b. Like sample mean and sample SD, the bootstrap mean estimates of sample correlation coefficient agree well with those from the original data sets. The bootstrap COV estimates of sample correlation coefficient can also be obtained through bootstrapping. Similarly, significant variation in sample correlation coefficient due to a small sample size is observed. The bootstrap COV estimates of sample correlation coefficient are 0.185, 0.125, 0.058 and 0.078 for D-NC, B-NC, D-C and B-C data sets, respectively. It is clear that the COVs are inversely proportional to the sample size. The variation in sample correlation coefficient will lead to uncertainty in the copula parameters underlying the four copulas for (a, b). 4.4. Variation in AIC scores to identify distribution types using bootstrapping Section 4.3 investigates the variation in sample mean, sample SD and sample Kendall rank correlation coefficient. In this section, the variation in AIC scores for identifying the best-fit marginal distributions and copulas is further studied. Based on the Ns sets of bootstrap samples, the AIC scores associated with the selected marginal distributions and copulas are obtained for each bootstrap sample set. Then, the set of Ns bootstrap estimates of the AIC scores defines an empirical distribution of these values. Likewise, the kernel smoothing density is adopted to fit this empirical distribution. Figs. 5 and 6 present the probability density functions of the AIC scores associated with the four marginal distributions for parameters a and b, respectively. For comparison, the AIC scores calculated from the original data sets are also plotted in horizontal axis labeled by vertical lines. Note that the bootstrap mean estimates of the AIC scores match well with those from the original data sets. The AIC scores associated with the four marginal distributions also exhibit large variation. For example, the bootstrap COV estimates of the AIC scores in Fig. 5(d) are 0.098, 0.138, 0.077 and 0.078 for TruncNormal, Lognormal, TruncGumbel and Weibull distributions, respectively. These values are significantly increased to 0.293, 0.379, 0.925 and 0.280 in Fig. 6(b). The variation in the AIC scores will lead to uncertainty in the identification results of the best-fit marginal distributions for parameters a and b, which is discussed below. As mentioned earlier, the best-fit marginal distribution can be determined from the AIC scores associated with the four candidate marginal distributions for each bootstrap sample set. Such a process produces Ns best-fit marginal distributions corresponding to Ns bootstrap sample sets. To quantify the variation in identification results, the number of times that each candidate marginal distribution is identified as the best-fit distribution is summarized in Table 6. Note that no one distribution can be identified as the best-fit distribution with a probability of one due to the small sample size underlying the four data sets for (a, b). In most cases, the best-fit marginal distribution for parameter a is Weibull distribution except for B-C data set. For parameter b, the best-fit marginal distribution is also Weibull distribution except for D-NC data set. These results are consistent with the identification results from the original data sets, which indicate that the bootstrap method can preserve the main properties of the original data sets. By bootstrapping, the possibilities being the best-fit marginal distribution associated with the four distributions are obtained. For example, the best-fit marginal distribution corresponding to the maximum probability of 0.3560 (i.e., 3560/10,000) is Lognormal distribution for parameter b for
5318
D.-Q. Li et al. / Applied Mathematical Modelling 39 (2015) 5310–5326
2.0
D-NC (N = 28) B-NC (N = 30) D-C (N = 59) B-C (N = 53)
1.5
Probability density function
Probability density function
2.0
1.0
0.5
0.0
1
3
5
7
1.5
1.0
0.5
0.0
9
D-NC (N = 28) B-NC (N = 30) D-C (N = 59) B-C (N = 53)
0
1
2
Sample mean
(a) Sample mean of parameter a 40
4
5
(b) Sample SD of parameter a 70
D-NC (N = 28) B-NC (N = 30) D-C (N = 59) B-C (N = 53)
30
20
10
D-NC (N = 28) B-NC (N = 30) D-C (N = 59) B-C (N = 53)
60
Probability density function
Probability density function
3
Sample SD
50 40 30 20 10
0 0.6
0.7
0.8
0.9
1.0
0 0.05
0.10
0.15
Sample mean
(c) Sample mean of parameter b
0.25
(d) Sample SD of parameter b
10
D-NC (N = 28) B-NC (N = 30) D-C (N = 59) B-C (N = 53)
8
Probability density function
0.20
Sample SD
6
4
2
0 -1.0
-0.8
-0.6
-0.4
-0.2
0.0
Sample correlation coefficient
(e) Sample correlation coefficient between a and b Fig. 4. Probability density functions of the sample statistics for different piles.
D-NC data set. This probability is still below 50%. Such results indicate that there exists a significant uncertainty in the identified best-fit marginal distribution resulting from a small sample size, which should be properly taken into consideration to achieve a realistic reliability evaluation of piles, as illustrated in Section 4.5.
5319
D.-Q. Li et al. / Applied Mathematical Modelling 39 (2015) 5310–5326
0.07
0.05 0.04 0.03 0.02 0.01 0.00 100
TruncNormal Lognormal TruncGumbel Weibull
0.06
Probability density function
0.06
Probability density function
0.07
TruncNormal Lognormal TruncGumbel Weibull
0.05 0.04 0.03 0.02 0.01
120
140
160
180
0.00 100
200
150
AIC score
0.03
Probability density function
Probability density function
0.04
TruncNormal Lognormal TruncGumbel Weibull
0.02
0.01
220
240
260
300
350
(b) B-NC
0.04
200
250
AIC score
(a) D-NC
0.00 180
200
280
300
320
TruncNormal Lognormal TruncGumbel Weibull
0.03
0.02
0.01
0.00 130
180
230
280
AIC score
AIC score
(c) D-C
(d) B-C
330
380
Fig. 5. Probability density functions of the AIC scores associated with various marginal distributions for parameter a.
The probability density functions of the AIC scores associated with the four copulas are plotted in Fig. 7. Like the results in Fig. 6, the bootstrap mean estimates of the AIC scores match well with those from the original data sets. The AIC scores associated with various copulas exhibit large variation. For example, the bootstrap COV estimates of the AIC scores in Fig. 7(a) are 0.398, 0.410, 0.424 and 0.654 for Gaussian, Plackett, Frank and No.16 copulas, respectively. Such a variation in the AIC scores will lead to uncertainty in the identification results of the best-fit copulas for (a, b). Similarly, the best-fit copula can be determined from the AIC scores associated with the four candidate copulas for each bootstrap sample set. This results in Ns best-fit copulas for Ns bootstrap sample sets. Table 7 presents the number of times that each candidate copula is identified as the best-fit copula. It is observed that no one copula can be identified as the bestfit copula with a probability of one. The copulas with the maximum number of times being the best-fit copula are Gaussian, Plackett, Frank and Plackett copulas for D-NC, B-NC, D-C and B-C data sets, respectively. These results agree well with the identification results from the original data sets in most cases. However, for D-C data set, the best-fit copula is Plackett from the original data set. This may be attributed to the fact that the AIC scores for Frank and Plackett copulas calculated from the original data set is quite similar, as shown in Table 5. An unreliable judgment of the best-fit copula is made without consideration of uncertainty due to small sample size. In this aspect, the bootstrap method shows clear advantage over the method only based on the AIC scores from the original data set. In summary, the uncertainty in the best-fit copula for each candidate copula should also be properly incorporated into reliability evaluation of piles, as illustrated in Section 4.5. 4.5. Practical application for serviceability limit state reliability analysis of piles In the previous sections, a bootstrap method is used to characterize the uncertainty in the probabilistic models of parameters (a, b). Both uncertainties in the best-fit marginal distributions and best-fit copulas are systematically characterized using the bootstrap method. This section will further investigate the effect of the uncertainty in the probabilistic models of (a, b) on the computed serviceability limit state reliability of piles.
5320
D.-Q. Li et al. / Applied Mathematical Modelling 39 (2015) 5310–5326
0.07
TruncNormal Lognormal TruncGumbel Weibull
0.06
Probability density function
0.06
Probability density function
0.07
TruncNormal Lognormal TruncGumbel Weibull
0.05 0.04 0.03 0.02 0.01
0.05 0.04 0.03 0.02 0.01
0.00 -90
-70
-50
-30
0.00 -60
-10
-40
-20
AIC score
(a) D-NC
0.03
0.02
0.01
-120
-100
40
-80
-60
TruncNormal Lognormal TruncGumbel Weibull
0.04
Probability density function
0.04
Probability density function
0.05
TruncNormal Lognormal TruncGumbel Weibull
-140
20
(b) B-NC
0.05
0.00 -160
0
AIC score
-40
0.03
0.02
0.01
0.00 -160
-80
0
AIC score
AIC score
(c) D-C
(d) B-C
80
160
Fig. 6. Probability density functions of the AIC scores associated with various marginal distributions for parameter b.
Table 6 Numbers of times identified as the best-fit marginal distribution based on 10,000 bootstrap sample sets. TruncNormal
Lognormal
TruncGumbel
Weibull
a
b
a
b
a
b
a
b
D-NC
1131
2499
2129
1539
2774
5201
1167
B-NC
2415
119
16
3560 87
384
4
7185
9790
D-C
47
2828
1600
389
2365
9
B-C
644
1402
2964
287
5504
33
5988 888
8278
6774
Note: The numbers of times are bold and underlined if they are maximum among the four marginal distributions.
4.5.1. Performance function for reliability of piles In limit state design of piles, the ultimate limit state (ULS) (dealing with pile capacity) and the serviceability limit state (SLS) (dealing with pile settlement) are typically considered separately [42]. For the SLS, it is convenient to capture the uncertainties in two hyperbolic curve-fitting parameters for load-settlement curves as discussed previously. Thus, the probabilistic models of (a, b) developed in this study are used for SLS reliability analysis of piles. The SLS occurs when the pile head settlement y is equal to the allowable settlement ya. The pile exceeds serviceability if y > ya. Conversely, the pile is satisfactory if y < ya. These three situations can be described concisely by the following performance function
g ¼ ya yðQÞ;
ð17Þ
5321
D.-Q. Li et al. / Applied Mathematical Modelling 39 (2015) 5310–5326
0.05
Gaussian copula Plackett copula Frank copula No.16 copula
0.04
0.03
0.02
0.03
0.02
0.01
0.01
0.00 -60
-40
-20
0
-40
-20
0
(a) D-NC
(b) B-NC 0.04
Probability density function
0.03
0.02
0.01
-120
-60
AIC score
Gaussian copula Plackett copula Frank copula No.16 copula
0.00 -150
0.00 -80
20
AIC score
0.04
Probability density function
Gaussian copula Plackett copula Frank copula No.16 copula
0.04
Probability density function
Probability density function
0.05
-90
-60
-30
20
40
Gaussian copula Plackett copula Frank copula No.16 copula
0.03
0.02
0.01
0.00 -150
0
-120
-90
-60
AIC score
AIC score
(c) D-C
(d) B-C
-30
0
Fig. 7. Probability density functions of the AIC scores associated with various copulas for different piles.
Table 7 Numbers of times identified as the best-fit copula based on 10,000 bootstrap sample sets. Gaussian
Plackett
Frank
No.16
1056
6
B-NC
7832 2786
1106
166
1529
4599 1787
2449
D-C
70
B-C
1073
5216
6614 2295
D-NC
1416
Note: The numbers of times are bold and underlined if they are maximum among the four copulas.
where y(Q) is the settlement for a given applied load Q. If the load corresponding to ya is Qa, an alternate expression of the performance function can be obtained as follows
g ¼ Q a ðya Þ Q :
ð18Þ
For convenience, the performance function shown in Eq. (18) is adopted for SLS reliability analysis of piles. The corresponding probability of failure, pf, can be expressed as
pf ¼ PðQ a ðya Þ < Q Þ:
ð19Þ
Substituting Eq. (16) into Eq. (19) yields
pf ¼ P
ya Q