Stoch Environ Res Risk Assess (2013) 27:1039–1054 DOI 10.1007/s00477-012-0641-6
ORIGINAL PAPER
A new bivariate Gamma distribution generated from functional scale parameter with application to drought data Muhammad Mohsin • Albrecht Gebhardt Ju¨rgen Pilz • Gunter Spo¨ck
•
Published online: 5 September 2012 Ó Springer-Verlag 2012
Abstract Univariate and bivariate Gamma distributions are among the most widely used distributions in hydrological statistical modeling and applications. This article presents the construction of a new bivariate Gamma distribution which is generated from the functional scale parameter. The utilization of the proposed bivariate Gamma distribution for drought modeling is described by deriving the exact distribution of the inter-arrival time and the proportion of drought along with their moments, assuming that both the lengths of drought duration (X) and non-drought duration (Y) follow this bivariate Gamma distribution. The model parameters of this distribution are estimated by maximum likelihood method and an objective Bayesian analysis using Jeffreys prior and Markov Chain Monte Carlo method. These methods are applied to a real drought dataset from the State of Colorado, USA. Keywords Gamma distribution Maximum likelihood estimates Likelihood ratio test Bayesian estimates Jeffreys prior Drought modeling 1 Introduction The man-induced changes and the erratic behaviour of the climate are the main threats to the water resource management M. Mohsin (&) A. Gebhardt J. Pilz G. Spo¨ck Department of Statistics, Alpen-Adria University, 9020 Klagenfurt, Austria e-mail:
[email protected] A. Gebhardt e-mail:
[email protected] J. Pilz e-mail:
[email protected] G. Spo¨ck e-mail:
[email protected]
in the modern age. Rapid increase in the world population, deforestation, river regularization, soil sealing, low precipitation, increase in temperature, less or no rain and irregular distribution of resources cause many problems of which water deficit or drought is a major one. Whatever might be the reasons of water deficit; drought can have serious health, social, economic and political impacts like poverty, hunger, thirst, wildfire, disease, migration, land degradation, erosion etc. and thus is becoming a big challenge in future. It occurs all around the equator in hot and dry climatic regions and sustains for a long period having far-reaching impacts. Mishra and Singh (2010, 2011) express their concerns about the frequently occurring droughts in recent years and their impacts because of the increasing demand of water and variability in hydro-meteorological variables due to climate change. They also discuss different concepts of drought, various methodologies used for drought modeling such as drought forecasting, probability based modeling, spatio-temporal analysis, use of Global Climate Models (GCMs) for drought scenarios, land data assimilation systems for drought modeling and drought planning. Significant improvement has been observed in drought modeling over the time. In this paper, we attempt to provide a new and flexible model for the inter-arrival time and the proportion of drought periods. Since the scale parameter is customized functionally therefore our proposed model is preferred to the other models. This merit of the model provides an extensive application as it can be modified according to the given circumstances. Many authors use these types of models in different areas of hydrology. Hallack-Alegria and Watkins (2007) conduct a meteorological drought intensity–duration–frequency analysis based on annual and warm season precipitation records to estimate the return period for different years. Porporato et al. (2001) suggest a measure of
123
1040
vegetation water stress which combines mean intensity and the duration of the drought period i.e. soil water deficit period. Dupuis (2010) develops a new model for dry period inter-arrival times and analyzes different duration characteristics of dry and wet periods based on monthly Palmer drought severity indices. Nadarajah (2008, 2009a) proposes bivariate Pareto and bivariate F-distributions to model drought data by deriving the exact distribution of the interarrival time of drought events, the magnitude of drought events and the proportion of drought events respectively. Kim et al. (2006) propose a nonparametric method to estimate the joint distribution of drought properties. Song and Singh (2010) use a trivariate Plackett copula to derive the joint probability distribution of drought duration, severity and inter-arrival time where the drought duration and interarrival time follow the Weibull distribution each and the drought severity follows the Gamma distribution based on stream flow data. Hao and Singh (2011) propose a method based on entropy theory to construct a bivariate distribution that is capable of modeling drought duration and severity with different marginal distributions. Different forms of bivariate Gamma distributions are extensively used to model different hydrological events e.g. flood, rain, precipitation, drought etc. Clarke (1980) applies a bivariate Gamma distribution to an extension of stream flow records that are correlated with longer records of precipitation. Yue (2001) uses a bivariate Gamma distribution in multi flood frequency analysis. Nadarajah (2007) uses a bivariate Gamma model for drought data and derives the distribution of inter-arrival time and proportion. Yue et al. (2001) review various bivariate Gamma distribution models which are constructed from the Gamma marginals applied frequently in multivariate hydrological events. Cheng et al. (2010) demonstrate a frequency factor based approach for stochastic simulation of a bivariate Gamma distribution which is capable of generating random sample pairs describing marginal densities of random variables as well as their correlation coefficients. Some other types of univariate and bivariate Gamma distributions are used in the field of hydrology as well, see Husak et al. (2007) Prekopa and Szantai (1978), Nadarajah and Gupta (2006a, b), Nadarajah and Kotz (2006), Loaiciga and Leipnik (2005). The material of this paper is arranged as follows: The construction of the new bivariate Gamma distribution is presented in Sect. 2. Our main interest here is to model the distribution of inter-arrival time of drought X ? Y and of the proportion of drought duration to the total length. The explicit expressions for the probability density function and the moments of S ¼ X þ Y and W ¼ X=ðX þ Y Þ are derived in Sects. 3 and 4. An expression for the estimation of the model parameters by maximum likelihood (ML) is provided in Sect. 5. The estimation of the model parameters by Bayesian and ML methods using drought data for the Colorado state along
123
Stoch Environ Res Risk Assess (2013) 27:1039–1054
with their goodness of fit is presented in Sect. 6 and some concluding remarks are stated in Sect. 7. In this paper, analytical expressions involve some special functions such as the gamma function Cð:Þ, the modified Bessel function of the second kind Ka ð:Þ of order a and the confluent hypergeometric function Uð: Þ with parameters a and b, which are defined as follows: Z1 CðaÞ ¼ wa1 expðwÞ dw: 0
pffiffiffi a Z1 1 px ðw2 1Þa2 expðxwÞ dw: Ka ðxÞ ¼ a 2 C a þ 12 1
Uðz; a; bÞ ¼
1 CðaÞ
Z1
wa1 ð1 þ wÞba1 expðzwÞ dw:
0
The details and properties of these special functions can be studied in Prudnikov et al. (1986). 2 Construction of the new bivariate Gamma distribution In this section we derive the bivariate Gamma distribution as a compound distribution of two Gamma variates. The bivariate Gamma distribution is defined below: Let the random variable X have a Gamma distribution with shape parameter a and scale parameter b. The probability density function then is: f ðx; a; bÞ ¼
ba a1 x expðbxÞ; CðaÞ
a; b [ 0; x [ 0:
ð1Þ
Suppose another random variable Y has a Gamma distribution with shape parameter c and scale parameter /ð xÞ, where /ð xÞ is some function of X. The probability density function of Y then is: f ðy; c; /ð xÞÞ ¼
ð/ð xÞÞc c1 y expð/ð xÞyÞ; CðcÞ
c; /ð xÞ [ 0; y [ 0:
ð2Þ
The bivariate Gamma distribution is defined as the compound distribution of (1) and (2). The probability density function of the new bivariate distribution is thus given as: ba ð/ð xÞÞc a1 c1 x y expððbx þ /ð xÞyÞÞ; CðaÞ CðcÞ a; b; c; /ð xÞ [ 0; x; y [ 0:
f ðx; yÞ ¼
ð3Þ
The bivariate probability distribution (3) can be used to produce several bivariate probability distributions depending on the choice of /ð xÞ. Since there is an insignificant
Stoch Environ Res Risk Assess (2013) 27:1039–1054
1041
correlation between the drought duration and non-drought duration therefore in this paper we use /ð xÞ ¼ dx in (3) to obtain the following new bivariate Gamma distribution: ba dc ac1 c1 y f ðx; yÞ ¼ x y exp bx þ d ; x CðaÞ CðcÞ a; b; c; d [ 0; x; y [ 0: ð4Þ
transformation T ¼ X and W ¼ X=ðX þ Y Þ in the proposed bivariate distribution (4) we get the joint pdf of T and W as
The marginal density of Y can be obtained from (4) by integrating it over x and is given as: Z1 ba dc c1 y y gð y Þ ¼ xac1 exp bx þ d dx: ð5Þ x C ð aÞ C ð c Þ
(9) reflects that T and W are independent where T follows the Gamma distribution and W follows some beta type distribution, see Theorem 2 below.
0
The integral in (5) can be calculated by using equation (2.3.16.1) in Prudnikov et al. (1986) yielding gð yÞ as pffiffiffiffiffiffiffiffi 2b 2 d 2 ðaþcÞ 1 gð y Þ ¼ y 2 Kac 2 bdy ; CðaÞCðcÞ ðaþcÞ
ðaþcÞ
a; b; c; d; y [ 0; ð6Þ
where Kn ð:Þ is the modified Bessel function of the second kind. The s-th moment of (6) is given as EðY s Þ ¼
q
Cða þ sÞ Cðc þ sÞ : ðb dÞs CðaÞ CðcÞ
EðX Y Þ ¼
ð7Þ
Cða þ pÞ Cða þ qÞCðc þ qÞ bpþq dq ðCðaÞÞ2 CðcÞ
dc c1 r expðdr Þ; C ð cÞ
In this section we derive the probability density function of the sum and the ratio when X and Y are distributed according to (4). Theorem 1 If X and Y are jointly distributed according to (4), then the pdf of S is given as 1 b a dc X ðbÞi aþi1 s Uðd; c; c a i þ 1Þ; CðaÞ i¼0 i!
0\s\1;
:
ð8Þ
It is observed that if R ¼ XY , then X and R are independent. The probability density function of R is given as f ðr Þ ¼
3 Probability density function of S 5 X 1 Y X and W ¼ XþY
f ðsÞ ¼
From this, the mean and variance of Y can easily be calculated. The p-th and q-th product moment of (4) is p
ba dc ð1 wÞc1 ta1 CðaÞ CðcÞ wcþ1 d ð1 w Þ exp b t ; t [ 0; 0\w\1: ð9Þ w
f ðt; wÞ ¼
c; d [ 0; r [ 0:
For simplicity, the derived variables S and W can be 1 written as S ¼ Xð1 þ RÞ and W ¼ ð1þRÞ : In addition to the proportion R ¼ XY ; the hydrologists are interested in the joint distribution of ðX; X=ðX þ Y ÞÞ which shows the dependency between the drought duration and the proportion of the drought events. Nadarajah (2009b) and Porporato et al. (2001) use it quite effectively, which adds more value to the well known result that if X is Gamma then the second component of the joint distribution of ðX; X=ðX þ Y ÞÞ follows the beta distribution. This not only highlights the dependency between the Gamma and beta distributions but also proves to be useful to study the duration and the proportion of drought. In this paper we apply the continuous bivariate Gamma distribution successfully to model the drought data where X (drought duration) and Y (non-drought duration) have positive discrete values. Applying the
ð10Þ
where Uðd; c; c a i þ 1 Þ is the confluent hypergeometric function. X Proof Using the transformation S = X ? Y and W ¼ XþY in (4), the joint pdf of S and W is given as
f ðs; wÞ ¼
ba dc sa1 wac1 ð1 wÞc1 CðaÞ CðcÞ d ð1 wÞ exp b s w : w
ð11Þ
The pdf of S can be obtained as Z1 b a dc a1 f ðsÞ ¼ s wac1 ð1 wÞc1 CðaÞCðcÞ 0 dð1 wÞ exp bsw dw: w ( ) Z1 1 X b a dc ðbswÞi c1 a1 ac1 s ¼ w ð1 wÞ CðaÞCðcÞ i! i¼0 0 dð1 wÞ exp dw: w Z1 1 b a dc X ðbÞi aþi1 s ¼ waþic1 ð1 wÞc1 CðaÞCðcÞ i¼0 i! 0 dð1 wÞ exp dw: ð12Þ w
123
1042
Stoch Environ Res Risk Assess (2013) 27:1039–1054
Substituting t ¼ w1 1 in (12), we get a
¼
c
b d CðaÞ CðcÞ
1 X ðb Þi i¼0
i!
saþi1
Z1 0
t
EðX p Y q Þ ¼
bpþq dq ðCðaÞÞ2 CðcÞ
c1
ðt þ 1Þaþi
:
The proof of (16) can be completed by using (8) in the following expression
expðd tÞ dt:
ð13Þ
Using the definition of the confluent hypergeometric function in (13), we obtain
EðSq Þ ¼ EðX þ YÞq ¼
q X q j
EðX j Þ EðY qj Þ:
j¼0
Theorem 4 If X and Y are jointly distributed according to (4), then the p-th order moments of W are given as
1 ba dc X ðbÞi aþi1 s Uðd; c; c a i þ 1 Þ; CðaÞ i¼0 i! 0\s\1:
f ðsÞ ¼
EðW p Þ ¼ dc Uðd; c; c p þ 1Þ:
Theorem 2 If X and Y are jointly distributed according to (4), then the pdf of W is given as dc ð1 wÞc1 dð1 wÞ gðwÞ ¼ exp ; w CðcÞ wcþ1
Proof
ð17Þ
From (14), we have
dc EðW Þ ¼ CðcÞ p
Z1 w
pc1
ð1 wÞ
c1
d ð1 wÞ exp dw: w
0
ð18Þ
ð14Þ
0\w\1:
Proof Using (11), the marginal density of W can be obtained as a
Cða þ pÞ Cða þ qÞCðc þ qÞ
dc EðW Þ ¼ CðcÞ
c
b d gðwÞ ¼ wac1 ð1 wÞc1 CðaÞ CðcÞ Z1 d ð1 wÞ exp sa1 expðb s wÞ ds: w
Setting t ¼ w1 1 in (18), we get p
Z1
tc1 expðd tÞ dt: ðt þ 1Þp
ð19Þ
0
ð15Þ
Using the definition of the confluent hypergeometric function gives the proof of (17).
0
Solving the integral in (15), we get gðwÞ ¼
dc ð1 wÞc1 dð1 wÞ exp ; w CðcÞ wcþ1
0\w\1:
X 4 Moments of S 5 X 1 Y and W ¼ XþY
In this section we derive the moments of S = X ? Y and X W ¼ XþY when X and Y are distributed according to (4). Theorem 3 If X and Y are jointly distributed according to (4), then the q-th moment of S is given as q X 1 q EðSq Þ ¼ q j 2 b ðCðaÞÞ CðcÞ j¼0 Cða þ jÞ Cða þ q jÞ Cðc þ q jÞ : dqj
ð16Þ
5 Estimation of the parameters by maximum likelihood method In this section we estimate the model parameters of the new bivariate Gamma distribution by using the maximum likelihood method. The log likelihood function of (4) is given as LðgÞ ¼ na ln b þ n c ln d n ln CðaÞ n ln CðcÞ n n X X þ ð a c 1Þ lnðxi Þ þ ðc 1Þ lnðyi Þ i¼1
b
n X i¼1
xi d
i¼1
n X yi i¼1
xi
ð20Þ
The partial derivates of (20) with respect to a; b; c; d and then equated to zero are given as
EðX p Y q Þ ¼ EðX pþq Rq Þ ¼ EðX pþq Þ EðRq Þ:
n X oLðgÞ ¼ n ln b n wðaÞ þ lnðxi Þ ¼ 0; oa i¼1
We know from (8) that the p-th and q-th product moment of (4) is
where wð:Þ is the first derivative of log gamma function (also called Psi or digamma function),
Proof
Since X and R are independent,
123
ð21Þ
Stoch Environ Res Risk Assess (2013) 27:1039–1054 n oLðgÞ n a X ¼ xi ¼ 0; ob b i¼1
1043
ð22Þ
An approximate ð1 nÞ100% confidence interval, e.g. for a is constructed as
n n X X oLðgÞ ¼ n ln d n wðcÞ lnðxi Þ þ lnðyi Þ ¼ 0; oc i¼1 i¼1
ð23Þ n oLðgÞ n c X yi ¼ ¼ 0: od d x i¼1 i
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ^ ^c; ^d ¼ ði; iÞ entry in½Qðg^Þ1 : r ^a; b;
ð24Þ
Solving the above nonlinear equations (21–24) numerically, ^ ^c and ^ we get estimated values ^ a; b; d. The first derivatives with respect to a and b contain only the values of xi and neither depend on yi nor on c and d: ^ are the same as Therefore the ML-estimates of ^ a and b those of the univariate Gamma distribution discussed by Johnson et al. (1994). The Fisher information matrix Qðg^Þ can be obtained by taking second derivatives of the log-likelihood function (20). The matrix Qðg^Þ with corresponding entries is given in the Appendix. Asymptotic confidence intervals can be constructed on the basis of the approximate normal distribution of the ML ^ ^c; d ^ with standard errors estimate g^ ¼ ^ a; b;
^a Z rð^aÞ; where z is the ð1 n=2Þ quantile of the standard normal distribution.
6 Application: drought data In this Section we use drought data from the state of Colorado, USA to estimate the model parameters of (4) by means of Bayesian and ML methods. The drought data is available on: http://www.ncdc.noaa.gov/oa/climate/onlineprod/drought/ xmgr.html which contains the monthly modified Palmer Drought Severity Index (PDSI) from January 1895 to December 2010. The PDSI is based on the idea of the stability between moisture supply and demand. Since PDSI is used to quantify the long-term drought conditions for a given location and time therefore it is appropriate for the current discrete dataset. Alley (1984) describes PDSI as a meteorological drought
Fig. 1 The five climate divisions of the state of Colorado
123
1044
Stoch Environ Res Risk Assess (2013) 27:1039–1054
Table 1 Drought duration and non-drought duration data for Colorado climate division 1 Case
1
Non-drought duration (months) 51
Drought duration (months)
Case
Non-drought duration (months)
Drought duration (months)
3
29
1
13
2
1
2
30
18
29
3
10
7
31
3
5
4
4
14
32
1
2
5
4
1
33
2
14
6
6
9
34
20
22
7
5
5
35
5
5
8
5
7
36
4
4
9
18
3
37
21
1
10
3
4
38
3
6
11
193
4
38
1
1 1
12
1
6
40
11
13
72
76
41
6
7
14 15
16 5
9 3
42 43
1 12
14 7
16
61
10
44
6
5
17
36
26
45
40
9
18
7
59
46
10
1
19
16
5
47
29
4
20
5
3
48
1
6
21
7
6
49
4
3
22
16
33
50
3
32
23
1
2
51
1
1
24
2
2
52
11
5
25
5
18
53
1
8
26
8
6
54
14
11
27
2
6
55
3
10
28
21
7
56
12
5
index used to measure the dryness based on the precipitation and temperature. He finds that PDSI gives the spatial and temporal representation of the historical droughts along with the historical perspective of the current weather conditions. Willeke et al. (1994) believe PDSI as the most useful
monitoring and measuring tool for soil moisture conditions and to start or end drought contingency plans. Kogan (1995) and Hu and Willson (2000) state that PDSI is widely used to monitor the drought in the United States. Palmer arbitrarily selects the classification scale varying from -6.0 to ?6.0 depending upon the moisture conditions. Alley (1984) modifies the Palmer scale from -4.0 to ?4.0 where -0.49 to ?0.49 refers the moisture condition near to normal. Yevjevich (1967) and Guerrero-Salazar and Yevjevich (1975) give the concept of statistical theory of run which proposes that drought takes place when the value of PDSI is less than 0. Therefore we take 0 as a threshold below which the values show drought conditions and above which the values show the wet conditions with relative intensities. Changing the threshold value from 0 to any other number may affect the results. The state of Colorado has five climate divisions which are numbered and named as 1 (Arkansas Drainage), 2 (Colorado Drainage), 3 (Kansas Drainage), 4 (Platte Drainage) and 5 (Rio Grande Drainage). These five climate divisions are marked by the US Bureau of Census and Climate Data Centre which can be seen from the web site: http://www.cpc.ncep.noaa.gov/products/ analysis_monitoring/regional_monitoring/CLIM_DIVS/. Figure 1 shows the geographical map of these five divisions. For illustrative purpose, the real drought data of drought duration and nondrought duration for climate division 1 is given in Table 1. The descriptive statistics of these five divisions are exhibited in Table 2. We obtain data on drought duration and non-drought duration for each climate division by using PDSI data and our focus is to 1. 2. 3.
estimate the model parameters of (4) by Bayesian and ML method. determine the distribution of inter-arrival time of drought (S) = drought duration ? non-drought duration. determine the distribution of proportion of drought (W) = drought duration/(drought duration ? nondrought duration).
Since model (4) is based on the assumption that X and Y/X are independent therefore drought duration and non-drought
Table 2 Descriptive statistics for Colorado PDSI data Climate division
Number of drought
Drought frequency (number/year)
Mean drought duration (months)
Standard deviation of drought duration (months)
Mean non-drought duration (months)
Standard deviation of non-drought duration (months)
1
56
0.483
10.125
13.555
14.732
28.449
2
55
0.474
12.127
13.916
13.127
24.077
3
60
0.517
9.550
15.201
13.150
23.741
4
45
0.380
11.244
15.014
19.222
38.766
5
42
0.362
16.191
30.118
16.976
27.175
123
Stoch Environ Res Risk Assess (2013) 27:1039–1054
1045
Table 3 Pearson’s Correlation test for independence for drought duration and non-drought duration/drought duration Climate division
p value
1
0.184
2
0.148
3
0.161
4
0.237
5
0.221
duration are used for the analysis. In the general spectrum one of the physical elucidations of this assumption is that the relative ratio of drought duration and non-drought duration is independent of the drought duration. This assumption is verified by applying the Pearson’s correlation test for independence between the observed values of drought duration and non-drought duration/drought duration. Table 3 displays the p values of the Pearson’s correlation test for independence. There is a clear evidence of no association between X and R for all the five climate divisions of Colorado state.
6.1 Bayesian estimation of model parameters using MCMC Bayesian methodology combines prior information with the sample information to make inference. The Bayesian posterior density is proportional to the prior density times the likelihood. The objective Bayesian approach in this paper uses the so-called non-informative Jeffreys prior. A Markov Chain Monte Carlo (MCMC) simulation with Gaussian proposal distribution is used to estimate the model parameters from the posterior distribution implemented in the R-package MCMCpack proposed by Martin et al. (2011). We use 520000 MCMC iterations with a burn-in of 20000 values and in the remaining chain only every 100th sample is retained in order to get 5000 independent samples. Convergence is monitored by the trace plots for each parameter. The posterior distributions of the parameters are approximately Gaussian. The variance– covariance matrix V for the Gaussian proposal distribution selects the optimum value from the ‘‘tune’’ parameter. The variance–covariance matrix is calculated as:
2000
3000
4000
0.0
1.5 0.5
5000
0.5
1.0
1.5
2.0
alpha
alpha
trace plot
posterior density Density
0
0.05
10
1000
0.20
0
1.5
posterior density Density
trace plot
2000
3000
4000
5000
0.05
0.10
0.15
0.20
beta
trace plot
posterior density
0.8
0
0.4
4
beta
2
1000
Density
0
2000
3000
4000
5000
0.4
0.6
0.8
gamma
trace plot
posterior density Density
0
0.10
1.0
8
gamma
4
1000
0.30
0
0
1000
2000
3000
delta
4000
5000
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
delta
Fig. 2 Shapes of the posterior distributions of the model parameters a, b, c and d for climate division 1
123
1046
Stoch Environ Res Risk Assess (2013) 27:1039–1054
Table 4 Summary statistics of the posterior distributions for a, b, c and d Parameters
Mean
Median
Standard deviation
25th Percentile
75th percentile
95 % credible interval
For climate division 1 a
1.124
1.115
0.189
0.996
1.240
(0.797, 1.510)
b
0.113
0.111
0.024
0.096
0.127
(0.072, 0.162)
c
0.586
0.580
0.093
0.524
0.645
(0.425, 0.774)
d
0.189
0.186
0.045
0.159
0.216
(0.113, 0.281)
For climate division 2 a
1.215
1.201
0.207
1.071
1.341
(0.851, 1.652)
b
0.102
0.010
0.021
0.087
0.115
(0.066, 0.147)
c
0.484
0.479
0.076
0.431
0.531
(0.351, 0.643)
d
0.155
0.152
0.039
0.128
0.178
(0.091, 0.238)
For climate division 3 a 0.875
0.869
0.139
0.782
0.959
(0.630, 1.156)
b
0.093
0.092
0.020
0.080
0.106
(0.059, 0.135)
c
0.483
0.480
0.073
0.435
0.528
(0.355, 0.632)
d
0.126
0.124
0.030
0.105
0.145
(0.075, 0.188)
For climate division 4 a
1.123
1.105
0.211
0.978
1.253
(0.770, 1.558)
b
0.102
0.010
0.024
0.085
0.116
(0.061, 0.153)
c
0.454
0.452
0.078
0.401
0.503
(0.315, 0.617)
d
0.100
0.098
0.028
0.081
0.117
(0.054, 0.159)
For climate division 5 a
0.682
0.673
0.127
0.596
0.761
(0.462, 0.950)
b
0.044
0.043
0.011
0.036
0.051
(0.025, 0.067)
c
0.588
0.581
0.108
0.512
0.656
(0.399, 0.812)
d
0.184
0.180
0.050
0.150
0.215
(0.101, 0.286)
V ¼ T ðQÞ1 T:
6.2 ML-estimation of model parameters
where T, the diagonal positive definite matrix, is formed by the ‘‘tune’’ and Q is the approximate Hessian of the function. For all estimations we tune the variance–covariance matrix V at 1.5. Here, we show only the shape of the posterior distributions of the model parameters along with trace plots for the climate division 1. The shape of the posterior distributions of parameters and trace plots for the other four divisions also follow the Gaussian distribution and chains in the trace plots seem to be quite stable and well mixing. The acceptance rates of the MCMC Metropolis Hasting sampling algorithm for all the five climate divisions are approximately 21 % each. Figure 2 shows the shape of the posterior distributions of the parameters along with trace plots for the climate division 1. Table 4 shows the summary statistics of the posterior distributions for the different parameters of the proposed bivariate distribution (4) using Jeffreys’s prior.
The maximum likelihood estimates with the standard errors based on the inverse information matrix and the 95 % confidence intervals (based on the normal approximation) of the model parameters along with negative logarithm of the maximized likelihood (NL) values for the drought duration (X) and non-drought duration (Y) for five climate divisions are exhibited in Table 5. The Newton–Raphson algorithm implemented in R package maxLik (Henningsen and Toomet (2011)) is used to maximize the likelihood. It is observed that MCMC estimated parameters and their standard deviations are quite similar to those estimated by ML method which reflects the stability of the estimates of the model parameters. We note that although the standard errors of ML estimates are considerably small, their estimates seem to be different across the divisions. So it is important to check whether the distribution of drought duration and nondrought duration given in (4) is the same across the five
123
Stoch Environ Res Risk Assess (2013) 27:1039–1054
1047
Table 5 Estimated parameters by ML method Estimates
Climate divisions 1
2
3
4
5
^a ^ b
1.071
1.160
0.839
1.063
0.645
0.106
0.096
0.088
0.095
0.040
^c ^d
0.563
0.465
0.466
0.434
0.558
0.177
0.143
0.117
0.091
0.168
SE(^a) ^ SE(b)
0.179
0.197
0.133
0.198
0.119
0.022
0.020
0.019
0.022
0.011
SE(^c) ^ SE(d)
0.089
0.073
0.070
0.074
0.102
0.042
0.036
0.028
0.026
0.046
Lower 95 % C.I. for a
0.720
0.774
0.578
0.675
0.412
Upper 95 % C.I. for a
1.422
1.546
1.100
1.451
0.878
Lower 95 % C.I. for b
0.063
0.057
0.051
0.052
0.018
Upper 95 % C.I. for b
0.149
0.135
0.125
0.138
0.062
Lower 95 % C.I. for c
0.389
0.322
0.329
0.289
0.358
Upper 95 % C.I. for c
0.737
0.608
0.603
0.579
0.758
Lower 95 % C.I. for d
0.095
0.072
0.062
0.040
0.078
Upper 95 % C.I. for d
0.259
0.214
0.172
0.142
0.258
-398.318
-406.924
-414.365
-338.438
-319.366
NL
climate divisions or not. If this assumption is fulfilled then it will lead us to pool the droughts data for all the five divisions. Since the distribution (4) is based on four parameters we state the null and alternative hypotheses as H0 :
a1 ¼ a2 ¼ a3 ¼ a4 ¼ a5 ¼ a; b1 ¼ b2 ¼ b3 ¼ b4 ¼ b5 ¼ b; c1 ¼ c2 ¼ c3 ¼ c4 ¼ c5 ¼ c; d1 ¼ d2 ¼ d3 ¼ d4 ¼ d5 ¼ d
versus not all ai s are equal, not all bi s are equal, H1: not all ci s are equal, not all di s are equal For this purpose we carry out likelihood ratio tests (LRT). First we combine all the droughts data of all the five divisions. Under this hypothesis we find the ML estimates ^ ¼ 0:078; ^c ¼ 0:487 and ^ ^ a ¼ 0:901; b d ¼ 0:132 with standard ^ ¼ 0:008; SEð^cÞ ¼ 0:035 and errors SEð^ aÞ ¼ 0:069; SE b SE ^ d ¼ 0:015 for the joint distribution of drought duration and non-drought duration, respectively. The value of the negative logarithm of the maximized likelihood (NL) for all five divisions is 1887.508. The NL value without any restriction under null hypothesis is 1877.411 (the sum of the last row of Table 5). Under H0, the LRT, which is twice the difference between these two NL values, has an approximate Chi square distribution with 8 degrees of freedom. Thus, 2 ð1887:508 1877:411Þ ¼ 20:194 [ v28;0:05 ¼ 15:507; and therefore we reject the null hypotheses that five divisions of Colorado are homogeneous
Fig. 3 Contour plot of the fitted pdf(4) for drought duration and nondrought duration for the climate division 1
with respect to the joint distribution of drought duration and non-drought duration. Hence, we conclude that there is a significant difference among the five divisions for the drought data, so we cannot pool them. The contour plot of (4) for the drought duration and non-drought duration for the climate division 1 using its ML estimates is presented in Fig. Fig. 3. Next we check whether (4) provides an adequate fit for the drought data for all the five climate divisions or not.
123
1048
Stoch Environ Res Risk Assess (2013) 27:1039–1054
Fig. 4 Probability plots of the marginal X and Y density of (4) for the observed values of the drought duration (X) and non-drought duration (Y) dataset for climate a division 1, b division 2, c division 3, d division 4, e division 5
This can be checked by probability plots. In probability plots, the observed probability is plotted against the predicted probability for the fitted model. To check the goodness of fit of (4) for the bivariate distribution of drought duration (X) and non-drought duration (Y) we draw two probability plots for X and Y, respectively. This can be done by plotting FX ðxi Þ versus ði 0:375Þ=ðn þ
123
0:25Þ as recommended by Blom (1958) and Chambers et al. (1983), where FX ð:Þ represents the marginal cumulative distribution function (CDF) of X corresponding to (4) and xi are the sorted values in ascending order of X. Similarly, we check the goodness of fit of (4) for Y by plotting FY ðyi Þ versus ði 0:375Þ=ðn þ 0:25Þ, where FY ð:Þ represents the marginal CDF of Y corresponding to (4) and
Stoch Environ Res Risk Assess (2013) 27:1039–1054
1049
Fig. 5 Fitted values of the pdf (10) of S = X ? Y and (14) of W = X/(X ? Y), where X is drought duration and Y is non-drought duration for the climate a division 1, b division 2, c division 3, d division 4, e division 5
123
1050
Stoch Environ Res Risk Assess (2013) 27:1039–1054
Table 6 Model comparison Divisions
Our proposed model
Nadarajah’s (2009b) model
NL
AIC
NL
AIC
1
-398.318
-788.636
-392.804
-779.608
2
-406.924
-805.848
-388.401
-770.802
3
-414.365
-820.73
-405.240
-804.480
4
-338.438
-668.876
-332.033
-658.066
5
-319.366
-630.732
-314.025
-622.05
yi are the sorted values in ascending order of Y. However, the CDF of Y is not in closed form so we use numeric solvers in order to get the quantiles for probability plot for Y. Figure 4a–e represent the probability plots for X and Y for five climate divisions, respectively. The fit appears to be reasonably good for drought duration X where X follows the standard Gamma distribution. However the fit for the non-drought duration Y appears to be questionable and the respective results should be used cautiously. The reason might be that the non-drought duration datasets have relatively more variations than the drought duration datasets. Another possible reason might be the corresponding function is complicated to some extent. Further, the probability density function (pdf) of interarrival time S and proportion of drought W is fitted by using the Eqs. (10) and (14) taking their respective ML estimates for all the five divisions of Colorado. These fitted distributions are compared to the histograms of
inter-arrival time S and proportion of drought W of these five divisions. The fitted pdf for inter-arrival time S approximately follows the general pattern of the histogram and looks reasonably well fitted. However the fitted pdf for proportion W is debatable as it somewhat deviates from the general pattern of histogram and therefore suggests to use the corresponding results cautiously. Figure 5a–e show the histograms along with their respective fitted pdfs for five divisions. Further our proposed model can be compared with other existing bivariate models. Of these existing models, Nadarajah (2009b) uses a three parameter bivariate distribution to model the drought duration and non-drought duration. We apply and compare our model with Nadarajah’s (2009b) model on the basis of negative log likelihood value (NL) and Akaike’s information criterion (AIC) using the same drought data of the Colorado State. The results of this comparison are given in Table 6.
Table 7 Return period for (a) drought duration, (b) non drought duration, (c) inter-arrival time Divisions
3-Years
5-Years
1
3.958
9.054
2
3.767
8.869
3
4.647
4
1.478
5
10-Years
25-Years
50-Years
100-Years
15.831
24.692
31.356
38.000
15.649
24.511
31.176
37.820
9.724
16.492
25.347
32.009
38.651
6.679
13.497
22.379
29.053
35.702
0.956
6.196
13.023
21.911
28.587
35.238
1
2.834
14.018
41.513
98.836
157.776
230.001
2
2.577
13.467
40.584
97.422
156.001
227.867
3
3.860
16.099
44.954
104.030
164.277
237.801
4
0.427
7.779
30.431
81.553
135.875
203.517
5
0.186
6.727
28.389
78.252
131.637
198.339
(c) 1
9.453
25.182
55.878
117.286
179.513
255.027
2
8.950
24.516
54.874
115.798
177.633
252.803
3
11.306
27.654
59.606
122.798
186.323
263.153
4
3.314
17.235
43.795
98.894
156.457
227.393
5
2.117
15.757
41.547
95.370
151.989
221.994
(a)
(b)
123
Stoch Environ Res Risk Assess (2013) 27:1039–1054 Table 8 Estimated percentile for (a) inter-arrival time and (b) the proportion of drought for five climate divisions of Colorado
p
Inter-arrival time for division 1
1051
Inter-arrival time for division 2
Inter-arrival time for division 3
Inter-arrival time for division 4
Inter-arrival time for division 5
(a) 0.05
1.344
1.781
0.706
1.517
0.500
0.10
2.681
3.401
1.659
3.052
1.486
0.15
4.094
5.066
2.782
4.693
2.840
0.20
5.613
6.829
4.069
6.477
4.542
0.25
7.262
8.725
5.534
8.439
6.601
0.30
9.073
10.794
7.201
10.625
9.047
0.35
11.082
13.079
9.104
13.088
11.928
0.40
13.334
15.638
11.292
15.903
15.313
0.45
15.893
18.545
13.834
19.171
19.301
0.50 0.55
18.841 22.295
21.903 25.857
16.823 20.399
23.033 27.698
24.029 29.694
0.60
26.426
30.619
24.766
33.472
36.585
0.65
31.485
36.514
30.239
40.828
45.141
0.70
37.873
44.055
37.328
50.512
56.064
0.75
46.243
54.102
46.903
63.745
70.544
0.80
57.761
68.173
60.521
82.649
90.777
0.85
74.726
89.252
81.370
111.413
121.301
0.90
102.690
124.538
117.152
160.160
173.404
0.95
161.250
199.529
195.724
265.589
278.965
p
Proportion for division 1
Proportion for division 2
Proportion for division 3
Proportion for division 4
Proportion for division 5
0.05
0.012
0.018
0.031
0.051
0.014
0.10
0.088
0.099
0.134
0.170
0.096
0.15
0.182
0.186
0.231
0.268
0.192
0.20
0.267
0.264
0.311
0.344
0.278
0.25
0.342
0.330
0.377
0.407
0.352
0.30
0.407
0.388
0.433
0.459
0.417
0.35
0.464
0.438
0.482
0.504
0.473
0.40
0.514
0.483
0.525
0.543
0.523
0.45
0.560
0.525
0.564
0.579
0.567
0.50
0.602
0.563
0.599
0.612
0.608
0.55
0.640
0.599
0.633
0.642
0.646
0.60
0.676
0.633
0.664
0.672
0.681
0.65
0.711
0.665
0.695
0.700
0.715
0.70
0.744
0.698
0.724
0.727
0.748
0.75 0.80
0.776 0.809
0.730 0.763
0.754 0.784
0.755 0.783
0.779 0.811
0.85
0.842
0.780
0.816
0.813
0.844
0.90
0.878
0.836
0.851
0.847
0.879
0.95
0.919
0.884
0.895
0.889
0.920
(b)
Table 6 highlights that both criteria provide the minimum values for our proposed model as compared to Nadarajah’s (2009b) model for all five divisions of Colorado State reflecting that our proposed model gives better fit than Nadarajah’s (2009b) model.
Estimation of the return period of the drought events is another important feature of drought analysis. Bonaccorso et al. (2003) argue that return period is an important statistic to characterize the drought and provide information about improvements in the water system
123
1052
Stoch Environ Res Risk Assess (2013) 27:1039–1054
management under dry conditions. The return period is defined in different ways depending upon its application. Lloyd (1970), Loaicigica and Marin˜o (1991) and Shiau and Shen (2001) define the return period as the average elapsed time between the occurrence of the critical events. On the other hand, Vogel (1987), Bras (1990) and Douglas et al. (2002) define the return period as the average number of trials required to the first occurrence of the critical event. Haan (1977) considers the return period of a variable as a standard criterion in water resources system planning and management. In this paper, the return period defined in terms of drought duration (X), non-drought duration (Y) and inter-arrival time (S) is given as FX ð x Þ ¼ 1
1 ; Nx Tx
ð25Þ
FY ð y Þ ¼ 1
1 ; Ny Ty
ð26Þ
FS ðsÞ ¼ 1
1 ; Ns Ts
ð27Þ
respectively, where F ð:Þ are the respective CDF of the random variables X; Y and S. T is the return period and N is drought frequency per year. The return period given by (25)–(27) for T = 3, 5, 10, 25, 50 and 100 years are exhibited in Table 7a–c. Many important results can be derived from Table 7a–c, for instance, drought is expected to occur once in every 10 years having duration 16 months for climate division 1, 16 months for climate division 2, 16 months for climate division 3, 13 months for climate division 4 and 13 months for climate division 5 respectively. In the same way, Table 7b and c can be interpreted. Similarly, Shiau (2003) defines the return period for two hydrological variables either by the joint return periods for X and Y or by the conditional return periods X for the given Y or vice versa. For example, the drought duration exceeding a specific value and the nondrought duration exceeding another specific value i.e. ðX [ x; Y [ yÞ; the inter-arrival time of drought exceeding a specific value given that the drought duration has exceeded a threshold i.e. ðS [ s; X [ xÞ etc. Since these relationships can be applied on various combinations of two hydrological variables therefore it can result in the same return period. Finally, we provide some quantiles zp associated with the CDF’s of (10) and (14), respectively. These quantiles are computed numerically by solving the equations Zz f ðsÞ ds ¼ p ð28Þ 0
123
and Zz
f ðwÞ dw ¼ p:
ð29Þ
0
We use the function uniroot in R software for the numerical solution of the equations (28) and (29). Table 8a–b provide some important numerical values of zP using ML estimates for each climate division. We hope these will be useful for environmental scientists and practitioners. 7 Conclusion Drought characteristics are better explained by deriving its joint distribution as it is a multivariate phenomenon. We have presented a new bivariate Gamma distribution generated from a functional scale parameter to model the interarrival time and proportion of drought using drought duration and non-drought duration dataset. In some situations the standard distributions do not work properly due to random and uncertain behaviour of extreme events. Appropriate selection of the functional relationship /ð xÞ makes this class of distributions very flexible so that one can use it according to the given circumstances. In this paper the selection of /ð xÞ ¼ d=x is justified by the fact that there is an insignificant correlation between the drought duration and non-drought duration. This flexibility further suggests to extend this class of distributions to multivariate distributions by adopting the same procedure. An application of the bivariate Gamma distribution to drought data from the Colorado state is presented. The proposed bivariate Gamma distribution (4) seems to be a reasonable model for the drought data. We derived the explicit distributions of inter-arrival time (S) and proportion of drought (W) when drought duration (X) and nondrought duration (Y) follow the proposed bivariate distribution and checked their fitting to the observed data. The distributions of S provide an adequate fit to the observed drought data for all five climate divisions whereas the distribution of W is questionable and therefore suggests to use the corresponding results carefully. We also estimated the model parameters of the proposed bivariate distribution by MCMC based on Jeffreys prior and by the maximum likelihood method. These estimates are quite stable and have small standard deviations. The fitted bivariate Gamma distribution is used to estimate the return periods for 3, 5, 10, 25, 50 and 100 years for drought duration, non-drought duration and inter-arrival time. These estimates can help the water resource management for future planning.
Stoch Environ Res Risk Assess (2013) 27:1039–1054 Acknowledgments The authors are thankful to the Associate editor and the two referees for their valuable comments and suggestions which significantly helped to improve the paper. The first author is also thankful to the Higher Education Commission of Pakistan for their financial support for this project.
Appendix The Fisher information matrix Qðg^Þ is given by: 3 2 2 o LðgÞ o2 LðgÞ 0 0 2 oa ob 7 6 oa 2 7 6 o Lð g Þ 7 6 0 0 2 7 6 ob Qðg^Þ ¼ E6 7 2 2 o LðgÞ o LðgÞ 7 6 6 oc od 7 oc2 5 4 2
o2 LðgÞ od2
w0 ðaÞ
6 6 b1 ¼ n6 6 4 0 0
b1
0
a b2
0
0
w0 ðcÞ
0
1d
0
3
7 0 7 7 7 1d 5 c d2
Here w0 ð:Þ is the first derivative of the Psi function (also called trigamma function). References Alley WM (1984) The Palmer Drought Severity Index: limitations and assumptions. J Clim Appl Meteorol 23:1100–1109 Blom G (1958) Statistical estimates and transformed beta-variables. Wiley, New York Bonaccorso B, Cancelliere A, Rossi G (2003) An analytical formulation of return period of drought severity. Stoch Environ Res Risk Assess 17:157–174 Bras RL (1990) Hydrology: an introduction to hydrologic science. Addison-Wesley, Reading Chambers J, Cleveland W, Kleiner B, Tukey P (1983) Graphical methods for data analysis. Chapman and Hall, London Cheng KS, Hou JC, Liou JJ, Wu YC, Chiang JL (2010) Stochastic simulation of bivariate Gamma distribution: a frequency-factor based approach. Stoch Environ Res Risk Assess 25(2):107–122 Clarke RT (1980) Bivariate gamma distribution for extending annual stream flow records from precipitation: some large sample results. Water Resour Res 16:863–870 Douglas EM, Vogel RM, Kroll CN (2002) Impact of streamflow persistence on hydrologic design. J Hydrol Eng 7(3):220–227 Dupuis DJ (2010) Statistical modeling of the monthly Palmer drought severity index. J Hydrol Eng 15(10):796–808 Guerrero-Salazar P, Yevjevich V (1975) Analysis of drought characteristics by the theory of runs. Hydrology Paper Nr. 80, Colorado State University, Fort Collins Haan CT (1977) Statistical methods in hydrology. Iowa State University Press, Ames Hallack-Alegria M, Watkins DW Jr (2007) Annual and warm season drought Intensity–Duration–Frequency analysis for Sonora, Mexico. J Clim 20(9):1897–1909 Hao Z, Singh VP (2011) Bivariate drought analysis using entropy theory. 2011 Symposium on Data-Driven Approaches to Droughts, Paper 43. http://docs.lib.purdue.edu/ddad2011/43
1053 Henningsen A, Toomet O (2011) maxLik: a package for maximum likelihood estimation in R. J Comput Stat 26:443–458 Hu Q, Willson GD (2000) Effects of temperature anomalies on the Palmer Drought Severity Index in the central United States. Int J Climatol 20:1899–1911 Husak JG, Michaelsen J, Funk C (2007) Use of the gamma distribution to represent monthly rainfall in Africa for drought monitoring applications. Int J Climatol 27:935–944 Johnson NL, Kotz S, Balakrishnan N (1994) Continuous univariate distributions, vol 1. Wiley, New York Kim TW, Valdes JB, Yoo C (2006) Nonparametric approach for bivariate drought characterization using Palmer Drought Index. J Hydrol Eng 11(2):134–143 Kogan FN (1995) Droughts of the late 1980s in the United States as derived from NOAA polar-orbiting satellite data. Bull Am Meteor Soc 76:655–668 Lloyd EH (1970) Return period in the presence of persistence. J Hydrol 10(3):202–215 Loaiciga HA, Leipnik RB (2005) Correlated gamma variables in the analysis of microbial densities in water. Adv Water Resour 28:329–335 Loaicigica M, Marin˜o MA (1991) Recurrence interval of geophysical events. J Water Resour Plan Manag 117(3):367–382 Martin AD, Quinn KM, Park JH (2011) MCMCpack: Markov Chain Monte Carlo in R. J Stat Softw 42(9):1–21 Mishra AK, Singh VP (2010) A review of drought concepts. J Hydrol 391:202–216 Mishra AK, Singh VP (2011) Drought modeling: a review. J Hydrol 403:157–175 Nadarajah S (2007) A bivariate gamma model for drought. Water Resour Res 43:W08501. doi:10.1029/2006WR005641 Nadarajah S (2008) The bivariate F distribution with application to drought data. Statistics 42(6):535–546 Nadarajah S (2009a) A bivariate Pareto model for drought. Stoch Environ Res Risk Assess 23:811–822 Nadarajah S (2009b) A bivariate distribution with gamma and beta marginals with application to drought data. J Appl Stat 36(3):277–301 Nadarajah S, Gupta AK (2006a) Cherian’s bivariate gamma distribution as a model for drought data. Agrociencia 40:483–490 Nadarajah S, Gupta AK (2006b) Intensity-duration models based on bivariate gamma distribution. Hiroshima Math J 36:387–395 Nadarajah S, Kotz S (2006) A note on the correlated gamma distribution of Loaiciga and Leipnik. Adv Water Resour 30:1053–1055 Porporato A, Laio F, Ridolfi L, Rodriguez-Iturbe I (2001) Plants in water-controlled ecosystem: active role in hydrologic processes and response to water stress: III. Vegetation water stress. Adv Water Resour 24:725–744 Prekopa A, Szantai T (1978) New multivariate gamma distribution and its fitting to empirical stream flow data. Water Resour Res 14:19–24 Prudnikov AP, Brychkov YA, Marichev OI (1986) Integrals and series, vols 1–3. Gordon and Breach Science Publishers, Amsterdam Shiau JT (2003) Return period of bivariate distributed hydrological events. Stoch Environ Res Risk Assess 17:42–57 Shiau J, Shen HW (2001) Recurrence analysis of hydrologic droughts of differing severity. J Water Res Plan Manag 127(1):30–40 Song SB, Singh VP (2010) Frequency analysis of droughts using the Plackett copula and parameter estimation by genetic algorithm. Stoch Environ Res Risk Assess 24:783–805 Vogel RM (1987) Reliability indices for water supply systems. J Water Resour Plan Manag 113(4):645–654 Willeke G, Hosking JRM, Wallis JR, Guttman NB (1994) The National Drought Atlas. Institute for Water Resources Rep.
123
1054 94-NDS-4. U.S. Army Corps of Engineers, Fort Belvoir, VA, 587 pp Yevjevich V (1967) An objective approach to definitions and investigations of continental hydrologic drought. Hydrol. Pap., 23, Colorado State University, Fort Collins
123
Stoch Environ Res Risk Assess (2013) 27:1039–1054 Yue S (2001) A bivariate gamma distribution for use in multivariate flood frequency analysis. Hydrol Process 15:1033–1045 Yue S, Ouarda TBMJ, Bobee B (2001) A review of bivariate Gamma distribution for hydrological application. J Hydrol 246:1–18