Testing Parameters of a Gamma Distribution for Small Samples

2 downloads 0 Views 242KB Size Report
The gamma distribution is relevant to numerous areas of application in the physical, environmental, and biological sciences. The focus of this paper is on testing ...
Supplemental materials for this article are available through the TECH web page at http://www.amstat.org/publications.

Testing Parameters of a Gamma Distribution for Small Samples Dulal K. B HAUMIK, Kush K APUR, and Robert D. G IBBONS Center for Health Statistics Department of Psychiatry and Division of Epidemiology and Biostatistics University of Illinois at Chicago Chicago, IL 60612 ([email protected]) The gamma distribution is relevant to numerous areas of application in the physical, environmental, and biological sciences. The focus of this paper is on testing the shape, scale, and mean of the gamma distribution. Testing the shape parameter of the gamma distribution is relevant to failure time modeling where it can be used to determine if the failure rate is constant, increasing, or decreasing. Testing the scale parameter is also relevant to problems in survival analysis, where when the shape parameter κ = 1, the reciprocal of the scale parameter measures the hazard function. Finally, testing the mean of the gamma distribution allows us to determine if the average concentration of an environmental contaminant is higher, lower, or equivalent to a health-based standard. In this paper, we first derive new small sample-based tests and then via simulation, we study the Type I error rate and statistical power of these tests. Results of these simulation studies reveal that in terms of maintaining Type I error rate, the new tests perform extremely well as long as the shape parameter is not too small, and even then the results are only slightly conservative. We illustrate the new tests using three real datasets taken from the fields of engineering, medicine, and environmental science. This article has supplementary material online. KEY WORDS: Gamma distribution; Log-Gamma; Power function; Wilson–Hilferty approximation.

1. INTRODUCTION Gamma distributions are routinely used for the analysis of data with right-skewed distribution. The gamma distribution has been applied in numerous disciplines including engineering, finance, climatology, and environmental science. Davis (1952) and Barlow and Proschan (1965) pointed out the importance of a gamma distribution for the failure times of complex systems under continuous repair and maintenance. Das (1955), Stephenson et al. (1999), and Aksoy (2000) have used the gamma distribution to model the amount of daily rainfall in a region. Bain, Engelhardt, and Shiue (1984) proposed approximate tolerance limits for a gamma distribution for the purpose of finding lower tolerance limits for the endurance of deep-groove ball bearings. Bhaumik and Gibbons (2006) developed simultaneous statistical prediction limits for a gammadistributed random variable for environmental monitoring problems. Krishnamoorthy, Mathew, and Mukherjee (2008) developed prediction and tolerance intervals for gamma-distributed random variables using the Wilson–Hilferty (1931) approximation in the context of stress-strength reliability problems. For industrial quality control problems, in which endpoints often have long-tailed distributions, gamma distributions are used to determine whether engineering processes are in control, and have been shown to lead to increased power over normal alternatives. The use of a gamma distribution is more appropriate than a normal distribution when variability and concentration are related as they are in the case of many environmental constituents (Gibbons and Coleman 2001, pp. 34–47). For small sample sizes when the distributional properties cannot be easily verified, routine use of the normal distribution is often misleading. Taken as a whole, gamma distributions are potentially

quite useful for applications in numerous fields, including but not limited to environmental monitoring, genetic research, and industrial quality control. Over the years, researchers have mainly focused on two distinct aspects related to gamma distributions. The first focus is on estimation, testing, and construction of confidence intervals of the parameters of gamma distributions. The second focus is on construction of prediction and tolerance limits for a gamma random variable. In this article, our interest is in developing tests for parameters or a function of parameters of a two parameter gamma distribution. Suppose x follows a gamma distribution with shape parameter κ and scale parameter θ, and denote it by x ∼ G(κ, θ ). The density function of x is f (x) =

e−x/θ xκ−1 . (κ)θ κ

In the following three sections we motivate our statistical developments of three tests with some important applications. Testing the shape parameter: The shape parameter (κ) of a gamma distribution plays an important role in various fields. For example, in renewal theory, to model times of occurrence of events, engineers focus attention on the the shape parameter. In reliability theory and survival analysis the physical interpretation of testing the null hypothesis κ = 1 against the left-sided alternative κ < 1 is that under the null hypothesis the failure rate is constant whereas it is decreasing (DFR) under the alternative hypothesis within the gamma family. On the other hand,

326

© 2009 American Statistical Association and the American Society for Quality TECHNOMETRICS, AUGUST 2009, VOL. 51, NO. 3 DOI 10.1198/TECH.2009.07038

TESTING PARAMETERS OF A GAMMA DISTRIBUTION FOR SMALL SAMPLES

the failure rate is increasing (IFR) in case of the right-sided alternative hypothesis. Researchers have paid special attention in testing κ = 1 as it provides a test of exponentiality against gamma alternatives (see Keating, Glaser, and Ketchum 1990). The second motivation for this test relates to testing the coefficient of variation in connection to measuring the efficiency of gears, blades, and deep-groove ball bearings of heavy engines (Bain, Engelhardt, and Shiue 1984). The third motivation arises from using transformed gamma variables for constructing prediction and tolerance intervals (see Aryal et al. 2008; Krishnamoorthy, Mathew, and Mukherjee 2008). Hence, an appropriate testing procedure of the shape parameter is necessary. Testing the scale parameter: In reliability and life testing when κ = 1, the reciprocal of the scale parameter measures the constant failure rate known as the hazard function. Testing of the scale parameter plays an important role in manufacturing for comparing the failure rate of a new design or system with that of the existing system. In survival analysis the hazard function provides instantaneous potential for an event to occur given survival up to a fixed time. In the general case, the reciprocal of the scale parameter measures the asymptotic value of the failure rate function (see Barlow and Proschan 1965, p. 13). Hence, testing the scale parameter is related to testing the limiting failure rate. In addition, testing equality of scale parameters is crucial for studying the additive property of multiple independent gamma variables, to generate beta-prime, beta, or Dirichlet distributions (Johnson, Kotz, and Balakrishnan 1994, p. 350). Testing the mean: In environmental statistics an important problem is to compare the average of a small number of potentially contaminated measurements to a regulatory standard, usually health-based in nature. This type of one-sample test, often implemented by constructing a confidence interval and comparing the lower confidence limit to the fixed standard, is widely used in environmental monitoring applications (see Gibbons and Coleman 2001, pp. 204–217). Another important problem in environmental statistics is to compare the average of a small number of potentially impacted measurements with a larger collection of background measurements. The distributions of the analytes of concern are generally right skewed and gamma distributions are appropriate for analyzing these types of data (see Bhaumik and Gibbons 2006 and Krishnamoorthy, Mathew, and Mukherjee 2008). T-tests based on bootstrapping or other permutation-based methods frequently used in this context are generally inadequate due to a typically small number of measurements. A small sample based test for the mean exploiting the gamma distribution of the analytes is critical for this type of analysis. The motivation for the current work is to develop a valid testing strategy that is robust to violation of the assumption of normality, yet retains good statistical power and Type I error rates for small samples. Potential applications of these methods are widespread in quality control, survival analysis, reliability theory, and environmental science. In what follows, we first explore various exact and approximate results related to gamma random variables. In Section 3 based on a result by Bain and Engelhardt (1975) we study the performance of a test developed for the shape parameter and compare its power with the test of Keating, Glaser, and Ketchum (1990) via simulation. In Section 4 we construct a test for the scale parameter and compare

327

the performance of this test with Engelhardt and Bain (1977) test. In Section 5 we propose two tests for the mean, one of which is based on the Wilson–Hilferty (1931) approximation. All the tests proposed in this article are easy to compute and remain valid for small samples as long as the shape parameter is not too small. For every testing problem we numerically compute the Type I error rates using Monte Carlo simulation. We illustrate these methods with three examples in Section 6. 2. STATISTICAL FOUNDATION Let x1 , x2 , . . . , xn be an independent random sample of size n drawn from a gamma population to estimate the unknown parameters. Denote the arithmetic and geometric means based on this random sample by x¯ and x˜ , respectively. Let U be the ratio of the geometric mean to the arithmetic mean, and Rn = ln(1/U). In the following we provide some foundational results that will be used in the subsequent sections. For results (1)–(5) related to a gamma distribution we refer to Glaser (1976a) and Bain and Engelhardt (1975). Results 1. x¯ and U are jointly sufficient and complete statistics for the gamma distribution. 2. The distribution of U does not depend on θ . 3. The distributions of x¯ and U are statistically independent. 4. 2nκRn is approximately distributed as cχν2 . The values of c and ν depend on n and κ. For κ > 2, the distribution of 2nκRn can be approximated by a chi-square distribution with degrees of freedom (df) n − 1. 5. Let X = n¯x. Then X ∼ G(nκ, θ). The maximum likelihood estimators (MLE) of θ and κ denoted by θˆ and κˆ are solutions to the following equations. ˆ − ψ(κ) ˆ Rn = ln(κ)

and

κˆ θˆ = x¯ ,

(1)

where ψ denotes a digamma or Euler’s psi function which is the derivative of the logarithm of the gamma function, that is,  (κ) = (κ) . Denote the mean and variance of x ∼ ψ(κ) = dln((κ)) dκ G(κ, θ ) by E(x) and V(x), respectively, then E(x) = κθ

and

V(x) = κθ 2 .

(2)

For more results on the gamma distribution we refer to Johnson, Kotz, and Balakrishnan (1994, pages 337–414). 3. TESTING THE SHAPE PARAMETER Engelhardt and Bain (1977) developed a uniformly most powerful unbiased test for κ based on U when θ is unknown in testing the simple null hypothesis against the simple alternative hypothesis. Glaser (1973, 1976a) derived the exact density function of U and showed that this density is related to Bartlett’s (1937a) test statistic for testing the homogeneity of variances. Glaser’s derivation for the density function of U involves various powers of − ln(U). However, his expression provides a conservative radius of convergence, and convergence takes place TECHNOMETRICS, AUGUST 2009, VOL. 51, NO. 3

328

DULAL K. BHAUMIK, KUSH KAPUR, AND ROBERT D. GIBBONS

for all U in the interval [exp(−2π/n), 1]. Using Glaser’s expression, one can compute the upper critical values while testing right-sided alternatives for κ. Both Glaser (1980) and Keating, Glaser, and Ketchum (1990) observed that determination of lower critical values for left-sided alternative hypotheses are highly computationally intensive with a very slow rate of convergence and the method often fails to produce lower critical values. In this context, we investigate the performance of a test constructed for κ using result (4). As the distribution of Rn does not depend on θ [see result (2)], without any loss of generality we can assume that θ = 1. Let the null hypothesis be H01 : κ = κ0 , and the alternative hypothesis be Ha1 : κ > κ0 . The test statistic for H01 is T1 = 2nκ0 Rn ,

(3)

where, T1 ∼ cχν2 approximately. To determine the values of c and ν we use the following equations: 2nκ0 E(Rn ) = cν, (2nκ0 )2 Var(Rn ) = 2c2 ν.

(4)

To obtain the constants c and ν we generate 100,000 random samples of size n from G(κ0 , 1) and numerically compute the mean and variance of Rn and use them in the previous equations (see supplementary file for a function to compute critical values using R statistical software package available at the

2 , where Technometrics webpage). We reject H01 if T1 < cχν,α 2 χν,β is the (β)th percentile point of a chi-square distribution with df ν. Note that the distribution of Rn depends only on the parameter κ. Hence under the null hypothesis when κ = κ0 , its distribution does not depend on any unknown quantities. The 2 depends only on the known quantity κ . percentile point χν,β 0 In this regard, θ plays absolutely no role in testing the shape parameter. When κ0 > 2, using result (4) we claim that 2nκ0 Rn follows approximately a chi-square distribution with df (n − 1). Hence, the critical value of the test statistic T1 is global and it does not depend on any parameter as long as the chi-square approximation with df (n − 1) is used when κ0 > 2. An alternative approach to obtain the critical value is to use the empirical distribution of Rn , that is, the 95th percentile point of 2nκ0 Rn . An extensive Monte Carlo simulation study based on 1 million samples indicates that this test performs extremely well in terms of controlling simulated Type I error rates for all values of n = 3, 5, 8, 10, 15, 20 and κ > 1. Even for κ = 0.25 and for a small value of n = 3, the simulated Type I error rate did not exceed 0.065 when the nominal value was fixed at 5%. But such small values of κ are rarely encountered in practice. Next, we compared the simulated Type I error rates of T1 with those of the likelihood ratio test (LRT) for the same values of n and κ = 0.25(0.25)3.0. These results are reported in Table 1. Table 1 reveals a poor performance of the LRT in terms of controlling the Type I error rates for all values of κ and n, with increasingly poor performance when n decreases. By contrast, T1 performed well for all values of κ and n.

Table 1. Simulated Type I error rates of T1 and LRT for shape parameters κ

T1

LRT

T1

n=3 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00

0.0642 0.0604 0.0564 0.0539 0.0527 0.0519 0.0515 0.0513 0.0510 0.0509 0.0506 0.0504

0.0526 0.0519 0.0515 0.0512 0.0510 0.0508 0.0504 0.0505 0.0502 0.0502 0.0503 0.0503

T1

n=5 0.1530 0.1521 0.1502 0.1475 0.1441 0.1407 0.1376 0.1347 0.1319 0.1291 0.1261 0.1231

0.0569 0.0553 0.0538 0.0525 0.0520 0.0515 0.0511 0.0511 0.0507 0.0506 0.0504 0.0502

n = 10 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00

LRT

n=8 0.1020 0.1028 0.1026 0.1021 0.1028 0.1029 0.1021 0.1021 0.1020 0.1027 0.1025 0.1021

0.0538 0.0529 0.0522 0.0517 0.0516 0.0514 0.0510 0.0508 0.0505 0.0503 0.0502 0.0502

n = 15 0.0678 0.0672 0.0673 0.0675 0.0673 0.0675 0.0674 0.0674 0.0671 0.0672 0.0674 0.0675

TECHNOMETRICS, AUGUST 2009, VOL. 51, NO. 3

0.0521 0.0515 0.0511 0.0509 0.0509 0.0506 0.0502 0.0502 0.0499 0.0501 0.0501 0.0501

LRT

0.0754 0.0759 0.0754 0.0757 0.0752 0.0754 0.0757 0.0759 0.0751 0.0753 0.0755 0.0755 n = 20

0.0639 0.0638 0.0639 0.0636 0.0639 0.0631 0.0633 0.0634 0.0635 0.0632 0.0633 0.0631

0.0517 0.0512 0.0509 0.0507 0.0506 0.0504 0.0500 0.0500 0.0498 0.0500 0.0500 0.0499

0.0607 0.0605 0.0610 0.0609 0.0608 0.0605 0.0607 0.0604 0.0604 0.0604 0.0602 0.0602

TESTING PARAMETERS OF A GAMMA DISTRIBUTION FOR SMALL SAMPLES (a)

(b)

329

4. TESTING THE SCALE PARAMETER Engelhardt and Bain (1977) proposed a uniformly most powerful unbiased (UMPU) test (EBT) for testing H02 : θ = θ0 against the alternative hypothesis Ha2 : θ > θ0 of a gamma distribution when the shape parameter is unknown. The construction of their test is based on the conditional distribution of 1/U given g = x˜ /θ . The determination of critical values of this test is computationally intensive as the conditional distribution depends on g. They proved that for large g, E(1/U|g) → 1 and constructed a test for H02 using Zn = 2ng(1/U − 1), where Zn 2 . Engelhardt and Bain (1978) imfollows approximately χn−1 proved this result by deriving an approximate 2-moment chisquare distribution for Zn given g. The distribution of this test statistic depends on the unknown parameter κ and the determination of percentile points of this conditional distribution is extremely complicated for both the original and improved tests of Engelhardt and Bain (1977, 1978). Also, the performance of these tests is unsatisfactory in terms of controlling Type I error rate for small values of κ. In the following, we derive a new test for θ that is easier to compute and maintains the nominal Type I error rates. The distribution of x¯ depends on both κ and θ . We propose an approximate simple test using the additive property of chisquare distributions. In this procedure, we first estimate κ by using the first part of Equation (1). Let Z = 2X θ0 , then using result (5) we obtain a chi-square distribution for Z with estimated 2 . Using result (4), we get df = 2nκˆ and denote it by Z ∼ χ2n κˆ ˆ n an approximate chi-square distribution for 2nκR with df = ν. c Moreover, result (3) implies that Z and Rn are independently distributed. Define

T2 =

Figure 1. Power curves for the shape parameter using T1 and KGKT for n = 3 (a) and n = 5 (b).

The critical values provided by Keating, Glaser, and Ketchum (1990) in their paper are based on simulations. Figure 1 displays the power curves for T1 and Keating, Glaser, and Ketchum test (KGKT) for testing H01 : κ = 0.5 against the alternative Ha1 : κ > 0.5. The Type I error rate of KGKT for this test is 0.059 and that of T1 is 0.060 for n = 3. Figure 1(a) reveals that T1 has better performance in terms of power than that of KGKT. For n = 5, the Type I error rate of KGKT is 0.052 whereas that of T1 is 0.055. Figure 1(b) reveals that T1 performs slightly better than KGKT but it should be noted that the nominal Type I error rate for T1 is slightly higher than that of KGKT and the increment may disappear if both tests are calibrated to 5% Type I error rate. However, for n ≥ 15 the simulated power curves are indistinguishable (not reported here). Overall, these findings indicate that for very small samples T1 has better performance than KGKT. For large samples T1 is a strong competitor of KGKT in terms of both controlling Type I error rate at the nominal level and in terms of statistical power. However, T1 is much easier to implement than KGKT for κ > 2 as determination of critical values of KGKT is problematic.

2nκR ˆ n + Z. c

(5)

Based on the above results and using the fact that the sum of two independent chi-square distributions is a chi-square distribution, we obtain an approximate chi-square distribution for ˆ Note that both the distribuT2 with estimated df = ν + 2nκ. tions of Z and T2 in (5) depend on the unknown parameter κ. For large samples when the MLE of κ is consistent, the performance of T2 in terms of controlling Type I error rates is satisfactory. However, even for large samples Z alone has a very poor performance when κ is small (≤1). An alternative to T2 can be obtained based on the concept of parameteric bootstrapping (PBT) for the scale parameter. The test is computed as follows: 1. For a given dataset of sample size n, estimate the shape parameter κˆ by the maximum likelihood method. Hence for large n, κˆ is a consistent estimator for κ. In the following we replace κ by κ. ˆ 2. Using κˆ and θ0 , generate N datasets each of size n from G(κ, ˆ θ0 ). Generally N is between 1000–10,000. 2nκR ˆ 3. Compute T2j = c nj + Zj for the generated datasets in step 2, j = 1, . . . , N. 4. Find the 100(1 − α)th percentile point T2 (α) from T2j ’s. 5. Reject H02 if T2 > T2 (α), where T2 is obtained from Equation (5). TECHNOMETRICS, AUGUST 2009, VOL. 51, NO. 3

330

DULAL K. BHAUMIK, KUSH KAPUR, AND ROBERT D. GIBBONS (a)

In order to compare performance of Zn (EBT) and T2 , we conducted a Monte Carlo simulation study based on 1 million samples for various values of κ under the null hypothesis H02 : θ = 1 and for n = 3, 5, 8, 10, 15, 20. For κ ≥ 0.5, both T2 and PBT perform extremely well in terms of controlling the Type I error rates for a fixed nominal rate of 5%. However, the performance of EBT is poor for the aforementioned parameter values. Next, we compared power curves of these two tests (EBT and T2 ) in Figure 2 for testing H02 : θ = 0.5 against the alternative Ha2 : θ > 0.5 via simulation for n = 5 and 10 with κ = 1. The simulated Type I error rates for EBT are 0.0302 for n = 5, and 0.0240 for n = 10. T2 maintains the Type I error rate to its nominal level at 5% for n = 5 and n = 10. Both the power curves [Figures 2(a) and (b)] reveal that T2 has better performance than EBT. The power curves for PBT are not shown in these figures as they are almost identical to those of T2 .

(b)

5. TESTING THE MEAN OF A GAMMA DISTRIBUTION Testing the mean of a gamma distribution involves the product of the shape and scale parameters. Grice and Bain (1980) developed an approximate test (GBT) for the mean of a gamma distribution when both parameters are assumed to be unknown. GBT is based on x¯ and its distribution depends on the unknown shape parameter κ. In fact, the distribution of their test statistic follows a chi-square distribution with df = 2nκ. To implement this test they first estimate the unknown shape parameter κ and also estimate the df based on κ. ˆ The drawback of this procedure is that the true significance level may be different from the nominal level (α) for small sample sizes. To overcome this problem, the authors select an appropriate initial level β so that ˆ = α, where μ0 approximately P(κ, β, n) = Pr[ 2nμκˆ0 x¯ < χβ2 (2nκ)] is the value of the mean specified under the null hypothesis. They showed that the limiting value of P(κ, β, n) is close to P(∞, β, n) when κ is large. The value of β is chosen such that it satisfies P(∞, β, n) = α. Based on this concept, Shiue and Bain (1983) developed a test for the means of two independent gamma populations with a common unknown shape parameter. Shiue, Bain, and Engelhardt (1988) generalized the concept of Shiue and Bain (1983) for testing the means of two gamma populations when shape parameters are unequal and unknown. Our test procedure is distinctly different from the abovementioned tests. Note that while developing a test for the mean (means), the aforementioned authors used only the distribution of x¯ . It is equally important within this framework to exploit the distribution of U as it provides a vital information regarding κ. Our test is based on both the distributions of x¯ and U. Wilson and Hilferty (1931) used a normal approximation for the gamma distribution. Specifically, we use their result that the cube root of a gamma random variable is approximately normally distributed. In the context of constructing prediction and tolerance limits Krishnamoorthy, Mathew, and Mukherjee (2008) used the cube root approximation for a gamma random variable and showed that the simulated coverage matched with the given value. Let μ = κθ . Our hypothesis is H03 : μ = μ0 against the alternative hypothesis Ha3 : μ > μ0 . Krishnamoorthy, Mathew, and TECHNOMETRICS, AUGUST 2009, VOL. 51, NO. 3

Figure 2. Power curves for the scale parameter using T2 and EBT for n = 5 (a) and n = 10 (b).

Mukherjee (2008) observed that when x ∼ G(κ, θ ) and κ ≥ 1, then x1/3 follows approximately a normal distribution. They compared the quantiles of a gamma distribution for κ = 1, 2, 5, and 20 based on a exact method (IMSL routine GAMIN) and the Wilson–Hilferty approximation. This quantile based comparison produces a satisfactory result for the assumption that cube root of a gamma random variable follows approximately a normal distribution. However, when a random sample of size n is available, the condition that κ > 1 is replaced by nκ > 1 as X ∼ G(nκ, θ). Let γ0 = (nμ0 )1/3 . We compute the variance of X 1/3 by the delta method and get Var(X 1/3 ) = 9γθ 0 under H03 . The approximate distribution of X 1/3 ∼ N(γ0 , 9γθ 0 ). Let Z1 =

9κγ0 (X 1/3 − γ0 )2 . μ0

(6)

Approximately, Z1 has a χ12 distribution. Recall from re2 . sult (4) that the approximate distribution of 2nκRn is χn−1

TESTING PARAMETERS OF A GAMMA DISTRIBUTION FOR SMALL SAMPLES

Moreover, the aforementioned result (3) states that these two chi-square distributions are statistically independent. Define, 9γ0 (n − 1)(X 1/3 − γ0 )2 . 2nμ0 Rn

T3 =

331

(a)

(7)

Under H03 , approximately, T3 in (7) has an F-distribution with dfs 1 and (n − 1). Note that, the construction of T3 in (7) does not depend on any of the unknown parameters under H03 . Using the Wilson–Hilferty approximation, the mean and variance of X 1/3 under the null hypothesis are 1/3

μX 1/3 =

μ0 (nκ + 1/3) κ 1/3 (κ)

and

2/3

σX21/3

μ (nκ + 2/3) − μ2X 1/3 . = 0 2/3 κ (κ)

(8)

Based on this approximation, our test for the mean is T4 =

(n − 1)(X 1/3 − μX 1/3 )2 . 2nκRn σX21/3

(b)

(9)

T4 is constructed using the expressions for the mean and variance provided by Wilson–Hilferty. Approximately, T4 in (9) has an F-distribution with dfs 1 and n − 1. The third test we propose in this context is based on the result of Krishnamoorthy, Mathew, and Mukherjee (2008). It is important to note that Krishnamoorthy, Mathew, and Mukherjee (2008) used the cube root approximation to construct tolerance and prediction intervals for a gamma random variable. They did not address the issue of testing the mean in their article. Based on the cube root approximation of Krishnamoorthy, Mathew, and Mukherjee (2008), we construct a test for the mean of a gamma distribution as follows. Let 1/3

μK =

μ0 (κ + 1/3) κ 1/3 (κ)

σK2 =

2/3 μ0 (κ + 2/3) κ 2/3 (κ)

and Figure 3. Power curves for the mean using T3 , T4 , T5 , and GBT for n = 5 (a) and n = 10 (b).

− μ2k ,

√ n 1/3 n(( i=1 xi )/n − μK ) T5 = . σK

(10)

T5 has an approximate standard normal distribution. To compute T4 and T5 we use the MLE of κ. As mentioned before, for small samples the distribution of T5 has similar problems as that of T4 . A Monte Carlo simulation study based on 1 million samples indicates that the overall performance of T3 and T4 is excellent in terms of controlling the simulated Type I error rates for n = 3, 5, 8, 10, 15, 20 and μ = 0.25(0.25)3.0 with θ = 1. For μ ≤ 0.50, these tests are slightly conservative. On the other hand, both T5 and GBT suffer from highly inflated Type I error rates. Our simulation study in this regard matches with a previous study performed by Grice and Bain (1980). For example, when n = 5 and κ = 2, the Type I error rates of GBT and T5 are 0.0960 and 0.0870, respectively, whereas T3 and T4 have Type I error rates 0.0502 and 0.0510, respectively. Further comparisons of T3 , GBT, T4 , and T5 in terms of power are provided in Figure 3 for testing H03 : μ = 10.99

against the alternative hypothesis Ha3 : μ > 10.99 (the motivation for this value comes from an example discussed in Section 6). Figure 3(a) shows the power curves of these four tests for n = 5 along with the Type I error rates. In this figure we see that the Type I error rate of T5 is close to 15% and that of GBT is approximately 10%, whereas both T3 and T4 maintain the Type I error rate at 5%. For n = 10, the Type I error rates of both T5 and GBT are approximately 10%, which is still unacceptable. On the other hand, performance of T3 and T4 are indistinguishable, and both the tests maintain the Type I error rate at the nominal level of 5%. We find that GBT and T5 have higher power at the expense of inflated Type I error rates. Note that T3 does not require an estimate of κ. The poor performance of T5 has been noted by Krishnamoorthy, Mathew, and Mukherjee (2008) who observed that their method is not useful for making inferences regarding the gamma parameters. We further explore the performance of GBT in connection to constructing a confidence interval for the mean in Section 6. TECHNOMETRICS, AUGUST 2009, VOL. 51, NO. 3

332

DULAL K. BHAUMIK, KUSH KAPUR, AND ROBERT D. GIBBONS

Table 2. Times between successive failures of air condition (A/C) equipment in a Boeing 720 aircraft

6. ILLUSTRATION We now illustrate our results with three real datasets: one from engineering, one from the biomedical sciences, and one from environmental monitoring. Shape test: We consider a historically important dataset illustrated first in Proschan (1963) and subsequently used by Keating, Glaser, and Ketchum (1990) and Pal, Jin, and Lim (2005, p. 175). The data were collected in the following manner: The times of successive failures of air condition (A/C) equipment in Boeing 720 aircrafts were recorded. The intervals between successive failures are shown in order of occurrence in Table 2. The basic motivation of this study was to determine the reliability and optimal maintenance schedule and inventory of spare parts of the A/C system in a Boeing 720 aircraft. For a complete description of the dataset and in depth background on statistical applications in the area of reliability, see Proschan (1963) and also Pal, Jin and Lim (2005, p. 175). Figure 4 presents a gamma probability plot corresponding to the data displayed in Table 2. In general, Figure 4 reveals an excellent fit of these data to the gamma distribution. We also computed the following goodness-of-fit tests: Kolmogorov Smirnov p-value > 0.5; Anderson Darling p-value > 0.5; and Cramer von Mises p-value > 0.5, all of which failed to reject the null hypothesis that the data are consistent with a gamma distribution. We have n = 29 times between successive failures of air conditioning equipment in Boeing 720 aircraft measurements, with sample mean x¯ = 83.5172 hours, geometric mean x˜ = 60.1544 hours and sample standard deviation (S) = 70.8058 hours. Our estimates of κ and θ are κˆ = 1.67 and θˆ = 49.981. To test the exponentiality against an IFR, that is, H01 : κ = 1 against the alternative Ha1 : κ > 1, Lawless (1982, pp. 110–111) used nonparametric methods and drew the conclusion that there is a lack of evidence against the null hypothesis of exponentiality. On the other hand, Keating, Glaser, and Ketchum (1990) used the same dataset for testing exponentiality and rejected the null hypothesis at the 5% significance level, but not at the

90 14 44 310 130

10 24 59 76 208

60 56 29 26 70

186 20 118 44 101

61 79 25 23 208

1% level. To test the same hypothesis, we compute T1 = 19.032 using Equation (3). The estimates of c = 1.085 and ν = 28.394 are based on 1,000,000 samples. Using these estimates of c and 2 ν, the critical value at the 5% level of significance of cχν,0.05 is 18.697. Hence, we also find lack of evidence against the null hypothesis, although the p-value of this test is 0.0562, which approaches significance. The 95% lower confidence limit for κ based on T1 is

2 cχν,0.05 2nRn = 0.9495 and the 95% upper confidence limit 2 cχν,0.95 = 2.3039, respectively. The two-sided 95% is UCT(κ) = 2nR n

LCT(κ) =

confidence interval for κ is (0.8891, 2.5628). In order to obtain the coverage probabilities of these confidence limits (interval), we simulated data from a gamma distribution with κ = 1.67, θ = 49.981, and n = 29. We computed the coverage probabilities based on 1,000,000 samples and found that the average coverage probability is in the neighborhood of 93%, which is slightly below the specified level of 95%. Scale test: In the second example, we consider a dataset provided in Gross and Clark (1975) and Engelhardt and Bain (1977) in connection to testing the scale parameter of a gamma distribution. The dataset consists of a random sample of 20 survival times (in weeks) from a group of 208 male mice who were exposed to 240 rads of gamma radiation. The values of x¯ = 113.45 and x˜ = 107.068. The MLEs are κˆ = 8.7992 and θˆ = 12.8932. Let β = 1/θ . The testing problem considered by Engelhardt and Bain (1977) is H02 : β ≤ 0.05 against the alternative Ha2 : β > 0.05 at the 0.01 signifi-

Figure 4. Gamma probability plot for times between successive failures of air condition equipment in a Boeing 720 aircraft. TECHNOMETRICS, AUGUST 2009, VOL. 51, NO. 3

49 84 156 62

TESTING PARAMETERS OF A GAMMA DISTRIBUTION FOR SMALL SAMPLES

Table 3. Survival times of male mice exposed to gamma radiation (in weeks) 152 125

152 40

115 128

109 123

137 136

88 101

94 62

77 153

160 83

Table 5. Vinyl chloride data from clean upgradient ground-water monitoring wells in (μg/L) 165 69

cance level; they found no evidence to reject the null hypothesis. This is the same as testing H02 : θ ≥ 20 against the alternative Ha2 : θ < 20 at the 0.01 significance level. We obtained the estimates of c = 1.01723 and ν = 19.0459 based on 1,000,000 samples. The value of the test statistic using the estimate of c is T2 = 246.933 and the critical value based 2 = 310.599. Hence, we on the estimate of ν is χν+2n κ,0.05 ˆ reject the null hypothesis. The p-value for this test is p = 1.1854e−07 0.01. The 99% confidence interval for θ based on T2 is (10.6802, 15.9467). The simulated coverage probability for this interval is 99.12%. Mean test: We consider the second random sample of 20 survival times (in weeks) from the same dataset provided in Gross and Clark (1975) and Grice and Bain (1980) in connection to testing the mean of a gamma distribution. Gross and Clark compared the means of survival times of the two samples presented in Tables 3 and 4. For the purpose of illustration, we test H03 : μ = 113.45, against the alternative hypothesis Ha3 : μ > 113.45. Using Table 4 we obtain the values of x¯ = 123.55 and x˜ = 117.978. The MLEs are κˆ = 10.9995 and θˆ = 11.2324. The value of the test statistic is T3 = 1.5407. The p-value of the test is 0.2296. Hence, we do not have enough evidence to reject the null hypothesis at the nominal rate of 5%. The 95% confidence interval for μ based on T3 is (104.7597, 147.1166). The simulated coverage probability for this interval is 97.2%. Remark. Grice and Bain (1980) constructed a 95% confidence interval for the mean based on the dataset provided in Table 3. Their confidence interval was (107.7, 134.7). We have simulated the coverage probability of this interval and found that it is 73.8%. For this dataset, our confidence interval is (94.364, 138.050) and the simulated coverage probability is 96.24%. In the third example, we consider vinyl chloride data obtained from clean upgradient monitoring wells. Vinyl chloride is a volatile organic compound. This constituent is of particular interest in environmental investigations because it is both anthropogenic and carcinogenic. Nevertheless, low levels of this constituent are found in many background monitoring wells. The low-level detections of this compound in clean upgradient background monitoring wells are due to cross contamination from air or gas or the analytical process itself. Bhaumik and Gibbons (2006) and Krishnamoorthy, Mathew, and Mukherjee (2008) considered this example in constructing prediction and tolerance intervals for gamma random variables. Table 4. Survival times of male mice exposed to gamma radiation (in weeks) 56 91

174 86

134 96

157 70

166 87

131 177

147 83

127 128

156 123

333

137 145

5.1 2.4 0.4 0.5 2.5 0.1 6.8

1.2 0.5 0.6 5.3 2.3 1.8 1.2

1.3 1.1 0.9 3.2 1.0 0.9 0.4

0.6 8.0 0.4 2.7 0.2 2.0 0.2

0.5 0.8 2.0 2.9 0.1 4.0

An appropriate statistical problem related to this dataset is to compute an upper confidence limit (UCL) for the mean vinyl chloride concentration such that practitioners can use this UCL to compare the mean obtained from downgradient monitoring wells that are potentially contaminated by leakage from the facility (e.g., a landfill) that they are designed to monitor. The arithmetic and geometric means are x¯ = 1.879 and x˜ = 1.096. The MLEs are κˆ = 1.0627 and θˆ = 1.769. The 95% UCL for the mean using T3 is 2.816. Average concentrations of vinyl chloride from downgradient monitoring wells in excess of 2.816 μg/L should therefore provide a signal that the facility may be impacting ground water and further study is required. 7. DISCUSSION Applications of gamma distributions are widespread in the environmental sciences, biological sciences, and many other fields. Testing of parameters of a gamma distribution should provide a useful addition to the arsenal of statistical methods that are related to these important applied problems. Tests presented in this article extend the literature developed by Bain and Engelhardt (1975), Engelhardt and Bain (1977, 1978), Grice and Bain (1980), Shiue and Bain (1983), Bain, Engelhardt, and Shiue (1984), Shiue, Bain, and Engelhardt (1988), and Keating, Glaser, and Ketchum (1990). Although our tests were developed for one-sided alternative, they can be easily modified for two-sided alternatives also. We can form an approximate F-test exploring the distribution of T1 to test the equality of shape parameters of two independent gamma populations. Similarly, we can construct a test statistic for comparing scale parameters of two independent gamma random variables based on the second component of T2 . However, we cannot extend our results discussed in Section 5 to compare two gamma means. Results based on extensive simulation studies reveal that in terms of maintaining Type I error rates, the proposed tests perform extremely well as long as the shape parameter is not too small and even then the results are only slightly conservative. ACKNOWLEDGMENTS The authors thank the editor, associate editor, and two referees for their insightful comments, which significantly improved the quality of this article. This work was supported in part by grants from the National Institute of Health (R01 MH69353 and R01 MH65556). [Received March 2007. Revised March 2009.] TECHNOMETRICS, AUGUST 2009, VOL. 51, NO. 3

334

DULAL K. BHAUMIK, KUSH KAPUR, AND ROBERT D. GIBBONS

REFERENCES Aksoy, H. (2000), “Use of Gamma Distribution in Hydrological Analysis,” Turkish Journal of Engineering and Environmental Sciences, 24, 419–428. Aryal, S., Bhaumik, D. K., Mathew, T., and Gibbons, R. D. (2008), “Approximate Tolerance Limits and Prediction Limits for the Gamma Distribution,” Journal of Applied Statistical Science, 16, 103–111. Bain, L. J., and Engelhardt, M. (1975), “A Two-Moment Chi-Square Approximation for the Statistic log( xx¯˜ ),” Journal of the American Statistical Association, 70, 948–950. Bain, L. J., Engelhardt, M., and Shiue, W.-K. (1984), “Approximate Tolerance Limits and Confidence Limits on Reliability for the Gamma Distribution,” IEEE Transactions on Reliability, 33, 184–187. Barlow, R. E., and Proschan, F. (1965), Mathematical Theory of Reliability, New York: Wiley. Bartlett, M. S. (1937a), “Properties of Sufficiency and Statistical Tests,” Proceedings of the Royal Society of London, Ser. A, 160, 268–282. Bhaumik, D. K., and Gibbons, R. D. (2006), “One-Sided Approximate Prediction Intervals for at Least p of m Observations From a Gamma Population at Each of r Locations,” Technometrics, 48, 112–119. Das, S. C. (1955), “Fitting Truncated Type III Curves to Rainfall Data,” Australian Journal of Physics, 8, 298–304. Davis, D. J. (1952), “An Analysis of Some Failure Data,” Journal of the American Statistical Association, 47, 113–150. Engelhardt, M., and Bain, L. J. (1977), “Uniformly Most Powerful Unbiased Tests on the Scale Parameter of a Gamma Distribution With a Nuisance Shape Parameter,” Technometrics, 19, 77–81. (1978), “Construction of Optimal Unbiased Inference Procedures for the Parameters of the Gamma Distribution,” Technometrics, 20, 485–489. Gibbons, R. D., and Coleman, D. E. (2001), Statistical Methods for Detection and Quantification of Environmental Contamination, New York: Wiley. Glaser, R. E. (1973), “Inferences for a Gamma Distributed Random Variable With Both Parameters Unknown With Applications to Reliability,” Technical Report 154, Standford University, Dept. of Statistics. (1976a), “The Ratio of the Geometric Mean to the Arithmetic Mean for a Random Sample From a Gamma Distribution,” Journal of the American Statistical Association, 71, 480–487.

TECHNOMETRICS, AUGUST 2009, VOL. 51, NO. 3

(1980), “A Characterization of Bartlett’s Test Involving Incomplete Beta Functions,” Biometrika, 67, 53–58. Grice, J. V., and Bain, L. J. (1980), “Inferences Concerning the Mean of the Gamma Distribution,” Journal of the American Statistical Association, 75, 929–933. Gross, A. J., and Clark, V. A. (1975), Survival Distributions: Reliability Applications in the Biomedical Services, New York: Wiley. Johnson, N. L., Kotz, S., and Balakrishnan, N. (1994), Continuous Univariate Distributions. Wiley Series in Probability and Statistics, New York: Wiley. Keating, J. P., Glaser, R. E., and Ketchum, N. S. (1990), “Testing Hypotheses About the Shape Parameter of a Gamma Distribution,” Technometrics, 32, 67–82. Krishnamoorthy, K., Mathew, T., and Mukherjee, S. (2008), “Normal Based Methods for a Gamma Distribution: Prediction and Tolerance Intervals and Stress-Strength Reliability,” Technometrics, 50, 69–78. Lawless, J. F. (1982), Statistical Models and Methods for Lifetime Data, New York: Wiley. Pal, N., Jin, C., and Lim, W. K. (2005), Handbook of Exponential and Related Distributions for Engineers and Scientists, Boca Raton, FL: Chapman & Hall/CRC. Proschan, F. (1963), “Theoretical Explanation of Observed Decreasing Failure Rate,” Technometrics, 5, 375–383. Shiue, W. K., and Bain, L. J. (1983), “A Two-Sample Test of Equal Gamma Distribution Scale Parameters With Unknown Common Shape Parameter,” Technometrics, 25, 377–381. Shiue, W. K., Bain, L. J., and Engelhardt, M. (1988), “Test of Equal GammaDistribution Means With Unknown and Unequal Shape Parameters,” Technometrics, 30, 169–174. Stephenson, D. B., Kumar, K., Doblas-Reyes, F.-J., Royer, J.-F., Chauvin, F., and Pezzulli, S. (1999), “Extreme Daily Rainfall Events and Their Impact on Ensemble Forecasts of the Indian Monsoon,” Monthly Weather Review, 127, 1954–1966. Wilson, E. B., and Hilferty, M. M. (1931), “The Distribution of Chi-Squares,” Proceedings of the National Academy of Sciences, 17, 684–688.