Modifying the Exact Test for a Binomial Proportion ... - Semantic Scholar

0 downloads 0 Views 138KB Size Report
ABSTRACT. In this note we provide a simple continuity and tail-corrected approach to the standard exact test for a single binomial proportion commonly used in ...
Journal of Applied Statistics Vol. 33, No. 7, 679 –690, August 2006

Modifying the Exact Test for a Binomial Proportion and Comparisons with Other Approaches ALAN D. HUTSON University at Buffalo, The State University of New York, New York, USA

ABSTRACT In this note we provide a simple continuity and tail-corrected approach to the standard exact test for a single binomial proportion commonly used in practice. We redefine the p-value for the two-sided alternative by noting the skewed distribution of the sample proportion under the null hypothesis. We illustrate that for both one and two-sided alternatives the coverage probabilities of the new methodology approaches more closely the desired type I error a and thus recommend these modifications to the applied statistician for consideration. KEY WORDS : Binomial confidence interval, exact test

Introduction A common problem in biostatistics is the hypothesis test involving a single binomial parameter H0: p ¼ p0 versus either a simple one-sided alternative, taking the form H1: p , p0 or H1: p . p0, or a two-sided alternative, H1: p = p0. The most popular approach in practice for carrying forth this test is based upon the distribution of the binomial observation under H0, where Y  B(n,p0), pˆ(Y ) ¼ Y/n is a random proportion with ˆ ¼ y/n is the binomial probabilities corresponding to Y under the null hypothesis and p observed proportion, e.g. p0 may represent some survival fraction at a fixed point in time for a new therapy under investigation in a phase II trial. The p-values based on the binomial probabilities calculated under H0 are deemed ‘exact’ p-values. For the one-sided alternative, H1: p , p0, the p-value is defined as pl ¼ P(p^ (Y)  p^ jH0 ) ¼ P(Y  yjH0 )

(1)

and for the one-sided alternative, H1: p . p0, the p-value is defined as pu ¼ P(p^ (Y)  p^ jH0 ) ¼ P(Y  yjH0 )

(2)

respectively. Correspondence Address: Alan D. Hutson, University at Buffalo, The State University of New York, Farber Hall Room 249, 3435 Main Street, Building 26, Buffalo, New York 14214-3000, USA. Email: [email protected] 0266-4763 Print=1360-0532 Online=06=070679–12 # 2006 Taylor & Francis DOI: 10.1080=02664760600708723

680 A.D. Hutson The calculation of the so-called ‘exact’ p-values in equations (1) and (2) for one-sided alternatives is straightforward yet highly conservative, i.e. the true underlying type I error is restricted to be less than or equal to the desired type I error, denoted as a. In fact the true error may be substantially less than the desired error for small to moderate sample sizes, even for non-extreme values of p0, i.e. values of p0 near 1/2. For fixed sample sizes the calculation of this true type I error rate takes the well-known saw-toothed pattern as a function of p0 and is bounded above by a. This will be illustrated graphically in later sections. In terms of calculating an ‘exact’ p-value given the two-sided alternative H1: p = p0 there is no fixed definition. However, the approach commonly employed by statistical software packages such as SAS version 9 (SAS Institute, 2004, Cary, NC) is to define a two-sided p-value simply as p ¼ 2 min (pl , pu )

(3)

where pl and pu are defined in equations (1) and (2), respectively, i.e. twice the p-value from the smaller of the two one-sided alternatives. Note that p in equation (3) has the undesirable property in that it may take values greater than one. Hence, in practice, the typical statistical software definition is arbitrarily to define p . 1 to be p ¼ 1. As in the one-sided case the actual type I error is bounded above by the desired type I error a and hence the two-sided test as defined by equation (3) may also be very conservative for a given p0. The interesting pattern for the type I error as a function of p0 is given by the convolution of the two-one sided alternative type I error functions. Other well-known approaches for both the one-sided and two-sided tests for a single proportion are given by the and variance-stabilization method with test pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi Wald, Score pffiffiffi statistics defined as T ¼ n ( p ^  p )= p ^ (1  p ^ ) , T ¼ n ( p ^  p )= p (1  p0 ) and w 0 s 0 0 pffiffiffiffi pffiffiffi pffiffiffiffiffiffi Ta ¼ 2 n( arcsin ( p^ )  arcsin ( p0 )), respectively, each being asymptotic N(0, 1) under H0. The exact difference in variance between the statistics are contained in the order O(n2 ) terms and above. Currently, it is rare that these tests or similar versions are utilized in lieu of the exact binomial tests. It should be noted that even those these tests are based on normal approximations the true underlying type I error rate may be calculated based on Y  B(n, p0 ) similar to the exact binomial test. This is carried forth by simply aligning the binomial probabilities for a given test statistic corresponding to each value of Y in the sample space. This obviously implies one could also define exact p-values for Tw , Ts and Ta simply by summing over the binomial probabilities that are more extreme than the observed test statistic that are equivalent to the exact binomial test, and hence generate p-values different from those based on normal approximations that yield a more conservative test. Interestingly, the dual problem related to calculating the confidence interval for p has generated several publications that make the case strongly against the exact approach, primarily due to its conservative properties, e.g. see Casella (1986), Vollset (1993), Agresti & Coull (1998), Brown et al. (2001, 2002) and Piegorsh (2004), and the references there within. The general recommendations for confidence intervals is the approach by Brown et al. (2001, 2002) and Wilson (1927). The approach of Brown et al. extends the notion set forth by Agresti & Coull (1998), with the 100  (1 2 a)% confidence interval defined as rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p~ (1  p~ ) p~ + za=2 n

(4)

Modifying the Exact Test for a Binomial Proportion 681 where za=2 is the 1  a=2th quantile from a standard normal distribution, the modified estimator of p is given by p~ ¼ (y þ z2a=2 =2)=(n þ z2a=2 ) and n~ ¼ n þ z2a=2 . The approach of Wilson (1927) which is defined as

p~ +

ffi za=2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi np^ (1  p^ ) þ z2a=2 =4 n

(5)

where p~ and n~ are defined above. In terms of the frequentist setting, Brown et al. (2001) recommends equation (4) for large n and equation (5) for small n. The exact intervals may ^  be calculated simply by associating binomial probabilities with p^ (Y) ¼ Y=n given Yjpi B(n, p^ ) and finding the lower and upper quantiles closest to a/2 and 1 2 a/2. Alternatively, some refer to the Pearson – Clopper intervals, which are given by inverting binomial probabilities with respect to p, as exact. Both approaches have similar operating characteristics. Arguments against the exact confidence interval approach in lieu of the approximate approach are stated as follows: 1. Agresti & Coull (1998): ‘However, this procedure (exact) is necessarily conservative, because of the discreteness of the binomial distribution (Neyman 1935), just as the corresponding exact test (without supplementary randomization on the boundary of the critical region) is conservative we believe it is inappropriate to treat this approach as optimal for statistical practice.’ 2. Brown et al. (2001): ‘The Clopper– Pearson (exact) interval is wastefully conservative and is not a good choice for practical use, unless strict adherence to the prescription C(p, n)  1  a is demanded,’ where C(p, n) refers to the coverage probability. In the next section we modify the exact binomial hypothesis test such that it should be considered ‘optimal’ for statistical practice as compared to all other tests. Furthermore, the inversion of this modified test will likewise produce well-behaving confidence intervals in the Agesti and Brown framework. A comparison with the other methods in terms of confidence intervals will be performed in the third section.

Modifications to the Exact Test In the spirit of the work carried out by Agresti & Coull (1998) and Brown et al. (2001) with respect to developing new approaches towards defining confidence intervals for a binomial parameter, we illustrate that the exact one-sided and two-sided hypothesis tests may be modified to provide exact type I error control as close as possible (in the average sense) to the desired type I error control over the range of the parameter space corresponding to p under H0. Furthermore, these corrected exact tests tend to have better statistical properties as compared with other alternatives and thus may also be inverted to generate well-behaved confidence intervals. As a starting point let us first consider the two exact one-sided intervals. Define the observed p-value for the test of H0 : p ¼ p0 versus H1 : p , xp0 as pl (y) ¼ P(p^ (Y)  p^ jH0 ) ¼ P(Y  yjH0 )

(6)

682 A.D. Hutson where p^ (Y) ¼ Y=n, p^ ¼ y=n and Y  B(n, p0 ). Then the exact type I error, a0l , is given by

a0l ¼

nþ1 X

I(pl (i1),a) P(Y ¼ i  1jH0 )

(7)

i¼1

where I() denotes the indicator function and the function pl (  ) is defined in equation (6). Likewise, for the test of H0 : p ¼ p0 versus H1 : p . p0 with pu (y) ¼ P(p^ (Y)  p^ jH0 ) ¼ P(Y  yjH0 )

(8)

we have that the exact type I error is given by

a0u ¼

nþ1 X

I(pu (i1),a) P(Y ¼ i  1jH0 )

(9)

i¼1

Figures 1 and 2 illustrate a0l and a0u as a function of p0 for n ¼ 25 and a ¼ 0:05 for the traditional exact binomial test. As can be seen, there is a familiar saw tooth pattern with n ¼ 25 peaks and the exact type I error may be much below the desired type I error, a, for a given p0. In fact it may be exactly 0 for a significant range of p0. These plots are consistent across sample size and desired type I error a. As noted in the previous section, there is not a unique definition of a p-value for the two-sided test of H0 : p ¼ p0 versus H1 : p = p0 . The approach that is commonly used in statistical software packages, such as SAS, is given by defining the p-value as a function

Figure 1. Type I error for the exact binomial test as a function of p0 for the hypothesis H0 : p ¼ p0 versus H1 : p , p0 given n ¼ 25

Modifying the Exact Test for a Binomial Proportion 683

Figure 2. Type I error for the exact binomial test as a function of p0 for the hypothesis H0 : p ¼ p0 versus H1 : p . p0 given n ¼ 25

of both one-sided p-values, yielding the two-sided p-value p(y) ¼ 2 min (pl (y), pu (y))

(10)

where pl (y) and pu (y) are defined in equations (6) and (8), respectively. Given this definition, the exact type I error for the two-sided test may now be calculated explicitly as

a0 ¼

nþ1 X

I(p(i1),a) P(Y ¼ i  1jH0 )

(11)

i¼1

where I() denotes the indicator function and the function p() is defined in equation (1). As noted earlier, a0 for the two-sided test may be thought of as a convolution of a0l and 0 au in equations (7) and (9), respectively. Figure 3 illustrates this interesting convolution property. By examining Figures 1 and 2 closely in terms of the maxima and minima we see how the odd shape pattern is arrived at for the exact type I error for the two-sided test.

Simple Continuity Correction A simple continuity correction that provides a less conservative test is given by redefining the one-sided and two-sided p-values as follows: For the test of H0 : p ¼ p0 versus H1 : p , p0 the continuity corrected p-value is simply given by subtracting off one-half the probability at the boundary of the test, such that the corrected p-value is given by pl ¼ P(p^ (Y)  p^ jH0 )  P(p^ (Y) ¼ p^ jH0 )=2 ¼ P(Y  yjH0 )  P(Y ¼ yjH0 )=2

(12)

684 A.D. Hutson

Figure 3. Type I error for the exact binomial test as a function of p0 for the hypothesis H0 : p ¼ p0 versus H1 : p = p0 given n ¼ 25

where, as before, p^ (Y) ¼ Y=n, p^ ¼ y=n and Y  B(n, p0 ). Similarly, for the test of H0 : p ¼ p0 versus H1 : p . p0 we define the continuity corrected p-value as pu ¼ P(p^ (Y)  p^ jH0 )  P(p^ (Y) ¼ p^ jH0 )=2 ¼ P(Y  yjH0 )  P(Y ¼ yjH0 )=2

(13)

This leads to a simple continuity corrected two-sided p-value for the test of H0 : p ¼ p0 versus H1 : p = p0 corresponding to the commonly used definition and given by p ¼ 2 min (pl , pu )

(14)

where pl and pu are defined in equations (12) and (13), respectively. Note that as compared to the definition for the two-sided p-value, p, in equation (3) the modified two-sided p-value, p , is bounded above by 1 and hence will not yield nonsensical p-values greater than 1. Plots of the exact type I error based on the continuity corrected p-values pl and pu defined in equations (12) and (13) are provided in Figures 4 and 5, respectively, for a sample size of n ¼ 25. The two-sided version is illustrated in Figure 6. As one can see these corrected p-values provide the applied statistician with a test that is more in line with what is expected in practice and similar to the notions proposed by Agresti & Coull (1998) and Brown et al. (2001). It is interesting to note that the twosided test has fluctuations that are less pronounced than the one-sided test around the desired level of a due to a dampening type effect via the convolutions of the one-sided test error rates. These plots are consistent in their form across different sample size and a values. We will examine this approach in more detail in the next section.

Modifying the Exact Test for a Binomial Proportion 685

Figure 4. Type I error for the continuity corrected exact binomial test as a function of p0 for the hypothesis H0 : p ¼ p0 versus H1 : p , p0 given n ¼ 25

It should be noted that one further correction can be made to improve the two-sided test of H0 :p ¼ p0 versus H1 :p = p0 . The additional correction is based on the notions expounded by George & Mudholkar (1990) for defining p-values for two-sided tests when the distribution of the test statistic under the null hypothesis is asymmetric. Their

Figure 5. Type I error for the continuity corrected exact binomial test as a function of p0 for the hypothesis H0 : p ¼ p0 versus H1 : p . p0 given n ¼ 25

686 A.D. Hutson

Figure 6. Type I error for the continuity corrected exact binomial test as a function of p0 for the hypothesis H0 : p ¼ p0 versus H1 : p = p0 given n ¼ 25

general approach leads to a very simple notion with respect to the one sample binomial case, which is basically to reweight pl and pu in equations (12) and (13), respectively, given some measure of skewness under H0, in order to redefine the two-sided p-value. The basic approach is to note that the product-moment skewness for the distribution of the test statistic p^ under H0 is proportional to p0. This leads to the continuity and tail corrected p-value for the two-sided test of H0 : p ¼ p0 versus H1 : p = p0 of p ¼ min



pl pu , p0 1  p0

 (15)

where pl and pu are defined in equations (12) and (13), respectively. If one is required to have strict adherence that the type I error is less than or equal to a across p0 then equation (15) may be modified as p c ¼ min



pl pu , p0 1  p0

 (16)

where pl and pu are defined in equations (1) and (2), respectively. As we can see from Figure 8 this is still an improvement over the exact binomial test given by the common definition in equation (3), as illustrated by Figure 3. Note that p = p and p c ¼ p given p0 ¼ 1=2 and that practically speaking the two p-value definitions are similar for p0 anywhere within the neighborhood of 1/2. The difference between p and p occurs for p0 at the extremes near 0 or 1. For example, examine the plot of the exact type I error rates based on p given in Figure 7 and compare this plot to Figure 6. Through simple inspection it is clear that the two-sided

Modifying the Exact Test for a Binomial Proportion 687

Figure 7. Type I error for the continuity and tail corrected exact binomial test as a function of p0 for the hypothesis H0 : p ¼ p0 versus H1 : p = p0 given n ¼ 25

hypothesis test based on p , in the Agresti and Coull (1998) and Brown et al. (2001) vein, should be preferred to hypothesis test based on p . The same relative improvement holds across sample sizes in the sense of properly centering the exact type I error rates with the true type I error rates.

Figure 8. Type I error for the tail corrected exact binomial test as a function of p0 for the hypothesis H0 : p ¼ p0 versus H1 : p = p0 given n ¼ 25

688 A.D. Hutson

Figure 9. Approximate area under the Type I error function for the standard, continuity and continuity-tail corrected exact binomial test as a function of p0 for the hypothesis H0 : p ¼ p0 versus H1 : p = p0 as a function of n at a ¼ 0:05

Figure 10. Approximate area under the Type I error function for the standard, continuity and continuity-tail corrected exact binomial test as a function of p0 for the hypothesis H0 : p ¼ p0 versus H1 : p = p0 as a function of n at a ¼ 0:1

Modifying the Exact Test for a Binomial Proportion 689 Figures 9 and 10 illustrate the approximate average exact type I error over p0 [ (0:001,0:999) for each of the three definitions for a two-sided test given by the p-values p, p and p , in equations (3), (14) and (15), respectively, as a function of sample size. The averages were calculated using equation (11) over the range of p0 with increments of 0.001. From the figures we see that the continuity and tail corrected p-value, p , approaches the desired level a in the average sense more rapidly than the continuity corrected p-value, p , and that the standard version based on p is not close to the desired level even at n ¼ 100, and hence is overly conservative.

Comparison with Methods used for Constructing Confidence Intervals Noting that there is a one-to-one relationship between confidence intervals and hypothesis tests allows us to compare the continuity and tail corrected p-value approach with the two recommended approaches for confidence intervals. Namely, the methods proposed by Brown et al. (2001, 2002) and Wilson (1927) and given in equations (4) and (5), respectively. The key comparisons are Figures 6 and 7 corresponding to the type I error control for the continuity-corrected and continuity- and tail-corrected methods versus one minus the coverage probability for the Brown et al. and Wilson methods given in Figures 11 and 12, respectively. As can be seen, the Wilson approach has poor operating characteristics at extreme values for p, while the method of Brown et al. is conservative for extreme p and is very comparable to the continuity-corrected hypothesis test approach. In general, the continuity- and tail-corrected approach has good operating characteristics across the range of p as compared to the other approaches in the average sense. Hence, the inversion of this interval will yield reasonable confidence intervals when compared head-to-head with the approach of Brown et al. and Wilson. This small study also suggests a modification of the Brown et al. approach in terms of defining a weighting scheme

Figure 11. 1 – coverage probability for Brown interval given n ¼ 25

690 A.D. Hutson

Figure 12. 1 – coverage probability for Wilson et al. interval given n ¼ 25

applied to the lower and upper bounds around the estimate of p in order to define asymmetric intervals. Also note that as well as the methods of Brown et al. work for non-extreme p these intervals may still lead to lower and/or upper bounds that are outside the range of the parameter space for small to moderate sample sizes. However, inverting the continuity and tail-corrected hypothesis test will always yield intervals that lie within the parameter space. Acknowledgement This work is partially supported by a NYSTAR Faculty Development grant. The author would like to thank the reviewers for their thoughtful comments. References Agresti, A. & Coull, B.A. (1998) Approximate is better than ‘exact’ for interval estimation of binomial proportions, American Statistician, 52, pp. 119 –126. Brown, L.D., Cia, T.T. & DasGupta, A. (2001) Interval estimation for a binomial proportion (with discussion), Statistical Science, 16, pp. 101–133. Brown, L.D., Cia, T.T. & DasGupta, A. (2002) Confidence intervals for a binomial proportion and asymptotic expansions, Annals of Statistics, 30, pp. 160–201. Casella, G. (1986) Refining binomial confidence intervals, Canadian Journal of Statistics, 14, pp. 113–129. George, E.O. & Mudholkar, G.S. (1990) P-values for two-sided tests, Bio-metrical Journal, 32, pp. 747–751. Piegorsh, W.W. (2004) Sample sizes for improved binomial confidence intervals, Computational Statistics & Data Analysis, 46, pp. 309–316. Vollset, S.E. (1993) Confidence intervals for a binomial proportion, Statistics in Medicine, 12, pp. 809–824. Wilson, E.B. (1927) Probable inference, the law of succession, and statistical inference, Journal of the American Statistical Association, 22, pp. 209–212.

Suggest Documents