Andrew Thompson, Sarah Sullivan, Maddi Barley, Laurence Moore, Paul Rogers, Attila Sipos,. Glynn Harrison. 2010. Effectiveness of a cognitive behavioural ...
Perceptual and Motor Skills, 1994, 78, 275-284. O Perceptual and Motor Skills 1994
SAMPLE-SIZE FORMULAE FOR PARAMETER ESTIMATION '-' DAVID L. STREINER Deparfment of Clinical Epidemiology and Biostatistics McMaster University Summary.-Formulae ate presented for calculating sample-size requirements when the purpose of the study is to estimate the magnitude of a parameter rather than to test an hypothesis. Formulae are given for the mean, a proportion, and correlation, for the slope, intercept, value of Y,and Y for a given value of X in multiple regression, and for the odds and risk ratios.
The publication of Cohen's book on power analysis (Cohen, 1969) resulted in a major change in the thinking about experiments. Previously, the majority of studies were designed with little or no thought given to the number of subjects needed to minimize a Type I1 error. Cohen's own analysis of published reports (Cohen, 1962) found a mean power of only .48 for a medium effect size (for a two-tailed CY of .05); it is impossible to know how many studies never found their way into print because their negative findings were the result of insufficient power. Progress over the past three decades has been mixed. O n the positive side, many granting agencies now feel that the lack of justification for sample size is sufficient grounds for rejecting a proposal. Further, in addition to Cohen's book, there have been many other publications in the area of power analysis (e.g., Dupont & Plummer, 1990; Friedman, 1982; Kraemer & Thiemann, 1987; Lachin, 1981). Counterbalancing this, a number of surveys have indicated increased interest in the area has not been matched by a corresponding improvement in power in published articles. Rossi (1990), surveying the same journals and using the same methods as in Cohen's original study, found that the mean power increased to only .50. With a few exceptions, most of which are largely unknown to psychologists (e.g., Lemeshow, Hosmer, & Klar, 1988; Lemeshow, Hosmer, Klar, & Lwanga, 1990; Lwanga & Lemeshow, 1991; McHugh & Le, 1984), the majority of the books and papers on power analysis deal with hypothesis-testing situations, e.g., determining if differences between two or more means, proportions, correlations, or the like are stochastically significant. However, there are a number of studies which are more concerned with estimating the size of parameters (e.g., a correlation, mean, or intercept) rather than the comparison of groups. Some examples of such problems are (a) deriving a 'I thank Dr. Andrew Willan for his assistance in calculating some of the equations and for his he1 ful su:o,estions about the manuscri t ' d d r e s s correspondence to Dr. ~ a v i i. l Streiner, Department of Clinical Epidemiology and Biostatistics, McMaster University, Faculty of Health Sciences, 1200 Main Street West, Hamilton, Ontario, Canada L8N 3Z5.
276
D.L. STREINER
regression equation so that a person's value of Y can be estimated with a given degree of accuracy, based on a vector of Xs; (b) seeing if the correlation of a new test with an older, accepted one is around 0.90; and (c) determining if the prevalence of a given disorder is .30 among a group of newly admitted patients. The issue in these cases is not to test whether the parameter differs from zero (which is often a trivial question) or whether it differs from the parameter in another group (which is the domain of hypothesis testing) but rather how many subjects are necessary to estimate the parameter to achieve a given degree of precision. The purpose of this paper is to present formulae which can be used to calculate sample sizes for various parameters; when the factors affecting the parameter are bounded, tables are also provided. The approach taken is to begin with the estimate of the half-width of the confidence interval (CIH; that is, the width from the mean to either the upper or lower end of the interval) for the parameter and solve for the sample size, n. I n some cases, the formulae are "quick and dirtyH and give approximate solutions. These, though, yield results which are within a few percentage points of those derived from more complicated equations. However, since the other parameters which enter into the formulae are usually educated guesses themselves, it was felt that the greater ease in calculation more than offsets any loss in accuracy.
1. The Mean l a . A Single Mean The CIH around an individual mean is given by
where s is the standard deviation and vy12is the critical value of either the t or normal distribution (see below). Solving for n gives us
When n is 30 or more, the standard normal deviate can be used for v,lz; for sample sizes under 30, the t distribution with n - 1 degrees of free-
dom is employed. This means that, for small sample sizes, an iterative procedure must be used: first guessing the appropriate sample size, putting the corresponding value of td2 in Equation [I], getting a more accurate estimate of n, putting its value of t,12 in the equation, and so on, until n converges on one value. (In fact, simply adding 3 if n is under 30 when the standard normal deviate is used yields a very close approximation to the iterative procedure.) Since the value of s does not have an upper bound, Table 1 presents values of n for different values of CIH, expressed in standard deviation units.
PARAMETER ESTIMATION
277
TABLE 1
SAMPLE-SIZE REQUIREMENTS FORESTIMATES OF THE MEAN BASEDON THE RATIOOF THE HALF-WIDTH OF THE CI TO THE STANDARD DEVIA~ON
As an example, to estimate a mean so that the CIH is half the standard deviation, with a = .05, requires a sample size of 18. Keeping a the same but reducing the CIH to one-quarter r increases the sample-size requirement to 61. l b . Two Means At times, such as testing the equivalence between two forms of therapy, the magnitude of the difference between two means is the important parameter. The approach is- the same as outlined above, except that the CIH is , is drawn around (XI- Xz),and the standard error, instead of being s / f i
D. L. STREINER
This value is used in Equation [I]. 2. Pearson's r and Spearman's rho The distribution of r is not normal, so it is first necessary to normalize it using Fisher's rl(r) transformation, where z'(r) =
1
(1 + 1 )
2
(1 - r )
-log, -
and the CIH is
Solving for n yields
Thus, the width of the C I depends on the correlation: smaller when r is small and larger when r is close to 1. Although the interval is symmetric around zf(r), it is not around r. We computed the sample-size requirements around (r - CIH). This yields a larger value of n and is therefore more conservative than using (r + CIH). Sample-size requirements for a = .O1 and .05 and for CIHs of .01, .05, and .10 are given in Table 2. Since the distribution of Spearman's rho is similar to that of Y, the same table can be used. It should be noted that this approach differs from that of Donner and Eliasziw (1987), who based their calculations on demonstrating that r (calculated as an intraclass correlation) differed from some arbitrary value, YO. To illustrate the use of Equation [2], we will use a CIH not directly available from Table 2. If we anticipate that r will be .50 and want to be 95% confident that the estimate is within the range of _+ .15 (although, as mentioned above, the interval is not symmetrical), the first step would be to calculate the z'(r) equivalents of .50 and .65, which are 0.5493 and 0.7753. Putting these in Equation [2] yields
so that 79 subjects (or pairs of subjects) are needed. 3. Proportions 3a. A Single Proportion The lower and upper exact CIHs, given by Armitage and Berry (1987), are
PARAMETER ESTIMATION TABLE 2 SAMPLE-SIZE REQUIREMENTS FOR r r
* .lo
Confidence Interval, C I .05
*
* .Ol
D. L. STREINER r - np,
-%
r-npu+%
Jnpoo where r is the number of observed events, n the sample size, and pU and pL the upper and lower limits of the proportion, respectively. They state that the continuity correction, l/2, can be omitted when n is large. Tables of the exact CIHs for the ns of 2 through 25 are given in the Geigy Scientific Tables (Diem & Lentner, 1970). However, a much simpler equation is
where q = (1 - p). Especially when p is near .50, the approximation is quite close. Solving for n, we get
Table 3 gives values of n for a = .O1 and .05, and for CIHs ranging from .05 to .50. TABLE 3
SAMPLE-SIZEREQUIREMENTS FORp
If we observe that the proportion of bipolar patients who have a positive family history of depression is 0.30 and want to be 95% confident that the true value falls between 0.26 and 0.34, we would need
PARAMETER ESTIMATION
or 505 subjects. Notice that, according to Table 3, if the CIH were increased just slightly to & .05, the sample size drops dramatically to 323. 3b. Two Proportions In a manner analogous to the CIH around the difference between two means, if the interest is in the difference between two proportions, then we must use the standard error of the difference between proportions
Assuming that both groups will have the same number of subjects, then the equation for the sample size is
A different approach is taken by Cohen (19691, who deals with the situation where one can specify the order of magnitude of the difference between the proportions but not their individual values. Since tables for this purpose are provided in his book, they are not given here. 4. Regression 4a. Slope of the Regression Line The CIH around the slope is given by
where s,,;dual is the residual standard deviation of Y around the regression line, s, is the standard deviation of X, and vdz is defined as for the mean -the standard normal deviate when n is over 30 and iteratively determined using values of t when it is under 30. The equation for sample size is then
0.90
If s, is 15,,,s .30, then
=
17, and we want to be 95% confident that the slope is
4b. Mean Value of P for a Given Value of X When the equation for the CIH,
D. L. STREINER
is solved for n, the result is a fairly formidable quadratic. If n is relatively large, then this can be simplified by replacing the n term by (n - I), resulting in a more tractable solution
where X is the mean value of X, and Xo is the value of X to be evaluated. [Replacing n by (n - I), rather than the reverse, increases the sample size by 1, resulting in a slightly more conservative answer.] Using the same numbers for s, and s,, as in the previous example, setting = 45 and wanting to be 95% confident that Y is within the range of 10 at Xo = 6 5 , then
*
4c. The Value of Y for a Specific Person
The CI around the predicted value of Y for a given individual is wider than that around p, since the scatter around the regression line must be taken into account. (Strictly speaking, it is not a CI, since Y is not a parameter. Kleinbaum and Kupper (1778) referred to it as a "prediction interval.") The formula in Equation [6] is modified slightly by including the term ' + 1' under the radical. Making the same simplifying assumptions regarding the (n - 1) term, we arrive at the formula for the sample size as
Again using the same example but using a prediction interval of with a = . l o ,
* 30
Note that the equation is undefined if C I H s(zu12x I,-). 4d. Intercept of the Regression Line Equation [71 can be used, by setting Xo equal to zero, resulting in
With the same values for s,, s,,, 4b, we get
-
X, and the CIH as in the example for
PARAMETER ESTIMATION
5. The Odds Ratio Lemeshow, et al. (1988) point out that, since the upper end of the odds ratio (OR) is unbounded, the normal approximation holds only for very large sample sizes. Better results are obtained using In(OR), the natural logarithm of OR. Assuming an equal number of subjects in the diseased and nondiseased groups and letting P1 and P2 be the true probabilities of exposure in the two groups, respectively, then the CIH around ln(0R) is
They define the width of the desired CIH as E, which is in terms of a percent of the OR. Thus, they derive the formula for the sample size to be
For example, if the estimated exposure rate in the control group is 0.50 and the O R is believed to be 2.5, then P1 is defined by the identity
(whichever yields a value of P < 1) to be 0.20. If we wanted to have 90% confidence that the estimate we got is within 15% of the true value, then
in each group.
6 . The Relative Risk Using the same assumptions as for the OR, Lemeshow, et al. (1988) derived the equation for the sample size for relative risk (RR) in a cohort study to be
If the anticipated outcome will occur in 25% of the control group, then to
284
D. L. STREINER
be 95% confident that the true value will be within 15% of an estimate of an RR whose true value is 2.0 will require
subjects per group.
DISCIJSSION This paper presents some formulae for calculating sample sizes when the aim of the study is to estimate the magnitude of some parameter rather than to test an hypothesis. As with traditional sample-size calculations, it is necessary to make some assumptions about the data. If these can be made based on previous studies or pilot work, then the estimates of n may be fairly accurate. If no data are available, the equations can be used to evaluate the feasibility of a study through a sensitivity analysis. If it is found that small changes in one parameter d o not result in large increases in sample size, then it is likely that the final estimates of the parameter may be accurate. REFERENCES
ARMITAGE, l?, & BERRY, G . (1987) Statistical methods in medical rewarch. (2nd ed.) Oxford, UK: Blackwell. COHEN,J . (1962) The statistical power of abnormal-social research: a review. ]ournu1 of Abnormal and Social Psychology, 65, 145-153. COHEN, J. (1969) Statistical power analysis for the behavioral sciences. New York: Academic Press.
D m , K., & LENTNER, C. (Eds.) (1970) Docurnenfa Geigy: scientific tables. (7th ed.) Basle: Geigy .
DONNER, A,, & ELIASZIW, M. (1987) Sample size requirements for reliability studies. Statistics in Medicine, 6, 441-448. DWONT,W. D . , & PLUMMER, W. D . , JR. (1990) Power and sample size calculations: a review and computer program. Controlled Clinical Trials, 11, 116-128. FRIEDMAN, H. (1982) Simplified determinations of statistical power: magnitude of effect and research sample size. Educational and Psychological Measurement, 42, 521-526. KLEINBAUM, D. G . , & KUPPER, L. L. (1978) Applied regression analysis and other multivariable methods. North Scituate, MA: Duxbury. KRAEMER, H . C., & THEMANN, S. (1987) How many subjects? Beverly Hills, CA: Sage. LACHIN, J. M. (1981) Introduction to sample size determination and power analysis for clinical trials. Controlled Clinical Trials, 2, 93- 113. LEMESHOW, S., HOSMER, D. W., & KLAR, J . (1988) Sam le size requirements for studies estimating odds ratios or relative risks. Statistics in ~ e i c i n e 7, , 759-764. LEMESHOW, S., HOSMER, D. W., KLAR, J., & LWANGA, S. K. (1990) Adequacy of sample size in health studies. New York: Wiley. LWANGA, S. K., & LEA,IFSHOW, S. (1991) Sample rize determination in health studies. Geneva: World Health 0rg:lnlzation.
MCHUGH,R. B., & LE, C T. (1984) Confidence intervals and the size of a clinical trial. Controlled Clmzcai Trzals, 5 , 157-163. Ross~,J. S. (1990) Statistical power of psychological research: what have we gained in 20 years? Journal of Consulting and Clinical Psychology, 58, 646-656. Accepted November 15, 1993.
This article has been cited by: 1. Yeong Yeh Lee, Anuar Waid, Huck Joo Tan, Seng Boon Andrew Chua, William E Whitehead. 2012. Validity and reliability of the Malay-language translation of the Rome III Diagnostic Questionnaire for irritable bowel syndrome. Journal of Gastroenterology and Hepatology 27:10.1111/jgh.2012.27.issue-4, 746-750. [CrossRef] 2. V. Indira, R. Vasanthakumari, N.R. Sakthivel, V. Sugumaran. 2011. A method for calculation of optimum data size and bin size of histogram features in fault diagnosis of mono-block centrifugal pump. Expert Systems with Applications 38, 7708-7717. [CrossRef] 3. Andrew Thompson, Sarah Sullivan, Maddi Barley, Laurence Moore, Paul Rogers, Attila Sipos, Glynn Harrison. 2010. Effectiveness of a cognitive behavioural workbook for changing beliefs about antipsychotic polypharmacy: analysis from a cluster randomized controlled trial. Journal of Evaluation in Clinical Practice . [CrossRef] 4. Jean Wong, Doris Tong, Yoshani De Silva, Amir Abrishami, Frances Chung. 2009. Development of the Functional Recovery Index for Ambulatory Surgery and Anesthesia. Anesthesiology 110, 596-602. [CrossRef] 5. Nancy N. Baxter, Pamela J. Goodwin, Robin S. Mcleod, Rene Dion, Gerald Devins, Claire Bombardier. 2006. Reliability and Validity of the Body Image after Breast Cancer Questionnaire. The Breast Journal 12:10.1111/tbj.2006.12.issue-3, 221-232. [CrossRef] 6. Nancy Sikich, Jerrold Lerman. 2004. Development and Psychometric Evaluation of the Pediatric Anesthesia Emergence Delirium Scale. Anesthesiology 100, 1138-1145. [CrossRef] 7. Alexandra Kirkley, Sharon Griffin. 2003. Development of disease-specific quality of life measurement tools. Arthroscopy: The Journal of Arthroscopic & Related Surgery 19, 1121-1128. [CrossRef]