Bivariate Median Splits and Spurious Statistical Significance

3 downloads 0 Views 1MB Size Report
Bivariate Median Splits and Spurious Statistical Significance. Scott E. Maxwell and Harold D. Delaney. Despite pleas from methodologists, researchers often ...
Bivariate Median Splits and Spurious Statistical Significance Scott E. Maxwell and Harold D. Delaney Despite pleas from methodologists, researchers often continue to dichotomize continuous predictor variables. The primary argument against this practice has been that it underestimates the strength of relationships and reduces statistical power. Although this argument is correct for relationships involving a single predictor, a different problem can arise when multiple predictors are involved. Specifically, dichotomizing 2 continuous independent variables can lead to false statistical significance. As a result, the typical justification for using a median split as long as results continue to be statistically significant is invalid, because such results may in fact be spurious. Thus, researchers who dichotomize multiple continuous predictor variables not only may lose power to detect true predictor-criterion relationships in some situations but also may dramatically increase the probability of Type I errors in other situations.

duces measurement precision, underestimates the magnitude of bivariate relationships, and lowers statistical power for detecting true effects (e.g., Cohen, 1978; Humphreys & Fleishman, 1974; Maxwell, Delaney, & Dill, 1984). If indeed these were the only effects, researchers might very well be justified in dichotomizing their measures so as to simplify data analysis so long as they continued to achieve statistical significance. However, the point of this article is that lower estimates of effect size and lower power are not the only possible results of such an artificial dichotomization. In a design in which artificial dichotomization is all too common in psychological research, we show that dichotomizing continuous predictor measures may in fact lead to overestimates of strength of relationship accompanied by an increase in Type I errors, that is, to results that are spuriously statistically significant.

For many years, behavioral statisticians have chided psychological researchers for artificially dichotomizing continuous variables. Despite many such methodological pleas (e.g., Cohen, 1983,1990; Humphreys, 1978; McNemar, 1969, pp. 444449), the ubiquitous median split has retained its popularity in many areas of psychology. Indeed, some well-known psychological theories such as the Type A/Type B personality distinction, the reflection/impulsivity distinction, and the models of sex roles discussed by Bern (1977) and Spence and Helmreich (1978) are based on forming dichotomies from continuous scores obtained from psychological instruments. Studies have demonstrated that dichotomization is not just an abstract methodological issue but can in fact greatly impact the interpretation of empirical results (e.g., Block, Block, & Harrington, 1974; Lubinski, Tellegen, & Butcher, 1983; Spence, 1983; Tellegen & Lubinski, 1983). Why have researchers continued to ignore methodologists' advice not to dichotomize their measures? Certainly one obvious reason is that data analysis procedures are generally somewhat simpler for dichotomous measures than for continuous measures. A common defense on the part of researchers is that as long as they can obtain statistical significance with a dichotomous measure, why should they have to bother with the more complicated statistical technique likely to be required by using a continuous measure? Underlying this argument is an implicit assumption that the effect of artificially dichotomizing a continuous measure is necessarily to lower the power of obtaining statistical significance. In fact, various methodologists have noted that dichotomizing a continuous measure in effect throws away information because individuals within a subgroup are treated as if they were identical with respect to the attribute in question, when there is evidence to the contrary. This loss of information typically re-

Bivariate Problems Most methodological attention has been focused on a situation in which the goal is to relate a single independent variable to a dependent variable. The likely decrease in the standardized effect size and therefore statistical power as a result of dichotomizing either or both of the variables has been known for many years. For example, Peters and Van Voorhis (1940) showed that under a bivariate normal distribution, dichotomizing one of the variables at its mean' reduces the population correlation coefficient from p to .798 p. Dichotomizing both variables at their means in this situation generally lowers the correlation yet further (as discussed below, in the Correlational Perspective section).

1 Whereas Peters and Van Voorhis (1940) described the effects of dichotomizing a variable at its mean, in practice most researchers dichotomize at the median. However, in a symmetric distribution, such as the normal distribution, the mean and the median are identical. Because the primary focus of the current article is on the population effects of dichotomizing normally distributed variables, the distinction between the mean and the median is irrelevant to our conclusions.

Scott E. Maxwell, Department of Psychology, University of Notre Dame; Harold D. Delaney, University of New Mexico. Correspondence concerning this article should be addressed to Scott E. Maxwell, Department of Psychology, University of Notre Dame, Notre Dame, Indiana 46556.

Psychological Bulletin, 1993, Vol. 113, No. 1, 181-190 Copyright 1993 by the American Psychological Association, Inc. 0033-2909/93/$3.00

181

182

SCOTT E. MAXWELL AND HAROLD D. DELANEY

Table 1 Hypothetical Test Scores Errors X,

Speed X2

Standardized test (Y)

10 10 10 10 12 12 12 12 14 14 14 14 16 16 16 16

8 8 12 12 10 10 14 14 12 12 16 16 14 14 18 18

99 101 99 101 109 111 109 111 119 121 119 121 129 131 129 131

Cohen (1983) has discussed the costs of such dichotomization. He concludes that the cost in the degradation of measurement due to dichotomization is a loss of one-fifth to two-thirds of the variance that may be accounted for on the original variables, and a concomitant loss of power equivalent to that of discarding one-third to two-thirds of the sample. Such losses cannot be justified, given the availability of methods that fully exploit all the original measurement information, (p. 253)

Although Cohen acknowledges that his calculations assume bivariate normality, he nevertheless argues persuasively that the same general tendency applies to most behavioral data. As he points out, even when distributions are highly skewed and relationships are nonlinear, polynomial regression is generally preferable to dichotomization.

Multivariate Problems -

Much less attention has been paid to problems involving more than two variables. Nevertheless, a popular design in the behavioral sciences involves dichotomizing two continuous variables to study their effects on a third dependent variable. Such designs are especially frequent in research on individual differences, such as the research on sex roles mentioned earlier in the article. Other recent examples in which two continuous predictor measures are jointly dichotomized include the effects of depression and self-esteem on attributions (Stoltz & Galassi, 1989), Type A personality and leisure ethic on academic performance (Tano, 1988), fetal movement and habituation on Brazelton and Bayley scales (Madison, Madison, & Adubato, 1986), stress and depression on preference for type of humor (Schill & O'Laughlin, 1984), simultaneous and successive processing on linguistic functioning (Ashman, 1982), and reaction time and performance accuracy on modifiability of impulsivity and self-esteem (Tolor & Tolor, 1982). Despite Humphreys and Fleishman's (1974) view that correlationalanalysis is generally strongly preferred to the analysis of variance (ANOVA) for such problems, many researchers obviously continue to dichotomize their independent variables and

Table 2 Cell Sizes and Cell Means for the 2X2 Design Resulting From Table 1

Low

High

M

High Low

M

120.00 103.33

2 6

126.67 1 10.00

6 2

Note. Xt = errors, X2 = speed.

proceed to perform an ANOVA. Although Humphreys and Fleishman are clear in their preference for a correlational approach, some researchers may have rationalized their choice to dichotomize their variables on the basis of Humphreys and Fleishman's statement that such an approach can be useful under the proper circumstances. Specifically, Humphreys and Fleishman (1974) state that "interpretation of main effects as approximations to partial correlations rather than zero-order relationships is legitimate and useful. If dichotomization is at the means or medians, the approximation is reasonably accurate" (p. 472). The primary purpose of this article is to explore the consequences of dichotomization in designs involving two independent variables. Whereas dichotomization of a single independent variable almost always produces a conservative bias, we show that dichotomization of multiple independent variables can easily have the opposite effect. In particular, a liberal bias may emerge, so that one or more effects are actually overestimated. Illustrative Example To motivate our presentation, we consider a hypothetical example involving an educational psychologist who is interested in the effects of cognitive ability on test performance. Suppose that the psychologist has three measures available for each of a number of children: number of errors made in a laboratory cognitive task (X,), speed of response during the task (X2), and score on a standardized ability test (Y). Further suppose that the general goal is to determine the relative effects of number of errors and response speed during the laboratory task on standardized test performance.2 Table 1 contains hypothetical test data obtained from a sample of 16 children. What can be inferred from these data about the effects of number of errors (Xt) and response speed (X2) on standardized test performance (y)? One approach to answering this question is to dichotomize Xl and X2 at their respective medians. Table 2 shows the cell

2 Effect here is taken to mean the relationship between Y and either of the X variables at a fixed value of the other X variable. In the absence of randomization, effects cannot generally be interpreted as causal without satisfying a set of stringent assumptions (cf. Kenny, 1979, pp. 50-53).

183

BIVARIATE MEDIAN SPLITS

Table 3 Analysis of Variance Table From 2x2 Design Source

SS

df

MS

Xi X2 X, X Xi

833.33 133.33 0.00

1 1 1

833.33 133.33 0.00

35.38 5.66 0.00

0.0001 0.0348 1.0000

Note. Xt = errors, X2 = speed.

Using results derived by .Rosenbaum (1961) and Kendall and Stuart (1958), the relevant population values can be found. Without loss of generality, we assume that both X} and X2 have been standardized. First, the expected number of observations in each cell can be determined from Sheppard's theorem on median dichotomy (Kendall & Stuart, 1958, p. 351), which states that the volume in the upper right quadrant of a bivariate normal distribution (conventionally represented as Fj. in the statistics literature) is given by Fl = 0.25 + [(arcsin p)/(2ir)]

sizes and cell means that result from this procedure. Table 3 shows the corresponding ANOVA table. There is literally no interaction in these artificial data, but both the Xt and X2 main effects are statistically significant at the .05 level. As an aside, note that the Table 3 and Table 4 results are applicable for both Type II and Type III sums of squares because of the complete lack of an interaction effect for these data. In other words, the interaction term could be omitted from the model, and the sums of squares for the main effects would remain the same for these data. Furthermore, both Type II and Type III sums of squares are based on a full model that includes both main effects (Maxwell & Delaney, 1990). Thus, the ANOVA sum of squares and associated test statistic for X2 are obtained from a model that seemingly controls not only for the correlation between Xt and X2 but also for the effect of AT, on Y. Table 4 contains the results of analyzing these data with multiple regression where Y is regressed on Xt and X2. The important point for our purposes is that the effect of X2 on Y is now literally zero, in stark contrast to the statistically significant effect obtained in the 2 X 2 ANOVA. Why has the 2 X 2 ANOVA revealed an effect that the regression analysis failed to detect? As we show, the answer is that the bivariate dichotomization of X, and X2 has led to a situation in which the estimated effects of X{ and X2 on yare biased.

where p is the population correlation between AT, and X2. By symmetry, Equation 2 also provides the volume in the lower left quadrant, and the volume in the other two quadrants can be found from subtraction. Second, Rosenbaum (1961) showed that the expected value of ATi in a truncated bivariate normal distribution is given by

i >0,X2> 0) = [.199 (1 + p)]/{.25 + [(arcsin p)/(2»)]}.

To illustrate the possible bias that can result from artificially dichotomizing continuous measures, we suppose that unbeknownst to the researcher, standardized test performance (Y) is related to number of errors made in the laboratory task (A!,) but not to response speed (X2). Specifically, we assume that Xlt X2, and Y have a trivariate normal distribution, in which the zero-order correlations between all pairs of variables are nonzero in the population but where the correlation between Yand X2 controlling for Af, (that is, the correlation between Y and X2 at each fixed value of A"i) is zero. Thus, in the population, the relationship of Y to A, and X2 can be written as

(3)

The population variance of AT, in the upper right quadrant of the truncated distribution can be found by making use of the following expression from Rosenbaum for the expected value of A",2: £(A',2|A"1 > 0, X2 > 0) = 1 + 0.3989p •

(4)

where Ft is the volume in the quadrant. Similar expressions can be used to find the expected value and variance of A", in each of the other quadrants (see Appendix for details). Substituting Equations 3 and 4 into Equation 1 provides the means and variances for Y within each cell of the 2 x 2 design. For example, E(Y\Xi > 0, X2 > 0) = 00 +

Mathematical Relationships

(2)

f, > 0, X2 > 0) (5)

and var(y|A-,>0,A- 2 >0) = (3,2 var (Xl\Xl > 0, A"2 > 0) + var (t).

(6)

To consider the practical implications of Equations 1 through 6, consider a situation in which Equation 1 describes the relationship among Y, Xt and X2. Specifically, assume that /J0 = 100, /?, = 10, (rx2 = 1, and a2 = 104, which implies that pXlY = .7. Furthermore, assume that pXlX2 = .5 and that pX2f = .0. It then follows that pX2Y = .35, in which case the partial correlation between A"2 and Y controlling for A", equals 0. In other words, as a simple consequence of Equation 1, X2 and Y are unrelated to

Y=

(1) because X2 makes no unique contribution to the explanation ofX We now consider what the likely effect would be if the researcher decided to analyze sample data by artificially dichotomizing both Xl and X2, perhaps in an effort to be able to analyze the data with a 2 X 2 ANOVA instead of having to use multiple regression. The results of the 2 x 2 ANOVA would be determined by the number of observations in each of the four cells, the four cell means, and the average within-cell variance.

Table 4 Regression Analysis Variable

b

SE

t

n

Intercept X,

50.00 5.00 0.00

1.64 0.19 0.14

30.56 26.87 0.00

0.0001 0.0001 1.0000

Note. X, = errors, X2 = speed.

184

SCOTT E. MAXWELL AND HAROLD D. DELANEY

Table 5 Population Cell Means for Y in 2 X 2 Design Yielded by Performing Median Splits on Bivariate Normal Data When pYXl = .7 and pXiX2 = .5

5

X,

High Low

Low

High

94.0 91.0

109.0 106.0

X

one another at each fixed value of Xt. What would happen if both Xi and X2 were dichotomized at their respective medians? Table 5 shows the population means for Ffor the resultant 2 X 2 factorial design. It can be shown from Equation 6 that the population pooled within-cell standard deviation for Y equals 11.76 in this situation. Although X2 has no true effect on Y in the trivariate normal distribution, the means in Table 5 clearly show that the X2 factor will have a nonzero main effect in the 2 x 2 design. In fact, the standardized effect for the X2 factor is d= 0.25, somewhat less than midway between a small and a medium effect size (Cohen, 1988), although in reality X2 has absolutely no effect on Y. Furthermore, note that this bias exists in the population and is unrelated to sampling error. In addition, the X2 main effect is nonzero regardless of the type of sum of squares used to analyze the nonorthogonal data. In particular, X2 has an effect even when X, is included (and therefore seemingly controlled for) in the ANOVA model. Figure 1 illustrates the source of the bias shown in the cell means in Table 5. Each of the 100 small circles in Figure 1 represents an observation drawn from a standardized bivariate normal distribution for Xl and X2, where the correlation between the variables equals .5. The dotted horizontal and vertical lines divide the bivariate distribution into four quadrants. Each large square represents the (Xt, X2) centroid for its respective quadrant. The critical point for our purposes is that the top and bottom quadrants within each half of X{ differ not only in terms of X2 (i.e., the vertical dimension) but also in terms of X, (i.e., the horizontal dimension). For example, the mean A", score in the upper right quadrant is higher than the mean Xt score in the lower right quadrant. The same relationship holds for the upper left and lower left quadrants. From a geometric perspective, the problem is that the shape produced by the four centroids is a parallelogram instead of a square. As a consequence, comparing people whose X2 scores are above the X2 median with people who are below the X2 median confounds X2 differences with AT, differences, even if the comparison is restricted to people in a particular half of the X, distribution. As a consequence, the X2 main effect in the 2 X 2 design reflects not just an X2 difference but also any X, difference that may exist. If an X2 main effect is found, it may reflect a true X2 effect, or it may in fact reflect an A", effect. The important practical implication is that the main effects in the 2 X 2 design represent not only the

-5

-5

0

x. Figure 1. Bivariate scatterplot of individual data points and group centroids for each quadrant. (Xt = errors, X2 = speed.)

true main effect of the corresponding continuous variable but also some part of the main effect of the opposite continuous variable. As Xl and X2 become more highly correlated, the ellipse becomes more concentrated, and the confounding of effects becomes worse. The arguments to this point have demonstrated a potential bias that can result from dichotomizing a pair of continuous predictor variables. Some practical arguments for the utility of the procedure might still remain if the degree of bias were very small. However, Table 6 shows that the magnitude of the bias can in fact be quite substantial. Each tabled value represents the population standardized effect size measure d that will be obtained for the main effect in a 2 X 2 factorial design when the underlying continuous variable (X2) has no true effect on the dependent variable. Specifically, d can be written as

E(Y\X, >0,X2>0)- E(Y\XI >0,X2< 0) where aY is defined to be the square root of the weighted average within-cell variance of Y. As shown in the Appendix, d can be rewritten as the following (see Equation 8 at the bottom of the page) where F, and F2 are the areas in the upper right and lower right quadrants, respectively, of the bivariate normal distribution. The table shows that the apparent effect in the 2 X 2 design can exceed a small effect even when the variable has no true

-p)] - 2PXiY*F2[A99(\ + p)] 2 -

[. 199(1 - p)]2}

(8)

185

BIVARIATE MEDIAN SPLITS

Table 6 Standardized Effect Size d for Main Effect in 2 X 2 Design When the Corresponding Continuous Variable Has a Null Effect PX,Y

Px,x2

.1 .5 .3

0

.3

.5

.7

0.13 0.09 0.05 0.00

0.24 0.16 0.10 0.00

0.37 0.25 0.15 0.00

effect. Consistent with Equation 8, the bias increases the more PX[Y and p depart from zero. Note that the bias shown in Equation 8 and Table 6 has severe consequences for the actual Type I error rate of the X2 main effect. We have assumed that X2 has no true effect on Y, so the null hypothesis regarding this effect is true. However, when Xt and X2 are dichotomized, the population effect of X2 on Fwill be nonzero any time px,x2 and Px,raK both nonzero. Hence, the true effect of X2 on Fis nonzero for the dichotomized variables. The probability of rejecting the null hypothesis of no effect then equals the power of the test. The probability of finding a statistically significant X2 effect (at the .05 level) can be calculated analytically from the noncentrality parameter for the X2 main effect in the 2 X 2 ANOVA model. By regarding the main effect as a single-degree-of-freedom contrast (Maxwell & Delaney, 1990), procedures described by McFatter and Gollob (1986) can be used to calculate the required noncentrality parameter. Table 7 shows the resultant Type I error rate for the X2 effect as a function of total sample size and px,x2 when pXly equals .5. Four further comments are pertinent to interpreting the values displayed in Table 7. First, these Type I error rates are approximate because Equation 6 (and its counterparts for other cells) implies that the homogeneity of variance assumption for Y will be violated in the 2 x 2 ANOVA model when pX[X2 is nonzero, if the assumption is valid in the original regression model. Because large cell sizes will typically be paired with large variances, the actual error rates may be somewhat less than those shown in Table 7 (Milligan, Wong, & Thompson, 1987). However, the general tendency for inflated error rates will still occur. Second, the combination of unequal cell sizes and heterogeneity of variance calls into question the appropriateness of analyzing such data with a standard 2 x 2 ANOVA over and above the bias documented in Table 6. Third, Table 7 shows Type I error rates for the special case in which pXY equals .5. If the correlation be-

50 100 200

.06 .08 .10

.08 .12 .19

(9) As stated earlier, for a bivariate normal distribution,3

(10) (11) and xtdxu = -637 pXiXf

(12)

Thus, the partial correlation pxJKld can be written in terms of the original correlations as .798 px* - (.637 Pjr.XlX.798 P^-.

.798 ( p Y - .637

(13) (14)

V( 1 - . 405/^X1-. 637 p According to our assumptions, p^2Ki equals zero, so it follows from the definition of partial correlation that PX2Y ~

.290

.1 .05 .05 .05

Correlational Perspective Another perspective on the bias produced by dichotomization can be obtained by considering the partial correlation Px^i • According to the assumptions of Equation 1, this partial correlation equals zero for the three continuous variables. Letting XM and Xld represent the dichotomous equivalents, the partial correlation when Xt and X2 are dichotomized can be written as

Substituting from Equation 15 into the numerator of Equation 1 4 yields

Table 7 Approximate Type I Error Rate for the X2 Main Effect When pXlY=.5

N

tween Xt and Y were lower, the degree of inflation in Type I error rates would be lessened; on the other hand, if the correlation were higher, the degree of inflation would be greater. Fourth, as shown in Table 7, Type I error rates will generally increase as a function of sample size. If sample size is sufficiently large, the error rate of the test may approach 1.0. Thus, even though the continuous measure (X2) has no true effect, a researcher who dichotomizes variables in this situation is almost certain to conclude that the variable does have an effect, thereby committing a Type I error. Ironically, researchers may be most likely to dichotomize two independent variables when their sample is quite large because they may then believe that they can afford the loss in power they have been led to anticipate. However, it is exactly this situation in which the true result of dichotomization is likely to be a badly inflated Type I error rate.

.10 .18 .31

(16)

3

Although the .637 reduction factor for Px.jx^derived by Peters and Van Voorhis (1940, pp. 395-398) has generally been regarded as exact,

it is in fact an approximation. The .637 figure closely approximates the true reduction factor of (2 arcsin pXlxi)/fa>x,x2) as long as pXtXl is .5 or less.

186

SCOTT E. MAXWELL AND HAROLD D. DELANEY

Thus, although the partial correlation for the continuous variables is zero, the partial correlation when Xl and X2 are dichotomized will be nonzero any time both pXiXl and pXjY are nonzero. Consistent with Equation 8 and Table 6, the bias worsens the more pXfXi and pXlY depart from zero. Equation 16 demonstrates the bias produced by dichotomization more clearly than does Equation 8, primarily because the correlational approach is naturally suited to continuous measures, unlike the ANOVA underlying Equation 8, which requires an artificial dichotomization. However, from another perspective, Equation 8 is more useful than Equation 16. Although Equation 16 clearly shows conditions under which bias will be present, it cannot be used to assess the magnitude of the bias produced by using ANOVA to analyze these artificially dichotomized variables. Equation 8 is important because it provides a direct measure of the magnitude of this bias from the ANOVA perspective. Another useful perspective for understanding the bias produced by dichotomization comes from the literature on the effects of measurement error. In essence, dichotomizing a continuous predictor variable can be conceptualized as adding error of measurement to the variable. As a result, the effects of dichotomization are similar to the effects of random error of measurement, which have been documented for multiple regression by such sources as Cochran (1968) and Kenny (1979). In particular, error of measurement in a predictor variable biases both its own regression weight and the weights of other correlated predictors. If Xt and X2 are positively correlated, as in our examples, error of measurement in Xl will tend to deflate the regression coefficient for Xl but inflate the coefficient for X2. Thus, even if the true population weight for X2 is zero, its expected value will be positive in an equation where X, is artificially dichotomized (cf. Kenny, 1979, p. 81). At the same time, the true effect ofXt may be underestimated, so previous methodologists who have asserted that dichotomization can reduce power are correct, even in multivariate situations. Paradoxically, dichotomization of two or more predictor variables can reduce power for some of these variables while simultaneously leading to spurious statistical significance for others. Figure 2 contains an asymmetric scatterplot matrix (Cleveland, 1985) to illustrate why dichotomizing X, inflates the regression coefficient for X2. Each of the four parts of Figure 2 shows a scatterplot for a pair of variables that is based on the data originally shown in Figure 1. The two columns of Figure 2 correspond to Xt and X2, respectively, that is, X, is the horizontal dimension for the two scatterplots on the left, and X2 is the horizontal dimension for the two scatterplots on the right. The top row represents the residual formed when Y is regressed on Xt (the continuous Xt variable); the bottom row represents the residual formed when Y is regressed on Xld (the dichotomized Xt variable), that is, CRESID is the vertical dimension for the two scatterplots on the top, and DRESID is the vertical dimension for the two scatterplots on the bottom. The scatterplot between CRESID and JV, in the upper left hand square shows that there is no relationship between Xt and the residual formed when Y is regressed on Xl. The lack of any linear relationship simply reflects a well-known property of ordinary least squares, namely that the residual is always uncorrelated with the predictor. The scatterplot to the right (between CRESID and X2) shows further that X2 also has no relationship with this

DRESID

CRESID

Figure 2. Scatterplot matrix of residuals (CRESID and DRESID) with X\ (errors) and X2 (speed).

residual. Thus, X2 contributes nothing to the prediction of Y once Xt has been controlled for. However, the bottom row of scatterplots shows a very different picture. When Y is regressed on Xid, a nonzero linear relationship remains between the residual (DRESID) and X{, because Xtd has failed to capture the full extent of the relationship between Fand Xt. To the extent that X2 is correlated with Xit it will also correlate with DRESID, so that X2 will predict Y after Xid has been controlled for, even though X2 has no true effect on Y. This is indicated in the figure by the positive slope of the regression line in the lower right quadrant.

Testing Interactions One possible reason for dichotomizing two continuous measures is to test their interactive effect on a third variable. The rationale for this procedure is quite likely a result of the centrality of the concept of interaction in ANOVA textbooks, accompanied historically by a lack of concern for testing interactions between continuous variables using multiple regression. However, as pointed out most vividly by Cronbach and Snow (1977), artificial dichotomization is likely to reduce power substantially for testing interactions. Although many multiple regression textbooks still fail to discuss interactions between two continuous variables, the topic has begun to receive attention in some texts (e.g., Cohen & Cohen, 1983, pp. 320-325). More recently, two books have appeared that are devoted solely to the topic of testing interactions in multiple regression (Aiken & West, 1991; Jaccard, Turrisi, & Wan, 1990). As these sources clearly point out (e.g., Aiken & West, 1991, pp. 167-168; Jaccard et al., 1990, pp. 48-49), artificial dichotomization will usually reduce the statistical power to detect an interaction.

187

BIVARIATE MEDIAN SPLITS

Although it is unquestionably true that artificial dichotomization can dramatically lower power to detect true interactions, it is also nevertheless true that dichotomizing can create the illusion of an interaction where none exists. Lubinski and Humphreys (1990) have shown how nonlinear effects in regression models can masquerade as interaction effects. Artificially dichotomizing variables results in an inability to distinguish nonlinear effects from interaction effects. To understand how artificial dichotomization can produce an illusory interaction, suppose that unbeknownst to a researcher, the relationship among Y, Xt, and X2 is defined by the following equation: (17) Y = ft, + 0i*i + 02*2 + 03*i2 + Note that Xt has a quadratic effect on Y but that Xl and X2 do not interact. Furthermore, suppose that JT, and X2 follow a standardized bivariate normal distribution with correlation coefficient p. IfXi and X2 are dichotomized at their respective population medians, the cell means for Y can be found directly from Equations 3 and 4 (and their counterparts for the other quadrants). To simplify notation, let ft,,, nn, p 2l , and ^22 refer to the cell means for Y in the upper left, upper right, lower left, and lower right quadrants, respectively (see Figure 1). Similarly, let F, represent the volume in the lower left and upper right quadrants and F2 the volume in the upper left and lower right quadrants. It then follows that the population means for the four cells are given by the following expressions: .199(1 -p) = 0o + (ft - 0i)

(18) ft)

M21 = 00 - (01 +

.199(1 + p ) F,

/ + ft 1 + • \ .199(1 + p ) 02) F, + 0, 1 +

(19)

(20)

Equation 23 can be simplified to .3989 ftpVl - p2 _ = 0, which (unless p = 1) further simplifies to

ftp = o.

(25)

Thus, there will be no population interaction in the 2 X 2 design as long as either p = 0 or ft = 0. When the two predictor variables are correlated (i.e., p * 0), ft must equal 0, or there will appear to be an interaction. If Xl and X2 are correlated and either has a quadratic effect on Y, artificially dichotomizing the predictors will also produce an artificial interaction even though in reality the variables do not interact. Furthermore, the practice of artificially dichotomizing is especially pernicious here, because the dichotomization prohibits the researcher from being able to follow Lubinski and Humphreys's (1990) recommendations to disentangle interaction effects from nonlinear effects. For example, suppose that Y is related to A', and X2 in the following manner:

(26)

Y= 100

where X\ and X2 follow a standardized bivariate normal distribution with a correlation of .5, and t is a random error term with a variance equal to 9. Tables 8 and 9 show results obtained from applying a 2 X 2 ANOVA to a randomly generated sample of 500 observations from such a population using SYSTAT (1990). As expected from Equations 18 through 21, although both Xi and X2 have an effect on Y, the effect varies as a function of the specific levels of X, and X2. Furthermore, the ANOVA table in Table 9 shows that the Xl X X2 interaction is statistically significant for these data. However, as Equation 26 shows, these data were in fact generated from a model in which A', and X2 do not interact. When nonlinear effects exist, artificially dichotomizing two correlated predictor variables will produce cell means where the population interaction is nonzero, even when the continuous variables do not truly interact. In this situation, the 2 X 2 ANOVA can lead to a badly inflated Type I error rate for the interaction test.

.199(1 -p)

Summary

M22 = 00 + (01 - 02)

1 -•

(24)

(21)

There will be no population interaction among the 2 x 2 cell means if and only if Mn - Mi2 - M2i + M22 = 0, (22) which from Equations 18 through 21 will be true if and only if

In summary, effects of dichotomized variables in 2 X 2 factorial designs are often biased estimates of the true effect of the underlying continuous variable. Researchers who justify their Table 8 Cdl Means

&Data RandornlV Generated from Equation 26

203 1 -

(23)

High Low

Low

High

100.40 99.76

103.14 100.70

188

SCOTT E. MAXWELL AND HAROLD D. DELANEY

Table 9 ANOVA Table for Data in Figure 5 Source

x,

X2 X , X X-i

SS

df

MS

375.99 263.01 89.99

1 1 1

375.99 263.01 89.99

25.29 36.15 8.65

.001 .001 .003

Note. ANOVA = analysis of variance

use of bivariate median splits cannot fall back on the argument that their results were statistically significant in spite of their taking a conservative approach. When two (or more) continuous predictor measures are dichotomized, the resulting 2 x 2 analysis is not necessarily conservative. Instead, there is the potential for an effect that is truly zero for a continuous measure to be estimated as a small to medium effect in the 2 x 2 factorial design. The extent of bias worsens as the continuous measures become more highly correlated. The potential for misinterpretation is quite high, because many of the measures studied this way in the psychological literature are undoubtedly highly correlated (e.g., depression and self-esteem). This recommendation to avoid performing median splits on continuous variables should not be interpreted as implying that all psychological variables are necessarily best regarded as dimensional instead of as types or classes. However, as Gangestad and Snyder (1985), Tellegen and Lubinski (1983), and others have pointed out, different methodologies are appropriate for studying discrete classes, so even researchers who postulate the existence of types will rarely find median splits to be appropriate for testing their hypotheses. Thus, as many other methodologists have argued, there is almost always a cost associated with dichotomization. However, the perspective offered here is that the cost is not necessarily one of conservatism, but instead may be an increase in Type I errors. As a consequence, statistically significant main effects and interactions obtained with dichotomized variables in factorial designs cannot necessarily be trusted as reflecting true population effects of the corresponding continuous variables.

References Aiken, L. S., & West, S. G, (with Reno, R. R.). (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage. Ashman, A. F. (1982). Strategic behavior and linguistic functions of institutionalized moderately retarded persons. Internalional Journal of Rehabilitation Research, 5, 203-214. Bern, S. L. (1977). On the utility of alternative procedures for assessing psychological androgyny. Journal of Consulting andClinical Psychology, 45,196-205. Block, J., Block, J. H., & Harrington, D. M. (1974). Some misgivings about the Matching Familiar Figures Test as a measure of reflection-impulsivity. Developmental Psychology, 10, 611-632. Cleveland, W S. (1985). The elements of graphing data. Monterey, CA: Wadsworth. Cochran, W G. (1968). Errors of measurement in statistics. Technometrics, 10, 637-666.

Cohen, J. (1978). Partialed products are interactions; partialed powers are curve components. Psychological Bulletin, 85, 858-866. Cohen, J. (1983). The cost of dichotomization. Applied Psychological Measurement, 7, 249-253. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304-1312. Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum. Cronbach, L. J., &Snow, R. E. (1977). Aptitudes and instructional methods: A handbook for research on interactions. New York: Wiley. Gangestad, S., & Snyder, M. (1985). "To carve nature at its joints": On the existence of discrete classes in personality. Psychological Review, 92, 317-349. Humphreys, L. G. (1978). Research on individual differences requires correlational analysis, not ANOVA. Intelligence, 2, 1-5. Humphreys, L. G, & Fleishman, A. (1974). Pseudo-orthogonal and other analysis of variance designs involving individual-differences variables. Journal of Educational Psychology, 66, 464-472. Jaccard, J., Turrisi, R., & Wan, C. K. (1990). Interaction effects in multiple regression. Newbury Park, CA: Sage. Kendall, M. G, & Stuart, A. (1958). The advanced theory of statistics. London: Griffin. Kenny, D. A. (1979). Correlation and causality. New York: Wiley. Lubinski, D., & Humphreys, L. G. (1990). Assessing spurious "moderator effects": Illustrated substantively with the hypothesized ("synergistic") relation between spatial and mathematical ability. Psychological Bulletin, 107, 385-393. Lubinski, D., Tellegen, A., & Butcher, J. N. (1983). Masculinity, feminity, and androgyny viewed and assessed as distinct concepts. Journal of Personality and Social Psychology, 44, 428-439. Madison, L. S., Madison, J. K., & Adubato, S. A. (1986). Infant behavior and development in relation to fetal movement and habituation. Child Development, 57, 1475-1482. Maxwell, S. E., & Delaney, H. D. (1990). Designing experiments and analyzing data: A model comparison perspective. Belmont, CA: Wadsworth. Maxwell, S. E., Delaney, H. D., & Dill, C. A. (1984). Another look at ANCOVA versus blocking. Psychological Bulletin, 95, 136-147. McFatter, R. M., & Gollob, H. F. (1986). The power of hypothesis tests for comparisons. Educational and Psychological Measurement, 46, 883-886. McNemar, Q. (1969). Psychological statistics (4th ed.). New York: Wiley. Milligan, G. W, Wong, D. S., & Thompson, P. A. (1987). Robustness properties of nonorthogonal analysis of variance. Psychological Bulletin, 101, 464-470. Peters, C. C., & Van Voorhis, W R. (1940). Statistical procedures and their mathematical bases. New \brk: McGraw-Hill. Rosenbaum, S. (1961). Moments of a truncated bivariate normal distribution. Journal of the Royal Statistical Society (Ser. B), 23, 405408. Schill, T., & O'Laughlin, M. S. (1984). Humor preference and coping with stress. Psychological Reports, 55, 309-310. Spence, J. T. (1983). Comment on Lubinski, Tellegen, and Butcher's "Masculinity, femininity, and androgyny viewed and assessed as distinct concepts." Journal of Personality and Social Psychology, 44, 440-446. Spence, J. T., & Helmreich, R. L. (1978). Masculinity & femininity: Their psychological dimensions, correlates, and antecedents. Austin: University of Texas Press.

BIVARIATE MEDIAN SPLITS Stoltz, R. E, & Galassi, J. P. (1989). Internal attributions and types of depression in college students: The learned helplessness model revisited. Journal of Counseling Psychology, 36, 316-321. SYSTAT. (1990). SYSTAT. Evanston, IL: Author. Tano, T. L. (1988). Effects of Type A personality and leisure ethic on Chinese college students' leisure activities and academic performance. Journal of Social Psychology, 128,153-154.

189

Tellegen, A., & Lubinski, D. (1983). Some methodological comments on labels, traits, interaction, and types in the study of "femininity" and "masculinity": Reply to Spence. Journal of Personality and Social Psychology, 44, 447-455. Tolor, B., & Tolor, A. (1982). An attempted modification of impulsivity and self-esteem in kindergarteners. Psychology in the Schools, 19, 526-531.

Appendix Derivation of Expression for d A mathematical expression for d can be derived from the equations in the text. As shown in Equation 7, d can be calculated as

_, E(Y\Xt>0,X2>0)~ E(Y\Xl>0,X2 0) - E(Y\Xl 0, X2 > 0) - E(Xl\Xl >0,X2< 0)], which equals J0.199(l+p)

0.199(1-p)l

where F, and F2 are the respective volumes in the upper right and lower right quadrants of the bivariate normal distribution (see Figure 1). This expression for the numerator of d can be simplified by obtaining a common denominator and by expressing 0, in terms of pXl Y, because Thus, the numerator of d can be written as

The denominator of the expression for d is the pooled within-cell standard deviation of Y. Because of the symmetry of the distribution, it is sufficient once again to consider only the upper right and lower right quadrants of the distribution, so that aY is given by

°Y

/FI var (H^ > 0, X2 > 0) + F2 var (ifo, > 0, X2 < 0) V F, + F2

From Equation 6 of the text, var ( y^T, > 0, X2 > 0) = /3,2 var (Xi\X: > 0, X2 > 0) + 1

and var (Y\Xl >0,X2 0, X2 < 0) + 1. The conditional variances of Jf, can be found from

var (Xi\Xt >0,X2>0) = E(Xl2\Xt >0,X2>0)- [E(Xt\Xt > 0, X2 > O)]2 and var (Xifo >0,X2 0, X2 < 0) - [E(Xl\Xl > 0, X2 < O)]2. (Appendix continues on next page)

190

SCOTT E. MAXWELL AND HAROLD D. DELANEY

By substituting from Equation 3 and 4 of the text, straightforward but tedious algebra yields . 199(1 + p)]2 -ft2f,[Q.199(1 - p)]2 F2)

= Y

By collecting terms and substituting 0.5 for the sum of F, and F2, oy can be rewritten as UY-

v

?!2 + 1) + F,F22(;3,2 + 1) - g,2F2[0.199(1 + p)]2 -