Document not found! Please try again

Measurement Error and Regression to the Mean in ...

2 downloads 0 Views 697KB Size Report
matched group "regress" toward different and respectively less extreme population means of. Y, a dependent variable or post-test, creating a difference of ...
206 / SOCIAL FORCES / vol. 50, dec. 1971

1965 1970 Stevens, 1957 Stevens, 1962 1963

ogy 66:177-186. "Scaling of Apparent Viscosity." Science 144(May):1157-1158. Stevens, S. 5., and J. R. Harris 1962 "The Scaling of Subjective Roughness and Smoothness." Iournal of ExperimentaL PsychoLogy 64:489-494. Stevens, S. S., A. A. Carton, and G. M. Shickman 1958 "A Scale of Apparent Intensity of Electric Shock." Journal of ExperimentaL PsychoLogy 56: 328-334. Terrance, H. S., and S. S. Stevens 1962 'The Quantification of Tonal Volume." American Journal of PsychoLogy 75(December) :596-604. Thibaut, John, and Harold H. Kelley 1959 The Social PsychoLogy of Groups. New York: Wiley.

1964

Measurement Error and Regression to the Mean in Matched Samples' ROBERT P. ALTHAUSER, Princeton University DONALD RUBIN, Harvard University

ABSTRACT

This is a treatment of "regression to the mean" in matched samples produced by measurement error in matching variables. An equation expressing the respective contributions of regression and treatment effects to a difference of Y-means for two matched groups is derived, followed by an intuitive picture of regression effects showing their dependence on the variance in error and the normal distribution of error and of X. An illustration of a crude estimate of regression effects produced by measurement error in occupational prestige completes the paper.

Analyses of data from matched samples (and panel studies) have long been considered vulnerable to the "regression fallacy" (or regression effects or "regression to the mean") (McNemar, 1940a, 1940b; Rulon, 1941;

* We are grateful to the following for their helpful comments on earlier drafts of this paper: James Posner, Charles Werts, Robert Linn, H. M. Blalock, and Robert McGinnis. We are also indebted to the following organizations who supported the study which drew our attention to regression effects and which provided the illustrative data discussed in the latter part of this paper: the

Thorndike, 1942). The problem is said to arise because groups under study (e.g., matched groups) have been "selected for their extremity" (Campbell and Stanley, 1966: 11) on certain variables. (These variables are sometimes considered fallible; at other times considered error-free.)

College Entrance Examination Board, and the Ford, Carnegie, Esso Education, Alfred P. Sloan, New York, Woodrow Wilson National Fellowship, Seth Sprague Education, John Hay Whitney, Field and Roger Williams Straus Memorial Foundations.

Downloaded from http://sf.oxfordjournals.org/ by guest on November 5, 2015

"Matching Functions Between Loudness and Other Continua." Perception and Psychophysics 1: 5-8. "Neural Events and the Psychophysical Law." Science 170(December): 10431050. S. S., and E. H. Galanter "Ratio Scales and Category Scales for a Dozen Perceptual Continua." Journal of ExperimentaL PsychoLogy 54:377-411. S. 5., and M. Guirao "Loudness, Reciprocality, and Partition Scales." JournaL of the AcousticaL Society of America 34(September): 1466-1471. "Subjective Scaling of Length and Area and the Matching to Loudness and Brightness." JournaL of ExperimentaL Psy chol-

Measurement Error and Regression I 207

1 In our terminology of matched sampling, there are four types of variables. We are primarily interested in matched sample studies in the relationship between some dependent variable and some independent, treatment or match variable. Characteristics with respect to which two (or more) groups are matched (and so made equivalent) are matching variables. Variables outside this system of dependent, match and matching variables are extraneous or external variables. For a causal model of these variables, see Althauser and Rubin

( 1970).

generally a function of the degree of correlation...." More pointed are Campbell and Clayton (1961: 115) as they respond to Maccoby's (1956) paper which stressed measurement error: Maccoby has called attention effectively to the problem in certain types of panel analyses other than those treated here, using primarily random error considerations to explain the effects. It would however be unfortunate if her presentation led to the belief that the mistaken interpretation is limited to instances of unreliable measure or the use of broad categories of measurement. Regression effects will be present wherever the correlation is less than unity, no matter how reliable or with what degree of refinement the variables in question are measured, or what the underlying sources of correlation, or the lack thereof, may be (our

italics). The effects here discussed are tautological restatements of the fact of imperfect relationships and their degree. From this point of view, the following treatment of regression as a function of measurement error must be necessarily incomplete. Our own view is that regression should be considered solely a function of measurement error in the pre-test or matching variable. We will take for granted but not defend that view here. We offer a review and critique of the "imperfect correlation" interpretation in another paper (Althauser and Rubin, 1971). THE CONDITIONS FOR DIFFERENTIAL REGRESSION

Let us consider the conditions which underlie regression effects. We will utilize a fairly typical example: the impact of some "match variable" or treatment variable (e.g., two types of educational 'treatment') on Y (e.g., final IQ) when two groups of students have been matched on a variable X (e.g., initial IQ or SES) on which they have unequal population distributions. We will show in the equations to follow that to the degree 1. that there is random measurement error in the matching variable (initial IQ); 2. that we can assume that this error and the matching variable are normally distributed (or approximately so) ; and 3. that the ratio of the variance of this error to the variance in X is appreciable,

Downloaded from http://sf.oxfordjournals.org/ by guest on November 5, 2015

When we match, for example, we attempt to "equalize" the members of two or more experimental or natural groups with respect to one or more characteristics or matching variables.' In doing this we usually oversample members of each group who are decidedly above or below the average values of each characteristic; because prior to matching, the average characteristics of the respective groups differ. As a result, the values of the matching variables of each matched group "regress" toward different and respectively less extreme population means of Y, a dependent variable or post-test, creating a difference of Y -means which is in part artifactual. Previous treatments of this problem have produced two distinct interpretations of regression. Those who see these effects as a manifestation of measurement error include McNemar (1940a; 1940b), Rulon (1941), and Maccoby (1956). Others see regression effects in the usually imperfect correlations between matching or pre-test and dependent or post-test variables. Among those holding this view, Thorndike (1942) says very little about measurement error producing the imperfect correlation. Campbell and Clayton (1961) and Campbell and Stanley (1966) hold that both measurement error and imperfect correlation per se (between error-free tests) produce regression. Some like Lord (1962), Campbell and Stanley ( 1966), and Hovland et ai. (1949) blend the two interpretations. Of all these treatments, the most recent and influential have been those by Campbell and his associates. They have stressed the interpretation of regression as imperfect correlation per se over the interpretation of regression as measurement error. Thus, Campbell and Stanley ( 1966: 11 ), write "While regression has been discussed here in terms of error, it is more

208 I SOCIAL FORCES I vol. 50, dec. 1971

+ {3(Xi j - jJ.X;) + d ij (jJ.Y i - (3jJ.x) + {3Xi j + d.,

(1) Y i j = jJ.Yi

=

where jJ.Yi = the mean value of the final measurement of IQ in group i jJ.x i = the mean value of the initial measurement of IQ in group i {3 = the regression coefficient of Y on X (assumed the same in both groups) d., = the residual after Y i j is regressed on Xij, with E(dij/ Xij)

=

O.

The effect of the match variable (or the treatment effect) is the expected difference in the final measures of IQs of exactly matched students (i.e., they have identical X measures). This difference is also a difference of Y-intercepts: (2)

T

= (jJ.Yl -

jJ.Y2) - (3(jJ.Xl - J.LX2)

We usually cannot measure a "true" Xi, or Y i j free of measurement error, but rather a fallible X~J = Xij + e a, and Y~j = Y i j + lij. As be2 We need the assumption of normal errors and normal X in order that the regression of Y on X' remains linear, and hence the equations to follow are simply derived. Our dependence on the normality of errors is more a matter of theoretical convenience than our assumption that X is normal. It is easy to imagine distributions of X which are non-normal (e.g., bimodal distributions) for which regression to the mean does not occur. When errors are non-normal, however, regression to the mean probably still occurs for most conceivable distributions of errors.

fore, we assume E(eij/ Xij) = 0 and E(fiJ/Yij)= 0, where e ij is the measurement error in X~j and iii is the error in Y'. Then if x.; e., and lij are normally distributed, the regression of Y' on X' remains linear: (3) Y~j

=

»r. + {3' (X~j

- IJ~Xl)

d:

+ d: j

Y:

where j is the residual after j is regressed on X:j, with E(d: j/ X:j) = 0, and {3' is the "attenuated" regression coefficient. Only in the absence of measurement error in Xij will {3 = {3'. What we want to estimate is the effect (T) of the match variable. The usual estimate is the difference of observed means in Y, (Yi. - y~.). Taking expectations of this difference, assuming samples exactly matched on the fallible h we have

X:

(4) E (Y~.

- Y~.)

(5)

=

E(jJ.Yl -+- {3' (X~. - jJ.Xl» - E(jJ.Y2 {3' (X~. - jJ.X2»

= (jJ.Yl -

X~.)

(6)

+ jJ.Y2) + (3' E(X{. - {3' (jJ.Xl

= (jJ. Yl-jJ. Y2) -

-

jJ.X2)

(3' (jJ.Xl-jJ.X2)

since with exactly matched samples, X{. = X~ .. If we compare equation (2) with (6), we see that E( Y~. - Y~.) = T only if {3 = {3'. That is, the difference of observed means in Y will estimate the treatment effect only if there is no measurement error in X~j. If Xij and eii are independent and normally distributed with variances u~ and u~ respectively, it can be shown that (7) {3' = {3/[J

+ (uVu~)]

= {3[1 -

(u~/uD/ 1

+ (u;/uDl

Solving equation (2) for (jJ.Yl - jJ. Y2) and substituting into (6), we have (8) E(Y~. -

Y;.)

=

T

+ (jJ.Xl

- jJ.X2) ({3 - (3')

After substituting equation (7) into (8), we finally arrive at a general expression for the average value of the difference of observed means in Y: (9) E( Y;.

-

Y~.) =

T

(treatment effect)

+ {3 (jJ.Xl -

jJ. X2) [1

u~/u~

+ (u~/u;,)l

(regression effect)

Downloaded from http://sf.oxfordjournals.org/ by guest on November 5, 2015

regression to the mean will follow." Thus, if there are no treatment effects, then we may falsely infer a treatment effect which, in fact, only reflects the initial difference of group means on IQ (fLXl - fLX2) in favor of group 1. In the more general case, there may well be true differences between the two groups of students with respect to the final measure of IQ. Thus, genuine treatment effects and regression effects are confounded, and we may have no immediate way of knowing how much of the observed difference between groups is due to each. Letting Y i j be the final true IQ for the jth student in the ith matched group (i = 1,2) and Xij be the initial true IQ, we assume that Y ij has a linear regression on Xi;

Measurement Error and Regression I 209

AN INTUITIVE APPROACH

An intuitive picture of regression to the mean can be drawn from these conditions for regression effects: that X and e are each approximately normally distributed, and that there is at least a modest ratio of the variance of e to the variance of X. To construct this picture, let us consider (Figure 1) regression to the mean within a single group. We will consider a normal-like distribution of five of the many possible true scores (say, -2, -1, 0, +1, +2). We assume that the probabilities of these five are, respectively, .01, .04, .90, .04 and .01. Superimposed on these five scores (see Figure 1) are the normally distributed errors for these

scores. An observed score X' is, of course, equal to X + e. Now let us say that we have an observed X' = -1. With at least a modest variance of the errors distributed about each true value, it is possible that the true X underlying this observed X' is -2, -1 or O. But clearly a true score of 0 is more likely than a true score of -2-a reflection of the normal-like distribution of true Xs and the symmetrical distribution of errors. Likewise, if we observed an X' = + 1, a true score of 0 is more likely than a true score of +2. In general, then, with measurement error present, the true value of X which underlies a measured X' is more likely to lie between the value of X' and the mean of the true X s. Hence, the mean of the values of a later measurement of X' (or in matched sampling, of the dependent variable) will lie towards the mean of X relative to the value of X'. The larger the ratio of the variance of the error to the variance of X, the flatter and broader-based would be the normal curves for the distribution of errors relative to the curve for the distribution of X. The larger the ratio, the more likely it is that the observed values of X will be very different from their true values. Hence the mean value of a re-measurement of any X could "regress" further towards the mean, the larger this ratio. By contrast, none of these results would follow if the variance of the error was very small. Measured values of X would be comprised of true scores that were very close to these measured values. Nor, if the distribution of the Xs was U-shaped, would there be a greater fre-

Normally distributed errors

sco~ (~~i; Jt~S~~~:i~~~~i5:;bll~~~fh~;~~~~~~.:t~~.:~~~~~ butHltlll of errors associatedwithea.cJl tree score,

Figure 1. A Superimposition of the Two Components of a Hypothetical Distribution of Observed Scores (which is not shown): A Discrete Distribution of True Scores (-2, -1, 0, +1, +2) and Normal Distributions of Errors Associated with Each True Score

Downloaded from http://sf.oxfordjournals.org/ by guest on November 5, 2015

Since X and the measurement error in X are independently normally distributed with means !LXi and zero and standard deviations lTx and lTe respectively, the result above in equation (9) likewise depends on these assumptions. As we can see from equation (9), when the ratio of the variance of measurement error in X to the variance of X is at all sizeable, the expression in brackets to the right might not be negligible. Hence the expected value of the difference between matched groups on the dependent variable will be a sum of both treatment and regression effects. In the case where the former is, in fact, zero, we can falsely infer that it is not zero when measurement error in X is large enough to produce a regression effect. When there is a treatment effect and at least modest measurement error in X, we may not have the information on measurement error to be able to separate or identify the contributions of treatment and regression effects.

210 I SOCIAL FORCES I vol. 50, dec. 1971

=

MEASUREMENT ERROR AND IMPERFECT CORRELAnON

Consider now the implications of the preceding treatment of regression effects for the other interpretation of regression as a tautological restatement of imperfect correlation between pre- and post-test scores. Measurement error in matching variables or pre-test scores could, of course, be responsible for such imperfect correlation but it is generally not its only source. This can be seen in a simple path analysis of the correlation between falliable pre- and posttest scores, X' and Y '. In Figures 2A, 2B and

'\.x Pyx~y Pxxl

~ X~y p~

1

Pyy

2C, we let X be the unmeasured true value of the pre-test or matching variable, X' be the measure of X, Y be the error-free value of the dependent variable or later measure and e the random error affecting X', Of course, Y will also tend to be measured imperfectly, but we do not assume as much in Figure 2A. Note also that the path from X to X' is Pxx and PYX is the path from X to Y. Using the algorithm from path analysis, we can interpret the correlation between X' and Y, rX,yas: (10) r x' y = Pxx Pyx

But Pxx will depend on the amount of measurement error present in X', From reliability theory (Bohrnstedt, 1969, has an elementary review of this), we know that: 2

(11) reliability of X

where u~ is the variance of X, and u~, is the variance of X'. We can rewrite this expression in terms of the ratio of u~/ u:x-, where u~ is the variance of the ~easurement error in X', Since X' = X + e, and assuming X and e uncorrelated, and E(e) = 0, we can write (12) u~,

=

u:x-

+ u;

and equation (11) becomes 1

(13)

2

1+~ 2 Ux

Thus we see that the greater the ratio of