Dept. of Political Science, 3324 Turlington Hall, University of Florida, Gainesville, FL 32611,. U.S.A. and. EDWARD G. CARMINES. Dept. of Political Science, ...
Quality" & Quantity 27: 19-30, 1993. 9 1993 Kluwer Academic Publ&hers. Printed in the Netherlands.
19
Making measurement errors and interpreting path coefficients: a practical perspective WAYNE L. FRANCIS Dept. of Political Science, 3324 Turlington Hall, University of Florida, Gainesville, FL 32611, U.S.A.
and
EDWARD G. CARMINES Dept. of Political Science, Woodburn Hall, Indiana University, Bloomington, Indiana 47405, U.S.A.
Abstract. The purpose of this analysis is to provide a practical approach to the assessment of
reliability. In particular, we examine the impact of random measurement error upon the magnitude and interpretation of standardized regression coefficients (or path coefficients) and the specification of regression models. With the proper research the relationship between measured and true values can be inferred by using path coefficients. Such inferences allow assessments of the specification of statistical models. Several examples illustrate how researchers can be misled without knowledge of the impact of measurement error.
A voluminous literature now exists focusing on the effects of measurement error in structural equation models. Indeed, the general model for the analysis of covariance structures, implemented most widely in Joreskog and Sorbore's LISREL (1988) computer program, specifies the formal connection between structural equation models representing the causal dynamics among variables and factor analytic models representing the relationships between measured and unmeasured variables. This literature, though extensive, is highly technical and thus beyond the capability of many social science researchers (see Bollen, 1989). For this reason, the assessment of measurement error is often viewed as an esoteric and technically specialized topic that is only of concern to methodological experts. Nothing, of course, could be further from the truth. Measurement is absolutely fundamental to the process of social science research. And while the assessment of measurement error can be technically quite complex, it is possible to represent and estimate measurement error in a clear, straightforward, and uncomplicated manner. The purpose of this paper is to clarify the impact of random measurement error in structural equation models,
20
Wayne L. Francis and Edward G. Carmines
specifically their effect on the magnitude and interpretation of standardized regression coefficients (or path coefficients) and the specification of regression models. The focus in this treatment will not be on theoretical matters so much as the kinds of situations the typical social science researcher encounters routinely in the analysis of data. One reason measurement error is so important in the social sciences is that at the outset, and perhaps by necessity, inaccurate procedures are employed for pinning numbers on things. The case for the seriousness of this problem has been made at length in both economics (Morgenstern, 1963) and sociology (Siegel and Hodge, 1968). In political science, many of us utilize data from election returns, yet we know that official recounts seldom give identical results. Census counts provide valuable information, but recounts are not taken and/or evaluated. Financial estimates are often compiled from thousands of transactions and it would be unlikely that such sums would be duplicated precisely in a reenactment of the aggregation process. Then we have the multitude of estimates derived from interviews and questionnaires that exhibit a variety of ways of assigning numbers to responses. The accuracy of such measurement is often suspect.
Types of error
Social scientists are concerned with several types of error, but overall these concerns can be consolidated into three areas: 1. Conceptual error. Such error occurs when the concepts or labels assigned to measure are not congruent with such measures. Typically the concepts have much broader connotations than the actual measures convey. For example, "political participation" is too broad a concept to apply to a measure that takes into account only how often people vote. Problems of this kind are sorted out by substantive specialists concerned with validity. 2. Stochastic Error. Such error appears in the error term of a regression equation and results from all of the minor influences created by the many variables that cannot be taken into account in the equation model (Wonnacott & Wonnacott, 1979, p. 23). Thus normally the researcher is advised to include in the model those variables with statistically discernable impacts, which will have the effect of reducing the magnitude of the error term. Nevertheless, it is expected that many minor influences will not be known or stipulated. Stochastic error is almost never eliminated. 3. Measurement Error. As the examples in the introductory section illustrate, measurement error is encountered in the acts of observing, recording, and compiling event and attribute data. The errors encountered in this
21
M a k i n g m e a s u r e m e n t errors
process create problems of reliability. Repeated measurement is seldom convenient, and in non-experimental research, estimates of reliability frequently are not established. It is this latter type of error that is the concern of this analysis. Most data analysts are no doubt aware of the measurement error problem. One partial solution to the problem, of course, is to measure more carefully at the outset, or to set up alternative measurement procedures on every occasion. More careful procedures can be expensive and time-consuming, and it is fair to ask whether the additional cost would be more than offset by improvements in statistical models and explanations. Most social scientists perhaps would be sympathetic to the additional costs and it is hoped that this analysis will offer a better basis for judgment.
Definitions and assumptions In order to develop the above thesis, the following distinctions are made: 1. True-x = x, representing the actual values of a variable labeled x. 2. Measured-x = x' representing the measured values of x. 3. Given the linear equation, x' = b x - e, it is assumed that the error term is random, E ( e ) = 0, and that the correlation between the true scores and the error scores is zero. 4. The above distinctions hold for all other variables (e.g., y, z). These assumptions can be altered to treat special conditions, say where measurement error is a function of the true values, but the possibilities are too numerous to treat formally here, and in any case would divert our attention from the central points to be addressed. Classical reliability theory defines "parallel" measures as those measures that have identical true scores and equal variance. The correlation between true-x and measured-x is equal to the square root of the correlation between two parallel measures, such that p x ' x -= k / p XlX2 ' '
(1)
(Carmines and Zeller, 1979). The square of the right hand term (i.e., the correlation between parallel measures) is normally referred to as the reliability coefficient," although as may be seen, it may be more direct and intuitive to work with the path coefficient (the left hand term above), since it represents the degree of correspondence between true-x and measured-x.
22
Wayne L. Francis and Edward G. Carmines
For our purposes, these results can be treated diagrammatically within a path coefficient framework, as in the following example 0.8 x~
x
0 . 8 ~
0.64
1,2
where it may be seen that the path coefficients between x and x' are identical, and that the correlation between parallel measures is predicted by
pxlx(px~x). How is this implied causal relationship to be interpreted? Essentially, the difficulties in measuring x create a certain amount of random measurement error. We cannot necessarily specify what those difficulties are, although in most cases it is clear to the researcher that the instrumentation lacks precision, or that random human error is inevitable in the procedures, or both.
Theimpact ofmeasurement error The previous section sets the foundation for the manner in which measurement error can be treated more formally. For any measured-x we can state that there is a true-x, such that x
~x'
whose path coefficient is a measurement error coefficient that represents an impact upon the empirical model. The ME coefficient is the square root of what has been called the reliability" coefficient. The empirical model presumably would include other variables. This kind of representation allows us to initiate a number of thought experiments, the simplest of which evaluates the impact of measurement error in assessing the impact of x upon y. To illustrate the above, we may show in the first case that P x )y
x'
Y'
In other words, the relationship between the measured-x and measured-y is
Making measurement errors
23
determined by the other path coefficients. The logic can be reversed also. If the path coefficient (or correlation coefficient in these recursive models) for x' and y' is known, as is usually the case, then additional knowledge of the relationships between x and x', and y and y', would allow an estimate of the path coefficient between the true values of x and the true values of y. Essentially, (2)
Pyx = Pyx,/(Px,x(Py,y))
These values must be estimated in order to assess the impact of random measurement error upon the statistical model. Further insight into possible ramifications of measurement error can be gained by examining a number of cases in which it will be assumed that the measurable path coefficients have specific values, such that Py'x'
Px'x
Py'y
Pyx
0.9 0.9 0.9 0.8
0.9 0.9 0.9 0.8
(1.00) (0.79) (0.62) (1.00)
5. o.s
0.8
0.8
(0.78)
6. 7. 8. 9.
0.8 0.7 0.7 0.7
0.8 0,7 0.7 0.7
(0.63) (1.00) (0.82) (0.61)
Case 1. 2. 3. 4.
0.81 0.64 0.5 0.64 0.4 0.49 0.4 0.3
Cases 1-9 exhibit a small range in the magnitude of correlations between true-x and measured-x, from 0.9 to 0.7, but obviously the range could be extended at both ends to cover all cases. It may be observed above that differences between estimated path coefficients (last column) for true values differ more dramatically from the path coefficients for the measured values (first column) as the measurement error increases. It is also apparent that at a given level of random measurement error, the correlation between measured variables can be high enough to suggest that the full variance in the dependent variable has been "explained" in the statistical sense. Cases 1, 4, and 7 above, for example, suggest that an ordinary least squares regression model can be fully specified and that stochastic error (as distinct from measurement error) is negligible. It is conventional wisdom in statistical analysis that the unstandardized regression coefficients (slope coefficients) are crucial to understanding cause and effect. To estimate the change in y produced by a change in x is much more important than concern for the correlation coefficient. The other side of the argument, of course, is that R 2 is indicative of whether or not the
24
Wayne L. Francis and Edward G. Carmines
model is underspecified. A large error term, or low R 2, suggests that other variables ought to be included in the model. But if we take Case 7 above we see that the correlation among the measured variables is only 0.49, and the explained variance only 0.24, yet the model is fully specified if we take into account random measurement error, such that x
1.00
> y
x'- . . . . . . . . . . .
0.49 . . . . . . . . . . . .
y'
Taken to its logical conclusion, we can concoct more extreme cases, say where the path coefficients between true and measured values are only 0.5 and the observed correlation between x' and y' is only 0.25. Again the estimated relationship between x and y is 1.00. Thus it seems that we could have many situations where random measurement error is the principal detractor in assessing the true relationships between variables. The specific illustrations above can be expanded to three or more variables without difficulty. For example, x
>y
X'
y'
>y
Z'
The same rules of deduction apply. If the path coefficients between true and observed values and the observed correlations between x', y', and z' are known, then the path coefficients between the true values of each variable can be estimated. Here again, the observed correlations between variables may be quite low, yet the model may be fully specified. In essence, the above examples and arguments suggest that it is crucial to establish the impact of measurement error in the construction of statistical models. Estimates of the coefficients representing the relationship between true and measured values of variables will allow:
1. Estimation of the correct path coefficients between variables in the model (which amounts to corrections for attenuation due to random measurement error). 2. Estimation of whether the statistical model is correctly specified.
It is clear that although empirical models may look weak in terms of explained variance, they in fact may represent correctly the appropriate linkages.
Making measurement errors
25
Table 1. Examples of reported reliability coefficientsfor various types of data (non-aggregated measures) Source
Variable
Siegel & Hodge (1968) Achen (1975) Siegel & Hodge (1968) Siegel & Hodge (1968) Achen (1975) Markus (1983) Achen (1975)
Years of School Completed Party Identification (7pts) Occupational SES Personal Income Church Attendance Recall Strength of Partisanship Eight Attitude Items on Public Policy Issues (e.g., School Integration, Foreign Aid, etc., each on a 5pt scale)
Reliability coefficient 0.93 0.88* 0.87 0.85 0.75* 0.58* 0.37-0.55
*Mean value in three wave panel studies. How much measurement error is typical?
Since reliability coefficients are seldom reported in political science, as distinguished from psychology and much of sociology, it may be difficult to gain much perspective on this subject. Table 1 includes a number of non-aggregated measures relating to political science research. Three of the variables have been examined by Siegel and Hodge following the 1960 Population Census, They matched respondents to the Current Population Survey and Post Enumeration Survey with the Census returns for three variables: (1) years of school completed, (2) personal income, and (3) occupational SES. The reliability coefficients for these variables ranged from 0.85 to 0.93. In the Achen study the reliability coefficients are calculated for respondent reports on their own political party identification, and their own church attendance. These coefficients are fairly high 0.88 and 0.75, and correspond to studies in other areas where respondents are asked to make observations or estimates of their own behavior. The single attitude items, however, generate low reliability coefficients, often below 0.5. Psychologists and sociologists have been acutely aware of this problem and as a matter of course almost never employ single attitude items in their analyses. Instead they aggregate or combine responses to items on identifiable dimensions. Reliability theory has advanced to the point where it is possible to estimate coefficients from aggregated responses (see Cronbach, 1951; Lord & Novick, 1968; Novick & Lewis, 1967; Nunnally, 1978; Carmines & Zeller, 1979). In one sense it would seem ideal if in each case the reliability of a measure would be tested by repeated measurement of the same subjects with the same instrument. Such retests, however, are often inconvenient and in addition flawed by carryover effects from one test to the next. There is general
26
W a y n e L . Francis and E d w a r d G. Carmines
agreement now that the use of alternative measurement stimuli along the same dimension on the same test offers the most efficient way to estimate reliability. The most widely accepted estimate is called Cronbach's alpha, where for a k-item measure x, oy
t
x
(0.25+) (0.45)
> z'
Again, these are maximum estimates allowing for some stochastic error (0.05 attenuation).
Conclusions
Measurement reliability plays a key role in constructing and understanding statistical models. The measured values of a variable are seldom the true values. With the proper research the relationship between measured and true values can be inferred by using path coefficients. Such inferences allow assessments of the specification of statistical models. Several examples have illustrated how the researcher can be misled without knowledge of the impact of measurement error. On a less formal basis, it seems appropriate for political scientists, like psychologists, to routinely begin to report reliability coefficients whenever possible, and especially in the use of attitude or opinion scales. Even though authors may be reporting only the descriptive aspects of their research, reliability coefficients allow us to build up experience on the accuracy of different types of scales and to construct better informed models of reality.
References Achen, C. H. (1975). Mass political attitudes and the survey response, American Political Science Review 69, 1218-1231. Bollen, K. A. (1989). Structural Equation Models with Latent Variables, New York: John Wiley & Sons. Carmines, E. G. and R[ A. Zeller (1979). Reliability and Validity Assessment, Beverly Hills, California: Sage. Cronbach, L. J. (1951). Coefficient alpha and the interval structure of tests, Psychometrika 16, 297-334. Joreskog, K. G. and D. Sorbom (1988). L1SREL: A Guide to the Program and Applications, Chicago: SPSS, Inc. Lord, F. M. and M. R. Novick (1968). Statistical Theories o f Mental Test Scores, Reading MA: Addison-Wesley.
30
W a y n e L . Francis and E d w a r d G. Carmines
Marcus, G. B. (1983). Dynamic modeling of cohort change: the case of political partisanship, American Journal of Political Science 27,717-739. Morgenstern, Oskar (1963). On the Accuracy of Economic Observations, Princeton N2: Princeton University Press. Niemi, R. G., E. G. Carmines, and J. P. McIver (1986). The impact of scale length on reliability and validity: a clarification of some misconceptions, Quality and Quantity 20,371-376. Novick, M. R. and G. Lewis (1967). Coefficient alpha and the reliability of composite measurements, Psychometrika 32, 1-13. Nunnally, J. C. (1978). Psychometric Theory, New York: McGraw-Hill. Rainey, H. G. (1983). Public agencies and private firms: incentive structures, goals, and individual roles, Administration and Society 15,207-242. Romzek, B. S. (1985). The effects of public service recognition, job security, and staff on organizational involvement, Public Administration Review 45, 283-291. Rosenberg, M. (1965). Society and the Adolescent Self Image, Princeton University Press. Siegel, P. M. and R. W. Hodge (1968). A causal approach to the study of measurement error, chapter in H. M. Blalock Jr. and A. B. Blalock (eds.), Methodology in Social Research, 28-59. Tyler, T. R. and R. Weber (1982). Support for the death penalty: instrumental response to crime, or symbolic attitude?, Law and Society Review 17, 21-45. Wiley, D. E. and J. A. Wiley (1970). The estimation of measurement error in panel data, American Sociological Review 35, 112-117. Wonnacott, R. J. and T. H. Wonnacott (1979). Econometrics, New York: Wiley.