Alan G. Sawyer University of Florida and Thomas J. Page, Jr. University of Wisconsin. Indices of the incremental goodness fit of structural equation models can.
The Use of Incremental Goodness of Fit Indices in Structural Equation Models in Marketing Research* University of Florida and Thomas J. Page, Jr. University of Wisconsin
Alan G. Sawyer
Indices of the incremental goodness fit of structural equation models can provide useful information beyond that provided by statistical significance tests. Two such indices are described, and their advantages are illustrated with several marketing research studies. Additional discussion focuses on the issue of appropriate null hypotheses against which to test proposed models.
Although not yet common in marketing research, the use of structural equation models with unobservable variables and measurement error is becoming increasingly popular. This article briefly discusses the information that is obtained from the statistical inference tests involved in analyzing these models and the advantages of augmenting these tests with nonstatistical estimates of incremental goodness of fit. Although the disadvantages of solely relying on statistical significance tests are certainly not limited to structural equation models [ 15, 161, there are important problems involving structural equation models. Even though some of these issues have been previously commented on [3, 4, 8, 111, their importance warrants the following reemphasis and extended discussion. The issues are illustrated by applying indices of incremental goodness of fit to models in marketing research studies in which conclusions about goodness of fit were made primarily on the incomplete basis of statistical significance tests. In addition, the key issue of selecting the appropriate alternative hypotheses is discussed. In analysis of structural equation models such as latent variable causal modeling and confirmatory factor analysis, maximum likeli* Research
support
was provided
by the Dean’s
Research
Administrative Science at Ohio State. The authors acknowledge Ginter, Robert MacCallum, and Anthony Greenwald.
Fund
of the College
of
the helpful ideas of James
Send correspondence to: Thomas .I. Page, Jr., I I55 Observatory Drive, Madison, Wisconsin 53 706. Journal of Business Research 12, 297-308 0
(1984) 297
Elsevier Science Publishing Co.. Inc. 1984
52 Vanderbilt
Ave.,
New York,
NY
10017
0148-2963/84/$3.00
298
Alan G. Sawyer and Thomas J. Page, Jr.
hood estjmates of various parameters of the model’s covariance matrix (C) are compared to the sample covariance matrix (S). The chi-square statistic provides a test of the closeness of 2 to S. The smaller the chi-square compared to the degrees of freedom, the more confident one is that the model that generated 2 is a plausible description of the relationships among the variables in the population. Unfortunately, there are some problems involving statistical power with the chi-square statistic in these model tests. First, although a large sample size is desired to provide a sufficiently conservative statistical test, a very high sample size assures that nearly any 2 will be rejected as different from S since the chi-square statistic is a direct function of the sample size. Such a result could occur even when an examination of the residual matrix suggests that the proposed model represents most of the variance in the sample data. The second problem involves the opposite situation of low sample size. With small samples, the power of the test may be sufficiently low that statistically significant differences between 2 and S are unlikely. Unlike most conventional classical statistical tests, a statistically insignificant chi-square test of the difference between 2 and S means that the test fails to reject the null hypothesis that the proposed model provides a plausible representation of the sample covariance matrix. Thus, a low sample size biases the statistical significance test in favor of a proposed model, whereas a very high sample size may provide a test so sensitive that it rejects any proposed model that does not perfectly fit the sample covariance matrix. There are several alternatives which can alleviate the high sample size problems [5]. These approaches are all appropriate and can complement each other. One alternative is to examine subsamples that are not so large that rejection of the model is very likely, yet not so small as to bias the test against rejection of the model. Second, the residual matrix could be examined to assess the practical significance of the remaining unexplained information. The third alternative involves the restriction to compare only nested models, and hence, examine the differences in chi-square tests-a procedure that is much less likely to be biased by sample size. This test of the difference between two proposed models can assess whether a given model is a statistically significant improvement over another, even if the model is significantly different from a perfect representation of the sample covariance matrix itself. This article will concentrate on
Incremental Goodness
a variation of the model comparison approach that involves of incremental fit indices based on null models.
299
the use
Indices of Incremental Fit Bentler and Bonett [9] propose that a null model be used as a standard of comparison to proposed models. A null model is generally the most restricted model that is a special case of the model(s) of interest. In other words, the proposed model must be a less restricted form of the null model in order to make possible nested comparisons. The usual null model sets up the straw man of complete independence among all model terms and therefore specifies zero covariances among all model terms. The proposed model, of course, specifies certain nonzero covariances. Statistical tests of a proposed model with t degrees of freedom can then go beyond the test of whether the chi-square is statistically significantly different from zero-a test analogous to asking whether the R2 of a regression model is significantly less than 1.00. With this approach, the proposed model can also be compared to the null model withju > t) degrees of freedom. The significance test then focuses on the difference between the chi-squares for the proposed and null models with j - t degrees of freedom-a test analogous to the question of whether R 2is significantly greater than 0. These two research questions can be stated as two formal research hypotheses: Hl: The covariance matrix implied by the model does not differ from the sample covariance matrix. This research hypothesis is supported by a nonsignificant chi-square statistic comparing 2 with s. H2: The covariance matrix implied by the proposed model is closer to the sample covariance matrix than the covariance matrix implied by a null model of complete independence. This second research hypothesis is supported by a statistically significant difference in the respective chi-squares for the null and proposed model. Thus, a model might be a statistically significant improvement over a null model and support H2, but not represent the sample covariance matrix completely enough to result in an insignificant difference between 2 and S. The latter result would reject Hl . Furthermore, as suggested earlier, the improvement of a proposed model over a second proposed model can be tested by comparing the difference in the respective chi-square values.
300
Alan G. Sawyer and Thomas J. Page, Jr.
Descriptive nonstatistical indices of goodness of fit can also be calculated to augment the statistical significance tests. Bentler and Bonett [9] extended Tucker and Lewis’ [ 171 reliability coefficient to general models described by arbitrary covariance structures. It is nor =
(Qo - Qr MQo - 11,
(1)
where Q. is the ratio of the chi-square to the degrees of freedom for the null model and Qr is the same ratio for the least restrictive proposed model. Bentler and Bonett extend this index to any two models alternative to the null Pkr
=
(Qk
-
Qr V. However, several factors caused them to believe that the model provided an adequate representation of the data. First, all but one theoretical path in the model was supported; second, the large sample size (n = 1175) resulted in a very sensitive test of the model’s fit. MacKenzie and Lutz’s confidence in their model was increased by the large size of the index of fit using a null model (A = 0.96). MacKenzie and Lutz iteratively reduced the sample size from 358 and found that the chi-square became insignificant at a sample of 146 (12.4 % of the original sample). All theoretical paths were still supported at that sample size and the fit index remained acceptable. They then tested six independent holdout samples and found statistically insignificant chi-square values in four instances. MacKenzie and Lutz’s procedure demonstrates how a model could be rejected at a larger sample size but not for a smaller sample and how an index of fit that is independent of sample size can assess the fit of a model. Their procedure may prove helpful where a large sample may cause a model of some value to be rejected by the test of statistical significance. What Constitutes a Suitable Null Model? A strict null model is one in which there are no paths at all, and the only parameters being estimated are the error terms for each measured variable. In other words, it is a model of complete independence. However, this is not always the most defensible model to use. Bentler and Bonett [9] state that “In some research contexts, certain variances, covariances, or
304
Alan G. Sawyer and Thomas J. Page, Jr.
regression coefficients may be treated as known (possibly nonnull), and the (null) model MO specifies that the remaining covariances are zero” (p. 596). Later in that paper, they assert that “In general, the most restrictive, theoretically defensible model should be used in the denominator of the fit indices” (emphasis added). Thus, in some instances, the incremental fit indices should use a null model of modified independence. Modified independence means that paths between terms that are known to covary are included in the null model. This means that goodness of fit indices measure the extent of improvement over what is already known rather than the improvement compared to total independence (see [2]). Bentler and Bonett do not give an example of when such a modified independence model should be used. To illustrate this important point, we analyze the much discussed model of alienation [ 121. The data include measures of anomia and powerlessness in both 1967 and 197 1 along with education and socioeconomic indices for a sample of 932. The hypothesized latent (unmeasured) constructs were alienation in two time periods and socioeconomic status. These data were analyzed by Bentler and Bonett to illustrate how different models could be compared. In their analysis, a strict null model of complete independence was employed. We argue that such a strict null model might be misleading in this case. Certainly, one would not expect that repeated measures of the same variables, even 4 years apart, would have a zero correlation. Therefore, we suggest that in this instance the analysis include a modified null model (A!,) that specifies mutual independence among all terms except between the errors in measurement of anomia in 1967 and 1971 and powerlessness in 1967 and 1971. Instead of being restricted to zero, those two covariances should be allowed to freely vary between 0 and 1. This modified independence model could then serve as a more appropriate null model. Models which allowed other covariances to be nonzero could then be judged against a null model that does not ignore the high likelihood of some correlation of measures of the same variable over time. Figures 1 and 2 illustrate the proposed models of alienation. The proposed model Mk specifies zero covariances between errors of measurement in anomia in 1967 and 1971 and powerlessness in the two time periods (Figure l), and the second proposed model A4, includes the correlations between E, and e2 and e3 and e4, as shown in Figure 2. Several results of the analysis of these models are worth noting. First, only for model M,[x:(4) = 4.73, p = 0.3161 is the hypothesis
Incremental Goodness
305
POUERLESSRESS
c 61
61 FIGURE
FIGURE
1: Proposed
2: Proposed
POW4LLESSNESS
model Mk for the alienation
study.
model M, for the alienation
study.
67
71
Alan’G. Sawyer and Thomas J. Page, Jr.
306
of no difference between 2 and S not rejected by a chi-square test. Mk is significantly different from the sample data [x$(6) = 71.47, p < O.OOOl], as are, of course, M,[xz(15) = 2131.43, p < O.OOl] and M,,,[x$(l3) = 1487.73,~ < O.OOOl]. However, as shown in Table I, each tested model is a statistically significant improvement over the strict null model M,. Furthermore, the indices of incremental goodness of fit indicate that the modified null model M,, although a statistically significant improvement, does not account for a substantial amount of the data by itself. However, each of the proposed models, whether or not the errors of measurement are allowed to correlate over time, is a statistically significant improvement over the null model. Moreover, the incremental goodness of fit of M, over M, accounted for 80.3 % (p) and 69.7 % (A), respectively, of the overall incremental fit of M, over M,(p,, = 0.999 and AO! = 0.998). All of the incremental fit indices in Table 1 used the chi-square value for the strict null model in the denominator. When the appropriate values for the modified null are instead used in the denominator of the equations comparing M, to M,, the value of both indices is 0.998. In this case, it seems appropriate to compare the proposed model to the modified model as well as to a strict null because a proposed model ought to offer incremental explanation over what is already known rather than simply improve over no knowledge at all. However, simply allowing the errors of measurement to correlate over time does not account for the very large incremental fit accounted for by the proposed model in the anomia study. Moreover, the incremental fit of the proposed model over even the modified null model is impressively large and makes it unlikely that, for this sample, any other theoretiTable
1. Incremental Fit of Models Strict Null Models
Comparison” MO--M, Mm-M, Mm-M, MO-M, Me--M, M,-M,
X’ 643.7b 1483.0b 2126.7b 2059.96 b 66.74
U 2 9 11 9 2
of Alienation
P 0.196 0.803 0.998’ 0.999 0.923 0.076
Compared
to
A 0.302 0.696 0.998’ 0.998 0.967 0.031
a Since MC is not nested within M,,,, they cannot be compared on the basis of chi-square. bp < 0.001. ’ This calculation of p and A uses the chi-square value for M,,, in the denominator. All others use the value for M,.
Incremental Goodness
307
tally defensible model will achieve a much better fit. Such a result suggests that further research ought to concentrate on enlarging the theory and model to account for other measures, latent constructs and time periods (the original data included measures in 1966 as well as in 1967 and 1971), and the replicability of the excellent fit of the proposed model should be tested on another sample if possible. Conclusion There are deficiencies in relying solely on statistical comparisons of covariance matrices examined by a theoretical model and by sample data in structural equation modeling of unobservable variables and measurement error. In addition to examination of the absolute size of residuals, consideration of the incremental goodness of fit indices can be a very useful complement to statistical tests. Even when the incremental fit indices do not suggest a high incremental fit, they can be used as benchmarks potentially more standardizable than levels of statistical significance. When a proposed model is compared to either the null or another proposed model, these benchmarks can indicate how close to a satisfactory incremental fit a model is, even though it might be rejected as an imperfect fit by a chi-square statistical significance test. Of course, these indices measure the fit of the overall model and, when the fit is unsatisfactory, do not help indicate which parts of the model are the sources of the poor fit. Examination of the residual matrix may indicate where the model can be further improved to obtain acceptable increments in goodness of fit. The prime objective of our article has been to illustrate the advantages of augmenting the chi-square test of goodness of fit with indices of the size of the incremental fit of one model over another. We have also highlighted the potential problems of choosing an unsuitable null model for comparison. One drawback of our analysis is our reliance on Bentler and Bonett’s two indices with unknown statistical properties. Recent work by Bearden, Sharma, and Tee1 supports our intuition that A is preferable top and also suggests that a A value of 0.95, not 0.90 as Bentler and Bonett suggested, be used as a rough cutoff of value delineating a “good” fit. It is encouraging that Joreskog and Sorbom’s latest version (LISREL V) contains several new indices of incremental fit. Although these measures, like those of Bentler and Bonett, have unknown statistical properties, they have the advantages of being routinely calculated in LISREL V. Unlike Bentler and Bonett’s indices, they can be used to compare different nonnested models.
308
Alan G. Sawyer and Thomas J. Page, Jr.
References 1.
Aaker, David A., Bagozzi, Richard P. Unobservable variables in structural equation models with an application in industrial selling. Journal of Marketing Research 16: 147-158 (May 1979).
2.
Armstrong, J. Scott. Advocacy 423-428 (May 1979).
3.
Bagozzi, Richard P. Structural equation models in experimental Marketing Research 14: 209-226 (May 1977).
and objectivity
Managemenr
in science.
P. Causal Models in Marketing.
research.
New York: Wiley,
Science 25: Journal
of
4.
Bagozzi,
5.
Bagozzi, Richard and measurement (August 1981).
6.
Bagozzi, behavior (1979).
7.
Bearden, William 0.. Sharma, Subhash, Teel, Jesse E. Sample size effects Upon chisquare and other statistics used in evaluating causal models. Journal of Murketing Research 19: 425-430 (November 1982).
8.
Bentler, Peter M. Multivariate analysis with latent variables: Review ofPsychology 31: 419-456 (1980).
9.
Bentler. Peter M., Bonett, Douglas G. Significance tests and goodness fit in the analysis of covariance structures. Psychological Bulletin 88: 588-606 (1980).
Richard
1980.
P. Evaluating structural equation models with unobservable variables error: A comment. Journal of Markefing Research, 18: 375-381
Richard P., Burnkrant, Robert E. Attitude organization and the attituderelationship. Journal of Personality and Social Psychology 37:913-929
A structural salespeople.
Annual
Causal modeling.
10.
Churchill, Gilbert A., Jr., Pecotich, Anthony pay satisfactory-valence relationship among 114-124 (Fall 1982).
equation investigation of the Journal of Marketing 46:
II.
Fornell. Claes, Larker. David F. Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Reseurch 18: 39-50 (February 1981).
12.
Joreskog, Karl G., Sdrbom, Dag LISREL tional Resources, Inc., 1978.
13.
MacKenzie, Scott B., Lutz, Richard I. Testing competing theories of advertising effectiveness via structural equation models, in Research Methods and Causal Mode/ing in Marketing, William R. Darden, Kent B. Monroe, and William R. Dillon, eds. Chicago: American Marketing Association, 1983, pp. 70-75.
14.
Phillips. Lynn W., Bagozzi, Richard P. 0 n measuring organizational properties: Methodological issues in the use of key informants. Unpublished working paper, Stanford University, 1980.
15.
Sawyer, Alan G.. Ball, A. Dwayne. Statistical power and effect size in marketing research. Journal ofMarketing Reseurch 18: 275-290 (August 1981).
16.
Sawyer. Alan G.. Peter, J. Paul. The significance of statistical significance tests in marketing research. Journal ofMurketing Research 20: 122-134 (May 1983).
17.
Tucker, Ledyard R, Lewis, Charles. A reliability coefficient factor analysis. Psychomefrika 38: l-10 (March 1973).
User’s Guide. Chicago:
National
for maximum
Educa-
likelihood