PSYCHOMETRIKA—VOL. 70, NO. 4, DECEMBER 2005 DOI: 10.1007/s11336-001-0930-9
791–798
ON NONEQUIVALENCE OF SEVERAL PROCEDURES OF STRUCTURAL EQUATION MODELING
KE-HAI YUAN UNIVERSITY OF NOTRE DAME
WAI CHAN THE CHINESE UNIVERSITY OF HONG KONG The normal theory based maximum likelihood procedure is widely used in structural equation modeling. Three alternatives are: the normal theory based generalized least squares, the normal theory based iteratively reweighted least squares, and the asymptotically distribution-free procedure. When data are normally distributed and the model structure is correctly specified, the four procedures are asymptotically equivalent. However, this equivalence is often used when models are not correctly specified. This short paper clarifies conditions under which these procedures are not asymptotically equivalent. Analytical results indicate that, when a model is not correct, two factors contribute to the nonequivalence of the different procedures. One is that the estimated covariance matrices by different procedures are different, the other is that they use different scales to measure the distance between the sample covariance matrix and the estimated covariance matrix. The results are illustrated using real as well as simulated data. Implication of the results to model fit indices is also discussed using the comparative fit index as an example. Key words: generalized least squares, incorrect model, incremental fit index, maximum likelihood, noncentrality parameter.
1. Introduction Structural equation modeling (SEM) has been widely used in social and behavioral sciences. Model estimation and evaluation are the most important components of SEM. In estimating and evaluating models, the normal theory based maximum likelihood (ML) is widely used in practical data analysis. Closely related to the normal theory ML are the normal theory based generalized least squares (GLS) (Anderson, 1973; Browne, 1974; J¨oreskog & Goldberger, 1972) and the normal theory based iteratively reweighted least squares (IRLS) (Bentler, 1995, p. 216; Lee & Jennrich, 1979). When data are nonnormal, the asymptotically distribution-free (ADF) method developed by Browne (1984) will be preferable when the sample size is large enough. For normal data and correctly specified models, asymptotic equivalence of the four procedures has been established (Browne, 1974, 1984; J¨oreskog & Goldberger, 1972; Lee & Jennrich, 1979; Shapiro, 1985). Data in practice may not be normally distributed and substantive models are seldom correctly specified. However, the equivalences of these procedures are often used without checking the conditions. Such a phenomenon can be due to the fact that, with violation of conditions, nonequivalence of these procedures has not been clearly documented in the literature. The purpose of this note is to discuss issues related to the nonequivalence of these procedures and to illustrate the nonequivalence by means of specific examples. An example of incorrectly specified models is when the independence model I = diag(σ11 , σ22 , · · · , σpp ) The work described in this paper was supported by a grant from the Research Grants Council of Hong Kong Special Administrative Region (Project No. CUHK 4170/99M) and by NSF grant DMS04-37167. Requests for reprints should be sent to Ke-Hai Yuan, Department of Psychology, University of Notre Dame, Notre Dame, IN 46556, USA. Email:
[email protected].
c 2005 The Psychometric Society
791
792
PSYCHOMETRIKA
is fitted to a p-dimensional correlated data set. This model is actually used as the base model in defining several so-called incremental or relative fit indices (see Hu & Bentler, 1998). Because model I is generally incorrect for a practical data set, an incremental fit index changes substantially when evaluated by different procedures. Such an inconsistency was first noted by Tanaka (1987). La Du and Tanaka (1989) further demonstrated the variability of incremental fit indices across different estimation methods. Sugawara and MacCallum (1993) explained this phenomenon through different weight matrices in the estimation process. We will describe conditions under which different procedures are not equivalent. Detailed analysis is provided in section 2. Actually, differences among different statistics can be as big as the statistics themselves. When the statistics are not equivalent, any measures of model fit/misfit derived from these statistics will inherit the nonequivalence. Using the comparative fit index of Bentler (1990), we will further illustrate this inconsistency in section 3 by examples. 2. The Effect of an Incorrect Model on Test Statistics Let S be a p × p sample covariance matrix based on a sample of size N = n + 1. Then the vector of parameter estimates θˆ in the ML procedure is obtained by minimizing the discrepancy function FML (S, (θ)) = tr[S −1 (θ )] − log |S −1 (θ )| − p. Under the null hypothesis 0 = E(S) = (θ 0 ) and the assumption of multivariate normality of the sample, L ˆ → TML = nFML (S, (θ)) χp2∗ −q ,
where p∗ = p(p + 1)/2 and q is the number of unknown parameters in θ . It is well known that the normal theory based ML is equivalent to the normal theory based GLS and IRLS when the model structure is correct (Shapiro, 1985). However, it is not clear how these procedures differ when model (θ ) is not correctly specified. This section analytically studies their differences. Some notation is necessary for this development. For the p × p symmetric matrix S, let s = vech(S) be the p∗ -dimensional vector formed by stacking the columns of S leaving out the elements above the diagonal. Let vec(S) be the vector formed by stacking the columns of S, then there exists a unique p2 × p∗ matrix Dp such that vec(S) = Dp s (Magnus & Neudecker, 1999). A function with dots on top implies derivative, for example, . (S, ) = ∂F (S, )/∂σ and F¨ (S, ) = ∂ 2 F (S, )/∂σ ∂σ , where σ = vech(). F Treating FML (S, ) as a function of σ and using a Taylor expansion at σ = s we have .
FML (S, ) = FML (S, S) +F ML (S, S)(σ − s) + 2−1 (σ − s) F¨ ML (S, S)(σ − s) + rML (s, σ ), (1) 3 and is of order O(||s − σ || ) for σ sufficiently close to s. where rML (s, σ ) is the remainder . Because FML (S, S) = 0, F ML (S, S) = 0, and F¨ ML (S, S) = Dp (S−1 ⊗ S−1 )Dp , it follows from equation (1) that FML (S, ) = (s − σ ) W1 (s − σ ) + rML (s, σ ),
(2)
where W1 = 2−1 Dp (S−1 ⊗ S−1 )Dp . The first term on the right side of (2) equals FGLS (S, ), which can also be further written as FGLS (S, ) = 2−1 tr[(I − S−1 )2 ]. ˆ be the estimated covariance matrix based on minimizing either of the funcˆ = (θ) Let ˆ and FGLS (S, ) ˆ are not tions FML or FGLS . Then it follows from equation (2) that FML (S, ) ˆ in approximately equal unless rML (s, σˆ ) is ignorable. Note that we are speaking of the same
KE-HAI YUAN AND WAI CHAN
793
both FML and FGLS ; which implies that the difference between FML and FGLS is not just due to different parameter estimates or estimation methods, but due to the different ways of measuring ˆ Equation (2) also implies that the equivalence between FML and distances between S and . FGLS does not depend on the sampling distribution but on the correctness of the model structure. When the model is correct, whatever the sampling distribution is, s − σˆ approaches zero in ˆ probability and so is nrML (s, σˆ ). Because TML − TGLS = nrML (s, σˆ ) with TGLS = nFGLS (S, ), TML and TGLS are asymptotically equivalent. However, nrML (s, σˆ ) is asymptotically stochastically unbounded when the model is incorrectly specified (see chapter 14 of Bishop, Fienberg, & Holland, 1975). In such a case, TML and TGLS are no longer asymptotically equivalent. Actually, with misspecified models TGLS is also stochastically unbounded as n → ∞. Adding nrML (s, σˆ ) to TGLS may result in a totally different statistic TML . For a sample x1 , x2 , · · · , xN , let yi = vech[(xi − x¯ )(xi − x¯ ) ] and let Sy be the sample covariance matrix of yi . The ADF discrepancy function is given by (Browne, 1984) FADF (S, (θ )) = (s − σ (θ)) S−1 y (s − σ (θ )). When data are normal and the model structure is correct, FADF is also asymptotically equivalent to both FML and FGLS . However, when the model is incorrect, the ADF procedure is not asymptotically equivalent to FML even when data are normal. Comparing FADF with FGLS , the only difference between them is in the weight matrices. For normal data, W1 and S−1 y converge in −1 −1 ⊗ )D . Thus, W ≈ S when n is large. probability to the same weight matrix 2−1 Dp ( −1 p 1 y 0 0 It follows that the equivalence between FGLS and FADF does not depend on the model structure but only on the sampling distribution and sample size. So when a sample is approximately normal, ˆ and TGLS are more close to each other than to TML . TADF = FADF (S, ) In addition to TML , TGLS , and TADF , the normal theory based IRLS statistic is also routinely reported in standard software (e.g., EQS, SAS CALIS). This statistic is defined as (e.g., Bentler, 1995, p. 216) TI RLS = n(s − σˆ ) W2 (s − σˆ ), ˆ = 2−1 D [ −1 (θ) ˆ ⊗ −1 (θ)]D ˆ p . There are two factors that contribute to the where W2 = W2 (θ) p difference between TI RLS and TGLS . One is the different weight matrices W1 and W2 , the other ˆ by GLS and ML. Notice that different models is the different covariance matrix estimates s ˆ as a function of s and correspond to different weight matrices in TI RLS . Now, treating FML (S, ) using a Taylor expansion of this function at s = σˆ we have ˆ = (s − σˆ ) W2 (s − σˆ ) + r (1) (s, σˆ ). FML (S, ) ML
(3)
(1) (1) nrML (s, σˆ ). Similar to nrML (s, σˆ ), nrML (s, σˆ )
It immediately follows from (3) that TML =TI RLS + approaches zero in probability only when (θ) is correctly specified. So, with a misspecified (θ ), TML will not be asymptotically equivalent to TI RLS as commonly believed. The IRLS is ˆ where the weight matrix W2 (θ ) is updated only an algorithm to obtaining the ML estimate θ, at each iteration. This should be distinguished from the estimation method that minimizes the normal theory quadratic fitting function with weight given by W2 (θ ). Swain (1975) studied the property of the quadratic discrepancy function with weight W2 (θ), but no standard SEM software minimizes this function for parameter estimate. There also exists a vast literature (e.g., Bentler & Dijkstra, 1985; Satorra, 1989; Satorra & Saris 1985; Shapiro, 1983; Steiger, Shapiro, & Browne, 1985) indicating that TML , TGLS , TADF , and TI RLS all asymptotically follow a noncentral chi-square distribution. The key behind this development is the assumption E(S) = 0,n = (θ 0 ) + √ , n
(4)
794
PSYCHOMETRIKA
where is a constant matrix. Condition (4) is commonly called Pitman (1948) drift, which implies that the true population value of the covariance matrix is sufficiently close to the model for the noncentral chi-square approximation of the distribution of the test statistic to make sense. Using a Taylor expansion such as in either (1) or (3), one obtains the quadratic form of s − σˆ , whose population counterpart is σ 0 − σ (θ 0 ). Because of (4), the quadratic form asymptotically follows a noncentral chi-square distribution. Also because of (4), the higher order term represented by (1) (s, σˆ ) approaches zero in probability and can be ignored for large n. Notice that nrML (s, σˆ ) or nrML E(S) in (4) is the population covariance matrix that should not depend on n while (4) implies that the amount of misspecification in (θ ) decreases as n increases. Such a condition has an obvious fault but it provides the mathematical convenience to allow one to show that all four statistics asymptotically follow the same noncentral chi-square distribution. For an approximately correct model, (4) may be a reasonable assumption at a given sample size. However, even for normal data, the four statistics are not equally well-approximated by the same noncentral chi-square as characterized by asymptotics (see Yuan & Hayashi, 2003). Related discussions on regularity conditions can be found in Stroud (1972) and Shapiro (1983). When (θ) is not correct, min FML ( 0 , (θ)) = τ > 0 θ
(5)
with τ being a constant that does not depend on n. Then the higher order term nrML (s, σˆ ) cannot be ignored, and none of the statistics can be well-approximated by a noncentral chi-square distribution. Simulation studies conducted in Satorra, Saris, and de Pijper (1991) and Yuan and Hayashi (2003) verified this empirically. Actually, even when assumption (4) holds and data are normal, at any finite sample size, δˆ = T − df will not be an unbiased estimate of the noncentrality parameter as characterized by asymptotics (see Yuan & Marshall, 2004). ˆ is always a consistent estimate of the distance between Whatever the model is, F (S, ) 0 and (θ ) as measured by (5) (see Shapiro, 1983). Of course, this distance depends on the specific function used in measuring it, such as FML , FGLS , or FADF . As a test statistic, the weight matrix W2 in TI RLS depends on the model structure. While comparing the distance between s ˆ TI RLS also compares the weight matrix. So TI RLS is less desirable although it might and σ (θ), offer useful information about the model. Thus, we will not deal with it further in the next section. 3. Illustrations We will illustrate the differences among the statistics TML , TGLS , and TADF when (θ ) is not correctly specified as well as when it is correctly specified. We will compare the three statistics under the independence model I = (θ I ) and study their effect on the comparative fit index (CFI) (Bentler, 1990) or, equivalently, the relative noncentrality index (McDonald & Marsh, 1990). Let M (θ) be a substantively interesting model. When evaluated at I and M , the corresponding statistics will be denoted by TI and TM whose degrees of freedom are dfI and dfM , respectively. Then the widely used CFI is calculated as CFI = 1 − δˆM /δˆI ,
(6)
where δˆI = max(TI − dfI , TM − dfM , 0) and δˆM = max(TM − dfM , 0). It is obvious from (6) that the CFI corresponding to a greater TI will be greater. When model M is correct, the statistics TM ’s evaluated by the ML and the GLS procedures are comparable. When evaluated at I , TML and TGLS are not comparable, but TGLS and TADF may be comparable when n is large and data are normally distributed. Let θˆ IML , θˆ IGLS , and θˆ IADF be the parameter estimates of θ I by the procedures ML, GLS, and ADF, respectively. Weng and Cheng (1997) obtained θˆ IML = (s11 , s22 , · · · , spp ) .
KE-HAI YUAN AND WAI CHAN
795
FML (S, (θˆ IML )) = tr(R) − log |R| − p = − log |R|,
(7)
It is easy to see that
where R is the correlation matrix. Let S−1 = (s ij ), vecdiag(S−1 ) = (s 11 , s 22 , · · · , s pp ) , and let S−1 S−1 = ([s ij ]2 ) be the Hadamard product of S−1 with itself, then θˆ IGLS = (S−1 S−1 )−1 vecdiag(S−1 ) (Weng & Cheng, 1997). Notice that when S is diagonal, θˆ IML = θˆ IGLS , and they generally are not equal even asymptotically unless 0 is diagonal. Because there is no simple expression for FGLS (S, (θˆ IGLS )), we cannot compare it with equation (7) analytically, instead, some numerical comparison through examples will be given below. The parameter estimate θˆ IADF can also be obtained analytically. Let ei be the ith unit vector in the Euclidean space Rp . Let b = (b1 , b2 , · · · , bp ) with bi = vech (ei ei )S−1 y s and let −1 ˆ A = (aij ) with aij = vech (ei ei )S−1 vech(e e ). Then θ = A b. Similarly, there is no simple j j IADF y ˆ analytical form for FADF (S, (θ IADF )) and we will compare the three corresponding test statistics numerically through examples. Example 1. Our first example is based on a data set of Holzinger and Swineford (1939). This classical data set consists of 24 cognitive variables from 145 subjects. J¨oreskog (1969) used 9 of the 24 variables and studied their correlation structures with the normal theory ML method. The nine variables are, respectively, Visual Perception, Cubes, Lozenges, Paragraph Comprehension, Sentence Completion, Word Meaning, Addition, Counting Dots, and Straight-Curved Capitals. In the original report of Holzinger and Swineford (1939), the first three variables were designed to measure Spatial ability, the next three variables were designed to measure Verbal ability, and the last three variables were tested with a limited time period and were designed to measure a Speed factor in performing the tasks. Let x be the vector of the nine observed variables, then the confirmatory factor model represented by x = µx + f + ε with µx = E(x), 1.0 λ21 0 = 0 0
0
Cov(x) = + ,
and
λ31 0
0 1.0
0 λ52
0 λ62
0 0
0 0
0
0
0
0
1.0
λ83
0 0 , λ93
(8)
φ11 = φ21
φ12 φ22
φ13 φ23 ,
φ31
φ32
φ33
reflects the hypothesis of the original design. We assume the measurement errors are uncorrelated with being a diagonal matrix. The degrees of freedom for model (8) are dfM = 24. Test statistics TI and TM corresponding to the three fitting procedures for model (8) are given in Table 1. The statistic TIML is more than three times TIGLS and more than twice TIADF . This implies that the term nrML (σ , s) is more than twice TIGLS . If TIML follows a noncentral chi-square TABLE 1. Statistics TI , TM , and CFI. Based on a data set of Holzinger and Swineford (1939).
ML GLS ADF
TI
TM
CFI
502.279 138.329 201.261
51.187 42.584 57.517
0.942 0.818 0.797
796
PSYCHOMETRIKA
distribution, TIGLS or TIADF cannot follow the same noncentral chi-square distribution. Although differences exist among TML , TGLS , and TADF when evaluated at model M , comparing to their counterparts at I , the differences are minor. The substantial differences among the TI ’s mainly contribute to the differences in the CFIs. For example, the statistic TGLS gives the best support for the model, its corresponding CFI is only .818; while the CFI corresponding to TML is much greater. When test statistics TM ’s are comparable, the CFI in (6) is decided by δˆI , which is further decided by TI . Because TIML is several times the size of TIGLS and TIADF , the CFI corresponding to the ML procedure gives a better support for model (8) than those corresponding to the other two procedures when using the same cutoff value to judge the CFIs. The above example illustrates the substantial differences among the three test statistics when evaluated at the incorrect baseline model I . This inconsistency further has its effect on the fit index CFI. Although the model M in the example is also only an approximation, the differences among the three statistics at M are much smaller. In order to further study the differences among the three test statistics when models change from correct to incorrect, we present the following simulation example. ˆ where S is the sample covariance matrix with the Example 2. Let a = aS + (1 − a)(θ), Holzinger and Swineford data in Example 1 and θˆ is the corresponding ML estimate for model (8). Then all three discrepancy functions, when evaluated at the minimum of F ( a , (θ)), are monoˆ ≈ .355; tonically increasing as a increases (Yuan & Hayashi, 2004). When a = 1, FML ( a , (θ)) ˆ and FML ( a , (θ )) ≈ .355/2 when a = .719. Our comparison is based on computer simulated samples from N (µ, a ) for a = 0, .719, and 1, respectively. With model (8), these three sampling schemes, respectively, represent the correct model, a moderately misspecified model, and a more severely misspecified model. Our interest is to contrast the systematic differences among TML , TGLS , and TADF when the model is misspecified, a large sample size N = 1000 is chosen. Because the underlying distribution is normal, TADF will be approximately equal to TGLS . Test statistics and CFIs for five replications of the above design are reported in Table 2. When the model is correct, the three test statistics at M are about the same, illustrating their asymptotic TABLE 2. Statistics TI , TM , and CFI for different level of misspecifications. Five replications of normal data (N = 1000).
Correct Model (a = 0)
Incorrect Model I (a = .719)
Incorrect Model II (a = 1)
TI
TM
CFI
TI
TM
CFI
TI
TM
CFI
ML GLS ADF
2928.180 782.056 803.672
15.643 15.576 16.045
1.000 1.000 1.000
3066.425 829.896 917.087
143.846 130.698 141.570
0.960 0.866 0.867
3231.118 909.326 1019.833
302.601 257.419 283.489
0.913 0.733 0.736
ML GLS ADF
3096.756 767.565 865.982
16.859 16.118 16.211
1.000 1.000 1.000
3285.437 864.991 1003.340
204.746 189.921 187.827
0.944 0.800 0.831
3467.253 964.911 1115.922
385.946 334.624 336.665
0.895 0.666 0.710
ML GLS ADF
2975.828 798.883 851.138
27.656 26.399 27.131
0.999 0.997 0.996
3187.866 894.104 993.668
232.903 205.714 217.503
0.934 0.788 0.798
3378.542 985.066 1106.814
420.715 347.303 365.034
0.881 0.659 0.682
ML GLS ADF
3175.763 800.201 815.401
28.277 28.298 28.616
0.999 0.994 0.994
3286.358 852.364 881.314
133.665 128.482 140.687
0.966 0.872 0.862
3439.547 932.470 975.346
283.199 255.869 279.545
0.924 0.741 0.728
ML GLS ADF
3296.301 823.643 840.309
30.353 29.653 33.438
0.998 0.993 0.988
3419.410 877.696 950.674
159.996 145.525 156.286
0.960 0.856 0.855
3573.015 952.415 1045.677
316.421 272.779 291.440
0.917 0.729 0.735
KE-HAI YUAN AND WAI CHAN
797
equivalence. Even though there still exist substantial differences among TML , TGLS , and TADF at model I , the differences among the corresponding CFIs are minimal. This is because, when δˆM is small enough, the effect of TI on CFI is ignorable, as can be seen from (6). Actually, CFI will be 1.0 when δˆM = 0 regardless of the value of TI > dfI . When model M is moderately misspecified, the differences among the TM ’s by the three fitting procedures are more obvious, which, together with the differences among the corresponding TI ’s, have a substantial effect on the CFIs. The contrast among the different procedures is more salient with the more severely misspecified model, as reflected in the last two columns of Table 2. Notice that the difference between GLS and ADF is only due to a finite sample effect for any of the models. This difference is not trivial as can be visually inspected through the different rows in Table 2. 4. Conclusion Various studies have been conducted on the performance of TML , TGLS , and TADF for model evaluations. It is commonly believed that the ML, GLS, and ADF procedures are asymptotically equivalent when data are normal. To our knowledge no systematic study has been made about their nonequivalence when a model is misspecified. Actually, there exist substantial systematic differences among the three statistics. The difference between TML and TGLS is due to model misspecification, not the distribution of the sample. The difference between TGLS and TADF is due to the distribution of the sample and a small sample size, not model misspecification. The difference between TML and TADF can be due to model misspecification, distribution of the sample, as well as a small sample size. When a systematic difference exists, it is created by two factors. One factor is that different discrepancy functions measure distances between different covariance matrices. For example, with the independence model, FML measures distance between S and (θˆ IML ) while FGLS measures distance between S and (θˆ IGLS ), and (θˆ IML ) does not equal (θˆ IGLS ) even approximately unless 0 is diagonal. The distance between S and (θˆ IML ) is created by all the off diagonal elements sij of S. While the distance between S and (θˆ IGLS ) or (θˆ IADF ) is created by all the off diagonal elements as well as by elements in the diagonal of S. The other factor is that these discrepancy functions use different scales to measure the differences between matrices, as discussed in section 2. The systematic differences among the test statistics also create differences on model fit indices that depend on these statistics or estimation methods. Consequently, blind use of the same cutoff value for a fit index is inappropriate (see Hu & Bentler, 1999; McDonald & Ho, 2002). References Anderson, T.W. (1973). Asymptotically efficient estimation of covariance matrices with linear structure. Annals of Statistics, 1, 135–141. Bentler, P.M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238–246. Bentler, P.M. (1995). EQS structural equations program manual. Encino, CA: Multivariate Software. Bentler, P.M., & Dijkstra, T.K. (1985). Efficient estimation via linearization in structural models. In P.R. Krishnaiah (Ed.), Multivariate analysis VI (pp. 9–42). Amsterdam: North-Holland. Bishop, Y.M.M., Fienberg, S.E., & Holland, P.W. (1975). Discrete multivariate analysis: Theory and practice. Cambridge, MA: MIT Press. Browne, M.W. (1974). Generalized least-squares estimators in the analysis of covariance structures. South African Statistical Journal, 8, 1–24. Browne, M.W. (1984). Asymptotic distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62–83. Holzinger, K.J., & Swineford, F. (1939). A study in factor analysis: The stability of a bi-factor solution. University of Chicago: Supplementary Educational Monographs, No. 48. Hu, L.T., & Bentler, P.M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424–453. Hu, L.T., & Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.
798
PSYCHOMETRIKA
J¨oreskog, K.G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34, 183–202. J¨oreskog, K.G., & Goldberger, A.S. (1972). Factor analysis by generalized least squares. Psychometrika, 37, 243–260. La Du, T.J., & Tanaka, J.S. (1989). The influence of sample size, estimation method, and model specification on goodness-of-fit assessments in structural equation models. Journal of Applied Psychology, 74, 625–636. Lee, S.-Y., & Jennrich, R.I. (1979). A study of algorithms for covariance structure analysis with specific comparisons using factor analysis. Psychometrika, 44, 99–114. Magnus, J.R., & Neudecker, H. (1999). Matrix differential calculus with applications in statistics and econometrics. New York: Wiley. McDonald, R.P., & Ho, R.M. (2002). Principles and practice in reporting structural equation analyses. Psychological Methods, 7, 64–82. McDonald, R.P., & Marsh, H.W. (1990). Choosing a multivariate model: Noncentrality and goodness of fit. Psychological Bulletin, 107, 247–255. Pitman, E.J.G. (1948). Lecture notes on nonparametric statistical inference. Columbia University. Satorra, A. (1989). Alternative test criteria in covariance structure analysis: A unified approach. Psychometrika, 54, 131–151. Satorra, A., & Saris, W.E. (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50, 83–90. Satorra, A., Saris, W.E., & de Pijper W.M. (1991). A comparison of several approximations to the power function of the likelihood ratio test in covariance structure analysis. Statistica Neerlandica, 45, 173–185. Shapiro, A. (1983). Asymptotic distribution theory in the analysis of covariance structures (A unified approach). South African Statistical Journal, 17, 33–81. Shapiro, A. (1985). Asymptotic equivalence of minimum discrepancy function estimators to GLS estimators. South African Statistical Journal, 19, 73–81. Steiger, J.H., Shapiro, A., & Browne, M.W. (1985). On the multivariate asymptotic distribution of sequential chi-square statistics. Psychometrika, 50, 253–264. Stroud, T.W.F. (1972). Fixed alternatives and Wald’s formulation of the noncentral asymptotic behavior of the likelihood ratio statistic. Annals of Mathematical Statistics, 43, 447–454. Sugawara, H.M., & MacCallum, R.C. (1993). Effect of estimation method on incremental fit indexes for covariance structure models. Applied Psychological Measurement, 17, 365–377. Swain, A.J. (1975). A class of factor analysis estimation procedures with common asymptotic sampling properties. Psychometrika, 40, 315–335. Tanaka, J.S. (1987). “How big is big enough?”: Sample size and goodness of fit in structural equation models with latent variables. Child Development, 58, 134–146. Weng, L.-J., & Cheng, C.-P. (1997). Why might relative fit indices differ between estimators? Structural Equation Modeling, 4, 121–128. Yuan, K.-H., & Hayashi, H. (2003). Bootstrap approach to inference and power analysis based on three test statistics for covariance structure models. British Journal of Mathematical and Statistical Psychology, 56, 93–110. Yuan, K.-H., & Hayashi, K. (2004). Standard errors with misspecified covariance structure models. Unpublished manuscript. Yuan, K.-H., & Marshall, L. L. (2004). A new measure of misfit for covariance structure models. Behaviormetrika, 31, 67–90. Manuscript received 5 SEP 2001 Final version received 6 JUN 2004