Apr 5, 2011 - that rarely hold in practice to compute the non-centrality parameters in either Satorra-Saris or Maccallum-Browne framework, or run Monte ...
Power in structural equation models: A regression model for the non-centrality parameter as means to construct the power surface Stanislav Kolenikov April 5, 2011 Abstract In this paper, we propose a new way to look at the power of tests in structural equation models. To determine the power of a test at a given misspecification and sample sizes, methodologists either rely on normality assumptions that rarely hold in practice to compute the non-centrality parameters in either Satorra-Saris or Maccallum-Browne framework, or run Monte Carlo simulations to determine power with non-normal data for a given configuration of parameters. Since for the tests that have asymptotic chi-square distributions (central χ2 for a correctly specified model; non-central χ2 for a misspecified model), the asymptotic power is a sufficiently simple expression that involves only the non-centrality parameter, the latter should be studied in utmost detail. We propose to run a small number of simulations for each configuration, and then build a model for the non-centrality parameter, from which the asymptotic power for any given configuration can be recovered. We provide illustrative example of the proposed methodology, and provide code to implement it.
1 1.1
Introduction Papers on power
Satorra & Saris (1985) discuss testing against a specific alternative. For a constraint under the alternative, h(θ) = 0, they mention that the non-centrality parameter is (equation (4)) λ = δ 0 (∇0 h(θ0 )B −1 ∇h(θ0 ))−1 δ, where B = (1/n)× information matrix (i.e., B −1 = asymptotic covariance matrix of the parameter estimates un√ der multivariate normality — NB: need to check what it becomes with non-normal data!), δ = lim nh(θn0 ) = asymptotic local alternative limit of the constraints (NB: existence of the sequence of θn0 satisfying this limit has not been established). Obtaining explicit parameterization for the overall test may be difficult. They show then that λ = nF (Σ(θn0 ), Σ(θn∗ )) = λ + o(1) where θn∗ minimizes F (Σ(θn0 ), Σ(·)). More importantly, an intermediate step of theirs shows (equation (12)) √
n(θn∗ − θn0 ) = −B −1 (∇0 h(θ0 )B −1 ∇h(θ0 ))−1 δ + o(1)
and hence (equation (13)) λ + o(1) = nF (Σ(θn0 ), Σ(θn∗ )) == n(θn∗ − θn0 )0 B −1 (θn∗ − θn0 ) + o(1) Under the regularity condition that h ∈ C 2 and F ∈ C 3 , the order of the o(1) terms used throughout is O(n−1/2 ) (don’t need Op (·) since the derivations are for the population values). They argue that the locality of the alternative is not a limitation, as the test will likely have power against more distant alternatives (NB: monotonicity of any kind 1
is not established). They note that using nF (·) as the approximation to the non-centrality parameter is more accurate than using their equation (4) (Satorra, Saris & de Pijper 1991). They mention that under non-normal data, B −1 is not the right matrix to use, but don’t provide derivations or clarifications. Satorra et al. (1991) compare performance of five different approximations to the non-centrality parameter: W1 = nF (Σ(θ0 , τ ), Σ(θ(τ ), τ0 )), 1 W2 = n(τ − τ0 )0 ∇0τ 0 σHss,θ ∇τ 0 σ(τ − τ0 ), 2 1 W3 = n(σ(τ ) − σ0 )0 Hss,θ (σ(τ ) − σ0 ), 2 W4 = nF (Σ(θ0 , τ0 ), Σ(θ0 , τ )), W5 = nF (Σ(θ0 , τ ), Σ(θ0 , τ0 )), −1 Hss,θ = ∇2ss0 F (S, Σ(θ, τ0 )) − ∇2sθ0 F (S, Σ(θ, τ0 )) ∇2θθ0 F (S, Σ(θ, τ0 )) ∇2θs0 F (S, Σ(θ, τ0 )) when the true model is Σ = Σ(θ, τ0 ) under a family of the local alternatives {Σ(θ0 , τ )|τ ∈ U (τ0 )}. Here, θ(τ ) = arg min F (Σ(θ0 , τ ), Σ(θ, τ0 )). W1 is due to Satorra & Saris (1985) and Steiger, Shapiro & Browne (1985); W3 is due to Shapiro (1985). They prove asymptotic equivalence W1 ∼ W2 ∼ W3 up to o(1), while W4 and W5 are asymptotically biased by O(nkτ − τ0 k2 ) (bias corrections were proposed). The simulation is from a CFA model (1 vs. 2 factors), with varying factor loadings (concomitant issue) and correlation of the factors (τ − τ0 misspecification), with n = 26. They conclude that all the expressions overestimate the power, especially against far alternatives. W1 is more accurate than W2 ≡ W3 (the equivalence is only due to the special form of misspecification in their setup). Stas tends to think that they have discovered small sample issues: for asymptotically equivalent expressions pertaining to the population, that’s the only reasonable explanation. The authors note, too, that when the alternative expressions give wildly different ncp and power, none should be trusted, and suggested using the normal approximation (Shapiro 1985). Saris & Satorra (1993) in Bollen & Long (1993) provides an overview of the issue of power for parameterizable misspecification. They make several important contributions, as well as a number of useful observations. First, they introduce the concept of isopower (actually, they borrow it from Andrews (1989); I don’t think that the latter paper by Maccallum, Lee & Browne (2010) makes that reference): different models tested against the null may produce the same power, and show how different values of the parameters that have the same power can be determined from a respective quadratic function. Second, they discuss various outcomes of the test in the 2 × 2 table with “significant– not significant” and “low–high power” margins: it’s OK to reject the model with low power and significant test statistic, and retain the model with high power and non-signficant test statistic, but other outcomes are not sufficiently informed. Third, they propose decomposition of the misspecification space into the directions associated with high and low eigenvalues of the matrix M = ∇0τ 0 σHss,θ ∇τ 0 σ in λ = n(τ − τ0 )0 M (τ − τ0 ). The higher the eigenvalue, the higher is sensitivity of the test in that direction, and the more narrow is the (hyper-)ellipse of the isopower surface. (Stas should see how to use that in the paper! The direction in which both θ1 and θ2 are of the same sign is that of high sensitivity, while with opposite signs, the sensitivity is lower: the effects offset one another.) As a useful observation, they demonstrate that the power of the overall chi-square test is monotone wrt reliability of the indicators (the greater reliability is associated with a greater power), although this is only done for population values with asymptotic chi-square. I could try to extract that information analytically to guide another simulation model for another paper. There is no obvious analytical expression for this to work out though. Maccallum, Browne & Sugawara (1996) drill on RMSEA (Steiger & Lind 1980, Browne & Cudeck 1993): p = F0 /d where F0 = F (Σ0 , Σ(θ)) is “the resulting value of the discrepancy function reflecting lack of model fit in the population” (i.e., population ncp per observation). The sample estimate is q ˆ = min F (S, Σ(θ))/d − 1/(n − 1) θ
2
where d is the residual degrees of freedom. They spent freaking five pages just to introduce the idea of RMSEA. Inference is possible by virtue of (n − 1)F ∼ non-central χ2 (d, (n − 1)λ), where also λ = (n − 1)d2 . Argument is provided in favor of the statistical test of not-close fit, H0 : ≥ 0.05. vs H1 : < 0.05. This is an inverse test, in that the null is rejected for low values of the positive statistic. NB: implement it in confa with a user-supplied value of RMSEA! NB: there is an upper limit on power of this test, as provided by the exact fit; write a small note on this and submit to PM. Arguments regarding the width of the CI: as d or N increases, the CI shrinks reflecting better assessment of accuracy of the model. NB: Stas suspects that larger d also brings greater small sample biases, according to the general theory of GMM. NB: An argument can be given that increases in either d or N shift the distribution up. If the test is that of the form “Reject if above the critical value”, then an associated increase in power would result. They mildly criticize Satorra-Saris approach on the grounds that it requires the user to specify the full model, while their procedure is model-free and only uses the fit of the model (need to review again when writing up the current paper that works with the test of exact fit). Stas is not convinced: he would like to see whether the model with a different fit is actually feasible (existence). For gross misspecifications, there are upper bounds for the fit, and arbitrary small values may or may not realize. Curran, Bollen, Paxton, Kirby & Chen (2002) report the results of a simulation study that deals with moderately (parameterically) misspecified SEMs. They find that the non-central chi-square performs well with mild misspecifications and sufficiently large sample sizes (n ≥ 200). When the model was severely misspecified, the test statistic still had the right mean, but was overdispersed compared to the target non-central chi-square. McQuitty (2004): a not so technical overview of power issues in SEM. The formulae from Maccallum et al. (1996) are applied to the median model/sample size in published journals; there’s a worthy review of SEM papers in marketing and business research journals. The studies were found to be far from the recommended power of 80%: about a third were overpowered, with power approaching 1; and about a third were hugely underpowered, with power below 40%. MacCallum, Browne & Cai (2006) reintroduce the idea of testing not-closed fit against close fit proposed in Maccallum et al. (1996) (looks mostly like a remake of that article; excessively long as all PM articles are). Things are put somewhat upside down: they propose to procedure to compute the power first, and then go on to discuss the null and the alternative hypotheses. A brief review of the non-central chi-square distribution and the need to specify the ncp is given, with specific parametric differences between models being dubbed Satorra-Saris approach, and overall model fit framework reproduced from Maccallum et al. (1996). An argument is suggested that the former is applicable in established fields where the models are relatively well known, and the latter, in the new field. The question of isopower is briefly mentioned. An interesting relation between the d.f. and the power is established: the greater the residual d.f. of the model, the greater is the power of the overall test (based on RMSEA). Various possible null/alternative combos in the frameworks of RMSEA testing may include: perfect fit against not-perfect fit (the classical LRT paradigm); small difference null H0 : FA − FB ≤ δ0 > 0; H1 : FA − FB = δ1 > δ0 . Stas suspects some mess with the differences between non-central χ2 distributions (see also Yuan & Bentler (2004) that needs to be read and reviewed!), although he could not put a finger to it. Discussed limitations: choice of RMSEA values; violations of assumptions: normality and asymptotic robustness, small sample size with normal and non-normal data, “population drift” (≡ neither of the models is too badly misspecified); elevated levels of the LRT statistic in small samples (Curran, West & Finch 1996); other estimation and testing methods. Chen, Curran, Bollen, Kirby & Paxton (2008) present the results of their huge simulation project with correctly and incorrectly specified models, and found essentially sporadic performance of RMSEA with the traditional cutoffs of 0.05 and 0.10, as well as the use of CIs for RMSEA. The tests have huge levels in low sample sizes, and may have zero power if the population RMSEA is below this cutoff. The RMSEA based test performed better in larger models (not surprising: the non-centrality per parameter is larger in these for a given RMSEA). They recommend to complement fit testing with additional considerations, and find it difficult to justify or recommend any fixed cut-off point. Chun & Shapiro (2009) compare the asymptotic non-central χ2 and normal distributions as approximations to
3
the distribution of the LRT. They discuss the foundations for both distributions: the non-central χ2 , coming from the sequence of local alternatives (n.c.p. δ = nFML ∗); the normal distribution, as arising from the expansion of the non-central χ2 into the sum of squares, and expanding the first square that is assumed to contain all non-normality; for high non-centrality, this term will be greater than the central χ2 remainder; a reference to the standard result on approximate normality of the non-central χ2 for either large d.f. or large non-centrality is also made. They further note around equation (7) that asy
TML ∼ Y 0 QY,
Y ∼ N (0, Γ), Q =
1 ∂2 FML (S, Σ(θ)) 2 ∂s ∂s0
and in the case of the normal data, QΓNT Q = Q due to the special structure of ΓNT . For misspecified data, they obtain that √ d n(FML − F0 ) −→ N (0, γ 0 Γγ), γ = vec (Σ(θ∗ ))−1 − Σ−1 0 where θ∗ is the minimizer of F (Σ0 , Σ(θ)). (This first order Taylor series expansion term vanishes for correctly specified models.) A simplification for ΓNT is given. The second order quadratic term in Taylor series expansion has expected value of n−1 tr[ΓQ] = residual d.f./n and variance of n−2 (2d.f. + 4δ), both contributing smaller order terms to the above normal distribution. Simulations involved an EFA model for normal and non-normal data with varying numbers of factors, variables and observations. Misspecifications were generated to be orthogonal to the estimated parameters (that’s smart! see their references), with F ∗ ≤ 1.36. The quality of approximation was assessed by Kolmogorov-Smirnov statistic. The non-central approximation and the normal approximation with higher order corrections both performed reasonably well when misspecification was not too severe. None of the distributions worked well when the model was badly misspecified. Also, for extremely misspecified models, the quality of approximation deteriorated with sample size. Q-Q plots show that the normal distribution has problems in the tails, however. The non-normal data were represented by an elliptically contoured distributions generated as a mixture of two normals with the same mean and proportionate covariance matrices. (For the current paper, what we can take is the range of sample sizes and range of misspecifications, although the details of parameterization of the misspecifications are opaque.) Hessen & Dolan (2009) consider a parametric model of heteroskedasticity in a CFA model with one factor: Prj k V[yj |η] = exp k=0 βkj η . The model is estimated assuming normality of both the factor η and yj |η using Gaussian quadrature. A likelihood ratio test against homoskedasticity is proposed; LM/score test is mentioned, but not developed (if I were a referee, I would insist on one). The simulations demonstrate reasonable accuracy (size) of the LRT with n ≥ 300, and reasonable power against heteroskedasticity. The normal theory minimum of fit function, the traditional chi-square, has very low power against heteroskedasticity: one needs to have all items to be quite notably heteroskedastic for this power to exceed 10% for a nominal 5% level test. By then, the LRT of the heteroskedastic model has power of > 0.999. Worth noting for this paper: (i) the form of heteroskedasticity; (ii) the χ2 distribution of the LRT; (iii) the low sensitivity/power of the NTLRT against heteroskedasticity. Maccallum et al. (2010) provide a discussion of several ways in which different models can produce the same power in a given test. In parametric Satorra-Saris framework, the alternatives with a given power can be created by finding alternative parameter values for the same model, by finding a (χ2 -)equivalent model, by finding another alternative model in which a given null is nested, by finding another arbitrary model, or by using procedure by Cudeck & Browne (1992) for creating arbitrarily misspecified matrices (see also procedures in Chun & Shapiro (2009)). Most of these procedures require a guided search over tuning parameters that would generate the given power. In MacCallum-Browne’s RMSEA-based framework, things are even easier, as all you need to specify is the alternative combination of ε (or rather a pair of RMSEAs for the null and the alternative) and n. Note: no discussion of nonnormality; no discussion of accuracy of asymptotic power calculations. While Maccallum et al. (2010) discuss the issue of equal power in general descriptive terms, von Oertzen (2010) provides specific details and gives the specific structural transformations that provide power-equivalent models. (The χ2 -equivalent models are a subclass of power-equivalent models.) RAM formulation of SEM is used to drive the 4
proofs. He demonstrates an example of how a linear growth model with multiple waves can be reduced to a fewer waves, and eventually to a single slope parameter with a single intercept (dubbed “minimal power equivalent representative” — amazing!). It should be noted that the fact that a lot of parameters are known for the growth model is heavily utilized. Foldnes, Olsson & Foss (2011) relate the power of the difference and overall tests to kurtosis. As expected, since an increase in kurtosis leads to increase in variance of the estimators, the power of tests goes down. Hence, as always, the closer the data to the multivariate normal, the better the power of the overall test is. The test must be sensitive to kurtosis; thus, Satorra & Bentler (1994) scaled test and the normal theory LRT have different performance, and the former will be more likely to lose power: λscaled = λNT /c where c is the scaling factor.
1.2
Low-level reviews
Muth´en & Muth´en (2002) provide a very simplistic explanation of how to determine the sample size so that the biases of parameter estimates and standard errors are sufficiently small (say within 10% of the target), coverage is sufficiently accurate (within 3% of the target) and the power (correlation between two factors in CFA model: H0 : φ12 = 0, H1 : φ12 = 0.25; growth model: H0 : slope = 0, H1 : slope = 0.1) is close to 80%. Hancock (2006): not so technical overview, with multiple technical inaccuracies. He reviews both power for parameters (Satorra-Saris approach) and power of the overall fit (RMSEA, MacCallum-Browne approach).
1.3
Parallel examples
MacKinnon (1996) and MacKinnon, Haug & Michelis (1999) provide calibration constants for the finite sample quantiles of Dickey & Fuller (1979) test for unit roots in time series. While the asymptotic critical values have been derived using Brownian motion, the higher order asymptotics has been utilized to guide the meta-model for their simulations.
2
Simulations and role of non-centrality
As we have seen in the previous section, the question of power in SEM is studied by simulations. Researchers speculate what the plausible values of the parameters are, and simulate their models for those parameters. To estimate the power accurately, hundreds or thousands simulations are necessary at each parameter combination. We would like to argue that this expense of computational power is not necessary if the simulation results are summarized across different settings. Skrondal (2000) advocated a more extensive use of statistical models to summarize the results of simulation studies, and this is exactly the approach this paper is pursuing. We shall discuss the overall fit test and its various variants here, but the discussion equally applies to the tests of nested hypothesis and other tests that have asymptotic chi-square distribution (central under the null hypothesis and non-central under the alternative). The origins of the non-central χ2 distribution as the distribution of the test under alternative can be demonstrated with a Wald test of a hypothesis that a (vector-valued) parameter is equal to zero: √ (1) H0 : θ = 0 vs. H1 : θ = θ 1 = δ/ n, where the local alternatives parameterization is taken for convenience of deriving approximations with terms involving √ n or n. If the estimates are asymptotically normal (Yuan & Jennrich 1998) [ FIND A BETTER REFERENCE! ], √ d n(θ − θ 0 ) −→ N (0, Ω), (2) 5
where θ 0 is the true value of the parameter (0 under the null, θ 1 under the alternative), and Ω is the asymptotic covariance matrix of the estimates, then the Wald test statistic 0
bΩ b W = nθ
−1
b θ
(3)
is asymptotically a quadratic form in the normal variables, and hence has an asymptotic non-central chi-square distribution with non-centrality parameter ( 0 under H0 0 −1 λ = θ0 Ω θ0 = . (4) 0 −1 δ Ω δ under H1 See Appendix on the definition and properties of the non-central chi-square distribution. Going back from the local parameterization to the fixed alternative, we have λ = nθ 01 Ω−1 θ 1 .
(5)
This expression provides us with several important pieces of information regarding how a theoretical model of the expected simulation results can be set up. First, the non-centrality parameter is expected to be quadratic in the residuals. Second, the non-centrality parameter is linear in the sample size, n. Finally, the non-centrality parameter is inversely related to the variance of the estimates, and hence greater kurtosis of the data will be associated with lower noncentrality (although the relation will not necessarily be very direct).
2.1
Small sample effects
Expansions of estimators Bartlett correction Non-central χ2 for small sample effects: λ = O(n−1 ).
3
Example
3.1
An illustrative model
To connect the theoretical concepts introduced in the previous section with SEM practice, consider the following example. Figure 1 represents a path diagram of a simple two-factor confirmatory factor analysis model. We shall study the power of the normal theory and scaled/adjusted tests (Satorra & Bentler 1994) to detect distributional and structural misspecifications. When the model is perfectly specified, only the solid paths are present, and the data are multivariate normal. The parameters of the model are as follows: λ1 = λ5 = 1, V[f1 ] = V[f2 ] = 1,
λ2 = λ6 = 0.9,
λ3 = λ7 = 0.8,
COV[f1 , f2 ] = 0.5,
λ4 = λ8 = 0.7
ψj = V[deltaj ] = 0.5, j = 1, . . . , 8.
To generate misspecifications that will produce non-trivial power of the tests to be studied, we introduce the following problems to the model (showed as dotted lines on Fig. 1). 1. A small structural misspecification: correlated measurement errors δ7 and δ8 . The magnitude of misspecification √ is given by correlation θ1 = ψ78 / ψ7 ψ8 . 2. A moderate structural misspecification: omitted path connecting f1 and y5 . The magnitude of misspecification is given by θ2 ≡ λ51 .
6
φ12
φ11
φ22
f2
f1 λ1
λ2
λ4
λ3
λ51
λ5
λ6
λ8
λ7
y1
y2
y3
y4
y5
y6
y7
y8
δ1
δ2
δ3
δ4
δ5
δ6
δ7
δ8
ψ6
ψ
78
t(k) Figure 1: The model used in simulations. Misspecified parts of the model are shown in dotted lines. 3. A small distributional misspecification: the unique error δ6 is heteroskedastic: p δ6 = 1/2z6 exp(θ3 f1 − θ32 ), V[δ6 |f1 ] = 0.5 exp(2θ3 f1 − 2θ32 ),
(6)
where z6 is a standard normal variate independent of other variables in the model. Then using the momentgenerating function of the normal distribution, the unconditional variance of δ6 is 2
2
2
V[δ6 ] = E[δ6 ] = 0.5 exp(−2θ3 ) E[exp(2θ3 f1 )] = 0.5 exp(−2θ3 )Mz (2θ3 ) = 0.5
(7)
where Mz (t) = exp(t2 /2) is the mgf of the standard normal variable f1 . The magnitude of misspecification is given by θ3 . 4. A moderate distributional misspecification: the error terms δ1 , . . . , δ4 have a scaled multivariate Student t(k) distribution: r Gk δj = zj / , j = 1, . . . , 4, (8) k−2 where zj are standard normal variates independent of other variables in the model, and Gk ∼ χ2k independently of other variables in the model. The magnitude of misspecification is given by the excess kurtosis of the error, θ4 = 6/(k − 4), k > 4. For all misspecifications, the case of θi = 0 corresponds to no misspecification present (infinite degrees of freedom, or multivariate normal distribution, in the case of misspecification 4).
7
3.2
Meta-model
Based on the stylized facts regarding the performance of the central and the non-central χ2 distribution presented in Section 2, we entertain the model for the mean of the test statistic: E[T ] = d + T1 + T2 , X akl θ1k θ2l T1 = n , 1 + γ3kl θ3 + γ4kl θ4 k+l≤2
T2 =
β0 + β3 θ3 + β33 θ32 + β4 θ4 . n
(9)
The term T1 is the first order term of the non-centrality parameter for the structurally misspecified model, and the term T2 is the second order term responsible for Bartlett correction of the correctly specified model (θ1 = θ2 = 0). The sum T1 + T2 approximates the non-centrality parameter of the distribution, where the non-centrality is allowed to appear both because of Since the variance of the non-central χ2 distribution increases with T , a variance stabilizing transformation is called for. As discussed in Appendix, an appropriate transformation has the form √ G= D−a (10) We took 4 instead of the optimal a = d/2 = 19/2 to retain all observations.
3.3
Simulation setup
Suppose we are interested in evaluating the power of the overall fit tests across a broad range of possible alternatives and sample sizes: θ1 , θ2 , θ3 ∈ {0, ±0.1, ±0.3, ±0.5}, θ4 ∈ {0, 0.5, 1, 2, 6}, n ∈ {25, 50, 100, 200, 400, 800}
(11)
The degrees of freedom of the Student distribution corresponding to the values of θ4 are +∞ (normal), 16, 10, 7 and 5, in an increasing order of the kurtosis (0, 0.5, 1, 2, 6). The full factorial design has 7 · 7 · 7 · 5 · 6 = 10290 distinct combinations. We took 200 replications when θ1 = θ2 = 0 to obtain identification of the small sample term T2 in (9), and 5 replicates for each combination of θ’s when the model was structurally misspecified (either θ1 or θ2 6= 0) to obtain identification of T1 in (9).
3.4
Main results
Research questions: 1. Are there small sample effects? 2. Are there interactions between misspecifications? 3. Is the power surface symmetric wrt to sign changes in θ1 , θ2 , θ3 ?
3.5
Verification
Simulate with all zeroes. Simulate 1000 replications with a value off the grid.
8
3.6
Side results
Besides the tests of the model, we also collected the results of Mardia (1970) multivariate skewness and kurtosis tests. Since these tests have an asymptotic χ2 distribution, they can also be analyzed in the current framework to study the sensitivity of these tests to misspecifications 3 and 4.
4
Discussion
Other schemes that provide input parameters θ to the simulation can be thought of. A random design can use random values of θ. The theory of optimal designs (Fedorov 1972, Dette, Melas & Pepelyshev 2004, Melas 2005) can be used to further reduce the variance of the estimates of the regression model (??). The proposed approach is currently only suitable for the test statistics that have an asymptotic χ2 distribution with a fixed degrees of freedom. Thus, the power of the adjusted test of Satorra & Bentler (1994) for which the degrees of freedom is a random variable cannot be studied in the current framework.
A
Non-central χ2 distribution
Let x ∼ Np (µ, Ip ) be a multivariate normal vector with independent components. Then D = kxk2 = said to have a non-central chi-square distribution with p degrees of freedom and non-centrality parameter λ = kµk2 =
p X
µ2k
Pp
k=1
x2k is
(12)
k=1
When √ µ = 0, the (central) chi-square distribution is obtained. Often, it is convenient to normalize the mean vector as µ = ( λ, 0, . . . , 0). Important properties of the non-central chi-square distribution are as follows. 1. The first two moments of D are E[D] = p + λ,
V[D] = 2(p + 2λ).
(13)
2. The probability density function is given by f (x; p, λ) =
∞ X exp(−λ/2)(λ/2)i
i!
i=0
f (x; p + 2i)
(14)
for x ≥ 0, and zero otherwise, where f (x; k) =
xk/2−1 exp(−x/2) 2k/2 Γ(k/2)
is the density of the central chi-square variable with k degrees of freedom. This representation is sometimes referred to as Poisson mixture, as the factors in front of the density are Poisson probabilities. 3. As p + λ → ∞, i.e., as either the degrees of freedom, or the non-centrality parameter, or both, increase, the distribution converges to a normal: D − (p + λ) d p −→ N (0, 1) (15) 2(p + 2λ) More accurate approximations are available (Sankaran 1963). 9
A variance stabilizing transformation for the non-central χ2 distribution is given in Johnson, Kotz & Balakrishnan (n.d.), equation (29.61b), as r p−1 G= D− (16) 2 Here, V[G] is nearly independent of λ. For small values of λ, however, this would require taking square roots of negative numbers. In analysis of our simulation results, we had to resort to the transformation of the form √ G= D−a (17) for a sufficiently small (say the smallest value of the test statistic that was produced in simulations).
References Andrews, D. W. K. (1989), ‘Power in econometric applications’, Econometrica 57(5), 1059–1090. Bollen, K. A. & Long, J. S., eds (1993), Testing structural equation models, Sage, Thousand Oaks, CA. Browne, M. W. & Cudeck, R. (1993), Alternative ways of assessing model fit, in K. A. Bollen & J. S. Long, eds, ‘Testing Sructural Equation Models’, SAGE Publishers, Newbury Park, CA, USA, chapter 6, pp. 136–162. Chen, F., Curran, P., Bollen, K. A., Kirby, J. & Paxton, P. (2008), ‘An empirical evaluation of the use of fixed cutoff points in RMSEA test statistic in structural equation models’, Sociological Methods and Research 36, 462–494. Chun, S. Y. & Shapiro, A. (2009), ‘Normal versus noncentral chi-square asymptotics of misspecified models’, Multivariate Behavioral Research 44(6), 803–827. Cudeck, R. & Browne, M. (1992), ‘Constructing a covariance matrix that yields a specified minimizer and a specified minimum discrepancy function value’, Psychometrika 57(3), 357–369. Curran, P. J., Bollen, K. A., Paxton, P., Kirby, J. & Chen, F. (2002), ‘The noncentral chi-square distribution in misspecified structural equation models: Finite sample results from a monte carlo simulation’, Multivariate Behavioral Research 37(1), 1–36. Curran, P. J., West, S. G. & Finch, J. F. (1996), ‘The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis’, Psychological Methods 1(1), 16–29. Dette, H., Melas, V. B. & Pepelyshev, A. (2004), ‘Optimal designs for a class of nonlinear regression models’, The Annals of Statistics 32(5), 2142–2167. http://www.jstor.org/stable/3448568. Dickey, D. A. & Fuller, W. A. (1979), ‘Distribution of the estimators for autoregressive time series with a unit root’, Journal of the American Statistical Association 74(366), 427–431. Fedorov, V. V. (1972), Theory of Optimal Experiments, Academic Press. Foldnes, N., Olsson, U. H. & Foss, T. (2011), ‘The effect of kurtosis on the power of two test statistics in covariance structure analysis’, British Journal of Mathematical and Statistical Psychology forthcoming(??), ??–?? Hancock, G. R. (2006), Power analysis in covariance structure modeling, in G. R. Hancock & R. O. Mueller, eds, ‘Structural Equation Modeling: A Second Course’, Quantitative Methods in Education and the Behavioral Sciences, IAP - Information Age Publishing Inc.
10
Hessen, D. J. & Dolan, C. V. (2009), ‘Heteroscedastic one-factor models and marginal maximum likelihood estimation’, British Journal of Mathematical and Statistical Psychology 62(1), 57–77. Johnson, N. L., Kotz, S. & Balakrishnan, N. (n.d.), Continuous Univariate Distributions, Vol. 1 (Wiley Series in Probability and Statistics), 2 edn, Wiley-Interscience. MacCallum, R. C., Browne, M. W. & Cai, L. (2006), ‘Testing differences between nested covariance structure models: Power analysis and null hypotheses.’, Psychological Methods 11(1), 19–35. Maccallum, R. C., Browne, M. W. & Sugawara, H. M. (1996), ‘Power analysis and determination of sample size for covariance structure modeling’, Psychological Methods 1(2), 130–149. Maccallum, R. C., Lee, T. & Browne, M. W. (2010), ‘The issue of isopower in power analysis for tests of structural equation models’, Structural Equation Modeling: A Multidisciplinary Journal 1(17), 23–41. MacKinnon, J. G. (1996), ‘Numerical distribution functions for unit root and cointegration tests’, Journal of Applied Econometrics 11(6), 601–618. MacKinnon, J. G., Haug, A. A. & Michelis, L. (1999), ‘Numerical distribution functions of likelihood ratio tests for cointegration’, Journal of Applied Econometrics 14(5), 563–577. Mardia, K. V. (1970), ‘Measures of multivariate skewness and kurtosis with applications’, Biometrika 57, 519–530. McQuitty, S. (2004), ‘Statistical power and structural equation models in business research’, Journal of Business Research 57(2), 175–183. Melas, V. B. (2005), ‘On the functional approach to optimal designs for nonlinear models’, Journal of Statistical Planning and Inference 132(1–2), 93–116. Muth´en, L. & Muth´en, B. (2002), ‘How to use a monte carlo study to decide on sample size and determine power’, Structural Equation Modeling 9(4), 599–620. Sankaran, M. (1963), ‘Approximations to the non-central chi-square distribution’, Biometrika 50(1–2), 199–204. Saris, W. E. & Satorra, A. (1993), Power evaluations in structural equation models, in ‘Testing Structural Equation Models’, SAGE, Newbury Park, California. Satorra, A. & Bentler, P. M. (1994), Corrections to test statistics and standard errors in covariance structure analysis, in A. von Eye & C. C. Clogg, eds, ‘Latent Variable Analysis’, Sage, Thousands Oaks, CA, chapter 16, pp. 399–419. Satorra, A. & Saris, W. E. (1985), ‘Power of the likelihood ratio test in covariance structure analysis’, Psychometrika 50(1), 83–90. Satorra, A., Saris, W. E. & de Pijper, W. M. (1991), ‘A comparison of several approximations to the power function of the likelihood ratio test in covariance structure analysis’, Statistica Neerlandica 45(2), 173–185. Shapiro, A. (1985), ‘Asymptotic distribution theory in the analysis of covariance structures (a unified approach)’, South African Statistical Journal 17, 33–81. Skrondal, A. (2000), ‘Design and analysis of Monte Carlo experiments: Attacking the conventional wisdom’, Multivariate Behavioral Research 35(2), 137–167. Steiger, J. H. & Lind, J. C. (1980), Statistically-based tests for the number of common factors. Paper presented at the Annual Meeting of the Psychometric Society, Iowa City, Iowa, USA. Available at http://www.statpower.net/Steiger Biblio/Steiger-Lind 1980.pdf. 11
Steiger, J., Shapiro, A. & Browne, M. (1985), ‘On the multivariate asymptotic distribution of sequential Chi-square statistics’, Psychometrika 50(3), 253–263. von Oertzen, T. (2010), ‘Power equivalence in structural equation modelling’, British Journal of Mathematical and Statistical Psychology 63(2), 257–272. Yuan, K.-H. & Bentler, P. M. (2004), ‘On chi-square difference and z tests in mean and covariance structure analysis when the base model is misspecified’, Educational and Psychological Measurement 64(5), 737–757. Yuan, K.-H. & Jennrich, R. I. (1998), ‘Asymptotics of estimating equations under natural conditions’, Journal of Multivariate Analysis 65(2), 245–260.
12