Mar 4, 2004 - 1989; Cole, Maxwell, Arvey & Salas, 1993; Hancock, 2001, 2003; Hancock, ... Hancock, Lawrence & Nevitt, 2000; Kaplan & George, 1995).
Mean Comparison: Manifest Variable versus Latent Variable∗ Ke-Hai Yuan University of Notre Dame and Peter M. Bentler University of California, Los Angeles March 4, 2004
∗
The research was supported by Grant DA01070 from the National Institute on Drug Abuse.
Abstract Mean comparisons are of great importance in the application of statistics. Procedures for mean comparison with manifest variables have been well studied. However, few rigorous studies have been conducted on mean comparisons with latent variables, although the methodology has been widely used and documented. This paper studies the commonly used statistics in latent variable mean modeling and compares them with parallel manifest variable statistics in terms of power, asymptotic distributions, and empirical distributions. The robustness property of each statistic is also explored when the model is misspecified or when data are nonnormally distributed. Our results indicate that, under certain conditions, the likelihood ratio and Wald statistics used for latent mean comparisons do not always have greater power than the Hotelling T 2 statistics used for manifest mean comparisons. The noncentrality parameter corresponding to the T 2 statistic can be much greater than those corresponding to the likelihood ratio and Wald statistics, which we find to be different from those provided in the literature. Our results also indicate that the likelihood ratio statistic can be stochastically much greater than the corresponding Wald statistic, and neither of their distributions can be described by a chi-square distribution when the null hypothesis is not trivially violated. Recommendations and advice are provided for the use of each statistic.
Keywords: Noncentrality parameter, likelihood ratio statistic, Wald statistic, empirical power, parameter bias, asymptotic robustness.
1. Introduction Comparing means of multiple variables is a central theme of many statistical procedures. For example, ANOVA, MANOVA, growth curve modeling, etc., all aim to study mean changes either across populations or across time. Because measurements in the social and behavioral sciences are typically subject to errors, there has been a great interest in mean comparisons of latent variables (LV) using the structural equation modeling (SEM) approach (Bagozzi, 1977; Bagozzi & Yi, 1989; Bentler & Yuan, 2000; Byrne, Shavelson & Muth´en, 1989; Cole, Maxwell, Arvey & Salas, 1993; Hancock, 2001, 2003; Hancock, Lawrence & Nevitt, 2000; Kano, 2001; Kaplan & George, 1995; K¨ uhnel, 1988; McArdle & Epstein, 1987; Meredith & Tisak, 1990; Muth´en, 1989; S¨orbom, 1974). When the observed variables are subject to errors of measurement, modeling the means of LVs may be more meaningful substantively than a comparison of means of the observed or manifest variables (MV). However, the LV approach cannot be applied to any set of variables. When a factor analysis model does not hold for the observed data, it may be more appropriate to just evaluate means of the MVs by using ANOVA or MANOVA (Cole, Maxwell, Arvey & Salas, 1993). Although some research has been done on both the LV and MV approaches to mean comparisons, it is not clear what statistical advantage the LV approach has over the MV approach, even under idealized conditions. Our purpose is to compare these two approaches under ideal as well as under more realistic conditions. The comparison includes a study of type I error, power, parameter biases, and efficiency of parameter estimates. The power of a mean comparison method is closely related to the effect size and sample size, which affect the power through the so-called noncentrality parameter (NCP). Given the sample size and effect size, one can figure out the corresponding NCP. With the NCP, one can evaluate the overlap between the noncentral chi-square distribution (or F distribution) and the corresponding central chi-square distribution (or F distribution), and further determine power characteristics (see e.g., Hancock, Lawrence & Nevitt, 2000; Kaplan & George, 1995). Such a comparison lies behind all the methods of power evaluations in the conventional approaches to inference (e.g., Cohen, 1988; MacCallum, Browne & Sugawara, 1996; Satorra & Saris, 1985). It is not an exaggeration to say that the NCP in the noncentral chi-square or
1
F distribution plays a key role in facilitating power comparisons. Actually, when the NCP and degrees of freedom are known, the data, the sample size or the effect size are no longer necessary in evaluating power. The NCP is easiest to obtain when comparing the means of MVs. In the LV approach to mean comparisons, it is not clear how the NCP is related to model misspecifications or effect sizes. In particular, it is not clear whether the NCP in the LV approach equals that of the MV approach when testing, say, the population mean of a multivariate normal distribution. A related issue is which approach is statistically more powerful. When comparing the means of MVs, the statistic of choice is clear, for example, the t or T 2. In the LV approach, there are the likelihood ratio (LR), Wald, and Lagrange multiplier (LM) statistics (see chapter 10 of Bentler, 1995; Buse 1982; Satorra, 1989). Under idealized conditions, these three procedures are asymptotically equivalent (see Engle, 1984). In reality, they may not be equivalent, and thus it would be desirable to know which statistic optimally controls type I errors and also achieves a nice power. We will discuss these issues using analytical as well as Monte Carlo methods. Kano (2001) thoroughly studied the effect of treating categorical group-indicator variables as normally distributed in the SEM/LV approach to mean comparison. He also compared power properties of MANOVA with that of SEM. Using asymptotics, he found that the NCP in the SEM approach is the same as that in the MANOVA approach. Because there are fewer degrees of freedom in the SEM approach, the power for finding mean differences among latent variables is always greater than that in MANOVA. In his comparison, Kano assumed that the factor loadings and unique variances are known. In practice, these are seldom known in advance and have to be estimated from the data. It is not clear how the NCP or power in the SEM approach is affected when the factor loadings and unique variances are unknown. Hancock (2001) defined effect size and studied the power of mean comparisons in the SEM approach. Using the normal theory based maximum likelihood, he determined the relationships among power, NCP, and effect size. However, as we shall see, the NCP given by Hancock also implicitly is based on the assumption that the factor loading matrix is known, and thus the power and sample size determination procedures given by him are limited to that situation. In section 2, we will provide a new formula for the NCP of several
2
models including the linear and nonlinear growth curves models. We also provide a general result that characterizes the NCP in the LV and MV approaches to mean comparisons. In the LV approach to a mean comparison, one commonly refers the LR or the Wald statistic to a chi-square distribution, justified by asymptotics. Some regularity conditions such as normally distributed data and large enough sample sizes have to be met. In section 3, we discuss these conditions and empirically study the behavior of several commonly used statistics. We shall see that, even for normally distributed data, the commonly used statistics cannot be even approximately described by chi-square distributions when the null hypothesis is not trivially violated. In this situation, the Wald and LR statistics are not asymptotically equivalent either. When comparing means using the MV approach, conclusions will be biased if the covariance matrices in separate groups are assumed to be equal, but are actually not equal. Similarly, in the LV approach, a misspecification in the covariance structure can affect the evaluation of the mean structure. We will analytically characterize such effects in section 4, and relate our results to those in the literature. In practice, data may not be normally distributed. Commonly used procedures for mean comparison, as implemented in standard software, are based on the normal distribution assumption. In section 5, we will characterize the effect of nonnormality on several normal theory based statistics. Under certain conditions, normal theory based procedures remain asymptotically valid even when data are nonnormally distributed. Both the MV and the LV approaches to mean comparison need regularity conditions. In practice, both can be misused when applied blindly, without checking these conditions. A related issue is how to minimize errors with a given procedure. In the concluding section, we will summarize the problems with each method and discuss how to proceed when conditions are not ideal. 2. NCP and Power Under Idealized Conditions Let y1, y2 , . . ., yn be a random sample from a p-variate normal distribution N (µ, Σ). Assume that yi is generated by yi = ν + Λξ i + i ,
3
(1a)
where ξ i and i are independent with E(ξ i ) = τ , Cov(ξi ) = Φ, and Cov(i ) = Ψ is a diagonal matrix. So the mean vector and covariance matrix of yi are µ = ν + Λτ and Σ = ΛΦΛ0 + Ψ.
(1b)
Using this structure for motivation, below we also study more general structures. An interest in the MV approach is to test H01 : µ = 0. It is well-known that the Hotelling T 2 is designed for such a purpose. It is given by ¯, y0S−1 y T 2 = n¯
(2)
¯ is the sample mean and S is the sample covariance matrix. When using (2) to where y test µ = 0, one usually transforms T 2 to an F statistic (see Anderson, 1984, p. 163). On the other hand, the test statistics in the LV approach are justified by asymptotics. For the purpose of comparing the LV and MV approaches, we will also describe the distribution of T 2 by asymptotics (see also Kano, 2001). It is obvious that L
T 2 → χ2p(δ1),
(3a)
δ1 = nµ0Σ−1 µ.
(3b)
where
In the setup of model (1), testing µ = 0 can also be accomplished by testing some more basic parameters in the mean structure. For example, when ν = 0 in (1b), µ = 0 is equivalent to τ = 0. More generally, suppose the elements of µ and Σ are further parameterized as m(θ) and C(θ). When correctly specified, µ = m(θ0 ), Σ = C(θ 0) and we call θ0 the population value of θ. Let θ = (θ01 , θ 02)0 , where θ1 is a q1 × 1 vector containing all the parameters that appear only in m(θ), and θ 2 contains the q2 remaining parameters. ˆ of θ 0 can be obtained by The normal theory based maximum likelihood estimator (MLE) θ minimizing (see Browne & Arminger, 1995) ¯ , S) = [¯ y − m(θ)]0 C−1 (θ)[¯ FM L(θ, y y − m(θ)] + tr[SC−1 (θ)] − log |SC−1 (θ)| − p.
(4)
We need some notation for further technical development. For a p × p matrix A, let vec(A) be the p2 -dimensional vector formed by stacking the columns of A while vech(A) be
4
the p∗ = p(p + 1)/2-dimensional vector formed by stacking the nonduplicated elements of A leaving out the elements above the diagonal. There exists a unique p2 × p∗ matrix Dp such that vec(A) = Dp vech(A) (Magnus & Neudecker, 1999). Let c = vech(C) and W(θ) = 2−1 D0p [C−1 (θ) ⊗ C−1 (θ)]Dp. ˙ A dot on top of a function denotes a derivative, for example, m(θ) = dm(θ)/dθ. Partial ˙ 1(θ) = ∂m(θ)/∂θ1 and c˙ λ (θ) = derivatives will be denoted by a subscript, for example m ∂c(θ)/∂λ. We often omit the argument of the function when it is evaluated at the population value θ 0. We will denote by 0p×q the matrix of dimension p × q whose elements are all 0, by Ip the p × p identity matrix, and by 1p the vector of length p whose elements are all 1.0. ˙ 1θ 01. Then testing for H01 : µ = 0 is equivalent to testing H02 : θ01 = 0. Suppose µ = m We will mainly consider the Wald statistic in this section because the T 2 statistic in (2) and the commonly used z statistic for evaluating the significance of a parameter estimate are also Wald statistics. Under idealized conditions, the Wald statistic is asymptotically equivalent to the LR and LM statistics (see Engle, 1984). In the context of SEM, the Wald statistic for testing θ01 = 0 is given by d θ ˆ 0 [Acov( ˆ 1)]−1 θ ˆ 1, TW = θ 1
(5)
d θ ˆ 1) is a consistent estimator of the asymptotic covariance matrix of θ ˆ 1 . Under where Acov(
standard regularity conditions (see Satorra, 1989; Engle, 1984), L
TW → χ2q1 (δ2),
(6a)
ˆ 1 )]−1θ 01. δ2 = θ 001[Acov(θ
(6b)
where
We have the following theorem regarding the two NCPs δ1 and δ2. ˙ 1θ 01. Then Theorem 1. Suppose both m(θ) and C(θ) are correctly specified with µ = m ˙ 2 = 0, and δ1 > δ2 otherwise. ˙ 01Σ−1 m δ1 = δ2 when m ¯ , S) with respect to θ we obtain the Proof: Taking the second derivative of FM L(θ, y Hessian matrix whose expectation at θ0 is ˙ + c˙ 0Wc). ˙ ˙ 0 Σ−1 m −(m
5
˙ = (m ˙ 1, m ˙ 2) and c˙ = (0p∗ ×q1 , c˙ 2), which further give the information matrix for Note that m minimizing (4) as I=
˙ 01Σ−1 m ˙1 ˙ 01Σ−1 m ˙2 m m 0 −1 0 −1 ˙ 2Σ m ˙1 m ˙ 2Σ m ˙ 2 + c˙ 02Wc˙ 2 m
Let A11 A12 A21 A22
A = I −1 =
!
!
.
(7)
,
where A11 is of dimension q1 × q1 corresponding to θ1 . Then h
ˆ 1 ) = A11/n = n−1 m ˙ 01Σ−1 m ˙ 1−m ˙ 2 (m ˙ 2 + c˙ 02Wc˙ 2 )−1 m ˙ 02Σ−1 m ˙1 ˙ 01Σ−1 m ˙ 02 Σ−1 m Acov(θ
i−1
.
ˆ 1 ) = n−1 (m ˙ 2 = 0, Acov(θ ˙ 1)−1 and ˙ 01 Σ−1 m ˙ 01Σ−1 m When m ˙ 01 Σ−1 m ˙ 1 θ01 = nµ0Σ−1 µ = δ1. δ2 = nθ 001m ˆ 1 ) − n−1 (m ˙ 2 6= 0, Acov(θ ˙ 1)−1 > 0, i.e., a positive definite matrix, ˙ 01Σ−1 m ˙ 01Σ−1 m When m thus δ1 > δ2 . Of course, the hypothesis H02 can also be tested by LR statistics. Under the hypothesis ˇ 2. Because θ 01 = 0, one can fix θ1 = 0 in (4) and just estimate θ 2 . Denote the estimator as θ (m(θ2 ), C(θ 2)) is nested within (m(θ), C(θ)), one LR statistic is h
i
(1) ˇ2, y ˆ y ¯ , S) − FM L(θ, ¯ , S) . TLR = n FM L(θ
In using the above LR statistic for testing θ01 = 0 one needs to make sure that m(θ) and C(θ) are correctly specified. Otherwise, the power or type I error might not be properly controlled. Of course, the saturated covariance model is automatically correct and (m(θ2 ), C(θ 2)) is nested within the saturated model. The LR statistic based on this nesting is (2) ˇ2, y ¯ , S) TLR = nFM L(θ (1)
When (m(θ), C(θ)) is correctly specified and (m(θ 2), C(θ 2 )) is not correctly specified, TLR (2)
(1)
and TLR have the same NCP, but it is TLR that is asymptotically equivalent to the Wald statistic TW in (5). Actually, (1) L
(2) L
TLR → χ2q1 (δ2) and TLR → χ2p∗+p−q2 (δ2).
6
Under idealized conditions, there also exists an LM statistic that is asymptotically equivalent (1)
to TW and TLR . However, the LM statistic is seldom used in mean comparisons, and thus we will not explicitly deal with it here. In the remainder of this section, we will study several important cases. We first consider the simple one-group one-factor model so that enough details can be provided. We next consider linear and nonlinear growth curve models that have two factors. A two-group onefactor model will be considered at the end. By specializing to different cases, we will be able to see the functional relationship of NCP with various model parameters. 2.1 One-group one-factor model Consider the situation when ξ i contains a single factor. Then Λ = λ is a vector and we fix the variance of ξ i = ξi at 1.0 to identify its scale. For simplicity, we further assume that ν = 0, which also makes the mean structure identified. Thus, the mean and covariance structures of (1) are m(θ) = λτ, and C(θ) = λλ0 + Ψ,
(8)
where θ1 = τ and θ2 = (λ0 , ψ0 )0 with ψ = (ψ11, ψ22, . . . , ψpp )0. It is easy to see that ˙ 1 = λ, m ˙ 2 = (τ Ip, 0p×p ), c˙ 2 = (c˙ λ, c˙ ψ ), and µ = m ˙ 1 θ01. m
(9)
˙ 2 6= 0, the NCP δ2 in (6b) for testing τ = 0 does not equal the NCP δ1 in ˙ 01 Σ−1 m Because m (3b). Using (7) and (9), we obtain the information matrix corresponding to model (8) as
λ0Σ−1 λ τ λ0 Σ−1 01×p −1 2 −1 0 0 I = τ Σ λ τ Σ + c˙ λ Wc˙ λ c˙ λ Wc˙ ψ . c˙ 0ψ Wc˙ λ c˙ 0ψ Wc˙ ψ 0p×1 The standard error of τˆ is given by
(10)
q
a11/n. Let
Q = Ip∗ − W1/2c˙ ψ (c˙ 0ψ Wc˙ ψ )−1 c˙ 0ψ W1/2.
(11)
By repeatedly using the formula of inversion for partitioned matrices (e.g., Magnus & Neudecker, 1999, p. 11), we have h
a11 = λ0Σ−1 λ − λ0Σ−1/2 (Ip + τ −2 Σ1/2c˙ 0λ W1/2QW1/2c˙ λ Σ1/2)−1 Σ−1/2 λ
7
i−1
.
(12)
The Wald test for τ = 0 is to refer tW =
√ √ nˆ τ/ a ˆ11 to the standard normal distribution
N (0, 1) or to use 2 TW = t2W = nˆ τ 2a ˆ−1 11 ∼ χ1 .
(13)
When τ 6= 0, it follows from (12) and (13) that L
TW → χ21(δ2),
(14a)
where h
i
0 −1 δ2 = nτ 2a−1 − Σ−1/2 (Ip + τ −2 Σ1/2c˙ 0λ W1/2QW1/2c˙ λ Σ1/2)−1 Σ−1/2 µ. 11 = nµ Σ
(14b)
When λ and Ψ are known, Kano (2001) showed that the LR statistic for testing τ = 0 in model (8) asymptotically follows χ21(δ1), where δ1 is given by (3b). Obviously, when λ and ψ ˙ 2 = 0. When λ and Ψ are unknown, the NCP for inferring the means ˙ 01 Σ−1 m are known, m of MVs is actually greater than the NCP corresponding to the LR statistic for inferring the means of LVs. Actually, holding λ and Ψ constant, a−1 11 is a decreasing function of τ while λ0 Σ−1 λ does not depend on τ . As τ increases, the difference between δ1 and δ2 gets larger. Because T 2 is referred to χ2p (δ1) which has p − 1 more degrees of freedom than χ21(δ2), δ1 > δ2 does not necessarily imply that inference for µ = 0 based on (3) is more powerful than that based on (14). We are unable to give an analytical characterization of the magnitude of the power difference between TW and T 2. A numerical comparison is used instead. Let λ = (0.70, 0.70, 0.75, 0.75, 0.80, 0.80)0 and ψii s be chosen so that Σ is a correlation matrix. At level α = 0.05, Table 1 provides the powers (β) of TW and T 2 when referring them to χ21 and χ26 respectively, but their actual distributions are respectively χ21 (δ2) and χ26(δ1). Note that the sample size requirement for a statistic to be approximately described by its asymptotic distribution is directly related to the degrees of freedom (see Yuan & Bentler, (1)
1998). With 1 degree of freedom in TW and TLR , n = 20 can be regarded as a small sample size and n = 50 can be regarded as a medium sample size. For extra information, Table 1 (2)
also includes the powers of referring TLR ∼ χ215(δ2) to χ215 and referring T 2 to the commonly used F distribution through TF = (n − p)T 2/{p(n − 1)} ∼ Fp,n−p.
8
(15)
Notice that (15) is not a large sample approximation but the exact finite sample distribution when data are normally distributed. At n = 20, TW is more powerful than T 2 and TF when τ ≤ 1.0. T 2 is more powerful than TW when τ ≥ 1.2, and TF is also more powerful than TW (2)
when τ ≥ 1.6. TLR is the least powerful statistic due to its smaller NCP and the greater degrees of freedom in its reference distribution. At n = 50, TW is more powerful than T 2 and TF until all of their powers reach essentially 1.0 at τ = 1.0. Notice that δ1 increases much faster than δ2 as τ increases. Because T 2 has more degrees of freedom, its power is not greater than TW until δ1 is much greater than δ2. Also notice that the powers of T 2, TW (2)
and TLR are based on large sample approximations which may not properly control type I errors at smaller sample sizes. Actually, with normally distributed data, referring T 2 to χ26 for inference leads to greater type I errors than α = 0.05, as specified by the critical value. Insert Table 1 about here The statistic T 2 in (3a) for testing µ = 0 is based on a saturated model. One can also ˆ from the structured ˆ = τˆλ formulate the problem of testing µ = 0 based on the estimated µ model with √ L ˆ − µ) → N (0, V), n(µ √ ˆ can be obtained from standard asymptotics (see Ferguson, 1996, p. where V = Acov( nµ) 45). The corresponding test statistic is L
ˆ −1 µ ˆ → χ2p(δ1), ˆ 0V TS = nµ ˆ does not contain more information ˆ = τˆλ where δ1 is given by (3b). This indicates that µ ˆ j is ¯ . Actually, each individual µ about the underlying population parameter µ than y ˆj = τˆλ more efficient than y¯j but different µ ˆj s also have greater correlations (overlapping information) than those among the corresponding y¯j s. ˜ and ψ ˜ when modeling S by C(θ) without the Note that one can obtain estimates λ ˆ or ψ ˆ become more efficient by jointly mean structure involved. The question is whether λ modeling the mean and covariance structures. It follows from (10) that the asymptotic √ ˆ covariance matrix of nλ is given by n
Aλ = [c˙ 0λWc˙ λ − c˙ 0λWc˙ ψ (c˙ 0ψ Wc˙ ψ )−1 c˙ 0ψ Wc˙ λ] + τ 2[Σ−1 − Σ−1 λ(λ0Σ−1 λ)−1 λ0 Σ−1 ]
9
o−1
and the asymptotic covariance matrix of Aψ =
σ˙ 0ψ Wc˙ ψ
−
c˙ 0ψ Wc˙ λ
c˙ 0λ Wc˙ λ
√ ˆ nψ is given by 2
−1
+ τ [Σ
The asymptotic covariance matrices of
−1
0
−1
−1
0
−1
− Σ λ(λ Σ λ) λ Σ ]
−1
c˙ 0λ Wc˙ ψ
−1
.
√ ˜ √ ˜ nλ and nψ are respectively
h
i−1
h
i−1
Bλ = c˙ 0λWc˙ λ − c˙ 0λ Wc˙ ψ (c˙ 0ψ Wc˙ ψ )−1 c˙ 0ψ Wc˙ λ and Bψ = c˙ 0ψ Wc˙ ψ − c˙ 0ψ Wc˙ λ(c˙ 0λWc˙ λ )−1 c˙ 0λWc˙ ψ
.
Because Σ−1 − Σ−1 λ(λ0 Σ−1 λ)−1 λ0 Σ−1 ≥ 0, we have Aλ ≤ Bλ and Aψ ≤ Bψ . ˆ and ψ ˆ are. So both λ ˆ and ψ ˆ are more The greater the τ the more efficient the estimator λ efficient when jointly modeling the means and covariances. This is consistent with results given by Yung and Bentler (1999). A final note we’d like to make is that δ1 = µ0Σ−1 µ = nτ 2 (λ0 Σ−1 λ) is proportional to the reliability of measurements in yi. Actually, λ0 Σ−1 λ = λ0 Ψ−1 λ/(1 + λ0Ψ−1 λ) is the maximum reliability of a weighted average of the p measurements (see Bentler, 1968; Hancock & Mueller, 2001; Li, 1997; Raykov, 2004; Yuan & Bentler, 2002). For the given τ , the more reliable the measurements the greater the δ1 and δ2 are. 2.2 Growth curve model Let yi = (yi1 , yi2, . . . , yip )0 be repeated measures at p time points. Then a latent growth curve model can be expressed by equation (1) (Meredith & Tisak, 1990; Curran, 2000; Duncan, Duncan, et al., 1999), where ν = 0, Λ=
1 1 1 ... 1 0 1 λ3 . . . λ p
!0
,
(16)
ξ i = (ξi1 , ξi2)0 with ξi1 being the latent slope and ξi2 being the latent intercept, τ = (τ1 , τ2 )0, Φ = Cov(ξi ) =
10
φ11 φ12 φ21 φ22
!
is the covariance matrix between the individual differences in intercept and slope. This setup leads to the following mean and covariance structures m(θ) = Λτ ,
C(θ) = ΛΦΛ0 + Ψ.
(17)
When λ3 to λp in (16) are known constants, then model (17) represents a growth curve model with predetermined or hypothesized growth rates. A special case is the initial value linear growth model with λ3 = 2, λ4 = 3, . . ., λp = p − 1. With predetermined growth rates, ˙ 1 = Λ and θ 1 = (τ1 , τ2)0 and θ 2 = (φ0 , ψ0 )0 , where φ = (φ11 , φ21, φ22)0. It is obvious that m ˙ 2 = 0, the NCP δ2 in the asymptotic distribution of ˙ 1θ 01. Because m ˙ 2 = 0, m ˙ 01 Σ−1 m µ=m the Wald statistic TW equals that in the asymptotic distribution of T 2. Since the likelihood ratio statistic in the SEM approach has smaller degrees of freedom, it will have a greater power than the T 2 statistic in testing µ = 0. Actually, the normal theory based information matrix for model (17) with known λi is
I=
10p Σ−1 1p 10p Σ−1 λ0 01×3 01×p 0 −1 0 −1 λ 0 Σ 1p λ 0 Σ λ 0 01×3 01×p 0 0 c˙ φ Wc˙ φ c˙ φ Wc˙ ψ 03×1 03×1 c˙ 0ψ Wc˙ φ c˙ 0ψ Wc˙ ψ 0p×1 0p×1
,
(18)
where λ0 = (0, 1, λ3 , λ4 , . . . , λp )0 is the prespecified vector of growth rates. Thus, A = I −1 = √ ˆ diag(A11 , A22) with A11 being the asymptotic covariance matrix of nθ 1 and A22 being the √ ˆ ˆ ˆ asymptotic covariance matrix of nθ 2 . So θ 1 and θ 2 are asymptotically independent. It also follows from (18) that the MLEs of Φ and Ψ do not become more efficient when modeling ¯ does not contain any the mean and covariance structure simultaneously. This is because y information of the covariance parameters (φ, ψ). When λ3 to λp in (16) are unknown, then (17) represents a growth curve model with an arbitrary growth rate to be determined by (λj+1 − λj )τ2 . In such a case, θ 1 = τ = (τ1 , τ2)0 and θ2 = (λ3 , . . . , λp , φ0 , ψ0 )0 . Let λ = (0, 1, λ3 , . . . , λp )0 and E = (e3, . . . , ep ), where ei is the ith unit vector in the Euclidean space. We have ˙ 1 = Λ, m
˙ 2 = (τ2 E, 0p×(3+p)). and m
˙ 2 6= 0. Thus, the NCP δ2 in (6b) for testing ˙ 01 Σ−1 m ˙ 1θ 01 and m Consequently, µ = m θ 01 = τ = 0 is smaller than the δ1 in (3b) for testing µ = 0.
11
The normal theory based information matrix for model (17) with unknown λj is
I=
10p Σ−1 1p 10pΣ−1 λ τ2 10p Σ−1 E 01×3 01×p 0 −1 0 −1 0 −1 λ Σ 1p λΣ λ τ2 λ Σ E 01×3 01×p τ2E0 Σ−1 1p τ2 E0Σ−1 λ τ22E0 Σ−1 E + c˙ 0λ Wc˙ λ c˙ 0λ Wc˙ φ c˙ 0λWc˙ ψ c˙ 0φ Wc˙ λ c˙ 0φ Wc˙ φ c˙ 0φ Wc˙ ψ 03×1 03×1 c˙ 0ψ Wc˙ λ c˙ 0ψ Wc˙ φ c˙ 0ψ Wc˙ ψ 0p×1 0p×1
It follows from (19) that the asymptotic covariance matrix of A11 =
(
10p λ0
!
h
−1
Σ
−1
0
−1
− (Σ E)(E Σ E)
−1/2
0
−1
M(E Σ E)
.
(19)
√ nˆ τ is
−1/2
0
−1
(E Σ )
i
1p λ
)−1
,
where n
M = Ip−2 + τ2−2 (E0 Σ−1 E)−1/2 c˙ 0λW1/2HW1/2c˙ λ (E0Σ−1 E)−1/2 with H = Ip∗ − W
1/2
c˙ φ c˙ ψ
"
c˙ 0φ c˙ 0ψ
!
W
c˙ φ c˙ ψ
#−1
c˙ 0φ c˙ 0ψ
!
o−1
W1/2.
Thus, δ2 = nµ0 Σ−1 µ − nµ0 Σ−1 E(E0Σ−1 E)−1/2 M(E0Σ−1 E)−1/2E0Σ−1 µ. Notice that M is a function of τ2 but not of τ1 . When τ2 increases, δ2 increases not as fast as δ1. So, once again, the LR or the Wald statistic will have a greater power than T 2 for ˙ 2 = 0 and smaller τ2 and the opposite is true for larger τ2 . Also notice that, when τ2 = 0, m consequently δ2 = δ1 . Then the likelihood ratio statistic or TW will always have a greater power than T 2 regardless of how large τ1 is. 2.3 Two-group one-factor model Suppose x1 , x2 , . . ., xn1 ∼ N (µ1 , Σ1 ) and y1 , y2, . . ., yn2 ∼ N (µ2 , Σ2 ) are two independent random samples. We further assume that both the samples are generated by one-factor models and µ1 , µ2 , Σ1 , Σ2 can be correctly modeled by m1(θ) = ν, m2(θ) = ν + λτ, C1 (θ) = C2 (θ) = λλ0 + Ψ,
(20)
where θ = (τ, ν 0, λ0, ψ 0). Although the setup does not fall into the category of Theorem 1, the results that are going to be obtained for T 2 and TW are closely related to the one-group case considered in subsection 2.1.
12
Using the MV approach, one can test H03 : µ1 = µ2 by n1 n2 ¯ )0 S−1 (¯ ¯ ), (¯ y−x y−x N
T2 =
(21)
where S is the pooled sample covariance matrix and N = n1 + n2 . When µ1 6= µ2 , L
T 2 → χ2p (δ3)
(22a)
δ3 = Nr1r2 τ 2 λ0 Σ−1 λ = Nr1 r2 (µ2 − µ1 )0Σ−1 (µ2 − µ1 ),
(22b)
as n1 and n2 increase, where
with r1 = n1 /N and r2 = n2/N. The interest in the LV approach is to test H04 : τ = 0, which is equivalent to testing ˆ of θ can be obtained by minimizing µ1 = µ2 in the setup of (20). The MLE θ ¯ , S1 , y ¯ , S2) = r1 FM L(θ, x ¯ , S1) + r2 FM L(θ, y ¯ , S2 ). FM L(θ, x
(23)
The information matrix associated with minimizing (23) is
I =
r2λ0Σ−1 λ r2 λ0 Σ−1 r2 τ λ0 Σ−1 01×p −1 −1 −1 r2 Σ λ Σ r2τ Σ 0p×p −1 −1 2 −1 0 0 r2τ Σ λ r2 τ Σ r2τ Σ + c˙ λWc˙ λ c˙ λWc˙ ψ c˙ 0ψ Wc˙ λ c˙ 0ψ Wc˙ ψ 0p×1 0p×p
.
(24)
When both n1 and n2 increase, standard asymptotics imply that √ L ˆ − θ 0) → N (θ N (0, I −1 ).
(25)
Denote A = (aij ) = I −1 . Then a11 = (r1r2 )
−1
0
−1
0
−1/2
λΣ λ−λΣ
h
2 −1
Ip + (r1 r2 τ )
i−1 Σ1/2 c˙ 0λW1/2QW1/2c˙ λΣ1/2
−1/2
Σ
λ
−1
,
(26) where Q is given in (11). The Wald statistic for testing τ = 0 is TW = N τˆ2a ˆ−1 11 .
(27)
When τ 6= 0, L
TW → χ21(δ4),
13
(28a)
where 0 δ4 = Nτ 2 a−1 11 = Nr1 r2 (µ2 − µ1 ) P(µ2 − µ1 )
(28b)
with h
P = Σ−1 − Σ−1/2 Ip + (r1 r2τ 2 )−1 Σ1/2c˙ 0λ W1/2QW1/2c˙ λ Σ1/2
i−1
Σ−1/2.
Comparing (28b) with (22b), δ4 ≤ δ3 and the inequality is strict when τ 6= 0. Notice that, when holding λ and ψ constant, h
λ0 Σ−1/2 Ip + (r1 r2 τ 2)−1 Σ1/2 c˙ 0λW1/2QW1/2c˙ λΣ1/2
i−1
Σ−1/2 λ
is an increasing function of r1 r2 τ 2. Thus, δ3 − δ4 increases as r1 r2τ 2 increases. Comparing (22b) with (3b) and (28b) with (14b), we find that the difference between LV and MV approaches to mean comparison in the two-group case is almost the same as in the onegroup case. Actually, when changing r1r2 τ 2 to τ 2 and N to n, the result in the two-group case turns to that in the one-group case. Based on the numerical result of the one-group case, the power of TW based on (28) will be greater than that based on (22) for smaller τ , while the opposite is true for larger τ . When λ and ψ are known, Kano (2001) obtained the same NCP for both the LV and MV approaches to mean comparison as that given in (22). Hancock’s (2001) formula for the NCP of the likelihood ratio statistic is just (22b) in the setup of model (20). This equals ˙ 01 Σ−1 m ˙ 2 = 0. Under a sequence of local (28b) only when λ is known or m(θ) satisfies m alternatives (see Stroud, 1972), it can be shown that the likelihood ratio statistic for testing τ = 0 in model (20) also asymptotically follows the chi-square distribution as specified in (28). 3. Empirical Behaviors of the Statistics with Normally Distributed Data and Fixed Alternative Hypothesis The previous section showed that the T 2 statistic in the MV approach and the Wald statistic in the LV approach to mean comparison do not have the same NCP unless the mean structure satisfies certain conditions as in the linear growth curve model. Under idealized conditions, the LR statistic is asymptotically equivalent to the Wald statistic, so
14
(1)
the LR statistic TLR does not have the same NCP as that of the T 2 statistic. In this section, we will study the behavior of the LR statistics, the Wald statistic and the T 2 statistic when conditions are not ideal. Specifically, with normally distributed data we will consider a fixed alternative hypothesis rather than a sequence of local (or contiguous) alternative hypotheses. The sample size will be small or medium instead of huge. A Monte Carlo study comparing the LM, the Wald and the LR statistics in the context of only covariance structure analysis was conducted by Chou and Bentler (1990). Our analysis and result below will provide more insight to the three statistics in mean and covariance structure models. 3.1 Alternative hypothesis and theoretical NCP Let (µn , Σn ) be a sequence of alternative hypotheses and θ ∗ satisfy min FM L(θ, µn , Σn ) = FM L(θ ∗, µn , Σn ). θ In order to establish that the LR or the Wald statistic asymptotically follows a noncentral chi-square distribution one needs to assume √ √ µn = m(θ∗ ) + δ/ n, Σn = C(θ ∗) + ∆/ n,
(29)
where δ is a p-dimensional vector and ∆ is a p × p matrix, and neither depends on n. Notice that the m(θ) in (29) is correctly specified for modeling µn when δ = 0 and C(θ) is correctly specified for modeling Σn when ∆ = 0. The amount of misspecification in (29) decreases as n increases with limn→∞ µn = µ(θ∗ ) and limn→∞ Σn = C(θ ∗ ). When µn and Σn represent the population under consideration and θ 0 is the population value of θ that ˆ is consistent for θ 0 , that is θ∗ = θ 0. corresponds to a correct model structure, under (29) θ Let ϕ = nFM L(θ∗ , µn , Σn ). Under (29), one can further show that (see e.g., Satorra, 1989; Steiger, Shapiro & Browne, 1985) δ2 ≈ ϕ
(30)
and (30) becomes exact when n = ∞. y, S), the alternative hypothesis is µn = E(¯ However, when fitting (m(θ), C(θ)) to (¯ y) = µ and Σn = E(S) = Σ, which do not depend on n. In such a realistic situation, θ ∗ will (1)
not equal θ0 ; (30) may not hold; the Wald statistic TW and the LR statistic TLR may
15
not be asymptotically equivalent either. Actually, condition (29) is just a mathematical convenience; without it, one cannot show, for complicated models as in the context of SEM, that the LR statistic asymptotically follows a noncentral chi-square distribution. We have to use numerical procedures to study the difference between δ2 and ϕ and simulation to (1)
illustrate the discrepancy between the statistic TLR and the chi-square distribution with (1)
(2)
NCP ϕ. In the simulation we also study the statistics T 2, TW , TLR , TLR . Insert Table 2 about here Using the same model as for producing the result of Table 1, the NCP ϕ and the powers (1)
(2)
of referring TLR to χ21 and TLR to χ215 are reported in Table 2, where the real distributions (2)
(2)
are TLR ∼ χ21(ϕ) and TLR ∼ χ215(ϕ). For comparison purpose, δ2 and the powers of TW (2)
and TLR from Table 1 are also copied here. It is easy to see that ϕ > δ2 for all the τ s in (1)
Table 2. Consequently, TW and TLR are no longer equivalent. Actually, the corresponding (1)
power of TLR is greater than that of TW due to ϕ > δ2. Based on our numerical exploration with different factor loadings, error variances, and τ s with model (1), we predict that ϕ is always greater than δ2 when τ 6= 0. But we are unable to provide an analytical answer. Our exploration also indicates that, when holding all the other parameters constant, the magnitude of ϕ − δ2 is an increasing function of τ . When τ is tiny, ϕ ≈ δ2, which is also implicitly implied by the ideal condition specified in (29). When n is near ∞, the √ (1) misspecification δ/ n in (29) will be tiny, and TW and TLR having the same power implies that ϕ = δ2 . Comparing Tables 1 and 2, we may observe that T 2 ∼ χ2p(δ1 ) is still more powerful than (1)
TLR ∼ χ21(ϕ) when τ ≥ 1.2. Another observation is that ϕ is still much smaller than δ1 when τ is large, although it is greater than δ2. 3.2 Empirical power The distributions of the statistics that generated the powers in Tables 1 and 2 are based on asymptotics. With finite sample sizes and fixed alternative hypotheses, it is not clear how well the behavior of the statistics can be described by chi-square distributions. To better understand these statistics, a simulation is performed with normally distributed data. The
16
model and parameters are the same as those that produced Tables 1 and 2. With n = 20, 50, ˆ of TF , T 2, TW , T (1) and T (2), where 95th percentiles of Table 3 contains empirical powers, β, LR LR the corresponding asymptotic null distributions are used in judging the significance of each statistic over 500 replications. Insert Table 3 about here The first line in Table 3 contains the power or type I error when τ = 0 and n = 20. When the model is correct, the ideal situation is to have a 5% rejection. Both T 2 and (2)
TLR over-reject the correct model, implying that their behavior cannot be well described by (1)
their asymptotic distributions. On the other hand, type I errors of TF , TW and TLR are approximately at 5%. Notice that the statistic TF is referred to an exact distribution, thus, ˆ F ) and β(TF ) is just due to sampling errors. As τ increases, the discrepancy between β(T (1)
the powers by all the statistics increase while those of TW and TLR increase the fastest. According to the asymptotic results in Tables 1 and 2, T 2 has a greater power than TW or (1)
(1)
TLR when τ = 1.2, but TW has a greater empirical power in Table 3. The statistic TLR
always has a greater power than TW according to Table 2, however, their empirical powers are about the same in Table 3. The bottom panel of Table 3 contains the powers when (2)
n = 50. As sample size increases, type I errors of T 2 and TLR are more near the nominal level of 0.05. Comparing Table 3 with Tables 1 and 2 we may also observe that, except for the statistic TF , the empirical powers for the other four statistics are always greater than the corresponding asymptotic powers. 3.3 Empirical NCP (1)
An interesting phenomenon is that in Table 2, β(TLR) > β(TW ), while the difference ˆ W ) and β(T ˆ (1)) is smaller than that between β(T ˆ F ) and β(TF ). Because β(T ˆ F )− between β(T LR ˆ W ) and β(T ˆ (1)) might be regarded as essentially equal. The β(TF ) is just sampling error, β(T LR (1) discrepancy between βˆ and β corresponding to TW and TLR deserves further understanding. (1)
For such a purpose, we first obtain the empirical NCPs corresponding to TW and TLR and see how much they differ. Insert Table 4 about here
17
Let TW i be the statistic TW evaluated at the ith replication, δˆ2i = TW i − 1 and 500 1 X δ¯2 = δˆ2i. 500 i=1 (1)
(2)
Both statistics TLR and TLR asymptotically follow chi-square distributions with the same (1)
(2)
NCP ϕ. Their estimates can be obtained by ϕˆ1 = TLR − 1 and ϕˆ2 = TLR − 15. The average ¯2, respectively. Similarly, two estimators for estimators over 500 replications are ϕ¯1 and ϕ δ1 are obtained using TF ∼ Fp,n−p (δ1) and T 2 ∼ χ2p (δ1). The average estimators over 500 replications are δ¯11 and δ¯12, respectively. Notice that E(δ¯11) = δ1 for all the τ s and ns. So we may regard δ¯11 − δ1 as a sampling error. It is obvious from Table 4 that δ¯12 over-estimates δ1 and the amount of over-estimation increases as δ1 increases and decreases as n increases. The estimation of δ2 by δ¯2 is quite good overall. However, at n = 20, δ¯2 over-estimates δ2 when δ2 = 0, 0.2 and δ¯2 under-estimates δ2 when δ2 = 0.4 to 2.0. A similar pattern also exists when n = 50. This may imply that the discrepancy between TW and χ21(δ2) systematically changes as δ2 increases. The estimator ϕ¯2 consistently over-estimates ϕ. The amount of over-estimation is pretty much the same as ϕ increases, and decreases when n increases. As an estimator of ϕ, ϕ¯1 is pretty good and there does not seem to exist a systematic pattern of bias at n = 20. When n = 50 and τ ≥ 0.8, ϕ ¯1 is consistently smaller than ϕ. This may (1)
imply that the discrepancy between TLR and χ21 (ϕ) systematically changes as τ increases. ˆ W ) and β(T ˆ (1)) are about the same, ϕ¯1 is always greater than δ¯2. This implies Although β(T LR (1)
that there exists a systematic difference in distribution shapes of TW and TLR . 3.4 Empirical distribution We next use quantile-quantile (QQ) plots to better understand the distributions of the (1)
(2)
statistics. We will study the distributions of TF , T 2, TW , TLR and TLR . For each statistic, eleven QQ plots corresponding to τ = 0, 0.2 . . ., 2.0 were created. To save space we only present the QQ plots when τ = 0, 0.2, 1.0 and 1.8. These four plots give us enough information about the distribution change of each statistic as τ and n change. Insert Figure 1 about here Figures 1(a) and (b) contain the plots of the statistic TF against the distribution Fp,n−p (δ1) when n = 25 and 50, respectively. Notice that the distribution Fp,n−p (δ1) exactly describes
18
the behavior of TF . In Figure 1, the quantiles of TF match those of Fp,n−p (δ1) very well from left tail to approximately the median; however, they do not match well from approximately the median to the right tail. As n increases, the discrepancy between the two sets of quantiles becomes smaller. But there is no systematic pattern in the discrepancy when τ changes. This is probably because the discrepancy is totally due to sampling errors. Insert Figure 2 about here Figures 2(a) and (b) contain the plots of the quantiles of T 2 against those of χ2p(δ1). It is obvious that essentially all the quantiles of T 2 are above those of the corresponding ones of χ2p(δ1 ) when n = 20, only those near the very left tail are close. The right tail of T 2 is way above that of χ2p(δ1). When n = 50, T 2 is better described by χ2p (δ1). The majority of the quantiles of T 2 are still above the corresponding ones of χ2p(δ1 ) when τ is small. As τ increases, the left tail of T 2 is shorter than that of χ2p(δ1 ) while the right tail of T 2 is much longer than that of χ2p (δ1). Insert Figure 3 about here Figures 3(a) and (b) contain the QQ plots of TW against those of χ21 (δ2). At τ = 0, the distribution of TW can be described by χ21 quite well when compared to Figure 1. However, the distribution of TW gradually departs from χ21(δ2 ) as τ or δ2 increases. When τ = 1.8, TW has a much longer tail on the left and a much shorter tail on the right. The sample size n has little effect in controlling the discrepancy between TW and χ21(δ2). Insert Figure 4 about here (1)
Figures 4(a) and (b) contain the QQ plots of TLR against those of χ21(ϕ). The distribution (1)
of TLR can be described by χ21 (ϕ) quite well when τ or ϕ is small. However, the distribution (1)
(1)
of TLR gradually departs from χ21(ϕ) as τ increases. When τ = 1.8, TLR has a much longer tail on the left and a much shorter tail on the right. The sample size n has little effect (1)
in controlling the discrepancy between TLR and χ21(ϕ). Comparing Figures 3 and 4, the (1)
discrepancy between TLR and χ21 (ϕ) is a lot smaller than that between TW and χ21(δ2).
19
Insert Figure 5 about here (1)
Statistics TLR and TW are asymptotically equivalent under idealized conditions. With a fixed alternative, neither of them can be described well by their asymptotic distributions even for normally distributed data. It is interesting to see how different they are empirically. (1)
Figures 5(a) and (b) compare the quantiles of TLR against those of TW . When τ = 0, the two statistics follow approximately the same distribution even at n = 20 and essentially identical distribution at n = 50. As τ increases, they depart first at the right tail and then (1)
the whole range. At τ = 1.8, TW is stochastically much smaller than TLR . This discrepancy is proportional to sample size n. Insert Figure 6 about here (2)
Figures 6(a) and (b) plot the quantiles of TLR against those of χ215(ϕ). At n = 20 and (2)
τ = 0, the quantiles near the left tail of TLR approximate those of χ215(ϕ) reasonably well, (2)
(2)
but TLR has a longer right tail. As τ increases, the right tail of TLR becomes shorter than (2)
that of χ215(ϕ), the left tail becomes longer. At n = 50, the overall distribution of TLR is (2)
better approximated by χ215(ϕ) when τ is small. But, as τ increases, the left tail of TLR becomes longer while the right tail becomes shorter than those of χ215(ϕ).
Figures 1 to 6 imply that, except for the statistic TF , no statistic can be well approximated by its target or asymptotic distribution. One can show that, as n → ∞, the distribution of (1)
T 2 can be closely described by χ2p(δ1) regardless of δ1. However, for the statistics TW , TLR (2)
and TLR , large sample size alone cannot guarantee that their distributions are approximately chi-squares. Actually, the alternative hypothesis τ is more critical than the sample size n in (1)
(2)
describing the distributions of the statistics TW , TLR and TLR . When τ is small, the behavior (1)
of TW and TLR can be described by their asymptotic distributions pretty well even when n (1)
(2)
is small. When τ is large, a larger n does not make TW , TLR and TLR better approximated by their asymptotic distributions. But a larger n does make them better described by their asymptotic distributions when τ = 0. The figures also provide additional information regarding the empirical powers and NCPs reported in Tables 3 and 4. Because most of the quantiles of T 2 are above the corresponding
20
ˆ 2) > β(T 2) and δ¯12 > δ1 . The discrepancy will gradually disappear when ones of χ2p(δ1), β(T n → ∞. As τ increases, the quantiles from the median to the right tail of TW become increasingly smaller than the corresponding ones of χ21(δ2). Although the lower quantiles of TW also become greater than the corresponding ones of χ21(δ2), the extent of the greatness is not as large as that of the smallness. Thus, δ¯2 < δ2 for larger τ . Notice that the empirical ˆ W ) is evaluated by comparing each TW i to the critical value (χ2)−1 (0.95) ≈ 3.841. power β(T 1 The quantiles of TW around 3.841 are greater than those of χ21(δ2). Although the upper tail of TW are shorter than that of χ21 (δ2), it does not make any difference as long as it is above ˆ W ) > β(TW ). Due to essentially the same reason, (χ21)−1 (0.95). This also explains why β(T ˆ (1)) > β(T (1)) and ϕ β(T ¯1 < ϕ for larger τ . It is obvious from the QQ plots in Figure 5 LR LR (1)
that the quantiles of TW match those of TLR very well around the critical value (χ21 )−1 (0.95), (1)
but those of TW are much smaller than the corresponding ones of TLR at the upper tails, ˆ (1)) are about the same while ˆ W ) and β(T especially for larger τ . This explains why β(T LR (2) ϕ¯1 > δ¯2 . Similarly, because the quantiles of TLR around (χ215)−1 (0.95) ≈ 24.996 are greater
ˆ (2)) > β(T (2)), and ϕ¯2 > ϕ when majority of the quantiles of T (2) than those of χ215(ϕ), β(T LR LR LR are greater than the corresponding ones of χ215(ϕ). (1)
(2)
A final note we would like to make is that the empirical performance of TW , TLR and TLR
under alternative hypotheses obtained in this section might be different from that reported in Chou and Bentler (1990) and Curran et al. (2002). Without a mean structure, Chou and Bentler as well as Curran et al studied the commonly used statistics with misspecified covariance structures. Our results hold for a misspecified mean structure with a correctly specified covariance structure. All 500 replications in all the conditions reported in Figures 1 to 6 converged while Curran et al.’s conclusion was based on a converged subset of a larger number of replications. 4. Parameter Biases due to Model Misspecifications In this section we develop some analytical results regarding the effect of misspecified models on parameters. Yuan, Marshall and Bentler (2003) studied the effect of a misspecified model on parameter estimates in the context of covariance structure analysis. This section
21
generalizes their result to mean structures. Specifically, parameters will be biased when the model is misspecified. The bias makes the LR test and the z or Wald test on parameter values inaccurate. We also relate the analytical results obtained here to empirical findings reported in the literature. ˆ =y ¯ In the context of MANOVA, the µ and Σ are saturated. The parameter estimates µ ˆ = S are unbiased. A possible bias in estimating the covariance matrix can be due to and Σ the misspecification of a common Σ when separate groups have different variance-covariance ¯ −x ¯ matrices. Using the setup of subsection 2.3, the true population covariance matrix of y is ¯) = Cov(¯ y−x
1 1 Σ1 + Σ2 . n1 n2
¯ ) in the T 2 statistic y−x When assuming a common covariance matrix, the estimator of Cov(¯ in (21) is d y−x ¯) = Cov(¯
N [(n1 − 1)S1 + (n2 − 1)S2 ]. (N − 2)n1 n2
When Σ1 6= Σ2 , the bias is d y−x ¯ )] − Cov(¯ ¯) = y−x B = E[Cov(¯
(N − 1)(n2 − n1 ) (Σ2 − Σ1 ). (N − 2)n1 n2
This bias will affect the performance of the statistics TF and T 2. Without loss of generality, let us consider n2 > n1 . When Σ2 − Σ1 > 0, B > 0. Let Tc2 be the corresponding Hotelling’s statistic when Σ1 = Σ2 . Then E(T 2) < E(Tc2 ) and T 2 is stochastically smaller than Tc2. Similarly, when Σ2 < Σ1 , E(T 2) > E(Tc2) and T 2 is stochastically greater than Tc2. This explains why the type I errors of TF is smaller than the nominal error when n2 > n1 and Σ2 > Σ1 , as has been reported in simulation studies on TF by Algina and Oshima (1990) and Hakstian, Roed and Lind (1979). More elaborate studies on T 2 using asymptotics can be found in Ito and Schull (1964). ˆ caused by a misspecified C(θ) Yuan, Marshall and Bentler (2003) studied the bias in θ in covariance structure analysis only. In the context of SEM, both m(θ) and C(θ) can be misspecified. Let m∗ (υ) and C∗ (υ) be the correct mean and covariance structures, thus there exists a vector υ 0 such that µ = m∗(υ 0) and Σ = C∗ (υ 0 ). Let the misspecified models be m(θ) and C(θ). We assume that the misspecification is due to m(θ) and C(θ)
22
missing parameters ϑ of υ = (θ0 , ϑ0 )0 . Under standard regularity conditions (e.g., Kano, ˆ converges to θ∗ which minimizes FM L(θ, µ, Σ). Note that in general 1986; Shapiro, 1984), θ θ ∗ does not equal its counterpart θ0 in υ 0 = (θ00 , ϑ00 )0 , which is the population value of the correctly specified models. We will call ∆θ = θ ∗ − θ 0 the bias in θ ∗, which is also the ˆ It is obvious that, if the sample is generated by µ0 = m(θ 0) and asymptotic bias in θ. Σ0 = C(θ 0 ), then θ∗ will have no bias. We may regard the true population (µ, Σ) as a perturbation to (µ0 , Σ0 ). Due to the perturbation, θ ∗ 6= θ 0 , although some parameters in θ ∗ can still equal the corresponding ones in θ0 (see Yuan et al., 2003). It is obvious that θ0 minimizes FM L(θ, µ0, Σ0 ), under standard regularity conditions, θ0 satisfies the normal equation h(θ 0 , µ0, σ 0 ) = 0, where ˙ 0 (θ)C−1 (θ)[µ−m(θ)]+ c˙ 0 (θ)W(θ){σ +vech[(µ−m(θ))(µ −m(θ))0 ]−c(θ)}. h(θ, µ, σ) = m Because 0 0 ˙ 0C−1 m ˙ + c˙ 0 Wc) ˙ h˙ −1 θ (θ 0 , µ , σ ) = −(m
is invertible, under standard regularity conditions (see Rudin, 1976, pp. 224-225), equation h(θ, µ, σ) = 0 defines θ as an implicit function of µ and σ in a neighborhood of (µ0 , σ 0 ). Denote this function as θ = f (µ, σ), then it is continuously differentiable in the neighborhood of (µ0, σ 0 ) with
˙ + c˙ 0Wc) ˙ 0C−1 c˙ 0W . ˙ 0C−1 m ˙ −1 m f˙ (µ0 , σ 0) = (m
(31)
It follows from (31) that the perturbation (∆µ, ∆σ) causes a perturbation in θ approximately equal to ˙ + c˙ 0 Wc) ˙ 0C−1 m ˙ −1 (m ˙ 0C−1 ∆µ + c˙ 0W∆σ). ∆θ ≈ (m
(32)
In the context of only a covariance structure model, ∆µ = 0, Yuan et al. (2003) formulated the perturbation ∆σ further as a function of the omitted parameters in C(θ), for example, the covariances among errors in a factor model. Equation (32) extends the result of Yuan et al. (2003) to mean structures as well as to any perturbations ∆µ = µ − µ0 and ∆σ = σ − σ 0 . So the bias in θ ∗ caused by the perturbation of misspecifications in m(θ) and
23
c(θ) is approximately given by the ∆θ in (32). Equation (32) implies that the bias in θ∗ caused by ∆µ and ∆σ are approximately additive. Let q be the number of free parameters in θ. The coefficients of ∆σ in bias ∆θ are contained in the q × p∗ matrix ˙ + c˙ 0 Wc) ˙ −1 c˙ 0W. ˙ 0C−1 m (m For example, the lth row (cl,11, . . . , cl,p1; cl,22, . . . , cl,p2; . . . ; cl,pp) of the matrix are the coefficients for (∆σ11, . . . , ∆σp1; ∆σ22, . . . , ∆σp2; . . . ; ∆σpp) corresponding to the lth parameter in θ. When cl,ij = 0, the perturbation ∆σij has no effect on θl. Similarly, the coefficients of ∆µ in ∆θ are contained in the q × p matrix ˙ + c˙ 0 Wc) ˙ 0 C−1 . ˙ 0 C−1 m ˙ −1 m (m To better understand equation (32) we next consider a misspecified covariance structure on a mean parameter. Suppose the population mean vector µ and covariance matrix Σ are generated by µ = τ λ1 and Σ = ΛΛ + Ψ, where λ1 = (λ011, λ021)0 and λ11 λ12 0 λ21 0 λ23
Λ=
!
.
Let µ0 = µ and Σ0 = λ1 λ01 + Ψ. It is obvious that model (1) with ν = 0 represents correct mean and covariance structures for (µ0, Σ0 ). The misspecification in (1) for (µ, Σ) is in the covariance structure C(θ) not in the mean structure m(θ). The perturbation is ∆σ = vech
λ12 λ012 0 0 λ23 λ023
!
.
With θ = (τ, λ0, ψ 0)0 , the bias in τ due to the omitted factors in the factor model is approximately given by the first number in the vector 0
−1
0
−1 0
˙ C m ˙ + c˙ Wc) ˙ c˙ Wvech (m This bias is proportional to λ12 and λ23 .
24
λ12λ012 0 0 λ23λ023
!
.
In special cases, we can get the exact bias instead of the approximate one given in (32). Let’s consider the situation that the population mean vector µ and population covariance matrix Σ are generated by model (1) with ν = 0, Λ = λ being a vector. However, one assumes τ = 0 in the model. Then m = 0 and θ = (λ0 , ψ 0)0 . The value of θ∗ is obtained by minimizing FM L(θ, µ, Σ). The corresponding score function (gradient) is g(θ) = c˙ 0(θ)W(θ)[vech(µµ0 + Σ) − c(θ)], where µµ0 + Σ = τ 2λλ0 + λλ0 + Ψ = (1 + τ 2)λλ0 + Ψ. It is obvious that √ λ∗ = λ 1 + τ 2 and ψ ∗ = ψ √ satisfy g(θ ∗) = 0. So the bias in λ∗ is ( 1 + τ 2 − 1)λ. Parallel to (32), in the two-group mean comparison, as in section 2.3, the θ ∗ that minimizes FM L (θ, µ1 , Σ1 , µ2 , Σ2 ) also contains biases. With perturbations ∆µ1 = µ1 − µ01, ∆σ 1 = σ 1 − σ 01, ∆µ2 = µ2 − µ02 , ∆σ 2 = σ 2 − σ 02, the bias is approximately ˙ 01C−1 ˙ 01 W1 ∆σ1 + r2 m ˙ 02C−1 ˙ 02 W2∆σ 2 ], ∆θ ≈ L−1 [r1m 1 ∆µ1 + r1 c 2 ∆µ2 + r2 c where ˙ 01C−1 ˙ 1 + r1 c˙ 01 W1c˙ 1 + r2 m ˙ 02C−1 ˙ 2 + r2 c˙ 02W2 c˙ 2 . L = r1 m 1 m 2 m There are conditions under which misspecifications in the covariance matrix do not affect ˙ 01C−1 m ˙ 2 = 0, there exist the mean parameters at all. Using the setup of Theorem 1, when m 0
−1
0
˙ + c˙ Wc) ˙ ˙ C m (m 0
−1
−1
˙ C m
=
=
˙ 1)−1 ˙ 01C−1 m (m 0 0 −1 ˙ 2 + c˙ 02 Wc˙ 2)−1 ˙ 2C m 0 (m ˙ 01 C−1 m ˙ 02 C−1 m
!
,
0
c˙ W =
0 0 c˙ 2 W
!
!
,
.
Thus, ∆θ =
∆θ1 ∆θ2
!
=
˙ 1)−1 m ˙ 01 C−1 ∆µ ˙ 01C−1 m (m ˙ 2 + c˙ 02 Wc˙ 2)−1 (c˙ 02 W∆σ + m ˙ 02C−1 m ˙ 02 C−1 ∆µ) (m
!
.
So the perturbations in ∆σ do not contribute to ∆θ 1, which contains all the mean parameters. A special case of it is the growth curve model (17) with predetermined growth rates,
25
a misspecified C(θ) does not affect the τ1 and τ2 . There are other special cases in which misspecified C(θ) do not interfere with the estimation of mean parameters. When µ = 0 in the one-group factor model considered in section 2.1, τ ∗ = 0 whether C(θ) is misspecified or not. When µ = 0 in the nonlinear growth curve model, τ1∗ = τ2∗ = 0 even when C(θ) is misspecified. Similarly, when µ1 = µ2 in the two-group factor model considered in subsection 2.3, τ ∗ = 0 even when C(θ) is misspecified. These conclusions can be obtained by directly inspecting the FM L function. Insert Table 5 about here Let us next consider the effect of a misspecified covariance structure on the NCPs for the (1)
(2)
test statistics TW , TLR and TLR , using a numerical example. For comparison purposes, the same model as generating Tables 1 to 4 is used here. Let (µ0 , Σ0 ) be the same as specified in generating previous Tables. When σ34, σ45 and σ56 are perturbed simultaneously by 0.2, formula (32) gives the approximated bias on τ ∗ as ∆τ = 0.2(c1,34 + c1,45 + c1,56). We also evaluated the true bias ∆∗ τ = τ ∗ − τ0. The second and third columns of numbers in Table 5 show that formula (32) approximates the true bias quite well. As τ increases, the bias also increases. Notice that the biases are obtained using the population parameters and are not affected by sample size n. When Σ is perturbed, the asymptotic NCP in the Wald statistic is affected. When τ ∗ becomes smaller due to the bias, the corresponding NCP δ2∗ should be smaller than the corresponding NCP δ2 that without the perturbation. However, δ2∗ > δ2 when τ ≥ 1.6. This can be seen from equation (14b) that δ2 does not necessarily increase as (1)
τ increases while Σ changes. The NCP ϕ∗1 corresponding to TM L also decreases when C(θ) is misspecified. Unlike δ2 , ϕ∗1 monotonically increases as τ increases, so does (ϕ∗1 − ϕ). The (2)
last column of numbers in Table 5 are the NCP ϕ∗2 corresponding to TM L, which tests both (2)
the mean and covariance structures simultaneously. The significance of TM L can be due to (1)
a misspecified m(θ) or a misspecified C(θ). Although TM L may not correctly reflect the (2)
magnitude of τ with a misspecified C(θ), it is much more reliable than TM L. This is quite different from the results in Yuan and Bentler (in press), where a misspecified Ca (θ) can have a much greater effect on the nested chi-square test when used to test a further restricted Cb (θ). Several numerical examples illustrating different aspects of misspecified covariance
26
structures on mean parameters are also provided in Yuan and Bentler (in press). For example, they showed that a misspecified mean structure m(θ) have a much more dramatic effect on the mean parameters than a misspecified covariance structure, and a misspecified Ca (θ) make the nested chi-square test totally unreliable when testing the structure of Cb (θ). With a misspecified C(θ), we should not expect any of the statistics to empirically behave better than when C(θ) is correctly specified. 5. Distribution Violations In previous sections we studied mean inference for normally distributed data. In practice data may not be normally distributed. In this section, we study the effect of nonnormal data (1)
(2)
on the test statistics T 2, TF , TW , TM L and TM L. In the MV approach to mean comparison, the statistic T 2 asymptotically follows a noncentral chi-square distribution as characterized in (3) whether data are normally distributed or not. Similarly, the statistic TF in (15) is also asymptotically robust to nonnormally distributed data (Arnold, 1981, pp. 340-342), though its distribution can be moderately affected by skewness or kurtosis when sample size is small (Kano, 1995; Wakaki, Yanagihara & Fujikoshi, 2002). (1)
(2)
To study the properties of TW , TM L and TM L when data are nonnormally distributed, let t = (¯ y0, s0 )0 , η = (µ0, σ 0 )0 and β(θ) = (m0(θ), c0(θ))0. When the mean and covariance structure models are correctly specified, β(θ0 ) = η. When yi has finite fourth-order moments, it follows from the central limit theorem that √
L
n(t − η) → N (0, Γ),
√ where Γ = Acov( nt). We will mainly consider model (1) and use the setup developed ˆ 1 ) is the asymptotic at the beginning of section 2. Let TW be defined by (5) where Acov(θ covariance matrix based on inverting the normal theory based information matrix in (7). ˙ 1 θ01. Theorem 2. Suppose both m(θ) and C(θ) are correctly specified with µ = m √ ˙ 1θ 01 with θ01 = δ/ n. Consider a sequence of local alternative hypotheses, that is µn = m ˙ 01 Σ−1 m ˙ 2 = 0, When m L
TW → χ2q1 (δ2),
27
where δ2 is given by (6b). Proof: With nonnormally distributed data, the asymptotic covariance matrix of
√ ˆ nθ is
given by the sandwich-type covariance matrix 0
ˆ = I −1 β˙ GΓGβI ˙ −1 , nAcov(θ) where I is given by (7), β˙ =
˙1 m ˙2 m 0 c˙ 2
!
, G=
Σ−1 0 0 W
!
.
˙ 2 = 0, I −1 is a diagonal matrix. Let ˙ 01 Σ−1 m When m Γ=
Γ11 Γ12 Γ21 Γ22
!
,
where Γ11 = Σ. After a tedious but rather direct computation we have ˆ 1 ) = (m ˙ 1 )−1 . ˙ 01 Σ−1 m nAcov(θ ˆ 1 ) is invariant when the distribution of the data changes. So the asymptotic Thus, Acov(θ distribution of TW does not depend on the normality of the data. ˙ 2 = 0 in Theorem 2 is sufficient but not ˙ 01 Σ−1 m Unlike in Theorem 1, the condition m ˆ necessary. The necessary condition can be obtained by setting the upper-left block of Acov(θ) ˙ 1)−1 . The resulting condition is slightly less stringent, ˙ 01 Σ−1 m that corresponds to θ1 at (m but it is also less transparent and thus more difficult to verify. (1)
Similarly, the asymptotic distribution of TLR does not depend on the normality assumption either. We need to introduce extra notation before stating such a result. Suppose m(θ) is correctly specified and one wants to further test a new mean structure that lies in a manifold of m(θ) while the covariance structures of the two models are the same. Then we can express the new structures as m(θ1 (γ 1), γ 2) and C(γ 2 ), where γ 2 = θ 2 . Denote θ(γ) = (θ01 (γ 1), γ 02)0 and 0
β(γ) = β[θ(γ)] = (m0 [θ1 (γ 1 ), γ 2 ], c0(γ 2 )) . We will use the subscripts γ and θ to indicate derivatives as well as information matrices corresponding to parameters γ and θ, respectively. Let (1) ˆ y ¯ , S) − FM L(θ, ¯ , S)]. TLR = n[FM L(θ(ˆ γ ), y
28
We have the following theorem to test the nested structure. ˙ 2 = 0, ˙ 01Σ−1 m Theorem 3. When θ(γ) is correctly specified and m (1) L
TLR → χ2q , where q = dim(θ 1) − dim(γ 1 ). The proof of the theorem is quite involved and is provided in the appendix. Under a (1)
sequence of local alternative hypotheses, TLR may also approach a noncentral chi-square distribution. But we are unable to provide a proof. (2)
For the statistic TLR , using equation (A4) in the appendix we have (2)
TLR = z0p+p∗ Γ1/2UΓ1/2zp+p∗ + op (1) =
q X
ρj zj2 ,
j=1
where 0 U = G − Gβ˙ θ Iθ−1 β˙ θ G,
ρj are the nonzero eigenvalues of ΓU and zj are independent standard normal random vari(2)
ables. Unless all the ρj s are equal to 1.0, TLR will not follow a chi-square distribution. The ˙ 2 = 0 plus the asymptotic robustness condition identified in covariance ˙ 01 C−1 m condition m structure analysis (see Amemiyia & Anderson, 1990; Browne & Shapiro, 1988; Satorra & (2)
Bentler, 1990; Yuan & Bentler, 1999) are not enough for TLR to follow a chi-square distribution. 6. Discussion and Conclusion Many classical statistical procedures are motivated by mean comparisons, which is of fundamental interest in statistical inference. Classical procedures such as ANOVA and MANOVA have been well studied. In contrast, statistical procedures for mean comparisons with LV models is not well understood, although it is widely used and becoming increasingly popular. We studied the statistical property of several commonly used statistics and compared their merits and drawbacks.
29
First, there are no misspecified models in formulating the statistic TF in the one-group case. Its distribution is also insensitive to nonnormally distributed data when n is large. However, what one can get from TF is whether µ = 0 or µ1 = µ2 . In the two-group case, the distribution of TF can be affected by heterogeneous covariance matrices together with unequal sample size. The statistic T 2 can be regarded as equivalent to TF at a huge sample size. Inference based on T 2 ∼ χ2p is influenced by sample size, number of variables p, as well as heterogeneous covariance matrices in the two group case. A modification to T 2 is ¯ )0 ( Tm2 = (¯ y−x
1 1 ¯ ), S1 + S2 )−1 (¯ y−x n1 n2
whose asymptotic distribution is not affected by heterogeneous covariance matrices but still affected by small sample size together with nonnormality. A parallel statistic TmF based on Tm2 can also be constructed. More research is needed in this direction. (1)
In the LV approach to mean comparison, the target distributions of the statistics TW , TLR (2)
and TLR are chi-squares, justified by asymptotics under idealized conditions. When the null hypothesis is true, their empirical behaviors can be described very well when n is relatively large. With 1 degree of freedom, n = 20 is enough for the LR or Wald statistics to be well described by χ21 as illustrated in section 3. However, when the alternative hypothesis is not (1)
(2)
trivially different from the null hypothesis, the distributions of TW , TLR and TLR cannot be described by the corresponding noncentral chi-square distributions regardless of how large the sample size is. Under certain conditions, as specified in Theorems 2 and 3, the asymptotic (1)
distributions of TW and TLR are not influenced by nonnormality. The distributions of TW (1)
and TLR can be moderately influenced by a misspecified covariance structure and strongly influenced by a misspecified mean structure. There seems to be no remedy to such an effect of misspecification. (1)
Unless the null hypothesis is true, TLR is stochastically greater than TW . Their difference increases as the alternative hypothesis departs from the null. Although the distributions of (1)
TW and TLR cannot be described by noncentral chi-square distributions, there exist minor differences between their empirical power and idealized power. The discrepancy at the far (1)
tail does not matter for the purpose of power. Actually, TW and TLR have approximately the same empirical power even when conditions are not ideal.
30
When the purpose is to detect a mean difference, TF should be the prefered statistic because it is quite robust to violation of conditions. But one needs to check whether the covariance matrices are homogeneous. A factor model may not hold for any population. When it holds, testing the difference in factor means may be substantively interesting. In (1)
the LV approach, we recommend the statistics TW and TLR . When using them, one needs to pay attention that the covariance and especially the mean structures are correctly specified in the base model. Nonnormal data will generally affect the distributions of TW and (1)
˙ 2 6= 0. A possible remedy for TW with nonnormal data is to use the ˙ 1 C−1 m TLR when m sandwich-type covariance matrix instead of the normal theory based information matrix in (1)
ˆ 1 ). Similarly, the statistic T can be rescaled to make it less sensitive to obtaining Acov(θ LR distributional violations. More empirical study is needed to see how these statistics perform. Comparing the LV and MV approaches to mean comparison, the LV approach based on (1)
TW and TLR is slightly more powerful, although, under idealized conditions, the NCP corre(1)
sponding to T 2 or TF in the MV approach is much greater. When TLR or TW is significant, (2)
it is most likely that TF will also be significant. The statistic TLR is not recommended for mean inference. With violation of conditions such as nonnormally distributed data and misspecified models, the bootstrap procedure may have some advantage over those based on asymptotics (see Efron & Tibshirani, 1992). In applying the bootstrap procedure, there may exist convergence problems in the LV approach. Then TF should be the choice if the interest is to test µ = 0. Appendix Proof of Theorem 3: Using a Taylor expansion (see Yuan, Marshall & Bentler, 2002) on ¯ , S) we have FM L(θ, y ˆ y ˆ 0G(t − β(θ)) ˆ + op (1/n). ¯ , S) = (t − β(θ)) FM L(θ,
(A1)
√ ˆ − θ 0 = I −1 β˙ 0 G(t − η) + op (1/ n). θ θ θ
(A2)
Notice that
31
It follows from (A2) that ˆ = (t − β(θ0 )) − (β(θ) ˆ − β(θ 0 )) t − β(θ) √ ˆ − θ 0 ) + op (1/ n) = (t − β(θ0 )) − β˙ θ (θ
(A3)
√ 0 = [Ip+p∗ − β˙ θ Iθ−1 β˙ θ G](t − η) + op (1/ n). Putting (A3) into (A1) leads to ˆ y ¯ , S) = n(t − η)0 G1/2(Ip+p∗ − Pθ )G1/2(t − η) + op (1), nFM L(θ,
(A4)
where 0
Pθ = G1/2β˙ θ Iθ−1 β˙ θ G1/2. Similarly, ¯ , S) = n(t − η)0 G1/2(Ip+p∗ − Pγ )G1/2(t − η) + op (1), γ, y nFM L(ˆ
(A5)
where 0
Pγ = G1/2 β˙ γ Iγ−1 β˙ γ G1/2. Combining (A4) and (A5) leads to (1)
TLR = n(t − η)0G1/2(Pθ − Pγ )G1/2(t − η) + op (1).
(A6)
0 0 Notice that β(γ) = β[θ(γ)], Iγ = β˙ γ Gβ˙ γ and Iθ = β˙ θ Gβ˙ θ . There exists β˙ γ = β˙ θ θ˙ γ and 0
0
0
Pγ = G1/2β˙ θ θ˙ γ (θ˙ γ Iθ θ˙ γ )−1 θ˙ γ β˙ θ G1/2. Because θ(γ) = (θ 01 (γ 1 ), γ 02 )0, θ˙ γ =
θ˙ 1γ 0 0 Iq2
!
and 0 θ˙ γ Iθ θ˙ γ
=
0 0 ˙ 01C−1 m ˙ 1θ˙ 1γ ˙ 01C−1 m ˙2 θ˙ 1γ m θ˙ 1γ m 0 −1 0 −1 0 ˙ ˙ 2C m ˙ 1 θ1γ ˙ 2C m ˙ 2 + c˙ 2Wc˙ 2 m m
!
.
˙ 2 = 0, ˙ 01 C−1 m When m 0 0 θ˙ γ (θ˙ γ Iθ θ˙ γ )−1 θ˙ γ
=
0 0 ˙ 01C−1 m ˙ 1θ˙ 1γ )−1 θ˙ 1γ θ˙ 1γ (θ˙ 1γ m 0 0 −1 ˙ 2 + c˙ 02 Wc˙ 2)−1 ˙ 2C m 0 (m
32
!
.
(A7)
It follows from (A7) that 0 0 0 Pθ − Pγ = G1/2β˙ θ [Iθ−1 − θ˙ γ (θ˙ γ Iθ θ˙ γ )−1 θ˙ γ ]β˙ θ G1/2
=
R 0 0 0
!
(A8) ,
where 0 0 ˙ 1 [(m ˙ 1 )−1 − θ˙ 1γ (θ˙ 1γ m ˙ 01 C−1 m ˙ 1 θ˙ 1γ )−1 θ˙ 1γ ]m ˙ 01C−1 m ˙ 01C−1/2 . R = C−1/2 m
Combining (A6) and (A8) leads to (1)
TLR = z0Rz + op (1), L
where z = C−1/2 (¯ y − µ) → N (0, Ip ). The theorem follows by noticing that R is a projection matrix with tr(R) = dim(θ 1) − dim(γ 1 ).
33
References Algina, J., & Oshima,T. C. (1990). Robustness of the independent samples Hotelling’s T 2 to variance-covariance heteroscedasticity when sample sizes are unequal and in small ratios. Psychological Bulletin, 108, 308–313 Amemiya, Y. & Anderson, T. W. (1990). Asymptotic chi-square tests for a large class of factor analysis models. Annals of Statistics, 18, 1453–1463. Anderson, T. W. (1984). An introduction to multivariate statistical analysis (2nd ed.). New York: Wiley. Arnold, S. F. (1981). The theory of linear models and multivariate analysis. New York: Wiley. Bentler, P. M. (1968). Alpha-maximized factor analysis (Alphamax): Its relation to alpha and canonical factor analysis. Psychometrika, 33, 335–345. Bentler, P. M. (1995). EQS structural equations program manual. Encino, CA: Multivariate Software. Bagozzi, R. P. (1977). Structural equation models in experimental research. Journal of Marketing Research, 14, 209–226. Bagozzi, R. P., & Yi, Y. (1989). On the use of structural equation models in experimental designs. Journal of Marketing Research, 26, 271–284. Bentler, P. M., & Yuan, K.-H. (2000). On adding a mean structure to a covariance structure model. Educational and Psychological Measurement, 60, 326–339. Browne, M. W., & Arminger, G. (1995). Specification and estimation of mean and covariance structure models. In G. Arminger, C. C. Clogg, & M. E. Sobel (Eds.), Handbook of statistical modeling for the social and behavioral sciences (pp. 185–249). New York: Plenum. Browne, M. W., & Shapiro, A. (1988). Robustness of normal theory methods in the analysis of linear latent variate models. British Journal of Mathematical and Statistical Psychology, 41, 193–208. Buse, A. (1982). The likelihood ratio, Wald, and Lagrange multiplier tests: An expository note. American Statistician, 36, 153–157. Byrne, B. M., Shavelson, R. J., & Muthen, B. (1989). Testing for the equivalence of factorial covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105, 456–466. Chou, C.-P., & Bentler, P. M. (1990). Model modification in covariance structure modeling: A comparison among likelihood ratio, Lagrange multiplier, and Wald tests. Multivariate Behavioral Research, 25, 115–136.
34
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. Cole, D. A., Maxwell, S. E., Arvey, R., & Salas, E. (1993). Multivariate group comparisons of variable systems: MANOVA and structural equation modeling. Psychological Bulletin, 114, 174–184. Curran, P. J. (2000). A latent curve framework for the study of developmental trajectories in adolescent substance use. In J. R. Rose, L. Chassin, C. C. Presson, & S. J. Sherman (Eds.), Multivariate applications in substance use research: New methods for new questions (pp. 1-42). Mahwah, NJ: Erlbaum. Curran, P. J., Bollen, K. A., Paxton, P., Kirby, J., & Chen, F. (2002). The noncentral chi-square distribution in misspecified structural equation models: Finite sample results from a Monte Carlo simulation. Multivariate Behavioral Research, 37, 1–36. Duncan, T. E., Duncan, S. C., Strycker, L. A., Li, F., & Alpert, A. (1999). An introduction to latent variable growth curve modeling: Concepts, issues, and applications. Mahwah, NJ: Erlbaum. Efron, b. & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall. Engle, R. (1984). Wald, likelihood ratio and Lagrange multiplier tests in econometrics. In Griliches and Intrilligator (Eds.), Handbook of econometrics, vol II (pp. 775–826). Amsterdam: North Holland. Ferguson, T. (1996). A course in large sample theory. London: Chapman & Hall. Hakstian, A. R., Roed, J. C. & Lind, J. C. (1979). Two-sample T 2 procedure and the assumption of homogeneous covariance matrices. Psychological Bulletin, 86, 1255–1263. Hancock, G. R. (2001). Effect size, power, and sample size determination for structured means modeling and MIMIC approaches to between-groups hypothesis testing of means on a single latent construct. Psychometrika, 66, 373–388. Hancock, G. R. (2003). Fortune cookies, measurement error, and experimental design. Journal of Modern Applied Statistical Methods, 2, 293–305. Hancock, G. R., Lawrence, F. R., & Nevitt, J. (2000). Type I error and power of latent mean methods and MANOVA in factorially invariant and noninvariant latent variable systems. Structural Equation Modeling, 7, 534–556. Hancock, G. R., & Mueller, R. O. (2001). Rethinking construct reliability. In R. Cudeck, S. du Toit, & D. S¨orbom (Eds.), Structural equation modeling: Present and future (pp. 195–216). Lincolnwood, IL: Scientific Software International. Ito, K. & Schull, W. J. (1964). On the robustness of the T02 test in multivariate analysis of
35
variance when variance-covariance matrices are not equal. Biometrika, 51, 71-82. Kano, Y. (1986). Conditions on consistency of estimators in covariance structure model. Journal of the Japan Statistical Society, 16, 75–80. Kano, Y. (1995). An asymptotic expansion of the distribution of Hotelling’s T 2-statistic under general distributions. American Journal of Mathematical and Management Sciences, 15 317–341. Kano, Y. (2001). Structural equation modeling for experimental data. In R. Cudeck, S.H.C. du Toit, & D. S¨orbom (Eds.), Structural equation modeling: Present and future (pp. 381–402). Lincolnwood, IL: Scientific Software International. Kaplan, D., & George, R. (1995). A study of the power associated with testing factor mean differences under violations of factorial invariance. Structural Equation Modeling, 2, 101–118. K¨ uhnel, S. M. (1988). Testing MANOVA designs with LISREL. Sociological Methods and Research, 16, 504–523. Li, H. (1997). A unifying expression for the maximal reliability of a linear composite. Psychometrika, 62, 245–249. MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130–149. Magnus, J. R., & Neudecker, H. (1999). Matrix differential calculus with applications in statistics and econometrics. New York: Wiley. McArdle, J. J., & Epstein, D. (1987). Latent growth curves with developmental structure equation models. Child Development, 58, 110–133. Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55, 107–122. Muth´en, B. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, 54, 557–585. Raykov, T. (2004). Estimation of Maximal Reliability: A Note on a Covariance Structure Modeling Approach. To appear in British Journal of Mathematical and Statistical Psychology, 57. Rudin, W. (1976). Principles of mathematical analysis (3rd ed.). New York: McGraw-Hill. Satorra, A. (1989). Alternative test criteria in covariance structure analysis: A unified approach. Psychometrika, 54, 131–151. Satorra, A., & Bentler, P. M. (1990). Model conditions for asymptotic robustness in the analysis of linear relations. Computational Statistics & Data Analysis, 10, 235–249.
36
Satorra, A. & Saris, W. (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50, 83–90. Shapiro, A. (1984). A note on the consistency of estimators in the analysis of moment structures. British Journal of Mathematical and Statistical Psychology, 37, 84–88. S¨orbom, D. (1974). A general method for studying differences in factor means and factor structures between groups. British Journal of Mathematical and Statistical Psychology, 27, 229–239. Steiger, J. H., Shapiro, A., & Browne, M. W. (1985). On the multivariate asymptotic distribution of sequential chi-square statistics. Psychometrika, 50, 253–264. Stroud, T. W. F. (1972). Fixed alternatives and Wald’s formulation of the noncentral asymptotic behavior of the likelihood ratio statistic. Annals of Mathematical Statistics, 43, 447–454. Wakaki, H., Yanagihara, H., & Fujikoshi, Y. (2002). Asymptotic expansions of the null distributions of test statistics for multivariate linear hypothesis under nonnormality. Hiroshima Mathematical Journal, 32, 17–50. Yuan, K.-H., & Bentler, P. M. (1998). Normal theory based test statistics in structural equation modeling. British Journal of Mathematical and Statistical Psychology, 51, 289– 309. Yuan, K.-H., & Bentler, P. M. (1999). On normal theory and associated test statistics in covariance structure analysis under two classes of nonnormal distributions. Statistica Sinica, 9, 831–853. Yuan, K.-H., & Bentler, P. M. (2002). On robustness of the normal-theory based asymptotic distributions of three reliability coefficient estimates. Psychometrika, 67, 251–259. Yuan, K.-H., & Bentler, P. M. (in press). On chi-square difference and z tests in mean and covariance structure analysis when the base model is misspecified. Educational and Psychological Measurement (www.nd.edu/∼ kyuan/papers/nest-chisq-z.pdf). Yuan, K.-H., Marshall, L. L., & Bentler, P. M. (2002). A unified approach to exploratory factor analysis with missing data, nonnormal data, and in the presence of outliers. Psychometrika, 67, 95–122. Yuan, K.-H., Marshall, L. L., & Bentler, P. M. (2003). Assessing the effect of model misspecifications on parameter estimates in structural equation models. Sociological Methodology, 33, 241–265. Yung, Y.-F., & Bentler, P. M. (1999). On added information for ML factor analysis with mean and covariance structures. Journal of Educational and Behavioral Statistics, 24, 1–20.
37
Table 1. Powers1 (β) of T 2, TW and TLR under idealized conditions n = 20 MV LV (2) 2 τ δ1 β(T ) β(TF ) δ2 β(TW ) β(TLR ) 0.2 0.712 0.080 0.070 0.696 0.133 0.067 2.846 0.196 0.142 2.611 0.366 0.126 0.4 6.404 0.429 0.289 5.324 0.636 0.236 0.6 0.8 11.385 0.711 0.504 8.367 0.824 0.380 17.790 0.907 0.725 11.377 0.921 0.524 1.0 25.617 0.982 0.885 14.140 0.964 0.643 1.2 34.868 0.998 0.965 16.566 0.983 0.732 1.4 45.542 1.000 0.992 18.642 0.991 0.795 1.6 1.8 57.639 1.000 0.999 20.394 0.995 0.839 71.159 1.000 1.000 21.864 0.997 0.870 2.0 n = 50 0.2 1.779 0.134 0.122 1.740 0.261 0.097 7.116 0.476 0.416 6.527 0.724 0.291 0.4 16.011 0.869 0.810 13.310 0.954 0.609 0.6 28.464 0.991 0.978 20.918 0.996 0.851 0.8 44.474 1.000 0.999 28.443 1.000 0.955 1.0 1 Calculated using noncentral chi-square distribution
Table 2. Powers1 (β) of TW and TLR with the violation of “local alternatives” n = 20 NCP δ2 NCP ϕ (2) (1) (2) τ δ2 β(TW ) β(TLR ) ϕ β(TLR ) β(TLR) 0.2 0.696 0.133 0.067 0.699 0.133 0.067 2.611 0.366 0.126 2.661 0.371 0.128 0.4 5.324 0.636 0.236 5.556 0.654 0.246 0.6 8.367 0.824 0.380 9.012 0.851 0.411 0.8 1.0 11.377 0.921 0.524 12.726 0.946 0.584 14.140 0.964 0.643 16.491 0.982 0.730 1.2 16.566 0.983 0.732 20.184 0.994 0.835 1.4 18.642 0.991 0.795 23.739 0.998 0.902 1.6 20.394 0.995 0.839 27.127 0.999 0.944 1.8 21.864 0.997 0.870 30.337 1.000 0.968 2.0 n = 50 0.2 1.740 0.261 0.097 1.748 0.262 0.097 6.527 0.724 0.291 6.653 0.732 0.297 0.4 0.6 13.310 0.954 0.609 13.890 0.961 0.633 20.918 0.996 0.851 22.531 0.997 0.883 0.8 28.443 1.000 0.955 31.815 1.000 0.975 1.0 1 Calculated using noncentral chi-square distribution.
(1)
(2)
ˆ of TF , T 2, TW , T and T Table 3. Empirical powers1 (β) LR LR n = 20 MV LV ˆ 2) ˆ W ) β(T ˆ (1)) β(T ˆ (2)) β(T τ TF β(T LR LR 0.0 0.056 0.274 0.044 0.044 0.150 0.084 0.318 0.136 0.134 0.188 0.2 0.154 0.436 0.376 0.376 0.256 0.4 0.262 0.640 0.692 0.696 0.354 0.6 0.490 0.856 0.886 0.888 0.602 0.8 1.0 0.734 0.944 0.980 0.980 0.736 0.894 0.990 0.996 0.996 0.886 1.2 0.974 1.000 1.000 1.000 0.952 1.4 0.994 1.000 1.000 1.000 0.994 1.6 0.998 1.000 1.000 1.000 0.996 1.8 2.0 1.000 1.000 1.000 1.000 1.000 n = 50 0.0 0.048 0.102 0.056 0.056 0.062 0.112 0.232 0.278 0.278 0.118 0.2 0.4 0.424 0.570 0.762 0.760 0.360 0.822 0.904 0.986 0.986 0.746 0.6 0.974 0.994 1.000 1.000 0.930 0.8 1.000 1.000 1.000 1.000 0.994 1.0 1 Based on 500 replications with normal data.
Table 4. Empirical NCP1 n = 20 τ 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 n = 50 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
δ1 0.000 0.712 2.846 6.404 11.385 17.790 25.617 34.868 45.542 57.639 71.159
δ¯11 0.058 0.926 2.711 5.967 10.530 18.087 26.140 34.282 45.030 56.714 72.163
0.000 1.779 7.116 16.011 28.464 44.474 64.043 87.170 113.854 144.097 177.897
-0.204 1.617 7.185 16.084 28.600 44.044 64.021 86.642 112.286 140.658 178.223 1 Based
δ¯12 3.592 4.966 7.793 12.948 20.172 32.138 44.888 57.779 74.797 93.297 117.758
δ2 0.000 0.696 2.611 5.324 8.367 11.377 14.140 16.566 18.642 20.394 21.864
δ¯2 0.023 0.717 2.479 5.068 7.623 10.747 13.492 15.821 17.752 19.319 20.871
0.762 0.000 0.056 2.887 1.740 1.773 9.382 6.527 6.335 19.765 13.310 13.292 34.367 20.918 20.283 52.385 28.443 27.608 75.692 35.351 34.606 102.082 41.416 40.584 132.001 46.605 45.440 165.101 50.986 49.761 208.927 54.660 53.557 on 500 replications with normal
ϕ 0.000 0.699 2.661 5.556 9.012 12.726 16.491 20.184 23.739 27.127 30.337
ϕ¯1 0.034 0.747 2.638 5.527 8.515 12.650 16.588 20.062 23.407 26.806 30.383
ϕ¯2 3.214 4.055 5.461 8.174 12.133 15.805 20.117 23.393 27.310 29.997 33.785
0.000 1.748 6.653 13.890 22.531 31.815 41.228 50.460 59.348 67.817 75.844 data.
0.061 1.801 6.531 14.050 22.166 31.412 41.147 50.275 58.772 67.046 75.811
0.741 2.542 8.124 15.191 23.508 32.491 42.448 51.116 59.963 68.472 76.691
Table 5. NCP with a misspecified covariance structure ∆σ34 = 0.2, ∆σ45 = 0.2, ∆σ56 = 0.2 n = 20 τ 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
∆τ -0.015 -0.029 -0.042 -0.054 -0.066 -0.077 -0.088 -0.100 -0.111 -0.122
∆∗ τ -0.017 -0.032 -0.047 -0.060 -0.073 -0.085 -0.097 -0.110 -0.122 -0.135
δ2 0.696 2.611 5.324 8.367 11.377 14.140 16.566 18.642 20.394 21.864
δ2∗ 0.616 2.351 4.896 7.865 10.910 13.800 16.411 18.699 20.670 22.352
ϕ 0.699 2.661 5.556 9.012 12.726 16.491 20.184 23.739 27.127 30.337
ϕ∗1 0.617 2.368 5.002 8.205 11.702 15.290 18.844 22.289 25.591 28.733
ϕ∗2 16.261 18.191 21.043 24.452 28.122 31.849 35.509 39.038 42.404 45.597
Figure 1(a). QQ plots of the Wald statistic TF versus the noncentral F distribution Fp,n−p (δ1), n = 20.
tau=0
tau=0.2
6
4
F statistic
F statistic
6
2
4
2
0
0
0
1
2
3
4
5
6
0
7
1
2
3
4
5
Central F distribution
Noncentral F distribution
tau=1.0
tau=1.8
6
7
50
20
40
F statistic
F statistic
15
10
30
20
5 10
0
0
3
8
13
Noncentral F distribution
18
0
10
20
30
Noncentral F distribution
40
50
Figure 1(b). QQ plots of the Wald statistic TF versus the noncentral F distribution Fp,n−p (δ1), n = 50.
tau=0
tau=0.2
5 4
3
F statistic
F statistic
4
2
1
2
1
0
0
0
1
2
3
4
5
0
1
2
3
4
Central F distribution
Noncentral F distribution
tau=1.0
tau=1.8
20
50
15
40
F statistic
F statistic
3
10
5
30
20 5
10 0 0
5
10
15
Noncentral F distribution
20
10
20
30
40
Noncentral F distribution
50
60
Figure 2(a). QQ plots of the statistic T 2 versus the noncentral chi-square distribution χ2p(δ1), n = 20.
tau=0
tau=0.2
40
30
T-square statistic
T-square statistic
50
20
10
40
30
20
10
0
0
0
10
20
30
40
0
10
20
30
40
50
Central chi-square distribution
Noncentral chi-square distribution
tau=1.0
tau=1.8 300
250
T-square statistic
T-square statistic
120
80
40
200
150
100
50
0
0
5
30
55
80
105
Noncentral chi-square distribution
130
155
0
50
100
150
200
250
Noncentral chi-square distribution
300
Figure 2(b). QQ plots of the statistic T 2 versus the noncentral chi-square distribution χ2p(δ1), n = 50.
tau=0
tau=0.2
30
T-square statistic
T-square statistic
30
20
10
0
20
10
0
0
5
10
15
20
25
30
0
5
10
15
20
25
Central chi-square distribution
Noncentral chi-square distribution
tau=1.0
tau=1.8
30
35
150
T-square statistic
T-square statistic
300
100
50
250
200
150
100
50 0 0
20
40
60
80
100
120
Noncentral chi-square distribution
140
160
50
100
150
200
250
Noncentral chi-square distribution
300
350
Figure 3(a). QQ plots of the Wald statistic TW versus the noncentral chi-square χ21(δ2 ), n = 20.
tau=0
tau=0.2
10 12
Wald statistic
Wald statistic
8
6
4
8
4 2
0
0
1
3
5
7
0
9
2
4
6
8
10
Central chi-square distribution
Noncentral chi-square distribution
tau=1.0
tau=1.8
12
14
40 50
40
Wald statistic
Wald statistic
30
20
30
20
10 10
0
0
0
10
20
30
Noncentral chi-square distribution
40
0
10
20
30
40
Noncentral chi-square distribution
50
Figure 3(b). QQ plots of the Wald statistic TW versus the noncentral chi-square χ21(δ2), n = 50.
tau=0
tau=0.2
10 15
Wald statistic
Wald statistic
8
6
4
10
5 2
0
0
0
2
4
6
8
0.5
10
3.0
5.5
8.0
10.5
13.0
Noncentral chi-square distribution
Noncentral chi-square distribution
tau=1.0
tau=1.8
15.5
18.0
100 60
Wald statistic
Wald statistic
80
40
60
40
20
20 0 0 0
10
20
30
40
50
Noncentral chi-square distribution
60
70
0
20
40
60
80
Noncentral chi-square distribution
100
(1)
Figure 4(a). QQ plots of the likelihood ratio statistic TM L versus the noncentral chi-square χ21 (ϕ), n = 20.
tau=0
tau=0.2
11
Likelihood ratio statistic
Likelihood ratio statistic
14
7
3
9
4
-1
-1
-2
0
2
4
6
8
10
0
12
5
10
15
Central chi-square distribution
Noncentral chi-square distribution
tau=1.0
tau=1.8
40
Likelihood ratio statistic
Likelihood ratio statistic
60
30
20
10
0
40
20
0
0
10
20
30
Noncentral chi-square distribution
40
0
10
20
30
40
50
Noncentral chi-square distribution
60
70
(1)
Figure 4(b). QQ plots of the likelihood ratio statistic TM L versus the noncentral chi-square χ21 (ϕ), n = 50.
tau=0
tau=0.2 19
11
Likelihood ratio statistic
Likelihood ratio statistic
9
7
5
3
1
-1
14
9
4
-1
-2
0
2
4
6
8
10
0
12
5
10
15
Central chi-square distribution
Noncentral chi-square distribution
tau=1.0
tau=1.8
20
60
Likelihood ratio statistic
Likelihood ratio statistic
130
40
20
110
90
70
50
30
0
10 0
10
20
30
40
50
60
Noncentral chi-square distribution
70
20
40
60
80
100
Noncentral chi-square distribution
120
(1)
Figure 5(a). QQ plots of the Wald statistic TW versus the likelihood ratio statistic TM L , n = 20.
tau=0
tau=0.2
11
7
Wald statistic
Wald statistic
14
3
9
4
-1
-1 -2
0
2
4
6
8
10
12
0
5
10
Likelihood ratio statistic
Likelihood ratio statistic
tau=1.0
tau=1.8
15
50
30
Wald statistic
Wald statistic
40
20
10
30
20
0
10
0
5
10
15
20
25
Likelihood ratio statistic
30
35
10
20
30
Likelihood ratio statistic
40
50
(1)
Figure 5(b). QQ plots of the Wald statistic TW versus the likelihood ratio statistic TM L, n = 50.
tau=0
tau=0.2
11
Wald statistic
Wald statistic
9
7
5
14
9
3 4 1
-1
-1
-2
0
2
4
6
8
10
0
12
5
10
Likelihood ratio statistic
Likelihood ratio statistic
tau=1.0
tau=1.8
15
100 60
80
Wald statistic
Wald statistic
50
40
30
20
60
40
10 20 10
20
30
40
Likelihood ratio statistic
50
60
30
50
70
Likelihood ratio statistic
90
(2)
Figure 6(a). QQ plots of the likelihood ratio statistic TM L versus the noncentral chi-square χ215(ϕ), n = 20.
tau=0
tau=0.2 60
60
Likelihood ratio statistic
Likelihood ratio statistic
50
40
20
0
30
20
10
0
0
10
20
30
40
50
60
0
10
20
30
40
50
Central chi-square distribution
Noncentral chi-square distribution
tau=1.0
tau=1.8
60
80
Likelihood ratio statistic
60
Likelihood ratio statistic
40
40
20
0
60
40
20
0 0
10
20
30
40
50
Noncentral chi-square distribution
60
0
10
20
30
40
50
60
Noncentral chi-square distribution
70
80
(2)
Figure 6(b). QQ plots of the likelihood ratio statistic TM L versus the noncentral chi-square χ215(ϕ), n = 50.
tau=0
tau=0.2
50
40
Likelihood ratio statistic
Likelihood ratio statistic
50
30
20
10
40
30
20
10
0
0
0
10
20
30
40
0
50
10
20
30
40
50
Central chi-square distribution
Noncentral chi-square distribution
tau=1.0
tau=1.8
140
Likelihood ratio statistic
Likelihood ratio statistic
80
60
40
120
100
80
60
20
40 0 0
20
40
60
Noncentral chi-square distribution
80
40
60
80
100
120
Noncentral chi-square distribution
140