error terms see Schermelleh-Engel, Moosbrugger, Frank & Klein, 1996; J oreskog ..... where the interaction effect is not too high and sample size is not too small.
Methods of Psychological Research Online 1997, Vol.2, No.2 Internet: http://www.pabst-publishers.de/mpr/
c 1998 Pabst Science Publishers
Methodological Problems of Estimating Latent Interaction Eects Helfried Moosbrugger Karin Schermelleh-Engel Andreas Klein Johann Wolfgang Goethe-University, Frankfurt am Main
Abstract Approaches developed for the analysis of latent interaction eects are confronted with several methodological problems resulting from measurement error of indicator variables, nonlinearity of parameters, speci cation of mean structures, inadequate variable transformations, and, as the major characteristic of interaction models, non-normally distributed variables. While some of these problems can be solved with covariance structure analysis or least squares estimation procedures in structural equation modeling, the new approach \Latent Moderated Structural Equations (LMS)" (Klein & Moosbrugger, submitted; Klein, in prep.) is the only method that takes the non-normal distributions of latent interaction models explicitly into account. LMS produces maximum likelihood estimates by analyzing mixture densities with the EM algorithm, provides ecient parameter estimates and yields unbiased standard error estimates for inferential statistics. Keywords: Latent interaction models, mixture distribution, ML estimation, structural equation modeling (SEM), EM algorithm.
1 Introduction: Latent interaction models In structural equation modeling (SEM) the latent variables are usually linearly related. But in some cases theory may suggest that the linear structural relationship itself is moderated by a latent variable: The slope of the regression of an endogenous variable on an exogenous variable varies with the realizations of a second exogenous variable, the 'moderator variable'. For an analysis of this relationship, a product term of exogenous variables is included in the structural equation, modeling a latent interaction eect. Let us consider an elementary latent interaction model (Equation 1) with two exogenous variables 1 and 2 , a latent product term 1 2 , an endogenous variable , an intercept term , structural parameters 1 , 2 , 3 , and a disturbance term . Each exogenous variable is measured by two observed indicators (x1 , x2 , and x3 , x4 , respectively, see Equation 2), and the endogenous variable is measured by one observed indicator y (Equation 3). The measurement models of 1 and 2 include the factor loadings 21 and 42 (x1 and x3 are the scaling variables) and the error variables 1 ,...,4 . Structural equation: = + 1 1 + 2 2 + 3 1 2 + (1) This research has been supported by the Deutsche Forschungsgemeinschaft (DFG), grants No. Mo 474/3-1 and Mo 474/3-2.
96
Estimating Latent Interaction Eects
Measurement models:
0x 1 0 1 BBx12 CC = BB21 @x A @ 0 3 x4
0
0 0 1
42
1 0 1 CC 1 + BB12CC A @ A 2
3 4
y=
(2)
(3)
The following assumptions are made: x1 ; :::; x4 are multivariate normal with zero expectations, 1 and 2 are bivariate normal with zero expectations, 1 ; :::; 4 are normal with zero expectations, is normal with zero expectation, i and j are independent for i 6= j (i = 1; :::; 4; j = 1; :::; 4), i and k are independent for i = 1,...,4 and k = 1, 2, is independent of i and k for i = 1,...,4 and k = 1, 2. In the analysis of latent interaction models the latent interaction eect 3 is of special interest, as this parameter indicates how much the slope1 of on 1 is predicted to change given a one unit change in the moderator variable 2 .
2 Methodological problems of interactions in structural equation modeling Since 1984, several approaches to the analysis of continuous latent interaction eects have been developed2. Most of them study the elementary latent interaction model ('Kenny-Judd model') proposed by Kenny and Judd (1984), which is an interaction model with one latent interaction eect. The continuous-variable approaches for latent interaction models are confronted with various problems, which arise mainly from a) measurement error of indicator variables, b) nonlinearity of parameters, c) mean structures, d) inadequate interpretation of transformation of variables, and e) non-normal distribution of indicator and latent variables (cf. SchermellehEngel, Moosbrugger, Frank & Klein, 1996). In the following, these problems will be investigated and explained in more detail.
2.1 Measurement error of indicator variables
Measurement error is a substantial problem for the analysis not only of latent interaction models, but also of manifest moderator models based on multiple regression, when observed variables are treated as if each was a perfectly reliable measure. Although this assumption is rather common in empirical research, it is often incorrect.
1 The structural equation (Equation 1) can be rewritten in a form which shows that the total eect of 1 on depends on a linear moderating function: = +( 1 + 3 2 )1 + 2 2 + . Then, the interpretation of 1 is reasonable as part of the moderating function ( 1 + 3 2 ). As can be seen, 1 and 2 cannot be interpreted separately (cf. Steyer, 1994). 2 For more information on categorical-variable interaction models (e.g. \multi-sample analysis" with LISREL) see Jaccard and Wan (1996) or Schumacker and Lomax (1996). only
MPR{online 1997, Vol.2, No.2
c 1998 Pabst Science Publishers
97
Estimating Latent Interaction Eects
Figure 1: Reliability of the product term Rel(x1 x2 ) in relation to the correlation of the variables, Corr(x1 , x2 ), here with Rel(x1 ) = Rel(x2 ) = 0.50.
It is well-known in social and behavioral sciences that observed variables often contain random and nonrandom errors (Bollen, 1989, p. 151). Ignoring measurement errors can lead to biased estimations of the regression coecients3 , a problem that will even be aggravated by adding a product variable x1 x2 for the interaction eect to the regression equation (Equation 4). y = 0 + 1 x1 + 2 x2 + 3 x1 x2 +
(4)
The reliability of the product variable x1 x2 does not only depend on the reliability of the predictor x1 and the moderator x2 , but also on the correlation of x1 and x2 (Busemeyer & Jones, 1983). Fig. 1 illustrates this relation for the two variables of reliability 0.5 (Rel(x1 ) = Rel(x2 ) = 0.5). If, for example, x1 and x2 are uncorrelated (Corr(x1 , x2 ) = 0), the reliability of the product variable x1 x2 is as low as the product of the reliability coecients of the predictor and the moderator variable (Rel(x1 x2 ) = 0.25). If the two variables are correlated (0 < Corr(x1, x2 ) 0.5), the reliability of the product variable (Rel(x1 x2 )) increases but does not reach the reliability of x1 or x2 . Note that the correlation of the variables x1 and x2 cannot exceed the maximum of the reliabilities of x1 and x2 , so the abscissa in Figure 1 has 0.5 as its upper bound. Even with very reliable measures (e.g. Rel(x1 ) = Rel(x2 ) = 0.90), the reliability Rel(x1 x2 ) will be signi cantly smaller than 0.90. The reliability of the product will only be one, if the measures are perfectly reliable. As least squares estimates of the regression coecients in multiple regression analysis with unreliable variables are biased and inconsistent, the low reliability of the product term in moderated regression analysis even aggravates the bias and, as a consequence, leads to an underestimation of the interaction eect and a loss of the statistical power for testing the hypothesis 3 6= 0. In contrast to multiple regression analysis with observed variables, methods developed for the analysis of structural equation models (e.g. LISREL, Joreskog & Sorbom, 1993; EQS, Bentler, 1995; AMOS, Arbuckle, 1997) take the biasing eects of measurement error explicitly into account by establishing measurement models for the latent variables. Moreover, power is still a problem in moderated regression as well as in latent interaction analyses. The power of a signi cance test is lowered when the reliability of the observed variables decreases (Bollen, 1989; Kaplan, 1995). As unreliable 3 A 'correction for attenuation' (Aiken & West, 1991, p. 145 .) in multiple regression analysis has become obsolete with the development of structural equation models. MPR{online 1997, Vol.2, No.2
c 1998 Pabst Science Publishers
98
Estimating Latent Interaction Eects
measures severely limit the validity of conclusions that can be drawn from analyses using these measures, latent interaction models should be preferred including appropriate indicator variables.
2.2 Nonlinearity of parameters
In covariance structure analysis of latent interaction models one of the main problems is the appropriate speci cation of a measurement model for the latent product term 1 2 . Methods applying covariance structure analysis need to enhance the measurement equation (Equation 2) in order to get indicators for the latent product term 1 2 . Kenny and Judd (1984) were the rst who suggested to use four product variables (x1 x3 ; x1 x4 ; x2 x3 , and x2 x4 ) as a measurement model for the latent product term 1 2 in the structural equation4 . But the multiplication of indicator variables leads to nonlinear terms of the factor loadings and to complex error terms (see Schermelleh-Engel, Moosbrugger, Frank & Klein, 1996; Joreskog & Yang, 1996), for example x2 x4 = 21 42 1 2 + 21 1 4 + 42 2 2 + 2 4
(5)
For that reason, Hayduk (1987) had to implement a great number of phantom variables (Rindskopf, 1983, 1984) to handle the various parameter constraints resulting from nonlinearity in LISREL 7 (Joreskog & Sorbom, 1989). This complex speci cation task has become obsolete with the development of LISREL 8 (Joreskog & Sorbom, 1993), which allows for nonlinear restrictions of the parameters to implement the elementary latent interaction model. Although modeling nonlinear parameter terms can be regarded as solved in covariance structure analysis, other major problems still remain.
2.3 Mean structures
Methods analyzing covariance matrices in structural equation modeling (e.g. LISREL) usually assume that the observed and latent variables are centered variables, i.e. variables given in mean deviation form. But even if 1 and 2 have zero expectations, the product term 1 2 does not have zero expectation in general. This fact was not considered in early LISREL implementations (e.g. Hayduk, 1987). The misspeci cation was that the latent product term 1 2 , the indicator y of the endogenous variable, and the products of indicators formed in the measurement model of 1 2 were treated as if they were centered variables. To illustrate this misspeci cation problem, consider a structural equation model with centered predictor and moderator variables (E(1 ) = E(2 ) =0), a criterion variable and a latent product term 1 2 (Equation 1). As can be easily veri ed, 1 2 has only zero expectation if the latent variables 1 and 2 are uncorrelated: E(1 2 ) = E(1 ) E(2 ) + Cov(1 ; 2 ) = Cov(1 ; 2 )
(6)
Under the assumption E( ) = 0, it follows from Equations 1 and 6 for the expectation of : E() = E( + 1 1 + 2 2 + 3 1 2 + ) = + 1 E(1 ) + 2 E(2 ) + 3 E(1 2 ) + E( ) = + 3 E(1 2 ) = + 3 Cov(1 ; 2 )
(7)
4 Joreskog and Yang (1997) stated that the elementary latent interaction model is already identi ed with one product variable only. MPR{online 1997, Vol.2, No.2
c 1998 Pabst Science Publishers
99
Estimating Latent Interaction Eects
If is measured by one indicator variable y only (y = ), the expectation of y equals the intercept term plus the covariance of 1 and 2 weighted by 3 . To avoid misspeci cation, the constant intercept term and the expectation of the latent product term 1 2 (in LISREL: E(1 2 ) = 3 ) need to be estimated in LISRELmodels (Joreskog & Yang, 1996). Ignoring these facts can lead to errors in the estimates.
2.4 Variable transformation problems
Another peculiarity of interaction models which might be important for the interpretation of the structural model derives from the behavior of the model under linear transformation of latent variables (see Aiken & West, 1991). It is well-known that in linear structural equation models (without product terms) a translation of the predictor scales (i.e., adding constants to predictor scores) does not aect the structural parameters i , and it does not aect the path diagram's structure given by the structural equation. However, this structural invariance is generally false for interaction models. Transformation by adding constants has a profound eect on structural coecients. To demonstrate this problem, we de ne the following linear transformation of the predictor variables 1 and 2 under the assumption of 3 being dierent from zero:
10 := 1 + 2
3
20 := 2 + 1 :
3
(8)
Then the structural equation (Equation 1) of the interaction model can be rewritten as follows: = + 1 1 + 2 2 + 3 1 2 + = + 1 (10 ,
2 ) + 2 (20 ,
1 ) + 3 (10 ,
2 )(20 ,
1 ) + 3
3
3
3
= + 1 (10 ,
2 ) + 2 (20 ,
1 ) + 3 (10 ,
2 )20 , 3 (10 ,
2 )
1 + 3 3 3 3 3
1
2 0 0 0 = + 2 (2 , ) + 3 (1 , )2 +
3
3
1 = + 2 20 , 2 + 3 10 20 , 3
2 20 + 3 3
1 0 0 = , 2 + 3 1 2 + 3 = 0 + 3 10 20 + ;
(9)
where 0 = , 2
is the intercept term of the transformed equation. As can be seen in the bottom line of Equation 9, the transformed structural equation now lacks regression parameters for the linear eects of the predictors 10 and 20 ( 10 = 20 = 0). The eect of 1 on and of 2 on is now being evaluated at dierent zero points of the predictors than was originally the case. In our example, we chose zero points of 10 and 20 in such a way that the eect on evaluated at these points became zero5. A common linear transformation of the latent variables is mean centering, which is often done in LISREL models. As can be seen from Equation 10, mean centering of the latent variables 1 and 2 (100 = 1 , 1 , 200 = 2 , 2 ) in a structural equation 1 3
5 If the variables 2 and 20 are interpreted as moderator variables, the original moderating function ( 1 + 3 2 ) is transformed into ( 3 20 ), so the transformation implies a shifting of the moderating function. MPR{online 1997, Vol.2, No.2
c 1998 Pabst Science Publishers
100
Estimating Latent Interaction Eects
Table 1: Distributional characteristics of the latent product term 1 2 vs. a normally distributed latent variable with same expectation and variance.
Expectation Variance Skewness Kurtosis (centered)
Distributional Characteristics Product Term 1 2 Normally Distributed Variable 0.235 0.235 0.369 0.369 1.999 0.000 6.187 0.000
with a latent interaction term 3 leads to dierent values of 00 , 100 and 200 , but it does not alter 3 : = + 1 1 + 2 2 + 3 1 2 + = + 1 (100 + 1 ) + 2 (200 + 2 ) + 3 (100 + 1 )(200 + 2 ) + (10) = + 1 1 + 2 2 + 3 1 2 + ( 1 + 3 2 )100 + ( 2 + 3 1 )200 + 3 100 200 + = 00 + 100 100 + 200 200 + 3 100 200 +
with intercept 00 = + 1 1 + 2 2 + 3 1 2 , and structural parameters 100 =
1 + 3 2 , and 200 = 2 + 3 1 : As the latter translation (Equation 10) illustrates, the moderator eect 3 is again independent of any translation of 1 and 2 . In a structural equation with a latent interaction eect, the parameters 1 and 2 do not
represent constant eects of the latent variables. In contrast to structural equation models without latent interaction terms, the structural parameters 1 and 2 are not independent of translations of the latent variables, whereas the latent interaction eect 3 is unaected by the scale translation. Therefore, the parameters 1 and
2 must be interpreted in relation to the scaling chosen for the latent variables 1 and 2 . Again, one should not interpret the parameters 1 and 2 on their own, but interpret the way in which the linear relationship between and 1 is moderated by 2 (see Footnote 1).
2.5 Non-normality of variables: skewness and kurtosis
A latent interaction model involves a product term in the structural equation (Equation 1). Even if all indicators of the latent exogenous variables and the latent variables 1 and 2 themselves are normally distributed, the distribution of the product term 1 2 is not normal. Figure 2 illustrates the density of the product term 1 2 vs. the density of a normally distributed variable with the same expectation value and variance. The variables 1 and 2 follow a bivariate normal distribution with zero expectation and covariance matrix . The parameter values of are chosen from an example of the elementary latent interaction model given by Joreskog and Yang (1996, p. 74): Var(1 ) = 0.49, Cov(1 ,2 ) = 0.2352, and Var(2 ) = 0.64. The skewness and kurtosis of the densities in Figure 2 are listed in Table 1. The distribution of the latent product term 1 2 in Figure 2 shows considerable skewness and kurtosis, indicating its deviation from normality. In contrast to the non-normal density, a normally distributed variable with the same expectation and variance has skewness zero and centered kurtosis zero. The deviation from normality will be even more extreme, if the covariance of 1 and 2 increases. As the product term 1 2 is one component of the structural equation, this indicates that the endogenous variable cannot be normally distributed either. The degree of non-normality in the distribution of depends on the non-normality of 1 2 , the size of the interaction eect 3 , and on the variance of the disturbance term in relation to the variance of . For a similar reason, the distribution of the indicator MPR{online 1997, Vol.2, No.2
c 1998 Pabst Science Publishers
101
Estimating Latent Interaction Eects
- Density of product term ξ1ξ2 ++ Density of a
normally distributed
variable
Figure 2: Density of latent product term 1 2 vs. density of a normally distributed variable with equal expectation value (0.235) and variance (0.369).
y as well as the indicators of the latent product term of the Kenny-Judd model deviate substantially from normality. Methods are needed that explicitly take the non-normal distribution of the indicator variables into account.
3 Methods developed for the analysis of latent interaction models The methodological problems of latent interaction models arising from measurement error of indicator variables, nonlinearity of the parameters, and misspeci ed mean structures (see Sections 2.1 - 2.3) are general problems that can be solved quite easily with methods developed for the analysis of latent moderator models, while variable transformation problems (see Section 2.4) are independent of the method used, as these problems deal with the correct interpretation of interaction eects. But all methods have to deal with the problem of multivariate non-normality of the indicator variables (see Section 2.5). For the methods developed until now, this distribution problem can lead to a dilemma: Either, if one uses an estimation procedure which assumes the indicator variables to be normally distributed (as LISREL-ML does), this procedure might not be robust against the non-normality of interaction models. Or, using an asymptotically distribution-free estimation procedure (as LISREL-WLSA or 2SLS), this procedure might yield inecient parameter estimates, because distribution-free methods do not exploit the speci c distributions induced by product terms. Table 2 gives an overview of recently developed methods and their estimation characteristics. For an evaluation of these methods, the results of simulation studies (if conducted) were examined regarding bias of parameter estimates, ful llment of normality assumptions, and utilization of non-normal distributions. The listed approaches can be divided into three groups: Estimation methods with the assumption of normally distributed indicator variables, distribution-free methods, and the LMS method tailored for non-normal distributions induced by interaction models. Hayduk (1987) was the rst who implemented the elementary latent interacMPR{online 1997, Vol.2, No.2
c 1998 Pabst Science Publishers
102
Estimating Latent Interaction Eects
Table 2: Three groups of methods developed for the analysis of latent interaction models. Estimation methods with the assumption of normally distributed indicator variables, distribution-free methods, and the LMS method tailored for non-normal distributions.
Method
Evaluation Studies
LISREL 7 { Hayduk (1987) ML LISREL 7 { Moosbrugger, Frank ML and Schermelleh-Engel (2-Step-T.) (1991) Ping (1996) LISREL 8 { Jaccard and Wan ML (1995) J oreskog and Yang (1996, 1997) Schermelleh-Engel, Klein and Moosbrugger (1998) Yang Jonsson (1997) LME{ULS Moosbrugger, Klein, Frank and Schermelleh-Engel (1993, 1996) PLS Chin, Marcolin and Newsted (1996) BALAM{ Wittenberg and (Bayes) Arminger (1997) LISREL 8 { Joreskog and Yang WLS (1996) Yang Jonsson (1997) LISREL 8 { Joreskog and Yang WLSA (1996, 1997) Schermelleh-Engel, Klein and Moosbrugger (1998) Yang Jonsson (1997) 2SLS Bollen (1995, 1996) Schermelleh-Engel, Klein and Moosbrugger (1998) LMS{ML Klein, Moosbrugger, Schermelleh-Engel and Frank (1997) Klein and Moosbrugger (submitted) Schermelleh-Engel, Klein and Moosbrugger (1998)
MPR{online 1997, Vol.2, No.2
Estimation Characteristics Bias of Ful llment Utilization Parameter of Distri- of Estimators butional Non-normal AssumpDistributions tions not investigated not investigated
no
no
no
no
unbiased
no
no
asymptot. unbiased
yes
no
asymptot. unbiased not investigated asymptot. unbiased
yes
no
yes
no
yes
no
asymptot. unbiased
yes
no
unbiased
yes
no
unbiased
yes
yes
c 1998 Pabst Science Publishers
Estimating Latent Interaction Eects
103
tion model (Kenny & Judd, 1984) in LISREL 7 using the maximum likelihood estimation method based on the normality assumption. The implementation was quite complex and required the speci cation of many additional phantom variables (Rindskopf, 1983, 1984) to obtain estimates of the nonlinear loading terms and error variances. A simpli cation of this method using two-step techniques was proposed by Moosbrugger, Frank and Schermelleh-Engel (1991) as well as by Ping (1996), but all approaches based on LISREL 7 have become obsolete with the development of LISREL 8 (Joreskog & Sorbom, 1993) allowing nonlinear parameter constraints. Still, the implementation of the latent interaction model is complex even for the elementary latent interaction model, and needs a careful derivation of nonlinear parameter terms. The main problem of applying the LISREL-ML approach to latent interaction models is the violation of distributional assumptions. Using LISRELML with non-normal variables leads to serious underestimation of standard errors (a correction proposed by Joreskog and Yang, 1996, resulted in overestimation of the standard errors) and biased chi-square values (Klein, Moosbrugger, SchermellehEngel & Frank, 1997; Joreskog & Yang, 1996, 1997; Schermelleh-Engel, Klein & Moosbrugger, 1998). Despite these problems LISREL 8-ML may be used in cases where the interaction eect is not too high and sample size is not too small. On the other hand, the distribution-free methods in the second part of Table 2 are not based on the normality assumption, but their estimators are only asymptotically unbiased (with the exception of 2SLS which produces unbiased estimates). Although there is no violation of distributional assumptions, these methods do not exploit the type of non-normal distributions induced by interaction eects (Bollen, 1995, 1996; Chin, Marcolin & Newsted, 1996; Moosbrugger, Klein, Frank & Schermelleh-Engel, 1996; Joreskog & Yang, 1996, 1997; Wittenberg & Arminger, 1997; Yang Jonsson, 1997). Simulation studies are needed to examine eciency and statistical power of distribution-free methods for nite samples. The only Bayesian approach developed for the analysis of latent interaction models is BALAM (Bayesian Analysis of Latent Variable Models) by Wittenberg and Arminger (1997). The posterior distributions of the parameters are estimated using the Gibbs sampler and the Metropolis-Hastings algorithm, while the computations are performed with the GAUSS program system BALAM. Preliminary results of a simulation study with sample size N = 100 show biased parameter estimates and relatively large standard errors. Further investigation is needed to get information about bias and eciency of parameter estimates in larger samples. LISREL 8-WLS and LISREL 8-WLSA are least squares approaches. WLS (weighted least squares) analyzes the covariance matrix, WLSA (weighted least squares based on the augmented moment matrix) the augmented moment matrix, i.e., the covariance matrix augmented by the mean vector of indicator variables. Simulation studies (Joreskog & Yang, 1996; Yang Jonsson, 1997) show that standard errors of parameter estimates and chi-square statistics are prone to error for WLS because WLS uses an incorrect weight matrix, whereas WLSA uses the correct weight matrix and therefore provides asymptotically correct standard errors and chi-square values (Joreskog & Yang, 1996, p. 64). But our own simulation studies (Schermelleh-Engel, Klein & Moosbrugger, 1998; Klein, in prep.) show that for small to medium sample size (N 400) the eciency of the LISREL-WLSA estimates is relatively low compared to LMS-ML (Latent Moderated Structural Equations; Klein & Moosbrugger, submitted; Klein, in prep.). In addition to that, the standard errors are often grossly underestimated in LISREL-WLSA applications, and they cannot be interpreted easily for inferential statistics and con dence intervals because of large type I errors. These results con rm Yang Jonsson's statement that LISREL-WLSA should only be used with large samples (Yang Jonsson, 1997, p. 94). Least squares methods analyzing the raw scores instead of the covariance or MPR{online 1997, Vol.2, No.2
c 1998 Pabst Science Publishers
Estimating Latent Interaction Eects
104
augmented moment matrix are LME-ULS (Latent Moderated Eects { Unweighted Least Squares; Moosbrugger, Klein, Frank & Schermelleh-Engel, 1996), PLS (Partial Least Squares; Chin, Marcolin & Newsted, 1996), and 2SLS (Two-Stage Least Squares; Bollen, 1995, 1996). 2SLS outperforms LME-ULS and PLS as well as LISREL-WLSA by providing unbiased parameter estimates (see Table 2). Although the performance of the 2SLS method has not yet been fully investigated for interaction models, we demonstrated in a simulation study of medium sample sizes (Schermelleh-Engel, Klein & Moosbrugger, 1998) that the 2SLS estimator of the interaction eect 3 is unbiased and that 2SLS yields unbiased { but very high { standard errors of the estimates. The disadvantage of 2SLS lies in its low power and low eciency relative to LMS-ML, but also relative to LISREL-ML and LISRELWLSA. In practice, the application of 2SLS requires large sample sizes to provide estimates of satisfying precision for interaction models. To summarize Table 2, only LISREL-ML, 2SLS, and LMS-ML (Latent Moderated Structural Equations, see Section 4) are known to provide unbiased parameter estimates. But LMS-ML is the only maximum likelihood estimation method that explicitly utilizes the non-normal distributions. The LMS maximum likelihood estimation method has the preferable large sample properties given for ML estimates (see Section 4). Additionally, the nite sample properties were investigated: LMS provides ecient parameter estimates and yields unbiased standard error estimates for inferential statistics (Schermelleh-Engel, Klein & Moosbrugger, 1998). The new LMS estimation method is explained in more detail in the following section.
4 Latent Moderated Structural Equations (LMS): A new approach to the analysis of latent interaction eects
With the development of Latent M oderated S tructural Equations6 (LMS; Klein, Moosbrugger, Schermelleh-Engel & Frank, 1997; Klein & Moosbrugger, submitted; Klein, in prep.) there now exists a method that utilizes the type of multivariate non-normal distribution in latent interaction models. Because the density function of the indicator variables cannot be maximized directly for the model parameters, LMS uses the EM-algorithm (Estimation Maximation Algorithm; Dempster, Laird & Rubin, 1977). The EM-algorithm can be adapted to non-normal distributions and yields maximum likelihood estimates of the model parameters. These estimates are consistent, unbiased and ecient. We give only an outline of the new LMS-ML estimation technique, a detailed description will be presented elsewhere (Klein & Moosbrugger, submitted), where also a practical example for an LMS analysis is discussed. LMS is based on an analysis of the multivariate density function of the indicator variables, which takes the non-normal distribution of product terms explicitly into account. The result of the density analysis yields a representation of the distribution of the joint indicator vector as a nite mixture of normal distributions7 . This makes it possible to exploit the mathematical structure of the product term for the estimation procedure without loss of statistical information. By adaptation of the EM algorithm to the mixture distribution, LMS provides an iterative computation of maximum likelihood estimates for the model parameters. The elements of the new method are explained with reference to the elementary latent interaction model (see Equations 1, 2, 3), a description for generalized latent interaction models with multiple interaction eects and more complex measurement 6 The LMS method is developed as part of the doctoral thesis of Andreas Klein. 7 A survey of the application of mixture density concepts is given by Rost and Erdfelder (1996).
MPR{online 1997, Vol.2, No.2
c 1998 Pabst Science Publishers
105
Estimating Latent Interaction Eects
models is given elsewhere (Klein & Moosbrugger, submitted). The model has 14 parameters, and we de ne the 14-dimensional parameter vector = (; 1 ; 2 ; 3 ; Var(1 ); Var(2 ); Cov(1 ; 2 ); 21 ; 42 ; (11) Var(1 ); Var(2 ); Var(3 ); Var(4 ); Var( )): Since the structural equation (Equation 1) includes the product term 1 2 , the indicator variable y does not follow a normal distribution (see Section 2.5). However, the conditioned variable yj2 (2 is the moderator variable) and the conditioned joint indicator vector (x1 ; x2 ; x3 ; x4 ; y)j2 still follow a normal distribution. The LMS approach takes advantage of the relations between unconditional and conditional distributions and approximates the multivariate density f of the joint indicator vector (x; y) = (x1 ; x2 ; x3 ; x4 ; y) by a nite mixture of multivariate normal densities (a detailed deduction of this analytical step is given in Klein & Moosbrugger, submitted): f (x = x; y = y) =
M X j =1
j 'j ;j (x; y);
(12)
where j (j = 1; :::; M ) are the mixture probabilities and 'j ;j (x; y) (j = 1; :::; M ) are the densities of the normally distributed mixture components. The mixture probabilities j are derived analytically, and they do not depend on the model parameters. The model implied mean vectors j and covariance matrices j of the densities 'j ;j (x; y) are complex functions8 of the parameter vector . The number M of terms in the nite mixture depends on the accuracy chosen for the approximation. For example, with regard to the elementary latent interaction model, simulations showed that a choice of M = 16 terms provides a suciently precise approximation of the density. A general approach to the maximum likelihood estimation of parametric models with mixture densities is the application of the EM algorithm (for an overview see Redner & Walker, 1984). The general principle of this algorithm interprets the indicator vector as the incomplete part of a complete, not fully observable vector. By using the expected log-likelihood function of the complete vector, it produces ML estimates for the model in an iterative estimation procedure (Dempster, Laird & Rubin, 1977). An outline of the LMS implementation is given below. The components of the mixture densities (Equation 12) are indexed from 1 to M . Consider a multinomially distributed variable j with P (j = j ) = j , j = 1; :::; M . Then each row of the (N 5) data matrix (X, Y), i. e. (xi ; yi ) = (xi1 ; :::; xi4 ; yi ), i = 1; :::; N , can be interpreted as the outcome of a two-step random process: In the rst step, a component index ji is drawn as an outcome of the multinomially distributed random variable j; in the second step, a data row (xi ; yi ) is drawn from the component distribution with index ji . Then, every data row (xi ; yi ) can be treated as the incomplete part of a complete data row (ji ; xi ; yi ), where ji indicates the component index drawn for the i-th data row. Thus, every row (xi ; yi ) of the data matrix (X, Y) is symbolically enhanced by the (unobserved) corresponding component label ji , and (ji ; xi ; yi ) is interpreted as a row of the complete (N 6) data matrix (J, X, Y). The appropriate data matrices, densities and log-likelihood functions of the incomplete vector (x, y) vs. complete vector (j, x, y) variables are given in Table 3. For the ML estimation of parameter vector , the EM algorithm employs the loglikelihood function L (J; X; Y) in order to maximize L (X; Y) for the parameter vector . The procedure requires a vector (0) of starting values for the parameters
8 A detailed derivation of these functions in matrix notation form is given by Klein and Moosbrugger (submitted). MPR{online 1997, Vol.2, No.2
c 1998 Pabst Science Publishers
106
Estimating Latent Interaction Eects
Table 3: Data matrix, density function, and log-likelihood function for incomplete and complete vector.
Incomplete Vector (observed)0
BB B Data Ma- (X, Y) = B BB trix B@ (with sample size N )
Density Function Loglikelihood Function
x11 ; : : : ; x14 ; y1
.. .
xi1 ; : : : ; xi4 ; yi
.. .
1 CC CC CC CA
xN 1 ; : : : ; xN 4 ; yN f (x = x;P y = y) = M ' j =1 j j ;j (x; y) L (X; YP )= N ln f (x ; : : : ; x ; y ) i1 i4 i i=1
Complete Vector (not fully observed) 0
BB B (J; X; Y) = B BB B@
j1 ; x11 ; : : : ; x14 ; y1
.. .
ji ; xi1 ; : : : ; xi4 ; yi
.. .
1 CC CC CC CA
jN ; xN 1 ; : : : ; xN 4 ; yN g(j = j; x = x; y = y) = j 'j ;j (x; y) L (J; XP ; Y) = N ln g(j ; x ; : : : ; x ; y ) i i1 i4 i i=1
and the observed data matrix (X, Y). The iterative procedure starts from (0) , updates the parameters of in every cycle of the iteration and converges to a maximum likelihood estimate ^ for . Each cycle consists of two subsidiary parts, the estimation step and the maximation step (see Figure 3). We illustrate the EM-algorithm assuming the current value of the parameter vector to be (p) . The component indices ji of the complete data matrix (J, X, Y) are unobserved. Thus, the log-likelihood function L (J; X; Y) cannot be applied directly, and the EM principle calculates in the estimation step p a log-likelihood function LEM p (X; Y) for the incomplete data, which is the conditional expectation of L p (J; X; Y) under the condition of (p) and the observed data matrix (X, Y). As a result of the estimation step, we obtain a function LEM p which depends on the parameter vector (p+1) only. In the maximation step we calculate the parameter vector (p+1) as the argument vector which maximizes the function LEM p . Then, we compare (p+1) with the previous parameter vector (p) : If the dierence of the two parameter vectors does not ful ll a criterion of convergence, the iterative estimation proceeds using (p+1) as the current value (p) of the parameter vector. If the criterion of convergence is ful lled, the algorithm stops and (p+1) is regarded as an estimation of . Then, ^ := (p+1) maximizes the log-likelihood function L (X; Y) of the observed variables. As proved by the general theory of the EM principle described by Dempster et al. (1977), the iterative algorithm yields a maximum likelihood estimate for with respect to the log-likelihood function L (X; Y) of the observed indicator variables. The non-normal distribution type of the indicator vector is explicitly taken into account while executing the estimation and maximation step with the appropriate mixture density functions. Unlike the implementation of latent interaction models under LISREL (Joreskog & Yang 1996, 1997; Yang Jonsson 1997), LMS uses the full raw data information of indicator variables for estimation and no products of indicator variables are necessary. The large sample properties of the ML estimators computed by LMS are given by general ML estimation theory (Breiman, 1973): LMS estimators are consistent, asymptotically unbiased, asymptotically ecient, and asymptotically normally distributed. The nite sample properties of LMS estimators have been examined in simulation studies (Schermelleh-Engel, Klein & Moosbrugger, 1998), where sample sizes and interaction eect sizes were varied at dierent levels. The simulation results show that LMS estimators are unbiased and very ecient, even for small sample sizes (e.g. N = 200). The density function f of joint indicator vector (x, y) is explicitly derived and classi ed in LMS, and a calculation of the standard errors is obtained by calculation of the Fisher information matrix. In the simulation ( +1)
( +1)
( +1)
( +1)
MPR{online 1997, Vol.2, No.2
c 1998 Pabst Science Publishers
107
Estimating Latent Interaction Eects
Starting Values (p = 0):
Observed Data:
(p)
Θ
(X,Y)
Estimation Step: Conditional Expectation of Log-Likelihood Function for Complete Data (p) (p+1) LEMΘ (X,Y) := E ( LΘ(p+1) (J, X, Y) Θ , X,Y)
p+1→p
Maximation Step: (p+1) Maximize for Θ : LEMΘ(p+1) (X,Y) no
? (p+1)
Θ
(p)
≈Θ
Θ
(p+1)
yes
Stop : = Θ ( p+1) Θ
Figure 3: Iterative estimation procedure of LMS-ML (a detailed description is given in
the text).
studies, the LMS standard error estimates proved to be unbiased, which enables the calculation of correct con dence intervals for statistical inference.
5 Discussion The analysis of interaction models is confronted with several methodological problems which have dierent consequences for the development of estimation methods. Measurement errors and reliability of indicator variables, nonlinear parameter relations, and the correct modeling of mean structures are chie y model speci cation problems: With the development of covariance structure analysis for latent variable models, as EQS or LISREL, these speci cation problems can be accomplished. The application of these methods requires the forming of products of indicators as a measurement model for the latent product term. Further, one needs a careful implementation of the correct parameter constraints for the model implied covariance matrix and mean vector, because the estimation algorithms of these methods are based on a t function minimizing the dierence between the empirical and the model implied covariance matrices and mean vectors. For empirical research, it is important that the interpretation of the size of the structural parameters should be done carefully. In contrast to linear models, a characteristic property of interaction models lies in the fact that the structural coecients 1 and 2 (Equation 1) are not invariant when the latent exogenous MPR{online 1997, Vol.2, No.2
c 1998 Pabst Science Publishers
Estimating Latent Interaction Eects
108
variables are rescaled by translations. Dierently, the structural coecient 3 of the latent interaction model is not aected by a rescaling of latent exogenous variables. The size of structural coecients can therefore only be interpreted in relation to the scales de ned for the latent variables. The major problem of latent interaction models is caused by the joint distribution of the observed variables, which deviates substantially from multivariate normality, if interaction eects exist. This is even more the case with the application of covariance structure analysis, because these methods require products of indicator variables as a measurement model of the latent interaction term, but the products are non-normally distributed. For that reason, covariance structure analysis of interaction models with LISREL-ML violates the normality assumption of this method. If the interaction eect is not too high and sample size is large, LISREL-ML provides consistent and unbiased parameter estimates, but the estimation of standard errors remains biased. Therefore, inferential statistics and con dence intervals based on the estimated standard errors are not reliable. Although LISREL-ML violates the assumption of normally distributed indicator variables, it could still be used for exploratory purposes, preliminary data analysis or analysis of large samples. Covariance structure analysis with the LISREL-WLSA method is asymptotically distribution free, so there is no violation of distributional assumptions. Still, it provides asymptotically unbiased and consistent parameter estimates. But despite of these distributional properties there are drawbacks for the practical application of this method. 'Even with a good theory, one must have a very large sample to accurately estimate the asymptotic covariance matrix Wa needed for WLSA' (Joreskog & Yang, 1996, p. 85). Our own simulation studies (Schermelleh-Engel, Klein & Moosbrugger, 1998) have shown that for small or medium sample sizes the eciency of the LISREL-WLSA estimates is very low compared to LMS and the standard errors are still underestimated in LISREL-WLSA applications for medium sample sizes; therefore they cannot be interpreted easily for inferential statistics. Alternatively, latent interaction models can by analyzed by a two-stage least squares estimation procedure, the 2SLS method (Bollen, 1995, 1996). 2SLS provides unbiased estimates of parameters and standard errors, so that con dence intervals calculated with the standard error estimates are reliable measures for inferential statistics. But as we have shown (see Schermelleh-Engel, Klein & Moosbrugger, 1998), the disadvantage of 2SLS lies in its low power and low eciency relative to LMS-ML, but also relative to LISREL-ML and LISREL-WLSA. In practice, the application of 2SLS to interaction models requires a large sample size to compute estimates of satisfying precision. The LMS estimation method (Klein & Moosbrugger, submitted; Klein, in prep.) outperforms the other methods developed so far for the analysis of latent interaction models. This new method takes the distributional characteristics of non-normally distributed variables explicitly into account and implements an iterative ML estimation of the parameters. Simulation studies con rm that the LMS parameter estimates are consistent, unbiased and ecient, even for small or medium sample sizes (Klein, Moosbrugger, Schermelleh-Engel & Frank, 1997; Schermelleh-Engel, Klein & Moosbrugger, 1998). Moreover, the estimation of standard errors in LMS is unbiased which allows hypothesis testing of interaction eects. For the analysis of empirical data9 LMS assumes the indicators of the exogenous variables to be normally distributed, which should be veri ed before using LMS. But regardless of the method used, interaction eects should not be analyzed routinely; in general there should be theoretical reasons for setting up a latent interaction model. 9 The LMS algorithm has been programmed, tested and veri ed for the elementary latent interaction model with one interaction eect and sample sizes of N 200. A test version of LMS restricted to the elementary latent interaction model is available from the authors upon request. MPR{online 1997, Vol.2, No.2
c 1998 Pabst Science Publishers
Estimating Latent Interaction Eects
109
References
[1] Aiken, L. S. & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park: SAGE Publications. [2] Arbuckle, J. L. (1997). AMOS Users' Guide Version 3.6. Chicago: Small Waters Corporation. [3] Bentler, P. M. (1995). EQS structural equations program manual. Encino, CA: Multivariate Software. [4] Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley. [5] Bollen, K. A. (1995). Structural equation models that are nonlinear in latent variables. In P. V. Marsden (Ed.), Sociological methodology 1995 (Volume 25). Washington, DC: American Sociological Association. [6] Bollen, K. A. (1996). An alternative two-stage least squares (2SLS) estimator for latent variable equations. Psychometrika, 61, 109-121. [7] Breiman, L. (1973). Statistics. Boston: Houghton Miin Company. [8] Busemeyer, J. R. & Jones, L. E. (1983). Analyses of multiplicative combination rules when the causal variables are measured with error. Psychological Bulletin, 93, 549562. [9] Chin, W. W., Marcolin, B. L. & Newsted, P. R. (1996). A partial least squares latent variable modeling approach for measuring interaction eects: Results from a Monte Carlo simulation study and voice mail emotion/adaptation study. In J. I. DeGross, S. Jarvenpaa & A. Srinivasan (Eds.), Proceedings of the Seventeenth International Conference on Information Systems (pp. 21-41), Cleveland, Ohio. [10] Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Ser. B, 39, 1-38. [11] Hayduk, L. A. (1987). Structural equation modeling with LISREL. Baltimore: Johns Hopkins University Press. [12] Jaccard, J. & Wan, C. K. (1995). Measurement error in the analysis of interaction eects between continuous predictors using multiple regression: Multiple indicator and structural equation approaches. Psychological Bulletin, 117, 348-357. [13] Jaccard, J. & Wan, C. K. (1996). LISREL approaches to interaction eects in multiple regression (Quantitative Applications in the Social Sciences No. 114). Thousand Oaks, CA: Sage. [14] Joreskog, K. G. & Sorbom, D. (1989). LISREL 7: A guide to the program and applications (2nd Ed.). Chicago, IL: SPSS. [15] Joreskog, K. G. & Sorbom, D. (1993). New features in LISREL 8. Chicago, IL: Scienti c Software. [16] Joreskog, K. G. & Yang, F. (1996). Nonlinear structural equation models: The KennyJudd model with interaction eects. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling: Issues and techniques (pp. 57-87). Mahwah, NJ: Lawrence Earlbaum Associates. [17] Joreskog, K. G. & Yang, F. (1997). Estimation of interaction models using the augmented moment matrix: Comparison of asymptotic standard errors. In W. Bandilla & F. Faulbaum (Eds.), SoftStat '97 (Advances in Statistical Software 6, pp. 467-478). Stuttgart: Lucius & Lucius. [18] Kaplan, D. (1995). Statistical power in structural equation modeling. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 100-117). Thousand Oaks, CA: Sage Publications. [19] Kenny, D. A. & Judd, C. M. (1984). Estimating the nonlinear and interactive eects of latent variables. Psychological Bulletin, 99, 422-431. MPR{online 1997, Vol.2, No.2
c 1998 Pabst Science Publishers
Estimating Latent Interaction Eects
110
[20] Klein, A. (in prep.). Ein neues Verfahren zur Schatzung latenter Moderatoreekte: Die LMS-Methode (A new approach to the estimation of latent interaction eects: The LMS method). Doctoral Thesis at J. W. Goethe-University, Frankfurt am Main, Department of Psychological Methodology. [21] Klein, A. & Moosbrugger, H. Maximum likelihood estimation of latent interaction eects with the LMS method (manuscript submitted for publication). [22] Klein, A., Moosbrugger, H., Schermelleh-Engel, K. & Frank, D. (1997). A new approach to the estimation of latent interaction eects in structural equation models. In W. Bandilla & F. Faulbaum (Eds.), SoftStat '97 (Advances in Statistical Software 6, pp. 479-486). Stuttgart: Lucius & Lucius. [23] Moosbrugger, H., Frank, D. & Schermelleh-Engel, K. (1991). Zur U berprufung von latenten Moderatoreekten mit linearen Strukturgleichungsmodellen (Estimating latent interaction eects in structural equation models). Zeitschrift fur Dierentielle und Diagnostische Psychologie, 12, 245-255. [24] Moosbrugger, H., Klein, A., Frank, D. & Schermelleh-Engel, K. (1993). On estimating parameters of latent moderator eects in structural equation models. Arbeiten aus dem Institut fur Psychologie der J. W. Goethe-Universitat, Heft 11/1993. Frankfurt am Main: Institut fur Psychologie der J. W. Goethe-Universitat. [25] Moosbrugger, H., Klein, A., Frank, D. & Schermelleh-Engel, K. (1996). Zum Problem der Schatzung von latenten Moderatoreekten (On the problem of estimating latent moderator eects). In R. Brandmaier & Ch. Rietz (Hrsg.), Methodische Grundlagen und Anwendungen von Strukturgleichungsmodellen (Band 2, S. 5-35). Mannheim: Forschung Raum und Gesellschaft e.V. [26] Ping, R. A. (1996). Latent variable interaction and quadratic eect estimation: A two-step technique using structural equation analysis. Psychological Bulletin, 119, 166-175. [27] Redner, R. A. & Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26, 195-239. [28] Rindskopf, D. (1983). Parameterizing inequality constraints on unique variances in linear structural models. Psychometrika, 48, 73-83. [29] Rindskopf, D. (1984). Using phantom and imaginary latent variables to parameterize constraints in linear structural models. Psychometrika, 49, 37-47. [30] Rost, J. & Erdfelder, E. (1996). Mischverteilungsmodelle (Models with mixture distributions). In E. Erdfelder, R. Mausfeld, T. Meiser & G. Rudinger (Hrsg.),Quantitative Methoden. Weinheim: Psychologie Verlags Union. [31] Schermelleh-Engel, K., Moosbrugger, H., Frank, D. & Klein, A. (1996). Grundlagen und Probleme von latenten Moderatoreekten in Strukturgleichungsmodellen: Ein Vergleich von LISREL und LME (Fundamental principles and problems of latent interaction eects in structural equation models: A comparison of LISREL and LME). Arbeiten aus dem Institut fur Psychologie der J. W. Goethe-Universitat, Heft 5/1996. Frankfurt am Main: Psychologisches Institut der J. W. Goethe-Universitat. [32] Schermelleh-Engel, K., Klein, A. & Moosbrugger, H. (1998). Estimating nonlinear eects using a Latent Moderated Structural Equations Approach. In R. E. Schumacker & G. A. Marcoulides (Eds.), Interaction and nonlinear eects in structural equation modeling. Mahwah, NJ: Lawrence Erlbaum Associates (in press). [33] Schumacker, R. E. & Lomax, R. G. (1996). A beginner's guide to structural equation modeling. Mahwah, NJ: Lawrence Erlbaum Associates. [34] Steyer, R. (1994). Stochastische Modelle. In W. H. Tack & T. Herrmann (Hrsg.), Enzyklopadie der Psychologie, Themenbereich B: Methodologie und Methoden, Serie I: Forschungsmethoden der Psychologie, Band 1: Methodologische Grundlagen der Psychologie (S. 649-693). Gottingen: Hogrefe. [35] West, S. G., Finch, J. F. & Curran, P. J. (1995). Structural equation models with non-normal variables. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 56-75). Thousand Oaks, CA: Sage Publications. MPR{online 1997, Vol.2, No.2
c 1998 Pabst Science Publishers
Estimating Latent Interaction Eects
111
[36] Wittenberg, J. & Arminger, G. (1997). Bayesian nonlinear latent variable models { speci cation and estimation with the program system BALAM. In W. Bandilla & F. Faulbaum (Eds.), SoftStat '97 (Advances in Statistical Software 6, pp. 487-494). Stuttgart: Lucius & Lucius. [37] Yang Jonsson, F. (1997). Nonlinear structural equation models: Simulation studies of the Kenny-Judd model. Uppsala: University.
MPR{online 1997, Vol.2, No.2
c 1998 Pabst Science Publishers