An Introduction to Using Structural Equation Models in ... - APA PsycNET

1 downloads 0 Views 479KB Size Report
May 19, 2008 - Rebecca Weston, Department of Psychology, Southern Illinois University;. Paul A. Gore, Jr., Department of Educational Psychology, University ...
Rehabilitation Psychology 2008, Vol. 53, No. 3, 340 –356

Copyright 2008 by the American Psychological Association 0090-5550/08/$12.00 DOI: 10.1037/a0013039

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

An Introduction to Using Structural Equation Models in Rehabilitation Psychology Rebecca Weston

Paul A. Gore, Jr.

Southern Illinois University

University of Utah

Fong Chan

Denise Catalano

University of Wisconsin––Madison

University of North Texas

Objective: To provide an overview of structural equation modeling (SEM) using an example drawn from the rehabilitation psychology literature. Design: To illustrate the 5 steps in SEM (model specification, identification, estimation methods, interpretation of results, and model modification), an example is presented, with details on determining whether alternataive models result in a significant improvement to fit to the observed data. Data are from a sample of 274 people with spinal cord injury. Issues commonly encountered in preparing data for SEM analyses (e.g., missing data, nonnormality) are reviewed, as is the debate surrounding some aspects of SEM (e.g., acceptable sample size). Conclusion: SEM can be a powerful procedure for empirically representing complex and sophisticated theoretical models of interest to rehabilitation psychologists. Keywords: structural equation modeling, covariance structure analysis, statistical guide

goal of this article is to provide an overview of SEM, illustrating the advantages of using multiple measures versus single measures in theory testing. Our example data and model are drawn from a study conducted by Catalano (2006) to validate Kumpfer’s (1999) resiliency framework model for predicting life adaptation in a sample of 274 people with spinal cord injury. Most SEM guides outline the following five steps in modeling: model specification, identification, estimation, evaluation of fit, and modification (Kline, 2005; Schumacker & Lomax, 2004). This overview will discuss each of the steps in testing a structural equation model and uses the Catalano (2006) data as an example. Table 1 includes a guide to commonly used terms in SEM. The example model shown in Figure 1 includes six constructs drawn from Kumpfer’s (1999) resiliency framework model. Resiliency, in this model, implies the ability to demonstrate competence in the context of significant challenges to adaptation or development (Masten & Coatsworth, 1998). According to the model, adaptation by individuals to an initial stressor results from a developmental and interactive process involving environmental context (e.g., social support), transactional processes between individuals and their environment (e.g., coping strategies), individual characteristics (e.g., personal strength, optimism, tenacity), and resiliency processes, which are a current focus of resiliency research efforts. Outcomes of social competence or the absence of an emotional or behavioral maladjustment are considered to be representative of positive adaptation (Luthar, Cicchetti, & Becker, 2000). In rehabilitation research, the outcome of adaptation involves psychosocial adaptation to disability and chronic illness. Psychosocial adaptation is also frequently measured by an absence of maladjustment symptomatology, such as depression (Livneh & Wilson, 2003). Consistent with Kumpfer’s (1999) model, the acute or chronic stressor is represented by Severity of Disability in Figure 1. An

The use of structural equation modeling (SEM) in rehabilitation psychology has increased in the last decade. In our review of two complete volumes (2006 –2007) of Rehabilitation Psychology, we found 11 articles in which some form of SEM analysis was used to test the research hypothesis, suggesting that SEM is increasingly being used by rehabilitation psychology researchers to empirically test complex theories (Chan, Chan, Siu, & Poon, 2007; Chronister & Chan, 2006; Corrigan, Larson, & Kuwabara, 2007; Elliott, Bush, & Chen, 2006; Harkins, Elliott, & Wan, 2006; Hui, Elliott, Shewchuk, & Rivera, 2007; Lee, Chan, & Berven, 2007; Motl, McAuley, & Snook, 2007; Motl, Snook, McAuley, Scott, & Gliottoni, 2007; Suzuki, Krahn, McCarthy, & Adams, 2007; Ziegelmann, Luszcznska, Lippke, & Schwarzer, 2007). Perhaps one of the greatest advantages of using SEM over other techniques is its ability to more accurately represent constructs through the use of multiple measures. Using a single measure of a construct restricts the generalizability of results, whereas using multiple measures of a construct decreases limitations associated with measurement error and incomplete representation of the construct. The

Editor’s Note. Kathleen Chwalisz served as the action editor for this article.—TRE

Rebecca Weston, Department of Psychology, Southern Illinois University; Paul A. Gore, Jr., Department of Educational Psychology, University of Utah; Fong Chan, Department of Rehabilitation Psychology and Special Education, University of Wisconsin––Madison; Denise Catalano, Department of Rehabilitation, Social Work and Addictions, University of North Texas. Portions of this article are based on Weston and Gore (2006). Correspondence concerning this article should be addressed to Rebecca Weston, Department of Psychology, Southern Illinois University, Mailcode 6502, Carbondale, IL 62901. E-mail: [email protected] 340

SPECIAL ISSUE: INTRODUCTION TO SEM

341

Table 1 Common Terms and Symbols in Structural Equation Modeling

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Term used here

Alternative term(s)

Definition

Symbol

Examples in figure(s)

Latent variable

Factor, construct

Unobserved hypothetical variable (e.g., Perceived Stress)

Perceived Stress in Figure 1

Indicator

Measured or manifest variable

Unobserved variable (e.g., harassment)

Irritability in Figure 1

Parameter

Path

Factor loading

Path loading

Hypothesized association between two variables Correlation between latent variable and indicator

Direct effect

Unmediated effect

Correlation between two latent variables

3

Covariance

Nondirectional association, correlation Predictor error, measurement error

Correlation between two latent variables

7

Error in indicator that is not accounted for by latent variable. Indicator error is also considered to be a latent variable Error in dependent latent variable not accounted for by predictors

e

e

e associated with each indicator in Figure 1

D

D

D associated with each dependent latent variable in Figure 1

Percent of variance in dependent latent variable accounted for by predictor(s) Variable that is not dependent upon or predicted by other latent variables or indicators Variable that is predicted by other latent variables or indicators Parameter that is set at a constant and not estimated. Parameters constrained at 1.0 reflect an expected 1:1 association between variables. Parameters constrained at 0 reflect the assumption that no relationship exists. Parameter that is not constrained and is to be estimated using observed data Unstandardized associations between all pairs of variables Degree of asymmetry observed in the distribution for a variable Degree of the peakedness of the distribution for a variable

R2

Indicator error

Disturbance

Predictor error

Explained variance Independent variable

Exogenous variable, predictor

Dependent variable

Endogenous variable, criterion

Constrained parameter

Fixed parameter; Set path

Free parameter

Estimated parameter

Covariance matrix

Sample matrix

Skewness

Asymmetry

Kurtosis

Flatness/Peakedness

important aspect of resiliency is that the individual must perceive the initial stressor as being a stressor, thus Perceived Stress is included as well. Social Support represents the environmental context, and Positive Coping represents the person-environment transactional processes. Individual characteristics (i.e., tenacity, optimism, personal strength) are represented by Resilience. Finally, Depression represents the outcome of psychosocial adaptation. Severity of Disability was measured by co-occurring pain, number of hospitalizations, and ability to perform activities of daily

3,7

Arrows in all figures

3

Unidirectional arrow from Perceived Stress to Irritability in Figure 1 Unidirectional arrow from Severity of Disability to Perceived Stress in Figure 1 —

1-D2 in Figures 3 and 4 — —

Parameters set at nonzero values should be labeled:

1.0 Parameters set at 0 are omitted. Represented with an asterisk or simply unlabeled. S

Severity of Disability in Figures 1, 3 and 4; Predictor error in Figure 1 Depression in Figures 1, 3 and 4; Co-occurring pain in Figure 1 Parameter set at 1.0 from Severity of Disability to Cooccurring pain in Figure 1; Parameter set at 0 from Severity of Disability to Social Support in Figure 1 Parameter from Perceived Stress to Irritability in Figure 1. Lower left diagonal of Table 2.









living, assessed with the Self-Efficacy for Daily Activities subscales of The Moorong Self-Efficacy Scale (Middleton, Tate, & Geraghty, 2003). Perceived Stress was measured with 7 subscales of the Perceived Stress Questionnaire (Levenstein et al., 1993): (a) Harassment, (b) Overload, (c) Irritability, (d) Lack of Joy, (e) Fatigue, (f) Worries, and (g) Tension. Social Support was measured with the Multidimensional Scale of Perceived Social Support (Zimet, Dahlem, Zimet, & Farley, 1988). This 12-item multidimensional scale measures the respondent’s perception of the social support available from three sources––family, friends, and signif-

WESTON, GORE, CHAN, AND CATALANO

342 *e

*e 1.0

Irritability

*

1.0

Lack of joy

Fatigue

*

*

* *

Perceived Stress

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

*e 1.0

Negative affect

*e 1.0

Positive affect

*e 1.0

Interpersonal

*e 1.0

1.0

Personal Strength

*e 1.0

*

Tenacity

*e 1.0

Optimism

*e 1.0

*

*

Depression

*

*

D*

* 1.0

Somatic

*

1.0

*

D*

D* 1.0

*e 1.0 *e

Co-occurring pain

1.0

Tension

Worries

1.0

1.0 1.0

*e 1.0

*

1.0

*e

*e

*e 1.0

1.0

Overload

Harassment

*e

*e 1.0

1.0

# hospitalization Ability to perform ADL

*

Severity of Disability*

Social Support

*

*

1.0

*

*

*

Family

Friends 1.0 *e

1.0 *e

Significant Others

Resilience

1.0 *e

* 1.0

*

D*

Positive Coping* 1.0

Coping planning 1.0 *e

*

Active coping 1.0 *e

Figure 1. Proposed model. Asterisks represent parameters to be estimated. D ⫽ disturbance error associated with dependent latent variables; e ⫽ indicator error associated with observed variables; ADL ⫽ activities of daily living.

icant others. Positive Coping was measured with the Active Coping and Planning subscales from The Brief COPE (Carver, 1997). Depression was represented with Radloff’s (1977) Center for Epidemiologic Studies Depression Scale, including four subscales: Somatic, Positive Affect, Negative Affect, and Interpersonal. Finally, Resilience was measured by the Connor-Davidson Resilience Scale (Connor & Davidson, 2003). However, the three factors reported by Yu and Zhang (2007) were used: (a) tenacity, (b) personal strength, and (c) optimism. Descriptive statistics for all variables are provided in Table 2.

SEM Background

path diagram. Path diagrams can be used to illustrate and test direct, indirect, and total effects among observed variables. The combination of factor and path analysis is based on Jo¨reskog’s (1973) outline of the general structural equation model that consists of two parts: measurement models and structural models. Factor analysis is used in building the measurement models that define latent variables. Latent variables are the unobserved constructs free of random error hypothesized to underlie observed variables, or indicators. In the model shown in Figure 1, six measurement models are proposed; the first is as follows: Co-occurring pain ⫽ function of Severity of Disability ⫹ error

SEM is a hybrid of two statistical techniques: factor analysis and path analysis. In factor analysis, intercorrelations among measured variables are analyzed to confirm the unobserved constructs. In contrast, path analysis is a method used by investigators to describe the correlations among a set of variables when no underlying constructs are assumed to exist. By showing pictorially how correlations among variables were related to model parameters, Wright (1920) created the

Number of hospitalizations ⫽ function of Severity of Disability ⫹ error Ability to perform activities of daily living ⫽ function of Severity of Disability ⫹ error

Pain Hospital. ADL abil. Harassment Overload Irritability Lack joy Fatigue Worries Tension SS-Friends SS-Sig. Ot. SS-Family Cope plan. Act. cope Somatic Neg. aff. Pos. aff. Interpers. Pers. str. Tenacity Optimism

0.63 0.20 ⫺1.58 0.10 0.09 0.09 0.10 0.12 0.12 0.12 ⫺0.05 0.06 ⫺0.04 ⫺0.08 ⫺0.04 0.13 0.08 0.08 0.05 ⫺0.03 ⫺0.05 ⫺0.03 1.28 0.79

1

3

4

5

6

7

8

9

10

0.16 ⫺0.18 0.22 0.17 0.16 0.21 0.24 0.24 0.23 2.49 ⫺0.23 0.01 0.01 0.09 0.13 0.07 0.13 0.11 ⫺4.08 126.38 ⫺0.15 ⫺0.02 ⫺0.25 ⫺0.40 ⫺0.30 ⫺0.38 ⫺0.36 0.01 ⫺0.99 0.35 0.70 0.56 0.57 0.54 0.64 0.62 0.02 ⫺0.14 0.28 0.45 0.41 0.44 0.51 0.58 0.56 0.10 ⫺1.93 0.23 0.19 0.48 0.62 0.54 0.56 0.63 0.12 ⫺2.55 0.19 0.17 0.24 0.33 0.60 0.71 0.64 0.07 ⫺2.13 0.20 0.22 0.24 0.22 0.40 0.66 0.65 0.13 ⫺2.71 0.24 0.25 0.25 0.26 0.27 0.41 0.73 0.12 ⫺2.70 0.25 0.25 0.29 0.24 0.28 0.31 0.45 ⫺0.18 6.54 ⫺0.13 ⫺0.11 ⫺0.30 ⫺0.34 ⫺0.24 ⫺0.25 ⫺0.28 ⫺0.05 4.14 ⫺0.15 ⫺0.10 ⫺0.33 ⫺0.32 ⫺0.20 ⫺0.24 ⫺0.29 ⫺0.07 4.10 ⫺0.24 ⫺0.16 ⫺0.28 ⫺0.31 ⫺0.29 ⫺0.25 ⫺0.34 ⫺0.12 4.43 0.00 0.06 ⫺0.09 ⫺0.16 ⫺0.17 ⫺0.15 ⫺0.10 ⫺0.41 5.79 ⫺0.06 ⫺0.01 ⫺0.19 ⫺0.20 ⫺0.21 ⫺0.20 ⫺0.23 0.12 ⫺2.65 0.17 0.15 0.17 0.18 0.21 0.24 0.26 0.08 ⫺2.78 0.17 0.12 0.21 0.22 0.18 0.24 0.24 0.09 ⫺4.16 0.18 0.12 0.28 0.32 0.22 0.30 0.27 0.04 ⫺0.10 0.12 0.09 0.11 0.10 0.08 0.11 0.11 ⫺0.13 3.48 ⫺0.11 ⫺0.07 ⫺0.18 ⫺0.20 ⫺0.17 ⫺0.20 ⫺0.19 ⫺0.09 3.69 ⫺0.10 ⫺0.07 ⫺0.19 ⫺0.20 ⫺0.17 ⫺0.21 ⫺0.19 ⫺0.09 2.66 ⫺0.12 ⫺0.07 ⫺0.21 ⫺0.22 ⫺0.14 ⫺0.21 ⫺0.19 0.73 47.73 1.82 1.98 2.09 2.03 2.35 1.97 2.01 1.58 11.24 0.59 0.67 0.69 0.57 0.63 0.64 0.67

2 ⫺0.04 ⫺0.08 0.39 ⫺0.15 ⫺0.11 ⫺0.29 ⫺0.39 ⫺0.26 ⫺0.26 ⫺0.28 2.25 1.48 1.40 0.57 0.54 ⫺0.24 ⫺0.29 ⫺0.44 ⫺0.08 0.38 0.36 0.45 5.10 1.50

11

13

14

0.04⫺0.04⫺0.07 ⫺0.02⫺0.03⫺0.05 0.22 0.23 0.27 ⫺0.16⫺0.26 0.00 ⫺0.09⫺0.16 0.06 ⫺0.28⫺0.26⫺0.09 ⫺0.33⫺0.35⫺0.19 ⫺0.19⫺0.30⫺0.18 ⫺0.22⫺0.25⫺0.16 ⫺0.26⫺0.33⫺0.10 0.59 0.60 0.26 2.85 0.57 0.21 1.50 2.42 0.22 0.52 0.51 2.10 0.40 0.48 1.30 ⫺0.22⫺0.30⫺0.14 ⫺0.31⫺0.30⫺0.20 ⫺0.43⫺0.42⫺0.36 ⫺0.15⫺0.18⫺0.05 0.30 0.30 0.35 0.32 0.34 0.37 0.52 0.39 0.26 5.63 5.39 3.99 1.69 1.55 1.45

12 ⫺0.03 ⫺0.19 0.37 ⫺0.08 ⫺0.01 ⫺0.20 ⫺0.25 ⫺0.23 ⫺0.23 ⫺0.24 0.26 0.17 0.22 0.64 1.96 ⫺0.19 ⫺0.31 ⫺0.45 ⫺0.12 0.48 0.47 0.35 4.40 1.40

15 0.27 0.12 ⫺0.38 0.46 0.36 0.40 0.50 0.54 0.60 0.63 ⫺0.26 ⫺0.21 ⫺0.31 ⫺0.16 ⫺0.22 0.39 0.24 0.24 0.15 ⫺0.16 ⫺0.19 ⫺0.15 0.73 0.62

16 0.18 0.08 ⫺0.43 0.49 0.32 0.52 0.65 0.48 0.66 0.61 ⫺0.33 ⫺0.32 ⫺0.33 ⫺0.24 ⫺0.38 0.67 0.34 0.32 0.17 ⫺0.22 ⫺0.23 ⫺0.22 0.48 0.58

17 0.11 0.07 ⫺0.44 0.36 0.21 0.48 0.66 0.42 0.55 0.47 ⫺0.34 ⫺0.30 ⫺0.32 ⫺0.30 ⫺0.38 0.45 0.65 0.71 0.15 ⫺0.29 ⫺0.31 ⫺0.30 0.84 0.84

18

0.12 0.05 ⫺0.16 0.38 0.26 0.30 0.34 0.24 0.33 0.31 ⫺0.09 ⫺0.17 ⫺0.21 ⫺0.07 ⫺0.16 0.44 0.54 0.33 0.29 ⫺0.09 ⫺0.10 ⫺0.13 0.22 0.54

19

⫺0.07 ⫺0.13 0.49 ⫺0.30 ⫺0.17 ⫺0.42 ⫺0.54 ⫺0.43 ⫺0.49 ⫺0.45 0.40 0.28 0.30 0.39 0.54 ⫺0.41 ⫺0.60 ⫺0.54 ⫺0.27 0.40 0.36 0.32 2.95 0.63

20

⫺0.09 ⫺0.09 0.49 ⫺0.27 ⫺0.15 ⫺0.40 ⫺0.53 ⫺0.41 ⫺0.49 ⫺0.43 0.36 0.28 0.32 0.38 0.50 ⫺0.45 ⫺0.59 ⫺0.56 ⫺0.27 0.85 0.45 0.29 2.79 0.67

21

⫺0.05 ⫺0.09 0.34 ⫺0.30 ⫺0.16 ⫺0.44 ⫺0.54 ⫺0.31 ⫺0.46 ⫺0.41 0.43 0.44 0.36 0.26 0.36 ⫺0.36 ⫺0.56 ⫺0.51 ⫺0.34 0.74 0.62 0.48 2.89 0.69

22

Note. Covariances appear in the lower left of the matrix, with variances on the diagonal in bold. Correlations appear in the upper right of the matrix. Hospital. ⫽ hospitalizations; ADL abil. ⫽ ability to perform activities of daily living; SS ⫽ Social Support; Sig. Ot. ⫽ Significant others; Cope plan. ⫽ coping planning; Act. cope ⫽ active coping; Neg. aff. ⫽ negative affect; Pos. aff. ⫽ positive affect; Interpers. ⫽ Interpersonal; Pers. str. ⫽ personal strength.

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. M SD

Indicator

Table 2 Covariance and Correlation Matrices With Means and Standard Deviations for Observed Variables

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

SPECIAL ISSUE: INTRODUCTION TO SEM

343

WESTON, GORE, CHAN, AND CATALANO

344

As in factor analysis, measures that have little error should have higher standardized loadings on latent variables and will generally be better indicators. For example, in the standardized solution, if number of hospitalizations had a higher factor loading on Severity of Disability compared to ability to perform activities of daily living, number of hospitalizations would be considered a better indicator of Severity of Disability than ability to perform activities of daily living would be. Structural equations specify relationships between independent and dependent latent variables. Two of the structural equations depicted in Figure 1 are the following:

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Depression ⫽ Resilience ⫹ Perceived Stress ⫹ error Resilience ⫽ Perceived Stress ⫹ Social Support ⫹ Positive Coping ⫹ error These equations are the result of the first step in SEM: model specification.

Model Specification Model specification refers to the process where a researcher describes the relationships that are hypothesized to exist among all variables. Both observed variables (indicators; Harassment in Figure 1) and unobserved variables (latent variables; Perceived Stress in Figure 1) are specified in this step. The researcher describes how each of the elements of the model is expected to relate to other elements by setting parameter values.

Parameter Values Relationships among variables are called parameters or paths and are shown with arrows in Figure 1. Parameters are (a) constrained to a nonzero value and not estimated, (b) constrained to zero and not estimated, or (c) left free to be estimated from the data. The first condition is seen most often when parameters are constrained to 1.0 to scale latent variables (e.g., parameter from Perceived Stress to Harassment in Figure 1). Unlike in regression analyses, where the scale of the predictors and criterion are defined by the variables themselves, latent variables have no inherent scale. This is a problem when trying to interpret the associations among latent variables. For example, without a scale, we would not be able to understand how changes in the Severity of Disability construct relate to changes in the Perceived Stress construct. To scale a latent variable, either the variance of the latent variable can be constrained to 1.0 or one of the factor loadings (the parameter from the latent variable to an indicator) can be constrained to 1.0. Constraining the variance of the latent variable standardizes all indicators, which is useful in confirmatory factor analysis where the goal is to estimate all factor loadings. However, when researchers are interested in determining whether the factor structure of a scale is similar across groups (i.e., testing for factorial invariance), the latent factor variance should not be constrained to 1.0. Instead, one factor loading should be constrained to 1.0 for each latent variable; this is the more commonly used approach in SEM and results in the scale of the latent variable being equal to the scale of the measured variable. When using this approach, researchers should choose the most representative indicator as the marker

variable. In Figure 1, factor loadings for harassment, co-occurring pain, friends, coping planning, somatic, and resilience were constrained to 1.0 to scale their respective latent variables. When only a single group is analyzed, the overall fit of the model is not affected by the option chosen. Parameters may also be constrained to other nonzero values on the basis of findings from previous research. For example, error variances may be constrained to a nonzero value based on the reliability of a specific indicator. Researchers may also constrain parameters to equality when past research has suggested they are similarly impacted by an independent variable or have similar effects on a dependent variable. Parameters that are constrained to zero are often not considered, especially in complex models, because the focus of SEM is more often on those relationships that are thought to exist (e.g., parameter from Severity of Disability to Perceived Stress in Figure 1). Parameters constrained to zero should be carefully considered by researchers because they reflect the hypothesized lack of a relationship between two variables. Researchers are urged to consider the implications of parameters that are not estimated in addition to considering implications of estimated parameters. For example, in Figure 1, if the direct parameters from Severity of Disability to Depression and to Resilience are indeed zero, the resulting fully mediated model has implications that differ from those of a partially mediated model that includes direct effects of Severity of Disability on Depression and on Resilience. In the example in Figure 1, the direct relationship between Severity of Disability and Social Support is assumed to be equal to zero, although a mediated relationship (through Perceived Stress) is hypothesized. If Severity of Disability and Social Support are directly related, this can result in misspecification of the structural model because the parameter between the two latent variables in Figure 1 is not displayed and therefore assumed to be constrained to zero. Misspecification occurs when the relationships hypothesized in a structural equation model do not reflect the relationships observed in the data. A simple example in Figure 2 provides an illustration of misspecification. The observed data for our example are in the table shown in Figure 2. The model we have specified in Figure 2 implies that subsets of the observed variables are related and others are unrelated. A matrix could be generated to summarize the relationships described by the model; this matrix is referred to as the model implied matrix or ⌺. All structural equation models are built from raw data and those data are in the form of either a covariance matrix (an unstandardized correlation matrix) or a correlation matrix. Raw data or covariance matrices are preferred because statistical theory underlying SEM is based on covariance, not correlation. To simplify the example in Figure 2, we have included a correlation matrix on the left side of the figure. This matrix is referred to as the observed or sample matrix (or S). Differences are evident when comparing the sample matrix and the model in Figure 2. For example, b and d are assumed to be unrelated in the model, but the sample matrix shows a moderate relationship (r ⫽ .57). In essence, we have created a model that says the relationship between b and d is equal to zero when, in fact, it is equal to .57. Our model will suffer (e.g., be misspecified) to the extent that our observed relationships are not captured by the hypothesized relationships in our models. Therefore, it is impor-

SPECIAL ISSUE: INTRODUCTION TO SEM

*e a b c d ___________________________ a 1.00 ---b .81 1.00 --c .10 .09 1.00 -.72 1.00 d .04 .57 ___________________________

1.0

Indicator a

345

1.0 Factor 1*

*e

1.0

* Indicator b *

*e

1.0

Indicator c

1.0

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Factor 2 *e

1.0

* Indicator d

1.0 D*

Figure 2. Example correlation matrix with misspecified model. Asterisks represent parameters to be estimated.

tant that the researcher carefully consider all specified relationships.

Parameters Three types of parameters can be specified: direct effects, variances, and covariances. Direct effects describe the relationships between latent variables and indicators (called factor loadings) and relationships between latent variables and other latent variables (called path coefficients). In Figure 1, one factor loading for each latent variable has been set at 1.0 to scale the latent variables (e.g., Severity of Disability to co-occurring pain). Free parameters, indicated with asterisks, are to be estimated for 16 factor loadings (parameters between latent variables and indicators, e.g., Severity of Disability to number of hospitalizations) and 7 path coefficients (between latent variables, e.g., Severity of Disability to Perceived Stress), for a total of 23 direct effects. One drawback of hypothesizing directional effects in SEM is the perception that causal relationships are being tested. As with other methods involving prediction (e.g., regression), causality may not be inferred from cross-sectional data. One of the advantages of SEM over other methods is its capacity to accommodate estimates of error variance, in contrast to methods such as regression and path analysis, which assume all variables are measured without error. The error associated with indicators and dependent latent variables can be modeled in two ways. One option is to estimate the variance for each of the 22 error terms while constraining loadings of error terms on the dependent variables to 1.0 (the default on most software programs and what is shown in Figure 1). This results in the estimation of parameters representing error variance. Alternatively, the error variances can be constrained to 1.0 and the factor loadings estimated. The second option results in a standardized error term and estimation of parameters representing factor loadings. Overall model fit will be the same regardless of the option chosen. We opted to constrain the loadings of each error term to 1.0 and estimate the 22 error variances in Figure 1. When a factor loading for an independent latent variable is constrained at 1.0, the variance of the independent latent variable is then estimated. This was the case for Severity of Disability and Positive Coping in the model shown in Figure 1. To summarize,

variance was estimated for indicator error associated with the 22 indicators, variance (referred to as disturbance) in the four dependent latent variables (Perceived Stress, Social Support, Depression, and Resilience), and variance in the two independent latent variables (Severity of Disability and Positive Coping). Finally, covariances are nondirectional associations among independent variables. If a researcher expected that two factors were associated but that a causal relationship did not exist, then he or she would specify a covariance between the factors. Because covariances can only exist between independent variables, researchers who anticipate correlations or nondirectional associations between endogenous latent variables should specify covariances between the respective disturbance error variance for each latent variable. Given the theoretical background of the model in Figure 1, no covariances were included in the model. Therefore, 51 parameters were specified for estimation: 16 factor loadings, 7 path coefficients, 22 indicator error variances, 4 disturbance error variances, and 2 independent latent variable variances.

Model Identification Before data collection, researchers should ensure that their specified models are identified and thus able to be estimated with SEM software. Model identification describes whether a researcher can create a set of unique estimates for the parameters of interest given the number of known parameters. Known parameters are the elements within the variance/covariance matrix. This ratio of known parameters to unknown parameters is at the heart of any discussion of identification. Rather than counting the number of variances and covariances in Table 2, the number of known parameters (i.e., the number of unique elements in a variance/covariance matrix) can be easily calculated using the following formula: Number of observed variables ⫻ (Number of observed variables ⫹ 1) 2 The number of unknown, or estimated, parameters is determined by the specified model. Unless archival data are used, researchers should specify their models before data collection to avoid specifying a model that

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

346

WESTON, GORE, CHAN, AND CATALANO

cannot be identified. That is, researchers may specify more free parameters (marked with asterisks in Figures 1 and 2) in a model than can be estimated with the available data. Computing the degrees of freedom is the first of several rules proposed by Bollen (1989) that help determine whether a model is identified. Degrees of freedom are calculated by subtracting the estimated parameters from the known parameters. When the calculated degrees of freedom are negative, the model is underidentified. For example, if the model in Figure 2 were changed slightly so that the factor loadings for indicator a and indicator c were freely estimated (ignoring for the moment that this would result in latent variables that are not scaled), the degrees of freedom would be negative. With 10 unique elements in the matrix (shown in Figure 2, or calculated using the previously explained formula: [4(4 ⫹ 1)]/2 ⫽ 10) and 11 parameters (4 indicator error variances, 4 factor loadings in the modified model, 1 latent variable variance for Factor 1, 1 path coefficient, and 1 disturbance variance), the model has ⫺1 degree of freedom and is underidentified. In addition to illustrating underidentification, this example demonstrates the importance of scaling latent variables. When degrees of freedom are positive, the model may be identified. In Figure 1, there are 22 observed variables (thus 253 unique elements in the variance/covariance matrix; [22(22 ⫹ 1)]/2⫽ 253) and 51 estimated parameters, noted with asterisks. For this model, there are 202 degrees of freedom, calculated by subtracting parameters to be estimated (51) from known parameters (253). The model in Figure 1 is further evaluated by examining the interrelationships of the dependent latent variables. If none of the dependent latent variables predict other dependent latent variables, the model is identified. Figure 1 shows that dependent variables do have effects on other dependent variables (e.g., Perceived Stress predicts Depression), so this rule is not satisfied. Not meeting this rule does not imply that the model is not identified, only that we should continue to evaluate the model. Had we met this requirement, further evaluation would not be needed because this rule is sufficient to determine identification. Models can be described as recursive or nonrecursive. Recursive models have no reciprocal relationships between variables or “feedback loops.” In other words, all relationships between variables are unidirectional. The model in Figure 1 is recursive. Recursive models are identified; this is a sufficient condition for identification. If we were to add a parameter from Depression to Resilience, a reciprocal relationship would exist between the two latent variables and the model would be nonrecursive. When models are nonrecursive, additional steps must be taken to determine whether the model is identified. Kline (2005) has provided an excellent guide for determining identification of nonrecursive models. Identified models may be overidentified or just-identified. When the degrees of freedom are positive, as is the case in Figure 1 with 202 degrees of freedom, models are said to be overidentified. In essence, there are multiple possible estimates for each of the unknown parameters. An overidentified model has the potential to not fit the observed data and is thus of great value to researchers. In contrast, a just-identified model is one in which the known parameters equal the estimated parameters. For example, in Figure 2, if a parameter was added from Factor 1 to indicator c, 10 parameters would be estimated in the model. With 10 known parameters in the sample matrix, the model would have 0 degrees

of freedom, only one possible solution, and will result in a perfect model fit. Because the fit of such a model is known a priori, it is of little interest to researchers. The notion of parsimony is replete within the SEM literature (e.g., there are parsimony adjusted measures of model fit) precisely because of the delicate balance researchers must achieve between specifying too many parameters (e.g., capitalizing on the tendency of overspecified or just-identified models to approach perfect fit) and not specifying enough parameters to capture hypothesized relations among observed and latent variables (thus resulting in poor fit).

Data Collection/Screening After determining that the specified model is identified, researchers collect data or screen archival data. As with all statistical analyses, several assumptions must be met. Only those specific to SEM are discussed in detail.

Sample Size Two reasons contributing to the need for large samples in SEM are model complexity and use of maximum likelihood (ML) estimation, one of the most popular methods of estimation in SEM. Depending upon which source is consulted, researchers may find conflicting information on what sample size is adequate for SEM. Previous guidelines (e.g., Kline, 1998) indicated that 10 to 20 participants per indicator would result in a sufficient sample. Using this guideline, we would need at least 220 participants to test the model in Figure 1. Kline (2005) revised his guidelines for sample size to suggest that samples with fewer than 100 participants are small, those with 100 to 200 participants are medium, and those with more than 200 participants are large. Sample size is an especially salient issue in rehabilitation research, where data collection is often intensive and several years of sampling may be necessary to obtain samples of 200 participants. Research by Jackson (2001, 2003, 2007) is relevant; he found only a small effect of sample size on model fit when he tested the hypothesis that an inadequate sample would result in poor-fitting models. Jackson’s research suggests that the reliability of observed measures and the number of indicators per factor were important determinants of model fit. When few acceptable measures of a construct exist, when multiple measures of a construct are not at least moderately related to each other, or when reliability of measures is low, careful researchers will use larger samples. Using these guidelines, the available sample of 274 participants is acceptable when testing the model in Figure 1 because multiple indicators per latent variable are specified, all of which have internal consistency reliabilities of ␣ ⫽ .65 or greater. Additional research on small samples is promising. Results indicate that samples of 5 participants per indicator (Nevitt & Hancock, 2004) or samples of fewer than 100 participants (Bentler & Yuan, 1999) can generate stable solutions when statistical corrections are applied, for example, Bartlett’s (1950) k-factor correction, and data are normally distributed. Because small-sample modeling requires advanced statistical methods, we do not recommend conducting structural equation models with samples of fewer than 200 unless researchers are well-versed in SEM.

SPECIAL ISSUE: INTRODUCTION TO SEM

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Multicollinearity Multicollinearity refers to situations where measured variables are so highly related that they are essentially redundant. This problem is a concern in SEM because researchers intentionally use related measures as indicators of a latent variable. A rough guideline used to check for multicollinearity is screening bivariate correlations of all indicators. Bivariate correlations higher than r ⫽ .85 can signal potential problems (Kline, 2005). Screening for multicollinearity is important because a high degree of correlation between indicators of the same latent variable (e.g., between coping planning and active coping in Figure 1) may result in poor fit indices when the indicators are highly reliable (Browne, MacCallum, Kim, Andersen, & Glaser, 2002). In effect, highly reliable, highly related indicators can result in models that appear to fit poorly. When two observed variables are reliable and highly correlated, one solution is to remove one of the redundant variables. Another option would be to note the potential for a problem and reconsider removing the variable if problems were to arise in estimation. The upper right matrix in Table 2 shows that none of the correlations between indicators are above r ⫽ .85.

Outliers Univariate and multivariate outliers must also be examined. Participants’ scores are univariate outliers if their scores are extreme on only one variable. When participants have two or more extreme scores or an unusual configuration of scores, they are considered multivariate outliers. Univariate outliers could be either transformed or changed to the next most extreme score, depending on the normality of the data. Multivariate outliers can be recoded (e.g., to a value equal to three or four standard deviations beyond the mean) or removed from the analysis with sufficient justification.

Normality Most statistics used in SEM assume that the multivariate distribution is normally distributed. Violating this assumption can be problematic because nonnormality will impact the accuracy of statistical tests. If a model is tested with nonnormally distributed data, the results may incorrectly suggest that the model is a good fit to the data or that the model is a poor fit to the data, depending upon the degree and type of problem. Tabachnick and Fidell (2001) reviewed assumptions to be met for multivariate normality, which are (a) all univariate distributions must be normal, (b) the joint distributions for any combination of variables must be normal, and (c) all bivariate scatterplots must show the existence of linearity and homoscedasticity. Homoscedasticity exists when variance of criterion scores is evenly distributed along the regression line for the predictor (e.g., the variance of co-occurring pain is the same at all levels of number of hospitalizations). Testing whether the assumptions for multivariate normality are met is impractical as it involves the examination of an infinite number of linear combinations. Screening data for univariate normality can inform researchers whether multivariate normality may be an issue. To determine whether univariate normality exists, univariate distributions are examined for skewness and kurtosis. Skewness is

347

the degree to which the distribution for a variable is asymmetrical, with positive skew describing a distribution where many scores are at the low end of a scale (e.g., the score distribution for a very difficult test). For the skewness index, absolute values greater than 2.0 are considered extreme (Curran, West, & Finch, 1996). Kurtosis is an index of the peak and tails of the distribution. Positive kurtosis reflects distributions that are very peaked, with short, thick tails (also called heavy-tailed distributions). Negative kurtosis exists when the distribution is quite flat, with long, thin tails. Values over 7.0 for the kurtosis index suggest a problem (Curran et al., 1996), and values over 20.0 are considered extreme (Kline, 2005). Univariate normality is especially important to consider because distributions can vary from normality in at least four ways. For example, some variables may be positively skewed, whereas others are negatively skewed. This clearly increases the likelihood that the assumption of homoscedasticity will be violated. Tabachnick and Fidell (2001) have suggested that transforming all data to increase normality may be the safest approach to take. Common transformations include square root (used when data are moderately positively skewed), logarithm (for more than moderate positive skew), and inverse (for severe positive skew, such as an L-shaped distribution). When data are negatively skewed, transformation involves first reflecting the data by subtracting a constant value equal to the largest value plus one before using the appropriate technique. Deletion or transformation of univariate or multivariate outliers enhances multivariate normality. Researchers should be aware, however, that transformation is not desirable if the variable is meaningful (e.g., height, temperature) or is widely used (e.g., IQ scores) because transformation would hinder interpretation of the standardized solution.

Missing Data Rubin (1976) described three categories for missing data. Data may be missing completely at random, missing at random (MAR), or not missing at random (NMAR). Missing data in the first two conditions are considered ignorable but are not ignorable in the third. NMAR implies a systematic loss of data. For example, if participants were missing data on the Social Support construct because they have little support, data would be NMAR. In longitudinal research, data missing due to attrition would be a concern when reasons for attrition are related to study variables (e.g., attrition due to death in a health study). Unfortunately, the researcher is only able to determine whether data are missing completely at random or not, and not whether data are MAR or NMAR. This is problematic as methods for handling missing data vary according to whether the “missingness” is ignorable. Allison’s (2003) review of techniques such as list-wise deletion, pairwise deletion, ML, and multiple imputation is quite informative for readers interested in details beyond the scope of this introduction. Minimally, we would suggest that researchers address the issue of missing data by noting the extent of the problem and how it was handled.

Model Estimation After the model is specified, identified, data have been collected (or archival data have been selected) from a sufficiently large sample of participants, and screened, the model can be estimated.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

348

WESTON, GORE, CHAN, AND CATALANO

Estimation involves determining the value for the unknown (free) parameters. For each parameter, an estimate of the unstandardized coefficient (analogous to a B weight in a regression analysis) and the standard error of the estimate are generated, along with the standardized coefficient (analogous to ␤). Estimates of the free parameters are generated using an SEM software program. Many programs are available, including LISREL (Jo¨reskog & So¨rbom, 1996), AMOS (Arbuckle, 2003), PROC CALIS (Hartmann, 1992) in SAS (SAS Institute, Inc., 2000), EQS (Bentler, 1995), and Mplus (L. K. Muthe´n & B. Muthe´n, 2004). Programs differ in their ability to compare multiple groups, estimate parameters for categorical indicators and in the specific fit indices provided as output. A complete discussion of the advantages and disadvantages of each package is beyond the scope of this article and likely to become quickly outdated given advances in programming. Readers are encouraged to consult software manuals and SEM guides, some of which include student versions of software (e.g., Schumacker & Lomax, 2004), prior to selecting a software package. As we are familiar with EQS, the most recent available version (EQS 6.1) was used to test the covariance matrix included in the lower diagonal of Table 2.

Types of Estimators Estimation procedures include ML, least squares (LS), unweighted LS, generalized LS (GLS), and asymptotic distribution free (ADF). Researchers must select which estimation method to use prior to conducting their analysis. Some methods are better able to correct for violations of statistical assumptions. One deciding factor is whether the data are normally distributed. ML and GLS methods assume multivariate normality, whereas LS and ADF do not. Although LS estimation is not useful for making valid inferences about the population, ADF is generalizable when the sample is sufficiently large. ML is one of the most commonly used techniques and is robust to moderate violations of the normality assumption (Anderson & Gerbing, 1984; Chou, Bentler, & Satorra, 1991; Hu, Bentler, & Kano, 1992; B. O. Muthe´n & Kaplan, 1992). Many researchers opt to use ML when data are moderately nonnormal and slightly beyond the skewness and/or kurtosis values of 2.0 and 7.0, respectively. If the data are severely nonnormal, the researcher has three options (Kline, 2005). Nonnormal data may be analyzed with corrected statistics to reduce bias. Corrected test statistics include scaled goodness-of-fit tests and robust standard errors (Kline, 2005). Satorra and Bentler’s (1994) scaled ␹2 is an example of a corrected statistic that decreases the value of standard ␹2 by a constant that is determined by the observed kurtosis. This option is chosen quite often by researchers who have nonnormal data. Alternatively, the data may be transformed (e.g., square root) and then estimated using ML or LS. GLS and ML are scale invariant. If the original scale data are transformed, the obtained parameter estimate is typically retransformed to the original scale to allow the researcher to interpret the results. Kline (2005) noted that unweighted LS is sensitive to transformation and is generally not effective with transformed data. Finally, nonnormal data may be estimated with methods such as ADF, which do not assume multivariate normality. ADF adjusts for kurtosis but has the disadvantage of requiring very large samples in order to generate stable, accurate estimates. Samples of

100 or fewer are much too small for this technique. Simple models can be estimated with sample sizes of 500 or more and complex models require thousands of participants (Yuan & Bentler, 1998).

Approaches to Estimation In Anderson and Gerbing’s (1988) approach, confirmatory factor analysis is used to test the measurement model before the full model is estimated. The confirmatory factor analysis tests whether indicators load on specific latent variables as proposed. After model estimation, factor loadings are examined to determine whether any indicators do not load as expected. Examples would be indicators that load on multiple latent variables (i.e., cross-load) when only expected to load on one latent variable or indicators that fail to load significantly on the expected latent variable. Most SEM computer programs provide the researcher with means to identify specific parameter misspecification. Misspecified free parameters present clearly with low or insignificant parameter estimates. Misspecified constrained (zero) parameters are sometimes more difficult to detect. Examination of the standardized residual tables in programs such as EQS is helpful in identifying misspecification. Residual tables show the difference between the observed correlation and the model-implied correlation, with larger values indicating greater misspecification. Positive residuals indicate observed relationships that are not specified in the model, suggesting to the researcher that parameters constrained to zero might be more appropriately freed to be estimated. Negative residuals indicate model-implied relationships that are not observed in the data, suggesting that freely estimated parameters should be constrained to zero. Results such as these would indicate that the measurement model may be inadequate and may require modification. Ensuring that the measurement model provides the best fit possible will allow the researcher to eliminate the measurement model as a source of misspecification should the full model not fit well. In other words, if the measurement and structural models were tested simultaneously in a single step and the resulting model did not fit well, it would be difficult to determine whether the misspecification was related to the measurement or to the structural portion of the model. Therefore, we recommend, as do others (e.g., Anderson & Gerbing, 1988; Kline, 2005), that reasonable necessary changes are made to the measurement model. In the second step, the full model is tested by estimating expected directional associations among latent variables, indicated with unidirectional arrows in Figure 1. Mulaik and Millsap (2000) proposed an alternative to Anderson and Gerbing’s (1988) approach involving four phases. Mulaik and Millsap’s first phase is the equivalent of estimating an exploratory factor analysis, with parameters from each latent variable to all indicators. For example, each of the six latent variables in Figure 1 would have a direct effect on number of hospitalizations rather than on only Severity of Disability. This step allows the researcher greater precision in determining potential problems with the measurement model. In the second phase, the confirmatory factor analysis is tested as described in Anderson and Gerbing’s approach. Third, the researcher tests the measurement and structural portions of the model simultaneously, as described in the second phase of Anderson and Gerbing’s approach. The goal of the fourth phase is to test a

SPECIAL ISSUE: INTRODUCTION TO SEM

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

priori hypotheses about parameters specified by the researcher. For example, this phase would involve testing whether the freely estimated parameter from Severity of Disability to Depression is significantly different from zero. By including their fourth phase, Mulaik and Millsap explicitly addressed the need to consider potential alternatives to the proposed model. Researchers are encouraged to follow one of the two multiphase approaches and test the measurement portion of their models before testing the full model. In this example, Anderson and Gerbing’s (1988) approach was followed. Although detailed discussion of the confirmatory factor analysis is beyond the scope of this article, we briefly review the results of the confirmatory factor analysis.

Measurement Model The confirmatory factor analysis was estimated with ML estimation. The fit indices, shown in Table 3, suggested the model fit was acceptable and will be discussed in more detail in the next section. All factor loadings were significant and there was no evidence of cross-loading for any indicator. Examination of the residual correlations, which are the differences between the observed and model-implied correlations, revealed a potential problem. Results indicated that the overload variable, specified as an indicator of perceived stress, was more related to indicators of other latent variables, including ability to perform activities of daily living, personal strength, coping planning, tenacity, positive affect, negative affect, and active coping, than implied in the model shown in Figure 1. To the extent that observed relationships between variables are not reflected in the model, the model will be misspecified. One option in this situation is to specify covariances between the error variances associated with each of the indicators (e.g., covariance between error variances for overload and personal strength) to incorporate the observed relationship in the model. Covariances between the observed variables cannot be specified because dependent variables cannot covary, therefore covariances between the associated error variances must be specified when dependent variables are thought to correlate. Although specifying covariances was an option, this would have resulted in a much more complex and theoretically problematic model. For this demonstration model, we opted to remove the problematic indicator from the measurement model. The resulting model reached acceptable levels of fit and was the starting point

349

for testing the structural model. We caution readers that making such a modification in the model results in exploratory model testing rather than confirmatory testing.

Structural Model The model in Figure 1, without the overload indicator for perceived stress, was also estimated with ML. Standardized results for the structural portion of the full model are shown in Figure 3. Most SEM software programs provide standardized and unstandardized output, which is analogous to standardized betas and unstandardized B weights (accompanied by standard errors) in regression analyses. Researchers typically present standardized estimates, but significance is determined by examining the unstandardized portion of the output. For example, Figure 3 shows a significant relationship between Perceived Stress and Depression (␤ ⫽ .67, p ⬍ .001). The significance of this path coefficient was determined by examining the unstandardized output, which showed that the unstandardized coefficient (the B weight) was 0.705, and had a standard error of 0.079. Although the critical ratio (i.e., a z-score) is automatically calculated and provided with output in EQS and other programs, researchers can easily determine whether the coefficient is significant (i.e., z ⱖ 1.96 for p ⱕ .05) at a given alpha level by dividing the unstandardized coefficient by the standard error. Here, 0.705 divided by 0.079 is 8.92, which is greater than the critical z value (at p ⫽ .001) of 3.29, indicating that the parameter is significant. Examination of standardized estimates is also informative. Because different variables may have different scales, comparing standardized parameter estimates (vs. unstandardized estimates) allows researchers to determine which variable(s) have a greater impact than others do. However, significant differences between parameter strength can only be determined by comparing the ␹2 values for two models (as described below), where one includes the parameters of interest constrained to be equal, and the second model includes no such constraints. In addition to providing standardized parameter estimates, most software programs include estimates of the proportion of variance explained (R2) for all dependent variables. This information is helpful in determining which indicators contain the most (and least) error in measurement.

Table 3 Fit Indices for Models With Chi-Square Difference Tests for Nested Models (N ⫽ 274) Model Confirmatory factor analyses Proposed model Model with overload removed Structural models Proposed model (overload removed) Modified Model 1 (SD 3 SS) Modified Model 2 (SD 3 PC)

df

␹2

CFI

RMSEA (90% CI)

SRMR

⌬df

⌬␹2

194 174

535.68ⴱⴱⴱ 418.28ⴱⴱⴱ

.90 .92

.08 (.07, .08) .07 (.06, .08)

.07 .06

20

117.40ⴱⴱⴱ

182 181 180

492.85ⴱⴱⴱ 486.06ⴱⴱⴱ 435.10ⴱⴱⴱ

.90 .91 .92

.08 (.07, .09) .08 (.07, .09) .07 (.06, .08)

.10 .11 .06

1 2

6.79ⴱⴱⴱ 57.75ⴱⴱⴱ

R2 Resil.

R2 Dep.

.516 .517 .590

.746 .745 .760

Note. df ⫽ degrees of freedom; CFI ⫽ comparative fit index; RMSEA ⫽ root-mean-square error of approximation; CI ⫽ confidence interval; SRMR ⫽ standardized root-mean-square residual; Resil. ⫽ Resilience; Dep. ⫽ Depression; SD ⫽ Severity of Disability; SS ⫽ Social Support; PC ⫽ Positive Coping. ⴱⴱⴱ p ⬍ .001.

WESTON, GORE, CHAN, AND CATALANO

350

.75*** Perceived Stress

.50***

.67*** -.45*** Depression

.66***

Severity of Disability

-.46

***

.89*** -.30***

Social Support .17**

.70***

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Resilience .46

***

Positive Coping

Figure 3. Estimated model (N ⫽ 274). Fit indices for this model are as follows: ␹2(182, N ⫽ 274) ⫽ 492.85, p ⬍ .001, comparative fit index ⫽ .90, root-mean-square error of approximation ⫽ .08 (90% confidence interval: .07, .09), standardized root-mean-square residual ⫽ .10. ⴱⴱ p ⬍ .01. ⴱⴱⴱ p ⬍ .001.

Evaluating Model Fit The objective when evaluating the estimated model is to determine whether the associations among indicators and latent variables in the model (e.g., the model shown in Figure 3) adequately reflect the observed associations in the data (e.g., the data shown in Table 2). Statisticians agree that fit should be evaluated in terms of (a) significance and strength of estimated parameters, (b) variance accounted for in indicators and latent variables, and (c) the overall fit of the model to the observed data, as indicated by a variety of fit indices. Addressing the first two criteria is a fairly straightforward task; however, there exists considerable disagreement over what constitutes “acceptable” values for global fit indices. Multiple indices are available to evaluate model fit. The most stringent concept of fit suggests that the model must exactly replicate the observed data. A second perspective is that models approximating the observed data are acceptable. For more information on the “exact versus close fit” debate, readers are referred to discussions by Quintana and Maxwell (1999) and by Marsh, Hau, and Wen (2004). Martens’s (2005) study suggested that the perspective commonly taken by social scientists reflects the assumption that approximating observed data is acceptable and can result in important contributions to the literature. Hoyle and Panter (1995) have recommended that researchers report several indices of overall model fit, a practice that many have followed (Martens, 2005). We present the fit indices that are reported by most software programs and that have been shown to be the most accurate in a variety of conditions based on empirical research.

Goodness-of-Fit Index (GFI) and ␹2 Absolute fit indices directly assess how well a model fits the observed data (i.e., how well the model in Figure 3 describes the data in Table 2) and include the GFI (Jo¨reskog & So¨rbom, 1981; Tanaka & Huba, 1985, 1989), ␹2 (Bollen, 1989), and scaled ␹2 (Satorra & Bentler, 1994). GFI is analogous to R2, used in regres-

sion to summarize the variance explained in a dependent variable, yet GFI refers to the variance accounted for in the entire model. However, GFI is not reported as consistently as ␹2 is. Both ␹2 values are actually tests of model misspecification. Thus, a significant ␹2 suggests the model does not fit the sample data. In contrast, a nonsignificant ␹2 indicates that the model-implied relationships between variables are not significantly different than are those observed in the data. Although the most commonly reported absolute fit index is ␹2, two limitations exist with this statistic. First, ␹2 tests whether the model is an exact fit to the data. Finding an exact fit is a rare occurrence, especially when the sample size is very large. In addition, as with most statistics, large sample sizes increase power, resulting in significance with small effect sizes. Consequently, a nonsignificant ␹2 may be unlikely, despite the model fitting the observed data closely but not exactly. Therefore, additional fit indices are typically considered to determine whether the model fit is acceptable. Table 3 shows that the ␹2 for all models tested in this demonstration were significant, indicating the model is not an exact fit to the data. Many additional indices are available, but not all software programs provide the same indices. This has resulted in the lack of a standard format for reporting fit and an inability to compare across studies. In addition, reviewers may prefer to see specific indices. Following the recommendations of Boomsma (2000), MacCallum and Austin (2000), and McDonald and Ho (2002), three indices are presented in addition to the model ␹2: Bentler’s (1990) comparative fit index (CFI), Steiger’s root-mean-square error of approximation (RMSEA; Steiger, 1990; Steiger & Lind, 1980), including the associated 90% confidence interval (CI), and the standardized root-mean-square residual (SRMR; Bentler, 1995).

CFI Bentler’s (1990) CFI is an example of an incremental fit index. This type of index compares the improvement of the fit of the researcher’s model over a more restricted model, called an inde-

SPECIAL ISSUE: INTRODUCTION TO SEM

pendence or null model, which specifies no relationships among variables. CFI has a range of 0 to 1.0, with values closer to 1.0 indicating better fitting models. Table 3 shows that the proposed structural model has a CFI of .90.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

RMSEA RMSEA (Steiger, 1990; Steiger & Lind, 1980) is also suggested as an index of fit. This index corrects for the complexity of a model. As a result, when two models explain the observed data equally well, the simpler model will have the more favorable RMSEA value. An RMSEA value of 0.00 indicates that the model exactly fits the data. In addition to providing the RMSEA, a recent practice has been to provide the 90% CI for the RMSEA, which incorporates the sampling error associated with the estimated RMSEA. The 90% CI is interpreted in the same way as one would interpret a one-sample t-test. If, for example, an RMSEA cutoff of 0.05 was defined as the criteria for adequate fit, a lower bound 90% CI less than 0.05 would cause a researcher to conclude the model was in the adequate range. Clearly, this can lead to confusion when the 90% CI is wide and, for example, ranges from 0.00 to 0.15. In such a case, the wide CI about the RMSEA would indicate that substantial error exists in the estimate of RMSEA and that the results should be interpreted cautiously. The RMSEA for the proposed model in Table 3 is 0.08, and the narrow CI suggests there is little error in the estimate.

SRMR SRMR (Bentler, 1995) is an index based on covariance residuals, with a smaller value indicating better fit. The SRMR is a summary of how much difference exists between the observed data and the model. The SRMR is the absolute mean of all differences between the observed and model-implied correlations. A mean of 0.0 indicates no difference between the observed data and the correlations implied in the model, thus an SRMR of 0.00 indicates perfect fit. The results in Table 3 for the proposed model show an SRMR of 0.10. Regardless of whether researchers opt to report this index, we strongly recommend that the standardized residuals be examined to identify any associations in the observed data that are not reflected in the estimated model.

Guidelines for Fit Previously, guidelines for acceptable fit included a nonsignificant ␹2, CFI greater than .90 (Hu & Bentler, 1995), RMSEA less than 0.10 with an upper bound of the 90% CI of 0.10 (Browne & Cudek, 1993), and SRMR less than 0.10 (Bentler, 1995). These guidelines are still followed by many, as evidenced by the positive evaluation of models with fit indices at or near these values. However, readers should be aware that some debate exists among statisticians (and likely among reviewers) regarding acceptable fit. Recent studies (e.g., Hu & Bentler, 1998, 1999) have suggested a minimum cutoff of .95 for CFI and maximum cutoff of 0.06 for RMSEA. However, the work by Hu and Bentler and others (e.g., Marsh et al., 2004) also indicates that appropriate cutoff values are impacted by sample size, model complexity, and degree of misspecification. Inappropriately applying the “new” cutoff criteria could result in the incorrect rejection of acceptable models when

351

sample sizes are smaller than N ⫽ 500 and models are not complex. On the other hand, using “old” criteria to evaluate complex models estimated with samples larger than N ⫽ 500 can result in the incorrect acceptance of misspecified models. Empirical research suggests that fit indices not meeting the old guidelines (i.e., CFI ⱖ .90, RMSEA ⱕ 0.10, and SRMR ⱕ 0.10) would likely not be acceptable and also indicates that models with indices exceeding new criteria would be acceptable (i.e., CFI ⱖ .95, RMSEA ⱕ 0.06, and SRMR ⱕ 0.08). This suggests to us that when CFI values between .90 and .95, RMSEA values between 0.05 and 0.10, and SRMR values between 0.08 and 0.15 are observed, readers should take into consideration the sample size used to estimate the model (using more stringent criteria for larger samples) and the complexity of the model (using more stringent criteria for less complex models). All fit indices should be considered together to determine the overall model fit. Although the CFI for the proposed structural model in Table 3 indicates the model may be acceptable, the significant ␹2 and high RMSEA and SRMR values suggest the model may be a poor fit to the data. In addition, the 90% CI for the RMSEA indicates that there is little error in the point estimate of the fit index and that it is unlikely that the model in Figure 3 adequately describes the data in Table 2. Overall, the fit indices suggest that some misspecification may exist.

Parameter Estimates Although it is important to consider overall model fit, the significance of estimated parameters must also be examined because fit indices do not take into account significance of individual parameters. A model that fits the data quite well but has few significant parameters would be meaningless. Figure 3 includes standardized estimates for path coefficients, interpreted as regression coefficients. (Factor loadings for indicators are not included here because they would have been examined for significance when the confirmatory factor analysis was conducted during testing of the measurement model.) Standardized estimates allow the relationships among latent variables to be compared. In Figure 3, a stronger relationship is shown between Perceived Stress and Resilience (␤ ⫽ ⫺.45) than is shown between Social Support and Resilience (␤ ⫽ .17). However, the relationship between Perceived Stress and Resilience is also partially mediated by Social Support, so two parameters from Perceived Stress to Resilience can be traced in the model (Perceived Stress 3 Resilience and Perceived Stress 3 Social Support 3 Resilience). Altogether, Perceived Stress, Social Support, and Positive Coping explain 51.6% of the variance in Resilience. The amount of explained variance (R2) can be determined by squaring the standardized disturbance estimate (.6962 ⫽ .484) and subtracting the value from 1 (R2 ⫽ 1 – standardized D2). Resilience and Perceived Stress, with indirect effects of Social Support, Positive Coping, and Severity of Disability, explain 74.6% of variance in Depression. Severity of Disability, as the sole predictor of Perceived Stress, explains 43.7% of the variance in Perceived Stress, and Perceived Stress explains 20.9% of variance in Social Support. All proposed parameters were significant and in the expected direction. One problem that can arise, but that was not observed here, is a negative error variance. Estimates of error variance that are negative are called Heywood cases and result from a standard-

WESTON, GORE, CHAN, AND CATALANO

352

ized factor loading or path coefficient greater than 1.0. The presence of Heywood cases indicates a poorly specified model (Kline, 2005), and the researcher should consider potential problems in specifying the model that may have led to the Heywood case.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Model Modification/Testing Alternative Models The fit of a proposed model is often less than ideal. As a result, researchers may modify (or respecify) their models to correct misfit. Model modification involves adjusting the estimated model by freeing or constraining parameters. In other words, parameters that were initially left free to be estimated may be constrained to a value (e.g., 0.0) and not estimated. Parameters previously constrained may be respecified to be free and estimated in the modified model. Such changes are observed most often when researchers encounter nonsignificant parameter estimates and opt to constrain those parameters to 0.0 and simplify the model. Modification is a controversial topic that has been likened to the debate about post hoc comparisons in analysis of variance (Hoyle, 1995). Readers interested in specific aspects of the discussion are referred to Bollen and Long’s (1993) edited volume, which is devoted entirely to the debate, as well as work by Hoyle and Panter (1995), MacCallum and Austin (2000), and McDonald and Ho (2002). As with analysis of variance and regression, problems with model modification include capitalization on chance and results that are specific to a sample because they are data driven. Although there is disagreement regarding the acceptability of post hoc model modification, statisticians and applied researchers alike emphasize the need to clearly state when post hoc modification has occurred rather than implying that analyses were a priori. As Martens (2005) found, modification is generally accomplished by using statistical search strategies (often called a specification search) to determine which adjustments will result in a better-fitting model. The Lagrange multiplier test suggests which

parameters constrained to 0 (not estimated) should be freed (estimated), and the Wald test suggests which free parameters should be constrained to 0 and, effectively, removed from the model. Schumacker and Lomax (2004) and Kline (2005) provided detailed information on conducting specification searches using modification indices. Researchers who opt to engage in post hoc modification should be aware (and ready to address reviewers’ concerns) that such capitalization on chance often results in the estimation of models that are not generalizable across samples (e.g., Chou & Bentler, 1990; Green, Thompson, & Babyak, 1998). This problem is more likely when (a) small samples are used, (b) researchers do not limit modifications to those that are theoretically acceptable, and (c) the initial model is severely misspecified (Green et al., 1998). Careful researchers will modify their model within the limitations of their theory. For example, if a Wald test indicated the freely estimated parameter from Perceived Stress to Resilience should be removed, then that modification would not be included because the suggested relationship contradicts theory and research.

Modification Example The Lagrange multiplier test revealed two potential modifications to the proposed model that were not contrary to theory or past research. First, results suggested that freely estimating a parameter from Severity of Disability to Social Support would result in an improved model fit. In the proposed model, the parameter is constrained to 0 and not estimated. By freeing the parameter and releasing the constraint, we reduce the degrees of freedom by 1 (currently 182) and specify a slightly more complex model. The second suggestion was to estimate a parameter from Severity of Disability to Positive Coping. Although these two parameters were estimated in steps so that each addition to the model could be evaluated, Figure 4 shows the final model with both parameters. Table 3 shows the fit statistics at each step.

.83*** Perceived Stress

.57**

Severity of Disability

-.36*

-.25*

.49***

.65*** -.40***

Depression

.84*** -.31***

Social Support

.64*** .16**

-.54***

Resilience

.84*** .44

**

Positive Coping

Figure 4. Modified model with both additional paths (N ⫽ 274). Fit indices for this model are as follows: ␹2(180, N ⫽ 274) ⫽ 435.10, p ⬍ .001, comparative fit index ⫽ .92, root-mean-square error of approximation ⫽ .07 (90% confidence interval: .06, .08), standardized root-mean-square residual ⫽ .06. ⴱ p ⬍ .05. ⴱⴱ p ⬍ .01. ⴱⴱⴱ p ⬍ .001.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

SPECIAL ISSUE: INTRODUCTION TO SEM

The fit of the first modified model is compared to that of the proposed model in three ways: (a) Parameters are evaluated by examining significance of parameter estimates, (b) the change in explained variance for Resilience and Depression is considered, and (c) significant improvement in model fit is tested with a chi-square difference test and improvement in other fit indices. Figure 4 shows that all original parameters and the additional two parameters are significant. This is the first indication that the first modified model may be acceptable. Second, no changes in explained variance for both Resilience and Depression are observed in Table 3. Ideally, model modification would result in an increase in explained variance or, at least, no change. Third, the fit of the proposed model is compared to the fit of the first modified model. Table 3 shows that the ␹2 for the first modified model is smaller but still significant. There is an increase in the SRMR, no change in the RMSEA and associated 90% CI, but the CFI shows an improvement in fit. Thus, the addition of the parameter from Severity of Disability to Social Support has improved the model slightly, and it is retained in the next modified model, where a parameter from Severity of Disability to Positive Coping is added. The second modified model also shows an improvement. The added parameter is significant and results in increases in explained variance in Resilience and Depression. Finally, the decrease in ␹2 is significant, and all other fit indices show improvement. Additional fit indices, used to determine which of two or more competing models fits best, include the Akaike information criterion (AIC; Akaike, 1974) and the expected cross-validation index (Browne & Cudek, 1993). Both are considered predictive fit indices and indicate how well models would be expected to fit sample data drawn from the same population. These indices are not informative in determining how well a single model fits the data but are generally used to choose between models. Only the AIC is reported here as EQS does not include the expected cross-validation index as output. Smaller values are indicative of better fitting models, with more parsimonious models favored. For the model shown in Figure 3, AIC ⫽ 128.85. In comparison, for the final model shown in Figure 4, AIC ⫽ 75.10, indicating that the modified model, although more complex, is a better fit to the data. The model in Figure 3 is more restricted than is the model in Figure 4 because the parameters from Severity of Disability to Social Support and Positive Coping are constrained to 0 and not estimated. Because the two parameters are freely estimated in the modified model in Figure 4, this model is nested within the more restrictive, initially proposed model in Figure 3. Nested models can be directly compared to determine which provides a better fit. If the decrease in the ␹2 value is significant, then the less restrictive modified model provides a better fit to the data. The difference is calculated by subtracting the ␹2 value of the less restrictive first modified model (486.06) from the ␹2 value of the more restrictive proposed model (492.85). This calculation provides the ␹2 difference, 6.79, shown in Table 3. To determine the associated degrees of freedom, the value for the less restrictive alternative model (df ⫽ 182) is subtracted from the value for the more restrictive model (df ⫽ 181). Using the calculated degrees of freedom, the researcher would use a ␹2 table (found in the appendix of most univariate statistics texts) and find the critical value at the appropriate significance level. The critical value for a ␹2 with 1 degree of freedom at p ⬍ .05 is 3.84. Our value of 6.79 exceeds the critical value. Thus, the ␹2 difference test suggests that the addi-

353

tional parameter in the modified model has resulted in a significant improvement in fit. Taking this together with the improvement in fit for the CFI and SRMR and the smaller AIC suggests that the added parameter is indeed significantly different from zero and should be included in the final model. Comparing the second modified model to the proposed model shows the following: ⌬␹2(2, N ⫽ 274) ⫽ 57.75, p ⬍ .05. This difference, along with the improvement in other fit indices, supports the decision to retain the second modified model as the final model.

SEM in Rehabilitation Psychology Rehabilitation psychologists often approach disability and disability-related interventions utilizing a biopsychosocial model, addressing concomitantly a variety of factors that impact individuals, such as social systems, psychological/personal characteristics, and biological systems (Lynch, 2005). SEM can be a powerful procedure for empirically representing complex and sophisticated theoretical models, facilitating an increased understanding of the relationships among and between multiple biopsychosocial factors. Using SEM procedures, models can be developed to identify factors that may be important to consider in developing effective interventions designed to improve the lives of individuals with disabilities and chronic illness. A consensus on the measurement of these biopsychosocial factors, however, continues to challenge rehabilitation researchers, therefore SEM will be useful as a tool to confirm measurement structures of psychological instruments as well as theory and model-building. SEM has direct implications for treatment and intervention programs. For example, using the modified resilience model that has been described herein, there appears to be a direct and significant relationship between Perceived Stress and Depression and between Resilience and Depression. A treatment program developed to address an individual’s ability to manage stress effectively and enhance an individual’s resiliencey could be an effective means to diminish depressive symptomatology. Resilience could be enhanced by using interventions that improve the individual’s level of social support and coping skills. Experimental studies and clinical trials can then be conducted to determine the effectiveness of the interventions. Competing models and interventions can be tested against other models to determine whether a particular intervention appears to be more effective and efficient for a particular disorder with a particular population. Using SEM as a model and theory testing procedure allows continued progress on rehabilitation research efforts to identify effective and efficient standards of practice that serve to enhance the health, general well-being, and psychosocial functioning of individuals with disabilities and chronic illness. Applications beyond the examples shown here include the examination of change over time through the analysis of repeated measures (e.g., growth curve models, Bollen & Curran, 2006; longitudinal mediation models, Cole & Maxwell, 2003), comparison of multiple groups to test for factorial invariance (Widaman & Reise, 1997), and analysis of hierarchical data with multilevel modeling (Hox, 2002; Stapleton, 2006). Given the prevalence of SEM studies, it is now not uncommon to see meta-analyses based in part or in whole on previous SEM studies. Although many new, easy-to-use software programs have increased the accessibility of this quantitative method, SEM remains a complex family of sta-

WESTON, GORE, CHAN, AND CATALANO

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

354

tistical procedures that requires a great deal of judgment on the part of the researcher in order to avoid misuse and misinterpretation. Users of SEM are faced with making decisions regarding how many participants to use, how to normalize their data, what estimation methods and fit indices to use, and how to evaluate the meaning of those fit indices. We offer several guidelines for answering these questions. SEM requires larger samples as more complex models are specified. Guidelines for appropriate fit are a hot topic for debate among statisticians, resulting in some uncertainty about what constitutes a good fit in applications of SEM. To address this issue, we recommend researchers report the fit indices described here (i.e., ␹2, CFI, RMSEA with 90% CI, and SRMR), as well as the standardized parameter estimates, with significance. In addition, we strongly recommend that researchers include (or reviewers request) a covariance matrix or correlation matrix with means and standard deviations, which allows others to duplicate the results and independently assess model fit. Perhaps the most important suggestion we can offer those interested in understanding SEM is not to attempt to master any static set of decision rules. Instead, we emphasize the need to continue to seek out information on the appropriate application of this technique. Guides to SEM by Kline (2005) and Schumacker and Lomax (2004) are accessible to new users of SEM. The classic text by Pedhazur (1982) provides information on basic path analysis concepts and principles that apply to SEM. Another resource for novice and advanced SEM users alike is SEMNET, the structural equation modeling discussion network (http://www2.gsu.edu/ ⬃mkteer/semnet.html). It is likely that best practices will change as consensus is reached, affecting the way in which researchers make decisions.

References Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716 –723. Allison, P. D. (2003). Missing data techniques for structural equation modeling. Journal of Abnormal Psychology, 112, 545–557. Anderson, J. C., & Gerbing, D. W. (1984). The effects of sampling error on convergence, improper solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis. Psychometrika, 49, 155– 173. Anderson, J. C., & Gerbing, D. W. (1988). Structural equation modeling in practice: A review and recommended two-step approach. Psychological Bulletin, 103, 411– 423. Arbuckle, J. L. (2003). Amos user’s guide. Chicago: SmallWaters. Bartlett, M. S. (1950). Tests of significance in factor analysis. British Journal of Psychology: Statistical Section, 3, 77– 85. Bentler, P. M. (1990). Comparative fit indices in structural models. Psychological Bulletin, 107, 238 –246. Bentler, P. M. (1995). EQS structural equations program manual. Encino, CA: Multivariate Software. Bentler, P. M., & Yuan, K.-H. (1999). Structural equation modeling with small samples: Test statistics. Multivariate Behavioral Research, 34, 181–197. Bollen, K. A. (1989). A new incremental fit index for general structural equation models. Sociological Methods and Research, 17, 303–316. Bollen, K. A., & Curran, P. J. (2006). Latent curve models: A structural equation approach. Hoboken, NJ: Wiley. Bollen, K. A., & Long, J. S. (Eds.). (1993). Testing structural equation models. Newbury Park, CA: Sage.

Boomsma, A. (2000). Reporting analyses of covariance structures. Structural Equation Modeling, 7, 461– 483. Browne, M. W., & Cudek, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136 –162). Newbury Park, CA: Sage. Browne, M. W., MacCallum, R. C., Kim, C., Andersen, B. L., & Glaser, R. (2002). When fit indices and residuals are incompatible. Psychological Methods, 7, 403– 421. Carver, C. S. (1997). You want to measure coping but your protocol’s too long: Consider the Brief COPE [Electronic version]. International Journal of Behavioral Medicine, 41, 91–100. Catalano, D. (2006). Resiliency as a framework for predicting life adaptation in a community sample of Canadians with spinal cord injuries. Unpublished doctoral dissertation, University of Wisconsin-Madison. Chan, S., Chan, C. C. H., Siu, A. M. H., & Poon, P. K. K. (2007). Stages of change in self-management of chronic diseases: Psychometric properties of the Chinese version of the URICA scale. Rehabilitation Psychology, 52, 103–112. Chou, C.-P., & Bentler, P. M. (1990). Model modification in covariance structure modeling: A comparison among the likelihood ratio, Lagrange Multiplier, and Wald tests. Multivariate Behavioral Research, 25, 115– 136. Chou, C.-P., Bentler, P. M., & Satorra, A. (1991). Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: A Monte Carlo study. British Journal of Mathematical and Statistical Psychology, 44, 347–357. Chronister, J. A., & Chan, F. (2006). A stress process model of caregiving for individuals with traumatic brain injury. Rehabilitation Psychology, 51, 190 –201. Cole, D. A., & Maxwell, S. E. (2003). Testing mediational models with longitudinal data: Questions and tips in the use of structural equation modeling. Journal of Abnormal Psychology, 112, 558 –577. Connor, K. M., & Davidson, J. R. T. (2003). Development of a new resilience scale: The Connor-Davidson Resilience Scale (CD-RISC). Depression and Anxiety, 18, 76 – 82. Corrigan, P. W., Larson, J. E., & Kuwabara, S. A. (2007). Mental illness stigma and the fundamental components of supported employment. Rehabilitation Psychology, 52, 451– 457. Curran, P. J., West, S. G., & Finch, J. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1, 16 –29. Elliott, T. R., Bush, B. A., & Chen, Y. (2006). Social problem-solving abilities predict pressure sore occurrence in the fist 3 years of spinal cord injury. Rehabilitation Psychology, 51, 69 –77. Green, S. B., Thompson, M. S., & Babyak, M. A. (1998). A Monte Carlo investigation of methods for controlling Type I errors with specification searches in structural equation modeling. Multivariate Behavioral Research, 33, 365–383. Harkins, S. W., Elliott, T. R., & Wan, T. T. H. (2006). Emotional distress and urinary incontinence among older women. Rehabilitation Psychology, 52, 346 –355. Hartmann, W. M. (1992). The CALIS procedure: Extended user’s guide. Cary, NC: SAS Institute. Hox, J. J. (2002). Multilevel analysis: Techniques and applications. Mahwah, NJ: Erlbaum. Hoyle, R. H. (Ed.). (1995). Structural equation modeling: Concepts, issues and applications. Thousand Oaks, CA: Sage. Hoyle, R. H., & Panter, A. T. (1995). Writing about structural equation models. In R. H. Hoyle (Ed.), Structural equation modeling (pp. 158 – 176). Thousand Oaks, CA: Sage. Hu, L.-T., & Bentler, P. M. (1995). Evaluating model fit. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues and applications (pp. 76 –99). Thousand Oaks, CA: Sage. Hu, L.-T., & Bentler, P. M. (1998). Fit indices in covariance structure

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

SPECIAL ISSUE: INTRODUCTION TO SEM modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424 – 453. Hu, L.-T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55. Hu, L.-T., Bentler, P. M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted? Psychological Bulletin, 112, 351–362. Hui, S. K. A., Elliott, T. R., Shewchuk, R., & Rivera, P. (2007). Communal behavior and psychosocial adjustment of family caregivers and persons with spinal cord injury. Rehabilitation Psychology, 52, 113–119. Jackson, D. L. (2001). Sample size and number of parameter estimates in maximum likelihood confirmatory factor analysis: A Monte Carlo investigation. Structural Equation Modeling, 8, 205–223. Jackson, D. L. (2003). Revisiting sample size and number of parameter estimates: Some support for the N:q hypothesis. Structural Equation Modeling, 10, 128 –141. Jackson, D. L. (2007). The effect of the number of observations per parameter in misspecified confirmatory factor analytic models. Structural Equation Modeling, 14, 48 –76. Jo¨reskog, K. G. (1973). A general method for estimating a linear structural equation system. In A. S. Goldberger & O. D. Duncan (Eds.), Structural equation models in the social sciences (pp. 85–112). New York: Academic Press. Jo¨reskog, K. G., & So¨rbom, D. (1981). Analysis of linear structural relationships by maximum likelihood and least squares methods (Research Report No. 81– 8). Uppsala: University of Sweden. Jo¨reskog, K. G., & So¨rbom, D. (1996). LISREL 8: User’s reference guide. Chicago: Scientific Software International. Kline, R. B. (1998). Principles and practice of structural equation modeling. New York: Guilford. Kline, R. B. (2005). Principles and practice of structural equation modeling (2nd ed.). New York: Guilford. Kumpfer, K. (1999). Factors and processes contributing to resilience: The Resilience Framework. In M. D. Glantz & J. L. Johnson (Eds.), Resilience and development: Positive life adaptations (pp. 179 –224). New York: Kluwer Academic/Plenum Publishers. Lee, G. K., Chan, F., & Berven, N. L. (2007). Factors affecting depression among people with chronic musculoskeletal pain: A structural equation model. Rehabilitation Psychology, 52, 33– 43. Levenstein, S., Prantera, C., Varvo, V., Scribano, M. L., Berto, E., Luzi, C., & Andreoli, A. (1993). Development of the Perceived Stress Questionnaire: A new tool for psychosomatic research. Journal of Psychosomatic Research, 37, 19 –32. Livneh, H., & Wilson, L. M. (2003). Coping strategies as predictors and mediators of disability-related variables and psychosocial adaptation: An exploratory investigation. Rehabilitation Counseling Bulletin, 46, 194 – 208. Luthar, S. S., Cicchetti, D., & Becker, B. (2000). The construct of resilience: A critical evaluation and guidelines for future work. Child Development, 71, 543–562. Lynch, R. T. (2005). Promotion of health and enhanced life functioning for individuals with traumatic injuries and chronic health conditions. In F. Chan, M. J. Leahy, & J. L. Saunders (Eds.), Case management for rehabilitation health professionals (2nd ed., Vol. 2, pp. 44 – 63). Osage Beach, MO: Aspen Professional Services. MacCallum, R. C., & Austin, J. T. (2000). Applications of structural equation modeling in psychological research. Annual Review of Psychology, 51, 201–226. Marsh, H. W., Hau, K.-T., & Wen, Z. (2004). In search of Golden Rules: Comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler’s (1999) findings. Structural Equation Modeling, 11, 320 –341.

355

Martens, M. P. (2005). The use of structural equation modeling in counseling psychology research. The Counseling Psychologist, 33, 269 –298. Masten, A. S., & Coatsworth, J. D. (1998). The development of competence in favorable and unfavorable environments: Lessons from research on successful children. American Psychologist, 53, 205–220. McDonald, R. P., & Ho, M.-H. R. (2002). Principles and practice in reporting structural equation analyses. Psychological Methods, 7, 64 – 82. Middleton, J. W., Tate, R. L., & Geraghty, T. J. (2003). Self-efficacy and spinal cord injury: Psychometric properties of a new scale. Rehabilitation Psychology, 48, 281–288. Retrieved April 28, 2005, from the PsycARTICLES database. Motl, R. W., McAuley, E., & Snook, E. M. (2007). Physical activity and quality of life in multiple sclerosis: Possible roles of social support, self-efficacy, and functional limitations. Rehabilitation Psychology, 52, 143–151. Motl, R. W., Snook, E. M., McAuley, E., Scott, J. A., & Gliottoni, R. C. (2007). Are physical activity and symptoms correlates of functional limitations and disability in multiple sclerosis? Rehabilitation Psychology, 52, 463– 469. Mulaik, S. A., & Millsap, R. E. (2000). Doing the four-step right. Structural Equation Modeling, 7, 36 –73. Muthe´n, B. O., & Kaplan, D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 19 –30. Muthe´n, L. K., & Muthe´n, B. O. (2004). Mplus user’s guide (3rd ed.). Los Angeles: Author. Nevitt, J., & Hancock, G. R. (2004). Evaluating small sample approaches for model test statistics in structural equation modeling. Multivariate Behavioral Research, 39, 439 – 478. Pedhazur, E. J. (1982). Multiple regression in behavioral research (2nd ed.). Austin, TX: Holt, Rinehart and Winston. Quintana, S. M., & Maxwell, S. E. (1999). Implications of recent developments in structural equation modeling for counseling psychology. The Counseling Psychologist, 27, 485–527. Radloff, L. S. (1977). The CES-D Scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1, 385– 401. Rubin, D. B. (1976). Inference and missing data. Biometrika, 61, 581–592. SAS Institute, Inc. (2000). SAS/ETS software: Changes and enhancements. Release 8.1. Cary, NC: Author. Satorra, A., & Bentler, P. (1994). Corrections to test statistics and standard errors on covariance structure analysis. In A. von Eye & C. C. Clogg (Eds.), Latent variables analysis (pp 399 – 419). Thousand Oaks, CA: Sage. Schumacker, R. E., & Lomax, R. G. (2004). A beginner’s guide to structural equation modeling (2nd ed.). Mahwah, NJ: Erlbaum. Stapleton, L. M. (2006). Using multilevel structural equation modeling techniques with complex sample data. In Hancock, G. R. & Mueller, R. (Eds.), A second course in structural equation modeling (pp. 345–384). Greenwich, CT: Information Age Publishing. Steiger, J. H. (1990). Structural model evaluation and modification: An internal estimation approach. Multivariate Behavioral Research, 25, 173–180. Steiger, J. H., & Lind, J. C. (1980, May). Statistically based tests for the number of common factors. Paper presented at the annual meeting of the Psychometric Society, Iowa City, IA. Suzuki, R., Krahn, G. L., McCarthy, M. J., & Adams, E. J. (2007). Understanding health outcomes: Physical secondary conditions in people with spinal cord injury. Rehabilitation Psychology, 52, 338 –350. Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon. Tanaka, J. S., & Huba, G. J. (1985). A fit index for covariance structural

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

356

WESTON, GORE, CHAN, AND CATALANO

models under arbitrary GLS estimation. British Journal of Mathematical and Statistical Psychology, 8, 197–201. Tanaka, J. S., & Huba, G. J. (1989). A general coefficient of determination for covariance structure models under arbitrary GLS estimation. British Journal of Mathematical and Statistical Psychology, 42, 233–239. Weston, R., & Gore, P. A. (2006). A brief guide to structural equation modeling. The Counseling Psychologist, 34, 719 –751. Widaman, K. F., & Reise, S. P. (1997). Exploring the measurement invariance of psychological instruments: Applications in the substance use domain. In K. J. Bryant, M. Windle, & West, S. G. (Eds.), The science of prevention: Methodological advances from alcohol and substance abuse research (pp. 281–324). Washington, DC: American Psychological Association. Wright, S. (1920). The relative importance of heredity and environment in determining the piebald pattern of guinea-pigs. Proceedings of the National Academy of Sciences, USA, 6, 320 –332. Yu, X., & Zhang, J.-X. (2007). Factor analysis of the Connor-Davidson

Resilience Scale (CD-RISC) with Chinese people. Social Behavior and Personality: An International Journal, 35, 19 –30. Yuan, K.-H., & Bentler, P. M. (1998). Normal theory based test statistics in structural equation modeling. British Journal of Mathematical and Statistical Psychology, 51, 289 –309. Ziegelmann, J. P., Luszcznska, A., Lippke, S., & Schwarzer, R. (2007). Are goal intentions or implementation intentions better predictors of health behavior? A longitudinal study in orthopedic rehabilitation. Rehabilitation Psychology, 52, 97–102. Zimet, G. D., Dahlem, N. W., Zimet, S. G., & Farley, G. K. (1988). The multidimensional scale of perceived social support. Journal of Personality Assessment, 52, 30 – 41.

Received May 19, 2008 Revision received June 11, 2008 Accepted June 12, 2008 䡲