Structural Equation Modeling - Wiley Online Library

7 downloads 0 Views 959KB Size Report
Structural Equation Modeling: Possibilities for Language Learning Researchers1. Gregory R. Hancocka and Rob Schoonenb. aUniversity of Maryland and ...
Language Learning

ISSN 0023-8333

CHAPTER 7 Structural Equation Modeling: Possibilities for Language Learning Researchers1 Gregory R. Hancocka and Rob Schoonenb a

University of Maryland and b University of Amsterdam

Although classical statistical techniques have been a valuable tool in second language (L2) research, L2 research questions have started to grow beyond those techniques’ capabilities, and indeed are often limited by them. Questions about how complex constructs relate to each other or to constituent subskills, about longitudinal development in those constructs and factors affecting that development, and about differences among populations in average amounts of complex constructs or in their relations require a broader analytical framework. Fortunately, that of structural equation modeling (SEM), a versatile and ever-expanding family of techniques, is able to accommodate such questions and many more. The current article describes some of the questions that can be addressed by SEM, presents some research examples within the existing L2 literature, and then provides examples of the incredible potential of SEM, cautions in its practice, and resources for further information. Keywords SEM; covariance structure analysis; modeling relations; multi-group comparisons

Introduction Although traditional analytical techniques such as analysis of variance (ANOVA) and multiple linear regression have been the workhorses of many disciplines for decades, the increased complexity of research questions has demanded a commensurate increase in the capabilities of our analytical tools. In second language (L2) research, for example, such questions might include how complex constructs relate to each other or to constituent subskills, how constructs develop longitudinally (and the factors affecting that development), and what differences exist among populations in amounts of complex constructs or

Correspondence concerning this article should be addressed to Gregory R. Hancock, Department of Human Development and Quantitative Methodology, 1230 Benjamin Building, University of Maryland, College Park, MD 20742–1115. E-mail: [email protected]

Language Learning 65:Suppl. 1, June 2015, pp. 160–184  C 2015 Language Learning Research Club, University of Michigan

DOI: 10.1111/lang.12116

160

Hancock and Schoonen

SEM Possibilities for Language Learning

their relations (where those populations are known a priori or are inferred from the data themselves). Each of these is elaborated upon briefly below. How Constructs Relate Multiple correlation and regression have been the traditional methods of choice for studying how constructs relate to each other, where those constructs are operationalized in the form of scores on particular assessments, instruments, or observations. But what if those constructs are measured with error, which they almost certainly are in L2 research, and/or are part of a larger network of hypothesized causal relations with many constructs believed to be influencing each other? Accommodating the attenuating and often misleading effects of measurement error, as well as hypotheses involving multiple independent and dependent variables, quickly exceeds the capabilities of the general linear model. How Constructs Develop Over Time Classical repeated measures analyses are certainly valuable for a number of longitudinal research scenarios. They are quite constricting, however. For example, when there are individual differences in development over time and there is a research interest in the determinants of those individual differences, when there is attrition, as well when measurement error and potential differences in measurement quality exist over time, a more comprehensive and flexible longitudinal analytical framework is quickly needed. Differences in Constructs and Construct Relations Across Populations ANOVA in its many forms is a well-established technique for evaluating population mean differences involving one or more independent grouping variables. Under the very real conditions of nonnormality, heterogeneity of variance, and dependence of data such as through complex sampling designs, however, the technique becomes strained to say the least. And if one wishes to accommodate error in measurement (which can greatly affect power and the ability to estimate the magnitude of effect sizes), or investigate potential populations that are unseen but suspected to underlie the data, or focus on population differences in construct relations themselves, ANOVA and the general linear model more broadly break down completely. The analytical paradigm able to accommodate all of the above research question scenarios, and more, is that of structural equation modeling (SEM). Within L2 research, unfortunately, SEM’s presence is thus far only relatively sporadic. For example, a cursory search in July of 2014 of the PsycINFO 161

Language Learning 65:Suppl. 1, June 2015, pp. 160–184

Hancock and Schoonen

SEM Possibilities for Language Learning

bibliographical database yielded 7507 hits for second language acquisition (SLA), but only 145 when cross-referenced with SEM. Even fewer results came from the Linguistics and Language Behavior Abstracts bibliographical database wherein, out of the 27461 hits for SLA, only 51 referenced SEM. Still, these studies are first and foremost taken as a hopeful sign of the shape of analytical things to come in L2 research. Second, they are merely the tip of the iceberg in terms of potential applications for SEM in the L2 domain. In an attempt to provide additional encouragement, this article will be structured in three parts: (1) a relatively brief but accessible outline of some of the different types of models that can be examined within SEM; (2) some cautions in the practice of SEM, especially as might be relevant to L2 research; and (3) resources for learning more about SEM. Some Useful Types of Structural Equation Models for L2 Research As mentioned above, SEM is a fairly large, and expanding, analytical framework, with many specialized model types falling under the SEM heading. These types include, but are not limited to, measured variable path models, confirmatory factor models, latent variable path models, multisample covariance structure models, latent means models for between-subjects and withinsubjects designs, latent growth models, and mixture models. For each of these specific model types, we will provide a brief conceptual overview, refer to an example of such a model as appears in the L2 or L2-related literature, and then provide additional examples and/or possibilities for that model type within the L2 domain. Covariance Structure Models Understanding why variables relate, or covary, is at the core of research. Indeed, most scientific endeavors are squarely aimed at articulating and testing particular models about the underlying causes operating that constitute the structure of variables’ patterns of covariance in a population. These models contain parameters that characterize, for example, the magnitude and direction of the causal connections hypothesized to give rise to the variables’ relations in the population, relations that are evidenced in, and estimated from, a covariance matrix for sample data drawn from that population. In SEM such models are often referred to as covariance structure models, which may be described as falling into three specific, although interrelated, model types: measured variable path models, confirmatory factor models, and latent variable path models. Each of these is addressed below. Language Learning 65:Suppl. 1, June 2015, pp. 160–184

162

Hancock and Schoonen

SEM Possibilities for Language Learning

Figure 1 Measured variable path model from MacIntyre and Charos (1996).

Measured Variable Path Models Measured variable path models, or more simply (and historically) path models, are related to multiple linear regression models in the sense that they specify linear models relating independent and dependent variables. Three critical distinctions exist, however. First, rather than having a single dependent variable, the researcher may specify multiple dependent variables simultaneously, allowing linear connections from independent to dependent variables as well as from dependent to other dependent variables as are consistent with theory. Second, and at the core of all SEM, is that each of the hypothesized links does not merely represent the empirical association of an outcome variable with predictor variables, as is typical in regression, but rather a hypothesized causal link whereby variables are believed to exert influence on other variables. And third, assuming all variables are not directly linked to each other (a saturated model), a measured variable path model is rejectable; that is, if the pattern of relations among the measured variables is sufficiently inconsistent with the hypothesized connections specified in the model, then the model as a whole may be rejected as an explanatory system. An example from the L2 literature comes from MacIntyre and Charos (1996), who modeled L2 willingness to communicate in monolingual university students. As seen in Figure 1, the seven measured variables shown in rectangles were hypothesized to have causal relations as depicted by directional (single-headed) arrows. For models such as this, then, 163

Language Learning 65:Suppl. 1, June 2015, pp. 160–184

Hancock and Schoonen

SEM Possibilities for Language Learning

measured variable path analysis provides a useful framework for specifying and assessing hypothesized causal relations among sets of measured variables. Confirmatory Factor Analysis (CFA) Models CFA models allow for the introduction of latent variables, also referred to as factors or constructs. While many definitions of latent variables have been offered, for our purposes they are quite simply variables hypothesized to exist but for which we have no direct measures at hand, often because such variables are by their very nature unobservable. In the same sense as exploratory factor analysis (EFA), the factors are typically believed to be the unobserved underlying sources of patterns of covariation among measured variables. The key difference between EFA and CFA, however, as implied by their names, is that the latter does not involve exploring the data in search of such latent variables but rather explicitly specifies them a priori within a model and then assesses that model’s ability to explain observed variables’ characteristics. Most typically, although not necessarily, the observed variables are selected or constructed specifically in an attempt to serve as observable (manifest) indicators of the hypothesized constructs, as in, for example, the creation of various reading and listening span tasks to try to reflect the latent working memory capacity those tasks are believed to assess. The CFA model thus serves as an analytical framework to gather support for, or to refute, hypothesized constructs’ existence and the relations to their observed indicators, as well as relations to other constructs when more than one is contained in the model. A nice example from the L2 literature comes from a commentary by Isemonger (2007), in which he points out the ability of CFA to compare models representing competing explanations, specifically regarding implicit and explicit language knowledge. In Figure 2, competing models of potential latent variables are depicted in ellipses that are theorized to influence scores on various task measures, while the latent variables themselves are hypothesized to covary as indicated by their nondirectional (i.e., two-headed) arrow. Latent Variable Path Models Latent variable path models combine aspects of measured variable path models and CFA models, representing a general covariance structure modeling framework for assessing hypothesized relations among independent and dependent measured and latent variables. In models involving measured variables in general, whether path models or otherwise (e.g., regression, ANOVA), error of measurement can greatly affect the estimation of those measured variables’ relations, specifically leading to attenuated estimates of the Language Learning 65:Suppl. 1, June 2015, pp. 160–184

164

Hancock and Schoonen

SEM Possibilities for Language Learning

Figure 2 Competing confirmatory factor analysis models from Isemonger (2007).

parameters trying to explain those relations, as well as diminished statistical power in the tests of those parameters. As most variables are typically assumed to be mere operationalizations of error-free constructs, latent variable path models insert CFA models for each such construct in place of many or all measured variables from a path model. The result is a model combining both a measurement (CFA) model and a theoretical (path) structural model, where the structure captures relations among the constructs of interest rather than among their error-prone measured indicators. This has the benefits of (1) directly representing the construct relations that are typically of interest to the researcher; (2) estimating those relations without attenuation and with increased statistical power; and (3) being rejectable, that is, allowing for the assessment of consistency or inconsistency with the pattern of covariation in the data to determine whether or not the model as a whole is viable. As an example, as shown in Figure 3, Yashima (2002) integrated the latent variables of L2 Communication Confidence (two indicators), L2 Learning Motivation (two indicators), L2 165

Language Learning 65:Suppl. 1, June 2015, pp. 160–184

Hancock and Schoonen

SEM Possibilities for Language Learning

Figure 3 Yashima’s (2002) latent variable path model of Japanese students’ willingness to communicate in English.

Proficiency (three indicators), and attitude toward the international community (International Posture; four indicators) into a model with a measured outcome of Japanese students’ willingness to communicate in English. Covariance Structure Models: L2 Research Possibilities Covariance structure models offer many possibilities, some of which have already appeared in the literature. For example, researchers have studied the constituent parts of (second) language proficiency, such as work by Schoonen et al. (2003) and Schoonen, Van Gelderen, Stoel, Hulstijn, and De Glopper (2011) analyzing L2 writing proficiency and work by Shiotsu and Weir (2007) and Van Gelderen et al. (2004) and Van Gelderen, Schoonen, Stoel, De Glopper, and Hulstijn (2007) analyzing L2 reading proficiency. In these studies researchers modeled the subskills that they hypothesized to be an important part of overall language ability, and then assessed whether these model predictions held. This is reminiscent of regression analyses, but SEM provides the possibility to model the relations among the predictor variables as well, and—as was mentioned above—can take into account differences in reliability of the measures used, providing a fairer comparison of predictors. For instance, Shiotsu and Weir (2007) showed that in different samples of students both vocabulary and syntactic knowledge are strong predictors of reading comprehension, but that syntactic knowledge tends to be superior to vocabulary knowledge. As the researchers applied a latent variable path model, the comparison between the two components of reading is not contaminated by possible differences in reliability of the measures. Language Learning 65:Suppl. 1, June 2015, pp. 160–184

166

Hancock and Schoonen

SEM Possibilities for Language Learning

For L2 research the modeling facility affords opportunities to address key questions regarding the role of L1 abilities in L2 language learning. For example, is there a direct effect of L1 writing ability on L2 writing or is this effect mediated by something like metacognitive or metalinguistic knowledge? The modeling would be analogous to Yashima’s model (Figure 3), showing not only that International Posture has an indirect effect on Willingness to Communicate (via Motivation, Proficiency, and Confidence), but a direct effect as well. In a L1–L2 model one could theorize that not all of the L1 influence is mediated by metacognitive and metalinguistic skills, but that a direct effect of L1 on L2 writing remains. In a similar vein, one can investigate how background variables need to be modeled. Does language background (native vs. nonnative) per se have an influence on reading proficiency, or does it make more sense to presume that language background influences exposure to the L2, and that L2 exposure affects vocabulary uptake and that vocabulary size ultimately determines reading comprehension? This chain of effects is theoretically very plausible, but cannot be tested in a typical regression analysis. With the appropriate data such a covariance structure model can be put to the test and compared with a model that postulates a direct effect of language background on reading comprehension. Multisample Covariance Structure Models Each of the three types of models presented so far have been examples of covariance structure models, that is, models whose purpose is to explain why observed variables vary and covary in the manner that they do. Additionally, each did so with attention focused on a specific population. One may be interested, however, in fitting a model to multiple known populations simultaneously, with the aim of assessing whether key model relations appear to be the same across populations (i.e., are invariant) or appear to differ (i.e., are noninvariant). This type of model may be used in a variety of multisample settings, experimental or otherwise, with particular utility in cross-cultural research. For multisample CFA models the goal is often to examine whether factors’ measured indicators function comparably in those different populations, whereas for measured and latent variable path models the attention is on potential population differences in the theorized causal connections among the model’s key structural elements. In work by Van Gelderen et al. (2003), the authors compared structural models of linguistic skills’ influence on reading proficiency in samples of Dutch native students and Dutch non-native students. The latent variable path model, where reading proficiency and the various linguistic skills are latent and have two measured indicator variables each, is shown in Figure 4. 167

Language Learning 65:Suppl. 1, June 2015, pp. 160–184

Hancock and Schoonen

SEM Possibilities for Language Learning

Figure 4 Multisample covariance structure model based on Van Gelderen et al. (2003); see Schoonen et al. (2002) for a similar model for writing proficiency.

Language Learning 65:Suppl. 1, June 2015, pp. 160–184

168

Hancock and Schoonen

SEM Possibilities for Language Learning

Multisample Covariance Structure Models: L2 Research Possibilities For L2 research the capability to work with separate samples simultaneously is very helpful. When constrained by traditional techniques, we are inclined simply to compare (mean) scores on single tests while from a theoretical perspective it is more interesting (and appropriate) to take a broader view and to include the relations between variables (covariances and path coefficients) in the comparisons. Relations between variables in L2 learning, for example, are likely to differ by teaching context (e.g., communicative approach or not), country, age, L1, and so forth. In all of those cases it might be worthwhile to take a multisample approach. An example with different age groups and teaching contexts is the study by Kormos, Kiddle, and Csizer (2011) regarding English language learning motivation and how this is affected by L2 learning attitude, parental encouragement, knowledge orientation, international posture, ideal L2 self, and ought-to L2 self, in three populations: secondary school students, university students, and language institute students. Kormos and colleagues found self-related beliefs to be important for learning motivation, but these effects differed per sample. From a crosslinguistic perspective, multisample analyses offer the possibility to evaluate the effect of typological differences between a learner’s L1 and the target L2. Do learners coming from an L1 with verb-second word order have the same problems in learning a subject-object-verb language as do learners with an L1 verb-initial word order? And are the same variables involved in successful learning? Learners with different L1s would form the multisamples. It is also conceivable that the samples are drawn from learners with the same L1 but learning different L2s. In work by Kieffer and Lesaux (2012a) four different samples of young speakers and learners of English were involved, namely native speakers, L1 Spanish, L1 Vietnamese, and L1 Filipino learners. The authors investigated the direct and indirect effects of latent variable morphological awareness on (latent) reading comprehension, potentially mediated by reading vocabulary and silent word reading fluency. Despite the differences in the morphology of the L1s of the (young) learners, no differences were found between the samples regarding the important direct and indirect role morphological awareness plays in reading comprehension. Latent Means Models—Between-Groups Designs and Within-Groups Designs All of the models covered so far have been about how variables relate rather than how well individuals score on average. By analogy, multiple regression examines how independent variables relate to a dependent variable whereas 169

Language Learning 65:Suppl. 1, June 2015, pp. 160–184

Hancock and Schoonen

SEM Possibilities for Language Learning

Figure 5 Latent means model from Matsumura (2001) with between-group and withingroup comparisons.

ANOVA focuses on group differences in the average score on one or more variables. And while it is certainly possible to frame the latter as a special case of the former, ANOVA methods are utilized with different types of research questions in mind, specifically, questions regarding population differences in mean levels of measured outcome variables of interest. By extension, then, one could be interested in differences in mean levels of latent outcome variables of interest. Indeed, as measured variables are typically mere operationalizations of error-free latent variables, one could argue that research questions regarding means are always about latent variables, at least implicitly. To complement the covariance structure methods presented so far, latent means models allow the researcher to assess population differences directly in latent variables of interest, thereby providing standardized estimates of effect size that are not attenuated by measurement error and providing more statistical power in the tests of those effects than measured variable methods. As an example involving multiple groups (i.e., a between-groups design), Matsumura (2001) investigated the mean difference in the latent variable of sociocultural perception of social status between Japanese students studying English in Japan and Japanese students who were studying English in Canada. Mean differences in the latent variable, which had three measured indicators, were assessed at four different time points throughout a year. This model, an abbreviated version of which is shown in Figure 5 (and whose nuances are beyond the scope of this article), found that early in the year the group studying abroad scored statistically significantly lower on average in the latent variable, but throughout the year caught up and surpassed the group studying in Japan. Language Learning 65:Suppl. 1, June 2015, pp. 160–184

170

Hancock and Schoonen

SEM Possibilities for Language Learning

For cases where a researcher has dependent groups, such as matched pairs or the same individuals at different points in time and/or under different conditions, one can do latent means models for within-groups designs as well. In these cases the focus is on the mean difference in the construct of interest across those pairs, times, or conditions. The Matsumura (2001) study mentioned above and shown in Figure 5, for example, had a within-groups aspect in addition to the between-subjects portion of the design. Mean levels of the latent sociocultural perception of social status construct were assessed at four points in 3-month intervals, showing relative stability in latent means for those studying English in Japan but a consistent increase over time in the average amount of the construct for the sample studying English abroad in Canada. Latent Means Models: L2 Research Possibilities For L2 research, latent means models offer many advantages. L2 researchers are interested in the level of language proficiency attained by the language learners, whether it is in natural or educational settings. This level may then be compared to the level at an earlier point in time, or to the level of a different sample of language learners or native speakers. Latent means models let researchers avoid untrustworthy comparisons that could result from differences in measurement error at different points in time or in different samples. Moreover, in a repeated measures design a researcher might want to refrain from administering all tasks twice; a mix of repeated tasks and new tasks can be accommodated within a latent means model. In their longitudinal study of L1 and English as a foreign language (EFL) writing proficiency, Schoonen et al. (2011) administered three writing assignments per language at the first test administration, of which one per language was repeated the next administration together with two new tasks. In a model with latent means, the authors showed that on average the scores in L1 hardly increased, whereas the students showed a reasonable development in their EFL. Kieffer and Lesaux (2012b) administered a large battery of vocabulary measures to sixth grade L1 and L2 speakers of English. In a multisample CFA (see above), they found that the relations between the vocabulary measures could be explained by postulating three related but distinct latent variables (factors) of vocabulary knowledge: vocabulary breadth, contextual sensitivity, and morphological awareness. This underlying structure held for both samples. However, in the latent means model the authors could establish statistically significant differences between the two samples in their averages on these latent variables, also showing that estimated standardized effect sizes differed for the three latent variables, ranging from .37 (morphological awareness) to 171

Language Learning 65:Suppl. 1, June 2015, pp. 160–184

Hancock and Schoonen

SEM Possibilities for Language Learning

Figure 6 Linear latent growth model with predictors from Lerv˚ag and Aukrust (2010) (shown for one group only).

.52 (vocabulary breadth). Provided adequate data, all comparisons made with traditional techniques can be made in a latent means model, and many more. Latent Growth Models In the case of the within-groups latent means model over time, described above, there is an opportunity to learn about mean-level change over time, that is, whether the group as a whole is increasing or decreasing. What such models lack, however, is information regarding the extent to which there are individual differences in that change over time. For example, while the group as a whole might be increasing in mean levels of an outcome, some individuals might not be increasing, or perhaps are increasing but at a slower rate, while others might be increasing even more rapidly than average. Latent growth models are modifications of CFA models that allow for an assessment of the functional form of growth trajectories over time (e.g., linear, quadratic), degree of variability around those functional forms (e.g., variance in individual trajectories’ initial levels and growth slopes), and predictors of those individual differences. A fairly typical latent growth model was analyzed by Lerv˚ag and Aukrust (2010), as shown in Figure 6, who examined linear growth in reading comprehension in Norwegian children from first through fourth grade, and the Language Learning 65:Suppl. 1, June 2015, pp. 160–184

172

Hancock and Schoonen

SEM Possibilities for Language Learning

extent to which decoding and vocabulary measures were predictive of individual differences in initial levels of comprehension (intercepts) and growth in comprehension (slopes). Such models were analyzed and compared across two groups, L1 Norwegian and L2 Norwegian Urdu speakers from Pakistan, finding, among other results, that vocabulary is a critical predictor of growth in reading comprehension, and stronger among L2 learners than among the L1 learners. Latent Growth Models: L2 Research Possibilities It is obvious that latent growth models open up very appealing possibilities for SLA research, as the ultimate goal of this research is to understand, predict, and promote progress or growth in language learning. In L2 literacy studies, for example, it is not only possible to investigate the predictive power of L1 literacy on the L2 level, but also to study how L1 literacy level predicts development in L2 literacy (see Guglielmi, 2008). Of course, many more research scenarios can be envisioned. For instance, it can be expected that features of language instruction received or learning styles affect both students’ initial level (Intercept in Figure 6) and their rate of development (Slope in Figure 6). More concretely, one could investigate how the amount of (different sorts of) language exposure (e.g., communicative vs. instruction) shapes, for example, the decrease over time of grammatical errors in spontaneous speech. Another interesting possibility is to combine, for instance, two growth models, one for the development of L2 lexical skills and one for L2 speaking. This combination creates the opportunity to investigate whether lexical development propels speaking development. For a similar example on self-confidence and language proficiency (and school investment) in grades 2, 4, 6, and 8, see Stoel, Peetsma, and Roeleveld (2003), who found that the development in language proficiency, self-confidence, and school investments are all positively related. Mixture Models For all of the single-group models presented so far, and for many other traditional analyses as well (e.g., correlation, multiple regression), the assumption is implicit that the sample at hand is drawn from and represents a single population. As a result, the model’s parameter estimates, such as measured or latent variable path coefficients, latent means, or slope variances within a latent growth model, are assumed to reflect characteristics or processes within that single population. What if, however, a sample has members who, unbeknownst to the researcher, come from unobserved subpopulations (so-called latent classes)? In fact, having a sample that is actually a mixture of individuals from different populations may be a fairly realistic scenario across a variety of 173

Language Learning 65:Suppl. 1, June 2015, pp. 160–184

Hancock and Schoonen

SEM Possibilities for Language Learning

different model types. In a measured or latent variable path model there might be some individuals for whom a predictor has less impact on an outcome than for other individuals. In a CFA model there might be some subjects for whom certain measured variables are not useful factor indicators, or even subjects for whom a different factor structure is operating altogether. In a latent means model there might be a subset of individuals for whom a treatment is completely ineffective. Similarly, in a latent growth model where most individuals are growing steadily over time, there could be a class of subjects for whom growth never occurs. For these scenarios a mixture model might be appropriate, in which a structural equation model is fit allowing for the existence of multiple simultaneously operating sets of parameters (or possibly even different structures across classes entirely). This technique differs from methods such as discriminant function analysis and logistic regression in at least two important ways: (1) the category/subpopulation to which subjects belong is unknown and (2) subpopulation membership is more often viewed as an independent or predictor variable rather than an outcome, potentially affecting the nature of the structural model. Perhaps the closest analog to mixture models is cluster analysis, but again critical differences exist there as well, primarily residing in the ability of mixture models to specify and formally test a hypothetical structure relating measured and possibly latent variables. To the best of our knowledge, in the field of L2 research no examples of mixture models are published; however, in the broader field of educational studies there are a number of studies using mixture models. Amtmann, Abbott, and Berninger (2008), for example, examined the development of spelling errors by low-achieving second-grade students. They applied growth mixture modeling to the 24 repeated measures and found that three latent classes could be distinguished in their sample of students: those who started with a low initial level and developed slowly, those who started at a relatively high level and showed a fast grow, and those who started at an even higher level but developed slowly. Latent class membership could, in turn, be predicted for those children from rapid automatic naming (RAN) and orthographic coding, that is, the coding of written words into working memory and making judgments about them. The model is represented in Figure 7. Mixture Models: L2 Research Possibilities Mixture models offer various possibilities for L2 researchers. In an analysis of the components of speaking proficiency, for example, it is easy to imagine that latent classes of language learners may exist: language learners who struggle with the phonological features of the L2, learners with little grammatical Language Learning 65:Suppl. 1, June 2015, pp. 160–184

174

Hancock and Schoonen

SEM Possibilities for Language Learning

Figure 7 Amtmann et al. (2008) latent growth mixture model with predictors.

knowledge but strong socio-pragmatic skills to compensate, and so forth. The latent classes could be associated with the L1 of the language learner, that is, L1–L2 phonetic distance, with learning style, and/or with personality traits such as extraversion. A second application of mixture models could be to explore whether, for a given model under investigation, classes of language learners are identified among an L1 heterogeneous group and, importantly, whether their L1s can be categorized meaningfully as a result of the classes that emerged. Third, the different trajectory classes that language learners may display according to a latent growth model could constitute subpopulations arising from different language instruction situations. Students coming from communicative approaches, for example, could have steeper growth curves for oral language proficiency than students from a grammar-translation method.

SEM in Practice The practice of SEM, like of any statistical method, requires great care. Indeed, with all its breadth and versatility come many opportunities for misuse. The goal of this section of the article is not to provide a comprehensive enumeration of all best practices in SEM, but rather to highlight a few that are important in general and/or especially relevant for current and potential L2 research. 175

Language Learning 65:Suppl. 1, June 2015, pp. 160–184

Hancock and Schoonen

SEM Possibilities for Language Learning

Data Screening and Assumptions The goal of SEM, as with any other statistical method, is to provide accurate inferences regarding a model and regarding parameters hypothesized to characterize one or more populations. In order for that model to do so, that is, to keep from yielding spurious results regarding data–model fit and/or model parameters, the model’s underlying assumptions must be met or statistics that are robust to violations of those assumptions (e.g., rescaled test statistics, bootstrapping) must be employed. It is therefore incumbent upon researchers to address issues such as, but not limited to, outliers, nonnormality, and dependence of observations, as detailed in some of the resources provided in the final section of this article. Data–Model Fit Assessment While a X2 test for a model as a whole is commonly reported, it is typically viewed as overly strict given its power to detect even trivial deviations of one’s data from a proposed model. Researchers should therefore report multiple data–model fit indices, typically drawing from three broad classes (and whose index cutoff criteria are discussed in resources provided in this article’s final section). Absolute indices, such as the standardized root mean square residual, evaluate the overall discrepancy between an observed covariance matrix and the covariance matrix suggested by the parameter estimates from the hypothesized model specification; fit improves as more parameters are added to the model. Parsimonious indices, such as the root mean-square error of approximation, evaluate the overall discrepancy between observed and implied covariance matrices while taking into account a model’s complexity; fit improves as more parameters are added to the model as long as those parameters are making a useful contribution. Incremental indices, such as the comparative fit index, assess absolute or parsimonious fit relative to a baseline model, usually the null/independence model that specifies no relations among observed variables. Finally, for models involving mean structures (e.g., latent means models and often latent growth models), as well as for mixture models, additional considerations are required in the assessment of data–model fit, as discussed in more formal treatments of these topics. Competing Models Rather than entering the modeling arena with a single model, the outcome of which is either model retention or model rejection, the evaluation of competing alternative models can strengthen a study by providing a more complete picture of the current thinking in a domain of inquiry (see the previously mentioned Language Learning 65:Suppl. 1, June 2015, pp. 160–184

176

Hancock and Schoonen

SEM Possibilities for Language Learning

commentary by Isemonger, 2007, in the context of CFA). These competing models may then be evaluated on their own and relative to each other, on grounds of data–model fit, replicability, and substantive interpretability. Exploration Versus Confirmation SEM is intended to be a confirmatory (or more correctly, disconfirmatory) process, that is, one in which potentially rejectable models are articulated a priori based on theory rather than derived from patterns in the data. This implies that results from a separate exploratory analysis, such as data mining or EFA, should not be used to construct and evaluate a model using the same data; doing so provides no test of the model at all as it was constructed based on observed patterns in those very data. Evaluating such a model using separate data, however, would be acceptable from a statistical standpoint; whether that model is theoretically meaningful remains up to the researcher to be able to defend. On the other hand, it should also be noted that smaller scale deviations from this general confirmatory principle of SEM practice do exist. When a model does not fit satisfactorily, for example, SEM programs commonly offer indices that suggest potential modifications to the model that will improve data–model fit. Modifications made to the measurement portion of a latent variable path model (see above), such as cross-loadings of variables on secondary factors or covariances between measured variables’ residual errors, are typically considered more acceptable than modifications made to key theoretical links among the latent variables in the structural portion of such a model. In short, when the theory-derived structural portion of a model is being altered or possibly even abandoned based on the suggestions of a sample of data, one is exchanging the hat of hypothesis (dis)confirmer for that of hypothesis explorer, and as such undermining the nature and purpose of SEM. Suffice it to say, if post hoc structural modifications are to be entertained, they should not only be defensible theoretically (albeit in hindsight), they should be explicitly recognized as exploratory and in need of cross-validation in future samples. Latent Versus Emergent Factors The term latent in the context of SEM, in addition to meaning unobserved, connotes a system whereby a factor is hypothesized to have a causal bearing on one or more measured indicator variables. Thus, in a latent variable system individuals are believed to vary on the measured indicators in part because they vary on the underlying factor and their indicator scores covary because they have a common latent cause. On the other hand, theory might instead 177

Language Learning 65:Suppl. 1, June 2015, pp. 160–184

Hancock and Schoonen

SEM Possibilities for Language Learning

dictate that measured variables have a causal bearing on their factor, making that factor emergent rather than latent. In this case variation in such measured indicators (which themselves might or might not covary) is hypothesized to cause and partially explain variation in the emergent factor. In L2 studies this could apply to, for instance, parental education. There is no latent variable that causes certain educational levels for the mother and the father; on the contrary, their education constitutes an emergent parental education factor (for an example, see Aunio & Niemivirta, 2010). The same goes for a construct of L2 language exposure, which most likely emerges from measures such as the number of hours of instruction and the amount of contact with native speakers. It is incumbent upon the researcher to justify each factor’s status as latent or emergent rather than simply presume a latent status for all factors as is traditional within SEM. Such a presumption could lead to a misspecified measurement model and, in turn, to incorrect inferences regarding the relations between the construct in question and other variables (latent or measured) within a model. Factor indicators Latent variables serve at least two very important and related functions in a model: (1) from a theoretical perspective they allow the model to focus directly on the entities of true interest rather than on their fallible measured proxies; and, in doing so, (2) from a statistical perspective they facilitate a correction for the attenuating effects of observed variables’ inherent measurement error, thereby enabling the model to directly estimate the relations and amounts associated with those latent variables. That said, latent variables are only as good as their measured indicators. Indicators of low reliability, for example, which have low factor-indicator relations (loadings), can result in factors with relatively low reliability/replicability (e.g., Hancock & Mueller, 2001). Such factors in turn can yield a model with misleading data–model fit indices, low statistical power to test structural relations in the model, and low sensitivity to detect structural misspecifications through modification indices (Hancock & Mueller, 2011). How many indicators should one have, then, to achieve sufficient reliability? The answer, unfortunately, is that it depends. Models with, say, two very strong indicators can suffice quite adequately in terms of facilitating the structural power and sensitivity desired, although can lead to model identification problems if their factor is only weakly related to others in the model. Having at least four indicators per factor, on the other hand, for the most part ensures model identification, and can address power and sensitivity—provided that there are strong indicators in the set. For more information regarding the Language Learning 65:Suppl. 1, June 2015, pp. 160–184

178

Hancock and Schoonen

SEM Possibilities for Language Learning

number of indicators in SEM, see Marsh, Hau, Balla, and Grayson (1998) and Gagn´e and Hancock (2006). In addition to issues of reliability, validity of factors vis-`a-vis their indicators can be a challenge as well, in particular when indicators are all from the same type of data source. Using all retrospective questionnaires as indicators, for example, however reliable they might be, can create factors that confound the latent traits of interest with the methods of measurement. Whenever possible, then, indicators should be chosen from multiple modes of measurement (e.g., survey, behavioral observation) to make sure that each factor separates trait from method as much as possible. In addition, accommodating any remaining method covariance through correlated errors, or more systematically through multitrait-multimethod models, can help to maximize the validity of inferences regarding the estimates of key factors’ characteristics of theoretical interest (i.e., means, variances, relations with other factors). For further examples, see Bachman and Palmer (1982) and Quellmalz, Capell, and Chou (1982). Factor Invariance When a researcher’s intention is to examine the same factor across different populations (as in cross-cultural studies), or across time (as in longitudinal models), the assumption is typically made that the construct is indeed the same entity across those populations or points in time. This may or may not be reasonable, however, and should be addressed squarely in such studies both on theoretical and statistical grounds. Regarding theoretical grounds, the case must be made that the construct exists and is of the same nature under the different circumstances examined. For example, the operationalization of participation in academic group discussion may differ cross-culturally (Jones, 1999), as do different kinds of speech act behavior, such as offering a compliment (Yu, 2011). In a similar theoretical vein, the validity of reading tests or items indicating reading proficiency may change throughout the course of (early) reading development. Regarding statistical grounds, it is common to test the invariance of parameters associated with a factor’s indicators (e.g., the loadings), the concern being that the fewer such parameters that behave similarly in different populations or at different points in time the less confidence one should have that the construct represents the same thing. Although it is possible that constructs are indeed the same in nature but the indicators themselves function differently (so-called partial invariance), conducting an invariance assessment of this type will provide evidence for researchers and consumers to make judgments about the stability of each construct’s identity before interpreting comparisons of that construct’s characteristics across populations or time. 179

Language Learning 65:Suppl. 1, June 2015, pp. 160–184

Hancock and Schoonen

SEM Possibilities for Language Learning

Sample Size Determination Historically, many different recommendations have been offered regarding the sample size necessary to analyze structural equation models. The truth, however, is that there is no such thing as a one-sample-size-fits-all recommendation for SEM, just as there is not for any other statistical test. Statistical tests attempt to detect specific effects that are believed to exist, and the sample size necessary to do so with adequate power depends on, among other things, the magnitude of those effects. Hence, just as there is no universal magnitude of effect, for SEM or otherwise, there can be no universal sample size. The process of conducting SEM in particular includes testing at the level of data–model fit as well as at the level of key parameters within a given model. Effect sizes thus exist both for fit and parameters. Like any other research endeavor, then, planning for adequate sample size in SEM is essential, as too small a sample could yield insufficient sensitivity to evaluate a model and/or its parameters while too large a sample could be a potential waste of resources. Within L2 research that has used SEM, mention of a priori sample size determination appears to be virtually nonexistent, rendering potentially tenuous the conclusions regarding adequate data–model fit or regarding nonsignificant model parameter estimates (e.g., paths). Future L2 research should make a concerted effort to incorporate sample size planning into the SEM process just as in other types of analyses (see, e.g., Hancock & French, 2013). Additional Resources As indicated previously, the current article’s goal is not to be a comprehensive introduction to SEM, but rather an invitation of sorts to learn more about this family of statistical methods with an eye toward expanding the questions that L2 researchers are able to address. Resources for learning more about SEM include semester-length university courses, online self-paced tutorials, and intensive training workshops sponsored by universities, private instructional agencies, or professional associations, each of which offers its own mix of theory and hands-on practice using any of a number of SEM software packages (e.g., AMOS, EQS, LISREL, Mplus, Mx, R, SAS, STATA). In addition, written materials include a variety of short pieces addressing general best practices in SEM-related methods (see, e.g., chapters in Hancock & Mueller, 2010), as well as full-length textbooks. Among the latter at the introductory level are popular and accessible offerings by Byrne (1998, 2006, 2010, 2012), Kline (2010), Loehlin (2004), Raykov and Marcoulides (2006), and Schumacker and Lomax (2010), along with the more comprehensive and technical introduction Language Learning 65:Suppl. 1, June 2015, pp. 160–184

180

Hancock and Schoonen

SEM Possibilities for Language Learning

by Bollen (1989). While each of these resources addresses more advanced SEM topics to varying degrees, edited volumes by Hancock and Mueller (2013) and by Hoyle (2012) intend to provide researchers with a compendium of useful focused chapters on specific areas mentioned previously (e.g., latent means models, latent growth models, mixture models) as well as areas beyond the current article (e.g., multilevel SEM, latent variable interactions, Bayesian SEM). It is our hope that the overview, examples, and resources we have provided in this article will serve to motivate researchers to roll up their sleeves, learn more about SEM, and embrace the exciting possibilities that this family of analytical methods can bring to the L2 research domain. Final revised version accepted 30 September 2014

Note 1 This article is based in part on the paper presentation by the first author at the Conference on Improving Quantitative Reasoning in Second Language Research, Georgetown University, Washington DC, October 26-27, 2013.

References Amtmann, D., Abbott, R., & Berninger, V. (2008). Identifying and predicting classes of response to explicit, phonological spelling instruction during independent composing. Journal of Learning Disabilities, 41, 218–234. Aunio, P., & Niemivirta, M. (2010). Predicting children’s mathematical performance in grade one by early numeracy. Learning and Individual Differences, 20, 427–435. Bachman, L. F., & Palmer, A. S. (1982). The construct validation of some components of communicative proficiency. TESOL Quarterly, 16, 449–465. Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley. Byrne, B. M. (1998). Structural equation modeling with LISREL, PRELIS, and SIMPLIS: Basic concepts, applications, and programming. Mahwah, NJ: Erlbaum. Byrne, B. M. (2006). Structural equation modeling with EQS: Basic concepts, applications, and programming (2nd ed.). Mahwah, NJ: Erlbaum. Byrne, B. M. (2010). Structural equation modeling with AMOS: Basic concepts, applications, and programming (2nd ed). New York: Taylor & Francis. Byrne, B. M. (2012). Structural equation modeling with Mplus: Basic concepts, applications, and programming. New York: Taylor & Francis. Gagn´e, P. E., & Hancock, G. R. (2006). Measurement model quality, sample size, and solution propriety in confirmatory factor models. Multivariate Behavioral Research, 41, 65–83. 181

Language Learning 65:Suppl. 1, June 2015, pp. 160–184

Hancock and Schoonen

SEM Possibilities for Language Learning

Guglielmi, R. S. (2008). Native language proficiency, English literacy, academic achievement, and occupational attainment in limited-English-proficient students: A latent growth modeling perspective. Journal of Educational Psychology, 100, 322–342. Hancock, G. R., & French, B. F. (2013). Power analysis in covariance structure models. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (2nd ed.; pp. 117–159). Charlotte, NC: Information Age Publishing. Hancock, G. R., & Mueller, R. O. (2001). Rethinking construct reliability within latent variable systems. In R. Cudeck, S. du Toit, & D. S¨orbom (Eds.), Structural equation modeling: Present and future—A Festschrift in honor of Karl J¨oreskog (pp. 195–216). Lincolnwood, IL: Scientific Software International. Hancock, G. R., & Mueller, R. O. (Eds.). (2010). The reviewer’s guide to quantitative methods in the social sciences. New York: Routledge. Hancock, G. R., & Mueller, R. O. (2011). The reliability paradox in assessing structural relations within covariance structure models. Educational and Psychological Measurement, 71, 306–324. Hancock, G. R., & Mueller, R. O. (Eds.). (2013). Structural equation modeling: A second course (2nd ed.). Charlotte, NC: Information Age Publishing. Hoyle, R. (Ed.). (2012). Handbook of structural equation modeling. New York: Guilford Press. Isemonger, I. M. (2007). Operational definitions of explicit and implicit knowledge: Response to R. Ellis (2005) and some recommendations for future research in this area. Studies in Second Language Acquisition, 29, 101–118. Jones, J. F. (1999). The construct validation of some components of communicative proficiency. English for Specific Purposes, 18, 243–259. Kieffer, M. J., & Lesaux, N. K. (2012a). Direct and indirect roles of morphological awareness in the English reading comprehension of native English, Spanish, Filipino, and Vietnamese speakers. Language Learning, 62, 1170–1204. Kieffer, M. J., & Lesaux, N. K. (2012b). Knowledge of words, knowledge about words: Dimensions of vocabulary in first and second language learners in sixth grade. Reading and Writing, 25, 347–373. Kline, R. B. (2010). Principles and practice of structural equation modeling (3rd ed.). New York: Guilford Press. Kormos, J., Kiddle, T., & Csizer, K. (2011). Systems of goals, attitudes, and self-related beliefs in second-language-learning motivation. Applied Linguistics, 32, 495–516. Lerv˚ag, A., & Aukrust, V. G. (2010). Vocabulary knowledge is a critical determinant of the difference in reading comprehension growth between first and second language learners. The Journal of Child Psychology and Psychiatry, 51, 612–620. Loehlin, J. C. (2004). Latent variable models (4th ed.). Hillsdale, NJ: Erlbaum. MacIntyre, P. D., & Charos, C. (1996). Personality, attitudes, and affect as predictors of second language communication. Journal of Language and Social Psychology, 15, 3–26. Language Learning 65:Suppl. 1, June 2015, pp. 160–184

182

Hancock and Schoonen

SEM Possibilities for Language Learning

Marsh, H. W., Hau, K. T., Balla, J. R., & Grayson, D. (1998). Is more ever too much? The number of indicators per factor in confirmatory factor analysis. Multivariate Behavioral Research, 33, 181–220. Matsumura, S. (2001). Learning the rules for offering advice: A quantitative approach to second language socialization. Language Learning, 51, 635–679. Quellmalz, E. S., Capell, F. J., & Chou, C.-P. (1982). Effects of discourse and response mode on the measurement of writing competence. Journal of Educational Measurement, 19, 241–258. Raykov, T., & Marcoulides, G. M. (2006). A first course in structural equation modeling. Mahwah, NJ: Erlbaum. Schoonen, R., Van Gelderen, A., De Glopper, K., Hulstijn, J., Simis, A., Snellings, P., et al. (2002). Linguistic knowledge, metacognitive knowledge and retrieval speed in L1, L2, and EFL writing. In S. Ransdell & M.-L. Barbier (Eds.), New directions for research in L2 writing (pp. 101–122). Dordrecht, Netherlands: Kluwer Academic. Schoonen, R., Van Gelderen, A., De Glopper, K., Hulstijn, J., Simis, A., Snellings, P., et al. (2003). First language and second language writing: The role of linguistic fluency, linguistic knowledge and metacognitive knowledge. Language Learning, 53, 165–202. Schoonen, R., Van Gelderen, A., Stoel, R., Hulstijn, J., & De Glopper, K. (2011). Modeling the development of L1 and EFL writing proficiency of secondary-school students. Language Learning, 61, 31–79. Schumacker, R. E., & Lomax, R. G. (2010). A beginner’s guide to structural equation modeling (3rd ed.). Hillsdale, NJ: Erlbaum. Shiotsu, T., & Weir, C. J. (2007). The relative significance of syntactic knowledge and vocabulary breadth in the prediction of reading comprehension test performance Language Testing, 24, 99–128. Stoel, R. D., Peetsma, T. T. D., & Roeleveld, J. (2003). Relations between the development of school investment, self-confidence, and language achievement in elementary education: A multivariate latent growth curve approach. Learning and Individual Differences, 13, 313–333. Van Gelderen, A., Schoonen, R., De Glopper, K., Hulstijn, J., Simis, A., Snellings, P., et al. (2004). Linguistic knowledge, processing speed and metacognitive knowledge in first and second language reading comprehension: A componential analysis. Journal of Educational Psychology, 96, 19–30. Van Gelderen, A., Schoonen, R., De Glopper, K., Hulstijn, J., Snellings, P., Simis, A., et al. (2003). Roles of linguistic knowledge, metacognitive knowledge and processing speed in L3, L2, and L1 reading comprehension: A structural equation modeling approach. International Journal of Bilingualism, 7, 7–25. Van Gelderen, A., Schoonen, R., Stoel, R. D., De Glopper, K., & Hulstijn, J. (2007). Development of adolescent reading comprehension in language 1 and language 2: A longitudinal analysis of constituent components. Journal of Educational Psychology, 99, 477–491. 183

Language Learning 65:Suppl. 1, June 2015, pp. 160–184

Hancock and Schoonen

SEM Possibilities for Language Learning

Yashima, T. (2002). Willingness to communicate in a second language: The Japanese EFL context. The Modern Language Journal, 86, 54–66. Yu, M.-C. (2011). Learning how to read situations and know what is the right thing to say or do in an L2: A study of socio-cultural competence and language transfer. Journal of Pragmatics, 43, 1127–1147.

Language Learning 65:Suppl. 1, June 2015, pp. 160–184

184