On Matters of Invariance in Latent Variable Models - Springer Link

1 downloads 0 Views 192KB Size Report
On Matters of Invariance in Latent Variable. Models: Reflections on the Concept, and its. Relations in Classical and Item Response Theory. Bruno D. Zumbo.
On Matters of Invariance in Latent Variable Models: Reflections on the Concept, and its Relations in Classical and Item Response Theory Bruno D. Zumbo

Abstract An overview is provided of the author’s program of research on measurement invariance. Two questions are addressed. First, when do theoreticians and practitioners talk about invariance, and what is it that we are talking about? Second, is invariance only a property of latent variable models such as IRT and is there invariance in classical test theory? If so, what is it for the: observed score, and latent variable formulations.

1 Introduction This is an overview of my longstanding program of research on the paradox that is measurement invariance (Li and Zumbo 2009; Rupp and Zumbo 2003, 2004, 2006; Sawatzky et al. 2012; Wu et al. 2007; Zimmerman and Zumbo 2001; Zumbo and Rupp 2004; Zumbo 1999, 2007a, b, 2008, 2009). On the one hand, under a mathematical lens, it is a trivial identity but on the other hand, under a historical and conceptual lens, it is probably the most important property of latent variable measurement models and item response theory (IRT), in particular. Furthermore, according to much of the contemporary psychometric literature, invariance is what sets IRT apart from classical test theory (CTT) models. This state of affairs has left me with two perplexing questions that will be the focus of this chapter. 1. When do we talk about invariance? And what is it that we are talking about? This is most often discussed in the context of IRT, sometimes called “modern test theory”. I will not be discussing Rasch models, per se.

B.D. Zumbo () Department of ECPS (also Department of Statistics, and Institute of Applied Mathematics), University of British Columbia, Vancouver, BC, Canada e-mail: [email protected] P. Giudici et al. (eds.), Statistical Models for Data Analysis, Studies in Classification, Data Analysis, and Knowledge Organization, DOI 10.1007/978-3-319-00032-9 45, © Springer International Publishing Switzerland 2013

399

400

B.D. Zumbo

2. Is “invariance” only a property of latent variable models (such as IRT) and is there “invariance” in classical test theory (CTT)? If so, what is it for the: observed score, and latent variable formulations. I find it useful in considering the matter of invariance in measurement to distinguish two formulations (parameterizations) of CTT: observed score CTT (e.g., Novick 1966; Lord and Novick 1968), and latent variable CTT (e.g., McDonald 1999). Clearly, these two formulations are interrelated but it is useful, for my purposes herein, to distinguish them.

2 When Do We Talk About Invariance? Matters of invariance come up very often in contemporary applied and theoretical work, including axiomatic measurement. The minimum that we need is: A (parent) population, and some vector-valued random variable, V, which is an indicator (selection function) of sub-populations or range of conditions of interest selected from the parent on the basis of V. We then talk about invariance with respect to elements of the selection function V—e.g., the elements of V are indicators for age, gender, ethnicity, or other such demographics. The concern is for invariance in subpopulations versus their union. For example, all examinees in grade 6 which has as sub-populations grade 6 boys and girls; the sub-populations of grade 6 boys and girls come together to form the target population of grade 6 students.

2.1 Most Often Discussed in the Context of Item Response Theory (IRT) The versatility of IRT models has made them the preferred tool of choice in many psychometric settings, but beyond the flexibility of IRT models it is the often misunderstood feature of parameter invariance that is frequently cited in introductory or advanced texts as one of their most important characteristics. It is the property of parameter invariance which is the major foundation for their widespread use in equating and adaptive testing and assessment. As a brief review, IRT can be written in the following way. In conventional descriptions of IRT, examinees are indexed by i D 1, : : : , I; items are indexed by j D 1, : : : , J;  signifies the unidimensional latent indicator, and Pij ./ is the probability of examinee i responding correctly to item j as a function of the continuous latent variable  . From IRT the unidimensional three-parameter logistic (3-PL) model for dichotomously scored items can be written as follows: Pij ./ D j C .1  j /

exp.˛j .i  ˇj // ; 1 C exp.˛j .i  ˇj //

0  j < 1; ˛j > 0; 1 < ˇj ; i < 1

On Matters of Invariance in Latent Variable Models: Reflections . . .

401

wherein ˛j is the “item discrimination” parameter related to the slope of an item characteristic curve (ICC), ˇj is the “item difficulty” parameter related to the location of the ICC, and j is the “pseudo-guessing” parameter, which is the lower asymptote of the ICC. In what is often referred to as the two-parameter logistic, 2PL, IRT model, j is zero (or, in some cases, a constant other than zero) and in the one-parameter logistic, 1PL, IRT model: ˛j D 1 (or some other constant) and j is zero (or, in some cases, a constant of than zero). The concept of item parameter invariance then stipulates that with a sufficiently large pool of examinees item parameters are independent of the ability distribution of the examinees. Likewise, the concept of person parameter (ability or theta) invariance stipulates that with a sufficiently large set of items respondents’ ability score and overall distribution of the ability score are independent of the set of test items. In describing invariance in IRT models, Lord (1980) states that: the probability of a correct answer to item i from examinees at a given ability level  0 depends only on  0 not on the number of people at  0 nor on the number of people at other ability levels  1 ,  2 , : : : Since the regression is invariant, its lower asymptote, its point of inflexion, and the slope at this point all stay the same regardless of the distribution of ability in the group tested. [ : : : ] According to the model, they remain the same regardless of the group tested. (p. 34)

In this citation it is the phrase “according to the model”, which is key to an understanding of invariance. The phrase can be translated to “if the model holds” and indeed renders invariance a relatively trivial issue (as the author implies himself), because one can say that if a given model holds perfectly for examinees and items in the respective populations, then the sets of item and examinee parameters are invariant. “In other words, invariance only holds when the fit of the model to the data is exact in the population.” (Hambleton et al. 1991, p. 23) In this sense, the model is the “glue” that binds the examinees and items together. Put differently, parameter invariance is a term denoting an absolute state so any discussion about whether there are “degrees of invariance” or whether there is “some invariance” are technically inappropriate (Hambleton et al. 1991). Moreover, the question of whether there is invariance in a single population or under a single condition is illogical as invariance requires at least two (sub-) populations or conditions for parameter comparisons. Put differently, parameter invariance is not guaranteed by the mere fact that an IRT model—or any other latent variable model for that matter—is fit to data (see Engelhard 1994; van der Linden and Hambleton 1997). This illustrates the paradox that is parameter invariance: On the one hand, under a mathematical lens, it is a trivial identity but on the other hand, under a historical and conceptual lens, it is probably the most important property of IRT models that sets them apart from classical test theory (CTT) models. Indeed, the property of parameter invariance unifies related investigations of: scaling, differential item functioning (DIF), item parameter drift, and latent class mixture models. In an important sense these are all instantiations of a lack of invariance. Mathematically, parameter invariance is a simple identity of parameters that are on the same scale; yet, the latent scale in IRT models is arbitrary so that unlinked

402

B.D. Zumbo

sets of parameters are invariant only up to a set of linear transformations specific to a given IRT model. When estimating these parameters in unidimensional IRT models with calibration samples, this indeterminacy is typically resolved by requiring that the latent indicator  be normally distributed with mean 0 and standard deviation 1 (i.e.,   N (0,1)). In orthogonal multidimensional IRT models, the latent scale indeterminacy implies that parameters are identical up to an orthogonal rotation, a translation transformation, and a single dilution or contraction.

2.2 A Familiar Description of Invariance from an IRT Framework In the IRT framework we refer to item parameters and examinee parameters, and invariance means identical values of parameters in different populations or subpopulations indexed by V, as described above. We can get a sense of the IRT usage of invariance by considering the common 2PL model. In this case, for parameters from two populations to be invariant, ˛ 0 D ˛;

ˇ 0 D ˇ;

 0 D ;

but due to the indeterminacy of the latent scale, we obtain: 0

˛j D ı 1 ˛j

;

0

ˇj D " C ıˇj

;

0

i D " C ıi ; and

Pj .i / D ˛j .i  ˇj / D ˛j .i  ˇj / D Pj .i / We can see from the 2PL example that the matter of invariance in IRT is a rather complicated statement involving a solution to the latent scale indeterminacy. However, there may be situations wherein we do not have any sense of a population or sub-population and in those contexts we are, in essence, not concerned with invariance. This would be a type of calibrative measurement or assessment context. In more common contexts, however, the aim is to use a statistical measurement model to draw inferences from calibration samples to the respective populations from which these were drawn. The additional focus is on the range of possible conditions under which invariance is expected to hold. It depends, then on the type (or strength) of inferences one wants to draw.

2.3 Desired Types of Inferences Zumbo (2001, 2007) presented the following framework modeled on Draper’s (1995) approach to classifying causal claims in the social sciences and, in turn, on Lindley’s (1972) and de Finetti’s (1974–1975) predictive approach to inference.

On Matters of Invariance in Latent Variable Models: Reflections . . .

403

Fig. 1 Zumbo’s Draper–Lindley–de Finetti framework

Unlike Draper, Zumbo focused on the inferences about items and persons made in assessment and testing. The foundation of the approach is the exchangeability of: • Sampled and un-sampled respondents (i.e., examinees or test-takers); this could be based on the selection function for sub-populations. • Realized and unrealized items. • Exchangeable sub-populations of respondents and items. By exchangeability you can think of it in the purely mechanical sense. I have found this useful to help me think of the various possibilities, whether they happen regularly or not. This also helps me detail the range of conditions under which invariance is expected to hold. Figure 1 is a description of the resulting fourfold table and in the range of invariant claims. So the answer to the question of whether there is invariance in CTT is “yes”, however, not of the flavor and sense of invariance that we get with latent variable models. For example, we proved that measurements that are parallel in a given population are also parallel in any subpopulation. This was a point also emphasized by Lord and Novick (1968). We provide other new results but they are of the flavor seen in the sentence above about parallel tests. Let us then write CTT from a latent variable framework and see what we get in terms of invariance therein.

404

B.D. Zumbo

3 Is There “Invariance” in CTT? Zimmerman and Zumbo (2001) described and extended a model of tests and measurements that identifies test scores with Hilbert space vectors and true and error components of scores with linear operators. One of the niceties of that level of abstraction is that we were able to clearly show that some of the properties and quantities from CTT hold for the entire population of individuals, as well as any subpopulation of individuals. For our purposes today I will not go into the details of the algebra, but it is important to note that the feel for invariance in the observed score characterization of CTT is strikingly different from that in IRT. Furthermore, many of the examples of invariance we prove in the Zimmerman and Zumbo (2001) paper are of the variety seen in all classical mathematical statistics. It is well known in CTT that: X i D i C " i var.Xi / D var.i / C var."i /  cov.i ; "i / D 0 Properties of i ; "i : E."i / D 0 It should be noted that these last two statements about the covariance and expected value are not assumptions, per se, but rather properties implied from the definition of true and error variables. I will sketch a bit of a model of CTT to motivate my remarks. See Steyer (2001) for a full description of the details. Let us focus on the essentially tau-equivalent test model because it is the one that is most commonly referred to in observed score CTT. The model is: Xi and Xj are a pair of tests from the set X1 ; ::::; Xm with the assumptions W .1/ i D j C ij ; ij 2

Suggest Documents