Document not found! Please try again

Monotonicity and conditional independence in models for ... - CiteSeerX

1 downloads 0 Views 581KB Size Report
Dec 22, 2000 - ½, LI and M, has been studied under many names in the literature. ... The principal tools of that theory are adaptations of Loevinger's (1948) А.
Monotonicity and conditional independence in models for student assessment and attitude measurement Brian W. Junker Carnegie Mellon University Pittsburgh PA [email protected] December 22, 2000

Abstract Since the beginnings of factor analysis in the early 20th century, through the development of item response theory (IRT) models by Guttman, Lazarsfeld, Rasch, Lord, and others, the properties of monotonicity and conditional independence have played a key role. In these models, respondents, examinees or experimental subjects produce a multivariate response, in which each coordinate is a response to a different stimulus (task assignment, exam question, opinion survey question, etc.). The goal is to use these observed responses to make inferences about a common underling latent variable, for each respondent; such models are common in student assessment and attitude measurement. Monotonicity asserts that each such response is stochastically ordered by the latent variable; this allows us to interpret the latent variable as a ”propensity” to respond positively to each stimulus. Conditional independence asserts that the responses are conditionally independent given the latent variable. This gives factor analyis models, IRT models, and related models a simple graphical structure. Monotonicity and conditional independence together impose strong conditions on the marginal distribution of the data, treating the latent variable as missing data. First I review some of the modern theory of monotonicity and conditional independence in (especially) nonparametric IRT models, including the representation theorem of Junker and Ellis (1997) and the relationship between the stochastic ordering property described above, and a similar stochastic ordering property in which an observed total score is substituted for the latent trait (Junker, 1993; Junker and Sijtsma, 2000). I will also suggest some ways, primarily due to Stout (1990) and Ramsay (1991), in which these two conditions can be weakened without inferential harm. Next I illustrate the role of these conditions in model building, interpretation, and inference, especially in recent models for student assessment (e.g. Mislevy, 1996; Junker, 2001) and attitude measurement (e.g. Andrich and Luo, 1993; Johnson, 2001).

Contents 1 Introduction

1

2 Item Response Theory (IRT)

2

2.1

Basic Assumptions: The Monotone Unidimensional IRT Model . . . . . . . . . . .

2

2.2

Nonparametric IRT: Scale Construction and Model Features . . . . . . . . . . . .

6

2.3

Parametric IRT: Modeling Dependence . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Example: Extending IRT Ideas to Cognitive Assessment 3.1

13

Two IRT-like Cognitive Assessment Models . . . . . . . . . . . . . . . . . . . . . 15 3.1.1

Deterministic Inputs, Noisy “And” Gate (DINA) . . . . . . . . . . . . . . 17

3.1.2

Noisy Inputs, Deterministic “And” Gate (NIDA) . . . . . . . . . . . . . . 18

3.2

Exploring Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3

Monotonicity Properties in Cognitive Assessment Models . . . . . . . . . . . . . . 22

4 Example: Direct-Response Probabilistic Unfolding Models

25

4.1

IRT-like Unfolding Models and the Monotonicity Assumption . . . . . . . . . . . 27

4.2

Nonparametric Estimators of Subject Locations and IRF’s in Unfolding Data . . . 38

4.3

Nonparametric Curve Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5 Summary

43

6 References

44

Monotonicity and conditional independence

1

1 Introduction In education and the social sciences, we often ask subjects to respond to a set of items (questions, statements or tasks) on survey forms, self-report inventories, and mental tests, that are coded as discrete—often dichotomous—variables. In many settings it is natural to think, in analogy with factor analysis, that there is one or more continuous latent variables for each subject—such as political agency, extrovertedness, or ability in some area of mathematics—that can be measured or estimated using the positive and negative responses to these items. Item response theory (IRT; e.g., Fischer & Molenaar, 1995; Van der Linden & Hambleton, 1997) is a psychometric approach to modeling data from social surveys and educational and psychological tests, dating back at least to Lord (1952) and Rasch (1960), and to the work of Lazarsfeld, Loevinger and Guttman before them. IRT enables us to study the characteristics of test or survey items across multiple respondent populations, and to study respondents’ propensities to answer positively across various items. IRT has arguably been one of the most successful and widely used techniques in psychometrics, with applications in developmental, social, educational and cognitive psychology for example, as well as in medical research, demography and other social science settings. In Section 2, we review review some of the modern theory of monotonicity and conditional independence in (especially) nonparametric IRT models, including the representation theorem of Junker and Ellis (1997) as well as certain useful monotonicity and stochastic ordering properties. We also review some basic parametric IRT models and model building methodology. The meat of the paper is in Sections 3 and 4. In Section 3 we explore two models for cognitive assessment (that is, for inferring from examination data what skills students do or do not possess), that generalize the basic IRT model by allowing for a discrete, multidimensional latent variable, rather than the usual continuous unidimensional latent variable of item response theory (e.g. Mislevy, 1996; Junker, 2001). In this example we see that IRT models are closely related to discrete-node Bayesian inference networks, which allow us to model violations of the basic condi-

Monotonicity and conditional independence

2

tional independence assumption in IRT based on common cognitive features of different examination items. We also see that the fundamental idea of monotonicity—positive association between examination items modeled by dependence on the latent variable(s)—in IRT models still plays a fundamental role in these cognitive assessment models. In Section 4 we consider a different class of models, direct-response unfolding models, also called proximity models. These models are used in social psychology and political science research, going back to the work of Thurstone (1928) and Coombs (1964) for example. In these models, positive responses arise when a survey respondent holds an opinion “close to” the opinion or attitude expressed by the item. This idea leads us to replace the monotonicity assumption of Section 2 with a “unimodality” assumption. Nevertheless, we show two ways in which monotone IRT models play a role in unfolding models (e.g. Andrich and Luo, 1993; Johnson, 2001), and illustrate with an application to a data set concerning political motivations of college students. These examples illustrate how models for a variety of novel situations can be built up by using and generalizing the assumptions and building blocks developed in Section 2. They also illustrate the central roles of monotonicity and conditional indepedence in specifying models for highly multivariate discrete response data. We return to these themes in Section 5 at the end of this paper.

2 Item Response Theory (IRT) 2.1 Basic Assumptions: The Monotone Unidimensional IRT Model To fix notation, let us consider J dichotomous item response variables for each of

N examinees Xij = 1 if subject i responds correctly to task j , and 0 otherwise, with values xij , i = 1; : : : ; N , j = 1; : : : ; J . The distribution of each Xij typically depends on some person parameter i (which may be multidimensional) and some item parameter j (which may be multidimensional). Two cases are almost ubiquitous in IRT: binary responses, in which Xij takes on only the two values 0 to indicate a negative response and 1 to indicate a positive response; and polytomous responses,

Monotonicity and conditional independence

3

in which Xij takes mj values, perhaps 1, 2, . . . , mj , or perhaps 0, 1, . . . , mj

1. We will denote

unspecified observed values of Xij by xij , or sometimes simply s. The usual unidimensional item response theory (IRT) models for polytomous responses have the form

P [Xi1 = xi1 ; : : : ; XiJ = xiJ ℄ =

Z

P [Xi1 = xi1 ; : : : ; XiJ = xiJ ji ℄ dF (i )

(1)

where we assume Unidimensionality (UD): i takes values on the real line; and Local Independence (LI):

P [Xi1 = xi1 ; : : : ; XiJ = xiJ j℄ =

J

Y

j =1

P [Xij = xij ji ℄:

(2)

The item category response curves (ICRF’s)

Pjs(i ) = P [Xij = sji ℄ might assume nonparametric or parametric forms; see van der Linden and Hambleton (1997) for a variety of specifications of the ICRF’s that are currently in use. When the responses can be ordered in some way it is also natural to assume Monotonicity (M): The item step response function (ISRF)

Pjs (i ) = P [Xij > sji ℄ =

m

j X

x=(s+1)

Pjx(i ) is nondecreasing in i for all s.

(3)

Occasionally, it will be clear from context that assumptions or assertions are being made for every i (the subject or examinee index). In these cases we will drop i from the notation for simplicity. For discrete polytomous responses it is equivalent to consider the monotonicity of P [Xj

or

P [Xj

> sj℄

 sj℄ (the difference being whether category s is included in the probability or not).

Monotonicity and conditional independence

4

When the response is dichotomous (0/1), we refer to the ICRF Pj ( )

 P [Xj = 1j℄ as the item

response function (IRF). In that case, equation (2) reduces to a product-binomial form familiar in the IRT literature,

P [Xi1 = xi1 ; : : : ; XiJ = xiJ ℄ =

J

Y

j =1

Pj (i )xij [1 Pj (i )℄1

xij :

(4)

The model described by d = 1, LI and M, has been studied under many names in the literature. Mokken (1971) studies binary versions of the model under the name monotone homogeneity; Holland and Rosenbaum (1986) call it the monotone unidimensional latent trait model; Junker (1991, 1993) calls it the strictly unidimensional model. The assumptions of unidimensionality, local independence and monotonicity can be relaxed in various ways, as we shall see in Sections 3 and 4, but this basic model has been the foundation of much progress in IRT work. A basic and familiar model in this area is the “two-parameter logistic”, or 2PL, model for dichotomous item response variables (e.g., Chapter 1 of Van der Linden & Hambleton, 1997), given by the monotone homogeneity assumptions (1), (2) and (3) and the assumption of a logistic form for the IRF’s

Pj (i ; j ; j )  P [Xij = 1ji ; j ; j ℄ =

1

1 + exp( j [i j ℄) ;

(5)

describing the dichotomous response of examinee i to item j . The “discrimination” parameter j controls the rate of increase of this logistic curve, and is directly related to the Fisher information for estimating  , and the “difficulty” parameter j is the location on the information is maximal; note also that at i

 scale at which the

= j , P [Xij = 1℄ = 1=2. The 3PL (three-parameter

logistic) model extends the 2PL model by adding a non-zero lower asymptote to each item response function; on the other hand the Rasch or 1PL (one-parameter logistic) model is a restriction of the 2PL model obtained by setting j identically equal to some constant, usually 1. Figure 1 displays a Rasch IRF and a 3PL IRF, to fix ideas. Item response theory can be approached from two perspectives: nonparametric IRT and parametric IRT. Nonparametric IRT focuses on two primary issues: developing exploratory data analysis techniques for selecting items that measure the same latent variable (an enterprise called “scale

Monotonicity and conditional independence

0.0

0.2

0.4

0.6

0.8

1.0

5

-4

-2

0

2

4

Figure 1: Logistic response functions Pj ( ) for binary data. The lower curve is a Rasch IRF, which may be obtained by taking j = 1 and j = 0:25 in (5). The upper curve is a 3PL IRF, which adds a lower asymptote (0.4 in this case) to the basic form in (5). The dotted lines indicate the value of  for which Pj ( ) is halfway between its lower and upper asymptotes. This is the value j in each of the Rasch, 2PL and 3PL models.

Monotonicity and conditional independence

6

construction”), and identifying modeling and measurement properties that follow from basic assumptions like LI, M and UD. Parametric IRT combines parametric forms for the terms in the likelihood, like (5), with hierarchical Bayes and mixture modeling techniques and computations, to develop models and do detailed inference in complex situations that generalize the basic framework we have described above. In the remainder of this section we survey some ideas in nonparametric and parametric IRT. In the remainder of this section, we review some ideas in nonparametric IRT and parametric IRT.

2.2 Nonparametric IRT: Scale Construction and Model Features For dichotomous items (Xj

= 0 or 1 indicating incorrect or correct answer) a theory of scale

construction—selecting groups of items that hang together well in the sense that the monotone homogeneity model is probably appropriate for them—has existed at least since Mokken (1971; see also Molenaar, 1997). The principal tools of that theory are adaptations of Loevinger’s (1948) H coefficients, comparing the marginal covariance Cov (Xi ; Xj ) of each item pair with the maximum covariance Cov max (Xi ; Xj ) possible, preserving the margins of the observed

Xi  Xj table. The

bound Cov max (Xi ; Xj ) is obtained by adjusting the table to remove Guttman errors (i.e. cases in which an examinee misses the easy item and gets the hard item); and indeed the original formulas for the H coefficients were expressed as ratios of Guttman errors. The

H coefficients are directly sensitive only to high or low correlations between items, rather than to local independence given  as in equations (2) and (4). If the correlations are near zero, we may be unsatisfied to assume that such a  exists (see for example the discussion of co-monotonicity in Junker & Ellis, 1997). While

a perfect Guttman scale would produce only indirect evidence of a

H coefficients equal to one, large H coefficients provide

 “explaining” covariation in the item responses in the sense of local

independence. More direct attacks on the problem of establishing such a

 from data analysis have been

pursued by Stout, Ramsay and their students and colleagues. Stout (1990; and subsequent work, for

Monotonicity and conditional independence

7

example Stout, Habing, Douglas, Kim, Roussos & Zhang, 1996) basically constructs a proxy for  from the total score on a specially-selected subset of the items and uses it to test a weakened version of monotone homogeneity, Stout’s essential unidimensionality model. The central assumptions of Stout’s approach are essential independence, which asserts that any conditional dependence beteen items, given  , is so weak that the conditional variance of the test score X J to zero as J

! 1,

J lim J !1 2

!

XX

g jX+ = s℄ is non-decreasing in s, 8 :

(12)

Monotonicity and conditional independence

9

Hemker, Sijtsma, Molenaar and Junker (1997) call this property “SOL” (Stochastic Ordering of the Latent trait by the sum score), and show that, surprisingly, this property does not generalize to “most” nonparametric ordered-polytomous response IRT models. Thus for example, rules based on cutoffs for X+ need not be most powerful for “mastery decisions” in the sense of 

> ; on the

other hand, such cutoff rules for X+ are most powerful for mastery decisions in the nonparametric dichotomous response case (Grayson, 1988; Huynh, 1994). In the process of developing these stochastic ordering ideas, Hemker et al. (1997) and Sijtsma and Hemker (1998) developed a useful taxonomy of nonparametric and parametric item response models, based on the cumulative, continuation-ratio, and adjacent-category logits that are commonly used to define parametric families of polytomous IRT models. Common forms of graded response models (GRM; e.g. Samejima, 1997), sequential models (SM; e.g. Tutz, 1997), and partial credit models (PCM; Masters 1982) assume, respectively, that the logit functions logit P [Xj

>

j℄, logit P [Xj > jX > 1; ℄, and logit P [Xj = + 1jXj 2 f ; + 1g; ℄ are linear in . An illustration of the PCM is given in Figure 2. The graph on the left shows the ICRF’s P [Xj = k j ℄, and the graph on the right shows the cumulative curves P [Xj > k j ℄. Hemker’s analogous nonparametric model classes, the np-GRM, np-SM and np-PCM, assume only that these logits are non-decreasing in  .

This taxonomy is a powerful way to organize ideas about model definition and model development in applications of both parametric and nonparametric IRT. It follows that the np-PCM class is nested within the np-SM class, which is nested within the np-GRM class; moreover all three linear-logit families above (GRM, SM and PCM) are in fact subsets of the np-PCM class. As Sijtsma and Hemker (1998) show, this approach also highlights links between polytomous IRT and the machinery of generalized linear models (McCullagh & Nelder, 1989), just as it has long been realized that parametric dichotomous IRT is basically multivariate mixed effects logistic regression (e.g., Lee & Nelder, 1996).

Monotonicity and conditional independence

0.8

0.8

1.0

10

0.6

1 2 3 4 5

0.0

0.0

0.2

0.2

0.4

0.4

0.6

1 2 3 4 5

-3

-2

-1

0

1

2

3

Figure 2: Partial credit response functions. P [Xij

-3

-2

-1

0

1

2

3

= kj℄ on the left; P [Xij  kj℄ on the right.

2.3 Parametric IRT: Modeling Dependence Parametric IRT, as surveyed for example in the edited volumes of Fischer and Molenaar (1995) and Van der Linden and Hambleton (1997), is a well-established, wildly successful statistical modeling enterprise. IRT models have greatly extended the data analytic reach of psychometricians, social scientists, and educational measurement specialists. Parametric IRT models, extended by hierarchical mixture/Bayesian modeling and estimation strategies, make it possible in principle and in practice to incorporate covariates and other structure. Many violations of the basic local independence assumption of IRT models are in fact due to unmodeled heterogeneity of subjects and items, that can now be explicitly modeled using these methods. The estimation of group effects and the use of examinee and item covariates in estimating item parameters plays an important role in the analysis of large multi-site educational assessments such as the National Assessment of Educational Progress (NAEP; e.g., Algina, 1992; Zwick 1992). These efforts, which go back at least to Mislevy (1985), can be recognized as the wedding of

Monotonicity and conditional independence

11

hierarchical linear or multi-level modeling methodology with standard dichotomous and polytomous IRT models. The general model is a two-way hierarchical structure for N individuals and J response variables, as follows First level:

Second level:

Third level:

Xij

 P (i; j

P (i ; j )℄ i = 1; : : : ; N ; j = 1; : : : ; J

)Xij [1

 fi(jf ); j  gj ( jg ); i

f g

 f (f )  g (g )

where i is the usual person parameter,

each i each j

(1

Xij ) ;

9 > > > > > > > > > > > > > > > > > > > > > > > = > > > > > > > > > > > > > > > > > > > > > > > ;

(13)

j is the vector of item parameters for item j (e.g., j =

( j ; j ) in the 2PL model (5) above), and where independence is assumed between j ’s conditional on i at the first level and between i’s at the second level. The terms f and g represent sets of hyperparameters needed to specify the person distributions fi and item distributions gj , with hyperprior distributions f and g , respectively. Terms in the first level, for example, are multiplied together to produce the usual joint likelihood for the N  J item response matrix [Xij ℄; the second

and third levels can be used to impose constraints on the first level parameters and latent variable, to deduce what integrations are needed for marginal likelihood approaches, etc. The model (13) is expressed for dichotomous items, for simplicity of exposition, but can easily be generalized to polytomous items, or combinations of item types (see for example Patz & Junker, 1999). It is also usual to assume for fi ( ) a single latent trait distribution not depending on i, and similarly for gj . Alternatively, we may allow the distribution of  to depend hierarchically on examinee covari-

ates, that is, instead of taking fi () to be a single latent trait distribution in (13), we can allow it to depend on examinee covariates to model population heterogeneity, as in the multi-group IRT models of Mislevy (1985) and Bock and Zimowski (1997), or to reflect hierarchical linear structure as

Monotonicity and conditional independence

12

in Fox and Glas (1998). We may also elaborate gj (), for example by building linear structure into

the item parameters. For example in the 2PL model, where j 2 6 6 6 6 6 6 6 6 6 6 6 6 4

1 2 .. .

J J

= ( j ; j ), we might take

3 7 7 7 7 7 7 7 7 7 7 7 1 7 5

3

2

=Q

6 6 6 6 6 6 6 6 4

1 2

.. .

7 7 7 7 7 7 7 7 5

;

(14)

K

where Q is an appropriate design matrix of full column rank, to reflect common sources ( k ’s) of item difficulty ( j ’s) across items. In the case of Rasch (1PL) IRF’s, this is the linear logistic test model (LLTM; Scheiblechner, 1972; Fischer, 1973). This model and its various generalizations continues to be used for psychological experiments with multiple outcomes per subject (e.g., Fischer and Molenaar, 1995, and the references therein) and for research in cognitively-motivated test design. Alternatively, we might hypothesize that i is in fact d-dimensional, i

= (i ; : : : ; id), and 1

obtain an IRF for dichotomous responses of the form

P [Xj = 1j1 ; : : : ; d ℄ = P (aj 11 +    + ajd d

j );

(15)

where P (t) might be the logistic or probit response function for example, and all ajk are assumed to be nonnegative. B´eguin and Glas (1999) survey the area well (see also several contributed chapters in Van der Linden & Hamilton, 1997) and give an MCMC algorithm for estimating these models; Gibbons and Hedeker (1997) pursue related developments in biostatistical and psychiatric applications. As an alternative to the additive, compensatory dependence on the coordinates of

 in equa-

tion (15), conjunctive models have also been explored. One example is the multicomponent latent trait model (MLTM) of Embretson (e.g 1985, 1997), that combines unidimensional models for

Monotonicity and conditional independence

13

components of response conjunctively, so that

P [Xj = 1j1 ; : : : ; d ℄ =

d

Y

`=1

Pj`(` )

(16)

where Pj` (` ) are parametric unidimensional dichotomous response functions. The usual interpre-

tation is that the Pj` (` ) represent skills or subtasks all of which must be performed correctly in

order to generate a correct response to the item itself. Janssen and De Boeck (1997) give a recent application. Both the additive [equation (15)] and the conjunctive [equation (16)] models described here relax the basic IRT assumptions discussed in Section 2.1, by allowing  to be multidimensional. Both types of model satisfy a coordinatewise monotonicity condition that naturally generalizes monotonicity discussed in (3) to the case of multidimensional  , namely that the IRF

P [Xj = 1j1 ; : : : ; d ℄ is monotone in each coordinate k individually, holding the other k0 ’s, k 0

1; : : : ; d fixed.

= 1; 2; : : : ; k 1; k +

3 Example: Extending IRT Ideas to Cognitive Assessment For our first example, we will examine two IRT-like models that have been proposed for use in cognitive assessment and cognitive diagnosis. Such models, and inferences based on them, are useful when we are using tests, not for rank-ordering examinees or certifying broad levels of mastery of the testing material, but for diagnosing which specific skills, pieces of knowledge, beliefs, and other cognitive attributes a subject may or may not have, for designing or improving an instructional program for children, or to provide feedback on particular aspects of learning and performance to teachers and students. Like the models just described in Section 2.3, these models have a multidimensional latent variable, and under simple conditions satisfy the coordinatewise monotonicity condition discussed at the end of Section 2.3. The remainder of this section is an abridged version of an example presented by Junker and Sijtsma (2001).

Monotonicity and conditional independence

14

Sijtsma and Verweij (1999) consider a set of transitive reasoning tasks, in which 417 school

A, B , C , : : :, with physical attributes YA, YB , YC , : : :. For example the objects might be sticks, YA the length of stick A, YB the length of stick B , etc. Relationships among some pairs of the attributes, such as YA < YB and YB < YC are shown to each child. On the children were shown objects

basis of the pairwise relationships shown, the child is asked to reason about what the relationship between some pair not shown, e.g. YA vs. YC in this example. Reasoning directly from the premises

YA < YB and YB < YC to the conclusion YA < YC , without guessing and without using any other information, is an example of transitive reasoning (see Sijtsma & Verweij, 1999; and Verweij, Sijtsma & Koops, 1999, for summaries of the relevant developmental psychology). These tasks occurred in the context of three types of objects featuring different attributes, and using different numbers of objects per task. Within each task, students were asked to solve a fixed number of deductive items, depending on the task (see Sijtsma & Verweij, 1999, for details). A summary of the nine tasks is given in Table 1. In order to facilitate analyses with binary-response models, we have recoded the Sijtsma and Verweij so that a task is scored as correct if all the items within that task were answered correctly using a correct deductive strategy; otherwise the task was scored as incorrect. This leads to a 417  9 array of 1’s (correct) and 0’s (incorrect). In this array, the scores for all examinees on tasks 5 and 6, both involving size of disks, were 0’s. If we desire to use the transitive reasoning scale as evidence in designing or improving an instructional program for children, or to provide feedback on particular aspects of transitive reasoning to teachers and students, then analyses with the monotone unidimensional IRT model described in Section 2.1 will not help, because they only tell us the relative positions of examinees on a unidimensional latent scale. Instead we must entertain models that explicitly models the task performance in terms of presence or absence of particular cognitive attributes related to transitive reasoning. To illustrate, let us consider the rough task analysis for these nine tasks in Table 2, corresponding to the task summary in Table 1. The first three attributes are simply the ability to recognize or reason about transitivity in the context of length, size and weight. In addition, the tasks place differential load on subjects’ working memory capacity. Thus the next three cogni-

Monotonicity and conditional independence

Task Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9

Number & Type of Objects 3 Sticks 4 Sticks 5 Sticks 3 Disks 4 Disks 5 Disks 3 Balls 4 Balls 5 Balls

15

Featured Number of Attribute Premises Length 2 Length 3 Length 4 Size 2 Size 3 Size 4 Weight 2 Weight 3 Weight 4

Number of Items 1 2 3 1 2 3 1 2 3

Table 1: The nine transitive reasoning tasks of Sijtsma and Verweij (1999). Expected a posteriori (EAP) estimates and posterior standard deviations (PSD) for the difficulty parameters in a Bayesian fit of the Rasch model to the data are recorded on the right, for later reference. tive attributes correspond to three levels of working memory capacity: maintaining the first given premise in a task in working memory, maintaining the second task premise in working memory, and maintaining the third task premise in working memory.

3.1 Two IRT-like Cognitive Assessment Models We focus here on two discrete latent attribute models, that allow both for modeling the cognitive loads of items and inferences about the cognitive attributes of examinees. In both models the latent variable is a vector of 0’s and 1’s for each examinee, indicating the absence or presence of particular cognitive attributes for that examinee, and we use Table 2 to determine which attributes the examinee needs to perform each task correctly. As in Section 2, we will assume there are

N

examinees and J binary task performance variables, and in addition we suppose there is a fixed set of K cognitive attributes involved in performing these tasks; different subsets of attributes may be

Monotonicity and conditional independence

Qjk 1 2 3 4 5 6 7 8 9

16

Context Premise st Length Size Weight 1 2nd 3rd 1 2 3 4 5 6 1 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 1 1 0 1 0 1 0 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 1 1 0 0 0 0 1 1 1 0 0 0 1 1 1 1

Table 2: Decomposition of the Sijtsma and Verweij (1999) tasks (j = 1; : : : ; 9) into hypothetical cognitive attributes (k = 1; : : : ; 6). Qjk = 1 if and only iff task j requires attribute k . involved in different tasks. For both models we define

Xij = 1 or 0 indicating whether or not student i performed task j correctly

Qjk = 1 or 0 indicating whether or not attribute k is relevant to task j ik = 1 or 0 indicating whether or not student i possesses attribute k

(17)

The values Qij are fixed in advance, like the design matrix in an LLTM model of equation (14). The

Qij can in fact be assembled into a Q-matrix of the type discussed by Tatsuoka (1995). Figure 3 illustrates the structure defined by Xij , Qjk and ik graphically as a Bayes network. We wish to make inferences about the latent variables ik , or to make inferences about the relationship between these attributes and observed task performance. Both models are most easily specified using the latent response framework elaborated by Maris (1995), which is closely related to the notion of data augmentation in statistical estimation (e.g. Tanner, 1996).

           

Monotonicity and conditional independence

Attribute i1

Attribute i2

17

Attribute i3

A AA   A  AA   AA  AU 

Task Xi1

Attribute i4

Attribute i5





 





 

 

 

R  ? /

Task Xi2

Task Xi3

Figure 3: One-layer Bayes network for discrete cognitive attributes models. For examinee i, ik = 1 or 0 describes presence or absence of latent attribute k; Xij = 1 or 0 describes success or failure performing task j ; and Qjk = 1 or 0 describes the presence or absence of edges in the graph. Tasks are conditionally independent given attributes (local independence); attributes relevant to a task may combine conjunctively, as in models (19) and (21), or nonconjunctively, to influence task performance. 3.1.1 Deterministic Inputs, Noisy “And” Gate (DINA) The first model we consider has been the foundation of several approaches to cognitive diagnosis and assessment (see the references in Tatsuoka, 1995; and Doignon & Falmange, 1999). It was considered in detail by Haertel (1989; see also Macready & Dayton, 1977) who identified it as a constrained latent class model. In this model, we also define latent response variables

ij

 j ( i) =

Y

k: Qjk =1

ik =

K

Y

k=1

ikQjk ;

i has all the attributes required for task j . In Tatsuoka’s terminology, the latent vectors i = ( i1 ; : : : ; iK ) are called “knowledge states”, and the vectors i = (i1 ; : : : ; iJ ) are called “ideal response patterns”, since they represent a determinisitic preindicating whether or not student

diction of task performance from each examinee’s knowledge state.

Monotonicity and conditional independence

18

The latent response variables ij are related to observed task performances

Xij according to

sj = P [Xij = 0jij = 1℄, a per-task “slip” probability; and gj = P [Xij = 1jij = 0℄, a per-task “guessing” probability. Note that sj and gj are merely the false negative and false positive rates in a simple signal detection model for detecting ij from noisy observations

Xij . We have chosen the names “slip” and

“guessing” probabilities to be suggestive, but there are other reasons than slips and guessing— ranging from poor wording of the task description for students to inadequate specification of the Q matrix—for the signal detection to be less than perfect (see for example the discussion in DiBello, Stout, & Roussos, 1995). The IRF for a single task is

P [Xij = 1j ; s; g ℄ = (1 sj )ij gj1

ij

 Pj ( i);

(18)

Note that each ij functions as an “and” gate (i.e., it is a binary function of binary inputs whose Q

value is 1 if and only if all the inputs are 1’s), combining determinisic inputs ikjk which indicate which task-relevant attributes are possessed by the examinee; and each task performance

Xij is

modeled as a noisy observation of each ij . We refer to the model hereafter as the DINA (deter-

ministic inputs, noisy “and”) model. From (18) it is also clear that Pj ( i ) will be coordinatewise monotone (M) in i if and only if 1

sj > gj .. Assuming local independence and independence

among examinees, the joint likelihood for all responses under the DINA model is

P [Xij = xij ; 8 i; j j ; s; g ℄ =

=

N Y J h Y i=1 j =1

(1 sj )xij sj

1

N J

Y Y

i=1 j =1

i gxij (1 g ) j j

xij j ( i

Pj ( i )xij [1 Pj ( i )℄1

) h

1

xij

i1

i

j (

)

xij

(19)

3.1.2 Noisy Inputs, Deterministic “And” Gate (NIDA) The second model we consider is a discrete-latent-space analogue of Embretson’s multicomponent latent trait models (MLTM’s); this model was recently discussed by Maris (1999) for example. In

Monotonicity and conditional independence

19

this model we take Xij , Qjk and ik as in (17), and define latent response variables

ijk = 1 or 0 indicating whether or not student i’s performance in the context of task j is consistent with possessing attribute k The latent response variables ijk are related to the student’s knowledge state i according to

sk = P [ijk = 0j ik = 1; Qjk = 1℄, a per-attribute “slip” probability, and gk = P [ijk = 1j ik = 0; Qjk = 1℄, a per-attribute “guessing” probability; and for completeness we define P [ijk

= 1j ik = a; Qjk = 0℄ = 1, regardless of the value a of ik .

Again, sk and gk are merely false negative and false positive error probabilites in a signal detection model. Observed task performance is related to the latent response variables via Y

Xij =

k: Qjk =1

ijk 

K

Y

k=1

ijk ;

so that the IRF for this new model is

P [Xij = 1j ; s; g ℄ =

=

K

Y h

k=1

iQ sk ) ik gk1 ik jk

(1

=

K

Y

k=1 K Y

P [ijk = 1j ik ; Qjk ℄

1 sk

!

ik Qjk Y K

gk

k=1

k=1

gkQjk

 Pj ( i):

(20)

In this model, noisy inputs ijk reflecting possession of attributes ik by examinees are combined in a deterministic “and” gate Xij ; we refer to the model hereafter as the NIDA model (noisy inputs, deterministic “and”). Again it is clear that the vector i plays the role of the latent variable i , that

sk and gk play the role of j , and that this IRF is monotone in the coordinates of i as long as (1 sj ) > gj . The joint model for all responses in the NIDA model is P [Xij = xij ; 8 i; j j ; s; g ℄ =

=

N J

Y Y

(

K

Y h

i=1 j =1 k=1

(1

N J

Y Y

i=1 j =1

Pj ( i)xij [1 Pj ( i)℄1

iQ jk sk ) ik gk1 ik

xij

)

(

1

K

Y h

k=1

(1

xij

iQ jk sk ) ik gk1 ik

)1

xij

(21)

Monotonicity and conditional independence

20

3.2 Exploring Monotonicity Both the DINA and NIDA models are stochastic conjunctive models for task performance; under the monotonicity conditions

1

s > g , examinees must possess all attributes listed for each

task, in order to maximize the probability of successful performance of that task. Both models are constrained latent class models, and therefore closely related to IRT models, as we have tried to suggest in the joint likelihood expressions (19) and (21). They can also both be seen as simple onelayer Bayes inference networks for discrete variables (e.g. Mislevy, 1996) for task performance, as illustrated in Figure 3. In general Bayes net models need not be conjunctive (e.g. Heckerman, 1996) but when examinees are presumed to be using a single strategy, conjunctive models seem natural. To explore the issue of monotonicity in real data, we used BUGS 0.6 (Spiegelhalter et al., 1997) to fit the DINA and NIDA models to the dichotomous task data described above, using the

Q-matrix in Table 2. We used Bayesian formulations of the models, in which population probabilities k = P [ ik = 1℄ were assumed to have independent, uniform priors Unif[0,1] on the unit interval. Independent, flat priors Unif[0,gmax ] and Unif[0,smax ] were also used on the guessing and slip probabilities in each model, with the upper bounds gmax and smax estimated from the data (when gmax and smax are small, these prior choices tend to prefer fits satisfying the monotonicity conditions 1 s > g ). For each model we ran five Markov Chain Monte Carlo (MCMC) chains of length 3000 from various randomly-chosen start points; the first 2000 steps of each chain were discarded as burn-in, and the remaining 1000 steps were thinned by retaining only every fifth observation, for a total of 200 observations per chain. Tables 3 and 4 list estimated posterior means (expected a posteriori values, or EAP’s) and posterior standard deviations (PSD’s) for each of the guessing and slip probabilities in the two models, using 1000 MCMC steps obtained by pooling the five thinned chains for each model. Both models showed some evidence of under-identification (slow mixing and multiple maxima), as would be expected from Tatsuoka (1995) and Maris (1999). However, we were still able to make some tentative conclusions about monotonicity conditions

Monotonicity and conditional independence

Task (j ) 1 2 3 4 5 6 7 8 9

g^j EAP 0.478 0.363 0.419 0.657 0.002 0.002 0.391 0.539 0.411

21

s^j PSD 0.167 0.162 0.255 0.199 0.002 0.002 0.420 0.242 0.162

g^max

EAP 0.486 0.487 0.479 0.488 0.462 0.464 0.486 0.489 0.480

PSD 0.277 0.281 0.292 0.279 0.270 0.270 0.274 0.275 0.283

1 s^p j > g^j ? (1 g^j )=g^j  (1 s^j )=s^j

p p  p p p  p

1.15 1.85 1.51 0.55 581.09 576.43 1.65 0.89 1.55

s^max

0.910 0.081 0.910 0.079 Table 3: Tentative expected a posteriori (EAP) estimates and posterior standard deviations (PSD) for the guessing and slip probabilities in the DINA model, for the transitive reasoning data, with Q-matrix as in Table 2. The last column is discussed in Section 3.3.

g^k

s^k

Attrib (k )

EAP

PSD

EAP

PSD

1 2 3 4 5 6

0.467 0.749 0.764 0.364 0.176 0.061

0.364 0.207 0.246 0.319 0.168 0.115

0.369 0.161 0.005 0.163 0.785 0.597

0.392 0.125 0.009 0.318 0.129 0.294

g^max

s^max

1 s^k > g^k ?

p p p p p p

1

s^k g^k

1.351 1.120 1.302 2.299 1.222 6.607

log

1

s^k g^k

0.301 0.113 0.264 0.833 0.200 1.888

0.877 0.109 0.877 0.108 Table 4: Tentative expected a posteriori (EAP) estimates and posterior standard deviations (PSD) for the guessing and slip probabilities in the NIDA model, for the transitive reasoning data, with Q-matrix as in Table 2. The last two columns are discussed in Section 3.3.

Monotonicity and conditional independence

22

from Tables 3 and 4. First, most of the point estimates satisfy the monotonicty condition 1

or equivalently, g + s

s > g,

< 1 (the exceptions are the guessing and slip probabilities for tasks 4 and 8

under the DINA model). Examination of the posterior distributions in each model shows that the posterior probability that 1

s > g for each task (DINA model) or latent attribute (NIDA model)

is near 0.50 in each case: this certainly does not contradict the hypothesis that M holds, but neither is it a strong confirmation of M. Second, in the DINA model, tasks 5 and 6 on which all examinees scored zeros yield the very plausible estimates gj

= 0:002 (P SD = 0:002): since all responses in

which the examinee did not have the requisite attributes are zeros, there is overwhelming evidence that these examinees could not “guess” the task and hence gj is near zero with great certainty. Third, except for these two guessing probabilities, all the guessing and slip probabilities in the DINA model are near their prior means, with fairly large posterior SD’s. On the other hand, the guessing and slip probabilities in the NIDA model seem to have moved farther from their prior means, and in some cases with relatively small PSD’s, indicating some certainty in these values.

3.3 Monotonicity Properties in Cognitive Assessment Models One of the strengths of the nonparametric IRT approach outlined in Section 2.2 is that it helps us to take a step back and consider fundamendal model properties that are important for inference about latent variables from observed data. In this section we consider the DINA and NIDA models, described in (19) and (21) respectively, in this context. We have seen already that LI holds by construction in these models. And, as discussed in Section 3.1, M holds as long and 1

s > g for

each task or latent attribute, and Section 3.2 provides some evidence in favor of this property in the example data set. For models satisfying LI, M and LD, it follows immediately from Lemma 2 of Holland and Rosenbaum (1986) that for any non-decreasing summary g (X ) of X

= (X ; : : : ; XJ ), E [g(X ) j i℄ 1

is non-decreasing in each coordinate ik of i ; this implies the SOM (Stochastic Ordering of the Manifest score

X+ by the latent trait) property of Hemker et al. (1997), that P [X+ > j i ℄ is

Monotonicity and conditional independence

23

non-decreasing in each coordinate ik of i . Not much is known about the inverse and more useful property SOL (Stochastic Ordering of the Latent variables by the manifest score al., 1997), that P [ i1

X+ ; Hemker et

> 1 ; : : : ; iK > K j X+ = s℄ is nondecreasing in s, when the latent “trait”

is multidimensional. We prove next that a weaker, related property, that

h

P ik = 1 i1 ; : : : ; i(k

1)

X

; i(k+1) ; : : : ; iK ; and

Xij = s

i

j :Qjk =1

(22)

is non-decreasing in s, holds for the NIDA model but not the DINA model. For the NIDA model, we define

mik =

P

j : Qjk =1 xij

= nk = =

=

J x Q j =1 ij jk

P

number of tasks correct involving attribute k

(23)

J Q j =1 jk

P

total number of tasks involving attribute k

and note that the posterior odds of ik

= 1, conditional on the data and all other parameter values

in the model, are equal to

1 sk

mik

1 ik (1 sk ) 1 ik gk

!

gk

nk mik

!

Y n

k0 6=k

(1 sk0 ) ik0 gk0 ik0 1



(24)

ik

where

ik =

(1)  ik (0)

o

Qjk0

;

(1)= (0) are the prior odds. Property (22) is immediate from (24), since by (23), and ik ik

equals the sum

mik

P

j :Qjk =1 Xij on the right in (22).

The DINA model, however, need not satisfy (22). In this model the posterior odds of ik

= 1,

conditional on the data and all other parameter values in the model, turn out to be J

Y

j =1 k)

=

Q

sj

1 gj

ik k) Qjk

! (



J

Y

j =1

2 6 4

1 gj  1 sj gj sj

ik k) Qjk

! (

xij

3 7 5

 ik (1) (0)

ik

(25)

`6=k:Qj` =1 i` , which indicates presence of all attributes needed for task j , except (1)= (0) are again the prior odds. Clearly, if the products of odds (1 for attribute k ; and ik ik

where ij (

Monotonicity and conditional independence

24

gj )=gj  (1 sj )=sj vary greatly, it need not be the case that (25) is monotone in mik = j :Qjk =1 Xij , ( k) the number of correctly performed tasks involving attribute k . Note also the presence of ij : only tasks for which the examinee possesses all attributes except attribute k will contribute to increasing the odds of ik = 1; Thus ik is less sensitive to mik , and may not even be monotone in mik , in the P

DINA model. Finally we turn to a new kind of monotonicity that seems plausible for some cognitive assessment models. In a standard monotone unidimensional IRT model of the kind reviewed in Section 1, the more of  there is, the higher the probability of getting a task right. What is there “more of” in a cognitive assessment model like the NIDA and DINA models, that might increase the probability of correctly performing an assessment task? One answer might be, the more task-relevant latent attributes the examinee has, the higher the probability of correct task performance should be. In other words, we are asking whether each of the IRF’s in equations (18) and (20) is nondecreasing in

mij =

k

X

k=1

ik Qjk = number of task-relevant attributes possessed

(26)

This monotonicity property is easy to see for the DINA model, since in that model Pj ( i ) = (1

sj )ij gj1

ij

equals gj as long as mij


. We define

the location j of Pj to be the midpoint of all ’s satifying the weak unimodality condition, i.e.

j = midpointft : Pj (t) = max Pj ()g. This merely formalizes the shape displayed in Figure 4.

Various specific parametric models with shapes like that in Figure 4 have been proposed

Monotonicity and conditional independence

28

and used to analyze attitudinal data (e.g. Hoijtink, 1990; Andrich, 1996; Roberts, Donoghue and Laughlin, 2000)). We briefly indicate two examples. Example 4.1 Squared logistic models. Andrich (1988) suggests a variant of the Rasch model

Pj () =

1

1 + expf [ j ℄g

for unimodal, or unfolding response processes called the simple squared logistic model (SSL). The SSL item response function is defined by:

Pj () =

1 1 + expf( j ) g

(29)

2

Note that the maximal probability of endorsement for this item occurs at j and is equal to P ( j ) = 1 2

, and that the IRF Pj ( ) is symmetric around j . Figure 4 shows an example of the SSL in (29),

located at j

= 1. A generalization of this model, called the two-parameter squared logistic

model (2PSL), has IRF

Pj () =

1

1 + expf( j )

2

j g

(30)

The parameter j , sometimes called the item threshold, or item unit, controls both the height of the curve at the maximum 

= j and the range of ’s for which Pj () > 0:5. Both models are related

to a more general parametric form suggested by DeSarbo and Hoffman (1986). 2 Example 4.2 The hyperbolic cosine model (HCM).

Andrich and Luo (1993) proposed a different parametric form for Pj ( ). Their hyperbolic cosine model (HCM) for unfolding responses has IRF’s

Pj () =

e j

e j + 2cosh(

j )

:

(31)

Note the IRF is symmetric around the location j , and the maximal endorsement probability

Pj ( j ) = 1+2 exp1 f j g is a function of the what Andrich and Luo (1993) call the item unit parameter j , which, as in the 2PSL model (DeSarbo and Hoffman, 1986), controls both the height

Monotonicity and conditional independence

1.0

1.0

29

Item 1

0.2

0.4

P(theta)

Item 2

0.0

0.0

0.2

0.4

P(theta)

0.8

0.8

Item 3

0.6

Item 2

0.6

Item 3

Item 1

−6

−4

−2

0 theta

2

4

6

−6

−4

−2

0

2

4

6

theta

Figure 5: Typical IRF’s for the hyperbolic cosine model (31). On the left, 1 = 2, 2 = 0, 3 and 1 = 2 = 3 = 2. On the right, 1 = 2 = 3 = 0, and 1 = 1; 2 = 2; 3 = 3.

= 3,

 = j and the range of ’s for which Pj () > 0:5. Figure 5 displays several HCM IRF’s Pj ( ) with various values of item location parameters j and item unit parameters j . 2 of the curve at the maximum

When a subject disagrees with the attitudinal statement “I follow politics because it bothers me when I don’t” on Mulhberger’s politically integrated motivation scale, the disagreement may be because the subject is expressing a lower developmental level of political motivation (e.g. the subject feels that other people expect him or her to follow politics) or because the subject is expressing a higher developmental level of political motivation (e.g. the subject follows politics because it is fun). This ambiguity of location information when a subject disagrees—does the subject disagree because his/her level is “below” or “above” the level expressed by the item?—was recognized by both Thurstone (1928) and Coombs (1964). More recently (e.g. Andrich and Luo, 1993; Andrich, 1996; Verhelst and Verstralen, 1993; Roberts et al., 2000) this idea has been exploited to express unfolding models in terms of standard monotone unidimensional IRT models for polytomous responses (cf. Section 2). For each item j on the attitude scale we define a latent response variable

Monotonicity and conditional independence

30

(cf. e.g. Maris, 1995) 8 > > > > >
> > > > :

0; 1; 2;

if subject i disagrees with item j “from below”; if subject i agrees with item j ;

if subject i disagrees with item j “from above”;

and then define the observed response Xij in terms of the latent response ij as

Xij =

8 > < > :

0; 1;

= 0 or 2; if ij = 1; if ij

= 1 j1 ij j: As with Xij , we can and will assume that the ij , j

= 1; : : : ; J , are conditionally independent given

i , for each i = 1; : : : ; N . If we denote the ICRF’s for the latent response variables ij by Rjk (i ) = P [ij = k j i ℄; k = 1; 2; 3 it immediately follows that the IRF for the observed response variables Xij must be

Pj (i ) = P [Xij = 1 j i ℄ = Rj 1 (i ) 1 Pj (i ) = P [Xij = 0 j i ℄ = Rj0(i ) + Rj2(i ) Conversely,

Rj 0 (i ) = P [ij = 0 j i ℄ = [1 Pj (i )℄[1 g j (j )℄ Rj 1 (i ) = P [ij = 1 j i ℄ = Pj (i ) Rj 2 (i ) = P [ij = 2 j i ℄ = [1 Pj (i )℄g j (i )

(32)

g j (i ) = P [ij = 2jXij = 0; i ; j ℄:

(33)

where

Moreover it is easy to see that the IRT model for ij will satisfy the monotonicity condition M (equation (3)), if and only if Rj 2 ( ) is nondecreasing and Rj 0 ( ) is nonincreasing.

Monotonicity and conditional independence

31

For any weakly unimodal response function Pj ( ) attaining its maximum at 

= j , Johnson

(2001) observes that a latent response model can always be constructed using (32) with the function 8 >
:

1; 0;

if 

> ;

(34)

else

It is easy to see from (32) that Rj 2 ( ) will be nondecreasing and Rj 0 ( ) will be nonincreasing, so that the model for the ij is a monotone unidimensional IRT model. Under equation (34), a subject

= 0, and for whom i > j , automatically “disagrees from above”, j = 2. Conversely, a subject for whom Xij = 0 and i < j automatically “disagrees from below”, ij = 0. The discontinuity at in this g ( ) also makes Rjk ( ), k = 0; 2, discontinuous functions, even if Pj ( ) is smooth. In practice it is usually possible to find functions g ( ) that are who disagrees with item j , Xij

continuous, as in the examples that come next. Example 4.3 The simple squared logistic model. If the unfolding response model P (t) is defined by the SSL in (29) and if we take

g () =

e( ) 1 + e( )

in (33), then the latent response model derived from (32) is:

expf( ) g (1 + expf g)(1 + expf( ) g) 1 Rj () = P [j = 1 j ℄ = 1 + expf( ) g expf g expf( ) g Rj () = P [j = 2 j ℄ = (1 + expf g)(1 + expf( ) g) 2

Rj 0 () = P [j = 0 j ℄ = 1

2

2

2

2

2

and is a monotone item response model for the latent responses j . Figure 6 shows the category response functions for the latent responses j . 2

Example 4.4 The hyperbolic cosine model (HCM).

Monotonicity and conditional independence

X=ξ=1 ξ=0 ξ=2

0.0

0.2

0.4

P(θ)

0.6

0.8

1.0

32

−4

−2

0

2

4

θ

Figure 6: Latent response model developed for the simple squared logistic model (SSL) in Example 4.3.

Monotonicity and conditional independence

33

Andrich and Luo (1993) and Verhelst and Verstralen (1993) model unfolding responses as a latent response model where the latent responses are assumed to follow a partial credit model (PCM; Masters (1982). We can obtain their formulation by taking

g () =

e2( ) 1 + e2( )

in (33). This leads, after a little algebra, to the latent response ICRF’s

1

Rj 0 () = P [j = 0 j ℄ =

1 + expf j + j g + expf2( j )g expf j + j g Rj () = P [j = 1 j ℄ = 1 + expf j + j g + expf2( j )g expf2( j )g Rj () = P [j = 2 j ℄ = 1 + expf j + j g + expf2( j )g 1

(35)

2

for an item located at j , and with a threshold of j . Note that

Rj 0 () and Rj 2 () are monotone

decreasing, and monotone increasing respectively. Clearly this setup induces the hyperbolic cosine model (HCM) of (31):

Rj 1 () = Pj () =

e j

e j + 2cosh( j )

:

Johnson (2001) shows that another representation using

e3( ) g () = 1 + e3( ) is possible, leading to a different latent response model,

Rj 0 () = P [j = 0 j ℄ = Rj 1 () = P [j = 1 j ℄ =

( osh( )) (1 + e + osh( ))  ) )(e e

3(

e + osh( ) e3( ) ( osh( )) Rj 2 () = P [j = 2 j ℄ = (1 + e3( ) )(e + osh( ))

(36)

Figure 7 displays the HCM IRF along with the ICRF’s corresponding to the latent response models of equations (35) and (36). The solid lines correspond to the PCM model (35), whereas the dashed lines correspond to the alternate model (36). 2

Monotonicity and conditional independence

0.0

0.2

0.4

P(θ)

0.6

0.8

1.0

34

−4

−2

0

2

4

θ

Figure 7: Latent response ICRF’s for two different latent response models developed for the HCM in Example 4.4. The solid lines correspond to the partial credit model in equation (35), and the dashed lines correspond to the IRF’s described in (36).

Monotonicity and conditional independence

35

Thus, there are many ways of representing a single unfolding model IRF in terms of latent responses that follow a monotone IRT model. This argues against a substantive psychological interpretation of the latent response process ij , but developing the latent response process is nevertheless a useful exercise, closely related to data augmentation methods (e.g. Tanner, 1996) in other areas of statistics. Johnson (2001) has elucidated and generalized this latent response / data augmentation methodology; he is able to give explicit conditions that guarantee that the IRF’s Pj ( ) for the observed responses will be unimodal, in terms of conditions on the latent response ICRF’s

Rjk () for example, and he also proposes several computational algorithms for estimating direct response unfolding models based on the latent response idea. As an illustration Table 6 and Figure 8 show the estimated IRF’s for Muhlberger’s (1999) eight items measuring politically internal motivation. Equal-tailed 95% posterior credible intervals (CI’s) for the item location parameters ( 0 s) and item unit parameters ( ’s) are also given in Table 6. Bayesian estimation of the 2PSL was carried out using the software package BUGS (Spiegelhalter et al., 1997). In this analysis the  distribution in (1) was taken to be Gaussian with weak hyperpriors on the mean and variance, and the item location and item unit parameters were constrained to sum to zero (i.e.

P

j j

= 0 and

P

j j

= 0).

Recall that the items are numbered in order of increasing developmental maturity, in Muhlberger’s assesssment. The item parameter estimates in Table 6 suggest that this ordering is more or less correct, except at the top end of the scale where items 7 and 8 might be reversed; however the 95% CI’s are much wider than the differences in these item locations. More generally, judging from the 95% CI’s, it seems that the first three of Muhlberger’s items are definitely different from the last three or so, with the middle two items occupying an expected middle ground. These differences are less evident in Figure 8, since the varying item units j cause the IRF’s Pj ( ) to “spread” across one another more (especially for the most popular item, item 6, for example). Nevertheless some separation into a “low group” and a “high group” of items seems evident in the figure.

Monotonicity and conditional independence

Item 1: Expected

Parameter Location 1 Unit 1 2: Upset Location 2 Unit 2 3: Feel Bad Location 3 Unit 3 4: Bother Location 4 Unit 4 5: Learn Location 5 Unit 5 6: Important Location 6 Unit 6 7: Fun Location 7 Unit 7 8: Enjoy Location 8 Unit 8

36

Median

1:12 1:16 0:91 4:69 0:53 1:45 0:02 0:15 0:45 1:90 0:64 3:37 0:78 0:43 0:69 1:23

95% CI ( 1:59; 0:65) ( 1:98; 0:17) ( 1:74; 0:09) ( 7:03; 3:04) ( 1:01; 0:03) ( 2:22; 0:66) ( 0:38; 0:39) ( 0:40; 0:73) (0:08; 0:84) (1:28; 2:68) (0:11; 1:17) (2:40; 4:74) (0:30; 1:30) ( 0:22; 1:76) (0:25; 1:23) (0:56; 2:75)

Table 6: Posterior medians and 95% credible intervals for the squared logistic model parameters for the Muhlberger (1999) political motivation data set.

Monotonicity and conditional independence

1.0

37

0.0

0.2

0.4

P(θ)

0.6

0.8

Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8

−4

−2

0

2

4

θ

Figure 8: Squared logistic IRF’s Pj ( ) estimated from the Muhlberger (1999) political motivation data.

Monotonicity and conditional independence

38

4.2 Nonparametric Estimators of Subject Locations and IRF’s in Unfolding Data A natural nonparametric generalization of the direct-response unfolding models considered in Section 4.1 is the set of all models satisfying the local independence and unidimensionality assumptions of Section 2 and in addition having IRF’s Pj ( ) satisfying the weak unimodality condition discussed at the beginning of Section 4.1, but without any particular parametric form. We will refer to the model defined by only these three assumptions as the nonparametric direct-response unfolding model. In particular, all of the parametric models considered in Section 4.1 satisfy the conditions of this nonparametric unfolding model. Although unimodality is a natural a priori assumption to make about the IRF’s Pj ( ) in this context, it is useful to have diagnostic tools to assess whether the IRF’s are actually unimodal, or deviate from unimodality in some systematic way. One could assess the shapes of IRF’s by examining nonparametric kernel smoothed estimates as in equation (8). Unfortunately the total score X+ and rest score X+ (

j)

are not ordinally consistent estimators of  , as in equation (7), in an

unfolding model. A different “method of moments” style estimator is needed. Thurstone (1928) proposed a two-stage procedure for measuring attitudes that does not depend on a parametric model. In the first stage of Thurstone’s procedure, the J items might be reviewed by judges who would rank or otherwise locate each item relative to the others. This location information could be summarized in item-by-item weights aj . In the second stage, subjects would respond to (agree/disagree with) the

J items, producing dichotomous responses

X i = (Xi ; : : : ; XiJ ). 1

Thurstone then proposed using the weighted average 8 PJ > j =1 > PJ
> :

aj Xij j =1 Xij P J j =1 aj J

if

P

j Xij

>0

(37)

otherwise,

to estimate the location of subject i. This Thurstone score can be expressed in many forms, depending on the choice of the weights aj . In a nonparametric direct-response unfolding model, the IRF’s

Monotonicity and conditional independence

39

Pj () achieve their maxima at locations j , so that a natural choice for the weights in equation (37) is aj = j . If the j are not known but their ranks rj are known, then another natural choice is aj = rj =J as suggested by van Schuur (1988). Johnson (2001), following Post (1992), shows how to obtain estimates r^j of these ranks by using sample estimates of the conditional probabilities P [Xj = 1jXk = 1℄. More generally we will consider the Thurstone score in (37) for any a = (a1 ; : : : ; aJ ) that is an ordered scoring scheme (compare Junker, 1991), in the sense that the aj are comonotone with j (i.e., j  k implies aj  ak , for all j; k < J ). We now show that any Thurstone score based on an ordered scoring scheme will be an ordinally consistent estimator of  , under the nonparametric direct-response unfolding model. We begin by considering the true Thurstone score

TJa () 

J a P ( ) j =1 j j PJ j =1 Pj ( )

P

P () aj PJ j k=1 Pk ( ) j =1 J

X

=

in which the Xij are replaced by their expected values Pj (i ). The terms Pj ( )=

(38) k Pk ( ) in the

P

true Thurstone score have an interesting interpretation. Suppose we ask subjects to endorse one

J attitude items, rather than endorsing all that they agree with. Let Yi 2 f1; 2; : : : ; J g be respondent i’s forced choice. Then Yi is a single polytomous item response, with and only one of the

P [Y = j j i ℄ = Rj (i )  Now suppose the items have been numbered so that that if i1

1

Pj (i ) J P ( ) : k=1 k i

P

(39)

      J . It is natural to expect 2

< i Yi is less likely to be a high value than Yi , i.e. respondents located lower on the 2

1

2

latent scale will endorse an item with a lower location. This condition is easily formalized; we are asserting than the hypothetical J -category polytomous item satisfies the monotonicity condition that

Pj ()  P [Y for all j

= 1; : : : ; J .

 k j ℄ =

J

X

j =k

Rj ()

is non-decreasing in 

(40)

Monotonicity and conditional independence

40

This model for the forced-choice response variable Y will be called the associated cumulative model (ACM) for the unfolding model. The condition (40) asserts that the ACM is a nonparametric graded response model (np-GRM); cf. Section 2.2. The monotonicity condition in (40) need not hold for all unfolding models, but when it does, the true Thurstone score is comonotone with  : Lemma 4.1 (Johnson, 2001). If the associated cumulative model (ACM) for the unfolding model

j = 1; : : : ; J , satisfies (40), then for any ordered scoring scheme a, the true Thurstone score TJa ( ) will be non-decreasing in  .

with IRF’s Pj ( ),

The proof of this lemma is straightforward and is omitted for brevity. Post (1992) also considers nonparametric direct-response unfolding models, with the added assumption that the IRF’s Pj ( ) satisfy her montone traceline ration (MTR) condition,

j > k implies

Pj () is nondecreasing in  Pk ()

(41)

Johnson (2001) shows that any nonparametric direct-response unfolding model satisfying the MTR condition in (41) has a monotone ACM (in fact, the ACM will be an np-PCM; see Section 2.2); thus in Post’s nonparametric framework, TJa ( ) will be monotone in  , for any ordered scoring

scheme a.

Building on Lemma 4.1, Johnson (2001) establishes an ordinal consistency result for the Thur-

stone score TJa (X ) that is analogous to equation (7).

Theorem 4.1 (Johnson, 2001). Suppose that the dichotomous item response variables X1 , : : :, XJ follow a nonparametric direct-response unfolding model satisfying the monotonicity condition in

(40), and suppose that the true Thurstone score TJa ( ) with ordered scoring scheme a satisfies the

LAD condition in equation (6). Then the Thurstone score TJa (X ) is ordinally consistent for  , that is, there exist monotone transformations tJ (x) such that

j

a(X )) j = 0

lim t (T J !1 J J in probability.

Monotonicity and conditional independence

41

This result, whose proof is similar Stout’s (1990) proof of (7), suggests that, by analogy with (8), we explore the unimodality of Pj ( ) by examining the kernel-smoothed regression function

tJ (TJa(X )) Xij K h i =1 ; Pbj () = N ! X  tJ (TJa(X )) K h i=1 N

X

!



(42)

K (t) is a unimodal, symmetric kernel function. Johnson (2001) shows that including Xj in the estimator TJa (X ) can obscure some features of the IRF’s. So, as in the discussion

where once again

of Ramsay’s (1991) nonparametric regression estimates of IRF’s in the monotone unidimensional IRT case (equation 8), we consider the “rest-Thurstone scores” TJa

(X ), defined in the same way as in (37) but omitting Xj from the numerator and denominator, for estimating each Pj ( ). For the ordered scoring scheme aj we adopt van Schuur’s (1988) proposal aj = rj =J with the item (

j)

b

ranks rj estimated as outlined in Johnson (2001, Chapter 5; see also Post, 1992).

4.3 Nonparametric Curve Estimation In Section 4.1 we estimated the 2PSL model (30) for data from the survey of political motivation of

Muhlberger (1999). From the item location parameter estimates ^j obtained there, we concluded that the items could be ranked in the order 1, 2, 3, 4, 5, 6, 8, 7, in increasing order of developmental maturity, though there was considerable uncertainty about the relative ranks of adjacent items in this ordering. This is the same rank order as that obtained by looking at the matrix of sample

= 1jXk = 1℄; these ranks are obtained by ranking the entries of the column of this matrix corresponding to the lowest value of P [Xj = 1jXk = 1℄ (cf. Johnson, 2001,

conditional probabilities Pb [Xj

b

Chapter 5, for details).

Applying the kernel estimator in (42) with the rest-Thurstone scores TJa

(

j)

(X ) and the esti-

mated ranks 1, 2, 3, 4, 5, 6, 8, 7, we obtain the nonparametric regression estimates of the unfolding IRF’s shown in Figure 9. These have been overlaid on the 2PSL estimates from Section 4.1 for comparison.

Monotonicity and conditional independence

42

2

4

−2

0

2

0.0 0.2 0.4 0.6 0.8 1.0

4

−4

−2

0

Item 5

Item 6

0

2

4

Pj(θ) −4

−2

0

θ

θ

Item 7

Item 8

0 θ

2

4

2

4

2

4

2

4

0.0 0.2 0.4 0.6 0.8 1.0

Item 4 0.0 0.2 0.4 0.6 0.8 1.0

θ

Pj(θ) −4

−2

θ

Pj(θ) −2

Pj(θ) −4

θ

0.0 0.2 0.4 0.6 0.8 1.0

−4

0.0 0.2 0.4 0.6 0.8 1.0

Pj(θ) 0

Item 3

−4

−2

0 θ

0.0 0.2 0.4 0.6 0.8 1.0

Pj(θ) Pj(θ)

−2

0.0 0.2 0.4 0.6 0.8 1.0

−4

Pj(θ)

Item 2

0.0 0.2 0.4 0.6 0.8 1.0

Item 1

−4

−2

0

2

4

θ

Figure 9: Non-parametric curves of the eight political motivation items from Muhlberger (1999). There appears to be substantial disagreement between the non-parametric item response function estimates [solid lines] in Figure 9 and the parametric 2PSL estimates [dashed lines]. Examining items 4, 5, and 6, we find that the 2PSL over-estimates the maximal endorsement probability and is not wide enough, relative to the corresponding nonparametric estimate. Recall that the item unit parameter j is a single parameter that controls both the height and width of the IRF. It may be that, in capturing appriximately the width of the true IRF, the unit parameter j is forcing the maximal endorsement probability to be over-estimated. This suggests that an unfolding model with separate parameters measuring the height and width of the response function may be needed to fit this data. One example of such a model is the generalized graded unfolding model (Roberts et al., 2000). The nonparametric IRF estimates for the other five items even seem to argue against a unimodal unfolding model for this data, though a careful analysis would have to include both (a) confidence envelopes for the curves; and (b) a more careful consideration of the weights aj used in the Thurstone scores on which the nonparametric regression estimates are based.

= r^j =J

Monotonicity and conditional independence

43

This analysis illustrates a real strength of nonparametric methods in this context: comparing nonparametric estimates of the IRF’s with parametric ones can suggest defects in the parametric estimates, leading to a better choice of a parametric model for future analyses. A more formal approach to assessing the fits of parametric IRT models, by comparing with nonparametric fits, is developed by Douglas and Cohen (2001).

5 Summary Discrete multivariate response data is ubiquitous in education, psychology and social science. Often, the nature of the dependence between the discrete variables, and the desires of researchers to summarize these data with fewer variables, leads to latent variable models for the multivariate discrete data. In Section 2 we reviewed one of the most successful modeling frameworks for this kind of problem, item response theory (IRT). The basic assumptions of IRT are that the discrete response variables become independent when we condition on a unidimensional, continuous latent subject variable, and that the probability of a positive response on any one of the variables is monotone increasing as a function of the latent variable. These models are the monotone unidimensional IRT models and can be approached from both a nonparametric and a parametric point of view. We also briefly considered some approaches, due primarily to Stout (1990) and Ramsay (1991), that allow the conditional independence and monotonicity assumptions of this basic IRT modeling framework to be weakened. In the remainder of this paper we have considered two examples that show both the flexibility of this basic framework and the ability of the framework to be generalized to handle novel situations. In Section 3 we considered generalizations of the basic model in which the continuous unidimensional subject variable was replaced with a multivariate vector of dichotomous variables indicating presence or absence of specific skills. Two such generalizations were considered, and compared using various monotonicity properties that aid in the interpretation of the models. In Section 4 the monotone relationship between positive response and the latent variable was replaced

Monotonicity and conditional independence

44

with a unimodal or proximity relationship, so that the probability of positive response increased as the “distance” between the subject and the item decreased. A well-known relationship between these so-called “unfolding” models and monotone unidimensional IRT models (e.g. Andrich and Luo, 1996) was reviewed and illustrated. In addition, a second way of relating unfolding models to monotone unidimensional IRT models (Johnson, 2001) was considered, and used to motivate a nonparametric approach to assessing the unimodality assumption in these models.

6 References Algina, J. (1992). Special issue: the National Assessment of Educational Progress (Editor’s Note). Journal of Educational Measurement, 29, 93–94. Andrich, D. (1988). The application of an unfolding model of the PIRT type for the measurement of attitude. Applied Psychological Measurement, 12, 33–51. Andrich, D. (1996). A hyperbolic cosine latent trait model for unfolding polytomous responses: Reconciling Thurstone and Likert methodologies. British Journal of Mathematical and Statistical Psychology, 49, 347–365. Andrich, D. and Luo, G. (1993). A Hyperbolic Cosine Latent Trait Model for Unfolding Dichotomo us Single-Stimulus Responses. Applied Psychological Measurement, 17, 253–276. Bartolucci, F., and Forcina, A. (in press). A likelihood ratio test for MTP2 within binary variables. In press, Annals of Statistics. B´eguin, A.A., and Glas, C.A.W. (1999). MCMC estimation of multidimensional IRT models. (Research Report 98-14). Department of Educational Measurement and Data Analysis, University of Twente, the Netherlands. [Online]. Available: http://to-www.edte.utwente.nl/TO/omd/report98/rr9814.htm. Accessed 28 April 2000. Bock, R. D., and Zimowski, M. F. (1997). Multi-group IRT. In W. J. Van der Linden, and R.

Monotonicity and conditional independence

45

K. Hambleton (Eds.), Handbook of modern item response theory (pp. 433–448). New York: Springer Verlag. Coombs, C. H. (1964). A Theory of Data. New York: Wiley. DeSarbo, W. S. and Hoffman, D. L. (1986). Simple and Weighted Unfolding Threshold Models for the Spatial Representation of Binary Choice Data. Applied Psychological Measurement, 10, 247–264. DiBello, L. V., Stout, W. F. & Roussos, L. A. (1995). Unified cognitive/psychometric diagnostic assessment likelihood-based classification techniques. Chapter 15, pp. 361–389, in Nichols, P. D., Chipman, S. F. & Brennan, R. L. (eds). (1995). Cognitively diagnostic assessment. Hillsdale, NJ: Erlbaum. Doignon, J.-P. & Falmange, J.-C. (1999). Knowledge spaces. New York: Springer-Verlag. Douglas, J. and Cohen, A. (2001). Nonparametric ICC estimation to assess fit of parametric models. Applied Psychological Measurement, in press. Embretson, S. E. (1985). Multicomponent latent trait models for tests odesign. pp. 195–218 in Embretson, S. E. (ed.) (1985). Test design: developments in psychology and psychometrics. New York: Academic Press. Embretson, S. E. (1997). Multicomponent response models. In W. J. Van der Linden and R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 305–321). New York: Springer Verlag. Fischer, G. H. (1973). Linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374. Fischer, G. H., and Molenaar, I. W. (eds.) (1995). Rasch models: foundations, recent developments, and applications. New York: Springer-Verlag. Fox, G.J.A., and C.A.W. Glas (1998). A multi-level IRT model with measurement error in the predictor variables. (Research Report 98-16). Department of Educational Measurement and

Monotonicity and conditional independence

46

Data Analysis, University of Twente, the Netherlands. [Online]. Available: http://towww.edte.utwente.nl/TO/omd/report98/report98.htm . Accessed 28 April 2000. Gibbons, R. D., and Hedeker, D. R. (1997). Random effects probit and logistic regression models for three-level data. Biometrics, 53, 1527–1537. Grayson, D. A. (1988). Two-group classification in latent trait theory: scores with monotone likelihood ratio. Psychometrika, 53, 383–392. Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 301–321. Heckerman, D. (1996). A tutorial on learning with Bayesian networks. Microsoft Research tech. report, MSR-TR-95-06. Obtained November 2000 from ftp://ftp.research.microsoft.com/pub/tr/TR-95-06.PS Hemker, B. T., Sijtsma K., Molenaar, I. W., and Junker, B. W. (1997). Stochastic ordering using the latent trait and the sum score in polytomous IRT models. Psychometrika, 62, 331–347. Hoijtink, H. (1990). A latent trait model for dichotomous choice data. Psychometrika, 55, 641– 656. Holland, P. W., and Rosenbaum, P. R. (1986). Conditional association and unidimensionality in monotone latent trait models. Annals of Statistics, 14, 1523–1543. Huynh, H. (1994). A new proof for monotone likelihood ratio for the sum of independent Bernoulli random variables. Psychometrika, 59, 77–79. Janssen, R., and De Boeck, P. (1997). Psychometric modeling of componentially designed synonym tasks. Applied Psychological Measurement, 21, 37–50. Johnson, M. S. (2001). Unfolding response models. Draft doctoral dissertation. Pittsburgh PA: Department of Statistics, Carnegie Mellon University.

Monotonicity and conditional independence

47

Junker, B. W. (1991). Essential independence and likelihood-based ability estimation for polytomous items. Psychometrika, 56, 255–278. Junker, B. W. (1993). Conditional association, essential independence and monotone unidimensional item response models. Annals of Statistics, 21, 1359–1378. Junker, B. W. (2001). On the interplay between nonparametric and parametric IRT, with some thoughts about the future. Invited paper to appear in Boomsma, A., Van Duijn, M. A. J. and Snijders, T. A. B. (Eds.) (2001). Essays on item response theory. New York: Springer-Verlag. Junker, B. W. and Ellis, J. L. (1997). A characterization of monotone unidimensional latent variable models. Annals of Statistics, 25, 1327–1343. Junker, B. W. and Sijtsma, K. (2000). Latent and manifest monotonicity in item response models. Applied Psychological Measurement, 24, 65–81. Junker, B. W. and Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric IRT. Applied Psychological Measurement, in press. Lee, Y., and Nelder, J. A. (1996). Hierarchical generalized linear models. Journal of the Royal Statistical Society, Series B, 58, 619–678. Loevinger, J. (1948). The technique of homogeneous tests compared with some aspects of “scale analysis” and factor analysis. Psychological Bulletin, 45, 507–530. Lord, F. M. (1952). A theory of test scores. Psychometric Monographs, No. 7. Macready, G. B. & Dayton, C. M. (1977). The use of probabilistic models in the assessment of mastery. Journal of Educational Statistics, 2, 99–120. Maris, E. (1995). Psychometric latent response models. Psychometrika, 60, 523–547. Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187– 212. Masters, G. N. (1982). , A Rasch Model for Partial Credit Scoring. Psychometrika, 47, 149–174.

Monotonicity and conditional independence

48

McCullagh, P., and Nelder, J. A. (1989). Generalized Linear Models (2nd Edition). New York: Chapman and Hall. Mislevy, R. J. (1985). Estimation of latent group effects. Journal of the American Statistical Association, 80, 993–997. Mislevy, R. J. (1996). Test theory reconceived. Journal of Educational Measurement, 33, 379– 416. Mokken, R. J. (1971). A theory and procedure of scale analysis. The Hague: Mouton. Molenaar, I. W. (1997). Nonparametric methods for polytomous responses. In W. J. Van der Linden, and R. K. Hambleton (Eds.), Handbook of modern psychometrics (pp. 369–380). New York: Springer Verlag. Muhlberger, P. (1999). A General Unfolding, Non-folding Scaling Model and Algorithm, Presented at the 1999 American Political Science Association Annual Meeting, Atlanta, GA. Patz, R. J. and Junker, B. W. (1999). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24, 342–366. Post, W. J. (1992). Nonparametric unfolding models: a latent structure approach. Leiden, Netherlands: DSWO Press. Ramsay, J. O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56, 611–630. Ramsay, J. O. (1995). A similarity-based smoothing approach to nondimensional item analysis. Psychometrika, 60, 323–339. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. (Expanded edition, 1980.) University of Chicago Press. Chicago, Illinois. Roberts, J. S., Donoghue, J. R. and Laughlin, J. E. (2000). A general item response theory model

Monotonicity and conditional independence

49

for unfolding unidimensio nal polytomous responses. Applied Psychological Measurement, 24, 3–32. Samejima, F. (1997). Graded response model. Chapter 5, pp. 85–100, in Van der Linden, W. J., and Hambleton, R. K. (eds.) (1997). Handbook of modern item response theory. New York: Springer Verlag. Scheiblechner, H. (1972). Das Lernen und L¨osen komplexer Denkaufgaben. [The learning and solving of complex reasoning items.] Zeitschrift f¨ur Experimentelle und Angewandte Psychologie, 3, 456–506. Sijtsma, K., and Hemker, B. T. (1998). Nonparametric polytomous IRT models for invariant item ordering, with results for parametric models. Psychometrika, 63, 183–200. Sijtsma, K. & Verweij, A. (1999). Knowledge of solution strategies and IRT modeling of items for transitive reasoning. Applied Psychological Measurement, 23, 55–68. Spiegelhalter, D. J., Thomas, A., Best, N. G., & Gilks, W. R. (1997). BUGS: Bayesian inference Using Gibbs Sampling, Version 0.6. MRC Biostatistics Unit, Cambridge. Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325. Stout, W. F., Habing, B., Douglas, J., Kim, H. R., Roussos, L., and Zhang, J. (1996). Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20, 331–354. Tanner, M. A. (1996). Tools for statistical inference: methods for the exploration of posterior distributions and likelihood functions. 3rd Edition. New York: Springer-Verlag. Tatsuoka, K. K. (1995). Architecture of knowledge structures and cognitive diagnosis: a statistical pattern recognition and classification approach. In P. D. Nichols, S. F. Chipman, and R. L. Brennan (Eds.), Cognitively diagnostic assessment (pp. 327–359). Hillsdale, NJ: Lawrence Erlbaum Associates.

Monotonicity and conditional independence

50

Thurstone, L. L. (1928). Attitudes can be measured. American Journal of Sociology, 33, 529–554. Tutz, G. (1997). Sequential models for ordered responses. Chapter 8, pp. 139–152, in Van der Linden, W. J., and Hambleton, R. K. (eds.) (1997). Handbook of modern item response theory. New York: Springer Verlag. Van der Linden, W. J., and Hambleton, R. K. (eds.) (1997). Handbook of modern item response theory. New York: Springer Verlag. van Schuur, W. H. (1988). Stochastic Unfolding, Chapter 9, pp, 137–157 in Saris, W. E. and Gallhofer, I. N. (Eds.). (1988). Sociometric Research, Volume 1. London: MacMillan. Verhelst, N. D. and Verstralen, H. H. F. M. (1993). A stochastic unfolding model dereived from the partial credit model. Kwantitative Methoden, 42, 73–92. Verweij, A., Sijtsma, K. & Koops, W. (1999). An ordinal scale for transitive reasoning by means of a deductive strategy. International Journal of Behavioral Development, 23, 241–264. Zwick, R. (1992). Special issue on the National Assessment of Educational Progress. Journal of Educational Measurement, 17, 93–94.

Suggest Documents