Detecting Answer Copying Using Alternate Test ... - Wiley Online Library

3 downloads 225 Views 202KB Size Report
Two types of answer-copying statistics for detecting copiers in small-scale examina- tions are proposed. One statistic i
Journal of Educational Measurement Summer 2008, Vol. 45, No. 2, pp. 99–117

Detecting Answer Copying Using Alternate Test Forms and Seat Locations in Small-Scale Examinations L. Andries van der Ark, Wilco H. M. Emons, and Klaas Sijtsma Tilburg University Two types of answer-copying statistics for detecting copiers in small-scale examinations are proposed. One statistic identifies the “copier-source” pair, and the other in addition suggests who is copier and who is source. Both types of statistics can be used when the examination has alternate test forms. A simulation study shows that the statistics do not depend on the total-test score. Another simulation study compares the statistics with two known statistics, and shows that they have substantial power. The new statistics are applied to data from a small-scale examination (N = 230) with two alternate test forms. Auxiliary information on the seat location of the examinees and the test scores of the examinees was used to determine whether or not examinees could be suspected.

Answer copying is a great problem in high-stakes exams. Stern and Havlicek (1986, cited in Cizek, 1999, p. 25) reported that 71% of a sample of undergraduate students at a large Midwestern University admitted to have been engaged in answer copying at least once during their college career. A similar percentage admitted to having permitted another examinee to look at one’s answer sheet; that is, being an active source to an answer copier. Answer copying is a great threat to the validity of the scores, both for the educational organization and for society. Only when a copier is caught red-handed does the testing authority have compelling evidence for taking appropriate measures against him/her and, possibly, the source. However, only a few copiers and sources are caught. Thus, additional tools for identifying answer copying are badly needed. This study proposes new statistics for identifying answer copying in small-scale examinations. Answer copying involves at least two examinees, the copier and the source. One statistic identifies the “copier-source” pair, and the other in addition suggests who is copier and who is source. Both statistics have a simpler version and a statistically more involved version. It is assumed unknown whether two examinees sitting next to one another have copied. Thus, the approach studied here treats answer-copying statistics as screening devices to explore the sample for potential cheaters. It does this by taking seat locations of potential copier and source into consideration, and by using two different versions of the test. Only examinees sitting next to one another are considered potential copier and source. Half of these pairs were administered the same test and the other half were administered different test versions. Our approach excludes other forms of cheating and ignores the possibility that examinees sitting apart (even in different rooms) might have had telephone contact. This paper is organized as follows. First, features of the sample and the examination are described. Second, the new answer-copying statistics are proposed. Our statistical approach follows earlier work (Angoff, 1974; Frary & Tideman, 1997; c 2008 by the National Council on Measurement in Education Copyright 

99

Van der Ark, Emons, and Sijtsma

Frary, Tideman, & Watts, 1977; Holland, 1996; Sotaridona & Meijer, 2002, 2003; Wollack, 1997, 2004) but is tailored to the present application. This means that the answer-copying statistics: (a) are designed for relatively small samples whereas previous answer-copying statistics require large samples for accurate estimation of the examinees’ latent trait values (e.g., Wollack, 1997) or separate distributions for each test score (Angoff, 1974); and (b) allow the use of different test versions containing both common and unique items. Third, simulation studies were conducted to investigate: (a) possible dependence of the statistics on the test score, and (b) the detection rate of the answer-copying statistics. Fourth, alternative explanations for obtaining high answer-copying values were sought using auxiliary information such as seat location and test score. Description of the Sample and the Examination Sample The sample consisted of 231 psychology students from Tilburg University, who took an examination in introductory test theory and psychodiagnostics. None of them were caught in the act of cheating. One examinee missed the last item and was excluded from the analyses. The remaining 230 examinees completed the examination. Location The examination took place in rooms A, C, and Q. Three rooms were necessary to have empty chairs between, in front of, and in the back of each examinee. Figure 1 shows the map of room C. Examinees were identified by their seat numbers (e.g., examinee C.K12 was seated in room C on seat K12). Because the location of each examinee was known, his/her neighbors were also known. These neighbors were potential sources to a potential copier and vice versa. Two test versions were used. These test versions were distributed such that in a “square” of four neighbors, examinees at diagonally opposing corners received the same versions (dashed lines in Figure 1) and examinees in other pairs received different versions (solid lines). Instrument Examinees did not know that two versions were used. The test versions consisted of 24 four-choice items with one correct answer option each. Each test version had 16 unique items and eight common items. Each unique item in Test Version 1 looked physically similar to a unique item in Test Version 2 (cf. Kvam, 1996). Thus, a glance at one’s neighbor’s answer sheet would not reveal that these items were not exactly the same. For example, in Test Version 1, one unique item required computing Cronbach’s alpha from a given variance-covariance matrix and in Test Version 2 the corresponding unique item had one different entry in this matrix. Both items had the same four answer options, but in one version option b was correct and in the other version option c was correct. Thus, only very close reading would reveal this difference. It was not expected that examinees would do this given that they did not know that there were two test versions. 100

FIGURE 1. Schematic map of arena-shaped room C (top of figure corresponds to back of room). The three bounded areas are the seats. Numbered seats (e.g., A07 = Row A, Seat 07) were occupied by examinees. Dashed lines indicate that two neighbors had the same version and solid lines indicate that they had different versions.

For eight items, it was impossible to prepare unique pairs because that would unavoidably result in items that looked different or items that looked the same but posed different challenges to the examinees. Hence, there were eight common items. Answer-Copying Statistics Given N = 230, there are 12 N (N − 1) = 26,335 different examinee pairs in total. Each examinee pair is denoted (v, w). Examinee v’s score on item j is denoted as X v j : X v j = 1 if the correct option was chosen, and X v j = 0 otherwise. The scores of examinee v and examinee w on item j constitute their pair-score. This pair-score is realizations x vw, j ∈ {(0, 0), (0, 1), (1, 0), (1, 1)}. The test denoted X vw, j , and it has  X j ; for Examinee v it is denoted as X +v . score is defined as X + = It was assumed that examinees v and w can only be copier and source with respect to item j if they (a) were neighbors and (b) chose the same option. Let m vw, j be an indicator variable: m vw, j = 1 if the examinees chose the same option to item j (called a match), and m vw, j = 0 otherwise. Whether or not a match can be considered to be representative of answer copying further depends on the values of the pair-score. Three cases of suspicious pair-scores can be distinguished: 1. Same option, incorrect answer: Both examinees chose the same incorrect option [i.e., m vw, j = 1 and X vw, j = (0, 0)]. This can occur when item j is either common or unique. Examinees v and w may have been cheating but there is no information on who is copier and who is source. 2. Same option, different answers: Both examinees chose the same option (i.e., m vw, j = 1), but for one examinee this option was correct and for the other incorrect [i.e., X vw, j = (1, 0) or X vw, j = (0, 1)]. This can occur only when item j is unique. The examinee having item j correct may be source and the other may be copier. 101

Van der Ark, Emons, and Sijtsma

3. Same option, correct answer: Both examinees chose the same correct option [i.e., m vw, j = 1 and X vw, j = (1, 1)]. This can occur only when item j is common. This result is not suspicious when examinees chose the correct option because they knew the answer. It may be suspicious if, based on the answers to all other items that are easier than item j, it is unlikely that one of the examinees would have answered item j correctly. The first two cases are used to define two normed answer-copying statistics, and all three cases are used to define two normed answer-copying statistics based on item difficulty ordering. Normed Answer-Copying Statistics One normed answer-copying statistic singles out suspicious examinee pairs, and the other in addition suggests who is source and who is copier. Both statistics are based on a count of an examinee pair’s suspicious pair-scores (i.e., either same option, incorrect answer or same option, different answers). The statistics are explained using an example of three examinees answering five three-choice items in two different test versions (Figure 2). Statistic τ 1 . The first statistic, denoted τ 1 , is a normed count of the suspicious pairscores. The raw count of suspicious pair-scores is denoted T 1 . Examinees q and r in Figure 2 (middle table) have three matching options (for items 1, 2, and 3). The pairscores on items 2 and 3 are not suspicious because q and r both gave correct answers (same option, correct answer), but the pair-score on item 1 is suspicious because q and r both gave incorrect answers (same option, incorrect answer); consequently T1qr = 1. By definition this count is the same for examinees q and r, so that T1qr = T1rq = 1. Similarly, examinee pair (q, s) has three suspicious pair-scores (i.e., T1qs = T1sq = 3), and examinee pair (r , s) has one suspicious pair-score (i.e., T1r s = T1sr = 1). For an arbitrary examinee pair (v, w) the raw count of the number of suspicious pair-scores is given by T1vw =

J  [m vw, j × (1 − X vj × X wj )].

(1)

j=1

The Appendix shows that T1vw has an ambiguous interpretation because its maximum value is a function of the number of common and unique items in the test and the test scores, X +v and X +w . To solve this problem, T1vw is transformed into the normed answer-copying statistic τ1vw : the difference is taken between T1vw and its expected value given no answer copying, denoted by E(T1vw ), and this difference is , compared to the difference between the maximum value of T1vw , denoted by T1max vw and E(T1vw ): τ1vw =

T1vw − E(T1vw ) . T1max − E(T1vw ) vw

(2)

, and that τ1vw = 0 if T1vw = E(T1vw ). It may be noted that τ1vw = 1 if T1vw = T1max vw can be computed and how E(T1vw ) can be In the Appendix, it is shown how T1max vw 102

Answer Key Version Correct options Items 1 2 3 4 5 I c c a a a II c c b b b Data Examinee Version

q r s

I I II

1 a a c

Raw counts T1 T1qr =1+0+0+0+0=1 T1rq =1+0+0+0+0=1 T1qs =0+0+1+1+1=3 T1sq =0+0+1+1+1=3 T1rs =0+0+1+0+0=1 T1sr =0+0+1+0+0=1

Options Items 2 3 4 c a c c a a c a c

5 a b a

Item scores Items 1 2 3 4 5 0 1 1 0 1 0 1 1 1 0 1 1 0 0 0

T2 T2qr =1+0+0+0+0=1 T2rq =1+0+0+0+0=1 T2qs =0+0+0+1+0=1 T2sq =1+0+2+1+2=5 T2rs =0+0+0+0+0=0 T2sr =0+0+2+0+0=2

FIGURE 2. Example of answer key (upper table), data (middle table), and raw counts T 1 and T 2 (lower table).

approximated without much computational effort. If examinee pair (v, w) did not copy, then τ1vw is expected to be approximately zero. The higher τ1vw the more suspicious the examinee pair. Statistic τ 2 . Unlike τ 1 , statistic τ 2 distinguishes between copier and source by assigning different weights to suspicious pair-scores. Raw count T2vw assigns positive weight to a suspicious pair-score if examinee v is copier and weight zero if w is copier. Analogously, T2wv assigns positive weight to a suspicious pair-score if w is copier and weight zero if v is copier. For example, examinees q and s in Figure 2 have four matching options (for items 2, 3, 4, and 5). Pair-score X qs,2 is not suspicious because both examinees gave the correct answer (same option, correct answer). The remaining three pair-scores are suspicious. Pair score X qs,3 = (1, 0) shows that examinee q answered Item 3 correctly and s answered Item 3 incorrectly (same option, different answers). If copying occurred here, the most likely scenario is that examinee q knew the 103

Van der Ark, Emons, and Sijtsma

answer and felt no need to copy, whereas examinee s did not know the answer and copied option a assuming it had the same content as examinee q’s option a. Because it is suspicious and informative, this pair-score counts double (weight 2) when computing T2sq , but when computing T2qs this pair-score does not count (weight 0) because q is not suspected of copying. The same scenario is assumed for the suspicious pair-score X qs,5 = (1, 0). Pair-score X qs,4 = (0, 0) also is suspicious but uninformative on who is copier and who is source (same option, incorrect answer). Because it is suspicious but uninformative, this pair-score has weight 1 when computing both T2sq and T2qs . Similarly, the weighted counts for examinee pairs (q, r ) and (r , s) can be computed from Figure 2. For an arbitrary examinee pair (v, w), raw count T2vw is defined as T2vw =

J 

m vw, j × [2X wj × (1 − X vj ) + (1 − X v j ) × (1 − X wj )].

(3)

j=1

If two examinees were administered the same test version, so that same option, different answers is impossible, all weights equal 1, and T1vw = T1wv = T2vw = T2wv . The same problems as encountered with T1vw give rise to defining the normed answercopying statistic τ2vw : τ2vw =

T2vw − E(T2vw ) . T2max − E(T2vw ) vw

(4)

The Appendix describes the computation of T2max and the approximation of E(T2vw ). vw , and τ2vw = 0 if T2vw = E(T2vw ). If exIt may be noted that τ2vw = 1 if T2vw = T2max vw aminee pair (v, w) did not copy, it is expected that τ2vw ≈ 0. High values of τ2vw indicate that examinee v copied from source w. It may be noted that τ2vw need not be equal to τ2wv . Distribution of the answer-copying statistics. For the interpretation of answercopying statistics, the distributions of τ 1 and τ 2 under the null hypothesis of no answer copying were estimated and cutoff scores were determined at the xth (e.g., 95th or 99th) percentile. Values of τ1vw or τ2vw exceeding the cutoff score suggest that examinee pair (v, w) is suspicious (cf. Wollack, 1997). The null distributions can be estimated by means of the empirical distributions of τ1vw and τ2vw for nonneighboring examinees. Two notes are in order. First, distributions must be estimated for four types of examinee pairs: a distribution when both members of a pair were administered Test Version 1 (Type 11); both were administered Test Version 2 (Type 22); the potential copier was administered Test Version 1 and the potential source Test Version 2 (Type 12); and vice versa (Type 21). Second, Angoff (1974) and Sotaridona and Meijer (2002) argued that separate distributions should be estimated for different combinations of test scores, X +v and X +w . This is infeasible for smaller sample sizes such as our N = 230. However, this problem was avoided because the normed answer-copying statistics τ 1 and τ 2 do not depend on the test score. This was verified using both real and simulated data. 104

Normed Answer-Copying Statistics Based on Item Ordering For statistics similar to τ 1 and τ 2 , Sotaridona and Meijer (2003) showed that incorporating information on matching correct options (i.e., same option, correct answer) while taking the item difficulties into account increased the power of the statistics. Such information is ignored by statistics τ 1 and τ 2 but statistics τ ∗1 and τ ∗2 to be defined next take this information into account. Consider same option, correct answer pair-scores. One could argue that the degree to which examinee v’s correct answer to item j is suspicious depends on his/her scores on the easier items. If several of the items easier than item j were answered incorrectly, then a correct answer to item j could be suspicious: Examinee v might have copied the option number from examinee w; else there is no reason to suspect examinee v. Items are ordered and numbered by decreasing proportions of correct answers (pvalues), such that p 1 > p 2 > · · · > p J . Then it is counted how often examinee v failed an item i (i = 1, . . . , j − 1) that was easier than the item under consideration (e.g., item j). This count is denoted as G v j (G stands for Gutmann error; see, e.g., Sijtsma & Molenaar, 2002, p. 53). This counting is repeated for each item for which the examinee pair (v, w) produced a same option, correct answer. Noting that G v1 cannot be computed, the total number of Guttman errors for examinee v is equal to Gv =

J 

Gv j .

(5)

j=2

The maximum value of G v depends on test score X v+ and the number of common items. Thus, statistic G is normed against its maximum, which means rescaling G to the interval 0 − 1 (see Appendix). Statistic G v is incorporated in τ 1 and τ 2 such that τ ∗1 and τ ∗2 are obtained as follows, τ1∗vw

τ2∗vw

= τ1vw

  1 Gv ; + 2 G max v

(6)

= τ2vw

  1 Gv . + 2 G max v

(7)

Simulations showed that the weighted sum of the two pieces of information (cf. Sotaridona & Meijer, 2003) using weights 1 for either statistic τ1vw or τ2vw and weight 1 for the normed Guttman errors produced the highest detection rates. For examinee 2 pairs of which the members were administered the same test version, it was shown already that τ1vw = τ2vw . It follows from equations (6) and (7) that in this case τ1∗vw = τ2∗vw . It may be noted that τ1∗vw and τ2∗vw can exceed 1. This is no problem when decisions are based on cutoff scores under empirical sampling distributions, as will be done later on. 105

Investigating Total-Score Dependence of the Answer-Copying Statistics To investigate whether the proposed answer-copying statistics depend on the total score, the sample was broken up into the four quartiles defined by total score X + . Twenty-three of the 117 examinees that were administered Test Version 1 belonged to the first quartile of the total score (0 ≤ X + ≤ 9), another 23 to the second quartile (10 ≤ X + ≤ 11), 35 to the third quartile (12 ≤ X + ≤ 14), and 36 to the fourth quartile (15 ≤ X + ≤ 24). Twenty-four of the 113 examinees that were administered Test Version 2 belonged to the first quartile of the total score (0 ≤ X + ≤ 9), 37 to the second quartile (10 ≤ X + ≤ 12), 25 to the third quartile (13 ≤ X + ≤ 14), and 27 to the fourth quartile (15 ≤ X + ≤ 24). Table 1 shows for each quartile and each type of examinee pair the means and standard deviations of the four normed statistics (τ 1 , τ 2 , τ ∗1 , and, τ ∗2 ) and two unnormed statistics (T 1 and T 2 ). A monotone trend across quartiles of the means or the standard deviations of a statistic indicates dependence on the total score. For example, for examinee-pair Type 22, the means of τ 1 across quartiles equaled .155, .160, .170, and .143; this is not a monotone trend. For the normed statistics, trends were absent or weakly decreasing in the means. Standard deviations were similar across all quartiles. For the unnormed statistics the means showed a clearly decreasing trend

TABLE 1 Means (M) and Standard Deviations (SD) of τ 1 , τ 2 , τ 1∗ , τ 2∗ , T 1 , and T 2 for Four Quartiles of Test Scores Statistics Score Type Quartile

τ1

τ ∗1

τ2

M

SD

M

SD

τ 2∗

T1

T2

M

SD

M

SD

M

SD

M

SD

.146 .173 .146 .143 . 157 .143 .149 .149 .149 .199 .133 .199

.208 .209 .203 .206

.156 .153 .153 .200

.208 .209 .203 .206

.156 .153 .153 .200

5.758 4.177 2.993 1.553

1.758 1.519 1.332 1.258

5.758 4.177 2.993 1.553

1.758 1.519 1.332 1.258

11

1 2 3 4

.173 .157 .149 .133

12

1 2 3 4

.077 .103 .050 −.005

.116 .077 .135 .103 .141 .050 .132 −.005

.146 .122 .134 .184

.105 .137 .093 .058

.118 .141 .147 .143

.111 .108 .071 .067

.146 .126 .136 .193

6.595 5.946 4.471 2.773

2.105 1.904 1.715 1.516

6.780 6.078 4.516 2.878

2.752 2.397 2.418 1.963

21

1 2 3 4

.077 .103 .050 −.005

.116 .065 .135 .074 .141 .042 .132 −.008

.141 .119 .119 .170

.110 .139 .106 .071

.119 .141 .152 .142

.098 .110 .098 .068

.148 .128 .138 .175

6.595 5.946 4.471 2.773

2.105 1.904 1.715 1.516

6.411 5.814 4.425 2.668

2.796 2 .339 2.024 1.948

22

1 2 3 4

.155 .160 .170 .143

.139 .149 .164 .185

.139 .149 .164 .185

.203 .218 .249 .239

.150 .154 .164 .193

.203 .218 .249 .239

.150 .154 .164 .193

5.525 1.590 5.525 1.590 3.946 1.589 3.946 1.589 2.953 1.409 2.953 1.409 1.416 .949 1.416 .949

.155 .160 .170 .143

Note. Type = examinee pair type, with levels: both examinees were administered Version 1 (11), the alleged copier was administered Version 1 and the alleged source was administered Version 2 (12), the alleged copier was administered Version 2 and the alleged source was administered Version 1 (21), and both examinees were administered Version 2 (22).

106

Detecting Answer Copying in Small-Scale Exams

(differences up to three standard deviations), and the standard deviations showed a weakly decreasing trend. These results were double-checked using simulated data that were free from answer-copying. The results were similar (not tabulated). This study provides evidence that the normed statistics do not depend on X + but the unnormed counts T 1 and T 2 decrease as X + increases.

Detection Rates The detection rates of the four τ statistics were investigated by means of a simulation study, and compared with the detection rates of statistic K¯ 2 (Sotaridona & Meijer, 2003), statistic ω (Wollack, 1997) and an upper benchmark, denoted UB, which was based on statistic ω. K¯ 2 considers the number of matches on items incorrectly answered by both source and copier. If K¯ 2 is high, there is evidence of answer copying. The sampling distribution of K¯ 2 uses the distribution of matches conditional on sum score X + . For small samples and long tests, K¯ 2 may be inaccurate and unstable because of too few observations at each X + level. Sotaridona and Meijer (2003) showed that ω had higher detection rates than other answer-copying statistics. Statistic ω is based on the likelihood of the number of matches with the potential source, given the copier’s estimated ability and the estimated item parameters under the nominal response model (Bock, 1972). In our analyses, we used small samples, N = 120, to estimate six nonredundant parameters per item. As a result, we found extreme-valued and instable item parameter estimates in several samples, causing instable ability estimates. Using instable ability estimates in the computation of ω may affect its detection rate and, as a result, ω may not be a good upper benchmark. Therefore, we included a theoretical upper benchmark, UB, which is defined as ω in which the true item parameters are inserted that were used to generate the data. Statistic ω uses the estimated item parameters. Statistic UB is truly an upper benchmark, because it is the best performing answer-copying statistic under theoretical conditions. Method Detection rates were investigated in simulated data that had the same characteristics as the real data. That is, two test versions each had 24 four-choice items with one correct option each. Sixteen items were unique and eight were common. The sample size was N = 240 instead of N = 230; 120 simulees instead of 117 were administered Test Version 1 and 120 simulees instead of 113 simulees, Test Version 2. These slightly modified frequencies were necessary for making convenient design choices. Null model of no answer copying. The nominal response model describes responses to multiple-choice items with one correct option, assuming that there was no copying. The nominal response model was fitted to the empirical data of Test Version 1 and Test Version 2 using concurrent calibration to locate the items on a common metric. Estimation was done using the computer program MULTILOG (Thissen, Chen, & Bock, 2003) assuming a standard normal distribution for the latent trait θ . The sets of parameter estimates for the two test versions were used to simulate data sets. 107

Van der Ark, Emons, and Sijtsma

For simulee-pairs Type 11, 12, 21, and 22, the detection rates of each of the statistics were determined as follows: 1. For each test version, 120 item-score vectors were generated under the nominal response model (N = 240). 2. Wollack (1997) and Sotaridona and Meijer (2002) assumed that a copier has lower ability than a source. In each sample, 240 θ s were generated at random from a standard normal distribution. For each θ , an item-score vector (J = 24) was generated under the nominal response model. Consider the first 12 pairs of θ s; in each pair the smallest θ was designated θ c and the highest θ s . To mimic answer copying, a set of items was selected, and the copier’s chosen options on these items were replaced by the source’s chosen options on the corresponding items. Thus, 12 item-score vectors out of 240 (i.e., 5%) were partly the result of answer copying. 3. Each statistic had four empirical null distributions: one for simulee-pair Type 11, one for Type 22, one for Type 12, and one for Type 21. It was assumed that 120 simulees were in one room and the other 120 simulees in another room. In each room, 60 simulees were administered Test Version 1 and the other 60 simulees Test Version 2. As an example, τ 1 for simulee-pair Type 11 was computed for each simulee pair who were in different rooms. This produced 60 × 60 = 3,600 values of τ 1 . These values constituted the empirical null distribution of τ 1 for Type 11. Other empirical null distributions were determined analogously for each combination of a statistic and a simulee-pair type. Initial simulations showed that the empirical Type I error rates were higher than the nominal level (maximum difference was .024). These inflated Type I error rates result from simulating answer copying under the condition that θ c < θ s (i.e., the copiers are not a random subset from the general population). To keep the Type I error rates close to the nominal level, we used as the reference distributions the distributions of statistics obtained between simulee pairs of which the members were in different rooms and for which the suspected copier had a lower X + than the source’s X + (in practical applications of the statistics, the latent trait is not estimated). Independent variables. The following independent variables were included in a completely crossed design: 1. Answer-copying statistics. Included were τ 1 , τ 2 , τ ∗1 , τ ∗2 , UB, ω, and K¯ 2 . 2. Simulee-pair type had four levels, denoted 11, 12, 21, and 22. 3. Type I error rate (α) had two levels: α = .01 (cutoff at the 99th percentile of the null distribution) and α = .05 (cutoff at the 95th percentile.) 4. Number of copied answers had three levels: either 0 (i.e., the null model), 5, 8, or 12 answers were copied. 5. Item set had two levels: All indicates that the copied items were a random sample from the 24 items and difficult indicates that the copied items were a random sample from the 12 most difficult items. Dependent variable. The dependent variable was the detection rate: this is the fraction of the simulee pairs that were simulated under the alternative model of 108

Detecting Answer Copying in Small-Scale Exams

answer copying with an answer-copying value greater than the 95th percentile (α = .05) or 99th percentile (α = .01) of the empirical null distribution. Detection rates were obtained in 1,500 replications. Results and Discussion Tables 2 and 3 show the detection rates when copying could occur on any of the items (level all) and only on the 12 most difficult items (level difficult), respectively. The standard errors of the detection rates were small (range = [.002, .018]; not tabulated). Because for identical test versions τ 1 = τ 2 and τ ∗1 = τ ∗2 , these statistics yielded identical detection rates. Table 2 also shows the empirical Type I error for each statistic. For the τ statistics, they were close to the nominal level. For statistics UB, ω, and K¯ 2 , Type I error rates tended to be smaller than the nominal level. The number of items copied had the strongest effect on the detection rates. For the τ statistics, detection rates ranged from .223 to .482 (α = .05) and from .061 to .208 (α = .01) when five items were copied; and from .722 to .999 (α = .05) and .417 to .963 (α = .01) when 12 items were copied. Among the τ statistics, τ ∗1 had the highest detection rates, followed by τ ∗2 , τ 1 , and τ 2 , respectively, but differences were small. The highest detection rates were TABLE 2 Detection Rates for Seven Answer Copying Statistics by Type I Error Rate (α), Type of Examinee Pair, and Number of Copied Answers (J cop ), When Copying Could Occur on Any Item α = .05 τ1

τ2

0 5 8 12

.051 .223 .438 .772

.051 .223 .438 .772

.046 .229 .450 .784

.046 .229 .450 .784

.036 .271 .553 .863

Copier 1, Source 1 .032 .017 .010 .246 .138 .061 .504 .325 .158 .827 .652 .420

.010 .061 .158 .420

.010 .062 .156 .417

.010 .062 .156 .417

.006 .088 .265 .638

.004 .073 .234 .584

.002 .031 .107 .356

0 5 8 12

.051 .392 .772 .988

.058 .358 .697 .965

.049 .420 .812 .994

.055 .393 .748 .983

.050 .483 .863 .998

Copier 1, Source 2 .049 .012 .011 .446 .100 .146 .818 .215 .439 .993 .461 .893

.014 .128 .370 .805

.010 .158 .485 .911

.013 .145 .420 .859

.010 .222 .608 .974

.011 .191 .537 .942

.001 .019 .060 .195

0 5 8 12

.051 .439 .822 .996

.058 .445 .782 .980

.049 .480 .871 .999

.055 .051 .482 .534 .832 .909 .990 1.000

Copier 2, Source 1 .050 .013 .011 .488 .087 .175 .867 .204 .519 .999 .433 .935

.013 .184 .488 .887

.010 .198 .585 .963

.012 .208 .551 .936

.010 .261 .696 .991

.011 .226 .614 .976

.001 .016 .051 .172

0 5 8 12

.052 .224 .442 .776

.052 .224 .442 .776

.048 .227 .453 .795

.048 .227 .453 .795

Copier 2, Source 2 .030 .020 .011 .252 .139 .062 .519 .322 .156 .839 .641 .418

.011 .062 .156 .418

.010 .061 .160 .420

.010 .061 .160 .420

.006 .088 .269 .643

.005 .080 .237 .599

.003 .034 .110 .344

J cop

τ ∗2

UB

α = .01

τ ∗1

.035 .274 .563 .862

ω

K¯ 2

τ1

τ2

τ ∗1

τ ∗2

UB

ω

K¯ 2

109

TABLE 3 Detection Rates for Seven Answer Copying Statistics by Type I Error Rate (α), Type of Examinee Pair, and Number of Copied Answers (J cop ), When Copying Could Occur on Difficult Items α = .05 J cop

τ1

τ2

τ ∗1

τ ∗2

α = .01 UB

ω

K¯ 2

τ1

τ2

τ 1∗

τ ∗2

UB

ω

K¯ 2

5 8 12

.305 .305 .601 .601 .911 .911

.320 .653 .943

Copier 1, Source 1 .320 .315 .292 .194 .096 .096 .095 .095 .104 .091 .048 .653 .644 .588 .477 .266 .266 .280 .280 .337 .294 .187 .943 .919 .879 .836 .681 .681 .739 .739 .760 .702 .610

5 8 12

.379 .304 .746 .593 .994 .930

.412 .809 .998

Copier 1, Source 2 .338 .494 .471 .140 .146 .108 .159 .128 .223 .200 .029 .670 .879 .854 .361 .424 .300 .465 .355 .634 .596 .120 .968 .999 .999 .731 .899 .717 .931 .793 .984 .974 .410

5 8 12

.411 .382 .446 .797 .727 .856 .998 .980 1.000

Copier 2, Source 1 .418 .508 .491 .116 .157 .149 .178 .167 .236 .208 .023 .786 .898 .874 .276 .480 .411 .536 .476 .667 .613 .075 .990 .999 .998 .621 .956 .869 .974 .909 .989 .981 .284

5 8 12

.292 .292 .584 .584 .916 .916

Copier 2, Source 2 .312 .294 .266 .182 .088 .088 .092 .092 .094 .080 .044 .634 .607 .569 .447 .242 .242 .265 .265 .305 .259 .165 .954 .901 .856 .830 .647 .647 .705 .705 .714 .653 .559

.312 .634 .954

obtained for τ ∗1 when α = .05, and when the two simulees were administered different versions and copied all twelve difficult items. The magnitude of the detection rates indicates that the statistics can detect a substantial number of copiers, especially those who copied many items. K¯ 2 had the lowest detection rates, and thus performed worse in small samples than the τ statistics. UB had the highest detection rates in most conditions but the differences with the detection rates of ω and the τ statistics were small. The detection rates of τ ∗1 were comparable to those of ω: in some conditions ω had detection rates that were a little higher, in other conditions τ ∗1 performed better. Thus, given the detection rates for our τ statistics that were comparable to those for ω, and not substantially smaller than those for UB, the important conclusion is that the τ statistics are feasible alternatives in small-sample applications. Compared with copying on any item, when copying could only be done from any of the twelve most difficult items (Table 3), detection rates were lower for simulee pairs Type 11 and 22 than for simulee pairs Types 12 and 21. Further, when copying could be done on any of the 24 items (Table 2) there were only small differences in detection rate among different types of simulee pairs. Empirical Data Analysis The empirical study served three goals. First, the degree of answer copying in the sample was estimated. Second, alternative reasons for cheating were sought 110

Detecting Answer Copying in Small-Scale Exams

to explain high answer-copying values. Third, possible copiers and sources were identified. Frequencies of Suspicious Examinee Pairs The distributions of statistics τ 1 , τ 2 , τ ∗1 , and τ ∗2 were estimated under the null model of no copying by means of the distribution of the statistics of all examinee pairs in which the members were in different rooms. Table 4 shows the proportions of examinee pairs having answer-copying values exceeding the cutoff score. For examinee pairs in different rooms, this proportion by definition equals .05 for α = .05 and .01 for α = .01. In real data, these proportions may differ from these values because several examinee pairs may have a value equal to the cutoff score. For pairs of non-neighboring examinees in the same room the proportion was a little higher when α = .05, but for pairs of neighboring examinees this proportion was considerably higher and even doubled when α = .01 (given 497 pairs of neighboring examinees, five pairs were expected to have values beyond the cutoff score, but 12 examinee pairs were found). Most suspicious examinee pairs were in Room C. Studying Extreme Examinee Pairs For each examinee-pair type, the 20 most extreme examinee pairs (having the highest answer-copying values) were studied individually. It was found that most of the extreme values were obtained by non-neighboring examinees, and cannot be due to answer copying. For high values of the statistics, three explanations are given which should be considered when interpreting these values. The numerical results illustrating for these explanations are given in Table 5. (1) Explanation A: High test score. Answer-copying values may be high due to chance when one of the examinees in a pair has a high test score. For example, examinee C.A11 (Table 5) failed only the four most difficult items; thus, ≤ 4 (ApX + = 20. For examinee-pairs Type 11 that included C.A11, T max 1 pendix). If the other examinee had answers matching these four incorrectly answered items, then τ 1 = τ 2 = 1; this was estimated to happen with a probability TABLE 4 Proportions of Examinee-Pairs With Answer-Copying Value Greater Than Cutoff Score α = .05 Location Different Rooms Same room/NN Neighbors Neighbors A Neighbors C Neighbors Q

No. of Pairs

τ1

τ2

τ ∗1

16,561 9,277 497 84 230 183

.050 .053 .060 .048 .065 .060

.048 .051 .063 .048 .067 .068

.051 .055 .067 .060 .072 .055

α = .01 τ ∗2

τ1

τ2

τ ∗1

τ ∗2

.052 .056 .061 .048 .072 .055

.010 .010 .024 .024 .030 .016

.010 .011 .021 .012 .028 .016

.011 .011 .020 .012 .024 .019

.011 .013 .018 .012 .020 .019

Note. Same room/NN = examinee pairs in which both members were not neighbors but were present in the same room.

111

TABLE 5 Numerical Illustrations of Explanations for High Values on Answer-Copying Statistics, 1% Cutoff Score Between Parentheses Statistics Copier

Source

Type

Neighbors

τ1

τ2

τ ∗1

τ ∗2

Exp.

Q.Q09 Q.I11 Q.L03 C.G18

C.A11 C.A11 C.A11 C.A11

11 11 11 11

No No No No

1.00 (.62) 1.00 (.62) 1.00 (.62) 1.00 (.62)

1.00 (.62) 1.00 (.62) 1.00 (.62) 1.00 (.62)

1.16 (.69) 1.10 (.69) 1.05 (.69) 1.03 (.69)

1.16 (.69) 1.10 (.69) 1.05 (.69) 1.03 (.69)

A A A A

C.A09 C.A07

C.A07 C.A09

21 12

Yes Yes

1.00 (.41) 1.00 (.41)

.86 (.40) .44 (.40)

1.16 (.49) 1.03 (.48)

1.02 (.50) .47 (.46)

B

Q.C22 Q.C20 Q.C22 Q.B22 Q.C20

Q.C20 Q.C22 Q.B22 Q.C22 Q.B22

21 12 21 12 11

Yes Yes Yes Yes Yes

.51 (.41) .51 (.41) .42 (.41) .42 (.41) .27 (.62)

.64 (.40) .01 (.40) .44 (.40) .09 (.40) .27 (.62)

.66 (.49) .56 (.48) .53 (.49) .52 (.48) .37 (.69)

.79 (.50) .06 (.46) .55 (.50) .19 (.46) .37 (.69)

B

C.A09 C.A09 Q.C05

C.A07 Q.C05 C.A07

21 21 11

Yes No No

1.00 (.41) .64 (.41) .05 (.62)

.86 (.40) .65 (.40) .05 (.62)

1.16 (.49) .72 (.49) .14 (.69)

1.02 (.50) .72 (.50) .14 (.69)

B C

B

Note. Type = examinee pair type; levels are described in the text. Exp. = explanation; levels are described in the text. Bold face indicates that the examinee had a high test score (X + ≥ 19), Underlining indicates that the examinee would have had a substantially (at least two points) higher test score had (s)he answered the other test version.

of .01. Hence, values of τ 1 = 1 may be due to high test scores rather than copying, and τ 1 still was high if the other examinee had matching answers on three instead of four most-difficult items. This conclusion holds for the majority of extreme answer-copying values in all examinee-pair types. (2) Explanation B: Copying from the direct neighbor. When examinee C.A07 was considered copier and her neighbor examinee C.A09 was considered source, their statistics were high but they were lower when roles were reversed (Table 5). Examinee C.A07 had X + = 4 but would have had X + = 17 if she had answered the same test version as C.A09. This provided evidence that C.A07 copied from C.A09. Examinee Q.C22 (X + = 12) had high values with her neighbors Q.C20 and Q.B22 and not with the other neighbors. She would have had X + = 16 if she had taken the other test version. Q.C22 had matching options on 22 out of 24 items, the first half mostly with Q.B22 and the second half mostly with Q.C20. This provided evidence that Q.C22 copied from two neighbors, whereas there was no evidence of answer-copying for either Q.C20 or Q.B22 (3) Explanation C: Other examinee is similar to source. A copier has a high answercopying value with a source due to copying (e.g., copier C.A09 and source C.A07) but also has a high value with a third examinee (a non-source, e.g., Q.C05) who took the same test version as the source and produced an answer pattern similar to that of the source. Source C.A07 and non-source Q.C05 both 112

Detecting Answer Copying in Small-Scale Exams

have high test scores on Test Version 2 and have high answer-copying values because of the resemblance between their item-score vectors. In most of the other examinee-pairs of Type 11 and 22 with high answer-copying values, one of the examinees had a low test score. Because such examinee pairs occurred most frequently in the sample, it seems natural that they also appear most frequently among the examinee pairs with high answer-copying values. Studying Neighboring Examinees For 55 pairs of neighboring examinees (11%) at least one statistic exceeded the cutoff score at the 95th percentile. Seat location and test score were used to better understand these results. For alleged copier C.E20 and alleged source C.G18, τ 1 = .42, τ 2 = .40 (both exceeding the cutoff at α = .01), and τ ∗1 = .44 and τ ∗2 = .42 (both exceeding cutoff at α = .05). Switching roles, τ 1 = .42, τ 2 = .29, and τ ∗1 = .54 (all exceeding cutoff at α = .05), but τ ∗2 did not exceed the cutoff. These results suggest that C.E20 is copier and C.G18 is source, but because C.G18 sat behind C.E20 in a higher location (Figure 1) C.E20 could not look at his/her answer sheet. One could argue that C.G18 whispered the answers in C.E20’s ear, but this is unlikely given that C.E20 had X + = 20 and C.G18 X + = 8. Given this information this examinee pair was not suspected of copying. For examinee-pair Types 11 and 22, statistics τ ∗1 and τ ∗2 were used to identify copier and source. Examinees C.G18 (X + = 8) and C.E18 (X + = 20) were administered Test Version 1 but τ 1 = τ 2 = .63 provided evidence of answer copying. Assuming C.E18 was copier and C.G18 was source, τ ∗1 and τ ∗2 did not exceed the cutoff score. Switching roles, τ ∗1 and τ ∗2 exceeded the cutoff score, thus providing evidence that C.E18 was source and C.G18 copier. This suggestion was corroborated by the seat locations (Figure 1) and the test scores. Based on the statistics, the test scores, and the seat locations, five examinees in total were suspected of copying (among them C.A07 and Q.C22 discussed earlier). Discussion Most importantly, the proposed answer-copying statistics are normed counts of suspicious pair-scores rather than raw counts (e.g., Holland, 1996). Norming was done similarly to the norming underlying Cohen’s (1960) kappa. This reduced the statistics’ dependence on test scores. Thus, the distributions of the statistics need not be estimated for each test score separately. Also using auxiliary information and the alternative explanations for high answer-copying values provided here, the new answer-copying statistics can identify potential copiers and sources in small-scale examinations. This approach was sought because traditional answer-copying detection requires large samples, either for determining distributions of statistics conditional on test scores, or for obtaining accurate estimates of model parameters in an IRT framework. The statistics τ 1 (identifying suspicious examinee pairs) and τ 2 (identifying copier and source) are easy to interpret. Statistic τ 2 cannot identify copier and source when two examinees were administered the same test version. Statistics τ ∗1 and τ ∗2 are more 113

Van der Ark, Emons, and Sijtsma

difficult to interpret because they are based on a mixture of two types of information and do not have a fixed upper bound. However, they have more power and may be used to identify copier and source when two examinees were administered the same test version. In real data, strong agreement was found among the four statistics. In each of the six examinee pairs we considered highly suspicious, all statistics agreed. The detection rates of the four τ statistics were a little less than the detection rate of theoretical upper benchmark UB. Because UB uses all information available in the data, and because it uses the true item parameters of the nominal response model under which the data were generated, it is unlikely that any statistic will perform better than UB. Thus, we believe further improvement of the τ -statistics is futile. This conjecture is confirmed by the similarity of the detection rates of the τ statistics and ω, which is the best answer-copying statistic according to Sotaridona and Meijer (2003). It may be noted that in our simulation studies, ω may have been given an unfair advantage because for both data generation and parameter estimation (which was needed to compute ω) the same IRT model was assumed, whereas in practice the model that generates the data is unknown. The τ -statistics, on the other hand, do not assume any model. This gives the τ -statistics another advantage over ω. Computation of the τ -statistics does not require additional software for IRT parameter estimation. Also, in small samples we encountered some computational problems in IRT parameter estimation, which required arbitrary decisions. For example, location parameter estimates were sometimes as extreme as −30 and had very large standard errors. An arbitrary decision to set the corresponding response probability equal to zero was needed to obtain ability estimates required for the computation of ω. The results showed that the answer-copying statistics identified more suspicious examinee pairs of Type 12 and 21 than suspicious examinee pairs of Type 11 and 22. This is partly due to the way suspicious pair scores are counted but also to the setup of the experiment. Direct neighbors of an examinee (right, left, front, back) were administered a different test version, so there was more opportunity to copy from a neighbor with a different test version than from a (more distant) neighbor with the same test version. One of the reviewers noted that the nominal Type I error rates in this study (5% and 1%) can be unacceptably high if procedures for discipline of students caught cheating are complicated or the threat of suing the university is serious. It is important to emphasize that statistical indices can never prove that an examinee has cheated, even if previous research shows that the Type I error rate is close to 0%. Moreover, if an examinee with a high answer-copying value was selected from a sample as being suspected, the answer-copying statistic cannot be used again as evidence that he or she has cheated (instead, independent evidence would be required). Therefore, answer-copying statistics are only useful (1) for detecting suspected examinees, (2) as circumstantial evidence, and (3) for gaining insight in central (group) tendencies of cheating. These considerations show that the choice of Type I error depends on the seriousness of the offence and its consequences for the parties involved. However, there is no clear-cut recipe for determining the exact Type I error rate. Future research may concentrate on the use of external evidence to investigate the validity of the answer-copying statistics in practice. Possible examples are (nonthreatening) post examination interviews and camera surveillance. 114

Appendix (equation (2)) and T2max (equation This appendix shows how to compute T1max vw vw (4)), how to approximate E(T1vw ) (equation (2)) and E(T2vw ) (equation (4)), and (equations (6) and (7)). For explanation and provides an algorithm to compute G max v computer codes in the R language, we refer to Van der Ark, Emons, and Sijtsma (2007). Throughout, it is assumed that a test has J items (U unique items and J − U common items). The test scores of examinee v and examinee w are denoted X v+ and X w+ , respectively, with realizations X v+ = L and X w+ = K . and T2max . For the computation T1max , assume without loss of Computation of T1max vw vw vw max generality, that L ≥ K . T1vw is the sum of three parts, A, B, and C, with A = min(U , L, J − K ) B = min(U − A, J − L, K ),

and

C = min(J − A − B, J − B − L, J − A − K ). Hence, T1max = A + B + C. vw , there are no inequality constraints on L and K . T2max is For the computation T2max vw vw the weighed sum of two parts, A∗ and B∗ , with A∗ = min(U , K , J − L), ∗

and



B = min(J − K , J − A − L). = 2A∗ + B ∗ . Then, T1max vw Approximation of E(T1vw ) and E(T2vw ). For simplicity, we assume that the items are interchangeable and that they have m response options of which 1 is correct and the other m − 1 are incorrect. This renders the solution an approximation of the real expected value but circumvents tedious calculations. The estimated probability that examinee v chooses the correct option is P v = X v+ /J . The estimated probability v . The approximathat examinee v chooses a particular incorrect option is Q v = 1−P m−1 tion of E(T1vw ) is the sum of two parts, E(U) (the expected number of suspicious pair-scores on the unique items given that the item scores of v and w are independent) and E(J − U ) (the expected number of suspicious pair-scores on the common items given that the item scores of v and w are independent); with E(U ) = U × [Pv Q w + Q v Pw + (m − 2)Q v Q w ] , and E(J − U ) = (J − U )(m − 1)Q v Q w . Hence, E(T1vw ) = E(U ) + E(J − U ). The approximation of E(T2vw ) follows the same logic. However, now E(U ) = U [2 × Q v Pw + (m − 2)Q v Q w ] , 115

Van der Ark, Emons, and Sijtsma

and E(J − U ) =

(J − U ) (m − 1)Q v Q w . 2

Hence, E(T2vw ) = E(U ) + E(J − U ). Computation of G max v . Let the items be ordered and numbered by increasing difficulty. The maximum value of G v depends on X v+ . For X + = K (K = 0, . . . , J ) there exists an item-score vector X K (consisting of K ones and J − K zeroes) producing the maximum value of G v given that X + = K . The algorithm for finding X K proceeds as follows. First, for K = 0, the zero vector X0 = 0 yields G max = 0 for X + = 0. Second, in a stepwise procedure, the 0s in X0 are replaced by 1s; one replacement per step. In Step 1, score 0 pertaining the most difficult common item is replaced by score 1; the number of Guttman errors from the resulting vector, denoted X1 , yields G max for X + = 1. In each next step, two cases are compared. Case 1: For the most difficult common item which still has score 0, score 0 is replaced by score 1, and the number of Guttman errors is computed. Case 2: For the most difficult unique item which still has score 0, score 0 is replaced by score 1, and the number of Guttman errors is computed. In both cases, the number of Guttman errors is computed under the condition that the scores on common items always match the item scores of a potential source. If in Step K the number of Guttman errors in Case 1 is greater than the number of Guttman errors in Case 2, then X K is the item-score vector that resulted from Case 1 and the replacement in Case 2 is cancelled; if the number of Guttman errors in Case 2 is greater than the number of Guttman errors in Case 1, then X K is the item-score vector that resulted from Case 2 and the replacement in Case 1 is cancelled. These steps are repeated until we have a vector X J in which all items have a score 1. References Angoff, W. H. (1974). The development of statistical indices for detecting cheaters. Journal of the American Statistical Association, 69, 44–49. Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51. Cizek, G. J. (1999). Cheating on tests: How to do it, detect it, and prevent it. Mahwah, NJ: Erlbaum. Cohen, J. A. (1960). Coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. Frary, R. B., & Tideman, T. N. (1997). Comparison of two indices of answer copying and development of a spliced index. Educational and Psychological Measurement, 57, 20– 32. Frary, R. B., Tideman, T. N., & Watts, T. M. (1977). Indices of cheating on multiple-choice tests. Journal of Educational Statistics, 2, 235–256. Holland, P. W. (1996). Assessing unusual agreement between the incorrect answers of two examinees using the K-index: Statistical theory and empirical support (ETS Technical Report No. 96-4). Princeton, NJ: Educational Testing Service. Kvam, P. H. (1996). Using exam scores to estimate the prevalence of classroom cheating. The American Statistician, 50, 238–242.

116

Detecting Answer Copying in Small-Scale Exams Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Thousand Oaks, CA: Sage. Sotaridona, L. S., & Meijer, R. R. (2002). Statistical properties of the K-index for detecting answer copying. Journal of Educational Measurement, 39, 115–132. Sotaridona, L. S., & Meijer, R. R. (2003). Two new statistics to detect answer copying. Journal of Educational Measurement, 40, 53–69. Stern, E. B., & Havlicek, L. (1986). Academic misconduct: Results of faculty and undergraduate student surveys. Journal of Allied Health, 15(2), 129–142. Thissen, D., Chen, W.-H., & Bock, R. D. (2003). MULTILOG [Computer software]. Lincolnwood, IL: Scientific Software International. Van der Ark, L. A., Emons, W. H. M., & Sijtsma, K. (2007). The computation of τ 1 τ 2 , τ ∗ 1 , and τ ∗ 2 (http://spitswww.uvt.nl/∼avdrark/research/research.htm#OtherPublicationsMark) (Accessed December 5, 2007). Unpublished manuscript. Wollack, J. A. (1997). A nominal response model approach for detecting answer copying. Applied Psychological Measurement, 21, 307–320. Wollack, J. A. (2004). Detecting answer copying on high-stakes tests. The Bar Examiner, 73, 35–45.

Authors L. ANDRIES VAN DER ARK is Associate Professor at the Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000 LE, Tilburg, The Netherlands; [email protected]; http://www.tilburguniversity.nl/webwijs/show/?uid=a.vdark. His primary research interests include item response theory, latent class analysis, and missing data analysis. WILCO H. M. EMONS is Assistant Professor at the Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000 LE, Tilburg, The Netherlands; [email protected]; http://www.tilburguniversity.nl/webwijs/show/?uid=w.h.m.emons. His primary research interests include person-fit analysis and item response theory. KLAAS SIJTSMA is Full Professor at the Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000 LE, Tilburg, The Netherlands; [email protected]; http://www.tilburguniversity.nl/webwijs/show/?uid=k.sijtsma. His primary research interests include measurement of individual differences.

117

Suggest Documents