Document not found! Please try again

Self-Adapted Testing Using MicroCAT 1 Conducting Self-Adapted ...

6 downloads 367 Views 806KB Size Report
... Lincoln, NE 68588-0203. (E-Mail: [email protected]). ... testing software to administer several types of self-adapted tests. Code is provided for (a) a basic ...
Self-Adapted Testing Using MicroCAT 1

Conducting Self-Adapted Testing Using MicroCAT

Linda L. Roos and Steven L. Wise University of Nebraska-Lincoln Michael E. Yoes Assessment Systems Corporation Thomas R. Rocklin University of Iowa

Correspondence regarding this article should be addressed to Linda L. Roos, 501 Building, Room 118, University of Nebraska, Lincoln, NE 68588-0203 (E-Mail: [email protected]).

I

Self-Adapted Testing Using MicroCAT 2 Abstract This article describes MCATL program code for using the MicroCAT 3.5 testing software to administer several types of self-adapted tests. Code is provided for (a) a basic self-adapted test, (b) a self-adapted version of an adaptive mastery test, and (c) a restricted self-adapted test.

Self-Adapted Testing Using MicroCAT 3 Conducting Self-Adapted Testing Using MicroCAT

Computer-based testing is a growing area of educational and psychological measurement. During the past few decades, much attention has been paid to computer-based versions of conventional (i.e., linear) tests and computerized adaptive tests (CATs). In particular, a CAT is very attractive to measurement practitioners because it employs item response theory (IRT) methods to efficiently select and administer the items that are most informative to the estimation of a given examinee's level of proficiency, resulting in a substantial reduction in testing time as compared with a conventional test. Innovative computer-based testing methods continue to be developed. One such method is self-adapted testing (Rocklin & O'Donnell, 1987). A selfadapted test, or SAT, is different from a CAT in one major respect--in a SAT examinees are allowed to choose the difficulty level of each test item, whereas in a CAT the computer chooses the difficulty level of each item. Although this difference might seem innocuous, research has indicated that examinees experience less stress and (sometimes) higher test performance w h e n administered a SAT, as compared with a CAT. Moreover, a SAT yields proficiency estimates that are less correlated with examinee anxiety, which suggests that scores yielded by a SAT may be more valid than those yielded by a CAT. Rocklin (1994) and Wise (1994) provide discussions of this research. The actual use of computer-based testing methods by researchers and practitioners is dependent, in part, on the availability of software for test delivery. In 1984, Assessment Systems Corporation introduced MicroCAT, a software package that allows users to develop computerized test banks and to

Self-Adapted Testing Using MicroCAT 4 administer both CATs and computer-based conventional tests. An integral aspect of MicroCAT is the Minnesota Computerized Adaptive Testing Language (MCATL), which provides a basis for users to tailor MicroCAT programs to their specific measurement needs regarding (a) the order and manner in which items are administered, (b) the collection and storage of various types of information during the testing session, such as response time for each item and which items were administered, and (c) the a m o u n t and types of information provided to examinees regarding their test performance. Although a number of research studies have emerged regarding the effects of SAT, is it not widely k n o w n how a SAT can be administered through MicroCAT. The purpose of this article, therefore, is to provide examples of MCATL program code for several types of SATs, using the most recent version of MicroCAT (3.5; Assessment Systems Corporation, 1994). This is not intended to be a tutorial on the use of MicroCAT; rather, it is a description of how to use MicroCAT to administer a testing procedure that is not described in the MicroCAT documentation but has received increasing attention in the measurement research literature. It is expected that the reader is familiar with the MicroCAT testing software and has a working knowledge of MCATL. Item Bank Characteristics and Program Setup In a SAT, the IRT-calibrated items in the bank are ranked according to difficulty and divided into several difficulty strata each possessing an equal number of items. In the examples that follow, assume that there are 50 items in each stratum. The items in each stratum are arranged in some p r e d e t e r m i n e d fashion. For example, the items can be arranged r a n d o m l y or, i f the items are calibrated using an |RT model in which a discrimination

Self-Adapted Testing Using MicroCAT 5 parameter is estimated, the items in a stratum could be ordered in terms of decreasing discrimination. Although as many as nine strata may be utilized by MCATL, the maximum number of strata available to an examinee at any one time is constrained to be six. In a SAT, an examinee's choice of difficulty level occurs via h i s / h e r response to a multiple choice 'item' in which the response options correspond to the available difficulty levels. In MicroCAT, a m a x i m u m of six response options may be specified for a multiple choice item. An effective way to initiate a test administration is to use the 'Default' test included with the MicroCAT software which first requests the examinee's ID and name and then requests the name of the test to be administered. The student identification information is stored along with the test results. .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Insert Figure 1 about here .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic MCATL Code for Conducting a SAT Figure 1 shows an example of the basic MCATL code for the administration of a SAT. The program begins by administering an 'item' (line 2) containing instructions to examinees regarding the basic SAT testing process and the scoring methods used. An example of the instructions used in previous SAT studies can be found in Wise, Plake, Johnson, and Roos (1992). Line 3 shows two examples of scoring m e t h o d s - - m a x i m u m likelihood proficiency estimation and number of items answered correctly. The information written to the output file after each content item is administered to the examinee is displayed on lines 4 and 5 and includes the count of items administered, the ID of the item, the key and examinee response to the item, the examinee's proficiency and variance estimates as

S e l f - A d a p t e d Testing U s i n g MicroCAT 6 well as the a m o u n t of time spent on the i t e m a n d the test to that point. The t e r m i n a t i o n criterion is specified on Line 6; in this example, the test will terminate after 25 items have been a d m i n i s t e r e d .

Line 7 administers the item

used to b r a n c h to one of the six strata d e f i n e d b y m a t c h i n g the examinee's difficulty level cheice to the stratum d i s p l a y e d in lines 8 t h r o u g h 37. Each s t r a t u m contains a s e q u e n c e / e n d s e q u e n c e block w i t h the items assigned to the s t r a t u m listed w i t h i n the block. In o r d e r to m a k e testing conditions e q u i v a l e n t w h e n c o m p a r i n g CAT a n d SAT, item feedback is a d m i n i s t e r e d via a MicroCAT post-item and therefore, w h e t h e r an e x a m i n e e r e s p o n d s correctly or not, the test branches to an item (lines 38 a n d 39) that allows the e x a m i n e e to choose the difficulty level of h i s / h e r n e x t item.

W h e n the t e r m i n a t i o n

criterion is met, the p r o g r a m branches to line 40 a n d the total testing time, total n u m b e r of items a n s w e r e d correctly a n d final proficiency a n d variance estimates are w r i t t e n to the o u t p u t file.

Insert Figure 2 a b o u t here

A d a p t i v e Mastery Testing U s i n g a SAT Format MicroCAT can also be used to a d m i n i s t e r an a d a p t i v e m a s t e r y test (AMT; Weiss & Kingsbury, 1984). A n A M T is an efficient w a y to assess w h e t h e r an e x a m i n e e ' s proficiency level is a b o v e or b e l o w some preestablished criterion. In the typical AMT a l g o r i t h m , items are targeted to the current estimate of an examinee's p r o f i c i e n c y a n d the test terminates w h e n either: (a) a confidence interval c o n s t r u c t e d a r o u n d an examinee's proficiency level does n o t contain the criterion v a l u e or (b) a m a x i m u m n u m b e r of a d m i n i s t e r e d items is reached. By r e p l a c i n g lines 6 t h r o u g h 39 of Figure 1

Self-Adapted Testing Using MicroCAT 7 with the code shown in Figure 2, an AMT can be modified to a SAT administration format (self-adapted mastery test; SAMT). A maximumlikelihood scoring method is utilized in this test so the m o d e (proficiency estimate) and error (variance) variables are initialized at the beginning of the test. The terminate statement considers both t e r m i n a t i o n criteria m e n t i o n e d previously. Before branching to the next difficulty level choice, the current proficiency estimate and standard error (computed from the variance via the built-in square root function) are used to construct a confidence interval (in this example, a 95% confidence interval) which is then evaluated to determine if a stopping criterion has been satisfied.

Insert Figure 3 about here .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Administering a Restricted SAT Although it has been found that most examinees administered a SAT tend to choose item difficulty levels that are reasonably matched to their proficiency levels, a small proportion choose difficulty levels that are not well matched. The undesirable consequence is that the standard errors of the resulting proficiency estimates for these examinees will be large. To address this problem, Wise, Kingsbury, and Houser (1993) proposed using a restricted SAT, in which a narrower range of difficulty level choices is established around a examinee's current proficiency estimate. This procedure w o u l d prevent examinees from choosing item difficulty levels that are much too easy or difficult for them. An example of the code for administering a restricted. SAT is shown in Figure 3, which would be substituted for lines 7-39 ~of Figure 1. In this illustration, the item bank has been divided into nine

Self-Adapted Testing Using MicroCAT 8 strata. Moreover, each examinee is allowed to choose from a m o n g five difficulty levels; these levels correspond to the five strata that are closest to the examinee's current proficiency estimate. The initial difficulty level choice allows an examinee to choose a m o n g the middle five difficulty levels. After each item has been administered and scored, the examinee's proficiency estimate is updated and the five closest strata are presented in the next choice. The four values specified in the IF statements near the end of the code in Figure 3 represent the cutting points corresponding to the beginning of strata four t h r o u g h seven; these statements are used to identify the five closest strata. Possible Modifications There.are a variety of modifications possible w h e n developing programs for administering a SAT. For example, feedback can be administered in at least two ways. One method is illustrated in the code for the basic SAT test (Figure 1) and might be used if CAT and SAT are being compared and equivalent feedback is desired. After each content item is administered, the examinee responds to the item and presses the 'Enter' key. The item feedback is then displayed on the screen via a MicroCAT post-item and is of the form 'a is the correct answer'. A second method would possibly be more desirable if one were administering only the SAT. The feedback is displayed in the same 'item' as the difficulty level choice and is of the format 'your previous an:;wer was incorrect'. This would require that the sequence line for each of tile strata contain different branches for correct and incorrect responses: $STR5

Sequence IN: $INCORRECT CO: $CORRECT

Of course, the provision of any feedback at all is optional.

Self-Adapted Testing Using MicroCAT 9 Other possible modifications include the n u m b e r of strata constructed, type of scoring procedure used, and the type and amount of i n f o r m a t i o n stored in each examinee's o u t p u t file. When you do not wish to record any of the score and response information (e.g., time taken to choose) for the difficulty level choice (i.e., branching) item(s), these items should be defined with an item type of XX preventing them from being included in any item counts or scoring methods. This allows the choice items to be administered but not to impact the count of n u m b e r of content items administered. If output is desired for the difficulty level choice items, the 'item type' is left blank and the administration of the choice item is included in the c o u n t which must be considered w h e n calculating the n u m b e r of content items that have been administered.

Self-Adapted Testing Using MicroCAT 10 References Assessment Systems Corporation (1994). User's manual for the MicroCAT

Testing System, Version 3.5. St. Paul, MN: Author. Rocklin, T. (1994). Self-adapted testing. Applied Measurement in Education, 7, 3-14. Rocklin, T., & O'Donnell, A. M. (1987). Self-adapted testing: A performanceimproving variant of computerized adaptive testing. Journal of

Educational Psychology, 79, 315-319. Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361-376. Wise, S. L. (1994). Understanding self-adapted testing: The perceived control hypothesis.

Applied Measurement in Education, 7, 15-24.

Wise, S. L., Kingsbury, G. G., & Houser, R. L. (1993, April). An investigation

of restricted self-adapted testing. Paper presented at the annual meeting of the National Council on Measurement in Education, Atlanta, GA. Wise, S. L., Plake, B. S., Johnson, P. L., & Roos, L. L. (1992). A comparison of self-adapted and computerized adaptive tests. Journal of Educational

Measurement, 29, 329-339. Wise, So L., Roos, L. L., Plake, B. S., Nebelsick-Gullett, L. J. (1994). The relationship between examinee anxiety and preference for self-adapted testing. Applied Measurement in Education, 7, 81-91.

Figure Captions Figure 1. MCATL Code for a SAT. The line numbers would not be included in a MicroCAT file; they are included only to provide reference points for the code contained in Figures 2 and 3. Figure 2. MCATL Code for a SAMT. Figure 3. MCATL Code for a Restricted SAT.

i,

2. 3. 4. 5. 6. 7. 8. 9. 10. ii. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

Test SELFADAP #INSTRI SetScore @MaximumLikelihood(Mode, Error) , @NumberCorrect (NC) AutoKeep (@ItemsAdministered, @ItemID, @ItemKey, @ItemResponse, & Mode,Error, @ItemLatency, @TestLatency, @NL) Terminate SFINISH (@ItemsAdministered = 25) #CH001 AL:A=$STRI B=$STR2 C=$STR3 D=$STR4 E=$STR5 F=$STR6 $STRI Sequence IN: $NEXTCH CO: $NEXTCH #item001 #itemO50 EndSequence ~STR2 #item051 #iteml00 EndSequerlce $STR3 #iteml01 #itemlS0 End[Sequence $STR4 #iteml51 #item200 EndSequence $STR5 #item201 #item250 EndSequence $STR6 #item251

Sequence

IN: $NEXTCH CO: SNEXTCH

Sequence

IN: $NEDCI~H CO: $NEXTCH

Sequence

IN: SNEXTCH CO: SNEXTCH

Sequence

IN: $NEXTCH CO: SNEDCI~H

Sequence

IN: SNEXTCH CO :SNEXTCH

#item300 EndSequence

SNEXTCH #CH002 AL:A=$STRI B=$STR2 C=$STR3 D=$STR4 E=$STR5 F=$STR6 $FINISH @KeepLine ( "Total time : ", @Tes tLatency) @KeepLine ( "Number correct: ",NC) @KeepLine("Final proficiency and var estimates: ",Mode,Error) EndTe s t

Set M o d e = 0 Set E r r o r = 1 Terminate SFINISH

((@ItemsAdministered = 25) or & ((@ItemsAdministered > 9) and ((neg > - 0.2) or (pos < - 0.2)))) #CH001 AL:A=$STRI B=$STR2 C=$STR3 D=$STR4 E=$STR5 F=$STR6 $STRI S e q u e n c e IN :$ N E X T C H C O :$ N E X T C H #item001

#item050 EndSequence $STR2 Sequence #item051 #iternl00 EndSequence $STR3 Sequence #iteml01 #iteml50 EndSequence $STR4 Sequence #iteml51 #item200 EndSequence $STR5 Sequence #item201 #item250 EndSequence $STR6 Sequence #item251

IN: SNEDClt~ CO: $

~

IN :$ N E X T C H

CO :S N E X T C H

I'q: S N E X T C H

CO: S N E X T C H

IN: SNF_3CI~H CO: S N E X T C H

IN: $NF2CrCH CO: $ N E X T C H

#item300 EndSequence $NEXTCH Set sd = @ S q r t ( E r r o r ) Set n e g = M o d e - (1.96 * SD) Set p o s = M o d e + (1.96 * SD) #CH002 AL:A=$STRI B:$STR2 C=$STR3 SFINISH

D:$STR4

E=$STR5

F:$STR6

#CH03 AL:A=$STR3 B=$STR4 C=$STR5 D:$STR6 E=$STR7 $STRI Sequence IN: SNEXTCH CO: SNEXTCH #item001 #item050 EndSequence $STR2 Sequence IN: $NEXTCH CO: SNFDCI~H #item051 #iteml00 EndSequence $STR3 Sequence IN :$NEXTCH CO :$NEXTCH #iteml01 #iteml50 EndSequence $STR4 Sequence IN :SNEXTCH CO :SNEXTCH #iteml51 #item200 EndSequence $STR5 Sequence IN: $NEXTCH CO: SNEXTCH #itexn201 #item250 EndSequence $STR6 Sequence IN: $NEXTCH CO: $NEXTCH #item251 #item300 EndSequence $STR7 Sequence IN: $NF_~TCH CO: $NEXTCH #item301 #item350 EndSequence $STR8 Sequence IN: SNEXTCH CO: SNEXTCH #item351 #item400 EndSequence $STR9 Sequence IN :$NEXTCH CO :SNEXTCH #item401 #item450 EndSequence

$NEXTCH IF (Mode > 1.4) #CH004 AL:A=$STR5 B=$STR6 C=$STR7 D=$STR8 E=$STR9 ELSE IF (Mode > 0.5) #CH004 AL:A=$STR4 B:$STR5 C:$STR6 D:$STR7 E:$STR8 ELSE IF (Mode > -0.4) #CH004 AL:A=$STR2 B=$STR3 C=$STR4 D:$STR5 E:$STR6 ELSE IF (Mode > -1.3) #CH004 AL:A=$STR3 B=$STR4 C=$STR5 D=$STR6 E=$STR7 ELSE #CH004 AL:A=$STRI B=$STR2 C=$STR3 D=$STR4 E=$STR5 endi f endi f endif endif

Suggest Documents