The Role of Item Models in Automatic Item Generation - University of ...

25 downloads 97 Views 464KB Size Report
integrating test items and digital media to substantially increase the types of knowledge, ..... Their examples were drawn from diverse content areas, including science, social ..... Philadelphia, PA: National Board of Medical Examiners. ... annual meeting of the National Council on Measurement in Education, San Diego, CA.
The Role of Item Models in Automatic Item Generation

Mark J. Gierl Hollis Lai Centre for Research in Applied Measurement and Evaluation University of Alberta

Paper Presented at the Symposium “Item Modeling and Item Generation for the Measurement of Quantitative Skills: Recent Advances and Prospects” Annual Meeting of the National Council on Measurement in Education New Orleans, LA April, 2011

Item Models 2 INTRODUCTION Randy Bennett (2001) claimed, a decade ago, that no topic would become more central to the innovation and future practice in educational assessment than computers and the internet. His prediction has proven to be accurate. Educational assessment and computer technology have evolved at a staggering pace since 2001. As a result many educational assessments, which were once given in a paper-and-pencil format, are now administered by computer using the internet. Education Week’s 2009 Technology Counts, for example, reported that 27 US states now administer internet-based computerized educational assessments. Many popular and well-known exams in North America such as the Graduate Management Achievement Test (GMAT), the Graduate Record Exam (GRE), the Test of English as a Foreign Language (TOEFL iBT), and the American Institute of Certified Public Accountants Uniform CPA examination (CBT-e), to cite but a few examples, are administered by computer over the internet. Canadian testing agencies are also implementing internet-based computerized assessments. For example, the Medical Council of Canada Qualifying Exam Part I (MCCQE I), which is written by all medical students seeking entry into supervised clinical practice, is administered by computer. Provincial testing agencies in Canada are also making the transition to internet-based assessment. Alberta Education, for instance, will introduce a computer-based assessment for elementary school students in 2011, as part of their Diagnostic Mathematics Program. Internet-based computerized assessment offers many advantages to students and educators compared to more traditional paper-based assessments. For instance, computers support the development of innovative item types and alternative item formats (Sireci & Zenisky, 2006; Zenisky & Sireci, 2002); items on computer-based tests can be scored immediately thereby providing students with instant feedback (Drasgow & Mattern, 2006); computers permit continuous testing and testing ondemand for students (van der Linden & Glas, 2010). But possibly the most important advantage of

Item Models 3 computer-based assessment is that it allows educators to measure more complex performances by integrating test items and digital media to substantially increase the types of knowledge, skills, and competencies that can be measured (Bartram, 2006; Zenisky & Sireci, 2002). The advent of computer-based testing has also raised new challenges, particularly in the area of item development (Downing & Haladyna, 2006; Schmeiser & Welch, 2006). Large numbers of items are needed to develop the banks necessary for computerized testing because items are continuously administered and, therefore, exposed. As a result, these banks must be frequently replenished to minimize item exposure and maintain test security. Because testing agencies are now faced with the daunting task of creating thousands of new items for computer-based assessments, alternative methods of item development are desperately needed. One method that may be used to address this challenge is through automatic item generation (Drasgow, Luecht, & Bennett, 2006; Embretson & Yang, 2007; Irvine & Kyllonen, 2002). Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists develop item models, which are comparable to templates or prototypes, that highlight the features or elements in the assessment task that must be manipulated. Second, these item model elements are manipulated to generate new items with the aid of computerbased algorithms. With this two-step process, hundreds or even thousands of new items can be created from a single item model. The purpose of our paper is describe seven different but related topics that are central to the development and use of item models for automatic item generation. We start by defining item model and highlighting some related concepts; we describe how item models are developed; we present an item model taxonomy; we illustrate how item models can be used for automatic item generation; we

Item Models 4 outline some benefits of using item models; we introduce the idea of an item model bank; and finally, we demonstrate how statistical procedures can be used to estimate the parameters of the generated items without the need for extensive field or pilot testing. We begin by describing two general factors that, we feel, will directly affect educational measurement—including emerging methods such as automatic item generation—in the 21st century. TWO FACTORS THAT WILL SHAPE EDUCATIONAL MEASUREMENT IN THE 21ST CENTURY We assert the first factor that will shape educational measurement in the 21st century is the growing view that the science of educational assessment will prevail in guiding the design, development, administration, scoring, and reporting practices in educational testing. In their seminal chapter on “Technology and Testing” in the 4th Edition of the handbook Educational Measurement, Drasgow, Luecht, and Bennett (2006, p. 471) begin with this bold claim: This chapter describes our vision a 21st-century testing program that capitalizes on modern technology and takes advantage of recent innovations in testing. Using an analogy from engineering, we envision a modern testing program as an integrated system of systems. Thus, there is an item generation system, an item pretesting system, and examinee registration system, and so forth. This chapter discusses each system and illustrates how technology can enhance and facilitate the core processes of each system. Drasgow et al. present a view of educational measurement where integrated technology-enhanced systems govern and direct all testing processes. Ric Luecht has coined this technology-based approach to educational measurement “assessment engineering” (Luecht, 2006a, 2006b, 2007, 2011). Assessment engineering is an innovative approach to measurement practice where engineering-based principles and technology-enhanced processes are used to direct the design and development of assessments as well as the analysis, scoring, and reporting of assessment results. With this approach, the measurement specialist begins by defining the construct of interest using specific, empiricallyderived cognitive models of task performance. Next, item models are created to produce replicable

Item Models 5 assessment tasks. Finally, statistical models are applied to the examinee response data collected using the item models to produce scores that are both replicable and interpretable. The second factor that will likely shape educational measurement in the 21st century stems from the fact that the boundaries for our discipline are becoming more porous. As a result, developments from other disciplines such as cognitive science, mathematical statistics, medical education, educational psychology, operations research, educational technology, and computing science will permeate and influence educational testing. These interdisciplinary contributions will also create opportunities for both theoretical and practical change. That is, educational measurement specialists will begin to draw on interdisciplinary developments to enhance their own research and practice. At the same time, students across a host of other disciplines will begin to study educational measurement 1. These interdisciplinary forces that promote new ideas and innovations will begin to evolve, perhaps slowly at first, but then at a much faster pace leading to even more changes in our discipline. It may also mean that other disciplines will begin to adopt our theories and practices more readily as students with educational measurement training move back to their own content domains and areas of specialization. ITEM MODELING: DEFINITION AND RELATED CONCEPTS An item model (Bejar, 1996, 2002; Bejar, Lawless, Morley, Wagner, Bennett, & Revuelta, 2003; LaDuca, Staples, Templeton, & Holzman, 1986)—which has also been described as a schema (Singley & Bennett, 2002), blueprint (Embretson, 2002), template (Mislevy & Riconscente, 2006), form (Hively, Patterson, & Page, 1968), clone (Glas & van der Linden, 2003), and shell (Haladyna & Shindoll, 1989)— serves as an explicit representation of the variables in an assessment task, which includes the stem, the options, and oftentimes auxiliary information (Gierl, Zhou, & Alves, 2008). The stem is the part of an 1

We have already noticed this change in our own program. We currently have 14 students in the Measurement, Evaluation, and Cognition (MEC) graduate program at the University of Alberta. These students represent a diverse disciplinary base, which includes education, cognitive psychology, engineering, computing science, medicine (one of our students is a surgery resident), occupational therapy, nursing, forensic psychology, statistics, and linguistics.

Item Models 6 item which formulates context, content, and/or the question the examinee is required to answer. The options contain the alternative answers with one correct option and one or more incorrect options or distracters. When dealing with a multiple-choice item model, both stem and options are required. With an open-ended or constructed-response item model, only the stem is created. Auxiliary information includes any additional material, in either the stem or option, required to generate an item, including digital media such as text, images, tables, diagrams, sound, and/or video. The stem and options can be divided further into elements. These elements are denoted as strings, S, which are non-numeric values and integers, I, which are numeric values. By systematically manipulating the elements, measurement specialists can generate large numbers of items from one item model. If the generated items or instances of the item model are intended to measure content at similar difficulty levels, then the generated items are isomorphic. When the goal of item generation is to create isomorphic instances, the measurement specialist manipulates the incidental elements, which are the surface features of an item that do not alter item difficulty. Conversely, if the instances are intended to measure content at different difficulty levels, then the generated items are variants. When the goal of item generation is to create variant instances, the measurement specialist can manipulate the incidental elements, but must manipulate one or more radical elements in the item model. The radicals are the deep features that alter item difficulty, and may even affect test characteristics such as dimensionality. To illustrate some of these concepts, an example from Grade 6 mathematics is presented in Figure 1. The item model is represented as the stem and options variables with no auxiliary information. The stem contains two integers (I1, I2). The I1 element includes Ann’s payment. It ranges from $1525 to $1675 in increments of $75. The I2 element includes the size of the lawn, as either 30/m2 or 45/m2. The four alternatives, labelled A to D, are generated using algorithms produced from the integer values I1 and I2 (including the correct option, which is A).

Item Models 7 Figure 1. Simple item model in Grade 6 mathematics with two integer elements. Ann has paid $1525 for planting her lawn. The cost of lawn is $45/m2. Given the shape of her lawn is square, what is the side length of Ann’s lawn? A. 5.8 B. 6.8 C. 4.8 D. 7.3 ITEM MODEL VARIABLES Stem Ann has paid $I1 for planting her lawn. The cost of lawn is $I2/m2. Given the shape of her lawn is square, what is the side length of Ann’s lawn? Elements I1 Value Range: 1525-1675 by 75 I2 Value Range: 30 or 45 Options A=�𝐼1�𝐼2

B=�𝐼1�𝐼2 + 1 C=�𝐼1�𝐼2 − 1

D=�𝐼1�𝐼2 + 1.5 Key A

DEVELOPING ITEM MODELS Test development specialists have the critical role of designing and developing the item models used for automatic item generation. The principles, standards, and practices that guide traditional item development (cf. Case & Swanson, 2002; Downing & Haladyna, 2006; Schmeiser & Welch, 2006) have been recommended for use in item model development. Although a growing number of item model examples are available in the literature (e.g., Bejar et al., 2003; Case & Swanson, 2002; Gierl et al., 2008), there are currently no published studies describing either the principles or standards required to

Item Models 8 develop these models. Drasgow et al. (2006) advise test development specialists to engage in the creative task of developing item models by using design principles and guidelines discerned from a combination of experience, theory, and research. Initially, these principles and guidelines are used to identify a parent item model. One way to identify a parent item model is by using a cognitive theory of task performance. Within this theory, cognitive models, as described by Luecht in his assessment engineering framework, may be identified or discerned. With this type of “strong theory” approach, cognitive features are identified in such detail that item features that predict test performance can be not only specified but also controlled. The benefit of using strong theory to create item models is that item difficulty for the generated items is predictable and, as a result, the generated items may be calibrated without the need for extensive field or pilot testing because the factors that govern the item difficulty level can be specified and, therefore, explicitly modeled and controlled. Unfortunately, few cognitive theories currently exist to guide our item development practices (Leighton & Gierl, in press). As a result, the use of strong theory for automatic item generation has, thus far, been limited to narrow content domains, such as mental rotation (Bejar, 1990) and spatial ability (Embretson, 2002). In the absence of strong theory, parent item models can be identified using “weak theory” by reviewing items from previously administered exams or by drawing on an inventory of existing test items in an attempt to identify an underlying structure. This structure, if identified, provides a point-ofreference for creating alternative item models, where features in the alternative models can be manipulated to generate new items. Test development specialists can also create their own unique item models. The weak theory approach to developing parent models using previously administered items, drawing on an inventory of existing items, or creating new models is well-suited to broad content domains where few theoretical descriptions exist on the cognitive skills required to solve test items (Drasgow et al., 2006). The main drawback of using weak theory to create item models is that

Item Models 9 item difficulty for the generated items is unpredictable and, therefore, field or pilot testing may be required. ITEM MODEL TAXONOMY Gierl et al. (2008) described a taxonomy of item model types, as a way of offering guidelines for creating item models. The taxonomy pertains to generating multiple-choice items and classifies models based on the different types of elements used in the stems and options. The stem is the section of the model used to formulate context, content, and/or questions. The elements in the stem can function in four different ways. Independent indicates that the ni element(s) (ni ≥ 1) in the stem are unrelated to one another. That is, a change in one element will have no effect on the other stem elements in the

item model. Dependent indicate all nd element(s) (nd ≥ 2) in the stem are directly related to one other.

Mixed Independent/Dependent include both independent (ni ≥ 1) and dependent (ni ≥ 1) elements in the stem, where at least one pair of stem elements are directly related. Fixed represents a constant

stem format with no variation or change. The options contain the alternatives for the item model. The elements in the options can function in three different ways. Randomly-selected options refer to the manner in which the distracters are selected from their corresponding content pools. The distracters are selected randomly. Constrained options mean that the keyed option and the distracters are generated according to specific constraints, such as formulas, calculation, and/or context. Fixed options occur when both the keyed option and distracters are invariant or unchanged in the item model. By crossing the stem and options, a matrix of item model types can be produced (see Table 1). This taxonomy is useful for creating item models because it provides the guiding principles necessary for designing diverse models by outlining their structure, function, similarities, and differences. It can also be used to help ensure that test development specialists do not design item models with exactly the

Item Models 10 same elements. Ten functional combinations are designated with a checkmark, “√”. The two remaining combinations are labelled not applicable, “NA”, because a model with a fixed stem and constrained options is an infeasible item type and a model with a fixed stem and options produces a single multiple-choice item type (i.e., a traditional multiple-choice item). Gierl et al. also presented 20 examples (i.e., two examples for each of the 10 cells in the item model taxonomy) to illustrate each unique combination. Their examples were drawn from diverse content areas, including science, social studies, mathematics, language arts, and architecture. Table 1. Plausible Stem-by-Option Combinations in the Gierl et al. (2008) Item Model Taxonomy Stem Options

Independent

Dependent

Mixed

Fixed

Randomly Selected









Constrained







NA

Fixed







NA

USING ITEM MODELS TO AUTOMATICALLY GENERATE ITEMS Once the item models are developed by the test development specialists, automatic item generation can begin. Automatic item generation is the process of using item models to generate test items with the aid of computer technology. The role of the test development specialist is critical for the creative task of designing and developing meaningful item models. The role of computer technology is critical for the generative task of systematically combining large numbers of elements in each model to produce items. By combining content expertise and computer technology, item modeling can be used to generate items. If we return to the simple math example in Figure 1, the generative process can be illustrated. Recall, the stem in this example contains two integers (I1, I2). The generative task for this example involves generating six items with the following I1, I2 combinations: I1=$1525 and I2=30/m2;

Item Models 11 I1=$1600 and I2=30/m2; I1=$1675 and I2=30/m2; I1=$1525 and I2=45/m2; I1=$1600 and I2=45/m2; I1=$1675 and I2=45/m2. Gierl et al. (2008, pp. 25-31) also created a software tool that automatically creates, saves, and stores items. The software is called IGOR (which stands for Item GeneratOR). It was written in Sun Microsystems JAVE SE 6.0. The purpose of IGOR is to generate multiple items from a single item model. The user interface for IGOR is structured using the same sections as the example in Figure 1 (i.e., stem, elements, options). The Item Model Editor window is used to enter and structure each item model (see Figure 2a). The editor has three components. The stem panel is the starting point for item generation where the item prompt is specified. Next, the elements panel is used to identify the string and integer variables as well as specify the constraints required among the elements for successful item generation. The options panel is used to specify possible answers to a given test item. The options are classified as either a key or distracter. The Elements and Options panels also contain three editing buttons. The first of these adds a new element or option to its panel ( window to edit the currently selected element or option ( selected element or option from the model (

). The second opens a

). The third removes the currently

). To generate items from a model, the Test Item

Generator dialogue box is presented where the user specifies the item model file, the item bank output file, and the answer key file. If the option ‘Create answer key’ is not selected, then the resulting test bank will always display the correct answer as the last option (or alternative). If the option ‘Create answer key’ is selected, then the resulting test bank will randomly order the options. Once the files have been specified in the Test Item Generator dialogue box, the program can be executed by selecting the ‘Generate’ button (see Figure 2b).

Item Models 12 Figure 2. IGOR interface illustrating the (a.) input panels and editing functions as well as the (b.) generating functions.

(a.)

(b.)

Preliminary research has been conducted with IGOR. Gierl et al., working with two mathematics test development specialists, developed 10 mathematics item models. IGOR generated 331371 unique items from the 10 item models. That is, each model produced, on average, 33137 items thereby providing an initial demonstration of the practicality and feasibility of item generation using IGOR. BENEFITS OF ITEM MODELING Item modeling can enhance educational assessment in many ways. The purpose of item modeling is to create a single model that yields many test items. Multiple models can then be developed which will yield hundreds or thousands of new test items. These items, in turn, are used to generate item banks. Computerized assessments or automatic test assembly algorithms then draw on a sample of the items from the bank to create a new test. With this approach, item exposure through test administration is minimized, even with continuous testing, because a large bank of operational items is available. Item modeling can also lead to more cost-effective item development because the model is continually re-

Item Models 13 used to yield many test items compared with developing each item for a test from scratch. Moreover, costly, yet common, errors in item development—including omissions or additions of words, phrases, or expressions as well as spelling, punctuation, capitalization, item structure, typeface, and formatting problems—can be avoided because only specific elements in the stem and options are manipulated across large numbers of items (Schmeiser & Welch, 2006). In other words, the item model serves as a template or prototype where test development specialists manipulate only specific, well-defined, elements. The remaining components in the template or prototype are not altered. The view of an item model as a template or prototype with both fixed and variable elements contrasts with the more conventional view of a single item where every element is unique, both within and across items. Drasgow et al. (2006) explain: The demand for large numbers of items is challenging to satisfy because the traditional approach to test development uses the item as the fundamental unit of currency. That is, each item is individually hand-crafted—written, reviewed, revised, edited, entered into a computer, and calibrated—as if no other like it had ever been created before. But possibly the most important benefit of item modeling stems from the logic of this approach to test development. With item modeling, the model is treated as the fundamental unit of analysis where a single model is used to generate many items compared with a more traditional approach where the item is treated as the unit of analysis (Drasgow et al. 2006). Hence, with item modeling, the cost per item is lower because the unit of analysis is multiple instances per model rather than single instances per test development specialist. As a result, large number of items can be generated from a single item model rather than relying on each test development specialist to develop a large number of unique items. The item models can also be re-used, particularly when only a small number of the generated items are used on a particular test form.

Item Models 14 ITEM MODEL BANK Current practices in test development and analysis are ground in the test item. That is, each item is individually written, reviewed, revised, edited, banked, and calibrated. If, for instance, a developer intends to have 1236 operational test items in her bank, then she has 1236 unique items that must be created, edited, reviewed, field tested, and, possibly, revised. An item bank serves as an electronic repository for maintaining and managing information on each item. The maintenance task focuses on item-level information. For example, the format of the item must be coded. Item formats and item types can include multiple choice, numeric response, written response, linked items, passage-based items, and items containing multimedia. The content for the item must be coded. Content fields include general learning outcomes, blueprint categories, item identification number, item response format, type of directions required, links, field test number, date, source of item, item sets, and copyright. The developer attributes must be coded. These attributes include year the item was written, item writer name, item writer demographics, editor information, development status, and review status. The statistical characteristics for the item must also be coded. Statistical characteristics often include word count, readability, classical item analyses, item response theory parameters, distracter functioning, item history, field test item analyses, item drift, differential item functioning flags, and history of item use. The management task focuses on person-level information and process. That is, item bank management requires explicit processes that guide the use of the item bank. Many different people within a testing organization are often involved in the development process including the test development specialists, subject matter experts (who often reside in both internal and external committees), psychometricians, editors, graphic artists, word processors, and document production specialists. Many testing programs field test their items and then review committees evaluate the items

Item Models 15 prior to final test production. Hence, field tested items are often the item bank entry point. Rules must be established for who has access to the bank and when items can be added, modified, or removed during field testing. The same rules must also apply to the preparation of the final form of the test because field testing can, and often does, occur in a different unit of a testing organization or at a different stage in development process and, therefore, may involve different people. Item models, rather than single test items, serve as the unit of analysis in an item model bank. With an item model bank, the test development specialist creates an electronic repository of item models for maintaining and managing information on each model. However, a single item model which is individually written, reviewed, revised, edited, and banked will also allow the developer to generate many test items. If, for instance, a developer intends to have 331371 items, then she may only require 10 item models (as was illustrated in our previous section on “Using Item Models to Automatically Generate Items”). Alternatively, if a particularly ambitious developer aspired to have a very large inventory of 10980640827 items, then she would require 331371 item models [i.e., if each item model generated, on average, 33137 mathematics items as was illustrated in our previous section on “Using Item Models to Automatically Generate Items”, then 331371 item models could be used to generate 10980640827 (33137*331371) items]. An item model bank serves as an electronic repository for maintaining and managing information on each item model. Because an item model serves as the unit of analysis, the banks contain a complex assortment of information on every model, but not necessarily on every item. The maintenance task focuses on model -level information. For example, the format of the item model must be coded. Content fields must be coded. The developer attributes must be coded. Some statistical characteristics of the model must also coded, including word count, readability, and item model history. The item model bank may also contain coded information on the item model ID, item model name, expected

Item Models 16 grade levels for use, item model stem type, item model option type, number of constraints for the model, the number of elements (e.g., integers and strings) in the model, and the number of generated items. The management task focuses on person-level information and process. That is, item model bank management requires explicit processes that guide the use of the item model bank. As with a more traditional approach to item development, many different people within a testing organization are involved in the process including the test development specialists, subject matter experts, psychometricians, editors, graphic artists, and word processors. Because of the generative process required for item model banking, an additional type of specialist may also be involved: the item model programmer. This specialist is skilled in test development, but also in computer programming and database management. In other words, this is a 21st century career! Their role is, first, to bridge the gap between the test development specialist who creates the item model and required programming tasks necessary to format and generate items using IGOR. In other words, the item model programmer helps the test development specialist identify and manipulate the fixed and variable elements in each item model (which is where test development experience will be helpful), enter the newly created item models into IGOR, and then execute the program to generate items (the latter two steps require computer programming skills, at least at this stage in the development of automatic item generation 2). Second, the item model programmer is responsible for entering the models into the item model bank, maintaining the contents of the bank, and managing the use of the item model bank (which requires

2

In 2009, we worked with 12 test development specialists at the Learner Assessment Branch at Alberta Education to create item models for achievement tests in Grade 3 Language Arts and Mathematics as well as Grade 6 and 9 Language Arts, Mathematics, Science, and Social Studies. The project yielded 284 unique item models at all three grade levels and in four different content areas. The test development specialists in this project had the most difficulty specifying the fixed and variable elements in their model and, despite repeated training, were unable to code their models and run IGOR consistently.

Item Models 17 database management skills). The responsibilities of the item model programmer are presented in Figure 3. Figure 3. Basic overview of workflow using traditional item banking and item model banking.

Item Writing

Item Bank

Form Assembly

Traditional Item Banking Process

Item Model Writing

Item Model Programmer

Item Model Database

Item Generation

Form Assembly

Item Model Banking Process Item Generation

ESTIMATING STATISTICAL CHARACTERISTICS OF GENERATED ITEMS Drasgow et al. (2006, p. 473) claim that: Ideally, automatic item generation has two requirements. The first requirement is that an item class can be described sufficiently for a computer to create instances of that class automatically or at least semi-automatically. The second requirement is that the determinants of item difficulty be understood well enough so that each of the generated instances need not be calibrated individually. In the previous six sections of this paper, we described and highlighted the issues related to Drasgow et al.’s first requirement—describing an item class and automatically generating items—with the use of item models. In this section, we address the challenges related to Drasgow et al.’s second requirement by illustrating how generated items could be calibrated automatically. To be useful in test assembly,

Item Models 18 item must have statistical characteristics. These characteristics can be obtained by administering the items on field tests to collect preliminary information from a small sample of examinees. Item statistics can also be obtained by embedding pilot items within a form as part of an operational test administration, but not using the pilot items for examinee scoring. An alternative approach is to account for the variation among the generated items in an item model and, using this information, to estimate item difficulty with a statistical procedure thereby making field and pilot testing for the generated items unnecessary (or, at least, dramatically reduced). A number of statistical procedures have been developed to accomplish this task, including the linear logistic test model (LLTM; Fischer, 1973; see also Embretson & Daniel, 2008), the 2PL-constrained model (Embretson, 1999), the hierarchical IRT model (Glas & van der Linden, 2003), the Bayesian hierarchical model (Sinharay, Johnson, & Williamson, 2003; Sinharay & Johnson, 2005), and the expected response function approach (Mislevy, Wingersky, & Sheehan, 1994). Janssen (2010; see also Janssen, Schepers, & Peres, 2004) also described a promising approach for modeling item design features using an extension of the LLTM called the random-effects LLTM (LLTM-R). The probability that person 𝑗 successfully answers item 𝑖 is given by the LLTM as follows:

𝑃�𝑋𝑖𝑗 = 1�𝜃𝑗 , 𝒒, 𝜼� =

exp(𝜃𝑗 −∑𝐾 𝑘=1 𝑞𝑖𝑘 𝜂𝑘 )

1+exp(𝜃𝑗 −∑𝐾 𝑘=1 𝑞𝑖𝑘 𝜂𝑘 )

.

In this formula, the item difficulty parameter 𝛽𝑖 found in the Rasch model is replaced with an item

difficulty model specified as 𝛽𝑖 = ∑𝐾 𝑘=1 𝑞𝑖𝑘 𝜂𝑘 , where item difficulty is specified by a linear combination

of item predictors, including a parameter for the item design feature 𝑞𝑖𝑘 , which is the score of item 𝑖 on

item design feature 𝑘, and a parameter 𝜂𝑘 , which is the difficulty weights associated with item design feature 𝑘. Building on this LLTM formulation, the LLTM-R adds a random error term to 𝛽𝑖 to estimate

that component of item difficulty that may not be accounted for in the item difficulty model:

Item Models 19 ∗ ∗ 2 𝛽𝑖 = ∑𝐾 𝑘=1 𝑞𝑖𝑘 𝜂𝑘 + 𝜀𝑖 = 𝛽𝑖 + 𝜀𝑖 , where 𝛽𝑖 ∼ 𝑁�𝛽𝑖 , 𝜎𝜖 �.

By adding 𝜎𝜀2 to the model, random variation can be used to account for design principles that yield the

same items but not necessary the same item difficulty values across these items.

Janssen (2010) also described the logic that underlies the LLTM-R, as it applies to automatic item generation. The LLTM-R consists of two parts. The first part of the model specifies the person parameters associated with 𝜃𝑗 , which include 𝜇𝜃 and 𝜎𝜃2 , and the second part specifies the item

parameters associated with the 𝛽𝑖 , which include 𝛽𝑖∗ and 𝜎𝜖2 . The parameter 𝜀𝑖 accounts for the random

variation of all items created within the same item design principles leading to similar, but not

necessarily the same, item difficulty levels. Taken together, the LLTM-R can be used to describe three meaningful components: persons (i.e., 𝜇𝜃 , 𝜎𝜃2 ) , items (𝛽𝑖∗), and item populations (𝜎𝜖2 ). For modeling outcomes in an automatic item generation context, our focus is on the items and item populations (where the items are nested within the item population). Next, we develop a working example using the logic for automatic item generation presented in Janssen (2010). Our example is developed using operational data from a diagnostic mathematics program (see Gierl, Taylor-Majeau, & Alves, 2010). The purpose of the Gierl et al. (2010) study was to apply the attribute hierarchy method in an operational diagnostic mathematics program at the elementary school levels to promote cognitive inferences about students’ problem-solving skills. The attribute hierarchy method is a statistical procedure for classifying examinees’ test item responses into a set of structured attribute patterns associated with a cognitive model. Principled test design procedures were used to design the exam and evaluate the student response data. To begin, cognitive models were created by test development specialists who outlined the knowledge and skills required to solve mathematical tasks in Grades 3 and 6. Then, items were written specifically to measure the skills in the cognitive models. Finally, confirmatory statistical analyses were used to evaluate the student response

Item Models 20 data by estimating model-data fit, attribute probabilities for diagnostic score reporting, and attribute reliabilities. The cognitive model and item development steps from the diagnostic math program were used in the current example to create item models. Cognitive models for CDA have four defining characteristics (Gierl, Alves, Roberts, & Gotzmann, 2009). First, the model contains skills that are specified at a fine grain size because these skills must magnify the cognitive processes underlying test performance. Second, the skills must be measurable. That is, each skill must be described in way that would allow a test developer to create an item to measure that skill. Third, the skills must be instructionally relevant to a broad group of educational stakeholders, including students, parents, and teachers. Fourth, a cognitive model will often reflect a hierarchy of ordered skills within a domain because cognitive processes share dependencies and function within a much larger network of inter-related processes, competencies, and skills. Figure 4 provides one example taken from a small section of a larger cognitive model developed to yield diagnostic inferences in SAT algebra (cf. Gierl, Wang, & Zhou, 2008). As a prerequisite skill, cognitive attribute A1 includes the most basic arithmetic operation skills, such as addition, subtraction, multiplication, and division of numbers. In attribute A2, the examinee needs to have the basic arithmetic skills (i.e., attribute A1) as well as knowledge about the property of factors. In attribute A3, the examinee not only requires basic arithmetic skills (i.e., attribute A1) and knowledge of factoring (i.e., attribute A2), but also the skills required for the application of factoring. The attributes are specified at a fine grain size; each attribute is measurable; each attribute, and its associated item, is intended to be instructionally relevant and meaningful; and attributes are ordered from simple to more complex as we move from A1 to A3.

Item Models 21 Figure 4. Three sample items designed to measure three ordered skills in a linear cognitive model.

A3

A2

A1: Arithmetic

A2: Properties

A3: Application

operations

of Factors

of Factoring

A1: Arithmetic operations

A2: Properties of Factors

A1: Arithmetic A1

Cognitive Model

operations

Hierarchy Level

Item 3: If 4a+4b = 3c-3d, then (2a+2b)/(5c-5d)=? A. 2/5 B. 4/3 C. 3/4 D. 8/15 E. 3/10 Item 2: If (x+2)/(m-1)=0 and m≠1, what is the value of x? A. 2 B. -1 C. 0 D. 1 E. -2 Item 1: If 6(m+n)-3=15, then m+n=? A. 2 B. 3 C. 4 D. 5 E. 6

Sample Test Items

The same test design principles were used to develop four item models in our working example. We selected four parent items that had been field tested with 100 students from the diagnostic mathematics project. These parent items, in turn, we used to create item models. The item models were then used for item generation. The four item models are presented in Appendix A. The item models in Appendix A are ordered from least to most complex according to their cognitive features, meaning that item model 1 measures number sequencing skills; item model 2 measures number sequencing skills and numerical comparison skills; item model 3 measures number sequencing skills, numerical comparison skills, and addition skills; item model 4 measures number sequencing skills, numerical comparison skills, addition skills, and ability to solve fractions (please note that the ordering of the item models in this example has not been validated, rather the models are used to illustrate how the LLTM-R could be used for item generation). The LLTM-R was implemented in two steps. In step 1, parameters were estimated for the persons, items, and item population with the LLTM. Using a field test consisting of 20 item specifically written to measure the cognitive features of number sequencing, numerical comparison, addition, and fractions

Item Models 22 (i.e., five items per cognitive feature), the person and item parameters were estimated using the dichotomously-scored response vectors for 100 students who solved these items. The item feature parameter estimates were specified as fixed effects in the LLTM and the person and item population estimates were specified as random effects. The estimated item fixed-effect parameter weights and their associated standard errors are presented in Table 2. Table 2. Estimated Weights and Standard Errors Using the Cognitive Features Associated with the Four Diagnostic Test Items Cognitive Feature Number Sequencing (Least Complex)

Estimate (Standard Error) -2.06 (0.22)

Numerical Comparisons

0.94 (0.27)

Addition

0.86 (0.25)

Fractions (Most Complex)

1.03 (0.25)

The estimated weights in Table 2 were then used to create a cognitive feature effect for each parent item. The cognitive feature effect is calculated by taking the sum of the products for the pre-requisite cognitive features as measured by each parent item. For example, a parent item that measures the cognitive feature numerical comparisons would have a skill pattern of 1,1,0,0 because the features are ordered in a hierarchy from least to most complex. This pattern would be multiplied and summed across the estimated weights in Table 2 to produce the cognitive feature effect for each of the four parent items in our example. The cognitive feature effect for the parent item measuring numerical comparisons, for instance, would be (-2.06 X 1) + (0.94 X 1) +(0.86 X 0) + (1.03 X 0) = -1.13. The random effects estimated for the person and item population, as reported in standard deviation units, are 0.99 and 0.33, respectively. In step 2, the four parent items were selected from the field test and used to create item models (Appendix A), the item models were used to generate items, and the difficulty parameters for the generated items were estimated. Number sequencing is the first, and hence, most basic cognitive

Item Models 23 feature. This model generated 105 items. The second cognitive feature, numerical comparison, resulted in a model that generated 90 items. The third cognitive feature was addition. The addition item model generated 30 items. Fractions is the fourth, and therefore, most complex cognitive feature. The fraction item model generated 18 items. In total, the four item models yielded 243 generated items. For our illustrative example, the four item models are also differentiated by three key item features. Each generated item had a different combination of these three item features. These features were coded for each item and factored into our estimation process because they were expected to affect item difficulty. The 10 item features and their codes (reported in parentheses) include all patterns with 0 (0), or not (1); no use of odd numbers (0) or use of odd numbers (1); sum of last digit is less than 10 (0) or sum is greater than 10 (1); some parts are 1/8 (0) or no parts are 1/8 (1); pattern by 10s (0), pattern by 20s and 5s (1), patterns by 15 and 25 (2); 1 group(0), 2 groups (1), 3 groups (2); no odd number (0), one odd number (1), two odd numbers (2); lowest common denominator less that 8 (0) or lowest common denominator greater than 8 (1); first number ends with 0 (0), or not (1); group size of 5 (0) or group size of 10 (1); use of number in multiples of 10 (0) or no number with multiples of 10 (1). These three item features, when crossed with the four cognitive features (i.e., four parent items), are shown in Appendix B. These 10 item features serves as our best guess as to the variables that could affect item difficulty for the generated items in each of the four item models. These item features would need to be validated prior to use in a real item generation study. To compute the difficulty parameter estimate for each of the generated items, four sources of information must be combined. These sources include the cognitive feature effect (estimated in step 1), the item feature coding weight, the item population standard deviation (from step 1), and random error 3. These sources are combined as follows: Difficulty Level for the Generate Item = Cognitive 3

The random error component allowed us to introduce error into our analysis, which is how we modeled the LLTM-R using the LLTM estimates from step 1 for our example.

Item Models 24 Feature Effect + [(Item Feature Effect) x (Item Population Standard Deviation) x (Random Error)]. Returning to our previous example from step 1, the difficulty level for a generated item with the numerical comparisons cognitive feature and an item feature effect of 0,1,1 (i.e., use of odd number; use of two groups; use a group size of 5) would be -1.21 [-1.13 + (-0.5) x (0.33) x (0.48)]. The item feature effect code of 0,1,1 is represented as -0.5 to standardize the item feature results in our calculation, given that different cognitive features have different numbers of item features (see Appendix B). This method is then applied to all 243 generated items to yield their item difficulty estimates. SUMMARY AND FUTURE DIRECTIONS Internet-based computerized assessment is proliferating. Assessments are now routinely administered over the internet where students respond to test items containing text, images, tables, diagrams, sound, and video. But the growth of internet-based computerized testing has also focused our attention on the need for new testing procedures and practices because this form of assessment requires a continual supply of new test items. Automatic item generation is the process of using item models to generate test items with the aid of computer technology. Automatic item generation can be used to initially develop item banks and then replenish the banks needed for computer-based testing. The purpose of our paper was to describe seven topics that are central to the development and use of item models for automatic item generation. We defined item model and highlighted related concepts; we described how item models are developed; we presented an item model taxonomy; we illustrated how item models can be used for automatic item generation; we outlined some benefits of using item models; we introduced the idea of an item model bank; and we demonstrated how statistical procedures could be used to calibrate the item parameter estimates for generated items without the need for extensive field or pilot testing. We also attempted to contextualize the growing interest in

Item Models 25 automatic item generation by highlighting the fact that the science of educational assessment is beginning to influence educational measurement theory and practice and by claiming that interdisciplinary forces and factors are beginning to exert a stronger affect on how we solve problems in the discipline of educational assessment. Research on item models is warranted in at least two different areas. The first area is item model development. To our knowledge, there has been no focused research on item model development. Currently, the principles, standards, and practices that guide traditional item development are also recommended for use with item model development. These practices have been used to design and develop item model examples that are cited in the literature (e.g., Bejar et al., 2003; Case & Swanson, 2002; Gierl et al., 2008). But much more research is required on designing, developing, and, most importantly, evaluating the items produced by these models. By working more closely with test development specialists in diverse content areas, researchers can begin to better understand how to design and develop item models by carefully documenting the process. Research must also be conducted to evaluate these item models by focusing on their generative capacity (i.e., the number of items that can be generated from a single item model) as well as their generative veracity (i.e., the usefulness of the generated items, particularly from the view of test development specialists and content experts). The second area is the calibration of generated items using an item modelling approach. As noted by Drasgow et al. (2006), automatic item generation can minimize, if not eliminate, the need for item field or pilot testing because items generated from a parent model can be pre-calibrated, meaning that the statistical characteristics from the parent item model can be applied to the generated items. We illustrated how the LLTM-R could be used to estimate the difficulty parameter for 243 generated items in a diagnostic mathematics program. But a host of other statistical procedures are also available for

Item Models 26 estimating the statistical characteristics of generated items, including the 2PL-constrained model (Embretson, 1999), the hierarchical IRT model (Glas & van der Linden, 2003), the Bayesian hierarchical model (Sinharay, Johnson, & Williamson, 2003; Sinharay & Johnson, 2005), and the expected response function approach (Mislevy, Wingersky, & Sheehan, 1994). These different statistical procedures could be used with the same item models to permit parameter estimate comparisons across generated items, without the use of sample data. This type of study would allow researchers to assess the comparability of the predicted item statistics across the procedures. These statistical procedures could also be used with the same item models to permit parameter estimate comparisons across generated items relative to parameter estimates computed from a sample of examinees who actually wrote the generated items. This type of study would allow researchers to assess the predictive utility of the statistical procedures (i.e., the agreement between the predicted item characteristics on the generated items using a statistical procedure compared to the actual item characteristics on the generated items using examinee response data), which, we expect, will emerge as the “gold standard” for evaluating the feasibility and, ultimately, the success of automatic item generation.

Item Models 27 REFERENCES Bartram, D. (2006). Testing on the internet: Issues, challenges, and opportunities in the field of occupational assessment. In D. Bartram & R. Hambleton (Eds.), Computer-based testing and the internet (pp. 13-37). Hoboken, NJ: Wiley. Bejar, I. I. (1990). A generative analysis of a three-dimensional spatial task. Applied Psychological Measurement, 14, 237-245. Bejar, I. I. (1996). Generative response modeling: Leveraging the computer as a test delivery medium (ETS Research Report 96-13). Princeton, NJ: Educational Testing Service. Bejar, I. I. (2002). Generative testing: From conception to implementation. In S. H. Irvine & P. C. Kyllonen (Eds.), Item generation for test development (pp.199-217). Hillsdale, NJ: Erlbaum. Bejar, I. I., Lawless, R., Morley, M. E., Wagner, M. E., Bennett, & R. E., Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. Journal of Technology, Learning, and Assessment, 2(3). Available from http://www.jtla.org. Bennett, R. (2001). How the internet will help large-scale assessment reinvent itself. Educational Policy Analysis Archives, 9, 1-23. Case, S. M., & Swanson, D. B (2002). Constructing written test questions for the basic and clinical sciences (3rd edition). Philadelphia, PA: National Board of Medical Examiners. Downing, S. M., & Haladyna, T. M. (2006). Handbook of test development. Mahwah, NJ: Erlbaum. Drasgow, F., Luecht, R. M., & Bennett, R. (2006). Technology and testing. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 471-516). Washington, DC: American Council on Education. Drasgow, F., & Mattern, K. (2006). New tests and new items: Opportunities and issues. In D. Bartram & R. Hambleton (Eds.), Computer-based testing and the internet (pp. 59-76). Hoboken, NJ: Wiley.

Item Models 28 Embretson, S. E. (1999). Generating items during testing: Psychometric issues and models. Psychometrika, 64, 407-433. Embretson, S. E. (2002). Generating abstract reasoning items with cognitive theory. In S. H. Irvine & P. C. Kyllonen (Eds.), Item generation for test development (pp. 219-250). Mahwah, NJ: Erlbaum. Embretson, S. E., & Daniel, R. C. (2008). Understanding and quantifying cognitive complexity level in mathematical problem solving items. Psychological Science Quarterly, 50, 328-344. Embretson, S. E., & Yang, X. (2007). Automatic item generation and cognitive psychology. In C. R. Rao & S. Sinharay (Eds.) Handbook of Statistics: Psychometrics, Volume 26 (pp. 747-768). North Holland, UK: Elsevier. Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359-374. Gierl, M. J., Wang, C., & Zhou, J. (2008). Using the Attribute Hierarchy Method to make diagnostic inferences about examinees’ cognitive skills in algebra on the SAT©. Journal of Technology, Learning, and Assessment, 6 (6). Retrieved [date] from http://www.jtla.org. Gierl, M. J., Zhou, J., & Alves, C. (2008). Developing a taxonomy of item model types to promote assessment engineering. Journal of Technology, Learning, and Assessment, 7(2). Retrieved [date] from http://www.jtla.org. Gierl, M. J. Alves, C., Roberts, M., & Gotzmann, A. (2009, April). Using judgments from content specialists to develop cognitive models for diagnostic assessments. In J. Gorin (Chair), How to Build a Cognitive Model for Educational Assessments. Paper presented in symposium conducted at the annual meeting of the National Council on Measurement in Education, San Diego, CA.

Item Models 29 Gierl, M. J., Alves, C., & Taylor-Majeau, R. (2010). Using the Attribute Hierarchy Method to make diagnostic inferences about examinees’ skills in mathematics: An operational implementation of cognitive diagnostic assessment. International Journal of Testing, 10, 318-341. Glas, C. A. W., & van der Linder, W. J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247-261. Haladyna, T., & Shindoll, R. (1989). Items shells: A method for writing effective multiple-choice test items. Evaluation and the Health Professions, 12, 97-106. Hively, W., Patterson, H. L., & Page, S. H. (1968). A “universe-defined” system of arithmetic achievement tests. Journal of Educational Measurement, 5, 275-290. Irvine, S. H., & Kyllonen, P. C. (2002). Item generation for test development. Hillsdale, NJ: Erlbaum. Janssen, R. (2010). Modeling the effect of item designs within the Rasch model. In S. E. Embretson (Ed.), Measuring psychological constructs: Advances in model-based approaches (pp. 227-245). Washington DC: American Psychological Association. Janssen, R., Schepers, J., & Peres, D. (2004). Models with item and group predictors. In P. DeBoeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and non-linear approach (pp. 189-212). New York: Springer. LaDuca, A., Staples, W. I., Templeton, B., & Holzman, G. B. (1986). Item modeling procedures for constructing content-equivalent multiple-choice questions. Medical Education, 20, 53-56. Leighton, J. P., & Gierl, M. J. (in press). The learning sciences in educational assessment: The role of cognitive models. Cambridge, UK: Cambridge University Press. Luecht, R. M. (2006a, May). Engineering the test: From principled item design to automated test assembly. Paper presented at the annual meeting of the Society for Industrial and Organizational Psychology, Dallas, TX.

Item Models 30 Luecht, R. M. (2006b, September). Assessment engineering: An emerging discipline. Paper presented in the Centre for Research in Applied Measurement and Evaluation, University of Alberta, Edmonton, AB, Canada. Luecht, R. M. (April, 2007). Assessment Engineering in Language Testing: From Data Models and Templates to Psychometrics. Invited paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL. Luecht, R. M. (February, 2011). Assessment design and development, version 2.0: From art to engineering. Invited paper presented at the annual meeting of the Association of Test Publishers, Phoenix, AZ. Mislevy, R. J., & Riconscente, M. M. (2006). Evidence-centered assessment design. In S. M. Downing & T. Haladyna (Eds.), Handbook of test development (pp. 61-90). Mahwah, NJ: Erlbaum. Mislevy, R. J., Wingersky, M. S., & Sheehan, K. M. (1994). Dealing with uncertainty about item parameters: Expected response functions (ETS Research Report 94-28-ONR). Princeton, NJ: Educational Testing Service. Schmeiser, C.B., & Welch, C.J. (2006). Test development. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 307-353). Westport, CT: National Council on Measurement in Education and American Council on Education. Singley, M. K., & Bennett, R. E. (2002). Item generation and beyond: Applications of schema theory to mathematics assessment. In S. H. Irvine & P. C. Kyllonen (Eds.), Item generation for test development (pp. 361-384). Mahwah, NJ: Erlbaum. Sinharay, S., Johnson, M. S., & Williamson, D. M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28, 295-313.

Item Models 31 Sinharay, S., & Johnson, M. (2005). Analysis of data from an admissions test with item models. (ETS Research Report 05-06). Princeton, NJ: Educational Testing Service. Sireci, S. G., & Zenisky, A. L. (2006). Innovative item formats in computer-based testing: In pursuit of improved construct representation. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp.329-348). Mahwah, NJ: Erlbaum. van der Linden, W., & Glas, C. A. W. (2010). Elements of adaptive testing. New York: Springer. Zenisky, A. L., & Sireci, S. G. (2002). Technological innovations in large-scale assessment. Applied Measurement in Education, 15, 337-362.

Item Models 32 Appendix A Item model #1 in mathematics used to generate isomorphic instances of numerical sequences. If the pattern continues, then the next three numbers should be 700 695 690 685 _____ _____ _____ A. 680, 675, 670 B. 700, 695, 690 C. 680, 677, 675 D. 685, 680, 675 ITEM MODEL VARIABLES Stem If the pattern continues, then the next three numbers should be I1 I1-I2 I1-(2*I2) I1-(3*I2) _____ _____ _____ Elements I1 Value Range: 700-800 by 5 I2 Value Range: 5-25 by 5 Options A= I1 - ( 4 * I2 ), I1 - ( 5 * I2 ), I1 - ( 6 * I2 ) B= I1 - ( 3 * I2 ), I1 - ( 4 * I2 ), I1 - ( 5 * I2 ) C= I1 - ( 4 * I2 ), I1 - ( round( 4.5 * I2 ) , I1 - ( 5 * I2 ) D= I1, I1 - ( 1 * I2 ) , I1 - ( 2 * I2 ) Key A

Item Models 33 Item model #2 in mathematics used to generate isomorphic instances of numerical comparisons. The number that is 1 group of 5 fewer than 201 is ... A. 196 B. 190 C. 197 D. 191 ITEM MODEL VARIABLES Stem The number that is I1 group of I2 fewer than I3 is ... Elements I1 Value Range: 1-3 by 1 I2 Value Range: 5-10 by 5 I3 Value Range: 201-245 by 3 Options A= I3 - ( I2 * I1 ) B= I3 - ( I2 * ( I1 + 1 ) ) C= I3 - ( I2 * I1 ) + 1 D= I3 - ( I2 * ( I1 + 1 ) ) - 1 Key A

Item Models 34 Item model #3 in mathematics used to generate isomorphic instances for addition. What is 15 + 18 ? A. 33 B. 48 C. 32 D. 34 ITEM MODEL VARIABLES Stem What is I1 + I2 ? Elements I1 Value Range: 15-30 by 3 I2 Value Range: 15-30 by 3 Options A= I1 + I2 B= I1 + I2 + 1 C= I1 + I1 + I2 - 1 D= I1 + I1 + I2 Key A

Item Models 35 Item model #4 in mathematics used to generate isomorphic instances for fractions. What fraction of the measuring cup has oil in it?

A. 2/8 B. 2/3 C. 3/10 D. 3/8 ITEM MODEL VARIABLES Stem What fraction of the measuring cup has oil in it? Diagram: I1 of Water and I2 of oil in one cup. Elements I1 Value Range: 0.125-1.0 by 0.125 I2 Value Range: 0.125-1.0 by 0.125 Options A= ( I2 * 8 ) / 8 B= ( I2 * 8 ) + ( ( I1 * 8 ) / 8 ) C= ( I2 * 8 ) + ( ( I1 * 8 ) / 10 ) D= ( I2 * 8) / ( ( I2 * 8 ) + 1 ) Key A

Appendix B The cognitive feature codes were used to develop the four parent items for our example. The item feature codes serve as variables that could affect the difficulty level for the generated items.

Item Feature Code

Cognitive Feature Code 1 (Number Sequencing) 2 (Numerical Comparisons) 3 (Addition) 4 (Fractions)

1 Value 0 1

Feature All start patterns are 0 All start patterns not 0

2 Value 0 1

Feature Pattern by 10s Pattern by 20s and 5s

2

Pattern by 15s and 25s

3 Value 0 1

Feature First number ends with 0 First number does not end with 0

0

No use of odd number

0

1 Group less

0

Group size of 10

1

Use of odd umber

1

2 Groups less

1

Group size of 5

2

3 Groups less

0

Sum of Last Digit 10

1 2

One use of odd numbers Two use of odd numbers

1

No number with multiples of 10

0 1

Some parts are 1/8 No parts are 1/8

0 1

Lowest common denominator < 8 Lowest common denominator=8

Suggest Documents