Balanced Item Pool Assembly in Computerized Adaptive Testing (RR ...

LSAC RESEARCH REPORT SERIES

Balanced Item Pool Assembly in Computerized Adaptive Testing

Dmitry I. Belov

Law School Admission Council Research Report 07-04 October 2007

A publication of the Law School Admission Council

The Law School Admission Council (LSAC) is a nonprofit corporation whose members are more than 200 law schools in the United States, Canada, and Australia. Headquartered in Newtown, PA, USA, the Council was founded in 1947 to facilitate the law school admission process. The Council has grown to provide numerous products and services to law schools and to more than 85,000 law school applicants each year. All law schools approved by the American Bar Association (ABA) are LSAC members. Canadian law schools recognized by a provincial or territorial law society or government agency are also members. Accredited law schools outside of the United States and Canada are eligible for membership at the discretion of the LSAC Board of Trustees. © 2009 by Law School Admission Council, Inc. All rights reserved. No part of this work, including information, data, or other portions of the work published in electronic form, may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or by any information storage and retrieval system, without permission of the publisher. For information, write: Communications, Law School Admission Council, 662 Penn Street, Box 40, Newtown, PA, 18940-0040. This study is published and distributed by LSAC. The opinions and conclusions contained in this report are those of the author(s) and do not necessarily reflect the position or policy of LSAC.

i

Table of Contents Executive Summary............................................................................................................................................................ 1 Introduction ........................................................................................................................................................................ 1 Test Assembly .................................................................................................................................................................... 2 Definitions .......................................................................................................................................................................... 4 Integer Programs................................................................................................................................................................. 5 Integer Programming Solvers ............................................................................................................................................. 8 A Formal Statement of the Problems and an Outline of the Algorithms ............................................................................ 9 Direct Problem ............................................................................................................................................................ 9 Inverse Problem........................................................................................................................................................... 9 Direct Algorithm.......................................................................................................................................................... 9 Inverse Algorithm ...................................................................................................................................................... 10 Computer Experiments ..................................................................................................................................................... 10 Discussion......................................................................................................................................................................... 12 References ........................................................................................................................................................................ 13

1

Executive Summary In computerized adaptive testing (CAT), test items (i.e., questions) for administration to an individual test taker are selected from a pool of items with the goal of matching the difficulty level of the test to the ability level of the test taker. In the recent literature on CAT, researchers have developed methods for designing the CAT item pool as a special set of nonoverlapping forms reflecting the skill levels of an assumed population of test takers. The input includes the original item pool called the master pool, required test form characteristics (e.g., content coverage), the assumed test-taker ability levels, and the number of nonoverlapping forms to assemble. Two problems with this approach have been identified. First, since these methods produce test forms that maximize the measurement precision (called information in the mathematical model applied here) at corresponding ability levels, the best items from the master pool are depleted. Second, since all forms are assembled simultaneously, the optimization problem is quite large and potentially intractable. To resolve both issues, this research introduces an additional input parameter—a threshold on the degree of information for each form at the corresponding ability level. This parameter allows the large optimization problem to be subdivided into smaller subproblems. By varying this parameter, both measurement accuracy and master pool utilization can be balanced. Then the direct problem identifies the maximum number of such nonoverlapping forms. When the master pool, test assembly constraints, and information threshold are fixed, there exists a certain ability density that will maximize the objective of the direct problem among all possible densities. The inverse problem is to identify such a density. Based on combinatorial optimization techniques, direct and inverse algorithms are developed that provide a feasible solution to the direct and inverse problems. Computer experiments with a pool of Law School Admission Test (LSAT) items and LSAT assembly constraints are presented. The direct and inverse algorithms provide testing organizations with effective means to maintain their master pools and produce CAT pools that balance measurement accuracy and item exposure.

Introduction Item pool design for computerized adaptive testing (CAT) has been addressed by a number of authors (Stocking & Swanson, 1998; Veldkamp & van der Linden, 2000; Ariel, Veldkamp, & van der Linden, 2004; van der Linden, Ariel, & Veldkamp, 2006). An important issue with CAT is to create an item pool (called a CAT pool), used for CAT administrations, from a larger item pool (called a master pool) such that two conflicting goals are simultaneously met: (a) item exposure during CAT administration is minimized; and (b) for each test taker, the set of items administered by the CAT meets content constraints and maximizes information at that test taker’s ability level. Current approaches include the blueprint design by Veldkamp and van der Linden (2000); constructing rotating pools by Ariel, Veldkamp, and van der Linden (2004); and building a CAT pool as a set of nonoverlapping forms satisfying content constraints and reflecting the expected ability density by van der Linden, Ariel, and Veldkamp (2006). This paper is an extension of the last approach. Given a master pool, test assembly constraints, and the expected ability density, a CAT pool is assembled as a specified number of nonoverlapping forms reflecting the density. This heuristic can be justified. First, each form satisfies content constraints used in CAT. Therefore, an adaptive test with content constraints will be feasible. Second, each form maximizes information at a certain ability level, and these levels are distributed as the expected ability. Therefore, an item exposure control will be reasonable. Empirical studies have demonstrated strong advantages of using such a CAT pool. For example, van der Linden, Ariel, and Veldkamp (2006) reported a CAT with nearly uniform bias and mean-squared error functions for the ability estimator and items exposure rates satisfying the target for all items in the CAT pool. However, these authors assembled all nonoverlapping forms simultaneously. Therefore, the overall number of constraints is proportional to the product of the number of test assembly constraints and the number of forms to assemble; in practice, this creates an intractable optimization problem. In addition, each form maximizes information at the corresponding ability level. This causes a depletion of the more informative items from the master pool. Van der Linden, Ariel, and Veldkamp (2006) report on assembly of a CAT pool with 10 nonoverlapping forms (500 items total) from a master pool of 5,316 items (9% utilization of the master pool). Since the most informative items were withdrawn from the master pool, the next CAT pool assembled from 4,816 leftover items will have a much lower information function than the first CAT pool. Therefore, each ensuing CAT pool has a much lower measurement quality than the previous one. Taking into account the substantial cost of developing new items, it would be more practical to control the depletion of the informative items, which leads to a new approach presented in this paper. We introduce an additional input parameter—an information threshold τ for each form at the corresponding ability level. This parameter allows subdividing the large optimization problem into smaller subproblems. By varying this parameter, one can balance master pool utilization and measurement accuracy. Then the direct problem is to determine the maximum number of such nonoverlapping forms. When the master pool, test assembly constraints, and information threshold are fixed, there exists a certain ability density that will maximize the objective of the direct problem among all possible densities. The inverse problem is to identify such density f (θ ) . A solution to the inverse problem f (θ ) can be used to measure a misfit μ between the population g (θ ) and the master pool. The misfit μ is computed by

2

w(θ ) = g (θ ) − f (θ ) 1 0≤μ = 2

+∞

.

∫ w( x) dx ≤ 1

(1)

−∞

Function w(θ ) provides the test developer with guidance as to the properties of items needed to reduce the misfit. The approach consists of two steps: (a) find a maximum of w(θ ) ; and (b) write new items with high information in the neighborhood of the maximum to decrease the misfit. This will save a substantial amount of time and resources for testing organizations. The following sections outline assembly of a test form, give definitions, describe integer programs and integer programming solvers used in this paper, and formulate the direct and inverse problems. This is followed by a description of the corresponding direct and inverse algorithms. Then the results of computer experiments are presented. A summary discussion is given in the final section of the paper.

Test Assembly A test form (or just form) is composed of items (test questions) and passages (stimulus material that the items refer to). A database of items, passages, and their associated characteristics is called an item bank or master pool. Consider a master pool where the three-parameter logistic (3PL) model (Lord, 1980) is used for each item. In this study, a form is a sequence of items from the master pool satisfying the following constraints, most common in practice: Number of passages: The number of passages in a form must be within a specified range. Number of items: The number of items in a form must be within a specified range. Item-passage specifications: The number of items associated with a passage must be within a specified range. Topic: A form must have a specified number of passages for each topic area. Cognitive skill distribution: Items in a form must satisfy the specific distribution of the cognitive skills being tested. Enemies: Some item pairs and passage pairs must not be included on the same form. Word count: The total number of words in each form must be within a specified range. Answer key count distribution: The distribution of the multiple-choice answer keys is constrained for each form. Item response theory (IRT) constraint: The Fisher information for a form must exceed a given information threshold τ for a given ability level (Figure 1). Note that there is no maximization of the information; only a threshold must be exceeded. This requirement encourages an economical use of informative items during the assembly of multiple forms. Since the information function is smooth, a form satisfying the IRT constraint for an ability level θ satisfies this constraint for an ability range θ ∈ [l , r ] . Given the θ , a binary search is used to find bounds l and r .

3

FIGURE 1. An illustration of IRT constraint: The information function exceeds an information threshold τ = 4 along ability range [l , r ] = [-0.85, 1.45] .

Given an item pool and test assembly constraints, a form is a solution to a finite system of inequalities induced by the constraints. All solutions to the system constitute a finite feasible set. Consider a uniform test assembly, where each form from the feasible set has an equal probability of being assembled. The uniform test assembly allows the construction of a representative subset of the feasible set, which is crucial for the analysis below. This paper employs a Monte Carlo test assembler by Belov and Armstrong (2005) to attain uniformity. For a fixed-length test, however, the following integer program can be considered as an approximation of uniform test assembly: h

maximize

∑α x j

j

j =1

subject to h>b>0 x j ∈ {0,1}

j = 1,..., h

,

(2)

h

∑x

j

=b

j =1

any other constraints for x = ( x1 , x2 ,..., xh )T where coordinates α1 , α 2 ,..., α h of vector α are independent random variables uniformly distributed on [0,1] ; h is the size of the master pool; and the number of items in a test is b . Vector α is resampled each time before Problem (2) is solved. More on uniform assembly and the properties of Problem (2) can be found in Belov (2008). By changing the parameter τ , one can balance master pool utilization and measurement accuracy for CAT administrations: the higher τ is, the more informative items are chosen for assembled forms, which leads to smaller standard errors of measurement; the lower τ is, a larger number of forms can be assembled, which leads to a higher utilization of the master pool. In this study, a Monte Carlo method computes τ as an expectation of Fisher information over the uniform sample from the feasible set. However, the actual value of τ should be selected by the testing organization in order to satisfy bounds on standard errors of measurement. The following algorithm simulates random variable γ ; its expectation is an estimate for τ .

4

Algorithm 1 (one simulation for the information threshold): Input: Master pool, test assembly constraints without IRT constraint Output: Random variable γ Step 1: Generate random θ uniformly distributed on [−3, 3]. Step 2: Assemble a form uniformly. Step 3: Set γ to the Fisher information of the form at ability level θ .

Definitions The following definitions are referred to throughout the text: Definition 1 (sequential assembly): Assembling forms one by one. Definition 2 (overlapping and nonoverlapping forms): If two forms have common items they are called overlapping; otherwise, they are called nonoverlapping. In sequential assembly, if the items from each assembled form are withdrawn from the master pool, then multiple nonoverlapping forms are produced; otherwise, if items are not withdrawn, overlapping forms are possible. When producing nonoverlapping forms, sequential assembly eventually fails because there is a lack of forms available from the pool or because a limit for search iterations has been exceeded. Definition 3 (ability levels induced by density): Given positive integer k and a density p(θ ) , a set of ability levels θ1 < θ 2 < ... < θ k is induced by the density p(θ ) if θ1

∫ p( x)dx = 2k 1

−∞

and for each i = 1, 2,..., (k − 1) θi +1

∫ p( x)dx = k . θ 1

i

An example of 10 ability levels induced by N (0,1) density is illustrated in Figure 2. To compute ability levels for an arbitrary density function, the function is transformed to a step function with support [−3, 3]; this study uses a 0.05 discretization step.

5

FIGURE 2. Ten ability levels, induced by N(0,1) density, are marked on the ability axis (θ). Each region under the density curve between two vertical dash lines has size 1/10 (see Definition 3).

Definition 4 (forms reflecting ability levels): Given k forms and k ability levels {θ1 ,θ 2 ,..., θ k } , forms reflect th the ability levels if the i form has Fisher information exceeding threshold τ at θi , i = 1, 2,..., k . Definition 5 (forms reflecting density): If in Definition 4 the ability levels {θ1 ,θ 2 ,..., θ k } are induced by an ability density then we say that forms reflect the ability density.

For 10 ability levels induced by N (0,1) (see Figure 2), each of 10 forms must be assigned to a single θi , i = 1, 2,...,10 . The i th form may be assigned to more than one ability level, because the information function is usually smooth and can exceed the threshold along an ability range [li , ri ] (see Figure 1). However, the assignment must be one-to-one, that is, one form to one ability level. Definitions 3–5 generalize the mathematical model by van der Linden, Ariel, and Veldkamp (2006, pp. 82–83). Indeed, when the information threshold is increasing, the corresponding range [li , ri ] is shrinking, and an ability level λi ∈ [li , ri ] , where Fisher information of i th assigned form is maximized, converges to θi ∈ [li , ri ] . Thus, when the information threshold is increasing, the ability levels λi , i = 1, 2,..., k will satisfy Definition 3.

Integer Programs Given a set of forms, the identification of the biggest subset of nonoverlapping forms can be formulated as the following integer program (IP):

maximize k subject to n

∑a

ij

yj ≤ 1

i = 1,..., m

j =1

y j ∈ {0,1} n

∑y

j

,

(3)

j = 1,..., n

=k

j =1

where n is number of forms in the given set, m is number of items used in these forms, and the variable y j = 1 th when the j th form is included in the solution ( y j = 0 otherwise); coefficient aij = 1 if the j form contains the i th item ( aij = 0 otherwise). Problem (3) is called the maximum set packing problem, known to be NP-hard (Garey & Johnson, 1979).

6

Consider an abstract example to the maximum set packing problem. Given a set of five forms {A, B, C, D, E} consisting of three items each: A = {1, 2, 3}, B = {4, 5, 6}, C = {7, 8, 9}, D = {1, 4, 7}, and E = {5, 8, 10}, where items are denoted by numbers, then Problem (3) instantiates in the following:

maximize k subject to ⎛1 ⎜ ⎜1 ⎜1 ⎜ ⎜0 ⎜0 ⎜ ⎜0 ⎜0 ⎜ ⎜0 ⎜ ⎜0 ⎜0 ⎝

0⎞ ⎟ 0⎟ 0⎟ ⎟ ⎛ y1 ⎞ ⎛ 1⎞ 0⎟⎜ ⎟ ⎜ ⎟ y2 1 1⎟⎜ ⎟ ⎜ ⎟ . ⎜ ⎟ ⎜ ⎟ y3 ≤ 1⎟ 0⎟⎜ ⎟ ⎜ ⎟ y4 1 0 ⎟ ⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟ ⎟ ⎝ y5 ⎠ ⎝ 1 ⎠ 1⎟ ⎟ 0⎟ 1 ⎟⎠ y1 , y2 , y3 , y4 , y5 ∈ {0,1} y1 + y2 + y3 + y4 + y5 = k 0 0 0 1 1 1 0 0 0 0

0 0 0 0 0 0 1 1 1 0

1 0 0 1 0 0 1 0 0 0

The following presents pairs of overlapping forms: (A, D), (B, D), (B, E), (C, D), and (C, E). In the maximum set packing problem, we ask for a subset of maximum size containing only nonoverlapping forms. For this example, the solution is ⎛1 ⎞ ⎜ ⎟ ⎜1 ⎟ ⎜1 ⎟ , ⎜ ⎟ ⎜0⎟ ⎜0⎟ ⎝ ⎠ which corresponds to subset {A, B, C}. Given a set of forms, the identification of the biggest subset of nonoverlapping forms, such that they reflect a given density, can be formulated as the following modification of Problem (3): maximize k subject to k

n

∑∑ a z

ij rj

≤1

i = 1,..., m

r =1 j =1

zrj ∈ {0,1} k

n

∑∑ b

r = 1,..., k ; j = 1,..., n ,

z =k

(4)

rj rj

r =1 j =1 n

∑b

z ≤1

r = 1,..., k

z ≤1

j = 1,..., n

rj rj

j =1 k

∑b

rj rj

r =1

where the set of k ability levels {θ1 ,θ 2 ,..., θ k } is induced by the density and the variable brj = 1 if the information on the j th form exceeds τ at ability level θ r , r = 1,..., k ; otherwise, brj = 0 . Decision variable zrj = 1 if the j th form is assigned to ability level θ r , r = 1,..., k ; otherwise, zrj = 0 . The last two inequalities from Problem (4) require one-to-

7

one assignment between forms and ability levels. Problem (4) is dynamically dependent on the maximized value of k in terms of values of coefficients brj and number of inequalities and variables; therefore, it is difficult to solve directly. However, one can immediately see that for the same set of n forms the size of an optimal solution to Problem (3) is an upper bound for an optimal solution to Problem (4). This fact will be a foundation for the two approaches for solving Problem (4). The following algorithm solves Problem (4) as a series of its relaxations: Algorithm 2 (finding the maximum subset of nonoverlapping forms reflecting an ability density):

Input: ability density, set of n possibly overlapping forms. Output: maximum subset of nonoverlapping forms satisfying Definition 5. Step 1: Solve Problem (3), where the resulting optimal solution is k nonoverlapping forms. Step 2: Solve the following relaxation of Problem (4): k

n

maximize ∑∑ zrj r =1 j =1

subject to k

n

∑∑ a z

ij rj

≤1

i = 1,..., m

r =1 j =1

zrj ∈ {0,1} k

n

∑∑ b

r = 1,..., k ; j = 1,..., n

.

(5)

z =k

rj rj

r =1 j =1 n

∑b

z ≤1

r = 1,..., k

z ≤1

j = 1,..., n

rj rj

j =1 k

∑b

rj rj

r =1

Step 3: If an optimal solution of Problem (5) is found, then stop; otherwise, decrement k and if k > 0 then restate Problem (5) by recomputing ability levels {θ1 ,θ 2 ,..., θ k } and coefficients brj and go to Step 2. Problem (5) is not dynamic because k is fixed. Algorithm 2 finds an optimal solution to Problem (4) or stops when k = 0 . The next section will discuss IP solvers including the one that solves Problem (4) directly. Run Algorithm 2 for the previous example when ability density is N (0,1) . After Step 1, k = 3 because the maximum subset of nonoverlapping forms is {A, B, C}. Then the corresponding ability levels are θ1 = −0.967 , θ 2 = 0 , and θ3 = 0.967 . Assume that form A exceeds the information threshold at θ1 and θ 2 ; B at θ 2 ; C at θ 2 and θ3 ; D at θ1 ; and E at θ1 and θ 2 . Then the matrix brj , r = 1,...,3 , j = 1,...,5 is equal to:

⎛1 0 0 1 1⎞ ⎜ ⎟ ⎜1 1 1 0 1⎟ . ⎜0 0 1 0 0⎟ ⎝ ⎠ The corresponding instance of Problem (5) has an optimal solution:

⎛1 0 0 0 0⎞ ⎜ ⎟ ⎜0 1 0 0 0⎟ , ⎜0 0 1 0 0⎟ ⎝ ⎠ which corresponds to subset {A, B, C}, where form A is assigned to θ1 , B to θ 2 , and C to θ3 .

8

Integer Programming Solvers Problems 2–5 are examples of integer programs. To solve an integer linear or quadratic program, a commercial IP solver is usually used: for example, CPLEX (ILOG, Inc., 2003). Problems 2, 3, and 5 can be solved this way; for Problem (4), however, Algorithm 2 should be applied. In this paper, a Monte Carlo test assembler is employed (Belov & Armstrong, 2005). It is based on pure random search, which provides each form with an equal probability of being assembled. This is crucial for preparing n forms that are representative of all forms available from a given master pool. Problem (3) is solved with the branch-and-bound algorithm (Belov & Armstrong, 2006). This algorithm performs an enumerative search in a space of size 2 n . Employing various lower and upper bounds accelerates the convergence. The lower bound is the largest subset of nonoverlapping forms found so far. The upper bound is an upper estimate of the size of an optimal solution. Given the lower bound, the upper bound helps avoid a search through a set of forms with no optimal solution inside. In order to solve Problem (4), we modify the branch-and-bound algorithm by Belov and Armstrong (2006). Clearly, for the same set of forms, the size of an optimal solution to Problem (4) cannot exceed the size of an optimal solution to Problem (3). Thus, an upper bound for Problem (3) is an upper bound for Problem (4). However during the optimization, each subset of nonoverlapping forms having greater size than the lower bound has to be checked to see if it satisfies Definition 5. If it does, then the lower bound is updated with this subset; otherwise, the subset is filtered such that it will satisfy Definition 5, and then the lower bound is updated, if necessary. We use the following notation to describe the algorithms. Capital letters A, B, C ,... denote sets, and small letters a, b, c,... denote scalars, functions, and elements of a set. The number of elements in a set S is denoted by S ; ∅ denotes an empty set; := is an assignment operator. The following algorithm performs the filtering: Algorithm 3 (filtering forms):

Input: Lower bound L and the set of nonoverlapping forms Q , such that Q > L , each form q ∈ Q having an ability range [lq , rq ] where the form’s information exceeds τ ; forms are presorted in ascending order of the left limit l .

Output: Subset S ⊆ Q satisfying Definition 5. Step 1: Set k := Q . Step 2: Build vector {θ1 , θ 2 ,...,θ k } induced by the density. Step 3: Run Algorithm 4 to compute S given Q and {θ1 , θ 2 ,...,θ k } . Step 4: If S ≠ ∅ then set L := S and stop. Step 5: Set k := k − 1 , and if k > L then go to Step 2; otherwise, stop. Algorithm 4 (finding k -match):

Input: Set of nonoverlapping forms Q and vector {θ1 , θ 2 ,...,θ k } . Output: Subset S ⊆ Q , S = k satisfying Definition 5 or ∅ if no such subset exists. Step 1: Set i := 1 . Step 2: Construct subset W ⊆ Q such that θi ∈ [lw , rw ] , w ∈ W . Step 3: If W = ∅ , then set S := ∅ and stop. Step 4: Find form s ∈ W with the minimal right limit rs . Step 5: Add form s to S . Step 6: If i < k , then set i := i + 1 and go to Step 2; otherwise, stop.

9

For each k = Q , Q − 1,..., L + 1 Algorithm 3 employs Algorithm 4 to check if Q contains a subset of size k satisfying Definition 5. Algorithm 4 solves a k -matching problem on a bipartite graph G (V1 ,V2 , E ) , where V1 contains Q forms and V2 contains k ability levels; an edge from E connects a form q with an ability level θ if θ ∈ [lq , rq ] . A solution to k -matching problem consists of k disjoint edges. Since forms are presorted, the complexity of Algorithm 4 is O ( Q + E ) . Note that Problems (4) and (5) also require that a k -match be found.

A Formal Statement of the Problems and an Outline of the Algorithms Now we are ready to formulate the direct and inverse problems and outline an approach to solving each. Direct Problem

Given a master pool, test assembly constraints, and an expected ability density, assemble the maximum number of nonoverlapping forms reflecting this ability density (see Definition 5). Given k nonoverlapping forms to assemble, information threshold τ , and ability density p(θ ) , it is simple to build a CAT pool. First, compute k ability levels {θ1 ,θ 2 ,..., θ k } induced by density p(θ ) . Second, for each θi , i = 1, 2,..., k assemble a form with Fisher information exceeding threshold τ at that θi , and in the case of successful assembly, withdraw the items of the assembled form from the master pool. After k successful assemblies, the resulting CAT pool will contain k nonoverlapping forms reflecting density p(θ ) . For a large k , however, the withdrawal of items after each successful assembly can block further assembly of nonoverlapping forms; that is, it can make the next assembly infeasible. This paper addresses this difficulty by the following steps: (a) assemble multiple forms, possibly overlapping; and (b) find a maximum subset of nonoverlapping forms reflecting the density p(θ ) . Thus, to solve the direct problem we will assemble n forms suited to the ability density and then solve Problem (4). Inverse Problem

Given a master pool and test assembly constraints, identify the ability density that will maximize the objective of the direct problem. To solve the inverse problem we will assemble n forms suited to the uniform ability density and then solve Problem (3). The resulting forms will be used to construct the desired density. Direct Algorithm

A feasible solution to the direct problem is provided by the following: Algorithm 5:

Input: Ability density, master pool, test assembly constraints, and number η of sets of nonoverlapping forms to assemble. Output: Nonoverlapping forms reflecting the ability density. Step 1: Generate a random value of θ from the ability density. Step 2: Assemble a form such that its information at θ is greater than τ . Step 3: In the case of success, withdraw this form’s items from the pool and go to Step 1. Step 4: Decrement η . Step 5: If η > 0 , then return all items of the assembled forms back to the master pool and go to Step 1. Step 6: Find among all assembled forms the largest subset of nonoverlapping forms reflecting the ability density; that is, solve Problem (4). Note that Steps 1–3 perform a sequential assembly of nonoverlapping forms. However, because of Step 5, forms considered at Step 6 may overlap. As a result of Step 1, each of the η sets of nonoverlapping forms tends to reflect the ability density. The major advantage of the direct algorithm over the approach by van der Linden, Ariel, and Veldkamp (2006) is that multiple forms are assembled one by one and the size of Problem (4) is independent of the number of test assembly

10

constraints, which makes the direct problem considerably more tractable. This permits the handling of very large master pools and assembling a greater number of nonoverlapping forms under numerous constraints. Inverse Algorithm

A possible strategy for solving the inverse problem is presented by the following modification of the direct algorithm: Algorithm 6:

Input: Master pool, test assembly constraints, and number η of sets of nonoverlapping forms to assemble. Output: Ability density. Step 1: Generate a random value of θ uniformly distributed on [−3, 3]. Step 2: Assemble a form such that its information at θ is greater than τ . Step 3: In case of success, withdraw this form’s items from the pool and go to Step 1. Step 4: Decrement η . Step 5: If η > 0 , then return all items of assembled forms back to the master pool and go to Step 1. Step 6: Find among all assembled forms the largest subset S of nonoverlapping forms; that is, solve Problem (3). Step 7: For each form s ∈ S , construct the following step function: ⎧1, θ ∈ [ls , rs ] f s (θ ) = ⎨ , ⎩0, θ ∉ [ls , rs ]

(6)

where the Fisher information of form s exceeds the threshold τ along ability range [ls , rs ] .

Step 8: The output of the algorithm is a normalized sum of the step functions:

∑ f (θ ) s

+∞

s∈S

.

(7)

∫ ∑ f ( x)dx s

−∞ s∈S

Computer Experiments Consider a master pool with 1,830 Law School Admission Test (LSAT) items. The sum of the information functions of all items in this pool is shown in Figure 3. The distributions of the discrimination, difficulty, and guessing parameters of the items in the pool had the following means and variances, respectively: (a) mean 0.75, variance 0.06; (b) mean 0.48, variance 1.33; (c) mean 0.17, variance 0.01.

11

FIGURE 3. Information function of the master pool

Each assembled form must contain 25 items and satisfy test assembly constraints for the Logical Reasoning section of the LSAT. Algorithm 1 ran 1,000 times, resulting in τ = 4.00 . For the direct and inverse algorithms, η = 10 , and the number of iterations of branch-and-bound algorithms for Problems (3) and (4) was limited to 10,000. The direct problem was studied, where the ability density was assumed to be N (0,1) . The direct algorithm assembled a CAT pool P17, with 17 nonoverlapping forms. The inverse problem was studied. The output of the inverse algorithm is shown in Figure 4, where the number of assembled nonoverlapping forms was 58.

FIGURE 4. A feasible solution to the inverse problem f (θ ) , computed from 58 nonoverlapping forms

12

The output of the inverse algorithm was used as an input for the direct algorithm. Then the direct algorithm assembled a CAT pool P59, with 59 nonoverlapping forms. The information functions of the CAT pools P17 and P59 were normalized by dividing by the number of nonoverlapping forms: 17 and 59, respectively. The normalized information functions of the CAT pools P17 and P59 are shown in Figure 5. One can see how two structurally different CAT pools can have a close resemblance in total information.

FIGURE 5. Normalized information functions of CAT pools P17 (thin line) and P59 (thick line)

Discussion The first half of this paper extends the ideas of van der Linden, Ariel, and Veldkamp (2006) to assemble a CAT pool from a master pool as a specially constructed set of nonoverlapping forms. The extension aims toward balancing a specific precision of measurement and utilization of the master pool. By fixing an information threshold and then solving the direct problem, it is possible to assemble such a CAT pool of maximum size. A higher value of the information threshold allows the assembly of a CAT pool with higher measurement accuracy. A lower value of the information threshold allows the assembly of a CAT pool with a greater number of items. Algorithm 5 (direct algorithm) was formulated to provide a feasible solution to the direct problem. At Step 6, it runs a branch-and-bound search for the solution to Problem (4) by employing Algorithms 3 and 4. Computer experiments were performed on a master pool with 1,830 items. The ability density of N (0,1) was assumed. Algorithm 1 estimated the value of the information threshold as 4.00. Then the direct algorithm assembled a CAT pool of 17 nonoverlapping forms. It resulted in 17 × 25/1830 × 100 = 23% utilization of the master pool. The major advantage of the direct algorithm is the division of a large optimization problem considered by van der Linden, Ariel, and Veldkamp (2006) into two smaller subproblems: (a) assemble multiple forms sequentially; and (b) find the maximum subset of nonoverlapping forms reflecting an assumed ability density. The direct algorithm can be easily implemented using a commercial IP solver, such as CPLEX (ILOG, Inc., 2003). For this, at Step 6 of the direct algorithm, Algorithm 2 should be applied. The second half of this paper addresses the problem of identifying the ability density that will maximize the objective of the direct problem. It is formulated as an inverse problem, and Algorithm 6 (inverse algorithm) provides a feasible solution. Computer experiments demonstrated that a feasible solution to the inverse problem allowed the direct algorithm to assemble a CAT pool with 59 nonoverlapping forms for the same information threshold. It resulted in 59 × 25/1830 × 100 = 81% utilization of the master pool. Figure 4 shows that the identified ability density is shifted to the difficult side of N (0,1) with a maximum at about 0.5. This is consistent with the heavier right tail of the information function of the master pool in Figure 3. The inverse algorithm can be easily implemented by employing a commercial IP solver, such as CPLEX (ILOG, Inc., 2003), since at Step 6 the linear Problem (3) is solved. The methods presented can be used to assemble multiple nonoverlapping CAT pools of a fixed size. First, assemble multiple overlapping CAT pools, where each CAT pool has the same number of nonoverlapping forms. Nonoverlapping forms for each CAT pool can be assembled sequentially. Second, interpret each assembled CAT pool as a subset of items, and then formulate a corresponding maximum set packing problem; its solution provides multiple nonoverlapping CAT pools.

13

The direct and inverse algorithms provide testing organizations with effective means to maintain their master pools and produce CAT pools balancing measurement accuracy and item exposure.

References Ariel, A., Veldkamp, B. P., & van der Linden, W. J. (2004). Constructing rotating item pools for constrained adaptive testing. Journal of Educational Measurement, 41(4), 345–359. Belov, D. I. (2008). Uniform Test Assembly. Psychometrika, 73(1), 21–38. Belov, D. I., & Armstrong, R. D. (2005). Monte Carlo test assembly for item pool analysis and extension. Applied Psychological Measurement, 29(4), 239–261. Belov, D. I., & Armstrong, R. D. (2006). A constraint programming approach to extract the maximum number of nonoverlapping test forms. Computational Optimization and Applications, 33(2/3), 319–332. Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: A guide to the theory of NP-completeness. New York, NY: W.H. Freeman and Company. ILOG, Inc. (2003). CPLEX 9.0 [Computer program and manual]. Mountain View, CA: Author. [www.ilog.com] Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum. Stocking, M. L., & Swanson, L. (1998). Optimal design of item pools for computerized adaptive tests. Applied Psychological Measurement, 17, 271–279. van der Linden, W. J., Ariel, A., & Veldkamp, B. P. (2006). Assembling a CAT item pool as a set of linear tests. Journal of Educational and Behavioral Statistics, 31(1), 81–99. Veldkamp, B. P., & van der Linden, W. J. (2000). Designing item pools for computerized adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 149–162). Boston, MA: Kluwer Academic.

Balanced Item Pool Assembly in Computerized Adaptive Testing (RR ...

Balanced Item Pool Assembly in Computerized Adaptive Testing (RR ...

Suggest Documents

Item Pool Size on Adaptive Testing - CiteSeerX

Multiple Objective Item Selection in Computerized Adaptive Testing

Constraining Item Exposure in Computerized Adaptive Testing With ...

Assembling a Computerized Adaptive Testing Item Pool as a Set of ...

Chapter 8 Designing Item Pools for Computerized Adaptive Testing

Bayesian Computerized Adaptive Testing - SciELO

Computerized Adaptive Testing - Boston College

Evaluating Computerized Adaptive Testing Systems

Computerized Adaptive Testing - Semantic Scholar

Computerized Adaptive Test based on Item

Adaptive Item Calibration - International Association for Computerized ...

Practical Questions in Introducing Computerized Adaptive Testing ...

Computerized adaptive testing in instructional settings

Computerized Adaptive Testing in Industrial and Organizational ...

Conditional Item-Exposure Control in Adaptive Testing Using Item

Item Selection Rules in Computerized Adaptive ... - Hogrefe eContent

Item usage in a multidimensional computerized adaptive test ... - iacat

Computerized Adaptive Test based on Item Response Theory in E ...

Computerized Adaptive Testing: A Primer - Semantic Scholar

Journal of Computerized Adaptive Testing - iacat

Overview of the computerized adaptive testing special

FIRESTAR: Computerized Adaptive Testing (CAT) Simulation Program ...

Computerized Adaptive Testing for Public Opinion Surveys

Journal of Computerized Adaptive Testing - CiteSeerX