Documentation for the Software Package SQE

Research Memorandum ETS RM–14-02

Documentation for the Software Package SQE Lili Yao Sandip Sinharay Shelby J. Haberman

April 2014

ETS Research Memorandum Series EIGNOR EXECUTIVE EDITOR James Carlson Principal Psychometrician

ASSOCIATE EDITORS Beata Beigman Klebanov Research Scientist

Gary Ockey Research Scientist

Heather Buzick Research Scientist

Donald Powers Managing Principal Research Scientist

Brent Bridgeman Distinguished Presidential Appointee

Gautam Puhan Senior Psychometrician

Keelan Evanini Managing Research Scientist

John Sabatini Managing Principal Research Scientist

Marna Golub-Smith Principal Psychometrician

Matthias von Davier Director, Research

Shelby Haberman Distinguished Presidential Appointee

Rebecca Zwick Distinguished Presidential Appointee

PRODUCTION EDITORS Kim Fryer Manager, Editing Services

Ayleen Stellhorn Editor

Since its 1947 founding, ETS has conducted and disseminated scientific research to support its products and services, and to advance the measurement and education fields. In keeping with these goals, ETS is committed to making its research freely available to the professional community and to the general public. Published accounts of ETS research, including papers in the ETS Research Memorandum series, undergo a formal peer-review process by ETS staff to ensure that they meet established scientific and professional standards. All such ETS-conducted peer reviews are in addition to any reviews that outside organizations may provide as part of their own publication processes. Peer review notwithstanding, the positions expressed in the ETS Research Memorandum series and other published accounts of ETS research are those of the authors and not necessarily those of the Officers and Trustees of Educational Testing Service. The Daniel Eignor Editorship is named in honor of Dr. Daniel R. Eignor, who from 2001 until 2011 served the Research and Development division as Editor for the ETS Research Report series. The Eignor Editorship has been created to recognize the pivotal leadership role that Dr. Eignor played in the research publication process at ETS.

Documentation for the Software Package SQE

Lili Yao, Sandip Sinharay,1 and Shelby J. Haberman Educational Testing Service, Princeton, New Jersey

April 2014

Find other ETS-published reports by searching the ETS ReSEARCHER database at http://search.ets.org/researcher/

To obtain a copy of an ETS research report, please visit http://www.ets.org/research/contact.html

Action Editor: Matthias von Davier Reviewers: Yi-Hsuan Lee and Jonathan Weeks

Copyright © 2014 by Educational Testing Service. All rights reserved. ETS, the ETS logo, and LISTENING. LEARNING. LEADING. are registered trademarks of Educational Testing Service (ETS).

Abstract This paper introduces documentation for the software package SQE (Subscore Quality Evaluation). This package, which includes SAS macro functions and a FORTRAN program, is designed to produce subscores based on the methods of Haberman (2008) and Wainer et al. (2001). An example illustrates use of this software package with different types of input files. Key words: observed subscores, augmented subscores

i

Recently, subscores have attracted much attention from psychometricians, statisticians, and test users due to their potential diagnostic value (Haberman, 2008; Haberman, Sinharay, & Puhan, 2009). To improve accuracy of estimation of a true subscore, a linear combination of an observed subscore and an observed total score (an augmented subscore) can be employed (Haberman, 2008). To satisfy the needs of researchers and test users, a software package called SQE (Subscore Quality Evaluation) has been produced to evaluate the value of observed subscores and to produce augmented subscores . The package includes a FORTRAN 90 program for list directed files and SAS macro programs for SAS data files. This software package is used for three purposes: (a) to determine if subscores have added value, (b) to determine if augmented subscores have added value, and (c) to produce augmented subscores. Description and Example SQE is a software package that is designed to produce several observed subscores and augmented subscores. This software package contains a FORTRAN 90 program and SAS IML macro programs. The software can be obtained from Lili Yao ([email protected]). The FORTRAN code file is SQE.f90, and it has a corresponding executable file SQE.exe if Windows is used. Use of the program requires an input control file and may require a final item data for individual examinees. All input is list directed, so that entries in records are separated by spaces or commas. There are two cases for the input, depending on whether the analysis uses all examinee responses or only uses summary statistics. In the first case, in which all examinee responses are used, the input control file includes the case of the input file, the number of examinees, the number of items in each subscore, the name of the item response file, the name of the output file for the results evaluating the subscore quality, and the name of the output file for the subscores and augmented subscores of each examinee. That is, in this case, there are two sets of output files. The initial record in the input file includes the number of subscores, the second record indicates the case (1) for the input file, and the third record provides the number of examinees. The fourth record provides the number of items per subscore. The fifth record gives the name of the input file (the item response file). The sixth record gives the name selected by the user for the file containing the analysis of added value of subscores, and the last record provides the name selected by the user for the output file containing the subscores and augmented subscores. In the item response file, each record represents an examinee and includes each item score. The scores on the items contributing to the first subscore should appear first, 1

followed by those for the second subscore, and so on. The total score used in the analysis is the sum of all item scores, while a subscore is the sum of all item scores for items associated with the subscore. In the second case, in which a data summary is available rather than the original item responses, the input control file provides the case of the input file, the number of subscores, the correlation matrix of the subscores and, for each subscore, its standard deviation and reliability and the name of the output file for the results of evaluating the subscores. That is, there is only one output file in this case. The first record in the input file includes the number of subscores and the second record indicates the case (2) for the input file. Then, there are two records with the reliabilities and standard deviations of the subscores. The next records display the correlation matrix of the subscores. Each record provides a row of the correlation matrix. The final record specifies the file name of the output file for the analysis. To illustrate use of this software package with both versions of input control files, consider a case with four subscores and 31,001 observations. Table 1 summarizes the data information required to construct input control files. Table 1 The Data Structure of the Example Variable All items Subscore 1 Subscore 2 Subscore 3 Subscore 4 Number of subscores 4 Number of observations 31,001 Number of items in a 30 29 30 30 Item response file Responses.dat Subscore reliability 0.71 0.83 0.73 0.71 Subscore SD 3.79 5.36 4.47 4.30 Correlation with Subscore 1 1.00 0.59 0.58 0.59 Correlation with Subscore 2 0.59 1.00 0.53 0.60 Correlation with Subscore 3 0.58 0.53 1.00 0.64 Correlation with Subscore 4 0.59 0.60 0.64 1.00 Based on the data in Table 1, the input control files for these two versions are shown in Tables 2 and 3.

2

Table 2 FORTRAN Input Control File for Item Scores Input field 4 1 31001 30 29 30 30 Responses.dat Result1.out Augscores.out

Comment field !The number of subscores !The case of the input file !The number of observations !The number of items per subscore !The item response file !The output file with results from evaluation of subscore quality !The output file with subscores and augmented subscores

Table 3 FORTRAN Input Control File for Data Summary Input field 4 2 31001 0.71 0.83 0.73 0.71 3.79 5.36 4.47 4.30 1.00 0.59 0.58 0.59 0.59 1.00 0.53 0.60 0.58 0.53 1.00 0.64 0.59 0.60 0.64 1.00 Result2.out

Comment field !The number of subscores !The case of input file !The number of observations !The reliabilities of the subscores !The standard deviations of the subscores !Correlations of subscores with the first subscore !Correlations of subscores with the second subscore !Correlations of subscores with the third subscore !Correlations of subscores with the fourth subscore !The output file

The SAS macro programs are also prepared for both cases, in which SQE_ItemScores.sas is the SAS macro file for the input of item scores and SQE_DataSummary.sas is the SAS macro code for the input of data summary. In SQE_ItemScores.sas, SQE_ItemScores, the function for item scores, has five input parameters: the item response file, the number of subscores, the vector containing the number of items in each subscore, and the output file for the results evaluating the subscore quality, the output file for subscores and augmented subscores for all examinees. That is, there are two sets of output files in this case. The path and the name of the output file can be defined by users. Note

3

that the input format of the item response file is an SAS data file, so it is necessary to convert the item response file to an SAS data set if the original file is in some other format, as is the case in the example. An SAS program for the example is provided in Table 4. Table 4 SAS Code for Input of Item Scores Command filename Responses ‘C:/work/Responses.dat’; filename outfile

Comment /*Input file*/ /*Output file with results from evaluation subscore quality*/ /*Output file with subscores and subscores*/ /*The SAS data file*/ /*The reading specification*/ /*The input format*/

filename augfile data score; infile Responses ; input item001 1. (item002-item119)(2.); run; %SQE_ItemScores (score, 4, 30 29 30 30, outfile, augfile); run;

/*The macro function for item scores*/

In SQE_DataSummary.sas, SQE_DataSummary, the function for data summary, has five input parameters: the number of subscores, the vector of subscore reliabilities, the vector of standard deviations of subscores, the SAS data set that provides the correlation matrix of the subscores, and the output file for the results of evaluating the subscores. That is, there is only one output file in this case. The path and the name of the output file can be defined by users. In this example, the correlation matrix is read from the file ‘corsub.txt’ specified in Figure 1. The SAS program for this case is provided in Table 5.

1.00 0.59 0.58 0.59

0.59 1.00 0.53 0.60

0.58 0.53 1.00 0.64

0.59 0.60 0.64 1.00

Figure 1. The correlation matrix file corsub.txt.

4

Table 5 SAS Code for Data Summary Command filename inputxt ‘C:\work\corsub.txt’; filename outfile ‘C:\work\OutFile_DataSummary.pdf’; data corsub; infile inputxt; input @1 corsub1 5.3 @7 corsub2 5.3 @13 corsub3 5.3 @19 corsub4 5.3; run; %SQE_DataSummary(4,0.71 0.83 0.73 0.71,3.79 5.36 4.47 run;

Comment /*Input file*/ /*Output file*/ /*Data set*/ /*Input*/ /*Format*/

/*Macro*/

The summary of output for both the FORTRAN program and the SAS macros include the means of subscores, the standard deviations of subscores, the correlations between subscores, the total-score reliability, the proportional reductions in mean squared errors (PRMSEs) for different kinds of subscores such as the subscore-based estimate, the total-score-based estimate, the Haberman-augmented subscores (Haberman, 2008), and the Wainer-augmented subscores (Wainer et al., 2001). In addition, the indicators of whether those estimates have any added value are produced. The output file for the example is shown in Figure 2. In addition, if the original item responses of the examinees are provided as input (the first case), the programs produce Haberman-augmented subscores (Haberman, 2008). Each line of this file contains the subscores of an examinee followed by the corresponding Habermanaugmented subscores (Haberman, 2008).

5

Means of the subscores: 22.90 18.89 17.57 19.14 Standard deviations of the subscores: 3.79 5.36 4.47 4.30 Simple correlations between subscores: 1.00 0.59 0.58 0.59 0.59 1.00 0.53 0.60 0.58 0.53 1.00 0.64 0.59 0.60 0.64 1.00 Disattenuated correlations between subscores: 1.00 0.78 0.80 0.84 0.78 1.00 0.68 0.78 0.80 0.68 1.00 0.88 0.84 0.78 0.88 1.00 Total score reliability: 0.911 PRMSEs for subscore-based estimate: 0.71 0.83 0.73 0.71 PRMSEs for total-score-based estimate: 0.77 0.74 0.75 0.82 PRMSEs for Haberman-augmented subscores: 0.82 0.86 0.81 0.84 PRMSEs for Wainer-augmented subscores: 0.82 0.86 0.82 0.84 Value-added indicator for subscore-based estimate: 0100 Value-added indicator for Haberman-augmented subscores: 1111 Value-added indicator for Wainer-augmented subscores: 0000 Figure 2. Output file for the example.

Program Availability and Conditions of Use The SQE software package is designed to evaluate the value of several observed subscores and augmented subscores. This package includes a FORTRAN 90 program and SAS macro programs. The SAS macro functions can be used on any computing platform for which SAS IML is available. The FORTRAN program can be used on a computing platform where a FORTRAN 90, 95, or 2003 compiler is available. The software package is fully available internally within ETS. Outside ETS, the programs are available at no cost for academic and other noncommercial use.

6

References Haberman, S. J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33, 204–229. Haberman, S. J., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62, 79–95. Wainer, H., Vevea, J. L., Camacho, F., Reeve, B. B., Swygert, K. A., & Thissen, D. (2001). Augmented scores—“Borrowing strength” to compute scores based on small numbers of items. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 343–387). Hillsdale, NJ: Lawrence Erlbaum.

7

Notes 1

Sandip Sinharay is now at CTB/McGraw-Hill ([email protected]).

8