Feb 29, 2016 - development life cycle framework for initial planning, and also as a structure ... legal defensibility, a
A Test Development Life Cycle Framework for Testing Program Planning Pamela Ing Stemmer, Ph.D. February 29, 2016 How can an organization (test sponsor) successfully execute a testing program? Successfully operating a testing program with a valid, legally defensible examination is no small task. Reviewing necessary information and making numerous decisions can be overwhelming. Effective planning and successful execution of a testing program may be achieved by using a test development life cycle framework. In this paper, we present a framework for planning and decision making, to guide successful execution of new and existing testing programs. We recommend the use of this test development life cycle framework for initial planning, and also as a structure for evaluating current testing program functioning. In this document, we introduce each test development stage. Further details, including issues to address at each stage of the test development cycle, questions to consider as a test sponsor, and questions to ask potential or existing vendors will be presented throughout the rest of this white paper series. This white paper series is focused on testing programs in the domain of certification and licensure. The purpose of a credential is to provide evidence to the public that an individual meets the minimum level of competence required for safe practice in a particular profession. Many certification and licensure programs develop one or more valid tests to assure that the knowledge and performance capabilities of their credential holders meet those standards. To achieve validity, legal defensibility, and public confidence in the credential, effort is required throughout the planning, execution, and evaluation of a testing program. Important questions need to be considered by test sponsors and information needs to be gathered from potential and/or existing vendors. Planning ahead, using a test development life cycle framework, such as the one presented in Figure 1, can ensure that important questions and decisions are not overlooked throughout the planning and execution of a testing program. Benefits of planning ahead, using a test development life cycle framework include, but are not limited to, ensuring security, quality assurance, validity evidence, legal defensibility, and preparation for accreditation application. Measurement professionals employ the 2014 Joint Standards for Educational and Psychological Testing (AERA/APA/NCME, 2014) and the National Commission for Certifying Agencies (NCCA) 2016 Standards for the Accreditation of Certification Programs (Institute for Credentialing Excellence, 2014) as psychometric criteria for validating examinations, such as licensing and certification examinations.
1
The test development life cycle includes the following stages: job analysis, test specifications, item development, test assembly, standard setting, equating, test administration, scoring and reporting, and technical report.
1. Job Analysis 9. Technical Report
2. Test SpecificaWons
8. Scoring and ReporWng
3. Item Development
7. Test AdministraWon
4. Test Assembly
6. EquaWng
5. Standard Se[ng
Figure 1. The test development life cycle provides a useful framework for addressing important questions and decisions throughout the planning, execution, and evaluation of a testing program.
2
1. Job Analysis A job analysis (or practice analysis) is an empirical method used to define the content that is to be measured by an examination (Downing, 2006; Figure 1). Use of a job analysis study as a method of defining examination content is clearly endorsed by both the 2016 NCCA Standards (Standards 13 and 14) and the 2014 Joint Standards (Standards 4.12, 11.03, and 11.13). For credentialing programs, NCCA Standard 14 states that a job analysis must be conducted and documented, in order to delineate and evaluate job responsibilities and content domains associated with the credential’s purpose. Numerous approaches may be taken when conducting a job analysis. Common methods and issues to consider when planning a job analysis will be discussed in the next installment of this white paper series. 2. Test Specifications Test specifications provide a detailed, comprehensive description of an examination’s components and features (Figure 1). Test specifications include a test blueprint, which outlines the number or proportion of items assigned to measure each content domain (and subdomain, if applicable). Other elements of test specifications include the following: test length; item format(s); maximum testing time; candidate directions; test administration procedures; any permissible materials; and procedures for scoring and reporting (AERA/APA/NCME, 2014). According to The 2014 Joint Standards (Standards 4.01 and 4.02) and the 2016 NCCA Standards (Standard 15), comprehensive test specifications must be established and documented. Issues to consider regarding creating and updating test specifications, such as time and processes, will be presented in an upcoming installment of this white paper series. 3. Item Development Item development comprises item writing and item review (Figure 1). Important item development issues to consider include, but are not limited to, the following: subject matter expert (SME) recruitment and training; guidelines for item content and structure; security of item content; item authoring methods; item banking. The content domains outlined in the test blueprint guide the item development process. The 2014 Joint Standards (Standards 4.07 and 4.08) and the 2016 NCCA Standards (Standards 13 and 16) emphasize the importance of developing and documenting systematic item development procedures. Questions to address regarding item development will be discussed in an upcoming installment of this white paper series. 4. Test Assembly Item selection for an examination form should be guided by the test blueprint (Figure 1). In addition to alignment with test blueprint, there are many issues for test sponsors to consider regarding test assembly, including: key balance; placement of anchor items and accounting for an equating model; inclusion and coverage of pretest items; test security; and quality control. There are many approaches to test form assembly, which may be appropriate for paper and pencil administration, computer based testing (CBT), or both. Test form assembly should be guided by psychometric principles, such as classical test theory (CTT) or item response theory 3
(IRT). Tests may be assembled so as to be presented in a fixed format or with individualized elements, such as with computer adaptive testing (CAT) or linear on the fly testing (LOFT). Expectations for test assembly are addressed in the 2016 NCCA Standards (Standard 16) and the 2014 Joint Standards (Standard 4). Further discussion regarding test assembly issues and questions will be presented in an upcoming installment of this white paper series. 5. Standard Setting Standard setting refers to the process by which a passing score for an examination is established (Cizek, 2006; Figure 1). In certification and licensure, the goal of standard setting is to determine the level of performance (i.e., knowledge, skill and/or ability) required by an individual to demonstrate minimal competence in a particular profession. Issues for test sponsors to consider include: method selection; SME panel composition, recruitment, and training; and determining when standard setting is appropriate, versus equating a new form with an existing examination form. Standard setting is endorsed by the 2016 NCCA Standards (Standards 13 and 17) and the 2014 Joint Standards (Standards 5.22 and 11.16). Methods of standard setting and issues to consider when planning a standard setting will be presented in a future installment of this white paper series. 6. Equating According to the 2016 NCCA Standards (Standard 21), testing programs must establish that candidates will not receive any advantages or disadvantages resulting from content structure and/or difficulty across different forms of an examination (Figure 1). Issues for test sponsors to consider regarding equating include: selection of appropriate statistical procedures (i.e., equating models); establishing equivalence across examination forms that have been translated into different languages; and ensuring that examination forms are compliant with the requirements of the test blueprint and selected equating model. Questions to ask and issues to consider regarding equating will be discussed in an upcoming installment of this white paper series. 7. Test Administration Test administration is the most visible component of the test development life cycle and delivers vital evidence of test score validity, when it is executed in an organized, consistent, and effective manner (Downing, 2006; Figure 1). There are many elements of test administration for test sponsors to consider, such as: proper candidate identification; standardization of testing environments; examination security; proctor training and monitoring; quality control; and candidate accommodation. The 2014 Joint Standards (Standards 6.01 – 6.07) and the 2016 NCCA Standards (Standard 18) assert the need for testing programs to establish and comply with policies and procedures designed to protect examination content and ensure standardized candidate experiences. Further discussion regarding test administration issues to consider will be presented in a future installment of this white paper series. 4
8. Scoring and Reporting There are many different ways to apply a scoring key to candidate responses from an examination, ranging from simple (e.g., number of correct out of total number of items) to complex (e.g., partial credit, penalties for incorrect responses; Figure 1). The range of valid scores may be changed to suit the program's needs or the candidates' understanding, through the use of raw, percentage, or scaled scores. Other important scoring issues for test sponsors to address include key validation, item analysis, and any additional statistical analysis information required for the testing program (e.g., quarterly reports, annual reports). With respect to score reporting, decisions need to be made regarding: what type of information will be provided to candidates; what information will be reported to the test sponsor (e.g., total scores, subscores, average examination performance statistics); in what format will information be provided to candidates and the sponsor organization (e.g., paper, electronic, preliminary, official); whether contextual cues such as explanatory text, visual representations, or uncertainty information will be included; and on what type of timeline will the information be provided (e.g., instant, delayed). Issues related to scoring and reporting are addressed by both the 2016 NCCA Standards (Standard 19) and the 2014 Joint Standards (Standards 6.10 – 6.16). Further detail regarding scoring and reporting issues to consider will be discussed in an upcoming installment of this white paper series. 9. Technical Report Thorough documentation of the entire test development process provides validity evidence for the testing program (Figure 1). With respect to technical reports, it is important to consider the level of documentation that test sponsors should receive from contracted vendors, which is guided by the 2014 Joint Standards (Standard 7.04) and the 2016 NCCA Standards (Standards 13 – 21). Additional considerations regarding technical documentation will be provided in a future edition of this white paper series. Operating a successful testing program with a valid, legally defensible examination, is no small feat. Information gathering, review, and decision-‐making can be overwhelming. Using a test development life cycle framework may facilitate effective planning and successful execution of a testing program. For more detailed descriptions, issues, and questions to ask regarding each stage of the test development life cycle, check the Comira website at the end of each month for the next white paper publication. If you would like a free one-‐hour consultation on the quality of your testing program/planning, please use the following link to contact Comira’s Psychometric Team.
5
References American Educational Research Association, American Psychological Association, National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: AERA. Cizek, G. J. (2006). Standard setting. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 225 -‐ 258). Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Downing, S. M. (2006). Twelve steps for effective test development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 3 -‐ 25). Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Institute for Credentialing Excellence. (2014). National Commission for Certifying Agencies Standards for the Accreditation of Certification Programs. Washington, DC: ICE.
6