Available online at www.sciencedirect.com
ScienceDirect Procedia - Social and Behavioral Sciences 217 (2016) 806 – 812
Future Academy®’s Multidisciplinary Conference
Computerized Adaptive Testing with Reflective Feedback : A Conceptual Framework Nhabhat Chaimongkola, Shotiga Pasipholb*, & Sirichai Kanjanawaseec a
b
Ph.D. Candidate, Faculty of Education, Chulalongkorn University, Bangkok 10330, Thailand Associate Professor, Ph.D., Faculty of Education, Chulalongkorn University, Bangkok 10330, Thailand c Professor, Ph.D., Faculty of Education, Chulalongkorn University, Bangkok 10330, Thailand
Abstract Computerized Adaptive Testing (CAT) has been developed for many years. The main concept of CAT is the administration of items that are proper to each examinee’s ability. In general, CAT has three important components, starting point, item selection and ability estimation, and stopping rules. Although CAT has been used for a long time, but still lack of ability estimation. In 2014, Zheng and Chang developed a new algorithm to fulfill the lack of ability estimation which is called On-the-Fly Assemble Multistage Adaptive testing or OMST. The study of Zheng and Chang shown that OMST is outperformed the traditional CAT. Furthermore, CAT reports testing score including feedback to be a summative assessment. However feedforward, one of formative assessment, is important to improve the learners. Both of feedback and feedforward in this research are referred to as reflective feedback. In this research, we would like to purpose a new conceptual framework which combines OMST and reflective feedback together. Items that we use for testing came from Information Technology Professional Examination (only IP level) from 2010 to 2014. Selected items into item pools will be analyzed by item response theory (IRT). The results of CAT with this framework will help protect over and under estimate ability of the examinees and will show instant feedback and feedforward simultaneously. © by Elsevier Ltd. by This is an open © 2016 2016Published The Authors. Published Elsevier Ltd.access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of Future Academy® Cognitive Trading. Peer-review under responsibility of Future Academy® Cognitive Trading Keywords: Computerized Adaptive Testing; Reflective Feedback
* Corresponding author. Tel.: +66 2218 2565-97 ext. 504; fax: +66 2218 2559. E-mail address:
[email protected]
1877-0428 © 2016 Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of Future Academy® Cognitive Trading doi:10.1016/j.sbspro.2016.02.151
Nhabhat Chaimongkol et al. / Procedia - Social and Behavioral Sciences 217 (2016) 806 – 812
807
1. Introduction Computerized adaptive testing (CAT) is tailored testing concept, items are selected according to the ability of each examinee. Testing starts with moderate difficult item. If an examinee answers an item correctly, next item that is administered to the examinee will be more difficult, if not, easier item is administered. Testing will continue until meet the stopping criterion, then the testing will stopped. (Kanjanawasee, 2012). At the present time, CAT is widely used for admission such as Graduate Record Examination (GRE) and Graduate Management Admission Test (GMAT) (Chang, 2004). CAT integrated test theory and technology together for more precision in measurement. CAT component has three main important things are starting point, item selection and ability estimation, and stopping criterion (Chang, 2004; 2014; Phankokkruad, 2012). Even though CAT has many advantages but still lack of ability estimation in GRE and GMAT in 2000 and 2003 respectively. Because CAT estimation based on maximum information which is more complicated when the test is large scale. So, researchers have developed multistage testing (MST) which help decrease over and under ability estimation (Chang, 2014; Zheng, & Chang, 2014). MST does not use only one item to estimate ability of examinee but also estimates ability from stage. In MST separated items into stages, each stage has many modules, while each module has different difficult level. It help decrease over and under estimate ability of examinee when test was started. Moreover, MST help decreased stress of examinees because they allow examinees to revisit previous question in the same stage which traditional CAT do not. Although MST has many advantage over CAT but it is still limited. The panels of MST are pre-assembled. So in the large scale testing might have more error because panels were assembled by human. Therefore, in 2014 Zheng and Chang had integrated concept of CAT and MST together, called On-the-Fly assemble Multistage Adaptive Testing or OMST. OMST is similar to traditional CAT in selecting item on the fly and it is similar to MST in separating item in stage. Besides, OMST will reduce error from human by assembling and controlling psychometric properties with computer algorithm (Chang, 2014; Zheng, & Chang, 2014). Concept of OMST just has been prototyped. If we could implement in real testing, it will be extremely helpful testing system. Generally, CAT will report feedback with score from testing, but we cannot found feedforward simultaneously with feedback. Both feedback and feedforward in this research are called reflective feedback. In this paper we would like to purpose new conceptual framework which combined OMST and reflective feedback together by using a case study of Information Technology Professional Examination (ITPE) only IP level. 2. Literature The objective of this study is conceptualized framework of computer adaptive testing with reflective feedback. It has many components, which involve with this conceptual framework. First, computerized adaptive testing (CAT), items were selected with consistency ability of the examinee. Second, multistage adaptive testing (MST), items were grouped in stage and assemble before testing. Third, on-the-fly assembles multistage adaptive testing (OMST), items were separate in stage, but each item was selected on the fly like traditional CAT by computer algorithm. And last, reflective feedback, combine feedback and feedforward together, which help to improve learning of the examinee. 2.1. Computerized Adaptive Testing Concept of computerized adaptive testing (CAT) was established in 1905 by Binet-Simon intelligence test. Items were selected according to mental age of the examinee which are derived from item response in previous items. The test will continue until it could estimate actual age of examinee. Concept of CAT has been used for a long time. We could meet it in oral test, interviewer could select proper item to examinee (van der Linden, 2000). The main principle of CAT is item selection, each item selection came from item response from previous item. Each examinee will start with moderate difficult item. After ability estimation from the previous item response, next item would be selected based on parameter of items: power of discrimination, difficulty and probability of guessing, which matched with ability of examinee. If previous item is correct, next item will be harder. If previous item is incorrect,
808
Nhabhat Chaimongkol et al. / Procedia - Social and Behavioral Sciences 217 (2016) 806 – 812
next item will be easier. Testing will continued until ability estimation of examinee is stable or meet the stopping criterion, testing will terminate (Kanjanawasee, 2012). Item response theory (IRT) is used in CAT. IRT is relationship between ability and item response, which is described by item characteristic curve (ICC). ICC is logistic function or normal ogive function. IRT has three item parameters: discrimination power (a), difficulty (b) and guessing probability (c). Item response model is relationship between correct item probability and ability that has three main model, as following formula (Kanjanawasee, 2012). 1) Three parameter logistic model (3 PL) 2) Two parameter logistic model (2 PL) 3) One parameter logistic model (1 PL)
2.2. Multistage Adaptive Testing Multistage adaptive testing (MST) operates different from CAT. The important components are panel, module, stage and pathway. Panel consisted of items, which has different levels of difficulty. Each testing has more than one panel so it should be parallel. Inside panels have many modules. Each module has same level of difficulty. In general, modules are separate with easy, medium and difficult level. MST collaborates item in stage. First stage usually sets item with moderate level. Numbers of stage could set by testing administrator. Last component in MST is pathway or routing. When examinee finished each stage, it would estimate ability of examinee after that it would route to modules, which had consistency with ability of examinee (Zheng, Nozawa, Gao, & Chang, 2012). 2.3. On the fly assemble Multistage Adaptive Testing Traditional computerized adaptive testing (CAT) and multistage adaptive testing (MST) have different advantages and disadvantages. So Zheng and Chang integrated both methods together, call on-the-fly assemble multistage adaptive testing (OMST). OMST is more outperform than CAT and MST in prototype testing. The initial stage of OMST uses item with moderate difficult level in first stage. After first OMST stage completed, the items from second stage were selected to match with examinee ability. Item selection from second stage based on computer algorithm which assemble automatically. Other conditions such as content coverage, item exposure and other psychometrics properties are controlled (Zheng and Chang, 2014).
Estimate
Stage 1
Stage 2
Initial Stage
Stage 3
Adaptive Stage
Constrained Control
Figure. 1. On-the-fly assemble multistage adaptive testing framework
Nhabhat Chaimongkol et al. / Procedia - Social and Behavioral Sciences 217 (2016) 806 – 812
2.4. Reflective Feedback Feedback is conceptual data, which is derived from stakeholder e.g. teacher, friends or book. Moreover, feedback is principle of formative assessment. The main objectives of feedback is closing gap between present and desired capacities. Effective feedback has to answer three main questions. First, where I am going? Second, how am I going? And Third, where to next? However, reflective feedback is formative feedback for improvement. So they have two guidelines for giving feedback, guideline for should and should not do (Hattie, & Timperley, 2007, Shute, 2007). Feedback is used to consider leadership skills for long time. The problem of feedback is that it focuses in the past activities, which had already done. It still is lack of improvement sense. We should utilize feedforward as well. Feedforward is activities that does not focus in past activities but focus in future activities. It is a recommendation for future improvement, which helps them more happy in life (Goldsmith, 2003). In this research we combines feedback and feedforward together called reflective feedback. 2.5. Information Technology Professional Examination Information Technology Professional Examination (ITPE) is an examination that is a collaborate effort by seven countries, Japan, Malaysia, Philippines, Vietnam, Myanmar, Mongolia and Thailand. The examination is knowledge and skill testing of information technology, which does not involved with brand. It is separated in four levels. First, Information Technology Passport Examination (IP). Second, Fundamental Information Technology Engineer Examination (FE). Third, Applied Information Technology Engineering Examination (AP). And fourth, Advance Professional Examination (National Electronics and Computer Technology Center, 2015). In new framework, we use only IP level because it is a basic level of information technology skill. If they pass at this level that mean they have information technology skill. For another level may be considered in the future. 2.6. System evaluation In this research, three systems are evaluated. First, heuristic evaluation, search problem of user interface design. Evaluators will prove and make decision of user interface before implement system. Furthermore they will recommend which part is good and which part should be improved (Nielsen & Molich,1990; Nielsen, 1992). Second, user’s satisfaction of human-computer interface, important key of system development is response satisfaction of user (Chin, 1988; Navas, 2007). Third, standard evaluation, Stufflebeam evaluation concept, consists of four modules are utility, feasibility, propriety and accuracy (Kanjanawasee, 2011). 3. Conceptual Framework Conceptual framework of this study is shown in figure 2. The system was implemented from principle and theory of CAT, MST, OMST, ITPE and reflective feedback. In system development have four sections, Starting from study algorithm of OMST and reflective feedback report, design and develop testing system, validate quality of system and improve, and try out the system respectively. OMST with reflective feedback system consists of five components, first, initial testing, ability examination and items selection, item exposure control, stopping criterion, and reflective feedback report. Moreover, quality of system will be evaluated by three evaluations, heuristic evaluation, user satisfaction of the human-computer interface evaluation and standard evaluation.
809
810
Nhabhat Chaimongkol et al. / Procedia - Social and Behavioral Sciences 217 (2016) 806 – 812
Develop OMST with reflective feedback system - Study algorithm of OMST and reflective feedback report - Design and develop testing system - Validate quality of system and improve - Trial
OMST with reflective feedback system - Initial testing - Ability estimation and items selection - Item exposure control - Stopping criterion - Reflective feedback report
Principle and theory of OMST with reflective feedback - CAT - MST - OMST - ITPE - Reflective Feedback
OMST with reflective feedback system evaluation - Heuristic - User satisfaction of the human-computer interface - Standard
Fig. 2 Conceptual Framework of Computerized Adaptive Testing with Reflective Feedback Operation of CAT system have been shown in figure 3. The system starts at stage one with moderate items. Then estimate ability of examinees. If standard error (SE) less or equal than 0.3, testing will terminate, if not continue to stage two. Each item from stage two cames from provision ability estimation in first stage by using on the fly assemble algorithm in selection items. In stage two, examinees will continue test 15 items, if standard error less or equal than 0.3, testing will terminate, if not testing will proceed. Testing will continue until standard error less or equal than 0.3. After finished testing, system will reflect instant feedback by test score and recommendation report for information technology occupation.
Nhabhat Chaimongkol et al. / Procedia - Social and Behavioral Sciences 217 (2016) 806 – 812
Start
Stage 1 15 moderate items
Yes Stop Testing
SE (θ) ≤ 0.3 No
Stage n 15 items (each item selected from ability in stage 1)
No SE (θ) ≤ 0.3
Yes
Stop Testing
Reflective Feedback -Report Test Score -Recommendation for Occupation
End
Fig. 3. Computerized Adaptive Testing with Reflective Feedback Flow Chart 4. Conclusion Computerized adaptive testing has effectiveness in administering items, which are relevant with ability of examinee. Although CAT has more effective but it still lack of ability estimation. OMST is used replace tradition CAT. OMST is not pre-assemble panel or testlet like MST but select items immediately by using computer algorithm after ability estimation. Testing is controlled by psychometric properties as content balance and item exposure. After finish test, system would report instant score of testing and recommendation for career, which is called reflective feedback. In this research applies information technology professional examination level IP that collaborates from seven countries, Japan, Malaysia, Philippine, Vietnam, Myanmar, Mongolia, and Thailand. Nevertheless, this research will be used for preparation before real testing, it cannot claim certificate from this CAT system. The system apply new algorithm of CAT and reflective feedback together. In future research may apply this framework with other examination such as driving license.
811
812
Nhabhat Chaimongkol et al. / Procedia - Social and Behavioral Sciences 217 (2016) 806 – 812
Acknowledgements I would like to thank Associate Professor Shotiga Phasiphol, Ph.D. and Professor Sirichai Kanjanawasee, Ph.D. for recommend and comment in my research. References Kanjanawasee, S. (2011). Evaluation Theory (Ed.4). Bangkok: Chulalongkorn University Press. Kanjanawasee, S. (2012). Modern Test Theory (Ed.4). Bangkok: Chulalongkorn University Press. Chang, H. H. (2004). Understanding computerized adaptive testing: From Robbins-Monro to Lord and beyond. In D. Kaplan (Ed.), The Sage handbook of quantitative methodology for the social sciences. Thousand Oaks, CA: Sage. Chang, H. H. (2014). Psychometrics behind computerized adaptive testing. Psychometrika. Published online Feb. 2014. DOI: 10.1007/S11336- 014-9401-5. Chin, J. P., Diehl, V. A., & Norman, K. L. (1988). Development of An Instrument Measuring User Satisfaction of the HumanComputer Interface. Proceeding of the SIGCHI Conference on Human Factors in Computing Systems, 213-218. Goldsmith, M. (2003). Leadership Development : Try Feedforward Instead of Feedback. Journal of Excellence, 8, 15-19. Phankokkruad, M. (2012). Association Rules for Data Mining in Item Classification Algorithm: Web Service Approach. Proceeding of Digital Information and Communication Technology and it's Applications (DICTAP), 2012 Second International Conference. National Electronics and Computer Technology Center. (2015, April). Information Technology Professional Examination. Retrieved from http://www.nstdaacademy.com/webnsa/index.php/sss. Navas, H., Osornio, A. L., Baum, A., Gomez, A., Luna, D., & Bernaldo, F. G. (2007). Creation and Evaluation of a Terminology Server for the Interactive Coding of Discharge Summaries. Proceeding of MEDINFO 2007, 650-654. Nielsen, J., & Molich, R. (1990). Heuristic evaluation of user interfaces. Proceeding of ACM CHI'90 (Seattle, WA, 1-5 April), 249-256. Nielsen, J. (1992). Finding Usability Problems through Heuristic Evaluation. Proceeding of ACM CHI'92 (Monterey, CA, 3-7 May), 373-380. Zheng, Y. & Chang, H. H. (2014). On-the-Fly assembled multistage adaptive testing. Applied Psychological Measurement. Published online on September 5, 2014. DOI: 10.1177/0146621614544519. van der Linden, W. J. (2000). Optimal assembly of tests with item sets. Applied Psychological Measurement, 24. 225-240. Zheng, Y., Nozawa, Y., Gao, X., & Chang, H. H. (2012). Multistage Adaptive Testing for a Large-Scale Classification test: The designs, Automated Heuristic Assembly, and Comparison with Other Testing Modes. ACT Research Reports 2012-6. Hattie, J.,& Timperley, H. (2007). The Power of Feedback. Review of Educational Research, 77(1), 81-112. Shute, V. J. (2007). Focus on Formative Feedback. Education Testing Service (ETS) research report.