Journal of Engineering Science and Technology Special Issue on UKM Teaching and Learning Congress 2013, June (2015) 42 - 52 © School of Engineering, Taylor’s University
RASCH MODEL APPROACH FOR FINAL EXAMINATION QUESTIONS CONSTRUCT VALIDITY OF TWO SUCCESSIVE COHORTS 1
2,
2
MST. SADIA MAHZABIN , ROSZILAH HAMID * , SHAHRIZAN BAHAROM
1 Department of Civil Engineering, Lee Kong Chian Faculty of Engineering and Science, Universiti Tunku Abdul Rahman (UTAR), Jalan Genting Klang, Setapak, 53300 KL, Malaysia 2 Centre for Engineering Education Research, Department of Civil and Structural Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor Darul Ehsan Malaysia *Corresponding Author:
[email protected]
Abstract This paper describes a Rasch-based measurement model as performance assessment to measure the validity of the ‘Materials Technology’ course final examination questions for a given cohort. An overview of the measurement model, key concepts, and item and person performance are presented and illustrated. The outcome of this investigation shows that some final examination questions for ‘Materials Technology’ course need to be revised. The same assessment was done on the final examination questions for previous cohort and the results are compared with the current assessment. Further improvement in the construction of final examination questions for the course in the future can be done based on assessments of both cohorts. Keywords: Rasch model, ‘Materials Technology’ course, Examination questions, Validation, Assessment.
1. Introduction This paper describes an alternative approach using Rasch Measurement Model as an accurate performance assessment tool to measure the validation of ‘Materials Technology’ course final examination questions. Nowadays, student’s evaluation measurement is largely dependent on student’s performance in carrying out tasks such as quizzes, examinations, assignments and laboratory works. The Department of Civil and Structural Engineering (Dept. of C & S), Faculty of Engineering and Built Environment, UKM offers the ‘Materials Technology’ (coded KKKH2264) course to second year students. ‘Materials Technology’ is a 4 42 Peer-review under responsibility of Universiti Kebangsaan Malaysia
Rasch Model Approach for Final Examination Questions Construct Validity of . . . . 43
credit hour course with two third of the contact hour goes to lecture and one third of it to laboratory works. The course introduces the physical and engineering properties of all categories of construction materials, via lectures and students’ presentation of projects. Some related tests on one of the construction materials, which are concrete, are delivered through laboratory works. Rasch model is a modern measurement method that provides a sound platform of measurement which matches the SI unit criteria where it acts as an instrument of measurement with a defined unit and thus is replicable [1]. This model uses empirical data of the students which is directly from the given tasks via lecturer’s assessment [2]. Osman et al. [3] studies how the Rasch model analysis measured the performance of the students in ‘Civil Engineering Design II’ course at the Dept. of C & S, UKM. This study was conducted to all 64 final year students from the department. The results show similar pattern with the conventional method. The WinSteps software to model Rasch method uses ‘logit’ as the measurement unit thus transforms the assessment results into a linear correlation. Rasch analysis can be used as measurement tools to determine the students results with a better explanation. Rasch analysis is an outcome scale which used logit unit against a mathematical measurement. This model has been developed by George Rasch [4]. Ever since, Rasch Analysis has been widely used and proves its effectiveness and accuracy in giving results and prediction. Saidfudin et al. [5] described how Rasch model analysis proved is more meaningful for assessing academic reports and also improve students’ achievement in meeting the targeted Course Learning Outcomes (CLO) by student classification hence better management of assessment. Rasch Model offers excellent and comprehensive assessment of CLO for which can enhance the understanding of education alignment, and assist educators in developing and maintaining quality in engineering education [1]. Rasch method, based on logistic regression model, which can measure and classify the student learning ability using a few numbers of selected students and dimensions. The results were more accurate compared to the traditional CGPA method [2]. An assessment in an engineering course was done using Bloom’s Taxonomy, with the behavioural learning characteristics as parameter, with which each dimension of the abilities is to be measured. The results were evaluated on how well it relates to the attributes being assessed and scrutinized. It is further checked against the CLO’s maps for consistency and used as a guide for future improvement of the teaching method and style. The lecturers are hoped to have more accurate insight of the student level of learning competency [1, 2]. With the early intervention [6] and previous assessment on the final examination questions construct validity [7], this study aims to measure the “Material Technology” course final examination questions construct validity and identify unexpected patterns of item and person performance in one cohort. The measurement is based on the final examination questions of 2011/2012 session cohort and the raw marks will be analysed by Rasch model. Results are compared with analysis from different cohort (2009/2010 session) [7].
2. Measurement Methodology Since the student’s ability is latent and cannot be observed directly, the Rasch Rating Scale Model, the dominant measurement that estimates the person
Journal of Engineering Science and Technology
Special Issue 1 6/2015
44
M. S. Mahzabin et al.
measures and item difficulties based on the same linear scale in standard units (logits) is used. This study was conducted for all 46 students (Male = 22, Female = 24) registered for the Materials Technology course in 2011/2012 session. The degree of a student’s ability is indicated by the separation of the item against the student’s location on the map: the further the separation, the more able a person likely to respond correctly to the said item. Similarly, the extent of an item difficulty is reflected by the spread of the item over the logit scale; a higher item is perceived to be more difficult as compared to an item at lower location [8]. Rasch Model is closely related to Item Response Theory (IRT), which is derived from a distinct set of fundamental postulates, and the most important concept is being specific objectivity. WinSteps program is the consequence of fundamental principles deemed important and indispensable. In the Rasch philosophy, the data have to comply with these principles, or in other words the data have to fit the model; hence validity. Rasch Unidimensional Measurement Method (using WinSteps software) is applied where accurate findings can be yielded even by using a small data-set. The item difficulty encountered in building up the students’ required learning ability and cognitive skills development can be duly analysed. Table 1 shows the linkages of the learning performance measurement for each identified student’s cognitive ability based on Bloom’s Taxonomy. Table 1. Cognitive level (Taxonomy Bloom) of final examination questions. Question Entry No. Learning Topic Taxonomy Bloom 1 Masonry, concrete Knowledge (K) A1 components, concrete, steel, plastic 2 Concrete Analysis (An) A2 3 Steel, Ferrous Metals Knowledge (K) A3 4 Concrete, Steel, Timber Knowledge (K) B1 5 Concrete Mix Design Analysis (An) B2 6 Concrete Evaluation (E) B3 7 Sustainable Construction Application (A) B4 Table 1 shows the learning topics covered by the final examination questions. Each question was constructed based on the outcomes expected from the learning topics designed before the class was offered. The Bloom’s Taxonomy cognitive learning levels; Level-1: Knowledge (K), 2: Comprehension (C), 3: Application (A), 4: Analysis (An), 5: Evaluation (E) and 6: Creation (Cr); are all measurable. For course KH 2264, the students were expected to develop the levels up to 5; i.e. evaluation of knowledge which acquire them to analyse situations to provide the appropriate solutions. The questions are entered as entry number shown in Table 1. The item are labelled as question no., taxonomy bloom and learning topic, thus for entry number 1, the item is QA1_K. Assessment marks distribution is as predetermined in the typical course outline. In Rasch measurement, data is multidimensional but measurement is unidimensional, hence putting everything on the
Journal of Engineering Science and Technology
Special Issue 1 6/2015
Rasch Model Approach for Final Examination Questions Construct Validity of . . . . 45
same scale. Rasch emphasize the shifts of reliability and validity in traditional Cronbach-α and Factor Analysis to the reproducibility of measures rather than expressing the reproducibility of raw scores [9]. Marks were then prepared based on the percentage distribution. The final exam results is compiled and tabulated as shows in Table 2. Students were then sorted according to their gender; M for Male and F for Female followed by the student’s number.
Table 2. Student exam results and Students Exam Results Tabulation *prn Student M01 M02 F01 F02 F03 M03 M04 M05 F04 M06 M07 F05 M08 M09 M10 F06 M11 M12 M13 F07 M14 M15 F08 F09 F10 M16 M17 F11 M18 F12 F13 F14 F15 F16 M19 F17 M20 F18 F19 M21 F20 F21 F22 F23 F24 M22
Raw Data (%) QA1 QA2 50 0 38 0 38 0 43 0 40 0 65 0 15 0 48 0 50 0 45 0 53 0 60 0 48 0 23 0 45 0 33 0 48 0 10 0 5 0 30 50 48 0 33 0 33 0 20 0 45 0 53 0 40 0 20 0 48 20 58 0 38 0 28 0 15 0 25 0 18 0 18 0 30 0 23 0 48 0 38 0 25 0 35 40 55 0 40 0 38 0 43 0
QA3 60 60 30 30 45 65 40 70 65 60 45 80 0 70 65 55 40 15 60 35 70 25 45 55 45 55 35 40 65 85 60 45 0 30 50 30 40 20 35 50 15 0 60 35 55 70
QB1 83 35 33 23 65 63 0 38 50 0 73 48 38 55 30 33 50 0 38 0 33 0 48 43 75 23 0 38 45 0 0 50 5 25 40 15 0 13 45 0 33 58 30 30 55 78
QB2 75 100 60 5 65 100 80 70 70 70 100 75 80 55 75 80 100 40 75 70 60 45 75 0 75 65 65 50 100 70 60 80 65 75 70 65 85 55 0 100 45 75 65 65 0 60
QB3 55 35 20 0 70 0 0 0 0 65 75 0 20 0 0 0 78 20 0 58 0 0 43 35 0 0 30 35 50 45 23 0 8 30 0 0 23 28 25 50 0 0 0 55 50 0
QB4 0 0 0 20 0 55 60 80 25 60 0 60 0 30 30 75 0 43 45 60 65 20 0 40 73 30 90 0 0 48 35 85 0 0 30 0 10 0 55 20 20 20 65 0 65 0
Rated Raw Scores QA1 QA2 QA3 3 1 4 2 1 4 2 1 2 2 1 2 2 1 3 4 1 4 1 1 2 3 1 5 3 1 4 3 1 4 3 1 3 4 1 5 3 1 1 1 1 5 3 1 4 2 1 3 3 1 2 1 1 1 1 1 4 2 3 2 3 1 5 2 1 1 2 1 3 1 1 3 3 1 3 3 1 3 2 1 2 1 1 2 3 1 4 3 1 5 2 1 4 1 1 3 1 1 1 1 1 2 1 1 3 1 1 2 2 1 2 1 1 1 3 1 2 2 1 3 1 1 1 2 2 1 3 1 4 2 1 2 2 1 3 2 1 5
QB1 5 2 2 1 4 4 1 2 3 x 5 3 2 3 2 2 3 x 2 x 2 1 3 2 5 1 x 2 3 x x 3 1 1 2 1 x 1 3 x 2 3 2 2 5 5
QB2 5 5 4 1 4 5 5 5 5 5 5 5 5 3 5 5 5 2 5 5 4 3 5 x 5 4 4 3 5 5 4 5 5 5 5 4 5 3 x 5 3 5 4 4 x 4
QB3 3 2 1 x 5 x x x x 4 5 x 1 x x x 5 1 x 3 x x 2 2 x x 2 2 3 3 1 x 1 2 x x 1 1 1 3 x x x 3 3 1
QB4 x x x 1 x 3 4 5 1 4 x 4 x 2 2 5 x 2 2 4 4 1 x 2 5 2 5 x x 3 2 5 x x 2 1 1 x 3 1 1 1 4 x 4 x
The raw data are then transformed into Grade Rating in which students were rated according to their achievement. In this study, the responses to the item are scored 1, 2, 3, 4, 5 with 5 is for marks greater than 70 percent marks, 4 is for 60, 3 is for 45, 2 is for 30 and 1 is for less than 29, which is considered as failed. This grade rating is tabulated in Excel*prn format is as shown in Table 2. This number coding is necessary for further evaluation of the student achievement using Rasch software, Winstep. Journal of Engineering Science and Technology
Special Issue 1 6/2015
46
M. S. Mahzabin et al.
3. Discussion and findings Data were gathered and analysed in Winsteps 3.72.3 to produce the Rasch output. Figure 1 shows the Person Item Distribution Map (PIDM), where the person is the students and the item is the question learning topics. PIDM is a graph of two variables that are plotted on the same logit scale. PIDM shows the student level from the lowest to the highest level from bottom to top of the scale and at the right side shows the number of questions as well as the cognitive domain level.
Fig. 1. Person item distribution map.
Overall, question QA2_An (Analysis) is the most difficult question based on the PIDM where no one had answered it, while QB2_An (Analysis) is the easiest question for the students as shown in Fig. 1. There are also a huge gap between QA1_K (Knowledge) and QA2_An (Analysis), and next QB4_A (Application) and QB2_An (Analysis). This huge gap shows the extent of difficulty the students had to deal with in answering the questions. It shows that there is a large separation between easy questions and difficult questions. The end of the very
Journal of Engineering Science and Technology
Special Issue 1 6/2015
Rasch Model Approach for Final Examination Questions Construct Validity of . . . . 47
difficult question also shows an empty area and need to be patched up. Both of these gaps require a review to close the gap. Hence, students M07, F05, F10 and M01 are exceptional students in this cohort because they make at the top of the list. It is also noted that two students are male and other two are female. However, persons’ performance in any course undertaking is subjective. It may be attributed to factors such as health condition of the students or gap between two examinations they have to sit which affect the preparation of the students. These factors are not taken into account in this model, and probably may affect the model. Also as indicated, QA2_An is the most difficult question based on the PIDM, which also does not take into account considering failing to answer the question may be due to insufficient time allocated for the examination. The model had assumed that the time allocation is sufficient, thus it is the instructor’s effort to assure that time allocation is sufficient before applying Rasch model to assess the questions. Figures 2 and 3 show the summary statistic for students and item categories. Figure 2 shows a fair person spread of 4.86logit (spread between maximum 1.25 to minimum -3.61) with separation, G = 1.73 and fair reliability of Cronbach-α = 0.57. The major finding is the person Mean value, µperson = -0.82logit which is lower than the threshold value, MeanItem = 0. This value shows that the student’s performance is below the expected performance. Only 32.61% (N = 15, (15/46) ×100) were found to be above the MeanItem while 67.4% (N = 31, (31/46) ×100) were below the MeanItem. According to the raw score, 84.8% (N = 39) students passed the final exam as shown in Table 3. Table 3 shows the statistical analysis of raw marks for both session 2011/2012 and 2009/2010 [7].
Fig. 2. Summary Statistic: person measure.
Fig. 3. Summary statistics: Item measures.
Journal of Engineering Science and Technology
Special Issue 1 6/2015
48
M. S. Mahzabin et al.
Table 3. Statistical analysis of raw marks. Statistical Analysis Marks > 70 60 ≤ Marks < 70 45 ≤ Marks < 60 30 ≤ Marks < 45 Marks ≤ 29 Average Standard Deviation Maximum Minimum No. of input data
No. (S2009/2010) 8 9 22 11 4 53.10 14.21 81 15.5 54
No. (S2011/2012) 0 2 18 19 7 42.5 11.91 64.5 18.5 46
Figure 3 shows the Item summary with separation, G = 5.36 and a very high reliability = 0.97. It has good item spread of 6.99logit with SDi = 1.89 but requires review on the both end of difficult and easy item. Nevertheless, students F02, F18 and M12 are definitely in trouble as they have serious difficulty to understand the program where they are located below the all items. Analysis on Point Measure Correlation (PMC) indicates the construct validity of the questions. Figure 4 shows that all questions have met all criteria of a quality question and review is not required. If the point measure = x; 0.4 1.5 and ZSTD >2, it means poor students can answer this difficult question. On the other hand, for 2011/2012 cohort only for the question number QA2_An, the point measure x = 0.09 < 1.4, MNSQ > 1.5 and ZSTD > 2. It means, many students could not answer this question but poor students can answer this difficult question. However, for other six questions all met three controls properly. From the person measure table for the 2009/2010 cohort [7], it is exposed that five students are found to be misfit where they failed all three criteria. For the 2011/2012 cohort, the control is to check the item and it is exposed that all students are found to be fit and they passed the criteria.
4. Conclusions The construct validity and improvement of “Materials Technology” course final examination questions had been measured using Rasch Measurement Model for two different cohorts. In the 2009/2010 cohort, the students found that half of the topics studied were difficult to answer; Q3c-wood (Comprehension), Q3bconcrete 9 (Knowledge) and Q5b-concrete (Analysis), being most difficult where most of the student were not able to answer them satisfactorily. In the 2011/2012 cohort, the most difficult question has been reduced to only one: QA2_An (Analysis) and only QB2_An (Analysis) are the easiest questions. The level of difficulty of each question from both cohorts had been determined through Rash Analysis, even though the levels of the Bloom Taxonomy for each question are different. Instructors had found that level of difficulty of questions can be set at different level of the Bloom Taxonomy. Five students from the 2009/2010 cohort are found to be misfit where they failed all three criteria of Person measure compare to the 2011/2012 cohort, where it is found that all students are fit and they pass the criteria. The results also show that the Item is valid and reliable to be used in measuring student performance for this course. The results generated from this measurement can be used to guide the lecturer to determine the appropriate improvement of the teaching method as well as in determining of the quality question prepared.
Acknowledgements The authors acknowledge the financial support from Universiti Kebangsaan Malaysia through grants PTS-2013-004 and PTS-2013-017.
Journal of Engineering Science and Technology
Special Issue 1 6/2015
52
M. S. Mahzabin et al.
References 1.
Saidfudin, M.; and Ghulman, H.A. (2009). Modern measurement paradigm in Engineering Education: Easier to read and better analysis using Rasch-based approach. International Conference on Engineering Education. ICEED. Dec. 9-10, Shah Alam, Selangor, Malaysia. 2. Saifudin, M.; Ghulman, H.A.; Razimah, A.; and Rozeha, A. (2008). Application of Rasch-based ESPEGS model in measuring generic skills of engineering students: A new paradigm. WSEAS Transactions on Advances in Engineering Education, 5(8), 591-602. 3. Osman S.A.; Naam S.I.; Jaafar O.; Badaruzzaman W.H.W.; and Rahmat R.A.A. (2012). Application of Rasch model in measuring students’ performance in Civil Engineering Design II Course. International Conference on Teaching and Learning in Higher Education (ICTLHE 2012) in conjunction with RCEE & RHED 2012. Procedia - Social and Behavioral Sciences, 56, 59-66. 4. Tennanti, A.; and Philip, G.C. (2007). Using Rasch analysis to compare the psychometric properties of the short form 36 physical function score and the health assessment questionnaire disability index in patients with Psoriatic Arthritis and Rheumatoid Arthritis. Arthritis & Rheumatism (Arthritis Care & Research), 57(8), 1358-1362. 5. Saidfudin, M.; Azlinah, M.; Azrilah, A.A.; Nor Habibah, A.; and Sohaimi, Z. (2007). Appraisal of Course Learning Outcomes using Rasch measurement: A case study in Information Technology Education. International Journal of Systems Applications, Engineering & Development, 1(4), 64-172. 6. Hamid, R.; Yusof, K.M.; Osman, S.A.; and Rahmat, R.A.O.K. (2009). Improvement in delivery methods in teaching materials technology. WSEAS Transactions on Advances in Engineering Education, 6(3), 77-86. 7. Hamid, R.; Othman, E.; Osman, S.A.; Hamzah, N.; Jaffar, O.; and Kasim, A.A.A. (2011). Determination of materials technology course final examination questions construct validity through Rasch model approach. Recent Researches in Education. Proceedings of the 10th WSEAS International Conference on Education and Educational Technology (EDU` 11), 154-160. 8. Zakaria, S.; Aziz, A.A.; Mohamed, A.; Habibah, N.; Ghulman, A.H.A.; and Masodi, M.S. (2008). Assessment of information managers’ competency using Rasch measurement. Third International Conference on Convergence and Hybrid Information Technology, IEEE, 190-196. 9. Mok, M.; and Wright, B (2004). An overview to Rasch model and measurement. University of Chicago, USA. 10. Saidfudin, M.; Azrilah, A.A.; Rodzo’an, N.A.; Omar, M.Z.; Zaharim, A.; and Basri, H. (2010). Easier learning outcomes analysis using Rasch model in engineering education research. EDUCATION'10 Proceedings of the 7th WSEAS International Conference on Engineering Education. 442-447. 11. Hamid, R.; Baharom, S.; Taha, M.R.; and Kasim, A.A.A. (2012). Sustainable and economical open-ended project for materials technology course laboratory work. Procedia Social and Behavioral Sciences- UKM Teaching and Learning Congress 2011, 60, 3-7.
Journal of Engineering Science and Technology
Special Issue 1 6/2015