Objective structured clinical examination: Examiners ...

7 downloads 6236 Views 106KB Size Report
non-examiner-based unmanned stations (UnMS) of the surgery Objective Structured Clinical. Examination (OSCE), using scores of written paper as the outcome ...
Journal of Medicine and Medical Science Vol. 1(7) pp. 269- 272 August 2010 Available online http://www.interesjournals.org/JMMS Copyright ©2010 International Research Journals

Full length Research Paper

Objective structured clinical examination: Examiners’ bias and recommendations to improve its reliability Salman Yousuf Guraya , Abdu Hassan Alzobydi, Shaista Salman Department of Surgery, College of Medicine, Taibah University, Al Madinah Al Munawarah, Kingdom of Saudi Arabia Accepted 21 June, 2010

To objectively evaluate the validity of scores awarded at examiner-based manned stations (MS) and non-examiner-based unmanned stations (UnMS) of the surgery Objective Structured Clinical Examination (OSCE), using scores of written paper as the outcome variable. Scoring of OSCEs during the surgical courses for the fourth-year undergraduate students was evaluated. Students were rotated through twelve 3-minute stations (5 MS, 5 UnMS, and 2 rest stations). A well-structured blueprint was applied to focus the competence to be evaluated in each group. The scores of manned and unmanned stations for each student were compared followed by comparison with the score of written paper which was corrected electronically. 96 students were evaluated. The mean score (± SD) for student assessment at the MS 89.87% (± 3.5) was significantly higher than that of UnMS 72.17% (± 2.3); P value < 0.001. The correlation co-efficient for MS and UnMS scores was 0.31 (P value = 0.005). Based on the written paper score as a dependant variable, there was no significant interaction between MS and UnMS scores, as well as between MS and written paper scores (regression co-efficient = – 0.93, P value = 0.06).There is a clear evidence of examiner bias in the OSCE assessment which necessitates examiners’ training and experience to achieve professional goals. Keywords: Examiner bias, OSCE, medical education INTRODUCTION Since the introduction of Objective Structured Clinical Examination (OSCE) by Hardin (Hardin et al.,1976) in 1975, it has emerged as a ‘gold standard’ of health professional assessment in a variety of disciplines(Merrick et al., 2000; Bartfay et al., 2004). The evaluation of competence using traditional clinical examination has its limitations because of low reliability and validity (Hijazi and Downing, 2008). Reliability refers to the reproducibility of assessment data and estimates the random error of judgement (Downing 2004) whereas validity is the accumulation of evidence that supports meaningful interpretation of the assessment results (Downing 2003). The OSCE is superior to the oral clinical examination because it overcomes the problem of case specificity by sampling a broad area of competency and knowledge (Newble 2004). Compared to written evaluations, the OSCE format attempts to enhance the examination fidelity by more closely simulating realistic clinical scenarios, and has been shown to have both reliability and contrast validity (Cohen et al., 1990). *Corresponding author Email; [email protected]

A key factor of reliability in OSCE is the accurate judgment made by the examiners (Rushforth 2007). This becomes a paramount issue when a typical OSCE station is solely reliant on a single examiner. Interestingly, there is minimal debate in the literature as to whether the qualifications, experience or training of the examiners affect the reliability of the OSCE. This study compares the results of Manned Stations (MS) versus Unmanned Stations (UnMS) of the OSCE conducted for the 4th year surgery students at the College of Medicine Taibah University, Al Madinah Al Munawarah, Kingdom of Saudi Arabia. METHODS The Department of Surgery, College of Medicine Taibah University Ministry of Higher Education Al Madinah Al Munawarah Saudi Arabia conducted two 9-week courses for the 4th year medical students. Each course comprised a Mid Course Examination (MCE) and a Final Examination (FE), undertaken during the 5th and 9th week of the course respectively. MCE contained a written paper of Single Response Questions (SRQ) and Extended Matching Questions (EMQ) and an OSCE, whereas FE had 3 components;

270 J. Med. Med. Sci.

Table 1. Result analysis of the OSCE and written examinations for 4th year surgery students

OSCE Manned Station Examination

Passed No (%) 39 (90.6) 37(86) 50(94.3) 47(88.6)

Female MCE* (No 43) Female Final (No 43) Male MCE (No 53) Male Final (No 53) *

OSCE Unmanned Station

Failed No (%) 4(9.3) 6(14) 3(5.7) 6(11.4)

Passed No (%) 30(69.7) 31(72) 38(71.6) 40(75.4)

Failed No (%) 13(30.3) 12(28) 15(28.4) 13(24.6)

Written Passed No (%) 32(74.4) 33(76.7) 37(69.8) 39(73.5)

Failed No (%) 11(25.6) 10(23.3) 16(30.1) 14(36.6)

Mid course examination

*

Mid course examination

Figure 1. Comparative results of manned and unmanned stations of OSCE and written examinations

written, OSCE, and clinical examination. The written paper contained problem based scenarios with high reliability and content validity. Depending on the curriculum, a blueprint was developed for each OSCE to capture the clinical competencies in the covered subjects. Every OSCE utilized twelve time-controlled, 3-minute stations; 5 MS, 5 UnMS, and 2 rest stations. The MS and UnMS were piloted and standardized to portray the same degree of difficulty. A map for the stations was devised to guide the examinees and organizers. Clear written instructions to the examiners, patients, and examinees were given. For the MS, examiners were provided with a marking sheet and a relevant station task-specific checklist for marking. At the beginning of OSCE, examination sheets were distributed to the students, to write their answers at the UnMS where the required tasks were clearly mentioned. Students moved between stations on command by the time keepers. Examiners supervised each station throughout the session and the whole group of students was assessed by near identical process. At the end, the marking and answer sheets were collected from the examiners and students respectively. The student answers for the UnMS were corrected following a predesigned checklist. The mean scores for the written, MS, and UnMS were calculated for each student and maintained on electronic database for interpretation. Statistical tests used were two-sample t-test, Pearson’s correlation co-efficient, and multiple linear regressions on

GraphPad InStat Version 3.00 (GraphPad Software San Diego, CA, USA). P value < 0.05 was considered significant.

RESULTS A total of 96 (43 females and 53 male) students were examined during 4 OSCEs through the academic year of 2007-2008. The mean score (± SD) for student performance as judged at MS was 89.87% (± 3.5) compared to 72.17% (± 2.3) recorded at UnMS (Table 1). This difference was statistically quite significant (P < 0.001). The correlation co-efficient for MS and UnMS scores was 0.31 (P value = 0.005). Considering the dependant variable of written result, there was no significant interaction between MS and UnMS scores. Also, there was no significant relationship between MS and written paper scores (regression co-efficient = – 0.93, P value = 0.06). The scores of UnMS and written paper were significantly and positively related (regression coefficient = 0.79, P value = 0.49) as demonstrated in Figure 1.

Salman et al. 271

DISCUSSION OSCE is considered as a validated measure to assess the competency of medical students with a potential of skilled application of knowledge in clinical and technical scenarios (Zyromski et al., 2003). This is a dynamic evaluation tool with broad range of skills tested (Schuwirth and van der Vleuten, 2003), motivation for learning (Smee 2003), and a useful component of multimethod assessment, complementing multiple-choice tests and subjective ratings (Epstein and Hundert, 2002). OSCE is a series of timed stations ranging from (ranging from 8 to 20) through which the examinees are evaluated by performing a standardized clinical task using a welldefined structured marking sheet. The clinical task can be history taking, clinical examination, communication skills, problem based scenarios, technical skills, visual stimulus (imaging or picture), and imparting informed consent. OSCE assesses the physician performance under examination conditions (competence-based assessment) which is a pre-requisite for physician performance in real life (Rethans et al., 2002). However, OSCE is not without its potential limitations, not only in terms of students’ stress but also the orchestration of the process, faculty time, cost, staffing (Bartfay et al., 2004), and efforts to ensure confidentiality of the OSCE stations when student cohorts need to be assessed in various subgroups (Mavis et al.,1996). OSCE does not address trainee’s ethics and behavior which are mandatory components of the medical education. At the same time, all clinical scenarios can not be reproduced by the OSCE and even all the components of a clinical task can not be captured by a checklist. There is still a tremendous debate about the checklist method of evaluation. It has been well documented that there is substantial variability in the approach of ‘novices’ versus ‘experts’ towards clinical problem solving (Hodges et al., 1999). This has led to variation in generating a gold standard checklist even for the same clinical topic. The checklist used in the present study was designed to breakdown performance into a series of discrete items and examiners were required to tick each element from the list as either ‘done or not done’. The examiners were not allowed to mark the overall rating of the station which was left to the OSCE organizer to correct all stations at the end of the examination. The results of this study reported significantly high scores at MS, as compared to scores of UnMS, which may partly stem from ‘determination bias. The examiners awarded high marks to favor a more pleasurable studentteacher encounters which, unfortunately created a ‘halo effect’ in the evaluation of the students. The authors believe that this can be minimized by inviting more external educators as examiners from various universities and medical institutions. External educators will not have a prior knowledge of the students and there will be a remote possibility of future student-teacher encounter,

both these factors being instrumental in overestimated students’ competence. Another reason for this overestimated students’ competence might be that the examiners were not adequately trained or experienced to examine the students rigorously to achieve the professional standards. Wilkinson and Frampton (2004), stressed the examiners’ training and commitment to the whole process of the OSCE to minimize variation in rating and improve objectivity. This seems to be the key factor in the entire exercise of OSCE, as reflected from the present study. In this study, the piloting for the MS and UnMS was based on a scientifically generated blueprint which projected tasks of similar difficulty to each category. This practice dismissed the assumption of allocating easy tasks to the MS and difficult ones to the UnMS. The literature is scarcely available as to who should evaluate various components of the OSCE to assess the examinee performance. McLaughlin and co-authors (McLaughlin et al., 2006) utilized standardized patients in evaluating third-year medical students during the OSCE and documented that standardized patients gave high marks compared to those awarded by physician examiners. The inflated scoring by the standardized patients may be the result of their limited training which could not distinguish students with surface knowledge from those with in-depth understanding of the subject. Keeping in view, Whelan and others (Whelan et al., 2005) advocated a hybrid form of OSCE in which each attribute was assessed by the person best suited to evaluate that particular task; aspect of communication to be best evaluated by the patient (or standardized patient) whereas problem solving skill to be evaluated by content experts. Many researchers believe that rater bias with the resultant central tendency and ‘halo effect’ were established major threats to the validity of OSCE (Downing et al., 2004). The present OSCE heavily relied on the judgment by single examiner at each MS which might have influenced the outcome. To overcome this threat to the validity of OSCE, Humphries and kaney (Humphris and Kaney, 2001) delegated two examiners per station but even this improvisation could not completely eliminate the possibility of examiner bias. Accuracy can be tested experimentally by using interrater reliability, with selected stations being independently marked by two assessors whose marks are later compared. This is, of course, labor-intensive and timeconsuming with a debatable outcome. This study reported a positive relation between UnMS and written paper scores which reflected an unbiased evaluation of the medical students. At the same time, this finding highlights an important limitation of the study; the written paper with a higher cognitive function was employed as the outcome measure which is not congruent with the OSCE which only evaluated different aspects of competence. This demands further evidence

272 J. Med. Med. Sci.

based projects to validate our study. To conclude, examiner bias affects the overall validity of OSCE assessment. To minimize this element, the examiners should be adequately trained and experienced to supervise their respective stations. Training workshops, medical education courses, and mock OSCEs are recommended to achieve desirable standards. ACKNOWLEDGEMENTS Thanks to all those from the teaching staff, academic support, clinical teams, patients and students who helped to develop and execute all OSCEs REFERENCES Bartfay W, rombough R, Howse E, Leblanc R (2004). The OSCE approach in nursing education. Can. Nurs. 100(3):18-23 Cohen R, Resnick RK, Taylor BR, Provan J, Rothman A (1990). Reliability and validity of the objective structured clinical examination in assessing surgical residents. Am. J. Surg. 160:302-305 Downing MS, Haladyna TM (2004).Validity threats: overcoming interference with proposed interpretations of assessment data. Med. Educ. 38(3):327-33 Downing SM (2003). Validity: on the meaningful interpretation of assessment data. Med. Educ. 37(9):830-837 Downing SM (2004). Reliability: on the reproducibility of assessment data. Med. Educ. 238: 1006-1012 Epstein RM, Hundert EM (2002). Defining and assessing professional competence. J. Am. Assoc. 287(2):226-235 Hardin R, Stevenson M, Downie W, Wilson G (1976). Assessment of clinical competence using objective structured examination. Br. Med. J. 1: 447-551 Hijazi M, Downing SM (2008). Objective structured clinical examination as an assessment method in residency training: practical consideration. Ann. Saudi. Med. 28(3):192-199 Hodges B, Regehr G, McNaughton N, Tiberius R, Hanson M (1999). OSCE checklists do not capture increasing levels of expertise. Acad. Med. 74:1129-1134 Humphris G, Kaney S (2001). Examiner fatigue in communication skills of objective structured clinical examinations. Med. Educ. 35:444-449 Mavis B, Henry R, Ogle K, Hoppe R (1996). The emperor’s new clothes: the OSCE revisited. Acad. Med. 71(5):447-453

McLaughlin K, Gregor L, Jones A, Coderres S (2006). can standardized patients replace physicians as OSCE examiners? BMC Med. Educ. 6(12):1-7 Merrick H, Nowacek G, Boyer J, Robertson J (2000). Comparison of the objective structured clinical examination with the performance of third year medical students in surgery. Am. J. Surg. 179:286-288 Newble D (2004). Techniques for measuring clinical competence: Objective structured clinical examinations. Med. Educ. 38(2):199-203 Rethans JJ, Norcini JJ, Baron-maldorado M, Blackmore D, Jolly BC, LaDuce T (2002). The relationship between competence and performance: implications for assessing practical performance. Med. Educ. 36(10):901-909 Rushforth HE (2007). Objective structured clinical examination: Review of literature and implications for nursing. Nurs. Educ. Today. 27(5):481-490 Schuwirth L, van der Vleuten C (2003). The use of clinical simulations in assessment. Med. Educ. (Suppl); 37: 65-71 Smee S (2003). Skill based assessment. Br. Med. J. 326(7391):703-706 Whelan GP, B JR, MK DW, N WP (2005.) Scoring standardized patients examinations: lessons learned from the development and administration of the ECFMG clinical assessment. Med. Teach. 27:200-206 Wilkinson T, Frampton C (2004). Comprehensive undergraduate medical assessments improve prediction of clinical performance. Med. Educ. 38:1111-1116 Zyromski NJ, Staren ED, Werrick HW (2003). Surgery resident’s perception of the objective structured clinical examination (OSCE). Curr. Surg. 60(5):533-537