SURGICAL EDUCATION: ROLE OF RACS ANZJSurg.com
Correlation of selection scores with subsequent assessment scores during surgical training Zaita Oldfield,* Spencer W. Beasley,† Julian Smith,‡ Adrian Anthony§ and Anthony Watt¶ *Education Development and Research Department, Royal Australasian College of Surgeons, Melbourne, Victoria, Australia †Court of Examiners, Royal Australasian College of Surgeons, Melbourne, Victoria, Australia ‡Department of Surgery (MMC), Monash University and Department of Cardiothoracic Surgery, Monash Medical Centre, Melbourne, Victoria, Australia §Department of Surgery, The Queen Elizabeth Hospital, Adelaide, South Australia, Australia and ¶College of Education, Victoria University, Melbourne, Victoria, Australia
Key words assessment, education, examination, selection, surgical training. Correspondence Zaita Oldfield, Education Development and Research Department, Royal Australasian College of Surgeons, College of Surgeons’ Gardens, 250-290 Spring Street, East Melbourne, Vic. 3002, Australia. Email:
[email protected] Z. Oldfield MEd; S. W. Beasley MBChB, MS, FRACS; J. Smith MBBS, MS, FRACS, FACS; A. Anthony MBBS, FRACS; A. Watt PhD. Accepted for publication 6 January 2013. doi: 10.1111/ans.12176
Abstract Background: Determining admission criteria to select candidates most likely to succeed in surgical training in Australia and New Zealand has been an imprecise art with little empirical evidence informing decisions. Selection to the Royal Australasian College of Surgeons’ Surgical Education and Training programme is based entirely on applicants’ performance in structured curriculum vitae (CV), referees’ reports and interviews. This retrospective review compared General Surgery (GS) trainees’ performance in selection with subsequent performance in assessments during training. Methods: Data from three cohorts of GS trainees were sourced. Scores for four selection items were compared with scores from six training assessments. Interrelationships within each of the sets of selection and assessment variables were determined. Results: A single significant relationship was found between scores on the three selection tools. High scores in the CV did not correlate with higher scores in any subsequent assessments. The structured referee report score, multi-station interview score and total selection score all correlated with performance in subsequent workbased assessments and examinations. Direct observation of procedural skills (DOPS) scores appear to reflect increasing acquisition of operative skills. Performance in mini clinical examinations (Mini-CEX) was variable, perhaps reflecting limitations of this assessment. Candidates who perform well in one examination tend to perform well in all three examinations. Conclusions: No selection tool demonstrated strong relationships with scores in all subsequent assessments; however referee reports, multi-station interviews and total selection scores are indicators for performance in particular assessments. This may engender confidence that candidates admitted into the GS training programme are likely to progress successfully through the programme.
Introduction How to select the most suitable candidates for surgical training has attracted increasing interest in recent times. In the face of greater academic and financial accountability, increased economic pressures on specialist training and reduced working hours, there is an imperative to ensure that selection practices identify those who are most likely to succeed in training and beyond.1–3 Determining appropriate admission criteria, however, has tended to be an imprecise art with little empirical evidence to inform decisions.4
ANZ J Surg 83 (2013) 412–416
In Australia and New Zealand, surgical training is specialty specific at the outset and typically takes five or more years to complete.5 Selection to the Royal Australasian College of Surgeons’ (RACS) General Surgery (GS) Surgical Education and Training (SET) programme, undertaken annually, is a highly competitive process – applicants are an elite group of closely matched, highly motivated and skilled individuals. Decisions about selection onto the GS SET programme comply with the Brennan principles6 and are based entirely on applicants’ performance in three selection activities: a structured curriculum vitae (CV), structured referee reports (RR) © 2013 The Authors ANZ Journal of Surgery © 2013 Royal Australasian College of Surgeons
Selection and subsequent assessment scores
and semi-structured multi-station interviews (MSI). Each of the selection tools constitutes a defined proportion of the total selection score (TSS); the CV accounts for 20% and the RR and MSI each account for 40%. Applicants are ranked according to their TSS, with those achieving the highest ranks being offered positions in the GS SET programme. The selection tools gather information about specific attributes, with slight variations between years and between countries in the information being sought. The CV scores applicants’ self-reported, authenticated biographic information, clinical experience and academic and personal accomplishments. The RR scores applicants’ workplace performance, as judged by their supervisors, in criteria aligned to the RACS competencies of collaboration, communication, judgement and clinical decision making, management and leadership, medical expertise, professionalism, scholar and teacher, and technical expertise. The MSI scores applicants’ attributes as they relate to scholar and teacher, communication and collaboration, management and leadership, health advocacy and cultural awareness, and professionalism and contribution to GS. These qualities are assessed by multiple interviewers at five interview stations. Most SET training occurs in hospitals during trainees’ workbased activities, with short courses and formal teaching augmenting this clinical training. Major assessments comprise work-based and in-training assessments and summative examinations. Trainees undertake at least one direct observation of procedural skills (DOPS) and one mini clinical examination (Mini-CEX) formative assessment for every ‘rotation’ in the first 2 years of SET and complete one summative end-of-term assessment (ETA) for every 6-month rotation throughout SET. In the first two SET years, trainees must also complete two multiple-choice surgical sciences examinations (SSEs) – the generic SSE (Gen SSE) and the specialty-specific SSE (Sp-sp SSE) – and an objective structured clinical examination (CE) in order to progress in the programme. The final assessment, the fellowship examination, is usually undertaken in the fifth or sixth SET year. The different assessments reflect the range of knowledge, skills and attributes required of surgical trainees. The two work-based assess-
413
ments (DOPS and Mini-CEX) provide ‘snapshots’ of trainees’ clinical skills in the workplace; the ETA in-training assessments review trainees’ clinical, operative and professional skills over time. Most workplace supervisors who implement these assessments have received training in conducting work-based and in-training assessments. The written Gen SSE and Sp-sp SSE test trainees’ knowledge of the surgical sciences; the CE tests trainees’ application of knowledge and performance of clinical skills in a multi-station examination setting. This study was conducted to investigate the relationship between trainees’ scores in selection to the RACS GS training programme and their scores in assessments during the first 3 years of training, to determine whether applicants’ performance in any selection tool was associated with their subsequent performance as GS trainees. This is the first study to examine the relationship between selection tools and assessments in the context of Australian and New Zealand surgical training.
Materials and methods The study was approved by the Royal Australasian College of Surgeons Ethics Committee and by the Victoria University Ethics Committee. Data were reviewed for 367 Australian and New Zealand GS trainees who had been selected into three consecutive cohorts in 2008 (n = 105), 2009 (n = 107) and 2010 (n = 155) to commence training in 2009, 2010 and 2011, respectively. At the time of the study, the majority of those selected in 2008 had progressed into their third year of training (SET3), with the 2009 and 2010 cohorts in SET2 and SET1, respectively. The data collected comprised scores for four selection items, and scores for performance in six assessments during training. Name, identification number, year of application, country and state of origin, and selection scores were collected for all trainees in the study. At the time of initial data collection, many trainees selected in 2009 and 2010 (commencing in 2010 and 2011) had not yet completed assessments that were implemented in SET2 and SET3; however, sample sizes were sufficient for statistical analysis of all assessment items except for DOPS4 and Mini-CEX4. Table 1summarizes the
Table 1 Selection and assessment items in which trainee performance was scored Selection items (n = 367) • Structured curriculum vitae (CV) • Structured referee reports (RR) • Semi-structured multi-station interview (MSI) • Total selection score (TSS) Assessment items In-training and work-based assessments† • Direct observation of procedural skills (DOPS) DOPS1 (n = 178), DOPS2 (n = 119), DOPS3 (n = 44), DOPS4 (n = 16) and average DOPS (n = 179) • Mini clinical examinations (Mini-CEX) ‡ Mini-CEX1 (n = 175), Mini-CEX2 (n = 115), Mini-CEX3 (n = 39), Mini-CEX4 (n = 12) and average Mini-CEX (n = 178) • End-of-term assessments ETA1 (n = 240), ETA2 (n = 168), ETA3 (n = 125), ETA4 (n = 70) and average ETA (n = 246) Examinations • Generic surgical sciences examination (Gen SSE) (n = 325) • Specialty-specific surgical sciences examination (Sp-sp SSE) (n = 276) • Clinical examination (CE) (n = 327) †The average DOPS, Mini-CEX and ETA assessments include trainees whose available scores omitted DOPS1, Mini-CEX1 and/or ETA1. ‡Equivalent to mini clinical evaluation exercise.
© 2013 The Authors ANZ Journal of Surgery © 2013 Royal Australasian College of Surgeons
414
Oldfield et al.
selection tools and assessment items in which trainee performance was scored. Data were collected on trainee performance for the three cohorts (2008–2010). All trainees’ scores were calculated for each selection activity; scores were converted to percentages and a TSS percentage was derived for every trainee. Assessment scores were recorded and converted to percentages for trainees’ first attempts in the three examinations (Gen SSE, Sp-sp SSE, CE). Trainees’ DOPS, MiniCEX and ETA assessments for each rotation were reviewed and supervisor ratings were converted to numeric scores and percentages for each of these reports.
Data analysis Two types of analysis were conducted, for the three cohorts combined, using Statistical Package for the Social Sciences 19.0 (SPSS Inc., Chicago, IL, USA). Firstly, Pearson’s correlations were calculated to examine the interrelationships within each of the sets of variables; that is, for selection, to determine the strength of the interrelationships between scores in the CV, the RR and the MSI; for the work-based and in-training assessments, to determine the interrelationships between DOPS, Mini-CEX and ETA; and for examinations, to determine the interrelationships between Gen SSE, Sp-sp SSE and CE. Secondly, correlation analyses were conducted to determine the degree of association between the selection items and performance in each of the subsequent assessment items.
Results Pearson’s correlation analysis Selection tools Correlations between the selection tools are generally small and not statistically significant. The only statistically significant relationship – between the CV and the MSI – has a negative correlation (r = -0.190, P < 0.001). Direct observation of procedural skills There were significant medium correlations between DOPS1 and DOPS2 (r = 0.412, P < 0.001) and between DOPS2 and DOPS3 (r = 0.317, P = 0.036). Mini clinical examinations No significant correlations were observed between Mini-CEX scores. End-of-term assessments A significant, medium correlation was observed between performance in ETA3 and ETA4 (r = 0.326, P = 0.007). A small correlation,
approaching significance was observed between performance in ETA1 and ETA2 (r = 0.152, P = 0.053). Examinations Table 2 shows significant correlations between the examinations that are medium to large. The Gen SSE and Sp-sp SSE were strongly correlated with each other and there was medium correlation between each of the SSE exams and the CE.
Relationship between selection tools and assessment items Correlation analysis was used to identify associations between scores in selection tools and scores in assessments during training for all participants. Table 3 summarizes the significant positive and negative correlations.
Discussion Interpretation of significant correlations The small correlations between trainee scores for the selection tools (i.e. between the CV, RR and MSI), indicated that performance in any selection tool was not closely related to performance in either of the other selection tools. This may indicate that each of the selection tools captured information about different personal attributes, or may reflect a shortcoming of the selection tools. Moderate correlations between performance in DOPS2 with both DOPS1 and DOPS3 may indicate that this assessment tool identifies a progressive improvement in trainee performance within this domain, with DOPS2 being an intermediate stage between DOPS1 and DOPS3, as might be expected as trainees rapidly acquire operative skills in the early phases of training. DOPS1 and DOPS2 are undertaken in SET1, so this result may provide evidence to indicate consistent trainee performance within a SET year or stage of training. The lack of significant correlations between any of the Mini-CEX reports may reflect the unpredictability of clinical encounters and highlights variability inherent in this type of assessment. Trainees nominate the procedures in which they will be assessed by DOPS and may therefore anticipate requirements, whereas the Mini-CEX assessments are conducted during incidental clinical encounters for which trainees cannot specifically prepare. Trainees may be challenged by unfamiliar or complex conditions, particularly during their early SET years. Furthermore, it has been shown that the reliability of the Mini-CEX assessments increases with more encounters and more assessors.7–11 ETA1 and ETA2 are usually undertaken in a single SET year (SET1), and ETA3 and ETA4 in SET2. The moderate correlations
Table 2 Correlations for examinations, for combined selection years 2008–2010 Selection tool
Generic SSE Specialty-specific SSE Clinical exam
n
325 276 327
Generic SSE
Specialty-specific SSE
Clinical exam
Pearson’s correlation (r) †
Pearson’s correlation (r) †
Pearson’s correlation (r) †
1.000
0.854* 1.000
0.357* 0.348* 1.000
*P < 0.01. †r approaching 1 indicates a large positive correlation, r approaching -1 indicates a large negative correlation.
© 2013 The Authors ANZ Journal of Surgery © 2013 Royal Australasian College of Surgeons
Selection and subsequent assessment scores
415
Table 3 Significant positive and negative correlations between selection tools and subsequent performance Selection tool
Positive correlations
Negative correlations
Structured CV
Structured referee reports
Semi-structured multi-station interview
Total selection score
DOPS2 ETA3 Generic SSE passdiff† Average DOPS ETA1 ETA3 ETA4 Average ETA CE CE passdiff† Mini-CEX1 Sp-sp SSE CE CE passdiff† DOPS1 Average DOPS ETA1 ETA4 Average ETA Sp-sp SSE CE CE passdiff†
(r (r (r (r (r (r (r (r (r (r (r (r (r (r (r (r (r (r (r
= = = = = = = = = = = = = = = = = = =
0.131, 0.142, 0.251, 0.305, 0.183, 0.223, 0.194, 0.129, 0.127, 0.212, 0.222, 0.142, 0.124, 0.142, 0.294, 0.125, 0.113, 0.281, 0.253,
P P P P P P P P P P P P P P P P P P P
= = = = = < < = = < < = = = = = = <