The changing face of assessment: swings and ... - Semantic Scholar

and in written and audio form. The use of audiotapes is vital not only for those whose language is predominantly a spoken one (Sylheti, for example) but also for the estimated 23% of the adult British population who are functionally illiterate.17 This initiative is to be applauded: we can only hope that other services follow suit and rapidly. In addition to being offered information about the test in ways that meet their literacy needs, those providing the information need to check that it has been understood. Such checking is not routine: in an analysis of five videotaped consultations from each of 2094 trainee GPs, 45% did not check understanding in any of their five consultations, and fewer than 1% did so in all five.18 Such checking can be very effective in increasing understanding, particularly for those with low levels of education.19 Finally, tests need to be presented in ways that help individuals to make choices that reflect their own values and not those of the person presenting the test options. While decision-aids can achieve this,20 further research is needed to establish how this might be achieved in routine consultations in which screening for sickle cell and thalassaemia is being offered to women with a wide range of literacy and values that reflect the multiple cultures that make up the population of most developed countries. In the context of sickle cell and thalassaemia screening, the recent launch of the NHS programme has begun to address many of the organisational challenges of facilitating timely and informed choices for women from diverse ethnic backgrounds as well as for women with wide literacy needs.

genetic screening for thalassaemia during pregnancy: audit from a national confidential inquiry 2000; BMJ; 320(7231): 337–341.

Further research and continued audit will be needed to ensure that we have moved far away from the charge of institutional racism — which, in the previous decade, could rightly be levelled at the provision of this and other services in the UK.

11. Modell M, Wonke B, Anionwu E, et al. A multidisciplinary approach for improving services in primary care: randomised controlled trial of screening for haemoglobin disorders. BMJ 1998; 317(7161): 788–791.

Theresa Marteau

12. Ali S, Atkin K. Primary healthcare and South Asian populations: meeting the challenges. Oxford: Radcliffe Medical Press Ltd, 2004.

Professor of Health Psychology

13. Welch HG. Informed choice in cancer screening. JAMA 2001; 285(21): 2776–2778.

Elizabeth Dormandy Research Fellow, Psychology and Genetics Research Group, King’s College London

14. UK National Screening Committee. Second report of the UK National Screening Committee. London: Department of Health, 2000. http://www.nsc.nhs.uk/ pdfs/secondreport.pdf (accessed 4 May 2005).

REFERENCES

15. Marteau TM, Dormandy E, Michie S. A measure of informed choice. Health Expect 2001; 4: 99–108.

1. NHS. NHS Sickle Cell and Thalassaemia Screening Programme. http://www.kcl-phs.org.uk/ haemscreening/AimsObject.htm (accessed 4 May 2005.) 2. Thomas P, Oni L, Alli M, et al. Antenatal screening for haemoglobinopathies in primary care: a whole system participatory action research project. Br J Gen Pract 2005; 55: 424–428. 3. Wright J, Rati N, Kennefick A, et al. A pilot study of ‘fast trac’ antenatal screening for haemoglobinopathies 2003; J Med Screen 10(4): 169–171. 4. Dormandy E, Michie S, Hooper R, Marteau TM. Low uptake of antenatal Down syndrome screening in ethnic and socially deprived groups: a reflection of women’s attitudes or a failure to facilitate informed choices? Int J Epidemiol 2005; 34(2): 346–352. 5. Stewart A, Rao JN, Osho-Williams G, et al. Audit of primary care angina management in Sandwell, England. J R Soc Health 2002; 122(2): 112–117. 6. Skinner J, Weinstein JN, Sporer SM, Wennberg JE. Racial, ethnic, and geographic disparities in rates of knee arthroplasty among Medicare patients. N Engl J Med 2003; 349(14): 1350–1359. 7. Hussain-Gambles M, Atkin K, Leese B. Why ethnic minority groups are under-represented in clinical trials: a review of the literature. Health Soc Care Community 2004; 12(5): 382–388. 8. Feder G, Crook AM, Magee P, et al. Ethnic differences in invasive management of coronary disease: prospective cohort study of patients undergoing angiography. BMJ 2002; 324: 511–516. 9. Rowe RE, Garcia J, Davidson LL. Social and ethnic inequalities in the offer and uptake of prenatal screening and diagnosis in the UK: a systematic review 2004; Public Health 118(3): 177–189. 10. Modell B, Harris R, Lane B, et al. Informed choice in

16. Dormandy E, Hooper R, Michie S, Marteau TM. Informed choice to undergo prenatal screening: a comparison of two hospitals conducting testing either as part of a routine visit or requiring a separate visit. J Med Screen 2002; 9: 109–114. 17. Office for National Statistics. Adult literacy in Britain. London: Office for National Statistics, 1997. http://www.statistics.gov.uk/StatBase/ Product.asp?vlnk=1314 (accessed 9 May 2005). 18. Campion P, Foulkes J, Neighbour R, Tate P. Patient centredness in the MRCGP video examination: analysis of large cohort. Membership of the Royal College of General Practitioners. BMJ 2002; 325(7366): 691–692. 19. Baker H, Uus K, Bamford J, Marteau TM. Increasing knowledge about a screening test: preliminary evaluation of a structured, chart-based, screener presentation. Patient Educ Couns 2004; 52(1): 55–59. 20. O’Connor AM, Legare F, Stacey D. Risk communication in practice: the contribution of decision aids. BMJ 2003; 327(7417): 736–740.

ADDRESS FOR CORRESPONDENCE

Theresa Marteau Psychology and Genetics Research Group, Department of Psychology, Institute of Psychiatry, King’s College London, 5th Floor Thomas Guy House, Guy’s Campus, London SE1 9RT. E-mail: [email protected]

The changing face of assessment: swings and roundabouts ‘The novelties of one generation are only the resuscitated fashions of the generation before.’ George Bernard Shaw. From the preface to Three Plays for Puritans. This quotation aptly reflects the tensions in the pursuit of a ‘Holy Grail’ ideal

420

assessment. In the early 20th century the goal was integration. Flexner, the late 19th century American educationalist, held the firm belief that assessment must focus on a student’s ability to assess in full ‘a concrete case to collect all the relevant data and to suggest the positive procedures applicable to the conditions disclosed’.1 Long cases

and oral presentations were in favour. Subsequently, the logisitics of ensuring fair and equitable challenge across cases and during unstructured vivas led to an increasing focus on more objective testing methodologies (some believe at the cost of being too reductionist), such as multiple choice questions (MCQs) and objective

British Journal of General Practice, June 2005

Editorials

structured clinical examinations (OSCEs), using simulation rather than reality. A century later, the international focus is on the need to assess doctors’ performance, highlighted by Miller’s now famous pyramid,2 within the context of their work (that is, what a doctor ‘does’). The aim is to ensure that more formative processes support the individual’s needs during training. Fashion has swung almost full circle.3 We are back to searching for more integrated approaches, using real patients. Yet much of this change has lacked evidence. We know little of the psychometrics of the original long case and orals.4 The logistics of work-based assessment are still to be explored. Two contrasting papers in this month’s Journal are to be welcomed. Simpson and Ballard have investigated the content validity of the more traditional oral format used in the Membership of the Royal College of General Practitioners (MRCGP) examination.5 Swanwick and Chana6 reflect on workplace assessment as it moves forwards to take an increasing role in the licensing of doctors. There is no ideal assessment method. All are flawed in some way. The current trend is to design innovative assessment packages to drive educational agendas, acknowledging the impact this has on learning. We must be realistic and remain open to the potential application of all available tools, and ensure that the assessment programme, when completed, has addressed the ultimate goals of training.7 The overriding principle underpinning any measurement of professional performance is that of ‘content specificity’; that is, the need to assess across a broad range of contexts. An individual’s performance in one situation, such as the assessment of a diabetic patient, will not necessarily mirror that of an asthmatic patient. It makes sense. We have all trained and worked in different ways. It is becoming increasingly apparent that knowledge, which is inevitably stronger in some areas than others, underpins much of clinical practice. Traditionally, medical training objectives have been subdivided into knowledge, skills and attitudes. Yet Flexner was right: the three are, in reality, intertwined. We must resist making assessment too atomistic. Attempts in OSCEs to assess


communication skills as a separate item illustrate this. As a very frustrated surgeon once said in an undergraduate OSCE committee, ‘it is no good giving full marks to a student for “communicating bad news” if in actuality they’ve given the wrong news’. To judge professional competency, assessment has to be carefully designed to cover adequate content. In the 20th century, when the need to address content specificity became apparent, there was an international divergence in trends. The US was quick to abandon long cases, as orals (favouring knowledge tests), which covered broad content, were reliable and legally defensible. Elsewhere, traditional methods have only gradually moved towards the more objective models. Ironically, these changes were made with little research into the psychometrics of the original traditional tests. The reality is now dawning that it does not matter which assessment method you use, be it orals, MCQs, long cases or OSCEs, provided that the test time is long enough to ensure sufficient content is examined.8 The limiting factor is the feasibility of the test. Eight to 10 long cases could be as reliable as a 3-hour OSCE, but relatively impractical to deliver. We can learn from this original divergence. There is increasing awareness of the need for research into the validity, reliability and impact of assessment methodologies and educational developments.9 We must not lose any opportunity to produce an evidence base to support change. Internationally, there is a big drive towards new assessment tools as we strive to measure performance and improve our tests of competency. In the UK, this has been accelerated by rapidly changing frameworks for postgraduate training, such as the Department of Health’s proposals for Modernising Medical Careers (MMC), where the emphasis is now placed on competency-based curricula and the establishment of clear standards by the Postgraduate Medical Education Training Board (PMETB). These changes aim to provide assessment packages that enable trainees to demonstrate competencies as and when they learn them, and to avoid examination structures that unnecessarily delay progress. Some of the change is to be welcomed: in particular, more formative

assessments, robust standards and lay involvement in the quality assurance of the process. We face rapid change that is not underpinned by clear research evidence and is overshadowed by concerns that these are driven by politics rather than educational rationale. Caution is justified. Demonstration of competence may not equate with competency acquired through experience.10 We are at risk of losing the appraisal of higher skills learnt through experience if assessment becomes a tick list of ‘can do’, be it in the workplace or in an examination. It is difficult to see how these new assessments will address the allimportant issue of content specificity. Trainee-led completion of competencies in the workplace will need careful planning to ensure that they are tested across a suitable range of clinical contexts. This may not be feasible, given the current pressures on clinical service delivery in both secondary and primary care. Trainers will be asked not only to teach and appraise but also to judge in the workplace. There are significant tensions between these roles.11 The new tools under development for the workplace are essentially based on old formats. Four methods are proposed for the new MMC Foundation programme, the first of which is the mini-CEX — a format modified from an observed long case by John Norcini in the US12 that takes ‘snapshots’ of the integrated assessment, focusing (among other things) on observation of history taking or examination, but not on the entire process. The other methods are: case-based discussions grounded on good oral techniques; direct observation of procedural skills (DOPS) (which are a type of OSCE in the work environment); and a mini peer assessment tool modelled on 360º appraisal. The MRCGP also faces radical change to accommodate the new focus on workplace assessment. Thus, the skills of experienced examiners trained to assess videos or conduct oral examinations remain crucial but within an entirely new and challenging framework. As Tom Stoppard observed in his play Indian Ink, ‘If an idea’s worth having once, it’s worth having twice’. This time round, however, we must ensure these methods are adequately appraised for validity as well as reliability.

421

Simpson and Ballard set a good example.5 Their study highlights how tests do not always measure what they set out to measure; in this case decision making in the orals. Swanwick and Chana raise interesting issues on the high validity of evidential, locally-based assessment, suggesting ways of enhancing the reliability.6 If we believe the published literature,11 the task in hand is challenging. The danger of current change is the impetus with which is it taking place. The new approach may not lead us to the Holy Grail for a test that accurately predicts future unobserved practice; but we do face a major change in philosophy. Assessment programmes designed to ensure education is efficiently and appropriately delivered are replacing the examinations that are known to be reliable but are at times lacking in validity. We need well-designed research to support or refute this new rationale.

Val Wass

Med 2004; 79(10): 939–947.

Professor of Community-based Education, Manchester

10. Talbot M. Monkey see, monkey do: a critique of the competency model in graduate medical education Med Educ 2004; 38: 587–592.

REFERENCES

11. Norcini JJ, Blank LL, Duffy FD, Fortuna GS. The miniCEX: a method for assessing clinical skills. Ann Intern Med 2003; 138(6): 476–481.

1. Flexner A. Medical education in the United States and Canada. Bethesda, MD: Science and Health Publications, 1910. 2. Miller GE. The assessment of clinical skills/ competence/performance. Acad Med 1990; 65: S63–S67.

12. Williams RG, Klamen DK, McGaghie WC. Cognitive, social and environmental sources of bias in clinical performance ratings. Teach Learn Med 2003; 15(4): 270–292.

3. Schuwirth LWT, van der Vleuten CPM. The use of clinical simulations in assessment. Med Educ 2003; 37 Suppl 1: 65–71. 4. Wass V, van der Vleuten CPM. The long case. Med Educ 2004; 38(11): 1176–1180. 5. Simpson RG, Ballard KD. What is being assessed in the MRCGP oral examination? A qualitative study. Br J Gen Pract 2005; 55: 430–436. 6. Swanwick T, Chana N. Workplace assessment for licensing in general practice. Br J Gen Pract 2005; 55: 461–467. 7. van der Vleuten CPM, Schuwirth LWT. Assessing professional competence: from methods to programmes. Med Educ 2005; 39: 309–317. 8. Wass V, van der Vleuten CPM, Shatzer J, Jones R. Assessment of clinical competence. Lancet 2001; 357: 945–949. 9. Regehr G. Trends in medical education research. Acad

ADDRESS FOR CORRESPONDENCE

Val Wass Professor of Community-based Education, First Floor, Rusholme Health Centre, Walmer St, Manchester M14 5NP E-mail: [email protected]

Making general practice fit for the 21st century For some while it has been customary to argue that good primary care is an essential part of an effective, comprehensive system of health care. Barbara Starfield’s work comparing the systems in different countries has supplied convincing objective evidence to support the contention, where countries with poor orientation towards primary care have worse health outcomes.1 The same relationship has been found when comparing the different US states,1 and on a more local scale, mortality rates in English hospitals have been found to be closely related to the supply of GPs in the area.2 Policy experts observing from countries lacking such systems, most obviously the US, bemoan their deprivation in this respect.3 Older doctors and patients lament the passing of the familiar family doctor, who through long knowledge of the patients became a family friend. The difficulty with this vision is that such long-term commitment has become unusual, and erodes through the mobility of both patients and the needs of doctors for career development and adherence to the

422

European Working Time Directive. So unusual that it is now unrealistic to plan a system on the basis of long-term relationships. Worse is the risk that constructing a system on the basis of such relationships sets it up to fail. It is worth taking stock to define what the task should be now, and how we can set about achieving it. First and foremost, doctors working in primary care have to practise medicine to a high degree of technical competence. It is easy to dismiss this, or regard it as a truism. But truisms are often true. It has been customary to take technical excellence for granted, either by assuming its universal existence among all primary care doctors, or through fear of what we might find if we looked a bit harder. However, we know that there have always been doctors working in primary care whose practice would not, if tested, reach an acceptable standard. One of the benefits of a robust system of revalidation, when it is in place, is that it will tell us whether such doctors comprise 0.1%, 1% or 10% of the total. Technical

competence demands two added dimensions. First, doctors all have to keep abreast of a discipline that changes constantly, with the appearance of new diseases (and the disappearance of some old ones), development of new diagnostic tests and techniques, and a constant supply of new therapies. Second, doctors working in primary care should continue to aspire to be good generalists. One of the strengths of primary care is to have enough expertise across all areas of medicine to relieve patients of any anxiety that they need to decide the appropriate specialist to deal with their problem. Also the last thing patients want is a primary care doctor who can say as specialists can (and sometimes do), ‘Sorry but this problem is outside my area of expertise’. Because the idea of excellence which turns its face against specialism is so alien in our culture, the insistence that primary care has to remain rooted in generalism is a vital message to get into the heads of policy makers. This is especially true right now in the UK, with developments pulling in the opposite