n Opinion
Note: This copy is for your personal non-commercial use only. To order presentation-ready copies for distribution to your colleagues or clients, contact us at www.rsna.org/rsnarights.
Reviews and Commentary
Avoiding Testocracy1
Richard B. Gunderman, MD, PhD Zachary Ballenger, MD Darel E. Heitkamp, MD
If they can get you asking the wrong questions, they don’t have to worry about the answers. Thomas Pynchon, Gravity’s Rainbow (1)
R
1
From the Department of Radiology, Indiana University School of Medicine, 705 Riley Hospital Dr, Room 1053, Indianapolis, IN 46202-5200. Received May 21, 2012; revision requested June 6; revision received June 29; final version accepted July 18. Address correspondence to R.B.G. (e-mail:
[email protected]). RSNA, 2012
q
332
ecent controversy involving the American Board of Radiology (ABR) has garnered a great deal of attention. Allegations concerning widespread cheating on qualifying examinations, against which the ABR has mounted a spirited defense, have surfaced in national news media (2). In particular, critics allege that recently tested ABR candidates have been engaged in inappro priately sharing examination content with future candidates, artificially raising scores (3). Partly in response to these allegations, the ABR is requiring candidates, program directors, program coordinators, and department chairs to sign attestations prohibiting cheating, including the sharing of questions from examinations. One sign of a robust profession is the quality of dialogue surrounding its methods for educating and certifying the performance of new entrants. In what follows, we do not intend in any way to criticize or undermine the important work of the ABR or its personnel, trustees, and many volunteers. We seek only to stimulate reflection and dialogue concerning the purpose of the ABR and its examinations. By thinking about the ABR’s policies in the larger contexts of radiology as a profession, and the responsibility of every radiologist to promote the interests of patients and the public, we may better understand and appraise the board certification process (4). To begin with, we should avoid operating on the presumption that any human organization, whether the United States of America, the New York Yankees,
or the ABR, is infallible. The very fact that such organizations change their strategies and policies over time indicates that they have not achieved perfection. For example, the fact that the ABR only recently initiated a crackdown against decades of alleged widespread cheating on its examinations is an indication that the organization is still learning and may still have room to improve. A key ingredient in the recipe for such improvement is honest dialogue, grounded in mutual respect and the shared search for understanding. All professional boards face an inherent hazard. They rely largely on examinations to identify incompetent candidates. In the case of the ABR, thousands of person-hours are invested in designing, improving, and implementing these examinations (5). Such examinations often dominate discussions at the workplace. This certainly happens with some radiology candidates, who sometimes let board preparation supersede other residency responsibilities (6). However, the mission of the ABR is not to create or administer examinations. According to the ABR Web site, its mission is to “serve patients, the public, and the medical profession by certifying that its diplomates have acquired, demonstrated, and maintained a requisite standard of knowledge, skill, and under standing essential to the practice of diagnostic radiology” (7). If examinations can help achieve such missions, then med ical boards should produce them, and such examinations should be evaluated on how effectively they promote the mission. But everyone should avoid confusing the examination with the mission. Examinations are not the be-all and end-all of any educational organization, including a medical board. The examination is but a tool, and an imperfect tool at that. What matters most is not how well candidates perform on an examination, but how well they have been
radiology.rsna.org n Radiology: Volume 265: Number 2—November 2012
OPINION: Avoiding Testocracy
prepared for practice. In most cases, examinations are far less effective in predicting how well candidates will perform as full-fledged members of a profession than in foretelling how well they would perform if they sat again for such an examination (8). It is quite possible that some candidates who would perform adequately or even well in a field are weeded out by examinations that, for whatever reason, are not a good fit for their test-taking approach. Perhaps more importantly, examinations can test only some aspects of a candidate’s education, and different types of examinations invariably suit quite different aspects (9). For example, a computer-based multiple-choice examination may perform well at assessing factual knowledge. Such examinations tend not to perform so well at testing communication skills or professionalism. Oral examinations, on the other hand, can do a better job of testing these latter aspects of a candidate’s preparation but tend to be more expensive to administer and more subject to the personal biases of examiners. In principle, the best test is the one that most closely approximates what candidates will be expected to do in practice. The ABR physics, clinical, and new “exam of the future” certifying examinations are examples of standardized, highstakes tests. Standardized simply means that each candidate takes essentially the same test. The term high-stakes comes from gambling, where players are wagering a substantial amount of money on an outcome of a single event, such as a spin of the roulette wheel. In the case of the ABR examinations, at stake is whether candidates will attain board certification, and the event on which the candidates are staking their future is the outcome of the examination. If they pass the examination, they realize this goal. If they fail, serious consequences may ensue, both formal and informal. In the minds of many candidates, an investment consisting of 4 or more years of their lives seems to hang in the balance. There are numerous inherent drawbacks to standardized, high-stakes testing. These include an incentive to “teach to the test,” creation of a com-
Gunderman et al
petitive learning environment, and biases toward certain types of learners and against others. Moreover, all such tests are inherently limited in their ability to help distinguish between those who have mastered the material and those who have not (10). A single test at the end of training tends to provide a less rich and reliable assessment of learning than continuous assessment over time. Although well-designed standardized tests promote a kind of objectivity, a one-time computerized examination inevitably provides a narrower and more superficial assessment. Of course, narrative evaluations have their own pitfalls. Evaluators are not invariably competent and trustworthy. But narrative evaluations also have the potential to provide much richer and more encompassing assessments. Those who doubt this need ask themselves but a single question: On which would I prefer to stake my choice among several job candidates: their pass-fail results on a multiple-choice examination or narrative assessments from trusted colleagues who have worked with them? Board certification status is one possible proxy for excellence. But some very talented radiologists who are widely recognized as models of excellence in patient care, research, and education are not board certified. Conversely, some problematic radiologists have both attained and maintained their board certification. What matters most is not test performance but what people bring, professionally and personally, to the daily practice of radiology. This raises the question of failure rates. What is the appropriate failure rate on the board examinations? During a recent 5-year period, pass-fail rates for the ABR certification examinations demonstrated very little variation, with a 92%–96% pass rate for 3rd-year residents taking the clinical examination (11). One sign that examinations are increasingly regarded as ends in themselves is the ascendancy of psychometricians on the staffs of many professional boards. Although psychometricians construct tests for a variety of purposes, they most commonly seek to stratify examinees. A question that all exam-
Radiology: Volume 265: Number 2—November 2012 n radiology.rsna.org
inees get correct is likely to seem useless to them, as would a question that all examinees get wrong. But stratifying candidates should not be the top priority for a professional board, whose purpose is simply to establish with certainty that the examinee is competent. To professionals, whose primary loyalty should be to patients and the public, there are some things that every radiologist should know. Ultimately, we are not trying to produce the best possible test. We are trying to educate the best possible radiologist. We have heard many references to examination integrity. The integrity of an examination is important and should be protected and promoted. Yet the integrity of the examination should take a backseat to higher ends, such as the integrity of the profession, the welfare of its current and future members, and, above all, the profession’s fiduciary responsibility to promote the interests of patients and the public. From the patient’s standpoint, no test is as important as the knowledge, skills, and character of the professionals delivering care. To let a test dominate our field of view is to forget an elemental truth familiar to anyone who has ever worked on a ranch: You do not fatten the cattle by weighing them. The higher an examination’s stakes, the greater the incentive for learners and educators to discover what it is like. Candidates are desperate to pass. Educators and program directors want to tout their program’s examination preparation to attract top applicants. Residents routinely ask faculty members whether they need to know particular concepts for the board examination, and faculty members routinely inform residents when they believe that material tends to appear on the test. Some faculty members even ask recently tested candidates for information on the examination so they can better prepare future candidates.
Published online 10.1148/radiol.12121008 Content codes: Radiology 2012; 265:332–335 Conflicts of interest are listed at the end of this article.
333
OPINION: Avoiding Testocracy
Where does effective board preparation end and cheating begin? In essence, cheating occurs when someone gets the questions right without truly knowing the material. A cheater is someone who performs artificially well on an examination without having invested the time and energy to learn the material. Cheaters subvert fairness, whether they happen to pass or not. Clear examples of cheating would be illicitly obtaining a copy of an examination or answer key and using it to gain a higher score. But there is a difference between merely memorizing questions and answers and using general knowledge of examination content to guide study and preparation. Sharing information about examinations is not a new phenomenon, and the practice has some merits. It fosters the development of a cooperative attitude between learners. It encourages more senior learners to take greater responsibility for the education of their more junior colleagues. It builds a collaborative spirit among learners and educators. It also helps alleviate frequently counterproductive anxiety among learners. Examination preparation should not be a zero-sum game in which learners find it unthinkable to help one another. Like the practice of medicine, it should be a cooperative and even collaborative endeavor (12). What if there was material on an examination that could save a patient’s life? Would it be undesirable if candidates let other candidates know that it appeared on the examination? For better or worse, expectations concerning assessment shape learning objectives. Although teaching to the test manifests regrettable implications, it would be naive to suppose that testing does not influence curriculum. So long as the test focuses on material that is truly essential to radiology practice, and the sharing of information does not replace actual mastery of the material, such sharing may actually promote clinical excellence (13). For example, if senior residents were to relay to junior residents that they had seen a question about bremsstrahlung radiation on the examination, this may encourage the latter 334
Gunderman et al
to devote more study to the topic. Some educators have suggested that the use of recalls in radiology departments has often had such a salutary effect (14). One way to attempt to get candidates to cease sharing information about examinations is to forbid the practice and make the probability of detection and the severity of the punishment so great that people feel intimidated into refraining. Another way is to structure the examination in such a way that the incentive to memorize questions is lessened or eliminated. There is an important difference between reducing the incentive to share examination questions and driving such sharing underground. To promote the former, the use of repeat questions could be avoided. The same essential material could and should be tested from year to year, but new questions could be devised, or old questions could be reformulated, so that merely memorizing questions and answers would provide no advantage. In addition, the ABR could publish the actual examination questions following each administration, as the American College of Radiology has done. An even higher priority is to deemphasize the examination. No less important than how well candidates perform on a standardized high-stakes test are their work ethic, reliability, compassion for patients, interpersonal skills, and character. Educators’ longterm observation and collaboration with candidates provides a much more robust indicator of their level of preparedness for independent practice than any standardized, high-stakes test could ever do. To encourage richer and more comprehensive assessments of candidates, we would need to place more trust in faculty members and program directors to take seriously their professional responsibility to promote and protect the interests of the profession, patients, and public at large. There is evidence that such a system can work. For example, educational institutions from K–12 schools to colleges and universities have been failing poorly performing students for years. Although a component of this pass-fail metric typically involves standardized testing, the
final determination often includes multiple methods of evaluation. Those who would seek to shirk such responsibility on the grounds that it might subject them to lawsuits by failed candidates should focus more on fulfilling their educational responsibilities and less on legal expediency (15,16). Whatever we do, we must assiduously avoid inflating even further the perceived importance of board examinations. In the great scheme of postgraduate education, and especially radiology education, such examinations already draw too much time and attention. Results on such tests pale in educational significance to what good educators already know about their trainees. We must be careful not to allow postgraduate education to degenerate into a testocracy, where board examinations so dominate learners’ and educators’ perspectives that we lose sight of the real purpose of education. As one program director put it, “I worry so much about the board examinations because I feel I have to, but I wish I could focus more on education.” Disclosures of Conflicts of Interest: R.B.G. No relevant conflicts of interest to disclose. Z.B. No relevant conflicts of interest to disclose. D.E.H. No relevant conflicts of interest to disclose.
References 1. Pynchon T. Gravity’s rainbow. New York, NY: Viking Press, 1973. 2. Becker GJ. ABR’s exam security. Tucson, Ariz: American Board of Radiology, 2012. http://www.theabr.org. Accessed April 16, 2012. 3. Zamost S, Griffin D, Ansari A. Exclusive: doctors cheated on exams. http://www.cnn .com/2012/01/13/health/prescription-for -cheating/index.html. Published January 13, 2012. Accessed April 16, 2012. 4. Yang JC, Kazerooni EA, Bosma JL, Gerdeman AM, Becker GJ, Vydareny KH. Practice analysis: a basis for content validity for American Board of Radiology examinations in diagnostic radiology. J Am Coll Radiol 2012; 9(2):121–128. 5. Yang JC, Gerdeman AM, Becker GJ, Bosma JL. American Board of Radiology diagnostic radiology initial qualifying (written) examinations. AJR Am J Roentgenol 2010;195(1):10–12.
radiology.rsna.org n Radiology: Volume 265: Number 2—November 2012
OPINION: Avoiding Testocracy
6. Hall FM, Janower ML. The new r equirements and testing for American Board of Radiology certification: a contrary opinion. Radiology 2008;248(3):710–712. 7. American Board of Radiology. About the ABR. http://www.theabr.org/about-landing. Accessed May 2, 2012. 8. Noddings N. High stakes testing: Why? Theory Res Educ 2004;2(3):263–269. 9. Mislecy R, Braun H. Intuitive test theory. Kingston, NJ: Princeton Association for Computing Machinery and Institute of Electrical and Electronics Engineers, 2003. 10. Langenfeld K, Thurlow M, Scott D. High stakes testing for students: unanswered questions and implications for students with disabilities. January 1997. National Center for
Gunderman et al
Educational Outcomes Synthesis Report 26. Minneapolis, Minn: National Center for Educational Outcomes, 1997. 11. Diagnostic Radiology: exam scoring and results. Tucson, Ariz: American Board of Radiology, 2012. http://www.theabr.org/ic -dr-score. Accessed May 12, 2012. 12. Keppell M, Au E, Ma A, Chan C. Peer learning and learning-oriented assessment in technology-enhanced environments. Assess Eval High Educ 2006;31(4):453–464.
14. Hall FM. The ABR and resident recall “cheating”. Radiology 2012;263(2):323–325. 15. Greene J. Resident dismissal: suing the system. http://www.ama-assn.org/amednews/2000 /05/08/prsa0508.htm. Published May 8, 2000. Accessed June 26, 2012. 16. Smith JJ, Berlin L. Is being sued for malpractice grounds for dismissal from a residency program? AJR Am J Roentgenol 2000; 175(2):315–318.
13. Ruchman RB, Kwak AJ, Jaeger J. The written clinical diagnosis board examination: survey of program director and resident opinions. AJR Am J Roentgenol 2008;191(4): 954–961.
Radiology: Volume 265: Number 2—November 2012 n radiology.rsna.org
335