Teaching and Assessing Procedural Skills ... - Wiley Online Library

39 downloads 263901 Views 81KB Size Report
tion of training methods; transfer of skills learned on simulator models to patients; and prevention of skill decay ... The Accreditation Council for Graduate Medical ..... procedures must be taught on animal models rather .... J Lap Adv Surg Tech.
Teaching and Assessing Procedural Skills Using Simulation: Metrics and Methodology Richard L. Lammers, MD, Moira Davenport, MD, Frederick Korley, MD, Sharon Griswold-Theodorson, MD, Michael T. Fitch, MD, Aneesh T. Narang, MD, Leigh V. Evans, MD, Amy Gross, Elliot Rodriguez, MD, Kelly L. Dodge, MD, Cara J. Hamann, MPH, Walter C. Robey III, MD

Abstract Simulation allows educators to develop learner-focused training and outcomes-based assessments. However, the effectiveness and validity of simulation-based training in emergency medicine (EM) requires further investigation. Teaching and testing technical skills require methods and assessment instruments that are somewhat different than those used for cognitive or team skills. Drawing from work published by other medical disciplines as well as educational, behavioral, and human factors research, the authors developed six research themes: measurement of procedural skills; development of performance standards; assessment and validation of training methods, simulator models, and assessment tools; optimization of training methods; transfer of skills learned on simulator models to patients; and prevention of skill decay over time. The article reviews relevant and established educational research methodologies and identifies gaps in our knowledge of how physicians learn procedures. The authors present questions requiring further research that, once answered, will advance understanding of simulation-based procedural training and assessment in EM. ACADEMIC EMERGENCY MEDICINE 2008; 15:1079–1087 ª 2008 by the Society for Academic Emergency Medicine Keywords: emergency medicine, medical education, simulation, procedures, technical skills, competence

From the Department of Emergency Medicine, Michigan State University ⁄ Kalamazoo Center for Medical Studies (RLL), Kalamazoo, MI; the Departments of Sport Medicine and Emergency Medicine, Allegheny General Hospital (MD), Pittsburgh, PA; the Department of Emergency Medicine, The Johns Hopkins School of Medicine (FK), Baltimore, MD; the Department of Emergency Medicine, Drexel University College of Medicine (SGT), Philadelphia, PA; the Department of Emergency Medicine, Wake Forest University Health Sciences (MTF), Winston Salem, NC; the Department of Emergency Medicine, Boston Medical Center (ATN), Boston, MA; the Department of Emergency Medicine, Yale University School of Medicine (LVE, KLD, CJH), New Haven, CT; the Department of Psychology, Western Michigan University (AG), Kalamazoo, MI; the Department of Emergency Medicine, SUNY Upstate Medical University (ER), Syracuse, NY; and the Department of Emergency Medicine, East Carolina University ⁄ Brody School of Medicine (WCR), Greenville, NC. Received June 27, 2008; revisions received July 3 and July 10, 2008; accepted July 10, 2008. This is a proceeding from a workshop session of the 2008 Academic Emergency Medicine Consensus Conference, ‘‘The Science of Simulation in Healthcare: Defining and Developing Clinical Expertise,’’ Washington, DC, May 28, 2008. Address for correspondence and reprints: Richard L. Lammers, MD; e-mail: [email protected].

ª 2008 by the Society for Academic Emergency Medicine doi: 10.1111/j.1553-2712.2008.00233.x

T

he Accreditation Council for Graduate Medical Education (ACGME) Outcomes Project was implemented in 2001 to shift the focus of graduate medical education from a process-based curriculum to an outcome-based curriculum. To encourage programs to formulate, evaluate, and utilize outcome-based evaluation instruments, the Emergency Medicine Residency Review Committee (EMRRC)1 implemented new formative evaluation requirements, which includes an annual, direct observation of a chief complaint, a resuscitation, and performance of an unspecified procedure by every resident. The method of evaluation is left to the discretion of each residency program.2 However, the ACGME Toolbox of Assessment Methods recommends the use of simulation to assess competence in four of six domains and ranks simulation among the ‘‘most desirable methods’’ for assessing medical procedure skills in the patient care domain.2 This article presents the work done in preparation for and completed during the 2008 Academic Emergency Medicine sponsored Consensus Conference (CC) on ‘‘The Science of Simulation in Healthcare: Defining and Developing Clinical Expertise,’’ held on May 28, 2008, in Washington, DC. One of the CC propositions was that well-designed simulations should become an integral component of this new approach to medical education. Therefore, emergency medicine (EM)

ISSN 1069-6563 PII ISSN 1069-6563583

1079

1080

Lammers et al.

educators and researchers must 1) utilize appropriate methodologies to measure skills, develop performance standards, and validate training and assessment tools and 2) identify optimal methods for the initial acquisition and maintenance of clinical and procedural skills that easily transfer to the clinical environment. During the CC, the Technical Expertise Consensus Group focused on technical, or ‘‘procedural,’’ skills. The goals of this article are 1) to review past work in these areas, drawing from disciplines within and outside of medicine; 2) to report on the findings of the consensus group; and 3) to propose a research agenda that will advance understanding of simulation-based EM procedural training and assessment. METHODS FOR ESTABLISHING A CONSENSUS An interest group consisting of EM educators convened at the 2007 SAEM Technology in Medical Education Committee meeting, recruited additional emergency physicians (EPs), behavioral psychologists, and researchers, and established an online discussion group to consider the scope of this project. Following a literature search, review of existing work, and identification of gaps in knowledge, the group selected six research themes: measurement of technical skills; development of performance standards; assessment and validation of training methods, simulator models, and assessment tools; optimization of training methods; transfer of skills learned on simulator models to patients; and prevention of skill decay over time. Proceedings from the discussion among participants of the Technical Expertise Session of the CC were incorporated into the final document. Some terms presented in this paper have acquired a variety of meanings in the literature. For clarity and consistency, specific definitions are listed in the Appendix. RESEARCH QUESTION 1: WHAT ARE THE BEST METHODS FOR MEASURING TECHNICAL PERFORMANCE? What Research Methodology and Study Designs Are Most Appropriate for Investigations of Procedural Training and Technical Skill Acquisition? Study designs that investigators might consider for simulation-based procedural research include observational, quasi-experimental, and randomized trials. Observational studies can identify baseline skill level, which is useful in a curriculum needs analysis. Investigators may observe, record, and analyze performances by experts to establish optimal methods and set performance standards. Quasi-experimental studies include pretest ⁄ posttest designs without random selection of subjects or randomized allocation of subjects to control and experimental groups. A randomized trial is the ideal experimental design for demonstrating cause and effect and for controlling bias. Randomized trials may be necessary to prove the efficacy of expensive simulation programs or to validate high-stakes examinations.



PROCEDURAL SKILLS: METRICS AND METHODOLOGY

What Are the Best Assessment Instruments for Measuring Procedural Skills? There are two general approaches to rating technical performance: global rating scales and checklists. Global rating scales have been validated for clinical skills examinations with standardized patients in other specialties.3,4 Global rating scales are subjective, but they provide flexibility in evaluating alternative or innovative approaches to a problem.5 Although global assessment tools that have been validated for objective structured clinical examination exercises may be transferable to EM simulations involving patient assessment, decisionmaking, and team management, these tools may not be optimal for procedural assessments. Since procedures tend to be sequential and predictable, checklists seem to be better for assessing technical skills. Checklists allow more thorough, structured, and objective assessments of the component skills of a procedure.6 Although a detailed checklist may include more steps than a trainee can memorize, it may still be useful to an instructor for guiding the trainee through a complex procedure. Can Technical Competence Only Be Assessed while the Procedure Is Being Performed on a Patient? Some procedures in EM are performed infrequently; therefore, opportunities for training and assessment of these skills are unpredictable and often simply unavailable.7 Simulation-based assessment removes issues of patient safety, instructor distraction, and unpredictability. Simulations allow instructors to develop standardized, controlled, and reviewable assessments.8 It may not be possible to transform a competent performer into an expert using only simulation-based training, but it is likely that technical competence can be demonstrated sufficiently on a procedural simulator. For rarely performed procedures, there may be no other alternative. Can Procedural Simulators Be Used in ‘‘High-stakes’’ Exams? High-stakes assessments are those that determine if a trainee should be given the privilege of performing a procedure in the clinical setting. The testing model and the methodology must be validated, and ‘‘the basis for the validation must be consistent, reproducible, and appropriately simulate the basic skills required to perform the procedure safely.’’9 The EM Council of Residency Directors has developed and shared procedural assessment tools for central venous catheter placement, lumbar puncture, thoracostomy tube insertion, and endotracheal intubation. However, few high-stakes assessment tools have been validated for EM procedures. Consensus Statement 1 Further research is needed to determine the best models and methods for assessing technical competence for EM procedures. Checklists derived through task analysis appear to be well suited for describing procedures, but it is not clear whether a combination of checklists and global assessments is preferable.

ACAD EMERG MED • November 2008, Vol. 15, No. 11



www.aemj.org

While context and fidelity may influence technical skill performance, the optimal setting to assess the degree of realism needed is unknown. The importance of the context of the assessments (i.e., a procedure embedded within a clinical scenario or isolated from the clinical context) and the number of raters and independent assessments needed in high-stakes examinations of procedure skills should be determined by further research. Assessments and scoring protocols should be procedure-specific and tailored to the learner. The components of a procedure have varying importance and may need to be weighted in a specific scoring protocol. Research is needed to determine if the use of expert panels is the most effective method of assigning weighting factors. RESEARCH QUESTION 2: HOW SHOULD PERFORMANCE STANDARDS FOR PROCEDURAL COMPETENCIES BE SET? How Can the Components of Procedures Be Defined? Defining the detailed steps of a procedure is a prerequisite to both training and testing. The description of a procedure needs to be objective, clear, and complete. It should include observable characteristics, be unambiguous, and denote what is included in and excluded from the behavior.10 Most procedures can be deconstructed into individual steps.11 Task analysis is a commonly used technique of analyzing job activities, and it has become an accepted research methodology with good intra- and interrater reliability in other specialties.12 Task analysis must provide a representative sample of the competencies expected, if not a complete replication of the procedure.13 Skill learning begins as problem solving, and trainees must recall and implement each step during practice. Once trainees reach a level of competence, many of these steps are combined into one, automatic, fluid motion. What Other Characteristics of Procedures Should Be Included in an Assessment? Operational definitions of competence may include critical actions, behavioral ratings, compliance ratings relative to algorithms, and simulated patient mortality.14 Procedural task complexity can be quantified using four dimensions: component complexity (e.g., number of steps in the procedure), coordinative complexity (requirements of timing and sequence), dynamic complexity (changes in the task over time), and cognitive complexity (knowledge recall and decision-making).15 Team interactions, affective issues, and external (system) complexity may impact performance of EM procedures. Performance time may also be an important feature of some EM procedures.8 Outcome measures might include overall success of the procedure and absence of complications. What Standard-setting Methods Are Appropriate for Defining Procedural Competency? A standard is a quantifiable level of performance that serves as a boundary between those who perform well

1081

enough and those who do not, in other words, a ‘‘minimally adequate level of performance.’’16 Standard setting is the process of determining a passing score or the lowest score that permits licensing, credentialing, or certification of competence.16 Using performance standards, educators can determine if a skill is successfully transferred to clinical practice. There are two, general categories of performance standards: absolute and relative. Absolute standards (also known as ‘‘criterion-referenced’’ standards) are set as a function of the amount of knowledge an examinee is expected to demonstrate.17 They may be reported as numbers or percentages of questions answered correctly on a written test or steps achieved on a psychomotor test.13,18 Relative standards (also called ‘‘reference-based’’ and ‘‘norm-referenced’’ standards) depend on the performances of the examinees; the passing score is determined by comparing an individual’s performance to a group. These standards are reported as a number or a percentage of examinees or as percentile scores. A passing score is not a fixed standard because the competence of groups can vary over time. Consequently, relative standards are rarely used in high-stakes examinations.17 Since the goal of procedural training in EM residency programs is to bring all residents to an acceptable level of performance and not to rank-order them, only absolute performance standards are appropriate.19 Therefore, it is necessary to set reference standards for a defined performance measure.20 A variety of specific, standard-setting methods for absolute standards have been reported.16,18,21–23 Standard-setting methods are imperfect and time-consuming and require examiner training, but they have been validated for knowledge-based competency assessment. Despite extensive research, there is no agreed upon method of setting performance standards specifically for technical skills.24 For Which Procedures Should Emergency Physicians Be Certified and Recertified? EM’s core content was revised in 2005 to include a comprehensive list of ‘‘Procedures and Skills Integral to the Practice of Emergency Medicine.’’25 The EMRRC lists 10 procedures with suggested performance numbers per resident. Despite calls to establish procedural competency standards, neither of EM’s certifying bodies, American Board of Emergency Medicine or the RRC, has established a specific definition.26 It is the responsibility of individual programs and institutions to define technical competence for its residents and postgraduate physicians. There are no published studies that resolve the question of which technical procedures in EM deserve certification or recertification. Consensus Statement 2 Assessment tools that establish a trainee’s procedural competence in the simulation laboratory before skills are attempted on patients would provide significant contributions to medical education and patient safety. The characteristics and the components of each procedure must be clearly defined before developing an assessment tool. Absolute performance standards are

1082

Lammers et al.

preferred over relative standards to assess procedural competence; however, this requires identifying procedural experts who can develop valid performance standards. Standard setting may require an iterative process whereby experts suggest a standard that is then tested by experienced clinicians. It is not clear if educators have the resources to bring all resident physicians to a demonstrable and welldefined level of competence for all EM procedures required in clinical practice. Educators must select the necessary procedures for which EM residents must acquire competence. RESEARCH QUESTION 3: WHAT METHODS SHOULD BE USED TO EVALUATE THE QUALITY OF SIMULATION-BASED TRAINING AND ASSESSMENT TOOLS? What Methods Should Be Used to Validate Simulation Training? When creating new complex, case-based simulations, educational product developers should describe the setting, events and conditions, training strategy, scripted behaviors of the trainers, expected behaviors of the trainees, and immediate and delayed consequences of trainees’ decisions and actions.27 Altering even one aspect of training can change the outcome. Although procedural simulations may require less detailed scenario descriptions, the expected actions of the trainee should be carefully defined. Investigators should include an evaluation of training integrity in the experimental study design. The trainers’ behavior should be monitored to ensure that the simulation is executed as designed and that each training group is presented with the appropriate material in the intended fashion.27 This is most commonly accomplished by having a nontrainer observe live or videotaped training sessions and rate the number of completed training behaviors on a checklist. Training integrity is frequently reported as a percent of training behaviors completed. ‘‘Training the trainers’’ prior to the implementing the instructional program is the first step toward improving training integrity.28 What Methods Should Be Used to Validate an Assessment Instrument or Simulator Model? The effectiveness and impact of any training method require an assessment strategy that yields valid, acceptable, reliable, accurate, and sensitive measures with which to quantify learning and performance. The quality of any assessment tool is demonstrated with these five benchmarks. Validity. Validity implies that the results of an assessment represent what the researcher intended for them to represent.29 Although traditional educational methods (such as textbooks, lectures, or supervised practice on patients) are seldom validated, various stakeholders are requiring validation of simulation, possibly because of initially high capital costs and faculty time commitment. Experimental design should control for major threats to internal validity, such as motivation effect, maturation of clinical skills over time, selection bias



PROCEDURAL SKILLS: METRICS AND METHODOLOGY

associated with using volunteer trainees, reactivity to (learning from) testing, invalid measures, attrition, and repeated measurement effects. The validity of the training model, protocol, and assessment instrument should be assessed. Reliability. ‘‘Reliability,’’ also known as ‘‘reproducibility’’ or ‘‘precision,’’ is the ability of multiple measurements to produce similar results. Reliability implies that the same individual will perform a procedure in the same manner on separate occasions or an evaluator will score repeated performances similarly.29,30 The greatest sources of measurement error are case design and raters.31 The most commonly used methods of testing reliability are 1) test–retest reproducibility and 2) internal consistency.32 In contrast to actual clinical encounters, one advantage of using simulation for competence assessment is the ability to present the situation, problem, or task in a consistent manner through scripting, programming, and fixed model design These characteristics provide reliability for evaluators and reproducibility for researchers.33 Sensitivity. A good assessment tool should be sensitive enough to detect important changes in skill level that have occurred as a result of the training.29 One common flaw in simulation research is the expectation of a large training effect from a small intervention using an insensitive measurement. Consensus Statement 3 Performance standards should be validated to the extent that is consistent with the purpose of the examination. For example, standards for high-stakes certification examinations should undergo more rigorous validations than formative evaluations used to satisfy instructors that new learners have achieved an acceptable skill level. Who sets the performance standard and how it was set should be described. Investigators should consider the recommendations of Cizek16 for demonstrating the validity or credibility of performance standards: 1) use task analysis or similar methods (content validity); 2) provide clear definitions of key constructs (e. g. ‘‘minimal competence’’ or ‘‘borderline performance’’); 3) document how the standard-setting panel of experts was selected; 4) assure that the experts were trained, understood the method, and applied it correctly; and ⁄ or 5) obtain an opinion from external sources that the standard is reasonable and appropriate for the procedure. A standard-setting process that requires an expert panel to evaluate each step in a procedure is usually preferred to one that uses a global decision about how many examinees should pass or what percentage of the procedure should be done correctly. Further research is needed to develop valid, accurate, reliable, sensitive, and acceptable assessment tools for many EM procedures. Training assessment tools should be sensitive enough to identify important, incremental changes in skill level and to track performance changes over time. The ideal combination of training devices needs to be determined for each procedure.

ACAD EMERG MED • November 2008, Vol. 15, No. 11



www.aemj.org

Demonstrating predictive validity is the ultimate goal of simulation research. Investigators should also consider other patient-centered outcomes, such as rates of success and complications, or patient satisfaction. RESEARCH QUESTION 4: WHAT ARE THE OPTIMAL CONDITIONS FOR LEARNING PROCEDURAL SKILLS USING SIMULATORS? What Are the Most Effective and Efficient Procedural Training Methods? Through a systematic review of simulation studies, Issenberg et al.34 determined that high-fidelity medical simulations facilitate learning under the ‘‘right conditions.’’ The optimal conditions applicable to procedural training are feedback during the learning experience, repetitive practice, integration of the simulation into the overall curriculum, increasing levels of difficulty during practice, a controlled environment, individualized learning, clearly defined and measured outcomes, and validation of the simulator. Educators who develop training and engineers who design simulators should incorporate these concepts into their training methods and models. How Many Times Does a Trainee Need to Repeat a Specific Procedure to Achieve Competence? In a systematic review of simulation literature, McGaghie et al.35 found a strong association between hours of practice on high-fidelity medical simulators and standardized learning outcomes. Additional research to demonstrate the same effect on ‘‘lower-fidelity simulator’’ is needed. Although the number of repetitions of a procedure, either in clinical practice or on a procedure simulator, is correlated with progression toward a performance standard, repetition alone is not an adequate metric by which to define competence.36 The number of repetitions of procedures needed to achieve competence is not clearly defined, but it likely varies among individuals and procedures. Can a Learning Curve for Specific Procedures Be Measured? Acquisition of most skills, knowledge, and confidence occurs gradually. Improvement in performance over time is described as the ‘‘learning curve.’’ If the learning curve for any given procedure can be established, performance level and training variables might be quantifiable. Educators could use a learning curve to determine the amount of student and faculty time required to teach a technical skill to a group of students or the amount of additional effort required to raise an individual student to a level of competence. Although the shape of the learning curve is likely to be different for each trainee, final learning outcomes should be identical.34 The best methods for measuring procedural learning curves are undefined. Can Procedural Simulators Accelerate the Learning Curve for Skill Acquisition? One goal of procedural simulation is to transfer the steep portion of the learning curve to the simulation lab, allowing trainees to perform their first attempt on

1083

a live patient at a level closer to competence. Complex procedures may be learned more effectively if the complexity of a set of tasks or steps is increased gradually. Lengthy procedures may be learned more rapidly if taught sequentially in smaller segments and then later combined into one, seamless performance.37 Therefore, different simulator models may be needed for different stages of training. Although studies suggest that physicians at all stages improve their skills with simulation-based training, novices usually demonstrate the greatest skill improvement for a given amount of instructor time. What Characteristics of Procedure Simulator Models and Training Methods Optimize Training Efficiency and Effectiveness? Simulation models have become more sophisticated (as well as more costly) without having been subjected to prerequisite validation studies. It is not known which procedures must be taught on animal models rather than on the simulator. Clearly defined performance objectives, outcomes, or benchmarks for each learner level should dictate design and use of a simulator model.38 Consensus Statement 4 Many factors can enhance the effectiveness of procedure-based simulation training. These include the quality and realism of the scenario, the level of experience of learners prior participant preparation, adequate time and space, the number of repetitions performed, the nature of the repetitions (i.e., deliberate practice), the training schedule, the instructor’s expertise, a low instructor-to-trainee ratio, the effective use of debriefing and feedback, and an institutional ‘‘culture of learning.’’ There has been limited research (with small sample sizes) to examine the impact of each of these factors on learning curves. Learning curves with interim benchmarks should be established for various procedures to advance our understanding of skill acquisition. Learning curves may be unique to each procedure and to the characteristics of the learner. Consequently, it may take years for EM researchers to conduct rigorously designed, randomized trials demonstrating training efficacy for the procedures that EM physicians must master, at a time when procedural competence assessments are urgently needed. Educators must determine for which procedures time- and resource-intensive simulation-based training methods are justified. RESEARCH QUESTION 5: HOW EFFECTIVELY ARE PROCEDURAL SKILLS THAT ARE LEARNED ON A SIMULATOR TRANSFERRED TO THE CLINICAL ENVIRONMENT? Is There Existing Evidence for the Transfer of Procedural Skills from Simulators to Patients? Skill transfer is the ability to perform a task on a real patient that was initially learned in a simulation-based environment. There is mixed evidence that simulationbased technical skills transfer to the actual clinical environment.39,40 The medical education literature does not

1084

Lammers et al.

describe how well this transfer occurs, the conditions that affect this transfer, or the point in simulation training at which the learner is ready to attempt a procedural skill at the bedside. Most research studies to date have demonstrated improved performance by residents and students on partial task trainers, laparoscopic simulators, and highfidelity simulators.41,42 Several authors have evaluated the transfer of surgical and airway skills from the simulator to the clinical environment. There is some evidence that transference occurs; however, the quality and consistency of methodologies and strength of inferences vary widely. Transferability of a learned skill to the bedside appears to be complex and difficult to study. Consensus Statement 5 The ultimate benefit of procedural simulators is to improve performance in the clinical setting and improve patient outcomes. To be considered effective, simulation should predict the learner’s technical performance on a patient.43 There is insufficient research on the transference of simulator-based procedural competence to performance on patients. Self-reported competency and number of exposures to a procedure in a simulation lab are insufficient measures of skill transfer. Some simulation-based assessments cannot be practically or ethically validated by comparison with actual clinical practice. Assessments of rarely performed procedures should not be held to this standard. EPs may be trained to perform infrequently encountered procedures using simulators and training methods that have not achieved the highest levels of validation. This will require a commitment by the learner to receive and incorporate formative assessment. Skill transfer to the clinical setting may be influenced by a number of variables, including the environment, clinical supervision and real-time prompts or cues during the procedure, performance feedback, observational bias, and nontechnical aspects of the procedure, such as the physician’s level of confidence. RESEARCH QUESTION 6: WHAT FACTORS INFLUENCE SKILL RETENTION? What Variables Should Be Considered in Studies of Skill Retention? Procedural skill decay, or the loss of some or all of the component skills necessary to perform a procedure after a period of nonuse, has been well established.44 The major factors that influence the retention of skills over extended periods of nonuse are listed in Table 1. It is unknown how often procedural skills should be refreshed or retrained or which procedures are most susceptible to skill decay. Procedures with task characteristics that decay more rapidly (such as the requirement for speed) may need to be retrained more frequently than those with more stable characteristics. How Should Technical Skill Retention Be Measured? When designing studies of skill retention, investigators should consider the following recommendations:



PROCEDURAL SKILLS: METRICS AND METHODOLOGY

Table 1 Factors That Influence Skill Retention Methodologic factors (modifiable during training) Instructional methods Degree of overlearning Conditions of retrieval Method of testing Evaluation criteria Task-related factors (inherent to the procedure) Length of the ‘‘retention interval’’ (between training and application) Individual trainee abilities and motivation Characteristics of the procedure 1. Difficulty 2. Complexity 3. Closed-loop tasks (well-defined) vs. open-loop tasks (requiring continuous responses) 4. Level of integration 5. Physical vs. cognitive nature of the task 6. Requirements for speed and accuracy 7. Natural vs. artificial nature of the task Created from Arthur et al.44

1. Measure the skill level of trainees prior to a training exercise. 2. Define the point at which a skill is acquired. In some studies, investigators train individuals to one, errorfree trial; others use relative standards (percentage of students performing the procedure correctly) as the criteria to stop training. Performance at the end of training may not, in itself, adequately predict long-term retention. 3. Use a repeated-measures design to evaluate skill retention. Unfortunately, studying procedural skill decay in the emergency department (ED) is logistically difficult because opportunities for procedures are unpredictable. If procedural competency in the simulated environment is shown to be predictive of clinical competency, skill retention may be studied effectively using simulators. 4. Account for interval learning. Ideally, subjects should not study or practice the procedure during the time between training and testing. Thus, procedures that are performed infrequently will be easiest to study. 5. Account for the training effect that may occur during testing sessions. 6. Blind study subjects to assessment date, if possible. This may reduce the chance that anxious subjects will study or practice the procedure prior to a performance date. Consensus Statement 6 Multiple variables should be measured in studies of procedural skill retention. Repeated assessments are necessary to establish learning and skill retention curves. Investigators should adjust for the training effect of repeated assessments whenever possible. Simulators allow practicing physicians to review infrequently used procedures as well as learn new procedures introduced into clinical practice after residency training. Understanding the rate of decay of EM

ACAD EMERG MED • November 2008, Vol. 15, No. 11



www.aemj.org

procedural skills will help educators establish appropriate adjustments to training interventions. CONCLUSIONS The consensus group explored six major research questions about simulation-based procedural training and assessment facing EM educators. The consensus statements may guide researchers attempting to advance our knowledge of technical skill acquisition. Currently, there are insufficient data to determine how well procedural skills learned on a simulator are transferred to the clinical environment. With further research, simulation could be integrated into routine procedural training as well as high-stakes competency certification processes. References 1. Accreditation Council for Graduate Medical Education. Emergency Medicine Residency Review Committee. Available at: http://www.acgme.org/ acWebsite/navPages/nav_110.asp. Accessed Jul 21, 2008. 2. Accreditation Council on Graduate Medical Education. General Competency Standards. Available at: http://www.acgme.org/outcome/comp/GeneralCom petenciesStandards21307.pdf. Accessed Jul 21, 2008. 3. Rothman A, Cohen R. A comparison of empiricallyand rationally-defined standards for clinical skills checklists. Acad Med. 1996; 71(Suppl):S1–3. 4. Bardes C, Colliver J, Alonso D, Swartz M. Validity of standardized-patient examination scores as an indicator of faculty observer ratings. Acad Med. 1996; 71(Suppl):S82–3. 5. Gordon J, McLauglhin S, Shapiro M, Bond W, Spillane L. Simulation in emergency medicine. In: Loyd G, Lake C, Greenberg C, eds. Practical Health Care Simulations. Philadelphia, PA: Elsevier, 2004: pp 299–337. 6. Rosenthal R, Gantert W, Hamel C, et al. Assessment of construct validity of a virtual reality laparoscopy simulator. J Lap Adv Surg Tech. 2007; 17:407–13. 7. Hayden S, Panacek E. Procedural competency in emergency medicine: the current range of resident experience. Acad Emerg Med. 1999; 6:728–35. 8. Chapman D, Rhee K, Marx J, et al. Open thoracotomy procedural competency: validity study of teaching and assessment modalities. Ann Emerg Med. 1996; 28:641–7. 9. Nicholson W, Cates C, Patel A, et al. Face and content validation of virtual reality simulation for carotid angiography: results from the first 100 physicians attending the Emory Neuro Anatomy Carotid Training (ENACT) program. Simul Healthc. 2006; 1:147–50. 10. Kazdin AE. Behavior Modification in Applied Settings. Florence, KY: Brooks ⁄ Cole; 1989, pp 54–6. 11. Aggarwal R, Grantchorov T, Darzi A. Framework for systemic training and assessment of technical skills. J Am Coll Surg. 2007; 204:697–705. 12. Slagle R, Weinger M, Dinh M, Brumer V, Williams K. Assessment of the intra-rater and inter-rater reliability of an established clinical task analysis methodology. Anesthesiology. 2002; 96:1129–39.

1085

13. Newble D. Guidelines for assessing clinical competence. Teach Learn Med. 1994; 4:213–20. 14. Bond W, Spillane L. The use of simulation for emergency medicine resident assessment. Acad Emerg Med. 2002; 9:1295–9. 15. Wood R. Task complexity: definition of a construct. Organ Behav Hum Dec Proc. 1986; 37:60–82. 16. Cizek G. Setting passing scores. Educ Meas: Issues Pract. 1996; (Summer):20–31. 17. deChamplain A. Ensuring that the competent are truly competent: an overview of common methods and procedures used to set standards on highstakes examinations. J Vet Med Educ. 2004; 31:62–6. 18. Norcini J. Setting standards on educational tests. Med Educ. 2003; 37:464–9. 19. McKinley D, Boulet J, Hambleton R. A work-centered approach for setting passing scores on performance-based assessments. Eval Health Prof. 2005; 28:349–69. 20. Boulet J, Champlain Ad, McKinley D. Setting defensible performance standards on OSCEs and standardized patient examinations. Med Teach. 2003; 25:245–9. 21. Boulet J, Murray D, Kras J, Woodhouse J. Setting performance standards for mannequin-based acute care scenarios. Simul Healthc. 2008; 3:72–81. 22. Morrison H. The passing score in the objective structured clinical examination. Med Ed. 1996; 30:345–8. 23. Plake B. Setting performance standards for professional licensure and certification. Appl Meas Educ. 1998; 11:65–80. 24. Talente G, Haist S, Wilson J. A model for setting performance standards for standardized patient examinations. Eval Health Prof. 2003; 26:427–46. 25. Thomas H, Binder L, Chapman D, et al. The 2003 model of the clinical practice of emergency medicine: the 2005 update. Acad Emerg Med. 2006; 13:1070–3. 26. Chapman D. Definitively defining the specialty of emergency medicine: issues of procedural competency. Acad Emerg Med. 1999; 6:678–81. 27. Baily J, Burch M. Research Methods in Applied Behavior Analysis. Thousand Oaks, CA: Sage Publications, 2002, pp 63–80. 28. Kazdin A. The token economy: a decade later. J Appl Behav Anal. 1982; 15:431–45. 29. Johnston J, Pennypacker H. Strategies and Tactics of Behavioral Research. Hillsdale, NJ: Lawrence Erlbaum, 1985, pp 135–43. 30. Reznick R. Teaching and testing technical skills. Am J Surg. 1993; 165:358–61. 31. Ferrell B. Clinical performance assessment using standardized patients: a primer. Fam Med. 1995; 27:14–9. 32. Carter F, Schijven M, Aggarwal R, et al. Consensus guidelines for validation of virtual reality surgical simulators. Simul Healthc. 2006; 1:171–9. 33. Scalese R, Obeso V, Issenberg S. Simulation technology for skills training and competency assessment in medical education. J Gen Intern Med. 2007; 23(Suppl):46–9.

1086

Lammers et al.

34. Issenberg S, McGaghie W, Petrusa E. Features and uses of high-fidelity medical simulations that lead to effective learning: a BEME systemic review. Med Teach. 2005; 27:10–28. 35. McGaghie W, Issenberg S, Petrusa E, Scalese R. Effect of practice on standardized learning outcomes in simulation-based medical education. Med Educ. 2006; 40:792–7. 36. Ericsson K. Deliberate practice and the acquisition and maintenance of expert performance in medicine and related domains. Acad Med. 2004; 10(Suppl):S1–12. 37. Brydges R, Carnahan H, Backstein D, Dubrowski A. Application of motor learning principles to complex surgical tasks: searching for the optimal practice schedule. J Motor Behav. 2007; 39:40–8. 38. Lujan H, DeCarlo S. First year medical students prefer multiple learning styles. Adv Physiol Educ. 2006; 30:13–6. 39. Prystowsky J, Regehr G, Rogers D, Loan J, Hiemenz L, Smith K. A virtual reality module for intravenous catheter placement. Am J Surg. 1999; 177:171–5. 40. Anastakis D, Wanzel K, Brown M, McIlroy J, Hamstra S, Ali J. Evaluating the effectiveness of a 2-year-curriculum in a surgical skills center. Am J Surg. 2003; 185:378–5. 41. Fried G. Proving the value of simulation in laparoscopic surgery. Ann Surg. 2004; 240:518–28. 42. Kneebone R, Kidd J, Nestal D. Blurring the boundaries: scenario-based simulation in a clinical setting. Med Educ. 2005; 39:580–7. 43. Wong J, Matsumoto E. Primer: cognitive motor learning for teaching surgical skill–how are surgical skills taught and assessed? Nat Clin Prac Urol. 2008; 2008:47–54. 44. Arthur W Jr, Bennett W Jr, Day EA, McNelly TL. Skill Decay: A Comparative Assessment of Training Protocols and Individual Differences in the Loss and Reacquisition of Complex Skills. Springfield, VA: National Technical Information Service, 2002. 45. Stolovitch H, Keeps E. Telling Ain’t Training. Alexandria, VA: ASTD Press, 2006. 46. Kneebone R, Kidd J, Nestel D, Asvall S, Paraskeva P, Darzi A. An innovative model for teaching and learning clinical procedures. Med Educ. 2002; 36:628–34. 47. McLaughlin S, Doezema D, Sklar D. Human simulationn in emergency medicine training: a model curriculum. Acad Emerg Med. 2002; 9:1310–8. 48. Dawson S. Procedural simulation: a primer. J Vasc Intervent Radiol. 2006; 17:205–13. 49. Ericsson K. The Influence of Experience and Deliberate Practice on the Development of Superior Expert Performance. The Cambridge Handbook of Expertise and Expert Performance. Cambridge, UK: Cambridge University Press, 2006, pp 685–706. 50. Ericsson K, Lehmann A. Expert and exceptional performance: evidence on maximal adaptations on task constraints. Annu Rev Psych. 1996; 47:273–305. 51. Bereiter C, Scardamalia M. Surpassing Ourselves: An Inquiry into the Nature and Implications of Expertise. Chicago and La Salle, IL: Open Court Publishing Company, 1993, p 11.



PROCEDURAL SKILLS: METRICS AND METHODOLOGY

APPENDIX Glossary of Terms Procedure—any maneuver or technique requiring manual dexterity that is used to accomplish a specific task during the medical management of a patient. Procedures are composed of a series of discrete steps, tasks, or actions that are sequential and that have a beginning and an end. In this article, the term ‘‘skill’’ refers to the ability to perform an entire procedure or any component of it, as a result of training and ⁄ or practice. The purpose of ‘‘training’’ is to create a change in the behavior of learners (including knowledge and skills) that is consistently reproduced without variation. With additional training, the behavior that is learned is performed with fewer errors, greater speed, and automaticity and under more demanding conditions. The purpose of ‘‘instruction’’ is to help learners generalize beyond the specifics of what is taught. The purpose of ‘‘education’’ is to build general mental models and value systems by synthesizing principles, concepts, experiences, and by role modeling.45 Procedural skills require consistent and well-rehearsed performances. In this article, we refer to the teaching of procedural skills as a process of training. Medical simulation—a method of training, instruction, education, or assessment in which learners use models, devices, or other representations that imitate patients, anatomic regions, clinical tasks, or processes in realistic situations and settings in which medical services are rendered.33 Some simulations utilize actors portraying patients or hybrids of models and actors.46 Simulation can be used to teach decision-making and other cognitive skills, teamwork, technical or psychomotor skills, professional skills, physiology, pharmacology, and other elements of a curriculum.14,33,47 Simulator—a device, model, or representation used to recreate a patient or patient care environment. These have been classified by degree of realism, components, capabilities or intended use and include procedural simulators, partial task trainers, computer-enhanced mannequins, hybrid simulators, and immersive or virtual reality simulators.33 The term ‘‘procedural simulator’’ as used in this article describes a tool that replicates the technical aspects of a clinical procedure. Procedural simulators range in complexity, cost, interactivity, and realism, from a knot-tying board or an orange (as a skin model for intramuscular injection) to an angiography simulator (used to simulate coronary or carotid artery catheterization in an angiography suite). ‘‘Partial task trainer’’ is another term used to describe procedural simulators that represent three dimensional body parts (e.g., intubation heads and peripheral intravenous access arms). Partial task trainers are generally used to teach a portion of a procedure or other, isolated skills. ‘‘Virtual reality’’ simulators are computer-driven devices that display and replicate the visual, auditory, and tactile elements of the physical world and user interactions on a computer screen or other display.33 Not all medical simulations rely on technical equipment or simulator models, but essentially all procedure simulations do.

ACAD EMERG MED • November 2008, Vol. 15, No. 11



www.aemj.org

Fidelity—a qualitative description of the realism of a simulation experience or simulator. High-fidelity simulators closely replicate the external appearance, physiology, anatomy, behaviors, and social responses of a real patient that are relevant to the educational exercise. Low-fidelity simulators are inexact approximations of these features. Newer, computer-enhanced mannequins are relatively technologically complex devices consisting of plastic, polymers, internal wiring, and air hoses, and they are driven by computer software that can be programmed by an instructor. Perhaps the lowest-fidelity simulations are oral examination ‘‘thought’’ exercises in which the learner elicits and responds to information provided verbally by the instructor. The importance of simulation fidelity is determined by the goal of the educational exercise. Assessment tool or assessment instrument—a mechanism for recording performance, which is based on a system of education metrics that identifies progress and learning for the training.48 Assessment tools range from checklists and surveys to electronically flagged video recordings. All assessment tools are

1087

accompanied by scoring protocols used to evaluate performance. Competence—an implied level of skill, knowledge, goal-directed strategies, or experience sufficient to accomplish a procedure successfully and safely under normal conditions or usual circumstances.49 Proficiency—further advancement in knowledge or skill, plus an ability to adapt to situations as they arise. Someone who is proficient can accomplish tasks or procedures with less planning and problem-solving. Expertise—a level of skill resulting from a combination of a superior body of formal and informal knowledge, practice, refinement of skills, and experience in problem-solving.50 Knowledge itself is not expertise, and expertise can exist apart from specialization. The expert addresses new problems within his or her field at the upper limit of possible complexity, whereas the experienced nonexpert is more likely to rely on practiced routines.51 Overlearning—deliberate training beyond what is required to meet performance standards.