Development and validation of tools for assessing use of personal ...

7 downloads 0 Views 258KB Size Report
Background: Incorrect use of personal protective equipment (PPE) may lead to the spread of ... transmission of HCAIs are hand hygiene and the use of personal.
American Journal of Infection Control 41 (2013) 28-32

Contents lists available at ScienceDirect

American Journal of Infection Control

American Journal of Infection Control

journal homepage: www.ajicjournal.org

Major article

Development and validation of tools for assessing use of personal protective equipment in health care Camille K. Williams MHSc a, *, Heather Carnahan PhD b, c a

Graduate Department of Rehabilitation Science, and Wilson Centre for Research in Education, Faculty of Medicine, University of Toronto, Toronto, ON, Canada Centre for Ambulatory Care Education, Women’s College Hospital, Toronto, ON, Canada c Department of Occupational Science and Occupational Therapy, and Wilson Centre for Research in Education, Faculty of Medicine, University of Toronto, Toronto, ON, Canada b

Key Words: Routine precaution Infection control Procedural skill Checklist Global rating scale

Background: Incorrect use of personal protective equipment (PPE) may lead to the spread of infectious agents among health care workers and patients. Although novel education programs show promise, there is no standard evaluation for the competencies developed during training. Methods: A Delphi methodology was used in which checklist and global rating items for evaluating the performance of PPE skills involving gloves, gowns, masks, eye protection, and hand hygiene were generated and iteratively distributed to a panel of experts. The panel rated the importance of each item until agreement was reached, and the relevant items were used to form the Tools for Assessment of PPE Skills (TAPS), comprising 3 checklist sections (hand hygiene, donning, and doffing) and a global rating scale. Newly trained and experienced PPE users participated in experiments to evaluate the reliability, construct validity, and responsiveness of TAPS. Results: TAPS demonstrated interobserver reliability, and its global rating scale differentiated the performance of newly trained users and experienced users and was sensitive to changes in performance over time. Conclusions: Pending further validation studies, the TAPS may facilitate the development and evaluation of educational programs to support learning and retention of PPE skills, leading to enhanced patient and health care worker safety. Copyright Ó 2013 by the Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.

The transmission of health careeassociated infections (HCAIs) is a major concern for most health care facilities, threatening patient safety by contributing to unnecessary suffering, morbidity, and sometimes mortality. In Britain, 8% of patients admitted to hospitals are affected,1 in the United States, HCAIs were the leading reportable disease in 2002,2 and in Canada, an estimated >220,000 HCAIs occur in hospitals, leading to >8,000 deaths annually.3 HCAIs present physical, social, psychological, and financial costs to patients and their families, as well as financial costs to health care systems.4 Two important measures to help prevent and limit the transmission of HCAIs are hand hygiene and the use of personal

* Address correspondence to Camille K. Williams, MHSc, Graduate Department of Rehabilitation Science and Wilson Centre for Research in Education, Faculty of Medicine, University of Toronto, 160-500 University Avenue, Toronto, ON, M5G 1V7, Canada. E-mail address: [email protected] (C.K. Williams). H.C. is supported by the BMO Financial Chair in Health Professions Education Research. Conflict of interest: None to report.

protective equipment (PPE), including gloves, gowns, masks, and various forms of eye protection. This was confirmed in a survey of required infection prevention and control (IPC) competencies for various hospital-based health care workers, where researchers found that proper hand hygiene, selection of appropriate PPE for each category of transmission-based precautions, and demonstrations of donning and doffing PPE were required objectives for all hospital-based health care workers.2 Unfortunately, even when PPE is used, errors in technique may reduce or negate its intended effects.5 Furthermore, a lack of assessment of infection control competencies6 may suggest to learners that these aspects of clinical competency are less valuable than others. The proper use of PPE depends on knowledge of infection control techniques as well as an understanding of infection control principles that inform routine and additional precautions, such as route of transmission of infectious agents, clinical activities, and the clinical environment. Infection control audits developed to examine practices and procedures in clinical wards or services from a system perspective have proven beneficial to facility-wide infection control programs.7,8 However, audits do not directly

0196-6553/$36.00 - Copyright Ó 2013 by the Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved. doi:10.1016/j.ajic.2012.01.027

C.K. Williams, H. Carnahan / American Journal of Infection Control 41 (2013) 28-32

assess a particular educational program and are not able to directly assess the competence of a particular health care worker or trainee. Furthermore, there is currently no widely accepted and standardized metric for assessing IPC competencies, particularly the use of PPE and hand hygiene. Consequently, each study that undertakes evaluation of an IPC education program does so using an evaluation method that is not validated, and the inconsistencies that may result are unknown. The overall objective of the present study was to produce a validated PPE skills assessment tool in 2 phases. In the first phase, we developed assessment tools, which we call Tools for Assessment of PPE Skills (TAPS), by conducting a Delphi survey of IPC experts across Canada to identify key aspects of PPE skills. The Delphi method has been used for various purposes in health care and other fields.9 This consensus-building process is meant to enhance the individual opinions of experts and obtain a collective expert opinion about a particular question or issue.9,10 Delphi panelists remain anonymous, so that the group process is not unduly influenced by the reputation or opinion of any one panelist.10 In the second phase, we performed validation tests on the novel assessment tools to determine their suitability for evaluating trainees’ IPC skills. We hypothesized that the tools would demonstrate that PPE users with more experience perform better than newly trained users, and that newly trained users improve immediately after acquisition and practice but regress over a 1-week period without practice or review. METHODS Development of assessment tools The Delphi process, described in detail below, was classified as a program evaluation activity by our institution’s Office of Research Ethics and did not require ethics approval. The primary investigator generated an initial list of items describing the procedures for hand hygiene as well as donning and doffing PPE for routine practices from 3 sources: (1) the Infection Prevention and Control Core Competency Education module in Routine Practices developed by the Ontario Ministry of Health and Long-Term Care,11 (2) federal and provincial guidelines for IPC in acute and long-term care facilities, and (3) published academic and nonacademic literature, media, and online forums on the topic of PPE use. The initial list included 27 techniquespecific checklist items and 3 rating scale items describing global or holistic performance measures. Two local IPC experts (the advisory committee) reviewed the list, which was then uploaded to Survey Monkey (http://www.surveymonkey.com). Delphi panelists were recruited via electronic mail from the following groups in Canada: IPC instructors in teaching hospitals or universities, allied health or medical professionals who regularly practice IPC, leaders in provincial and/or national health policy development regarding IPC, and authors of relevant articles in peer-reviewed journals. Item evaluation and analysis of Delphi data Thirty Delphi panelists responded to the first round, 25 responded to the second round, and 23 responded to the third round (76.7% overall retention). The respondents were quite experienced and represented a range of IPC perspectives; 19 (63%) had been practicing IPC for >10 years, 13 (43%) were involved in clinical practice, 25 (83%) were in education, 15 (50%) were involved in policy making, and 16 (53%) were in research. In each Delphi round, the panelists rated and commented on the checklist and global rating items, identified any additional performance indicators for PPE use, and revised any items that were ambiguous or inadequate. For each checklist and global rating item, participants used a 5-point Likert scale (1, completely unimportant to 5, extremely important) to rate the importance of that item for

29

assessing PPE skills. In the second round, the results were resent to the group, with the items that had achieved consensus highlighted and the group median, mode, and range of responses provided. Each panelist received a file outlining his or her response to each item in the previous round. The panelists then rerated the items, after which new responses to items were recorded. This process was repeated for a third round. The results were analyzed using median and mode responses to determine which items achieved positive or negative consensus.12 Positive consensus was defined as 80% of respondents choosing 4 (somewhat important) or 5 (extremely important), and negative consensus was defined as 80% of respondents choosing 1 (completely unimportant) or 2 (somewhat unimportant). Eight checklist items and all 3 global rating items were accepted in round one. Panelists requested a separate checklist section to assess hand hygiene, 3 new global rating items, and a pass/fail assessment item. They also expressed concerns about the consistency of results that would be generated from global rating evaluation, as well as the quality and usefulness of feedback that users would receive. At the end of round three, 43 checklist items and 6 global rating items were accepted. The final assessment tools are summarized in Figure 1 (a copy of the tools is available on request from the corresponding author). Cronbach’s a, providing an estimated reliability of the sum of the panelists’ responses,10 was calculated to measure the group’s consistency for each round. Alpha values >0.7 are adequate for research purposes, whereas values >0.9 are required for clinical applications.13 Cronbach’s a values across all 3 rounds ranged from 0.82 to 0.99 (Table 1). There was no a value for hand hygiene in round one or global rating in round three, because hand hygiene was introduced as a separate section in round two and all global rating items achieved consensus at the end of round two. Scoring The scoring system gave credit for the selection, donning, and doffing of each item of PPE, as well as the sequence in which multiple items were donned and doffed. For all sections of the TAPS, tasks that are not relevant or not assessed because of the test scenario or environment are marked “not applicable” (NA). For the hand hygiene, donning, and doffing checklists, 1 point is awarded for each task done correctly and 0 points are awarded for each task not done or done incorrectly (dichotomous scoring). The total hand hygiene score is the sum of all points awarded. The donning checklist also includes a score for PPE item selection, with 5 points added for each required PPE item selected and 5 points subtracted for each required PPE item not selected. Both the donning and doffing checklists also include a sequence score awarded for donning or doffing all required items in the correct order, with 5 points awarded for each required PPE item in a perfect sequence. If there are any errors in sequence, 0 points are awarded for the sequence score. The total donning score is the sum of the dichotomous checklist scores, the selection score, and the donning sequence score. The total doffing score is sum of the dichotomous checklist scores and the doffing sequence score. For each item on the global rating scale, the participant receives NA or a numerical score on a Likert scale of 1-5. The total global rating score is the sum of the numerical scores. Validation of assessment tools Our institution’s Research Ethics Board approved the validation protocol, and all participants provided voluntary informed consent before participating, in accordance with the guidelines set out by the 1964 Declaration of Helsinki and our institution’s Office of Research Ethics.

30

C.K. Williams, H. Carnahan / American Journal of Infection Control 41 (2013) 28-32

Fig 1. Summary of the Tools for Assessment of Personal Protective Equipment Skills, showing major sections with the number of assessment items in parentheses.

Table 1 Cronbach’s a values for each section and iteration of the Delphi survey

Round one Round two Round three

Hand hygiene

Donning

Doffing

Global rating

NA 0.89 0.96

0.88 0.98 0.99

0.82 0.96 0.97

0.94 0.93 NA

NA, not applicable.

Data collection Twenty-nine individuals with no or minimal experience in health care delivery (newly trained group) underwent a Web-based training session comprising the Ontario Ministry of Health and Long-Term Care Core Competency Education modules in hand hygiene and routine practices (maximum 40 minutes).11 After this training, the newly trained participants responded to a mock clinical scenario (baseline test), and their performance was videotaped. The participants selected and demonstrated donning and doffing of PPE for the clinical activities described while explaining their choices and actions. They then had an additional 20 minutes to obtain feedback from the investigator, review the material, and/or practice using the PPE independently. A second mock scenario (immediate transfer test), different from the baseline scenario, was then administered, and participants returned 1 week later for a delayed transfer test (using the same scenario as in the immediate transfer test). Eleven individuals with moderate to extensive experience using PPE in health care (experienced group) had the option to review the training modules (maximum of 40 minutes) before completing the baseline test but did not participate in the review/practice session, the immediate transfer test, or the delayed transfer test. The principal investigator trained 2 IPC experts (Infection Control Practitioners) on the use of the assessment tools. Blinded to the users’ levels of experience, these experts reviewed the randomized videotaped performances and evaluated each performance using the TAPS. Data analysis Interobserver reliability (the consistency of measurements between independent observers14) was evaluated by comparing baseline scores for the same performance from 2 expert observers using a single measure intraclass correlation coefficient (ICC) with a 95% confidence interval; an ICC >0.75 was considered acceptable. Construct validity refers to a tool’s ability to measure the intended construct, and can be inferred if the tool is able to discriminate different levels of trainees or skill.15 This comparison was done by subjecting baseline scores on each section of the TAPS to a one-way analysis of variance (ANOVA) using level of experience (experienced or newly trained) as a between-subject variable. Responsiveness or sensitivity of the tool refers to its ability to reflect important change in the construct being evaluateddthe skills of users over time.16 Repeated-measures ANOVA was performed with

Fig 2. Mean scores (with standard error bars) for newly trained and experienced users at baseline testing showing a statistically significant difference between groups for the global rating (as determined by one-way ANOVA). The triangles indicate the maximum possible scores for each category.

time of test (baseline or immediate transfer or delayed transfer) as a within-subject variable for the newly trained users’ hand hygiene and global rating scores. Only the scores from the hand hygiene and global rating sections were compared, because unlike the donning and doffing sections, these sections are not directly related to the specific scenario used for testing. The scenario used in both transfer tests differed from that used for the baseline test, which required fewer PPE items and would inevitably produce lower scores for donning and doffing. In addition, scores from only 21 participants were used in the analysis, because of attrition and technical difficulties. Effects were considered significant at P < .05, and post hoc comparisons were analyzed using Bonferroni correction with SPSS 17.0 (IBM, Armonk, NY). RESULTS Before the baseline test, users from the newly trained group used the tutorial for a longer period of time (mean  SEM, 38  1 minutes) than experienced users (13  3 minutes). Reliability analysis revealed that scores provided by both observers were in agreement (ICC ¼ 0.92 for both consistency and absolute agreement interpretations of the data). Thus, average interobserver scores were used for all subsequent analyses. Analysis of between-group performance revealed no significant difference between the groups for hand hygiene (P ¼ .108), donning (P ¼ .649), or doffing (P ¼ .319). However, experienced users scored significantly higher than newly trained users on the global rating [F(1,38) ¼ 6.82; P ¼ .013; u2 ¼ .13] (Fig 2). Kirk has suggested that u2 values of 0.01, 0.06, and 0.14 represent small, medium, and large effect sizes, respectively.17 Analysis of differences for time of test (Fig 3) revealed no significant difference among scores for the 3 tests on the hand hygiene checklist (P ¼ .099). For the Global Rating time of test analysis, Mauchly’s test indicated that the assumption of sphericity had been violated [c2(2) ¼ 7.04; P ¼ .03]; therefore, degrees of freedom were corrected using Huynh-Feldt estimates of sphericity (ε > 0.75). The corrected results showed that the global ratings of the 3 tests were significantly different with a medium effect size [F(2,19) ¼ 5.20; P ¼ .016; u2 ¼ 0.08], and post hoc tests revealed that baseline scores were significantly lower than delayed transfer scores (P ¼ .006), but that there was no difference between immediate transfer and delayed transfer scores.

C.K. Williams, H. Carnahan / American Journal of Infection Control 41 (2013) 28-32

Fig 3. Mean scores (and standard error bars) for newly trained users at 3 test points showing a statistically significant difference between baseline and delayed transfer for the global rating (as determined by repeated-measures ANOVA with Bonferroni adjustments of post hoc comparisons). The triangles indicate the maximum possible scores for each category.

Using an independent t-test (equal variances not assumed), we found no significant difference in global rating scores between experienced users at baseline and newly trained users at delayed transfer (18.7  1.3 vs 16.2  0.5; t(13.61) ¼ 1.89; P ¼ .081; r ¼ 0.46). Based on Cohen’s 1988 and 1992 studies,17 r ¼ 0.5 represents a large effect size. Post hoc power analysis using G*Power 3.118 revealed >80% power, indicating that the experiment was sufficiently powered to detect any effects that might exist. DISCUSSION Using the Delphi consensus technique, we successfully engaged a sample of IPC professionals to identify the most relevant items when evaluating health care workers’ performance in selecting, donning, and doffing PPE. Retention of participants throughout the Delphi process was moderate, and the internal consistency of the group was moderate to high. The items on which panelists achieved consensus included both technique-specific checklist items that are scored dichotomously and holistic global rating items that are scored on a 5-point Likert scale and anchored by 3 descriptive phrases. The checklist and global rating items were organized into a modular assessment tool that can be adapted to suit the context of the educational/assessment program. Interobserver reliability analysis suggests that multiple independent observers can use TAPS to achieve similar scores for the same performance. Although this is an important finding, future studies are needed to assess test-retest reliability and intraobserver reliability. Interestingly, results of construct validity analysis showed that the checklist sections of the TAPS (hand hygiene and donning and doffing of PPE) were not able to differentiate between newly trained and experienced users, even though the donning and doffing sections showed a trend toward higher scores for the experienced users. In contrast, the global rating demonstrated a significant difference between newly trained and experienced users, with a large effect size. We propose that several factors may be responsible for this result, including characteristics of the skill, the testing procedure, the participants, and the assessment tool. First, the nature of our testing environment might not have been equally conducive for validating the checklist and global rating sections of the TAPS. Use of PPE may be described as an open skilldone that occurs within a dynamic and perhaps unpredictable

31

environment. However, we created an artificially closed environment that, without interaction with patients or other health care workers or the need to perform a clinical activity, was stable and predictable. This likely allowed newly trained users to focus on each component of the skill separately and thus score well on the technique-specific checklist sections of the assessment. Furthermore, testing newly trained users immediately after completion of the tutorial for skill acquisition might have created a situation in which improved performance represented a transient practice effect instead of a relatively permanent improvement in ability (ie, learning19). This practice effect highlights the purpose of transfer testsdto avoid making conclusions about learning based on postacquisition performance. Second, the differences in skill between the 2 participant groups were not immense; experienced users were not IPC experts, newly trained users were not true novices, and users in both groups reported a range of experiences. Obviously, 2 distinctly different user groups would increase the chances of obtaining significant between-group differences; however, we believe that this scenario would not reflect the skill levels of those most likely to be evaluated using the TAPS. Finally, previous studies of assessment of medical competence have suggested that checklists might not be ideal for differentiating between novices and experts, because experts perform at a more automated level that is not easily broken down into component steps.20 These component steps tend to characterize novice performance and are the basis for checklist assessments. In contrast, appropriate performance, as judged by holistic or global indicators, requires practice and experience. We believe that the TAPS global rating scale is able to capture this in the performance of the experienced users and thus provides a better differentiation of skill level in our testing scenarios. Although we have discussed validity as a property of the TAPS, others have suggested that discussing it as a property of the application of the test is more productive.21,22 In fact, validity is highly contextual, and as such, our tool might or might not prove valid in other environments with different participants using different testing procedures. Further testing in other contexts, particularly those including a real or simulated patient, would allow validation of the entire global rating scale as well as address the utility of the TAPS for providing effective feedback to the participants being assessed. In fact, we believe that the ability of the TAPS global rating scale to detect differences despite the aforementioned limitations of the TAPS, suggests that it might provide more significant performance data and also provide feedback regarding higher-order decision making skills that can help users deepen their understanding of infection control procedures and principles. The responsiveness of the TAPS was inferred from the differences in performance of newly trained users on the baseline, immediate transfer, and delayed transfer tests. However, the scenario used for each test was a confounding factor in this analysis, hindering comparisons of user performance in real-world applications. As mentioned earlier, the scenario was changed between the baseline and immediate transfer tests to prevent participants from relying on simple recollection of previous performance and feedback instead of skill acquisition and understanding, and to emphasize the fact the use of PPE is procedural skill that encompasses the ability to assess any given situation and respond appropriately. Given that the scores for the TAPS hand hygiene checklist and global rating scale did not vary directly by scenario due to the number of test items, we chose to compare the performances across test periods/scenarios using only these sections. However, trainers could also use percentage of maximum scores to compare scores on all sections within and between testing

32

C.K. Williams, H. Carnahan / American Journal of Infection Control 41 (2013) 28-32

scenarios and participants. In fact, we designed the tools modularly to support such adaptations in use. The analysis of responsiveness yielded similar results as the between-group analysis, with significant differences in global rating scale scores but not in hand hygiene checklist scores. Newly trained PPE users had higher delayed transfer scores than baseline scores. Comparison of newly trained users’ delayed transfer scores and experienced users’ baseline scores revealed that even though the experiment was adequately powered to detect an effect, there was no statistically significant difference between groups. However, the effect size for this analysis was large, and thus future experiments are warranted to explore whether the between-group differences observed here are clinically significant for patient safety outcomes. Nonetheless, the trend in scores and statistical analyses suggest that users benefited from practice and feedback and, contrary to our hypothesis, they were able to consolidate their knowledge over a 1-week period and increase their performance scores between the immediate transfer and delayed transfer tests. In summary, the global rating section of the TAPS demonstrated construct validity, distinguishing the performance of newly trained and experienced users as well as responsiveness, detecting improvement in the performance of newly trained users after practice and a 1-week period. It is likely that newly trained users were not distinguishable from experienced users based on checklist items because these items are easier to recall immediately after skill acquisition, and because checklist items might not be ideal for evaluating experienced users. At 1 week after skill acquisition, newly trained users were able to achieve global rating scores similar to those of experienced users; however, further studies are needed to determine what scores accurately reflect clinically significant skill levels. Further validation testing with other clinical scenarios and IPC experts may serve to validate the utility of the checklist sections or confirm the present findings that the TAPS global rating scale is sufficient and efficient for assessment of PPE skills. Acknowledgment We thank Zahir Hirji and Dr Michael Gardam for lending their expertise as members of the Advisory Committee. We also thanks the members of the Delphi panel: Chingiz Amirov, Mary Agnes Beduz, Frederic Bergeron, Dr Erika Bontovics, Candice Botelho, Laurie Boyer, Risa Cashmore, Jennette Coates, Marg Creen, Tim Cronsberry, Sarah Eden, Jim Gauthier, Dr Elizabeth Henderson, Dr B. Lynn Johnston, Linda Kingsbury, Debbie Lam-Li, Dr Allison McGeer, Charlene McMahon, Dr Donna Moralejo, Dr Kathryn Nichol, Ramona Rodrigues, Suzanne Rowland, Denise Sorel, Dr Geoffrey D.

Taylor, Victoria Williams, Samantha Woolsey, and 4 other panelists who wish to remain anonymous. References 1. Jenner EA, Wilson JA. Educating the infection control team: past, present and future. A British perspective. J Hosp Infect 2000;46:96-105. 2. Carrico RM, Rebmann T, English JF, Mackey J, Cronin SN. Infection prevention and control competencies for hospital-based health care personnel. Am J Infect Control 2008;36:691-701. 3. Zoutman DE, Ford BD, Bryce E, Goudeau M, Hebert G, Henderson E, et al. The state of infection surveillance and control in Canadian acute care hospitals. Am J Infect Control 2003;31:266-72. 4. Pratt RJ, Pellowe CM, Shelley J, Adams J, Loveday HP, King D, et al. Using a blended e-learning model to provide accessible infection prevention and control training for NHS staff: The NHSU/TVU/Intuition approach. Br J Infect Control 2005;6:16-8. 5. Hambraeus A. Lowbury lecture 2005: infection control from a global perspective. J Hosp Infect 2006;64:217-23. 6. McKinley RK, Strand J, Ward L, Gray T, Alun-Jones T, Miller H. Checklists for assessment and certification of clinical procedural skills omit essential competencies: a systematic review. Med Educ 2008;42:338-49. 7. Bryce EA, Scharf S, Walker M, Walsh A. The infection control audit: the standardized audit as a tool for change. Am J Infect Control 2007;35:271-83. 8. Millward S, Barnett J, Thomlinson D. A clinical infection control audit programme: evaluation of an audit tool used by infection control nurses to monitor standards and assess effective staff training. J Hosp Infect 1993;24: 219-32. 9. de Villiers MR, de Villiers PJT, Kent AP. The Delphi technique in health sciences education research. Med Teach 2005;27:639-43. 10. Graham B, Regehr G, Wright JG. Delphi as a method to establish consensus for diagnostic criteria. J Clin Epidemiol 2003;56:1150-6. 11. Ontario Ministry of Health and Long-term Care. Infection prevention and control core competency education modules. Available from: http://www .health.gov.on.ca/english/providers/program/infectious/infect_prevent/ipccce_ ref.html. Accessed January 21, 2009. 12. Svensson E. Guidelines to statistical evaluation of data from rating scales and questionnaires. J Rehab Med 2001;33:47-8. 13. Bland JM, Altman DG. Cronbach’s alpha (statistics notes). BMJ 1997;314:572. 14. Thorndike R. Reliability. In: Haertel GD, Walberg HJ, editors. The international encyclopedia of educational evaluation. New York [NY]: Pergamon Press; 1990. p. 260-73. 15. Zeller R. Validity. In: Haertel GD, Walberg HJ, editors. The international encyclopedia of educational evaluation. New York [NY]: Pergamon Press; 1990. p. 259. 16. Guyatt GH, Deyo RA, Charlson M, Levine MN, Mitchell A. Responsiveness and validity in health status measurement: a clarification. J Clin Epidemiol 1989;42:403-8. 17. Field AP. Discovering statistics using SPSS (and sex and drugs and rock ‘n’ roll). 3rd ed. London: Sage; 2009. p. 57, 390. 18. Faul F, Erdfelder E, Lang A, Buchner A. G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 2007;39:175-91. 19. Guadagnoli MA, Lee TD. Challenge point: a framework for conceptualizing the effects of various practice conditions in motor learning. J Mot Behav 2004;36: 212-24. 20. Hodges B, Regehr G, McNaughton N, Tiberius R, Hanson M. OSCE checklists do not capture increasing levels of expertise. Acad Med 1999;74:1129-34. 21. Howley LD. Performance assessment in medical education. Eval Health Prof 2004;27:285-303. 22. Hodges B. Validity and the OSCE. Med Teach 2003;25:250-4.

Suggest Documents