CLINICAL REPORTS
56. Inomata S, Nagashima A, Itagaki F et al. CYP2C19 genotype affects diazepam pharmacokinetics and emergence from general anesthesia. Clin Pharmacol Ther. 2005; 78:647-55. 57. Dalen P, Dahl ML, Bernal Ruiz ML et al. 10-Hydroxylation of nortriptyline in white persons with 0, 1, 2, 3, and 13 functional CYP2D6 genes. Clin Pharmacol Ther. 1998; 63:444-52. 58. Yue QY, Zhong ZH, Tybring G et al.
Pharmacokinetics of nortriptyline and its 10-hydroxy metabolite in Chinese subjects of different CYP2D6 genotypes. Clin Pharmacol Ther. 1998; 64:384-90. 59. Chen S, Chou WH, Blouin RA et al. The cytochrome P450 2D6 (CYP2D6) enzyme polymorphism: screening costs and influence on clinical outcomes in psychiatry. Clin Pharmacol Ther. 1996; 60:522-34. 60. Balant-Gorgia AE, Balant LP, Andreoli A. Pharmacokinetic optimisation of the
Interrater agreement
treatment of psychosis. Clin Pharmacokinet. 1993; 25:217-36. 61. Dalen P, Frengell C, Dahl ML et al. Quick onset of severe abdominal pain after codeine in an ultrarapid metabolizer of debrisoquine. Ther Drug Monit. 1997; 19:543-4. 62. Gasche Y, Daali Y, Fathi M et al. Codeine intoxication associated with ultrarapid CYP2D6 metabolism. N Engl J Med. 2004; 351:2827-31.
Interrater agreement with a standard scheme for classifying medication errors RYAN A. FORREY, CRAIG A. PEDERSEN, AND PHILIP J. SCHNEIDER
M
edication errors are common, cause significant patient harm, and are costly to society. For nearly 50 years, problems with medication errors have been documented in the medical literature. 1-3 With the publication of the Institute of Medicine’s To Err Is Human: Building a Safer Health System,4 patient safety became a greater focus in health care delivery. This led organizations to examine the causes of medication errors and to question the clinical significance of these common errors. In 1995, the National Coordinating Council for Medication Error Reporting and Prevention (NCC MERP) was created from 24 national health care organizations. One of the objectives of NCC MERP was to “develop standardization or classification systems for the collection of medication error reports so that da-
Purpose. The interrater agreement for and reliability of the National Coordinating Council for Medication Error Reporting and Prevention (NCC MERP) index for categorizing medication errors were determined. Methods. A letter was sent by the U.S. Pharmacopeia to all 550 contacts in the MEDMARX system user database. Participants were asked to categorize 27 medication scenarios using the NCC MERP index and were randomly assigned to one of three tools (the index alone, a paper-based algorithm, or a computer-based algorithm) to assist in categorization. Because the NCC MERP index accounts for harm and cost, and because categories could be interpreted as substantially similar, study results were analyzed after the nine error categories were collapsed to six. The interrater agreement was measured using Cohen’s kappa value. Results. Of 119 positive responses, 101 completed surveys were returned for a response rate of 85%. There were no significant differences in baseline demograph-
RYAN A. FORREY, PHARM.D., M.S., is Assistant Director, Department of Pharmacy, The Ohio State University Medical Center (OSUMC), Columbus; at the time of the study he was a Resident in Pharmacy Practice Management at OSUMC and a graduate student in healthsystem pharmacy administration at The Ohio State University (OSU), Columbus. CRAIG A. PEDERSEN, PH.D., FAPHA, is Associate Professor and Director of Graduate Studies, Division of Pharmacy Practice and Administration, College of Pharmacy; and PHILIP J. SCHNEIDER, M.S., FASHP, is Clinical Professor and Director, Latiolais Leadership Program, College of Pharmacy, OSU. Address correspondence to Dr. Pedersen at the College of Pharmacy, The Ohio State University, 500 West 12th Avenue, Columbus, OH 43210-1291 (
[email protected]).
ics among the three groups. The overall interrater agreement for the participants, regardless of group assignment, was substantial at 0.61 (95% confidence interval [CI], 0.41–0.81). There was no difference among the kappa values of the three study groups and the tools used to aid in medication error classification. When the index was condensed from nine categories to six, the interrater agreement increased with a kappa value of 0.74 (95% CI, 0.56–0.90). Conclusion. Overall interrater agreement for the NCC MERP index for categorizing medication errors was substantial. The tool provided to assist with categorization did not influence overall categorization. Further refining of the scale could improve the usefulness and validity of medication error categorization. Index terms: Classification; Data collection; Errors, medication; Methodology; National Coordinating Council for Medication Error Reporting and Prevention; Reports Am J Health-Syst Pharm. 2007; 64:175-81
The assistance of Rodney Hicks, R.N., M.S.N., M.P.A., and Amy Lehman, M.S., is acknowledged. Funded by a grant from the U.S. Pharmacopeia. Presented in part at the National Patient Safety Foundation Annual Patient Safety Congress, Orlando, FL, May 4–6, 2005, and at the ASHP Midyear Clinical Meeting, New Orleans, LA, December 7, 2004. Copyright © 2007, American Society of Health-System Pharmacists, Inc. All rights reserved. 1079-2082/07/0102-0175$06.00. DOI 10.2146/ajhp060109
Am J Health-Syst Pharm—Vol 64 Jan 15, 2007
175
CLINICAL REPORTS
Interrater agreement
tabases will reflect reports and grading systems.”5 In 1995, NCC MERP began work to create a standard taxonomy for medication errors and an index for categorizing medication errors. This index was completed in 1996 and was most recently revised in 2001.6 The current NCC MERP index for categorizing medication errors includes nine categories of medication errors based on severity and patient outcomes (Figure 1).7 The index includes harm and cost factors, such as increased length of stay. To facilitate error-index categorization, an algorithm was created (Figure 2).8 The algorithm guides users through a series of questions designed to help reporters assign the appropriate error category to the event. This tool is intended to reduce variability in the interpretation of index categories. The NCC MERP algorithm is an attempt to increase
the consistency of assigning error categories. Despite the availability of this tool, there is still a possibility for miscategorization of medication errors using the NCC MERP index. Until this time, the NCC MERP index has not been validated for interrater agreement and reliability. Despite the lack of validation, use of the NCC MERP index is widespread. Many national and local organizations and health care institutions use the NCC MERP scale.9 The U.S. Pharmacopeia (USP) uses the NCC MERP scale in its medication error reporting system, MEDMARX. Because of the standardized format of MEDMARX, based on the NCC MERP index, the system allows sharing of information and comparisons across more than 500 participating facilities.9 To further minimize variability and facilitate consistent error index classification, USP also
developed a computer-based algorithm based on the paper-based algorithm published by NCC MERP. However, without validation of the NCC MERP index at the core of the system, the reliability of the information obtained from the more than 950,000 records may be questioned. Therefore, this study was designed to assess the reliability and validity of the NCC MERP index for categorizing medication errors. Methods A letter was sent by USP to all 550 contacts in the MEDMARX system user database explaining the NCC MERP index for categorizing medication errors and asking for participation in the study. Those who chose to participate responded via fax or e-mail to USP or directly to the study investigators with their contact information. All responses provided
Figure 1. Index for categorizing medication errors developed by the National Coordinating Council for Medication Error Reporting and Prevention. © 2001, National Coordinating Council for Medication Error Reporting and Prevention. All rights reserved.
176
Am J Health-Syst Pharm—Vol 64 Jan 15, 2007
CLINICAL REPORTS
Interrater agreement
Figure 2. Algorithm developed by the National Coordinating Council for Medication Error Reporting and Prevention (NCC MERP) for applying the NCC MERP index for categorizing medication errors. © 2001, National Coordinating Council for Medication Error Reporting and Prevention. All rights reserved.
Am J Health-Syst Pharm—Vol 64 Jan 15, 2007
177
CLINICAL REPORTS
Interrater agreement
to USP were forwarded to the investigators, who at no time had access to the MEDMARX subscriber database. The study was approved by The Ohio State University institutional review board. Each participant was sent a booklet containing 27 medication error scenarios. These scenarios were de-identified actual medication errors reported to USP through the MEDMARX system. The scenarios were selected by an expert panel to provide 3 scenarios from each of the nine NCC MERP medication error categories, but study participants were not told of this distribution. The expert panel consisted of three users of the NCC MERP index with at least five years’ experience rating medication errors and who also worked at USP. The error classification for each scenario determined by the expert panel served as the gold standard for all of the scenarios. A consensus by the expert panel in classification was reached for all of the scenarios used as the gold standard. Each of the study groups (described below) was compared to this gold standard for accuracy in error classification by the study subjects. Participants were randomly assigned to one of three study groups (1, 2, and 3) and asked to evaluate and assign an NCC MERP medication error category classification to each of the 27 scenarios. Each group was provided with and asked to use a different tool to aid in evaluation and assignment of the NCC MERP error category. Group 1 evaluated the scenarios using only a printed listing of the NCC MERP index for categorizing medication errors as shown in Figure 1. Group 2 evaluated the scenarios using the printed algorithm shown in Figure 2. Group 3 evaluated the scenarios using a computer-based, interactive algorithm program developed by USP from the paper-based algorithm published by NCC MERP. Participants in group 3 accessed the computer-based 178
algorithm on a password-protected website. Each participant was asked to assign an error category rating to the medication error described by the standardized scenario and record this on the documentation sheets provided in the booklet. The booklet was then returned to the researchers using a prestamped return envelope. Statistical analyses Enrollment of at least 51 current users (17 in each group) of the NCC MERP index was needed for adequate statistical power (power = 0.80). Additional participants were allowed to enroll to gain greater statistical power and prevent experimental bias. The error category classifications were analyzed using the kappa statistic for a measure of agreement.10-12 One rater was chosen from each of the three scoring groups, and kappa statistics were calculated to assess the agreement among these three reviewers. This process was repeated with all possible combinations of reviewers (with one rater from each of the three groups). Kappa statistics were used to score agreement within each error category (A–I) and to calculate overall agreement for the entire NCC MERP index for categorizing medication errors. A standard kappa statistic interpretation was used.12 For kappa values from 0.80 to 1.00, the strength of agreement was considered “almost perfect”; kappa values between 0.60 and 0.80 were considered “substantial,” between 0.40 and 0.60 were considered “moderate,” between 0.20 and 0.40 were considered “fair,” and between 0.00 and 0.20 were considered “slight”; kappa values less than 0 were considered “poor.” The accuracy of the raters was determined by comparing each of their scores with the gold standard score for a given medication error scenario. The overall percentage of responses matching the gold standard ratings was compared among the three groups using the Kruskal–Wallis test.
Am J Health-Syst Pharm—Vol 64 Jan 15, 2007
Because the NCC MERP index accounts for harm and cost, and because some categories could be interpreted as substantially similar, study results were reanalyzed after the nine error categories were collapsed to six. Categories E, F, and H were collapsed into one category, and categories C and D were collapsed into another. These categories were collapsed because distinctions were considered ambiguous or seemed similar. The overall kappa and the kappa for each category were recalculated for the revised six-category index. The accuracy in coding was also recalculated for each group and for all study participants using the revised error categories. The a priori level of significance was 0.05. Results The study population was 119 NCC MERP index users who agreed to participate, with 101 (85%) returning the booklet fully or partially evaluated. Overall, 29 (29%) respondents were in group 1 (definitions only), 41 (41%) were in group 2 (paper-based algorithm), and 31 (31%) were in group 3 (computerbased algorithm). Demographic information for all respondents is shown in Table 1. There were no significant differences in baseline demographics among the three groups. The overall interrater agreement for the participants, regardless of group assignment, was substantial at 0.61 (95% CI, 0.41–0.81) (Table 2). Category I had the highest interrater agreement (0.84). This is interpreted as “almost perfect” agreement. Categories A, D, F, and H had a “moderate” level of interrater agreement, and categories B, C, and G had a “substantial” level of interrater agreement. Category E had the lowest interrater agreement (0.36), interpreted as “fair” agreement. There was no difference among the study groups for overall interrater agreement. That is, all groups had
CLINICAL REPORTS
similar kappa values for the ratings of all 27 scenarios. Group 1, using the definitions only, had an overall kappa of 0.63 (95% CI, 0.32–0.91); group 2, using the paper-based algorithm, had a kappa of 0.60 (95% CI, 0.29–0.87); and group 3, using the computer-
Table 1.
Participant Demographic Informationa Characteristic
n (%)
Profession (n = 98) Nurse Pharmacist Other Sex (n = 98) Male Rating tool usually used (n = 97) Definition Algorithm Both Nothing Hospital bed size (n = 96)