Inter-observer reliability and intra-observer ... - Bone & Joint

7 downloads 0 Views 100KB Size Report
May 8, 2006 - Leighton Hospital, Middlewich. Road, Crewe, CW1 4QJ, UK. Correspondence should be sent to Mr I. A. Malek at 47 Dorman. Gardens, Oxford ...
Inter-observer reliability and intra-observer reproducibility of the Weber classification of ankle fractures I. A. Malek, B. Machani, A. M. Mevcha, N. H. Hyder From Leighton Hospital, Crewe, England

 I. A. Malek, MRCSEd, Senior House Officer  B. Machani, MRCS, Specialist Registrar  A. M. Mevcha, MBBS, Senior House Officer  N. H. Hyder, FRCS(Orth), Consultant Orthopaedic Surgeon Leighton Hospital, Middlewich Road, Crewe, CW1 4QJ, UK. Correspondence should be sent to Mr I. A. Malek at 47 Dorman Gardens, Oxford Road, Middlesborough, TS5 5DS, UK; e-mail: [email protected] ©2006 British Editorial Society of Bone and Joint Surgery doi:10.1302/0301-620X.88B9. 17954 $2.00 J Bone Joint Surg [Br] 2006;88-B:1204-6. Received 27 March 2006; Accepted 8 May 2006

1204

Our aim was to assess the reproducibility and the reliability of the Weber classification system for fractures of the ankle based on anteroposterior and lateral radiographs. Five observers with varying clinical experience reviewed 50 sets of blinded radiographs. The same observers reviewed the same radiographs again after an interval of four weeks. Interand intra-observer agreement was assessed based on the proportion of agreement and the values of the kappa coefficient. For inter-observer agreement, the mean kappa value was 0.61 (0.59 to 0.63) and the proportion of agreement was 78% (76% to 79%) and for intra-observer agreement the mean kappa value was 0.74 (0.39 to 0.86) with an 85% (60% to 93%) observed agreement. These results show that the Weber classification of fractures of the ankle based on two radiological views has substantial inter-observer reliability and intra-observer reproducibility.

Classification systems for fractures are often based on the mechanism of injury and are used to devise a plan of management or to predict prognosis. They give clarity to communication and facilitate comparison of published results. Several authors have attempted to evaluate the reliability and reproducibility of these classification systems,1-8 some of which have better reproducibility and reliability than others.1,3,8,9 Since Cohen10 established the kappa coefficient in 1960, it has been widely used to assess the level of agreement between observers. Weber11 classified fractures of the ankle into categories A (fracture of the fibula is distal to the syndesmosis), B (fracture is at the level of the syndesmosis) and C (fracture is proximal to the syndesmosis). Thomsen et al9 showed that there was an acceptable interand intra-observer agreement of both the Weber and the Lauge-Hansen12 classification systems using the kappa coefficient. However, several studies have shown that the use of the kappa coefficient value alone is not sufficient to assess agreement between multiple observers.13-16 We have assessed the inter- and intraobserver agreement for the Weber classification using standard anteroposterior and lateral radiographs of the ankle rather than the frontal, lateral and oblique views as mentioned in the study of Thomsen et al.9

Patients and Methods Using the hospital coding system, 50 patients with ankle fractures were randomly selected. The first author (IAM) reviewed the radiographs, and those which did not show a definite fibular fracture were excluded. Patients who had special radiographs such as stress views or further imaging such as CT scans were also excluded. The selected radiographs were blinded and then reviewed by five different observers with different levels of clinical experience. Two were consultants (including NHH) and three were orthopaedic registrars (including BM) in their second, fourth and sixth years of training. The observers were asked to classify the fractures according to the Weber classification. The first author (IAM) and a co-author (AMM) recorded the responses from the observers. To minimise bias they did not take part in the process of classification. After an interval of four weeks, the observers were again requested to classify the same set of radiographs, which had been mixed in order to minimise any chance of recollection. Statistical analysis. The proportion of agreement (observed agreement), the kappa statistics for agreement between multiple observers and those for intra-observer reproducibility were calculated using Statsdirect software (Statsdirect Ltd, Sale, United Kingdom). The observed agreement is the portion of cases for THE JOURNAL OF BONE AND JOINT SURGERY

INTER-OBSERVER RELIABILITY AND INTRA-OBSERVER REPRODUCIBILITY OF THE WEBER CLASSIFICATION OF ANKLE FRACTURES

which the observers agree and is defined as the number of occasions of complete agreement divided by the total number of occasions. The expected agreement is the probability that two persons will provide the same response to a question for any given patient (chance agreement). The kappa coefficient is the observed agreement which is above and beyond that due to chance, as follows: ( observed agreement – expected agreement ) Kappa = ---------------------------------------------------------------------------------------------------------( 1 – expected agreement ) A kappa value of 1.00 means perfect agreement and that of 0.00, agreement equal to that of chance alone. A negative kappa value implies agreement worse than that of chance alone. Landis and Koch17 have characterised different ranges of values for kappa with respect to suggested degrees of agreement. A kappa coefficient value of less than 0.00 suggests poor agreement, 0.00 to 0.20 slight agreement, 0.21 to 0.40 fair, 0.41 to 0.60 moderate, 0.61 to 0.80 substantial and 0.81 to 1.00 almost perfect agreement.

Results Inter-observer reliability. The proportion of agreement on

the two occasions was 76% (64% to 84%) and 79% (64% to 90%), respectively, with an overall proportion of agreement value of 78% (76% to 79%). The generalised kappa coefficient values for multiobserver agreement on the two occasions were as follows: 1) Kappa1 = 0.59 (SEM 0.045), 95% confidence interval (CI) 0.50 to 0.68, p < 0.00001. 2) Kappa2 = 0.63 (SEM 0.049), 95% CI 0.53 to 0.73, p < 0.00001. The mean of the generalised kappa value for multiobserver agreement was 0.61 (0.59 to 0.63) which implies substantial agreement. Kappa coefficients for category-specific agreement for subgroups were: type A, 0.44; type B, 0.60; and type C, 0.67 which means moderate, moderate and substantial agreement, respectively. Intra-observer reproducibility. Intra-observer agreement was substantial with a kappa coefficient value of 0.74 (0.39 to 0.86) and a mean proportion of agreement of 85% (60% to 93%). Four observers had almost perfect agreement, except for the junior orthopaedic registrar (observer 2; Table I).

Discussion Thomsen et al9 assessed multiobserver agreement of the Weber11 and Lauge-Hansen12 classifications based on frontal, lateral and oblique views. Four observers reviewed 94 sets of radiographs at an interval of three months. As far as the Weber classification was concerned, the observers were asked to classify the radiographs into four categories: A, B, C and ‘non-classifiable’. Inter-observer agreement was found to be acceptable with agreement of 0.737 (73.7%) and 0.740 (74%) on two occasions, and kappa values of 0.58 and 0.56, respectively.9 The observed agreement for intra-observer variation was 0.732 to 0.856 with a kappa VOL. 88-B, No. 9, SEPTEMBER 2006

1205

Table I. Intra-observer agreement for each observer Observer

Kappa

SEM

95% CI*

Percentage of agreement

p value

1 2 3 4 5

0.80 0.39 0.86 0.86 0.80

0.12 0.09 0.12 0.11 0.11

0.56 to 1.00 0.21 to 0.58 0.61 to 1.00 0.63 to 1.00 0.56 to 1.00

90 60 93 92 89

< 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001

* 95% CI, 95% confidence interval

Table II. Review of studies on the agreement of classification systems for fractures Author/s

Classification system*

Thomsen et al1 Kreder et al2 Siebenrock and Gerber3 Siebenrock and Gerber3 Swiontkowski et al4 Brumback and Jones5 Sidor et al6 Smith et al7

Garden’s femoral neck AO: distal radius AO: proximal humerus Neer: proximal humerus AO/OTA: pilon fracture Gustillo: compound fracture Neer: proximal humerus Ficat: osteonecrosis of the head of the femur

Kappa coefficient 0.39 0.48 0.53 0.40 0.57 0.60 0.52 0.46

* AO, Arbeitsgemeinschaft für Osteosynthesefragen; OTA, Orthopaedic Trauma Association

value of 0.60 to 0.76 for four observers.9 We feel that the addition of the fourth category of ‘not-classifiable’ affected the results inappropriately, since it was not part of the Weber classification. There were two reasons for repeating this study, first to measure the impact of using standard anteroposterior and lateral radiographs alone on the level of agreement and, secondly, to eliminate the category of ‘not-classifiable’ by selecting only ankles which had a definite fibular fracture. In our study inter-observer agreement was substantial with an observed agreement of 76% and 79% and kappa coefficient values of 0.59 and 0.63 on two occasions. For intra-observer agreement overall agreement was 60% to 93% with kappa values of 0.39 to 0.86. Four observers had almost perfect agreement. The most junior orthopaedic registrar was the exception. Overall, these results were comparable to those of Thomsen et al.9 A review of the orthopaedic literature showed that these values were higher than those achieved by many other classification systems which have been assessed (Table II). The validity of the kappa coefficient in assessing multiobserver agreement requires discussion. Several studies have described problems and paradoxes with the kappa coefficient.13-16 Sometimes, a low kappa value is possible despite higher observed agreement and this is due to an ‘unfair’ correction of chance agreement. In simple terms, a low kappa coefficient value does not necessarily mean that there is poor agreement. To overcome this phenomenon, Feinstein and Cicchetti13 suggested the use of observed agreement alone without imposing the unfair corrections

1206

I. A. MALEK, B. MACHANI, A. M. MEVCHA, N. H. HYDER

for chance associated with the kappa coefficient when assessing multiobserver agreement. The reason for our inclusion of the kappa value was to compare our results with those of other similar studies applied to different classification systems. The interpretation of the kappa value for poor, moderate or strong agreement is also controversial. Two common systems are used in the medical literature. The criteria of Landis and Koch17 have been described earlier. Svanholm et al18 had different criteria and stated that a kappa coefficient value of greater than 0.75 suggested excellent agreement, while that of less than 0.50 indicated poor agreement. We suggest that future studies to test the reliability and reproducibility of any classification system should mention the observed agreement value along with the kappa coefficient value, and should state which system is used to reach the conclusion so that unnecessary confusion may be avoided. We conclude that the Weber classification system of fractures of the ankle based on standard anteroposterior and lateral radiographs has substantial inter-observer and intraobserver reliability and reproducibility. No benefits in any form have been received or will be received from a commercial party related directly or indirectly to the subject of this article.

References 1. Thomsen NO, Jensen CM, Skovgaard NP, et al. Observer variation in the radiographic classification of fractures of the neck of the femur using Garden’s system. Int Orthop 1996;20:326-9. 2. Kreder HJ, Hanel DP, Mckee M, et al. Consistency of AO fracture classification for the distal radius. J Bone Joint Surg [Br] 1996;78-B:726-31.

3. Siebenrock KA, Gerber C. The reproducibility of classification of fractures of the proximal end of the humerus. J Bone Joint Surg [Am] 1993;75-A:1751-5. 4. Swiontkowski MF, Sands AK, Agel J, et al. Interobserver variation in the AO/OTA fracture classification system for pilon fractures: is there a problem? J Orthop Trauma 1997;11:467-70. 5. Brumback RJ, Jones AL. Interobserver agreement in the classification of open fractures of the tibia: the results of a survey of two hundred and forty-five orthopaedic surgeons. J Bone Joint Surg [Am] 1994;75-A:1162-6. 6. Sidor ML, Zuckerman JD, Lyon T, et al. The Neer classification system for proximal humerus fractures: an assessment of interobserver reliability and intraobserver reproducibility. J Bone Joint Surg [Am] 1993;75-A:1745-50. 7. Smith SW, Meyer RA, Connor PM, Smith SE, Hanley EN Jr. Interobserver reliability and intraobserver reproducibility of the modified Ficat classification system of osteonecrosis of the femoral head. J Bone Joint Surg [Am] 1996;78-A:1702-6. 8. Lenke LG, Betz RR, Harms J, et al. Adolescent idiopathic scoliosis: a new classification to determine extent of spinal arthrodesis. J Bone Joint Surg [Am] 2001;83-A: 1169-81. 9. Thomsen NO, Overgaard S, Olsen LH, Hansen H, Nielsen ST. Observer variation in the radiographic classification of ankle fractures. J Bone Joint Surg [Br] 1991;73-B: 676-8. 10. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960;20: 37-46. 11. Weber BG. Die Verletzungen des oberen sprunggelenkes. Second ed. Berne, etc: Verlag Hans Huber, 1972. 12. Yde J. The Lauge Hansen classification of malleolar fractures. Acta Orthop Scand 1980;51:181-92. 13. Feinstein AR, Cicchetti DV. High agreement but low kappa: I. the problems of two paradoxes. J Clin Epidemiol 1990;43:543-9. 14. Byrt T, Bishop J, Carlin JB. Bias, prevalence and kappa. J Clin Epidemiol 1993;46: 423-9. 15. Zwick R. Another look at interrater agreement. Psychol Bull 1988;103:374-8. 16. Thomsen NO, Olsen LH, Nielsen ST. Kappa statistics in the assessment of observer variation: the significance of multiple observers classifying ankle fractures. J Orthop Sci 2002;7:163-6. 17. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-74. 18. Svanholm H, Starklint H, Gundersen HJ, et al. Reproducibility of histomorphologic diagnoses with special reference to the kappa statistic. APMIS 1989;97:689-98.

THE JOURNAL OF BONE AND JOINT SURGERY