ANNOTATION
Health-care quality registers OUTCOME-ORIENTATED RANKING OF HOSPITALS IS UNRELIABLE
J. Ranstam, P. Wagner, O. Robertsson, L. Lidgren From Lund University Hospital, Lund, Sweden
J. Ranstam, PhD, CStat, Director P. Wagner, MSc, Biostatistician Swedish National Musculoskeletal Competence Centre (NKO) O. Robertsson, MD, PhD, Orthopaedic Surgeon L. Lidgren, MD, PhD, Honorary FRCS, Professor Department of Orthopaedics Lund University Hospital, SE22185 Lund, Sweden. Correspondence should be sent to Dr J. Ranstam; e-mail:
[email protected] ©2008 British Editorial Society of Bone and Joint Surgery doi:10.1302/0301-620X.90B12. 21172 $2.00 J Bone Joint Surg [Br] 2008;90-B:1558-61.
1558
Public disclosure of outcome-orientated ranking of hospitals is becoming increasingly popular and is routinely used by Swedish health-care authorities. Whereas uncertainty about an outcome is usually presented with 95% confidence intervals, ranking’s based on the same outcome are typically presented without any concern for bias or statistical precision. In order to study the effect of incomplete registration of re-operation on hospital ranking we performed a simulation study using published data on the two-year risk of reoperation after total hip replacement. This showed that whereas minor registration incompleteness has little effect on the observed risk of revision, it can lead to major errors in the ranking of hospitals. We doubt whether a level of data entry sufficient to generate a correct ranking can be achieved, and recommend that when ranking hospitals, the uncertainties about data quality and random events should be clearly described as an integral part of the results.
An outcome-orientated ranking of hospitals is routinely published by the Swedish health-care authorities1 as a means to assess the quality of health-care provision. The rankings are based on figures from health-care quality registers. Whereas the uncertainty of outcome, such as the risk of re-operation, is usually presented with 95% confidence intervals, rankings based on the same outcome are typically presented without any concern for bias or statistical precision. The reason for this may be a belief that data from the national registers describe a finite population, rather than a random sample. Sampling errors do not exist for a finite population and confidence intervals are therefore inapplicable. Under these circumstances, ranking based on observed data can simply be accepted. We question the validity of this belief. First, it is not appropriate to assume a finite population when evaluating differences between hospitals, because no allowance is made for random variation. Events that occur randomly cannot be rationally interpreted. Second, all registers contain some registration errors and incomplete data; a finite population approach requires a complete data set. To study the effect of incomplete registration of re-operations on hospital ranking, we simulated missing re-operations in a set of previously published data on the two-year risk of reoperation after total hip replacement (THR).2 Our
hypothesis was that incomplete data affect ranking to a greater extent than has previously been realised.
Materials and Methods The simulations performed used data published in a report on the two-year risk of revision after primary THR in 53 962 patients from 77 hospitals.2 The first nine hospitals and the observed number of primary and revision operations are shown in Table I. Monte Carlo simulation was used.3 This is a technique for analysing uncertainty of propagation and can determine how the reliability and sensitivity of a system are affected by lack of data, random variation and error. In the first simulation it was assumed that 95% of revisions had been correctly identified. It was also assumed that missing revisions were distributed randomly among hospitals and were proportional to the number of primary operations. In each simulation cycle the missing revisions were randomly distributed among the hospitals, and the hospitals were ranked according to the proportion of observed and total number of revisions. In all, 5000 simulation cycles were performed to describe the distribution of the number of hospitals with erroneous rankings and maximum rank error. The median value of each simulated parameter was used as a point estimate and the 2.5 and THE JOURNAL OF BONE AND JOINT SURGERY
HEALTH-CARE QUALITY REGISTERS
1559
Table I. Reoperations within two years per hospital (2003 to 2006) Hospital
Observed number of operations
Observed number of Additional simulated Observed number reoperations reoperations rank
Total number rank
Karolinska/Huddinge Karolinska/Solna Linköping Lund Malmö Sahlgrenska/Sahlgrenska Sahlgrenska/Östra Umeå Uppsala Total
923 1038 447 394 479 781 458 287 1110 5917
15 44 6 13 7 12 2 3 36 138
53 76 48 69 45 49 15 26 70 -
2 1 0 0 1 1 0 1 0 6
50 76 44 70 47 48 7 29 69 -
600
500
Frequency
Frequency
400 300 200
400
200
100 0
0 40
45
50
55
60
65
70
75
80
Number of erroneous rankings
0
5
10
15
20
25
30
35
40
45
50
Maximum ranking error
Fig. 1
Fig. 2
Graph showing number of erroneous rankings.
Graph showing maximum ranking errors.
97.5 percentiles as 95% confidence intervals (CI). We then performed further simulations using incompleteness rates ranging from 0% to 15%. We also calculated the adjusted risk estimate taking account of the incompleteness of the register. A re-operation risk estimate calculated using register data is related to the register’s degree of completeness. With incomplete re-operation registration the observed re-operation risk will underestimate the true re-operation risk. If the level of completeness is known, the risk estimate can, however, be adjusted for incompleteness. For example, with a level of completeness of 0.9 and a re-operation risk of 0.2, the adjusted risk estimate is calculated as 0.2/0.9 = 0.22. The statistical package (R Foundation for Statistical Computing, Vienna, Austria) was used for the calculations.4
re-operation incompleteness of 5% results in an adjusted re-operation risk of 0.016. By contrast, a revision incompleteness of 5% has major consequences for hospital ranking. The simulation study showed that 63 of 77 hospitals (95% CI 54 to 70) were incorrectly ranked (Fig. 1). Furthermore, the maximum rank error was 14 (95% CI 8 to 29), which cannot be considered a minor consequence (Fig. 2). When the simulation program was rerun with varying completeness rates, the incidence of incorrect ranking was substantial even with incompleteness rates as low as 1% to 2% (Fig. 3). The result was similar for maximum rank error (Fig. 4). In fact, if the incompleteness rate is 4% or more, it is of little importance whether it is 5%, 10% or 15%. To investigate whether incorrect ranking was less of a problem with higher revision rates, i.e. with greater variance in the distribution of hospital-specific re-operation rates, the simulation program was rerun once again with a fictitiously increased re-operation rate of ten times. This indicated that incorrect ranking is related to re-operation rate, but even when the real re-operation rate is increased ten times, the error remains substantial (Figs 5 and 6).
Results In the case of risk of re-operation, adjustment for register incompleteness is fairly straightforward. With randomly distributed and reasonably low incompleteness rates, and low risks for re-operation, register incompleteness has a negligible effect. For example, the average observed reoperation rate within two years is 0.015.2 An estimated VOL. 90-B, No. 12, DECEMBER 2008
J. RANSTAM, P. WAGNER, O. ROBERTSSON, L. LIDGREN
80
Number of erroneous rankings
Number of erroneous rankings
1560
70 60 50 40 30 20 10 0
80 70 60 50 40 30 20 10 0
0 .01 .02 .03 .04 .05 .06 .07 .08 .09 .1 .11 .12 .13 .14 .15
0 .01 .02 .03 .04 .05 .06 .07 .08 .09 .1 .11 .12 .13 .14 .15
Register incompleteness
Register incompleteness
Fig. 3
Fig. 5
Graph showing register in completeness and the number of erroneous ranking.
Graph showing register in completeness and the number of erroneous rankings with the revision rate increased ten times.
40 Maximum ranking error
Maximum ranking error
40
30
20
10
0
30
20
10
0 0 .01 .02 .03 .04 .05 .06 .07 .08 .09 .1 .11 .12 .13 .14 .15
0 .01 .02 .03 .04 .05 .06 .07 .08 .09 .1 .11 .12 .13 .14 .15
Register incompleteness
Register incompleteness
Fig. 4
Fig. 6
Graph showing register in completeness and maximum ranking error.
Graph showing register in completeness and maximum ranking error with the revision rate increased ten times.
Discussion Performance measurement using ranking has previously been criticised on the grounds of statistical imprecision,5 and this weakness has been demonstrated in empirical studies which have found ranks to be ‘not reliable indicators of performance or best practice’,6 and even ‘extremely unreliable statistical summaries of performance’.7 Consequently, serious concerns have been exposed about using registry data for ranking purposes.8 We have previously described the effects on ranking of issues related to statistical precision and suggested a statistical method that minimised the problem.9 To the best of our knowledge, the current report is the only one that quantifies ranking errors caused by incomplete registration. It is almost impossible to avoid some incompleteness in large registers. The Swedish Knee Arthroplasty Register is a good example.10 Despite regular annual enquiries, corrections by participating hospitals, a postal survey to all living patients and cross-referencing with the official reimbursing (ICD10) databases, only 95% of all revisions can be identified.11
The data completeness of other Swedish national healthcare quality registers varies greatly and in several instances is unknown. However, not even those registers that claim a very high completeness12 present higher completeness rates than 96% to 97%. Our results clearly show that even such high rates of completeness are insufficient for valid ranking, at least when Swedish hospitals are ranked according to the two-year risk of reoperation after hip replacement. For most of the registers we doubt whether it is practically possible to achieve the level of data quality required to make a correct ranking. Risk estimates related to implants, surgical technique and other similar clinical factors remain relatively robust measures in incomplete registers as long as the incompleteness is randomly distributed. Furthermore, when such estimates are presented as a basis for clinical decision, they usually appear with confidence intervals. This makes such data a sound basis for rational decisions. On the other hand, as we show in our example, observed rankings without any indication of variability due to the THE JOURNAL OF BONE AND JOINT SURGERY
HEALTH-CARE QUALITY REGISTERS
method of sampling cannot be used as a basis for rational decision making. Statistical methods for computing confidence intervals for ranks have been described.6 It is possible to develop these so as to account for the uncertainties caused by register incompleteness. If, in spite of their unreliability, hospital rankings are presented as a basis for clinical improvement, the actual margin of error should be assessed and clearly described for each individual hospital so that unnecessary mistakes in ranking are avoided. No benefits in any form have been received or will be received from a commercial party related directly or indirectly to the subject of this article.
References 1. No authors listed. Quality and efficiency in Swedish Health Care: regional Comparisons 2007. Sveriges Kommuner och Landsting, Stockholm, Sweden, 2006. 2. No authors listed. Swedish Hip Arthroplasty Register. Annual Report 2006. Department of Orthopaedics, Sahlgrenska University Hospital, Gothenburg, Sweden, 2007.
VOL. 90-B, No. 12, DECEMBER 2008
1561
3. Metropolis N, Ulam S. The Monte Carlo method. J Am Stat Assoc 1949;44:335-41. 4. No authors listed. R Development Core Team. A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2007. 5. Goldstein H, Spiegelhalter DJ. League tables and their limitations: statistical issues in comparisons of institutional performance. J R Statist Soc (A) 1996;159:385-43. 6. Parry GJ, Gould CR, McCabe CJ, Tarnow-Mordi WO. Annual league tables of mortality in neonatal intensive care units: longitudinal study. BMJ 1998;316:1931-5. 7. Marshall EC, Spiegelhalter DJ. Reliability of league tables of in vitro fertilisation clinics: retrospective analysis of live birth rates. BMJ 1998;316:1701-4. 8. Philipson MR, Westwood MJ, Geogehegan JM, Henry A, Jefferiss CD. Shortcomings of the National Joint Registry: a survey of consultants’ views. Ann R Coll Surg Engl 2005;87:109-12. 9. Robertsson O, Ranstam J, Lidgren L. Variation in outcome and ranking of hospitals: an analysis from the Swedish knee arthroplasty register. Acta Orthop 2006;77:487-93. 10. Robertsson O, Lewold S, Knutson K, Lidgren L. The Swedish Knee Arthroplasty Project. Act Orthop Scand 2000;71:7-18. 11. Robertsson O, Dunbar MJ, Knutson K, Lewold L, Lidgren L. Validation of the Swedish Knee Arthroplasty Register: a postal survey regarding 30,376 knees operated on between 1975-1995. Acta Orthop Scand 1999;70:467-72. 12. Troëng T. Swedvasc’s numbers are reliable. Läkartidningen 2008;105:557 (in Swedish).