genou entre deux regions de l'Ontario: une region ou le taux d'utilisation fonde sur la population est eleve et une ..... Waterloo Regional Municipal ty. Ottawa ...
[ original research * nouveautes en recherche I
APPROPRIATENESS
OF PRIMARY TOTAL HIP AND KNEE REPLACEMENTS IN REGIONS OF ONTARIO WITH HIGH AND LOW UTILIZATION RATES
Carl van Walraven,1[ MD; J. Michael Paterson,* MSc; Moira Kapral,t MD; Ben Chan,*II MD, MPA, MPH; Mary Bell,t4 MD, MSc; Gillian Hawker,it MD, MSc; Jeffrey Gollish,§ MD; Joseph Schatzker,§ MD; J. Ivan Williams,*tIl PhD; C. David Naylor,*tt§II MD, DPhil
Objective: To compare the appropriateness of case selection for primary hip and knee replacements between two regions in Ontario: one with a high population-based utilization rate and one with a low rate.
Design: Random audit of medical records sampled from hospital discharge abstracts, with subsequent implicit and explicit criteria-based assessments of the appropriateness of surgery. Study population: People aged 60 years or over who underwent elective, single-joint, non-fracturerelated, primary hip or knee replacement between Apr. 1, 1992, and Mar. 31, 1993, at one of seven hospitals in a high-rate region (comprising Brant, Huron and Oxford counties) or one of eight hospitals in a low-rate region (comprising the cities of Scarborough and Toronto). Interventions: Structured review of hospital medical records, with additional review of information from surgeons' and family physicians' office charts if necessary. Three physicians reviewed patient data and rated the preoperative pain level and functional status of patients, with agreement among at least two reviewers. The proportion of inappropriate cases was then assessed according to explicit criteria defined by a multidisciplinary panel using the delphi process. Profiles of each case were also subjected to independent implicit review by two rheumatologists and two orthopedic surgeons. Outcome measures: Proportion of joint replacements deemed inappropriate in the high- and low-rate regions according to either the explicit criteria or the implicit review, as well as preoperative pain levels and functional status of patients in the high- and low-rate regions. Results: Hip replacements were more common among patients sampled in the low-rate region than among those in the high-rate region (57.3% v. 39.3%; p < 0.002), although the patients' baseline characteristics, including severity of preoperative pain and dysfunction, were otherwise similar between the regions. Inappropriate surgery, determined by explicit criteria, was equally uncommon in the two regions (6.4% and 6. 1 %). On implicit review, the two rheumatologists rated fewer cases as appropriate than did the two orthopedic surgeons (63.0% v. 80.0%; p < 0.001); however, the proportion of cases rated as inappropriate by the subspecialists was similar in the high- and low-rate regions (1 1.4% and 11.0%, respectively, by the rheumatologists, and 6.3% and 10.4%, respectively, by the orthopedic surgeons). Conclusions: Patients selected for primary hip or knee replacement are similar in the high- and low-rate regions of Ontario. Inappropriate use of this procedure does not account for the high rate of surgery in some areas. Further studies will be required to determine which other factors account for the regional variations in the utilization rates and whether there is underservicing in low-rate areas. From *the Institute for Clinical Evaluative Sciences in Ontario (ICES), North York, Ont., tthe Clinical Epidemiology and Health Care Research Program, University of Toronto, Toronto, Ont., and the departments of lMedicine, §Surgery and !,`Health Administration, University of Toronto, Toronto, Ont.
1/At the time of the studyDr. van Walraven was a visiting research fellow with ICES; he is now with the Department of Medicine, University of Ottawa, Ottawa, Ont Reprnt requests to: Dr. C. David Naylor, Institute for Clinical Evaluative Sciences in Ontario, Sunnybrook Health Science Centre, Rm. G106, 2075 BayviewAve., North York ON M4N3M5; fax 416 480-6048
0 1996 Canadian Medical Associaton (text and abstract/rdsumd)
CAN MED ASSOC J * SEPT. 15, 1996; 155 (6)
697
Objectif: Comparer la pertinence de la selection des cas d'arthroplastie primaire de la hanche ou du genou entre deux regions de l'Ontario: une region ou le taux d'utilisation fonde sur la population est eleve et une autre oi le taux est faible. Conception VWrification aleatoire de dossiers medicaux tires de resumes de liberation d'h6pital et evaluations implicite et explicite subsequentes, fondees sur des criteres, de la pertinence de l'intervention chirurgicale. Population etudiee Personnes de 60 ans ou plus qui ont subi une arthroplastie primaire du genou ou de la hanche elective, portant sur une seule articulation et non liee a une fracture, entre le 1'r avr. 1992 et le 31 mars 1993, a un des sept hopitaux d'une region ou le taux est eleve (comtes de Brant, Huron et Oxford), ou a un des huit h6pitaux d'une region ou le taux est faible (Scarborough et Toronto). Interventions Etude structuree des dossiers medicaux d'hopital et etude supplementaire, au besoin, de renseignements tires de dossiers des cabinets des chirurgiens et des medecins de famille. Trois medecins ont etudie les donnees sur les patients et evalue le niveau de douleur preoperatoire et l1'tat fonctionnel des patients. 11 y a eu entente entre au moins deux examinateurs. On a ensuite evalue la proportion des cas inappropries en fonction des criteres explicites definis par un panel multidisciplinaire au moyen de la methode de delphi. Les profils de chaque cas ont aussi fait l'objet d'un examen implicite independant effectue par deux rhumatologues et deux chirurgiens orthopedistes. Mesures des resultats Proportion de remplacements de l'articulation juges inappropries dans les regions a taux eleve et a taux faible, selon soit les criteres explicites, soit l'examen implicite, ainsi que selon les niveaux de douleur preoperatoire et l'etat fonctionnel des patients dans les regions a taux eleve et a taux faible. Resultats Les arthroplasties de la hanche ont ete plus frequentes chez les patients echantillonnes dans la region a taux faible que chez ceux de la region 'a taux eleve (57,3 % c. 39,3 %; p < 0,002), meme si les caracteristiques de base des patients, y compris la gravite de la douleur et du dysfonctionnement preoperatoires, etaient autrement semblables entre les regions. Les interventions chirurgicales inappropriees, determinees en fonction de criteres explicites, etaient aussi peu frequentes dans les deux regions (6,4 % et 6,1 o). Apres une etude implicite, les deux rhumatologues ont juge moins de cas appropries que les deux chirurgiens orthopedistes (63,0 0/o c. 80,0 %; p < 0,001); la proportion des cas juges inappropries par les sous-specialistes etait toutefois semblable dans les regions a taux eleve et a taux faible (11,4 % et 11,0oo respectivement, par les rhumatologues, et 6,3 % et 10,4 % respectivement, par les chirurgiens orthopedistes). Conclusion Les patients choisis pour subir une arthroplastie primaire de la hanche ou du genou sont semblables dans les regions a taux eleve et a taux faible de l'Ontario. L'utilisation inappropriee de cette intervention n explique pas le taux eleve d'interventions chirurgicales dans certaines regions. Des etudes plus pousse's s'imposeront si lIon veut determiner les autres facteurs qui expliquent les variations des taux d'utilisation entre les regions et si les regions ou le taux est faible sont mal desservies.
Large, inexplicable geographic variations in the population-based service rates for many surgical procedures, including hip and knee arthroplasty, have been noted.'-4 In Ontario annual age- and sex-adjusted rates of joint replacement in small geographic areas for the 1989-92 period ranged from 50 to 1 1 3 per 100 000 adults for hip replacement and from 37 to 127 per 100 000 adults for knee replacement.' 2 These two procedures are expensive and dramatically affect patients' quality of life. Are residents in higher-rate areas undergoing joint replacements for less appropriate indications than residents in lower-rate areas? Although such a relation is plausible, when rates of other procedures have been analysed with the use of explicit criteria, inappropriate use has been shown to explain little9"' or nones' of the observed variations in utilization rates. This raises the possibility that residents in lower-rate areas are underserviced. We accordingly set out to examine whether there is a significant difference in the appropriateness of hip and knee replacements between high- and low-rate regions 698
CAN MED ASSOC J * 15 SEPT. 1996; 155 (6)
of Ontario. Randomly sampled cases were appraised with the use of both explicit case-selection criteria derived from a multidisciplinary panel" and implicit case review based on the judgements of orthopedic surgeons and rheumatologists who were unaware of the patients' place of residence.
METHODS The study was conceptualized as a variation on medical record review for either quality assurance or research purposes. It was approved by the Research Ethics Board of the Sunnybrook Health Science Centre, with specific safeguards. Executives and surgeons at each hospital were given a detailed summary of the study goals and methods; all hospitals agreed to participate, and only I of the 44 surgeons refused to allow review of his records. It was agreed that the identities of hospitals and surgeons would not be revealed in the research reports. Sampling was based on non-nominal hospital administrative data, and procedure codes and hospital chart
numbers were used to identify potentially eligible cases. Patient names identified through review of the hospital medical records were kept separate in a secured database and used only when needed to contact the orthopedic surgeon or family physician for additional details from the office records. No patients were contacted for infor-. mation by study personnel. Names were deleted from all records or computer files as soon as possible after information was received.
DETERMINATION OF UTILIZATION RATES AND PATIENT SELECTION Hospital discharge data from the Canadian Institute for Health Information were reviewed for the fiscal year 1992-93 (Apr. 1, 1992, to Mar. 31, 1993) to identify patients assigned a procedure code (Canadian Classification of Diagnostic, Therapeutic and Surgical Procedures'4) of 93.51 (total hip replacement with use of methyl methacrylate), 93.59 (other total hip replacement) or 93.41 (total knee replacement, geomedic and polycentric). We excluded patients who were less than 60 years of age (16.3%), those who underwent multiple simultaneous joint replacements (2.1 %), those who underwent joint replacement because of an acute fracture of the joint (5.4%; International Classification of Diseases codes 820-829'5) and those with an out-of-province or invalid residence code (1.1 %); these patients accounted for about 25% of the cases overall. By excluding patients less than 60 years of age we hoped to avoid contentious judgements about osteotomy candidacy. (Osteotomy procedures are commonly performed in younger patients; poorer results of osteotomy, decreased patient longevity and reduced prosthesis demand make joint arthroplasty relatively more attractive in patients over 60. 13,119)
Residence codes were used to assign patients to one of 49 prespecified small areas in Ontario according to previously published methods. 2 Age- and sex-adjusted population-based utilization rates for joint replacements (hip and knee replacements combined) were calculated for each small area. To choose geographic areas for comparison, we reviewed all small areas with joint-replacement rates that differed significantly (p < 0.001, X2 test) from the provincial mean, and looked for outlier areas whose residents had most of their operations in a few hospitals. We chose two regions to represent high- and low-rate areas. The high-rate region comprised Brant, Huron and Oxford counties; the low-rate region comprised the cities of Scarborough and Toronto. The proportion of cases coded as revisions was similar between the high- and low-rate regions, and rankings of regional rates were ro-
bust when primary joint replacements were considered alone. Of 414 eligible patients in the high-rate region, 371 (89.6%) underwent surgery in one of seven hospitals (three university affiliated, four community based). Of 703 eligible patients in the low-rate region, 565 (80.4%) underwent surgery in one of eight hospitals (five university affiliated, three community based). A random sample of patient chart numbers was chosen from each hospital, with the number of charts requested exceeding the desired number by one third to allow for miscoding or patient ineligibility. For the case to be eligible for the chart audit, the operation had to have been performed during the study period and confirmed as the procedure of interest without concomitant fracture or other unexpected indication for surgery. Because the explicit case-selection criteria were oriented to primary procedures (see "Explicit case review") we excluded revision procedures. Each hospital's contribution to the study sample was proportional to its contribution to the number of procedures performed in the study region. DATA COLLECTION AND RATING OF PREOPERATIVE PAIN AND FUNCTIONAL STATUS
One of two physicians (M.K. and C.v.W.) independently reviewed and abstracted each hospital medical record with the use of a structured form. The following information was obtained: hospital; patient's date of birth, height, weight and age; surgeon, date of surgery, family physician, reason for chart exclusion if applicable, whether the initial orthopedic consultation note was on the chart, date of consultation, whether fusion takedown was performed, presence of sepsis in the joint, hipabduction strength, knee-extension strength, presence of dementia in the patient, patient's type of residence (residing in nursing home or receiving home care), diagnoses, medications, radiology report on the joint, documentation of function, documentation of pain, need for a mobility aid or wheelchair, sleep disturbance or night pain, need for narcotics and their effect on pain. We required standardized ratings of preoperative pain and functional status, to allow us to compare case selection across regions and to assess cases against the explicit criteria (see "Explicit case review"). Thus, three of us (B.C., M.K., C.v.W.) independently reviewed each profile and rated each patient's level of pain and functional status according to the schemata endorsed by the criteria-setting panel (Appendix 1). The reviewers' confidence in the interpretations of each case was categorized as follows: confident (certain about ratings for both pain and function), uncertain (certain about only one of the two ratings) and not confident (not certain about either rating). Whenever two of the three reviewers were not confiCAN MED ASSOC J * SEPT. 15, 1996; 155 (6)
699
dent about their judgements of pain and functional status, orthopedic surgeons and family physicians were sent a standardized form requesting additional details from their office charts or photocopies of radiology reports, or both. Additional information was required in 41 (25.9%) of 158 cases in the high-rate area, as compared with 28 (17. 1 %) of 164 in the low-rate area (p < 0.01). Final ratings of preoperative pain and functional status were based on agreement among at least two of the three reviewers. In the rare instances when independent majority agreement did not occur, ratings were determined by panel deliberation.
To assess levels of intra- and intersubspecialty agreement, a random subsample of 120 case descriptions was reviewed by all four physicians. Ten fictitious patient profiles, representing cases of clearly inappropriate surgery, were included randomly with the actual profiles. The four reviewers were told that an unspecified number of "ringers" were included, in the hope that this would mitigate any reluctance to use the "inappropriate" category. Analysis of the ratings of these fictitious cases also gave us a way to determine the reviewers' willingness to judge procedures as "inappropriate."
EXPLICIT CASE REVIEW
OUTCOME MEASURES
The methods used to develop explicit criteria for assessing surgical appropriateness for hip and knee replacement surgery were a variation of the delphi processes developed by the RAND Corporation2" and described in detail elsewhere.'" In brief, members of an expert panel (four orthopedic surgeons, two rheumatologists, two family physicians, an internist, an epidemiologist and a physiotherapist) surveyed the literature for indications for total hip and knee replacement and agreed on key factors in surgical case selection. These factors were used to develop sets of hypothetical cases that could be rated by the panel as to the appropriateness of surgery. Each hypothetical case was then assigned an appropriateness summary score, and scoring algorithms were derived by statistical analysis. We used the clinical information obtained through the structured chart review to determine whether patients would be deemed inappropriate candidates for surgery by the panel's criteria. For simplicity in applying these criteria, we assumed that each patient could expect average long-term prosthesis survival.'"
The primary outcome measure was the proportion of joint replacements deemed inappropriate in the high- and low-rate regions according to the explicit caseselection criteria. Secondary outcome measures included the proportion of joint replacements deemed inappropriate in the two regions as determined by implicit review, the preoperative pain and functional status among patients in the high- and low-rate regions, and the levels of agreement between ratings determined by explicit and implicit methods and among and within clinical subspecialties for implicit review.
IMPLICIT REVIEW For blinded implicit review we used the information obtained through the structured chart review to develop case descriptions. Each description included the patient's age, sex, height, weight, living circumstances, extent of functional incapacity, severity of pain while walking and at rest, and type, frequency and doses of analgesics or nonsteroidal anti-inflammatory agents used. Gestalt assessments of appropriateness for each case were made independently on a simple three-level basis (inappropriate, uncertain or appropriate) by two rheumatologists (M.B. and G.H.) and two orthopedic surgeons U.G. and J.S.). That is, one clinician from each specialty reviewed half of the 322 cases, which were randomly chosen. Each reviewer also indicated his or her degree of confidence in judging each case (not, somewhat or very confident). 700
CAN MED ASSOC J * 15 SEPT. 1996; 155 (6)
SAMPLE SIZE Based on our pilot work and parallel analyses for other surgical procedures we assumed that about 10% of the cases in the low-rate region would be deemed potentially inappropriate according to the explicit criteria. Our null hypothesis was that the proportion of inappropriate joint replacements in the high-rate region would be similar. The alternative hypothesis was directional: that is, we were interested in detecting only a clearly higher proportion of inappropriate surgery in the high-rate region. Therefore, we used a one-tailed probability of Type I error of 0.05. From a quality-of-care and policy-making standpoint we considered a 10% absolute increase in the proportion of inappropriate joint replacements in the high-rate region to be unequivocally important. A review of 314 patients was needed to have an 80% chance of detecting and labelling significant an increment from 10% (in low-rate areas) to 20% (in high-rate areas) as the proportion of inappropriate procedures.
DATA ANALYSIS
Categorical data were analysed with the use of either the X2 test or Fisher's exact test, depending on cell sizes. Continuous data were analysed with the use of an unpaired Student's t-test. The concordance between im-
plicit rater pairs, and implicit and explicit assessment province (Table 1). The age- and sex-adjusted rate for methods, was measured with the use of simple propor- each of the five study regions differed significantly from tional agreement and a K coefficient to correct for the provincial mean (p < 0.001). The proportion of patients eligible for random selecchance agreement. tion in the two study areas is delineated in Table 2, along with reasons for case exclusion. Coding errors conRESULTS tributed to the exclusion of about 7% of the charts reviewed. Twelve surgeons performed the 1 S8 procedures CHARACTERISTICS BASELINE in the high-rate region, and 32 performed the 164 reOf the 49 prespecified small areas of Ontario, the placements in the low-rate region. The baseline characage- and sex-adjusted population-based rates for the teristics of these patients are shown in Table 3. The pathree counties in the high-rate study region (Huron, Ox- tients selected for study were similar in terms of age, sex ford and Brant) ranked first, second and fifth respec- and prevalence of prior primary hip or knee replacetively. Scarborough and Toronto, the low-rate study re- ment, but not the index procedure: knee replacements gion, had the lowest rates of joint replacement in the were significantly more common among the patients in kne000peopleage and. sex-adjusted rates of tota hip d Agel of redee trio in patient district city,countyor 1992-93,by 0n C R ate City, county or district City, county or district Table 1:
Huron County Oxford County Lambton County Haldiman-Norfolk Regional Municipality Brant County Elgin County Perth County Grey County Dufferin County Kent County Bruce County Frontenac County Peterborough County Lennox and Addington County Prince Edward County Hastings County Muskoka District Haliburton County Lanark County Renfrew County Essex County Thunder Bay District Nipissing District Timiskaming District Cochrane District Middlesex County
Sudbury Regional Municipality Sudbury District Manitoulin District
844.6* 818.3* 697.1* 697.0* 690.4* 688.4t 687.6t 665.3t 665.3t 656.2t 647.0
631 .7t 624.1t 617.0 617.0 615.1 614.2 614.2 595.6 586.1 585.4t 581.0 572.6 572.6 572.5 555.9 553.9 553.9 553.9
60and'overiin
Simcoe County Halton County Hamilton-Wentworth Regional Municipality Waterloo Regional Municipal ty Ottawa, Western Region York Region Municipality Northumberland County Parry Sound District Kenora District Rainy River District Leeds & Grenville County Peel: Mississauga, City of Stormont, Dundas and Glengarry County Prescott and Russell County Toronto: East York, City of Durham County Toronto: Etobicoke, City of Ottawa, City of Victoria County Peel: Brampton, City of
Wellington County Niagara Regional Municipality Toronto: North York, City of Algoma District Toronto: York, City of Ottawa, Eastern Region Toronto: Scarborough, City of Toronto: Toronto, City of
R ate
551.4 536.8 536.3 534.4 526.5 521.4 515.0 500.6 500.2 500.2 494.3 490.3 486.3 486.3 484.2 479.0 478.2 469.5 469.1 465.6 462.2 460. it 432.5* 420.6 417.8t 411.2 398.7* 382.5*
*p< 0.001.
tpc 0.01. tp c 0.05.
CAN MED ASSOC J * SEPT. 15, 1996; 155 (6)
701
the high rate region (p = 0.002). This difference is con- RATINGS OF PAIN AND FUNCTIONAL STATUS sistent with the higher ratio of knee replacement to hip The results of the rating exercises are shown in replacement in the hospitals in the high-rate region than in those in the low-rate region during 1992-93 (1.22 v. Table 3. Reviewers continued to show a lower level of confidence in their judgements for patients in the high0.88 respectively; p < 0.001). rate region, despite the acquisition of additional inforrecords. However, t the key finding Table 2: Number of hip and knee replacements and of charts mation from office was that patients in the two study regions were similar reviewed and excluded, by study region with respect to their preoperative pain level and functional status. ASSESSMENT OF SURGICAL APPROPRLATENESS WITH EXPLICIT CRITERIA
lable 3: Patient characteristics and distribution of pain. fuinction, and confidence classifications by study region
According to the explicit criteria, lower-limb joint replacement was clearly inappropriate for patients over 60 years of age with mild pain and class II functional capacity. We found no significant difference in the proportion of patients with these characteristics between the two areas (Table 4). Given that the reviewers lacked confidence more frequently in their judgements about cases in the high-rate region than in their judgements about those in the lowrate region, we reanalysed the data two ways: first, we excluded profiles that were classified as "not confident"; and, second, we considered cases classified as "not confiinthe ortin fiinapproprite nal de terregional differences in the proportion of inappropriate cases did not reach statistical significance. ASSESSMENT OF SURGICAL APPROPRIATENESS BY IMPLICIT REVIEW
Implicit judgements of surgical appropriateness made 1-able 4: Appropriateness of hip and knee replacements determined by assessors using explicit case-selection criteria, rtudy region
102
CAN MED ASSOC J * 15 SEPT. 1996; 155 (6)
by the rheumatologists and orthopedic surgeons confirmed the findings based on the explicit criteria: no interregional differences in the distributions of appropriateness ratings for either rater pair was found (Fig. 1). Since distributions of confidence ratings were also similar, it is not surprising that excluding cases with low confidence ratings or assuming them to be "inappropriate" had no effect on the results (data available upon request). Upon analysing each reviewer's ratings of the 120 cases reviewed by all four specialist assessors, we again found no interregional differences in the distributions of appropriateness ratings. Analysis of the reviewers' assessment of the fictitious inappropriate cases confirmed a very high sensitivity for rating inappropriate cases. Except for one case about which the two rheumatologists were uncertain, the reviewers were very confident in correctly judging all 10 hypothetical cases to be inappropriate.
are similar in areas with different rates of intervention. We approached this question by comparing randomly selected Ontario residents who had undergone a hip or knee replacement in the fiscal year 1992-93 in high- and low-rate regions: we found no differences in patients' age, sex, symptoms or functional impairment, or in the proportion undergoing inappropriate surgery, as determined by two distinct methods of assessment. These findings provide specific evidence refuting the notion that higher rates of surgery mean poorer case selection and greater chances of inappropriate treatment for patients in need of joint replacement. More generally, our study provides an example of the type of research that has been advocated for clarifying the relation between clinical practice style and geographic variations in practice. Orthopedic
AGREEMENT AMONG ASSESSMENT METHODS
Rheumatologists
surgeons
IlW0
4 rtA -
90 -
AND RATERS
so -
Table 5 shows the extent of agreement among the methods and the raters. There was overall agreement on from 85% to 90% of the cases. However, this high rate of agreement was driven largely by a very high concordance on decisions in favour of joint replacement; for decisions against surgery, agreement ranged from 27% to 43% of the cases. Corresponding K values were only fair (range 0.20 to 0.38), in part because the low prevalence of inappropriate surgery causes K a to be conservatively biased (while biasing overall raw agreement upwards).2"
70 60 -
s040 -
30 -
20 -
_
10
P8 IU
PI I
A
U
A
Rating
DISCUSSION Analyses of small-area variations often lead to questions about whether the people selected for a procedure
Fig. 1: Distribution of cases of total hip or knee replacement rated by two orthopedic surgeons and two rheumatologists as inappropriate (1), uncertain (U) or appropriate (A) in a high-rate region (striped bars) and a low-rate region (black bars) in Ontario.
Table S: Extent of agreement among assessment methods and raters % of decisions
replacement
replacement
K
against joint
cases
No. (and %) of all decisions-.
321
290 (90.3)
94.8
34.0
0.29
321t
289 (90.0)
94.5
42.9
0.38
322 120
281 (87.3) 89 (74.2)
92.9 92.7
34.9 27.2
0.28 0.20
120
102 (85.0)
91.5
35.7
0.30
No. of
Comparison Explicit criteria v. orthopedic surgeons Explicit criteria v. rheumatologists Orthopedic surgeons v. rheumatologists Surgeon 1 v. surgeon 2 Rheumatologist 1 v. rheumatologist 2
Extent of agreement % of decisions in favour of joint
~-Appropriate` and "uncertain" categories were collapsed into "not inappropriate" for implicit -OTIe case was excluded fromn explicit review because of a lack of information.
review.
CAN MED ASSOC J * SEPT. 15, 1996; 155 (6)
703
ROBUST RESULTS
Methods used to assess the appropriateness of medical care are frequently the target of criticism.2223 Two study limitations are noteworthy: (a) the use of medical records as the source of patient data and (b) the subjective nature of panel judgements. We sought to minimize the effect of these factors on the overall results by supplementing the hospital record information with that from office charts and by assessing surgical appropriateness with the use of two complementary methods. The predefined explicit criteria developed by a multidisciplinary panel were applied to three groups of patients: a random sample; the same sample, assuming that surgery was not appropriate when reviewers' judgements about symptom status and disability lacked confidence; and a subgroup of patients for whom reviewers were confident in their judgements. In addition, we asked two musculoskeletal specialists to independently review each case as separate physician-patient encounters and provide appropriateness ratings based on their implicit clinical judgement. These judgements were analysed by reviewer and by specialty. Regardless of the criteria used or the conditions under which they were applied, we could not demonstrate a difference in the prevalence of inappropriate surgery between the high- and low-rate regions.
POTENTIAL SOURCES OF BIAS
What potential sources of bias might explain the similarity of patients from the high- and low-rate regions? Blinding could have been compromised, since the two chart abstractors were later involved in rating level of pain and functional status. However, abstracting and rating were separated by enough time (at least 6 months) and the number of charts abstracted was large enough (more than 150 for each physician) that this would have a small effect on the results. There was also a third physician involved in rating pain and functional status who had no involvement in chart abstraction. The two abstractors obviously could not be blinded as to region when reviewing the actual medical records. However, since identical, systematic data-abstraction methods were used in the high- and low-rate regions, no bias should have been introduced. The explicit reviewers lacked confidence in their rating of pain level and functional status more frequently for the patients in the high-rate region than for those in the low-rate region. Therefore, we had to approach more surgeons and family physicians in the high-rate region than in the low-rate region for supplementary information. If information from the office charts led to higher ratings of pain or dysfunction than did informa704
CAN MED ASSOC J * 15 SEPT. 1996; 155 (6)
tion from the hospital records alone, differences between the high- and low-rate regions may have been masked. However, we addressed this possibility in our sensitivity analysis, when we assumed that "notconfident" cases were "inappropriate"; doing so did not change the overall results (Table 4). The ratio of knee replacement to hip replacement was greater in the high-rate than in the low-rate region. This difference reflected the actual distribution of procedures in those regions but could have biased our results if the reviewers systematically favoured one procedure over the other. We believe this bias was unlikely for three reasons. First, the explicit case-selection criteria dealt exclusively with level of pain and functional status; the joint involved was not considered in the assessment process. Second, the specialists who participated in the implicit review were very experienced with both procedures, and two clinicians from each subspecialty contributed to the overall results. Third, when explicit and implicit ratings for hip and knee replacements were compared, no significant difference in appropriateness was found between the two types of procedures. Finally, we excluded patients who underwent joint replacement in non-study hospitals. This accounted for about 10% and 20% of all replacements performed in the high- and low-rate regions respectively. If included, virtually all of these cases would have to have been inappropriate in the high-rate region to demonstrate a significantly higher rate of inappropriate surgery than in the low-rate region. (For example, let us assume that the rate of inappropriate surgery in the low-rate region is no lower than 6.1 % among the patients excluded from the study sample. With a proportional sample of the 43 excluded patients in the high-rate region, 16 additional cases would be reviewed. To come close to the 10% absolute increase in the rate of inappropriate surgery deemed important at the outset of this study, all 16 cases would need to be rated as inappropriate by explicit criteria, thereby shifting the total to 26 [14.9%] of 174 cases being deemed inappropriate in the high-rate region.)
IMPLICATIONS
Findings from studies like this one are time-dependent and, for numerous reasons, not always generalizable to other jurisdictions. For example, although regional variations in rates of hip and knee replacement have been documented in the United States, one cannot assume that high- and low-rate areas in that country would have equally high rates of appropriate case selection. Theoretically, one might like to see our findings replicated in Canada, with other case reviewers and perhaps different sets of appropriateness criteria. Whether such studies are worth while, however, is open to debate. Ac-
cording to our panel's explicit criteria, the observed rate of inappropriate surgery in the low-rate region was clearly lower than our a priori estimate (6.1 % v. 10%), and the observed regional proportions were virtually identical. Given our sample size and the observed proportion of inappropriate cases (6.1 %) in the low-rate region, we had sufficient power to detect an 8% higher rate of inappropriate surgery in the high-rate region. Should future studies be designed to detect smaller increments? If so, how small should one go? Does the price paid for detecting minor interregional differences in already low rates of inappropriate surgery represent a worthwhile investment of limited health services research dollars? These questions need to be considered before further studies are launched. Since differences in case selection do not contribute to the regional variations in Ontario's joint-replacement rates, other factors must account for the observed differences. One plausible inference is that too few joint replacements are being performed in the lower-rate region. Other potential factors are the number of surgeons per capita, the availability of and referrals to other therapies and services, availability of operating rooms, size of hospital budgets for joint implants, and patients' preferences. However, there is little reason to expect major interregional differences in patients' preferences. Although the other factors may explain why rates vary, the question remains, are rates too low in some areas? A crucial step will be to rule out regional differences in the prevalence or burden of arthritis. A population-based survey of disease burden has recently been funded by the Medical Research Council of Canada and is now under way to determine definitively whether the prevalence of severe arthritis differs between areas in Ontario with low and high joint-replacement rates. We thank the chief executives, medical directors and chiefs of staff at the following participating centres: Brantford General Hospital, Brantford; Centenary Health Centre, Scarborough; Mount Sinai Hospital, Toronto; Orthopaedic and Arthritic Hospital, Toronto; St. Joseph's Health Centre of London, London; St. Joseph's Hospital, Brantford; Scarborough General Hospital, Scarborough; Stratford General Hospital, Stratford; Sunnybrook Health Science Centre, North York; The Toronto Hospital, Toronto; The Wellesley Hospital, Toronto; Toronto East General and Orthopaedic Hospital, Toronto; University Hospital, London; Victoria Hospital, London; Woodstock General Hospital, Woodstock. We are particularly indebted to the medical records staff of these institutions for their help. We also thank the orthopedic surgeons at these hospitals who were openly supportive of the study and who facilitated the project by sharing information directly with us.
2. Cohen MM, deBoer D, Young W. Total knee replacement. In: Naylor CD, Anderson GM, Goel V, editors. Patterns of health care in Ontario. Ottawa: Canadian Medical Association, 1994:76-9.
3. Peterson MG, Hollenberg JP, Szatrowski TP, Johanson NA, Mancuso CA, Charlson ME. Geographic variations in the rates of elective total hip and knee arthroplasties among Medicare beneficiaries in the United States. J Bone Joint Surg
[Am] 1 992;74: 1530-9. 4. Keller RB, Soule DN, Wennberg JE, Hanley DF. Dealing with geographic variations in the use of hospitals. The experience of the Maine Medical Assessment Foundation Orthopedic Study Group. J Bone Joint Surg [Am] 1 990;72: 1286-93. 5. Laupacis A, Bourne R, Rorabeck C, Feeny D, Wong C, Tugwell P, et al. The effect of elective total hip replacement on
health-related quality of life. J Bone Joint Surg [Am] 1993; 75: 1619-26. 6. Cleary PD, Greenfield S, McNeil BJ. Assessing quality of life after surgery. Controlled Clin Trials 1991; 12:1 89S-203S. 7. Kantz ME, Harris WJ, Levitsky K, Ware JE Jr, Davies AR. Methods for assessing condition-specific and generic functional status outcomes after total knee replacement. Med Care
1992;30:MS240-52. 8. Katz J, Larson M, Phillips C, Fossel A, Liang M. Comparative measurement sensitivity of short and longer health status instruments. Med Care 1992;30:917-25. 9. Chassin MR, Kosecoff J, Park RE, Winslow CM, Kahn KL, Merrick NJ, et al. Does inappropriate use explain geographic variations in the use of health care services? A study of three procedures. JAMA 1987;258:2533-7. 10. Leape LL, Park RE, Solomon DH, Chassin MR, Kosecoff J, Brook RH. Does inappropriate use explain small-area variations in the use of health care services? JAMA 1990; 263:669-72. 11. Chassin MR, Brook RH, Park RE, Keesey J, Fink A, Kosecoff J, et al. Variations in the use of medical and surgical services
by the Medicare population. NEngl JMed 1986;314:285-90. 12. Roos NP, Roos LL Jr, Henteleff PD. Elective surgical rates - Do high rates mean lower standards? Tonsillectomy and adenoidectomy in Manitoba. N Engl J Med 1977; 297: 360-5. 13. Naylor CD, Williams JI and the Ontario Panel on Hip and
Knee Arthroplasty. Primary hip and knee replacement surgery: the Ontario criteria for case selection and surgical priority. Qual Health Care 1996;5:20- 30. 1. Cohen MM, deBoer D, Young W. Total hip replacement. In: Naylor CD, Anderson GM, Goel V, editors. Patterns of health care in Ontario. Ottawa: Canadian Medical Association, 1994:72-6.
14. Statistics Canada. Canadian classification of diagnostic, therapeutic and surgical procedures. 2nd printing. Ottawa: Ministry of Industry, Science and Technology, 1993. CAN MED ASSOC J * SEPT. 15, 1996; 155 (6)
105
15. World Health Organization. International classification of diseases, 9th revision, clinical modification. Vol 1. Ann Arbour, Mich: Edward Bros, 1993. 16. Broughton NS, Newman JH, Baily RAJ. Unicompartmental replacement and high tibial osteotomy for osteoarthritis of the knee. A comparative study after 5-10 years' follow-up. J Bone Joint Surg [Br] 1 986;68:447-52. 17. Maistrelli GL, Gerundini M, Fusco U, Bombelli R, Bombelli M, Avai A. Valgus-extension osteotomy for osteoarthritis of the hip: indications and long-term results. J Bone Joint Surg [Br] 1990j72:653-7.
(suppl 1): S33-8.
20. Brook RH, Chassin MR, Fink A, Solomon DH, Kosecoff JB, Park RE. A method for the detailed assessment of the appropriateness of medical technologies. Int I1 Technol Assess Health Care 1986j2:53-64. 21. Feinstein AR, Cicchetti DV. High agreement but low kappa: 1. The problems of two paradoxes. J Clin Epidemiol 1990; 43:543-9.
22. Hicks NR. Some observations on attempts to measure appropriateness of care. BMJ 1994;309:730-3.
18. Wedge JH. Osteotomy of the pelvis for the management of hip disease in young adults. Can J Surg 1995i38(suppl 1) S25-32.
23. Phelps CE. The methodologic foundations of studies of the appropriateness of medical care. N Engl J Med 1993;
19. Santore RF, Dabezies EJ. Femoral osteotomy for secondary arthritis of the hip in young adults. Can J Surg 1995;38
24. Steinbrocker 0, Traeger CH, Batterman RC. Therapeutic criteria in rheumatoid arthritis. JAWA 1949; 140:659-62.
329: 1241-5.
Apperiix I Criteria used to assess preoperative Ievel of pain arit futictiona'i ta Li
706
CAN MED ASSOC J * 15 SEPT. 1996; 155 (6)
For prescribing information see page 831 ->