Appropriate questionnaires for knee arthroplasty - Semantic Scholar

0 downloads 0 Views 68KB Size Report
Dalhousie University, Suite 4822, QE II Health Sciences Centre, 1796. Summer ..... bistånd åt vanföra I Skåne, the Medical Faculty of the University of Lund,.
Appropriate questionnaires for knee arthroplasty RESULTS OF A SURVEY OF 3600 PATIENTS FROM THE SWEDISH KNEE ARTHROPLASTY REGISTRY M. J. Dunbar, O. Robertsson, L. Ryd, L. Lidgren From Lund University Hospital, Sweden

he Swedish Knee Arthroplasty Registry (SKAR) has recorded knee arthroplasties prospectively in Sweden since 1975. The only outcome measure available to date has been revision status. While questionnaires on health outcome may function as more comprehensive endpoints, it is unclear which are the most appropriate. We tested various outcome questionnaires in order to determine which is the best for patients who have had knee arthroplasty as applied in a cross-sectional, discriminative, postal survey. Four general health questionnaires (NHP, SF-12, SF-36 and SIP) and three disease/site-specific questionnaires (Lequesne, Oxford-12, and WOMAC) were tested on 3600 patients randomly selected from the SKAR. Differences were found between questionnaires in response rate, time required for completion, the need for assistance, the efficiency of completion, the validity of the content and the reliability. The mean overall ranks for each questionnaire were generated. The SF-12 ranked the best for the general health, and the Oxford-12 for the disease/site-specific questionnaires. These two questionnaires could therefore be recommended as the most appropriate for use with a large knee arthroplasty database in a cross-sectional population.

T

J Bone Joint Surg [Br] 2001;83-B:339-44. Received 10 March 2000; Accepted after revision 14 November 2000

Although over 70 000 knee arthroplasties have been registered with the Swedish Knee Arthroplasty Registry 1,2 (SKAR), the only outcome available has been revision

M. J. Dunbar, MD, FRCS C, Assistant Professor of Orthopaedics Dalhousie University, Suite 4822, QE II Health Sciences Centre, 1796 Summer Street, Halifax, Nova Scotia, Canada B3H 3A7. O. Robertsson, MD, Head, Swedish Knee Arthroplasty Registry L. Ryd, MD, PhD, Associate Professor of Orthopaedics L. Lidgren, MD, PhD, Professor of Orthopaedics Department of Orthopaedics, Lund University Hospital, S-221 85 Lund, Sweden. Correspondence should be sent to Dr M. J. Dunbar. ©2001 British Editorial Society of Bone and Joint Surgery 0301-620X/01/311134 $2.00 VOL. 83-B, NO. 3, APRIL 2001

status, which yields data on the small number of operations that fail, but gives no information on the status of most of 3 the patients. Questionnaires on health outcome can be 4-9 used to define more comprehensive endpoints. Numerous questionnaires are available for application to knee arthroplasty but there is no consensus as to which are the most 10,11 appropriate. Questionnaires are evaluated and compared using psychometric properties such as validity (content and construct), reliability and responsiveness. Direct comparison of questionnaires for knee arthroplasty is not possible since the psychometric properties of many of the questionnaires advocated have not been fully determined. Of those which have, they have often been determined for a general population. Patients who undergo arthroplasty of the knee are distinct from the general population in that they are older, fitter and have a longer 12,13 and previously defined properties of life expectancy, questionnaires may therefore not be directly transferable to 14 them. We have identified the relevant general health and disease/site-specific outcome questionnaires for knee arthroplasty and simultaneously tested them on a large random sample from the SKAR. It was supposed that differences in the validity and reliability properties, as well as feasibility of use (usable response rate and patient burden) on this population, would differ by questionnaire and that some would be more appropriate for this application than others.

Patients and Methods A list of relevant questionnaires for knee arthroplasty was compiled by means of a literature review using the following selection criteria: 1) previous application of the questionnaire to knee arthroplasty patients; 2) previous application to patients with osteoarthritis; and 3) previous publication of the results of validity, reliability and responsiveness tests. Four general health questionnaires, the Nottingham 15-17 a 12-item short-form health Health Profile (NHP), 18 survey (SF-12), a 36-item short-form health survey (SF19-21 22 36) and the Sickness Impact Profile (SIP), and three disease/site-specific questionnaires, the Algofunctional 339

340

M. J. DUNBAR, O. ROBERTSSON, L. RYD, L. LIDGREN 23

Index for the Knee (Lequesne), the Oxford 12-item Knee 24 Score (Oxford-12), and the Western Ontario and McMas25 ter Universities Osteoarthritis Index (WOMAC), met the above criteria. A total of 3600 patients was randomly selected from the registry. They had a diagnosis of primary osteoarthritis, age ≥55 years at the time of surgery but ≤95 years at the time of mail-out, and either a medial unicompartmental, lateral unicompartmental, bilateral (same knee) unicompartmental or total knee arthroplasty. We excluded patients known to have subsequently required an excision arthroplasty, amputation or arthrodesis. The patients were randomly divided into 12 groups of 300, each being sent a combination of one general health and one disease/site-specific questionnaire (four general health questionnaires ⫻ three disease/site-specific questionnaires = 12 groups). A covering letter with instructions was included with a postage-paid return envelope and a third questionnaire enquiring about the length of time required and the need for assistance to complete the questionnaires. A reminder letter was sent two weeks later to non-responders. The mean age of the patients at the time of mail-out was 77.7 years (57 to 94) and at the time of surgery 71.0 years (55 to 90); the mean follow-up time was 6.7 years (1 to 23). Of the sample, 2511 (69.8%) were female; 94.5% had not undergone revision surgery (removal, addition or exchange of a component), 2086 (57.9%) had had tricompartmental knee replacements, 1295 (36.0%) medial unicompartmental replacements, and 219 (6.1%) had had either a lateral unicompartmental or both compartments of the same knee replaced by a unicompartmental prosthesis. The adequacy of patient randomisation was confirmed by checking the distribution of these variables by questionnaire group. Three weeks after the first mailing, 420 (60 patients ⫻ seven questionnaires) were randomly selected from those who had responded to the first mail-out and were sent one repeat questionnaire (generic or disease/site-specific) in order to test the reproducibility of each questionnaire. The data were analysed by SPSS 8.0 (SPS Inc, Chicago, Illinois). ANOVA was used when comparing the time required for completion data. The chi-squared test was used when comparing the ratio of returned to unreturned questionnaires, the ratio of complete to incomplete questionnaires, and the ratio of patients requiring assistance to those not requiring assistance, for each type of questionnaire. A p value of less than 0.05 was considered to be significant for all tests. Geometric means were calculated for each score. Validity reflects how well the questionnaire works as a measure of the condition of interest. Content validity determines how well the questionnaire sampled the condition, and had three parameters: floor effect, ceiling effect and 26 skew of the frequency distribution. Patients scoring the best possible score on a questionnaire cannot demonstrate improvement on a subsequent application of the same questionnaire, even if they have improved clinically. This is

referred to as the floor effect. The ceiling effect is the opposite. The skew of the distribution of questionnaire scores is a quantitative indicator of how far the distribution deviates from that of a normal distribution. The larger the skew value, the greater is the deviation. Construct validity measures how the results correlate with a predefined hypothesis, or construct. Reliability is a measure of a questionnaire’s stability over repeated applications when the condition measured remains unchanged. It was investigated by examining two parameters: test-retest and internal consistency. Test-retest reliability examines the extent to which a measure reproduces 27 similar results on repeated trials and was calculated using 28 the intraclass correlation coefficient. This is a value from 0 to 1, with a score of 0 indicating no reliability and a score of 1 perfect reliability. Cronbach’s alpha coefficient was calculated to investigate the internal consistency of the 29 questions within each questionnaire. Internal consistency refers to what extent the individual questions within a questionnaire ask about a similar concept. Cronbach’s alpha coefficient ranges from 0 to 1, with a score of 0 indicating that the questions are enquiring about completely dissimilar concepts and a score of 1 that they are about the same concept. Responsiveness is a measure of a questionnaire’s ability to detect a change in the condition of interest. Feasibility was investigated by calculating the usable response rate for each questionnaire, as well as the time required for each patient to complete, and the need for assistance (burden). A low questionnaire score usually reflects a patient’s favourable impression of their health status, while a high score is the opposite. The exception is for the SF-12 and the SF-36. For the purposes of this paper, the scores for the SF12 and SF-36 have been inverted to aid comparison of questionnaire results. The Oxford-12 and the Lequesne questionnaires produce a single score, while the other questionnaires yield a number of subscores referred to as domains. Content validity and reliability results for questionnaires with domains were calculated by averaging the results from each domain. Ranks were assigned for each of the above tested parameters for each questionnaire. An average rank for each questionnaire by class (general health or disease/sitespecific) was generated.

Results Four weeks after the first mail-out 3052 (84.8%) patients responded by returning their questionnaire package with some attempt to complete the questions. In an additional 131 (3.6%) either the patients, family members or carers wrote back to explain why the questionnaires had not been completed. Eight envelopes were returned by the post office with the address unknown, leaving 409 (11.4%) of the questionnaire packages unaccounted for. The response rates for the SF-12, SF-36 and the NHP THE JOURNAL OF BONE AND JOINT SURGERY

APPROPRIATE QUESTIONNAIRES FOR KNEE ARTHROPLASTY

341

Table I. Gross and net response rates for general health and disease/site-specific questionnaires Questionnaire

Number received by patients*

Number returned

General health NHP SF-12 SF-36 SIP

896 895 899 893

764 782 779 727

Disease-specific Lequesne Oxford-12 WOMAC

1194 1194 1195

1012 1026 1014

Gross % returned (95% CI)

% Completed† (95% CI)

Net % return‡ (95% CI)

85.3 87.4 86.6 81.4

67.2 75.4 63.0 67.9

57.3 65.9 54.6 55.3

(85.2 (87.3 (86.5 (81.3

to to to to

85.4) 87.5) 86.7) 81.5)

84.8 (84.7 to 84.9) 85.9 (85.8 to 86.0) 84.9 (84.8 to 85.0)

(67.1 (75.3 (62.9 (67.8

to to to to

67.3) 75.5) 63.1) 68.0)

79.1 (79.0 to 79.2) 89.4 (89.3 to 89.5) 83.0 (82.9 to 83.1)

(57.2 (65.8 (54.5 (55.2

to to to to

57.4) 66.0) 54.7) 55.4)

59.2 (59.1 to 59.3) 76.7 (76.6 to 76.8) 70.5 (70.4 to 70.6)

* number of patients sent a questionnaire package minus those returned by post office or with note indicating that the patient was deceased † percentage of questionnaires returned which were fully completed ‡ percentage returned multiplied by percentage completed

(87.4%, 86.6% and 85.3%, respectively), were significantly higher than those for the SIP (81.4%; chi-squared test, p < 0.001; Table I). There was no difference in the response rates for the disease/site-specific questionnaires (Table I). For the general health questionnaires the SF-12 had the highest percentage of completed questionnaires returned (75.4%; chi-squared test, p < 0.001); the SIP (67.9%) and the NHP (67.2%) were indistinguishable. The SF-36 had a significantly lower efficiency of completion (63.0%; chisquared test, p < 0.001; Table I). Of the disease/site-specific questionnaires, the Oxford-12 had the highest percentage of completed questionnaires (89.4%) followed by the WOMAC (83.0%) and the Lequesne (79.1%) (Table I). The completion efficiencies for the disease/site-specific questionnaires were all significantly different (chi-squared test, p < 0.001). Multiplying the response rate by the percentage of completed forms yields the net percentage of questionnaires which were returned completed. The highest net percentage for the general health questionnaires was for the SF-12 (65.9%), followed by the NHP (57.3%), the SIP (55.3%) and the SF-36 (54.6%) (Table I). The Oxford-12 was the highest for the disease/site-specific questionnaires (76.7%), followed by the WOMAC (70.5%) and the Lequesne (59.2%) (Table I). Of the general health questionnaires, the SIP required the most time to complete (23.2 minutes, 95% confidence interval (CI) 21.5 to 24.9) followed by the SF-36 (14.2 minutes, 95% CI 13.5 to 14.9), the NHP (10.5 minutes, 95% CI 9.8 to 11.2) and the SF-12 (7.7 minutes, 95% CI 7.2 to 8.2). The WOMAC required the most time to complete for the disease/site-specific questionnaires (11.7 minutes, 95% CI 11.0 to 12.4), followed by the Oxford-12 (9.6 minutes, 95% CI 9.1 to 10.1) and the Lequesne (8.2 minutes, 95% CI 7.7 to 8.7). The differences between all times for the general health and disease/site-specific questionnaires were significant (ANOVA, p < 0.0001). Patients reported a significantly greater frequency (28.7%) of requiring assistance to complete the SF-36 compared with the other general health questionnaires (chisquared test, p = 0.005). The SF-12 had the second highest reported frequency (23.8%) followed by the NHP (22.9%) VOL. 83-B, NO. 3, APRIL 2001

and the SIP (21.0%). For the disease/site-specific questionnaires, similar frequencies for requiring assistance were observed (Lequesne 25.5%, WOMAC 23.3%, Oxford-12 22.7%). Considerable floor effects were found for the NHP (45.7%) and the SIP (55.8%). The mean floor effect for the SF-36 (17.1%) was much lower, and the SF-12 had no floor effect (Table II). The highest floor effect for the disease/ site-specific questionnaires was found for the WOMAC (18.3%), with lower scores for the Lequesne and Oxford-12 (6.4% and 6.8%, respectively) (Table II). A wide variation was seen for ceiling effects with the general health questionnaires. The SF-36 was the highest (12.5%), followed by the NHP (4.7%) and the SIP (4.4%). The SF-12 had no ceiling effect (Table II), and much lower ceiling effects were seen for the disease/site-specific questionnaires. The WOMAC (0.8%) was slightly higher than the Oxford-12 (0.1%) (Table II), and there was no ceiling effect for the Lequesne. The mean distribution for the domains of the SIP and the NHP were quite skewed (2.9 and 1.4, respectively). The distributions for the SF-12 and SF-36 were less skewed and similar to each other (0.1). Generally, the distributions for the disease/site-specific questionnaires were less skewed than those for the general health questionnaires (Table II). The mean intraclass correlation coefficients for the group of general health questionnaires ranged from 0.91 (NHP) to 0.75 (SF-36) (Table II). The highest intraclass correlation coefficient for the disease/site-specific questionnaires ranged from 0.94 (Oxford-12) to 0.85 (Lequesne). The Cronbach’s alpha coefficient for the SF-12 (0.62) was lower than all the others. The SF-12 ranked best overall for the general health questionnaires, and the Oxford-12 ranked best overall for the disease/site-specific questionnaires, when the mean individual ranks for each parameter were considered (Table III).

Discussion We have avoided comparing the construct validity of the questionnaires tested. Constructs are generated for each

342

M. J. DUNBAR, O. ROBERTSSON, L. RYD, L. LIDGREN

Table II. Breakdown of reliability and construct validity factors, as well as scores by domain for general health and disease/site-specific questionnaires Reliability

Content validity

Scores

Questionnaire

Cronbach's alpha*

ICC†

Floor

Ceiling

Skew

Mean‡ (95% CI)

Possible score range

General Health Nottingham Health Profile (n = 764) Emotional reaction Sleep Energy Pain Physical mobility Social isolation Mean

0.85 0.72 0.64 0.85 0.80 0.60 0.74

0.84 0.89 0.91 0.95 0.97 0.87 0.91

58.37 27.99 49.59 38.10 25.11 74.97 45.69

1.16 2.79 19.61 2.77 1.56 0.42 4.72

2.04 1.10 0.68 1.18 0.71 2.37 1.35

13.3 (11.6 to 15.0) 25.1 (23.1 to 27.1) 33.8 (30.9 to 36.7) 23.0 (20.9 to 25.0) 28.2 (26.4 to 30.1) 9.3 (7.9 to 10.7) N/A

0 0 0 0 0 0

SF-12 (n = 782) Physical component summary Mental component summary Mean

0.62 0.62 0.88

0.85 0.92 0.02

0.02 0.02 0.00

0.00 0.00 -0.09

0.25 -0.42 N/A

37.3 (36.4 to 38.1) 49.7 (48.8 to 50.7)

0 to 100 0 to 100

SF-36 (n = 779) Physical functioning Role-physical Body pain General health Vitality Social functioning Role-emotion Mental health Transition Mean Physical component summary Mental component summary

0.90 0.88 0.92 0.81 0.82 0.75 0.88 0.83 N/A 0.85 N/A N/A

0.89 0.57 0.86 0.88 0.69 0.77 0.71 0.80 0.56 0.75 0.93 0.82

0.79 21.32 17.70 3.26 3.26 36.02 41.41 12.82 N/A 17.07 N/A N/A

5.83 49.50 3.37 0.59 1.88 2.45 36.08 0.43 N/A 12.52 N/A N/A

-0.14 -0.69 -0.14 0.02 0.08 0.85 0.09 0.68 0.21 0.11 -0.29 0.43

43.2 (41.2 to 45.2) 34.1 (31.1 to 37.0) 56.3 (54.1 to 57.6) 55.9 (54.1 to 57.6) 52.9 (51.0 to 54.8) 73.5 (71.4 to 75.5) 52.4 (49.1 to 55.7) 72.1 (70.5 to 73.8) 3.2 (2.3 to 4.2) N/A 33.3 (32.4 to 34.3) 47.9 (46.8 to 49.1)

0 0 0 0 0 0 0 0 0

Sickness Impact Profile (n = 727) Sleep and rest Emotional behaviour Body care and movement Home management Mobility Social interaction Ambulation Alertness behaviour Communication Work Recreation and pastimes Eating Mean Physical dimension Psychosocial dimension Total score

0.62 0.80 0.88 0.86 0.81 0.88 0.82 0.85 0.75 N/C 0.71 0.84 0.80 N/A N/A N/A

0.81 0.96 0.87 0.87 0.89 0.76 0.88 0.63 0.73 0.68 0.85 0.52 0.79 0.92 0.87 0.97

40.54 68.49 42.67 52.77 59.97 46.04 28.12 67.91 75.00 69.00 37.98 80.97 55.79 28.15 41.63 22.98

0.57 0.72 0.58 46.37 0.88 0.72 0.43 1.16 0.58 0.23 0.30 0.59 4.43 0.30 0.44 0.00

1.84 3.16 2.37 1.62 2.59 3.83 1.14 3.12 4.27 1.14 1.25 8.64 2.91 2.18 4.07 2.57

22.4 (21.2 to 23.6) 23.4 (22.2 to 24.6) 18.5 (17.3 to 19.6) 34.1 (32.3 to 35.8) 23.0 (21.8 to 24.3) 13.2 (12.3 to 14.2) 18.6 (17.3 to 20.0) 25.5 (24.2 to 26.8) 19.8 (18.9 to 20.6) 61.7 (58.8 to 64.5) 27.6 (26.1 to 29.1) 2.3 (1.6 to 2.9) N/A 12.3 (11.1 to 13.4) 6.8 (5.9 to 7.6) 8.9 (7.9 to 9.9)

0 0 0 0 0 0 0 0 0 0 0 0

0.77 0.93

0.85 0.94

6.38 6.76

0.00 0.11

0.42 0.73

8.9 (8.6 to 9.3) 25.5 (24.9 to 26.2)

0 to 25 12 to 60

0.91 0.91 0.98

0.95 0.90 0.92

20.48 25.76 8.59

0.52 1.93 0.12

0.72 0.54 0.34

5.1 (4.8 to 5.4) 2.3 (2.2 to 2.4) 23.0 (21.9 to 24.2)

5 to 25 2 to 10 17 to 75

0.93

0.92

18.27

0.85

0.53

N/A

Disease/site-specific Lequesne (n = 1012) Oxford-12 (n = 1026) WOMAC (n = 1014) Pain Stiffness Physical function Mean

to to to to to to

to to to to to to to to to

100 100 100 100 100 100

100 100 100 100 100 100 100 100 100

0 to 100 0 to 100 to to to to to to to to to to to to

100 100 100 100 100 100 100 100 100 100 100 100

0 to 100 0 to 100 0 to 100

* Cronbach's alpha (see text) † intraclass correlation coefficient (see text) ‡ geometric mean

questionnaire because there are no established standards to which the results of outcome questionnaires can be compared. Paradoxically, because there is no established standard for knee arthroplasty, constructs are validated against other questionnaires or against the surgeon’s perception of the patient’s status. Validating one questionnaire against another involves circuitous logic, and the use of ‘objective’ 30,31 Therefore, we ratings by surgeons introduces bias.

elected to concentrate on the content validity as opposed to construct validity. We also avoided comparing the responsiveness of the questionnaires tested, since the purpose of our study was to define questionnaires which would be appropriate for a cross-sectional, discriminative postal survey. Responsive11 ness is immaterial for this purpose. We did not limit this study to tricompartmental or primary arthroplasties intenTHE JOURNAL OF BONE AND JOINT SURGERY

APPROPRIATE QUESTIONNAIRES FOR KNEE ARTHROPLASTY

343

Table III. Mean ranked values for general health and disease/site-specific questionnaires for each parameter (1 = highest rank, 4 = lowest rank) Burden

Feasibility

Content validity

Reliability

Questionnaire

Time

Help

Response

% Comp

Floor

Ceiling

Skew

ICC*

C. alpha†

Average rank

General health Nottingham Health Profile SF-12 SF-36 Sickness Impact Profile

2 1 3 4

2 3 4 1

3 1 2 4

3 1 4 2

3 1 2 4

3 1 4 2

3 1.5 1.5 4

1 2 4 3

3 4 1 2

2.6 1.7 2.8 2.9

Disease/site-specific Lequesne Oxford-12 WOMAC

1 2 3

3 1 2

3 1 2

3 1 2

1 2 3

1 2 3

1 3 2

3 1 2

3 1.5 1.5

2.1 1.6 2.3

* intra-class correlation coefficient (see text) † Cronbach's alpha (see text)

tionally so that the results could be applied to future postal surveys and to most patients who have been registered. Previous published comparative studies have reported various aspects of specific outcome questionnaires. All questionnaires tested had a response rate which was higher than expected when compared with other published 32-34 35 results. Stucki et al compared the SF-36 with the SIP on 54 patients undergoing elective total hip replacement. They also found large floor effects for the SIP, and concluded that it was a less relevant questionnaire than the SF36 for total hip arthroplasty. This agrees with our results. 36 Beaton, Hogg-Johnson and Bombardier investigated the reliability and responsiveness of five general health questionnaires as applied to workers with musculoskeletal complaints. The questionnaires included the NHP, SF-36 and SIP. Reliability estimates (intraclass correlation coefficients) for these questionnaires were slightly higher than ours, which may reflect the fact that the population which was studied was younger. Their reliability estimates, however, ranked in the same order as ours (NHP>SIP>SF-36). 14 Essink-Bot et al compared four general health questionnaires, including the SF-36 and NHP, on a population suffering from migraine. They found the NHP to have better feasibility but a more skewed distribution, with a larger percentage of minimum scores, and lower internal consistency (Cronbach’s alpha) than the SF-36. These findings are in agreement with ours. In conclusion, we found the SF-12 and the Oxford-12 to have the best overall ranking for general health and disease/ site-specific questionnaires, respectively, and based on this, these are the questionnaires which we recommend for use in a cross-sectional discriminative application, at least in Sweden. The Lequesne, WOMAC, SF-36 and NHP performed satisfactorily. Based on poor performance over multiple parameters, we do not recommend the use of the SIP in this context. We thank Jonas Ranstam for statistical consultation. The study was funded by grants from the Arthritis Society of Canada, the Medical Research Council of Sweden (Project 9509), Stiftelsen för bist˚and a˚ t vanföra I Sk˚ane, the Medical Faculty of the University of Lund, and Socialstyrelsen. No benefits in any form have been received or will be received from a commercial party related directly or indirectly to the subject of this article. VOL. 83-B, NO. 3, APRIL 2001

References 1. Robertsson O, Dunbar MJ, Knutson K, Lewold S, Lidgren L. The Swedish Knee Arthroplasty Register: 25 years experience. Bull Hosp Jt Dis 1999;58:133-8. 2. Robertsson O, Dunbar M, Knutson K, Lewold S, Lidgren L. Validation of the Swedish Knee Arthroplasty Register: a postal survey regarding 30 376 knees operated on between 1975 and 1995. Acta Orthop Scand 1999;70:467-72. 3. Apley AG. An assessment of assessment. J Bone Joint Surg [Br] 1990;72-B:957-8. 4. Lieberman JR, Dorey F, Shekelle P, et al. Outcome after total hip arthroplasty: comparison of a traditional disease-specific and a quality-of-life measurement of outcome. J Arthroplasty 1997;12:639-45. 5. Ritter MA, Albohm MJ, Keating EM, Faris PM, Meding JB. Comparative outcomes of total joint arthroplasty. J Arthroplasty 1995;10:737-41. 6. Bombardier C, Melfi CA, Paul J, et al. Comparison of a generic and a disease-specific measure of pain and physical function after knee replacement surgery. Med Care 1995;33 Suppl 4:131-44. 7. Wiklund I, Romanus B. A comparison of quality of life before and after arthroplasty in patients who had arthrosis of the hip joint. J Bone Joint Surg [Am] 1991;73-A:765-9. 8. Rissanen P, Aro S, Sintonen H, Slatis P, Paavolainen P. Quality of life and functional ability in hip and knee replacements: a prospective study. Qual Life Res 1996;5:56-64. 9. McGrory BJ, Harris WH. Can the Western Ontario and McMaster Universities (WOMAC) osteoarthritis index be used to evaluate different hip jonts in the same patient? J Arthroplasty 1996;11:841-4. 10. Kreibich DN, Vaz M, Bourne RB, et al. What is the best way of assessing outcome after total knee replacement? Clin Orthop 1996;331:221-5. 11. Kirshner B, Guyatt G. A methodologist framework for assessing health indices. J Chronic Dis 1985;38:27-36. 12. Schroder HM, Kristensen PW, Petersen MB, Nielsen PT. Patient survival after total knee arthroplasty: 5-year data in 926 patients. Acta Orthop Scand 1998;69:35-8. 13. Ries MD, Philbin EF, Groff GD, et al. Improvement in cardiovascular fitness after total knee arthroplasty. J Bone Joint Surg [Am] 1996;78-A:1696-701. 14. Essink-Bot ML, Krabbe PF, Bonsel GJ, Aaronson NK. An empirical comparison of four generic health status measures: The Nottingham Health Profile, the Medical Outcomes Study 36-item Short-Form Health Survey, the COOP/WONCA charts, and the EuroQol instrument. Med Care 1997;35:522-37. 15. Hunt SM, McKenna SP, McEwen J, et al. A quantitative approach to perceived health status: a validation study. J Epidemiol Community Health 1980;34:281-6. 16. Wiklund I, Romanus B, Hunt SM. Self-assessed disability in patients with arthrosis of the hip joint: reliability of the Swedish version of the Nottingham Health Profile. Int Disabil Stud 1988;10:159-63. 17. Hunt SM, McKenna SP, McEwen J, Williams J, Papp E. The Nottingham Health Profile: subjective health status and medial consultations. Soc Sci Med A 1981;15:221-9.

344

M. J. DUNBAR, O. ROBERTSSON, L. RYD, L. LIDGREN

18. Ware J Jr, Kosinski M, Keller SD. A 12-Item Short-Form Health Survey: construction of scales and preliminary tests of reliability and validity. Med Care 1996;34:220-33. 19. Ware JE, Sherbourne CD. The MOS 36-item short-form health survey (SF36). I. conceptual framework and item selection. Med Care 1992;30:473-83. 20. Brazier JE, Harper R, Jones NM, et al. Validating the SF-36 health survey questionnaire: new outcome measure for primary care (see comments). BMJ 1992;305:160-4. 21. Sullivan M, Karlsson J, Ware JE Jr. The Swedish SF-36 Health Survey–I: evaluation of data quality, scaling assumptions, reliability and construct validity across general populations in Sweden. Soc Sci Med 1995;41:1349-58. 22. Pollard WE, Bobbitt RA, Bergner M, Martin DP, Gilson BS. The Sickness Impact Profile: reliability of a health status measure. Med Care 1976;14:146-55. 23. Lequesne MG, Mery C, Samson M, Gerard P. Indexes of severity for osteoarthritis of the hip and knee: validation value in comparison with other assessment tests. Scan J Rheumatol Suppl 1987;65:85-9. 24. Dawson J, Fitzpatrick R, Murray D, Carr A. Questionnaire on the perception of patients about total hip replacement. J Bone Joint Surg [Br] 1998;80-B:63-9. 25. Bellamy N, Buchanan WW, Goldsmith CH, Campbell J, Stitt LW. Validation study of WOMAC: a health status instrument for measuring clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol 1988;15:1833-40. 26. Hays RD, Anderson R, Revicki D. Psychometric considerations in evaluating health-related quality of life measures. Qual Life Res 1993;2:441-9.

27. Ware JE, Snow KK, Kosinski M, Gandek B. SF-36 Health Survey: Manual and Interpretation Guide. Boston: Nimrod Press, 1993. 28. Bland JM, Altman DG. Measurement error and correlation coefficients. BMJ 1996;313:41-2. 29. Bland JM, Altman DG. Cronbach’s alpha. BMJ 1997;314:572. 30. Ryd L, Dahlberg L. On bias. Acta Orthop Scand 1994;65:499-504. 31. Ryd L, K¨arrholm J, Ahlvin P, and the score assessment group. Knee scoring systems in gonarthrosis: evaluation of interobserver variability and the envelope of bias. Acta Orthop Scand 1997;68:41-5. 32. Asch DA, Christakis NA. Different response rates in a trial of two envelope styles in mail survey research. Epidemiology 1994;5:364-5. 33. McHorney CA, Kosinski M, Ware JE Jr. Comparisons of the costs and quality of norms for the SF-36 health survey collected by mail versus telephone interview: results from a national survey. Med Care 1994;32:551-67. 34. Plant P, McEwen J, Prescott K. Use of the Nottingham Health Profile to test the validity of census variables to proxy the need for health care. J Public Health Med 1996;18:313-20. 35. Stucki G, Liang MH, Phillips C, Katz JN. The Short Form-36 is preferable to the SIP as a generic health status measure in patients undergoing elective total hip arthroplasty. Arthritis Care Res 1995;8:174-81. 36. Beaton DE, Hogg-Johnson S, Bombardier C. Evaluating changes in health status: reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol 1997;50:79-93.

THE JOURNAL OF BONE AND JOINT SURGERY