Assessing Change Over Time in Patients With Low Back Pain

4 downloads 0 Views 579KB Size Report
patients with mechanical low back pain who were referred by physicians to ... areas for the RM (0.79)) O W (0.78), a n d w pain (0.79) questionnaires were sig-.
Research Report

Assessing Change Over Time in Patients With Low Back Pain

Background and Purpose. This study compared the ability of the Roland-Morris (RM), Oswesty ( O m , and Jan van Breemen Institute ( w )pain and function questionnaires to detect change over time. Subjects. The sample consisted of 88 patients with mechanical low back pain who were referred by physicians to the outpatient physical therapy department of a teaching hospital. Methods. Questionnaires were completed by the subjects at their initial visit and 4 to 6 weeks later. Clinically important change was estimated by having the subject and the clinician independently complete two rating scales. Sensitivity to change was assessed using receiver operating characterisic (ROC) curve analysis. Results. The ROC curve areasfor the RM (0.79))O W (0.78), a n d w pain (0.79) questionnaires were significantly greater thanfor t h e w function questionnaire (0..66). Blank and multiple responses per item were present on approximately 20% of the OSW questionnaires and 14% of theJVB questionnaires. Wordr;rather than checb were evident on 3% of the RM questionnaires. Coneluston and Dtscusston. Based o n the latter/inding, we believe the RM questionnaire may be the preferred instrumentfor assessing change over time in patients with low back pain. [Stratford PW, Binkley J, Solomon .P, et al. Assessing change over time in patients with low back pain. Phys Thm 199.g 74:528-533.1

Paul W Stratford Jill Binkley Patrlcia Solomon Caroline Gill Elspeth Flnch

Key Words: Backache, Evaluation, Functional status, Measurement scales.

Low back pain is a condition for which outcomes such as death and

cure are usually inappr~priate.~ In recognition of the deficiencies associ-

PW Stratford, MSc, is Assistant Professor, School of Occupational Therapy and Physiotherapy, McMaster University, and Consultant, West End Physiotherapy Clinic, Hamilton, Ontario, Canada. Address all correspondence to Mr Stratford at School of Occupational Therapy and Physiotherapy, Health Sciences Centre (Rm lJll),McMaster University, 1200 Main St W, Hamilton, Ontario, Canada L8N 325.

J Binkley, MClSc, COMP, is Assistant Professor, Department of Physical Therapy, North Georgia College, Barnes Hall, Rm A-6, Dahlonega, GA 30597, and Orthopaedic Clinical Specialist, St Joseph's Hospital, Dahlonega, Ga 30533. At the time of this study, she was Assistant Professor, School of Occupational Therapy and Physiotherapy, McMaster University, and Clincial Specialist, Physiotherapy . . Department, St Joseph's Hospital. ~

~

~

P Solomon, MHSc, is Assistant Professor, School of Occupational Therapy and Physiotherapy, McMaster University. C Gill, Dip FT,is Acting Senior Physiotherapist, St Joseph's Hospital. E Finch, MIISc, is Assistant Professor, School of Occupational Therapy and Physiotherapy, McMaster University. .4t the time of this study, she was Assistant Professor, Division of Physical Therapy, Department of Rehabilitation Medicine, University of Toronto, and Research Associate, Department of Rehabilitation, Orthopaedic and Arthritic Hospital, Toronto, Ontario, Canada M5V 1A6 This project was funded by the McGregor Clinic Fund of the Hamilton Foundation.

This article was submitted June 28, 1993, and was accepted January 14, 1994.

Physical Therapy/Volume 74, Number 6/June 1994

ated with these measures, investigators have sought alternate strategies for assessing patient outcome. One approach, the functional status o r disability questionnaire, has received attention Over the past decade. Also during this period, an increased understanding of method010gical criteria for assessing functional status measures has been advanced. Kirshner and Guyatt2 have offered a convenient classification scheme that divides measures into (1) those that discriminate among patients at a single point in time, (2) those that ~redicta subse~uentevent or Outcome, and (3) those that ,=Valuate longitudinal change in a patient. From a measurement perspective, reliability is the required property of the first measure, validity, (at . a single

-

528 / 17

point in time) is the required property of the second measure, and the ability to assess change over time is the required property of the third measure.l Specifically, an evaluative measure should yield small variation between replicate measures (reliability), display a strong relationship in change scores between itself and external measures (validity), and have the power to detect a clinically important change (resp~nsiveness).~ Numerous functional status questionnaires have been proposed for patients with low back pain (LBP), and Deyol has provided a thoughtful summary of the measurement properties of many of these questionnaires. From his review, it is evident that a moderate amount of information concerning the reliability and validity (concurrent at a single point in time) of these questionnaires is available. Furthermore, it is clear that only limited inquiry has been directed toward identifying the evaluative properties of these measures. Clinical intervention trials are expensive, both in terms of time and money, and the anticipated therapeutic response is modest at best (d-index of 0.34).3 (The d-index is a measure of effect size. It represents the between-intervention group difference divided by the common standard deviation.3) Accordingly, information concerning the evaluative property of outcome measures is crucial for those who plan and conduct clinical trials. Perhaps the greatest information concerning the evaluative property of functional status questionnaires related to LBP exists for the Sickness Impact Profile SIP)^ and its abbreviated adaptation, the Roland-Morris (RM) questionnaire.5 The SIP was developed to provide a behaviorally based measure of perceived health status and consists of 136 items that can be self-administered o r administered by an interviewer in about 25 minutes4 Furthermore, the SIP was constructed to be applicable across a spectrum of illnesses and among various demographic and cultural subgroups. Two important facets of the SIP are its ability to detect longitu-

dinal change within a group and its proficiency in determining differences in health status among groups. The RM questionnaire consists of 24 items chosen from the SIP to cover a variety of activities of daily living, is selfadministered, and takes about 5 minutes to complete.5 In order to improve the specificity of the response, the phrase "because of my b a c k was added to each item. An item receives a score of 1 if it is checked as applicable by the respondent and a score of 0 if ir is not marked. Thus, total scores can vary from 0 (no disability) to 24 (severe disability). Both of these measures have demonstrated high testretest reliability (Pearson r) (SIP r=.81, RM r= 83) and moderate levels of validity when correlated with the patient's self-rating of pain (SIP r= 3 8 , RM r=.42).6 Furthermore, the RM questionnaire has demonstrated moderate correlations (r=.50-.63) with other functional assessment and pain disability indexes.' Both the SIP and RM questionnaires have been shown to reflect positive changes in patients over a 3- to 4-week period.6.8 In an attempt to examine the validity of the change, difference scores (ie, pretest score minus posttest score) have been correlated with a three-point (better, unchanged, worse) clinical rating scale. Surprisingly, the correlations were quite low (RM r=.16, SIP These results could signal that the questionnaires cannot be used to detect clinically meaningful change. Alternately, it could be argued that the low correlation coefficients could be a result of the insensitivity of the three-point scale. (Presumably, most patients would indicate they were better; however, the question is: How much better?) In support of this conjecture, Solomon and colleagues9 have reported a correlation of .63 when RM questionnaire difference scores were correlated with a 15-point global rating of change scale. In a subsequent study using the receiver operating characteristic (ROC) curve analysis, Deyo and Centor10 reported SIP and RM questionnaire areas of 0.70 and 0.72, respectively, for the criterion return to full

activity and corresponding areas of 0.61 and 0.67 for a criterion based on a patientclinician consensus using a six-point improvement scale. (An area of 0.50 represents accuracy equivalent to chance alone in identifying patients who improve and those who d o not improve, whereas an area of 1.00 represents perfect accuracy.) Follick and colleaguesn have also reported significant SIP change scores for patients with LBP receiving treatment compared with waitlisted patients not receiving treatment. A number of competing disability

questionnaires suitable for patients with LBP exist. Two popular alternatives to those discussed are the Oswestry (OSW) questionnaire12 and the Jan van Breemen Institute (JVB) pain and functional capacity quesrionnaire.13 The OSW questionnaire is divided into 10 sections, each with six response statements.1Each section is scored on a six-point scale (0-5), and the overall score is expressed as a percentage. Thus, scores can vary from 0% (no disability) to 100% (a great deal of disability). This questionnaire is self-administered and takes about 5 minutes to complete. The OSW questionnaire has demonstrated a high level of test-retest reliability (r= .99) when assessed on consecutive days and has displayed significant positive change over a 3-week period in a group of patients with a high likelihood of spontaneous recovery.12 The JVB questionnaire consists of two sets of questions: Six questions address pain, and nine questions deal with functional capacity. Each question is scored on an 11-point scale (0-10). Thus, the pain scale scores can vary from 0 (no pain) to 60 (maximum pain), and the functional scale scores can vary from 0 (poor function) to 90 (excellent function).l3 This questionnaire can be self-administered, and our own experience suggests that a patient can complete the questionnaire in less than 5 minutes. Lankhorst et all3 have shown that the JVB questionnaire has a high level of test-retest and interrater reliability (about .90 for each of the pain and function sections). Existing

Physical 'I'herapy/Volume 74, Number 6/June 1994

i

,

,

1 1

work has not examined the validity or responsivc:ness of this instrument. Deyo poir~tsout that "while the development of new instruments should be encouraged where necessary, we may hope that investigators will not reinvent the wheel."l(p105*)Furthermore, Deyo acknowledges that there are few data to support one questionnaire being superior to the rest, and he calls for direct comparisons of existing instruments. The purpose of our study was to compare the ability of the RM, OSW, and JVB questionnaires to detect change over time. We hypothesized that there would be no difference in the ability of these questionnaires to perform this task.

Method Subjects Eighty-eight consecutive patients with mechanical LBP who were referred by physicians to the outpatient physical therapy department of a large teaching hospital were eligible for this study. Patients were included in this study if they (1) were diagnosed as having musculoskeletal LBP, (2) could read English, and (3) provided written informed consent. Fifty of the patients were male, and the average age of the entire sample was 41 years (SD= 11.6).The mean duration of symptoms was 48 days (SD=36). Of the 88 consecutive patients fulfilling the eligibility criteria, 76 patients had sustained work-related injuries.

Design The patients completed the RM, OSW, and JVB questionnaires prior to commencing physical therapy and following 4 to 6 weeks of treatment. The order of a.dministration of the questionnaires was randomly assigned to the subjects in balanced blocks of three. The instructions published with the questionnaires were identical to those reported by their authors. We must emphasize that the aim of this study was to compare the ability of the questionnaires to measure change over time and not to evaluate the treatment received by the patients.

The administration of physical therapy and the 4- to 6-week interval served as our construct for change. An estimate of true and meaningful change was obtained by having the patient and the clinician independently complete two 15-point global rating scales (integer values varying from -7 to +7). One scale addressed the magnitude of change, and the other scale measured the importance of the change. Specifically, this instrument assessed whether the patient was better, about the same, o r worse. If the response was either better o r worse, the respondent indicated which of the following terms best described the change: (1) tiny bit, almost the same; (2) a little bit; (3) somewhat; (4) moderately; (5) quite a bit; (6) a great deal; and (7) a very great deal. This transitional scale has been reported previously.9J4 Clinicians completed their rating after assessing the patients at follow-up, but prior to the patients completing the questionnaires. The patients completed their questionnaires within 1 day of the clinician's assessment. The patients and clinicians were unaware of each other's responses.

Data Analysis Means and standard deviations were calculated for initial, follow-up, and change scores for the three questionnaires. The criterion standard for change was obtained by averaging the patient's and clinician's ratings of change and the importance of change.

-

Change scores for the RM, OSW, and

JVB pain questionnaires were calculated by subtracting the follow-up score from the initial score. The JVB questionnaire function scale change score was computed by subtracting the initial score from the follow-up score. Accordingly, a positive change score represented improvement for all scales. The validity of the change scores was analyzed two ways. In order to preserve the highest scale property of the clinician-patient criterion change score (ie, ordinal rather than nominal), Spearman rank-order correlation coefficients were calculated between the criterion score and the change score of each of the three questionnaires. The second analysis method involved calculating ROC curves.I0 A ROC curve plots sensitivity (y-axis) against 1-specificity (x-axis). Sensiti~ity is defined as the number of patients correctly identified (by a given questionnaire) as having undergone a clinically important change divided by all patients who truly underwent a clinically important change. Spectjicity refers to the number of patients who were correctly identified (by a given questionnaire) as not undergoing a clinically important change divided by all patients who truly did not undergo a clinically important change. The greater the area under the curve, the greater the questionnaire's ability to distinguish patients who did and did not undergo clinically important change. The area under the curve can

Table 1. Descriptive Statistics of Initial, t;allow-up, and Change Scores lnltlal

Follow-up

Change

X

X

Percentage of Change

Questionnaire'

X

RM

11.8

6.2

7.1

5.7

4.7

5.0

40

JVB (pain) JVB(function)

34.5

11.8

21.9

14.1

12.6

13.3

37

40.8

16.8

57.2

18.4

16.4

17.7

33b

OSW

40.5

17.8

24.4

15.5

16.1

15.9

40

SD

SD

SD

"RM=Roland-Morris, JVB=Jan van Breemen, OSW=Oswestry. '~ecauseof the scale orientation for this measure, the percentage of change was calculated as (100 x change scoreD0-initial score).

Physical 'I'herapy/Volume 74, Number 6/June 1994

530 / 19

Table 2.

Change score Correlations

Questlonnalreg

JVB (Pain)

JVB (Function)

RM

0.59

0.63

JVB (pain)

OSW

Crlterlon Change

0.56

JVB (function)

OSW

"RM=Roland-Morris, JVB=Jan van Breernen, OSW=Oswestry.

be interpreted as the probability of correctly identifying a patient who has undergone a clinically important change from randomly selected pairs of patients who have and have not undergone an important change.10 This approach requires the criterion rating of change to be dichotomized. Based on our clinical experience, we chose a cutoff point of 1 5 to represent small and clinically unimportant change and scores of > 5 to represent important change.9 Specifically, we have observed that patients who re-

port global change scores of 1 5 continue to seek treatment. The relative discriminatory ability of the questionnaires was evaluated by comparing the area under the curve using the approach described by Centor.l5 Twotailed probability values were calculated for the correlational and ROC comparison, and the critical value was set at .05.

Results The correlations between the therapists' and patients' global rating of

Figure. Receiver operating characteristic (ROC) curves for the Roland-Moms (RM), Jan van Breernen (JVB), and Oswestry ( O m questionnaires. A ROC curve plots sensitivity (y-axis) against I-speciJicity (x-axis). 20 / 531

change and importance of change were .76 and .72, respectively (PC .05). Table 1 provides a summary of the initial, follow-up, and change scores. Similar percentage of change scores (approximately 40%) were noted for the RM, JVB pain, and OSW scales; however, the JVB function scale score showed less change. Table 2 shows the correlations between the criterion measure of change and the RM, JVB pain, and OSW questionnaire change scores. The correlation between the JVB function scale score and the criterion measure was less than for the JVB pain (P