The Arthritis Impact Measurement Scales (AIMS ... - Wiley Online Library

113 downloads 11916 Views 122KB Size Report
University School of Public Health, 715 Albany St.,. T-C-306 ... AIMS2/public domain access/copy of original ... Test-retest intraclass correlation coefficients range.
Arthritis & Rheumatism (Arthritis Care & Research) Vol. 49, No. 5S, October 15, 2003, pp S113–S133 DOI 10.1002/art.11414 © 2003, American College of Rheumatology

MEASURES OF QUALITY OF LIFE

Adult Measures of Quality of Life The Arthritis Impact Measurement Scales (AIMS/AIMS2), Disease Repercussion Profile (DRP), EuroQoL, Nottingham Health Profile (NHP), Patient Generated Index (PGI), Quality of Well-Being Scale (QWB), RAQoL, Short Form-36 (SF-36), Sickness Impact Profile (SIP), SIP-RA, and World Health Organization’s Quality of Life Instruments (WHOQoL, WHOQoL-100, WHOQoL-Bref)

Alison Carr ARTHRITIS IMPACT MEASUREMENT SCALES (AIMS/AIMS2) General Description Purpose. Disease-specific measure of physical, social, and emotional well-being designed as a measure of outcome in arthritis (1). Content. There are 9 scales: mobility, physical activity (walking, bending, lifting), dexterity, household activity (managing money and medications, housekeeping), social activities, activities of daily living, pain, depression, and anxiety. AIMS2 includes arm function, social support, and work. Developer/contact information. AIMS was developed by Robert F. Meenan, Dean, Boston University School of Public Health, 715 Albany St., T-C-306, Boston, MA 02118. E-mail: rmeenan@ bu.edu. Versions. There is an original version, shortened version, an expanded version (AIMS2), a shortform of the AIMS2 (AIMS2-SF), a child version, and a version for the elderly (Geri-AIMS). AIMS has been translated into many languages including Portuguese, Canadian French, Italian, Spanish, French, Dutch, Swedish, Turkish, and Norwegian.

Alison Carr, MSc, PhD: University of Nottingham, United Kingdom. Address correspondence to Alison Carr, MSc, PhD, Special Lecturer in Musculoskeletal Epidemiology, University of Nottingham, Academic Rheumatology Clinical Sciences Building, City Hospital, Nottingham, NG5 1PB, UK. E-mail: [email protected]. Submitted for publication June 9, 2003; accepted June 19, 2003.

Number of items in scale. AIMS 45, Shortened AIMS 18, AIMS2 101, and AIMS2-SF 26. Subscales. AIMS 9 (listed above); shortened AIMS 9; AIMS2 12. Populations. Developmental/target. Developed in patients with rheumatoid arthritis and osteoarthritis to assess the outcome of health care. Other uses. AIMS has been used in other conditions including: psoriatic arthritis, ankylosing spondylitis, fibromyalgia, carpal tunnel syndrome, colles fracture, hemophilia and in patients undergoing joint replacement surgery. A 1-page summary of results has been developed for use in clinical practice. WHO ICF Components. Activity limitation, Participation restriction.

Administration Method. Self-administered and relatively easy to complete. Training. None required. Time to administer/complete. AIMS 15 minutes, Shortened AIMS 6 – 8 minutes, AIMS2 20 –30 minutes, AIMS2-SF 10 minutes. Equipment needed. None. Availability/cost. Available with user manual from Dr. Meenan (contact information above). Also available at http://www.qolid.org. (Click on free access/disease-specific measures/rheumatology/ AIMS2/public domain access/copy of original S113

S114

AIMS2 question/copy of the user’s manual. Dutch and Italian translations available at same website.

Scoring Responses. Scale. Guttman. Score range. Range is 0 –10 for each section. Total health score 0 – 60. Interpretation of scores. Zero represents good health status, 10 and 60 represent poor health status. Method of scoring. Each section contains a Guttman Scale (a series of questions/statements that are graded so that endorsement of one level of disability automatically indicates disability on all levels below it). In AIMS the number of response options within the Guttman scales varies across sections. In AIMS2, the response format has been standardised across sections to 5-point scales. For scoring, the Guttman scaling is ignored and each item is scored separately without weights. Higher scores indicate greater disability. The score for each section is standardized to a 0 –10 scale using a standardization formula. The total health score is calculated by summing the standardized scores for mobility, physical and household activities, dexterity, pain, and depression. Time to score. Scoring by hand takes around 10 minutes. Computerized scoring can be completed in seconds. Training to score. Minimal training is required for scoring. Users’ guides are available. Training to interpret. No specific training is required for interpretation of scores but familiarity with the range and direction of scoring is helpful. Norms available. None.

Psychometric Information Reliability. AIMS. Guttman scale coefficients for scalability ⬎0.6. Guttman scale coefficients for reproducibility ⬎0.9. Internal consistency via Cronbach’s alpha ⬎0.60 for each of the 9 sections. Test-retest correlations between 2 administrations over a 2-week period in several studies ⬎0.80. Shortened AIMS. Internal consistency and testretest reliability similar to the original AIMS. AIMS2. Internal consistency via Cronbach’s alpha over the 9 sections range from 0.72 to 0.91.

Carr

Test-retest intraclass correlation coefficients range from 0.65 to 0.90 over a 10-day period, and from 0.78 to 0.94 over a 3-week period. AIMS2-S. Test-retest intraclass correlation coefficients ⬎0.70 over a 1-week period. Validity. AIMS content validity. Items in AIMS are based on the content of the Rand Health Survey Questionnaires, the Quality of Well-Being Scale, and Katz’s Index of Activities of Daily Living. Items on dexterity and pain were added. Factor analysis identified 3 factors (physical function, psychological, and pain), which have been replicated in subsequent studies. AIMS construct. Relevant subscales of AIMS correlate strongly with other measures of the construct (e.g., physical activity AIMS scale with Health Assessment Questionnaire (HAQ), AIMS pain scale with HAQ pain scale, AIMS and Functional Status Questionnaire). Physical functioning AIMS scales correlate more strongly with measures of disease activity than AIMS psychological or social scales. All scales correlate with increasing age (i.e., reduced function with increasing age). AIMS2 content. Derived from AIMS but expanded to include arm function, social support, and work giving a 5-factor structure (lower extremity function, upper extremity function, affect, pain, and social interaction). AIMS2 criterion. Moderate correlations with general health status measures: NHP, SIP and Short Form-36 (SF-36). AIMS2 construct. Moderate, expected correlations with disease activity (swollen joint count, pain visual analog scale (VAS) and erythrocyte sedimentation rate). AIMS2-SF content. Derived from AIMS2 using Delphi and nominal group techniques. Principal components factor analysis confirmed the same 5-factor structure as AIMS2. AIMS2-SF criterion. Comparison between AIMS2 and AIMS2-SF using the Bland and Altman method for measuring agreement found almost complete agreement. Moderate correlations with other general health status measures (MHAQ, SF36, SIP) were very similar to the correlations between these measures and AIMS2.

Adult Quality of Life

AIMS2-SF construct. Correlations with clinical and disease factors were moderate and as expected. Responsiveness/sensitivity to change. Responsiveness of AIMS is better than most other generic and disease-specific measures (SIP, QWB, HAQ, Functional Status Index, MacMaster Health Index). AIMS2, and AIMS2-SF have similar responsiveness. Standardized response means for changes in AIMS2-SF scores over 3 months range from 0.36 (small) to 0.8 (high).

Comments and Critique The AIMS is a widely used disease-specific measure that has a broad scope, measuring many aspects of health status. It is more responsive in patients with arthritis than any of the generic measures. The revised version, AIMS2 has good psychometric properties and the advantage of including measures of satisfaction with health and patients’ priorities for improvement. The fulllength versions are quite time consuming to complete, and the short-form (AIMS2-SF) that has similar psychometric properties to the full-length versions, may be more appropriate for postal surveys, studies where patients are required to complete several questionnaires, and in clinical practice.

References 1. (Original) Meenan RF, Gertman PM, Mason JH. Measuring health status in arthritis: the Arthritis Impact Measurement Scales. Arthritis Rheum 1980;23: 146 –52.

Additional References Meenan RF, Mason JH, Anderson JJ, Guccione AA, Kazis LE. AIMS2: the content and properties of a revised and expanded Arthritis Impact Measurement Scales health status questionnaire. Arthritis Rheum 1992; 35:1–10 Guillemin F, Coste J, Pouchot J, Ghezail M, Bregeon C, Sany J, and the French Quality of Life in Rheumatology Group. The AIMS2-SF: a short form of the Arthritis Impact Measurement Scales 2. Arthritis Rheum 1997;40:1267–74. Kazis LE, Anderson JJ, Meenan RF. Health status information in clinical practice: the development and testing of patient profile reports. J Rheumatol 1988;15:338 – 44. Liang MH, Fossel AH, Larson MG. Comparisons of five health status instruments for orthopaedic evaluation. Med Care 1990;28:632– 42.

S115

DISEASE REPERCUSSION PROFILE (DRP) General Description Purpose. An individualized measure of perceived disadvantage resulting from illness. Designed specifically for use in routine clinical practice in arthritis (1). Content. Perceived disadvantage is measured in 6 domains: physical activity, social activity, socioeconomic status, relationships, emotions, and body image. In each domain, patients are asked to specify the problem or problems they are currently experiencing and to rate the degree of importance that those problems have for them. The domains are standardized but the specific problems within domains are unique to each individual completing the questionnaire. Developer/contact information. Developed by Alison Carr, Department of Academic Rheumatology, University of Nottingham, Clinical Sciences Building, City Hospital, Nottingham NG5 2PR, UK. Versions. One version, and adaptations to scoring (1997). Number of items in scale. There are 6 items. Subscales. A profile measure with 6 domains listed above. Populations. Developmental/target. Developed for use in routine clinical practice with patients with rheumatoid arthritis (RA), to identify specific problems that might be amenable to intervention and to assess the effectiveness of care. Other uses. Adapted for use in osteoarthritis, low back pain, upper limb problems, osteoporosis, and patients undergoing joint replacement surgery. Used in evaluative studies as a measure of outcome. It has also been used in clinical practice as the basis for goal-setting for patients with RA. WHO ICF Components. Activity limitation, Participation restriction.

Administration Method. Self-administered. Easy to complete for all age groups and severity of disease. Training. None required.

S116

Time to administer/complete. Time is 2–15 minutes depending on the number of problems patients are experiencing. Problems are specified using free text.

Carr

and items within the domain are individually specified by patients. Test-retest. Good reliability over a 1-week period. No differences in profile scores.

Equipment needed. None. Availability/cost. Available from Dr. Carr at the above address. No charge for research/clinical use. Also available at the Arthritis Care & Research Web site at http://www.interscience.wiley.com/ jpages/0004-3591:1/suppmat/index.html.

Scoring Responses. Scale. Ordinal. Score range. Range is 0 –10 for each domain. Interpretation of scores. Produces a profile of disadvantage across the 6 domains. The profile is presented as a bar chart. Zero represents no problems or no perceived disadvantage resulting from problems, and 10 represents severe disadvantage. Method of scoring. Each domain has a screening question asking whether any problems are currently experienced in this area. If yes, patients are asked to specify the problem and its consequences using free text. They are then asked to rate the importance of the consequences. Screening questions are coded 0 for no problem, 1 for yes. Domain scores are calculated by multiplying the screening question by the importance rating. These are plotted on a bar chart. The qualitative information is not scored but is used as the basis for setting treatment goals. Time to score. Scoring by hand takes around 1 minute. Training to score. Minimal. Training to interpret. Minimal. The bar chart profile enables problem areas to be identified at a glance. The questionnaire can then be reviewed to identify the specific problems experienced by the patient. Norms available. None.

Psychometric Information Reliability. Internal consistency. Assessment of internal consistency is not appropriate in this sort of profile measure where each domain is distinct

Validity. Content. The domains covered were determined by a 2-step process: qualitative indepth interviews with patients, followed by a large-scale postal survey of patients to confirm the content specified in the interviews. Construct. Moderate correlations between individual domains and other measures of the same construct (e.g. functional activity with HAQ disability index and SF-36 physical function score; social activity with SF-36 social function score; relationships with Quality of Social Support Scale; emotions with Hospital Anxiety and Depression Scale; and body image with Body Satisfaction Scale). Socioeconomic scores are higher in unemployed and invalid-retired than employed or retired patients. Weak correlations between measures of disease activity (C-reactive protein [CRP], early morning stiffness, Ritchie Index, pain) and most domains demonstrating the conceptual difference between impairment and disadvantage. Responsiveness/sensitivity to change. More data needed from clinical trials. As responsive to change in clinical practice as other measures of health status (HAQ and SF-36). A ceiling effect for some domains reflects the design of the tool as an individualized measure for use in clinical practice (i.e., a minority of patients report problems in some domains which may be important in clinical practice).

Comments and Critique The DRP is different from most other general health status measures, being individualized rather than standardized and designed specifically for use in clinical practice. This means it may need to be assessed using methods that differ slightly from traditional psychometric methods. This problem is common to other individualized measures such as the Patient Generated Index and the Schedule for the Evaluation of Individualized Quality of Life (SEIQoL) and may account for some problems in assessing the construct validity of individualized measures. It has been successfully used as the basis for a goal-setting approach to disease management in RA, and its main strength remains its use in routine clinical practice, although it has also been used as an outcome measure in evaluative studies.

Adult Quality of Life

References 1. (Original) Carr AJ. A patient-centered approach to evaluation and treatment in rheumatoid arthritis: the development of a clinical tool to measure patientperceived handicap. Br J Rheumatol 1996;35:921–32.

S117

Other uses. Health economic studies, population surveys, healthcare audit, needs assessment studies. WHO ICF Components. Activity limitation, Participation restriction.

Additional References Sharpe L, Sensky T, Brewin CR, Allard S. Characteristics of handicap for patients with recent onset rheumatoid arthritis: the validity of the Disease Repercussion Profile. Rheumatology 2001;40:1169 – 74. Carr AJ. Beyond disability: measuring the social and personal consequences of osteoarthritis. Osteoarthritis Cartilage 1999;7:230 – 8.

EUROQOL/EQ-5D General Description Purpose. Generic measure of health-related quality of life designed for use in evaluative studies to allow comparison across patient groups. It is also a utility measure that can assess patients’ preferences for health states for use in economic analyses (1). Content. In its simplest form for use in clinical research, it assesses quality of life in 5 dimensions: mobility, self-care, usual activity, pain/discomfort, and anxiety/depression. It also includes an overall assessment of patients’ perception of their health state. Developer/contact information. The EuroQol (EQ-5D) was developed by the EuroQol Group in 1990. The Group is an international network of researchers from different disciplines. Its administrative office is in Rotterdam: Frank de Charro, Centre for Health Policy and Law, Erasmus University, PO Box 1738, 3000 DR Rotterdam, the Netherlands. E-mail: [email protected]. Versions. The EQ-5D was developed simultaneously in English, Dutch, Finnish, Norwegian, and Swedish. It has also been translated into 20 other languages including most European languages. Number of items in scale. There are 5 items plus the 0 –100 graphic rating scale for overall health status. Subscales. Each of the 5 items and the general health status scale can act as subscales. Populations. Developmental/target. Developed for use in evaluative studies in all patients.

Administration Method. Self-completed. Suitable for postal surveys. Very quick and easy to complete. Training. None required. Time to administer/complete. 2–5 minutes. Equipment needed. None required. Availability/cost. Available from: Frank de Charro at Erasmus University (address above) or Professor Paul Kind: Centre for Health Economics, University of York, York YO1 5DD UK. E-mail: [email protected]. Also available through: http://www.euroqol.org. There is no charge for its use in clinical research. Users are encouraged to register and share their results with the EuroQoL Group.

Scoring Responses. Scale. Ratio. Score range. Index score 0 –1, general health status score 0 –100. Interpretation of scores. For the index score, 0 represents full health and 1, death. On the general health status scale, 0 represents the worst and 100 the best imaginable health state. Method of scoring. EQ-5D can give 3 different scores: a profile, a weighted health index, and the general health status score. Each of the 5 items/ dimensions is divided into 3 levels (no problem, some problem, and extreme problem). To produce the profile score, the levels for each item are simply represented as a 5-digit number, each digit representing the level of difficulty on one of the dimensions. To calculate the weighted index across the 5 dimensions, population weights are assigned for each level of difficulty selected and are subtracted from 1. In the UK, the population weights were obtained from a health survey in 1993. To obtain a general health status score, patients are asked to place a horizontal line across the point on a 100-point graphic rating scale that

S118

corresponds to their current health state and the score is read directly from the scale. Time to score. The profile and general health state scoring can be done by hand and takes 1–2 minutes. Computer software can be used to calculate the weighted index. Training to score. Minimal. User manual available. Training to interpret. Minimal. Norms available. In the UK, norms are available by age group, sex and socioeconomic class.

Psychometric Information Reliability. Test-retest. Intraclass correlations for test-retest reliability over a 1 week period range from 0.69 to 0.94. Validity. Content. The questionnaire was devised by a group of experts, based on knowledge of existing health status measures. The aim for content was brevity but with the most important dimensions of quality of life represented. It is therefore not exhaustively comprehensive but is designed to be used in conjunction with other quality of life measures. Criterion. Correlates with SF-36 and the Health Utilities Index. Construct. Patterns of responses across recent users of healthcare, age group, gender and socioeconomic group are as expected. Correlations with specific measures of individual domains (HAQ, Hospital Anxiety and Depression Questionnaire, pain VAS) are moderately strong. Responsiveness/sensitivity to change. Responsiveness to change has been demonstrated, but the EQ-5D is less responsive than the SF-36, and in specific conditions (e.g., back pain), it is less responsive than disease-specific measures. There are also reports of a ceiling effect in the function dimension with 95% of respondents at the ceiling for function.

Comments and Critique The EQ-5D has become an increasingly popular measure of health status. Its attractions include its simplicity and length, which makes it quick and easy to complete, and the potential for its use as a utility measure in economic analyses. For these reasons it is widely used in clinical trials

Carr

of treatment interventions. Its brevity means that on its own, it would probably not provide an adequate assessment of quality of life in a study where quality of life is a major outcome but it can be and often is, used alongside more detailed quality of life questionnaires. It is probably also more appropriate in studies in which large health changes are expected. Although some users in rheumatic disease have found the EQ-5D valid and responsive, others have been critical of its inability to discriminate patients with moderate morbidity, the restricted distribution of scores and differences between patient and societal utility tariffs. This last point is an increasingly recognised phenomenon: people with chronic or severe diseases may value the quality of their lives in different ways from the healthy population. This “disability paradox” explains the high quality of life ratings of severely disabled or seriously ill patients and has implications for the appropriateness of societaldetermined weights.

References 1. (Original) EuroQol Group. EuroQol: a new facility for the measurement of health-related quality of life. Health Policy 1990;16:199 –208.

Additional References Brazier JE, Harper R, Munro J, Walters SJ, Snaith ML. Generic and condition-specific outcome measures for people with osteoarthritis of the knee. Rheumatology 1999;38:870 –7. Johnson JA, Coons SJ, Ergo A, Szava-Kovats G. Valuation of EuroQol (EQ-5D) health states in an adult US sample. Pharmacoeconomics 1998;13:421–33. Kind P, Dolan P, Gudex C, Williams A. Variations in population health status: results from a United Kingdom national questionnaire survey. BMJ 1998; 316:736 – 41. Polsky D, Wilkie RJ, Scott K, Schulman KA, Glick HA. A comparison of scoring weights for the EuroQol derived from patients and the general public. Health Econ 2001;10:27–37. Wolfe F, Hawley DJ. Measurement of the quality of life in rheumatic disorders using the EuroQol. Br J Rheumatol 1997;36:786 –93.

NOTTINGHAM HEALTH PROFILE (NHP) General Description Purpose. Generic measure of general health status designed for use in primary care settings (1). Content. It measures functional ability, pain, sleep, energy, emotional problems, and participation (work and social activities).

Adult Quality of Life

Developer/contact information. Sonja Hunt, Galen Research, Enterprise House, Manchester Science Park, Lloyd St, North, Manchester M15 6SE, UK. Versions. There is an original and a revised version and it has been translated into most European languages and Arabic. Number of items in scale. There are 38 items (in Part 1). Subscales. The questionnaire is in 2 parts. Part 1 has 6 subscales: physical activities (8 items); pain (8 items); sleep (5 items); social isolation (5 items); emotional reactions (9 items); and energy level (3 items). Part 2 is optional and assesses handicap in 7 items (occupation, household tasks, personal relationships, sex life, social life, holidays, and hobbies). Populations. Developmental/target. Developed in patients with acute and chronic illness. Designed for use in primary care settings. Other uses. The NHP has been widely used in clinical trials in secondary care settings and in population surveys. WHO ICF Components. Activity limitation, Participation restriction.

Administration Method. Self-administered. The yes/no response options have the apparent advantage of simplicity, but there are reports that some patients find it frustrating to be limited to these options and may make arbitrary responses as a result. Training. None required.

S119

Score range. The range is 0 –100 for each section. The overall score is 0 –1. Interpretation of scores. For the section scores: 0 represents no problems, 100 means all the items within the section have been selected (i.e., maximum impact of condition on health status). For the overall score the meaning is reversed: 0 represents a poor health status score, and 1 means good health status. Method of scoring. The questionnaire consists of a series of statements about current health status. Patients answer yes or no to each statement within a section. In part 1, each item has a preassigned weight and the weights of the positive responses within each section are summed to give a score out of 100. The overall score for part 1 can be generated in several different ways. The simplest of these is to calculate what proportion of the 36 items have positive responses and then subtract this from 1. This gives an unweighted overall score but comparison with methods that generated weighted overall scores found little difference between them. Items in part 2 are unweighted, and it is scored by the number of positive responses. Time to score. Scoring by hand can take up to 10 minutes. If data are entered onto a computer, use of scoring algorithms makes scoring quick and easy. Training to score. Some training required if the questionnaire is to be scored by hand. Minimal if computer scoring is used. Training to interpret. None required. 0 –100 scores make intuitive sense. The handicap score is more difficult to interpret and requires guidance or familiarity.

Time to administer/complete. 10 –15 minutes. Equipment needed. None required. Availability/cost. Available from Galen Research, Enterprise House, Manchester Science Park, Lloyd St. North, Manchester M15 6SE, UK. Users must be registered with Galen research and there is a small charge for its use. Instrument may be viewed at http://www.medal.org/adocs/docs_ch1/doc_ch1.07. html.

Scoring Responses. Scale. Interval. Each item has dichotomous yes/no responses.

Norms available. For healthy people by age group, social class, and sex and for some patient groups. Norms should be used with caution because of concerns about the lack of smooth trends in health status across expected groups.

Psychometric Information Reliability. Test-retest correlation coefficients for repeated administrations over a 4-week period were reported as 0.75– 0.88 for Part 1 and 0.44 – 0.86 for Part 2. In patients with musculoskeletal disease, the intraclass correlation coefficient was 0.95.

S120

Carr

Validity. Content. The questionnaire is based on patients’ values and descriptions of the impact of disease on their quality of life rather than professional views, ensuring good content validity.

References

Criterion validity. The NHP shows moderate correlations with other measures of general health status (SF-36, SIP, Dartmouth COOP charts, and AIMS) and is less good than those measures at detecting minor disability, indicating a floor effect.

Additional References

Construct. Moderate to strong correlations with clinical and disease measures (McGill Pain Questionnaire, physician ratings, Ritchie index, pain VAS) in expected directions. Discriminates between patients before and after total hip replacement surgery; between those patients and their well spouses; between patients with physical and mental handicaps; between stroke patients and healthy controls; and between different severities of the same disease. However, it was unable to discriminate differences in the severity and frequency of angina. Responsiveness/sensitivity to change. Data on the responsiveness of NHP are mixed. It does not seem to be responsive to small changes in health (probably because of the dichotomous response scaling), and the score distribution in patients with COPD was skewed (⬎50% of patients had the best score). This and the floor effects (46% of people in a community survey reported no problems on the NHP) probably mean it is best avoided in health surveys, studies of mild disability/disease severity, or where treatment effects are expected to be small.

Comments and Critique Before the development of the SF-36, the NHP was one of the most widely-used health status measures in Europe. It was an innovative measure at the time of its development, capturing the patient’s perception of their health status. Although it is still a valuable tool in some situations, it should be used with caution in healthy populations (or those with mild disability) because of the floor effects, and there have been suggestions that it is not suitable for use in health surveys because the range of disability covered by each item is uneven. Its system of weighting and scoring has also been criticized, particularly in people whose disabilities limit their roles. The norms are considered of limited value as standards because there are no smooth trends in health status across expected groups.

1. (Original) Hunt SM, McEwen J. The development of a subjective health indicator. Sociol Health Illness 1980;2:231– 46

Donovan JL, Frankel SJ, Eyles JD. Assessing the need for health status measures. J Epidemiol Community Health 1993;47:158 – 62 Jenkinson C. Why are we weighting? A critical examination of the use of item weights in a health status measure. Soc Sci Med 1991;32:1413– 6. Kind P, Carr-Hill R. The Nottingham Health Profile: a useful tool for epidemiologists? Soc Sci Med 1987: 25:905–10.

PATIENT GENERATED INDEX (PGI) General Description Purpose. Generic, individualized measure of quality of life (1). Content. Based on Calman’s definition of healthrelated quality of life as the gap between expectations and reality. It asks patients to list the 5 most important areas/activities in their life that are affected by their condition and assesses the severity of impact of their condition on these areas/activities. Developer/contact information. Developed by Dr. Danny A. Ruta, Department of Epidemiology and Public Health, Ninewells Hospital and Medical School, Ninewells Road, Dundee, DD1 9SY, UK. Versions. Interviewer and self-completed versions. Number of items in scale. There are 5 items. Subscales. Profile measure where each area/activity reported constitutes a scale. Populations. Developmental/target. Designed to assess quality of life in evaluative studies across all conditions. Other uses. It has been used in population health surveys. WHO ICF Components. Not applicable.

Administration Method. Interviewer-administered and selfcompleted. The self-completed version uses a prompt list to encourage patients to specify their 5

Adult Quality of Life

areas/activities. Generally easy to complete although some patients have difficulty with the valuation exercise in which they are asked to allocate 60 points between their 5 areas on the basis of which is most important. Difficulties in self-completion have been reported among elderly and disabled patients. Training. Minimal training required for interviewer-administered version. Time to administer/complete. 10 –20 minutes. Equipment needed. None. Availability/cost. Available at no charge from Dr Ruta at the above address. It is also in the public domain, (reference 1).

Scoring Responses. Scale. Interval. Score range. Range is 0 –100 for each of 5 domains, 0 –100 for the overall quality of life index. Interpretation of scores. The scores represent the extent to which reality falls short of expectations. Zero represents a situation that is the worst imaginable, 100 represents a situation that is as good as the patient wants it to be. Method of scoring. Having specified the 5 areas/ activities affected by their condition that are of most importance to them, patients then rate the degree of impact of their condition in each of these 5 areas on a 0 –100 scale. They are then asked to distribute 60 points between the 5 areas to reflect their relative importance. The overall index is calculated by multiplying the rating for each area by the points allocated to that area and summing the 5 areas. Time to score. Hand scoring is relatively quick and easy and takes a few minutes. Training to score. Minimal. Training to interpret. Scores of 0 –100 are readily interpreted. Norms available. None.

Psychometric Information Reliability. Internal consistency. As with the DRP, the nature and structure of the scale make assessment of internal consistency inappropriate.

S121

Test-retest. The original evaluation reported Pearson correlations of 0.7 over a 2-week period. More recent evaluation suggests some elderly patients incorrectly interpret scoring instructions, reducing assessments of reliability (intraclass correlation coefficient [ICC] ⫽ 0.55). Reliability was still good in patients who had interpreted the instructions correctly (ICC ⫽ 0.67). Validity. Content. Content is specified by each individual patient, thereby ensuring validity for each patient. Construct. Discriminates between patients with mild, moderate and severe arthritis, between users and non-users of health services and between primary care and specialist-referred patients with the same condition. Correlates with general health status measures (SF-36 and AIMS) and with symptom severity. Responsiveness/sensitivity to change. Data on responsiveness is mixed. It is more responsive to clinical change in back pain than disease-specific instruments and more responsive to change following therapy in patients with sleep apnea than the EQ-5D or SF-36. However, it was unable to detect changes in health in a study of 1,027 patients with arthritis.

Comments and Critique The PGI is an innovative measure, one of a handful that attempt to capture the individual nature of quality of life. Unlike many standardized quality of life measures, it is based on an underlying theory of what constitutes quality of life. Its main drawback is the difficulty some patients have with the evaluation exercise. Where quality of life is to be measured in elderly or disabled patients, it may be worth using the interviewer-administered form to overcome these problems. Disease-specific trigger lists for the PGI are being developed for use in some rheumatic conditions (for example, ankylosing spondylitis) in an attempt to increase responsiveness and reduce problems with completion. These versions are still under evaluation.

References 1. (Original) Ruta DA, Garratt AM, Leng M, Russell IT, MacDonald LM. A new approach to measurement of quality of life: The Patient Generated Index. Med Care 1994;32:1109 –26.

Additional References MacDuff C, Russell E. The problem of measuring change in individual health-related quality of life by postal

S122 questionnaire: use of the patient generated index in a disabled population. Qual Life Res 1998;7:761–9. Ruta DA, Garratt AM, Russell IT. Patient centred assessment of quality of life for patients with four common conditions. Qual Health Care 1999;8:22–9. Tully M, Cantrill J. The test-retest reliability of the modified Patient Generated Index. J Health Serv Res Policy 2002;7:81–9. Tully MP, Cantrill J. The validity of the modified patient generated index: a quantitative and qualitative approach. Qual Life Res 2000:9:509 –20.

QUALITY OF WELL-BEING SCALE (QWB) General Description Purpose. Generic measure of general health status that places individuals on a continuum of wellness from death to full asymptomatic function. It combines this index with an assessment of prognosis and mortality to estimate quality adjusted life years (QALYs). It was designed as a measure of outcome for evaluative studies and in needs assessment exercises (1). Content. Symptoms and problems and current status in terms of mobility, physical activity, and social activity. Developer/contact information. Developed by J. W. Bush and R. M. Kaplan. Contact R. M. Kaplan, PhD, Professor and Chief, Division of Health Care Sciences 0622, School of Medicine, University of California, San Diego, La Jolla, CA 92093-0622. Versions. Original (interviewer-administered) with several modifications (most recent version from 1994), a self-completed version, a child version, and a version with simplified scoring (Functional Status Index). Number of items in scale. There are 30 items. Subscales. Quality of life index has 3 items; Symptom/problem complexes have 27 items. Populations. Developmental/target. Patients with all types of disease. Designed to enable comparison between effectiveness of treatment across different disease groups. Other uses. Evaluative studies (clinical trials). WHO ICF Components. Not applicable.

Administration Method. Interviewer administered. Selfcompletion of the original version was associated

Carr

with poorer ability to detect disabilities. The newer self-completed version is still being assessed. Training. Two weeks training for the interviewer-administered version. Time to administer/complete. 30 minutes. Equipment needed. None. Availability/cost. Interview schedule and manual available at cost from Dr. Kaplan at address above. Also see http://medicine.ucsd.edu/ fpm/hoap/instruments.html.

Scoring Responses. Scale. Ratio. Score range. Quality of well-being index 0 –1. Interpretation of scores. Zero represents death, 1 represents full, asymptomatic functioning. Method of scoring. Each of the 3 items in the functional status section of the questionnaire has several levels reflecting the severity of impact of disease on functioning (i.e., mobility [3 levels], physical activity [3 levels], social activity [5 levels]). Each level has a pre-assigned weight representing the social undesirability of that health state. Patients are categorized by level of severity within each section. The presence of any of the 27 symptoms/problems is recorded (whether or not it affects functional ability). As with the levels of functional status, each symptom/problem complex has a pre-assigned weight. Where there are multiple symptoms, only the most undesirable is scored. The QWB is calculated from the following formula: W (QWB) ⫽ 1 – (mobility level ⫻ mobility weight) – (physical activity level ⫻ physical activity weight) – social activity level ⫻ social activity weight) – (symptom ⫻ symptom weight). QALYs associated with the QWB score are calculated by multiplying the QWB score by the amount of time spent in that state. Where several measurements are available over time, this calculation is repeated for each change in health state and the results summed to give QALYs. Where there are data available on the probabilities of transition to a better or worse level of functioning for the specific disease/treatment being assessed, the QWB score is adjusted to reflect this prognosis.

Adult Quality of Life

Time to score. Scoring by hand is time consuming. Scoring by computer algorithm is quick and easy. Training to score. Training is necessary for scoring by hand. For computerized scoring, little training is necessary beyond writing the computer algorithm. Training to interpret. Some training necessary to interpret the scores. Interpretation requires a familiarity with indices and QALYs. Norms available. Norms are available for the US population and for some disease groups (such as rheumatoid arthritis).

Psychometric Information Reliability. Interrater reliability of weights for the functional status scale is high (0.90). Test-retest is assessed by comparing scores on day 1 with the mean of daily scores on 8 subsequent days. Correlation between day 1 and subsequent scores ⬎0.93. Validity. Content. The scale has a broad scope but does not specifically include any assessment of psychosocial functioning in the physical functioning index (although the symptom/problem list does include psychological symptoms). Criterion. Moderate correlations with SIP, SF-36 and AIMS. Construct. Moderate correlations with clinical tests in cystic fibrosis (forced expiratory volume (FEV1), peak oxygen consumption (VO2 max), and chronic obstructive pulmonary disease (exercise tolerance treadmill test). Moderate correlations with Jette’s Functional Status Index. Strong correlations with the number of reported symptoms and chronic health problems. Moderate correlations with the number of recent physician contacts. Responsiveness/sensitivity to change. QWB scale has demonstrated significant treatment effects in a number of conditions and treatments including: Chronic obstructive pulmonary disease, diabetes, AIDS, and rheumatoid arthritis (RA). In the RA trial, the QWB scale was more responsive than AIMS.

Comments and Critique The QWB scale is a well-established measure that is based on a clear conceptual background. Its

S123

advantage is that it is able to produce utilities for health economic evaluations with more detailed data than the EQ-5D, and the inclusion of a symptom/ problem list is likely to make it more sensitive to minor morbidity than some other generic measures. One of its major drawbacks in the interviewer-administered form is that it is time consuming and appears relatively complex to score. The development of a self-administered version that is as accurate as the intervieweradministered version and the use of computerized scoring may significantly reduce these problems. Another major criticism has been the exclusion of psychological or emotional functioning from the scale, although the symptom list does include psychological symptoms and the QWB score has shown weak-moderate correlations with the Center for Epidemiologic Studies Depression Scale and the AIMS psychological score.

References 1. (Original) Kaplan RM, Bush JW, Berry CC. Health status: types of validity and the Index of Well-Being. Health Serv Res 1976;11:478 –507.

Additional References Anderson JP, Bush JW, Berry CC. Classifying function for health outcome and quality of life evaluation: self versus interviewer modes. Med Care 1986;24:454 – 69. Andresen EM, Rothenberg BM, Kaplan RM. Performance of a self-administered mailed version of the quality of well-being (QWB-SA) questionnaire among older adults. Med Care 1998;36:1349 – 60. Kaplan RM, Ganiats TG, Sieber WJ, Anderson JP. The quality of well-being scale: critical similarities and differences with SF-36. Int J Qual Health Care 1998; 10:509 –20.

RAQOL General Description Purpose. Disease-specific quality of life measure for use in patients with rheumatoid arthritis (RA) (1). Content. Assesses the impact of RA on activities of daily living, social interaction, emotional wellbeing, and relationships. Developer/contact information. Developed by D van der Heijde, D Whalley, SP McKenna, and Z de Jong in 1997 Galen Research, Enterprise House, Manchester Science Park, Lloyd St. North, Manchester M15 6SE, UK.

S124

Versions. The original version was developed simultaneously in English and Dutch. It has since been translated into other languages including Swedish, Canadian French, and Danish.

Carr

Training to interpret. Minimal but some guidance/familiarity about what scores represent is necessary. Norms available. None.

Number of items in scale. There are 30 items. Subscales. None. Populations. Developmental/target. Patients with RA in the context of evaluative, economic and cohort studies, to determine the burden of illness, and in clinical practice.

Psychometric Information Reliability. Internal consistency via Cronbach’s alpha is 0.90. Test-retest via Spearman rank correlations between first and second administrations 14 days apart is 0.90 – 0.94.

WHO ICF Components. Activity limitation, Participation restriction.

Validity. Content. Content of the questionnaire is based on qualitative interviews with RA patients, ensuring content validity. Simultaneous development in the Netherlands and the UK ensures content validity across 2 cultural settings.

Administration

Criterion. Moderate correlations with Nottingham Health Profile.

Other uses. None.

Method. Self-administered. Training. None required. Time to administer/complete. Around 6 minutes, although it can be considerably longer in some patients. Equipment needed. None required. Availability/cost. Available from Galen Research. Users are required to register. Items may be viewed at http://rheumatology.oupjournals.org/cgi/reprint/ 36/8/878.pdf (in appendix).

Scoring Responses. Scale. Ordinal. Dichotomous (yes/ no) responses to items. Score range. The range is 0 –30. Interpretation of scores. A higher score represents a poor quality of life. Method of scoring. The questionnaire consists of 30 statements that have a yes/no response. Items are scored 1 for yes and 0 for no. Scores for each item are summed to give an overall quality of life score.

Construct. Discriminates between different severities of disease; and between active and inactive disease. Responsiveness/sensitivity to change. Responsiveness currently under evaluation in a randomized controlled trial. Preliminary results suggest moderate responsiveness, similar to the Nottingham Health Profile.

Comments and Critique The RAQoL is quick and easy to use and has the advantage of being specifically designed for use in RA. It is a new tool and needs further assessment in different research settings to evaluate its performance. It also needs to be evaluated in clinical practice and its responsiveness in individual patients should be established. The dichotomous response options may make it susceptible to similar problems in terms of responsiveness and questionnaire completion as those suffered by the Nottingham Health Profile. In addition, its superiority over other generic measures of health status/quality of life should be established for it to have a significant role in clinical research.

References Time to score. The questionnaire can be scored by hand in a few minutes. Training to score. Minimal.

1. (Original) De Jong Z, van der Heijde D, McKenna SP, Whalley D. The reliability and construct validity of the RAQoL: a rheumatoid arthritis-specific quality of life instrument. Br J Rheumatol 1997;36:878 – 83.

Adult Quality of Life

Additional Reference Wells G, Boers M, Shea B, Tugwell P, Westhovens R, Suarez-Almazor M, et al, and the OMERACT/ILAR Task Force on Generic Quality of Life, the Life Outcome Measures in Rheumatology, International League of Associations For Rheumatology. Sensitivity to change of generic quality of life instruments in patients with rheumatoid arthritis: preliminary findings in the generic health OMERACT study. J Rheumatol 1999;26:217–21.

SHORT FORM-36 (SF-36) General Description Purpose. A generic measure of general health status (health related quality of life) designed for use in population surveys (1,2). Content. The questionnaire assesses 8 dimensions of health status: physical functioning, role limitations due to physical problems, bodily pain, social functioning, mental health, role limitations due to emotional problems, vitality and overall/general health. It aims to measure positive as well as negative health status. Developer/contact information. Developed by the Rand Corporation and John E. Ware. SF-36 Health Survey, The Health Institute, New England Medical Center Hospitals, Box 345, 750 Washington Street, Boston, MA, 02111. Versions. The SF-36 has been altered slightly on at least 2 occasions since the original version was developed. SF-36v2 is a revised and improved version of the original questionnaire that is now recommended in place of the SF-36v1. Improvements include changes to the questions and responses that increases compatibility between cultural settings; changes to question and instruction wording and to the layout that simplifies and clarifies the questionnaire and reduces the number of missing responses; increased sensitivity of the role functioning scales by changing the dichotomous responses to 5-item responses; and simplification of the mental health and vitality scales by reducing the responses from 6-level to 5-level. SF-36 has been translated and adapted for use in a number of other languages including French, German, Dutch, Danish, Swedish, Spanish, Italian, and Japanese and has also been adapted for use in the UK. The translated/adapted versions may be based on different original versions of the questionnaire. For example, the UK version was adapted from the first version, before the two more recent revisions were made.

S125

There is an alternative form available that measures change over shorter time periods (over the last week) to enable assessment in acute health conditions. Some disease-specific versions have also been developed to improve responsiveness. For example, there is a version designed specifically to measure health status in patients undergoing knee replacement surgery. Two shorter versions of the SF-36 are available: SF-12 and SF-8. These are based on the SF-36 and designed to reduce responder burden in large-scale population surveys. Number of items in scale. SF-36 has 36 items, SF-12 has 12 items, SF-8 has 8 items. Subscales. All of the SF versions have the same 8 dimensions/subscales: Physical functioning (SF36 has 10 items), Social functioning (SF-36 has 2 items), Bodily pain (SF-36 has 2 items), Energy/ vitality (SF-36 has 4 items), Mental health (SF-36 has 5 items), Role limitations due to physical problems (SF-36 has 4 items), Role limitations due to emotional problems (SF-36 has 3 items), General health (SF-36 has 5 items). The SF-12 and SF-36 share 12 items with identical wording and response options, and the SF-8 and SF-36 share 1 item. Populations. Developmental/target. The SF-36 is a shortened version of the questionnaires designed for use in the Rand Corporation’s Health Insurance Study. The target population was adults aged 14 – 61 years with a full spectrum of medical conditions. Other uses. It has been used in older patients (⬎61 years) and in many evaluative studies including pharmaceutical trials. WHO ICF Components. Activity limitation, Participation restriction.

Administration Method. Questionnaires that can be selfcompleted or interviewer administered (at interview or by telephone). Generally easy to administer although there are reports that some patient groups (for example, the elderly) may experience some difficulty in completion, resulting in missing data. There is also an internet version available that enables patients to complete the questionnaire on line on repeated occasions. Physicians can then access their patients’ data through the internet and monitor their progress over time.

S126

Training. None required. Time to administer/complete. SF-36 takes approximately 10 minutes for most patient groups, 15–20 minutes for some elderly patients. The SF12 takes 5 minutes, and the SF-8 takes 2 minutes. Equipment needed. None required. Availability/cost. All SF-36 survey instruments, scoring manuals and licences for use are available from QualityMetric at www.qualitymetric.com. There is a charge for the manual and use of the questionnaires. Different charges are levied for academic and commercial use. The internet version is available from QualityMetric and is listed under Small Group Patient Tracking. There is a $99 set-up fee and then each survey transaction is charged at $0.50 (with a minimum purchase of 500 transactions). Computerized scoring systems are available from Response Technologies, Inc., 3399 South Country Trail, East Greenwich, Rhode Island, 02818. Items may be viewed at: http://www.medal. org/adocs/docs_ch1/doc_ch1.08.html#A01.08.01 (SF-36), http://www.medal.org/adocs/docs_ch1/ doc_ch1.08.html#A01.08.03 (SF-12).

Scoring Responses. Scale. Ordinal. There is a mixture of 3-, 5-, and 6-point scales for different items. In SF36v2, dichotomous responses and 6-point scales have been changed to 5-point scales. All versions (SF-36v1, SF-36v2, SF-12v2, and SF-8) produce a profile of 8 QoL scores. Two summary scores can also be calculated from the profile scores: physical health (physical functioning, role-physical, bodily pain, general health), and mental health (vitality, social functioning, role-emotional, mental health). Score range. Range is 0 –100 for each of the 8 dimensions in SF-36v1. SF-36v2, SF-12v2, and SF8 use norm-based scoring (mean 50, SD 10). Interpretation of scores. Zero indicates poor health status, 100 indicates very good health status (i.e., no impact of condition on general health status). For norm-based scores, any score above or below 50 can be considered above or below the population average health status for that dimension and each point on the scale is 1/10 of the standard deviation. Method of scoring. Computerized scoring recommended. Responses on each item are entered into the computer using the codes given on the

Carr

questionnaire. Ten of the items are then recoded. Raw scores for each dimension are computed by summing across the items in the same scale/dimension. Raw scores are transformed to give scores 0 –100 for each dimension for SF-36v1. Norm-based scores are calculated for SF-36v2, SF12v2, and SF-8 by including population norms in the scoring algorithms. Time to score. Once questionnaire data are entered on the computer, scoring takes a few seconds. Computer software for analysis of SF-36 is available from a number of sources. Scanning programs that enable the completed questionnaires to be optically scanned into the database are also available. Computerized scoring recommended for versions that use norm-based scoring. Training to score. Minimal if computer software is used. Training to interpret. Minimal. The 0 –100 scores produced are easy to understand. What constitutes a meaningful change in score is less clear. Norm-based scoring makes interpretation easier in relation to the general population and makes change within populations more accurate and meaningful. It also allows direct comparison between scores of each of the versions of the SF survey (SF-36v2, SF-12v2, SF-8) and comparison between scores on each of the individual dimensions within the questionnaires. Norms available. Population norms available for US and UK. In the US, these are given for 7 age groups and by sex. In the UK, they are given by sex, age, socioeconomic class, and for chronic health conditions.

Psychometric Information SF-36 is the version that has been most extensively used and evaluated. SF-12 has been evaluated in some studies and further evaluation of it and SF-8 is being undertaken in large-scale population surveys and clinical trials. Reliability. Internal consistency. Median Cronbach’s alpha across several reliability studies ⬎0.80 for all dimensions except social function (0.76). Studies suggest all dimensions reliable for comparisons between groups of patients and that the physical function dimension may be reliable for comparison within individuals. Test-retest. Correlation coefficients for test-retest over a 2-week period ⬎0.80 across all dimensions.

Adult Quality of Life

Agreement between scores was high, differences between measurements did not exceed 1 point on the 100-point scales. Validity. Face and content validity. Questionnaire was derived from pre-existing questionnaires used in large population studies. Criterion validity. Many studies have shown associations between SF-36 and other general health status measures such as NHP, EuroQoL, QWB, and SIP, indicating criterion validity. Criterion validity for individual dimensions has been established by association with ability to work and pain ratings. SF-12 and SF-8 were developed from the content of the SF-36. The items in SF-12 are identical to 12 of the items in SF-36. One of the items in SF-8 is identical to an item in SF-36 and the others were developed from evaluation of the SF-36 in population studies and different cultural settings. Construct validity. Distinguishes between known groups (ill and healthy, severe disease and mild disease, and chronic medical condition from medical condition combined with psychological problem).

S127

criticisms that elderly patients find the questionnaire difficult to complete, often because the questions do not have significance for them, and it should be used with caution in patients ⬎60 years old, possibly in an interviewer-administered form. This concern would be relevant to many populations of people with rheumatic disease. The SF-36 has been used in many studies in rheumatology to document the relative health status of different rheumatic conditions, assess the effectiveness of interventions, or to assess the validity of new disease-specific questionnaires. The shortened versions may be of use in population surveys of rheumatic disease where respondent burden is a concern. A computerized version of the SF-36 has been successfully used to collect health status data in routine clinical practice in rheumatology. One of the strongest arguments for using a generic measure is the ability to make comparisons, not only between patients with the same condition in different settings or studies, but also across patients groups but this should be undertaken with caution. Studies of different patient groups have found anomalies in the ways in which patients rate their quality of life, with many chronically and seriously ill patients rating their quality of life higher than patients with acute, mild disease. These differences may also exist between different cultural settings. This disability paradox undermines the use of generic measures in making global comparisons.

Responsiveness/sensitivity to change. Mixed results from studies assessing responsiveness. Appears responsive in some chronic conditions (e.g., in patients with soft tissue injuries). Some studies have identified floor and ceiling effects with specific patient groups (e.g. floor effects of the role dimensions in people on hemodialysis), although floor and ceiling effects should be reduced with version 2. In other studies, responsiveness was superior to other health status measures (e.g., SF-36 identifies minor impacts on health status that cannot be identified using the NHP).

References

Comments and Critique

Additional References

The SF-36 is the most widely used health status measure worldwide. This means that, theoretically, the data in cross-cultural studies can be aggregated or directly compared. It is a welldesigned measure that has undergone exhaustive testing in a number of situations. Some sensitivity and comprehensiveness have inevitably been sacrificed in the interests of brevity and standardization, and users should evaluate their results in the context of what has actually been measured. This consideration is even more pertinent for SF-12 and SF-8. Although most patients find it easy to use, there have been some

Anderson JJ, Ruwe M, Miller DR, Kazis L, Felson DT, Prashker M. Relative costs and effectiveness of specialist and general internist ambulatory care for patients with 2 musculoskeletal conditions. J Rheumatol 2002;29:1488 –95. Angst F, Aeschlimann A, Steiner W, Stucki G. Responsiveness of the WOMAC osteoarthritis index as compared with the SF-36 in patients with osteoarthritis of the legs undergoing a comprehensive rehabilitation intervention. Ann Rheum Dis 2001;60:834 – 40. McHorney CA, Haley SM, Ware JE Jr. Evaluation of the MOS SF-36 Physical Functioning Scale (PF-10): II. Comparison of relative precision using Likert and

1. (Original) Ware JR, Sherbourne C. The MOS-36 item short form health survey 1: conceptual framework and item selection. Med Care 1992;30:473– 83. 2. Ware JR, Kosinski M, Keller SD. A 12-item ShortForm Health Survey: construction of scales and preliminary tests of reliability and validity. Med Care 1996;4:220 –33.

S128 Rasch scoring methods. J Clin Epidemiol 1997;50: 451– 61. Stucki G, Daltroy L, Katz JN, Johannesson M, Liang MH. Interpretation of change scores in ordinal clinical scales and health status measures: the whole may not equal the sum of the parts. J Clin Epidemiol 1996;49:711–7. Wilson AS, Kitas GD, Carruthers DM, Reay C, Skan J, Harris S, et al. Computerized information-gathering in specialist rheumatology clinics: an initial evaluation of an electronic version of the Short Form-36. Rheumatology (Oxford) 2002;41:268 –73.

SICKNESS IMPACT PROFILE (SIP) General Description Purpose. Generic measure of general health status designed for use in population surveys, evaluative studies of the outcomes of care and in clinical practice to monitor patients. It measures changes in behavior and daily activities due to sickness (1). Content. The SIP assesses performance of activities in 12 categories: sleep and rest; eating; work; home management; recreation and pastimes; ambulation; mobility; body care and movement; social interaction; alertness behavior; emotional behavior; and communication.

Carr

WHO ICF Components. Activity limitation, Participation restriction.

Administration Method. Interviewer-administered or selfcompleted. Training. Minimal. User manuals available. Time to administer/complete. Approximately 20 –30 minutes. Equipment needed. None needed. Availability/cost. Available from Health Policy and Management, School of Hygiene and Public Health, Johns Hopkins University, 624 Broadway, Baltimore MD 21205. Also available from the Medical Outcomes Trust, PO Box 1917, Boston, MA, 02205 or through their Website: www.outcomes-trust.org.

Scoring Responses. Scale. Interval. Score range. Each of the 12 categories is scored from 0 to 100. The overall score ranges from 0 to 100.

Developer/contact information. Developed by Marilyn Bergner, Health Policy and Management, School of Hygiene and Public Health, Johns Hopkins University, 624 Broadway, Baltimore MD 21205.

Interpretation of scores. Zero represents good health status or no change of behavior resulting from sickness. 100 represents poor health status or large impact of sickness on behavior.

Versions. The SIP has been adapted for use in the UK (known as the Functional Limitations Profile) and has been translated into many languages including Swedish, Spanish, French, and Dutch. There are also shorter, disease- or situation-specific versions for use in rheumatoid arthritis (SIP-RA), back pain, and nursing homes.

Method of scoring. The questionnaire consists of a series of statements. Respondents check those items that most relate to them. Each item is individually weighted to indicate the severity of impact. The overall score is calculated by summing the weighted values of the items selected, dividing them by the scale values for all items and multiplying the result by 100.

Number of items in scale. There are 136 items. Subscales. There are 12 category scales (listed in Content section) that can be aggregated into 2 subscales: physical (ambulation, mobility, body care and movement) and psychosocial (social interaction, alertness behavior, emotional behavior, and communication).

Time to score. 10 minutes. Training to score. Minimal. Training to interpret. Minimal. The 0 –100 scores have intuitive meaning. Norms available. No.

Populations. Developmental/target. Patients with a range of medical conditions.

Psychometric Information

Other uses. Specific disease groups including RA, head injury, back pain, and the elderly.

Reliability. Cronbach’s alpha values from several studies for internal consistency range from 0.91 to 0.95 for overall score, 0.84 – 0.93 for the 2

Adult Quality of Life

subscales, and 0.60 – 0.90 for the 12 categories. The test-retest correlations between the 2 administrations range from 0.88 to 0.92 for the overall score. Test-retest correlation for interviewer-administered version was 0.97 and was 0.87 for the self-completed version. The interrater reliability (between raters Kappa) was 0.87. Validity. Content. Face and content validity ensured by the way in which questionnaire items were generated. Items were derived from a literature review and interviews with professionals, patients and a healthy population. Criterion. Concurrent validity established by comparison with patients’ self assessments of functional and sickness limitations and clinicians’ assessments of functional and sickness limitation. There are strong correlations between SIP and AIMS (0.83) and AIMS2 (0.73). Construct. Individual items and subscales correlated with other measures (e.g., Katz’s Index of Activities of Daily Living, the Barthel Index, Carroll Rating Scale for Depression, and the Geriatric Depression Scale.) Correlations with measures of disease activity and severity are weaker (e.g., in RA, SIP scores correlate weakly [0.17– 0.26] with disease duration, early morning stiffness, erythrocyte sedimentation rate and anatomic stage). Responsiveness/sensitivity to change. Data on the responsiveness of SIP are mixed. One study in RA patients suggested good responsiveness. Others have found the SIP less responsive in patients with musculoskeletal disease than other health status measures.

Comments and Critique Before the advent of the SF-36, the SIP was considered the gold standard measure of general health status. It is very comprehensive and has been used over many years. It has been used in rheumatology to quantify health status and to assess the effectiveness of interventions/health care delivery in RA. Limitations to its use in rheumatology relate to general problems in scoring and completion. Problems with responsiveness may be worse if overall scores are used instead of category scores because the same overall score can be achieved through many different combinations of item scores. This may obscure change in specific items of importance to certain patient groups or types of treatment. An alternative method of scoring has been proposed based on the highest

S129

category score but this requires further evaluation. Its main disadvantage is that it is time-consuming to complete and score. This may limit its usefulness in clinical practice and in studies where patients are required to complete several questionnaires. A shortened, disease-specific version for rheumatoid arthritis (SIP-RA) has been developed to overcome some of these problems.

References 1. (Original) Bergner M, Bobbitt RA, Kressel S, Pollard WE, Gilson BS, Morris JR. The sickness impact profile: conceptual formulation and methodology for a development of a health status measure. Int J Health Serv 1976;6:393– 415.

Additional References Ahlmen M, Sullivan M, Bjelle A. Team versus non-team outpatient care in rheumatoid arthritis: a comprehensive outcome evaluation including an overall health measure. Arthritis Rheum 1988;31: 471–9. Bergner M, Bobbitt RA, Carter WB, Gilson BS. The Sickness Impact Profile: development and final revision of a health status measure. Med Care 1981; 19:787– 805. Pollard B, Johnston M. Problems with the sickness impact profile: a theoretically based analysis and a proposal for a new method of implementation and scoring. Soc Sci Med 2001;52:921–34. Sullivan M, Ahlmen M, Bjelle A, Karlssm J. Health status assessment in rheumatoid arthritis. II. Evaluation of a modified Shorter Sickness Impact Profile. J Rheumatol 1993;20:1500 –7.

SIP-RA General Description Purpose. Shortened version of the SIP developed to measure health status in patients with rheumatoid arthritis (RA) in clinical practice (1). Content. SIP-RA measures performance in 10 of the original 12 SIP categories: body care and management; mobility; emotional behavior; social interaction; alertness behavior; communication; sleep and rest; home management; recreation and pastimes; and eating. Developer/contact information. Marianne Sullivan, Professor of Psychology, Health Care Research Unit, Sahlgrenska University Hospital, S-413 45 Gothenburg, Sweden. Versions. Original, adapted from SIP. Number of items in scale. There are 64 items.

S130

Subscales. Physical: body care and movement (14 items), mobility (6 items); Psychosocial: Emotional behavior (7 items), social interaction (9 items), alertness behavior (6 items), communication (2 items); and Free-standing categories: sleep and rest (4 items), home management (10 items), recreation and pastimes (5 items), eating (1 item). Populations. Developmental/target. Developed in women with RA. Target population, all patients with RA. Other uses. None to date. WHO ICF Components. Activity limitation, Participation restriction.

Administration Method. Self-administered and intervieweradministered. Relatively easy to complete. Requires patients to indicate which statements relate to their current health status. Training. Minimal. Time to administer/complete. Time is 10 –15 minutes. Equipment needed. None. Availability/cost. Available from the developer.

Scoring Responses. Scale. Interval. Score range. Each of the 10 categories is scored from 0 to 100. The overall score ranges from 0 – 100. Interpretation of scores. Zero represents good health status or no change of behavior resulting from sickness, and 100 represents poor health status or large impact of sickness on behavior. Method of scoring. The questionnaire consists of a series of statements. Respondents check those items that most relate to them. Each item is individually weighted to indicate the severity of impact. The overall score is calculated by summing the weighted values of the items selected, dividing them by the scale values for all items and multiplying the result by 100. Time to score. Time is 8 –10 minutes.

Carr

Training to score. Minimal. Training to interpret. None required. Norms available. No.

Psychometric Information Reliability. Internal consistency. Cronbach’s alpha values 0.59 – 0.86. Test-retest. No published data available. Validity. Content. Derived from the SIP. Item reduction was performed in a 3-step process based on the dependent variables: physical discomfort, bodily pain scale, overall mood (mood adjective checklist), and joint function (Keitel Index). Criterion. High correlations between SIP-RA and the original SIP. Some correlation would be expected because SIP-RA is the same instrument. No data available for Bland and Altman’s method for measuring agreement, which removes this bias. Construct. Discriminates between known groups on basis of American Rheumatism Association functional class. Weaker correlations with measures of disease activity/severity (pain scale, Keitel Index, Ritchie Index, Lansbury Articular index, C-reactive protein), similar to correlations between the original SIP and disease measures. Responsiveness/sensitivity to change. Responsiveness assessed by correlating change in SIP-RA with changes in measures of disease activity over a 1-year period. The results show weak but statistically significant correlations, similar to correlations between the original version of SIP and changes in measures of disease activity. No specific measures of responsiveness (effect size or standardized response means) published which makes the responsiveness of SIP-RA difficult to assess.

Comments and Critique This disease-specific shorter form of the SIP has some potential for use in rheumatology. It is based on a well-established measure and the reduction in length will make it much easier and more practical to use and score. One possible criticism is that the adaptation was based on data only from women with RA, which may limit its generalizability to men with RA. The published data do not indicate that it has any significant advantages over the original version of SIP in terms of responsiveness but this needs further

Adult Quality of Life

S131

investigation in clinical trials and statistical indicators of responsiveness should be published.

WHO Website: http://www.who.int/evidence/ assessment-instruments/qol/index.html.

References

Versions. The original version, a shortened version (WHOQoL-100) and a brief version of the WHOQoL-100 (WHOQoL-Bref). WHOQoL instruments were developed simultaneously in many countries and are available in more than 40 different languages worldwide.

1. (Original) Sullivan M, Ahlmen M, Bjelle A, Karlssm J. Health status assessment in rheumatoid arthritis. II. Evaluation of a modified Shorter Sickness Impact Profile. J Rheumatol 1993;20:1500 –7.

WORLD HEALTH ORGANIZATION’S QUALITY OF LIFE INSTRUMENTS: WHOQoL, WHOQoL-100, WHOQoL-Bref General Description Purpose. Multidimensional, multilingual profile designed for the cross-cultural assessment of quality of life (1). Content. Based on the WHO definitions of health, it measures the positive as well as the negative effects of ill health. It assesses individuals’ perceptions of their position in life in relation to their goals and expectations and within the context of their culture and value systems. WHOQoL covers 30 facets of QoL: pain and discomfort; energy and fatigue; sexual activity; sleep and rest; sensory functions; positive feelings; thinking, learning, memory and concentration; selfesteem; body image and appearance; negative feelings; mobility; activities of daily living; dependence on medication or treatment; dependence on non-medicinal substances; communication capacity; working capacity; personal relationships; practical social support; activities as provider; supporter; physical safety and security; home environment; work satisfaction; financial resources; health and social care; opportunities for acquiring new skills; participation (recreation and leisure); physical environments; transport; spirituality. There is also a section for overall perceptions of health and quality of life. WHOQoL-100 covers the same areas but exclude sensory functions; dependence on nonmedicinal substances; activities as supporter/ provider; and work satisfaction. WHOQoL-Bref has the same content as WHOQoL-100 but with fewer items. Developer/contact information. The WHOQoL instruments were developed by the WHOQoL group, a collaboration of international experts at WHOQOL Group Program on Mental Health, World Health Organization, CH-1211, Geneva 27, Switzerland. Further information about the WHOQoL instruments can be obtained from the

Number of items in scale. WHOQoL: 281, WHOQoL-100:100, WHOQoL-Bref: 25. Subscales. WHOQoL and WHOQoL-100 have 6 domains containing different facets of QoL (each facet has several items): Physical (WHOQoL: 5 facets, WHOQoL-100: 4 facets); Psychological (5 facets); Activities (WHOQoL: 6 facets, WHOQoL100: 4 facets); Social relationships (3 facets); Environment (WHOQoL: 9 facets, WHOQoL-100: 8 facets), Spirituality (1 facet). WHOQoL-Bref has 4 domains: Physical health (7 facets); Psychological (6 facets); Social Relationships (3 facets); Environment (8 facets). Populations. Developmental/target. Measurement of quality of life in all well and ill populations cross-culturally. Other uses. None. WHO ICF Components. Impairment, Activity limitation, Participation restriction, Environmental factors.

Administration Method. Self-completed. Even the longest version (WHOQoL) is easy to complete for most populations. Training. None required. Time to administer/complete. The longest version (WHOQoL) takes 20 minutes for a well population to complete but can take up to 1.5 hours for severely ill people to complete. Completion of WHOQoL-100 and WHOQ-L-Bref is much quicker. Equipment needed. None. Availability/cost. Available for use with permission from the WHOQoL group at the address above. Manuals and syntax files for scoring are also available from the WHOQOL group. The questionnaires can be viewed on the WHO

S132

Website: http://www.who.int/evidence/assessmentinstruments/qol/index.html.

Scoring Response. Scale. Ordinal. Score range. Profiles of scores given for each domain. For WHOQoL-Bref, domain scores are 4 – 20.

Carr

scores). The pain and discomfort facets correlate with the McGill Pain questionnaire. Responsiveness/sensitivity to change. Limited responsiveness data available to date. WHOQoL100 and WHOQoL-Bref are responsive to change post liver transplantation, after a pain management program in chronic pain and to improvement in depression following antidepressant therapy.

Comments and Critique Interpretation of scores. High scores indicate a better quality of life. Method of scoring. Individual items are scored on one of 5 different 5-point rating scales. Domain scores for WHOQoL-100 are calculated by multiplying the mean of all facet scores within the domain by 4. Domain scores for WHOQoL-Bref are calculated by multiplying the mean of all items included in the domain by 4. Time to score. Minimal if computerized methods used. Training to score. Minimal. Training to interpret. Minimal but some guidance about the meaning of scores necessary. Norms available. None yet.

Psychometric Information Reliability. Internal consistency. Cronbach alpha values for the domain scores range from 0.66 to 0.97. Validity. Content. The unique method of development (simultaneous development worldwide with the ability to ensure conceptual equivalence in content) ensured the content validity of the WHOQoL instruments. The content is very comprehensive, covering all aspects of life and assessing the positive as well as negative effects of health on quality of life. Construct. All instruments discriminate between well and ill groups and between inpatients and outpatients. Quality of life scores for all domains except spirituality correlate with self-rated general health (higher quality of life scores are associated with positively rated health and with SF-36

The WHOQoL instruments are a unique development in the measurement of quality of life. Their conceptual underpinning, the methods used in their development and the comprehensiveness of the data they provide are very different from any of the other available health status measures. They are still under psychometric evaluation but the results to date have been very promising. The availability of a short form instrument (WHOQoLBref) makes its use in postal surveys and clinical practice more achievable, although using the shortform necessarily involves sacrificing some of the comprehensiveness of the measure. For this reason, it has been recommended that if measurement of social interactions/relationships is of primary importance, the more detailed WHOQoL-100 or the original instrument should be used. To date, the WHOQoL instruments have only achieved limited use in rheumatology in patients with RA. However, its unique cross-cultural validity, ease of completion, and comprehensive content make it a potentially valuable tool for clinical practice and research in rheumatology and in particular for international studies.

References 1. (Original) The WHOQOL Group. The World Health Organization Quality of Life assessment (the WHOQOL) position paper from the World Health Organization. Soc Sci Med 1995;41:1403–9.

Additional References Skevington SM. Measuring quality of life in Britain: introducing the WHOQOL-100. J Psychosom Research 1999;47:449 –59. The WHOQOL Group. Development of the World Health Organization WHOQOL-BREF Quality of Life Assessment. Psychol Med 1998;2:551– 8. Wirnsberger RM, De Vries J, Jansen TL, Van Heck GL, Wouters EF, Drent M. Impairment of quality of life: rheumatoid arthritis versus sarcoidosis. Neth J Med 1999;54:86 –95.

WHOQOLBREF

WHOQOL100

WHOQOL

SIP-RA

SIP

SF-36

RAQoL

QWB

PGI

NHP

EQ-5D

DRP

AIMS2

AIMS

Scale

Item formats

Response format

Mobility, physical activity, dexterity, household activity, social activities, 45 items standardized Guttman profile and index activities of daily living, pain, depression and anxiety questions Mobility, physical activity, dexterity, household activity, social activities, 101 items (26 in short Guttman profile and index activities of daily living, pain, depression and anxiety, arm function, form) standardized social support, work questions Physical activity, social activity, socioeconomic status, relationships, 6 items Individualized, free text emotions, and body image individualized and 10-point graphic rating scales. Profile Mobility; self-care; usual activity; pain/discomfort; anxiety/depression; 6 items standardized For 5 items, 3 level overall assessment of health state questions about response (no difficulty, health status some difficulty, extreme difficulty). General health question: 100point graphic rating scale. Index and profile Functional ability, pain, sleep, energy, emotional problems, participation 38 items series of Dichotomous yes/no (work and social activities) statements Profile describing health status 5 most important areas/activities in patients’ lives that are affected by 5 items Individualized, free text their condition individualized and 0–100 severity scale. Profile and index Symptoms, mobility, physical activity and social activity 30 items statements Classification by level of about health status severity (3 levels in in each dimension each dimension). Index Activities of daily living, social interaction, emotional well-being, 30 items series of Dichotomous yes/no relationships statements about response to each item health status (unweighted). Index Physical functioning, role limitations due to physical problems, bodily 36 items standardized 3, 5, and 6 point scales pain, social functioning, mental health, role limitations due to questions Profile emotional problems, vitality, overall/general health Sleep and rest; eating; work; home management; recreation and 136 items series of Endorsed statements pastimes; ambulation; mobility; body care and movement; social statements (weighted). Profile and interaction; alertness behavior; emotional behavior; communication describing different index levels of health within each dimension Body care and management; mobility; emotional behavior; social 64 items series of Endorsed statements interaction; alertness behavior; communication; sleep and rest; home statements (weighted). Profile and management; recreation and pastimes; eating describing different index levels of health within each dimension Pain and discomfort; energy and fatigue; sexual activity; sleep and rest; 281 items 5-point rating scales sensory functions; positive feelings; thinking, learning, memory and standardized concentration; self-esteem; body image and appearance; negative questions feelings; mobility; activities of daily living; dependence on medication or treatment; dependence on non-medicinal substances; communication capacity; working capacity; personal relationships; practical social support; activities as provider; supporter; physical safety and security; home environment; work satisfaction; financial resources; health and social care; opportunities for acquiring new skills; participation (recreation and leisure); physical environments; transport; spirituality Pain and discomfort; energy and fatigue; sexual activity; sleep and rest; 100 items 5-point rating scales positive feelings; thinking, learning, memory and concentration; selfstandardized esteem; body image and appearance; negative feelings; mobility; questions activities of daily living;dependence on medication or treatment; communication capacity; working capacity; personal relationships; practical social support; physical safety and security; home environment; financial resources; health and social care; opportunities for acquiring new skills; participation (recreation and leisure); physical environments; transport; spirituality Pain and discomfort; energy and fatigue; sexual activity; sleep and rest; 25 items standardized 5-point rating scales positive feelings; thinking, learning, memory and concentration; selfquestions esteem; body image and appearance; negative feelings; mobility; activities of daily living; dependence on medication or treatment; communication capacity; working capacity; personal relationships; practical social support; physical safety and security; home environment; financial resources; health and social care; opportunities for acquiring new skills; participation (recreation and leisure); physical environments; transport; spirituality

Content

Self

Self

Self

Self or interview

Self or Interview

Self or Interview

Self

Interview

Self

Self

Self

Self

Self

Self

Method of administration

10 minutes

10–20 minutes

20 minutes (well person) up to 1.5 hours (severely ill person)

10–15 minutes

20–30 minutes

10–20 minutes

6 minutes

30 minutes

10–20 minutes

10–15 minutes

2–5 minutes

20–30 minutes (short form 10 minutes) 2–15 minutes

15 minutes

Time for administration

Summary Table for Adult Quality of Life Measures

Generic, multinational, patients and healthy populations, adults.

Generic, multinational, patients and healthy populations, adults.

Generic, multinational, patients and healthy populations, adults.

Disease-specific: Women with RA

Generic, multinational, adults

Generic, multinational, adults

Disease-specific. Adult patients with RA

Generic. Adults and children

Generic. Adults

Generic, multinational, adults

Disease-specific: adults with arthritis, clinical practice Generic, multinational, adults

Disease-specific: adults with arthritis Disease-specific: adults with arthritis

Validated populations Good

Good

Insufficient data

Insufficient data

Insufficient data

Good

Good

Good

Good

Good

Good

Good

Good

Good

Good

Good

Good

Good

Good

Good

Good

Good

Good

Good

Good

Good

Good

Good

Validity

Good

Insufficient data

Insufficient data

Insufficient data

Good. Floor effects with some patient groups Mixed

Insufficient data

Good

Not responsive to small changes in health. Significant floor effects Mixed

Reasonable. Less responsive than disease-specific measures in some situations

Insufficient data

Good

Good

Responsiveness

Psychometric properties Reliability

Adult Quality of Life S133