10.1177/0272989X03253652 MEDICAL KING, UTILITY “PERFECT-HEALTH” STYN, ASSESSMENT DECISION TSEVAT,VERSUS MAKING/MAY–JUNE ROBERTS AND DECISION “DISEASE-FREE” PSYCHOLOGY 2003 UPPER ANCHOR
ARTICLE
“Perfect Health” versus “Disease Free”: The Impact of Anchor Point Choice on the Measurement of Preferences and the Calculation of Disease-Specific Disutilities Joseph T. King, Jr., MD, MSCE, Mindi A. Styn, BS, Joel Tsevat, MD, MPH, Mark S. Roberts, MD, MPP
Background. During preference testing, some investigators use “perfect health” as the upper anchor point of their measurement scale (“Q scale”), whereas others use “disease free” (“q scale”), which can confound the interpretation and comparison of study results. Methods. We measured current health preferences among 74 patients with cervical spondylotic myelopathy (CSM) on both the Q and q scales using the visual analogue scale (VAS), standard gamble (SG), time tradeoff (TTO), and willingness to pay (WTP). Results. There were significant differences in mean Q and q scale val-
ues for the VAS, SG, and WTP (for all, P < 0.011); there were no significant differences for mean TTO values (P = 0.592). CSM accounted for 63% to 82% of total disutility, whereas other comorbidities accounted for 28% to 37%. Conclusions. Preferences for CSM differ when measured on the Q and q scales. Caution should be used when comparing and interpreting health values measured on scales with different upper anchors. Key words: measurement; preferences; standard gamble; time tradeoff; utility; visual analogue scale; willingness to pay. (Med Decis Making 2003;23:212–225)
T
participants with a hypothetical choice involving a risk of immediate death (SG) or the loss of future survival time (TTO) in exchange for perfect health and then calculate preferences on the basis of responses. The VAS asks participants to rate a health state on a linear scale. Contingency valuation techniques such as willingness to pay (WTP) have also been used to assess preferences for health states using a monetary metric.2 Preferences are often measured in study populations in which participants suffer from > 1 disease or condition that contributes to their imperfect health. Several additive, multiplicative, and higher order models have
he standard gamble (SG), time tradeoff (TTO), and rating scales techniques such as the visual analogue scale (VAS) are common methods for quantifying preferences for health states.1 As originally proposed, these methods value preferences on a ratio scale with 2 anchor points: 1) death, assigned a value of 0.0, and 2) perfect health (also expressed as maximal health, full health, or excellent health), valued at 1.0. Preference elicitation techniques such as the SG or TTO present
Received 18 March 2002 from the Section of Neurosurgery, Surgical Service Line (JTK), and the Center for Health Equity Research and Promotion (JTK, MAS), VA Pittsburgh Healthcare System; the Department of Neurological Surgery (JTK), the Section of Decision Sciences and Clinical Systems Modeling, Division of General Internal Medicine, Department of Medicine (JTK, MSR), the Division of General Internal Medicine (MSR), and the Center for Research on Health Care (MSR), University of Pittsburgh; the Section of Outcomes Research, Division of General Internal Medicine, Department of Internal Medicine, and the Center for Clinical Effectiveness, Institute for Health Policy and Health Services Research, University of Cincinnati Medical Center (JT); and the Veterans Affairs Medical Center, Cincinnati, Ohio (JT). Dr. King is a staff physician at the VA Pittsburgh Healthcare System and is supported by a Veterans Administration VISN 4 CPPF grant and a Mentored Patient-Oriented Research Career Award from the National
Institute of Neurological Disorders and Stroke (1K23 NS02169-01). Financial support for this study was provided in part by grants from the National Institutes of Health and the Veterans Administration VISN 4. The funding agreements ensured the authors’ independence in designing the study, interpreting the data, and writing and publishing the report. Revision accepted for publication 8 January 2003. Address correspondence and reprint requests to Joseph T. King, Jr., MD, MSCE, VA Connecticut Healthcare System, Surgical Service/112, 950 Campbell Ave., West Haven, CT 06516; telephone: (203) 9373409; fax: (203) 937-3845; e-mail:
[email protected]. DOI: 10.1177/0272989X03253652
212 • MEDICAL DECISION MAKING/MAY–JUNE 2003
Downloaded from http://mdm.sagepub.com at UNITED BIOSOURCE CORP on July 14, 2009
“PERFECT-HEALTH” VERSUS “DISEASE-FREE” UPPER ANCHOR
been proposed to quantify the relative contributions of multiple diseases to health state preferences. However, many investigators effectively ignore the contributions of other diseases during preference testing, attributing the entire difference between current health and perfect health to the disease of interest by substituting a “disease-free” upper anchor for perfect health. Fryback and Lawrence3 noted the methodological problems inherent when investigators deviate from the perfecthealth scale upper anchor point while measuring preferences in participants with multiple diseases or conditions. They introduced terminology to distinguish between the 2 measurement scales: the Q scale, using a perfect-health scale upper anchor point, and the q scale, using a disease-free upper anchor point. If a participant has only 1 disease or condition that detracts from perfect health, then the Q and q scales are identical. For any participant with multiple sources of disutility, the scales are different, and assuming equivalence introduces errors into the interpretation of preference values measured on the nonconventional q scale. The use of the q scale results in the overestimation of the disutility produced by the disease of interest and the corresponding inflation of the apparent benefits of curing the condition (Fig. 1).3 Such disutility inflation distorts comparisons of the study results with those from other studies that do not assume that the value of the disease-free health state is equivalent to perfect health. In addition, incorporating such inflated values into a cost-utility analysis decreases the costutility ratio associated with the successful treatment of the disease of interest, increasing the apparent economic attractiveness of the intervention. Fryback and Lawrence3 proposed 2 solutions to the problem of calculating disease-specific disutility. The 1st solution (henceforth referred to as “populationproxy adjustment”) uses population-derived normative values for quality of life to adjust the raw measurements of disease-specific disutility as follows: 1) measure preferences for current health on the Q scale in participants with the disease of interest, 2) use agestratified normative population preference values (e.g., Beaver Dam quality-of-life values4) as a proxy for the value of the disease-free health state, and 3) calculate the disease-specific disutility by subtracting the value of current health from the proxy value for the diseasefree health state. The 2nd proposed solution (henceforth referred to as “pre- and postcure assessment”) uses the difference between pre- and posttreatment values as follows: 1) measure preferences for current health on the Q scale in participants with the disease of interest, 2) measure preferences for current health on the Q scale in those participants who have been suc-
“Q”uality scale
“q”uality scale
excellent
excellent
1.0
0.87 no angina
“1.0” no angina
0.61 = 0.7 x 0.87
“0.7” angina
0.3 q
0.26 Q dead
0.0
dead
0.0 3
Figure 1 In their 1997 article, Fryback and Lawrence depicted the relationship between values measured using excellent health (i.e., perfect health) as the scale upper anchor point, designated the Q scale, and those obtained using the absence of the disease or condition of interest as the scale upper anchor point, which they called the q scale (reproduced with the permission of the authors and the publisher): “The relationship between the Q- and q-scale is illustrated for the case of an analysis of an intervention that relieves angina. On the right, the state ‘no angina’ is scaled at 1.0 on the q scale; however, in 4 the Beaver Dam data this state received an average of 0.87 when assessments were made on the Q scale. In the q units, the health related quality of life gain from relieving angina is 0.3; in Q units the gain is 0.26. Because the state of having angina is assessed in proportion to the state of no angina on the q scale, it is possible to make a proportional translation from the q scale to the Q scale” (p. 278).
cessfully treated for the disease of interest, and 3) calculate the disease-specific disutility from the difference between successful pre- and posttreatment values. Both techniques are designed to isolate the disutility of the disease of interest from other conditions that may be contributing to global disutility.* Fryback and Lawrence3 also noted that it is possible to effect a “proportional translation” between the Q and q scales. Using proportional translation, we propose a 3rd solution to the problem of calculating disease-specific disutility—dual-preference testing using both the Q and q scales for the same participant, followed by the transformation of all values onto the Q scale with a perfect-health upper anchor point (henceforth referred to as “upper-anchor-point adjustment”) (Fig. 2). Upper-anchor-point adjustment provides the direct measurement of current health on a scale anchored by death and perfect health and allows
*Both of these techniques as originally proposed assume that utilities are additive.
UTILITY ASSESSMENT AND DECISION PSYCHOLOGY
Downloaded from http://mdm.sagepub.com at UNITED BIOSOURCE CORP on July 14, 2009
213
KING, STYN, TSEVAT, ROBERTS
the calculation and mapping of the value for diseasefree health onto the same scale. The difference between the 2 values provides the disease-specific disutility. Upper-anchor-point adjustment offers several potential advantages over the other 2 options: 1) it directly measures individual patients’ preferences for the disease-free health state, as opposed to using population derived normative values; and 2) it does not require a successfully treated population to estimate the effects of curing the disease of interest but rather bases its estimate on 2 preference measurements in the untreated population. Another method of calculating diseasespecific disutility uses the valuation of hypothetical disease-free health state scenarios on the Q scale (henceforth referred to as “disease-free scenario adjustment”). The results of such testing were first described by Nichol and colleagues5 in 1996 for patients with angina who were asked to value personalized angina-free health state scenarios on the Q scale using the SG. All of these methods—population-proxy adjustment, preand postcure measurement, upper-anchor-point adjustment, and disease-free scenario adjustment—use different measurements to parse the value of current health into its constituents. Figure 3 provides graphical representations comparing the 4 methods. We are not aware of any studies that have compared preference values measured on the Q and q scales for the same participants. We would make several a priori predictions about the results of such testing: 1) if all of a participant’s disutility comes from the study disease, then we would expect identical responses to testing using the Q and q scales; and 2) if a participant’s disutility is a combination of the study disease and other diseases, then we would expect different results from the Q and q scale testing. Furthermore, these differences should vary in a predictable fashion: 1) VAS—participants should select a point on the scale further from the perfect-health upper anchor point compared to the point selected on the disease-free upper anchor point scale; 2) SG—participants should accept a larger risk of death to obtain perfect health compared to their maximum acceptable risk to obtain disease-free health; 3) TTO—participants should trade more time to obtain perfect health compared to their maximum time traded for disease-free health; and 4) WTP—participants should be willing to pay more for perfect health than for disease-free health (Table 1). As part of a study on preferences for spine disease, we designed an experiment to examine the effects of using the Q or q scale during preference measurements with the VAS, SG, TTO, and WTP. We compared the re-
“Q”uality scale perfect health
“q”uality scale perfect health
1.0 a
0.75 (calculated) 0.60 (measured)
“1.0”
no CSM b
current health
d
“0.8” (measured)
U(CH)Q
U(CH) q
0.15 Q
c
dead
0.20 q
e
0.0
dead
0.0
Figure 2 The left-hand scale is the traditional preference measurement scale anchored on perfect health = 1.0 and death = 0.0 (alias Q 3 scale of Fryback and Lawrence ): a = disutility associated with other diseases; b = disutility associated cervical spondylotic myelopathy (CSM); c = utility of current health (including CSM and other diseases) measured during the utility elicitation process. The right-hand scale is anchored on disease free (i.e., no CSM) = 1.0 and death = 0.0 3 (alias q scale of Fryback and Lawrence ): d = “disutility” of CSM; e = “utility” of current health (including CSM but ignoring other diseases) measured during the utility elicitation process. Participants with several conditions contributing to disutility will report different values for their current health depending on whether the Q or q scale is used during preference assessment. In this example, a participant afflicted with CSM and several other conditions is tested using perfect-health and disease-free upper anchor points. On the left-hand Q scale, the measured value of current health with CSM and other diseases = 0.60. On the right-hand q scale, the measured value of current health with CSM (and other diseases) = 0.80. The ratio between current health and disease-free health on the two scales allows the calculation of the value of the CSM disease-free health state expressed on the left-hand Q scale anchored on death and perfect health using the following equations: . − e) (10
c c *e c 060 . = − = −c = − 060 . = 015 . e e e e 080 . DU(Other)D = a = 1.0 – b – c = 1.0 – 0.60 – 0.15 = 0.25
DU(CSM)D = b = c
U(DF)D = U(CH)Q + DU(CSM)D = c + b = 0.60 + 0.15 + 0.75 c c 060 . −c = = = 075 . e e 080 . Using proportional translation, the q unit disutility of CSM = 0.20 is mapped onto the Q scale, yielding a corrected Q unit CSM diseasespecific disutility = 0.15.
U ( DF ) D = U (CH ) Q + DU (CSM ) D = c +
sults for disease-specific disutility calculated using the population-proxy and upper-anchor-point adjustment techniques. We also explored the relationship between disease-specific disutility, other-disease disutility, and possible explanatory factors such as severity of the disease under study and burden of comorbid diseases.
214 • MEDICAL DECISION MAKING/MAY–JUNE 2003
Downloaded from http://mdm.sagepub.com at UNITED BIOSOURCE CORP on July 14, 2009
“PERFECT-HEALTH” VERSUS “DISEASE-FREE” UPPER ANCHOR
“Truth” 1.0
A: Population-Proxy Adjustment Q Q 1.0
DF
DiseaseSpecific Disutility
B: Upper-Anchor-Point Adjustment Q q
1.0
1.0
DFPopulation CH
DiseaseSpecific Disutility
“1.0” (DF) CH
CHq
0.0
0.0
DiseaseSpecific Disutility CH
0.0
0.0
C. Pre- and Post-Cure Assessment Q Q 1.0 DiseaseSpecific Disutility
0.0
1.0
1.0 CH Post
CHPre
0.0
D: Disease-Free Scenario Adjustment Q Q
DiseaseSpecific Disutility
DFHypothetical CH
0.0
0.0
1.0
0.0
Figure 3 The “truth” diagram depicts an ideal scale of health state valuations anchored on death = 0.0 and perfect health = 1.0. On this idealized scale, the value of current health (CH—an amalgam of the effects of the study disease of interest and other diseases) and the value of current health absent the effects of the study disease (DF—study disease–free health) are known, and the difference between these values is the disutility attributable to the study disease (disease-specific disutility). Using various measurement techniques in combination, we can approximate the “truth” and calculate the disease-specific disutility by 4 methods: (A) Population-proxy adjustment—measure the value of current health on the Q scale (CH), measure population values for current health on the Q scale (e.g., Beaver Dam values) and assume that these individuals do not have the study disease (DFPopulation), and the difference between these values is the disutility attributable to the study disease. (B) Upper-anchorpoint adjustment—measure current health on the Q scale (CH), measure current health on the q scale (CHq), transform the q scale values onto the Q scale to obtain the value of disease-free health, and subtract the value of current health from that of disease-free health to obtain the disutility attributable to the study disease. (C) Treatment-effect adjustment—measure current health on the Q scale (CHPre), “cure” the study disease with a medical intervention, measure postcure health on the Q scale (CHPost), and subtract the value of precure current health from the value of postcure health to obtain the disutility attributable to the study disease. (D) Disease-free scenario adjustment—measure current health on the Q scale (CH), measure the value of a hypothetical study disease–free health state on the Q scale (DFHypothetical), and subtract the value of current health from the value of hypothetical study disease–free health to obtain the disutility attributable to the study disease.
METHODS Data Collection We studied patients with cervical spondylotic myelopathy (CSM), a degenerative narrowing of the spinal canal in the neck that presses on the spinal cord, producing varying amounts of neck pain, numbness and clumsiness of the hands, gait difficulties, sphincter dysfunction, and impotence. We recruited participants from the outpatient neurosurgery clinic at the Veterans Affairs (VA) Pittsburgh Healthcare System from October 2000 to September 2001. Prospective participants were asked to participate in a study designed “to learn more about how Veterans feel about cervical spinal stenosis.” Informed consent was obtained from all partici-
Table 1 Predicted Preference Testing Responses on the Q and q Scales a
Predicted Preference Testing Responses
Visual analogue scale Standard gamble Time tradeoff Willingness to pay
Q scale value < q scale value Q scale value < q scale value Q scale value < q scale value Q scale value > q scale value
Note: Q scale = preference values obtained during testing with a perfecthealth upper anchor; q scale = preference values obtained during testing with a study disease–free upper anchor. a. Predictions apply only to participants whose disutility is a function of the study disease and other diseases. Participants with no other diseases or conditions contributing to disutility should have identical responses to testing using the Q and q scales.
UTILITY ASSESSMENT AND DECISION PSYCHOLOGY
Downloaded from http://mdm.sagepub.com at UNITED BIOSOURCE CORP on July 14, 2009
215
KING, STYN, TSEVAT, ROBERTS
pants prior to data acquisition, and the protocol was approved by the institutional review boards of the VA Pittsburgh Healthcare System and the University of Pittsburgh. Neurosurgery clinic physicians completed a brief screening survey on all new clinic patients to identify patients with CSM on the basis of symptoms, physical signs, and imaging study results. Participants underwent structured interviews to collect data on demographics, habits, symptoms, functional status, and comorbidities. We assessed comorbid diseases by tabulating common medical conditions that might affect the risk-benefit ratio when considering surgery (e.g., cardiovascular disease, hypertension) and selected neurological conditions relevant to the assessment of outpatients in a neurosurgery clinic (e.g., carpal tunnel syndrome, stroke). Preference testing was administered by a trained research assistant using a script. A standardized neurological examination was performed by a neurosurgical clinical nurse specialist or neurosurgeon. Motor function, sensation, and bladder function were used to assign each participant a CSM severity score using a Western modification* of the Japanese Orthopaedic Association (mJOA) scale.6 The mJOA scale ranges from 17 (full arm function, unimpaired ambulation, normal sensation, and normal bladder function) to 0 (absent arm function, inability to walk, severe sensory impairment, and inability to void). Participants received $25 at the completion of the data collection process. Preference Testing Participants underwent 2 consecutive rounds of testing to measure their preferences for their current health. Each round consisted of a VAS, SG, TTO, and WTP. One round of testing used the Q scale with perfect health as the upper anchor point, defined as “The best possible health that you can imagine. You are cured of your spinal stenosis, and you are cured of all other health problems.” The other round used the q scale with absence of CSM (i.e., “disease free”) as the upper anchor point, described to participants as a cure of only their spinal stenosis symptoms: “Numb, clumsy hands; difficulty walking; problems with bladder control; problems with sexual function.” During each round of testing, participants were given a card printed with the anchor point definition as a mnemonic aid. The order of the testing rounds was randomized on the basis of *The criterion for full arm function in the original Japanese Orthopaedic Association scale is the ability to use chopsticks for selffeeding. The Western modification substitutes the ability to use a knife and fork.
the last digit of the participants’ study ID numbers. A research assistant performed preference testing using a script and interactive graphical software running on a portable computer. We used iMPACT3 software7 for VAS, SG, and TTO testing and a Visual Basic program written by one of the investigators (MAS) to assess WTP. We describe here the details of testing using the q scale—analogous instructions were used during the Q scale testing. VAS Participants were presented with a vertical thermometer depicted on the computer screen anchored at the lower end by death = 0% and at the upper end by disease-free health = 100%. Participants indicated the value of their current health by positioning the level of the indicator bar on the thermometer. The software allowed responses in increments of 5%, the equivalent of 0.05 utility units. SG Participants were offered a choice between continuing in their current states of health or accepting a hypothetical treatment for their CSM symptoms. The treatment had 2 possible outcomes: death or complete CSM symptom cure. The probabilities of death and CSM cure were systematically varied using a ping-pong technique until the participant was indifferent between current health and the treatment. The probability of death was graphically represented on the computer screen by blackening out a corresponding proportion of a grid of 100 faces. The utility was equivalent to the probability of cure at the indifference point. The iMPACT3 software permitted probabilities to vary by as little as 1%, the equivalent of 0.01 utility units. TTO Participants were offered a choice between continuing in their current states of health or reducing their life expectancy by trading years of life in exchange for an immediate permanent cure of their CSM symptoms. The number of years required to obtain a cure was systematically varied using a ping-pong technique until the participant was indifferent between current health and the tradeoff. We presented all participants with a 20-year life expectancy, the maximum permitted by the iMPACT3 software. The relationships between 20 years of life in current health, reduced life expectancy in disease-free health, and time lost from early death were displayed by horizontal bars on the computer screen. The utility was equivalent to the ratio between time in CSM disease-free health and time in current health at the indifference point. The minimal incre-
216 • MEDICAL DECISION MAKING/MAY–JUNE 2003
Downloaded from http://mdm.sagepub.com at UNITED BIOSOURCE CORP on July 14, 2009
“PERFECT-HEALTH” VERSUS “DISEASE-FREE” UPPER ANCHOR
mental change permitted by the iMPACT3 software was 1 year, the equivalent of 0.05 utility units. WTP We used a closed-ended contingency valuation WTP method to determine WTP for a hypothetical treatment resulting in a cure of CSM symptoms. We asked participants to imagine that they could purchase a cure for their spinal stenosis symptoms with a single payment. Participants were encouraged to consider the financial consequences of buying the treatment by the following statement: “To pay for your treatment, you might use your savings, your present household income, loans that you would have to pay back, and possible future increases in your income after you are cured of your spinal stenosis symptoms.” The interviewer then quoted a series of prices to the participants, and for each amount, they were asked: “Would you be willing to pay $X for a cure for your spinal stenosis symptoms?” A computer program calculated each successive price offer on the basis of an algorithm incorporating annual household income and the participant’s last response. Participants were first asked if they were willing to pay $1. If they were willing to pay $1 (virtually all were), the next price offer was the equivalent value of 1 month’s income. Offers were then systematically increased or decreased until convergence on a final monetary value was obtained. The maximum price permitted was 10 times the participant’s own annual household income.
where DU(CSM)P is the disutility attributable to CSM calculated using population-proxy adjustment, expressed on the Q scale; U(DF)P is the utility of the CSM disease-free health state using population-based values from the Beaver Dam study, expressed on the Q scale; and U(CH)Q is the utility of the current health state measured on the Q scale. The other-disease disutility was calculated as DU(Other)P = 1 – Beaver Dam Value,
where DU(Other)P is the disutility attributable to other diseases calculated using population-proxy adjustment, expressed on the Q scale. • Upper-anchor-point adjustment: The responses from the Q scale measurements were used to assign the values for current health. We then calculated adjusted disease-free health state values, mapping the q scale values onto the Q scale using the following formula: U ( DF )D =
We calculated means, standard deviations, and medians for each of the preference measurement techniques. We compared the values obtained from testing with the disease-free and perfect-health anchor points using the Wilcoxon signed rank test. Fisher’s exact test was used to examine proportional differences. We also calculated the disease-specific disutility for CSM and other-disease disutility using the VAS, SG, and TTO values for both the population-proxy adjustment and upper-anchor-point adjustment methods as follows: • Population-proxy adjustment: The responses from the Q scale assessments were used to assign the values for current health. Population-based age- and sex-stratified normative preference values from the Beaver Dam study4 were used as a proxy for the CSM disease-free health state. The disease-specific disutility was then calculated as DU(CSM)P = U(DF)P – U(CH)Q,
(1)
U (CH )Q
(3)
U (CH )q
where U(DF)D is the utility of the CSM disease-free health state calculated using upper-anchor-point adjustment, expressed on the Q scale; and U(CH)q is the utility of the current health state measured on the q scale. Similarly, the disease-specific disutility was then calculated as DU(CSM)D = U(DF)D – U(CH)Q =
Data Analysis
(2)
U (CH )Q U (CH )q
− U (CH )Q ,
(4) (5)
where DU(CSM)D is the disutility attributable to CSM calculated using upper-anchor-point adjustment, expressed on the Q scale. The other-disease disutility was calculated as DU(Other)D = 1 – U(DF)D,
(6)
where DU(Other)D is the disutility attributable to other diseases calculated using upper-anchor-point adjustment, expressed on the Q scale.
WTP values are not measured on a ratio scale bounded by 0.0 and 1.0 but are rather expressed as nonzero dollar amounts (in the present study, these values were capped at 10 times annual household income). Larger WTP values are associated with less desirable health states. The impact of the study disease on current health was directly assessed by the WTP measured on the q scale. Likewise, the combined effects of the study disease and other diseases were directly mea-
UTILITY ASSESSMENT AND DECISION PSYCHOLOGY
Downloaded from http://mdm.sagepub.com at UNITED BIOSOURCE CORP on July 14, 2009
217
KING, STYN, TSEVAT, ROBERTS
Table 2 Preference Values for Cervical Spondylotic Myelopathy (n = 74) Visual Analogue Scale q Scale
Q Scale
Standard Gamble Q Scale
q Scale
Time Tradeoff Q Scale
q Scale
Willingness to Pay Q Scale
q Scale
Median 0.50 $43,750 0.79 0.80 0.75 0.75 0.50 $25,000 Mean (SD) 0.57 (0.20) 0.50 (0.19) 0.71 (0.22) 0.73 (0.23) 0.70 (0.28) 0.72 (0.28) $94,335 ($119,286) $76,472 ($112,216) 0.174). Only the VAS with upper-anchor-point adjustment showed a trend toward association between the number of comorbid conditions and the absolute disutility attributable to other diseases (P = 0.099); there was no association with the VAS and population-proxy adjustment or with either adjustment technique for the VAS, SG, or TTO (for all, P ≥ 0.123). There were also no associations between the proportion of total disutility attributable to other diseases and any preference measurement or adjustment technique (for all, P ≥ 0.114). DISCUSSION We explored the impact of using 2 different upper anchor points (perfect health and CSM disease free) on VAS, SG, TTO, and WTP preference values for current health in a sample of patients afflicted with CSM. We found significant differences in the paired values for current health using the Q and q scales for the VAS, SG, and WTP testing protocols; there was no difference in the values obtained with the TTO. All mean differences were consistent with participants valuing perfect health more than CSM disease-free health. We then compared 2 methods to adjust preference values and calculate CSM disease-specific disutility expressed on the Q scale. Population-proxy adjustment uses preferences measured on the Q scale and normative values from the Beaver Dam study to calculate diseasespecific disutility. Upper-anchor-point adjustment uses preference values measured with both the Q and q
scales to calculate disease-specific disutility. We found that with both adjustment methods, for the majority of preference measurement techniques, the adjusted values for CSM disease-specific disutility measured on the Q scale were significantly different from the unadjusted values measured on the q scale. Although the absolute values for current health, CSM disease-specific disutility, and other-disease disutility varied across the preference testing methods, the proportion of disutility attributable to CSM (63%–82%) and other diseases (18%–37%) was relatively consistent for the VAS, SG, TTO, and WTP. There was also a significant association between CSM disease-specific disutility and CSM disease severity with WTP. Upper-Anchor-Point Testing The definition of the scale upper anchor point has a significant impact on the value of current health assessed with several preference measurement techniques. Fryback and Lawrence3 postulated that measurements using the q scale would produce inflated values for disease-specific disutility compared to values measured on the Q scale (Fig. 1). The results of the current study using 2 different methods to calculate adjusted disease-specific disutility confirm their contention. In these patients with CSM, preferences measured on the q scale overestimated the value of current health by as much as 0.14 (25% of mean TTO values, Table 4) and overestimated the value of disease-specific disutility by up to 0.05 (12% of mean VAS values or 17% of mean TTO values, Table 4) compared to Q scale values. If CSM q scale values were incorporated into a cost-effectiveness analysis of a curative intervention, the resulting cost-effectiveness ratio would be proportionally reduced in favor of the intervention. A large proportion of participants (39%–72%) gave identical responses on the Q and q scales for each of the preference testing modalities, and another 3% to 23% gave paradoxical answers indicating that they valued disease-free health more than perfect health. Multiple factors may have contributed to this finding, including 1) minor or absent comorbid disease, 2) the failure of participants to differentiate between the Q and q scales during testing, 3) a failure to understand or cooperate with testing, and 4) a study-induced focusing effect. Comorbid Diseases and CSM Severity If CSM were the only disease that detracted from a participant’s perfect health, or if other problems were relatively minor, then we would predict that preferences for CSM disease-free health would be the same as for perfect health. This may be the case for the 39% to
UTILITY ASSESSMENT AND DECISION PSYCHOLOGY
Downloaded from http://mdm.sagepub.com at UNITED BIOSOURCE CORP on July 14, 2009
221
KING, STYN, TSEVAT, ROBERTS
72% of participants (depending on the preference testing methodology) who gave identical responses to testing on the perfect-health and disease-free upper anchor scales. As a corollary, we would expect similar proportions of participants with identical responses across all 4 preference-testing methodologies. In fact, there were substantial differences across testing methods in the proportion of participants with identical responses, from a low of 39% for the VAS to a high of 72% for WTP. This variation suggests that other factors contributed to finding identical responses for Q and q scale testing. We also expected a relationship between comorbid disease burden and disutility attributable to other diseases; however, the data did not support this hypothesis. Failure to Differentiate between Q and q Scales Patients may not have appreciated the distinction between the 2 scale upper anchor points. We took great care to emphasize the distinction during testing: 1) the interviewing script introducing each preference test explicitly stated the upper anchor point; 2) the graphical software screen display included text defining the upper anchor point; 3) the interviewing script stressed the change in upper anchor point when transitioning between testing rounds; and 4) during testing, participants were given a card to remind them of the upperanchor-point definition. Nevertheless, during debriefing, it was clear that some participants did not appreciate the distinction between the 2 testing scales. Failure to Understand or Cooperate with Testing Techniques Some participants offered comments during the testing protocol that indicated that they were not following test instructions. For example, on the SG, 1 participant stated, “I’ll just play the odds—make it 50:50.” At least 1 other participant distinctly made an effort to recall and reproduce previous answers, believing that they were being retested using the same questions. Study-Induced Focusing Effect Participants had an increased awareness of CSM and its symptoms because of the recruitment and testing protocols: 1) the consent form mentioned the symptoms of CSM; 2) the interview immediately prior to preference testing asked numerous detailed questions about CSM symptoms; 3) some participants were offered a surgical procedure to treat their CSM during their encounters with their neurosurgeons, producing a heightened awareness of the impact of the condition on health; and 4) participants were offered a $25 incen-
tive for study participation. These factors may have produced a temporary focusing effect, exaggerating the effects of CSM-related symptoms beyond their usual impact on daily activities and quality of life. Disutility of CSM Is it plausible that 63% to 82% of CSM patients’ total disutility is due to CSM, even if they are afflicted with numerous comorbid diseases? There are data that suggest that CSM has a wide-ranging impact on many aspects of health-related quality of life. A recent study of health status using the SF-36 health survey demonstrated the adverse impact of CSM on all SF-36 health domains.10 Such a disease could dominate an individual’s quality of life, producing results such as those seen here. The consistency of the impact of CSM on quality of life across 4 very different techniques and 2 methods of preference adjustment lends additional credibility to these results. Although unlikely, the aforementioned study-induced focusing effect may account for this consistent disease impact by uniformly affecting all preference measurement and adjustment techniques. Comparing Population-Proxy and Upper-Anchor-Point Adjustment Several criteria can be applied to compare population-proxy and upper-anchor-point adjustment: 1) the proportion of nonsensical responses requiring the elimination of responses; 2) the consistency of the proportion of disutility attributable to CSM and other diseases across VAS, SG, TTO, and WTP; 3) disutility values attributable to CSM and other diseases; 4) the use of population-derived and individual preference values; and 5) the burden of data acquisition. Nonsensical Responses and Disutility Proportions For the 1st 2 criteria, neither technique is clearly superior: both techniques generated similar proportions of nonsensical responses, and the proportion of disutility attributable to CSM (63%–82%) and other diseases (18%–37%) was fairly consistent across the VAS, SG, TTO, and WTP (and quite similar when comparing population-proxy and upper-anchor-point adjustment). Absolute Disutility Values The absolute values of the disutilities attributed to CSM and other diseases were quite similar across the 2
222 • MEDICAL DECISION MAKING/MAY–JUNE 2003
Downloaded from http://mdm.sagepub.com at UNITED BIOSOURCE CORP on July 14, 2009
“PERFECT-HEALTH” VERSUS “DISEASE-FREE” UPPER ANCHOR
techniques. In the absence of a gold standard, the slight differences do not clearly favor either technique. Population-Derived versus Individual Preference Values Population-derived preferences have been criticized for not being representative of the values of individual patients.11 A theoretical advantage of upper-anchorpoint adjustment over the population-proxy adjustment is that the former uses the preferences only of individual patients. Others concede that although individual preference values are preferable for individualized decision making, population-derived preferences may be more appropriate for policy formation or clinical recommendations.12 The intended application of the study results may dictate which technique might be more appropriate. Burden of Data Acquisition Because it does not require preference data collection using 2 different scales, population-proxy adjustment is a more resource-efficient method of determining disease-specific disutility. This is more appealing both to researchers, who can collect data with less effort, and to research participants, who have a lower respondent burden. However, note that investigators using WTP cannot use population-proxy adjustment (there are no normative population values) and thus must employ upper-anchor-point adjustment if they wish to avoid the discrepancy between the Q and q scales. Modeling Disutility Most of our hypotheses regarding predicted relationships between disutility and comorbid diseases or CSM severity were not supported by the data. There was no association between comorbid disease burden and other-disease disutility. This may be a consequence of the constellation of comorbid disease variables collected during the study. Our assessment of comorbid diseases consisted of a tabulation of common medical conditions that might affect the risk-benefit ratio when considering surgery (e.g., cardiovascular disease, hypertension) and of selected neurological conditions relevant to the assessment of outpatients in a neurosurgery clinic (e.g., carpal tunnel syndrome, stroke). Some of these diseases, such as hypertension or a resolved stroke, may not be overtly symptomatic in certain individuals and thus may not have a measurable impact on the valuation of current health. The study-induced focusing effect discussed above may also have contributed to the lack of association by
deemphasizing comorbid conditions. The only disutility hypotheses supported by the data were inconsistent relationships between CSM disease severity measured on the mJOA scale and CSM disease-specific disutility—there was a significant association between mJOA scores and CSM disease-specific disutility calculated with upper-anchor-point adjustment for VAS and WTP values. For both VAS and WTP, greater disease severity was associated with lower preferences (i.e., lower VAS scores and greater WTP). Alternative Strategies for Measuring Disease-Specific Disutility In this initial exploration of using Q and q scale measurements to calculate disutility, we did not use preand postcure measurement or disease-free scenario adjustment. Pre- and postcure adjustment has several requirements: 1) the pre- and postintervention assessment of health values and 2) specific criteria for a “cure” of the disease of interest. The determination of a cure is particularly vexing in a disease such as CSM, which affects numerous domains that contribute to quality-of-life and health valuations.10 Nichol and others5 were the first to assess the value of disease-free health on the Q scale using what we have termed disease-free scenario adjustment, using personalized hypothetical disease-free health states scenarios and the SG. Rather than valuing current health while using a hypothetical health state for the upper scale anchor (i.e., “disease-free health” defining the upper anchor of the q scale), this method asks participants to value a hypothetical health state (i.e., a disease-free health state scenario) on the Q scale. The disease-specific disutility is then calculated from the difference between the values of current health and the scenario. The advantages of disease-free scenario adjustment are that 1) participants do not need to change the boundaries of their frames of reference during the testing process, because all measurements are on the Q scale; 2) the uniform use of the Q scale eliminates the need to transform values measured on the q scale (as in upper-anchor-point adjustment); and 3) the calculation of disease-specific and other-disease disutilities uses individualized data rather than population-normative values (as in population-proxy adjustment). The disadvantages of disease-free scenario adjustment are that 1) participants must value hypothetical health states (i.e., the absence of the study disease with residual comorbid diseases or the absence of comorbid diseases with persistent study disease) rather than their current health; and 2) 2 measurements are required to obtain the data necessary to calculate disease-specific and
UTILITY ASSESSMENT AND DECISION PSYCHOLOGY
Downloaded from http://mdm.sagepub.com at UNITED BIOSOURCE CORP on July 14, 2009
223
KING, STYN, TSEVAT, ROBERTS
other-disease disutility (i.e., Q scale measurements of current health state and disease-free or other-diseasefree health state), contrasted with the single measurement on the q scale (current health measured on the q sale, followed by adjustment with population normative values—applies to population-proxy adjustment only). Future work comparing these and other utility techniques is necessary to better understand the advantages and disadvantages of each techniques for determining study disease–specific and other-disease utility. Limitations This initial exploration of the impact of the Q and q scales on preference values used a sample of patients with CSM. We chose a disease with a defined set of symptoms to enable participants to differentiate between CSM symptoms and symptoms from other comorbid diseases during upper-anchor-point testing, and we used a standardized characterization of CSM symptoms. Nevertheless, these symptoms could also have been caused by other diseases (e.g., osteoarthritis, peripheral neuropathy), and thus, the disutility that we attributed to CSM may have included contributions from other diseases. The similarities of disutilities calculated using population-proxy and upper-anchorpoint adjustment may not be generalizable to other diseases. The population-proxy adjustment method used values from the Beaver Dam study obtained with the TTO. Using these TTO-derived values for calculations of disease-specific disutility in combination with values assessed with the VAS or SG may have introduced errors. Both preference adjustment methods require excluding participants who give nonsensical responses. The relative contribution of CSM and other diseases to global disutility in these excluded participants remains unknown, and the exclusion of the responses of a significant proportion of the study population may detract from the validity of our findings, indicating a conceptual or practical flaw in the adjustment methodologies. Future modifications of the preference elicitation protocols, including the direct measurement of multiple hypothetical health state values on the Q scale, may reduce the rates of nonsensical responses. Our findings may not be generalizable to other populations. Participants with other diseases may not show the same consistent ratios of disease-specific and other-disease disutility across the VAS, SG, TTO, and WTP shown by our CSM study population. The TTO responses were the most problematic in this study population. The TTO produced the greatest
proportion of paradoxical responses (23%), for which the value of perfect health was lower than the value of a CSM disease-free health state. The large proportion of paradoxical responses contributed to the lack of a difference in preference values for perfect health compared to CSM disease-free health states with the TTO. When the paradoxical responses were excluded, the expected rank ordering of disease-free and perfect health states emerged. Alternatively, the TTO responses may represent random differences between the Q and q scale testing, and censoring “paradoxical” responses may have created a difference although none actually existed. The difficulties with TTO responses may be a peculiarity of our study population or an artifact of our measurement protocol using the iMPACT3 software and a script read aloud by the research assistant. The software used during preference elicitation may have limited or influenced our results. The iMPACT3 software provided excellent graphical depictions of the VAS, SG, and TTO during preference testing. Nevertheless, the software’s limits on the range and gradations of responses during testing (VAS and TTO values were limited to the equivalent of 0.05 utility unit increments) may have influenced participants’ responses. Participants with minimal disutility may have had difficulty differentiating between their current health and perfect health because of the granularity of the VAS and TTO scales. The WTP Visual Basic program also limited gradations in the response options and imposed an upper limit (responses were capped at 10 times annual income), which may have restricted some participants’ preferences. Our analysis was limited to an additive model of utility. Such a model has the advantage of simplicity; however, the influence of multiple conditions on utility is probably more complex than a simple linear model. Some investigators have found that multiplicative or higher order models do a better job of describing the relative contributions of multiple conditions to global utility.13 Applying population-proxy and upperanchor-point adjustment techniques as an adjunct to nonlinear utility models needs further exploration. CONCLUSIONS The upper anchor point affects VAS, SG, TTO, and WTP values for current health in patients with CSM. Values measured on the Q scale (with a perfect-health anchor) were generally lower than those measured on the q scale (with a disease-free anchor). Using several mathematical conversions (population-proxy adjustment and upper-anchor-point adjustment) of raw pref-
224 • MEDICAL DECISION MAKING/MAY–JUNE 2003
Downloaded from http://mdm.sagepub.com at UNITED BIOSOURCE CORP on July 14, 2009
“PERFECT-HEALTH” VERSUS “DISEASE-FREE” UPPER ANCHOR
erence values measured on the Q and q scales to calculate CSM disease-specific disutility and other-disease disutility, we found that q scale measurements overestimated the disutility attributable to CSM compared to adjusted Q scale values. It is not clear whether population-proxy or upperanchor-point adjustment is superior. The proportion of total disutility attributable to CSM (63%–82%) is consistent across the VAS, SG, TTO, and WTP using either population-proxy or upper-anchor-point adjustment. Nonsensical values, disutility proportions, and absolute disutility values are similar using either technique. Population-proxy adjustment reduces respondent burden and researcher expense, but upper-anchor-point adjustment may have a theoretical advantage because it uses individual preference values instead of population-derived values for the disease-free health state, and it can also be used for WTP values. These preliminary findings need validation in other populations and other diseases. REFERENCES 1. Torrance GW. Social preference for health states. Socio-Economic Planning Sci. 1976;10:129–36. 2. Diener A, O’Brien B, Gafni A. Health care contingent valuation studies: a review and classification of the literature. Health Econ. 1998;7:313–26.
3. Fryback DG, Lawrence WFJ. Dollars may not buy as many QALYs as we think: a problem with defining quality-of-life adjustments. Med Decis Making. 1997;17:276–84. 4. Fryback DG, Dasback EJ, Klein R, Klein BEK, Peterson K, Martin PA. The Beaver Dam health outcomes study: initial catalog of healthstate quality factors. Med Decis Making. 1993;13:89–102. 5. Nichol G, Llewellyn-Thomas HA, Thiel EC, Naylor CD. The relationship between cardiac functional capacity and patients’ symptomspecific utilities for angina: some findings and methodologic lessons. Med Decis Making. 1996;16:78–85. 6. Chiles BW III, Leonard MA, Choudhri HF, Cooper PR. Cervical spondylotic myelopathy: patterns of neurological deficit and recovery after anterior cervical decompression. Neurosurgery. 1999;44: 762–70. 7. Lenert LA, Sturley A, Watson ME. iMPACT3: internet-based development and administration of utility elicitation protocols. Med Decis Making. 2002;22:464–74. 8. Spearman C. The proof and measurement of association between two things. Am J Psychol. 1904;15:72–101. 9. Cuzick J. A Wilcoxon-type test for trend. Statistics Med. 1985;4: 87–90. 10. King JT Jr, Roberts MS. Validity and reliability of the SF-36 in cervical spondylotic myelopathy. J Neurosurg. 2002;97(suppl 2):180–5. 11. Boyd NF, Sutherland HJ, Heasman KZ, Tritchler DL, Cummings BJ. Whose utilities for decision analysis? Med Decis Making. 1990; 10:67. 12. Gold MR, Siegel JE, Russell LB, Weinstein MC. Cost-Effectiveness in Health and Medicine. New York: Oxford University Press; 1996. 13. Torrance GW, Furlong WJ, Feeny D, Boyle MH. Multi-attribute preference functions: health utilities index. PharmacoEconomics. 1995;9:503–20.
UTILITY ASSESSMENT AND DECISION PSYCHOLOGY
Downloaded from http://mdm.sagepub.com at UNITED BIOSOURCE CORP on July 14, 2009
225