mastectomy or lumpectomy for breast cancer, or prostatectomy or watchful waiting for benign hypertrophy of the prostate, will be based on important attributable.
JOURNAL OF THE ROYAL SOCIETY OF MEDICINE
Volume 90
December 1 997
Are randomized controlled trials controlled? Patient preferences and unblind trials Klim McPherson PhD1
Annie R Britton MSc1
John E Wennberg MD2
J R Soc Med 1997;90:652-656
SUMMARY The most reliable information about treatment effects comes from randomized controlled trials (RCTs). However, the possibility of subtle interactions-for example, between treatment preferences and treatment effects-is generally subordinated in the quest for evidence about main treatment effects. If patient preferences can influence the effectiveness of treatments through poorly understood (psychological) pathways, then RCTs, particularly when unblinded, may wrongly attribute effects solely to a treatment's physiological/pharmacological properties. To interpret the RCT evidence base it is important to know whether any preference effects exist and, if so, by how much they affect outcome. Reliable measurement of these effects is difficult and will require new approaches to the conduct of trials. In view of the fanciful image with which such effects are portrayed and the uncertainties about their true nature and biological mechanisms, existing evidence is unlikely to provide sufficient justification for investment in trials. This is a Catch 22. Until an escape is found we might never know, even approximately, how much of modern medicine is attributable to psychological processes.
INTRODUCTION Medicine is best if based on hard evidence, and the most
reliable evidence for treatment effects comes from randomized controlled trials (RCTs). In theory, by randomizing patients between a treatment and a control, the average biological effect, uncontaminated by all confounding, can be estimated by comparing the mean response between two groups. The pragmatic question of whether the treatment works in a consistent way dominates the legitimate quest for evidence but subordinates the complexities of interaction to second-order questions. One possible kind of interaction is that between physical and psychological influences of therapy, and of particular interest is the possibility that a preference for a treatment will alter the therapeutic effect. We can call this the therapeutic effect of patient preference, which is possibly strongly related to the well-known placebo effectl. This paper is not about treatment choices consequent upon preferences for particular outcomes. A preference for mastectomy or lumpectomy for breast cancer, or prostatectomy or watchful waiting for benign hypertrophy of the prostate, will be based on important attributable aspects of outcome which are known to be different. Such matters are clearly separate from the attributable effects of 'Department of Public Health & Policy, London School of Hygiene and Tropical Medicine, London WC1 E 7HT, UK; 2Center for the Evaluative Clinical Sciences, Hanover, New Hampshire, USA
652
Correspondence to: Professor Klim McPherson
preference itself on therapeutic outcome. The existence of preferences for particular treatment can readily be studied but their possible attributable therapeutic effects remain poorly understood. The ability to detect such effects, if they exist, is intrinsically compromised. First, one can never randomize between enthusiasm for a treatment and hating the whole idea of it. Secondly, the serious possibility of confounding2 is usually present; people who tend to prefer something may be different in other ways, plausibly related to prognosis, from those who do not34. Thirdly, where people have strong preferences the possibility of randomizing between competing treatments is limited anyway5. Fourthly, detecting interactions is always difficult because of power considerations; possibly small interactions on small main effects can only be estimated with precision from very large numbers. An essential, but neglected, part of the evidence-based agenda is to disentangle the main physical effect of a treatment from any possible influence of individual preferences, notwithstanding all these difficulties. EVIDENCE FOR PREFERENCE EFFECTS
How strong is the evidence that preference has important effects on outcome? Irrefutable evidence is sparse partly, perhaps, because the effects are difficult to detect reliably but also because the dominant agenda has been to control for or circumvent, rather than measure them. The possibility itself is often regarded as fanciful. Studies that
JOURNAL OF THE ROYAL SOCIETY OF MEDICINE
have attempted to assess the possible impact of patient preferences include that by McKay et a]. In this study of alcoholism rehabilitation, patients without preferences were randomized to day-hospital or inpatient treatment. Those with preferences selected one of the two treatment settings. There were no significant differences, in terms of relapses or psychosocial outcomes, between those with preferences and those without. Others have tried similar approaches6, but none managed to show conclusively that preferences affect outcome. All that can be concluded from the limited empirical evidence is that preferences exist and that the characteristics of patients who choose their treatments may be different from those of patients who agree to randomization. Indirect evidence for such effects is not difficult to find. Links between psychological factors (for example, beliefs or enthusiasm) have been found with subsequent physiological outcomes. There is ample evidence to support the placebo effect7'8, especially in surgery9. If compliance with a placebo is a measure of some enthusiasm or belief in its worth, then these psychological factors may seem to have an important effect on outcome which is not strictly or directly pharmacological. None of the evidence is conclusive1>12, but, combined with the fascinating debate on the possible biological pathways for such psychogenic effects13,14, the issue is in principle important. We need to understand the circumstances in which such preference effects might be potent. Preference effects are most likely to influence the results of RCTs when blinding is difficult or impossible'5though even in supposedly blind trials patients may think they know which arm they are in and hence be susceptible to preference effects. .
A SIMPLE ADDITIVE MODEL
Volume 90
December 1997
interaction even if the main effect (x) of the new treatment B is zero. These effects are summarized in Table 1. If the proportion of the eligible population who prefer treatment A is a, while ,B prefer B and y are indifferent, then the model requires that (Oc+f+y)=1. More complex models could be imagined in which the effects of preference were multiplicative, graded, different for each treatment and/or asymmetric, but since these effects are poorly understood we want to investigate the simplest possible theoretical effects. It can be shown (by subtracting the estimated mean effect in the A group from that in the B group) that the best estimate of the attributable effect of treatment B over treatment A in a large well-conducted randomized comparison will be:
x+2y(fl - x) This is different from x (the true 'physiological' effect) by an amount equal to 2y(fB-o) (the preference component). Hence such trials will only estimate the main treatment effect correctly either if y is zero (no effect of preference) or if ,B=oc (an equal proportion prefer A and B). Remember that for simplicity we have assumed that the preference effects are equal for both treatments (a value of 'y' is added to both). A more complex model would be needed to incorporate graded or changing preferences. Also here the effects of random variation are ignored, and in general therefore distinguishing reliably between x and y will require very large trials. In practice, of course, this may be hard to achieve since those with strong preferences are likely to seek treatment elsewhere. Under reasonable assumptions ony and on ,B- 16, by how much might RCTs wrongly estimate main treatment effects, if the true nature of any preference effect is poorly understood? Consider the size of the difference in the proportion preferring the two treatments; if 35% prefer treatment B and 60% treatment A, the difference is 25%, i.e. (flo-a)=-0.25. In this model if the average 'physiological' effect of B over A (i.e. x) is 10% (let the effect of A alone be arbitrarily 50%) and if the preference advantage (i.e. y) is 5% then the actual treatment effects will be as in Table 2. If these values are ever appropriate then simple substitution will indicate that a fair RCT will be 25% 'out' i.e. in this case, 2y(,B-o) is 25% of x. x (the
In the absence of empirical evidence, patient preference effects can be investigated by means of a simple theoretical model. Imagine two treatments for some condition designated by A and B. Let us assume that, with regard to the purely physiological effect, A benefits on average a proportion P of eligible people, and B a higher proportion P+x. Thus, taking an example where measured outcome is five year survival, if P is 0.50 and x is 0.10, then on average 60% would be alive at five years on treatment B. Assume also that having a preference for A bestows an extra average advantage for treatment A of an amount y, to P+y and alternatively a preference for B of a similar amount Table 1 Postulated average effects y (for simplicity) to P+x+y for treatment B. Conversely, of those who prefer A, only P+x-y will be affected if given Treatment Prefer A Indifferent treatment B, and of those who prefer B, P-y will benefit if given A. These are postulated average interaction effects for P OnA P+y patients among whom these treatments would be On B P+x P+x-y appropriate, a nd t his s imple m odel a llows preference f or a and this model allows for a appropriate, simple preference
Prefer B
P-y P+x+y
653 653
JOURNAL OF THE ROYAL SOCIETY OF MEDICINE
Table 2 Postulated effects of treatment
654
Treatment
Indifferent
Prefer A
Prefer B
On A On B
50% 60%
55% 55%
45% 65%
'physiological' effect) will be estimated as xX, or if 60% prefer B and 35% A i.e. ( -oc)=0.25 then the unbiased RCT estimate will be 1 ΒΌ/4 x. Either way these results would be wrong, since the estimated effect will be attributed to the treatment alone but will in reality reflect the (possibly changing) distribution and extent of (usually unknown) preference effects. If the difference in the proportions who prefer A or B is 50% then the size of the 'bias' from a randomized comparison rises to 50%, for these hypothetical values of x=10% and y=5%. If, however, y is only 1% then the 'biases' in the results of RCTs will be reduced to 5% for a 25% difference in proportions with contrasting preferences, and 1 0% for a 50% difference. However, if y is 1 0% (i.e. the role of preference is more profound than the physiological treatment effect) then the trials will be, respectively, 50% and 100% 'out', on average (i.e. the treatment effect will be estimated as 11/2 x or 2 x). This is potentially important if such large differences in the prevalence of preferences can be shown to be plausible. Consider an unblind or poorly blinded trial (in the sense that patients might well be able to tell which treatment they are on) comparing placebo with a supposedly active new treatment, where the benefits are biologically plausible but which in fact has no additional physiological benefit. The difference in the prevalence of preferences might be large, with 90% preferring the new 'active' treatment, with 5% preferring the control and 5% being indifferent. If plausible, '/B-oc' becomes 0.85 and hence, if the preference effect remains as much as 5%, the bias is 8.5% in absolute terms. Consequently, if the natural history is such that 50% survive' anyway, such a trial would suggest that treatment improved this to 58.5%. This raises concern for cancer and cardiovascular treatments, and much else besides, which have been evaluated in many enormous unblind trials (and meta-analyses) with highly significant estimated benefits of around 10% improvement on the conventional treatment. These treatments are now offered on the essentially unquestioned assumption that they act through stable, general (and plausible) physiological processes. Preference effects may, however, not be stable, and will not be general. The implications of these arguments are important in our interpretation of evidence of new treatments. As clinical researchers are encouraged to randomize between successive new treatments rather than compare every new
Volume 90
December 1 997
treatment with placebo17, the average net treatment effect in RCTs is minimized (new treatments are, on average, less different from current treatment than from a placebo). Patients with a chronic disease for which the current treatment is not very effective may be attracted to new treatments. Although Chalmers shows that new treatments are just as likely to be worse as better than their predecessors18, such a finding is unlikely to be widely absorbed either by enthusiastic clinicians or by their anxious patients. Believing in a treatment does not, of course, necessarily enhance its effectiveness. Patients may prefer one treatment over another not because they believe the outcome to be superior, but because they find the process of that preferred treatment more acceptable. However, if preferences do enhance outcome, the consequences for the uptake of new treatments may be important. If new treatments are favoured over established ones for conditions with poor prognosis (50% survival for example), then new treatments may gain in apparent effectiveness even if they have no additional straight physiological benefit. There is then a tendency for this process to escalate as evidence from each successive RCT may affect patient preferences, directly or indirectlyl9. Such a process might accrue more and more expensive (and possibly unpleasant) treatments which are actually no better, in the sense of the postulated physiological mechanism, than the standard treatment. From a pragmatic standpoint, this may not matter; the patients will fare better for psychological reasons. However, if the preferences component (2y above) is larger than the physiological effect of treatment (x above), then RCTs could wrongly attribute benefits, causing some to suffer a net disadvantage. CONCLUSIONS
The argument that patient preferences can have important effects on the results of RCTs has been made theoretically, but empirical evidence is hard to find. We do, however, need to understand the nature of the phenomenon better and, in particular, where preferences are important and where they are not, and what the implications of this are on our understanding of the RCT evidence base. The obvious progression is to mount randomized trials from which the size of preference effects can be reliably measured. But this is difficult. Gerta Rucker has postulated a two-stage design where randomization between two groups (Figure 1) is described2. The two arms compare the outcome among no choices with patient preferences where they exist. If people with strong preferences can be recruited into such a trial (Torgenson et al.4, for example, demonstrate that it is possible, with limitations), the estimation of any preference effect remains complex even if
JOURNAL OF THE ROYAL SOCIETY OF MEDICINE
Figure
1 Rucker's two-stage design
the simple model discussed above is realistic. The problem is one of interpretation since, in this case, subtracting the means from the two randomized groups provides an estimate of a complex combined algebraic function of the main physiological effects and any preference effect (x and y, respectively). As we have emphasized, measuring the existence of main physiological effects is, in the known absence of preference effects, straightforward, but estimating interactions, such as y, is difficult. This is true of the simple assumptions made here, but more complex assumptions quickly render the solution more intractable. The physiological effect itself can be estimated from the randomized arm, but with an unknown preference component, as above, based on imprecise estimates of the proportions cx and ,B from the preference arm. A preference effect may thus be estimable from comparison of the results from the two arms, but the error structures are formidable. The algebra, which is highly laborious (and available from the authors), confirms the difficulty of estimating these interactions reliably. Large numbers are essential for sufficient precision, even for moderate effects. It may be easier to put into practice the trial design described by Brewin and Bradley20 (Figure 2), but this will produce results for which a preference effect cannot be disentangled from the possible confounding arising from differences between patients with particular treatment preferences. The physiological effect can be estimated from those patients who agreed to randomization, but this finding will have limited generalizability. Alternative methods4
Fioure 2 Brewin and Bradley's design
Volume 90
December 1997
involve recording preferences before randomization as a covariate and estimating a preference effect by including a 'y' type term in a regression equation estimating the main (x type) effects. Unless the trial is enormous such estimates will probably be too imprecise to distinguish them reliably from main physiological effects. To interpret RCT results properly we need to know that estimates of treatment effects are free from important preference components 2y(fl-oc), particularly as preferences can change while physiological effects may be less volatile. It is crucial, ultimately, to understand why a treatment works. Preference trials21 can answer contemporary pragmatic questions about which treatment works best, incorporating both individual choice and their preference effects, but double-blind trials are more likely to control for any psychological effects and hence detect the physiological effects alone. Physiological effects cannot be reliably observed from unblind trials at all, since certainty that there are no preference effects is never justifiable experimentally. Sufficiently large trials to discern preference effects reliably will require a formidable biological or clinical justification to obtain enough enthusiastic support. Invoking somewhat fanciful mechanisms for which there is not much evidence is unlikely to be sufficient justification. It remains to be seen whether enough evidence can be obtained from what we now already know and can agree upon. If it cannot, the larger question will be whether sufficient evidence can ever be amassed to justify a good enough trial. Perhaps the first step should be the systematic documentation of the importance of the preference effect on clinical outcomes. Such studies can proceed with less formidable designs as part of the informed consent process required for human experimentation-for example, by following the preference arm in the Rucker model (see Figure 1). Under this design, patients are carefully and fully informed about the relevant scientific uncertainties and invited to choose their treatment. Those with little or no preference for treatment are encouraged to accept randomization. The systematic follow-up of such cohorts would offer the opportunity to establish through randomization the physiological effects of treatment among those with no preference and to learn whether patients with apparently similar prognostic characteristics who actively choose their treatments have different outcomes from those predicted by randomization. How much of modern medicine is attributable to such psychological factors is an important question. Patients with similar clinical conditions and severity of illness can have quite different preferences for treatment22. Ultimately the demands of clinical research will be to provide reliable information on the chances for particular outcomes based on the experience of patients who actually choose their
655
JOURNAL OF THE ROYAL SOCIETY OF MEDICINE
treatment. To understand the therapeutic effects of preferences is a vital and formidable empirical task. Acknowledgment For important contributions to earlier drafts we thank lain Chalmers, Nick Black, Andrew Herxheimer, Martin McKee, Ann McPherson, Ann Oakley, Colin Sanderson, Deborah Waller, Kaye Wellings and Charles Warlow. REFERENCES I O'Boyle CA. Diseases with passion. Lancet 1993;342:1126-7 2 Rucker G. A two-stage trial design for testing treatment, self-selection and treatment preference effects. Stats Med 1989;8:477-85 3 McKay JR, Alterman AI, McLellan, Snider EC, O'Brien CP. Effect of random versus nonrandom assignment in a comparison of inpatient and day hospital rehabilitation for male alcoholics. J Consulting Clin Psychol 1995;63:70-8 4 Torgerson DJ, Klaber-Moffett J, Russell IT. Patient preferences in randomised trials: threat or opportunity. J Health Services Res Policy
1996;1: 194-7
5 Lilford RJ. Equipoise and the ethics of randomisation. J R Soc Med
1995;88:552-9
6 Nicolaides K, de Lourdes Brizot M, Patel F, Snijders R. Comparison of chorionic villus sampling and amniocentesis for fetal karyotyping at 1 013 weeks' gestation. Lancet 1994;344:435-9 7 The Coronary Drug Project Team. Influence of adherence to treatment and response of cholesterol on mortality in the Coronary Drug Project. N EnglJ Med 1980;303:1038-41 8 McPherson K. The best and the enemy of the good: randomised controlled trials, uncertainty, and assessing the role of patient preference in medical decision making. The Cochrane Lecture. J Epidemiol Commun Health 1994;48:6-1 5
656
Volume 90
December 1997
9 Barsamian EM. The rise and fall of internal mammary artery ligation in the treatment of angina pectoris and the lessons learned. In: Bunker JP, Barnes BA, Mosteller F, eds. Costs, Risks, and Benefits of Surgery. New York: Oxford University Press, 1977:212-22 10 Chaput de Saintonge DM, Herxheimer A. Harnessing placebo effects in health care. Lancet 1994;344:995-8 11 Kleijnen J, de Craen AJM, Everdingen JV, Krol L. Placebo effect in double-blind clinical trials: a review of interactions with medications. Lancet 1994;344: 1347-9 12 Phillips DP, Ruth TE, Wagner LM. Psychology and survival. Lancet 1993;342:1142-5 13 Levy SM, Herberman RB, Lee J, Whiteside T, Beadle M, Heiden L, Simons A. Persistently low natural killer cell activity, age, and environmental stress as predictors of infectious morbidity. Nat Immun Cell Growth Regul 1991;10:289-307 14 Redd WH, Silverfarb PM, Anderson BL, Andrykowski MS, Bovbjerg DH et a]. Physiologic and psychobehavioral research in oncology. Cancer 1991;67:813-22 15 Maclntyre IMC. Tribulations for clinical trials. Poor recruitment is hampering research. BMJ 1991;302: 1099-100 16 McPherson K. Patients' preferences and randomised trials. Lancet 1996;347:1 119 17 Rothman KJ. Placebo mania. BMJ 1996;313:3-4 18 Chalmers I. What is the prior probability of a proposed new treatment being superior to established treatments? BMJ 1997;311:74-5 19 Strong PM. The Ceremonial Order of the Clinic. London: Routledge & Kegan Paul, 1979 20 Brewin CR, Bradley C. Patient preferences and randomised clinical trials. BMJ 1989;299:313-15 21 Wennberg JE. What is outcomes research? In: Gelijns AC, ed. Medical Innovations at the Crossroads: Modern Methods of Clinical Investigation, Vol 1. Washington DC: National Academy Press, 1990:33-46 22 Onel E, Hammond C, Wasson JH, Berlin BB, et al. An assessment of the feasibility and impact of shared decision making in prostate cancer. Urology (in press)