Assessing Placebo Response Using Bayesian ...

0 downloads 0 Views 282KB Size Report
Jan 6, 1995 - Skene and Wake eld (1990) have taken a similar approach to ...... potentially great bene t: ECMO. by James H. Ware, Statistical Science, 4, ...
Assessing Placebo Response Using Bayesian Hierarchical Survival Models Dalene K. Stangl, Duke University Joel B. Greenhouse, Carnegie Mellon University January 6, 1995

Abstract The National Institute of Mental Health (NIMH) Collaborative Study of Long-Term Maintenance Drug Therapy in Recurrent A ective Illness was a multicenter randomized controlled clinical trial designed to determine the ecacy of a pharmacotherapy for the prevention of the recurrence of unipolar a ective disorders. The outcome of interest in this study was the time until the recurrence of a depressive episode. The data show much heterogeneity between centers for the placebo group. The aim of this paper is to use Bayesian hierarchical survival models to investigate the heterogeneity of placebo e ects among centers in the NIMH study. This heterogeneity is explored in terms of the marginal posterior distributions of parameters of interest and predictive distributions of future observations. The Gibbs sampling algorithm is used to approximate posterior and predictive distributions. Sensitivity of results to the assumption of a constant hazard survival distribution at the rst stage of the hierarchy is examined by comparing results derived from a two component exponential mixture and a two component exponential changepoint model to the results derived from an exponential model. The second component of the mixture and changepoint models is assumed to be a surviving fraction. For each of these rst stage parametric models sensitivity of results to second stage prior distributions is also examined. KEY WORDS: Bayesian; Hierarchical; Survival; Mixture; Changepoint; Predictive.

Acknowledgments This work was supported in part by grants from the National Institute of Mental Health, MH15758 and MHCRC30915, by a grant from the National Cancer Institute, CA54852, and by a grant from the John D. and Catherine T. MacArthur Research Network on the Psychobiology of Depression.

1 Introduction For many individuals, unipolar depression is a recurrent, disabling illness. It is estimated that 50 percent of those su ering one depressive episode will su er another within the ensuing ten years, and that those who have experienced two depressive episodes have almost a 90 percent chance of experiencing a third (Keller et al. 1982). Because clinical depression is such a debilitating illness and because recurrent depression is, in fact, a major public health problem (Wells et al. 1989), there has been increasing interest during the last fteen years in treatments for the prevention of the recurrence of future depressive episodes. These interventions have focused primarily on treating non-symptomatic patients with maintenance doses of pharmacotherapies, such as imipramine, that have been shown to be e ective in the treatment of acute episodes of depression. In the late 1970's the National Institute of Mental Health (NIMH) sponsored a collaborative ve-center randomized controlled clinical trial to evaluate the comparative ecacy of maintenance treatment for the prevention of the recurrence of depression (Prien et al. 1984). In this study, patients in an acute episode of depression who had experienced at least one previous episode of depression within the past 2 1/2 years were eligible to participate in a randomized controlled maintenance therapy trial if i) they responded to imipramine for treatment of the acute illness, and ii) once stabilized remained symptom free, that is, did not show signs of depressive symptomatology, for a period of eight consecutive weeks. Eligible patients were then randomly assigned either to receive maintenance doses of imipramine or placebo. (For more details see Greenhouse, Stangl, Kupfer and Prien 1991.) Patients were followed prospectively for two years or until they had a recurrence of depression. The objective of the NIMH study was to determine whether time-to-recurrence was prolonged for patients receiving the active treatment, maintenance doses of imipramine. The use of a placebo control is a fundamental tenet in randomized clinical trials for diseases in which there are no e ective standard therapies such as for the prevention of the recurrence of depression. We de ne placebo to be an intervention designed to simulate therapy, but which is not believed to be a speci c therapy for the target condition. With the use of a placebo it is hoped that patient attitudes to the trial in both treatment and control groups are as similar as possible. A placebo-control group also provides the opportunity to make the trial double-blind and thereby eliminate observer bias. Underlying the use of placebo in clinical investigations is the recognition that there are nonspeci c e ects of interventions attributable to factors other than speci c active 1

components. Placebo e ects have been noted across many di erent conditions involving a number of di erent interventions (Beecher 1955). For example, based on a literature synthesis spanning 30 years of clinical trials, Brown (1994) found that double-blind placebo controlled antidepressant ecacy studies have consistently shown that 30 to 40 percent of moderately to severely depressed patients improve with placebo treatment. Yet in general, little is understood about placebo response and the extent to which placebo can account for improvements observed in clinical studies (e.g., see Turner et al. 1994). In psychopharmacologic clinical trials, for instance, placebo is clearly more than an inert capsule. As Brown (1994) notes, : : : [placebo patients] are the recipients of the common treatment factors present in

any plausible treatment situation. These include expectation of improvement, demand for improvement, and clinical enthusiasm, e ort, and commitment. The subjects of antidepressant clinical trials also receive to varying degrees, depending on the treatment setting and clinician, the opportunity to verbalize distress, encouragement, mobilization of hope, attention and positive regard. No wonder the non-active intervention arm in psychopharmacology trials is often a ectionately referred to as the \warm placebo". The aim of this paper is to use the multicenter design of the NIMH study to investigate the heterogeneity of the placebo response for the prevention of the recurrence of depression. Our methodological aim is to develop an approach to the analysis of multicenter clinical trials where time-to-response is the outcome of interest. Speci cally, we consider a Bayesian treatment of hierarchical parametric survival models. Skene and Wake eld (1990) have taken a similar approach to the analysis of multicenter clinical trials when the response variable is binary. Gray (1994) investigates institutional di erences in such studies using a Bayesian hierarchical model that incorporates a proportional hazards structure. The hierarchical models that we consider in this paper are highly parameterized. A basic and challenging problem in the practice of statistics for both Bayesian and classical statisticians is the problem of model speci cation. Since no statistical model can be assumed adequate it is important to consider methods for model criticism (Box 1980). One feature of the Bayesian approach is that it provides a formal methodology for criticizing a statistical model and for assessing the sensitivity of inferences to the model. From the Bayesian point of view, the model consists of all aspects of 2

the available information about the substantive problem that one can express in the joint density obtained by combining the likelihood and the prior. Perhaps because much of the criticism of Bayesian methods has focused on the subjectivity in the choice of the prior distribution, there is now a relatively large literature on methods for Bayesian prior sensitivity analysis to assess the sensitivity or robustness of an analysis to possible misspeci cation of the prior distribution (see for example, Berger 1984; Kass and Greenhouse 1989; Wasserman 1992, Greenhouse and Wasserman 1995). On the other hand, because it is often argued that the model for the data (e.g., the speci cation of a survival distribution) has some theoretical justi cation or may have some external validity this component of the speci cation of the model in practice has been treated as if known with certainty. As a result there are few formal methods for investigating the sensitivity of inferences to this speci cation (see for example, Cox, 1990, and Lehman, 1990. Notable exceptions include Box, 1980; Skene, Shaw and Lee, 1986; Smith and Spiegelhalter, 1980). In this paper several di erent model speci cations are considered. Each complete speci cation is a hierarchical model. Within a center (the rst stage of the hierarchy), a constant hazard rate model and two di erent speci cations for nonconstant hazard rate models are considered. At the second stage of the hierarchy, a range of prior distributions will be speci ed for the parameters of the survival models speci ed at stage I. Sensitivity of the placebo e ects within each center due to the di erent prior speci cations will be examined by comparing the posterior distributions of the parameters of interest across the alternative Stage II speci cations. Sensitivity of the treatment e ects at each center due to changes in Stage I speci cations will be examined by comparing the predictive distributions for future observations across the alternative Stage I speci cations. The calculation of posterior and predictive distributions is facilitated by using the Gibbs sampling algorithm (Geman and Geman, 1984; Gelfand and Smith, 1990) which makes the type of sensitivity analysis described here feasible. Using this algorithm it is relatively easy to specify alternative likelihood functions as well as alternative speci cations for prior distributions. The outline for the remainder of the paper is as follows. In the next section the original analysis and major results of the NIMH trial are reviewed. In section 3 several hierarchical parametric survival models for the analysis of treatment e ects in a multicenter clinical trial are speci ed. The hierarchical nature of these models allows us to investigate e ects due to center. The results from the placebo arm of the NIMH trial are discussed in section 4. The paper concludes with a summary 3

and discussion of the methodology in section 5.

2 Background and Results of the NIMH Study For many individuals, unipolar depression is a recurrent, disabling illness. It is estimated that 50 percent of those su ering one depressive episode will su er another within the ensuing ten years, and that those who have experienced two depressive episodes have almost a 90 percent chance of experiencing a third (Keller et al. 1982). Because clinical depression is such a debilitating illness and because recurrent depression is, in fact, a major public health problem (Wells et al. 1989), there has been increasing interest during the last fteen years in treatments for the prevention of the recurrence of future depressive episodes. These interventions have focused primarily on treating non-symptomatic patients with maintenance doses of pharmacotherapies, such as imipramine, that have been shown to be e ective in the treatment of acute episodes of depression. The NIMH Collaborative Study was a ve center randomized controlled clinical trial designed to evaluate the comparative ecacies of several maintenance treatments for the prevention of the recurrence of depression (Prien et al. 1984). In this study, patients in an acute episode of depression who had experienced at least one previous episode of depression within the past 2 1/2 years were eligible to participate in a randomized controlled maintenance therapy trial if i) they responded to imipramine for treatment of the acute illness, and ii) once stabilized remained symptom free, that is, did not show signs of depressive symptomatology, for a period of eight consecutive weeks. Eligible patients were randomly assigned to one of four maintenance treatment groups. Patients were followed prospectively for two years or until they had a recurrence of depression. The initial objective of the NIMH study was to determine whether time-to-recurrence was prolonged for patients receiving the active treatment, maintenance imipramine. This paper will focus only on subjects that were withdrawn from imipramine at the time of randomization. This group served as the placebo arm of the clinical trial, and will be referred to as such throughout the rest of this paper. Table 1 presents the number of patients randomly assigned to the placebo group in each center. As noted above the outcome of interest in the NIMH study was the time until the rst recurrence of a depressive episode after randomization to maintenance therapy. In gure 1 the results of the 4

placebo arm of the NIMH study using the methods of Prien et al. (1984) are presented. The primary analysis on time-to-recurrence of depression was based on the Kaplan-Meier estimates. In the original analysis data from all centers was pooled. Their analysis ignored the heterogeneity across centers evident in Figure 1. Table 1 and Figure 1 about here It is clear from this gure that the recurrence rates are very di erent between the centers. The largest di erence is seen between the two largest centers, Center D and Center E. By 10 weeks 70% of the subjects had already experienced a recurrence of depression at Center E, while at Center D only about 50% experienced a recurrence by the end of the study at 104 weeks. These di erences clearly warrant further investigation and must be taken into account in the analysis. The Generalized Savage (Mantel-Cox) statistic was 16.95 on 4 degrees of freedom with a p-value of .002.

3 Model Speci cation The individual center Kaplan-Meier survival curves suggest that there is variability in the response among centers. A natural approach to account for this variability across centers is to build a multi-stage hierarchical model, where at the rst stage of the model, a survival distribution for time-to-recurrence conditional on center-speci c parameters is speci ed. In e ect, it is assumed that patients on a given treatment within a center are similar and their survival experience is modeled with a common survival distribution. Then at the second stage, heterogeneity among the centers is allowed by specifying a model for a population distribution of centers which in turn will be indexed by a set of hyperparameters. In other words, centers are treated as random. A feature of the hierarchical model is that permits investigation of the relationship between rst stage parameters and overall population parameters. Speci cally, the structure imposed by the hierarchical model uses information from all centers to provide estimates for speci c center e ects. Speci cation of second and third stage parameters allows control over how much information is shared among centers. This notion of sharing information across centers has been called \borrowing strength" and results in shrinkage-type estimators. (See for example, Lindley and Smith, 1972; Kass and Ste ey, 1989.) 5

In the models to be considered here, survival times within each treatment-center subgroup will be modeled as conditionally independent. That is, conditional on a set of center parameters, observations within a center are assumed independent and identically distributed. Similarly, conditional on a set of hyperparameters, that is, second-stage parameters, the center parameters of the rst stage are treated as independent and identically distributed at the second stage. Finally, the hyperparameters of the second stage are assumed known or given a prior distribution of their own.

3.1 Stage I Speci cation Let Ti denote the random time to recurrence of a depressive episode for patients in center i, i = 1; : : :; 5. S (t) will denote a survival distribution, i.e., S (t) = P (T > t). Three parametric stage I models for the distribution of time-to-recurrence are considered. Model A: Exponential

Se (t) = e? t

(1)

i

Remarks. The speci cation of an exponential survival distribution at stage I for each center implies that the hazard rate within each treatment group is constant. Inspection of Figure 1 suggests that this is not a valid assumption. However, the exponential survival model is considered here for comparative purposes.

Model B: Exponential Mixture Model with a Surviving Fraction Sm (t) = (1 ? i ) e? t + i : i

(2)

Remarks. The parameter  is the probability of not having a recurrence and is sometimes called the surviving fraction or cure rate (Boag 1949; Farewell 1983; Greenhouse and Wolfe 1984). This model implies that the patients in center i receiving treatment j are a mixed population with respect to their risk of a recurrence of depression and therefore the model implies a nonconstant hazard rate. In e ect, this model can be thought of as an approximation to a survival function that is a mixture of two survival distributions where the hazard rate for one subgroup of the population is very small.

6

Model C: Exponential Changepoint Model with a Surviving Fraction

8 >> e? t ; < Sc (t) = > e?  ; :> i

i i

t < i t > i :

(3)

Remarks. The parameter  refers to the point in time where the hazard rate changes. (See for examples, Achcar and Bolfarine, 1989; Carlin et al. 1992; Raftery and Akman, 1986; and Smith, 1980) In the application presented here, consider the time  to mark the end of a vulnerable period for the recurrence of depression. Throughout the time interval (0, ) observations occur according to an exponential distribution. If a patient survives this time interval it is assumed that the patient will not experience a recurrence. The survival function at time  represents the proportion of patients who will not experience a recurrence and hence can also be thought of as a surviving fraction. This model di ers from the previous model in an important way. Whereas in model B the population consists of a mixture of two subgroups, the changepoint model consists of a homogeneous population whose survival distribution changes at time  .

3.2 Stage II and Stage III Speci cations The speci cation at stage II of the hierarchical model attempts to describe the relationships between the rst stage parameters and the overall population parameters. When interest focuses on combining or summarizing information about treatment response across centers, the xed stage II and III parameters can be thought of as summarizing the prior information about the placebo e ects in the population. For the analyses considered here, di erent speci cations of the second and third stage parameters allows control over how much information is shared between centers. Thus, for each stage I model a set of prior distributions re ecting di use prior information about the parameters of interest as well as a set of prior distributions that re ect more certainty about the location of the placebo e ect will be used. This allows investigation of sensitivity of results due to di erent speci cations of stage I models. For the exponential survival distribution, a conjugate family of gamma distributions is introduced at stage II to model the heterogeneity in the hazard rates of stage I. The gamma distributions are indexed by the parameters and (with mean = and variance = 2 ). Fixing the mean of the gamma distribution, the parameter can be thought of as re ecting a prior sample size. As 7

increases the posterior distribution of  is more heavily in uenced by the prior distribution. The parameter can be thought of as representing the a priori sum of recurrence times for those patients. For the exponential mixture model with a surviving fraction, consider for the prior distribution of the surviving fraction,  , the conjugate family of beta distributions indexed by the parameters 2  and (with mean + and variance (+ +1)( + )2 ). The parameter  can be thought of as re ecting a prior assessment of the number of subjects in the surviving fraction (successes) out of a total of  + subjects. The prior distribution for the hazard rate of the non-surviving fraction will be a gamma distribution. For stage II of model C, the exponential changepoint model, two distributions are needed. One for the changepoint and one for the pre-changepoint hazard rate. A distribution that was linearly increasing on the integers from 26 to 52, uniform on the integers between 52 and 78, and linearly decreasing on the integers between 78 and 104 was used for the changepoint. The prior was zero elsewhere. A gamma distribution will be used for the prior of the pre-changepoint hazard rate. At the third stage of the hierarchy, speci cations are made for the stage II parameters. While in a fully Bayesian treatment all stage II parameters would receive prior distributions, for comparison purposes some of the stage II parameters will be given xed values. The prior speci cation for the parameter from the stage II gamma distributions, will be a gamma distribution with parameters  and ! . Interpreting as a prior sample size, then the distribution of can be thought of as the prior distribution for the sum of recurrence times for patients. By considering a xed mean, , for small the prior is relatively at, so more heterogeneity is modeled and less information will be shared across the centers in obtaining posterior estimates for placebo e ects within speci c centers. As increases the prior becomes more peaked, allowing less variability between the centers and more shrinkage of the estimates toward a common pooled estimate. The stage II and stage III speci cations for each stage I model are summarized below. Model A: Exponential Stage II : i j ; iid 

8

?1 ?  ?( ) i e

i

(4)

Stage III:

!

 ?(!) !? e?



1

(5)

xed

; ; !

Model B: Exponential Mixture Model with Surviving Fraction Stage II:

?1 ?  ?( ) i e

i j ; iid 

?( + )  ?1 (1 ?  ) i ?()?( ) i

iid 

i j;

Stage III:

(6)

i

!

 ?( !) !? e?



(7) (8)

1

xed

; ; !; ;

Model C: Exponential Changepoint Model with a Surviving Fraction Stage II: i j ; iid  i iid 

Stage III: ; ; !

?1 ?   e ?( ) i

(9)

i

?:0192 + :00074   ;

26    51

(10)

:0192;

52    78

(11)

:0769 ? :00074   ;

79    104

(12)

!

 ?(!) !? e? 1

xed 9

(13)

To complete the model speci cation, values for the xed stage II and III parameters, that is, the hyperparameters must be speci ed. The speci c values chosen for the NIMH trial and our basis for this selection is described in the next section. Brie y, we select values for the xed parameters that span a range of prior distributions. One set of prior distributions, denoted in what follows as Prior I, is speci ed to re ect di use information, resulting in only a small amount of shrinkage across centers. The other set of prior distributions, called Prior II, is speci ed to re ect more information sharing across centers and hence more shrinkage.

3.3 Speci cation of Fixed Stage II and III Hyperparameters for the NIMH Trial The major source of prior information about recurrence times came from the study's own recruitment criteria. To be recruited into the study patients must have had at least one episode of depression within the 2 21 years prior to the index episode. In the NIMH study, there were approximately an average of 1 12 episodes in the 2 12 years prior to the index episode. Episodes had to be separated by eight consecutive \well" weeks to be considered a distinct episode. Using this information, prior belief should re ect recurrence times between 8 and 122 weeks. Previous studies were also reviewed to glean prior information although they yielded little useful information. Because the goal of the analysis was to explore results over a range of prior distributions which demonstrated di ering beliefs in the amount of heterogeneity between centers, two sets of priors were de ned for each stage I model. The two sets of priors will be referred to as Prior I and Prior II. Model A: Exponential The prior for the exponential parameter was a gamma distribution. The parameters of the gamma were speci ed as and . The mode of the distribution for was set to a weighted average of the mean recurrence time for each center multiplied by , which was set to 3 under Prior I and to 10 under Prior II. Speci cally, the mode of the distribution of was set to 25  . For the small centers these prior sample sizes will have a large impact if the center deviates greatly from the other centers, while for the larger centers the e ect will be much less. For both priors, ! was set to 3, and  was determined by solving the equation !?1 =  25: (14) 

10

The actual values of ! and  are presented in Table 2. Table 2 about here Under Prior I the mode of the Gamma distribution for , ~ say, is 75 so that conditional on ~ the prior distribution for the hazard rates i is a Gamma distribution with parameters = 3 and ~ = 75. These values were chosen so that the rst prior is more di use and will result in less shrinkage in the posterior distributions of the hazard rates. For the placebo group under Prior I, the 5th and 95th percentile of this distribution correspond to average recurrence times of about 13 and 94 weeks, respectively, and under Prior II of about 16 and 52 weeks, respectively. This translates to a mode for the population mean for both Prior I and Prior II of approximately 0.02, representing a mean time to recurrence of about one year. Here the mass of the distribution for the population mean covers hazard rates representing mean time to recurrence of six months to two years, and hence, is consistent with expectations from the recruitment criteria. Model B: Exponential Model with Surviving Fraction Speci cations of the prior distributions for this stage I model are presented in Table 3. Prior distributions for were chosen in a fashion analogous to those for the exponential model just described; was xed and the mode of the prior distribution was chosen proportional to the sample size represented by . However, since in this model the i 's have a di erent interpretation, that is, they are the conditional hazard rates for patients who are not \cured", appropriate modi cations in the speci cation of the prior distribution are made. As in the exponential model was set to 3 and 10 in Prior I and Prior II respectively. Table 3 about here For both Prior I and Prior II, the distribution on the surviving fraction, i , was taken to be Beta(2,5). This prior was chosen as it's mean and standard deviation approximate the average and standard deviation of the maximum likelihood estimates for these parameters. The mode for this prior is .28 and the variance is .025. This distribution places 10% of our probability on surviving fractions greater than .5, and about 10% of our prior probability on surviving fractions less than .1. 11

Model C: Exponential Changepoint Model with a Surviving Fraction Speci cation of the prior distributions for this stage I model are presented in Table 4. Prior distributions for were chosen as described previously - was xed, and the mode of the prior distribution on was chosen proportional to the sample size represented by . The prior distribution for  , the changepoint, was taken to be linearly increasing (slope=.0007, intercept -.02) on the integers from 26 to 51, uniform on the integers from 52 to 78, and linearly decreasing (slope=-.0007, intercept .08) on the integers from 79 to 104. This prior places Table 4 about here 25%, 50%, and 25% of the probability on changepoints 26-51, 52-78, and 79-104 weeks respectively. This provides a relatively di use prior, yet places no probability on the changepoint being less than 6 months.

4 Analysis of Center Di erences Each of the three hierarchical survival models described in the previous section will be applied to the placebo arm of the NIMH study. Sensitivity of the placebo e ects to the di erent speci cations of prior distributions within a class of stage I models will be explored. Sensitivity of inferences about the placebo e ects due to changes in the stage I model will also be examined. Our primary inferential tool will be the marginal posterior densities of the parameters of interest, such as, hazard rates within centers.

4.1 Comparison of Results Within Stage I Distributions Model A: Exponential The plots in Figure 2 display for Prior I and II, respectively, the marginal posterior distributions for the i 's. Prior I is more di use and models more heterogeneity among the centers than Prior II (see Table 2). Comparing these gures it can be seen that the posterior modes for each center are further apart and there is less overlap in the distributions under Prior I compared to Prior II. The values of the marginal posterior modes are presented in the upper-right-hand corner of the plots. The gure also shows that there are still considerable di erences between centers D and E for the placebo group even under Prior II. For the placebo treatment group the posterior modes for Center 12

D and E are 0.016 and 0.035 respectively, corresponding to survival times of 62.5 weeks and 28.6 weeks. Note in Figure 2 there is a very small posterior probability that the hazard rate for Center D is greater than 0.02 (50 weeks survival), and a very small posterior probability that the hazard rate for Center E is less than 0.02. Figure 2 about here Each plot contains a sixth curve referred to as F, that represents the predictive distribution of the hazard rate for an unobserved or future center that is assumed apriori exchangeable with those included in the study. This predictive distribution given the data is as follows. p(F jt) =

Z ;

p(F j ; )p( ; jt)d d

(15)

This predictive distribution is estimated from the nal Gibbs sample, ( (r), (r) for r=1,: : :R), by averaging across the conditional distribution.(Geman and Geman, 1984; Gelfand and Smith, 1990) R 1X p(F j (r) ; (r)) (16) p(F jt) = R r=1

Figure 2 shows that the increased shrinkage of prior II results in a more peaked predictive distribution as well. Model B: Exponential Mixture Model with Surviving Fraction Figure 3 displays the marginal posterior distributions under Priors I and II for the stage I mixture model. As was the case in the previous section for model A, Prior II speci es a more concentrated prior distribution for the i 's than does Prior I. (See Table 3). It can be seen that the marginal posterior distributions for the i 's in the rst column of the gure, behave in a fashion similar to model A. As expected, there is more shrinkage due to Prior II than Prior I. However, since the interpretation of the i 's is di erent in this stage I model (these are the conditional hazard rates for patients who are not \cured") these posterior distributions are located around relatively shorter recurrence times. Figure 3 about here 13

The posterior modes of the surviving fractions for the placebo group remained virtually unchanged across the two prior distributions. Note that because the parameters of the prior on the surviving fraction were xed there is no sharing of information between centers, and the predictive distributions for the surviving fraction at an unsampled center (referred to as F) is the same as the prior distribution. Model C: Exponential Changepoint Model with a Surviving Fraction Figure 4 displays the marginal posterior distributions under Prior I and II for the changepoint model. The i 's in this model are the pre-changepoint hazard rates. The prior for the i 's is more di use under Prior I, and more peaked or informative under Prior II. The prior distribution for  , the changepoint, is a the same across the two priors. (See Table 4.) The rst column of Figure 4 displays the marginal posterior distributions of the pre-changepoint hazard rates. As expected, more shrinkage occurs under Prior II. Under Prior II the posterior modes of the placebo pre-changepoint hazard rates for Centers D and E are 0.014 and 0.039. These modes correspond to survival times of 71 weeks and 26 weeks, respectively. Note that the marginal posterior distribution for  , the changepoint, remains essentially the same under both prior speci cations. As for the surviving fraction of the mixture model, because no parameters of the prior on the changepoint were treated as random, the predictive distributions for the changepoint at an unsampled center (referred to as F) is the same as the prior distribution. Figure 4 about here

4.2 Comparison of Results Across Stage I Models In this section attention will focus on how the assessment of placebo e ects varies as a function of the the speci cation of the stage I model. Because the stage I models are not nested one cannot simply compare analogous parameters across models to decide how inferences about placebo e ects would change across models. Instead, summary measures that incorporate the heterogeneity seen in each parameter are required. Our primary inferential tool for these comparisons is the predictive distribution of a future patient's survival time, T, under each treatment within a center. The predictive distribution takes into account the heterogeneity seen in all the parameters. The 14

predictive distribution for T given the data is p(t jt) =

Z



p(t j)p(jt)d;

(17)

where p(tj) is the conditional predictive distribution and p(jt) is the posterior distribution of . The predictive distribution can be approximated from the nal Gibbs sample, ((r) r=1,: : :,R) by averaging across the conditional distribution.(Geman and Geman, 1984; Gelfand and Smith, 1990) p(tjt) 

R 1X p(tj(r) ): R r=1

(18)

A detailed derivation of the conditional predictive distributions for the exponential, exponential mixture, and exponential changepoint models may be found in Stangl (1991). Figure 5 displays the predictive cumulative distribution function for each of the ve centers in the NIMH Study and for a 6th unsampled center again referred to as F. These plots show that the main di erence between the models is what goes on in the upper tail. The surviving fraction of the mixture and changepoint models create horizontal asymptotes that clearly deviate from the monotonically-increasing-to-one tails of the exponential model. Table 5 presents the median for each of these predictive distributions. Table 6 presents the value of the predictive CDFs at 104 weeks. Both Tables 5 and 6, show that the exponential model estimates the largest median survival time and the largest value of the CDF at 104 weeks within each center. The smallest median survival time and the smallest value of the CDF at 104 weeks alternates between the mixture and changepoint models. While the Kaplan-Meier Curves are included in these tables to serve as a point of reference and aid us in understanding the di erences in the models, comparisons between the Kaplan-Meier curves and the three parametric models should be done remembering that there is no sharing of information across centers in the Kaplan-Meier Curves. Figure 5 about here Table 5 and Table 6 about here Table 5 also shows that within models the prior makes the largest di erence in estimating median survival for Center D. Moving from Prior I to Prior II pulls down the median by 13, 7, and 10 weeks for the exponential, mixture, and changepoint models respectively. The smallest 15

di erence across the priors is seen at Center E. Moving from Prior I to Prior II pulls up the median by 4, 2, and 4 weeks for the exponential, mixture, and changepoint models respectively. Exploring the heterogeneity across centers, Table 5 shows that the median survival time is clearly the smallest for Center E and largest for Center D regardless of prior and model. Proceeding from column 2 to column 8 of Table 5, the range across the centers is 15-73, 19-60, 11-62, 13-55, 14-59, 18-49, and 5-72. The heterogeneity is greatest for the Kaplan-Meier curves because no information is shared across the centers. These ranges clearly demonstrate the increased shrinkage under Prior II of each rst stage model, and the increased shrinkage of all models as compared to the KaplanMeier curves. The last row of table 5, shows how each model extrapolates to an unobserved center (denoted by the letter F). All estimates are within 26 to 34 weeks, with estimates for the exponential model falling slightly higher than the estimate of the median survival under a pooled Kaplan-Meier curve, and the estimates for the mixture and changepoint falling slightly below. The question remains however, of whether it makes sense to perform such extrapolations when such diversity exists between centers.

4.3 Sensitivity of Results to Particular Centers Goodness of t in hierarchical models is an open area for research. Because we often use hierarchical models to borrow strength for analyses where samples are too small for subset analysis, samples sizes are also too small to rigorously test for model t. In the example presented here, only one treatment center had a sample size greater than 20. Because this paper used survival models, then number of events is also of importance in measuring goodness-of- t. In our example, no center had more than 15 events. Goodness of t at the second stage of the model is even more dicult as the sample sizes at the second stage are likely to be small as well. In our example the second stage sample size was only 5. As an alternative to traditional goodness-of- t methods, an analysis was done to determine the sensitivity of results to exclusion of each particular treatment center. The six models were rerun 5 times, each time excluding one treatment center. The impact of the exclusion is summarized in Table 7. This table presents the median survival time for an unobserved center, again denoted Center F. The columns are labeled by the excluded center. The last column within each model 16

gives the median survival time when all centers are included. The rst three rows of numbers are the results under Prior I, the last three rows, Prior II. The row labeled Di erence A, presents the subtraction of the median for an unobserved center when a center is excluded from median when all centers are included. Using the exclusion of Center A under the exponential model as an example, the median for Center F when Center A is excluded is 29, and the median when all centers are included is 34. Hence the di erence is 29-34=-5. Di erence B presents the error in our prediction if we used the median for an unobserved center to predict the Kaplan-Meier median for the excluded center. Table 7 about here The table shows that median survival times for an unobserved center were most a ected by the exclusion of Centers D and E. The largest increase in median survival under each of the 6 models occurred when Center E was excluded, the largest decreases when Center D was excluded. When Center E was excluded, the median survival for an unobserved center ranged from 35 (Mixture Model, Prior II) to 45 (Exponential Model, Prior II). The increases in median survival resulting from exclusion of Center E range from 6 to 14 weeks and occur under the Exponential Model, Priors I and II respectively. When Center D was excluded, the range in median survival was 23 (Mixture Model, Prior II or Changepoint Model either prior) to 26 (Mixture Model, Prior I or Exponential Model, Prior II). For exclusion of Center D, decreases from median survival when all centers were included ranged from 3 to 9 weeks. As expected, exclusion of Center C resulted in the second largest increase, Center A the second smallest increase, and Center B falls in the middle. If we now use the median survival times for un unobserved center to predict the median survival for each of the observed centers as they were removed from the analysis, we see that our predictions would be far from the target. For example, look at Di erence B for Center A under Prior I. From Table 5, the Kaplan-Meier estimate of median survival for Center A was 49 weeks. Excluding Center A from the analysis, gave a predictive survival median for an unobserved center of 29 weeks. If we used this value to predict Center A's median survival we would miss by 20 weeks. This is the Di erence B reported in Table 7. We see di erences for other centers ranging from -40 to +49. Di erences are of similar magnitude across all 6 models. Our predictions are much too high for Centers C and E, while our predictions are much too low for centers A, B and D. Predictions are 17

closest for Center B, as it is the closest to being an 'average' center. This result again highlights the extreme heterogeneity in placebo response among the 5 centers. When there is heterogeneity between centers, we would not expect predictions for extreme centers to be accurate even if at priors were used at stage II. Finally, we see from the di erences presented in the column where all centers are included, that we do well in predicting an average center. For this di erence we compared our prediction for an unobserved center to the median survival from the pooled Kaplan-Meier estimate. For this our predictions are within  4 weeks regardless of model.

5 Discussion In this paper we have used Bayesian hierarchical survival models to investigate the range of placebo e ects among centers in a multicenter clinical trial. Three di erent stage I models, an exponential model, an exponential mixture model, and an exponential changepoint model, were used to describe the distribution of the time-to-recurrence of depression in the NIMH maintenance therapy trial. The latter two survival models allow for nonconstant hazard rates within centers. For these two models the second component was restricted to represent a surviving fraction. To assess the sensitivity of inferences about placebo e ects across centers, we compared the marginal posterior distributions of the parameters of interests within each stage I model across di erent prior or stage II speci cations. Examining the posterior distributions of the placebo treatment group under each model it was seen that hazard rates di ered greatly across centers. Posterior modes showed considerable di erences and only a small overlap was seen in the distributions of the most extreme centers. This heterogeneity between centers in the placebo e ect could be due to several factors. First, the \warm placebo" e ect discussed by Brown (1994) need not be of the same \warmth" at each center. There is likely a great deal of variation between centers in terms of the type of interpersonal interaction and cognitive support that supplements pharmacotherapy. Secondly, the placebo group was withdrawn from imipramine at the time of randomization. Deviations in drug withdrawal protocols are not unlikely. When the variance of the second stage prior decreased, more information was shared across centers and less heterogeneity was seen in the posterior distributions of the hazard rates across the centers. However, even with increased shrinkage 18

the di erences across centers were still large. This nding highlights the important question, of how should the results of this study be extrapolated to other unobserved centers. Does pooling across centers make sense when such diversity is present? The answer to this question requires a utility analysis. Do we maximize our utility by treating unobserved centers as a weighted average of observed centers? In the NIMH study there were only ve treatment centers and the within center sample sizes were small for several centers. Because of the small number of centers participating in the study, it is impossible to check the assumption of the parametric form of the Stage II distributions. Because of the small within center sample sizes, the posterior distributions of the center speci c parameters were sensitive to the prior speci cation. It would be advantageous to have more treatment centers and more observations within each center, however due to limited research resources, a compromise between these two sample sizes is necessary. Further investigation is being carried out (Stangl and Mukhadopadhyay, 1993) to clearly demonstrate the tradeo between information about the population and center-speci c parameters as we make compromises between the number of centers and the number of subjects within centers.

19

Table 1. Number of subjects by center. Center Samplesize Recurrences A 11 6 B 10 6 C 8 7 D 25 12 E 17 15

20

Table 2. Stage III Distributions for Exponential Model i  Gamma( ; ) Gamma(!;  ) ! 

Prior I 3 3 Prior II 10 3

.0266 .0080

Table 3. Stage III Distributions for Mixture Model i Gamma( ; ) i Beta(; ) Gamma(!;  ) !  

Prior I 3 3 IV 10 3

.0428 .0130

2 2

5 5

Table 4. Stage III Distributions for Changepoint Model i  Gamma( ; ) i Nonstandard(a,b,c,d)  Gamma(!;  ) Linearly increasing on [a,b], Uniform on [b,c], Linearly decreasing on [c,d] !  a b c d

Prior I 3 3 Prior II 10 3

.0311 .0093

26 26

21

52 52

78 78

104 104

Table 5. Median Survival Times (Weeks) Under Each Model

A B C D E F

Exponential I II 50 43 43 38 23 27 73 60 15 19 34 31

Mixture I II 43 36 31 29 17 19 62 55 11 13 29 27

Changepoint Kaplan-Meier I II 40 35 49 38 34 46 19 24 9 59 49 72 14 18 5 26 28 30a

a This is the median survival obtained from the Kaplan-Meier curve when all 5 centers were pooled.

Table 6. Predictive CDF At 104 Weeks Under Each Model

A B C D E F

Exponential I II .75 .81 .80 .85 .95 .93 .62 .70 .99 .97 .83 .88

Mixture I II .64 .66 .69 .69 .80 .80 .58 .57 .84 .85 .69 .70

Changepoint 1-Kaplan-Meier I II .69 .74 .63 .78 .82 .65 .86 .80 .88 .61 .68 .55 .95 .91 .89 .76 .78 .70b

b This is the value of one minus the Kaplan-Meier curve at 104 weeks when all 5 centers were pooled.

22

Table 7 Median Survival Times (Weeks) For an Unobserved Center When Observed Centers are Excluded

Prior I Center F Di erence A Di erence B Prior II Center F Di erence A Di erence B

Exponential Center Omitted All A B C D E In 29 32 38 25 -5 -2 +4 -9 20 14 -29 47

40 +6 -35

28 30 35 26 45 -3 -1 +4 -5 +14 21 16 -26 46 -40

Mixture Center Omitted All A B C D E In

Changepoint Center Omitted All A B C D E In

34 28 29 31 26 40 -1 0 +2 -3 +11 -4 21 17 -22 46 -35

29 25 24 30 23 39 -1 -2 +4 -3 +13 1 24 22 -21 49 -34

26

31 24 24 27 23 -3 -3 0 -4 -1 25 22 -18 49

27 26 26 31 23 -2 -2 +3 -5 3 23 20 -22 49

28

23

35 +8 -30

37 +9 -32

4

2

1.0

Figure 1: Kaplan-Meier Survival Curves for the Placebo Group at Each Center

Survival

0.6

0.8

Generalized Savage (Mantel Cox) Stat df p-value 16.95 4 .002

0.4

D

0.2

B

A

C

0.0

E

0

20

40

60 Time

24

80

100

Figure 2: Posterior Distributions for Exponential Model 150

Prior I Posterior Modes A .013 B .016 C .030 D .009 E .047 F .017

100

D

A

0

50

B C F •••••••••••••••••••• E •••••• ••• • •••••••• • •••••••••••••• ••••••••••••••• ••

••••••••••••••



150

0.0

0.02

i

0.06

Prior II Posterior Modes A .016 B .018 C .025 D .012 E .035 F .020

100

D

0

50

A B ••• C E ••• ••••• • •• • •• • ••• • • F ••• • •••• •• • ••••••••••• ••••••••••••••••••••••••••••••••• ••••••• 0.0

0.02

i

0.06

25

Figure 3: Posterior Distributions for Mixture Model

A

B C

i

1 0

0.15

0.0

0.05

i

0.4

C B

2 1

0.15

0.8

A

••••••••••••••••••• •••• •••• •••• ••• ••• •••• •••• F •••• •• •• •••• • ••••• •••••• •• ••••••••• ••• •••••••••••••••••••••••• 0.0

26

i

Prior II Posterior Modes A .30 B .28 C .15 D .40 E .12 F .20 D

E

5

Prior II Posterior Modes A .039 B .043 C .052 D .035 E .069 F .044

•••••• ••• ••• E •• •• ••• ••• •• • •• ••• F •• •• •• ••• •• •••• •••• •••••• ••••••••••••••••••••••••••••••••••••••••••••••• • ••••••• 0.0

A

4

A

B

•••••••• •••• ••••••••• • D •••• • ••• ••• ••• •• •••• F •••• •••• •• • • • • • ••••• •• •••••• ••••••••• ••• ••••••••••••••••••••••••

3

40

D

30 20 10 0

0.05

2

••••••••• E ••• •••••• ••• ••• ••• F •• •••• ••• •••• ••••• •• ••••••• ••• ••••••••••• •• •••••••••••••••••••••••••••••••••••• • 0.0

C

3

C

0

0

10

20

B

Prior I Posterior Modes A .29 B .27 C .15 D .38 E .13 F .20

E

4

30

D

5

Prior I Posterior Modes A .031 B .038 C .056 D .027 E .088 F .032

0.4

i

0.8

100 80

0.02 0.0

0.08

0

Prior II Posterior Modes A .020 B .020 C .029 D .014 E .039 F .023

AB

i

20

40

60

i

Prior II Posterior Modes A 56 B 75 C 52 D 72 E 50 F 52-78 A

0.04

C

•••• E •• ••• • • •• • F •• • • • ••• • ••••• • ••••••••••••••••••••••••••••••••••••• ••••• 0.04

F

0.08

100

i

C

0.12

0.04

D

80

B

80 100

D

B

E C F

0.0

20 0

C

E ••••••••••••••• • • ••••• F •••••• •• • • • • • •••••••••••••••••• •••••••••••••••• ••

60 40 20

D

E

40

60

A B

0.0

Prior I Posterior Modes A 56 B 75 C 51 D 72 E 47 F 52-78 A

0.06

Prior I Posterior Modes A .017 B .017 C .035 D .012 E .051 F .018

D

0.0

0

0.10

Figure 4: Posterior Distributions for Changepoint Model

0.08

0

27

20

40

60

i

80 100

Figure 5: Predictive Distributions for Future Observations

0

50

100

Ti

150

0.8 0.6 0.4 0.0

Ti

100

150

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ••••••••••••••• ••••••• ••••• •••• • • •• ••• ••• •• •• • •• •• •• •• • •• •• • • • • • • • • • • • • • • • • • • • • • • • • •

0

50

Ti

100

150

0

50

100

Ti 28

150

0.4

••••••••••••••••••••••••••••••••••••••••••••••••••••• ••••••••••••••••••• ••••••••••• •••••••• • • • • • • ••••• •••• ••• ••• ••• • • •• •• •• •• • •• •• • • • • • • • • • • • • • • • • • • • •

0.6

0.8

1.0

50

0.8 0.6 0.4 0.2

•• ••••••••••••• •••••••••• ••••••••• ••••••• ••••••• • • • • • ••••• ••••• •••• •••• •••• • • • •• ••• ••• ••• ••• • • ••• •• •• •• •• • •• •• •• •• • • •• •• •• • • • • • • • • • • • • • • • • • • • • • • • • • • • •

0.2

0.8 0

0.0

0.8 0.6 0.4 0.2 0.0

P R I O R II

0.6

150

0.2

100



1.0

Ti

1.0

50

•••• •••••••••••••••••••••••••••••••••••• ••••••••••••••••••• ••••••••••••• ••••••••• • • • • • • •• •••••• ••••• •••• •••• ••• • • •• ••• •• •• •• • • •• •• •• • • • • • • • • • • • • • • • • •

0.0

D

0.4

F B A

0.2

••••••••••• •••••••••• ••••••••• ••••••• ••••••• • • • • • •••••• ••••• ••••• ••••• •••• • • • •••• ••• ••• ••• ••• • • ••• ••• •• •• •• • •• •• •• •• • • •• •• •• • •• •• • • • • • • • • • • • • • • • • • • • • •

0.0

C

0

Changepoint

1.0

E

1.0 0.8 0.6 0.4 0.0

0.2

P R I O R I

Mixture 1.0

Exponential

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ••••••••• •••••• •••• •••• • • •• ••• •• •• •• • • •• •• •• • • •• •• • • • • • • • • • • • • • • • • • • • • • • • • • •

0

50

100

Ti

150

References Achcar, J. A., and Bolfarine, H. (1989). Constant hazard against a change-point alternative: a Bayesian approach with censored data. Commun. Statist.-Theory Meth., 18(10), 3801-3819. Beecher, H.K. (1955). The powerful placebo e ect. Journal of the American Medical Association, 159:16021606. Berger, J. (1984). The robust Bayesian viewpoint. In Joseph B. Kadane (ed.), Robustness of Bayesian Analyses, North-Holland, New York. Boag, J.W. (1949). Maximum likelihood estimation of the proportion of patients cured by cancer treatment. Journal of the Royal Statistical Society, 11, 15-53. Box, G. E. P. (1980). Sampling and Bayes' Inference in Scienti c Modelling and Robustness. Journal of the Royal Statistical Society, A, 143(4), 383-430. Brown, W. A. (1994), Placebo as a treatment for depression (with discussion). Neuropsychopharmacology, 10:265-288. Carlin, B., Gelfand, A.E., and Smith, A.F.M., (1992). Hierarchical Bayesian analysis of change point problems. Applied Statistics, 41(2), 389-405. Cox, D.R. (1990). Models in statistical analysis. Statistical Science, 5, 169-174. Farewell, V.T. (1983). The use of mixture models for the analysis of survival data with long-term survivors. Biometrics, 38, 1041-1046. Fleiss, J. (1986). Analysis of data from multicenter clinical trials. Controlled Clinical Trials, 7, 267-275. Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721-741. Geisser, S. (1987). Comments on \Prediction of future observations in growth curve models". Statistical Science, 2, 465-467. Geisser, S. (1990). On hierarchical Bayes procedures for predicting simple exponential survival. Biometrics, 46, 225-230. Gelfand, A.E. and Smith, A.F.M. (1990). Sampling based approaches to calculation marginal densities. Journal of the American Statistical Association, 85, 398-409. Gray, R. J. (1994). On institutional e ects in multicenter clinical trials. Biometrics, 50, 244-253.

29

Greenhouse J. and Wasserman L. (1995). Robust Bayesian analysis of clinical trials. Statistics in Medicine, in press. Greenhouse, J., Stangl, D., Kupfer, D., and Prien, R. (1991). Methodological issues in maintenance therapy clinical trials. Archives of General Psychiatry, 48, 313-318. Greenhouse, J.B. and Wolfe, R.A. (1984). A competing risks derivation of a mixture model for the analysis of survival data. Communications in Statistics - Theory and Methods, 13, 3133-3154. Grieve, A.P. (1988) Some uses of predictive distributions in pharmaceutical research. In Biometry - Clinical trails and related topics. ed. T. Okuno, 83-99. Kass, R.E. and Greenhouse, J.B. (1989) Comment: A Bayesian perspective on \Investigating therapies of potentially great bene t: ECMO. by James H. Ware, Statistical Science, 4, 298-340. Kass, R.E. and Ste ey, D. (1989) Approximate Bayesian inference in conditionally independent hierarchical models. Journal of the American Statistical Association, 84, 717-726. Keller, M.B., Shapiro, R.W., Lavori, P.W., and Wolfe, N. (1982) Relapse in major depressive disorder: Analysis with the life table. Archives of General Psychiatry, 39, 911-915. Lehman, E.L. (1990). Model Speci cation: The views of Fisher and Neyman and later developments. Statistical Science, 5, 160-168. Lindley, D.V. and Smith, A.F.M. (1972). Bayes estimates for the linear model (with discussion). Journal of the Royal Statistical Society, Series B, 34, 1-42. Prien, R.F., Kupfer, D.J., Mansky, P.A., Small, J.G., Tuason, V.B., Voss, C.B., Johnson, W.E. (1984). Drug therapy in prevention of recurrences in unipolar and bipolar a ective disorders. Archives of General Psychiatry, 41, Nov., 1096-1104. Raftery, A.E., and Akman, V.E. (1986). Bayesian analysis of a Poisson process with a changepoint. Biometrika, 73, 1, 85-89. Skene, A.M., Shaw, J.E.H., and Lee, T.D. (1986). Bayesian modelling and sensitivity analysis. The Statistician, 35, 281-288. Skene, A.M., and Wake eld, J.C. (1990). Hierarchical models for multicentre binary response studies. Statistics in Medicine, 9, 919-929. Smith, A.F.M. (1975). A Bayesian approach to inference about a change-point in a sequence of random variables. Bayesian Statistics, Vol. I, 83-98.

30

Smith, A.F.M. (1980). Change-point problems: approaches and applications, Bayesian Statistics, Vol I, eds. J.M. Bernardo, M.H. DeGroot, D.V. Lindley, and A.F.M. Smith, Oxford U.K.: Oxford University Press, 83-98. Smith, A.F.M. and Spiegelhalter, D.A. (1980) Bayes factors and choice criteria for linear models, Journal of the Royal Statistical Society, B, 42,213-220. Stangl, D. (1991) Modeling heterogeneity in multi-center clinical trials using Bayesian hierarchical survival models. Ph.D. dissertation, Carnegie Mellon University. Stangl, D. and Mukhopadhyay, S. Balancing Centers and Observations in Multicenter Clinical Trials. Tech Report 93-A13 , ISDS, Duke University. Stangl, D. (1994) Modeling and Decision Making Using Bayesian Hierarchical Survival Models. Technical Report 94-22, ISDS, Duke University. Turner, J.A, Deyo, R.A., Loeser, J.D., Von Kor , M. and Fordyce, W.E. (1994). The importance of placebo e ects in pain treatment and research. The Journal of the American Medical Association, 271:1609-1614. Wasserman, L. (1992) Recent methodological advances in robust Bayesian inference, in Bayesian Statistics, Vol 4, eds. J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith, Oxford U.K.: Oxford University Press, 483-502. Wells, K.B., Steward, A., Hayes, R.D., Burnam, A., Rogers, W., Daniels, M., Berry, S., Green eld, S., and Ware, J. (1989) The functioning and well-being of depressed patients: Results from the Medical Outcomes Study. Journal of the American Medical Association, 262, 914-919.

31