Multilevel Bayesian Models for Survival Times and ... - Biostatistics

4 downloads 33735 Views 226KB Size Report
Dec 21, 2011 - Boston, MA 02445 (email: [email protected]) ...... data: an application to B2B electronic payments system adoption,” Annals of ...
Multilevel Bayesian Models for Survival Times and Longitudinal Patient-Reported Outcomes with Many Zeros Laura A. Hatfield Department of Health Care Policy, Harvard Medical School Boston, MA 02445 (email: [email protected]) Mark E. Boye Global Health Outcomes, Eli Lilly and Company Indianapolis, IN 46285 (email: [email protected]) Michelle D. Hackshaw Global Health Outcomes - Oncology, Merck & Co., Inc. Whitehouse Station, NJ 08889 (email: [email protected]) Bradley P. Carlin Division of Biostatistics, University of Minnesota Minneapolis, MN 55455 (email: [email protected]) December 21, 2011 Author’s Footnote Laura A. Hatfield is Assistant Professor, Harvard Medical School, Department of Health Care Policy, Boston, MA 02445 (email: [email protected]); Mark E. Boye is Senior Research Scientist, Global Health Outcomes, Eli Lilly and Company, Indianapolis, IN 46285; Michelle D. Hackshaw, PhD was Associate Research Scientist, Global Health Outcomes - Oncology, Eli Lilly and Company, Indianapolis, IN 46285 at the time of this research. She is currently Manager, Global Health Outcomes - Oncology, Merck & Co., Inc., Whitehouse Station, NJ 08889 (email: [email protected]); and Bradley P. Carlin is Professor and Head of Biostatistics and Mayo Professor in Public Health, Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 (email: [email protected]). The work of Hatfield and Carlin was supported by a grant from Eli Lilly and Company, Global Health Outcomes - Oncology #2010-103, while that of Carlin was also supported by National Cancer Institute Grant 1-R01-CA95955. The authors thank the associate editor and two anonymous referees for their particularly insightful and constructive suggestions.

Abstract Regulatory approval of new therapies often depends on demonstrating prolonged survival. Particularly when these survival benefits are modest, consideration of therapeutic benefits to patient-reported outcomes (PROs) may add value to the traditional biomedical clinical trial endpoints. We extend a popular class of joint models for longitudinal and survival data to accommodate the excessive zeros common in PROs, building hierarchical Bayesian models that combine information from longitudinal PRO measurements and survival outcomes. Model development is motivated by a clinical trial for malignant pleural mesothelioma, a rapidly fatal form of pulmonary cancer usually associated with asbestos exposure. By separately modeling the presence and severity of PROs, using our zero-augmented beta likelihood, we are able to model PROs on their original scale and learn about individual-level parameters from both presence and severity of symptoms. Correlations among an individual’s PROs and survival are modeled using latent random variables, adjusting the fitted trajectories to better accommodate the observed data for each individual. This work contributes to understanding the impact of treatment on two aspects of mesothelioma: patients’ subjective experience of the disease process and their progression-free survival times. We uncover important differences between outcome types that are associated with therapy (periodic, worse in both treatment groups after therapy initiation) and those that are responsive to treatment (aperiodic, gradually widening gap between treatment groups). Finally, our work raises questions for future investigation into multivariate modeling, choice of link functions, and the relative contributions of multiple data sources in joint modeling contexts. Keywords: beta distribution, cancer, failure time, random effects model, repeated measures, Weibull distribution

1.

INTRODUCTION

1.1 Joint Modeling In many settings, a biological process is studied over time by repeatedly measuring features of the process and observing the times when significant events occur. Joint consideration of such longitudinal and time-to-event data is important for several reasons. The longitudinal measurements may yield early indications of impending events, the event may (informatively) censor the sequence of longitudinal measurements, or the effects of an intervention on multiple aspects of the process simultaneously may be of interest. In the clinical trial we consider, all of the foregoing are relevant to our joint modeling approach. The biological process of interest is a fatal lung disease. Time until clinical progression of disease or death is the primary survival endpoint of a clinical trial, but the effect of treatment on symptom evolution is also important. We build on the considerable joint modeling literature to develop Bayesian hierarchical models for these data. Our primary goals are 1) to simultaneously estimate treatment effects on symptoms and survival, 2) to study the connection between these two outcome types, and 3) to understand individual variation in the progression of disease. Joint longitudinal-survival models have been extensively developed in the literature (see Tsiatis and Davidian (2004) for a review). Examples of non-clinical settings in which joint modeling is useful include degradation measurements paired with time-to-failure data in reliability (Lu and Meeker 1993), and fruit fly reproductive fertility and lifespan measurements in fecundity studies (Hanson et al. 2011). 1.2 Excess Zeros One complication of modeling our clinical trial data is an excess of zeros in the longitudinal patient-reported outcomes (PROs). Symptom-based assessments may be right-skewed with a mode at zero corresponding to asymptomatic patients who complete a symptom ques-

1

tionnaire. Conceptually, a symptom score of zero may indicate either a censored value for an individual with a low actual score (i.e., a floor effect), or the complete absence of the measured outcome. Such mixed processes should not be modeled with zero representing another point on the severity continuum. Rather, the zero/one dichotomy should be modeled separately to account for the absence/presence of the measured outcome, respectively. Such zero-inflation is commonly addressed in count data by mixing a degenerate point mass at zero with a count distribution (e.g., Lachenbruch (2002)). This readily extends to the case of a continuous random variable that includes zero in its support (e.g., Ghosh and Albert (2009)). Recently, others have used copulas to build models for zero-inflated binary longitudinal outcomes in conjunction with survival (Rizopoulos et al. 2008). These authors considered a degenerate logistic model component for study participants for whom every longitudinal observation was a zero. Our zero-augmentation setting differs from these approaches; we mix a point mass at zero with a continuous distribution that does not include zero in its support. The resulting two-part longitudinal model is linked to a survival model by extending the latent variables framework, as described in Section 3 below. Regardless of this distinction, the mixture component from which an observation derives is naturally thought of as a Bernoulli trial with some success probability p. Suppose that the trial is a “success” if the observation is not from the zero point mass. Modeling this binary outcome follows the generalized linear model (GLM) convention of using a link function to transform the support of a linear predictor (Xβ ∈ ℜ) to that of a probability (p ∈ [0, 1]). These link functions frequently take the form of a CDF for an unbounded continuous random variable since these are monotonically increasing functions from ℜ → [0, 1]. Previous authors have studied the impact of link misspecification in binary GLM models (Pregibon 1980; Neuhaus 1999; Czado and Santner 1992), finding that skewness impacts bias and efficiency more severely than kurtosis. The two most common link functions (logit

2

and probit) are symmetric, while a third common link (complementary log-log) has only minimal skew. One flexible class of skewed links follows the CDF idea, but adds additional parameters to control the tail behavior and allow asymmetry (Wang and Dey 2010). Another recent paper considers the skewed link problem specifically in the setting of zero inflation and proposes a flexible so-called natural link (Ghosh et al. 2011). However, their derivation is in the context of zero-inflating a count variable, and does not readily extend to our setting of zero-augmenting a continuous variable. In the clinical trial setting, patients with poor survival have few PRO measurement occasions, and this censoring is informative in the sense that such patients may have had the worst observed symptom trajectories had they survived. The zero augmentation allows further fine-tuning of the associations among time-to-event outcomes and PROs by allowing the presence and severity of symptoms to be modeled separately. Our approach’s ability to flexibly borrow information across outcome types is generalizable to other settings that include both longitudinal endpoints, time-to-event endpoints, or both. The remainder of this paper proceeds as follows. In Section 2, we introduce our mesothelioma clinical trial data and consider the challenges of using patient-reported outcomes. In Section 3, we describe our general joint modeling framework, giving specifics of the implementation for the trial in Section 4. Section 5 presents detailed results for model parameters and considers the benefit of joint modeling. Lastly, Section 6 concludes with discussion of results and limitations, as well as suggestions for future work. 2.

DATA DESCRIPTION

2.1 Mesothelioma Clinical Trial Malignant pleural mesothelioma is a fatal cancer of the pleura (the membrane structure surrounding the lungs) usually caused by asbestos exposure (Wagner et al. 1960). After lung cancer, mesothelioma constitutes the second most important occupational cancer among

3

industrial workers (Galateau-Sall 2006). The disease is characterized by a long (20 to 40 year) post-exposure latency period, the time between the first inhalation of asbestos and the diagnosis of mesothelioma (Berman and Crump 2003). The disease is managed by chemotherapy, supportive care, surgery, radiation, or combinations of these (Zucali et al. R combined with cisplatin (pemetrexed/cisplatin) is the only 2009). Pemetrexed (Alimta )

treatment approved by the US Food and Drug Administration (FDA) as initial (first-line) chemotherapy for patients who are not candidates for curative surgery (U.S. FDA 2004; Eli Lilly and Company 2010) and has been established as standard of care (Ray and Kindler 2009). Here, we consider data from EMPHACIS (Evaluation of MTA in Mesothelioma in a Phase 3 Study with Cisplatin), a global phase III clinical trial conducted to evaluate the efficacy of pemetrexed/cisplatin, compared with cisplatin alone, as first-line treatment for patients with mesothelioma (Vogelzang et al. 2003). The trial enrolled 448 patients with locally advanced or metastatic disease who had received no prior chemotherapy, although some had received prior radiotherapy. This study showed an overall survival treatment benefit of 2.8 months (median 12.1 months in pemetrexed/cisplatin versus 9.3 in cisplain alone) as well as increased progression-free survival (PFS) time of 1.8 months (median 5.7 months in pemetrexed/cisplatin versus 3.9 in cisplain alone). Longitudinal Measurements. The event schedule included two pre-randomization visits (4-6 and 1-2 days before treatment initiation) at which PRO data were first collected. On days 8, 15, and 19 of each 21-day treatment cycle, PROs were assessed again. We limit consideration to the first 6 protocol-specified cycles, excluding observations from optional additional cycles and from the post-treatment phase. As a result, each participant contributed between 1 and 20 (median 14) PRO measurements. Longitudinal PROs were measured using the Lung Cancer Symptom Scale (LCSS), which

4

includes six lung-cancer-related symptoms (anorexia, fatigue, cough, dyspnea, hemoptysis, and pain) and three global measures (symptom distress, interference with carrying out normal activities, and quality of life). Consistent with previous work indicating that hemoptysis is not relevant to mesothelioma (Hollen et al. 2006), we exclude that item from our analysis. Participants reported their symptoms for the 24 hours prior to data collection by placing a mark on a 100mm visual analog scale. The lowest possible patient endorsement (i.e., a zero) indicated a lack of a symptom, no effect on normal activities, or a completely unrestricted quality of life. Conversely, patients with a 100mm endorsement represented that during the last 24 hours they experienced the worst possible level of a lung cancer symptom, the most possible functional impairment, or greatest impairment of quality of life. We observed a large number of zero responses in the EMPHACIS data, ranging from 5% (quality of life) to 25% (cough) at baseline. In contrast, there were very few observations of 100. Time-to-Event Outcomes. Following the EMPHACIS protocol, progression of disease was defined on the basis of the size of the primary tumor, and the PFS variable was defined as the time in days from randomization to progression or death from any cause. Our analysis considers only the PFS time-to-event outcome, not overall survival, and we take a parametric proportional hazards approach to the survival modeling. Visual inspection of the KaplanMeier curves (not shown) and likelihood-based comparison of various parametric survival models suggested that the Weibull distribution was a reasonable error distribution for PFS. Baseline Covariates. At the pre-randomization visits, study physicians rated participants on the Karnofsky performance scale. Scores range in intervals of 10 from 100 (normal) to 0 (dead), and the trial inclusion criteria specified a score of at least 70 at entry. We characterize each participant as having low (< 90) or high (90 − 100) Karnofsky according to their first pre-randomization value. Disease at baseline was staged according to International Mesothelioma Interest Group criteria (Rusch 1995). We collapse stage into a binary indicator 5

Table 1: Baseline characteristics of patients in each treatment group. Pemetrexed/cisplatin

Cisplatin only

n (%)

n (%)

Race: White

204 (90%)

206 (93%)

Gender: Male

184 (81%)

181 (82%)

Karnofsky: High (90-100)

117 (52%)

125 (56%)

Stage: I/II

52 (23%)

49 (22%)

Vitamin Supplement: Full

168 (74%)

163 (73%)

of advanced (Stage III/IV) or early (Stage I/II) disease. Vitamin supplementation with folic acid and B12 was added to the treatment regimen after early results showed that these could ameliorate the risk of severe treatment-related toxicity (Niyikiza et al. 2002). Not all patients in the EMPHACIS study received vitamin supplementation in every treatment cycle, so we dichotomize patients into fully supplemented and partially/not supplemented groups. Information on race/ethnicity, gender, and other demographics was also collected. Patient population baseline characteristics are summarized in Table 1. 2.2 Patient-Reported Outcomes and Survival There are several reasons to consider a joint modeling approach in a clinical trial setting. First, the FDA has expressed concerns about the use of the PFS surrogate endpoint to establish treatment benefit and support drug approvals (Laffler 2009). Second, opinion leaders and regulators have communicated the need for decision-relevant PROs to better inform cancer care by adding value to the traditional biomedical clinical trial endpoints (Lipscomb et al. 2007). In fact, the FDA has a history of considering PROs as direct evidence to support a treatment benefit for cancer drug approvals (Rock et al. 2007). The lethality of mesothelioma notwithstanding, the treatment effect on PFS in the EMPHACIS trial is

6

modest. Analyses showing additional treatment benefit on PRO evolution would provide compelling additional evidence on behalf of pemetrexed/cisplatin in this first-line setting. Although researchers have investigated PFS and PROs in mesothelioma clinical trials (Bottomley et al. 2007), there is scant literature on joint modeling or the concomitant zero-augmentation in this or similar settings. To address this gap, we extend a common class of joint longitudinal-survival models to account for the bounded support of the PRO observations and the substantial number of zeros. Our approach affords significant flexibility in the way that covariate information is accommodated, as well as how correlation is captured via subject-specific random effects. As a result, we are able to estimate treatment effects separately for multiple outcome measures and understand individual-level variation in the latent disease process. Our models also accommodate starkly different longitudinal trends for PROs associated with side effects of the therapy, as well as a previously unappreciated temporal periodicity in these PRO trajectories. 3.

ZERO-AUGMENTED JOINT MODELING APPROACH

We develop a zero-augmented beta (ZAB) distribution to address both the bounded support of the PRO measurement scale and the excessive zeros. First, we divide all values by 100 and subtract .005 from the values equal to 1 (comprising fewer than 1% of all the scores) to obtain values in [0, 1). Our ZAB is a two-part mixture distribution comprising a degenerate point mass at zero and a beta distribution, so that its support is Y ∈ {0} ∪ (0, 1). To formalize our ZAB distribution, we define three parameters: ω for the probability of Y ∈ (0, 1) (i.e., not arising from the point mass), µ for the mean of Y ∈ (0, 1), and φ for the dispersion of Y ∈ (0, 1). That is, Y ∼ ZAB(ω, µ, φ) corresponds to Y = ZB, where Z ∼ Bernoulli(ω) and B ∼ Beta(µφ, (1 − µ)φ), independently. Standard calculations yield h i φµ+1 E(B) = µ and V ar(B) = µ(1−µ) , so that E(Y ) = ωµ and V ar(Y ) = ωµ − ωµ . To our φ+1 φ+1

knowledge, the only previously published zero-augmented beta approach (Cook et al. 2008)

7

specified a different parameterization. This previous model specified the two parameters of the beta distribution mixture component as α and α exp{−X′ β}, while we use φ/(1 + exp{−X′ β}) and φ/(1 + exp{X′ β}) (assuming a logit link). Having specified a distributional form for the PROs, we now wish to develop a model that links the both the presence and severity of PROs with an individual’s PFS time. Let Yi (s) denote the PRO score for the ith patient (i = 1, . . . , N ) at time s. We write the ZAB longitudinal component as follows: Yi (s) ∼ ZAB(ωi (s), µi (s), φ) , logit(ωi (s)) = X0i (s)β 0 + U0i (s) ,

(1)

and logit(µi (s)) = X1i (s)β 1 + U1i (s). Here, β 0 is a p0 -vector of fixed effects with design vector X0i for the probability of non-zero score, and β 1 is a p1 -vector of fixed effects with design vector X1i for the mean of a nonzero score. Given an r0 -vector of individual-specific parameters, U0i = (U0i1 , . . . , U0ir0 )′ , we form a latent trajectory that underlies the probability of a non-zero score at time s, U0i (s). Similarly, given the r1 -vector of parameters U1i = (U1i1 , . . . , U1ir1 )′ , U1i (s) is an individualspecific latent trajectory that underlies the mean of a non-zero score. We assume that the latent variables U0i and U1i are distributed according to a multivariate normal. The formulation of (1) uses logit links in both longitudinal submodels. However, although we require two link functions that transform the linear predictor onto (0, 1), there is no reason to believe that the same link will be best for both. As described above, misspecification of the link, particularly skewness and kurtosis, can negatively impact the analysis; the relationship between linear predictor and modeled probabilities or means must be investigated to determine appropriate links. In the case of the presence submodel, there is a convenient interpretation of the parameters under the logit link (i.e., the exponentiated coefficients are odds ratios). In the case of the severity submodel, there is no such interpretation; the link function is simply a mathematical mechanism. If we wish to use a more general link function 8

containing additional control parameters, it is likely that the severity submodel has more information available to estimate them. We return to the choice of links in the context of our data analysis in Section 5 below. Moving on to the survival component of the model, let Ti denote the PFS time (with indicator δi = 1 if the progression/death event was observed, and δi = 0 if it was censored), and write the survival model as Ti ∼ Weibull(γ, λi ) , if δi = 1

(2)

for log(λi ) = X2i β 2 + A(α, U0i , U1i ) , where we use the survival function as the likelihood contribution for censored PFS observations; see (4) and (6) below. The hazard, λi , depends on a p2 -vector of fixed effects, β 2 , with corresponding design vector X2i . We link the longitudinal and survival components of the hierarchical model together using the function A, which depends on the vectors of random effects, U0i and U1i , and a parameter vector α. We remark that, though we have written this modeling framework in full generality, below we consider only simple forms for U0i (s) and U1i (s) so that r0 and r1 are small (e.g., U0i (s) = U0i whence r0 = 1). Our work is informed by Rizipoulos et al. (2008), who distinguish between models that allow greater flexibility in the form of association among measurements (“correlated effects”) and those that are simpler, involving parameters shared between submodels for different measurement types (“shared effects”). We prefer to use the most flexible model that is feasibly fit using the available data and existing software. In the formulation of (1), the two correlated sets of random effects in the longitudinal submodels allow great flexibility. But with only one survival observation per person, it is not feasible to add a third correlated random trajectory to (2). In preliminary modeling, we considered a single normal random intercept shared across all three submodels; three jointly normal intercepts, one for each submodel; two jointly normal intercepts, found in the longitudinal submodels only; and two jointly normal intercepts for the longitudinal submodels, both shared into the survival submodel. 9

We settled upon the hybrid approach above as a compromise between the flexibility of correlated effects and the scant information available about each individual in their survival observation compared to their longitudinal trajectories. 4.

ZAB JOINT MODELING IN THE MESOTHELIOMA CLINICAL TRIAL

Exploratory data analysis revealed possible periodicity in the PRO measurements according to the day within treatment cycle on which they were measured. On several PRO items (particularly anorexia and fatigue), PRO measurements were consistently highest on day 8 of each treatment cycle, declined on day 15, and improved further on day 19 before spiking again on day 8 of the next cycle. As a result, we use models that include parameters for both cycle and day-within-cycle. For the random trajectories, this parameterization limits our ability to include person-specific parameters, since estimating person-specific parameters at every cycle is infeasible. Thus, we restrict attention to random intercept models; in the notation of (1), we write U0i (s) = U0i and U1i (s) = U1i . At the suggestion of a referee, we also investigated the residuals from our hybrid model to look for additional longitudinal autocorrelation not accounted for by the shared random intercepts. We discovered that in one of our two representative outcome measures, the residuals from the presence submodel exhibited strong (i.e., ∼ .7) positive lag-1 autocorrelation. Thus, we also consider including a first-order autoregressive term in the presence submodel by modifying (1) to logit(ωi (sj )) = X0i (sj )β 0 + U0i + θYi (sj−1 ) ,

(3)

for j = 2, . . . , ni with no autoregressive term for j = 1 and an intercept in X0i (s). Such an AR(1) term could also be incorporated in the severity submodel. Notice that the covariance between two longitudinal observations is substantially more complicated under this formulation because conditional independence no longer holds within the longitudinal observations, and the covariance between any two longitudinal observations depends not only on their 10

shared Ui but also on the path of observations between them. We discuss the use of this model enhancement for our clinical trial data in Section 5 below. Thus, we carry on with the following implementation of the joint model given by equations (1) and (2). Let the fixed effect design matrices for the longitudinal submodels, X0i (s) = X1i (s) = Xi (s), consist of the following 21 items: an intercept; five contrast-coded indicators of baseline covariates race/ethnicity (= 1 if white/Caucasian, = −1 if non-white), gender (= 1 if male, = −1 if female), Karnofsky performance status at baseline (= 1 if 90 − 100, = −1 if < 90, ), baseline stage of disease (= 1 if < Stage I/II, = −1 if Stage III/IV, ), vitamin supplementation status (= 1 if fully supplemented, = −1 if not), and treatment assignment (= 1 if pemetrexed/cisplatin, = −1 if cisplatin alone); two contrast-coded day-within-cycle variables (day 8 as baseline); six dummy-coded cycle variables (pre-randomization visits as baseline); and six interaction terms for treatment assignment with each cycle. The seven fixed effect entries in survival model design matrix X2i in (2) are an intercept and contrastcoded indicators of baseline covariates race/ethnicity, gender, Karnofsky status, baseline stage of disease, vitamin supplementation, and treatment assignment. The model given above produces simple-to-interpret overall treatment effects for each PRO score and PFS, and we refer to it as the base model. To address additional analysis goals, we also consider model selection from a full model that includes all pairwise interactions of these important baseline covariates with each other. We took a stochastic variable selection (SVS) approach based on a “spike and slab” prior formulation to dynamically select from this large collection of interactions. Turning to the individual-specific latent parameters, we suppose that the vectors Ui = (U0i , U1i )′ are independently and identically distributed from a bivariate normal distribution with mean zero and 2 × 2 unstructured covariance matrix Σ. Our survival submodel uses a vector α = (α1 , α2 )′ to link the three random effects for an individual to the log hazard; in this paper we select A(α, U0i , U1i ) = α1 U0i + α2 U1i . Together, Σ and α parameterize the

11

associations among the probability and severity of PROs and PFS times for individuals. We specify vague prior distributions for the fixed regression effects, β 0 ∼ Np0 (0, 102 Ip0 ), β 1 ∼ Np1 (0, 102 Ip1 ), and β 2 ∼ Np2 (0, 102 Ip2 ), where In denotes an n × n identity matrix and p0 , p1 , and p2 are as in (1) and (2). Second, we adopt a similarly vague IW (rIr , r) prior for Σ, where IW denotes the inverse Wishart distribution and r = r0 + r1 is the dimension of the combined collection of random effects for both longitudinal submodels. We choose a G(1/10, 10) (mean 1, variance 10) prior for the Weibull shape parameter, γ, and N2 (0, 102 I2 ) prior for the PRO-PFS linking parameter, α. Finally, our only informative ˆ specification for the beta precision parameter φ, where φˆ is an empirical prior is a G(1, φ) Bayes-type estimate based on preliminary analysis. These priors represent a balance between minimal informativeness and the need to constrain the parameters to a reasonable range to achieve acceptable MCMC convergence. 4.1 Joint Model Likelihood For fixed effect vector Θ = (β 0 , β 1 , β 2 , Σu , φ, γ, α)′ and complete random effect vector U = (U01 , . . . , U0N , U11 , . . . , U1N )′ , we obtain the following expression proportional to the full joint posterior distribution: # (N "n i Y Y p(yij |β 0 , β 1 , U0i , U1i , φ) p(Ui |Σu ) p(Θ, U|Y, T, ∆) ∝ i=1

j=1

)

(4)

× p (ti |β 2 , U1i , U1i , α, γ)δi S (ti |β 2 , U1i , U1i , α, γ)1−δi π(Θ) , where the ith person’s j th PRO is yij at time sij , ti is the PFS/censoring time, and δi is the censoring indicator. Then Y, T, and ∆ are the complete vectors of these observed quantities over all i = 1, . . . , N and j = 1, . . . , ni . The prior π(Θ) is a product of the independent prior distributions on each element of Θ described above. The ZAB likelihood contribution

12

for each PRO observation yij , p(yij |β 0 , β 1 , U0i , U1i , φ), is [1 − ωij (β 0 , U0i )] I(yij = 0) # " + ωij (β 0 , U0i ) cij yij µij (β 1 ,U1i )φ−1 (1 − yij )(1−µij (β 1 ,U1i ))φ−1 I(yij > 0) ,

(5)

where the parameters are cij =

Γ(φ) , Γ (µij (β 1 , U1i )φ) Γ ((1 − µij (β 1 , U1i ))φ)

ωij (β 0 , U0i ) = logit−1 (X0ij β 0 + U0i ) , and µij (β 1 , U1i ) = logit−1 (X1ij β 1 + U1i ) . Note that ωij contains an AR(1) term θyi,j−1 , as needed. The Weibull likelihood contribution for the ith person is p (ti |β 2 , U1i , U1i , α, γ) = γλi (β 2 , U0i , U1i , α)tγ−1 exp −λi (β 2 , U0i , U1i , α)tγi i  or S (ti |β 2 , U1i , U1i , α, γ) = exp −λi (β 2 , U0i , U1i , α)tγi ,



(6)

according to whether the terminal event was observed or censored, respectively. The Weibull parameter is λi (β 2 , U0i , U1i , α) = exp (X2i β 2 + α1 U0i + α2 U1i ). 4.2 Computing We use the WinBUGS 1.4 software, freely available at www.mrc-bsu.cam.ac.uk/bugs, to fit these models. Convergence was monitored via MCMC chain histories, auto- and crosscorrelations, density plots, and Brooks-Gelman-Rubin statistics (Brooks and Gelman 1997). The results below reflect 2000 iterations of burn-in followed by 5000 production samples from each of three parallel chains. The “ones trick” (Spiegelhalter et al. 2004) for implementing our ZAB distribution led to very long computation times. As such, we instead utilize the WinBUGS Development Interface (WBDev; Lunn (2003)) to compile functions and distributions in component Pascal. Running WinBUGS from within the BlackBox Component Builder 1.5 (Oberon Microsystems, Zurich) development environment provides access to the hand-coded, three-parameter univariate ZAB distribution. All of this is efficiently accomplished via the R2WinBUGS (Sturtz et al. 2005) package for R. 13

5.

RESULTS

We demonstrate results for two PROs that illustrate very different responses to therapy. As mentioned above, preliminary data analysis suggested that some PROs were associated with day-within-cycle periodicity, as well as increased severity at treatment visits (compared to pre-randomization baseline visits) in both treatment groups. Anorexia and fatigue were most clearly in this group, which is consistent with the documented side effects. The most commonly reported are stomach upset, low blood cell counts, fatigue, oral sores, anorexia, and rash (Eli Lilly and Co. 2011). In contrast, other PROs (pain, dyspnea, and cough) exhibited much weaker periodicity and displayed different trends in the the treatment versus control groups. The three global measures followed their own patterns, not belonging clearly to either the therapy-associated or disease-associated clusters. In the results below, we present findings from one PRO in each of the two clusters; anorexia as a representative therapy-associated outcome, and pain as a disease-associated outcome. Preliminary results showed residual lag-1 autocorrelation among residuals for the presence of pain. Thus, only the submodel for the presence of pain includes an AR(1) term as in (3). This model represents a substantial DIC improvement (of approximately 40) over a model for pain with no AR(1) term. Figure 1 shows the impact on of this modeling change on the residuals. We use deviance residuals for the presence submodel (left) and logit residuals for the severity submodel (right). The top row displays these residuals from a hybrid joint model for pain that uses random intercepts only, and the bottom panels show the results for these same model with an added AR(1) term in the presence submodel. From this figure, we can see the strong positive autocorrelation among residuals from the presence of pain submodel (top left) is alleviated by inclusion of the AR(1) term (bottom left). 5.1 Fixed Effect Parameters The posterior medians and 95% equal-tail credible intervals for the baseline covariate effects in β 0 in the base model are shown in the leftmost column of Figure 2. These parameters 14

PAIN RI Only, Severity

0.0

0.0

0.5

Density 0.4 0.8

Density 1.0 1.5

1.2

2.0

PAIN RI Only, Presence

−1.0

−0.5 0.0 0.5 Lag 1 Autocorrelation

1.0

−1.0

1.0

PAIN RI + AR, Severity

0.0

0.0

Density 0.4 0.8

Density 0.4 0.8

1.2

PAIN RI + AR, Presence

−0.5 0.0 0.5 Lag 1 Autocorrelation

−1.0

−0.5 0.0 0.5 Lag 1 Autocorrelation

1.0

−1.0

−0.5 0.0 0.5 Lag 1 Autocorrelation

1.0

Figure 1: Distribution of lag 1 autocorrelation among residuals within each person’s data from the presence (left) and severity (right) submodels for hybrid joint models for pain that contain random intercepts only (top row) and added AR(1) terms in the presence submodel (bottom row). capture the effects of the baseline covariates on the overall logit probability of a non-zero PRO (negative parameters represent lower probability of non-zero symptoms). Corresponding posterior summaries for baseline covariate effects in β 1 are shown in the middle column of Figure 2. These represent the effects of covariates on the logit of the overall mean of non-zero PRO scores (negative parameters represent less severe PROs). Finally, posterior medians and intervals for the baseline covariate effects in β2 are shown in the right column of Figure 2. These represent the effects of covariates on the hazard of progression or death (negative parameters represent decreased hazard). Note that in all of these, the parameters have been doubled so that they represent the difference (on the logit scale) between the two 15

groups defined by the -1/1 contrast coding mentioned in Section 4. Figure 2 clearly shows the difference in the periodicity between anorexia, where the day-within-cycle parameters in the first two columns are significant, and pain, where they are (mostly) not, though there is some indication of periodicity in the severity of pain. Receiving full vitamin supplementation is associated with lower severity of anorexia, but higher probability of reporting pain. High baseline Karnofsky status is associated with lower probability and severity of both pain and anorexia. Male gender was associated with lower severity of both PROs (the effect for pain was nearly significant). Turning to the PFS outcome, both early stage of disease and high Karnofsy status at baseline are associated with decreased hazard of progression/death in models joint with either PRO. We turn next to the full model containing all pairwise covariate interaction terms and stochastic variable selection. The SVS approach yields posterior means of the binary inclusion/exclusion variables for each interaction term. These indicate the posterior probability that the covariate interaction should be included in the model. Sadly, none of the interaction terms reached a 0.5 posterior probability of inclusion for either anorexia and pain. In the results below, we therefore restrict attention to the base model containing no covariate interaction terms. We note however that in model fits for a shared effect model (in which a single random effect governed both probability and severity of symptoms) several interaction terms did emerge as significant. This suggests a trade-off between a more flexible random effect structure that can accommodate greater individual variability at the cost of potentially “using up” more of the information. Turning to the effects of treatment, Figure 3 displays posterior medians and posterior credible intervals for the modeled treatment and control group trajectories by treatment cycle, for both presence of non-zero PRO (left column) and the mean non-zero PRO severity (right). In this figure, the baseline covariates were all set to 1 for purposes of illustration, and these represent day 8 fits for the treatment cycles (cycle 0 indicates pre-randomization

16

visit).

Note that the intervals are wide and may overlap substantially even when the

treatment×cycle effect is “significant” because the fitted probabilities/means include variability in other effects due to the non-linear inverse logit transformation. To see why, note that even if we set all the covariates to 0 and vary only the treatment assignment, we obtain a fitted probability of symptoms of [exp {−(β01 + β0,trt )}]−1 in the treatment group and [exp {−(β01 − β0,trt )}]−1 in the control group. We cannot isolate the treatment effect β0,trt because the intercept does not “cancel out” as it would in a linear model. Notice that in the therapy-associated PRO (anorexia, top row), the probability and severity increases immediately upon therapy initiation, and the groups do not differ over time. Contrast this to the disease-associated PRO (pain, bottom row), where the differences between the treatment groups evolve over time, growing wider with additional treatment cycles, particularly on the severity side. The effect of treatment on hazard of progression/death was similar whether it was estimated in a model joint with pain or anorexia. The posterior median difference in fitted months of median PFS between treatment and control groups was 2.1 (CI: .96 − 3.6) in the model with anorexia and 1.4 (CI: 0.62 − 2.7) in the model with pain. These treatment differences are computed with all of the baseline covariates set to 1 (i.e., in white males with Stage I/II disease, baseline Karnofsky ≥ 90 and full vitamin supplementation). Moderate to strong positive association was observed between the two random intercepts (U0i and U1i ). Posterior medians for their correlation were .53 (CI: .39-.64) and .59 (CI: .46-.69) in the anorexia and pain models, respectively. This indicates that the baseline probability of having non-zero PRO score and the baseline PRO severity for an individual are positively associated, but not so highly that a single set of random effects would suffice. The two α parameters control the impact of individual-specific intercepts and slopes on the hazard submodel. In the base models for both anorexia and pain, while the median for α1 is positive, the posterior credible interval narrowly includes 0, indicating that the relationship

17

between the person-specific intercept in the presence submodel and the survival submodel is weak. However, the median for α2 is positive and its interval excludes zero, indicating a significant relationship between the person-specific intercept in the severity submodel and the survival submodel. The sign of this parameter indicates that higher severity of these two PROs in an individual is associated with greater hazard of progression/death. 5.2 Checking for Differential Skewness in the Links To investigate adequacy of the logit link in the two longitudinal submodels, we consider plots of the linear predictor versus the predicted probability. Figure 4 illustrates this empirical approach to examining link misspecfication in the base model for anorexia (top) and pain (bottom). We consider the linear predictors based on individual-level fit, that is, X0i (s)β 0 +u0i +θyi,j−1 in the presence submodel and X1i (s)β 1 +u1i in the severity submodel. We divide these linear predictors into 10 intervals containing roughly equal numbers of observations. For the presence submodel, we plot the distribution of inverse-logit transformed linear predictors in each bin (black boxplots), that is, the fitted probabilities of non-zero PRO. We compare the shape of this relationship to the observed proportion of PRO observations that are greater than 0 in each bin (grey line). For the severity submodel, we again plot the distribution of inverse-logit transformed linear predictors (black boxplots), which now represent the fitted mean of non-zero PROs. Then we overlay the empirical distributions of the observed non-zero PROs (grey boxplots). These figures show little visual evidence of link misspecification, which we would see as differences in the shapes of the fitted and observed trends; Additional model fits using other links (e.g., probit, and skewed links with fixed or random skewness parameters) either failed to converge or did not meaningfull change these relationships. These plots suggest that the standard logit links are adequate for these data in both the presence and severity submodels.

18

5.3 Individual-Level Parameters We also wish to examine how the individual-level random effects work to adjust the fitted trajectories. For the base anorexia model, Figure 5 displays individual and group results for six individuals chosen on the basis of their fitted individual-specific intercepts. To illustrate the anorexia presence submodel (top row), we choose three individuals on the basis of their posterior median U0i : one in the bottom decile (left), one near 0 (middle), and one in the top decile (right). To investigate the severity submodel (bottom row), we choose three additional individuals, this time with posterior median U1i values in the bottom, middle, or top deciles. For each individual, we plot the group-level fitted trajectory (grey), individual-level fitted trajectory (black), and observed data (solid circles); the individual-specific random effect posterior medians are shown above each subplot. This figure clearly shows the difference in the magnitude and direction of individual-level adjustment that are possible in the two longitudinal submodels. In the presence submodel (top row), the individual with many zero PRO observations (left, i = 1) has a much lower fitted trajectory than his group. However, the individual with no zero observations (right, i = 90) cannot have higher fitted trajectory, because the group fit is already so close to the maximum value of 1. In contrast, individuals show both significantly positive and negative adjustments to the group fit in the severity submodel (bottom row). We see that an individual with consistently low severity (left, i = 45) has a lower trajectory than the group, and an individual with consistently high severity (right, i = 59) has a higher fitted trajectory. The patterns seen for these six individuals hold in the whole clinical trial population. None of the credible intervals for U0i excluded zero in the positive direction, while 75 out of 448 were significantly negative. Compare this to U1i , where 92 of the credible intervals excluded 0 from the positive side and 93 excluded 0 from the negative side. This asymmetry in the random effects may contribute to the failure of α1 (the parameter linking U0i to the survival submodel) to exclude the null value of 0, while α2 is significantly positive. It is also 19

further reason to include separate random effects in these two models, rather than assuming that a single random effect governs both presence and severity. 5.4 Joint versus Separate Modeling Finally, we wish to investigate the relative benefit obtained by fitting these models using both sources of data (longitudinal and survival) compared to utilizing them separately. For a longitudinal-only model, we could simply delete the survival submodel from the joint model, retaining the same random effect structure. However, for the survival-only model, the two individual-level intercepts U0i and U1i and their corresponding coefficients α1 and α2 are not identified with only one observation per person. Thus, we fit a modified survivaliid

only model having the form log(λi ) = X2i β 2 + ui where we assume ui ∼ N (0, σu2 ) and we employ a σu ∼ U nif (0.01, 100) prior on the random effect standard deviation. To make our comparisons more fair, we similarly modify the joint and longitudinal-only models to use a single random intercept per person. Specifically, we use logit(ωi (s)) = X0i (s)β 0 + α1 ui in the presence submodel and logit(µi (s))) = X1i (s)β 1 + ui in the severity submodel for both the longitudinal-only and joint models. Then we use log(λi ) = X2i β 2 + α2 ui in the survival submodel of the joint model. Comparing posterior credible intervals for β 0 and β 1 , they were sometimes wider and sometimes narrower in the joint model compared to the longitudinal-only model, but these widths differed by at most 12%. By contrast, the β 2 credible intervals in the joint model were 65% to 89% narrower than those obtained using only the survival data. This demonstrates yet another aspect of the differential information content from different data sources. We obtain much greater benefit by adding the comparatively information-rich longitudinal data to the survival data, thereby improving parameter estimates in the survival side of the model.

20

6.

DISCUSSION

Although clinically-determined endpoints (e.g., survival outcomes) have generally been the basis for recommendation and approval of effective cancer therapies, PROs can be equally important study outcomes. Moreover, compared with survival endpoints, PROs can be used to better assess the treatment effect when minimal or no survival differences are observed (Fairclough 2004). Our analysis highlights several features of these PROs not previously reported. First, we were able to distinguish between two groups of symptoms, which we call therapy- and disease-associated. In the representative example of the former group, anorexia, periodicity within treatment cycle manifests as significantly negative day 15 and day 19 parameters for that model in Figure 2. The pain scores show much less periodicity; thus, only one of the corresponding day of measurement parameters in Figure 2 has a credible interval that excludes 0. Second, the trends across cycles of treatment also emerged as rather different in the two kinds of PROs. In anorexia, the fitted scores were substantially higher at each post-randomization cycle in both treatment groups. The control group receives an active control, so that all patients are receiving a dose of therapeutic agent that appears to produce loss of appetite as a side effect. Third, individual-level parameters are included to induce association both across longitudinal PRO measurements and between the PRO and PFS outcomes. As illustrated in Figure 5, we observed an asymmetry in the ability of individuallevel parameters to adjust the probability of non-zero symptoms versus the severity of those symptoms. Importantly, our methods allow us to simultaneously and efficiently evaluate several treatment effects on endpoints, including the probability of experiencing a PRO item, the severity of each non-zero PRO item, and the PFS benefit. In the case of pain, we found differences between treatment and control groups across treatment cycles that widened and were significant for both presence and severity. This was in contrast to the lack of treatment

21

benefit in anorexia. Our results for the benefit of treatment on PFS are consistent with the original analysis of these data, which also showed an approximately 2-month increase in median time to progression/death (Vogelzang et al. 2003). Of course, criticisms of our analysis can be made. One is a possible violation of the proportional hazards assumption. In the Kaplan-Meier curves for the two treatment groups (not shown), we find some indication that the curves converge at later times, when there are few events and the confidence intervals overlap completely. Also, na¨ıve likelihood estimates of treatment effects (i.e., log hazard ratio for treatment versus control) made at several points in time (not shown) indicate that a piecewise exponential model for PFS (see e.g., Ibrahim et al. (2002, Example 4.3)) may be more appropriate, since the treatment effect appears to be changing over time. We have used parametric (Weibull) survival for computational convenience, allowing the models to be fit in standard software (i.e., WinBUGS) and keeping our methods accessible to statistical support staff. However, customized MCMC algorithms could provide the flexibility to fit models containing semi-parametric survival, such as Cox proportional hazards, or more complex forms for the latent individual-specific effects. Turning to future work, we are interested in investigating the relative contributions of multiple data types to learning about both group- and individual-level effects, building upon and formalizing the work in Section 5.4. In the present data, we informally observed that the “richest” source of information about individuals is the collection of repeated, continuous longitudinal severity observations, rather than the single survival time per person. In addition, the “ceiling effect” in binary longitudinal observations may lead to asymmetry in our ability to learn about individual-level parameters. However, in settings with a large proportion of censoring, the occurrence of the terminal clinical event may be the most informative observation about an individual. Quantifying the impact of the number and type of longitudinal observations, as well as the proportion of censoring, on learning in joint models is a subject of current research (Hatfield et al. 2011).

22

Second, the choice of link functions is an important and often overlooked element of model building. In our zero-augmentation setting, the need to choose not one but two links is complicated by the asymmetry noted above. Although we found no evidence of a need for skewed or otherwise flexible links in these data, the choice of link and desire to learn about the link from skewed binary data in zero-inflation/augmentation settings remain subjects for future investigation. Third, we wish to extend these models to combine multiple longitudinal outcomes with survival. In such a model, specifying appropriate random effect structures is complicated; for example, we may assume that the same underlying random effects are common across the longitudinal measures, or we could model separate sets of random effects for each longitudinal outcome separately. In the latter case, specifying the distribution that governs these many random effects and defining the function A that links them to the survival submodel may be substantially more difficult. REFERENCES Berman, D. and Crump, K. (2003), “Technical Support Document for a Protocol to Assess Asbestos-Related Risk : Final Draft,” Tech. rep., United States Environmental Protection Agency. Bottomley, A., Coens, C., Efficace, F., Gaafar, R., Manegold, C., Burgers, S., Vincent, M., Legrand, C., van Meerbeeck, J., and EORTC-NCIC (2007), “Symptoms and patientreported well-being: do they predict survival in malignant pleural mesothelioma? A prognostic factor analysis of EORTC-NCIC 08983: randomized phase III study of cisplatin with or without raltitrexed in patients with malignant pleural mesothelioma,” Journal of Clinical Oncology, 25, 5770–5776. Brooks, S. and Gelman, A. (1997), “General methods for monitoring convergence of iterative simulations,” Journal of Computational and Graphical Statistics, 7, 434–455. 23

Cook, D., Kieschnick, R., and McCullough, B. (2008), “Regression analysis of proportions in finance with self selection,” Journal of Empirical Finance, 15, 860–867. Czado, C. and Santner, T. (1992), “The effect of link misspecification on binary regression inference,” Journal of Statistical Planning and Inference, 33, 213–231. R Eli Lilly and Co. (2011), “Information for Patients and Caregivers, Alimta (pemetrexed for

injection),” [Package insert] http://pi.lilly.com/us/alimta-ppi.pdf. Eli Lilly and Company (2010), “Alimta prescribing information,” http://pi.lilly.com/ us/alimta-pi.pdf, Published 2010, Accessed 10/7/2010. Fairclough, D. (2004), “Patient reported outcomes as endpoints in medical research,” Statistical Methods in Medical Research, 13, 115–138. Galateau-Sall, F. (2006), “Epidemiology of mesothelioma,” in Pathology of Malignant Mesothelioma, London: Springer. Ghosh, P. and Albert, P. (2009), “A Bayesian analysis for longitudinal semicontinuous data with an application to an acupunture clinical trial,” Computational Statistics and Data Analysis, 53, 699–706. Ghosh, S., Gelfand, A., Clark, J., and Zhu, K. (2011), “The k-ZIG: flexible modeling for zero-inflated counts,” Tech. rep., Department of Statistics, Duke University. Hanson, T., Branscum, A., and Johnson, W. (2011), “Predictive comparison of joint longitudinal-survival modeling: a case study illustrating competing approaches,” Lifetime Data Analysis, 17, 3–28. Hatfield, L., Hodges, J., and Carlin, B. (2011), “Bayesian learning in hierarchical joint models,” Tech. rep., Division of Biostatistics, School of Public Health, University of Minnesota.

24

Hollen, P., Gralla, R., Liepa, A., Symanowski, J., and Rusthoven, J. (2006), “Measuring quality of life in patients with pleural mesothelioma using a modified version of the Lung Cancer Symptom Scale (LCSS): psychometric properties of the LCSS-Meso,” Supportive Care in Cancer, 14, 11–21. Ibrahim, J., Chen, M.-H., and Sinha, D. (2002), Bayesian Survival Analysis, New York: Springer-Verlag. Lachenbruch, P. (2002), “Analysis of data with excess zeros,” Statistical Methods in Medical Research, 11, 297–302. Laffler, M. (2009), “FDA will revisit appropriate use of PFS endpoints at advisory committee,” The Pink Sheet, September 17. Lipscomb, J., Reeve, B., Clauser, S., Abrams, J., Bruner, D., Burke, L., Denicoff, A., Ganz, P., Gondek, K., Minasian, L., O’Mara, A., Revicki, D., Rock, E., Rowland, J., Sgambati, M., and Trimble, E. (2007), “Patient-reported outcomes assessment in cancer trials: taking stock, moving forward,” Journal of Clinical Oncology, 25, 5133–5140. Lu, C. and Meeker, W. (1993), “Using degradation measures to estimate a time-to-failure distribution,” Technometrics, 35, 161–174. Lunn, D. (2003), “WinBUGS Development Interface (WBDev),” ISBA Bulletin, 10, 10–11. Neuhaus, J. (1999), “Bias and efficiency loss due to misclassified responses in binary regression,” Biometrika, 86, 843–855. Niyikiza, C., Baker, S., Seitz, D., Walling, J., Nelson, K., Rusthoven, J., Stabler, S., Paoletti, P., Calvert, A., and Allen, R. (2002), “Homocysteine and methylmalonic acid: markers to predict and avoid toxicity from pemetrexed therapy,” Molecular Cancer Therapeutics, 1, 545–552. 25

Pregibon, D. (1980), “Goodness of link tests for generalized linear models,” Applied Statistics, 29, 15–24. Ray, M. and Kindler, H. L. (2009), “Malignant pleural mesothelioma,” Chest, 136, 888–896. Rizipoulos, D., Verbeke, G., and Molenberghs, G. (2008), “Shared parameter models under random effects misspecification,” Biometrika, 95, 63–74. Rizopoulos, D., Verbeke, G., Lesaffre, E., and Vanrenterghem, Y. (2008), “A two-part joint model for the analysis of survival and longitudinal binary data with excess zeros,” Biometrics, 64, 611–619. Rock, E., Kennedy, D., Furness, M., Pierce, W., Pazdur, R., and Burke, L. (2007), “Patientreported outcomes supporting anticancer product approvals,” Journal of Clinical Oncology, 25, 5094–5099. Rusch, V. (1995), “A proposed new international TNM staging system for malignant pleural mesothelioma. From the International Mesothelioma Interest Group,” Chest, 108, 1122– 1128. Spiegelhalter, D., Thomas, A., Best, N., and Lunn, D. (2004), WinBUGS User Manual, v 1.4 ed. Sturtz, S., Ligges, U., and Gelman, A. (2005), “R2WinBUGS: A package for running WinBUGS from R,” Journal of Statistical Software, 12, 1–16. Tsiatis, A. and Davidian, M. (2004), “Joint modeling of longitudinal and time-to-event data: an overview,” Statistica Sinica, 14, 809–834. U.S. FDA (2004), “Alimta label and approval history, NDA 021677,” http://www. accessdata.fda.gov/scripts/cder/drugsatfda/index.cfm, Last updated 8/19/2004, Accessed 10/7/2010. 26

Vogelzang, N., Rusthoven, J., Symanowski, J., Denham, C., Kaukel, E., Ruffie, P., Gatzemeier, U., Boyer, M., Emri, S., Manegold, C., Niyikiza, C., and Paoletti, P. (2003), “Phase III study of pemetrexed in combination with cisplatin versus cisplatin alone in patients with malignant pleural mesothelioma,” Journal of Clinical Oncology, 21, 2636–2644. Wagner, J., Sleggs, C., and Marchand, P. (1960), “Diffuse pleural mesothelioma and asbestos exposure in the North Western Cape Province,” British Journal of Industrial Medicine, 260–271. Wang, X. and Dey, D. (2010), “Generalized extreme value regression for binary response data: an application to B2B electronic payments system adoption,” Annals of Applied Statistics, 4, 2000–2003. Zucali, P., De Vincenzo, F., Simonelli, M., and Santoro, A. (2009), “Future developments in the management of malignant pleural mesothelioma,” Expert Review of Anticancer Therapy, 9, 453–467.

27

Presence

Severity

Hazard

Day 19

Day 15

Full Vit.

Stage I/II

High status

Male

White

−2 −1

0

1

2

−1.0 −0.5

Presence

0.0

0.5

−0.6

Severity

0.0

0.4

0.8

Hazard

Day 19

Day 15

Full Vit.

Stage I/II

High status

Male

White

−2

−1

0

1

2 −1.0 −0.6 −0.2

0.2

−0.6

0.0

0.4

0.8

Figure 2: Posterior medians and 95% credible intervals for effects of covariates on probability of non-zero PRO (β 0 , left), severity of non-zero PRO (β 1 , middle), and hazard of progression/death (β 2 , right) in anorexia (top row) and pain (bottom row). Credible intervals that exclude the null value of 0 are shaded darker.

28

Severity

0.85

20

Fitted Mean (Posterior Median, 95% CI) 30 40 50

Fitted Probability (Posterior Median, 95% CI) 0.90 0.95

60

1.00

Presence

Trt Ctrl 0

1

2 3 4 Treatment Cycle

5

6

0

1

5

6

Severity

15

Fitted Mean (Posterior Median, 95% CI) 20 25

Fitted Probability (Posterior Median, 95% CI) 0.6 0.7 0.8 0.9

1.0

Presence

2 3 4 Treatment Cycle

0.5

Trt Ctrl 0

1

2 3 4 Treatment Cycle

5

6

0

1

2 3 4 Treatment Cycle

5

6

Figure 3: Fitted probability of non-zero PRO or mean non-zero PRO severity in the treatment (black) and control (grey) groups for anorexia (top) and pain (bottom) base models. Cycles in which the treatment×cycle parameter’s 95% credible interval excludes the null value of 0 are plotted with stars.

29

0.8 E(Y|Y>0) 0.4 0.6 0.2

0.2

0.4

P(Y>0)

0.6

0.8

1.0

Severity

1.0

Presence

−2.1

1.5 4.08 6.12 Linear Predictor (deciles)

0.0

0.0

Observed Fitted 9

−2.31

1.16

0.8 E(Y|Y>0) 0.4 0.6 0.2

0.2

0.4

P(Y>0)

0.6

0.8

1.0

Severity

1.0

Presence

−1.29 −0.53 0.23 Linear Predictor (deciles)

−2.92

−0.42 2.36 4.4 Linear Predictor (deciles)

0.0

0.0

Observed Fitted 8.05

−2.53

−1.67 −0.89 0.07 Linear Predictor (deciles)

0.93

Figure 4: Observed and modeled relationship between linear predictor and fitted probability of non-zero PRO (left) or mean non-zero PRO severity (right) in anorexia (top row) and pain (bottom row). Modeled logit link relationships are plotted in black while the empirical proportions and distributions are shown in grey.

30

0

1

2

3 4 Cycle

5

6

Pr(Y>0) 0.0 0.2 0.4 0.6 0.8 1.0

U0, 90 = 3.54

Pr(Y>0) 0.0 0.2 0.4 0.6 0.8 1.0

U0, 62 = 0.07

Pr(Y>0) 0.0 0.2 0.4 0.6 0.8 1.0

U0, 1 = −5.55

0

2

3 4 Cycle

5

6

1

2

3 4 Cycle

5

1

6

2

3 4 Cycle

5

6

5

6

E(Y|Y>0) 0.0 0.2 0.4 0.6 0.8 1.0

U1, 59 = 1.17

E(Y|Y>0) 0.0 0.2 0.4 0.6 0.8 1.0 0

0

U1, 114 = 0.01

E(Y|Y>0) 0.0 0.2 0.4 0.6 0.8 1.0

U1, 45 = −1.16

1

Fitted Group Fitted Indiv Observed

0

1

2

3 4 Cycle

5

6

0

1

2

3 4 Cycle

Figure 5: Posterior medians and 95% credible intervals for the probability (top row) and mean severity (bottom row) of anorexia. Grey lines are for the group-level fit (i.e., ignoring Ui ) and black lines are for individual-level fit. Observations are plotted as solid circles and the posterior medians of the random effects are given above each plot.

31

Suggest Documents