joint modeling of multiple mixed-type outcomes using ...

2 downloads 0 Views 68KB Size Report
Piazza Leonardo da Vinci 32, 20133 Milano, Italy ... from a registry collecting clinical and process indicators, outcomes and personal information on patients.
J OINT MODELING OF MULTIPLE MIXED - TYPE OUTCOMES USING BAYESIAN SEMIPARAMETRICS : AN APPLICATION TO ACUTE M YOCARDIAL I NFARCTION PATIENTS Alessandra Guglielmi1 , Francesca Ieva1 , Anna Maria Paganoni1 and Elena Prandoni1 Dipartimento di Matematica Politecnico di Milano Piazza Leonardo da Vinci 32, 20133 Milano, Italy (e-mail: [email protected], [email protected], [email protected]) A BSTRACT. We propose a Bayesian semiparametric regression model to represent mixed-type multiple responses for acute myocardial infarction. We accurately selected covariates to include in the likelihood and assumed a prior which yields clustering of the hospitals the patients were treated in. We considered data collected in the ST-Elevation Myocardial Infarction (STEMI) Archive, a multicenter observational prospective clinical study planned within the Strategic Program of Regione Lombardia. The data comes from a registry collecting clinical and process indicators, outcomes and personal information on patients admitted to all hospitals of Regione Lombardia with STEMI diagnosis.

1

I NTRODUCTION

In this work, a Bayesian semiparametric multivariate model is assumed to represent data including in-hospital and 60-days survival of patients admitted to a hospital with ST-elevation myocardial infarction (STEMI) diagnosis. STEMI is caused by an occlusion of a coronary artery which causes an ischaemia that, if untreated, can damage heart cells and make them die (infarction). It is fundamental for the patient’s recovery to do a reperfusion therapy as quickly as possible, since its benefits decrease highly non-linear with delay in treatment. In this study, we consider data collected in the STEMI Archive, a multicenter observational prospective clinical study planned within the Strategic Program of Regione Lombardia. All patients were treated with percutaneous transluminal coronary angioplasty. Data was recorded in a registry collecting clinical outcomes, process and time indicators and personal information on patients admitted to hospitals of Regione Lombardia with STEMI diagnosis. We introduce a multivariate regression model, where the response has three mixed components: door to balloon time (DB), i.e. the time between the admission to the hospital and angioplasty, in-hospital survival and survival after 60 days from admission. As mentioned before, the first response (continuous) is essential in quantifying the efficiency of health providers and plays a key role in the success of the therapy; the second is the basic indicator of success or failure of the treatment, while the third concerns a 60-days period, during which the effectiveness of the treatment, in terms of survival and quality of life, can be truly evaluated. Note that the last two responses are binary, so that, as a whole, the multivariate response is of mixed type.

For each patient, the joint conditional distribution of the three responses (i.e. the likelihood), given the parameters, is modeled as the product (i) of the distribution of the continuous response (DB time), (ii) of the distribution of the in-hospital survival given DB time, and (iii) of the 60-days survival, given DB time and in-hospital survival. All these conditional distributions lie within the class of (univariate) generalized linear mixed models, with random-effects given by hospital intercepts. Covariates corresponding to the other regression parameters include hospital admission variables (i.e. times to treatment, mode of transportation to hospital), patient’s clinical variables at hospital admission and patient’s general health status variables. As usual, we assume conditional independence among patients. The prior for the hospitals effects is given non-parametrically, as an ANOVA-Poisson-Dirichlet process prior, that is a family of distribution of dependent random probability measures, with (marginal) almost surely discrete trajectories generalizing the Dirichlet process. Such priors induce a random partition of the hospital labels. This is particular useful when we want to estimate a latent clustering among hospitals of our dataset, identifying groups of providers affecting outcomes at patients level in a similar way. Therefore, in this context, a clustering analysis of the hospitals is straightforward, based on posterior estimates of the random partition parameter itself. The posterior inference provided here includes posterior distributions of all parameters, predictive survival probabilities, and hospitals clustering.

2

D ESCRIBING

THE DATASET

We will be interested in predicting three responses, the first being continuous, and the others binary: (i) DB, the time between the admission to the hospital (Door) and angioplasty (Balloon), (ii) the in-hospital survival (ALIVEIN), and (iii) the survival after 60 days from infarction (ALIVE60). It is known that DB is an important indicator of the efficiency of healthcare providers and plays a key role in the success of the therapy. On the other hand, ALIVEIN is the basic indicator of success or failure of treatment, while ALIVE60 is an important outcome, since doctors believe that it is in a 60-days period the effectiveness of the treatment in terms of survival and quality of life can be truly evaluated. The dataset analyzed here collects information about n = 697 patients treated with angioplasty in 33 hospitals of Lombardia, 12 of these in Milan. The number of patients per hospital ranges from a minimum of 5 to a maximum of 60, with mean 21. The dataset is strongly unbalanced: 96.84% of patients are alive after the discharge and 98.37% of them are alive after 60 days. There are many covariates available at patients level, and fewer at hospital’s level. After model selection (both under frequentist and Bayesian procedures), we considered the following ones: • • • • • • •

ACCESS (x1 ): 1 if the patient came to hospital by any rescue unit, 0 otherwise (by oneself); ECG (x2 ): time of the first electrocardiogram; WEEKEND (x3 ): 1 if the admission was on holiday, weekend or between 6pm-8am, 0 otherwise; AGE (z1 ): age of the patient; RISK (z2 ): 1 if patient had risk factor as diabetes, smoking etc., 0 otherwise; KILLIP (z3 ): 1 if the infarction was severe, 0 otherwise; EF (s1 ): ejection fraction at admission to hospital, i.e. the volumetric fraction of blood pumped out of the ventricle with each heart beat;

• • • • •

3

COMP (s2 ): 1 if there were complications after the angioplasty, 0 otherwise. CKD (s3 ): 1 if the patient had chronic kidney disease, 0 otherwise; STres (s4 ): 1 if the treatment was not effective, 0 otherwise; HOSPITAL ( j): hospital of admission of the patient; MILAN (φ): 1 if the hospital is in Milano, 0 otherwise.

T HE

MULTIPLE RESPONSE MODEL

We consider a hierarchical generalized linear mixed model, where Poisson-Dirichlet process priors (Pitman and Yor (1997)), generalizing the well known Dirichlet process, are considered for modeling the random-effects distribution of the grouping factor which is the hospital of admission. For each patient (i = 1, . . . , n), treated in hospital j[i], let Yi := (Yi1 ,Yi2 ,Yi3 ) = (DBi , ALIV EINi , ALIV E60i ) be the multiple response. We assume that observations, given parameters and covariates, are independent and the law of the response can be factorized in three parts:

L (Yi |par, cov) = L (Yi1 |par1 , covi1 )L (Yi2 |Yi1 , par2 , covi2 )L (Yi3 |Yi2 , par3 , covi3 ). In particular, we assume: Yi1 |µi , σ ∼ N (µi , σ2 )

µi = β1 xi1 + β2 xi2 + β3 xi3 + tφ j[i] j[i] logit(pi ) = α1 zi1 + α2 zi2 + α3 exp(Yi1 ) + α4 zi3 + bφ j[i] j[i]

Yi2 |pi ,Yi1 ∼ Be(pi ) Yi3 |ri ,Yi2 = 1 ∼ Be(ri )

logit(ri ) = γ0 + γ1 si1 + γ2 si2 + γ3 si3 + γ4 si4 .

(1) (2) (3)

Note that parameters t and b, in the expression of µi and logit(pi ) in (1) and (2), represents random intercepts depending on φ[i], the covariate MILAN for patient i. In particular, for patients in hospitals in Milano those parameters become t1 j[i] and b1 j[i] respectively, and t0 j[i] and b0 j[i] in hospitals outside the city. The peculiarity of our model consists exactly in assuming different marginals for random effects in and outside Milano, however modelling dependence between hospitals through a hierarchical Bayesian model. β, α , γ , τ, (t0 j ,t1 j , b0 j , b1 j , j = 1, . . . , 33)). We asConsequently, the parameter is θ = (β sumed a priori independence of all components of θ and β ∼ N3 (0, 100I4 ), α ∼ N4 (0, 100I4 ), γ ∼ N5 (0, 100I5 ), σ2 ∼ inv − gamma(0.1, 0.1) iid

(t0 j ,t1 j , b0 j , b1 j )|P ∼ P,

P ∼ PD(a, g, P0 ).

(4)

By P ∼ PD(a, g, P0) we mean that P is a Poisson-Dirichlet process with parameters a, g > 0, while P0 is a probability measure on R4 . When a = 0, the Dirichlet process case is recovered. For the ease of computation it is useful to introduce the stick-breaking representation for P (see Ishwaran and James (2001)): ∞

P = ∑ Vi δθi , i=1

where {Vi } ⊥ {θi },

iid

θi ∼ P0 ,

parameter ACCESS ECG WEEKEND AGE RISK exp(Y1 )

2.5% -0.004 0.09 0.003 -1.33 -1.97 -0.01

97.5% 0.15 0.16 0.15 -0.26 0.58 0.01

parameter KILLIP intercept EF COMP CKD STres

2.5% -4.62 4.38 0.92 -2.78 -2.54 -1.32

97.5% -2.42 6.57 1.72 -0.75 -0.51 0.40

Table 1. Posterior CIs for the fixed-effects parameters.

and {Vi } are stick-breaking weights, i.e. j−1

V1 = Z1 , V j = Z j ∏ (1 − Zi ) j ≥ 2, Zi ∼ Beta(1 − a, g + ia), i = 1, 2, . . . . iid

i=1

The base probability measure on R4 , P0 , is chosen as the product measure of four independent Gaussian distributions with random means and variances: P0 = N (m1 , λ21 ) × N (m2 , λ22 ) × N (m3 , λ23 ) × N (m4 , λ24 ) iid

mi ∼ N (4.5, 100) i = 1, . . . , 4,

iid

λi ∼ U (0, 7) i = 1, . . . , 4

Hyperparameters a and g are assumed randomly distributed, i.e. a ∼ U (0, 0.9), g ∼ gamma(6, 2). For details on the prior choice see Prandoni (2013). This model extends the one proposed in Guglielmi et al. (2013) for the in-hospital survival in a similar dataset.

4

C ASE

STUDY

Posterior estimates have been obtained through a Gibbs sampler algorithm, implemented in JAGS with the aid of R. Table 1 reports the posterior credible intervals of the fixed-effects parameters: it is clear that patients who were not delivered by the 118 service and/or arrive at weekends (or nights, etc.) are penalized in terms of DB time. Furthermore, an increase of ECG time yields an increase of DB. For the in-hospital survival probability, elderly patients are penalized as well as people who have a severe infarction and/or some risk factors. The values of the parameter corresponding to exp(Y1 ) (which represents DB) are concentrated around 0: there is some uncertainty whether considering it as an important variable for the short period survival probability. However, all cardiologists agree on its clinical relevance. For the 60-days survival probability, the presence of complications after the angioplasty, as well as the CKD are negative factors, while a high value of EF (which indicates the good function of the heart) increases this probability. We also made a goodness-of-fit analysis, using posterior predictive distributions. However, the classification of patients was quite poor for two reasons: the high empirical rate

of survival and the choice of covariates we made, aimed at detecting models accounting for variability in the hospitals, instead of focusing on prediction. The nonparametric prior component, thanks to the discreteness of its trajectories, induces a random partition of hospitals’ labels: it allows us to provide a clustering of the hospitals according to the similarity of their effect on patients’ outcome, in particular on the DB time and on in-hospital survival. The Bayesian cluster estimate was here computed as the random partition of the hospitals label {1, 2, . . . , 33} minimizing the posterior expectation of Binder’s loss function, as proposed in Lau and Green (2007); this function assigns cost w when two elements are wrongly clustered together and cost u when two elements are erroneously assigned to different clusters. For equal misclassification costs w and u, we obtained six clusters: in

Figure 1. Posterior 95% CIs of the random intercepts t (left) and b (right): hospitals located in Milano are depicted in dashed lines, those outside Milano in solid lines.

Figure 1 we provide posterior 95% CIs of the hospital random intercepts tφ j (left) and bφ j (right); hospitals in the same cluster are represented with same colors. It seems from the figure that there are no differences between hospitals in and outside Milano. Hospitals show a higher variability with respect to DB than in-hospital survival probability. We guess this is due to the modelization: while the mean DB is a function of covariates depending on hospitals efficiency, the in-hospital survival probability depends on covariates indicating the health status of patients. To conclude, we underline that this Bayesian semiparametric model is able to allow providers’ profiling “automatically” (e.g., without fixing the number of clusters in advance), through the posterior distribution of the hospital-effects random partition provided by the non-parametric setting specified by (4).

R EFERENCES GUGLIELMI, A., IEVA, F., PAGANONI, A.M., RUGGERI, F., SORIANO J. (2013): Semiparametric Bayesian models for clustering and classification in presence of unbalanced in-hospital survival. Journal of the Royal Statistical Society, C, to appear.

ISHWARAN, H., JAMES, L. (2001): Gibbs Sampling Methods for Stick-Breaking Priors. Journal of the American Statistical Association, 96, 161–173. LAU, J. W., GREEN, P.J. (2007): Bayesian Model-Based Clustering Procedures. Journal of Computational and Graphical Statistics, 16, 526–558. PITMAN, J., YOR, M. (1997): The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. The Annals of Probability, 25, 855–900. PRANDONI, E. (2013): Modelli bayesiani semiparametrici multivariati per le probabilit`a di sopravvivenza in seguito ad infarto miocardico acuto. Master Thesis. Dipartimento di Matematica, Politecnico di Milano.

Suggest Documents