Comparing Approaches for Predicting Prostate Cancer from ...

3 downloads 0 Views 234KB Size Report
Christopher H. Morrell1,2, Larry J. Brant2, and Shan L. Sheng2. Mathematical Sciences ... Gruttola and Tu, 1994; Hogan and Laird, 1997; Tsiatis et al., 1995 ...
Biometrics Section

Comparing Approaches for Predicting Prostate Cancer from Longitudinal Data Christopher H. Morrell1,2, Larry J. Brant2, and Shan L. Sheng2 Mathematical Sciences Department, Loyola College in Maryland, 4501 North Charles St., Baltimore, MD 21210-2699 USA [email protected] (410)617-2629, Fax, (410)617-28031 National Institute on Aging, 5600 Nathan Shock Drive, Baltimore, MD 21224 USA2 Abstract Classification approaches are compared using longitudinal data to predict the onset of disease. The data are modeled using linear mixed-effects models. Posterior probabilities are computed of group membership starting with the first observation and adding observations until the subject is classified as developing the disease or until the last measurement is used. From the longitudinal analysis we first use the marginal distributions of the mixed-effects models. Next, conditional on groupspecific random effects, the conditional distribution is used to compute the posterior probabilities. The third approach uses the distributions of the random effects. Finally, the subjects’ data is summarized by the most recent value and rate of change which are used in a logistic regression model to obtain formulae that can be applied at each visit to obtain probabilities of group membership. KEY WORDS: Classification, Linear Mixed-Effects Models, Sensitivity and Specificity 1. Introduction Prostate cancer is one of the common causes of cancer deaths in men, and after lung and stomach cancer, it accounts for the largest number of new non-skin cancer cases reported worldwide per year (Pisani et al., 1999). In the United States, for example, prostate cancer is the most common clinically diagnosed non-skin cancer with about 1 in 10 American men eventually getting a positive diagnosis. Since the chance of a diagnosis of prostate cancer increases with age, the present shift in the age distribution toward larger numbers of old men is expected to result in an even larger increase in the number of men diagnosed with prostate cancer (Carter and Coffey, 1990). Prostate specific antigen (PSA) is a glycoprotein that is produced by prostatic epithelium and can be measured in serum samples by immunoassay. Since PSA correlates with the cancer volume of the prostate, it has been found to be useful in the management of men with prostate cancer. As PSA increases, the extent of cancer and its chance of detection increases (Carter and Pearson, 1994). While PSA has been found to be a useful tumor marker for the diagnosis of men with prostate cancer, in some individual cases changes in PSA may not be predictive of cancer prognosis (Cadeddu et al., 1993). Also, studies have found that approximately 1 in 4 of

prostate cancer patients do not attain an elevated PSA level (Catalona et al., 1991; Carter et al., 1992). Brant et al. (2003) applied mixed-effects models to longitudinal PSA measurements (while the subject was cancer free) to obtain posterior probabilities of prostate cancer. These posterior probabilities are used to predict the future development of prostate cancer. Their approach first models the longitudinal PSA data using a mixed-effects model taking into account group membership so that each group will have its own mean trajectory. Then, sequentially adding one observation at a time, an individual’s PSA data is examined, and at each time, the marginal density of the individual’s data is computed for each group and Bayes’ Rule is applied to obtain the posterior probability of group membership. These posterior probabilities are then used to classify a subject as going on to develop cancer or not. In this study we compare the marginal approach taken in Brant et al. (2003) to a number of other ways of using the longitudinal data to obtain predictions of prostate cancer. That is, using longitudinal data on PSA before the onset of prostate cancer, we seek to predict the future development of prostate cancer based on a number of computational strategies. Other approaches to the prediction of prostate cancer could include applying mixtures of distributions or jointly modeling the multivariate longitudinal data and time to diagnosis. Verbeke and Lesaffre (1996) use a mixture of distributions in the random effects to account for heterogeneity in the data that can be related to some disease process. Posterior probabilities are obtained for membership in each group and these probabilities could be used to predict which outcomes belong to each group. Recently, numerous papers have considered both longitudinal changes in a variable and the associated effect on the length of time to the occurrence of an event, for example, see Altman and De Stavola, 1994; Bycott and Taylor, 1998; De Stavola and Christensen, 1996; De Gruttola and Tu, 1994; Hogan and Laird, 1997; Tsiatis et al., 1995; Wulfsohn and Tsiatis, 1997; Henderson et al., 2000; Wang and Taylor, 2001; and Guo and Carlin, 2004. Lin et al. (2002) considered the joint modeling of multiple longitudinal variables along with the time-to-event data. Law et al. (2002) applied a cure model for treated prostate cancer subjects that uses a mixture to model two groups of subjects and incorporates a joint model of the longitudinal PSA profiles and time to recurrence of disease.

127

Biometrics Section

Table 1. Descriptive statistics describing the BLSA sample. Mean (Minimum, Maximum) Control (C) Low Risk Cancer (L) High Risk Cancer (H) p-value Comparisons Number 136 57 12 Visits 6.6 (4, 13) 7.1 (4, 14) 9.6 (5, 15) 0.0001 C

Suggest Documents