K-Adaptive Partitioning for Survival Data with an

0 downloads 0 Views 731KB Size Report
The K-adaptive partitioning algorithm can be conducted by the function kaps. ... the number of metastasis LNs (meta), the number of examined LNs (exam),.
Journal of Statistical Software

JSS

MMMMMM YYYY, Volume VV, Issue II.

http://www.jstatsoft.org/

K-Adaptive Partitioning for Survival Data with an Application to SEER: The kaps Add-on Package for R Soo-Heang Eo

Seung-Mo Hong

HyungJun Cho

Korea University

University of Ulsan

Korea University

Abstract The partitioning of an ordered prognostic factor is important in order to obtain several groups having heterogeneous survivals in medical research. For this purpose, a binary split has often been used once or recursively. We propose the use of a multi-way split in order to afford an optimal set of cut-off points. In practice, the number of groups (K) may not be specified in advance. Thus, we also suggest finding an optimal K by cross-validation. The algorithm was implemented into an R package that we called kaps, which can be used conveniently and freely. It was illustrated with a toy dataset, and was also applied to a real data set of colorectal cancer cases from the Surveillance Epidemiology and End Results.

Keywords: adaptive partitioning, multi-way split, staging, SEER.

1. Introduction Clinicians are interested in obtaining several groups with heterogeneous survivals by partitioning an ordered prognostic factor. A staging system can be constructed by a kind of partitioning. The tumor node metastasis (TNM) staging system is the most widely used cancer staging system, and provides critical information about prognosis and about estimation for responsiveness to specific treatment for cancer patients (Edge, Byrd, Compton, Fritz, Greene, and Trotti 2010). The TNM staging system is composed of 3 classifications: T classification based on the extent or size of the primary tumor, N classification determined by the involvement of the regional lymph nodes (LNs), and M classification by distant metastasis. Each T, N, or M classification is decided by grouping cases with similar prognosis. When T classification, based solely on the size of the primary tumor such as breast cancer, or N classification, in several gastrointestinal tract cancers, was determined, increased tumor size or

2

K-Adaptive Partitioning for Survival Data

meta | toy str(toy) 'data.frame': 150 obs. $ meta : int 1 4 0 9 $ status: num 0 1 1 1 $ time : int 0 26 22

of 3 0 1 0 1 0 0 15 70

variables: 5 0 0 ... 0 1 0 ... 96 97 10 32 127 ...

Selecting a set of cut-off points for given K Suppose we specify the number of subgroups in advance. For instance, K = 3. To select an optimal set of two cut-off points when K = 3, the function kaps is called via the following statements

Journal of Statistical Software

9

R> fit1 fit1 Call: kaps(formula = Surv(time, status) ~ meta, data = toy, K = 3) K-Adaptive Partitioning for Survival Data Samples= 150

Selecting a set of cut-off points: X df Pr(>|X|) Xk df Pr(>|Xk|) cut-off points K=3 4.66 1 0.0309 19.48 2 1e-04 1, 10 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 P-values of pairwise comparisons 0

Suggest Documents