www.nature.com/scientificreports
OPEN
received: 18 September 2014 accepted: 26 March 2015 Published: 05 May 2015
Identification and validation of a two-gene expression index for subtype classification and prognosis in Diffuse Large B-Cell Lymphoma Qinghua Xu1, 2, 3, 4, 5, Cong Tan1, 2, 3, Shujuan Ni1, 2, 3, Qifeng Wang1, 2, 3, Fei Wu4, 5, Fang Liu4, Xun Ye4, 5, Xia Meng4, 5, Weiqi Sheng1, 2, 3 & Xiang Du1, 2, 3 The division of diffuse large B-cell lymphoma (DLBCL) into germinal center B-cell-like (GCB) and activated B-cell-like (ABC) subtypes based on gene expression profiling has proved to be a landmark in understanding the pathogenesis of the disease. This study aims to identify a novel biomarker to facilitate the translation of research into clinical practice. Using a training set of 350 patients, we identified a two-gene expression signature, “LIMD1-MYBL1 Index”, which is significantly associated with cell-of-origin subtypes and clinical outcome. This two-gene index was further validated in two additional dataset. Tested against the gold standard method, the LIMD1-MYBL1 Index achieved 81% sensitivity, 89% specificity for ABC group and 81% sensitivity, 87% specificity for GCB group. The ABC group had significantly worse overall survival than the GCB group (hazard ratio = 3.5, P = 0.01). Furthermore, the performance of LIMD1-MYBL1 Index was satisfactory compared with common immunohistochemical algorithms. Thus, the LIMD1-MYBL1 Index had considerable clinical value for DLBCL subtype classification and prognosis. Our results might prompt the further development of this two-gene index to a simple assay amenable to routine clinical practice.
Diffuse large B-cell lymphoma (DLBCL) is the most common lymphoma worldwide, accounting for nearly 30 to 40% of non-Hodgkin’s lymphoma cases. DLBCL is highly heterogeneous from both morphological and clinical standpoints. The standard therapy for patients with DLBCL is Rituximab® combined with cyclophosphamide, doxorubicin, vincristine, and prednisone (R-CHOP), and this regimen results in a long-term disease-free survival rate of approximately 50%1. The International Prognostic Index (IPI) is the current standard approach to estimate the prognoses of DLBCL patients. The IPI stratifies DLBCL patients into four risk groups (low, low-intermediate, high-intermediate, and high). However, within each of these IPI risk groups, there are considerable differences with respect to outcome, suggesting that there are underlying biological heterogeneities that are not accounted for by the traditional clinical parameters. Through gene expression profiling, Alizadeh et al. identified two major cell-of-origin (COO) phenotypes with distinct prognoses: the favorable germinal centre B-cell-like (GCB) and the unfavorable activated B-cell-like (ABC) subtypes2. The distinct biological and clinical features of these subtypes have been
1
Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China. 2Department of Pathology, Fudan University Shanghai Cancer Center, Shanghai, China. 3Institute of Pathology, Fudan University, Shanghai, China. 4Fudan University Shanghai Cancer Center – Institut Mérieux Laboratory, Shanghai, China. 5 bioMérieux (Shanghai) Company Limited, Shanghai, China. Correspondence and requests for materials should be addressed to X.D. (email:
[email protected]) Scientific Reports | 5:10006 | DOI: 10.1038/srep10006
1
www.nature.com/scientificreports/ Study cohort
DLBCL-1
DLBCL-2
DLBCL-3
No. of patients
414
88
68
Specimen type
Frozen
Frozen
Frozen, FFPE
Therapy
Mixture
Mixture
R-CHOP
End point
COO, OS
COO, OS
COO, OS
Median age (range)
63 (14-92)
61 (15-86)
61 (16-86)
ABC-like
167(40)
32(36)
28(41)
GCB-like
183(44)
37(42)
30(44)
COO, n(%)
Unclassified Platform LIMD1 probe set MYBL1 probe set Reference
64(16)
19(22)
10(15)
HG-U133 Plus 2
HG-U133 Plus 2
HG-U133 Plus 2
222762_x_at
222762_x_at
222762_x_at
213906_at
213906_at
Lenz et al., 20088
213906_at Scott et al., 201412
Table 1. Summary of DLBCL dataset. Abbreviation: FFPE, formalin-fixed, paraffin-embedded ; COO, cellof-origin classification; OS, overall survival
independently validated3, and therefore, these two groupings are recognized as DLBCL subtypes in the current World Health Organization classification4. With the rapid evolution of microarray technology over the last decade, there have been multiple follow-up studies performed in this field using standardized genome-wide microarrays5-12, which have generated large volumes of gene expression data. Given the vast amounts of publicly available microarray data, the integrative analysis of microarrays, in which data from multiple studies are combined to increase the sample size and avoid laboratory-specific bias, has the potential to yield new biological insights that are not possible from a single study, as already demonstrated for prostate and other cancers13. Here, we describe an integrative analysis leading to identification and validation of a novel biomarker for both subtype classification and survival prediction in DLBCL.
Results
The LIMD1-MYBL1 Index was associated with the COO subtypes in DLBCL. In this study, we
included three gene expression dataset for biomarker discovery and validation. The DLBCL-1 dataset were used as a training set to identify gene expression signatures, and the DLBCL-2 and DLBCL-3 dataset were used as independent test sets for validation purpose. Details of study designs and sample characteristics are provided in Table 1. The DLBCL-1 cohort included 167 ABC and 183 GCB DLBCL patients according to the gold standard method described by Wright et al.5. We performed a two-class unpaired t-test to select genes that were differentially expressed between ABC and GCB subgroups, and then ranked the genes in descending order according to their statistical significance. The top two probesets were particularly interesting. One probeset, “213906_at”, which targets the gene MYBL1 (v-myb myeloblastosis viral oncogene homolog (avian)-like 1), exhibited a 10-fold higher expression level in the GCB group compared with the ABC group (P = 1.5E-64; Fig. 1a). Sensitivity versus 1-specificity was plotted to construct a Receiver Operating Characteristic (ROC) curve, and a good discrimination between the two groups was observed, with an Area Under Curve (AUC) of 0.93. In sharp contrast, the probeset “222762_x_at”, targeting the gene LIMD1 (LIM domains containing 1), was significantly over-expressed in the ABC group compared with the GCB group (P = 5.7E-58; Fig. 1b). The discriminatory power measured by the AUC was 0.94. Since LIMD1 and MYBL1 exhibited distinct expression patterns in ABC- and GCB- DLBCLs respectively, we integrated these two genes into a Bayesian classifier similar to the gold standard method5, and defined it as “LIMD1-MYBL1 Index”. For each patient, a probability score was estimated. A sample is classified as ABC or GCB subtype if the probability that it belongs to the ABC or GCB subgroup is greater than 80%; otherwise it is considered as unclassified type. Accordingly, the LIMD1-MYBL1 Index correctly classified 137 out of 167 ABC and 151 out of 183 GCB cases, resulting in 82% sensitivity, 86% specificity for ABC group and 83% sensitivity, 83% specificity for GCB group (Fig. 1c). The discriminatory power measured by AUC was further improved to 0.97.
The LIMD1-MYBL1 Index was an independent factor for DLBCL prognosis. The most impor-
tant test of the LIMD1-MYBL1 Index was the ability to predict clinical outcome. Overall survival rates were significantly different between the ABC and GCB subgroups classified by the LIMD1-MYBL1 Index (P