Virchows Arch (2010) 456:533–541 DOI 10.1007/s00428-010-0896-6
ORIGINAL ARTICLE
Microsatellite instability of the colorectal carcinoma can be predicted in the conventional pathologic examination. A prospective multicentric study and the statistical analysis of 615 cases consolidate our previously proposed logistic regression model Ruth Román & Montse Verdú & Miquel Calvo & August Vidal & Xavier Sanjuan & Mireya Jimeno & Antonio Salas & Josefina Autonell & Isabel Trias & Marta González & Beatriz García & Natalia Rodón & Xavier Puig
Received: 21 October 2009 / Revised: 11 February 2010 / Accepted: 17 February 2010 / Published online: 15 April 2010 # Springer-Verlag 2010
Abstract High microsatellite instability (MSI-H) allows the identification of a subset of colorectal carcinomas associated with good prognosis and a higher incidence of Lynch syndrome. The aim of this work was to assess the interobserver variability and optimize our MSI-H prediction model previously published based on phenotypic features. The validation series collected from five different hospitals included 265 primary colorectal carcinomas from the same number of patients. The eight clinicopathological parameters that integrate our original model were evaluated in the corresponding centers. Homogeneity assessment revealed significant differences between hospitals in the estimation of the growth pattern, presence of Crohn-like reaction, percentage of cribriform structures, and Ki-67 positivity. Despite this
observation, our model was globally able to predict MSI-H with a negative predictive value of 97.0%. The optimization studies were carried out with 615 cases and resulted in a new prediction model RERtest8, which includes the presence of tumor infiltrating lymphocytes at the expense of the percentage of cribriform structures. This refined model achieves a negative predictive value of 97.9% that is maintained even when the immunohistochemical parameters are left out, RERtest6. The high negative predictive value achieved by our models allows the reduction of the cases to be tested for MSI to less than 10%. Furthermore, the easy evaluation of the parameters included in the model renders it a useful tool for the routine practice and can reinforce other published models and the current clinical protocols to detect the subset of colorectal
R. Román (*) : M. Verdú : M. González : B. García : N. Rodón : X. Puig BIOPAT. Biopatologia Molecular, SL, Grup Assistència, Avda. Diagonal 660, planta-1, 08034 Barcelona, Spain e-mail:
[email protected]
A. Vidal : X. Sanjuan Department of Pathology, Hospital Universitari de Bellvitge, Barcelona, Spain M. Jimeno Department of Pathology, Hospital del Mar, Barcelona, Spain
X. Puig Hospital de Barcelona-SCIAS, Grup Assistència, Barcelona, Spain
A. Salas Department of Pathology, Hospital Mútua de Terrassa, Terrassa, Spain
M. Verdú : A. Vidal : X. Puig Histopat Laboratoris, Barcelona, Spain
J. Autonell Department of Pathology, Hospital General de Vic, Vic, Spain
M. Calvo Statistics Department, Universitat de Barcelona, Barcelona, Spain
I. Trias Department of Pathology, Hospital Plató, Barcelona, Spain
534
cancer patients bearing hereditary nonpolyposis colorectal cancers risk and/or MSI-H phenotype. Keywords Microsatellite instability . Prediction model . Colorectal cancer . Pathological parameters . Hereditary nonpolyposis colorectal cancer
Introduction The study of global genomic and epigenomic changes occurring during colorectal carcinogenesis, mainly chromosomal instability (CIN), microsatellite instability (MSI), and CpG island methylation phenotype, is allowing the classification of colorectal tumors according to their molecular status [1] and the establishment of correlations between such status and key parameters like patient’s outcome and treatment response [2]. Two main alternative carcinogenic mechanisms have been proposed, the CIN and the microsatellite instability (MIN) pathways. The most common CIN pathway, involving approximately 90% of sporadic colorectal tumors and all cases of familial adenomatous polyposis, follows the adenoma carcinoma sequence through sequential accumulation of a series of mutations at crucial regulatory genes showing frequent chromosomal gains and losses [3]. On the contrary, tumors developing through the MIN pathway show a high degree of MSI (MSI-H) [4] which is due to alterations in the mismatch repair genes (MMR), mainly MLH1, MSH2, and MSH6 [5]. These alterations can be caused by germline mutations, which give rise to hereditary nonpolyposis colorectal cancers (HNPCC) [6] or by epigenetic silencing through hypermethylation of the MLH1 promoter [7]. Patients carrying both sporadic and hereditary MIN colorectal cancers exhibit a better prognosis and a poor response to 5-fluorouracilbased chemotherapy compared to those with CIN tumors [8, 9]. It is therefore essential to distinguish between these two main types of carcinomas. Testing for MSI or for loss of MMR in all colorectal cancer cases would be a very expensive and timeconsuming task to identify only the approximately 10% of tumors following the MIN pathway. It has become clear during the last years that these tumors display common morphological characteristics such as proximal location, mucinous differentiation, solid growth pattern, presence of intraepithelial lymphocytes and Crohn’s-like lymphocytic reaction which can make them distinguishable from CIN tumors [10-12]. Several studies have been reported in the literature in an effort to identify MIN status of colorectal tumors. Recently, Jenkins and colleagues [13] used pathology features described in the Bethesda guidelines to predict MSI in a group of patients diagnosed with colorectal carcinomas
Virchows Arch (2010) 456:533–541
before age 60 with the primary aim of identifying HNPCC candidates. Also lately another model has been published by Greenson and colleagues [14] which classifies MSI-H tumors with an 85% accuracy looking at pathologic features. The objective of our work was to validate and improve with a multicentric study a logistic model based on clinicopathological features previously designed in our laboratories to predict microsatellite instability [15]. We used a nonselected population of 615 colorectal carcinomas to provide pathologists with an easy to use tool to identify a large subset of carcinomas which do not present a MSI-H status and would need no MIN analysis. The major aim of our study was to achieve a high negative predictive value which would reduce greatly the number of cases to be tested for MSI.
Materials and methods Validation series For our validation series, a total of 265 unselected primary colorectal cancer cases were prospectively collected from five different centers in the area of Catalonia, Spain. Cases were evaluated by the corresponding pathologist following standard published criteria without previous specific training. Immunohistochemical analysis was also carried out at each participating center according to their own routine protocols in order to preserve interobserver heterogeneity. Histopathologic features were recorded as previously described [15]. Briefly such features included categorical and numerical variables. Categorical variables recorded were gender, tumor location, tumor configuration, extent of invasion, intramural and extramural thin-walled vessel invasion (TWVI), venous vessel invasion (VVI), perineural invasion (PNI), growth pattern, peritumoral Crohn-like reactivity which was considered positive when at least three nodular aggregates of lymphocytes were present within a single low power field (4× magnification), presence of tumor infiltrating lymphocytes (TIL) characterized by the finding of at least four intraepithelial lymphocytes in a high power field (40× magnification) [16], and presence of residual adenomas. Numerical variables evaluated were age, tumor size, number of affected lymph nodes, percentage of solid [17], mucinous, cribriform, micropapillary and microglandular patterns, as well as expression of Ki-67 and p53 by immunohistochemistry. Representative formalin-fixed paraffin-embedded blocks of paired tumor and normal tissue were then sent to our center for MSI-H prediction and MSI-H molecular analysis. Simultaneously a prospective series of 148 cases was collected and identically assessed in our institution; this added interobserver variability and also serve to increase the global validation series.
Virchows Arch (2010) 456:533–541
Optimization series The global series employed for the improvement of our prediction model included the 265 external cases from our validation series, another 146 from the prospective series evaluated in our institution, plus the 204 cases from our first study [15]. A summary of the histopathological variables of our global series is displayed on Table 1. Microsatellite instability analysis Genomic DNA was extracted from ten 5-µm-thick sections of paired normal and tumor samples by macrodissection of selected areas followed by a proteinase Kphenol/chloroform protocol. DNA (200 ng) were used for each specific PCR reaction after the assessment of DNA quality. MSI status was evaluated as described previously [15], using a panel of 11 microsatellites composed by the five microsatellites from the NCI panel (BAT25, BAT26, D5S346, D2S123, and D17S250) [18] in a multiplex PCR reaction, five additional microsatellites originally aimed at detecting the LOH status of chromosome 18q (D18S55, D18S58, D18S61, D18S64, and D18S69) [19] also amplified in a multiplex reaction and a microsatellite at the TP53 locus on 17p (P53CA) [20]. Fluorescent amplicons were analyzed on an automated ABI PRISM® 310 Genetic Analyzer using the GeneScan software (PE Applied Biosystems). According to the consensus definitions of the US NCI, tumors were classified as exhibiting MSI-H when 30% or more of the tested loci resulted unstable and non-MSI-H when they were less than 30%. Tumors exhibiting low microsatellite instability (up to three unstable markers out of 11) were considered together with stable tumors. Immunohistochemical analysis Immunohistochemical analysis in our institution was performed by ABC immunoperoxidase staining method, using mouse monoclonal antibodies DO-7 and MIB-1 (DakoCytomation, Denmark A/S) detecting p53 and Ki-67 proteins, respectively. Positive and negative controls were included in each experiment. Immunohistochemical evaluation was conducted double-blind by scoring the estimated percentage of tumor cells showing nuclear staining. Statistical analysis Our previous experience with the logistic regression model [15] capable to predict very accurately MSI instability impels us to assay again this modeling approach with the multicenter data set. Our purpose is to obtain a mathematical
535
expression in order to estimate the probability of a tumor exhibiting MSI-H according to the following equation: P ¼1
1 1 þ ex
ðÞ
Where P is the expected probability of MSI-H and x ¼ b 0 þ b 1 x1 þ b 2 x2 þ . . . þ b n xn is the development of the linear component of (*). β0 is the independent term of the regression equation, βi is the regression coefficient for the i-th explanatory variable, and xi is the value of the i-th variable for any individual tumor. Notice that for dichotomous variables, xi assumes value 1 or 0. Instead of the classical stepwise regression used in our previous publication [15] we follow here a different strategy to select the variables included in the model and to estimate their coefficients. In a first stage of our current approach, we use an automatic selection procedure of the variables based on the recent developments of the statistical topic known as shrinkage methods for model selection. More precisely, we use the methodology Regularization Paths for Generalized Linear Models via Coordinate Descent described in Friedman et al. [21, 22] implemented by these authors in the glmnet package running on the R statistical environment [23]. The glmnet package does not currently provide any stopping criteria to the user, we employ in our R script the Schwartz criteria combined with the tenfold cross-validation technique. For each of the ten validation sets, where it is successively excluded 10% of the full data set, the glmnet package with the Schwartz criteria [24] select the subset of variables that must be introduced in the logistic equation among our set of 30 maximum possible explanatory candidates. Therefore, a table with the frequency of each of the possible explanatory variable included in the ten final models, one for each validation data set, can be obtained. These results are used in the second stage of our approach, which discards, if any, variables included in less than 70% of the final models. In order to configure the final form of the equation, we take into account in this step the clinicopathological knowledge of the variables. When the variables in the right side of the equation are definitively established, we proceed to estimate the coefficients in the logistic model and validate its predictive capability. In this third stage, we use a huge resampling approach, where the basic element consists on a random split of the full data set in two subsets. The first subset, the training set, is used to compute the coefficients of the logistic model. They are obtained also via glmnet package, with the shrinkage parameter equal to 0. Then, the predictive capability is computed on the second subset, the validation set. We repeat 1,000 times this basic procedure of random split and predictive computations on the resulting validation set in order to avoid potential biases on the results due to the random selection. We compute the average value of
536
Virchows Arch (2010) 456:533–541
Table 1 Clinicopathological variables of the global optimization series MSS (%) n=563
MSI-H (%) n=52
336 (59.7) 227 (40.3)
22 (42.3) 30 (57.7)
Proximal Distal Configuration Exophytic Ulcerated Stenosing Extent of invasion (pT) pT1 pT2 pT3 pT4 Intramural TWVI Present Absent Extramural TWVI Present Absent Intramural VVI
177 (31.4) 386 (68.6)
45 (86.5) 7 (13.5)
244 (43.3) 230 (40.9) 89 (15.8)
23 (44.2) 23 (44.2) 6 (11.5)
37 (6.6) 83 (14.7) 271 (48.1) 172 (30.6)
1 (1.9) 7 (13.5) 33 (63.5) 11 (21.2)
172 (30.6) 391 (69.4)
16 (30.7) 36 (69.2)
141 (25.0) 422 (75.0)
11 (21.2) 41 (78.9)
Present Absent Extramural VVI Present Absent Intramural PNI Present Absent Extramural PNI Present Absent Growth pattern Expansive Infiltrative Crohn-like lymphoid reaction Present Absent TIL
40 (7.1) 522 (92.7)
2 (3.8) 50 (96.2)
113 (20.1) 450 (79.9)
8 (15.4) 44 (84.6)
54 (9.6) 509 (90.4)
2 (3.8) 50 (96.2)
76 (13.5) 487 (86.5)
3 (5.8) 49 (94.2)
193 (34.3) 370 (65.7)
32 (61.5) 20 (38.5)
190 (33.7) 373 (66.3)
41 (78.8) 11 (21.2)
86 (15.3) 477 (84.7)
29 (55.8) 23 (44.2)
176 (31.3) 387 (68.7)
15 (28.8) 37 (71.2)
Categorical variable Gender Male Female Location
Present Absent Residual adenoma Present Absent
Virchows Arch (2010) 456:533–541
537
Table 1 (continued) Numerical variable
Mean±SD
Mean±SD
Age Tumor size (mm) Solid pattern (%) Mucinous pattern (%) Cribriform pattern (%) Micropapillary pattern (%) Microglandular pattern (%) Nodal involvement (n) Ki67 proliferative index (%) p53 overexpression (%)
69.1±11.6 43.2±21.1 5.0±12.9 9.2±20.7 4.3±10.1 2.0±6.9 3.2±9.8 2.0±3.7 62.4±21.7 43.3±38.5
69.8±12.9 57.5±17.1 23.7±35.2 32.3±32.5 12.3±19.7 0.3±1.5 0.6±2.2 2.2±5.7 72.6±17.0 17.9±24.1
the coefficients and the average prediction capability of the new proposed model over the random 1,000 data sets. We compare its results to the obtained with our previous prediction model [15] using the same 1,000 validation data sets.
equivalent, achieving a negative predictive value of 97.8% and did not differ from those obtained in the initial study where the negative predictive value was 97.8%. Accuracy, sensitivity, specificity, positive, and negative predicted values of the three series are shown in Table 3.
Results
Optimization studies
Validation studies
Out of the 615 colorectal carcinomas included in our global series, 52 (8.5%) exhibited MSI-H. The clinicopathological variables of this series are illustrated in Table 1. The glmnet cross-validation analysis carried out with the global series, involving a total of 1,000 random data sets, found that tumor location, percentage of solid and mucinous components, presence of Crohn-like response and TIL were strongly associated with the MSI status being present in 976 to 1,000 of the generated models. The growth pattern included in 715 models and the expression of Ki-67 (in 592) and p53 (in 651) were also considered to be predictive of the MSI status and thus included in our optimized model, named RERtest8:
The first approach to test the robustness of our original prediction model was the assessment of the homogeneity between the participating hospitals, following the model’s equation: P ¼1
1 1 þ ex
x ¼ 2:648 a 1:6 b 1:884 c 2:401 þ d 0:045 þ e 0:021 f 0:118 þ g 0:05 h 0:029 Cases were assigned as MSI-H when P>0.29. The coefficients of the linear component (x) corresponded to: (a) location (0=proximal, 1=distal), (b) growth pattern (0= expansive, 1=infiltrative), (c) Crohn-like response (0=present, 1=absent), (d) solid pattern%, (e) mucinous pattern%, (f) cribriform pattern, (g) Ki-67%, and (h) p53 accumulation%. Significant differences were revealed in the estimation of the expansive growth, presence of Crohn-like inflammatory response, percentage of cribriform structures, and Ki-67 expression (Table 2). Despite this interobserver discrepancies, our original model was globally able to predict MSI-H with a negative predictive value of 97.0%, reducing the number of cases to be tested for MSI to just 11%. The results obtained with our own prospective series were
P ¼1
1 1 þ ex
x ¼ 2:925 þ a 1:563 þ b 1:034 þ c 1:426 þ d 0:588 e 0:046 f 0:028 g 0:031 þ h 0:021 The coefficients of the linear component (x) corresponded to: (a) location (0=proximal, 1=distal), (b) growth pattern (0=expansive, 1=infiltrative), (c) Crohn-like response (0= present, 1=absent), (d) TIL%, (e) solid pattern%, (f) mucinous pattern%, (g) Ki-67%, and (h) p53 accumulation%. Comparing these data with our original model, it should be noted that TIL has been included in the equation at the expense of the proportion of cribriform pattern which has been left out of the optimized model. Table 4 shows the accuracy,
538
Virchows Arch (2010) 456:533–541
Table 2 Homogeneity assessment between the five participating hospitals
Significant differences are highlighted in bold
Center 1 Proximal Distal Infiltrative Expansive Crohn-like Present Absent Solid pattern (%) Mucinous pattern (%) Cribriform pattern (%) Ki-67 proliferative index (%) p53 expression (%)
sensitivity, specificity, positive, and negative predicted values achieved by the optimized model. Considering that the expression of Ki-67 and p53 by immunohistochemistry were the variables with a weaker relation to the MSI status and were also the only parameters that could not be determined just by morphology assessment, a second alternative model was constructed that excluded these two variables, RERtest6: P ¼1
1 1 þ ex
x ¼ 1:114 þ a 1:523 þ b 1:046 þ c 1:591 þ d 0:893 e 0:042 f 0:031 The coefficients of the linear component (x) corresponded to: (a) location (0=proximal, 1=distal), (b) growth pattern (0= expansive, 1=infiltrative), (c) Crohn-like response (0=present, 1=absent), (d) TIL%, (e) solid pattern%, (f) mucinous pattern%. The statistical parameters of this alternative model are also shown in Table 4. A probability value of a tumor being MSS lower than 80% (P