Discovery and validation of novel expression

0 downloads 0 Views 1MB Size Report
Sep 12, 2012 - validated on four external UCB datasets using the SurvExpress biomarker validation tool (31). The first dataset comprised of patients with ...
SUPPLEMENTARY INFORMATION Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High-Risk Bladder Cancer

Anirban P. Mitra, Lucia L. Lam, Mercedeh Ghadessi, Nicholas Erho, Ismael A. Vergara,  Mohammed Alshalalfa, Christine Buerki, Zaid Haddad, Thomas Sierocinski, Timothy J. Triche,  Eila C. Skinner, Elai Davicioni, Siamak Daneshmand, Peter C. Black                       

Journal of the National Cancer Institute 

 

DOI: 10.1093/jnci/dju290

 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

CONTENTS

Supplementary Methods ........................................................................................................... .... 3 Specimen processing and initial microarray analysis ............................................................. 3 Development and validation of prognostic classifiers ............................................................ 4 Sample size assessment .......................................................................................................... 6 Statistical analyses .................................................................................................................. 6 Analysis of biological interactions between prognostic markers ........................................... 7 Comparison with prior signatures ........................................................................................... 8 Independent validation of genomic markers in external datasets ........................................... 9 Supplementary Results ..................................................................................................................11 Overlap of genomic markers with prior signatures and their relevance to cancer ..................11 Supplementary References ............................................................................................................12 Supplementary Figure 1 ................................................................................................................ 14 Supplementary Figure 2 ................................................................................................................ 15 Supplementary Figure 3 ................................................................................................................ 16 Supplementary Figure 4 ................................................................................................................ 17 Supplementary Figure 5 ................................................................................................................ 18 Supplementary Figure 6 ................................................................................................................ 19 Supplementary Figure 7 ................................................................................................................ 20 Supplementary Figure 8 ................................................................................................................ 21 Supplementary Figure 9 ................................................................................................................ 22 Supplementary Figure 10 .............................................................................................................. 23 Supplementary Table 1 ............................................................................................................ ..... 24 Supplementary Table 2 ............................................................................................................ ..... 25 Supplementary Table 3 ........................................................................................................... ...... 26 Supplementary Table 4 ........................................................................................................... ...... 27

Page 2 of 27  │  Supplementary Information  │  JNCI 

 

Mitra AP et al. 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

SUPPLEMENTARY METHODS

Specimen Processing and Initial Microarray Analysis Archival formalin-fixed paraffin-embedded (FFPE) primary tumor specimens were obtained for 225 study patients following radical cystectomy. All tumor specimens underwent histopathological rereview to confirm the diagnosis of urothelial carcinoma, reassess tumor stage, grade and other pathological characteristics, and ensure sampling of an area that was most representative of overall tumor histology. Hematoxylin/eosin-stained sections were used to guide tumor specimen selection using a 1 mm diameter core punch. Total RNA was extracted and purified using RNeasy FFPE kit (Qiagen, Valencia, CA). RNA was amplified and labeled using the Ovation WTA FFPE system (NuGen, San Carlos, CA) and hybridized to GeneChip Human Exon 1.0 ST oligonucleotide microarrays (Affymetrix, Santa Clara, CA) according to the manufacturer’s recommendations. GeneChip Human Exon arrays use 5,362,207 probes to interrogate over one million exon clusters (collections of overlapping exons) with over 1.4 million probe selection regions (PSRs), referred to as features in this study.

Microarray data quality control was assessed by Affymetrix Power Tools packages and internally developed metrics (1). A total of 26 samples failed quality control. The median (interquartile range [IQR]) ages of the FFPE tissues for the resulting discovery (n = 133) and validation (n = 66) sets were comparable at 12.5 (9.9–15.9) and 13.1 (10.7–17.1) years, respectively (P = .21). Array files for these cases are available from the National Center for Biotechnology Information’s

Gene

Expression

(http://www.ncbi.nlm.nih.gov/geo/)

Mitra AP et al.  

under

Omnibus GEO

 

accession

(NCBI–GEO) code

GSE57933.

database Feature

JNCI  │  Supplementary Information  │  Page 3 of 27 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

summarization and normalization were performed by frozen robust multi-array analysis (fRMA), which is available through Bioconductor (2). A custom set of frozen vectors was generated by randomly selecting 10 arrays from each of the eight batches across the study. Features interrogated with fewer than four probes, any cross-hybridizing probes as defined by Affymetrix, or deemed unreliable due to non-unique mapping to the genome (defined by probe sequence realignment to hg19) were removed.

Development and Validation of Prognostic Classifiers With bioinformaticians who generated the prognostic classifiers remaining blinded to clinical data, two-thirds of the cohort was assigned to a discovery set and one-third to a validation set, keeping clinicopathologic characteristics balanced between both sets by testing their distribution using Fisher’s exact test, chi-square test, ANOVA or logistic regression. Features identifying patients who did and did not recur were visualized using normal quantile-quantile plots of t-test statistics using the qqnorm function in R v2.15.2. To identify features most clinically relevant to recurrence-free survival (RFS), area under the receiver-operating characteristic (ROC) curve, t-test statistics and median fold difference (MFD) were calculated in the discovery set. Based on these metrics, features were filtered by area under ROC curve (AUC) > 0.6, unadjusted t-test P < .050 and MFD > 1.5. Selected features were combined as a signature to produce a genomic classifier (GC) score by a random forest algorithm using the randomForest R package with 50,000 trees based on the discovery set (3). The mtry and nodesize parameters of the model were first assessed using the tune function in the e1071 R package (4). Briefly, a 20 (nodesize: 25 to 100 in increments of 5) by 5 (mtry: 1, 3, 5, 10, 15) grid was searched using 10-fold crossvalidation with 1,000 trees built per try. The top 10 performing parameter pairs (nodesize and

Page 4 of 27  │  Supplementary Information  │  JNCI 

 

Mitra AP et al. 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

mtry pairs based on the 20 by 5 grid search) were further fine-tuned by manual assessment with 50,000 trees in the random forest model, and selected by minimizing the out-of-bag error in the discovery set. The randomForest package was also used to assess the variable importance of each selected feature. The GC outputs a continuous score between 0 and 1, with higher scores indicating higher probability of recurrence.

A “clinical-only” classifier (CC) was also developed on the discovery set. This incorporated age, gender, pathological stage, and lymphovascular invasion status modeled using logistic regression. Interactions between different clinicopathologic factors were explored in the discovery set and applied to the model when significant. A complete model with all clinicopathologic variables and significant interaction terms was built. Several nested models were subsequently developed that excluded less significant variables. Analysis of variance and chi-squared tests were used to select between these nested models. If the difference between two nested models was not significant, then the model with fewer variables was used.

Risk scores for postcystectomy recurrence based on clinicopathologic variables were also calculated for each patient based on the clinical nomogram from the International Bladder Cancer Nomogram Consortium (IBCNC) (5). To evaluate the joint prognostic value of genomic information and clinicopathologic variables, GC was combined with IBCNC and CC by logistic regression in the discovery set into integrated G-IBCNC and G-CC, respectively. Analogous to GC, patients were scored using the above classifiers between 0 and 1. All locked genomic, clinical and genomic-clinicopathologic classifiers were applied to patients in the validation set in a blinded fashion.

Mitra AP et al.  

 

JNCI  │  Supplementary Information  │  Page 5 of 27 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

Sample Size Assessment Sizes of the discovery and validation sets following microarray data quality control were assessed for adequacy for developing classifiers using high dimensional data (6). A discovery set of 133 patients (with 68 cases that recurred) was expected to produce a classifier with accuracy within 5% of an optimal classifier if the largest standardized fold change was as low as 1.1. Additionally, a validation set of 66 patients (with 33 cases that recurred) would have 97% power to achieve a minimum AUC of 0.75 (based on Diagnostic Test module of PASS v11.0.8).

Statistical Analyses Univariable prognostic abilities of classifiers were compared using discrimination boxplots, Wilcoxon rank-sum test and logistic regression. Boxes in the boxplots represent score quartiles, and notches were calculated using boxplot.stat function in R based on the formula by McGill et al (7,8). This method estimates the 95% confidence interval (CI) for the median and is insensitive to underlying distributions of the sample, thereby allowing comparison of two medians. Cumulative incidence curves for RFS were constructed using Fine-Gray competing risks analysis (9). AUCs were used for classifier performance assessment with respect to binary outcomes using the pROC R library (10). Extension of AUC for censored data was also used to compare performance of genomic classifiers, external signatures and clinical nomograms with respect to RFS (survival-ROC analysis using survival R library) (11). Time-dependent survival ROCs were evaluated for prediction of recurrence within four years postcystectomy (12). For survival ROC, the nearest-neighbor estimator with λ = 0.002 was used to approximate survival function density. The 95% CIs for survival-ROC AUCs were approximated through bootstrapping. Decision curve analyses were used to assess the net clinical benefit of genomic

Page 6 of 27  │  Supplementary Information  │  JNCI 

 

Mitra AP et al. 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

versus clinical models across different threshold probabilities (13). Reassignment of patients to risk strata based on addition of genomic information was assessed using reclassification plots (14). 

Importance of genomic-based classifiers relative to individual and combined clinical information, and their independent prognostic abilities were evaluated by multivariable Cox regression models. Proportional hazards assumptions of the Cox model were confirmed by evaluating the scaled Schoenfeld residuals (15). For univariable and multivariable analyses, classifier scores were assessed as continuous variables (step size = 0.1) unless dichotomization was required. For analyses where classifier score was categorized, majority rule criterion with classifier scores of ≥ 0.5 and < 0.5 grouped as high-risk and low-risk, respectively, was used.

Estimates of censoring distribution were used to calculate follow-up duration (16). Analyses were performed using R v2.15.2. All tests were two-sided with type I error probability of 5%.

Analysis of Biological Interactions between Prognostic Markers To understand the biological interactions of constituent features within GC, first-degree partners of their respective genes were extracted using the Human Signaling Network v5, a curated database with nearly 63,000 directed and undirected interactions between approximately 6,300 proteins (17–20). Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7 was used to assess biological processes, molecular functions and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of the gene network (21–23). Enriched processes, molecular functions or KEGG pathways with Benjamini-Hochberg-corrected P < .050

Mitra AP et al.  

 

JNCI  │  Supplementary Information  │  Page 7 of 27 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

were selected (24). Enrichment Map, a network-based gene set enrichment visualization method implemented as a Cytoscape plugin, was used to visualize the highly enriched biological concepts and group similar terms together (25). Nodes in the Enrichment Map represent individual Gene Ontology (GO) terms or KEGG pathways and lines represent the amount of overlap between the terms.

Comparison with Prior Signatures The performance of GC was compared to that of previously described prognostic signatures for muscle-invasive urothelial carcinoma of the bladder (UCB). We identified five studies that defined seven prognostic genomic signatures developed exclusively for muscle-invasive or node-positive UCB (Supplementary Table 3) (26–30). As these signatures were developed using early-generation microarrays with relatively limited genomic coverage, it was feasible to reconstruct the gene lists on our study cohort as GeneChip Human Exon 1.0 ST microarrays offer transcriptome-wide coverage with the ability to interrogate entire lengths of protein-coding genes and noncoding RNAs. Genomic markers and genes that comprised the individual signatures were mapped back to the associated GeneChip Human Exon array core transcript clusters where possible, or extended or full transcript clusters when not aligned with the core transcript cluster. When an appropriate transcript-level feature was not identified, the closest one to the probe on the original array was used. While some prior signatures were modeled using other machine learning algorithms, their codes were not publicly available. Moreover, other prior studies only presented a list of differentially expressed genes. Therefore, summarized expression values for transcripts in each signature were combined using random forest as the standard machine learning algorithm for modeling and to provide a comparison against GC. Nodesize and

Page 8 of 27  │  Supplementary Information  │  JNCI 

 

Mitra AP et al. 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

mtry parameters were optimized for each model to minimize the out-of-bag error rate, thereby maximizing their prognostic performance in the discovery set. As with GC, the optimized models of the prior signatures were then applied in a blinded manner on the validation set.

Independent Validation of Genomic Markers in External Datasets The combined prognostic ability of genes represented by the GC features was independently validated on four external UCB datasets using the SurvExpress biomarker validation tool (31). The first dataset comprised of patients with chemotherapy-naïve, muscle-invasive, high-grade tumors (T2-T4aNxMx) accessed through the Cancer Genome Atlas (TCGA) database for bladder urothelial carcinoma (n = 54) (32). Three datasets were accessed through the NCBI–GEO database. Dataset accession GSE13507 (n = 164) included UCB patients across all tumor stages and grades (26). Dataset accession GSE5287 (n = 30) included patients with locally advanced or metastatic disease (33). Dataset accession GSE31684 (n = 93) included patients with nonmuscle-invasive and muscle-invasive disease with no evidence of metastatic disease at radical cystectomy (27). UCB patients from each dataset with publicly available gene expression and clinical information were included. While the external datasets included high-risk bladder cancer patients, their composition did not optimally resemble the discovery and validation cohorts in this study. Nevertheless, an attempt was made to validate GC in these datasets based on the hypothesis that the constituent markers were generally prognostic for high-risk bladder cancer. Official symbols of genes associated with each of the constituent features within GC were provided as input. Briefly, for each dataset, the probes associated with each gene were identified where available, and the prognostic index of each patient was estimated using a Cox model. Each cohort was then divided into low-risk and high-risk groups defined by a median split.

Mitra AP et al.  

 

JNCI  │  Supplementary Information  │  Page 9 of 27 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

Discrimination ability of the GC was then assessed by hazard-ratio estimate, concordance index, and log-rank test of differences between risk groups with associated Kaplan-Meier curves using default settings. Overall survival was used as the common denominator for comparison as it was the only clinical endpoint available for all four datasets.

Page 10 of 27  │  Supplementary Information  │  JNCI 

 

Mitra AP et al. 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

SUPPLEMENTARY RESULTS

Overlap of Genomic Markers with Prior Signatures and Their Relevance to Cancer Previously identified genes associated with UCB were cataloged by a thorough PubMed search for all publications in English listed through January 2014. This resulted in the identification of 2,059 unique genes. A comparison with the markers represented within GC revealed that four genes had been previously documented as being associated with UCB:



MECOM:

Associated with tumor grade (34); progression in non-muscle-invasive bladder cancer (35).



PPP1R12A:

Prognostic for overall survival in muscle-invasive bladder cancer (27); associated with outcome in patients with muscle-invasive and nodepositive disease (28).



SYPL1:

Associated with tumor stage (36).



ARFGEF1:

Associated with tumor stage (34).

MECOM corresponds to the most important feature within GC (Supplementary Figure 3). Human Signaling Network analysis showed that the protein encoded by MECOM can interact with CTBP1, SMAD3, CREBBP, MAPK8 and MAPK9, and has been associated with clinical outcomes in leukemia and ovarian cancer (37,38).

Mitra AP et al.  

 

JNCI  │  Supplementary Information  │  Page 11 of 27 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

SUPPLEMENTARY REFERENCES

1. 2. 3. 4.

5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

Lockstone HE. Exon array data analysis using Affymetrix power tools and R statistical software. Brief Bioinform. 2011;12(6):634–644. McCall MN, Bolstad BM, Irizarry RA. Frozen robust multiarray analysis (fRMA). Biostatistics. 2010;11(2):242–253. Breiman L. Random forests. Machine Learning. 2001;45(1):5–32. Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F. Misc Functions of the Department of Statistics (e1071), TU Wien. v1.6-1. The Comprehensive R Archive Network; 2012. http://cran.r-project.org/web/packages/e1071/index.html. Published September 12, 2012. Bochner BH, Kattan MW, Vora KC. Postoperative nomogram predicting risk of recurrence after radical cystectomy for bladder cancer. J Clin Oncol. 2006;24(24):3967– 3972. Dobbin KK, Simon RM. Sample size planning for developing classifiers using highdimensional DNA microarray data. Biostatistics. 2007;8(1):101–117. Chambers JM, Cleveland WS, Kleiner B, Tukey PA. Graphical Methods for Data Analysis. New Jersey, NJ: Wadsworth & Brooks/Cole Publishing Company; 1983. McGill R, Tukey JW, Larsen WA. Variations of box plots. Am Stat. 1978;32(1):12–16. Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999;94(446):496–509. Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. Therneau T. A Package for Survival Analysis in S. R package v2.36-14. 2012. Heagerty PJ, Lumley T, Pepe MS. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics. 2000;56(2):337–344. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006;26(6):565–574. Pepe MS. Problems with risk reclassification methods for evaluating prediction models. Am J Epidemiol. 2011;173(11):1327–1335. Grambsch PM, Therneau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika. 1994;81(3):515–526. Korn EL. Censoring distributions as a measure of follow-up in survival analysis. Stat Med. 1986;5(3):255–260. Cui Q, Ma Y, Jaramillo M, et al. A map of human cancer signaling. Mol Syst Biol. 2007;3:152. Awan A, Bari H, Yan F, et al. Regulatory network motifs and hotspots of cancer genes in a mammalian cellular signalling network. IET Syst Biol. 2007;1(5):292–297. Li L, Tibiche C, Fu C, et al. The human phosphotyrosine signaling network: evolution and hotspots of hijacking in cancer. Genome Res. 2012;22(7):1222–1230. Newman RH, Hu J, Rho HS, et al. Construction of human activity-based phosphorylation networks. Mol Syst Biol. 2013;9:655.

Page 12 of 27  │  Supplementary Information  │  JNCI 

 

Mitra AP et al. 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38.

Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4(1):44–57. Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37(1):1–13. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004;32(Database issue):D277–D280. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. 1995;57(1):289–300. Merico D, Isserlin R, Stueker O, Emili A, Bader GD. Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PLoS One. 2010;5(11):e13984. Kim WJ, Kim EJ, Kim SK, et al. Predictive value of progression-related gene classifier in primary non-muscle invasive bladder cancer. Mol Cancer. 2010;9:3. Riester M, Taylor JM, Feifer A, et al. Combination of a novel gene expression signature with a clinical nomogram improves the prediction of survival in high-risk bladder cancer. Clin Cancer Res. 2012;18(5):1323–1333. Sanchez-Carbayo M, Socci ND, Lozano J, Saint F, Cordon-Cardo C. Defining molecular profiles of poor outcome in patients with invasive bladder cancer using oligonucleotide microarrays. J Clin Oncol. 2006;24(5):778–789. Blaveri E, Simko JP, Korkola JE, et al. Bladder cancer outcome and subtype classification by gene expression. Clin Cancer Res. 2005;11(11):4044–4055. Kim WJ, Kim SK, Jeong P, et al. A four-gene signature predicts disease progression in muscle invasive bladder cancer. Mol Med. 2011;17(5-6):478–485. Aguirre-Gamboa R, Gomez-Rueda H, Martinez-Ledesma E, et al. SurvExpress: an online biomarker validation tool and database for cancer gene expression data using survival analysis. PLoS One. 2013;8(9):e74250. The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature. 2014;507(7492):315–322. Als AB, Dyrskjøt L, von der Maase H, et al. Emmprin and survivin predict response and survival following cisplatin-containing chemotherapy in patients with advanced bladder cancer. Clin Cancer Res. 2007;13(15):4407–4414. Lindgren D, Frigyesi A, Gudjonsson S, et al. Combined gene expression and genomic profiling define two intrinsic molecular subtypes of urothelial carcinoma and gene signatures for molecular grading and outcome. Cancer Res. 2010;70(9):3463–3472. Wang R, Morris DS, Tomlins SA, et al. Development of a multiplex quantitative PCR signature to predict progression in non-muscle-invasive bladder cancer. Cancer Res. 2009;69(9):3810–3818. Dyrskjøt L, Thykjaer T, Kruhoffer M, et al. Identifying distinct classes of bladder carcinoma using microarrays. Nat Genet. 2003;33(1):90–96. Ho PA, Alonzo TA, Gerbing RB, et al. High EVI1 expression is associated with MLL rearrangements and predicts decreased survival in paediatric acute myeloid leukaemia: a report from the Children’s Oncology Group. Br J Haematol. 2013;162(5):670–677. Nanjundan M, Nakayama Y, Cheng KW, et al. Amplification of MDS1/EVI1 and EVI1, located in the 3q26.2 amplicon, is associated with favorable patient prognosis in ovarian cancer. Cancer Res. 2007;67(7):3074–3084.

Mitra AP et al.  

 

JNCI  │  Supplementary Information  │  Page 13 of 27 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

 

Supplementary Figure 1. Study design schema for classifier discovery and initial validation. A “clinical‐only” classifier (CC) based  on  clinicopathologic  variables  and  a  genomic  classifier  (GC)  based  on  15  RNA  features  predictive  for  cancer  recurrence  were  developed on patients in the discovery set (n = 133). Postcystectomy recurrence probabilities were also calculated based on the  International Bladder Cancer Nomogram Consortium (IBCNC) nomogram. GC and IBCNC were integrated into G‐IBCNC, and GC  and  CC  were  combined  as  G‐CC.  All  classifier  models  were  then  locked  and  applied  in  a  blinded  manner  on  patients  in  the  validation set (n = 66). 

Page 14 of 27  │  Supplementary Information  │  JNCI 

 

Mitra AP et al. 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

      Supplementary Figure 2. Normal quantile‐quantile plot of t‐test statistics for features separating patients who did and did not  recur in the A) discovery set and B) validation set. Line fitting through the first quantile and third quantile are shown in black.       

Mitra AP et al.  

 

JNCI  │  Supplementary Information  │  Page 15 of 27 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

    Supplementary  Figure  3.  Variable  importance  of  the  constituent  features  within  GC.  Importance  for  the  15  GC  features  was  measured by their accuracy (left) and “Gini importance” (right) metrics using the random forest algorithm. The plots indicate  the  degree  by  which  an  importance  measure  decreased  when  each  feature  was  removed  individually.  The  higher  the  mean  decrease, the more important a feature was for the model. GC = genomic classifier.       

Page 16 of 27  │  Supplementary Information  │  JNCI 

 

Mitra AP et al. 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

      Supplementary  Figure  4.  Decision  curve  analyses  comparing  the  net  benefit  of  clinical‐only  models  versus  genomic‐ clinicopathologic  classifiers.  Model  performance  is  compared  to  extremes  of  classifying  all  patients  as  being  at  risk  for  recurrence  (thus  warranting  treatment  of  all  patients;  gray  curve)  versus  classifying  no  patients  at  risk  (thus  treating  none;  horizontal  black  line).  The  “decision‐to‐treat”  threshold,  which  represents  the  probability  of  recurrence  used  to  trigger  a  decision to treat, ranges from 0 to 100 with sensitivity and specificity of each prediction model calculated at each threshold to  determine net benefit. An optimal classifier has high net benefit above the gray “treat all” curve. When comparing IBCNC versus  G‐IBCNC (left panel), and CC versus G‐CC (right panel), net benefits of the respective genomic‐clinicopathologic classifiers were  superior  to  clinical‐only  models  over  a  wide  range  of  decision‐to‐treat  thresholds.  As  an  example,  if  a  decision  to  administer  adjuvant therapy was triggered by a 50% threshold probability for post‐cystectomy recurrence, then in comparison to “treat‐ all”  or  “treat‐none”  scenarios  (where  no  prediction  models  would  be  used),  employing  genomic‐clinicopathologic  classifiers  would  reduce  unnecessary  treatment  (i.e.,  for  low‐risk  patients)  by  28%  with  G‐CC  compared  to  17%  with  IBCNC.  IBCNC  =  postcystectomy  recurrence  nomogram  from  the  International  Bladder  Cancer  Nomogram  Consortium;  G‐IBCNC  =  integrated  genomic‐IBCNC classifier; CC = “clinical‐only” classifier; G‐CC = integrated genomic‐CC classifier.     

Mitra AP et al.  

 

 

JNCI  │  Supplementary Information  │  Page 17 of 27 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

 



 

 

 

        B   

 

 

 

        E 

 

 

              C 

                                   



                                    Supplementary  Figure  5.  Discrimination  plots  for  patients  in  the  validation  set.  Notched  box  plots  indicate  distributions  of         A) IBCNC, B) CC, C) GC, D) G‐IBCNC, and E) G‐CC scores of patients based on their recurrence status. Box represents median  score  and  its  25th  and  75th  percentiles.  Notches  represent  95%  confidence  interval  of  the  median;  whiskers  extend  to  1.5  times  the  interquartile  range  from  the  median.  Blue  and  red  dots  represent  patients  who  did  not  recur  and  recurred,  respectively.  P  values  were  determined  by  Wilcoxon  rank‐sum  test  and  are  two‐sided.  IBCNC  =  postcystectomy  recurrence  nomogram from the International Bladder Cancer Nomogram Consortium; CC = “clinical‐only” classifier; GC = genomic classifier;  G‐IBCNC = integrated genomic‐IBCNC classifier; G‐CC = integrated genomic‐CC classifier.       

Page 18 of 27  │  Supplementary Information  │  JNCI 

 

Mitra AP et al. 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

A   

                                    B                                                    Supplementary Figure 6. Survival AUCs plotted  over time for patients in the validation set. Survival AUCs determined over a  range of timepoints following cystectomy indicate that A) G‐CC has the best performance among all patients, and B) G‐CC and  GC have comparable performance that is superior to CC among the subset of node‐negative patients for predicting recurrence.  AUC  =  area  under  receiver‐operating  characteristic  curve;  CC  =  “clinical‐only”  classifier;  GC  =  genomic  classifier;  G‐CC  =  integrated genomic‐CC classifier.     

Mitra AP et al.  

 

JNCI  │  Supplementary Information  │  Page 19 of 27 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

      Supplementary  Figure  7.  Distribution  of  GC  scores  across  pathological  stages  among  patients  in  the  validation  set.  Dots  and  lines represent individual patient and median GC scores, respectively. Blue and red dots represent patients who did not recur  and  recurred,  respectively.  Median  GC  scores  were  higher  in  patients  who  recurred  than  those  who  did  not  experience  recurrence at last follow‐up for patients with pT2N0M0 (0.64 versus 0.38), pT3‐4aN0M0 (0.59 versus 0.28) and pTanyN1‐3M0  (0.57 versus 0.34) disease. GC = genomic classifier.       

Page 20 of 27  │  Supplementary Information  │  JNCI 

 

Mitra AP et al. 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

      Supplementary  Figure  8.  Distribution  of  classifier  scores  among  patient  subsets  in  the  validation  set  based  on  nodal  status.  Among node‐negative patients (left panel), GC scores provided better discrimination based on recurrence status compared to  CC scores (P = .008 versus P = .069, respectively). Among node‐positive patients (right panel), while comparative significance  was not achieved for GC and CC scores (P = .16 versus P = .41, respectively) due to the limited number of node‐positive patients  who  did  not  recur,  GC  scores  still  provided  relatively  better  discrimination  as  evidenced  by  nonoverlapping  95%  confidence  intervals  of  its  median  values.  Black  line  and  gray  box  represents  median  score  and  its  associated  95%  confidence  interval,  respectively. Blue and red dots represent patients who did not recur and recurred, respectively. P values were determined by  Wilcoxon rank‐sum test. CC = “clinical‐only” classifier; GC = genomic classifier.     

Mitra AP et al.  

 

JNCI  │  Supplementary Information  │  Page 21 of 27 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

      Supplementary Figure 9. Major GO terms associated with the 15 GC markers. First‐degree partners of the constituent features  within  GC  were  identified  and  a  GO  term  enrichment  of  the  associated  genes  was  assessed.  Each  red  node  represents  significantly  enriched  pathway  associated  with  a  GO  term,  and  green  links  represent  overlapping  genes  between  pathways.  Thickness of green links represents the frequency of overlapping genes between the GO terms. GO terms representing similar  functions  are  grouped  within  dotted  blue  ellipses,  and  super‐grouped  within  dotted  red  ellipses.  GC  =  genomic  classifier;        GO = Gene Ontology; MAPK = mitogen‐activated protein kinase.     

Page 22 of 27  │  Supplementary Information  │  JNCI 

 

Mitra AP et al. 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

      Supplementary  Figure  10.  Performance  of  GC  and  previously  reported  prognostic  genomic  signatures  for  muscle‐invasive  bladder  cancer  as  assessed  by  survival‐ROC  analysis  in  the  discovery  and  validation  sets  for  predicting  postcystectomy  recurrence. As all predictors including GC were optimized on the discovery set, their AUCs within this subgroup of patients were  comparable. However, upon blinded assessment in the validation set, GC had the highest AUC of all genomic predictors. Circles  and whiskers represent AUC and associated 95% confidence intervals, respectively. AUCs for predictors are also listed under  the respective patient sets. Names of individual predictors are as referenced in Supplementary Table 3. Dotted orange line and  shaded region highlights median AUC of GC and its 95% confidence intervals, respectively, in the validation set. ROC = receiver‐ operating characteristic; AUC = area under ROC curve; GC = genomic classifier.       

Mitra AP et al.  

 

JNCI  │  Supplementary Information  │  Page 23 of 27 

 

FOXO6  HSD17B7  ARID4B  ENAH  MAP4K3  MARCH7  MECOM  LRBA  MUT  CRCP  SYPL1  ARFGEF1  EHF  METTL7A  PPP1R12A 

 

1†  2†  3†  4†  5‡  6†  7§  8†  9†  10†  11†  12†  13†  14†  15† 

1p34.2  1q23  1q42.1–q43  1q42.12  2p22.1  2q24.2  3q26.2  4q31.3  6p12.3  7q11.21  7q22.3  8q13  11p12  12q13.12  12q15–q21 

 

Chromosomal  location 

.0065 .0138 .0310 .0028 .0327 .0275 .0045 .0081 .0232 .0259 .0485 .0354 .0022 .0201 .0406

 

t‐test  P 

1.6514 1.5401 1.5427 1.6644 1.5895 1.5425 1.5113 1.5432 1.5150 1.5144 1.5328 1.5580 1.7470 1.6586 1.5429

 

MFD 

 

0.7704 0.7282 0.6593 0.7107 0.7521 0.7420 0.6345 0.6373 0.6107 0.7319 0.6777 0.6777 0.6437 0.7337 0.6630

 

AUC 

.0005 .0016 .0142 .0012 .0003 .0006 .0275 .0286 .0940 .0013 .0064 .0075 .0545 .0012 .0049

 

t‐test  P 

1.9620 1.3508 1.5130 1.5311 1.5929 1.6120 1.1318 1.4009 1.2203 1.9790 1.5042 1.4951 1.1631 1.8840 1.3429

 

MFD 

Recurrence‐specific feature  metrics in validation set 

 















 

 





 

Regulation  of  transcription 





 

Cell  differentiation 

 



 

Cell  proliferation/  cell‐cycle  regulation/  apoptosis 







 

 



 

Signaling  pathways/  signal  transduction 

Important biological process(es) 

Methyltransferase activity Cell adhesion

Lipid metabolism

Protein ubiquitination

Angiogenesis regulation Cholesterol metabolism Histone methylation

 

Other 

  AUC = area under receiver‐operating characteristic curve; MFD = median fold difference.           

 

  Biological processes summarized using Gene Ontology Annotation database [Dimmer EC, Huntley RP, Alam‐Faruque Y, et al. The UniProt‐GO Annotation database in 2011.    Nucleic Acids Res. 2012;40(Database issue):D565–D570]. 

 

 

AUC 

0.6251  0.6234  0.6223  0.6617  0.6067  0.6020  0.6434  0.6261  0.6206  0.6135  0.6064  0.6322  0.6437  0.6128  0.6104 

  *  All statistical tests were two‐sided.  †  Exonic feature  ‡  Exonic antisense feature  §  Intronic antisense feature 

Associated  gene 

Feature  and  type 

Recurrence‐specific feature  metrics in discovery set 

Supplementary Table 1. Summary description of features comprising the genomic classifier for predicting postcystectomy bladder cancer recurrence*   

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

Page 24 of 27  │  Supplementary Information  │  JNCI                                                                                                             Mitra AP et al. 

Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High‐Risk Bladder Cancer 

Supplementary  Table  2.  Multivariable  analysis  comparing  genomic‐clinicopathologic  versus  clinical‐only  models  in  the  validation set*      Relative risk of recurrence   

Hazard ratio (95% CI) 

Model 1     G‐IBCNC     IBCNC  Model 2     G‐CC     CC 

 

P   

1.18 (1.03 ‐ 1.36)  1.04 (0.90 ‐ 1.20)    1.18 (1.04 ‐ 1.33)  1.10 (0.92 ‐ 1.31) 

  .016    .62      .008    .30 

  *  Hazard ratio estimated by Cox proportional hazards analysis with ridge regression. All statistical tests were two‐sided.   

CI = confidence interval; IBCNC = postcystectomy recurrence nomogram from the International Bladder Cancer Nomogram  Consortium;  G‐IBCNC  =  integrated  genomic‐IBCNC  classifier;  CC  =  “clinical‐only”  classifier;  G‐CC  =  integrated  genomic‐CC  classifier.     

Mitra AP et al.  

 

 

JNCI  │  Supplementary Information  │  Page 25 of 27 

  74 ¶ 

  40 ¶ 

  38 ¶ 

Sanchez‐Carbayo 2006  

Blaveri 2005  

Kim 2011  

5

 

 

 

 

 

 

   



24 

85 

20 

44 

49 

15  61 

(mappable  transcripts, n) 

Signature size 

Progression

Overall survival

Overall survival

Overall survival

Progression

Cancer‐specific survival

Recurrence‐free survival  Overall survival

Prognostic endpoint 

0.82 (0.75 ‐ 0.90)

0.83 (0.76 ‐ 0.90)

0.86 (0.80 ‐ 0.93)

0.79 (0.71 ‐ 0.89)

0.82 (0.75 ‐ 0.90)

0.82 (0.75 ‐ 0.91)

0.83 (0.75 ‐ 0.91)  0.85 (0.78 ‐ 0.92)

Survival‐ROC  AUC (95% CI) 

1.66 (1.43 ‐ 1.92) 

1.81 (1.53 ‐ 2.15) 

1.98 (1.63 ‐ 2.42) 

1.60 (1.36 ‐ 1.88) 

1.64 (1.40 ‐ 1.93) 

1.84 (1.51 ‐ 2.24) 

1.36 (1.23 ‐ 1.49)  2.19 (1.75 ‐ 2.75) 

Hazard ratio  (95% CI) 

Discovery set