Identification of prostate cancer modifier pathways using parental ...

3 downloads 0 Views 984KB Size Report
Nov 6, 2007 - Qing Xu*, Pradip K. Majumder†, Kenneth Ross‡, Yeonju Shim†, Todd R. Golub†‡, Massimo Loda†§, and William R. Sellers†‡§¶. *Department ...
Identification of prostate cancer modifier pathways using parental strain expression mapping Qing Xu*, Pradip K. Majumder†, Kenneth Ross‡, Yeonju Shim†, Todd R. Golub†‡, Massimo Loda†§, and William R. Sellers†‡§¶储 *Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115; †Departments of Medical Oncology and Pediatric Oncology, Dana–Farber Cancer Institute, Boston, MA 02115; §Departments of Medicine and Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115; ‡The Broad Institute, Harvard Medical School and Massachusetts Institute of Technology, Cambridge, MA 02141; and ¶Novartis Institutes of BioMedical Research, Cambridge, MA 02139 Communicated by David M. Livingston, Dana–Farber Harvard Cancer Institute, Boston, MA, September 6, 2007 (received for review April 5, 2007)

glycolysis 兩 proliferation

T

he acquisition of somatic genetic alterations is a key feature of human cancer. Somatic genetic events are overlaid on the germ-line genetic background of individuals, and thus germ-line genetic variation can alter cancer-related phenotypes. For example, among familial cancer syndromes associated with germ-line mutation of tumor suppressor genes, there is often significant phenotypic variation between families, suggesting that germ-line genetic factors modify the consequence of major gene mutation. Similarly, germline genetic variation can influence the development of somatic mutations; notably, EGFR mutation frequency is strongly related to ethnicity in sporadic lung adenocarcinoma (1), and MC1R germ-line variants are correlated with increased BRAF mutation frequency in melanoma (2). Together, germ-line genetic variation can modify susceptibility/resistance to human malignancies. Although the etiology of prostate cancer remains unclear, twin studies have estimated that heritable factors contribute 42% of the risk of developing prostate cancer, placing it as the highest among the common epithelial malignancies (3). Despite this, it has been difficult to identify specific genes in which germ-line genetic variation reproducibly contributes to prostate cancer risk. Most methods for identifying modifier genes in human complex diseases focus either on population-based association studies of candidate genes or family-based linkage analyses of the entire genome. Both methods remain challenging (4). Experimental murine genetic models of complex human disease can provide insights into human disease pathogenesis because genetic contexts play an important role in strain-dependent variation of murine phenotypes. In murine models, the search for genes modifying disease phenotypes typically relies on quantitative trait locus mapping approaches (5). Several recent reports demonstrate the feasibility of the in silico correlation approach to map modifiers in inbred strains by associating the phenotype and genotype strain distribution patterns (6). www.pnas.org兾cgi兾doi兾10.1073兾pnas.0708476104

In addition, DNA microarray technology has made it possible to treat the expression levels of thousands of genes as quantitative or ‘‘complex’’ traits and inheritable phenotypes. Combining the traditional quantitative trait locus mapping strategy with DNA microarray technology led to the identification of genomic regions controlling the natural variation in gene expression in maize, mouse, and humans (7). It is therefore likely that expression differences could serve as the surrogate markers for the genomic polymorphisms that lead to the phenotypic variations. To exploit this idea, here we propose a method, parental strain expression mapping (PSEM), to directly identify candidate genes and modifier pathways associated with the phenotypic variations induced by functional genetic perturbations in mice by using DNA microarray analysis of genome-wide mRNA expression of parental murine tissues and the recently developed pathway-oriented gene expression analytical method, gene set enrichment analysis (GSEA) (8–11). In the context of our studies of the phenotypic consequences of activation of AKT1 in the murine prostate (12), PSEM was used to identify multiple candidate modifier pathways in which variation in the germ line was correlated or anticorrelated with proliferation induced by AKT1. Strikingly, the results identified the glycolysis pathway as a candidate negative modulator of AKT1induced proliferation. In keeping with this observation, variation in glycolysis pathway expression in normal human prostate tissue was correlated with nonrecurrent clinical outcomes in patients with prostate cancer. Moreover, when AKT1 was placed in the context of an inbred C57B6 strain having a low parental glycolysis signature, the AKT1-mediated induction of glycolysis genes was significantly blunted compared with the FVB-AKT1 mice. Together, these data raise the possibility that modifier pathways can be identified through PSEM without the identification of specific genomic loci. Results Ventral Prostate Expression Profiling Captures Strain-Specific Genetic Variation in Mice. To identify candidate modifier genes according to

mRNA expression levels, we first asked whether there was significant gene expression variation found in the ventral prostates (VPs) of distinct inbred strains. Next, gene expression data (MOE430A arrays; Affymetrix, Santa Clara, CA) were derived by using total Author contributions: Q.X., P.K.M., and W.R.S. designed research; Q.X., P.K.M., and Y.S. performed research; K.R., T.R.G., and M.L. contributed new reagents/analytic tools; Q.X. and K.R. analyzed data; and Q.X. and W.R.S. wrote the paper. The authors declare no conflict of interest. Abbreviations: GSEA, gene set enrichment analysis; PIN, prostatic intraepithelial neoplasia; PSEM, parental strain expression mapping; VP, ventral prostate. Data deposition: The data reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession no. GSE7270). 储To

whom correspondence should be addressed at: Novartis Institutes of Biomedical Research, 250 Massachusetts Avenue, 4A/245, Cambridge, MA 02139. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/cgi/content/full/ 0708476104/DC1. © 2007 by The National Academy of Sciences of the USA

PNAS 兩 November 6, 2007 兩 vol. 104 兩 no. 45 兩 17771–17776

MEDICAL SCIENCES

Inherited genetic risk factors play an important role in cancer. However, other than the Mendelian fashion cancer susceptibility genes found in familial cancer syndromes, little is known about risk modifiers that control individual susceptibility. Here we developed a strategy, parental strain expression mapping, that utilizes the homogeneity of inbred mice and genome-wide mRNA expression analyses to directly identify candidate germ-line modifier genes and pathways underlying phenotypic differences among murine strains exposed to transgenic activation of AKT1. We identified multiple candidate modifier pathways and, specifically, the glycolysis pathway as a candidate negative modulator of AKT1-induced proliferation. In keeping with the findings in the murine models, in multiple human prostate expression data set, we found that enrichment of glycolysis pathways in normal tissues was associated with decreased rates of cancer recurrence after prostatectomy. Together, these data suggest that parental strain expression mapping can directly identify germline modifier pathways of relevance to human disease.

Fig. 1. Genetic differences of six inbred mice are reflected by expression differences of VP. (A) The genes most highly expressed for each strain were selected by one-versus-all. The expression of each probe set (rows) in each sample (columns) is represented by the number of standard deviations above (red) or below (blue) the mean for that probe set across all samples. (B) A 4 ⫻ 3 self-organizing map separates all four samples from each strain into distinct strain-specific clusters. Shown in each cell is the mean (black diamonds) and expression range (red lines) of each of 2,212 genes used for sample clustering. For each cluster, the majority of average gene expressions are not discernable and formed a thick black line at the base of each cell, whereas the distinguishable black diamonds form different patterns for each cell (cluster).

RNA extracted from the VPs of 8-week-old inbred mice from six strains (AKR, SWR, 129⫻1, FVB, BALB/c, and C57BL/6). We then applied the one-versus-all classification method (13) to determine whether strain-specific transcripts could be detected. Statistically meaningful differential gene expression was identified for each strain (from 188 to 350 genes, permuted P value, ⱕ0.01) [Fig. 1A and supporting information (SI) Fig. 7]. The strain-dependent variation of one representative gene for each strain was validated by quantitative RT-PCR (SI Fig. 8). We next sought to determine whether strain-specific differences in expression were the major source of variations in this data set. To this end, we applied an unsupervised algorithm known as selforganizing maps (14). By using a 4 ⫻ 3 grid allowing for segregation into 12 total classes, self-organizing map analysis separated the 24 VP samples into only six distinct clusters in which each consisted of samples from a single strain (Fig. 1B). Moreover, the presence of multiple empty cells suggested that samples from the same strains were tightly related to each other in gene expression space. An independent clustering scheme, hierarchical clustering, also led to similar strain-specific clusters (SI Fig. 9). These results demonstrate that the germ-line genetic variation among strains is the major contributor to RNA expression differences in the samples analyzed. In other words, environmental influences on gene expression (within-strain variances) are much smaller than the genetic influences (between-strain variances). Thus, gene expression can potentially be used to assess germ-line contribution to phenotypic variation. Modifier Genes Modulate Ventral Prostate Proliferation Induced by AKT1. Transgenic expression of AKT1 in FVB mice induces pro-

liferation and a robust highly penetrant and uniform prostatic intraepithelial neoplasia (PIN) phenotype. Thus as a starting point for modeling prostate cancer-related phenotypes that vary according to different genetic contexts, we used the rate of proliferation (quantified by BrdU incorporation) in the VP induced by transgenic expression of AKT1. In addition, this measure served as a quantitative trait characterizing one aspect of the qualitative PIN phenotype induced by AKT1. First, to test whether genetic background differences among murine strains could modulate the 17772 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0708476104

Fig. 2. Introduction of the AKT1 transgene into F1 mice with FVBxB6 background results in increased BrdU incorporation in the VP. (A–D) Representative images of BrdU incorporation in the VPs of FVB, FVB-AKT1, FB6-WT, and FB6AKT1 mice, respectively. FB6, F1(FVB-AKT1 ⫻ C57BL/6). (Scale bar, 50 ␮m.) (E) Quantitation of BrdU levels; the data are expressed as means ⫾ SE. The number of mice analyzed for each group is indicated. ⌬2 ⫽ ␮FB-AKT1 ⫺ ␮FVB-AKT1; ⌬1 ⫽ ␮FB-wt ⫺ ␮FVB-wt. By using a two-way complete ANOVA model to compare the six group means, the following statistics were found: *, P ⫽ 0.0014 (FB6 vs. FVB); **, P ⬍ 0.0001 (FB6-AKT1 vs. FVB-AKT1); ***, P ⫽ 0.0153 (⌬2 vs. ⌬1); ****, P ⬍ 0.002 (FVB-AKT1 vs. FVB); *****, P ⬍ 0.00001 (FB6-AKT1 vs. FB6).

proliferation of luminal epithelial cells of VP in response to AKT1, we intercrossed FVB-AKT1 mice with C57BL/6 mice and compared the number of proliferating cells in 8-week-old FVB-AKT1 mice with F1 (FVB-AKT1 ⫻ C57BL/6) mice by BrdU immunohistochemistry. In either the WT (Fig. 2 A, C, and E) or transgenic (Fig. 2 B, D, and E) context, the number of proliferating cells was higher in the VPs from F1 hybrid mice (designated FB6) than from FVB mice. Proliferation in transgenic AKT1-expressing VPs was increased over WT controls in either F1 (Fig. 2 C–E) or FVB (Fig. 2 A, B, and E) animals. The effects of modifier genes on proliferation appeared to be synergistic with that of AKT1 because the proliferation difference between the two transgenic mice was significantly larger than that between the two WT strains (⌬2 vs. ⌬1; Fig. 2E). These data raise the possibility that modifier genes can influence prostate proliferation induced by AKT1. Quantitative Differences in AKT1-Regulated Proliferation Among the F1 Transgenic Progeny. Next we wanted to determine whether

additional murine strains would differ in the phenotypic response to AKT1 expression in F1 hybrid mice. To this end, FVB-AKT1 mice were intercrossed with C57B6, AKR, SWR, 129⫻1, and BALB/c mice and the BrdU index of VPs of transgenic male F1 progeny at 8 weeks was determined. A representative VP field of each F1 transgenic progeny is shown (Fig. 3 A–F). Mean values of FAK and FB6 were the highest, followed by FSW, F129, and FBc and then followed by FVB-AKT1 (Fig. 3G). One-way ANOVA of the group means with assumption of equal variance demonstrated that BrdU indexes fell into three different classes at the 95% individual confidence interval, which were significantly different according to Tukey pairwise comparison at 90% simultaneous confidence intervals (SI Fig. 10). These data suggest that AKT1 can induce proliferation in VP differentially among distinct murine strains likely because of modifier gene variation among strains. To ensure that this variation in BrdU incorporation did not Xu et al.

Fig. 3. Variation in BrdU incorporation among FVB-AKT1 and five F1 transgenic mice. (A–F) Representative images of BrdU incorporation in the transgenic VPs of FAK [F1(FVB-AKT1 ⫻ AKR)] (A); FSW [F1(FVB-AKT1 ⫻ SWR)] (B); F129 [F1(FVB-AKT1 ⫻ 129⫻1)] (C); FVB (FVB-AKT1) (D); FBc [F1(FVB-AKT1 ⫻ BALB/c)] (E); and FB6 (F) mice. (Scale bar, 50 ␮m.) (G) Quantitation of BrdU levels per ⫻40 field; the data are expressed as means ⫾ SE. The number of mice in each group is indicated.

Fig. 4. Identification of candidate genes with significant positive or negative correlation to the BrdU profile. See Fig. 1 A for the heat map method. The ideal proliferation index is shown in the first row. The genes are ranked by the descending Pearson coefficient. Shown in the upper half is the top 20 probe sets positively correlated, whereas the lower half shows the top 20 probe sets negatively correlated with the BrdU index. Probe sets that passed Bonferroni P ⱕ 0.05 are in bold characters.

simply arise as a result of differences in the expression or activity of the transgene and transgenic protein product, we collected whole cell lysate of VPs from each F1 transgenic mice and measured the phospho-Ser-473 AKT1 level by immunoblotting. Quantitation of two independent results demonstrated that the phospho-Ser-473 AKT1 level did not correlate with the BrdU index (SI Fig. 11), suggesting that modification of the BrdU incorporation is not caused by the differential activity of AKT1.

of positive list and 15 probe sets of negative list also passed a highly stringent Bonferroni corrected P value (ⱕ0.05) (corrected for 14,447 tests) (Fig. 4). The complete rank-ordered list of genes was then used in subsequent analyses designed to identify modifier pathways.

some modifier genes contributing to the phenotypic differences (BrdU incorporation) might have a pattern of expression across the parental inbred strains either correlated or anticorrelated with the BrdU profile found in the AKT transgenic F1 hybrids. Here it is important to note that in all of the transgenic mice, there is an equal mix of one constant strain, FVB, with a new second strain and that we are relating gene expression in parental strains in the absence of AKT1 expression to the resulting phenotype after AKT1 expression. Therefore, considering the BrdU profile from all F1 hybrids as response variable and the expression of each gene as an independent variable, we tried to identify those genes having the most similar or inverse linear relationship with the BrdU incorporation across all strains by using Pearson coefficient. We first obtained a ranked list of genes based on descending Pearson coefficient and then permutated the samples to obtain the statistically significant sublist. At P ⱕ 0.001, 190 probe sets (171 genes) were positively and 184 probe sets (152 genes) were negatively correlated with the BrdU profile, and the top 20 of each list is shown in Fig. 4. These genes spanned across all 20 chromosomes and were involved in a wide variety of biological processes (SI Table 1). Eight probe sets Xu et al.

individual genes, we might miss those genes that contribute more moderately to the phenotypic differences. Moreover, coding polymorphisms might result in the functional change of a gene (a transcription factor as an example) but might not change the transcript level of that particular gene and thus would be not detected. Finally, it is possible that one might be able to identify molecular pathways, the function of which, as a whole, could be implicated in the modification of a genetic trait without directly identifying the genetically varied locus. To address these concerns and search for pathways capable of acting as trait modifiers, we applied GSEA (8) to the rank-ordered gene expression correlates described above. A total of 540 functional gene sets were used, and the association of pathway members within the list of ranked genes according to the phenotypic pattern was measured by enrichment score (ES) and the significance was evaluated by permutation testing of the phenotype labels. One positively associated pathway (EGF receptor pathway) and 14 negatively associated pathways were obtained (nominal P value, ⱕ0.02; false discovery rate, q ⱕ 0.1) (Fig. 5A). This result suggests that variations in multiple signaling pathways account for the differential proliferation across strains induced by AKT1. Among the 14 negatively enriched gene sets, multiple glycolytic pathways and a manually curated Hif-1␣ target gene set (11) appeared to be functionally overlapped. To examine the relationPNAS 兩 November 6, 2007 兩 vol. 104 兩 no. 45 兩 17773

MEDICAL SCIENCES

Correlation of mRNA Expression Profiles with Quantitative Differences in AKT1-Induced Proliferation Reveals Underlying Candidate Modifier Genes. In applying PSEM (SI Fig. 12), we reasoned that

GSEA Identified Modifier Pathways That May Underlie the Phenotypic Differences. It is likely that by selecting significantly correlated

Fig. 5. The differential expression of the glycolysis pathway is most prominently associated with AKT1-induced proliferation. (A) The significant gene sets enriched for the BrdU profile ordered by increasing normalized enrichment score (NES). The glycolytic gene sets and Hif-1 target gene set are highlighted in blue, and the Egfr gene set is highlighted in red. (B) Leadingedge analysis of 14 negatively enriched pathways passing the false discovery rate threshold of 0.1. The presence of a probe set in a given predefined gene set is indicated in red, and the absence of a probe set is indicated in white. Seven glycolysis gene sets, indicated in blue, are grouped into one clustered by the core leading-edge genes they share. The blue square points out the shared core genes (SI Fig. 10). (C) Validation of differential expression of core glycolysis genes in the VP by using quantitative RT-PCR. Expression values are shown relative to that of FVB mice. Ldhb, lactate dehydrogenase b; Pgk1, phosphoglycerate kinase 1; Eno1, enolase 1. The Pearson coefficient to the ideal BrdU index is shown for each gene.

ship of these gene sets and to identify essential functional subsets, we performed leading-edge analysis (8). The leading-edge subset can be interpreted as the core of a gene set that accounts for the enrichment signal (see Methods). A prominent cluster was identified in which the glycolytic gene sets and Hif-1␣ target gene set shared probe sets for the glycolytic enzymes, and these gene sets were grouped together (Fig. 5B and SI Fig. 13). We validated the expression of Ldhb, Pgk1, and Eno1 by using quantitative RT-PCR across the six parental VPs (Fig. 5C). The negative correlation of Ldhb with the BrdU profile was consistent with the single gene correlation analysis and was significant at genome-wide P ⬍ 0.001, whereas Pgk1 and Eno1 were not (SI Table 1). This result suggests that GSEA is robust to identify subtle but consistent expression variations underlying phenotypic alterations. These data raised the possibility that preexisting increased levels of glycolytic pathway genes might be associated with a decreased proliferation in response to AKT1 activation. Glycolysis Pathways and Human Prostate Cancer Progression. We next

sought to determine whether glycolysis pathway variation might be associated with modification of human prostate cancer outcomes. Here, we hypothesized that a human normal prostate expression data set might provide an orthogonal means to test this hypothesis. We reasoned that if germ-line genetic variation in mice leading to higher levels of glycolysis pathway expression in the murine prostate were associated with reduced oncogene-induced (AKT-induced) proliferation, then the expression of glycolysis genes in normal human prostate tissue might be negatively associated with clinicopathological features such as patient outcome. Although, ideally, one might ask about how this signature could contribute to the risk for developing prostate cancer, expression data from normal prostate tissue do not exist for populations of men before prostate cancer diagnosis. On the other hand, we have identified previously a five-gene predictor for prostate cancer outcome by using expression data from primary prostate cancer samples (15). In this data set, expression data from normal prostate tissue was available for 18 patients for whom there was substantial long-term follow-up 17774 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0708476104

Fig. 6. High germ-line glycolysis gene expressions associate with nonrecurrent outcome. (A) Higher expression of glycolysis genes in the normal human prostate associates with nonrecurrent patient outcome. See Fig. 1 A for the heat map method. The red and blue circles below the heat map indicate high and low expression, respectively, of all glycolysis genes in the corresponding sample. The contingency table below summarizes the result. Fisher’s exact test: P ⫽ 0.04. (B) The significant (nominal P value, ⬍0.05) gene sets enriched for nonrecurrence in human normal prostate expression profiles. The gene sets are shown along the y axis ordered by NES, and NES is shown along the x axis. (C) MAP00010 is significantly enriched for the nonrecurrent phenotype in the second data set (17). The enrichment score running curve is shown in red, and positions of each gene in this set among all genes are shown as blue vertical bars. (D) Differential activation of glycolytic gene expression in FVBAKT1 mice when compared with inbred (eighth generation) B6-AKT1 mice. All probe sets for glycolysis genes were compared by t test. Shown are those significantly different between FVB-AKT1 and B6-AKT1 (P ⬍ 0.05) (data are depicted as in Fig. 1 A).

data. Among these 18 patients, six were classified as recurrent (defined as two consecutive prostate serum antigen values of ⬎0.2 ng/ml after surgery), whereas 12 were classified as nonrecurrent patients (no prostate serum antigen elevations after ⱖ4 years of follow-up). To test whether glycolysis pathway expression in the normal tissue would be negatively associated with patient outcome, we applied neighborhood analysis (16). Although no single glycolytic gene was statistically associated with patient outcome (data not shown), glycolysis genes with expressions that passed the variation filter, after stratification into high and low expression, together were significantly associated with nonrecurrence, as assessed by Fisher’s exact test (Fig. 6A). To assess the significance of this association and rule out the possibility of random association, expression correlates for the class distinction recurrent vs. nonrecurrent were derived by using a signal-to-noise metric, and then GSEA was applied to the rankordered gene list with 440 predefined human gene sets. Fourteen gene sets were significantly enriched for the nonrecurrent phenotype at P ⱕ 0.05, whereas none was enriched for the recurrent phenotype. Strikingly, two glycolysis gene sets (P ⬍ 0.01) were ranked at the top (Fig. 6B). The majority of genes in MAP00010 glycolysis gene set (64%, 41 of 64) were expressed higher in the nonrecurrent phenotype. To further confirm these results, we directly assessed the enrichment of the glycolysis gene set (MAP00010) in a second independent human prostate cancer and normal tissue data set for which long-term clinical outcome was available (17). Here, we compared eight nonrecurrent vs. 20 recurrent adjacent normal samples and found again that MAP00010 was significantly enriched in the normal prostate samples from nonrecurrent patients (Fig. 6C). GSEA was then applied to the rank-ordered gene list with 440 predefined human gene sets, and again MAP00010 was significantly Xu et al.

Alterations in the Expression of Glycolysis Genes After AKT1 Introduction. We have previously shown that in FVB-AKT1 mice the

expressions of glycolysis pathway genes (and Hif1 targets) are significantly up-regulated in an mammalian target of rapamycindependent fashion. We therefore asked whether the expression of the AKT1 transgene would override preexisting parental strain differences in glycolysis expression, perhaps rendering such parental variation irrelevant. To this end, we backcrossed FVB-AKT1 mice to B6 for eight generations to generate B6-AKT1 mice. In the VP proliferation at 8 weeks, B6-AKT1 mice had ⬇5-fold higher rates of proliferation than that of the FVB-AKT1 mice (data not shown). We then compared expression levels of glycolysis genes between these two strains. We found that all 11 glycolytic enzymes had higher expression (1.6- to 2.5-fold) in FVB-AKT1 mice than in B6-AKT1 mice (Fig. 6D). Moreover, the enrichment of the glycolysis pathway by GSEA in B6-AKT1 mice (compared with B6 parental mice) was substantially reduced to that seen in the relevant FVB-AKT1 comparison (data not shown). These data suggest that the AKT1-induced expressions of glycolysis genes is attenuated in the B6 inbred AKT1 transgenic mice despite a higher rate of proliferation and strongly support the hypothesis that high levels of glycolytic enzymes are associated with a reduction in prostate cell proliferation. In the first PSEM study, the PIN lesions assessed at 8 weeks did not appear to be grossly different among the six F1 strains. However, we found at 8 and 18 months, the inbred B6-AKT1 had a more severe PIN phenotype, as well as higher rates of proliferation than FVB-AKT1 (SI Fig. 15). This provides evidence that proliferation may be a marker of a more severe phenotype. Discussion It is thought that genetic modification of cancer susceptibility or progression may be a general phenomenon and that modifier genes with weak genetic effects may have much greater impact on cancer-related public health than rare high-penetrance alleles (4, 18). However, genetic dissection of quantitative traits in model systems that use traditional methods requires multiple often time-consuming steps, including locus detection, localization, gene identification, and functional validation (19). These concerns led us to try to identify functional modifier pathways without the use of genetic markers. Differential gene expression levels are linked to regulatory region polymorphisms in the genome. Here we used mRNA expression levels as surrogates for the patterns of allelic similarities and differences among inbred strains to discern the pathway variations underlying phenotypic differences upon subsequent genetic perturbation. We reasoned that use of parental expression patterns may provide direct causal candidate genes rather than genomic susceptibility loci. We also reasoned that the ultimate goal of modifier studies is not necessarily the identification of a single locus or gene but rather the identification of an opportunity or pathway for therapeutic or preventative intervention. Thus, if one could understand the germ-line variation in terms of the impact on molecular pathway activity and its contribution to a cancer phenotype, one might have a direct handle on points of intervention. Specifically, we modeled the acquisition of somatic mutations in the context of varied germ-line alleles by introducing an activated allele of AKT1 into six distinct heterozygote strains, maintaining a constant single germ-line genome (i.e., FVB) and varying the recipient genome. In this instance, the effects of modifier alleles were reduced to one-half of the homozygous background, whereas the correlation structure was not changed. We rank ordered a variety of inbred strain VP-specific genes based on their correlation with the profile of AKT1-Tg-induced proliferation. By using an Xu et al.

empirical permutation threshold, we identified genes with expressions that significantly associated with the phenotype variation. From this input, GSEA was used to identify the modifier pathways that may be a better reflection of the functional consequences of the genetic variations at the DNA level and may include all of the modifier genes explaining small fractions of genetic variances. In this approach, the use of inbred strains and the highly penetrant nature of the murine PIN phenotype (⬇100%), in which all luminal epithelial cells of PIN lesions expressed Tg-AKT1, serve to minimize environmental, phenotypic, and complex genetic variation (12). These features may provide certain advantages for such model studies in the mouse over the direct study of humans, in which extragenetic variation and phenotype differences are legion. By using this approach, we identified the glycolysis pathway as a negative modifier of AKT1-regulated proliferation in murine ventral prostate. Moreover, the expression of genes across the glycolysis pathways was enriched in the normal prostate tissues of men with prostate cancer who did not recur when compared with those who did. Although these data appear to contradict the finding of elevated glycolysis in most cancers, a key regulator of this pathway, namely Hif1␣, has bifunctional roles depending on the cellular context. For example, in von Hippel Lindau (VHL)-null vs. VHL WT mouse embryonic fibroblasts and fibrosarcomas, elevated Hif1␣ was associated with and required for a decrease in proliferation in the VHL⫺/⫺ cells, which in turn were less aggressive in their in vivo growth rate compared with VHL WT cells (20). Similarly, Carmeliet et al. showed that HIF1␣ knockout cells were unable to arrest under hypoxic conditions, supporting a role for HIF1␣ in an antiproliferative response (21). Moreover, Zwerschke et al. found dramatic up-regulation of glycolytic enzymes in senescent vs. presenescent fibroblasts (22). In keeping with these data in human cancers, down-regulation of enolase has been associated with poor outcome (23) and is among the highest scoring genes in our data set. Our results were extended to two independent human data sets, yet the total number of available samples remained small. Nonetheless, it is important to note that there were three independent data sets, that all reflect the same findings (SI Fig. 14), and that this validation provides compelling evidence for the robustness of these findings. Because human samples from men who did not develop prostate cancers were unavailable, we are unable to test whether high expression of the glycolysis pathway in normal tissues could prevent prostate cancer incidence. However, the association of high glycolysis pathway expression with the nonrecurrence outcome supported its role in modifying risk of prostate cancer progression and is consistent with our murine data. It will be of significant interest to expand this study to much larger population-based cohorts. As a first step, it may be necessary to determine whether germ-line variation leading to prostate tissue expression variance can be recapitulated or captured in a surrogate tissue such as lymphocytes. If so, this might allow for robust sample and data acquisition in larger studies in which prostate tissue is unlikely to be available. It is also important to note that although AKT1 was studied here, these results do not preclude the possibility that glycolysis gene expression may modify proliferation in response to other oncogenes or somatically acquired mutations. In conclusion, PSEM is an approach for identifying candidate modifier pathways with germ-line variations that contribute to phenotype variation. From these data, it appears that preexisting variation in glycolysis pathway could contribute to the phenotypic variation in proliferation in the murine prostate and in patient outcome in humans. Methods Mice. Mice were bred and housed at a Dana–Farber Cancer Institute animal research facility. Mice bearing a transgene directing the prostate-specific expression of human AKT1 [FVBTg(Pbsn-AKT1wrs9)] have been described previously (12). FePNAS 兩 November 6, 2007 兩 vol. 104 兩 no. 45 兩 17775

MEDICAL SCIENCES

enriched (SI Fig. 14). These data suggest that basal increases in germ-line expression of glycolytic genes may favor the disease-free outcome in human prostate cancer.

male FVB-AKT1 animals were crossed to WT male C57BL/6 (B6), SWR, AKR, BABL/c, and 129⫻1 mice, which were purchased from The Jackson Laboratory (Bar Harbor, ME) to generate the relevant hybrid F1 mice. Isolation of tail DNA, PCR-based genotyping, prostate dissection, tissue fixation, and H&E staining were performed as described (12). B6-AKT1 mice were generated through continuous backcrossing of female FVB-AKT1 mice to B6 mice for eight generations. Immunohistochemistry and Quantitation of BrdU Incorporation.

Eight-week-old mice were injected intraperitoneally with 0.2 ml of 10 mg/ml BrdU (Roche, Indianapolis, IN) and killed 16 h after injection. BrdU immunohistochemistry was carried out as previously described (11). BrdU-positive cells in the VP were scored under ⫻40 objective of an E200 light microscope (Nikon, Tokyo, Japan). The BrdU index was calculated as the average number of BrdU-positive cells found in eight ⫻40 fields. RNA Isolation and Microarray Expression Analysis. Total RNA from the VPs of eight 8-week-old inbred mice for each parental strain was isolated by using the TRIzol method (Invitrogen, Carlsbad, CA), followed by RNeasy MinElute cleanup (Qiagen, Valencia, CA) according to the instructions of the manufactures. Two samples of RNA from the same strain were combined to increase the amount of total RNA for each microarray. Briefly, 10 ␮g of total RNA was used to generate biotinylated target cRNA and hybridized to Affymetrix MOE430A arrays. The CEL files and MAS5 processed expression values have been deposited in the Gene Expression Omnibus (GEO) database (accession no. GSE7270). After setting a lower limit of 10 units and an upper limit of 16,000 units, 2,122 genes passing a variation filter (⬎5-fold difference ratio and a absolute difference ⬎50) were selected for supervised one-versusall analysis for strain-specific genes (13) or unsupervised selforganized map analysis (14) by using GenePattern (24). Statistical significance was achieved by permutation testing in which we compared the observed scores with those derived from 1,000 randomly permuted data sets (16). For human normal prostate expression data set (15), CEL files were normalized with dChip (25) and expression values were computed and outputted to be used in GenePattern. Expression values of glycolysis genes were filtered with the criteria of ⬎5-fold difference ratio and absolute difference ⬎50. CEL files for B6-AKT1 mice were combined with that of FVB-AKT1 mice obtained previously (11) and normalized with dChip. Expression values of glycolysis genes were compared by using Student’s t test.

cDNA from 20 ng of total RNA was then used as a template in a 25-␮l PCR with 50 nM primers targeting the 3⬘ end of the transcripts (SI Table 2). The standard curve was generated by using 2-fold serial dilution of cDNA from the strain having the highest expression of the gene based on the microarray result. Correlation Analysis. The expression values of each gene in each

parental sample were correlated with the group mean ventral prostate BrdU score for each F1 intercross (see above) by using the Pearson coefficient. Statistical significance of the Pearson correlation coefficients was determined by using permutation tests. GSEA. GSEA was carried out by using software GSEA-P, as

described previously (8). In this analysis, a pathway file containing 540 (for MOE430A array) or 440 (for hgU95Av2 array) metabolic and signaling pathways was used. Leading-Edge Analysis. The leading-edge of a significantly enriched pathway/gene set has been defined by Subramanian et al. (8). A matrix of probe sets in rows and gene sets in columns was made in which the presence of a probe set in a given gene set was set to 1, whereas the absence of which was set to 0. The relatedness of gene sets is defined by the probe sets they share, which can be identified through hierarchical clustering of all leading-edge probe sets without standardizing the values across rows. Human Prostate Expression Data Sets. In both described data sets,

the recurrent patient was defined as having had two consecutive prostate serum antigen values of ⬎0.2 ng/ml after surgery, whereas the nonrecurrent patient was defined as no prostate serum antigen elevations after ⱖ4 years of follow up. In a study by Singh et al. (15), matched normal cancer samples were available for 50 patients, 21 of whom were evaluable with respect to recurrence after surgery. The hgU95Av2 array data were available for 18 normal tissue samples of the 21 patients, of which six were recurrent and 12 were nonrecurrent. In a study by Yu et al. (17), among 70 hgU95Av2 array data for cancer samples, 31 were evaluable with respect to recurrence after surgery. HgU95Av2 array data were available for 28 adjacent normal samples from these patients, of which 20 were recurrent and eight were nonrecurrent. Statistical Analysis. Student’s t test, one-way ANOVA, ANOVA

with two-way complete model, and Tukey pairwise comparison were performed with MiniTab software. P values for Pearson correlation coefficients of candidate genes and Fisher’s exact test were performed with R software.

7500 Real-Time PCR system and reagents (Applied Biosystems, Foster City, CA). Briefly 2 ␮g of total RNA was reversed transcribed by using oligo-d(T)16 primer into cDNA. Equal amounts of

We thank Haiyan Xu for help with statistical analysis; Mark Daly and Andrew Kirby for helpful discussions; and Peter Sicinski, Thomas Benjamin, and Levi Garraway for critical comments. This work was supported by the Linda and Arthur Gelb Center for Translational Research and by National Cancer Institute Grant P01 CA089021.

1. Sellers WR, Meyerson M (2005) J Natl Cancer Inst 97:326–328. 2. Landi MT, Bauer J, Pfeiffer RM, Elder DE, Hulley B, Minghetti P, Calista D, Kanetsky PA, Pinkel D, Bastian BC (2006) Science 313:521–522. 3. Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, Pukkala E, Skytthe A, Hemminki K (2000) N Engl J Med 343:78–85. 4. Risch NJ (2000) Nature 405:847–856. 5. Flint J, Valdar W, Shifman S, Mott R (2005) Nat Rev Genet 6:271–286. 6. Cuppen E (2005) Trends Genet 21:318–322. 7. Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo V, Ruff TG, Milligan SB, Lamb JR, Cavet G, et al. (2003) Nature 422:297–302. 8. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al. (2005) Proc Natl Acad Sci USA. 102:15545–15550. 9. Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, et al. (2003) Nat Genet 34:267–273. 10. Lamb J, Ramaswamy S, Ford HL, Contreras B, Martinez RV, Kittrell FS, Zahnow CA, Patterson N, Golub TR, Ewen ME (2003) Cell 114:323–334. 11. Majumder PK, Febbo PG, Bikoff R, Berger R, Xue Q, McMahon LM, Manola J, Brugarolas J, McDonnell TJ, Golub TR, et al. (2004) Nat Med 10:594 – 601. 12. Majumder PK, Yeh JJ, George DJ, Febbo PG, Kum J, Xue Q, Bikoff R, Ma H, Kantoff PW, Golub TR, et al. (2003) Proc Natl Acad Sci USA 100:7841–7846.

13. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, et al. (2001) Proc Natl Acad Sci USA 98:15149–15154. 14. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Proc Natl Acad Sci USA 96:2907–2912. 15. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, et al. (2002) Cancer Cell 1:203–209. 16. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al. (1999) Science 286:531–537. 17. Yu YP, Landsittel D, Jing L, Nelson J, Ren B, Liu L, McDonald C, Thomas R, Dhir R, Finkelstein S, et al. (2004) J Clin Oncol 22:2790–2799. 18. Balmain A, Gray J, Ponder B (2003) Nat Genet 33(Suppl):238–244. 19. Darvasi A, Pisante-Shalom A (2002) Trends Genet 18:489–491. 20. Mack FA, Patel JH, Biju MP, Haase VH, Simon MC (2005) Mol Cell Biol 25:4565–4578. 21. Carmeliet P, Dor Y, Herbert JM, Fukumura D, Brusselmans K, Dewerchin M, Neeman M, Bono F, Abramovitch R, Maxwell P, et al. (1998) Nature 394:485–490. 22. Zwerschke W, Mazurek S, Stockl P, Hutter E, Eigenbrodt E, Jansen-Durr P (2003) Biochem J 376:403–411. 23. Chang YS, Wu W, Walsh G, Hong WK, Mao L (2003) Clin Cancer Res 9:3641–3644. 24. Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP (2006) Nat Genet 38:500–501. 25. Li C, Wong WH (2001) Proc Natl Acad Sci USA 98:31–36.

Quantitative RT-PCR. Quantitative RT-PCR was carried out with a

17776 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0708476104

Xu et al.